Semester 2 2022

Sunday 13-16, Schreiber 007

Home page on http://www.tau.ac.il/ ∼ saharon/StatsGenetics.html

Sunday 13-16, Schreiber 007

Home page on http://www.tau.ac.il/ ∼ saharon/StatsGenetics.html

Lecturer: | Saharon Rosset |

Schreiber 022 | |

saharon@tauex.tau.ac.il | |

Office hrs: | By appointment. |

In class we did a quick introduction to genetics on the board and with the following presentation: Class 1 presentation.

We then started discussing the problem of time estimation under molecular clock assumptions, covered the first part of this Class note.

In this class we analyzed the molecular clock example we started last week, using this Class note.

In our effort to understand mtDNA evolution and estimate the distribution of rates we reviewed this African mtDNA paper and used its data

Homework 1 due 20 March before class. Resources for this homework:

mtDNA mutation counts for problem 1.

mtDNA loci list for problem 1.

The paper by Whittaker et al. (2003) for problem 3 is available in pdf or html.

Today our discussion focused on "classical" subsitution model theory using this Class note and additional sources:

Review on nucleotide substitution models and ML fitting by Huelsenbeck and Crandall.

Whittaker et al. (2003) paper estimating mutation models for STRs.

Class note on phylogenetic reconstruction.

Reading materials on phylogenetic reconstruction:

review by Huelsenbeck and Crandall

Inferring phylogenies book by Felsenstein.

Class note: introduction to discovering phenotype-genotype connections.

R code for analyzing the kidney disease data shown in class.

Homework 2 due 10 April (UPDATE: submission extended to 24 April). PHYLIP homepage for problem 1.

The primate data for problem 1.

HapMap YRI Chromosome 22 dataset for problem 3.

Class note: LD, multiple testing, stratification (including carry-over material from last week's note).

R code for analyzing the kidney disease data shown in class.

Introduction to GWAS presentation covering many of the topics in a lighter level

Material is class note and presentation from last week

Material is mostly carry over from the previous note: testing approaches and stratification estimation using EM.

The EM solution is based on Estimation of individual admixture: analytical and study design considerations by Tang et al.

R code implementing the approach.

Homework 3 due 11 May (note this is Wednesday, the last HW will be handed out that week).

Class note PCA. As time permits we will also start discussing heritability using this note

PCA in GWAS: Genes Mirror Geography Within Europe by Novembre et al.

Code: Running PCA on movies example. Comparing using EM and PCA on genetic ancestry estimation.

PCA Corrects for Stratification by Price et al.

Class note on heritability: definitions, traditional methods and approach based on LMM.

Height GWAS's: Weedon et al., Lettre et al., Gudbjartsson et al.

Homework 4 due 29 May before last class (note no extensions will be given).

Class note on heritability estimation in GWAS.

Yang et al.'s famous paper on using the LMM approach for estimating heritability

Some material on MCMC which we probably won't have time to cover: Introductory presentation, and R code for the cipher example mentioned in the presentation.

Class note on heritability estimation in disease GWAS.

Lee et al.'s paper on estimating disease heritability by fitting LMM to the observed 0-1 data, then doing the Dempster-Lerner correction

Paper by Golan et al. (2014) on estimating heritability of disease phenotypes.

The famous Science paper on the neanderthal genome,including the admixture analyses, specifically Fig. 5 and Supp notes 15,16,18

The Genetics paper about which form the human-neanderthal admixture was likely to take.

Earlier Genetics paper describing the approach of calculating likelihoods with the generating function.

We will start with a brief introduction to Genetics concepts, and gradually start elaborating on statistical aspects of the questions that come up. As needed, we will introduce relevant areas of statistics in some detail.

In the latter part of the course we will pick a hot current research topic and concentrate on it for a few weeks.

The final grade will be based on a combination of homework (3-4), a final take home exam, and possibly a class presentation.

Tentative topics list (each topic 1-2 weeks):

- Introduction to Genetics and quantitative Genetics
- Mutation models: stochastic processes; estimation from data
- Phylogenetic analysis: algorithms and inference
- Human population genetics: statistical inference about human history
- Estimation of ancestry
- Principal component analysis in Genetics
- Genome-wide association studies (GWAS)
- Major public data sources like HapMap, 1000Genome project and their analysis
- Linear mixed models (LMM) in Genetics

Undergraduate courses in: Probability; Theoretical Statistics

Statistical programming experience in

Prior basic knowledge in Biology and Genetics is an advantage

An excellent introduction to Human Genetics, with a quantitative flavor

Comprehensive overview of computational methods in Genetics

Collection of tutorials and reviews on major topics in Statistical Genetics

R Project website also contains extensive documentation.

A basic "getting you started in R" tutorial. Uses the Boston Housing Data (thanks to Giles Hooker).

File translated from T

On 29 May 2022, 10:09.