Topics in Statistical Genetics

Semester 2 2022
Sunday 13-16, Schreiber 007
Home page on ∼ saharon/StatsGenetics.html
Lecturer: Saharon Rosset
Schreiber 022
Office hrs: By appointment.

Announcements and handouts

(20 February)
In class we did a quick introduction to genetics on the board and with the following presentation: Class 1 presentation.
We then started discussing the problem of time estimation under molecular clock assumptions, covered the first part of this Class note.
Some general interest reading: This New Yorker article on using genetics to catch criminals (thanks to Anna Dunietz for sending the link).
(27 February)
In this class we analyzed the molecular clock example we started last week, using this Class note.
In our effort to understand mtDNA evolution and estimate the distribution of rates we reviewed this African mtDNA paper and used its data
(1 March)
Homework 1 due 20 March before class. Resources for this homework:
mtDNA mutation counts for problem 1.
mtDNA loci list for problem 1.
The paper by Whittaker et al. (2003) for problem 3 is available in pdf or html.
(6 March)
Today our discussion focused on "classical" subsitution model theory using this Class note and additional sources:
Review on nucleotide substitution models and ML fitting by Huelsenbeck and Crandall.
Whittaker et al. (2003) paper estimating mutation models for STRs.
(13 March)
Class note on phylogenetic reconstruction.
Reading materials on phylogenetic reconstruction:
review by Huelsenbeck and Crandall
Inferring phylogenies book by Felsenstein.
(20 March)
Class note: introduction to discovering phenotype-genotype connections.
R code for analyzing the kidney disease data shown in class.
(21 March)
Homework 2 due 10 April (UPDATE: submission extended to 24 April). PHYLIP homepage for problem 1.
The primate data for problem 1.
HapMap YRI Chromosome 22 dataset for problem 3.
(27 March)
Class note: LD, multiple testing, stratification (including carry-over material from last week's note).
R code for analyzing the kidney disease data shown in class.
Introduction to GWAS presentation covering many of the topics in a lighter level
(3 April)
Material is class note and presentation from last week
(24 April)
Material is mostly carry over from the previous note: testing approaches and stratification estimation using EM.
The EM solution is based on Estimation of individual admixture: analytical and study design considerations by Tang et al.
R code implementing the approach.
Homework 3 due 11 May (note this is Wednesday, the last HW will be handed out that week).
(1 May)
Class note PCA. As time permits we will also start discussing heritability using this note
PCA in GWAS: Genes Mirror Geography Within Europe by Novembre et al.
Code: Running PCA on movies example. Comparing using EM and PCA on genetic ancestry estimation.
PCA Corrects for Stratification by Price et al.
(8 May)
Class note on heritability: definitions, traditional methods and approach based on LMM.
Height GWAS's: Weedon et al., Lettre et al., Gudbjartsson et al.
(11 May)
Homework 4 due 29 May before last class (note no extensions will be given).
(15 May)
Class note on heritability estimation in GWAS.
Yang et al.'s famous paper on using the LMM approach for estimating heritability
Some material on MCMC which we probably won't have time to cover: Introductory presentation, and R code for the cipher example mentioned in the presentation.
(22 May)
Class note on heritability estimation in disease GWAS.
Lee et al.'s paper on estimating disease heritability by fitting LMM to the observed 0-1 data, then doing the Dempster-Lerner correction
Paper by Golan et al. (2014) on estimating heritability of disease phenotypes.
(29 May) Zoom link for today's class
The famous Science paper on the neanderthal genome,including the admixture analyses, specifically Fig. 5 and Supp notes 15,16,18
The Genetics paper about which form the human-neanderthal admixture was likely to take.
Earlier Genetics paper describing the approach of calculating likelihoods with the generating function.


The goal of this course is to introduce some of the major topics in Genetics, and gain a statistical perspective on them.
We will start with a brief introduction to Genetics concepts, and gradually start elaborating on statistical aspects of the questions that come up. As needed, we will introduce relevant areas of statistics in some detail.
In the latter part of the course we will pick a hot current research topic and concentrate on it for a few weeks.
The final grade will be based on a combination of homework (3-4), a final take home exam, and possibly a class presentation.
Tentative topics list (each topic 1-2 weeks):


Basic knowledge of mathematical foundations: Calculus; Linear Algebra
Undergraduate courses in: Probability; Theoretical Statistics
Statistical programming experience in R is an advantage
Prior basic knowledge in Biology and Genetics is an advantage


There will be four homework assignments, which will count for about 30% of the final grade, and a final take-home project. Both the homework and the project will combine theoretical analysis with hands-on data analysis.

Some recommended books

Human Evolutionary Genetics by Jobling, Hurles and Tyler-Smith
An excellent introduction to Human Genetics, with a quantitative flavor
Principles of Population Genetics by Hartl and Clark
Comprehensive overview of computational methods in Genetics
Statistical Methods in Molecular Evolution edited by R. Nielsen
Collection of tutorials and reviews on major topics in Statistical Genetics


The course will require some use of statistical modeling software. It is strongly recommended to use R (freely available for PC/Unix/Mac).
R Project website also contains extensive documentation.
A basic "getting you started in R" tutorial. Uses the Boston Housing Data (thanks to Giles Hooker).
Modern Applied Statistics with Splus by Venables and Ripley is an excellent source for statistical computing help for R/Splus.

File translated from TEX by TTH, version 4.12.
On 29 May 2022, 10:09.