Topics in Statistical Genetics
Semester 2 2024
Wednesday, 13-16, Kaplun 118
Home page on http://www.tau.ac.il/~saharon/StatsGenetics.html
Lecturer: | Saharon Rosset |
Schreiber 203 | |
saharon@tauex.tau.ac.il | |
Office hrs: | By appointment. |
Final signup form (deadline 1/8/24)
(29 May 2024)
In class we have a quick introduction to genetics on the board and with the following presentation:
Class 1 presentation.
We then start discussing the problem of time estimation under molecular clock assumptions, cover
this Class note.
Some general interest reading: This New Yorker article on using genetics to catch criminals.
(5 June 2024)
We analyze the molecular clock example we started last week, using this Class note.
In our effort to understand mtDNA evolution and estimate the distribution of rates we use this
African mtDNA paper and analyze its data
(19 June 2024)
We will finish the molecular clock analysis we did, then move on to a more fundamental discussion
of nucleotide substitution models using this class note.
Review on nucleotide substitution models and ML fitting by Huelsenbeck and Crandall.
Whittaker et al. (2003) paper estimating mutation models for STRs.
Homework 1 due 3 July in class. Resources for this homework:
mtDNA mutation counts for problem 1.
mtDNA loci list for problem 1.
The paper by Whittaker et al. (2003) for problem 3 is available in pdf or html.
(26 June 2024)
In the first part we will discuss STR mutation models using last week’s class note, and Whittaker
et al. (2003) paper estimating mutation models for STRs.
Then we will switch to discuss phylogenetic tree reconstruction using this class note.
Reading materials on phylogenetic reconstruction:
Review by Huelsenbeck and Crandall
Inferring Phylogenies book by Felsenstein.
(3 July 2024)
We will complete the discussion of phylogenetic tree reconstruction using this class
note.
Reading materials on phylogenetic reconstruction: Review by Huelsenbeck and Crandall
We will then switch to discussing Genotype-Phenotype modeling in Genome Wide Association
Studies (GWAS), using this class note.
This presentation gives a pretty popular introduction to this area.
(6 July 2024)
Homework 2 due 24 July before class. Resources for this homework:
The program PHYLIP
14-species primates+mammals mtDNA database, with documentation.
dnamlk help page.
For problem 2: HapMap Yoruban haplotype data on Chromosome 22 (note individuals are in
columns, SNPs in rows, and each entry is two letters separated by space (i.e. a genotype), whereas
entries are separated by tab).
(10 July 2024)
We will solve HW1 in class and discuss it.
We will then discuss Genotype-Phenotype modeling in Genome Wide Association Studies
(GWAS), using this class note.
This presentation gives a pretty popular introduction to this area.
(17 July 2024)
Zoom link.
We discussed LD last time, in this week’s note we focus on testing and accounting
for multiplicity, and as time permits will start talking about stratification / ancestry
estimation
R code for analyzing the kidney disease data.
(24 July 2024)
Zoom link.
We will discuss admixture/stratification estimation using expectation-maximization (EM), in the
second half of last week’s note.
The EM solution is based on Estimation of individual admixture: analytical and study design
considerations by Tang et al.
R code implementing the approach.
R code for analyzing the kidney disease data using EM estimates.
(24 July 2024)
Homework 3 due 12 August before class (no extensions as the semester ends and I will discuss it
in this class). Resources for this homework:
EM code
Code for problem 2
(31 July 2024)
The first two hours will be pre-recorded on Thursday 25 July, 16-18, Seminar room in Schreiber
basement, Zoom link.
The third hour will be a live Zoom meeting at the regular hour 15-16 on 31 July, Zoom
link.
In this class we will discuss Principal Component Analysis (PCA) and its applications in
stratification estimation and beyond, using this note.
Resources and reading materials:
PCA in GWAS: Genes Mirror Geography Within Europe by Novembre et al.
Code: Running PCA on movies example. Comparing using EM and PCA on genetic ancestry
estimation.
PCA Corrects for Stratification by Price et al.
(7 August 2024)
The first two hours will be pre-recorded on Friday 26 July, 9-11, Seminar room in Schreiber
basement, Zoom link.
The third hour will be a live Zoom meeting at the regular hour 15-16 on 7 August, Zoom
link.
In this class we will do an introduction to heritability estimation using this note.
Resources:
Height GWAS’s: Weedon et al., Lettre et al., Gudbjartsson et al.
Yang et al.’s famous paper on using the LMM approach for estimating heritability
(12 August 2024)
The first two hours will be pre-recorded on Thursday 1 August, 17:30-19:30, Zoom
link.
The third hour will be a live Zoom meeting at the regular hour 15-16 on 12 August, Zoom link
.
We will discuss in some detail the famous Science paper on the neanderthal genome,including the
admixture analyses, specifically Fig. 5 and Supp notes 15,18
In the third hour, we will summarize the course and discuss HW3.
The goal of this course is to introduce some of the major topics in Genetics, and gain a statistical
perspective on them.
We will start with a brief introduction to Genetics concepts, and gradually start elaborating on
statistical aspects of the questions that come up. As needed, we will introduce relevant areas of
statistics in some detail.
In the latter part of the course we will pick a hot current research topic and concentrate on it for a
few weeks.
The final grade will be based on a combination of homework (3-4), a final take home exam, and
possibly a class presentation.
Tentative topics list (each topic 1-2 weeks):
Introduction to Genetics and quantitative Genetics
Mutation models: stochastic processes; estimation from data
Phylogenetic analysis: algorithms and inference
Human population genetics: statistical inference about human history
Estimation of ancestry
Principal component analysis in Genetics
Genome-wide association studies (GWAS)
Major public data sources like HapMap, 1000Genome project and their analysis
Linear mixed models (LMM) in Genetics
Basic knowledge of mathematical foundations: Calculus; Linear Algebra
Undergraduate courses in: Probability; Theoretical Statistics
Statistical programming experience in R is an advantage
Prior basic knowledge in Biology and Genetics is an advantage
There will be three or four homework assignments, which will count for about 30% of the final grade, and a final take-home project. Both the homework and the project will combine theoretical analysis with hands-on data analysis.
Human Evolutionary Genetics by Jobling, Hurles and Tyler-Smith
An excellent introduction to Human Genetics, with a quantitative flavor
Principles of Population Genetics by Hartl and Clark
Comprehensive overview of computational methods in Genetics
Statistical Methods in Molecular Evolution edited by R. Nielsen
Collection of tutorials and reviews on major topics in Statistical Genetics
The course will require some use of statistical modeling software. It is strongly recommended to use
R (freely available for PC/Unix/Mac).
R Project website also contains extensive documentation.
A basic “getting you started in R” tutorial. Uses the Boston Housing Data (thanks to Giles
Hooker).
Modern Applied Statistics with Splus by Venables and Ripley is an excellent source for statistical
computing help for R/Splus.