Topics in Statistical Genetics
 
Semester 1 2025-6
Thursday, 13-16, Dan-David 211
Home page on http://www.tau.ac.il/~saharon/StatsGenetics.html

 

Lecturer: Saharon Rosset
Schreiber 203
saharon@tauex.tau.ac.il
Office hrs: By appointment.

Announcements and handouts

(30 Oct 2025)
In class we have a quick introduction to genetics on the board and with the following presentation: Class 1 presentation (a pdf copy with no transitions in case the browser refuses to download ppt).
We then start discussing the problem of time estimation under molecular clock assumptions, cover this Class note.
Some general interest reading: This New Yorker article on using genetics to catch criminals.

(6 Nov 2025)
We analyze a molecular clock estimation problem, using this Class note.
In our effort to understand mtDNA evolution and estimate the distribution of rates we use this African mtDNA paper and analyze its data

(9 Nov 2025)
Homework 1 due 23 Nov. Resources for this homework:
mtDNA mutation counts for problem 1.
mtDNA loci list for problem 1.
The paper by Whittaker et al. (2003) for problem 3 is available in pdf or html.

(13 Nov 2025)
We will finish the molecular clock analysis we did, then move on to a more fundamental discussion of nucleotide substitution models using this class note.
Review on nucleotide substitution models and ML fitting by Huelsenbeck and Crandall.
Whittaker et al. (2003) paper estimating mutation models for STRs.
 

(20 Nov 2025)
In the first part we completed discussion of STR mutation models using last week’s class note, and Whittaker et al. (2003) paper estimating mutation models for STRs.
Then we switched to discuss phylogenetic tree reconstruction using this class note.
Reading materials on phylogenetic reconstruction:
Review by Huelsenbeck and Crandall
Inferring Phylogenies book by Felsenstein.

(27 Nov 2025)
We will complete the discussion of phylogenetic tree reconstruction using this class note.
Reading materials on phylogenetic reconstruction: Review by Huelsenbeck and Crandall
We will then switch to discussing Genotype-Phenotype modeling in Genome Wide Association Studies (GWAS), using this class note.
This presentation gives a pretty popular introduction to this area.

(28 Nov 2025)
Homework 2 due 14 December. Resources for this homework:
The program PHYLIP
14-species primates+mammals mtDNA database, with documentation.
dnamlk help page.
For problem 2: HapMap Yoruban haplotype data on Chromosome 22 (note individuals are in columns, SNPs in rows, and each entry is two letters separated by space (i.e. a genotype), whereas entries are separated by tab).

Syllabus

The goal of this course is to introduce some of the major topics in Genetics, and gain a statistical perspective on them.
We will start with a brief introduction to Genetics concepts, and gradually start elaborating on statistical aspects of the questions that come up. As needed, we will introduce relevant areas of statistics in some detail.
In the latter part of the course we will pick a hot current research topic and concentrate on it for a few weeks.
The final grade will be based on a combination of homework (3-4), a final take home exam, and possibly a class presentation.
Tentative topics list (each topic 1-2 weeks):

Prerequisites

Basic knowledge of mathematical foundations: Calculus; Linear Algebra
Undergraduate courses in: Probability; Theoretical Statistics
Statistical programming experience in R is an advantage
Prior basic knowledge in Biology and Genetics is an advantage

Grading

There will be three or four homework assignments, which will count for about 30% of the final grade, and a final take-home project. Both the homework and the project will combine theoretical analysis with hands-on data analysis.

Some recommended books

Human Evolutionary Genetics by Jobling, Hurles and Tyler-Smith
An excellent introduction to Human Genetics, with a quantitative flavor
 
Principles of Population Genetics by Hartl and Clark
Comprehensive overview of computational methods in Genetics
 
Statistical Methods in Molecular Evolution edited by R. Nielsen
Collection of tutorials and reviews on major topics in Statistical Genetics

Computing

The course will require some use of statistical modeling software. Class examples will be given in R (freely available for PC/Unix/Mac).
R Project website also contains extensive documentation.
Modern Applied Statistics with Splus by Venables and Ripley is an excellent source for statistical computing help for R/Splus.
You are welcome to use Python or other tools as you wish.