# Introduction to Statistical Learning

Semester 2 2015/16
Tuesday 14-17, Dan-David 204
 Lecturer: Saharon Rosset Schreiber 022 saharon@post.tau.ac.il Office hrs: Thursday 16-17 or by appointment (coordination needed in any case).

## Announcements and handouts

(8 March) R code from class to analyze the prostate data using least squares regression and nearest neighbors.
(15 March) R code from class to analyze the advertising data using least squares regression.
(18 March) Homework 1, due on 5/4 in class. k-NN R code to be used for problem 1.
(22 March) Nature paper from 2009 introducing Google Flu Trends (GFT)
(29 March) R code from class to analyze the default data using logistic regression and LDA.
(5 April) R code for Chapter 4 from the book website to demonstrate classification methods.
(19 April) Homework 2, due on 3/5 in class.
(3 May) R code from class verifying the theoretical result on optimism.
(8 May) Homework 3, due on 24/5 in class.
(25 May) Homework 4, due on 9/6 in my mailbox on floor 1 of Schreiber (or 7/6 in class). It uses the code boost.r.
(7 June) Bootstrap presentation from class

## Syllabus

The goal of this course is to introduce the basic ideas of "modern" statistical learning and predictive modeling, from a statistical, theoretical and computational perspective, together with applications in big data.
The topics we will cover include:
• Introduction: some examples of problems in regression and classification; Focus on Google Flu Trends (GFT)
• Basic methods for regression: Linear regression and local (neighbor-based) methods
• Basic methods for classification: Logistic regression and discriminant analysis
• Resampling methods: cross validation and bootstrap
• Model selection and regularization
• Modern methods and their applications: trees, support vector machines
Both the class material and homework will combine theoretical aspects with practical implementation aspects and demonstrations on data.

## Prerequisites

Basic knowledge of mathematical foundations: Calculus; Linear Algebra
Undergraduate courses in: Probability; Regression; Theoretical Statistics (possibly in parallel)
Statistical programming experience in R is an advantage

## Textbook

An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie and Tibshirani.

## Computing

The book labs, all class demonstrations and any code given in the HW will be in R (freely available for PC/Unix/Mac). There is no requirement to use it, but it is highly recommended.
R Project website also contains extensive documentation.
A basic "getting you started in R" tutorial. Uses the Boston Housing Data (thanks to Giles Hooker).
Modern Applied Statistics with Splus by Venables and Ripley is an excellent source for statistical computing help for R/Splus.

File translated from TEX by TTH, version 4.08.
On 13 Jun 2016, 17:02.