Lectures
The schedule below is tentative and subject to change, depending on time and class interests. We will move at a pace dictated by class discussions. Please check this page often for updates.
| Week | Date | Content | 
|---|---|---|
| 1 | 9/24 – 9/26 | Intro to Machine Learning; Nearest Neighbours; Bias-Variance Trade-Off | 
| 2 | 10/1 – 10/3 | Cross Validation | 
| 3 | 10/8 – 10/10 | Decision Trees; Bagging and Random Forests; Boosting and Boosted Additive Models | 
| 4 | 10/15 – 10/17 | Categorical Outcomes and Classification Models | 
| 5 | 10/22 – 10/24 | Logistic regression and Intro to Neural networks | 
| 6 | 10/29 – 10/31 | Neural Networks | 
| 7 | 11/5 – 11/7 | Recommender Systems | 
| 8 | 11/12 – 11/14 | Networks | 
| 9 | 11/19 – 11/21 | Naive Bayes; Probabilistic Graphical Models | 
| 10 | 12/3 – 12/5 | Hidden Markov Models (If time permits: anomaly detection in time series) | 
| 11 | 12/11 | Final Project due | 
Weeks 1-2
Lecture Slides: 
Overview 
Introduction to Predictive Models and kNN
R code: 
docv.R 
bias-variance-illustration.R 
BostonHousing_KNN_BiasVarTradeOff_CrossValid.Rmd
- please save as a R Markdown (.Rmd) file on your computer, open in RStudio, and run by "Knit PDF"; OR:
- you may also find it in the Programming Scripts > Boston Housing > R folder if you have cloned and synced the course GitHub repo down to your computer
- note: R Markdown requires TeX for conversion to PDF
Python code: 
BostonHousing_KNN_BiasVarTradeOff_CrossValid.ipynb
- please save as an iPython Notebook (.ipynb) file on your computer and open in an iPython Notebook server session; OR:
- you may also find it in the Programming Scripts > Boston Housing > Python folder if you have cloned and synced the course GitHub repo down to your computer
- The iPython NBviewer website allows you to view the notebook in a read-only mode
Homework assignments: 
Optional textbook reading: 
An Introduction to Statistical Learning: Section 2, Section 5.1, Section 8.1
Additional reading:
Machine Learning: Trends, Perspectives, and Prospects 
M. I. Jordan and T. M. Mitchel 
A Science review article from two leading experts in Machine Learning
Week 3
Lecture Slides: 
Trees, Bagging, Random Forests and Boosting
R code: 
knn-bagging.R 
boosting_demo_1D.R 
boosting_demo_2D.R 
BostonHousing_Trees_RandomForests_BoostedAdditiveModels.Rmd
- you can find an advanced version featuring the popular caret package and parallel computation here: BostonHousing_Trees_RandomForests_BoostedAdditiveModels_usingCaretPackage.Rmd
Python code: 
BostonHousing_Trees_RandomForests_BoostedAdditiveModels.ipynb
Homework assignment: See Problem 9.1 in Lecture notes for this week.
Optional textbook reading: An Introduction to Statistical Learning: Chapter 8
Week 4
Lecture Slides: 
Classification 
Perceptron 
Perceptron -- R Markdown Script to recreate slides
R code: 
fglass.R 
04_kaggle_logit_rf_boost.R 
04_simulation_logit_rf_boost.R 
04_tabloid_logit_rf_boost.R 
KaggleCreditScoring_usingCaretPackage.Rmd 
TabloidMarketing_usingCaretPackage.Rmd
- The use of caretpackage and parallel computing is highly encouraged from this point on, especially if you decide to train models on a sizeable Training data set
Python code: 
KaggleCreditScoring.ipynb 
TabloidMarketing.ipynb
Homework assignment:  
04_hw.pdf 
Start early.
- model answers in R Markdown; note: the whole thing takes a long time to run completely
- model answers in iPython Notebook; note: the whole thing takes a long time to run completely
Optional textbook reading: An Introduction to Statistical Learning: Chapter 4 (we will not talk about linear discriminant analysis)
Pedro Domingos: A Few Useful Things to Know about Machine Learning PDF
D. Sculley et al.: Machine Learning: The High Interest Credit Card of Technical Debt PDF
Week 5
Lecture Slides: 
Logistic regression 
RMarkdown -- Logistic regression 
R code: 
lr_decision_surface.R 
we8there.R 
Optional textbook reading:_ An Introduction to Statistical Learning: Chapter 4, Section 6.2
Midterm Exam
- model answers in R Markdown:
Week 6
Lecture Slides: 
Neural networks 
MNIST example 
R code: 
See our GitHub. 
We suggest you to clone the folder "Lecture06" or download all of its content, as the folder contains some pretrained models, which may take a long time to train again.
In order to install h2o package, go to http://h2o-release.s3.amazonaws.com/h2o/master/3232/index.html, click on "INSTALL IN R", and follow instructions.
Alternatively, you can type the following in R:
source("https://raw.githubusercontent.com/ChicagoBoothML/HelpR/master/booth.ml.packages.R")
Python code 
MNISTDigits_NeuralNet_KerasPackage.ipynb
Homework assignment:  
- model answers in R Markdown; note: the whole thing takes a long time to run completely
- model answers in iPython Notebook; note: the whole thing takes a long time to run completely
To load data use:
source("ParseData.R")
data <- parse_human_activity_recog_data()
Due Sunday, November 8.
Optional textbook reading: The Elements of Statistical Learning: Sections 11.3 - 11.5
Some h2o resources:
Week 7
Lecture Slides: 
Recommender Systems 
R code: 
simpleScript.R This is a toy example illustrating how to compute similarities between users, recommend items and predict ratings.
MovieLens_MovieRecommendation.Rmd
In this lecture, we will be using recommenderlab package.
recommenderlab: Reference manual 
recommenderlab: Vignette 
Python code 
MovieLens_LatentFactorRec.ipynb
- note: this script requires Apache Spark to be installed
Homework assignment:  
Assignment 
Data: videoGames.json.gz 
Starter script: starterScript.R 
Optional reading:
Amazon.com Recommendations 
Cold Start Problem 
Matrix Factorization Techniques For Recommender Systems 
All Together Now: A Perspective on the Netflix Prize
Week 8
Lecture Slides: 
Networks 
Ego nets   Slides from KDD tutorial on Graph-Based User Behaviour Modeling 
R code: 
See our GitHub. 
Homework assignment:  
Assignment 
Data: wikipedia.gml 
Starter script: starterScript.R 
Optional reading:
See Chapters 3 and 4 of "Statistical Analysis of Network Data with R" (PDF available through UChicago library)
Weeks 9-10
Lecture Slides: 
Probabilistic Graphical Models 
Example PGM 
Hidden Markov Models 
R code: 
NB_reviews.R. 
Large Movie Review Dataset can be downloaded from here.  
Direct link to data: aclImdb_v1.tar.gz
Homework assignment:  
Assignment 
Data: emails.cvs
Starter script: starterScript.R 
Optional reading:
Andrew Moore's basic probability tutorial 
Rabiner's Detailed HMMs Tutorial  
Text mining package  
Graphical Models with R   
HMM Tutorial 
Animated HMM Tutorial