Lectures
The schedule below is tentative and subject to change, depending on time and class interests. We will move at a pace dictated by class discussions. Please check this page often for updates.
Week | Date | Content |
---|---|---|
1 | 9/24 – 9/26 | Intro to Machine Learning; Nearest Neighbours; Bias-Variance Trade-Off |
2 | 10/1 – 10/3 | Cross Validation |
3 | 10/8 – 10/10 | Decision Trees; Bagging and Random Forests; Boosting and Boosted Additive Models |
4 | 10/15 – 10/17 | Categorical Outcomes and Classification Models |
5 | 10/22 – 10/24 | Logistic regression and Intro to Neural networks |
6 | 10/29 – 10/31 | Neural Networks |
7 | 11/5 – 11/7 | Recommender Systems |
8 | 11/12 – 11/14 | Networks |
9 | 11/19 – 11/21 | Naive Bayes; Probabilistic Graphical Models |
10 | 12/3 – 12/5 | Hidden Markov Models (If time permits: anomaly detection in time series) |
11 | 12/11 | Final Project due |
Weeks 1-2
Lecture Slides:
Overview
Introduction to Predictive Models and kNN
R code:
docv.R
bias-variance-illustration.R
BostonHousing_KNN_BiasVarTradeOff_CrossValid.Rmd
- please save as a R Markdown (.Rmd) file on your computer, open in RStudio, and run by "Knit PDF"; OR:
- you may also find it in the Programming Scripts > Boston Housing > R folder if you have cloned and synced the course GitHub repo down to your computer
- note: R Markdown requires TeX for conversion to PDF
Python code:
BostonHousing_KNN_BiasVarTradeOff_CrossValid.ipynb
- please save as an iPython Notebook (.ipynb) file on your computer and open in an iPython Notebook server session; OR:
- you may also find it in the Programming Scripts > Boston Housing > Python folder if you have cloned and synced the course GitHub repo down to your computer
- The iPython NBviewer website allows you to view the notebook in a read-only mode
Homework assignments:
Optional textbook reading:
An Introduction to Statistical Learning: Section 2, Section 5.1, Section 8.1
Additional reading:
Machine Learning: Trends, Perspectives, and Prospects
M. I. Jordan and T. M. Mitchel
A Science review article from two leading experts in Machine Learning
Week 3
Lecture Slides:
Trees, Bagging, Random Forests and Boosting
R code:
knn-bagging.R
boosting_demo_1D.R
boosting_demo_2D.R
BostonHousing_Trees_RandomForests_BoostedAdditiveModels.Rmd
- you can find an advanced version featuring the popular caret package and parallel computation here: BostonHousing_Trees_RandomForests_BoostedAdditiveModels_usingCaretPackage.Rmd
Python code:
BostonHousing_Trees_RandomForests_BoostedAdditiveModels.ipynb
Homework assignment: See Problem 9.1 in Lecture notes for this week.
Optional textbook reading: An Introduction to Statistical Learning: Chapter 8
Week 4
Lecture Slides:
Classification
Perceptron
Perceptron -- R Markdown Script to recreate slides
R code:
fglass.R
04_kaggle_logit_rf_boost.R
04_simulation_logit_rf_boost.R
04_tabloid_logit_rf_boost.R
KaggleCreditScoring_usingCaretPackage.Rmd
TabloidMarketing_usingCaretPackage.Rmd
- The use of
caret
package and parallel computing is highly encouraged from this point on, especially if you decide to train models on a sizeable Training data set
Python code:
KaggleCreditScoring.ipynb
TabloidMarketing.ipynb
Homework assignment:
04_hw.pdf
Start early.
- model answers in R Markdown; note: the whole thing takes a long time to run completely
- model answers in iPython Notebook; note: the whole thing takes a long time to run completely
Optional textbook reading: An Introduction to Statistical Learning: Chapter 4 (we will not talk about linear discriminant analysis)
Pedro Domingos: A Few Useful Things to Know about Machine Learning PDF
D. Sculley et al.: Machine Learning: The High Interest Credit Card of Technical Debt PDF
Week 5
Lecture Slides:
Logistic regression
RMarkdown -- Logistic regression
R code:
lr_decision_surface.R
we8there.R
Optional textbook reading:_ An Introduction to Statistical Learning: Chapter 4, Section 6.2
Midterm Exam
- model answers in R Markdown:
Week 6
Lecture Slides:
Neural networks
MNIST example
R code:
See our GitHub.
We suggest you to clone the folder "Lecture06" or download all of its content, as the folder contains some pretrained models, which may take a long time to train again.
In order to install h2o package, go to http://h2o-release.s3.amazonaws.com/h2o/master/3232/index.html, click on "INSTALL IN R", and follow instructions.
Alternatively, you can type the following in R:
source("https://raw.githubusercontent.com/ChicagoBoothML/HelpR/master/booth.ml.packages.R")
Python code
MNISTDigits_NeuralNet_KerasPackage.ipynb
Homework assignment:
- model answers in R Markdown; note: the whole thing takes a long time to run completely
- model answers in iPython Notebook; note: the whole thing takes a long time to run completely
To load data use:
source("ParseData.R")
data <- parse_human_activity_recog_data()
Due Sunday, November 8.
Optional textbook reading: The Elements of Statistical Learning: Sections 11.3 - 11.5
Some h2o resources:
Week 7
Lecture Slides:
Recommender Systems
R code:
simpleScript.R This is a toy example illustrating how to compute similarities between users, recommend items and predict ratings.
MovieLens_MovieRecommendation.Rmd
In this lecture, we will be using recommenderlab package.
recommenderlab: Reference manual
recommenderlab: Vignette
Python code
MovieLens_LatentFactorRec.ipynb
- note: this script requires Apache Spark to be installed
Homework assignment:
Assignment
Data: videoGames.json.gz
Starter script: starterScript.R
Optional reading:
Amazon.com Recommendations
Cold Start Problem
Matrix Factorization Techniques For Recommender Systems
All Together Now: A Perspective on the Netflix Prize
Week 8
Lecture Slides:
Networks
Ego nets Slides from KDD tutorial on Graph-Based User Behaviour Modeling
R code:
See our GitHub.
Homework assignment:
Assignment
Data: wikipedia.gml
Starter script: starterScript.R
Optional reading:
See Chapters 3 and 4 of "Statistical Analysis of Network Data with R" (PDF available through UChicago library)
Weeks 9-10
Lecture Slides:
Probabilistic Graphical Models
Example PGM
Hidden Markov Models
R code:
NB_reviews.R.
Large Movie Review Dataset can be downloaded from here.
Direct link to data: aclImdb_v1.tar.gz
Homework assignment:
Assignment
Data: emails.cvs
Starter script: starterScript.R
Optional reading:
Andrew Moore's basic probability tutorial
Rabiner's Detailed HMMs Tutorial
Text mining package
Graphical Models with R
HMM Tutorial
Animated HMM Tutorial