Instructor: Prof. Mladen Kolar (MKolar@chicagobooth.edu), office: Harper Center 338
TAs:
Chaoxing Dai (chaoxingdai@uchicago.edu)
Jingyu He (jingyuhe@uchicago.edu)
Office hours: by appointment
Review Session: 12:15pm-1:pm Saturdays, Gleacher 406
Piazza Discussion Forum
https://piazza.com/chicagobooth/winter2018/bus41204/home
You can add yourself to the course here: https://piazza.com/chicagobooth/winter2018/bus41204
We will be using Piazza for class discussion. Please post any questions that you may have there, rather than emailing TAs or instructors. All of your classmates will benefit from the public discussions.
We also encourage you to answer other students’ questions. Answering question will reinforce your learning and understanding of the material. We will carefully check all discussions and give clarification in case of any confusion. Sometimes the best answer may be “Let me Google that for you…”
Course Summary
Machine Learning is the very core of modern Data Analytics, which companies big and small are leveraging to mine commercial value out of their increasingly vast troves of information. This course aims to give you an introduction to a wide range of popular, powerful and highly practical Machine Learning algorithms, with the hope that what you take away here will help you rise up to data-enabled commercial opportunities in your subsequent careers.
You will learn about state-of-the-art Machine Learning techniques and how to apply them in business related problems. We will introduce techniques in the context of business applications and emphasize how Machine Learning can be used to provide insights and create value from data.
The first, and biggest, part of the course will focus on Predictive Analytics / Supervised Learning. You will learn about Decision Trees, Nearest-Neighbor Classifiers, Boosting, Random Forests, Deep Neural Networks, Naive Bayes and Support Vector Machines. Among other examples, we will apply these techniques to detecting spam in email, click-through rate prediction in online advertisement, image classification, face recognition, sentiment analysis and churn prediction. You will learn what techniques to apply and why.
In the second part of the class, you will learn about Unsupervised Learning techniques for extracting actionable patterns from data. Examples include Clustering, Collaborative Filtering, and Dimensionality Reduction with applications to customer segmentation, recommender systems, graph and time series mining, and anomaly detection.
Prerequisites
This course does not require sophisticated mathematical knowledge nor extensive programming experience. However, the nature of the material is somewhat technical.
BUS 41000 (Business Statistics) or BUS 41100 (Applied Regression) is highly recommended. You should be familiar with basic Probability and Linear Regression. If you have taken neither of these courses, you can still take Machine Learning if you have a strong quantitative background. However, in case there are gaps in your background, we expect that you fill those gaps as soon as possible, or withdraw from the course otherwise.
In addition, you should be familiar with programming in R, a platform for statistical analysis. Students in the previous year found using R harder part of the course compared to necessary mathematical background.
Computing
All computing in class will be conducted in R. The focus of the class will be on teaching Machine Learning concepts rather than how to use R.
Our colleague Matt Taddy (of BUS 41201 Big Data fame) lists some useful resources for learning R here.
Other programming languages: There are many programming languages that have readily available Machine Learning libraries. You are free to use any of them in your homework and projects. However, if you opt to do so, we may not be able to provide much support.
Textbooks
There are no required textbooks. All materials notes will be available on the class website.
Optional textbooks: Books below do not cover all the material we plan to cover in the class. We list both technical books and business related books.
An Introduction to Statistical Learning with Applications in R
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
This book provides a great introduction to Machine Learning.
Concepts are well explained, without too much technical details.
PDF available online
Deep Learning
Ian Goodfellow and Yoshua Bengio and Aaron Courville
Another great introductory book, but focusing on deep learning.
PDF available online
Elements of Statistical Learning
Trevor Hastie, Robert Tibshirani and Jerome Friedman
This book covers similar material to the one above, however, it provides more technical material.
PDF available online
Machine Learning: a Probabilistic Perspective
Kevin Murphy
PhD-level book, providing a encyclopedic survey of the area.
The following three books are very light on technical details, but do talk about applying Machine Learning in the context of business applications.
Data Science for Business
Foster Provost and Tom Fawcett
Predictive analytics
Eric Siegel
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
Jared Dean
Optional references on R. Here are additional resources.
R in a Nutshell
Joseph Adler
A Beginner’s Guide to R
Alain Zuur, Elena N. Ieno, Erik Meesters
Evaluation
Grades will be determined by:
-
group homework (19%)
-
one individual homework assignment (5%)
-
participation on Piazza (2%)
-
a take-home midterm exam (37%)
-
a group final project (37%)
Students must adhere to Booth Honor Code. But you do not need to include the honor code, and signatures, etc., on your work.