BUS 41204: Machine Learning

Weekly Lecture Time

Section Time Venue
41204-01 Thursday 08:30 AM – 11:30 AM Harper Center C01
41204-81 Friday 06:00 PM – 09:00 PM Gleacher Center 208
41204-85 Saturday 01:30PM – 04:30PM Gleacher Center 208

People

Instructors:
Mladen Kolar (MKolar@chicagobooth.edu), office: Harper Center 306
Robert E. McCulloch (Robert.McCulloch@chicagobooth.edu), office: Harper Center 365

Teaching Assistants:
Daniel Hedblom (Hedblom@uchicago.edu)
Juan Yrigoyen (JYrigoyen@chicagobooth.edu)
Vinh Luong (MBALearnsToCode@uchicago.edu)

Office Hours: by appointment

Course Materials

Will be available on this website: Lectures, Data Sets

Course Materials on GitHub (optional)

GitHub Repository: http://GitHub.com/ChicagoBoothML/MachineLearning_Fall2015

Please follow instructions below to install Git, GitHub and SourceTree to sync course materials down to a folder on your computer.

GitHub.com, built based on version-control software Git, is the most popular platform nowadays for hosting open-source software code and related documentation and teaching materials.

Git and GitHub offer a very efficient means for us to keep track of changes and updates to our course materials, and for you to keep in-sync with the latest version of the materials. For this reason, we have decided to also distribute materials through this GitHub repository.

We ask that you follow these instructions to install Git, sign up for a GitHub.com account, install an app called SourceTree and sync this repo down to a folder on your computer.

Piazza Discussion Forum

http://Piazza.com/ChicagoBooth/Fall2015/BUS41204/home

You can add yourself to the course here: http://Piazza.com/ChicagoBooth/Fall2015/BUS41204

We will be using Piazza for class discussion. Please post any questions that you may have there, rather than emailing TAs or instructors. All of your classmates will benefit from the public discussions.

We also encourage you to answer other students' questions. Answering question will reinforce your learning and understanding of the material. We will carefully check all discussions and give clarification in case of any confusion. Sometimes the best answer may be "Let me Google that for you..."

If we answer your questions after 2am, consider sending a rose...

Course Summary

If you are visiting this site in Fall 2015, we predict with >90% confidence (and 110% excitement!!) that you're among the Booth students who, together with us, are making some nice history. This inaugural Machine Learning course specially tailored for MBAs is the very first of its kind in any curricula of the leading U.S. and international business schools that we know of.

Machine Learning is the very core of modern Data Analytics, which companies big and small are leveraging to mine commercial value out of their increasingly vast troves of information. This course aims to give you a 10-week tasting session on a diverse buffet of popular, powerful and highly practical Machine Learning algorithms, with the hope that what you take away here will help you rise up to data-enabled commercial opportunities in your subsequent careers.

You will learn about state-of-the-art Machine Learning techniques and how to apply them in business related problems. We will introduce techniques in the context of business applications and emphasize how Machine Learning can be used to provide insights and create value from data.

The first, and biggest, part of the course will focus on Predictive Analytics / Supervised Learning. You will learn about Decision Trees, Nearest-Neighbor Classifiers, Boosting, Random Forests, Deep Neural Networks, Naive Bayes and Support Vector Machines. Among other examples, we will apply these techniques to detecting spam in email, click-through rate prediction in online advertisement, image classification, face recognition, sentiment analysis and churn prediction. You will learn what techniques to apply and why.

In the second part of the class, you will learn about Unsupervised Learning techniques for extracting actionable patterns from data. Examples include Clustering, Collaborative Filtering, Probabilistic Graphical Models and Dimensionality Reduction with applications to customer segmentation, recommender systems, graph and time series mining, and anomaly detection.

Prerequisites

This course does not require sophisticated mathematical knowledge nor extensive programming experience. However, the nature of the material is somewhat technical.

BUS 41000 (Business Statistics) or BUS 41100 (Applied Regression) is highly recommended. You should be familiar with basic Probability and Linear Regression. If you have taken neither of these courses, you can still take Machine Learning if you have a strong quantitative background. However, in case there are gaps in your background, we expect that you fill those gaps as soon as possible, or withdraw from the course otherwise.

Addition, you should be familiar with programming in R, a platform for statistical analysis.

Computing

All computing in class will be conducted in R. The focus of the class will be on teaching Machine Learning concepts rather than how to use R.

Our colleague Matt Taddy (of BUS 41201 Big Data fame) lists some useful resources for learning R here.

Other programming languages: There are many programming languages that have readily available Machine Learning libraries. You are free to use any of them in your homework and projects. However, if you opt to do so, we may not be able to provide much support. One particular language with which we can offer you substantial help is Python, which has a strong Machine Learning software ecosystem with much paralellism to that of R. Our TA Vinh is your go-to pal if you want to try out Python.

Textbooks

There are no required textbooks. All materials notes will be available on the class website.

Optional textbooks: Books below do not cover all the material we plan to cover in the class. We list both technical books and business related books.

An Introduction to Statistical Learning with Applications in R
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
This book provides a great introduction to Machine Learning. Concepts are well explained, without too much technical details.
PDF available online

Elements of Statistical Learning
Trevor Hastie, Robert Tibshirani and Jerome Friedman
This book covers similar material to the one above, however, it provides more technical material.
PDF available online

Machine Learning: a Probabilistic Perspective
Kevin Murphy
PhD-level book, providing a encyclopedic survey of the area.

The following three books are very light on technical details, but do talk about applying Machine Learning in the context of business applications.

Data Science for Business
Foster Provost and Tom Fawcett

Predictive analytics
Eric Siegel

Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
Jared Dean

Optional references on R. Here are additional resources.

R in a Nutshell
Joseph Adler

A Beginner's Guide to R
Alain Zuur, Elena N. Ieno, Erik Meesters

Grading

Homework assignments: 20%; Midterm: 40%; Final Project: 40%

There will be 8 homework assignments that will be due weekly except week 1 and 6. These assignments can be done in groups (max size 4). You should submit only one write-up per group. Homework should be submitted through Chalk. Only top 7 submissions count towards your grade.

There is a take-home midterm that will be posted in week 5 and will be due in week 6. You must work individually on the midterm.

The final project can be done in a group. The goal will be to apply Machine Learning to a particular business problem. More details about the project can be found here.