Final Project
Your class project is an opportunity for you to explore an interesting Machine Learning problem of your choice in the context of a real-world data set. Your class project must be about new things you have done this semester; you can't use results you have developed in previous semesters. We will provide some project ideas here, but the best idea is to combine Machine Learning with a problem in your business area.
Deadlines:
- Project Proposal: Week 7: Sunday, Nov 8
- Project Write-up: Week 11: Friday, Dec 11
Project Proposal
Proposals should be one page maximum.
Include the following information:
- Project Title
- Team members
- Data Set. Briefly describe data set that you are going to use. You should have access to this data set, rather than promissing to collect data.
- Project Idea. Describe what you want to achieve. What Machine Learning tools are you going to use? How is the data set going to help you answer your question?
We do not expect the same amount of work from a team of 4 people and a group of 1 or 2.
Think of the project proposal as an early draft of your final write-up. Ideally you will be able to start from your proposal and expand it to the final document.
Where to get data
Here are some ideas on where to get data. You are welcome to use data from any other source as well.
-
City of Chicago data https://data.cityofchicago.org/
-
Airlines data set http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html
-
Kaggle Competitions http://www.kaggle.com/
-
UCI Machine Learning Library http://archive.ics.uci.edu/ml/datasets.html
-
Stanford Large Network Dataset Collection http://snap.stanford.edu/data/index.html
-
Million Songs Database http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset
-
PGA Tour (contributed by Tyler Burkett) http://www.pgatour.com/stats/shotlinkintelligence.html
-
Big list of publically available data sets http://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/
-
Quora answer on "Where can I find large datasets open to the public?" https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
Project Write-Up
Create a professional-grade report on your project. Imagine a very statistically savvy client who would read your report. The report should not be longer than 10 pages, including figures and your analysis. You do not need to include your R code with the submission. However, in case you decide to include it, please do so in the appendix.
Your report should summarize what was the goal of the project, what data you used, what did you do to your data and why. Tell us what did you learn from the project. Did you manage to solve a particular business problem using machine learning tools? If your method did not do as well as you were expecting, could you elaborate as to why? What conclusions can you draw from the project?