Computing Resources
This page contains links to software installation guides and some resources for R (our main computing platform) and Python (the best alternative).
R Resources
We will be using R as the main platform to perform data analysis in the class, however, you are welcome to use any other tool/programming language you are familiar with.
We strongly encourage you to get familiar with the basics of R, so that you can focus on Machine Learning. We will go through examples in R in-class and we will provide some instructions. We do not expect you to have taken a class that uses R previously. That said, this class is not a class on R.
- R, RStudio and CRAN Packages installation guides
- Rob's notes on R
- R's Introduction to R: This is written by stat/computing geeks for stat/computing geeks. It is not always the easiest thing to read. That being said, if you want to learn R, you should read at least the first bit of the chapter on major topics (e.g. data frames).
- The optional textbook An Introduction to Statistical Learning has a section 2.3 dedicated to the basics of R
- R Reference Card
- Google's R Tutorial
- Princeton's R Tutorial
- R Markdown tutorial: a tutorial by Vinh on R Markdown, an excellent dynamic document generator. We recommend you to try and use it to do your homework. It will save you a ton of time, and its output will be easy on the eye,
adding extra shine to your already brilliant analysis...
- Please note that R Markdown requires TeX for conversion to PDF
- R data.table tutorial: a tutorial by Vinh on R
data.table
, an advanced, high-performance alternative to R's defaultdata.frame
- Introduction to the caret Package: caret is a popular R package providing standardized interfaces with about 200 Machine Learning algorithms and many related data-processing procedures
- Google and question answering web-page StackOverflow
- There are many books and tutorials on R. If you find something that you find particularly useful, please share it with us on Piazza.
Python Resources
- Anaconda Python, PyCharm IDE, and Conda & PyPI Packages installation guides
- Theano package installation guide
- iPython Notebook tutorial: a tutorial by Vinh on
iPython Notebook, an super-excellent dynamic document editor / generator built on Python,
and a perfect answer to the aforementioned excellent R Markdown. If you do your homework or final project in Python,
we encourage you to use iPython Notebook to generate your analysis reports
- Please note that iPython Notebook requires TeX for conversion to PDF
- Pandas tutorials:
pandas
is a data frame solution in Python – Python's answer to R'sdata.frame
anddata.table
- A Basic Tutorial on SciKit-Learn: SciKit-Learn is the most popular Python package providing standardized interfaces with over 100 Machine Learning algorithms and many related data-processing procedures
Comparison between R and Python
There is a great deal of parallelism between the R and Python ecosystems of Machine Learning / Data Science and related software. Here is a brief table of comparison between the two ecosystems, including the leading who's-whos and what's-whats in various aspects:
R | PYTHON | |
---|---|---|
Linear Algebra | (built-in) | NumPy |
Packages Respository | Comprehensive R Archive Network (CRAN) | Python Package Index (PyPI) |
Go-To Package for Popular ML Algos | caret | SciKit-Learn |
Data Frame for Data Processing | data.frame, data.table | Pandas |
Visualization | ggplot2, ggvis, dygraphs | MatPlotLib, GGPlot, Bokeh, Plotly, Pyxley |
Large-Scale Parallel Computation | parallel, doMC, doParallel, snow | Apache Spark, Theano, Numba |
Symbolic Math | Ryacas, rSymPy | SymPy |
Dynamic Document Editors / Generators | R Markdown, Slidify | iPython Notebook |
App Development Frameworks | Shiny | Django, Flask, Jinja2 |
Software Unit-Testing Frameworks | testthat | Nose, DocTest, Py.Test, PyUnit, Tox |
Leading Developers | RStudio, Revolution Analytics (Microsoft subsidiary) | Continuum Analytics, Enthought |
Popular Integrated Devt. Envirs. (IDEs) | RStudio | PyCharm, Spyder, Rodeo |
Other Software Installation
- Git, GitHub & SourceTree installation guides
- TeX: for rendering math equations in various document formats
- Cygwin installation guide (for Windows users only): Cygwin is a Unix-style command-line terminal for Windows