Problem Set 1

Problem Set 1

This is to be completed by November 27th, 2017. (THIS IS A TYPO: This should read October 26th, 2017). It is okay if you finish by this ridiculous first due date.


For several of these exercises, you will be asked to install software and/or setup accounts from outside websites. These websites may try to convince you to purchase some of their products. You do not need to purchase anything to do these exercises. All of the required materials have been made freely available to us.


  1. Learn the most important skill for this course
    • Learn how to use Google to find an answer to almost any technical problem (e.g., ‘How do I install — for Windows?’ or ‘How do I do — in R/Python?’ or ‘I’m getting a weird error message, how do I fix it?’). If you have not learned how to do this yet, I highly recommend it. It will open up your world to a range of new possibilities.
  2. Datacamp
    • Setup a Datacamp account and join the course group by using the email invitation. Note you should not have to pay any money to do this. I do not believe you will need to provide any bank information for this purpose.
    • Try out some of the lessons designated for the course.
    • During the course you should keep up with the assigned lessons.
  3. RStudio
    • Follow this tutorial to install RStudio. RStudio is an interactive development environment for the R programming language. Which will make it easier to complete the R exercises. Again, you should not have to pay anything for this.
    • Install the ISLR package from CRAN. While you’re at it you may as well install caret, dplyr, tidyr, and ggplot2.
    • Try out the commands from this lab.
  4. Python (Optional, but recommended for those interested in learning Python1)
    • Install Anaconda using Python 3.x (Python 2.7 is outdated, but still often used). As I remember it, this requires you to setup an account, but you will not have to pay anything. They will send you emails which you will have to unsubscribe from. I believe this is a small price to pay for a system that manages your Python installation and includes many of the packages used in data science.
  5. Additional material
    • Glance at some of the additional source material for the course.
    • Since we will be using it for R labs, download the ISLR book (or go and buy it if you like it!).
  6. R lab
    • Complete exercise 8 from Chapter 2 of ISLR. In this exercise you will be exploring data gathered from universities and colleges in the United States.

  1. In my humble opinion, Python is a truly wonderful language with an incredible number of highly useful libraries. Once you know Python, it is a surprisingly short road to solving many typical computer tasks. It is typically much slower than other, or more grown-up, languages such as C/C++ or Java, but the speed-up in development time makes up for it for the kind of tasks we will be considering. Moreover, the syntax is very natural and easy to learn. (Drops the microphone) 

Leave a Reply

Your email address will not be published. Required fields are marked *