Problem Set 13

Problem Set 13

This is to be completed by February 1st, 2018.

Exercises

  1. Datacamp

* Complete the lesson:
a. Python Data Science Toolbox (Part II)

  1. For a logistic regressor (multiclass ending in softmax) write down the update rules for gradient descent.

  2. For a two layer perceptron ending in softmax with intermediate relu non-linearity write down the update rules for gradient descent.

Python Lab

  1. Build a two layer perceptron (choose your non-linearity) in numpy for a multi-class classification problem and test it on MNIST.
  2. Build a MLP in Keras and test it on MNIST.

Problem Set 12

Problem Set 12

This is to be completed by January 25th, 2018.

Exercises

  1. Datacamp
    • Complete the lesson:
      a. Python Data Science Toolbox (Part I)
  2. Let $S\subset \Bbb R^n$ with $|S|<\infty$. Let $\mu=\frac{1}{|S|}\sum_{x_i\in S} x_i$. Show that $$ \frac{1}{|S|}\sum_{(x_i,x_j)\in S\times S} ||x_i-x_j||^2 = 2\sum_{x_i\in S} ||x_i-\mu||^2.$$
  3. Prove that the $K$-means clustering algorithm converges.

Python Lab

  1. Implement a $K$-Nearest Neighbors classifier and apply it to the MNIST dataset (you will probably need to apply PCA, you can use a library for this at this point).
  2. Implement a $K$-Means clustering algorithm and apply it to the MNIST dataset (after removing the labels and applying a PCA transformation) with $K=10$. Compare the cluster labelings with the actual labelings.
  3. Complete the implementation of the decision tree algorithm from last week.

Problem Set 11

Problem Set 11

This is to be completed by January 18th, 2018.

Exercises

  1. Datacamp
    • Complete the lesson:
      a. Intermediate Python for Data Science
  2. What is the maximum depth of a decision tree trained on $N$ samples?
  3. If we train a decision tree to an arbitrary depth, what will be the training error?
  4. How can we alter a loss function to help regularize a decision tree?

Python Lab
1. Construct a function which will transform a dataframe of numerical features into a dataframe of binary features of the same shape by setting the value of the jth feature of the ith sample to be true precisely when the value is greater than or equal to the median value of that feature.
2. Construct a function which when presented with a dataframe of binary features, labeled outputs, and a corresponding loss function and chooses the feature to split upon which will minimize the loss function. Here we assume that on each split the function will just return the mean value of the outputs.
3. Test these functions on a real world dataset (for classification) either from ISLR or from Kaggle.