Problem Set 12

Problem Set 12

This is to be completed by January 25th, 2018.

Exercises

  1. Datacamp
    • Complete the lesson:
      a. Python Data Science Toolbox (Part I)
  2. Let $S\subset \Bbb R^n$ with $|S|<\infty$. Let $\mu=\frac{1}{|S|}\sum_{x_i\in S} x_i$. Show that $$ \frac{1}{|S|}\sum_{(x_i,x_j)\in S\times S} ||x_i-x_j||^2 = 2\sum_{x_i\in S} ||x_i-\mu||^2.$$
  3. Prove that the $K$-means clustering algorithm converges.

Python Lab

  1. Implement a $K$-Nearest Neighbors classifier and apply it to the MNIST dataset (you will probably need to apply PCA, you can use a library for this at this point).
  2. Implement a $K$-Means clustering algorithm and apply it to the MNIST dataset (after removing the labels and applying a PCA transformation) with $K=10$. Compare the cluster labelings with the actual labelings.
  3. Complete the implementation of the decision tree algorithm from last week.

Leave a Reply

Your email address will not be published. Required fields are marked *