## Problem Set 12

This is to be completed by January 25th, 2018.

### Exercises

- Datacamp
- Complete the lesson:

a. Python Data Science Toolbox (Part I)

- Complete the lesson:
- Let $S\subset \Bbb R^n$ with $|S|<\infty$. Let $\mu=\frac{1}{|S|}\sum_{x_i\in S} x_i$. Show that $$ \frac{1}{|S|}\sum_{(x_i,x_j)\in S\times S} ||x_i-x_j||^2 = 2\sum_{x_i\in S} ||x_i-\mu||^2.$$
- Prove that the $K$-means clustering algorithm converges.

### Python Lab

- Implement a $K$-Nearest Neighbors classifier and apply it to the MNIST dataset (you will probably need to apply PCA, you can use a library for this at this point).
- Implement a $K$-Means clustering algorithm and apply it to the MNIST dataset (after removing the labels and applying a PCA transformation) with $K=10$. Compare the cluster labelings with the actual labelings.
- Complete the implementation of the decision tree algorithm from last week.