## Problem Set 13

This is to be completed by February 1st, 2018.

### Exercises

* Complete the lesson:
a. Python Data Science Toolbox (Part II)

1. For a logistic regressor (multiclass ending in softmax) write down the update rules for gradient descent.

2. For a two layer perceptron ending in softmax with intermediate relu non-linearity write down the update rules for gradient descent.

### Python Lab

1. Build a two layer perceptron (choose your non-linearity) in numpy for a multi-class classification problem and test it on MNIST.
2. Build a MLP in Keras and test it on MNIST.

## Problem Set 12

This is to be completed by January 25th, 2018.

### Exercises

1. Datacamp
• Complete the lesson:
a. Python Data Science Toolbox (Part I)
2. Let $S\subset \Bbb R^n$ with $|S|<\infty$. Let $\mu=\frac{1}{|S|}\sum_{x_i\in S} x_i$. Show that $$\frac{1}{|S|}\sum_{(x_i,x_j)\in S\times S} ||x_i-x_j||^2 = 2\sum_{x_i\in S} ||x_i-\mu||^2.$$
3. Prove that the $K$-means clustering algorithm converges.

### Python Lab

1. Implement a $K$-Nearest Neighbors classifier and apply it to the MNIST dataset (you will probably need to apply PCA, you can use a library for this at this point).
2. Implement a $K$-Means clustering algorithm and apply it to the MNIST dataset (after removing the labels and applying a PCA transformation) with $K=10$. Compare the cluster labelings with the actual labelings.
3. Complete the implementation of the decision tree algorithm from last week.

## Problem Set 11

This is to be completed by January 18th, 2018.

### Exercises

1. Datacamp
• Complete the lesson:
a. Intermediate Python for Data Science
2. What is the maximum depth of a decision tree trained on $N$ samples?
3. If we train a decision tree to an arbitrary depth, what will be the training error?
4. How can we alter a loss function to help regularize a decision tree?

Python Lab
1. Construct a function which will transform a dataframe of numerical features into a dataframe of binary features of the same shape by setting the value of the jth feature of the ith sample to be true precisely when the value is greater than or equal to the median value of that feature.
2. Construct a function which when presented with a dataframe of binary features, labeled outputs, and a corresponding loss function and chooses the feature to split upon which will minimize the loss function. Here we assume that on each split the function will just return the mean value of the outputs.
3. Test these functions on a real world dataset (for classification) either from ISLR or from Kaggle.

## Problem Set 10

This is to be completed by January 11th, 2018.

### Exercises

1. Datacamp
• Complete the lesson:
a. Intro to Python for Data Science

During this week’s problem session I will provide an introduction to Python.

## Problem Set 9

This is to be completed by December 21st, 2017.

### Exercises

1. Datacamp
• Complete the lesson:
a. Intermediate R: Practice
2. R Lab:
• Consider a two class classification problem with one class denoted positive.
• Given a list of probability predictions for the positive class, a list of the correct probabilities (0’s and 1’s), and a number N>=2 of data points, construct a function which produces an Nx2 matrix/dataframe whose ith row (starting at 1) is the pair (x,y) where x is the false positive rate and y is the true positive rate of a classifier which classifies to true if the probability is greater or equal to (i-1)/(N-1).
• Construct another function which produces the line graph associated to the points from the previous function .
• Finally, produce another function which estimates the area under the curve of the previous graph.

## Problem Set 7

This is to be completed by December 7th, 2017.

### Exercises

1. Datacamp
• Complete the lesson:
a. Credit risk modeling in R.
2. Exercises from Elements of Statistical Learning
• Complete exercise:
a. 4.5 (Use the reduced form of the logistic classifier that fits an (n,k-1)-matrix for a problem with n features and k classes).
3. R Lab:
• Construct a logistic regression classifier by hand and test it on MNIST.

## Problem Set 6

This is to be completed by November 30th, 2017.

### Exercises

1. Datacamp
• Complete the lesson:
a. Text Mining: Bag of Words
2. Exercises from Elements of Statistical Learning
• Complete exercises:
a. 4.2
b. 4.6
3. Run the perceptron learning algorithm by hand for the two class classification problem with $(X,Y)$-pairs (given by bitwise or): $((0,0), 0), ((1,0),1), ((0,1),1)), ((1,1),1)$.

4. R Lab:

• Update the LDA Classifier from last week as follows.
a. After fitting an LDA Classifier, produce a function which projects an input sample onto the hyperplane containing the class centroids.
b. Update the classifier to use these projections for classification. Compare the runtimes of prediction of the two methods when the number of features is large relative to the number of classes.
• Construct a perceptron classifier for two class classification. Put an upper bound on the number of steps.
a. Evaluate the perceptron on the above problem and for the bitwise xor problem: $((0,0), 0), ((1,0),1), ((0,1),1)), ((1,1),0)$.

## Problem Set 5

This is to be completed by November 23rd, 2017.

### Exercises

1. Datacamp
• Complete the lesson:
a. Machine Learning Toolbox
2. R Lab:
• Write a function in R that will take in a vector of discrete variables and will produce the corresponding one hot encodings.
• Write a function in R that will take in a matrix $X$ of samples and a vector $Y$ of classes (in $(1,…,K)$) and produces a function which classifies a new sample according to the LDA rule (do not use R’s built in machine learning facilities).
• Do the same for QDA.
• Apply your models to the MNIST dataset for handwriting classification. There are various ways to get this dataset, but perhaps the easiest is to pull it in through the keras package. Besides having keras is useful anyway. You may need to reduce the dimension of the data and/or the number of samples to get this to work in a re

## Problem Set 4

This is to be completed by November 16th, 2017.

### Exercises

1. Datacamp
• Complete the lessons:
a. Supervised Learning in R: Regression
b. Supervised Learning in R: Classification
c. Exploratory Data Analysis (If you did not already do so)
2. Let $\lambda\geq 0$, $X\in \Bbb R^n\otimes \Bbb R^m$, $Y\in \Bbb R^n$, and $\beta \in \Bbb R^m$ suitably regarded as matrices.
• Identify when $$\textrm{argmin}_\beta (X\beta-Y)^t(X\beta-Y)+\lambda \beta^t\beta$$ exists, and determine it in these cases.
• How does the size of $\lambda$ affect the solution? When might it be desirable to set $\lambda$ to be positive?
3. Bayesian approach to linear regression. Suppose that $\beta\sim N(0,\tau^2)$, and the distribution of $Y$ conditional on $X$ is $N(X\beta,\sigma^2I)$, i.e., $\beta$, $X$, and $Y$ are vector valued random variables. Show that, after seeing some data $D$, the MAP and mean estimates of the posterior distribution for $\beta$ correspond to solutions of the previous problem.

4. R Lab:

• Write a linear regression function that takes in a matrix of $x$-values and a corresponding vector of $y$-values and returns a function derived from the linear regression fit.
• Write a function that takes in a non-negative number (the degree), a vector of $x$-values and a corresponding vector of $y$-values and returns a function derived from the polynomial regression fit.
• Write a function that takes in a number $n$, a vector of $x$-values, and a corresponding vector of $y$-values and returns a function of the form: $$f(x)=\sum_{i=0}^n a_i \sin(ix)+b_i\cos(ix).$$
• Generate suitable testing data for the three functions constructed above and plot the fitted functions.