Problem Set 11

Problem Set 11

This is to be completed by January 18th, 2018.

Exercises

  1. Datacamp
    • Complete the lesson:
      a. Intermediate Python for Data Science
  2. What is the maximum depth of a decision tree trained on $N$ samples?
  3. If we train a decision tree to an arbitrary depth, what will be the training error?
  4. How can we alter a loss function to help regularize a decision tree?

Python Lab
1. Construct a function which will transform a dataframe of numerical features into a dataframe of binary features of the same shape by setting the value of the jth feature of the ith sample to be true precisely when the value is greater than or equal to the median value of that feature.
2. Construct a function which when presented with a dataframe of binary features, labeled outputs, and a corresponding loss function and chooses the feature to split upon which will minimize the loss function. Here we assume that on each split the function will just return the mean value of the outputs.
3. Test these functions on a real world dataset (for classification) either from ISLR or from Kaggle.

Leave a Reply

Your email address will not be published. Required fields are marked *