{"id":563,"date":"2018-01-13T17:15:20","date_gmt":"2018-01-13T17:15:20","guid":{"rendered":"http:\/\/www.nullplug.org\/ML-Blog\/?p=563"},"modified":"2018-01-13T17:15:20","modified_gmt":"2018-01-13T17:15:20","slug":"problem-set-11","status":"publish","type":"post","link":"https:\/\/www.nullplug.org\/ML-Blog\/2018\/01\/13\/problem-set-11\/","title":{"rendered":"Problem Set 11"},"content":{"rendered":"<h2>Problem Set 11<\/h2>\n<p>This is to be completed by January 18th, 2018.<\/p>\n<h3>Exercises<\/h3>\n<ol>\n<li><a href=\"https:\/\/www.datacamp.com\/home\">Datacamp<\/a>\n<ul>\n<li>Complete the lesson:<br \/>\na. Intermediate Python for Data Science<\/li>\n<\/ul>\n<\/li>\n<li>What is the maximum depth of a decision tree trained on $N$ samples?<\/li>\n<li>If we train a decision tree to an arbitrary depth, what will be the training error?<\/li>\n<li>How can we alter a loss function to help regularize a decision tree?<\/li>\n<\/ol>\n<p>Python Lab<br \/>\n1. Construct a function which will transform a dataframe of numerical features into a dataframe of binary features of the same shape by setting the value of the jth feature of the ith sample to be true precisely when the value is greater than or equal to the median value of that feature.<br \/>\n2. Construct a function which when presented with a dataframe of binary features, labeled outputs, and a corresponding loss function and chooses the feature to split upon which will minimize the loss function. Here we assume that on each split the function will just return the mean value of the outputs.<br \/>\n3. Test these functions on a real world dataset (for classification) either from ISLR or from Kaggle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Problem Set 11 This is to be completed by January 18th, 2018. Exercises Datacamp Complete the lesson: a. Intermediate Python for Data Science What is the maximum depth of a decision tree trained on $N$ samples? If we train a decision tree to an arbitrary depth, what will be the training error? How can we &hellip; <a href=\"https:\/\/www.nullplug.org\/ML-Blog\/2018\/01\/13\/problem-set-11\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Problem Set 11&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-563","post","type-post","status-publish","format-standard","hentry","category-general"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9dIpN-95","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":555,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/12\/18\/problem-set-9\/","url_meta":{"origin":563,"position":0},"title":"Problem Set 9","author":"Justin Noel","date":"December 18, 2017","format":false,"excerpt":"Problem Set 9 This is to be completed by December 21st, 2017. Exercises Datacamp Complete the lesson: a. Intermediate R: Practice R Lab: Consider a two class classification problem with one class denoted positive. Given a list of probability predictions for the positive class, a list of the correct probabilities\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":538,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/11\/24\/problem-set-6\/","url_meta":{"origin":563,"position":1},"title":"Problem Set 6","author":"Justin Noel","date":"November 24, 2017","format":false,"excerpt":"Problem Set 6 This is to be completed by November 30th, 2017. Exercises Datacamp Complete the lesson: a. Text Mining: Bag of Words Exercises from Elements of Statistical Learning Complete exercises: a. 4.2 b. 4.6 Run the perceptron learning algorithm by hand for the two class classification problem with $(X,Y)$-pairs\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":33,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/machine-learning-overview\/","url_meta":{"origin":563,"position":2},"title":"Machine Learning Overview","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"Science is knowledge which we understand so well that we can teach it to a computer; and if we don't fully understand something, it is an art to deal with it. Donald Knuth Introduction First Attempt at a Definition One says that an algorithm learns if its performance improves with\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=700%2C400 2x"},"classes":[]},{"id":486,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/11\/03\/problem-set-3\/","url_meta":{"origin":563,"position":3},"title":"Problem Set 3","author":"Justin Noel","date":"November 3, 2017","format":false,"excerpt":"Problem Set 3 This is to be completed by November 9th, 2017. Exercises [Datacamp](https:\/\/www.datacamp.com\/home Complete the lesson \"Introduction to Machine Learning\". This should have also included \"Exploratory Data Analysis\". This has been added to the next week's assignment. MLE for the uniform distribution. (Source: Kaelbling\/Murphy) Consider a uniform distribution centered\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":572,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2018\/01\/19\/572\/","url_meta":{"origin":563,"position":4},"title":"Problem Set  12","author":"Justin Noel","date":"January 19, 2018","format":false,"excerpt":"Problem Set 12 This is to be completed by January 25th, 2018. Exercises Datacamp Complete the lesson: a. Python Data Science Toolbox (Part I) Let $S\\subset \\Bbb R^n$ with $|S|<\\infty$. Let $\\mu=\\frac{1}{|S|}\\sum_{x_i\\in S} x_i$. Show that $$ \\frac{1}{|S|}\\sum_{(x_i,x_j)\\in S\\times S} ||x_i-x_j||^2 = 2\\sum_{x_i\\in S} ||x_i-\\mu||^2.$$ Prove that the $K$-means clustering\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":214,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/04\/linear-regression\/","url_meta":{"origin":563,"position":5},"title":"Linear Regression","author":"Justin Noel","date":"October 4, 2017","format":false,"excerpt":"Prediction is very difficult, especially about the future. - Niels Bohr The problem Suppose we have a list of vectors (which we can think of as samples) $x_1, \\cdots, x_m\\in \\Bbb R^n$ and a corresponding list of output scalars $y_1, \\cdots, y_m \\in \\Bbb R$ (which we can regard as\u2026","rel":"","context":"In &quot;Regression&quot;","block_context":{"text":"Regression","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/regression\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_linreg_normal.gif?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_linreg_normal.gif?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_linreg_normal.gif?resize=525%2C300 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/563","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/comments?post=563"}],"version-history":[{"count":1,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/563\/revisions"}],"predecessor-version":[{"id":564,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/563\/revisions\/564"}],"wp:attachment":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/media?parent=563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/categories?post=563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/tags?post=563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}