{"id":468,"date":"2017-10-27T05:48:26","date_gmt":"2017-10-27T05:48:26","guid":{"rendered":"http:\/\/www.nullplug.org\/ML-Blog\/?p=468"},"modified":"2017-10-27T05:48:26","modified_gmt":"2017-10-27T05:48:26","slug":"problem-set-2","status":"publish","type":"post","link":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/27\/problem-set-2\/","title":{"rendered":"Problem Set 2"},"content":{"rendered":"<h2>Problem Set 2<\/h2>\n<p>This is to be completed by November 2nd, 2017.<\/p>\n<h3>Exercises<\/h3>\n<ol>\n<li><a href=\"https:\/\/www.datacamp.com\/home\">Datacamp<\/a>\n<ul>\n<li>Complete the lesson &#8220;Data Visualization in R&#8221;.<\/li>\n<\/ul>\n<\/li>\n<li>Probabilities are sensitive to the form of the question that was used to generate the answer:\n<ul>\n<li>(Source: Minka, Murphy.) My neighbor has two children. Assuming that the gender of a child is like a coin flip, it is most likely, a priori, that my neighbor has one boy and one girl, with probability 1\/2. The other possibilities\u2014two boys or two girls\u2014have probabilities 1\/4 and 1\/4.<br \/>\na. Suppose I ask him whether he has any boys, and he says yes. What is the probability that one child is a girl?<br \/>\nb. Suppose instead that I happen to see one of his children run by, and it is a boy. What is the probability that the other child is a girl?<\/li>\n<\/ul>\n<\/li>\n<li>Legal reasoning\n<ul>\n<li>(Source: Peter Lee, Murphy) Suppose a crime has been committed. Blood is found at the scene for which there is no innocent explanation. It is of a type which is present in 1% of the population.<br \/>\na. The prosecutor claims: \u201cThere is a 1% chance that the defendant would have the crime blood type if he were innocent. Thus there is a 99% chance that he guilty\u201d. This is known as the <em>prosecutor\u2019s fallacy<\/em>. What is wrong with this argument?<br \/>\nb. The defender claims: \u201cThe crime occurred in a city of 800,000 people. The blood type would be found in approximately 8000 people. The evidence has provided a probability of just 1 in 8000 that the defendant is guilty, and thus has no relevance.\u201d This is known as the <em>defender\u2019s fallacy<\/em>. What is wrong with this argument?<\/li>\n<\/ul>\n<\/li>\n<li>Bayes rule for medical diagnosis\n<ul>\n<li>(Source: Koller, Murphy.) After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probability of testing negative given that you don\u2019t have the disease). The good news is that this is a rare disease, striking only one in 10,000 people.<br \/>\na. What are the chances that you actually have the disease? (Show your calculations as well as giving the final result.)<\/li>\n<\/ul>\n<\/li>\n<li>Conditional independence (Source: Koller.)\n<ul>\n<li>Let $H\\in &#123;1,\\cdots, K&#125;$ be a discrete random variable, and let $e_1$ and $e_2$ the observed values of two other random variables $E_1$ and $E_2$. Suppose we wish to calculate the vector<br \/>\n$$P(H|e_1, e_2) = (P(H=1|e_1,e_2),\\cdots, P(H=K|e_1, e_2)).$$<br \/>\na. Which of the following sets of numbers are sufficient for the calculation?<br \/>\ni. $P(e_1, e_2), P(H), P(e_1| H), P(e_2, H)$.<br \/>\nii. $P(e_1, e_2), P(H), P(e_1, e_2 | H)$.<br \/>\niii. $P(e_1|H), P(e_2|H), P(H)$.  <\/li>\n<\/ul>\n<p>b. Now suppose we now assume $E_1\\perp E_2 | H$ (i.e., $E_1$ and $E_2$ are independent given $H$). Which of the above 3 sets are sufficient now?<\/p>\n<\/li>\n<li>R lab\n<ul>\n<li>Estimate the value of $\\pi$ by taking uniform random samples from the square $[-1,1]\\times [1,1]$ and seeing which lie in the disc $x^2+y^2\\leq 1$.<\/li>\n<li>A company is trying to determine why their employees leave and why they stay. They have a list of roughly 15000 employee records <a href=\"https:\/\/www.kaggle.com\/ludobenistant\/hr-analytics\/data\">here<\/a>.<br \/>\na. Download this dataset and load it in R (this may require setting up a Kaggle account if you don&#8217;t already have one).<br \/>\nb. Examine the dataset and see if you need to transform any of the features=columns (e.g., are there factors that were not recognized as such, is there missing data?).<br \/>\nc. Randomly shuffle the rows and cut the dataset into two pieces with 10000 entries in a data frame called train and the remaining entries in a data frame called valid.<br \/>\nd. Study the train data frame and see if you can find any features that predict whether or not an employee will leave.<br \/>\ne. Make a hypothesis about how you can predict whether an employee will leave by studying the train data.<br \/>\nf. Once you have fixed this hypothesis evaluate how well your criteria work on the valid data frame.<br \/>\ng. Justify your proposal with data and charts. Save at least one of these charts to a pdf file to share with management.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Problem Set 2 This is to be completed by November 2nd, 2017. Exercises Datacamp Complete the lesson &#8220;Data Visualization in R&#8221;. Probabilities are sensitive to the form of the question that was used to generate the answer: (Source: Minka, Murphy.) My neighbor has two children. Assuming that the gender of a child is like a &hellip; <a href=\"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/27\/problem-set-2\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Problem Set 2&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[],"class_list":["post-468","post","type-post","status-publish","format-standard","hentry","category-general"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9dIpN-7y","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":555,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/12\/18\/problem-set-9\/","url_meta":{"origin":468,"position":0},"title":"Problem Set 9","author":"Justin Noel","date":"December 18, 2017","format":false,"excerpt":"Problem Set 9 This is to be completed by December 21st, 2017. Exercises Datacamp Complete the lesson: a. Intermediate R: Practice R Lab: Consider a two class classification problem with one class denoted positive. Given a list of probability predictions for the positive class, a list of the correct probabilities\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":61,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/probability-and-statistics-background\/","url_meta":{"origin":468,"position":1},"title":"Probability and Statistics Background","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"Statistics - A subject which most statisticians find difficult, but in which nearly all physicians are expert. - Stephen S. Senn Introduction For us, we will regard probability theory as a way of logically reasoning about uncertainty. I realize that this is not a precise mathematical definition, but neither is\u2026","rel":"","context":"In &quot;Supplementary material&quot;","block_context":{"text":"Supplementary material","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/supplementary-material\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":342,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/10\/hypothesis-testing\/","url_meta":{"origin":468,"position":2},"title":"Hypothesis Testing","author":"Justin Noel","date":"October 10, 2017","format":false,"excerpt":"Acceptance without proof is the fundamental characteristic of Western religion, rejection without proof is the fundamental characteristic of Western science. - Gary Zukav, \"The Dancing Wu Li Masters\" Hypothesis Testing Now we consider Hypothesis Testing in an example. While Bayesians also have a form of hypothesis testing, the term is\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":344,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/10\/parameter-estimation\/","url_meta":{"origin":468,"position":3},"title":"Parameter Estimation","author":"Justin Noel","date":"October 10, 2017","format":false,"excerpt":"\u2026the statistician knows\u2026that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world. - George Box (JASA, 1976, Vol.\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_polyreg_normal.gif?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_polyreg_normal.gif?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_polyreg_normal.gif?resize=525%2C300 1.5x"},"classes":[]},{"id":486,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/11\/03\/problem-set-3\/","url_meta":{"origin":468,"position":4},"title":"Problem Set 3","author":"Justin Noel","date":"November 3, 2017","format":false,"excerpt":"Problem Set 3 This is to be completed by November 9th, 2017. Exercises [Datacamp](https:\/\/www.datacamp.com\/home Complete the lesson \"Introduction to Machine Learning\". This should have also included \"Exploratory Data Analysis\". This has been added to the next week's assignment. MLE for the uniform distribution. (Source: Kaelbling\/Murphy) Consider a uniform distribution centered\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":286,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/05\/statistical-inference-2\/","url_meta":{"origin":468,"position":5},"title":"Statistical Inference","author":"Justin Noel","date":"October 5, 2017","format":false,"excerpt":"All models are wrong, but some are useful. - George Box Introduction The general setup for statistical inference is that we are given some data $D$ which we assume arise as the values of a random variable that we assume is distributed according to some parametric model $m(\\theta)$. The goal\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/468","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/comments?post=468"}],"version-history":[{"count":6,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/468\/revisions"}],"predecessor-version":[{"id":475,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/468\/revisions\/475"}],"wp:attachment":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/media?parent=468"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/categories?post=468"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/tags?post=468"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}