{"id":63,"date":"2017-09-26T20:00:54","date_gmt":"2017-09-26T20:00:54","guid":{"rendered":"http:\/\/www.nullplug.org\/ML-Blog\/?p=63"},"modified":"2017-10-11T11:09:56","modified_gmt":"2017-10-11T11:09:56","slug":"computer-science-background","status":"publish","type":"post","link":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/computer-science-background\/","title":{"rendered":"Computer Science Background"},"content":{"rendered":"<blockquote><p>\n  If you find that you&#8217;re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you&#8217;re spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice. &#8211; <a href=\"https:\/\/en.wikiquote.org\/wiki\/Donald_Knuth\">Donald Knuth<\/a>\n<\/p><\/blockquote>\n<h2>Introduction<\/h2>\n<p>This section will contain some very basic computer science background. Since the target audience of this lecture series has little to no training in this area, I will attempt to fill in some gaps (admittedly very poorly).<\/p>\n<h3>Numerical analysis<\/h3>\n<p>It should come as no surprise that computers store data as sequences of 1&#8217;s and 0&#8217;s. For example, non-negative integers are represented by their base-2 expansions (e.g. 7 is represented by 111 and 8 by 1000). Since we can only allocate a finite amount of space for each integer (e.g., 16, 32, or 64 bits) we can only hold a finite range of integers (e.g., $-2^{63}\\leq i \\leq 2^{63}-1$) (see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Two%27s_complement\">two&#8217;s complement<\/a>). This means that one has to avoid applying operations that will take us out of this range. Arithmetic operations that generate values outside of their allowed range are said to <em>overflow<\/em>.<\/p>\n<p>Problems with the finite representations of integers are dwarfed by the problems arising from the finite representation of real numbers. The (IEEE) standard way to finitely represent a real number is as a <em>floating point number<\/em>. The idea is to approximate a real number $x$ by a non-negative integer $\\alpha$, an integer $\\beta$, and a sign bit $\\epsilon$ so that $x\\approx \u00a0(-1)^\\epsilon \\cdot \\alpha \\cdot 2^{\\beta}$. Since, using a fixed number of bits, we can only represent finitely many integers, there are only finitely many such floating point numbers.<\/p>\n<p>The finiteness of this representation leads to some obvious consequences as well as some non-obvious ones.<\/p>\n<ol>\n<li>Since the exponent $\\beta$ is bounded above, there is a bound for the range of real numbers that we can encode. This leads to the same type of overflow problems as for integers.<\/li>\n<li>Since the exponent $\\beta$ is bounded below, there is a smallest positive real number we can encode in this representation. This leads to a new set of problems called <em>underflow<\/em>. For example $e^{-n}=0$ for $n\\gg 0$. As a consequence, $e^{-n}\\cdot e^{n}\\neq e^0=1$ for large $n$. If we instead expanded this expression from the middle then we obtain: $$e\\cdot (e\\cdot (\\cdots(e\\cdot e^{-1})\\cdot e^{-1})\\cdots e^{-1})=1.$$ <em>So multiplication of floating point numbers fails to be associative<\/em>. <\/li>\n<li>Since $\\alpha$ is a bounded positive integer, we can only encode a finite amount of precision. This implies that for certain (large) values of $m$ and $n$, $$\\sum_{i=1}^{2^n} 2^{-n} + 2^m = 2^m+1,$$ if we apply the operations from left to right, while the sum is $2^m$ if we apply them from right to left. <em>So addition of floating point numbers fails to be associative.<\/em><\/li>\n<li>We can not encode fractions precisely so $1\/n \\cdot n\\neq 1$ in general. On older calculators one can witness this first hand and see that $1\/3\\cdot 3&#8220;=&#8221;0.9999999999$.<\/li>\n<\/ol>\n<h4>Remark<\/h4>\n<p>I sometimes find it helpful to visualize all of the floating point numbers placed on the real line as a sequence of dots. These dots cluster heavily near 0 and gradually spread farther and farther apart as we move away from the origin until we meet the size bounds. Beyond these points there are no longer any dots.<\/h6>\n<p>Since both encodings of real numbers into floating point and floating point operations contribute errors<sup id=\"fnref-63-0\"><a href=\"#fn-63-0\" class=\"jetpack-footnote\">1<\/a><\/sup>, care needs to be taken in the design of floating point algorithms to ensure that these errors do not grow out of control, otherwise things may go seriously <a href=\"http:\/\/www-users.math.umn.edu\/~arnold\/disasters\/\">wrong<\/a>).<\/p>\n<h4>Remark<\/h4>\n<p>In <a href=\"http:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/probability-and-statistics-background\/\">probability theory<\/a> and, especially, <a href=\"http:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/05\/statistical-inference-2\/\">statistical inference<\/a>, it is very common to take iterated products of probabilities. Since many of these probabilities are very small, there is a high risk of numerical underflow as we calculate the product.<\/p>\n<p>One way to avoid this underflow is to use the simple formula $$  \\prod_{i=1}^n x_i = \\exp(\\sum_{i=1}^n \\log x_i). $$ This will help avoid the numerical underflow for the product. To ensure that the sum of quantities of varying magnitude is as accurate as possible, we can first sort the values to be summed and then add them.<\/p>\n<h3>Algorithm Analysis<\/h3>\n<p>Computer scientists and engineers are very concerned with space and computational efficiency. Making fair comparisons between two implementations of algorithms can be tricky. Implementation details of essentially identical algorithms can result in significant changes in runtime performance and memory costs (e.g., consider a program written in python 2.6 versus one written python 3.0).<\/p>\n<p>An easier comparison can be made about how the memory and runtime requirements of an algorithm change as we increase the size of the input. For example, the standard grade school algorithm for multiplying two $n\\times n$ matrices requires $n^3$ multiplications and $n^2*(n-1)$ addition operations<sup id=\"fnref-63-0.5\"><a href=\"#fn-63-0.5\" class=\"jetpack-footnote\">2<\/a><\/sup>. What is important to realize about this algorithm is not the exact count of the operations or how much faster addition is compared to multiplication, it is, for example, that if we replace $n$ with $100n$ then we will require roughly $1 000 000$-times more operations!<\/p>\n<p>At this scale the addition operations and their relative speed in comparison to multiplication do not matter as much as this observation. The dominant feature is that the time to multiply two $n\\times n$ matrices grows cubically in $n$.<\/p>\n<p>One way to encode the essential features of the growth of functions is with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Big_O_notation\">Big O-notation<\/a>.<\/p>\n<h4>Definition<\/h4>\n<p>We say<sup id=\"fnref-63-1\"><a href=\"#fn-63-1\" class=\"jetpack-footnote\">3<\/a><\/sup> $f(n)= O(g(n))$ if there exists an $M&gt;0$ such that for all sufficiently large $n$, \u00a0$$|f(n)|\\leq M|g(n)|.$$<\/p>\n<p>For most big data tasks, any algorithm whose runtime is not $O(n \\cdot \\log n)$ requires some attention (especially attention to how one can parallelize the computation across a cluster). More importantly, once the data is too large to fit into a single computers memory, this entire approach to comparing algorithms breaks down (as do the standard implementations of the algorithms). In this case, one has to think very carefully about how to break up the task into separate pieces. This is all complicated by the fact that the time move data between computers is much <a href=\"https:\/\/gist.github.com\/jboner\/2841832\">slower<\/a> than it is to move the data internally in computer memory.<\/p>\n<div class=\"footnotes\">\n<hr \/>\n<ol>\n<li id=\"fn-63-0\">\nOne way to reduce the errors in floating point operations is to use more bits for the representations, do the operation, and then truncate.&#160;<a href=\"#fnref-63-0\">&#8617;<\/a>\n<\/li>\n<li id=\"fn-63-0.5\">\nIt turns out this is not asymptotically optimal. For example see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Coppersmith%E2%80%93Winograd_algorithm\">here<\/a>.&#160;<a href=\"#fnref-63-0.5\">&#8617;<\/a>\n<\/li>\n<li id=\"fn-63-1\">\nI would like to, on the record, state my disapproval of the standard usage of the equal-sign in the expression $f(n)=O(g(n))$. I can see no reasonable interpretation of the equality symbol here.)https:\/\/en.wikiquote.org\/wiki\/Donald_Knuth&#160;<a href=\"#fnref-63-1\">&#8617;<\/a>\n<\/li>\n<\/ol>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>If you find that you&#8217;re spending almost all your time on theory, start turning some attention to practical things; it will improve your theories. If you find that you&#8217;re spending almost all your time on practice, start turning some attention to theoretical things; it will improve your practice. &#8211; Donald Knuth Introduction This section will &hellip; <a href=\"http:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/computer-science-background\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Computer Science Background&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[2],"tags":[],"class_list":["post-63","post","type-post","status-publish","format-standard","hentry","category-supplementary-material"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9dIpN-11","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":35,"url":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/supervised-learning\/","url_meta":{"origin":63,"position":0},"title":"Supervised Learning","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"A big computer, a complex algorithm, and a long time does not equal science. - Robert Gentleman Examples Before getting into what supervised learning precisely is, let's look at some examples of supervised learning tasks: Identifying breast cancer. A sample study. Image classification. List of last year's ILSVRC Winners Threat\u2026","rel":"","context":"In &quot;Supervised Learning&quot;","block_context":{"text":"Supervised Learning","link":"http:\/\/www.nullplug.org\/ML-Blog\/category\/supervised-learning\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":377,"url":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/11\/problem-set-1\/","url_meta":{"origin":63,"position":1},"title":"Problem Set 1","author":"Justin Noel","date":"October 11, 2017","format":false,"excerpt":"Problem Set 1 This is to be completed by November 27th, 2017. (THIS IS A TYPO: This should read October 26th, 2017). It is okay if you finish by this ridiculous first due date. Forewarning For several of these exercises, you will be asked to install software and\/or setup accounts\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"http:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":33,"url":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/machine-learning-overview\/","url_meta":{"origin":63,"position":2},"title":"Machine Learning Overview","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"Science is knowledge which we understand so well that we can teach it to a computer; and if we don't fully understand something, it is an art to deal with it. Donald Knuth Introduction First Attempt at a Definition One says that an algorithm learns if its performance improves with\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"http:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=700%2C400 2x"},"classes":[]},{"id":118,"url":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/27\/additional-sources\/","url_meta":{"origin":63,"position":3},"title":"Additional Sources","author":"Justin Noel","date":"September 27, 2017","format":false,"excerpt":"Textbooks Machine Learning: A probabilistic perspective\u00a0by Kevin Murphy. The material in this book is closest to what we will cover in the course, but is unfortunately not available for free. Written by an academic and a practitioner of machine learning, this text is full of real world examples and applications,\u2026","rel":"","context":"In &quot;Supplementary material&quot;","block_context":{"text":"Supplementary material","link":"http:\/\/www.nullplug.org\/ML-Blog\/category\/supplementary-material\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":342,"url":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/10\/hypothesis-testing\/","url_meta":{"origin":63,"position":4},"title":"Hypothesis Testing","author":"Justin Noel","date":"October 10, 2017","format":false,"excerpt":"Acceptance without proof is the fundamental characteristic of Western religion, rejection without proof is the fundamental characteristic of Western science. - Gary Zukav, \"The Dancing Wu Li Masters\" Hypothesis Testing Now we consider Hypothesis Testing in an example. While Bayesians also have a form of hypothesis testing, the term is\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"http:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":486,"url":"http:\/\/www.nullplug.org\/ML-Blog\/2017\/11\/03\/problem-set-3\/","url_meta":{"origin":63,"position":5},"title":"Problem Set 3","author":"Justin Noel","date":"November 3, 2017","format":false,"excerpt":"Problem Set 3 This is to be completed by November 9th, 2017. Exercises [Datacamp](https:\/\/www.datacamp.com\/home Complete the lesson \"Introduction to Machine Learning\". This should have also included \"Exploratory Data Analysis\". This has been added to the next week's assignment. MLE for the uniform distribution. (Source: Kaelbling\/Murphy) Consider a uniform distribution centered\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"http:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/63","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/comments?post=63"}],"version-history":[{"count":10,"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/63\/revisions"}],"predecessor-version":[{"id":395,"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/63\/revisions\/395"}],"wp:attachment":[{"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/media?parent=63"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/categories?post=63"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/tags?post=63"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}