{"id":405,"date":"2017-10-18T12:11:21","date_gmt":"2017-10-18T12:11:21","guid":{"rendered":"http:\/\/www.nullplug.org\/ML-Blog\/?p=405"},"modified":"2017-10-18T12:13:57","modified_gmt":"2017-10-18T12:13:57","slug":"tensor-calculus","status":"publish","type":"post","link":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/18\/tensor-calculus\/","title":{"rendered":"Tensor Calculus"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p>I will assume that you have seen some calculus, including multivariable calculus. That is you know how to differentiate a differentiable function $f\\colon \\Bbb R \\to \\Bbb R$, to obtain a new function $$\\frac{\\partial f}{\\partial x} \\colon \\Bbb R \\to \\Bbb R.$$ You also know how to differentiate a multivariable function $f\\colon \\Bbb R^m \\to \\Bbb R^n$, to obtain a map $$Jf\\colon \\Bbb R^m \\to \\mathrm{Hom}(\\Bbb R^m, \\Bbb R^n),$$ which takes a point $x\\in \\Bbb R^m$ to the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Jacobian_matrix_and_determinant\">Jacobian<\/a> at $x$, which is an $(m,n)$-matrix; alternatively you can think of the Jacobian as a matrix of functions with the $(i,j)$-entry being $\\frac{\\partial f_i}{\\partial x_j}\\colon \\Bbb R^m \\to \\Bbb R.$<\/p>\n<p>The Jacobian is a convenient form for packaging the derivative of a vector valued function with respect to a list of variables arranged into a vector. Many of us learn the basic results (the product rule, the sum rule, the chain rule) very well for single variable calculus and, hopefully, pretty well for multivariable calculus. We may have less experience using these compact forms in practice.<\/p>\n<p>While this knowledge is very useful for considering vector valued functions indexed by vectors of variables, it can be a bit tricky to differentiate matrix valued functions with respect to matrices or vectors of variables and get a useful compact form.<\/p>\n<h3>Potential objection<\/h3>\n<p>A bright student might observe that an element of a matrix is an element of $\\Bbb R^m\\otimes \\Bbb R^n\\cong \\Bbb R^{mn}$ and hence we can always regard an $(m,n)$-matrix as an $mn$-vector<sup id=\"fnref-405-1\"><a href=\"#fn-405-1\" class=\"jetpack-footnote\">1<\/a><\/sup>. This is true, but we usually organize data into matrices because that is the most natural structure for them to have, e.g., the Jacobian above. Here are some similar naturally occurring structures:<\/p>\n<ol>\n<li>We have a table with 100 entries. On each row is data from some sample, say temperature and pressure. This naturally fits into a $(100,2)$-matrix, whose rows are the samples and whose columns are the respective measurements. So we can recover each datum, by using two numbers: the row index and the column index. <\/li>\n<li>We have a list of 100 grayscale images 640&#215;480 images. To recover a datum (the darkness of a pixel in a sample) in this table, we would use <em>three<\/em> integers:\n<ul>\n<li>the image number, <\/li>\n<li>the $x$-coordinate, and <\/li>\n<li>the $y$-coordinate.<br \/>\nThis is encoded as a <em>tensor<\/em> of shape $(100,640,480)$, i.e., an element of $\\Bbb R^{100}\\otimes \\Bbb R^{640} \\otimes \\Bbb R^{480}$. While we could encode this as a $307200000$-vector, it would make it much more difficult to understand and meaningfully manipulate. <\/li>\n<\/ul>\n<\/li>\n<li>We have a list of 100 color (i.e., RGB) images 640&#215;480 images. To recover a datum (the strength of a particular color (red\/green\/blue) of a pixel in a sample) in this table, we would use <em>four<\/em> integers:\n<ul>\n<li>the image number, <\/li>\n<li>the $x$-coordinate, <\/li>\n<li>the $y$-coordinate, and <\/li>\n<li>which color we are interested in (e.g., red = 0, green = 1, blue = 2).<br \/>\nThis is encoded as a tensor of shape $(100,640,480,3)$, i.e., an element of $\\Bbb R^{100}\\otimes \\Bbb R^{640} \\otimes \\Bbb R^{480}\\otimes \\Bbb R^3$. <\/li>\n<\/ul>\n<\/li>\n<li>We have a list of 100 color (i.e., RGB) videos each consisting of 18000 frames of 640&#215;480 images. To recover a datum (the strength of a particular color (red\/green\/blue) of a pixel in a  sample at a particular frame in the table) in this table, we would use <em>five<\/em> integers:\n<ul>\n<li>the image number, <\/li>\n<li>the frame number, <\/li>\n<li>the $x$-coordinate, <\/li>\n<li>the $y$-coordinate, and <\/li>\n<li>which color we are interested in (e.g., red = 0, green = 1, blue = 2).<br \/>\nThis is encoded as a tensor of shape $(100,18000,640,480,3)$, i.e., an element of $\\Bbb R^{100}\\otimes \\Bbb R^{18000}\\otimes \\Bbb R^{640} \\otimes \\Bbb R^{480}\\otimes \\Bbb R^3$. Again, it would be very unnatural to work with this as a single vector. <\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3>Definition<\/h3>\n<p>Suppose we have a function $F$ taking values in tensors of shape $(n_1,\\cdots, n_\\ell)$ which depends on variables indexed by a tensor $X$ of shape $(m_1,\\cdots, m_k)$. The derivative $\\frac{\\partial F}{\\partial X}$ is a function from $\\Bbb R^{m_1}\\otimes \\cdots \\otimes \\Bbb R^{m_k}$ to tensors of shape $(n_1,\\cdots,n_\\ell,m_1,\\cdots,m_k)$ whose $(i_1,\\cdots, i_\\ell, j_1, \\cdots, j_k)$-entry is $\\frac{\\partial F_{i_1,\\cdots, i_\\ell}}{\\partial X_{j_1,\\cdots, j_k}}(-)$.<\/p>\n<h3>Some scandalous abuses of notation<\/h3>\n<p>We will typically abuse notation and let $n$-vectors also denote column vectors, i.e., tensors of shape $(n,1)$.<\/p>\n<p>For convenience, many authors (including this one), will sometimes take tensors of the form $(\\cdots, 1, \\cdots)$ and regard them as tensors of the form $(\\cdots,\\cdots)$ (i.e., we omit the 1 term by &#8216;squeezing the tensor&#8217;. This abuse of notation is both convenient and confusing. Avoiding this abuse tends to cause ones to accumulate in the shape of our tensors and these one&#8217;s do not add any value to our interpretation. The best way to follow what is going on is to check it for yourself.<\/p>\n<p>A less scandalous abuse of notation is to regularly add transposes to tensors of $(1,1)$ when convenient.<\/p>\n<h2>Some results<\/h2>\n<p>Since the only way to get used to tensor differentiation is to do some work with it yourself, we leave the following results as exercises.<\/p>\n<h3>Exercise<\/h3>\n<ol>\n<li>Suppose that $f\\colon \\Bbb R^m\\to \\Bbb R^n$ is a smooth function depending on an $m$-vector $X$ of variables. Show that $\\frac{\\partial f}{\\partial X}=Jf$. <\/li>\n<li>Suppose that $f\\colon \\Bbb R^m\\to \\Bbb R^n$ is a smooth function depending on an $m$-vector $X$ of variables of the form $f(x)=Ax$ for an $(n,m)$-matrix $A$. Show that $\\frac{\\partial f}{\\partial X}=A$ (a constant function). <\/li>\n<li>Suppose that $f\\colon \\Bbb R^m\\to \\Bbb R^n$ is a smooth function depending on an $m$-vector $X$ of variables of the form $f(x)=x^TA$ for an $(m,n)$-matrix $A$. Show that $\\frac{\\partial f}{\\partial X}$ is a function valued in $(1,n,m)$-tensors whose $(1,i,j)$ entry is $(A^T)_{i,j}$.<\/li>\n<li>Suppose that $f\\colon \\Bbb R^m\\to \\Bbb R$ is a smooth function depending on an $m$-vector $X$ of variables of the form $f(x)=x^TA x$ for an $(m,m)$-matrix $A$. Show that $\\frac{\\partial<br \/>\nf}{\\partial X}$ is a function valued in $(1,m)$-tensors, whose $(1,i)$-entry is $((A+A^T)x)_i$. <\/li>\n<li>Suppose that $f\\colon \\Bbb R^m\\to \\Bbb R$ is a smooth function depending on an $m$-vector $X$ of variables of the form $f(x)=(Ax-y)^T(Ax-y)$ for an $(m,m)$-matrix $A$ and a constant $(m,1)$-matrix $y$. Show that $\\frac{\\partial<br \/>\nf}{\\partial X}$ is a function valued in $(1,m)$-tensors, whose entries give $2(Ax-y)^TA.$ <\/li>\n<li>Suppose that $f\\colon \\Bbb R^m\\to \\Bbb R^m$ is a smooth function depending on an $m$-vector $X$ of variables of the form $f(x)=2(Ax-y)^TA$ for an $(m,m)$-matrix $A$ and a constant $(m,1)$-matrix $y$. Here we regard the function as taking values in $(m)$-tensors. Show that $\\frac{\\partial f}{\\partial X}$ is the $(m,m)$-tensor $2A^TA$.<br \/>\nf}{\\partial X}$ is a function valued in $(1,m)$-tensors, whose entries give $2(Ax-y)^TA.$<\/li>\n<\/ol>\n<div class=\"footnotes\">\n<hr \/>\n<ol>\n<li id=\"fn-405-1\">\nTo simplify our notation, $\\otimes := \\otimes_{\\Bbb R}$.&#160;<a href=\"#fnref-405-1\">&#8617;<\/a>\n<\/li>\n<\/ol>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Introduction I will assume that you have seen some calculus, including multivariable calculus. That is you know how to differentiate a differentiable function $f\\colon \\Bbb R \\to \\Bbb R$, to obtain a new function $$\\frac{\\partial f}{\\partial x} \\colon \\Bbb R \\to \\Bbb R.$$ You also know how to differentiate a multivariable function $f\\colon \\Bbb R^m &hellip; <a href=\"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/18\/tensor-calculus\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Tensor Calculus&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[2],"tags":[],"class_list":["post-405","post","type-post","status-publish","format-standard","hentry","category-supplementary-material"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9dIpN-6x","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":214,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/10\/04\/linear-regression\/","url_meta":{"origin":405,"position":0},"title":"Linear Regression","author":"Justin Noel","date":"October 4, 2017","format":false,"excerpt":"Prediction is very difficult, especially about the future. - Niels Bohr The problem Suppose we have a list of vectors (which we can think of as samples) $x_1, \\cdots, x_m\\in \\Bbb R^n$ and a corresponding list of output scalars $y_1, \\cdots, y_m \\in \\Bbb R$ (which we can regard as\u2026","rel":"","context":"In &quot;Regression&quot;","block_context":{"text":"Regression","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/regression\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_linreg_normal.gif?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_linreg_normal.gif?resize=350%2C200 1x, https:\/\/i0.wp.com\/www.nullplug.org\/ML-Blog\/wp-content\/uploads\/2017\/10\/compressed_linreg_normal.gif?resize=525%2C300 1.5x"},"classes":[]},{"id":508,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/11\/09\/problem-set-4\/","url_meta":{"origin":405,"position":1},"title":"Problem Set 4","author":"Justin Noel","date":"November 9, 2017","format":false,"excerpt":"Problem Set 4 This is to be completed by November 16th, 2017. Exercises Datacamp Complete the lessons: a. Supervised Learning in R: Regression b. Supervised Learning in R: Classification c. Exploratory Data Analysis (If you did not already do so) Let $\\lambda\\geq 0$, $X\\in \\Bbb R^n\\otimes \\Bbb R^m$, $Y\\in \\Bbb\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":61,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/probability-and-statistics-background\/","url_meta":{"origin":405,"position":2},"title":"Probability and Statistics Background","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"Statistics - A subject which most statisticians find difficult, but in which nearly all physicians are expert. - Stephen S. Senn Introduction For us, we will regard probability theory as a way of logically reasoning about uncertainty. I realize that this is not a precise mathematical definition, but neither is\u2026","rel":"","context":"In &quot;Supplementary material&quot;","block_context":{"text":"Supplementary material","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/supplementary-material\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":486,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/11\/03\/problem-set-3\/","url_meta":{"origin":405,"position":3},"title":"Problem Set 3","author":"Justin Noel","date":"November 3, 2017","format":false,"excerpt":"Problem Set 3 This is to be completed by November 9th, 2017. Exercises [Datacamp](https:\/\/www.datacamp.com\/home Complete the lesson \"Introduction to Machine Learning\". This should have also included \"Exploratory Data Analysis\". This has been added to the next week's assignment. MLE for the uniform distribution. (Source: Kaelbling\/Murphy) Consider a uniform distribution centered\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":35,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/supervised-learning\/","url_meta":{"origin":405,"position":4},"title":"Supervised Learning","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"A big computer, a complex algorithm, and a long time does not equal science. - Robert Gentleman Examples Before getting into what supervised learning precisely is, let's look at some examples of supervised learning tasks: Identifying breast cancer. A sample study. Image classification. List of last year's ILSVRC Winners Threat\u2026","rel":"","context":"In &quot;Supervised Learning&quot;","block_context":{"text":"Supervised Learning","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/supervised-learning\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":33,"url":"https:\/\/www.nullplug.org\/ML-Blog\/2017\/09\/26\/machine-learning-overview\/","url_meta":{"origin":405,"position":5},"title":"Machine Learning Overview","author":"Justin Noel","date":"September 26, 2017","format":false,"excerpt":"Science is knowledge which we understand so well that we can teach it to a computer; and if we don't fully understand something, it is an art to deal with it. Donald Knuth Introduction First Attempt at a Definition One says that an algorithm learns if its performance improves with\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.nullplug.org\/ML-Blog\/category\/general\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/web.stanford.edu\/class\/cs234\/images\/header2.png?resize=700%2C400 2x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/405","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/comments?post=405"}],"version-history":[{"count":10,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/405\/revisions"}],"predecessor-version":[{"id":420,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/posts\/405\/revisions\/420"}],"wp:attachment":[{"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/media?parent=405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/categories?post=405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nullplug.org\/ML-Blog\/wp-json\/wp\/v2\/tags?post=405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}