Machine Learning for Mathies

The purpose of this blog is to share information and notes relevant to a course on “Machine Learning from a Mathematical Perspective” at the University of Regensburg. For the follow-up seminar series see here. The table of contents below is provided in order to provide a somewhat more natural form of navigation and organization than the blog format is designed for.

Machine Learning Overview
- Supervised Learning
  - Linear Regression
  - Naive Bayes Classifiers.
  - Linear and Quadratic Discriminant Analysis.
  - Logistic Regression.
  - Decision Trees.
  - Ensemble methods (Bagging and Boosting)
  - Perceptrons and fully connected neural networks.
  - $K$-Nearest Neighbors.
  - Support Vector Machines.
- Unsupervised Learning.
  - Principal Components Analysis
  - $k$-Means Clustering.
  - Gaussian Mixtures.
- Reinforcement Learning.
Computer Science Background
Probability and Statistics Background
Tensor Calculus
Additional Sources

Problem Sets

Problem Set 1 to be completed by November 27th, 2017 (This should have been October 26th, whoops.).
Problem Set 2 to be completed by November 2nd, 2017.
Problem Set 3 to be completed by November 9th, 2017.
Problem Set 4 to be completed by November 16th, 2017.
Problem Set 5 to be completed by November 23rd, 2017.
Problem Set 6 to be completed by November 30th, 2017.
Problem Set 7 to be completed by December 7th, 2017.
Problem Set 8 to be completed by December 14th, 2017.
Problem Set 9 to be completed by December 21st, 2017.
Problem Set 10 to be completed by January 11th, 2018.
Problem Set 11 to be completed by January 18th, 2018.
Problem Set 12 to be completed by January 25th, 2018.
Problem Set 13 to be completed by February, 1st, 2018.

Exercise session pages

How does our approach differ from others?

The goal of this course is to provide an introduction to machine learning for those with some background in mathematics, but not necessarily a strong background in computer science, probability theory, or statistics. As such, we do not shy away from the technicalities nor the mathematics. Our approach will be noticeably less user-friendly than others, but I think it will have added value to the mathematically inclined.

With the growing interest in the field, many rapid courses have appeared which emphasize an essentially “plug and play” approach. That is to apply the methods of machine learning, one can use any of the numerous toolkits available to simply apply machine learning algorithms to any given dataset. While this is practical, it also lends itself to misuse, as practitioners may not understand whether or not a given approach is suitable for a given problem or what kind of implicit assumptions they are making about the problem. This black box approach also makes it rather difficult to improve on these methods, since it is not clear what the limitations of a given algorithm are and why they appear.

Our approach will not differ substantially from those who approach it from the perspective of probability theory. That approach makes it clear what is being assumed and what exactly is the problem that we are attempting to solve. The mathematics underlying the required probability theory is rich and worthy of learning.

However, I have found introductory material on probability theory lacking in rigor. For example, the definition of a random variable given in some introductory texts is often too vague for a mathematicians taste¹. I expect that the target audience of this lecture series will benefit from having a rapid review of the relevant probability theory from some rigorous foundations.

Second probability theorists may have little interest in the efficiency of their algorithms or the numerical stability problems that arise in their implementation, so this approach misses some important details from computer science and numerical analysis that mathematicians are prepared to grasp. Finally, this approach is happy to treat much of the linear algebra and some of the calculus methods as a black box. Since mathematicians often have a better understanding of the black box than how it is applied, it is a good idea to explore this part of the material in greater detail.

Why do this as a blog?

I realize that a blog is not exactly the correct format for what I am trying to do here. Rather than have blog posts be finished isolated products, I would prefer to keep editing and changing them over time. This runs contrary to the natural flow of a blog, which is why I have included a table of contents. I could have done this as a wiki, but reading wikipedia has given me a strong bias as to what the format of such a wiki should look like, and it does not fit into the more casual lecture style. I could have done this as a properly formatted website (something like an online book), this is most natural choice, but would me to organize the material in a way that I expect to be clear only once the course is over.

In the end, I chose the blog format because it was easy. It is easy to include mathematics. It is easy to share the information and make revisions. It is easy to include web content including dynamic media. It is easy for me to shape the material for the needs of the course participants without instituting significant organizational changes.

I seem to suffer from a disability which I think might be common amongst mathematicians. I find it nearly impossible to understand an imprecise definition and completely impossible to use one mathematically. Vague definitions usually wash over me, leaving little trace behind. It’s not the case that I find imprecise definitions unhelpful, it is just that I view them as a way to interpret a precise definition. Without the latter, the former is useless. ↩

Table of Contents

Problem Sets

Exercise session pages

How does our approach differ from others?

Why do this as a blog?