Topics in Machine Learning Seminar

Structure

Each participant seeking credit will be expected to give a 90 minute presentation in the seminar. This presentation must include both a lecture about the material (at least 30 minutes long) and a presentation of an implementation of the material (at least 30 minutes long). Each participant will have to meet with me twice for 30 minutes a piece to discuss their presentation. The first meeting will take place once 3 weeks before (Mondays at 15:00), where we will agree on proposed source material and method of implementation; the participant should have proposals ready at this time. During the break this meeting can be done via Skype or Google Hangout. The second meeting will take place 1 week before (Mondays at 15:30). At this point the student should have their material ready for review. This will give us some time to determine if there are any significant problems and how to solve them.

Implementation

You should view the implementation as a fun project that you can share with others and demonstrate what you know about the material. Please feel free to propose new methods and extensions. Experiment! Have fun!

The implementation should be in the format of an interactive notebook, e.g., Jupyter Notebook or R Markdown. The choice of programming language is flexible and should be appropriate for the problem (R or Python will always be fine). If taking the course for a Pruefungsleistung, than this notebook should include a written version of their talk. If it is necessary to use a language that does not support interactive notebooks (e.g., Java+Caffe), than a separate writeup must be prepared.

The implementation should be written by the participant and not copied from outside sources, even if the participant makes alterations to the outside implementation (although you are free to look at such sources for assistance). For implementations that are extremely resource intensive to train, the participant should make a much smaller/simpler version of the model and train on a less resource intensive version of the training data. You will not be expected to match state of the art results in these cases. Some ML algorithms have only gotten satisfactory results on extremely large datasets with very large models. Do not be discouraged if your model does not perform decently in these cases, but you should be able to demonstrate that the algorithm has indeed learned something from the data.

There may be some instances where any reimplementation of the algorithm will require too much work (there is no hard rule here, but > 1000 lines of code should be considered too much). In such a case, an arrangement should be made with me. A possible alternative is to find an existing implementation, make it work on a slightly different problem and then add extensive comments to the code. In this case you may want to request to add your modifications to the original code on github, although that will not be required nor have an effect on the course.

Presentation

As a rough outline: The talk should include a brief introduction to the problem being addressed, the new idea that is being proposed, why it works (if it can be demonstrated rigorously, otherwise a survey of empirical evidence). What are other possible approaches? How does this approach compare? The verbal part of the presentation can be given at the blackboard or a computer presentation if necessary (timing the latter is notoriously tricky).

Some words of caution

We will be moving towards more cutting edge material in this seminar. Some of the implementations may be more complicated and only very roughly sketched in the source material. One might require a fairly flexible machine learning library which may be more cumbersome to use than alternatives, e.g., Tensorflow. Once you have identified the library you want to use, make sure to do some of the online tutorials to make sure you have a handle on what is going on in very elementary cases.

For a great deal of this material there will be no textbook source and you will have to look at the preprints on the arxiv. I find that ML papers are much easier to read than math papers, but you might have to do some research and find a variety of resources to get a good overview of the material. Some of these topics, I have only a cursory understanding of, but I will be happy to work through them with you.

In other words, once you are at the frontiers of knowledge you should expect to do some exploration on your own. Don’t be intimidated and happy hunting! I’m always here if you need me.

Proposed plan of talks

Book Sources:
ESLR = Elements of statistical learning.
MLPP = Machine Learning a Probabilistic perspective
DL = Deep Learning

** The seminars marked with an “!” will be cancelled! If you would like to speak earlier, please let me know. **

(9.4.2018 Justin Noel) Introduction and the Expectation-Maximization algorithm and fitting Gaussian mixtures (primary source ESLR, 8.5, secondary sources MLPP, 11.4, DL 19.2). Example application: clustering.
(16.4.2018 Phillip Gäbelein) Principal components analysis and non-negative matrix factorizations (Primary source ESLR 14.5.1 + 14.6, DL 2.12, MSPP 12.2) Possible applications: topic model, document clustering.
(23.4.2018) (Markus Eppelsheimer) Deep dream/Style transfer: (Deep dream blog post, fast style transfer, video style transfer, etc.)
(30.4.2018 Luke Wallner) Speech recognition (https://arxiv.org/pdf/1412.5567.pdf).
(7.5.2018 Gesina Schwalbe) Image segmentation (Mask R-CNN.
(14.5.2018 Jonas Kleinoeder) Reinforcement learning.

Potential topics:
1. Make a suggestion!
1. Word2vec, recurrent neural networks, long-term short-term memory, gated recurrent units. (DL 10, and many online sources). Potential applications: text generation, sentiment analysis (e.g. Rottentomatoes)
1. Independent components analysis (Primary sources ESLR 14.7.2+14.7.4, DL 13.2, MSPP 12.6, Source article.
1. Image super resolution, denoising, inpainting and, time permitting, the general problem of data imputation. (Sources: This will require some research, but this may get you started; for data imputation with neural networks see here.
1. Support vector machines and the kernel trick. Derive the dual optimization problem, introduce KKT conditions, and Mercer’s theorem (proofs will not be necessary, but some discussion of why taking the dual problem will work is required. (Sources: ESLR 12.1-12.3, DL 5.7.2, MSPP 14.4-14.5, any textbook on convex optimization e.g. this one.
1. Image classification state of the art: CNNs and current best practices. Find one of the best models in recent years (an older but simpler model is ResNet) and implement a small simplified version of it. (Primary sources: DL 9, papers on the arxiv, including Resnets and those papers that cite it (see Google scholar))
1. You only look once object detection (paper).
1. Generative adversarial networks (DL 20.10.4 paper). Image generation.
1. Reinforcement learning (this is a big task, just discuss a specific problem you are interested in and a potential algorithm for solving it, e.g. Reinforce, deep Q-learning). You will probably need an existing implementation built on the OpenAI gym to start here. Potential applications: Basic games and/or atari games.
1. Hidden Markov models, Forward(-backward algorithm), maybe Viterbo algorithm (MLPP 17.3+17.4) Potential applications: Phoneme transcription, text generation.
1. State Space Models (MLPP 18.1-18.3) Applications object tracking.
1. Latent Dirichlet Allocation (MLPP 27.3) Potential applications unsupervised topic identification.
1. Markov Chain Monte Carlo: A selection of standard algorithms Gibbs sampling, Metropolis(-Hastings), BUGS. (MLPP 24) Potential applications: Bayesian inference.
1. Speech generation. Many potential sources here. The current state of the art is Wavenet,
1. Ranking algorithms (I don’t know much about this beyond the PageRank algorithm, which is not state of the art. Some research will be required).
1. Recommender systems.