Supervised Learning

A big computer, a complex algorithm, and a long time does not equal science. – Robert Gentleman


Before getting into what supervised learning precisely is, let’s look at some examples of supervised learning tasks:

  1. Identifying breast cancer.
  2. Image classification.
    • List of last year’s ILSVRC Winners
  3. Threat assessment
  4. Language Translation
  5. Identifying faces in images


Supervised learning is concerned with the construction of machine learning algorithms $f\colon 2^{D\times T}\times D\to T$. The subsets of $D\times T$ are referred to as subsets of labeled examples.

We often assume that the labeled examples arise from restricting some presumed function $F\colon D\to T$ whose values we know on $E\subset D$. We can then train $f$ on pairs $\{s, F(s) | s\in E\}$. We can then devise a cost function which measures the distance from the learned $f$ and the presumed $F$ (e.g., $L^2$-distance).

Supervised learning has been the most successful of the three branches to real world problems. The existence of labeled examples usually leads to well-defined performance metrics that transform supervised learning tasks into two other tasks finding an appropriate parametrized class of functions to choose $f$ from and an optimization problem of finding the best function in that class.

Typically $D$ will be some subset of $\Bbb R^n$ and we will refer to the components of $\Bbb R^n$ as features, dependent variables, or attributes (many concepts in machine learning have many names). Sometimes $D$ will be discrete, in which case we refer to these as categorical variables. There are a few simple tricks to map categorical variables into $\Bbb R^n$ (such as one-hot encoding), so it usually does not hurt to think of $D$ as a subset of $\Bbb R^n$.

When $T$ is discrete (typically finite), then the supervised learning problem is called a classification problem. When it is continuous (e.g, $\Bbb R$) then it is called a regression problem.


Let us consider the following sequence of supervised learning methods in turn.

  1. Linear Regression (Regression problems).
  2. $k$-nearest Neighbors (Classification or Regression problems).
  3. Logistic Regression (Classification problems1).
  4. Naive Bayes Classifier (Classification problems).
  5. Linear Discriminant Analysis (Classification problems).
  6. Support Vector Machines (Classification problems).
  7. Decision trees (Primarily Classification problems).
  8. Neural networks (Classification or Regression problems).

  1. I know the name is confusing. 

Leave a Reply

Your email address will not be published. Required fields are marked *