A big computer, a complex algorithm, and a long time does not equal science. – Robert Gentleman

## Examples

Before getting into what supervised learning precisely is, let’s look at some examples of supervised learning tasks:

- Identifying breast cancer.
- A sample study.

- Image classification.
- List of last year’s ILSVRC Winners

- Threat assessment

- Language Translation
- Identifying faces in images

#### Definition

*Supervised learning* is concerned with the construction of machine learning algorithms $f\colon 2^{D\times T}\times D\to T$. The subsets of $D\times T$ are referred to as subsets of *labeled examples*.

We often assume that the labeled examples arise from restricting some presumed function $F\colon D\to T$ whose values we know on $E\subset D$. We can then train $f$ on pairs $\{s, F(s) | s\in E\}$. We can then devise a cost function which measures the distance from the learned $f$ and the presumed $F$ (e.g., $L^2$-distance).

Supervised learning has been the most successful of the three branches to real world problems. The existence of labeled examples usually leads to well-defined performance metrics that transform supervised learning tasks into two other tasks finding an appropriate parametrized class of functions to choose $f$ from and an optimization problem of finding the best function in that class.

Typically $D$ will be some subset of $\Bbb R^n$ and we will refer to the components of $\Bbb R^n$ as *features*, *dependent variables*, or *attributes* (many concepts in machine learning have many names). Sometimes $D$ will be discrete, in which case we refer to these as *categorical variables*. There are a few simple tricks to map categorical variables into $\Bbb R^n$ (such as one-hot encoding), so it usually does not hurt to think of $D$ as a subset of $\Bbb R^n$.

When $T$ is discrete (typically finite), then the supervised learning problem is called a *classification problem*. When it is continuous (e.g, $\Bbb R$) then it is called a *regression problem*.

### Examples

Let us consider the following sequence of supervised learning methods in turn.

- Linear Regression (Regression problems).
- $k$-nearest Neighbors (Classification or Regression problems).
- Logistic Regression (Classification problems
^{1}). - Naive Bayes Classifier (Classification problems).
- Linear Discriminant Analysis (Classification problems).
- Support Vector Machines (Classification problems).
- Decision trees (Primarily Classification problems).
- Neural networks (Classification or Regression problems).

- I know the name is confusing. ↩