A big computer, a complex algorithm, and a long time does not equal science. – Robert Gentleman
Examples
Before getting into what supervised learning precisely is, let’s look at some examples of supervised learning tasks:
- Identifying breast cancer.
- A sample study.
- Image classification.
- List of last year’s ILSVRC Winners
- Threat assessment
- Language Translation
- Identifying faces in images
Definition
Supervised learning is concerned with the construction of machine learning algorithms $f\colon 2^{D\times T}\times D\to T$. The subsets of $D\times T$ are referred to as subsets of labeled examples.
We often assume that the labeled examples arise from restricting some presumed function $F\colon D\to T$ whose values we know on $E\subset D$. We can then train $f$ on pairs $\{s, F(s) | s\in E\}$. We can then devise a cost function which measures the distance from the learned $f$ and the presumed $F$ (e.g., $L^2$-distance).
Supervised learning has been the most successful of the three branches to real world problems. The existence of labeled examples usually leads to well-defined performance metrics that transform supervised learning tasks into two other tasks finding an appropriate parametrized class of functions to choose $f$ from and an optimization problem of finding the best function in that class.
Typically $D$ will be some subset of $\Bbb R^n$ and we will refer to the components of $\Bbb R^n$ as features, dependent variables, or attributes (many concepts in machine learning have many names). Sometimes $D$ will be discrete, in which case we refer to these as categorical variables. There are a few simple tricks to map categorical variables into $\Bbb R^n$ (such as one-hot encoding), so it usually does not hurt to think of $D$ as a subset of $\Bbb R^n$.
When $T$ is discrete (typically finite), then the supervised learning problem is called a classification problem. When it is continuous (e.g, $\Bbb R$) then it is called a regression problem.
Examples
Let us consider the following sequence of supervised learning methods in turn.
- Linear Regression (Regression problems).
- $k$-nearest Neighbors (Classification or Regression problems).
- Logistic Regression (Classification problems1).
- Naive Bayes Classifier (Classification problems).
- Linear Discriminant Analysis (Classification problems).
- Support Vector Machines (Classification problems).
- Decision trees (Primarily Classification problems).
- Neural networks (Classification or Regression problems).
- I know the name is confusing. ↩