Here we approach the two-class classification problem in a direct way :
We try and find a plane that seperates the classes in feature space
If we cannot, we get creative in two ways :
We soften what we mean by seperates, and
We enrich and enlarge the feature space so that separation is possible
What is a Hyperplane?
A hyperplane in p dimensions is a flat affine subspace of dimension p−1
In general the equation for a hyperplane has the form
β0+β1X1+⋯+βpXp=0
In p=2 dimensions a hyperplane is a line
If β0=0, the hyperplane goes throught the origin, otherwise not
The vector β=(β1,β2,⋯,βp) is called the normal vector - it points in a direction orthogonal to the surface of a hyperplane
Separating Hyperplanes
If f(X)=β0+β1X1+⋯+βpXp, then f(X)>0 for points on one side of the hyperplane, and f(X)<0 for points on the other
If we code the colored points as Yi=±1 for blue, say, and Yi=−1 for mauve, then if Yi⋅f(Xi)>0 for all i, f(X)=0 defines a separating hyperplane
d=β12+⋯+βp2∣β0+β1X1+⋯+βpXp∣
Maximal Margin Classifier
Among all seperating hyperplanes, find the one that makes the biggest gap or margin between the two classes
Constrained optimization problem
maximize Msubject to j=1∑pβj2=1yi(β0+β1xi1+⋯+βpxip≥M,for all i=1,…,N
Non-separable Data
The data are not separable by a linear boundary
This is often the ase, unless N<p
Noisy Data
Sometimes the data are separable, but noisy
This can lead to a poor solution for the maximal-margin classifier
The support vector classifier maximizes a soft margin
Support Vector Classifier
maximize M subject to j=1∑pβj2=1yi(β0+β1xi1+⋯+βpxip)≥M(1−ϵi),ϵi≥0,i=1∑nϵi≤C
ϵi : slack variable
C : Budget
Linear boundary can fail
Sometimes a linear boundary simply won't work, no matter what value of C
The example on the left is such a case. What to do?
Featue Expansion
Enlarge the space of features by including transformations;
e.g. X12,X13,X1X2,X1X22,… Hence go from a p-dimensional space to a M>p dimensional space
Fit a support-vector classifier in the enlarged space
This results in non-linear decision boundaries in the original space
Example : Suppose we use (X1,X2,X12,X22,X1X2) instead of just (X1,X2). Then the decision boundary would be of the form
β0+β1X1+β2X2+β3X12+β4X22+β5X1X2=0
This leads to nonlinear decision boundaries in the original space(quadratic conic sections)
Cubic Polynomials
Here we use a basis expansion of cubin polynomials
From 2 variables to 9
The support-vector classifier in the enlarged space solves the problem in the lower-dimensional space