Kernel Support Vector Classifier (SVC) extends the Support Vector Machine (SVM) concept to handle non-linear data by applying kernel functions. This approach allows the algorithm to operate in a transformed feature space without explicitly computing the transformation, enabling efficient classification of complex and non-linearly separable data sets. Kernel SVC is widely recognized for its robustness and versatility in various machine learning tasks.
The core idea behind Kernel SVC is to map the input data into a higher-dimensional space where it becomes linearly separable, using a kernel function. This process facilitates finding a hyperplane in the transformed space that can separate the data points based on their class labels with a maximal margin while minimizing classification error.
Given a dataset of points , where is an input feature vector and is the class label, the goal is to find a function that can predict the class label of a new instance.
The decision function in the context of kernel SVC is formulated as:
where:
The optimization problem maximizes the margin between classes while minimizing classification error, subject to certain constraints related to the data labels and the Lagrange multipliers.
Commonly used kernel functions include:
C
: float
, default = 1.0deg
: int
, default = 3gamma
: float
, default = 1.0coef
: float
, default = 1.0learning_rate
: float
, default = 0.001max_iter
: int
, default = 1000batch_size
: int
, default = 100kernel
: KernelUtil.func_type
, default = ‘rbf’Test on wine dataset with dimensionality reduction via RFE
and tuning with RandomizedSearchCV
:
from luma.classifier.svm import KernelSVC
from luma.classifier.naive_bayes import GaussianNaiveBayes
from luma.preprocessing.scaler import StandardScaler
from luma.reduction.selection import RFE
from luma.model_selection.split import TrainTestSplit
from luma.model_selection.search import RandomizedSearchCV
from luma.visual.evaluation import DecisionRegion, ConfusionMatrix
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
import numpy as np
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = TrainTestSplit(X, y,
test_size=0.3,
random_state=42).get
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.fit_transform(X_test)
rfe = RFE(estimator=GaussianNaiveBayes(),
n_features=2,
step_size=1,
cv=5,
random_state=42,
verbose=True)
rfe.fit(X_train_std, y_train)
X_train_rfe = rfe.transform(X_train_std)
X_test_rfe = rfe.transform(X_test_std)
param_dist = {'C': np.logspace(-3, 2, 5),
'deg': range(2, 10),
'gamma': np.logspace(-3, 1, 5),
'learning_rate': np.logspace(-3, -1, 5),
'kernel': ['poly', 'rbf', 'sigmoid']}
rand = RandomizedSearchCV(estimator=KernelSVC(),
param_dist=param_dist,
max_iter=20,
cv=5,
refit=True,
random_state=42,
verbose=True)
rand.fit(X_train_rfe, y_train)
ksvc_best = rand.best_model
X_concat = np.concatenate((X_train_rfe, X_test_rfe))
y_concat = np.concatenate((y_train, y_test))
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
dec = DecisionRegion(ksvc_best, X_concat, y_concat)
dec.plot(ax=ax1)
conf = ConfusionMatrix(y_concat, ksvc_best.predict(X_concat))
conf.plot(ax=ax2, show=True)
# Best params: {
# 'C': 0.31622776601683794,
# 'deg': 9,
# 'gamma': 10.0,
# 'learning_rate': 0.001,
# 'kernel': 'rbf'
# }
# Best score: 0.912
- Schölkopf, B., & Smola, A. J. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
- Vapnik, V. N. (1998). Statistical Learning Theory. Wiley.
- Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
- Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121-167.