[Classifier] Linear Discriminant Analysis (LDA)

안암동컴맹·2024년 2월 22일
0

Machine Learning

목록 보기
1/103

Linear Discriminant Analysis Classifier

Introduction

Linear Discriminant Analysis (LDA) is widely used in machine learning and statistics for pattern classification and dimensionality reduction. It works by maximizing the ratio of between-class variance to the within-class variance in any particular data set, thereby ensuring maximum separability.

Background and Theory

LDA is based on the concept of searching for a linear combination of variables (features) that best separates two or more classes. This concept can be traced back to Ronald A. Fisher in 1936 with the introduction of Fisher's linear discriminant.

The mathematical foundation of LDA considers two key statistical properties:

  • The mean vectors (centroids) of the classes.
  • The variance within each class and between the classes.

How the Algorithm Works

Mathematical Formulation

Given a dataset with dd dimensions and NN samples, suppose we have CC classes. Let xi\mathbf{x}_i be a sample with class label yiy_i. The goal of LDA is to project xi\mathbf{x}_i onto a lower-dimensional space SS that maximizes the class separability.

The within-class scatter matrix SWS_W and the between-class scatter matrix SBS_B are defined as follows:

  • Within-class scatter matrix SWS_W
    SW=c=1CSWc=c=1Cxiclassc(xiμc)(xiμc)TS_W = \sum_{c=1}^{C} S_{W_c} = \sum_{c=1}^{C} \sum_{\mathbf{x}_i \in \text{class}_c} (\mathbf{x}_i - \mathbf{\mu}_c)(\mathbf{x}_i - \mathbf{\mu}_c)^T
    where μc\mu_c is the mean vector of class cc.
  • Between-class scatter matrix SBS_B
    SB=c=1CNc(μcμ)(μcμ)TS_B = \sum_{c=1}^{C} N_c (\mathbf{\mu}_c - \mathbf{\mu})(\mathbf{\mu}_c - \mathbf{\mu})^T
    where NcN_c is the number of samples in class cc and μ\mathbf{\mu} is the overall mean of the dataset.

The objective is to maximize the Fisher criterion, which is defined as the ratio of the determinant of the between-class scatter matrix to the within-class scatter matrix.

J(W)=WTSBWWTSWWJ(W) = \frac{|W^T S_B W|}{|W^T S_W W|}

where WW is the matrix of the projection vectors.

Solving for W

The optimal WW can be found by solving the eigenvalue problem:

SW1SBW=λWS_W^{-1}S_B W = \lambda W

Implementation

Parameters

No parameters.

Examples

Test with the iris flower dataset:

from luma.classifier.discriminant import LDAClassifier
from luma.preprocessing.scaler import StandardScaler
from luma.model_selection.split import TrainTestSplit
from luma.reduction.linear import PCA
from luma.visual.evaluation import DecisionRegion, ConfusionMatrix

from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import numpy as np


X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = TrainTestSplit(X, y,
                                                  test_size=0.2, 
                                                  random_state=42).get

sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.fit_transform(X_test)

pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.fit_transform(X_test_std)

lda = LDAClassifier()
lda.fit(X_train_pca, y_train)

fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)

X_trans = np.concatenate((X_train_pca, X_test_pca))
y_trans = np.concatenate((y_train, y_test))

dec = DecisionRegion(lda, X_trans, y_trans)
dec.plot(ax=ax1)

conf = ConfusionMatrix(y_trans, lda.predict(X_trans))
conf.plot(ax=ax2, show=True)

Notes

  • To use LDA for dimensionality reduction, refer to luma.reduction.linear.LDA

Application and Use Cases

LDA is used in various fields such as facial recognition, medical diagnosis, and financial fraud detection, where it helps in reducing the number of features while retaining the characteristics that contribute most to the classification.

Strengths and Limitations

  • Strengths
    • Reduces computational costs by lowering the dimensionality
    • Minimizes the risk of overfitting
  • Limitations
    • Assumes that the data is normally distributed
    • Assumes that the classes have identical covariance matrices
    • Less effective if the mean vectors are not well separated

Advanced Topics and Further Reading

For those interested in exploring beyond basic LDA, topics such as Quadratic Discriminant Analysis (QDA) and Regularized Discriminant Analysis (RDA) offer more flexibility by relaxing some of LDA's assumptions.

References

  1. Fisher, Ronald A. "The use of multiple measurements in taxonomic problems." Annals of Eugenics, vol. 7, no. 2, 1936, pp. 179-188.
  2. Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern Classification. 2nd ed., Wiley-Interscience, 2001.
profile
𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀

0개의 댓글