[Reduction] Linear Discriminant Analysis (LDA)

안암동컴맹·2024년 3월 30일
0

Machine Learning

목록 보기
63/103

Linear Discriminant Analysis (LDA)

Introduction

Linear Discriminant Analysis (LDA) is a supervised machine learning algorithm used for both classification and dimensionality reduction. It seeks to reduce dimensions while preserving as much of the class discriminatory information as possible. Unlike Principal Component Analysis (PCA), which focuses on capturing the variance in the data, LDA aims to model differences between classes. This document provides a comprehensive overview of LDA, focusing on its application in dimensionality reduction, supported by mathematical formulations.

Background and Theory

LDA is based on the concepts of separation and projection. It projects data points onto a lower-dimensional space in a way that maximizes the separability between different classes. This is achieved by maximizing the ratio of the between-class variance to the within-class variance, ensuring that classes are as distinct as possible in the reduced space.

Mathematical Foundations

Consider a dataset with nn samples and pp features, grouped into kk distinct classes. Let xi\mathbf{x}_i be a sample with class label yiy_i.

Within-Class Scatter Matrix (SWS_W)

The within-class scatter matrix measures the spread of the samples within each class:

SW=c=1kxiclassc(ximc)(ximc)TS_W = \sum_{c=1}^{k} \sum_{\mathbf{x}_i \in \text{class}_c} (\mathbf{x}_i - \mathbf{m}_c)(\mathbf{x}_i - \mathbf{m}_c)^T

where mc\mathbf{m}_c is the mean vector of class cc:

mc=1ncxiclasscxi\mathbf{m}c = \frac{1}{n_c} \sum{\mathbf{x}_i \in \text{class}_c} \mathbf{x}_i

and ncn_c is the number of samples in class cc.

Between-Class Scatter Matrix (SBS_B)

The between-class scatter matrix measures the spread between the classes:

SB=c=1knc(mcm)(mcm)TS_B = \sum_{c=1}^{k} n_c (\mathbf{m}_c - \mathbf{m})(\mathbf{m}_c - \mathbf{m})^T

where m\mathbf{m} is the overall mean of the data:

m=1ni=1nxi\mathbf{m} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i

Objective Function

The goal of LDA is to find the projection matrix W\mathbf{W} that maximizes the ratio of the determinant of the between-class scatter matrix to the determinant of the within-class scatter matrix in the projected space:

maxWJ(W)=det(WTSBW)det(WTSWW)\max_{\mathbf{W}} J(\mathbf{W}) = \frac{\det(\mathbf{W}^T S_B \mathbf{W})}{\det(\mathbf{W}^T S_W \mathbf{W})}

This optimization problem can be solved as a generalized eigenvalue problem:

SBw=λSWwS_B \mathbf{w} = \lambda S_W \mathbf{w}

where w\mathbf{w} are the columns of W\mathbf{W}, the directions to project the data.

Procedural Steps

  1. Compute Mean Vectors: Calculate the mean vectors for each class.
  2. Compute Scatter Matrices: Calculate the within-class and between-class scatter matrices.
  3. Solve the Eigenvalue Problem: Solve for the eigenvalues and eigenvectors of SW1SBS_W^{-1}S_B.
  4. Select Linear Discriminants: Choose the top eigenvectors based on their corresponding eigenvalues to form the transformation matrix W\mathbf{W}.
  5. Project Data: Use W\mathbf{W} to project the data onto a lower-dimensional space.

Implementation

Parameters

  • n_components: int
    Number of linear discriminants

Notes

To use LDA for classification, refer to luma.classifier.discriminant.LDAClassifier.

Examples

from luma.reduction.linear import LDA

from sklearn.datasets import load_iris

import matplotlib.pyplot as plt
import numpy as np

iris_df = load_iris()
X = iris_df.data
y = iris_df.target

model = LDA(n_components=2)
X_trans = model.fit_transform(X, y)

fig = plt.figure(figsize=(11, 5))
ax1 = fig.add_subplot(1, 2, 1, projection="3d")
ax2 = fig.add_subplot(1, 2, 2)

for cl, m in zip(np.unique(y), ["s", "o", "D"]):
    X_cl = X[y == cl]
    sc = ax1.scatter(
        X_cl[:, 0],
        X_cl[:, 1],
        X_cl[:, 2],
        c=X_cl[:, 3],
        marker=m,
        label=iris_df.target_names[cl],
    )

ax1.set_xlabel(iris_df.feature_names[0])
ax1.set_ylabel(iris_df.feature_names[1])
ax1.set_zlabel(iris_df.feature_names[2])
ax1.set_title("Original Iris Dataset")
ax1.legend()

cbar = ax1.figure.colorbar(sc, fraction=0.04)
cbar.set_label(iris_df.feature_names[3])

for cl, m in zip(np.unique(y), ["s", "o", "D"]):
    X_tr_cl = X_trans[y == cl]
    ax2.scatter(
        X_tr_cl[:, 0],
        X_tr_cl[:, 1],
        marker=m,
        edgecolors="black",
        label=iris_df.target_names[cl],
    )

ax2.set_xlabel(r"$z_1$")
ax2.set_ylabel(r"$z_2$")
ax2.set_title(
    f"Iris Dataset after {type(model).__name__} "
    + r"$(\mathcal{X}\rightarrow\mathcal{Z})$"
)
ax2.legend()

plt.tight_layout()
plt.show()

Applications

  • Feature Extraction: Reducing the dimensionality of input features while maintaining the class-discriminatory information.
  • Data Visualization: Projecting data into two or three dimensions for visualization purposes.
  • Preprocessing for Classification: Improving the performance of classifiers by reducing the dimensionality of the feature space.

Strengths and Limitations

Strengths

  • Class Separation: Explicitly focuses on maximizing class separability.
  • Efficiency: Generally requires fewer dimensions to achieve high class separation.

Limitations

  • Assumption of Linearity: Assumes linear relationships between features.
  • Normal Distribution: Assumes features are normally distributed within each class.
  • Equal Covariance: Assumes identical covariance matrices for all classes.

Advanced Topics

  • Regularized LDA: Introduces regularization to handle scenarios where SWS_W is singular or near-singular.
  • Kernel LDA: Extends LDA to nonlinear transformations using kernel methods.

References

  1. Fisher, R.A. "The use of multiple measurements in taxonomic problems." Annual Eugenics, 7(2):179-188, 1936.
  2. Duda, R.O., Hart, P.E., and Stork, D.G. "Pattern Classification." 2nd ed. Wiley-Interscience, 2001.
  3. McLachlan, Geoffrey J. "Discriminant Analysis and Statistical Pattern Recognition." Wiley-Interscience, 2004.
profile
𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀

0개의 댓글