Laplacian Eigenmap is a dimensionality reduction technique used in machine learning and data science to project high-dimensional data into a lower-dimensional space, preserving the local structure of the data. Unlike other linear techniques such as Principal Component Analysis (PCA), Laplacian Eigenmap is a non-linear method that focuses on maintaining local distances between data points, which makes it particularly useful for data that lies on a manifold.
The technique is rooted in the field of spectral graph theory and is designed to uncover the intrinsic geometry of data represented as a graph. By constructing a neighborhood graph from the data points, Laplacian Eigenmap uses the graph Laplacian's eigenvalues and eigenvectors to find a low-dimensional representation that best preserves the graph's local properties.
To understand Laplacian Eigenmaps, it's essential to grasp some underlying mathematical concepts, primarily those related to graph theory and spectral analysis.
A graph consists of a set of vertices and a set of edges connecting these vertices. In the context of Laplacian Eigenmaps, vertices represent data points, and edges represent the connections between them, typically based on some notion of similarity or proximity.
The graph Laplacian is a matrix representation that characterizes the difference between a vertex and its neighbors. It's defined as , where:
The graph Laplacian plays a central role in Laplacian Eigenmaps, as it captures the essential structure of the data.
The objective of the Laplacian Eigenmap algorithm is to find a low-dimensional representation of the data that preserves local distances. This is achieved through the eigenvalue decomposition of the Laplacian matrix . Specifically, the algorithm involves the following steps:
n_components
: int
sigma
: float
, default = 1.0from luma.reduction.manifold import LaplacianEigenmap
from sklearn.datasets import make_s_curve
import matplotlib.pyplot as plt
X, y = make_s_curve(n_samples=500, noise=0.1)
model = LaplacianEigenmap(n_components=2, sigma=0.3)
Z = model.fit_transform(X)
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1, projection="3d")
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap="rainbow")
ax1.set_xlabel(r"$x$")
ax1.set_ylabel(r"$y$")
ax1.set_zlabel(r"$z$")
ax1.set_title("Original S-Curve")
ax2.scatter(Z[:, 0], Z[:, 1], c=y, cmap="rainbow")
ax2.set_xlabel(r"$z_1$")
ax2.set_ylabel(r"$z_2$")
ax2.set_title(f"After {type(model).__name__}")
ax2.grid(alpha=0.2)
plt.tight_layout()
plt.show()
Extending Laplacian Eigenmap to new data points requires techniques such as the Nyström method, which approximates the eigenfunctions of the Laplacian matrix for out-of-sample data based on a subset of the original data.
Kernel methods can be integrated with Laplacian Eigenmaps to capture more complex data structures, leading to a more flexible approach that can handle a broader range of datasets.
- Belkin, Mikhail, and Partha Niyogi. "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation." Neural Computation, vol. 15, no. 6, 2003, pp. 1373-1396.
- Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4 (2007): 395-416.