Multidimensional Scaling (MDS) is a statistical technique used for analyzing similarity or dissimilarity data. It aims to represent data in a lower-dimensional space (typically two or three dimensions) to visualize the similarities or dissimilarities between pairs of objects. The essence of MDS is to place each object in N-dimensional space such that the between-object distances are preserved as well as possible. MDS finds its applications in various fields such as psychology, marketing, and bioinformatics, providing insightful visualizations for complex datasets.
MDS is rooted in the mathematical theory of metric spaces and dimensionality reduction. The goal of MDS is to find a configuration of points in a lower-dimensional space that reflects the observed distances (similarities or dissimilarities) among a set of items as accurately as possible. The original distances are usually derived from direct measurements or computed using a distance metric such as Euclidean, Manhattan, or more complex measures that suit the data characteristics.
Given a set of items with a matrix representing the dissimilarities between each pair of items, MDS seeks a set of points in dimensions (where ), such that the Euclidean distances between these points closely match the original dissimilarities. The objective is to minimize the stress function , which is a measure of the discrepancy between the distances in the lower-dimensional representation and the original dissimilarities:
where is the Euclidean distance between points and in the lower-dimensional space, and is the original dissimilarity between items and .
n_components
: int
, default = NoneTest on the wine dataset:
from luma.reduction.manifold import MDS
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import numpy as np
iris_df = load_iris()
X = iris_df.data
y = iris_df.target
model = MDS(n_components=2)
X_trans = model.fit_transform(X)
fig = plt.figure(figsize=(11, 5))
ax1 = fig.add_subplot(1, 2, 1, projection="3d")
ax2 = fig.add_subplot(1, 2, 2)
for cl, m in zip(np.unique(y), ["s", "o", "D"]):
X_cl = X[y == cl]
sc = ax1.scatter(
X_cl[:, 0],
X_cl[:, 1],
X_cl[:, 2],
c=X_cl[:, 3],
marker=m,
label=iris_df.target_names[cl],
)
ax1.set_xlabel(iris_df.feature_names[0])
ax1.set_ylabel(iris_df.feature_names[1])
ax1.set_zlabel(iris_df.feature_names[2])
ax1.set_title("Original Iris Dataset")
ax1.legend()
cbar = ax1.figure.colorbar(sc, fraction=0.04)
cbar.set_label(iris_df.feature_names[3])
for cl, m in zip(np.unique(y), ["s", "o", "D"]):
X_tr_cl = X_trans[y == cl]
ax2.scatter(
X_tr_cl[:, 0],
X_tr_cl[:, 1],
marker=m,
edgecolors="black",
label=iris_df.target_names[cl],
)
ax2.set_xlabel(r"$z_1$")
ax2.set_ylabel(r"$z_2$")
ax2.set_title(
f"Iris Dataset after {type(model).__name__} "
+ r"$(\mathcal{X}\rightarrow\mathcal{Z})$"
)
ax2.legend()
ax2.grid(alpha=0.2)
plt.tight_layout()
plt.show()
MDS is versatile and finds applications across various domains:
- Borg, I., & Groenen, P. J. (2005). Modern Multidimensional Scaling: Theory and Applications. Springer Series in Statistics. Springer.
- Cox, T. F., & Cox, M. A. (2001). Multidimensional Scaling. Chapman and Hall/CRC.
- Kruskal, J. B., & Wish, M. (1978). Multidimensional Scaling. Sage Publications.