The Local Outlier Factor (LOF) algorithm is a density-based outlier detection method used in data analysis and machine learning to identify anomalous data points by measuring the local deviation of density of a given data point with respect to its neighbors. Unlike global outlier detection methods, LOF considers the local density variation, making it effective in detecting outliers in datasets with varying densities.
LOF relies on the concept of local density, where the density around a particular data point is compared to the densities of its neighbors. An outlier is then identified if the density around a data point is significantly lower than the density around its neighbors. This method is particularly useful in datasets where the notion of density varies across observations.
The LOF of a data point is calculated based on its local reachability density (LRD) and those of its neighbors. The key steps and formulas involved are as follows:
k-Distance: For each data point , the k-distance is defined as the distance to the nearest neighbor.
Reachability Distance: The reachability distance of a data point from point is defined as the maximum of the k-distance of and the actual distance between and , denoted as . It is given by:
Local Reachability Density (LRD): The LRD of a data point is the inverse of the average reachability distance of the data point from its neighbors, indicating the density around this point. It is computed as:
where is the set of the nearest neighbors of .
Local Outlier Factor: Finally, the LOF of a data point is the ratio of the average LRD of its neighbors to its own LRD. A LOF significantly greater than 1 indicates an outlier. It is given by:
n_neighbors
: int
, default = 10threshold
: float
, default = 1.5from luma.preprocessing.outlier import LocalOutlierFactor
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
X, y = make_blobs(n_samples=300, cluster_std=3, centers=3)
lof = LocalOutlierFactor(n_neighbors=10, threshold=1.5)
lof.fit(X)
scores = lof.get_scores(X)
scores_amp = np.exp(scores**2) * 20
sc = plt.scatter(
X[:, 0], X[:, 1], s=scores_amp, c=scores, cmap="coolwarm", alpha=0.3
)
plt.scatter(X[:, 0], X[:, 1], s=10, c="black", marker="x", alpha=0.5)
plt.colorbar(sc, label="Local Outlier Factor")
plt.xlabel("x")
plt.ylabel("y")
plt.title("Local Outlier Factor")
plt.grid(alpha=0.2)
plt.tight_layout()
plt.show()
- Breunig, Markus M., et al. "LOF: identifying density-based local outliers." Proceedings of the 2000 ACM SIGMOD international conference on Management of data. 2000.
- Kriegel, Hans-Peter, Peer Kröger, and Arthur Zimek. "Outlier detection techniques." Tutorial at KDD. 2010.