[Single] Perceptron Classification

안암동컴맹·2024년 4월 11일
0

Deep Learning

목록 보기
6/31

Perceptron Classifier

Introduction

The perceptron is a type of artificial neuron or simplest form of a neural network. It is the foundational building block of more complex neural networks and deep learning models. The concept of the perceptron was introduced by Frank Rosenblatt in 1957 as a binary classifier, designed to classify linearly separable datasets. It's a supervised learning algorithm, meaning it learns from labeled data to make predictions.

The perceptron makes its predictions based on a linear predictor function combining a set of weights with the feature vector. The algorithm iteratively adjusts these weights based on the errors made in previous predictions. The simplicity of the perceptron makes it a starting point for understanding more complex neural network models.

Mathematical Foundations

Model Representation

A perceptron takes a vector of real-valued inputs, x=[x1,x2,,xn]\mathbf{x} = [x_1, x_2, \ldots, x_n], where nn is the number of features. Each input is weighted by a corresponding weight, w=[w1,w2,,wn]\mathbf{w} = [w_1, w_2, \ldots, w_n], and the perceptron makes predictions based on the weighted sum of its inputs plus a bias term bb. The prediction (y^\hat{y}) can be represented as follows:

y^=f(wx+b)\hat{y} = f(\mathbf{w} \cdot \mathbf{x} + b)

Where wx\mathbf{w} \cdot \mathbf{x} denotes the dot product of the vectors w\mathbf{w} and x\mathbf{x}, and ff is the activation function. In the case of the perceptron, the activation function is typically a step function:

f(z)={1if z00otherwisef(z) = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{otherwise} \end{cases}

Learning Algorithm

The perceptron learns by iteratively adjusting its weights and bias to minimize the difference between the actual and predicted labels on the training data. The weights are updated as follows:

wi(new)=wi+η(yy^)xiw_i^{(new)} = w_i + \eta (y - \hat{y}) x_i
b(new)=b+η(yy^)b^{(new)} = b + \eta (y - \hat{y})

Where:

  • wiw_i is the current weight,
  • xix_i is the input feature,
  • yy is the actual label,
  • y^\hat{y} is the predicted label,
  • η\eta is the learning rate, a small positive value that determines the size of the weight updates.

The perceptron rule implies that the weight update is performed only if the prediction y^\hat{y} is wrong. If y^=y\hat{y} = y, then the weights are not updated.

Convergence Theorem

The perceptron convergence theorem guarantees that if the two classes of data are linearly separable, the perceptron algorithm will converge to a solution in a finite number of steps. However, if the classes cannot be separated by a linear boundary, the algorithm will not converge to a stable set of weights.

Implementation

Parameters

  • learning_rate: float, default = 0.01
    Step size for updating weights
  • max_iter: int, default = 1000
    Maximum iteration
  • regularization: Literal['l1', 'l2', 'elastic-net'], default = None
    Regularizing methods
  • alpha: float, default = 0.0001
    Regularization strength
  • l1_ratio: float, default = 0.5
    Ratio of L1 regularization in elastic-net regularization
  • random_state: int, default = None
    Seed for random shuffling during SGD

Examples

from luma.preprocessing.scaler import StandardScaler
from luma.model_selection.split import TrainTestSplit
from luma.model_selection.search import RandomizedSearchCV
from luma.model_selection.fold import StratifiedKFold
from luma.neural.single import PerceptronClassifier
from luma.visual.evaluation import ConfusionMatrix

from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)

X, y = load_wine(return_X_y=True)
n_classes = len(np.unique(y))

sc = StandardScaler()
X_std = sc.fit_transform(X)

X_train, X_test, y_train, y_test = TrainTestSplit(X_std, y,
                                                  test_size=0.2,
                                                  shuffle=True,
                                                  stratify=True).get

param_dist = {"learning_rate": np.logspace(-3, -1, 10),
              "alpha": np.logspace(-2, 2, 10),
              "l1_ratio": np.linspace(0, 1, 10),
              "regularization": ["l1", "l2", "elastic-net"]}

rand = RandomizedSearchCV(estimator=PerceptronClassifier(max_iter=100),
                          param_dist=param_dist,
                          max_iter=50,
                          cv=5,
                          maximize=True,
                          refit=True,
                          shuffle=True,
                          fold_type=StratifiedKFold,
                          verbose=True)

rand.fit(X_train, y_train)
print(rand.best_params, rand.best_score)

X_all = np.vstack((X_train, X_test))
y_all = np.hstack((y_train, y_test))

model = rand.best_model
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)

im = ax1.imshow(model.weights_, cmap="gray")
ax1.set_yticks(range(n_classes), range(n_classes))
ax1.set_ylabel("Classes")
ax1.set_xlabel("Weights")
ax1.set_title("Trained Weights")
ax1.figure.colorbar(im)

conf = ConfusionMatrix(y_all, model.predict(X_all))
conf.plot(ax=ax2, show=True)

Applications

  • Binary Classification: Its primary application, useful in scenarios where objects need to be categorized into two groups (e.g., spam vs. non-spam emails).
  • Feature Extraction: Used as a preliminary step to reduce dimensionality before applying more sophisticated algorithms.

Strengths and Limitations

Strengths

  • Simplicity: Easy to implement and understand.
  • Efficiency: Requires relatively few computational resources.

Limitations

  • Linear Separability: Only capable of classifying linearly separable data sets.
  • Binary Outputs: Limited to binary classification tasks.

Advanced Topics

  • Kernel Perceptron: Extension of the perceptron that uses kernel functions to enable it to classify non-linearly separable data.
  • Multi-layer Perceptrons (MLP): Networks of perceptrons (also known as feedforward neural networks) that can model complex, non-linear relationships between inputs and outputs.

References

  1. Rosenblatt, Frank. "The perceptron: A probabilistic model for information storage and organization in the brain." Psychological Review 65.6 (1958): 386.
  2. Minsky, Marvin, and Seymour Papert. "Perceptrons: An introduction to computational geometry." The MIT Press, 1969.
  3. Goodfellow, Ian, et al. "Deep Learning." MIT Press, 2016. 🔗
profile
𝖪𝗈𝗋𝖾𝖺 𝖴𝗇𝗂𝗏. 𝖢𝗈𝗆𝗉𝗎𝗍𝖾𝗋 𝖲𝖼𝗂𝖾𝗇𝖼𝖾 & 𝖤𝗇𝗀𝗂𝗇𝖾𝖾𝗋𝗂𝗇𝗀

0개의 댓글