The perceptron is a type of artificial neuron or simplest form of a neural network. It is the foundational building block of more complex neural networks and deep learning models. The concept of the perceptron was introduced by Frank Rosenblatt in 1957 as a binary classifier, designed to classify linearly separable datasets. It's a supervised learning algorithm, meaning it learns from labeled data to make predictions.
The perceptron makes its predictions based on a linear predictor function combining a set of weights with the feature vector. The algorithm iteratively adjusts these weights based on the errors made in previous predictions. The simplicity of the perceptron makes it a starting point for understanding more complex neural network models.
A perceptron takes a vector of real-valued inputs, , where is the number of features. Each input is weighted by a corresponding weight, , and the perceptron makes predictions based on the weighted sum of its inputs plus a bias term . The prediction () can be represented as follows:
Where denotes the dot product of the vectors and , and is the activation function. In the case of the perceptron, the activation function is typically a step function:
The perceptron learns by iteratively adjusting its weights and bias to minimize the difference between the actual and predicted labels on the training data. The weights are updated as follows:
Where:
The perceptron rule implies that the weight update is performed only if the prediction is wrong. If , then the weights are not updated.
The perceptron convergence theorem guarantees that if the two classes of data are linearly separable, the perceptron algorithm will converge to a solution in a finite number of steps. However, if the classes cannot be separated by a linear boundary, the algorithm will not converge to a stable set of weights.
learning_rate
: float
, default = 0.01max_iter
: int
, default = 1000regularization
: Literal['l1', 'l2', 'elastic-net']
, default = Nonealpha
: float
, default = 0.0001l1_ratio
: float
, default = 0.5random_state
: int
, default = Nonefrom luma.preprocessing.scaler import StandardScaler
from luma.model_selection.split import TrainTestSplit
from luma.model_selection.search import RandomizedSearchCV
from luma.model_selection.fold import StratifiedKFold
from luma.neural.single import PerceptronClassifier
from luma.visual.evaluation import ConfusionMatrix
from sklearn.datasets import load_wine
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
X, y = load_wine(return_X_y=True)
n_classes = len(np.unique(y))
sc = StandardScaler()
X_std = sc.fit_transform(X)
X_train, X_test, y_train, y_test = TrainTestSplit(X_std, y,
test_size=0.2,
shuffle=True,
stratify=True).get
param_dist = {"learning_rate": np.logspace(-3, -1, 10),
"alpha": np.logspace(-2, 2, 10),
"l1_ratio": np.linspace(0, 1, 10),
"regularization": ["l1", "l2", "elastic-net"]}
rand = RandomizedSearchCV(estimator=PerceptronClassifier(max_iter=100),
param_dist=param_dist,
max_iter=50,
cv=5,
maximize=True,
refit=True,
shuffle=True,
fold_type=StratifiedKFold,
verbose=True)
rand.fit(X_train, y_train)
print(rand.best_params, rand.best_score)
X_all = np.vstack((X_train, X_test))
y_all = np.hstack((y_train, y_test))
model = rand.best_model
fig = plt.figure(figsize=(10, 5))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
im = ax1.imshow(model.weights_, cmap="gray")
ax1.set_yticks(range(n_classes), range(n_classes))
ax1.set_ylabel("Classes")
ax1.set_xlabel("Weights")
ax1.set_title("Trained Weights")
ax1.figure.colorbar(im)
conf = ConfusionMatrix(y_all, model.predict(X_all))
conf.plot(ax=ax2, show=True)
- Rosenblatt, Frank. "The perceptron: A probabilistic model for information storage and organization in the brain." Psychological Review 65.6 (1958): 386.
- Minsky, Marvin, and Seymour Papert. "Perceptrons: An introduction to computational geometry." The MIT Press, 1969.
- Goodfellow, Ian, et al. "Deep Learning." MIT Press, 2016. 🔗