Linear Classifiers in Python

지니🧸·2022년 10월 17일
0

ML, DL, etc.

목록 보기
3/3

Applying logistic regression and SVM


Scikit-learn refresher

import sklearn.datasets
newsgroups = sklearn.datasets.fetch_20newgroups_vectorized()
X, y = newsgroups.data, newsgroups.target

X.shape
y.shape
from sklearn.neighbors import KNeighborsClassifer
knn = KNeighborsClassifer(n_neighbors = 1)
knn.fit(X, y)
y_pred = knn.predict(X)

Model evaluation

knn.score(X, y) #not meaningful, have to look at prediction on unseen data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
knn.fit(X_train, y_train)
knn.score(X_test, y_test)

Applying logistic regression and SVM

Using LogisticRegression

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.predict(X_test)
lr.score(X_test, y_test)
lr.predict_proba(X_train[:1])

LinearSVC

import sklearn.datasets
wine = sklearn.datasets.load_wine()
from sklearn.svm import LinearSVC
**svm = LinearSVC()**
svm.fit(wine.data, wine.target)
svm.score(wine.data, wine.target)

SVC (fits non-linear datasets by default)

import sklearn.datasets
wine = sklearn.datasets.load_wine()
**from sklearn.svm import SVC
svm = SVC()** #default hyperparameters
svm.fit(wine.data, wine.target)
svm.score(wine.data, wine.target) #score: 1 (can be overfitting)
  • more complex models like nonlinear SVMs contain the risk of classifier overfitting

Complexity review

  • underfitting: model is too simple, low training accuracy
  • overfitting: model is too complex, low test accuracy

Linear decision boundaries

Decision boundary: tells us what class our classifier will predict for any value of x

  • classifier predicts the blue class in the blue shaded area
    • blue shaded area: feature 2 is small
  • classifier predicts the red class in the red shaded area
    • red shaded area: feature 2 is large
  • decision boundary: dividing line between the two regions
    • line can be in any orientation
    • in this specific case, it is linear since it is horizontal
  • in basic forms, logistic regression & SVMs are linear classifiers
    • they learn linear decision boundaries

Vocabulary

  • classification: supervised learning when the y-values are categories
    • in contrast w/ regression (predicting continuous values)
  • decision boundary: the surface separating different predicted classes
  • linear classifier: a classifier that learns linear decision boundaries
    • (ex) logistic regression, linear SVM
  • linearly separable: a data set can be perfectly explained by a linear classifier

  • left figure: no single line that separates the red and blue examples
  • right figure: we could divide 2 classes w/ a straight line → linearly separable

Loss Functions


Linear classifiers: the coefficients

Dot product

x = np.arange(3) #array([0, 1, 2])
y = np.arange(3, 6) #array([3, 4, 5])
x*y #[(0*3), (1*4),(2*5)] -> array([0, 4, 10])
np.sum(x*y) #0+4+10 = 14
x@y #same as above = 14

Linear classifier prediction

  • raw model output = coefficients x features + intercept
  • linear classifier prediction: compute raw model output, check the sign
    • if positive, predict one class
    • if negative, predict the other class
  • this is the same for logistic regression & linear SVM
    • .fit() is different but .predict() is the same
    • differences in .fit() relate to loss functions
lr = LogisticRegression()
lr.fit(X, y)
lr.predict(X)[10] #0
lr.predict(X)[20] #1
lr.coef_ @ X[10] + lr.intercept_ #raw model output -> array([-33.78572166]) -> negative value -> 0
lr.coef_ @ X[20] + lr.intercept_ # -> array([0.08050621]) -> positive value -> 1

What is a loss function?

Least squares: the squared loss

  • scikit-learn’s LinearRegression minimizes a loss:

i=1n(true ith target value predicted ith target value)2\sum_{i=1}^n\textrm{(true }i\textrm{th target value }- \textrm{predicted }i\textrm{th target value)}^2

  • minimizes sum of squares of errors made on training set
  • error is defined as the difference b/w the true target value & the predicted target value
  • jiggle around the coefficients (parameters) until the error term (loss function) is small as possible
    • minimization in coefficients/parameters is to be reached
  • loss function is a penalty score that tells us how well/bad the model is doing on the training data
  • “fit” function as running code that minimizes the loss
  • scikit-learn model.score() isn’t necessarily the loss function
    • could be, but not guaranteed

Classification errors: the 0-1 loss

  • Squared loss is not appropriate for classification problems
    • b/c y-values are categories, not numbers
  • a natural loss for classification problem: number of errors
  • 0-1 loss:
    • 0 for a correct prediction
    • 1 for incorrect prediction
  • by summing this function over all training examples, we get the number of mistakes we’ve made on the training set
    • since we add 1 to the total for each mistake
  • but the loss is hard to minimize!
    • thus LR & SVMs don’t use it

Minimizing a loss

from scipy.optimize import minimize
minimize(np.square, 0).x #result: 0
  • minimize(function, initial guess).x
    • 1st: function
    • 2nd: initial guess
    • .x : grab the input value that makes the function as small as possible
    • result is 0 for the above code b/c the function is minimized when x = 0
      • the square of a number can only be zero or more
        • smallest possible value is attained when x = 0
minimize(np.square, 2).x #array([-1.88846401e-08])
  • the very small number is normal for numerical optimization:
    • we don’t expect exactly the right answer, but something very close
  • inputs: model coefficients
  • to answer the question: “what values of the model coefficients make my squared error as small as possible?”
    • what linear regression is doing

Loss Function Diagrams

The raw model output

  • Since we predict using the sign of the raw model output, the plot is divided into 2 halves
    • the left half: predict the one class (-1)
      • incorrect predictions
    • the right half: predict the other class (+1)
      • correct predictions

0-1 loss diagram

  • By definition of 0-1 loss, incorrect predictions get a penalty of 1 & correct ones get no penalty
  • this picture is the loss for a particular training
    • to get the whole loss, we need to sum up the contribution from all examples

Linear regression loss diagram

  • squared/quadratic function
  • the raw model output is the prediction
  • intuitively, the loss is higher as the prediction is further away from the true target value (1)
  • problem: the left side is correct (increasing loss as value further away from 1) but the right side is not
    • in the right side, we predict +1 & it is correct but the loss grows large regardless
    • perfectly good models are considered bad by the loss
  • we need specialized loss functions for classification

Logistic loss diagram

  • used in logistic regression
  • smoother version
  • as you move to the right (towards the zone of correct predictions), loss goes down

Hinge loss

  • used in SVMs

Logistic Regression


Logistic regression and regularization

  • regularization combats overfitting by making the model coefficients smaller

  • The figure shows the learned coefficients of a logistic regression model w/ default regularization
  • In scikit-learn, the hyperparameter “C” is the inverse of the regularization strength
    • larger C → less regularization
    • smaller C → more regularization

  • orange curve: with smaller value of C
    • more regularization for our logistic regression model
  • regularization makes the coefficients smaller

How does regularization affect training accuracy?

lr_weak_reg = LogisticRegression(C=100) #weak regularization
lr_strong_reg = LogisticRegression(C=0.01) #strong regularization

#fit both models
lr_weak_reg.fit(X_train, y_train)
lr_strong_reg.fit(X_train, y_train)

#compute training accuracy
lr_weak_reg.score(X_train, y_train) #weak regularization -> higher training accuracy
lr_strong_reg.score(X_train, y_train)
  • model w/ weak regularization gets a higher training accuracy
  • regularization: an extra term added to the original loss function, which penalizes large values of the coefficients

regularized loss =original loss + large coefficient penalty\textrm{regularized loss }=\textrm{original loss }+\textrm{ large coefficient penalty}

  • more regularization → lower training accuracy
  • w/o regularization, we maximize the training accuracy
    • do better on the metric
  • when we add regularization, we modify the loss function to penalize large coefficients, which distracts from the goal of optimizing accuracy
  • more regularization (smaller C)
    → more deviation from goal of maximizing training accuracy
    → lower training accuracy

How does regularization affect test accuracy?

lr_weak_reg.score(X_test, y_test) #0.86
lr_strong_reg.score(X_test, y_test) #0.88
  • more regularization reduces training accuracy but IMPROVES test accuracy
  • not having access to a particular feature ⇒ the corresponding coefficient set to zero
  • regularizing (making your coefficient smaller) is like a compromise b/w not using the feature at all (setting the coefficient to zero) & fully using it (the un-regularized coefficient value)
  • using a feature too heavily → overfitting
    • regularization lessens overfitting

L1 vs. L2 regularization

  • Lasso: linear regression w/ L1 regularization
  • Ridge: linear regression w/ L2 regularization
  • for other models like logistic regression we just say L1, L2, etc.
  • both help reduce overfitting
  • L1 performs feature selection
lr_L1 = LogisticRegression(penalty='l1')
lr_L2 = LogisticRegression() #penalty='12' by default

lr_L1.fit(X_train, y_train)
lr_L2.fit(X_train, y_train)

plt.plot(lr_L1.coef_.flatten())
plt.plot(lr_L2.coef_.flatten())

  • L1 regularization: set many of the coefficients to zero
    • ignore all the coefficients
    • it performed feature selection for us
  • L2 regularization: shrinks the coefficients to be smaller
    • analogous to what happens w/ Lasso & Ridge regression
profile
우당탕탕

0개의 댓글