Given a set of labeled examples (x, y), learn a mapping function g: X→Y
Model "generalization" is a goal (to perform well on the unseen data)
Training error (E train)
Testing error (E test) can be a proxy for E gen
Curse of dimension: as the input data or feature dimension increases, the number of sample data should exponentially increases, which is impossible in real.
to avoid Overfitting, we can use:
Cross validation (CV)
Linear models
Univariate problem when the output is determined by a single feature, Multivariate problem when the output is determined by multiple features
Linear regression framework
Iterative optimization by Gradient descent
Gradient: the derivative of vector functions, direction of greatest increase/decrease of a function
which direction? steepest gradient descent with a greedy method
c.f. Error surface
works well even when the number of samples is large
Stochastic Gradient Descent (SGD)
To avoid local optimum:
Learning rate scheduling
to avoid overfitting, "Regularization"
Linear classification Hypothesis H
Linear Classification Framework
Score and Margin
Sigmoid function: score and probability mapping
Multiclass classification
Support Vector Machine (SVM)
Artificial Neural Network (ANN)
Convolutional Neural Network
: classification model for high-demensional data
머신러닝 알고리즘의 종류에 상관없이 서로 다르거나 같은 매커니즘으로 동작하는 다양한 ML 모델들을 묶어 함께 사용하는 방식
(+) can improve predictive performance, and do not need too much parameter tuning
Bagging and Boosting
Performace Evaluation in SL