Artificial Intelligence

Hyun Lee·2023년 3월 21일

University

목록 보기

2/9

0. concepts

foward = inference = forward pass = forward propagation = prediction
output value = predict = inference
MLP(multi layer perceptron) = fully-connected network
error = input gradient = activation gradient
n-layer perceptron = fully-connected network = dense network = feed forward network
input image = input activation = input feature map
output image = output activation = output feature map
filter = kernel = weight
Output channels = Output depth

1. Traditional AI methods

2. Mathematical Backgrounds

gradient is a vector/matrix that points in the direction of the steepest increase in the function. The direction of greatest change of a function

3. Python Backgrounds

4. Introduction to Deep Learning

5. neural Network Basics

Gradient Descent

: A famous optimization algorithm to minimize a cost function by iteratively moving in the direction of steepest descent as mathematically defined by the negative gradient

Batch

Hyperparameters

6. Multi-Layer Perceptron

Multi-layer perceptron (MLP)

The need for mutiple layer (XOR problem)

MLP: Training

: finding weight/bias matrices given training data to achieve our goal

Gradient Descent: MLP

7. Deep Neural Networks

0으로 초기화 하면 안됨.
tanh, relU 는 weight가 0에서 영원히 멈추고,
sigmoid는 weight가 똑같이 업데이트 됨

8. Convolutional Neural Network

Convolutional Neural Network (CNN)

convolution (kernel)의 역할....>

CNN models

AlexNet

VGG

GoogLeNet – ResNet

Gradient vanishing/exploding problem

ResNet

batch nomalization

MobileNet

YOLO

9. CONV Backpropagation

10. Regularization

Underfitting/Overfitting

Validation set

Model architecture selection

Data augmentation

Weight decay

Early stopping

Dropout

기출

x
x
x
x
o
a
e
a

Universal Approximation Theorem이란 1개의 히든 레이어를 가진 Neural Network를 이용해 어떠한 함수든 근사시킬 수 있다는 이론

ReLU 사용/ residiual block 사용/ Auxiliary Classifier 사용

단일 sample을 input으로 넣는게 아닌 여러 sample을 input으로 한번에 넣는 것.
1. 전체 dataset에 대해 1번의 weight update을 하는 것은 불가능
2. matrix 연산이 빠르기 때문

1. 옅은 층의 신경망이 exponential하게 hidden unit을 더 필요로 함
2. 깊은 층이 non-linearity가 좋음/ good at generalization

추출한 feature를 classification해주기 위해서

1. activation함수로 tanh, ReLU 사용 시 weight가 0값으로 계속 고정
2. sigmoid 사용 시 weight가 column(혹은 row) 단위로만 update된다.

1x1 convolution을 수행하여 channel size를 조절시켜 줌

epoch별 validation loss(accuracy)를 추출하여 training accuracy와의 격차를 모니터링함. 격차가 벌어지기 시작하면 overfitting이 발생

오버피팅 해결방법:
1. model archiecture selection: 너무 복잡한 모델은 오버피팅 일어나므로 적절한 모델 선택
2. Larger Dataset Size=> Data Augmentation
3. Weight Decay: Suppress weights to be small values. Add L1 or L2 regularization term in the error function
4. Early Stopping
5. Dropout : Prevent overfitting by reducing co-adaptation of neurons