Optimization in Neural Networks
Regression with a perception




- 신경망(percetptron) : takes some inputs to output we want to predict
Regression with a perceptron - Loss function

- For handle positive error and negative one same we square the error
- multiplying by half : cosmetic reason(y-y')^2 derivative get a lingering two

Regression with a perceptron - Gradient Descent




Classification with Perceptron


- Activation function : crunch the number interval 1 and 0
- Sigmoid function : S(x)=1+e−x1
Classification with Perceptron - The sigmoid function

Q. why sigmoid is useful?
A. Its derivatives is really nice
- σ(z)=(1+e−z)−1
- chain rule dzdσ(z)가
- −1(1+e−z)−1−1(dzd(1+e−z))



- at the end the derivatives of sigmoid is
- dzdσ(z)=σ(z)(1−σ(z))
- It looks beautiful
Classification with Perceptron - Gradient Descent


- To get the least log loss(y, y-hat) error
- Gradient descent

- Starts with arbitrary(random) values for weights and biases
Classification with a Neural Network
- Neural network : a bunch of perceptrons organized in layers


- How neural network calculated
Classification with a Neural Network - Minimizing log-loss



- how to update hidden weight to minimize log-logss



Gradient Descent and Backpropagation




Newton's Method
Newton's Method


Newton's Method: An example




The second derivative









The Hessian




Hessians and concavity



Newton's Method for two variables






- Only 8 steps to obtain minimum - Fast!