Neural Networks mimic this model
Generalized formula for perceptrons
Input Layer
Output Layer : final estimate of the output
- There could be multiple output layers
Hidden Layer : the layers in the middle
- Deep Network : 2 or more hidden layers
- Might be difficult to interpret
z = x*w + b
: basic formula for each perceptron
- w
: how much weight or strength to give the incoming input
- b
: an offset value, making x*w
have to reach a certain threshold before having an effect
Activation Function (f(z) or X
) : sets boundaries to output values from the neuron
Step Function : Useful for classification
- a strong function - small changes aren't reflected
Sigmoid Function : moderate form of a step function
- more sensitive to small changes
max(0,z)
Activation functions for multiclass classification (Non-exclusive)
Activation functions for multiclass classification (Exclusive)
- Softmax Function : the target class chosen will have the highest probability
Notations
- ŷ
: estimation of what the model predicts the label to be
- y
: true value
- a
: neuron's prediction
Cost Function
- must be an average so it can output a single value
- Used to keep track of our loss/cost during training to monitor network performance
Quadratic cost function
- aL is the prediction at L layer
- Why do we square it?
Generalization of cost function
- W
is our neural network's weights, B
is our neural network's biases, Sr
is the input of a single training sample, and Er
is the desired output of that training sample.
Gradient Descent : find the w
values that minimizes the cost
- Learning Rate : how much you should move each time
Gradient : derivative for N-dimensional Vectors
∇C(w1,w2,...wn)
Cross Entropy Loss Function : for classification problems
- binary classfication :
- multi class classification :
w
x
to set the activation function a
for the input layer & repeat