
Transform the output of the neural network non-linearly
Role of the Activation Function
Considerations when Choosing an Activation Function
A non-linear function that transforms the input into a value between 0 and 1
Formula
Graph

Derivative of the Sigmoid Function
Characteristics
Drawbacks
Similar to the Sigmoid function but with an output range between -1 and 1
Formula
Graph

Derivative of the Tanh Function
Characteristics
Drawbacks
Returns 0 for negative inputs and the input value itself for positive inputs
Formula
Graph

Characteristics
Drawbacks
Developed to address the Dead Neurons problem of ReLU, Leaky ReLU applies a small gradient to negative inputs to ensure that all inputs have a non-zero gradient
Formula
Graph

Derivative of the Leaky ReLU Function
Characteristics
Drawbacks
Converts input values to values between 0 and 1, and ensures that the sum of these values equals 1 -> Can be interpreted as a probability distribution
Formula
Converts the output value ( z_i ) for each class into an exponential function, then divides by the sum of the exponential values of all classes to calculate the probability for each class
Characteristics
How it Works
Convert Logits to Exponential Functions: Convert the logits (output values) of each class to exponential functions
Normalization: Divide the exponential value of each class by the total sum of exponential values across all classes, converting the result into probabilities: The sum of probabilities across all classes must equal 1
Probability Calculation: After normalization, the calculated probability for each class represents the likelihood that the data belongs to that class
Softmax Function Example
Convert the logits to exponential values
Sum of exponential values for all classes:
Probability calculation for each class:
Pros and Cons