The hyperbolic tangent function, commonly referred to as tanh, is a widely used activation function in neural networks. Its mathematical form is similar to the sigmoid function but outputs values ranging from -1 to 1, making it zero-centered. This characteristic often leads to better performance in neural networks, especially in the hidden layers, by mitigating the vanishing gradient problem to some extent and helping in faster convergence.
The tanh function is defined mathematically as:
This formula showcases how tanh utilizes the exponential function to map any real-valued number to the range . Its zero-centered nature is a significant advantage over functions like the sigmoid because it distributes the activations around 0, reducing the bias shift effect in subsequent layers.
Activation functions introduce non-linear properties to the network, enabling it to learn complex data patterns and perform tasks beyond mere linear classification. The choice of activation function significantly impacts the network's learning dynamics and performance.
The tanh function's effectiveness partly lies in its derivative, which is used during backpropagation for computing gradients:
This derivative indicates how changes in the input affect changes in the output, playing a crucial role in updating weights through gradient descent. Notably, the derivative of is higher for values of closer to 0, facilitating stronger gradients and potentially faster learning, but it still suffers from the vanishing gradient problem for large values of .
Implementing tanh as an activation function in neural networks involves:
The tanh function is particularly favored in:
While tanh is a foundational activation function, the evolution of deep learning has seen the rise of other functions like ReLU and its variants, which, despite their simplicity, often outperform tanh in deep networks due to their ability to alleviate the vanishing gradient problem more effectively.
- LeCun, Yann, Leon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. "Efficient backprop." In Neural networks: Tricks of the trade, pp. 9-48. Springer, Berlin, Heidelberg, 2012.