-
The LSTM is considerably suitable for time-series sequences.
-
RNNs exhibit limitations in handling long-term dependencies, because the gradient either vanishes or explodes for the training process.
-
An LSTM network comprises a forget gate, input gate, and output gate, and can appropriately handle long-term dependencies.
-
Given the previous hidden state, previous cell state, and current input feature, an LSTM network determines each gate value as follows:
-
fr=σ(Wf⋅[Ct−1,ht−1,xt]+bf)it=σ(Wi⋅[Ct−1,ht−1,xt]+bi)ot=σ(Wo⋅[Ct,ht−1,xt]+bo)
-
f, i, o: forget, input, output gate values respectively
-
sigma: the activation function
- In LSTM, the logistic sigmoid function is utilized as the activation function.
- Because these gates appropriately control long-term dependencies, LSTM network can properly resolve the vanishing or exploding gradient problem.