-
A model proposed to solve the Long-term dependency / Vanishing (Exploding) gradient problem, which is the biggest problem of RNN (Recurrent Neural Network).
- Because it shares the same weight, the value of gradient becomes uncontrollably small or large when the weight goes wrong even once.
- So we need an independent cell that is responsible for regulating the weights.
-
The reason why it does not contain information from the past for a long time is that the size of the hidden vector quickly dissipates existing information due to non-linearity function and frequent matrix operations
- To improve this, we use a cell-state that constantly loses information.
-
In addition, rather than simply using the current information, to regularize information leakage
- how much information up to the previous point will be forgotten (Forget)
- how much current information will be reflected (Input)
- how much the final value will be actually used (Output)
-
Specifically, let input at time t be xt, cell state(Long interval), hidden state(Short interval) at time t be ct, ht
-
In addition to the three key terms, there are four additional terms that serve as forget, input, output, and core within the gate. Let's call it ft, it, ot, gt, respectively.
-
Then, update formula is following:
ftitotgtctht=σ(Wxhfxt+Whhfht−1+bhf)=σ(Wxhixt+Whhiht−1+bhi)=σ(Wxhoxt+Whhoht−1+bho)=tanh(Wxhgxt+Whhght−1+bhg)=ft⊙ct−1+it⊙gt=ot⊙tanh(ct)
-
Note that ct doesn't use activate function, so linearity preserved.