PARALLELIZING LINEAR RECURRENT NEURAL NETS OVER SEQUENCE LENGTH

About_work·2023년 7월 25일

딥러닝

목록 보기

2/16

We show the training of RNNs with only linear sequential dependencies can be parallelized over the sequence length using the parallel scan algorithm,
- leading to rapid training on long sequences even with small minibatch size.
We develop a parallel linear recurrence CUDA kernel and show that
- it can be applied to immediately speed up training and inference of several state of the art RNN architectures by up to 9x.
We abstract(추상화하다) recent work on linear RNNs into a new framework of linear surrogate RNNs
and develop a linear surrogate model for the long short-term memory unit, the GILR-LSTM, that utilizes parallel linear recurrence.
We extend sequence learning to new extremely long sequence regimes that were previously out of reach
- by successfully training a GILR-LSTM on a synthetic sequence classification task with a one million timestep dependency.

새로운 것이 들어오면 이미 있는 것과 충돌을 시도하라.