
make attention algorithnms IOaware;accounting for reads and writes between levels of GPU memoryFlashAttentionIO-aware exact attention algorithmuse til

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutionsExperi