In NLP, Self-Attention-based arcitectures became major model. Transformers'computational efficiecy and scalability make it unprecedented size trains.O
Introduction