MoE

1.Mixtral of Experts

post-thumbnail

2.Mixture of A Million Experts

post-thumbnail

3.MoRAL: MoE Augmented LoRA for LLMs’ Lifelong Learning

post-thumbnail

4.Mixture-of-Experts with Expert Choice Routing

post-thumbnail

5.Yuan 2.0-M32: Mixture of Experts with Attention Router

post-thumbnail

6.MoEAtt: A Deep Mixture of Experts Model using Attention-based Routing Gate

post-thumbnail

7.OLMoE: Open Mixture-of-Experts Language Models

post-thumbnail

8.MONET: Mixture of Monosemantic Experts for Transformers

post-thumbnail

9.Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

post-thumbnail