MoE

1.Mixtral of Experts

post-thumbnail

2.Mixture of A Million Experts

post-thumbnail

3.MoRAL: MoE Augmented LoRA for LLMs’ Lifelong Learning

post-thumbnail

4.Mixture-of-Experts with Expert Choice Routing

post-thumbnail

5.Yuan 2.0-M32: Mixture of Experts with Attention Router

post-thumbnail

6.MoEAtt: A Deep Mixture of Experts Model using Attention-based Routing Gate

post-thumbnail

7.OLMoE: Open Mixture-of-Experts Language Models

post-thumbnail

8.MONET: Mixture of Monosemantic Experts for Transformers

post-thumbnail

9.Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

post-thumbnail

10.LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-Training

post-thumbnail

11.DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism

post-thumbnail

12.FLEXOLMO

post-thumbnail

13.DEMIX Layers: Disentangling Domains for Modular Language Modeling

post-thumbnail

14.SliceMoE:Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling

post-thumbnail