Deep Learning

1.[NeurlPS'17] Attention Is All You Need

post-thumbnail

2.[ICML'23] Fast Inference from Transformers via Speculative Decoding

post-thumbnail

3.Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking

post-thumbnail

4.Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

post-thumbnail

5.Qwen 3 Technical Report

post-thumbnail

6.[ICLR'25] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

post-thumbnail

7.[ICML'24] MEDUSA: Simple LLM inference acceleration framework with multiple decoding heads

post-thumbnail

8.[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

post-thumbnail

9.[NeurlPS'25] MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

post-thumbnail

10.Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE

post-thumbnail

11.POSS:Position Specialist Generates Better Draft for Speculative Decoding

post-thumbnail

12.[NeurIPS'25] GRIFFIN: Effective Token Alignment for Faster Speculative Decoding

post-thumbnail

13.[NeurIPS'25] Scaling Speculative Decoding with Lookahead Reasoning

post-thumbnail