Paper review

1.[Paper review] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

post-thumbnail

2.[Paper review] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

post-thumbnail

3.[Paper review] ZeroQ: A Novel Zero Shot Quantization Framework

post-thumbnail

4.[Paper review] GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

post-thumbnail

5.[Paper review] Learned Token Pruning for Transformers

post-thumbnail

6.[Paper review] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

post-thumbnail

7.[Paper review] KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference

post-thumbnail

8.[Paper review] Quantization in Layer’s Input is Matter

post-thumbnail

9.[Paper review] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

post-thumbnail

10.[Paper review] LoRA: Low-Rank Adaptation of Large Language Models

post-thumbnail

11.[Paper review] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

post-thumbnail

12.[Paper review] STAIR: Improving Safety Alignment with Introspective Reasoning

post-thumbnail

13.[Paper review] HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

post-thumbnail

14.[Paper review] Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

post-thumbnail