최신 논문 Tracking

ODD·2024년 8월 29일

논문

목록 보기

1/2

Key topics: Quantization, Pruning, Object detection, Transformer, Mamba

follow-up study of ShiftAddNet
ShiftAddNet: doubling of #layers and limited representational
capacity due to frozen shift weights
maintained the original #layers without freezing

one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm
modifies a single index in the compressed data stream so that the decompressed image is misclassified

VQ decomposes model weight into a codebook and assignments
prev VQ methods calibrate only
the codebook without calibrating the assignments => weight sub-vectors being incorrectly assigned to the same assignment => providing inconsistent gradients
candidate assignment set \& reconstructs the sub-vector based on the weighted average
using the zero-data and block-wise calibration method, the
optimal assignment from the set is efficiently selected
(PTQ 과정에서 기존 연구는 quantization 오류만 줄이고, cluster 할당은 그대로 사용 => cluster 분포도 최적화가 필요!)

SleepNet seamlessly integrates supervised learning
with unsupervised “sleep" stages using pre-trained encoder models
DreamNet employs full encoder-decoder frameworks to reconstruct the hidden states, mimicking the human "dreaming" process

Quantization
MobileQuant: Mobile-friendly Quantization for On-device Language Models
(Samsung AI Center, Cambridge)

PTQ
on-device deployment of LLMs using integer-only quantization
jointly optimizing the weight transformation and activation range parameters in an end-to-end manner

Pruning
LLM Pruning and Distillation in Practice: The Minitron Approach
(NVIDIA)

Not training each model, but performing pruning & knowledge distillation for small models
lack of access to the original training data => fine-tune the teacher model on our own dataset (teacher correction)

Mamba
ReMamba: Equip Mamba with Effective Long-Sequence Modeling

selective compression and adaptation techniques within a
two-stage re-forward process