Pruning 라이브러리 Wanda 사용해보기

J. Hwang·2025년 1월 23일

모델 경량화는 성능을 최대한 유지시키면서도 거대 AI 모델의 크기를 줄이는 기술이다.
모델 경량화의 기법에는 크게 Pruning, Knowledge Distillation, Quantization이 있는데, 이번에는 Pruning을 직접 해보고자 한다.
Pruning을 구현한 코드는 SparseGPT, Wanda, DSnoT, LLM-Pruner, shortened LLaMa, FLAP, sliceGPT 등이 있다.
이번 포스팅에서는 wanda를 사용해 본 기록을 남겨본다.

Installation

INSTALL.md를 참고하면 설치할 수 있다.

# Step 1: Create a new conda environment:

conda create -n prune_llm python=3.9
conda activate prune_llm

# Step 2: Install relevant packages

conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install transformers==4.28.0 datasets==2.11.0 wandb sentencepiece
pip install accelerate==0.18.0

다만 나는 conda를 활용하지 않고 pip만으로 설치하려다보니 버전을 다르게 설치해야 했고 중간에 버전이 맞지 않아 오류가 생겼다. 최종적으로 오류가 나지 않게 정리한 requirements.txt는 아래와 같다. (25년 1월 24일, CUDA Version 12.2 기준)

transformers==4.47.1
datasets==3.2.0
wandb
sentencepiece
accelerate==0.26.0
torch==2.0.1+rocm5.4.2
torchvision==0.15.2+rocm5.4.2
torchaudio==2.0.2+rocm5.4.2

그 후 wanda 라이브러리의 코드를 git clone 해온다.
git clone https://github.com/locuslab/wanda.git

실행

코드 실행은 README에 있는 방법대로 해보았다. llama-3.2-3B-Instruct를 2B로 Pruning하는 것을 목적으로 시도했다.

python main.py --model meta-llama/Llama-3.2-3B-Instruct --prune_method wanda --sparsity_ratio 0.33 --sparsity_type unstructured --save out/llama_3b/unstructured/wanda/

그러나 아래와 같은 에러가 뜨면서 모델을 로드하지 못했다.

OSError: meta-llama/Llama-3.2-3B-Instruct does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

이는 처음에 transformers의 버전이 적절하지 못해서 생긴 문제였고, transformers==4.28.0에서 transformers==4.47.1로 새로 설치한 이후에는 해결되었다.

References

https://github.com/locuslab/wanda
https://sofar-sogood.tistory.com/entry/Diffusers-에러-OSError-Error-no-file-named-pytorchmodelbin-tfmodelh5-modelckptindex-or-flaxmodelmsgpack-found-in-directory

J. Hwang

Let it code

이전 포스트

Hugging face 가중치 없는 모델 로드하기

다음 포스트

Pruning 라이브러리 Wanda 사용해보기

Installation

실행

References

Hugging face 가중치 없는 모델 로드하기

[백준] 1629 곱셈

0개의 댓글