vLLM 버전 호환성 정리
- vLLM 0.5.3.post1 버전에서 Gemma2 모델 띄울 시 flashinfer 라이브러리 필요 필요
| vLLM | PyTorch | flashinfer |
|---|
| 0.5.3.post1 | 2.3.1 | 0.1.2+cu121torch2.3 |
| 0.6.3.post1 | 2.4.1 | 0.1.2+cu121torch2.4 |
| 0.6.6.post1 | 2.5.1 | 0.1.6+cu121torch2.4 |
설치 예시
#!/bin/sh
pip install vllm==0.6.6.post1
pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install -U typing-extensions filelock
pip install flashinfer==0.1.6 -i https://flashinfer.ai/whl/cu121/torch2.4/
서빙 예시 (feat. 8bit Quantization)
#!/bin/sh
vllm serve LLM_MODEL_PATH \
--load-format auto --enforce-eager --trust-remote-code \
--quantization fp8 --dtype auto \
--max-model-len 512 --gpu-memory-utilization 0.95 \
--seed 42
sh run_vllm.sh