Intel OpenVINO For Fast CPU/NPU Inference

Cafelatte·2025년 4월 5일

LLM cpu intel npu quantization

OpenVINO 기능 개요

CPU/NPU를 위한 LLM 추론 프레임워크
모델 양자화 지원
LLM 저장 포맷을 OV 포맷으로 변환 지원 (OpenVINO 자체 모델 포맷)
AI Agent를 위한 경량 모델 추론 프레임워크로 활용 가능

설치 프로세스

# for llm inference
pip install openvino-genai
# for OV format conversion
pip install nncf
pip install git+https://github.com/huggingface/optimum-intel.git
# for supporting latest models
pip install -U transformers

모델 변환

4Bit 양자화 모델 변환

# For text only model
optimum-cli export openvino --model $기존모델경로 --task text-generation-with-past --weight-format int4 $변환모델경로
# For visio-text model
optimum-cli export openvino --model $기존모델경로 --task image-text-to-text --weight-format int4 $변환모델경로

모델 로딩 및 생성

import openvino_genai as ov_genai
# load model
pipe = ov_genai.LLMPipeline(OV_MODEL_PATH, "CPU")
# generation
pipe.generate("hello world", max_new_tokens=100)

생성 예시

바로 활용 가능한 정보 공유를 목적으로 합니다

이전 포스트

RAPIDS - Zero Code Change Acceleration

다음 포스트

Flutter Andorid GCP Google Login (without Firebase)

0개의 댓글