python ML with intel GPU

seongjae·2023년 10월 19일

DeepLearning MachineLearning cuda python

무지는 죄라는 것을 철저하게 깨달은 옛기억
CUDA tag를 달아 둔 것은 과거의 죄를 기록하기 위해서임
근데 이제 nvidia로 다시 넘어갈거 ㅋ

Intel extention for ML

python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu

pip install --upgrade intel-extension-for-tensorflow[xpu]

pip install --upgrade intel-extension-for-tensorflow-weekly[gpu] -f https://developer.intel.com/itex-whl-weekly

pip install intel-extension-for-transformers

underbar와 dash 주의 아주 지들 마음대로다

https://pypi.org/project/intel-extension-for-transformers/
https://pypi.org/project/intel-extension-for-pytorch/
https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/cheat_sheet.html
https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-extension-for-pytorch-for-gpus.html
https://pypi.org/project/intel-extension-for-tensorflow/
https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/guide/practice_guide.md

Transformers

from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")

INT4 Inference

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig

model_name = "EleutherAI/gpt-j-6B"
config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModel.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
gen_text = tokenizer.batch_decode(gen_tokens)

INT8 Inference

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig

model_name = "EleutherAI/gpt-j-6B" 
config = WeightOnlyQuantConfig(compute_dtype="bf16", weight_dtype="int8")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModel.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
gen_text = tokenizer.batch_decode(gen_tokens)

huggingface

huggingface 관련해서는 일단 intel data model section이 있지만 걍 cpu쓰는게 편한듯.
pipeline에서 device (int or str or torch.device) — Defines the device (e.g., "cpu", "cuda:1", "mps", or a GPU ordinal rank like 1) on which this pipeline will be allocated.
그렇지만 1은 cuda고 0로 해도 cuda가 잡힌다 어휴

supporting model

Model	FP32	INT4 (Group size 32)	INT4 (Group size 128)	Next Token Latency
EleutherAI/gpt-j-6B	0.643	0.644	0.64	21.98ms
meta-llama/Llama-2-7b-hf	0.69	0.69	0.685	24.55ms
decapoda-research/llama-7b-hf	0.689	0.682	0.68	24.84ms
EleutherAI/gpt-neox-20b	0.674	0.672	0.669	80.16ms
mosaicml/mpt-7b-chat	0.672	0.67	0.666	35.84ms
tiiuae/falcon-7b	0.698	0.694	0.693	36.1ms
baichuan-inc/baichuan-7B	0.474	0.471	0.47	Coming Soon
facebook/opt-6.7b	0.65	0.647	0.643	Coming Soon
databricks/dolly-v2-3b	0.613	0.609	0.609	22.02ms
tiiuae/falcon-40b-instruct	0.756	0.757	0.755	Coming Soon

pytorch

import intel_extension_for_pytorch as ipex
model = model.to('xpu')
data = data.to('xpu')
model = ipex.optimize(model)

https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/cheat_sheet.html

Plaidml은 잊자...

Error Log

tranformers extention 설치

pyproject.toml-based projects 문제. 하씨... setup.py
억지로 설치하기. 근디 버전이 안맞는게 많다... venv에서 했어야했는데
뭔가 꼬이고 꼬이고 또 꼬이는 기분
pip install --use-pep517 를 사용하라는 말이 있다.

git clone https://github.com/intel/intel-extension-for-transformers.git itrex
cd itrex
pip install -r requirements.txt
pip install -v .

reference

setup.py 그만

seongjae

Python User DA, Being GOod

이전 포스트

Git Commit Convention

다음 포스트