python ML with intel GPU

seongjae·2023년 10월 19일
0

무지는 죄라는 것을 철저하게 깨달은 옛기억
CUDA tag를 달아 둔 것은 과거의 죄를 기록하기 위해서임
근데 이제 nvidia로 다시 넘어갈거 ㅋ

Intel extention for ML

python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu

pip install --upgrade intel-extension-for-tensorflow[xpu]

pip install --upgrade intel-extension-for-tensorflow-weekly[gpu] -f https://developer.intel.com/itex-whl-weekly

pip install intel-extension-for-transformers
  • underbar와 dash 주의 아주 지들 마음대로다

https://pypi.org/project/intel-extension-for-transformers/
https://pypi.org/project/intel-extension-for-pytorch/
https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html
https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/cheat_sheet.html
https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-extension-for-pytorch-for-gpus.html
https://pypi.org/project/intel-extension-for-tensorflow/
https://github.com/intel/intel-extension-for-tensorflow/blob/main/docs/guide/practice_guide.md

Transformers

from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")

INT4 Inference

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig

model_name = "EleutherAI/gpt-j-6B"
config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModel.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
gen_text = tokenizer.batch_decode(gen_tokens)

INT8 Inference

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig

model_name = "EleutherAI/gpt-j-6B" 
config = WeightOnlyQuantConfig(compute_dtype="bf16", weight_dtype="int8")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModel.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
gen_text = tokenizer.batch_decode(gen_tokens)

huggingface

huggingface 관련해서는 일단 intel data model section이 있지만 걍 cpu쓰는게 편한듯.
pipeline에서 device (int or str or torch.device) — Defines the device (e.g., "cpu", "cuda:1", "mps", or a GPU ordinal rank like 1) on which this pipeline will be allocated.
그렇지만 1은 cuda고 0로 해도 cuda가 잡힌다 어휴

supporting model

ModelFP32INT4 (Group size 32)INT4 (Group size 128)Next Token Latency
EleutherAI/gpt-j-6B0.6430.6440.6421.98ms
meta-llama/Llama-2-7b-hf0.690.690.68524.55ms
decapoda-research/llama-7b-hf0.6890.6820.6824.84ms
EleutherAI/gpt-neox-20b0.6740.6720.66980.16ms
mosaicml/mpt-7b-chat0.6720.670.66635.84ms
tiiuae/falcon-7b0.6980.6940.69336.1ms
baichuan-inc/baichuan-7B0.4740.4710.47Coming Soon
facebook/opt-6.7b0.650.6470.643Coming Soon
databricks/dolly-v2-3b0.6130.6090.60922.02ms
tiiuae/falcon-40b-instruct0.7560.7570.755Coming Soon

pytorch

import intel_extension_for_pytorch as ipex
model = model.to('xpu')
data = data.to('xpu')
model = ipex.optimize(model)

https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/cheat_sheet.html

Plaidml은 잊자...

Error Log

  1. tranformers extention 설치
  • pyproject.toml-based projects 문제. 하씨... setup.py
  • 억지로 설치하기. 근디 버전이 안맞는게 많다... venv에서 했어야했는데
    뭔가 꼬이고 꼬이고 또 꼬이는 기분
  • pip install --use-pep517 를 사용하라는 말이 있다.
git clone https://github.com/intel/intel-extension-for-transformers.git itrex
cd itrex
pip install -r requirements.txt
pip install -v .

reference

setup.py 그만

profile
Python User DA, Being GOod

0개의 댓글