๐ ์ค๋์ ๊ฐ๋จํ kogpt2 ๋ชจ๋ธ์ BentoML๋ก ์๋นํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค!
๐
์ง๋๋ฒ ๊ธ์ BentoML v1.0-pre ๋ฒ์ ์ด์์ง๋ง, Transformers ๊ด๋ จ API๊ฐ BentoML 0.11.1(stable)์ ์กด์ฌํ๋ ์ด์ ๋ก, ์ค๋ ํ๊ฒฝ์ BentoML==0.11.1 ์
๋๋ค!
(transformers==4.18.0 / torch==1.11.0)
๐ค 'NLP์ ๋ฏผ์ฃผํ' ๋ผ๋ ์ปจ์ ์ผ๋ก ์ ๋ช ํ huggingface์ Transformer ๊ธฐ๋ฐ์ผ๋ก ์์ฑ๋ "ํ๊ตญ์ด ์์ฑ ๋ชจ๋ธ"์ ๋๋ค.
๐ Self-Attention ๋ฉ์ปค๋์ฆ์ ์ฌ์ฉํ Transformer ๋ฑ์ฅ์ผ๋ก ๊ฐ๋ ฅํ ์ฑ๋ฅ ํฅ์์ ์ด๋ค๋ธ ์ดํ, ํด๋น ๋ฉ์ปค๋์ฆ์ ์ฐจ์ฉํ ๋ค์ํ NLP ๋๋ฉ์ธ D/L ์๊ณ ๋ฆฌ์ฆ๋ค์ด ๋๊ฑฐ ๋ฑ์ฅํ์์ต๋๋ค.
๐ ๊ฐ์ฅ ๋ํ์ ์ธ ๊ฒ๋ค์ด GPT(์์ฑ), BERT(Pre-train) ๋ฑ์ด ์์ต๋๋ค.
๐ ๊ทธ๋ฌํ ๋ค์ํ ์๊ณ ๋ฆฌ์ฆ๋ค์ ๊ฐ๋จํ API๋ก ์ฌ์ฉํ ์ ์๋๋ก ๊ตฌํํด ๋์ ์์ฃผ ๊ณ ๋ง์ด ํจํค์ง์ด์ฃ :)
๐ ์ต๊ทผ์ ๋์ฑ ๋ฐ์ ํด์ docker hub์ฒ๋ผ weight hub๊น์ง ๊ตฌ์ถํ์ฌ ์์ ๋กญ๊ฒ Data Scientist๋ค์ด ์์ฑํ ๋ชจ๋ธ๋ค์ ๋ฐฐํฌ๊น์ง API๋จ์์ ๊ฐ๋ฅํ๊ฒํ๋ ํ๋ซํผ ์ญํ ๊น์ง ์ํํ๊ณ ์์ต๋๋ค.
๐ ๊ฒฐ๊ตญ, ๋ฌธ์ฅ์ ๋ฃ์ผ๋ฉด ํด๋น ๋ฌธ์ฅ ๊ธฐ๋ฐ์ผ๋ก ๋ง์ ์ด์ด์ ์์ฑํ๋ "ํ๊ตญ์ด" NLP ์์ฑ ๋ชจ๋ธ์ ๋๋ค!
# ์์
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("skt/kogpt2-base-v2",
bos_token='</s>', eos_token='</s>', unk_token='<unk>',
pad_token='<pad>', mask_token='<mask>')
# print(tokenizer.tokenize("์๋
ํ์ธ์. ํ๊ตญ์ด GPT-2 ์
๋๋ค.๐ค:)l^o"))
import torch
from transformers import GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained('skt/kogpt2-base-v2')
text = '๋ฌ ๋ฐ์ ๋ฐค'
input_ids = tokenizer.encode(text, return_tensors='pt')
gen_ids = model.generate(input_ids,
max_length=256,
repetition_penalty=4.0,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
use_cache=True)
generated = tokenizer.decode(gen_ids[0])
print(generated)
>>> ๋ฌ ๋ฐ์ ๋ฐคํ๋์ ๋ณผ ์ ์๋ ๊ณณ.</d> ์ง๋ํด 12์ 31์ผ ์คํ 2์ ์์ธ ์ข
๋ก๊ตฌ ์ธ์ข
๋ฌธํํ๊ด ๋๊ทน์ฅ์์ ์ด๋ฆฐ โ2018 ๋ํ๋ฏผ๊ตญ์ฐ๊ทน์ โ ๊ฐ๋ง์์๋ ๋ฐฐ์ฐ๋ค๊ณผ ๊ด๊ฐ๋ค์ด ๋๊ฑฐ ์ฐธ์ํ๋ค.
์ด๋ ๊ฐ๋งํ ์ฐ๊ทน์ ๋ ์ฌํด๋ก 10ํ์งธ๋ฅผ ๋ง๋ ๊ตญ๋ด ์ต๋ ๊ท๋ชจ์ ๊ณต์ฐ์์ ์ถ์ ๋ค.
์ฌํด๋ ์ฝ๋ก๋19๋ก ์ธํด ์จ๋ผ์ธ์ผ๋ก ์งํ๋๋ค. ...
https://docs.bentoml.org/en/0.13-lts/quickstart.html#example-hello-world
https://docs.bentoml.org/en/0.13-lts/frameworks.html#transformers
https://sooftware.io/bentoml/
๐ 2๊ฐ script๋ก ๊ตฌ์ฑ๋ ํ์ผ์
๋๋ค.
๐ main.py์์๋ transformers ๊ด๋ จ ํจ์๋ฅผ ๋ก๋ํ๊ณ , ํด๋น ๋ณ์๋ฅผ service์ "pack"ํฉ๋๋ค.
๐ ๊ทธ๋ฆฌ๊ณ BentoML์ point์ธ Service๋ bento_service.py์์ ๊ตฌํํ ๊ฒ์ importํ์ฌ ์ฌ์ฉํฉ๋๋ค.
# main.py
import torch
from transformers import GPT2LMHeadModel
from transformers import PreTrainedTokenizerFast
from bento_service import TransformerService
model_name = 'kogpt2'
model = GPT2LMHeadModel.from_pretrained('skt/kogpt2-base-v2')
tokenizer = PreTrainedTokenizerFast.from_pretrained("skt/kogpt2-base-v2",
bos_token='</s>', eos_token='</s>', unk_token='<unk>',
pad_token='<pad>', mask_token='<mask>')
service = TransformerService()
artifact = {'model' : model, 'tokenizer' : tokenizer}
service.pack("kogpt2Model", artifact)
saved_path = service.save()
๐ BentoService class๋ฅผ ์์๋ฐ์ TransformerService๋ ์์ฑ๋ฉ๋๋ค.
โจ ๊ทธ๋ฆฌ๊ณ , decorator๋ ํด๋น ํฌ๋งท์ผ๋ก ์ฌ์ฉํ๋ค๊ณ ์ดํดํ๋ฉด ์ฌ์ฉ์๋ ๋ฌธ์ ์์๊ฒ์ผ๋ก ๋ณด์
๋๋ค.
โ @env๋ inference ์ํ package / version์ ์ ์ํฉ๋๋ค.
โ @artifacts๋ BentoML์์ ๋ฏธ๋ฆฌ ์์ฑ๋ Artifact์ค ์ด๋ค ๊ฒ์ ์ฌ์ฉํ ์ง ๋ช
์ํ๋ฉฐ, Artifact์ ์ด๋ฆ์ ์ ์ธํฉ๋๋ค.
โ @api๋ input์ format์ ๋ฏธ๋ฆฌ ๊ฒฐ์ ํ๋ฉฐ, ๋ชจ๋ธ์ด serve๋์์ ๋ api๋ก ๋ฐ๋ก ์ฌ์ฉํ ์ ์๊ฒ ํ๋ decorator์
๋๋ค.
๐ ๊ทธ๋ฆฌ๊ณ , predict์ ๋ก์ง์ ์์ฑํด์ฃผ๋ฉด ๋ฉ๋๋ค.
โ main.py์์ ๋๊ฒจ์ค artifact๋ฅผ ์ฌ์ฉํ๊ธฐ ์ํด, @artifacts decorator์์ ์ค์ ํ "kogpt2Model"์ ์ฌ์ฉํ์ฌ model๊ณผ tokenizer๋ฅผ ์ฌ์ฉํ๋ ๊ฒ์ ํ์ธํ ์ ์์ต๋๋ค.
# bento_service.py
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.frameworks.transformers import TransformersModelArtifact
@env(pip_packages=["transformers==4.18.0", "torch==1.11.0"])
@artifacts([TransformersModelArtifact("kogpt2Model")])
class TransformerService(BentoService):
@api(input=JsonInput(), batch=False)
def predict(self, parsed_json):
src_text = parsed_json.get("text")
model = self.artifacts.kogpt2Model.get("model")
tokenizer = self.artifacts.kogpt2Model.get("tokenizer")
input_ids = tokenizer.encode(src_text, return_tensors="pt")
gen_ids = model.generate(input_ids,
max_length=256,
repetition_penalty=4.0,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
use_cache=True)
output = tokenizer.decode(gen_ids[0])
return output
๐ ๋จผ์ , service.save() ๋ก์ง์ด ์์ฑ๋์ด ์๋ main.py๋ฅผ ์คํํด์ค๋๋ค!
# bash
python main.py
๐ ์ ์์ ์ผ๋ก ์ํ๋๋ฉด, ์๋์ ๊ฐ์ ๋ฉ์์ง์ ํจ๊ป Service๊ฐ ์ ์ฅ๋์์์ ์ ์ ์์ต๋๋ค.
โจ ์ค์ ๋ก, ํด๋น ๋๋ ํ ๋ฆฌ๋ฅผ ํ์ธํ๋ฉด requirements.txt / Dockerfile ๋ฑ ๋ฐฐํฌ๋ฅผ ์ํ ํ์ผ๋ค์ด ์๋์ผ๋ก ์์ฑ๋์์์ ํ์ธํ ์ ์์ต๋๋ค.
[2022-04-19 23:49:29,091] INFO - BentoService bundle 'TransformerService:20220419234923_5080D7' saved to: /home/kang/bentoml/repository/TransformerService/20220419234923_5080D7
๐ ๊ทธ๋ฆฌ๊ณ , ํด๋น Service๋ฅผ local์์ run ํ๋ฉด Server๊ฐ ๋จ๊ฒ ๋ฉ๋๋ค!
# bash
bentoml serve TransformerService:latest
>>> 2022-04-19 23:51:14,614] INFO - Getting latest version TransformerService:20220419234923_5080D7
[2022-04-19 23:51:14,622] INFO - Starting BentoML API proxy in development mode..
[2022-04-19 23:51:14,623] INFO - Starting BentoML API server in development mode..
[2022-04-19 23:51:14,743] INFO - Your system nofile limit is 4096, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
======== Running on http://0.0.0.0:5000 ========
(Press CTRL+C to quit)
* Serving Flask app 'TransformerService' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:54471 (Press CTRL+C to quit)
๐ http://127.0.0.1:54471 ๋ก ๋ค์ด๊ฐ๊ฒ ๋๋ฉด, Swagger์์ ๊ฐํธํ๊ฒ ๊ฐ์ ํ์ธํด ๋ณผ ์ ์์ต๋๋ค :)
โจ ๋ฌผ๋ก , python requests package๋ curl๋ฅผ ํ์ฉํ์ฌ 5000๋ฒ port๋ก ์์ฒญํ์ฌ๋ ๋ฉ๋๋ค :)
import requests
res = requests.post("http://127.0.0.1:5000/predict", json={"text": "๊ฐ๋ ์ด์ํ ๋ง๋ ํด์"})
print(res.text)
๐ kogpt2๋ฅผ ์ฌ์ฉํ ๊ฒฐ๊ณผ๋ค์ ๋ณด๋, ๋ง ์์ฒด๋ ๋์ง๋ง ๊ฒฐ๊ตญ ์ฝ๊ฐ ๋ฌธ๋งฅ์ ๋ง์ง ์๋ ๋จ์ด๋ค์ ์ฐ์ํด์ ๋ฑ๋ ๋ถ๋ถ์ด ์์์ ํ์ธํ์์ต๋๋ค ;<
๐ค kogpt2๋ฅผ ์ ์
๋ง์ ๋ง๊ฒ ์ฌ๋ฐ๋ ๋ชจ๋ธ์ ๋ง๋ค์ด์ ๋ค์ ๊ธ์์๋ serving ํด๋ณด๋๋ก ํ๊ฒ ์ต๋๋ค :)
๐ ๊ทธ๋ฆฌ๊ณ , https://sooftware.io/bentoml/ ์ ๊ธ์ ๊ณต๋ถํ๋ฉฐ ๋ง์ ๋์์ ์ป์์ต๋๋ค. ๊ฐ์ฌํฉ๋๋ค! ๐
๐ ๊ทธ๋ผ ์ค๋๋ ์ฝ์ด์ฃผ์
์ ๊ฐ์ฌํฉ๋๋ค!
Hello
Thank you for this. I am having hard time understanding BentoML artifacts for various frameworks.
In my case, my colleague was using below model from Transformers library's Pipeline module to get the emotions score for the imput text:
classifier = pipeline("text-classification",model='bhadresh-savani/distilbert-base-uncased-emotion', return_all_scores=True)
Now, I need to create a rest API for this. I using bentoml. I am just not able to understand how to use pipeline instance in the Artifacts wrapper.
I tried below approach:
Writing bentoml code
import re
import bentoml
from transformers import DistilBertTokenizer
from transformers import DistilBertModel
from transformers import DistilBertForSequenceClassification
from bentoml.adapters import JsonInput
from bentoml.frameworks.transformers import TransformersModelArtifact
@bentoml.env(pip_packages = ["transformers==4.18.0", "torch==1.11.0"])
@bentoml.artifacts([TransformersModelArtifact('distilbert')])
class DistillBertService(bentoml.BentoService):
ds = DistillBertService()
MODEL_NAME = 'distilbert-base-cased'
model = DistilBertModel.from_pretrained(MODEL_NAME)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_NAME)
artifact = {"model": model, "tokenizer": tokenizer}
ds.pack("distilbert", artifact)
saved_path = ds.save()
When I hit request, I get below error:
File "/home/dd00740409/bentoml/repository/DistillBertService/20220421051733_82868D/DistillBertService/distill_bert_service_with_preprocessing.py", line 41, in predict
output = model.generate(input_ids, max_length=50)
File "/home/dd00740409/.conda/envs/distill-bert/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, kwargs)
File "/home/dd00740409/.conda/envs/distill-bert/lib/python3.7/site-packages/transformers/generation_utils.py", line 1263, in generate
model_kwargs,
File "/home/dd00740409/.conda/envs/distill-bert/lib/python3.7/site-packages/transformers/generation_utils.py", line 1649, in greedy_search
next_token_logits = outputs.logits[:, -1, :]
AttributeError: 'BaseModelOutput' object has no attribute 'logits'
Can u guide a bit?
Thanks