[MLOps] ๐Ÿฑ BentoML - 2 : kogpt2 with transformers!

๊ฐ•์ฝฉ์ฝฉยท2022๋…„ 4์›” 19์ผ
0

MLOps

๋ชฉ๋ก ๋ณด๊ธฐ
2/2
post-thumbnail

๐Ÿ˜Ž ์˜ค๋Š˜์€ ๊ฐ„๋‹จํžˆ kogpt2 ๋ชจ๋ธ์„ BentoML๋กœ ์„œ๋น™ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค!
๐Ÿ˜… ์ง€๋‚œ๋ฒˆ ๊ธ€์€ BentoML v1.0-pre ๋ฒ„์ „์ด์—ˆ์ง€๋งŒ, Transformers ๊ด€๋ จ API๊ฐ€ BentoML 0.11.1(stable)์— ์กด์žฌํ•˜๋Š” ์ด์œ ๋กœ, ์˜ค๋Š˜ ํ™˜๊ฒฝ์€ BentoML==0.11.1 ์ž…๋‹ˆ๋‹ค!
(transformers==4.18.0 / torch==1.11.0)

kogpt2?

https://github.com/SKT-AI/KoGPT2

๐Ÿค— 'NLP์˜ ๋ฏผ์ฃผํ™”' ๋ผ๋Š” ์ปจ์…‰์œผ๋กœ ์œ ๋ช…ํ•œ huggingface์˜ Transformer ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘์„ฑ๋œ "ํ•œ๊ตญ์–ด ์ƒ์„ฑ ๋ชจ๋ธ"์ž…๋‹ˆ๋‹ค.

huggingface? Transformer?

https://huggingface.co/docs/transformers/index

๐Ÿ˜‰ Self-Attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•œ Transformer ๋“ฑ์žฅ์œผ๋กœ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ค„๋‚ธ ์ดํ›„, ํ•ด๋‹น ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ฐจ์šฉํ•œ ๋‹ค์–‘ํ•œ NLP ๋„๋ฉ”์ธ D/L ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์ด ๋Œ€๊ฑฐ ๋“ฑ์žฅํ•˜์˜€์Šต๋‹ˆ๋‹ค.
๐Ÿ˜„ ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ๊ฒƒ๋“ค์ด GPT(์ƒ์„ฑ), BERT(Pre-train) ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ˜‹ ๊ทธ๋Ÿฌํ•œ ๋‹ค์–‘ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ๊ฐ„๋‹จํžˆ API๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ตฌํ˜„ํ•ด ๋†“์€ ์•„์ฃผ ๊ณ ๋งˆ์šด ํŒจํ‚ค์ง€์ด์ฃ  :)
๐Ÿ˜Œ ์ตœ๊ทผ์€ ๋”์šฑ ๋ฐœ์ „ํ•ด์„œ docker hub์ฒ˜๋Ÿผ weight hub๊นŒ์ง€ ๊ตฌ์ถ•ํ•˜์—ฌ ์ž์œ ๋กญ๊ฒŒ Data Scientist๋“ค์ด ์ž‘์„ฑํ•œ ๋ชจ๋ธ๋“ค์„ ๋ฐฐํฌ๊นŒ์ง€ API๋‹จ์—์„œ ๊ฐ€๋Šฅํ•˜๊ฒŒํ•˜๋Š” ํ”Œ๋žซํผ ์—ญํ• ๊นŒ์ง€ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

kogpt2!

http://aidev.co.kr/chatbotdeeplearning/9538

๐Ÿ˜ ๊ฒฐ๊ตญ, ๋ฌธ์žฅ์„ ๋„ฃ์œผ๋ฉด ํ•ด๋‹น ๋ฌธ์žฅ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ง์„ ์ด์–ด์„œ ์ž‘์„ฑํ•˜๋Š” "ํ•œ๊ตญ์–ด" NLP ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค!

# ์˜ˆ์‹œ
from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("skt/kogpt2-base-v2",
  bos_token='</s>', eos_token='</s>', unk_token='<unk>',
  pad_token='<pad>', mask_token='<mask>')

# print(tokenizer.tokenize("์•ˆ๋…•ํ•˜์„ธ์š”. ํ•œ๊ตญ์–ด GPT-2 ์ž…๋‹ˆ๋‹ค.๐Ÿ˜ค:)l^o"))

import torch
from transformers import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('skt/kogpt2-base-v2')

text = '๋‹ฌ ๋ฐ์€ ๋ฐค'

input_ids = tokenizer.encode(text, return_tensors='pt')

gen_ids = model.generate(input_ids,
                           max_length=256,
                           repetition_penalty=4.0,
                           pad_token_id=tokenizer.pad_token_id,
                           eos_token_id=tokenizer.eos_token_id,
                           bos_token_id=tokenizer.bos_token_id,
                           use_cache=True)

generated = tokenizer.decode(gen_ids[0])

print(generated)
>>> ๋‹ฌ ๋ฐ์€ ๋ฐคํ•˜๋Š˜์„ ๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณณ.</d> ์ง€๋‚œํ•ด 12์›” 31์ผ ์˜คํ›„ 2์‹œ ์„œ์šธ ์ข…๋กœ๊ตฌ ์„ธ์ข…๋ฌธํ™”ํšŒ๊ด€ ๋Œ€๊ทน์žฅ์—์„œ ์—ด๋ฆฐ โ€˜2018 ๋Œ€ํ•œ๋ฏผ๊ตญ์—ฐ๊ทน์ œโ€™ ๊ฐœ๋ง‰์‹์—๋Š” ๋ฐฐ์šฐ๋“ค๊ณผ ๊ด€๊ฐ๋“ค์ด ๋Œ€๊ฑฐ ์ฐธ์„ํ–ˆ๋‹ค.
์ด๋‚  ๊ฐœ๋ง‰ํ•œ ์—ฐ๊ทน์ œ๋Š” ์˜ฌํ•ด๋กœ 10ํšŒ์งธ๋ฅผ ๋งž๋Š” ๊ตญ๋‚ด ์ตœ๋Œ€ ๊ทœ๋ชจ์˜ ๊ณต์—ฐ์˜ˆ์ˆ ์ถ•์ œ๋‹ค.
์˜ฌํ•ด๋Š” ์ฝ”๋กœ๋‚˜19๋กœ ์ธํ•ด ์˜จ๋ผ์ธ์œผ๋กœ ์ง„ํ–‰๋๋‹ค. ...

BentoML!

https://docs.bentoml.org/en/0.13-lts/quickstart.html#example-hello-world
https://docs.bentoml.org/en/0.13-lts/frameworks.html#transformers
https://sooftware.io/bentoml/

Script

๐Ÿ˜† 2๊ฐœ script๋กœ ๊ตฌ์„ฑ๋œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.
๐Ÿ˜‹ main.py์—์„œ๋Š” transformers ๊ด€๋ จ ํ•จ์ˆ˜๋ฅผ ๋กœ๋“œํ•˜๊ณ , ํ•ด๋‹น ๋ณ€์ˆ˜๋ฅผ service์— "pack"ํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ˜Ž ๊ทธ๋ฆฌ๊ณ  BentoML์˜ point์ธ Service๋Š” bento_service.py์—์„œ ๊ตฌํ˜„ํ•œ ๊ฒƒ์„ importํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

# main.py
import torch
from transformers import GPT2LMHeadModel
from transformers import PreTrainedTokenizerFast
from bento_service import TransformerService

model_name = 'kogpt2'
model = GPT2LMHeadModel.from_pretrained('skt/kogpt2-base-v2')
tokenizer = PreTrainedTokenizerFast.from_pretrained("skt/kogpt2-base-v2",
  bos_token='</s>', eos_token='</s>', unk_token='<unk>',
  pad_token='<pad>', mask_token='<mask>')

service = TransformerService()

artifact = {'model' : model, 'tokenizer' : tokenizer}
service.pack("kogpt2Model", artifact)

saved_path = service.save()

๐Ÿ˜ BentoService class๋ฅผ ์ƒ์†๋ฐ›์•„ TransformerService๋Š” ์ž‘์„ฑ๋ฉ๋‹ˆ๋‹ค.
โœจ ๊ทธ๋ฆฌ๊ณ , decorator๋Š” ํ•ด๋‹น ํฌ๋งท์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ์ดํ•ดํ•˜๋ฉด ์‚ฌ์šฉ์—๋Š” ๋ฌธ์ œ ์—†์„๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.
โœ” @env๋Š” inference ์œ„ํ•œ package / version์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
โœ” @artifacts๋Š” BentoML์—์„œ ๋ฏธ๋ฆฌ ์ž‘์„ฑ๋œ Artifact์ค‘ ์–ด๋–ค ๊ฒƒ์„ ์‚ฌ์šฉํ• ์ง€ ๋ช…์‹œํ•˜๋ฉฐ, Artifact์˜ ์ด๋ฆ„์„ ์„ ์–ธํ•ฉ๋‹ˆ๋‹ค.
โœ” @api๋Š” input์˜ format์„ ๋ฏธ๋ฆฌ ๊ฒฐ์ •ํ•˜๋ฉฐ, ๋ชจ๋ธ์ด serve๋˜์—ˆ์„ ๋•Œ api๋กœ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” decorator์ž…๋‹ˆ๋‹ค.
๐Ÿ˜œ ๊ทธ๋ฆฌ๊ณ , predict์˜ ๋กœ์ง์„ ์ž‘์„ฑํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
โœ” main.py์—์„œ ๋„˜๊ฒจ์ค€ artifact๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด, @artifacts decorator์—์„œ ์„ค์ •ํ•œ "kogpt2Model"์„ ์‚ฌ์šฉํ•˜์—ฌ model๊ณผ tokenizer๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# bento_service.py
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.frameworks.transformers import TransformersModelArtifact

@env(pip_packages=["transformers==4.18.0", "torch==1.11.0"])
@artifacts([TransformersModelArtifact("kogpt2Model")])
class TransformerService(BentoService):
    @api(input=JsonInput(), batch=False)
    def predict(self, parsed_json):
        src_text = parsed_json.get("text")

        model = self.artifacts.kogpt2Model.get("model")
        tokenizer = self.artifacts.kogpt2Model.get("tokenizer")

        input_ids = tokenizer.encode(src_text, return_tensors="pt")

        gen_ids = model.generate(input_ids,
                           max_length=256,
                           repetition_penalty=4.0,
                           pad_token_id=tokenizer.pad_token_id,
                           eos_token_id=tokenizer.eos_token_id,
                           bos_token_id=tokenizer.bos_token_id,
                           use_cache=True)

        output = tokenizer.decode(gen_ids[0])

        return output

Model Serve!

๐Ÿ˜€ ๋จผ์ €, service.save() ๋กœ์ง์ด ์ž‘์„ฑ๋˜์–ด ์žˆ๋Š” main.py๋ฅผ ์‹คํ–‰ํ•ด์ค๋‹ˆ๋‹ค!

# bash
python main.py

๐Ÿ˜œ ์ •์ƒ์ ์œผ๋กœ ์ˆ˜ํ–‰๋˜๋ฉด, ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฉ”์‹œ์ง€์™€ ํ•จ๊ป˜ Service๊ฐ€ ์ €์žฅ๋˜์—ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
โœจ ์‹ค์ œ๋กœ, ํ•ด๋‹น ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ํ™•์ธํ•˜๋ฉด requirements.txt / Dockerfile ๋“ฑ ๋ฐฐํฌ๋ฅผ ์œ„ํ•œ ํŒŒ์ผ๋“ค์ด ์ž๋™์œผ๋กœ ์ƒ์„ฑ๋˜์—ˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

[2022-04-19 23:49:29,091] INFO - BentoService bundle 'TransformerService:20220419234923_5080D7' saved to: /home/kang/bentoml/repository/TransformerService/20220419234923_5080D7

๐Ÿ˜† ๊ทธ๋ฆฌ๊ณ , ํ•ด๋‹น Service๋ฅผ local์—์„œ run ํ•˜๋ฉด Server๊ฐ€ ๋œจ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค!

# bash
bentoml serve TransformerService:latest
>>> 2022-04-19 23:51:14,614] INFO - Getting latest version TransformerService:20220419234923_5080D7
[2022-04-19 23:51:14,622] INFO - Starting BentoML API proxy in development mode..
[2022-04-19 23:51:14,623] INFO - Starting BentoML API server in development mode..
[2022-04-19 23:51:14,743] INFO - Your system nofile limit is 4096, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
======== Running on http://0.0.0.0:5000 ========
(Press CTRL+C to quit)
 * Serving Flask app 'TransformerService' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:54471 (Press CTRL+C to quit)

๐Ÿ˜‰ http://127.0.0.1:54471 ๋กœ ๋“ค์–ด๊ฐ€๊ฒŒ ๋˜๋ฉด, Swagger์—์„œ ๊ฐ„ํŽธํ•˜๊ฒŒ ๊ฐ’์„ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค :)



โœจ ๋ฌผ๋ก , python requests package๋‚˜ curl๋ฅผ ํ™œ์šฉํ•˜์—ฌ 5000๋ฒˆ port๋กœ ์š”์ฒญํ•˜์—ฌ๋„ ๋ฉ๋‹ˆ๋‹ค :)

import requests
res = requests.post("http://127.0.0.1:5000/predict", json={"text": "๊ฐ€๋” ์ด์ƒํ•œ ๋ง๋„ ํ•ด์š”"})
print(res.text)

๊ธ€์„ ์ •๋ฆฌํ•˜๋ฉฐ

๐Ÿ˜ƒ kogpt2๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฐ๊ณผ๋“ค์„ ๋ณด๋‹ˆ, ๋ง ์ž์ฒด๋Š” ๋˜์ง€๋งŒ ๊ฒฐ๊ตญ ์•ฝ๊ฐ„ ๋ฌธ๋งฅ์— ๋งž์ง€ ์•Š๋Š” ๋‹จ์–ด๋“ค์„ ์—ฐ์†ํ•ด์„œ ๋ฑ‰๋Š” ๋ถ€๋ถ„์ด ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค ;<
๐Ÿค— kogpt2๋ฅผ ์ œ ์ž…๋ง›์— ๋งž๊ฒŒ ์žฌ๋ฐŒ๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์„œ ๋‹ค์Œ ๊ธ€์—์„œ๋Š” serving ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค :)
๐Ÿ’• ๊ทธ๋ฆฌ๊ณ , https://sooftware.io/bentoml/ ์˜ ๊ธ€์„ ๊ณต๋ถ€ํ•˜๋ฉฐ ๋งŽ์€ ๋„์›€์„ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ๐Ÿ‘
๐Ÿ˜‰ ๊ทธ๋Ÿผ ์˜ค๋Š˜๋„ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

profile
MLOps, ML Engineer. ๋ฐ์ดํ„ฐ์—์„œ ์‹œ์Šคํ…œ์œผ๋กœ, ์‹œ์Šคํ…œ์—์„œ ๊ฐ€์น˜๋กœ.

1๊ฐœ์˜ ๋Œ“๊ธ€

comment-user-thumbnail
2022๋…„ 4์›” 21์ผ

Hello

Thank you for this. I am having hard time understanding BentoML artifacts for various frameworks.

In my case, my colleague was using below model from Transformers library's Pipeline module to get the emotions score for the imput text:

classifier = pipeline("text-classification",model='bhadresh-savani/distilbert-base-uncased-emotion', return_all_scores=True)

Now, I need to create a rest API for this. I using bentoml. I am just not able to understand how to use pipeline instance in the Artifacts wrapper.

I tried below approach:

Writing bentoml code

import re
import bentoml
from transformers import DistilBertTokenizer
from transformers import DistilBertModel
from transformers import DistilBertForSequenceClassification
from bentoml.adapters import JsonInput
from bentoml.frameworks.transformers import TransformersModelArtifact

@bentoml.env(pip_packages = ["transformers==4.18.0", "torch==1.11.0"])
@bentoml.artifacts([TransformersModelArtifact('distilbert')])

class DistillBertService(bentoml.BentoService):

@bentoml.api(input=JsonInput(), batch=False)
def predict(self, parsed_json):        
    print('Input: ', parsed_json['text'])
    
    model = self.artifacts.distilbert.get("model")
    tokenizer = self.artifacts.distilbert.get("tokenizer")
    input_ids = tokenizer.encode(parsed_json['text'], return_tensors="pt")
    output = model.generate(input_ids, max_length=50)
    output = tokenizer.decode(output[0], skip_special_tokens=True)
    return output

ds = DistillBertService()

MODEL_NAME = 'distilbert-base-cased'
model = DistilBertModel.from_pretrained(MODEL_NAME)
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_NAME)
artifact = {"model": model, "tokenizer": tokenizer}
ds.pack("distilbert", artifact)
saved_path = ds.save()

When I hit request, I get below error:
File "/home/dd00740409/bentoml/repository/DistillBertService/20220421051733_82868D/DistillBertService/distill_bert_service_with_preprocessing.py", line 41, in predict
output = model.generate(input_ids, max_length=50)
File "/home/dd00740409/.conda/envs/distill-bert/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, kwargs)
File "/home/dd00740409/.conda/envs/distill-bert/lib/python3.7/site-packages/transformers/generation_utils.py", line 1263, in generate
model_kwargs,
File "/home/dd00740409/.conda/envs/distill-bert/lib/python3.7/site-packages/transformers/generation_utils.py", line 1649, in greedy_search
next_token_logits = outputs.logits[:, -1, :]
AttributeError: 'BaseModelOutput' object has no attribute 'logits'

Can u guide a bit?

Thanks

๋‹ต๊ธ€ ๋‹ฌ๊ธฐ