
triton은 여러 배포 타입(pytorch, tensorflow, python)등을 제공하는데 이 글에서는 python으로 모델을 배포하는 quick start 방법 정리 및 작성. 참고자료
docker run --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
git clone https://github.com/triton-inference-server/python_backend -b r<xx.yy>
cd python_backend
mkdir -p models/add_sub/1/
cp examples/add_sub/model.py models/add_sub/1/model.py
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
tritonserver --model-repository `pwd`/models
docker run -ti --net host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk /bin/bash
git clone https://github.com/triton-inference-server/python_backend -b r<xx.yy>
python python_backend/examples/add_sub/client.py
아래 코드는 5번 코드 대체.
pip install tritonclient
pip install gevent
pip install geventhttpclient
sudo docker run --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ~/triton_test/models:/opt/tritonserver/models -ti nvcr.io/nvidia/tritonserver:22.02-py3
sudo docker run --shm-size=1g --gpus all -p 8003:8000 -p 8004:8001 -p 8005:8002 -v ~/triton_test/models:/opt/tritonserver/models -ti nvcr.io/nvidia/tritonserver:22.02-py3
sudo docker run --shm-size=1g --gpus all -p 8003:8000 -p 8004:8001 -p 8005:8002 -ti opt:1.0
# 자원 제한
sudo docker run --shm-size=1g --ulimit memlock=-1 --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -v ~/triton_test/models:/opt/tritonserver/models -ti nvcr.io/nvidia/tritonserver:22.02-py3
model/
└── opt/ (model name)
├── 1/ (model version)
│ └── model.py (mode code or model file)
└── config.pbtxt (model setting file)
model.py
name: "opt"
backend: "python"
input [
{
name: "INPUT0"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
instance_group [{ kind: KIND_CPU }]
config.pbtxt
name: "opt"
backend: "python"
input [
{
name: "INPUT0"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "OUTPUT0"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
instance_group [{ kind: KIND_CPU }]
import requests
URL = "http://localhost:8000/v2/models/opt/infer"
def main():
data = {
"name": "opt",
"inputs": [
{
"name": "INPUT0",
"shape": [1],
"datatype": "BYTES",
"data": ["go to school"]
}
]
}
res = requests.post(URL, json=data)
print(res.json())
return
if __name__ == "__main__":
main()
TritonPythonModel 이어야함