sudo docker run --gpus=all --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:24.10-py3
git clone https://github.com/triton-inference-server/python_backend -b r24.10
python 버전 확인하기 위해선 nvidia NGC Tags 확인
py3와 py-sdk 등등 구별하기위해선 Overview 확인
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags
설치된후 triton 서버 실행시 옵션
tritonserver --model-repository `pwd`/models --model-control-mode=explicit
model-control-mode 차이점
explicit
None ...
curl -X POST 192.168.1.75:8000/v2/repository/index
curl -X POST 192.168.1.75:8000/v2/repository/models/{model_name}/load