[MLOps] Triton Inference Server 구축기 2 - model repository 만들기

Ellie·2023년 3월 1일

Nvidia Onnx TensorRT Triton gpu inference mlops server serving torch

Triton Inference Server

목록 보기

3/3

이전 글에서 triton inference server를 docker로 띄우는데 성공하였다면, 이제 serving 하고자 하는 모델을 triton에 올릴 차례이다. triton은 모델 저장소(model-repository)에서 각각의 모델과 메타 데이터(버전 등)을 읽어와 서버로 띄운다. model-repository는 로컬 혹은 remote cloud인 AWS S3, Azure Blob Storage, Google Cloud Storage가 될 수도 있다. 여기서는 로컬 환경에서의 repository 구성을 하려 한다.

Model Repository의 구조

우선 serving 할 모델들이 위치할 폴더를 하나 생성한다. 그 다음 생성한 폴더 아래에 아래와 같은 layout으로 model, config 파일 등을 넣어둔다.

[Model-repository 구성 예시]

<model-repository-path>/
  <model-name>/
    [config.pbtxt]
    [<output-labels-file> ...]
    <version>/
      <model-definition-file>
    <version>/
      <model-definition-file>
    ...
  <model-name>/
    [config.pbtxt]
    [<output-labels-file> ...]
    <version>/
      <model-definition-file>
    <version>/
      <model-definition-file>
    ...
  ...

Model-repository는 총 세가지 구성 요소로 되어 있는데,

1. model-name

모델의 식별 이름으로 여기서 지정한 이름이 곧 triton inference server가 뜰 때의 모델 이름이 된다. 예를들어, 'text_detection'으로 폴더를 만들었다면, triton 서버에서 아래와 같이 모델을 잡는다.

# triton inference server 실행 후 로그
I0712 16:37:18.246487 128 server.cc:626]
+------------------+---------+--------+
| Model            | Version | Status |
+------------------+---------+--------+
| text_detection   | 1       | READY  |
+------------------+---------+--------+

2. config.pbtxt

각 모델의 정보를 입력하는 파일. 모델 입출력의 tensor 이름, 크기, type 그리고 백엔드, cpu/gpu 선택 등을 이 파일에서 정의할 수 있다. TensorRT, TensorFlow, ONNX, OpenVINO 모델의 경우, config.pbtxt 파일을 자동으로 생성하지만 TorchScript 모델의 경우 수동으로 config 파일을 만들어 줘야 한다. 더 자세한 내용은 공식 페이지에서 확인할 것!

name
name은 선택적으로 작성. 작성시 모델의 디렉토리 이름과 일치해야 함
backend
모델을 실행하는 데 사용되는 backend를 지정. triton은 TensorFlow, PyTorch, Python, ONNX 등의 다양한 백엔드를 지원하고 있음
max_batch_size
모델이 지원할 수 있는 최대 배치 크기
input/output
입출력 tensor의 이름, shape, type 등을 지정. reshaping도 가능하다.

name: "text_recognition"
backend: "onnxruntime"
max_batch_size : 0
input [
  {
    name: "input.1"
    data_type: TYPE_FP32
    dims: [ 1, 1, 32, 100 ]
  }
]
output [
  {
    name: "308"
    data_type: TYPE_FP32
    dims: [ 1, 26, 37 ]
  }
]

3. version

버전 관리를 사용하면 동일한 모델의 여러 버전을 사용할 수 있다. 이때 버전 네이밍 룰이 있는데, 반드시 숫자로 적어야하며, 0으로 시작하는 이름은 모두 무시된다.

4. model file

Triton의 장점에는 다양한 모델의 서빙을 제공한다는 점이 있는데, 서빙할 모델의 종류에 따라 아래와 같이 모델을 위치시키면 된다.

TorchScript 모델
단일 파일로 이름은 model.pt(default)로 넣어준다. 이름은 config 파일에서 default_model_filename에 의해 override 될 수 있다.
```
<model-repository-path>/
  <model-name>/
    config.pbtxt
    1/
      model.pt
```

ONNX 모델

<model-repository-path>/
  <model-name>/
    config.pbtxt
    1/
      model.onnx/
         model.onnx

그 밖의 모델들..
https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md#model-files

아래 링크에서 예시를 보면 이해가 잘 된다!
https://github.com/triton-inference-server/tutorials/tree/main/Conceptual_Guide/Part_1-model_deployment/model_repository

Ellie

A bit nerdy

이전 포스트