[MLOps] PyTorch 모델을 TensorRT로 변환하기

Ellie·2023년 2월 15일

MLOps

목록 보기

1/3

Torch-TensorRT는 PyTorch 모델을 TensorRT 모델로 바꾸어 주는 역할
TensorRT는 NVIDIA에서 만든 inference engine으로 kernel fusion, graph optimization, low precision 등의 optimization을 도와 준다.

In this tutorial, converting a model from PyTorch to TensorRT™ involves the following general steps:

Build a PyTorch model by doing any of the two options:

Train a model in PyTorch
Get a pre-trained model from the PyTorch ModelZoo, other model repository, or directly from Deci’s SuperGradients, an open-source PyTorch-based deep learning training library.
Convert the PyTorch model to ONNX.
Convert from ONNX to TensorRT™.

How to convert a Transformers model to TensorRT
전통적인(?) 방법으로는 pytorch에서 학습을 완료한 모델을 ONNX로 바꾼 다음 TensorRT로 변환함 (2번의 변환 과정을 거침)

Train a model using PyTorch
Convert the model to ONNX format
Use NVIDIA TensorRT for inference
그러나, https://github.com/pytorch/TensorRT

There's another library, but it's likely to be outdated. or it's already outdated? no more update since Nov, 2022 ?_? https://github.com/NVIDIA-AI-IOT/torch2trt

Firstly, install python packages below:

pip3 install nvidia-pyindex
pip3 install nvidia-tensorrt
pip3 install torch-tensorrt==<VERSION> -f

https://github.com/pytorch/TensorRT/releases/expanded_assets/

deepspeed docker 사용하는 경우, 이미 docker에 설치되어 있음.

공식 repo example :
https://github.com/triton-inference-server/server/tree/main/docs/examples/stable_diffusion

방법 1. CLI TensorRT

Official Docs : https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#save-model
방법 2. Python TensorRT
2-1. tensorrt

Installation

python3 -m pip install --upgrade tensorrt

Version check

import tensorrt
print(tensorrt.__version__)
assert tensorrt.Builder(tensorrt.Logger())

Convert the model

trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine.trt
trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine.plan

To tell trtexec where to find our ONNX model, run:

--onnx=resnet50/model.onnx

To tell trtexec where to save our optimized TensorRT engine, run:

--saveEngine=resnet_engine_intro.trt

2-2. Torch-TensorRT library

References

NVIDIA Tech Blog 1 : https://developer.nvidia.com/blog/estimating-depth-beyond-2d-using-custom-layers-on-tensorrt-and-onnx-models/#:~:text=ONNX%20is%20a%20common%20file,shows%20the%20ONNX%2DTensorRT%20workflow.
NVIDIA Tech Blog 2 : https://developer.nvidia.com/blog/accelerating-inference-up-to-6x-faster-in-pytorch-with-torch-tensorrt/
https://developer.nvidia.com/blog/optimizing-and-serving-models-with-nvidia-tensorrt-and-nvidia-triton/
Torch-TensorRT Docs : https://pytorch.org/TensorRT/
https://github.com/pytorch/TensorRT
Huggingface Docs (Mostly comparison between CUDAExecutionProvider and TensorrtExecutionProvider tho) : https://huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu
Korean blog : https://velog.io/@pjs102793/Triton-Inference-Server%EC%97%90%EC%84%9C-TensorRT-Engine-Inference

Ellie

A bit nerdy

다음 포스트

[MLOps] PyTorch 모델을 TensorRT로 변환하기

MLOps

References

[MLOps] Inference Model Format

0개의 댓글

관련 채용 정보