Torch-TensorRT는 PyTorch 모델을 TensorRT 모델로 바꾸어 주는 역할
TensorRT는 NVIDIA에서 만든 inference engine으로 kernel fusion, graph optimization, low precision 등의 optimization을 도와 준다.
In this tutorial, converting a model from PyTorch to TensorRT™ involves the following general steps:
Build a PyTorch model by doing any of the two options:
Train a model in PyTorch
Get a pre-trained model from the PyTorch ModelZoo, other model repository, or directly from Deci’s SuperGradients, an open-source PyTorch-based deep learning training library.
Convert the PyTorch model to ONNX.
Convert from ONNX to TensorRT™.
How to convert a Transformers model to TensorRT
전통적인(?) 방법으로는 pytorch에서 학습을 완료한 모델을 ONNX로 바꾼 다음 TensorRT로 변환함 (2번의 변환 과정을 거침)
Train a model using PyTorch
Convert the model to ONNX format
Use NVIDIA TensorRT for inference
그러나, https://github.com/pytorch/TensorRT
There's another library, but it's likely to be outdated. or it's already outdated? no more update since Nov, 2022 ?_? https://github.com/NVIDIA-AI-IOT/torch2trt
Firstly, install python packages below:
pip3 install nvidia-pyindex
pip3 install nvidia-tensorrt
pip3 install torch-tensorrt==<VERSION> -f
https://github.com/pytorch/TensorRT/releases/expanded_assets/
deepspeed docker 사용하는 경우, 이미 docker에 설치되어 있음.
공식 repo example :
https://github.com/triton-inference-server/server/tree/main/docs/examples/stable_diffusion
방법 1. CLI TensorRT
Official Docs : https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#save-model
방법 2. Python TensorRT
2-1. tensorrt
Installation
python3 -m pip install --upgrade tensorrt
Version check
import tensorrt
print(tensorrt.__version__)
assert tensorrt.Builder(tensorrt.Logger())
Convert the model
trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine.trt
trtexec --onnx=resnet50/model.onnx --saveEngine=resnet_engine.plan
To tell trtexec where to find our ONNX model, run:
--onnx=resnet50/model.onnx
To tell trtexec where to save our optimized TensorRT engine, run:
--saveEngine=resnet_engine_intro.trt
2-2. Torch-TensorRT library
NVIDIA Tech Blog 1 : https://developer.nvidia.com/blog/estimating-depth-beyond-2d-using-custom-layers-on-tensorrt-and-onnx-models/#:~:text=ONNX%20is%20a%20common%20file,shows%20the%20ONNX%2DTensorRT%20workflow.
NVIDIA Tech Blog 2 : https://developer.nvidia.com/blog/accelerating-inference-up-to-6x-faster-in-pytorch-with-torch-tensorrt/
https://developer.nvidia.com/blog/optimizing-and-serving-models-with-nvidia-tensorrt-and-nvidia-triton/
Torch-TensorRT Docs : https://pytorch.org/TensorRT/
https://github.com/pytorch/TensorRT
Huggingface Docs (Mostly comparison between CUDAExecutionProvider and TensorrtExecutionProvider tho) : https://huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu
Korean blog : https://velog.io/@pjs102793/Triton-Inference-Server%EC%97%90%EC%84%9C-TensorRT-Engine-Inference