[NMT] 7. CTranslate

Judy·2023년 7월 18일

OpenNMT ctranslate dev nmt python

FastAPI

목록 보기

7/8

번역 모델 서빙 프레임워크로 당초 TorchServe 를 이용할 계획이었는데요,
OpenNMT 에서 제공하는 CTranslate 를 먼저 이용하여 간단히 서빙해 보겠습니다.

Why CTranslate?

CTranslate2 is a C++ and Python library for efficient inference with Transformer models.

OpenNMT 를 이용해 학습한 번역 모델을 서빙하고 추론하려니(번역 문장을 생성하려니) 몇 가지 문제가 있었는데요,

OpenNMT 에서 제공하는 기본 추론 시 input/output 포맷 = 텍스트 파일
제가 원한 형태는 문장 1개를 바로 모델에 입력하여 번역 문장을 얻는 것이라
iput/output 코드를 수정해야 합니다.
GPU 없이 빠르게 서빙해야 함
현재 개인 GPU 가 없어서 CPU 에서 번역을 수행해야 하는데,
경험상 CPU 환경에서의 추론 속도는 GPU 환경의 약 10배에 가깝습니다.

CTranslate 는 이 2가지 문제를 모두 해결해 주었습니다.
1. 추론시 파일이 아니라 문장 input/output
2. CPU 에서 빠르게 추론 가능

공식 문서에서 언급하는 비결은 다음과 같습니다.

The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.
(이 프로젝트는 CPU 및 GPU에서 Transformer 모델의 메모리 사용을 가속화하고 줄이기 위해 가중치 양자화, 레이어 융합, 일괄 재정렬 등과 같은 많은 성능 최적화 기술을 적용하는 맞춤형 런타임을 구현합니다.)

즉, CTranslate 는 모델에 각종 경량화 기법을 적용하여 서빙하며
이를 위해 번역 모델을 'CTranslate2 format' 으로 변환하여 사용합니다.

Quickstart

1. Install the Python packages

모델 서빙을 위한 CTranslate2, 모델을 위한 OpenNMT-py,
토크나이징을 위한 Sentencepiece 패키지를 설치합니다.

pip install ctranslate2 OpenNMT-py==2.* sentencepiece
또는
poetry add ctranslate2 OpenNMT-py==2.* sentencepiece

2. Prepare Transformer model trained with OpenNMT-py

공식 문서에서는 OpenNMT-py 로 학습된 모델을 다운로드하여 이용하지만
저는 제가 직접 학습시킨 모델을 준비하겠습니다.
모델은 학습 완료된 번역 모델뿐만이 아니라 토크나이저 모델도 각각 준비합니다.

model.pt
Source & Target language sentencepiece model

3. Convert the model to the CTranslate2 format

번역 모델을 CTranslate2 format 으로 변환합니다.

ct2-opennmt-py-converter --model_path <번역모델.pt> --output_dir <output_dir>

CTranslate2 로 변환된 model.bin 이 <output_dir> 에 저장됩니다.
저는 모델 파일 경로를 다음과 같이 수정했습니다.

├── nmt
│   └── model
│       ├── bin
│       │   └── ko-en
│       │       ├── config.json
│       │       ├── ko-en.pt (학습 모델)
│       │       ├── model.bin (ctranslate format 변환 파일)
│       │       ├── source_vocabulary.json
│       │       └── target_vocabulary.json
│       └── sentencepiece
│           ├── sp_model.en (영어 sentencepiece 토크나이저 모델)
│           └── sp_model.ko (한국어 sentencepiece 토크나이저 모델)

4. Translate texts with the Python API

모델들을 모두 로드하고 추론을 수행합니다.

app/service/translate.py

async def translate_text(params) -> str:

    mt = await ctranslate(params)
    setattr(params, 'mt', mt)

    # DB에 번역 언어 종류 & 문장 저장
    await crud.create_translate(params)

    return mt


async def ctranslate(params) -> str:

    translator = ctranslate2.Translator(
        f"nmt/model/bin/{params.sl}-{params.tl}", device="cpu")
    sp_sl = spm.SentencePieceProcessor(
        f"nmt/model/sentencepiece/sp_model.{params.sl}")
    sp_tl = spm.SentencePieceProcessor(
        f"nmt/model/sentencepiece/sp_model.{params.tl}")

    input_tokens = sp_sl.encode(params.text, out_type=str)
    results = translator.translate_batch([input_tokens])

    output_tokens = results[0].hypotheses[0]
    output_text = sp_tl.decode(output_tokens)

    return output_text