[번역] MLOps: Continuous delivery and automation pipelines in machine learning (6) MLOps level 2: CI/CD pipeline automation

Hayley·2022년 5월 29일

MLOps

목록 보기

6/6

안녕하세요 Hayley입니다~! :) 휴 회사 일도 바쁘고, 개인적인 일들도 많아 오랜만에 글을 올리게 되었네요!

오늘은 계속 연재해오고 있던
Google의 MLOps: Continuous delivery and automation pipelines in machine learning 글을 이어서 번역해보려 해요.

오역 또는 질문이 있으시면, 댓글로 부탁드립니다!
또한, 위 링크에서 언어 설정을 통해 한국어 본문도 확인하실 수 있으니, (이미 한국어로 설정되어, 애초에 왜 번역을 하지? 🤔 하셨을수도 있겠네요) "공식적"인 번역을 원하시는 분은 언어 설정을 통해 한국어 본문을 확인하시기 바랍니다! (제가 읽었을 때는, 구글 번역기를 이용한 것 같다는 느낌을 받기는 했어요. 그렇지만, 번역의 퀄리티가 나쁘진 않았습니다!)

MLOps level 2: CI/CD pipeline automation

MLOps 2단계: CI/CD 파이프라인 자동화

For a rapid and reliable update of the pipelines in production, you need a robust automated CI/CD system. This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment.

운영환경에서 빠르고 안정적으로 파이프라인을 업데이트 하기 위해서는, 강건한 자동화된 CI/CD 시스템이 필요합니다. 이 자동화된 CI/CD 시스템은 데이터 사이언티스트들이 feature engineering (파생변수 생성), 모델 아키텍쳐, 하이퍼 파라미터에 관련한 새로운 아이디어들을 빠르게 탐색해볼 수 있게 합니다. 그들이 이러한 아이디어들을 구현하고 나면, 타겟 환경 (운영 환경)에 새롭게 구현된 파이프라인 컴포넌트를 자동으로 빌드, 테스트, 배포할 수 있습니다.

The following diagram shows the implementation of the ML pipeline using CI/CD, which has the characteristics of the automated ML pipelines setup plus the automated CI/CD routines.

아래 다이어그램은 CI/CD를 사용하는 ML 파이프라인의 구현체를 보여줍니다. 이러한 파이프라인은 1단계에서 본 자동화된 ML 파이프라인 셋업의 특징들을 가지고 있으며 추가적으로 자동화된 CI/CD 루틴도 가지고 있습니다.

Figure 4. CI/CD and automated ML pipeline.
그림 4. CI/CD와 자동화된 ML 파이프라인.

This MLOps setup includes the following components:
이 MLOps 셋업은 아래 컴포넌트들을 포함합니다:

Source control (소스 컨트롤)
Test and build services (테스트와 빌드 서비스)
Deployment services (배포 서비스)
Model registry (모델 registry - registry는 기록이 보관되는 장소를 말합니다. 번역 주.)
Feature store (변수 저장소)
ML metadata store (ML 메타데이터 저장소)
ML pipeline orchestrator (ML 파이프라인 orchestrator)

Characteristics

특징

The following diagram shows the stages of the ML CI/CD automation pipeline:
아래 다이어그램은 ML CI/CD 자동화 파이프라인의 단계들을 보여줍니다:

Figure 5. Stages of the CI/CD automated ML pipeline.
그림 5. CI/CD가 자동화된 ML파이프라인의 단계들

The pipeline consists of the following stages:
파이프라인은 다음 단계로 구성됩니다:

Development and experimentation: You iteratively try out new ML algorithms and new modeling where the experiment steps are orchestrated. The output of this stage is the source code of the ML pipeline steps that are then pushed to a source repository.
개발과 실험: 반복적으로 새로운 ML 알고리즘과 새로운 모델링을 실험해보는데, 이 때 실험 단계들이 orchestrate (번역자 주: kubeflow 등의 도구로 각 단계가 DAG 등으로 연결되어 실행됨) 됩니다. 이 단계의 결과물은 ML 파이프라인 각 단계들의 소스코드이며, 이 소스코드가 소스코드 repository로 푸쉬 됩니다.
Pipeline continuous integration: You build source code and run various tests. The outputs of this stage are pipeline components (packages, executables, and artifacts) to be deployed in a later stage.
파이프라인 지속적 통합 (CI): 소스코드를 빌드하고 다양한 테스트를 수행합니다. 이 단계의 결과물은 다음 단계에서 배포될 파이프라인 컴포넌트 (패키지, executables - 실행가능한 프로그램, artifacts - 모델 등의 파이프라인 실행으로 생성되는 객체들) 입니다.
Pipeline continuous delivery: You deploy the artifacts produced by the CI stage to the target environment. The output of this stage is a deployed pipeline with the new implementation of the model.
파이프라인 지속적 배포(CD): CI 단계에서 생성된 artifacts (오브젝트)들을 타깃 환경으로 배포합니다. 이 단계의 결과는 모델에 대한 새로운 구현 방법이 반영된 파이프라인이 배포되는 것입니다.
Automated triggering: The pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a trained model that is pushed to the model registry.
자동화된 트리거링: 이 파이프라인은 운영환경에서 스케줄이나 특정 트리거에 의해 자동으로 실행됩니다. 이 단계의 결과물은 모델 registry에 푸쉬되는 학습된 모델입니다.
Model continuous delivery: You serve the trained model as a prediction service for the predictions. The output of this stage is a deployed model prediction service.
모델 지속적 배포 (CD): 학습된 모델을 예측을 하기 위한 예측 서비스로 서빙합니다. 이 단계의 결과물은 배포된 모델 예측 서비스입니다.
Monitoring: You collect statistics on the model performance based on live data. The output of this stage is a trigger to execute the pipeline or to execute a new experiment cycle.
모니터링: 실제 데이터에 기반해 모델 성능에 대한 통계를 수집합니다. 이 단계의 결과는 파이프라인을 재실행하거나(재학습을 위함) 새로운 실험 사이클을 실행하기 위한 트리거입니다.

The data analysis step is still a manual process for data scientists before the pipeline starts a new iteration of the experiment. The model analysis step is also a manual process.
데이터 분석 단계는 여전히 수작업으로 이뤄지며, 데이터 사이언티스트들이 파이프라인이 실험을 새롭게 iteration(반복) 하기를 시작하기 전에 수행하는 작업입니다. 모델 분석 단계도 수작업입니다.

Continuous integration

지속적 통합

In this setup, the pipeline and its components are built, tested, and packaged when new code is committed or pushed to the source code repository. Besides building packages, container images, and executables, the CI process can include the following tests:
이 셋업에서, 파이프라인과 파이프라인 내의 컴포넌트들은 새로운 코드가 소스코드 리포지토리에 커밋 되거나 푸쉬되면 자동으로 빌드되고, 테스트되고, 패키지화 됩니다. 패키지, 컨테이너 이미지, 실행가능 프로그램을 빌드하는 것 이외에, CI 프로세스는 다음과 같은 테스트를 수행할 수 있습니다.

Unit testing your feature engineering logic.
Feature engineering 로직을 단위 테스트
Unit testing the different methods implemented in your model. For example, you have a function that accepts a categorical data column and you encode the function as a one-hot feature.
모델에 구현되어 있는 다양한 메소드들을 유닛 테스트 함. 예를 들면, 범주형 데이터 컬럼을 받고 원핫 인코딩하는 함수가 있을 수 있다.
Testing that your model training converges (that is, the loss of your model goes down by iterations and overfits a few sample records).
모델 학습이 수렴하는지 (즉, 모델의 로스가 학습 iterations를 거쳐 낮아지고 몇개의 샘플 데이터를 과적합하는지) 테스트
Testing that your model training doesn't produce NaN values due to dividing by zero or manipulating small or large values.
모델 학습이 0으로 나누거나, 작거나 큰 값을 다루다가 NaN 값을 뱉지 않는지 테스트
Testing that each component in the pipeline produces the expected artifacts.
파이프라인의 각 컴포넌트들이 예상한 artifacts들을 생성하는지 테스트
Testing integration between pipeline components.
파이프라인 컴포넌트간 통합을 테스트

Continuous delivery

지속적 배포

In this level, your system continuously delivers new pipeline implementations to the target environment that in turn delivers prediction services of the newly trained model. For rapid and reliable continuous delivery of pipelines and models, you should consider the following:
MLOps 2단계에서, ML 시스템은 새로운 파이프라인 구현체를 타깃 환경에 지속적으로 배포하고, 이 타깃 환경은 새롭게 학습된 모델을 이용하는 예측 서비스를 배포합니다. 파이프라인과 모델의 빠르고 안정적인 지속 배포를 위해서, 다음을 고려해야 합니다.

Verifying the compatibility of the model with the target infrastructure before you deploy your model. For example, you need to verify that the packages that are required by the model are installed in the serving environment, and that the memory, compute, and accelerator resources that are available.
모델을 배포하기 전에, 타깃 인프라와 모델의 호환성을 검증해야 합니다. 예를 들어, 서빙 환경에 모델이 필요로 하는 패키지들이 설치되어 있는지, 그리고 메모리, 연산장치, 가속기 리소스가 가용한지 확인해야 합니다.
Testing the prediction service by calling the service API with the expected inputs, and making sure that you get the response that you expect. This test usually captures problems that might occur when you update the model version and it expects a different input.
예상되는 인풋으로 서비스 API를 호출하고 예상하는 응답이 반환되는지 확인함으로써 예측 서비스를 테스트 합니다. 이 테스트는 보통 모델 버전을 업데이트 하여 인풋데이터의 형태가 바뀔 때 발생할 수 있는 문제를 잡아냅니다.
Testing prediction service performance, which involves load testing the service to capture metrics such as queries per seconds (QPS) and model latency.
예측 서비스 퍼포먼스를 테스트합니다. 이는 서비스에 로드 테스트를 해서 QPS(초당 처리 쿼리 수)와 모델 응답속도와 같은 지표를 측정하는 것을 포함합니다.
Validating the data either for retraining or batch prediction.
재학습이나 배치 예측에 필요한 데이터의 정합성을 검증합니다.
Verifying that models meet the predictive performance targets before they are deployed.
모델이 배포되기 전에 예측 성능 목표를 달성하는지 확인합니다.
Automated deployment to a test environment, for example, a deployment that is triggered by pushing code to the development branch.
테스트 환경으로의 자동화된 배포를 고려해야 합니다. 예를 들면, develop 브랜치에 코드가 푸쉬되었을 때 트리거되는 배포를 말합니다.
Semi-automated deployment to a pre-production environment, for example, a deployment that is triggered by merging code to the main branch after reviewers approve the changes.
프리-프로덕션(스테이징) 환경으로의 반 자동화된 배포를 고려해야 합니다. 예를 들면, 리뷰어들이 코드 체인지를 승인한 후에 코드를 메인 브랜치로 머지함에 따라 트리거되는 배포를 말합니다.
Manual deployment to a production environment after several successful runs of the pipeline on the pre-production environment.
프로덕션(운영) 환경으로는 프리-프로덕션(스테이징) 환경에서 파이프라인이 여러번 성공적으로 동작함을 확인한 후에 수동으로 배포하는 것을 고려해야 합니다.

To summarize, implementing ML in a production environment doesn't only mean deploying your model as an API for prediction. Rather, it means deploying an ML pipeline that can automate the retraining and deployment of new models. Setting up a CI/CD system enables you to automatically test and deploy new pipeline implementations. This system lets you cope with rapid changes in your data and business environment. You don't have to immediately move all of your processes from one level to another. You can gradually implement these practices to help improve the automation of your ML system development and production.
요약하자면, 운영환경에서 ML을 구현하는 것은 예측을 위한 API로 모델을 배포하는 것만을 의미하는 것이 아닙니다. 그보다, 새로운 모델의 재학습과 배포를 자동화할 수 있는 ML 파이프라인을 배포하는 것을 의미합니다. CI/CD 시스템을 세팅하는 것은 새로운 파이프라인 구현체를 자동으로 테스트하고 배포할 수 있도록 합니다. 이러한 시스템은 데이터와 비즈니스 환경에서 일어나는 빠른 변화에 대응할 수 있도록 합니다. 당신이 현재 가지고 있는 모든 프로세스들을 즉시 더 높은 단계로 옮길 필요는 없습니다. 제시된 이러한 프랙티스들을 점진적으로 구현함으로써 당신의 ML 시스템 개발과 운영이 더 잘 자동화될 수 있도록 할 수 있을 것입니다.

요약을 해보자면,

MLOps 의 끝판왕(?)이라 할 수 있는 2단계에서는, 모델 학습을 위한 파이프라인을 그냥 배포하는 것이 아니라 CI/CD 파이프라인을 이용해서 자동으로 빌드/테스트를 거쳐 배포되도록 합니다. CI (Continuous Integration 지속적 통합)는 소스코드가 수정되어 리포지토리에 푸쉬되었을 때, 자동화된 빌드/테스트를 거쳐 패키지(또는 컨테이너 이미지, 실행 가능 프로그램)로 만들어지는 것을 말합니다. 이때 단위 테스트 (피처 엔지니어링, 변수 인코딩, 그 외 모델링에 구현된 함수) 및 모델 학습이 수렴하는지, 모델이 NaN을 리턴하지 않는지, 각 컴포넌트가 예상되는 artifact를 생성하는지를 테스트 할 수 있습니다. CD(Continuous Delivery 지속적 배포)는 CI를 거쳐 ML 파이프라인이 패키지화되면 이를 개발, 스테이징을 거쳐 운영환경에 빠르고 안정적으로 배포하여, 최종적으로는 새로 배포된 ML 파이프라인에서 학습한 모델을 예측 서비스로 배포하는 것을 말합니다. CD를 할 때에도 예측 API 요청-응답이 예상된 대로인지, 예측 서비스 로드 테스트, 예측 성능이 target을 넘는지 등의 검증을 거쳐야 하며, 개발환경으로는 자동배포 되더라도 스테이징과 운영으로는 이러한 검증을 거친 후에 각각 반자동과 수동으로 배포되도록 구현해야 합니다.

폭풍같이 번역을 해보았는데요. 좀더 정리하는 시간이 필요할 것 같아, 다음 포스팅에서는 전체 내용을 요약해보도록 하겠습니다.

그럼 다음 포스팅에서 만나요~! 😁

Hayley

선한 데이터 사이언티스트를 꿈꾸다.

이전 포스트