MLflow_2_예제1 & 서빙

정원석·2024년 3월 22일

MLOps

목록 보기

11/14

1. Example code 살펴보기

https://github.com/mlflow/mlflow/tree/master/examples/sklearn_elasticnet_diabetes

# VM 혹은 linux 사용자
wget https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_diabetes/linux/train_diabetes.py

# Mac 사용자
wget https://raw.githubusercontent.com/mlflow/mlflow/master/examples/sklearn_elasticnet_diabetes/osx/train_diabetes.py

mlflow 에서 example 로 제공해주는 다양한 example 중 하나인 train_diabetes.py
- scikit-learn 패키지에서 제공하는 diabetes(당뇨병) 진행도 예측용 데이터로 ElasticNet 모델을 학습하여, predict 한 뒤 그 evaluation metric 을 MLflow 에 기록하는 예제
- 442 명의 당뇨병 환자를 대상으로, 나이, 성별, bmi 등의 10 개의 독립변수(X) 를 가지고 1년 뒤 당뇨병의 진행률 (y) 를 예측하는 문제
데이터에 대한 자세한 분석과 ElasticNet 에 대한 자세한 설명은 생략하겠습니다.
- ElasticNet : Linear Regression + L1 Regularization + L2 Regularization
  - parameter
    - alpha : Regularization coefficient
    - l1_ratio : L1 Regularization 과 L2 Regularization 의 비율
코드를 함께 살펴보겠습니다.
- mlflow 와 연관된 부분에 주목해주세요.
  - mlflow.log_param
  - mlflow.log_metric
  - mlflow.log_model
  - mlflow.log_artifact

2. Example code 실행

# mlflow ui 를 수행한 디렉토리와 같은 디렉토리로 이동
cd mlflow-tutorial

# example 코드를 실행 후 mlflow 에 기록되는 것 확인
python train_diabetes.py

model 관련 meta 정보와 더불어 pkl 파일이 저장된 것을 확인
parameters, metrics, artifacts
다양한 parameter 로 테스트 후 mlflow 확인

python train_diabetes.py  0.01 0.01
python train_diabetes.py  0.01 0.75
python train_diabetes.py  0.01 1.0
python train_diabetes.py  0.05 1.0
python train_diabetes.py  0.05 0.01
python train_diabetes.py  0.5 0.8
python train_diabetes.py  0.8 1.0

다음과 같은 화면이 출력되며, metrics 와 parameter 를 한 눈에 비교할 수 있습니다.

3. MLflow 데이터 저장 방식

cd mlruns/0
ls
# 굉장히 많은 디렉토리가 생성되었습니다.
# (각각의 알 수 없는 폴더명은 mlflow 의 run-id 를 의미합니다.)

# 아무 디렉토리에나 들어가보겠습니다.
cd <특정 디렉토리>
ls

# artifacs, metrics, params, tag 와 같은 디렉토리가 있고 그 안에 실제 mlflow run 의 메타 정보가 저장된 것을 확인할 수 있습니다.

4. MLflow 를 사용한 서빙 Example

MLflow 를 사용하여 간단하게 서빙도 할 수 있습니다.

https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html

 mlflow models serve -m $(pwd)/mlruns/0`run-id`/artifacts/model -p `port`

 mlflow models serve -m $(pwd)/mlruns/0/63d1a9cde7f84190a5634648467be195/artifacts/model -p 1234

원하는 모델의 run id 를 확인한 다음, port 를 지정하여 mlflow models serve 명령을 수행합니다.
- 모델 서빙이라는 의미는 쉽게 말하면 127.0.0.1:1234 에서 REST API 형태로 .predict() 함수를 사용할 수 있는 것을 의미합니다.
이제 해당 서버에 API 를 보내서, predict() 의 결과를 확인해보겠습니다.

API 를 보내기 위해서는, request body 에 포함될 data 의 형식을 알고 있어야 합니다.

diabetes data 의 column 과 sample data 를 확인해보겠습니다.

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html

data = load_diabetes()
print(data.feature_names)

df = pd.DataFrame(data.data)
print(df.head())

print(data.target[0])

    ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
          0         1         2         3         4         5         6  \
0  0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401   
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163  0.074412   
2  0.085299  0.050680  0.044451 -0.005671 -0.045599 -0.034194 -0.032356   
3 -0.089063 -0.044642 -0.011595 -0.036656  0.012191  0.024991 -0.036038   
4  0.005383 -0.044642 -0.036385  0.021872  0.003935  0.015596  0.008142   

          7         8         9  
0 -0.002592  0.019908 -0.017646  
1 -0.039493 -0.068330 -0.092204  
2 -0.002592  0.002864 -0.025930  
3  0.034309  0.022692 -0.009362  
4 -0.002592 -0.031991 -0.046641  

151.0

127.0.0.1:1234 서버에서 제공하는 POST /invocations API 요청을 수행해보겠습니다.

curl -X POST -H "Content-Type:application/json" --data '{"columns":["age", "sex", "bmi", "bp", "s1", "s2", "s3", "s4", "s5", "s6"],"data":[[0.038076, 0.050680,  0.061696,  0.021872, -0.044223, -0.034821, -0.043401, -0.002592,  0.019908, -0.017646]]}' http://127.0.0.1:1234/invocations

prediction value 가 API 의 response 로 반환되는 것을 확인할 수 있습니다.

정해진 Data size 와 다르게 POST /invocations API 요청을 수행해보겠습니다.

curl -X POST -H "Content-Type:application/json" --data '{"columns":["Age", "Sex", "Body mass index", "Average blood pressure", "S1", "S2", "S3", "S4", "S5", "S6", "S7"],"data":[[0.038076, 0.050680,  0.061696,  0.021872, -0.044223, -0.034821, -0.043401, -0.002592,  0.019908, -0.017646]]}' http://127.0.0.1:1234/invocations

data size 가 predict 하기에는 안 맞는다는 에러가 반환되는 것을 확인할 수 있습니다.

mlflow 를 사용한 서빙도 가능하지만, flask, seldon-core 와 같은 tool 을 사용한 서빙을 다룰 예정.

정원석

이기적이타주의자

이전 포스트

MLflow_1_설치하기

다음 포스트