What is ๐๐ผ๐ป๐๐ถ๐ป๐๐ผ๐๐ ๐ง๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด (๐๐ง) in MLOps and what steps are needed to achieve it?
CT is the process of automated ML Model retraining in Production Environments on a specific trigger. Letโs look into some prerequisites for this:
1๏ธโฃ Automation of ML Pipelines.
๐ Pipelines are orchestrated.
๐ Each pipeline step is developed independently and is able to run on different technology stacks.
๐ Pipelines are treated as a code artifact.
โ
You deploy Pipelines instead of Model Artifacts allowing Continuous Training In production.
โ
Reuse of components allows for rapid experimentation.
2๏ธโฃ Introduction of strict Data and Model Validation steps in the ML Pipeline.
๐ Data is validated before training the Model. If inconsistencies are found - Pipeline is aborted.
๐ Model is validated after training. Only after it passes the validation is it handed over for deployment.
โ Short circuits of the Pipeline allow for safe CT in production.
3๏ธโฃ Introduction of ML Metadata Store.
๐ Any Metadata related to ML artifact creation is tracked here.
๐ We also track performance of the ML Model.
โ
Experiments become reproducible and comparable between each other.
โ
Model Registry acts as glue between training and deployment pipelines.
4๏ธโฃ Different Pipeline triggers in production.
๐ Ad-hoc.
๐ Cron.
๐ Reactive to Metrics produced in Model Monitoring System.
๐ Arrival of New Data.
โ This is where the Continuous Training is actually triggered.
5๏ธโฃ Introduction of Feature Store (Optional).
๐ Avoid work duplication when defining features.
๐ Reduce risk of Training/Serving Skew.
๐ ๐ ๐๐ต๐ผ๐๐ด๐ต๐๐ ๐ผ๐ป ๐๐ง:
โก๏ธ Introduction of CT is not straightforward and you should approach it iteratively. The following could be good Quarterly Goals to set:
๐ Experiment Tracking is extremely important at any level of ML Maturity and the least invasive in the process of ML Model training - I would start with ML Metadata Store introduction.
๐ Orchestration of ML Pipelines is always a good idea, there are many tools supporting this (Airflow, Kubeflow, VertexAI etc.). If you are not doing it yet - grab this next, also make the validation steps part of this goal.
๐ The need for Feature Store will wary on the types of Models you are deploying. I would prioritize it if you have Models that perform Online predictions as it will help with avoiding Training/Serving Skew.
๐ Donโt rush with Automated retraining. Ad-hoc and on-schedule will bring you a long way.
Let me know your thoughts! ๐
Follow Aurimas Griciลซnas to upskill inย #MLOps,ย #MachineLearning,ย #DataEngineering,ย #DataScienceย and overallย #Dataย space.
๐๐ผ๐ปโ๐ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐ ๐๐ผ ๐น๐ถ๐ธ๐ฒ ๐, ๐๐ต๐ฎ๐ฟ๐ฒ ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐!
Join a growing community of Data Professionals by subscribing to my ๐ก๐ฒ๐๐๐น๐ฒ๐๐๐ฒ๐ฟ:ย https://lnkd.in/e5d3GuJe
์ถ์ฒ : Aurimas, Machine Learning Community