
[Key challenges in "Deploy in production"]
[Key challenges in "Monitor & maintain system"]
[Deployment patterns]
Shadow mode:
- ML system shadows the human and runs in parallel.
- ML system's output not used for any decisions during this phase.
Canary deployment
- Roll out to small fraction (say 5%) of traffic initially.
- Monitor system and ramp up traffic gradually.
Blue Green deployment
- Use a router to switch between old version and new version.
- easy way to enable rollback.
[Degrees of automation]
[Monitoring]
- Software metrics: memory, compute, latency, throughput, server load, ...
- Input metrics: Avg input length, Avg input volume, Num missing values, ...
- Output metrics: Num missing values, Num outliers, ...
As a result, either manual retraining or automatic retraining is performed.