[Obtaining data]
[Data pipelines]
Raw data → Pre-processing → ML train → test set performance → Post-processing → Product
POC(proof-of-concept)
Production phase
[Meta-data, data provenance and lineage]

Keep track of data provenance and lineage (Hard!)
- provenance: where it comes from
- lineage: sequence of steps
Make extensive use of Meta-data
for error analysis. Spotting unexpected effects.
for keeping track of data provenance.
- Manufacturing visual inspection: Time, factory, line #, camera settings, inspector ID, ...
- Speech recognition: Device type, labeler ID, VAD model ID, ...
[Balanced train/dev/test splits]