Test and Debugging

findingflow·2021년 11월 21일

gcl ml engineer

목록 보기

3/5

Relevant GCP Products to exam

AI Hub (KubeFlow, KubeFlow Pipelines)
Cloud AI Platform (AutoML (tables), Training, Serving, Explanations)
Tensorflow Data Validation, Transform, Model analysis
Compute Engine
Cloud Storage
Cloud SQL
Cloud MemoryStore, Datastore, Bigtable
BigQuery
Dataflow, Dataprep, Dataproc
Operations (formerly stackdriver)
Cloud Build
Container Repository, Source Repository
Cloud Composer
Container Registry
Cloud Functions
Cloud Run
GKE
Data Studio

– AI Platform enables full lifecycle support for custom ML models in a variety of frameworks (though the limitations need to be understood; TensorFlow is a first-class citizen compared to the rest).
– AutoML supports custom model development for a limited set of use cases in a codeless manner, leveraging transfer learning and neural architecture search under the covers.
– BigQuery ML supports custom model development for a variety of use cases involving structured data directly in BigQuery, through SQL. This isn’t as hands-off as AutoML, but is much less complex than AI Platform.
– Pre-Trained Models are consumed as-is through a RESTful API interface, with no customization and low implementation complexity.
– Solutions like Contact Center AI and Document AI solve industry-specific problems with the underlying ML products and services.

Testing and Debugging

Rubric for ML Paper: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf

NaN/Inf Loss --> Exploding gradient

Too high learning rate (Adam default 0.001, SGD default 0.01)

Data validation with data schema (rules for expected statistics)

Numeric --> range and distribution
Categorical --> Possible values
Ensure splits are good quality (statistically similar)

Baselines

Regression (mean), Classification (most common label)
Cannot be always wrong, not meaningful as baseline (eg. predicting all taxi fare as $1 when minimum fare is $3)
Only use a trained model as a baseline after fully validated in production

Split skew

df.describe() on train and test set

Wrong loss/activation function

True vs Predict classes
- Calculate mean/std
- Plot distribution

findingflow

Data Analyst