ML Metadata

Hanna·2022년 3월 17일
1

Process Outline

Here is the figure shown in class that describes the different components in an ML Metadata store:


The green box in the middle shows the data model followed by ML Metadata. The official documentation describe each of these and we'll show it here as well for easy reference:

  • ArtifactType describes an artifact's type and its properties that are stored in the metadata store. You can register these types on-the-fly with the metadata store in code, or you can load them in the store from a serialized format. Once you register a type, its definition is available throughout the lifetime of the store.
  • An Artifact describes a specific instance of an ArtifactType, and its properties that are written to the metadata store.
  • An ExecutionType describes a type of component or step in a workflow, and its runtime parameters.
  • An Execution is a record of a component run or a step in an ML workflow and the runtime parameters. An execution can be thought of as an instance of an ExecutionType. Executions are recorded when you run an ML pipeline or step.
  • An Event is a record of the relationship between artifacts and executions. When an execution happens, events record every artifact that was used by the execution, and every artifact that was produced. These records allow for lineage tracking throughout a workflow. By looking at all events, MLMD knows what executions happened and what artifacts were created as a result. MLMD can then recurse back from any artifact to all of its upstream inputs.
  • A ContextType describes a type of conceptual group of artifacts and executions in a workflow, and its structural properties. For example: projects, pipeline runs, experiments, owners etc.
  • A Context is an instance of a ContextType. It captures the shared information within the group. For example: project name, changelist commit id, experiment annotations etc. It has a user-defined unique name within its ContextType.
  • An Attribution is a record of the relationship between artifacts and contexts.
  • An Association is a record of the relationship between executions and contexts.

As mentioned earlier, you will use TFDV to generate a schema and record this process in the ML Metadata store. You will be starting from scratch so you will be defining each component of the data model. The outline of steps involve:

  1. Defining the ML Metadata's storage database
  2. Setting up the necessary artifact types
  3. Setting up the execution types
  4. Generating an input artifact unit
  5. Generating an execution unit
  6. Registering an input event
  7. Running the TFDV component
  8. Generating an output artifact unit
  9. Registering an output event
  10. Updating the execution unit
  11. Seting up and generating a context unit
  12. Generating attributions and associations

You can then retrieve information from the database to investigate aspects of your project. For example, you can find which dataset was used to generate a particular schema. You will also do that in this exercise.

For each of these steps, you may want to have the MetadataStore API documentation open so you can lookup any of the methods you will be using to interact with the metadata store. You can also look at the metadata_store protocol buffer here to see descriptions of each data type covered in this tutorial.

profile
매일 성장하고 있습니다

0개의 댓글