What is a dbt source?
- Name and description of data loaded by EL process
dbt에서 소스는 EL 프로세스에 의해 로드된 데이터의 이름을 지정하고, 설명하는 기능을 나타냅니다.
이는 데이터 웨어하우스에 있거나, 들어오려고하는 데이터에 추가 정보를 적용하는 것입니다.
- Helps define data lineage
- Tests
- Documentation
Sources
- Provides data lineage information
- Describes the flow of data in warehouse
- Use the Jinja
{{ source() }}
function
- Similar to the
{{ ref() }}
function
- Simplifies accessing raw data
select *
from
{{ source('raw', 'orders') }}
Defining a source
- In the .yml file
- File can be
models/model_properties.yml
- Or any other .yml file in the directory
- Under the
source:
section
- Name the source with the
- name:
option
- Define each source table with a
- name:
option under tables:
- Different options available depending on the data warehouse type, refer to dbt documentation
version: 2
sources:
- name: raw
tables:
- name: phone_orders
- name: web_orders
Accessing sources
- Use the `{{ source() }}
- {{ source(source_name, table_name) }}
- Provides the proper name of the table in the compiled query
select * from
{{ source('raw', 'phone_orders') }}
UNION
select * from
{{ source('raw', 'web_orders') }}
-- dbt compiled
select * from
'raw'.'phone_orders'
UNION
select * from
'raw'.'web_orders'
Testing sources
- You can apply tests to sources
- Same methods as applying to models
- Defined in the
sources:
section instead of models:
- In .yml file where sources defined
version: 2
sources:
- name: raw
tables:
- name: phone_orders
columns:
- name: id
tests:
- not_null
- unique
- name: web_orders
Documentation
- Document using the same tools as models
- Defined in the
sources:
section
version: 2
sources:
- name: raw
tables:
- name: phone_orders
description: >
Sales orders by phone, daily
columns:
- name: id
tests:
- not_null
- unique
Model VS Sourc