What are dbt seeds?
- CSV files to be loaded into data warehouse
- Typically rarely chaing sets of data
- List of countries
- List of postal codes(우편번호)
- Not meant for raw data
Why?
- Easy to manage
- Easy to use in various scenarios
- Source controllable
How are seeds defined?
- Add CSV file to the
seeds
directory
- Make sure the header is the first row
- import using the
dbt seed
command
Further configuration
- Several options
- Which schema?
- What database?
- Column quoting
- Column data types
- Can be applied to the whole project or individual seeds
- Can be added to
dbt_project.yml
or seed/properties.yml
Defining datatypes
- Availabe data types depends on data warehouse
- Typical ones are available
- If type is not defined, it is inferred based on the data
만약 타입이 정의되지 않았다면, 데이터 자체를 통해 추론됩니다.
version: 2
seeds:
- name: zipcodes
config:
column_types:
zipcode: varchar(5)
Tests & documentation
- Support tests
- Support documentation
- Just like models and sources
version: 2
seeds:
- name : zipcodes
description: US zipcodes
config:
column_types:
zipcode: varchar(5)
columns:
- name: zipcode
tests:
- unique
Accessing seeds
- Available via the
{{ ref() }}
command
- Behaves as a model after initial import
select * from
{{ ref('zipcodes') }}