데이터엔지니어스터디

데이터엔지니어스터디

DBT seeds

우상욱·2024년 3월 2일

0

DBT

목록 보기

12/16

What are dbt seeds?

CSV files to be loaded into data warehouse
Typically rarely chaing sets of data
- List of countries
- List of postal codes(우편번호)
Not meant for raw data

Why?

Easy to manage
Easy to use in various scenarios
Source controllable

How are seeds defined?

Add CSV file to the seeds directory
Make sure the header is the first row
import using the dbt seed command

Further configuration

Several options
- Which schema?
- What database?
- Column quoting
- Column data types
Can be applied to the whole project or individual seeds
Can be added to dbt_project.yml or seed/properties.yml

Defining datatypes

Availabe data types depends on data warehouse
Typical ones are available
- Integer
- Varchar
- etc
If type is not defined, it is inferred based on the data
만약 타입이 정의되지 않았다면, 데이터 자체를 통해 추론됩니다.

version: 2

seeds:
  - name: zipcodes
    config:
      column_types: 
        zipcode: varchar(5)

Tests & documentation

Support tests
Support documentation
Just like models and sources

version: 2

seeds:
  - name : zipcodes
    description: US zipcodes
    config:
      column_types:
        zipcode: varchar(5)
    columns:
      - name: zipcode
        tests:
          - unique

Accessing seeds

Available via the {{ ref() }} command
Behaves as a model after initial import

select * from
  {{ ref('zipcodes') }}

데이터엔지니어

이전 포스트

DBT sources

다음 포스트

SCD2 with dbt snapshots

0개의 댓글

관련 채용 정보