DBT seeds

우상욱·2024년 3월 2일
0

DBT

목록 보기
12/16

What are dbt seeds?


  • CSV files to be loaded into data warehouse
  • Typically rarely chaing sets of data
    • List of countries
    • List of postal codes(우편번호)
  • Not meant for raw data

Why?


  • Easy to manage
  • Easy to use in various scenarios
  • Source controllable

How are seeds defined?


  • Add CSV file to the seeds directory
  • Make sure the header is the first row
  • import using the dbt seed command

Further configuration


  • Several options
    • Which schema?
    • What database?
    • Column quoting
    • Column data types
  • Can be applied to the whole project or individual seeds
  • Can be added to dbt_project.yml or seed/properties.yml

Defining datatypes


  • Availabe data types depends on data warehouse
  • Typical ones are available
    • Integer
    • Varchar
    • etc
  • If type is not defined, it is inferred based on the data
    만약 타입이 정의되지 않았다면, 데이터 자체를 통해 추론됩니다.
version: 2

seeds:
  - name: zipcodes
    config:
      column_types: 
        zipcode: varchar(5)

Tests & documentation


  • Support tests
  • Support documentation
  • Just like models and sources
version: 2

seeds:
  - name : zipcodes
    description: US zipcodes
    config:
      column_types:
        zipcode: varchar(5)
    columns:
      - name: zipcode
        tests:
          - unique

Accessing seeds


  • Available via the {{ ref() }} command
  • Behaves as a model after initial import
select * from
  {{ ref('zipcodes') }}
profile
데이터엔지니어

0개의 댓글