데이터엔지니어스터디

데이터엔지니어스터디

Automating with dbt build

우상욱·2024년 3월 2일

0

DBT

목록 보기

14/16

Review

sources and seeds feed initial data to dbt
EL 파이프라인을 거쳐 올라온 sources와 일반적으로 csv 파일로 구성될 수 있는 seeds
models handle the transformation of data (usually from sources / seeds) for downstream users
다운 스트림 사용자를 위해 일반적으로 소스 또는 시드의 데이터 작업을 변환
snapshots track changes in datasets
데이터셋의 변화 감지를 위한 스냅샷
tests can validate sources, seeds, models, and even snapshots
- Build-in (unique, not_null, relation, accepted_values)
- Singular
- Generic / Reusable
dbt build performs all these tasks, usually in production
dbt build 하위 명령을 사용하면 일반적으로 프로덕션 환경에서 이러한 모든 작업을 수행할 수 있습니다.

DBT build

dbt build :

Combination of multiple tasks
여러 DBT 작업을 결합하고, 모든 모델을 실행하고 검증을 실행
Run models(dbt run)
Run validations via tests(dbt test)
Process any seeds (dbt seed)

Remember the commands can be run individually if required

Why?

dbt build의 목적은 무엇일까?

individual subcommands work well, but don't handle all potential issues
- dbt run은 데이터의 유효성 검사를 하지 않습니다. 즉 모델 업데이트 전에 테스트를 실행하지 않습니다.
- dbt snapshot은 자동으로 추적되므로, 잠재적으로 추적된 변경사항을 포착하지 않을 수도 있습니다.
- 다운스트림 모델에서 발견된 특정 쿼리에 대해서는 dbt seed가 완전하지 않을 수 있습니다.
- dbt build는 종속성을 전체적으로 결정하고, 프로덕션 변경 전에 모든 테스트를 실행합니다.
- dbt build는 테스트환경이나 작은 변경사항이 있을 경우 과도할 수 있습니다.

일반적으로 프로덕션 데이터에 대해 실행하는 경우, dbt 빌드를 사용하는 것이 가장 좋습니다.

Steps can be run manually instead if required.

Production 환경에서의 실행 순서

dbt build
에러 수정
dbt build
에러가 없음을 확인한 뒤, dbt docs generate : 문서 생성
dbt docs serve : dbt docs 서버 실행

데이터엔지니어

이전 포스트

SCD2 with dbt snapshots

다음 포스트

Course Review

0개의 댓글

관련 채용 정보