quay.io/astronomer/astro-runtime:12.6.02.10.4+astro.1pyspark 설치pip install astro-cli
astro dev init airflow-spark
cd airflow-spark
생성되는 주요 구조:
airflow-spark/
├─ dags/
├─include/
├─ Dockerfile
├─ requirements.txt
USER airflow 전환 시점 주의 (없는 유저일 수 있음)FROM quay.io/astronomer/astro-runtime:12.6.0
USER root
# Java 17 및 필수 도구 설치
RUN apt-get update && \
apt-get install -y --no-install-recommends openjdk-17-jdk wget && \
rm -rf /var/lib/apt/lists/*
# MSSQL JDBC 드라이버 (Spark 연동 대비)
RUN mkdir -p /opt/airflow/jars && \
wget -O /opt/airflow/jars/mssql-jdbc-12.4.2.jre11.jar \
https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/12.4.2.jre11/mssql-jdbc-12.4.2.jre11.jar && \
chmod 644 /opt/airflow/jars/mssql-jdbc-12.4.2.jre11.jar
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
ENV PATH=$PATH:$JAVA_HOME/bin
# PySpark 설치
RUN pip install --no-cache-dir pyspark
version: "3.1"
services:
scheduler:
environment:
- JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
- SPARK_HOME=/home/airflow/.local/lib/python3.11/site-packages/pyspark
volumes:
- ./jars:/opt/airflow/jars
Variable 안전 처리
api_key = Variable.get(
"weather_decoding_api_key",
default_var=os.getenv("WEATHER_DECODING_API_KEY")
)
ifnot api_key:
raise ValueError("Missing API key")
data_interval_end 사용 방식
deffetch_ultra_srt_ncst(**context):
data_interval_end = context["data_interval_end"]
start_date는 반드시 고정값
에러
unableto finduser airflow:no matching entriesin passwd file
원인
USER airflow가 Dockerfile에 있으나,airflow 유저가 존재하지 않음해결
USER airflow 제거증상
webserver healthcheck timedoutafter1m0s
로그 핵심
ERROR: You needto upgrade the database.
AttributeError: execution_date
원인
해결 (로컬 개발 기준 정답)
astro dev stop
docker ps -aq --filter"name=airflow-spark" | ForEach-Object { dockerrm -f$_ }
docker volumels --format"{{.Name}}" | findstr airflow-spark | ForEach-Object { docker volumerm$_ }
astro dev start --wait 5m
에러
ModuleNotFoundError: airflow.providers.standard
ModuleNotFoundError: airflow.sdk
원인
수정 기준 (Airflow 2.10.x 호환)
from airflow.operators.pythonimport PythonOperator
from airflow.decoratorsimport dag, task
from airflow.datasetsimport Dataset
에러
KeyError:'Variable weather_decoding_api_key does not exist'
원인
Variable.get() 호출해결
weather_decoding_api_keyastro dev run airflow airflow variablesset weather_decoding_api_key"<API_KEY>"
start_date=pendulum.datetime(2026,1,16, tz=local_tz)