[Airflow/EC2/트러블슈팅] airflow 크롤링 배치 파일 쓰기 권한 문제

김진만·2023년 7월 11일
0

User / Group / Others
컨테이너가 데이터를 긁어다주면 컨테이너는 Others다
그래서 chmod 해서
write 권한 줘야 한다
이것때매 서버에서 크롤링 배치 돌리는데 애먹었다.
크롤링 주체가 컨테이너라 외부인 취급
다들 참고하세요
어디에도, GPT도 못알려주는..
컨테이너 내부 logs 깊숙이 들어와서 로그 보니까
csv 파일은 만들어졌는데 못써져서 dag error 뜨더라
어제 한시간 오늘 한시간 슈팅 하다가 아웃바운드 all traffic 에러였는데
하..
하여간 너무 기분 좋다 ㅎㅎ

에러 로그 한번 구경..(컨테이너 내부 task log 7-depth 에서 발견한 Permission Denied)
호스트pc에서 아무리 컨테이너에 logs 찍어도 안나오길래..

airflow@0b77f1cca39a:/opt/airflow/logs/dag_id=binance/run_id=manual__2023-07-11T08:46:43.391781+00:00/task_id=binance_crawl$ sudo cat 'attempt=1.log'

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.

[sudo] password for airflow:
sudo: a password is required
airflow@0b77f1cca39a:/opt/airflow/logs/dag_id=binance/run_id=manual2023-07-11T08:46:43.391781+00:00/task_id=binance_crawl$ vi 'attempt=1.log'
airflow@0b77f1cca39a:/opt/airflow/logs/dag_id=binance/run_id=manual
2023-07-11T08:46:43.391781+00:00/task_id=binance_crawl$ vi 'attempt=1.log'
airflow@0b77f1cca39a:/opt/airflow/logs/dag_id=binance/run_id=manual2023-07-11T10:18:13.187454+00:00/task_id=binance_crawl$ cat 'attempt=1.log'
[2023-07-11T10:18:19.471+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: binance.binance_crawl manual
2023-07-11T10:18:13.187454+00:00 [queued]>
[2023-07-11T10:18:19.481+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: binance.binance_crawl manual2023-07-11T10:18:13.187454+00:00 [queued]>
[2023-07-11T10:18:19.481+0000] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
[2023-07-11T10:18:19.502+0000] {taskinstance.py:1327} INFO - Executing <Task(PythonOperator): binance_crawl> on 2023-07-11 10:18:13.187454+00:00
[2023-07-11T10:18:19.508+0000] {standard_task_runner.py:57} INFO - Started process 1315 to run task
[2023-07-11T10:18:19.511+0000] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'binance', 'binance_crawl', 'manual
2023-07-11T10:18:13.187454+00:00', '--job-id', '6', '--raw', '--subdir', 'DAGS_FOLDER/binance_crawl.py', '--cfg-path', '/tmp/tmp7pzfq8yh']
[2023-07-11T10:18:19.513+0000] {standard_task_runner.py:85} INFO - Job 6: Subtask binance_crawl
[2023-07-11T10:18:19.560+0000] {task_command.py:410} INFO - Running <TaskInstance: binance.binance_crawl manual2023-07-11T10:18:13.187454+00:00 [running]> on host 0b77f1cca39a
[2023-07-11T10:18:19.895+0000] {taskinstance.py:1547} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='binance' AIRFLOW_CTX_TASK_ID='binance_crawl' AIRFLOW_CTX_EXECUTION_DATE='2023-07-11T10:18:13.187454+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual
2023-07-11T10:18:13.187454+00:00'
[2023-07-11T10:18:20.231+0000] {taskinstance.py:1824} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 181, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 198, in execute_callable
return self.python_callable(self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/binance_crawl.py", line 47, in _binance_api
df.to_csv(f'/home/airflow/data/{file_name}', index=False)
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 3482, in to_csv
storage_options=storage_options,
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1105, in to_csv
csv_formatter.save()
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 243, in save
storage_options=self.storage_options,
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/common.py", line 707, in get_handle
newline="",
PermissionError: [Errno 13] Permission denied: '/home/airflow/data/bccusdt_1month.csv'
[2023-07-11T10:18:20.239+0000] {taskinstance.py:1350} INFO - Marking task as FAILED. dag_id=binance, task_id=binance_crawl, execution_date=20230711T101813, start_date=20230711T101819, end_date=20230711T101820
[2023-07-11T10:18:20.266+0000] {standard_task_runner.py:109} ERROR - Failed to execute job 6 for task binance_crawl ([Errno 13] Permission denied: '/home/airflow/data/bccusdt_1month.csv'; 1315)
[2023-07-11T10:18:20.285+0000] {local_task_job_runner.py:225} INFO - Task exited with return code 1
[2023-07-11T10:18:20.303+0000] {taskinstance.py:2651} INFO - 0 downstream tasks scheduled from follow-on schedule check
airflow@0b77f1cca39a:/opt/airflow/logs/dag_id=binance/run_id=manual2023-07-11T10:18:13.187454+00:00/task_id=binance_crawl$ cat 'attempt=1.log'
[2023-07-11T10:18:19.471+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: binance.binance_crawl manual
2023-07-11T10:18:13.187454+00:00 [queued]>
[2023-07-11T10:18:19.481+0000] {taskinstance.py:1103} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: binance.binance_crawl manual2023-07-11T10:18:13.187454+00:00 [queued]>
[2023-07-11T10:18:19.481+0000] {taskinstance.py:1308} INFO - Starting attempt 1 of 1
[2023-07-11T10:18:19.502+0000] {taskinstance.py:1327} INFO - Executing <Task(PythonOperator): binance_crawl> on 2023-07-11 10:18:13.187454+00:00
[2023-07-11T10:18:19.508+0000] {standard_task_runner.py:57} INFO - Started process 1315 to run task
[2023-07-11T10:18:19.511+0000] {standard_task_runner.py:84} INFO - Running: ['airflow', 'tasks', 'run', 'binance', 'binance_crawl', 'manual
2023-07-11T10:18:13.187454+00:00', '--job-id', '6', '--raw', '--subdir', 'DAGS_FOLDER/binance_crawl.py', '--cfg-path', '/tmp/tmp7pzfq8yh']
[2023-07-11T10:18:19.513+0000] {standard_task_runner.py:85} INFO - Job 6: Subtask binance_crawl
[2023-07-11T10:18:19.560+0000] {task_command.py:410} INFO - Running <TaskInstance: binance.binance_crawl manual2023-07-11T10:18:13.187454+00:00 [running]> on host 0b77f1cca39a
[2023-07-11T10:18:19.895+0000] {taskinstance.py:1547} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='binance' AIRFLOW_CTX_TASK_ID='binance_crawl' AIRFLOW_CTX_EXECUTION_DATE='2023-07-11T10:18:13.187454+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='manual
2023-07-11T10:18:13.187454+00:00'
[2023-07-11T10:18:20.231+0000] {taskinstance.py:1824} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 181, in execute
return_value = self.execute_callable()
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/operators/python.py", line 198, in execute_callable
return self.python_callable(
self.op_args, **self.op_kwargs)
File "/opt/airflow/dags/binance_crawl.py", line 47, in _binance_api
df.to_csv(f'/home/airflow/data/{file_name}', index=False)
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/core/generic.py", line 3482, in to_csv
storage_options=storage_options,
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/formats/format.py", line 1105, in to_csv
csv_formatter.save()
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/formats/csvs.py", line 243, in save
storage_options=self.storage_options,
File "/home/airflow/.local/lib/python3.7/site-packages/pandas/io/common.py", line 707, in get_handle
newline="",
PermissionError: [Errno 13] Permission denied: '/home/airflow/data/bccusdt_1month.csv'
[2023-07-11T10:18:20.239+0000] {taskinstance.py:1350} INFO - Marking task as FAILED. dag_id=binance, task_id=binance_crawl, execution_date=20230711T101813, start_date=20230711T101819, end_date=20230711T101820
[2023-07-11T10:18:20.266+0000] {standard_task_runner.py:109} ERROR - Failed to execute job 6 for task binance_crawl ([Errno 13] Permission denied: '/home/airflow/data/bccusdt_1month.csv'; 1315)
[2023-07-11T10:18:20.285+0000] {local_task_job_runner.py:225} INFO - Task exited with return code 1
[2023-07-11T10:18:20.303+0000] {taskinstance.py:2651} INFO - 0 downstream tasks scheduled from follow-on schedule check
airflow@0b77f1cca39a:/opt/airflow/logs/dag_id=binance/run_id=manual__2023-07-11T10:18:13.187454+00:00/task_id=binance_crawl$

profile
충분한 전기와 컴퓨터 한 대와 내 이 몸만 남아 있다면 지구를 재건할 수 있습니다.

0개의 댓글