DataOps 에러 해결방안

문주은·2024년 1월 18일

Error name : Airflow Error response from daemon

1) Detail

Error response from daemon: unable to find user 1001 #50000: no matching entries in passwd file

Windows에서 Airflow를 실행할 발생

2) Causes

  • Docker 컨테이너에서 사용자를 생성하거나 사용자 ID를 지정한 부분에 문제

3) Solution

  • windows에서 실행했을 때 발생. windows wsl 에서 실행하면 정상

Error name : Airflow invalid Host Header ERROR

1) Detail

docker-compose up --build를 할 때 http: invalid Host Header ERROR

2) Causes

3) Solution

DOCKER_BUILDKIT=0 docker-compose up --build  


Error name : Airflow could not read served logs

1) Detail

could not read served logs: Request URL is missing an 'http://' or 'https://' protocol

2) Causes

  • log 디렉토리의 소유자 문제로 logs 디렉토리에 log를 적재할 수 없는 이슈

3) Solution

sudo chown -R {username}:{username} {path}


Error name : Airflow cannot stop container: permission denied

1) Detail

Error response from daemon: cannot stop container: permission denied

2) Causes

  • Docker 데몬이 컨테이너를 중지(stop)하려고 할 때 권한이 없어서 발생한 문제

3) Solution

sudo systemctl restart docker.socket docker.service

Error name : Airflow Please make sure that all your Airflow components

1) Detail

Please make sure that all your Airflow components (e.g. schedulers, webservers, workers and triggerer) have the same 'secret_key' configured in 'webserver' section and time is synchronized on all your machines (for example with ntpd)

2) Causes

  • 모든 구성 요소가 같은 'secret_key'를 사용하도록 설정되어 있어야 한다는 경고로 모든 머신에서 시간이 동기화되어 있어야 함.

3) Solution

container 내부에서 python -c 'import os; print(os.urandom(16))' 실행



Error name : Cannot connect to the Docker daemon

1) Detail

docker ps를 실행 시 아래와 같은 결과 도출

$ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

2) Causes

다른팀에서 해당 서버에 외부망 설정을 임의로 변경하여 발생했던 이슈

3) Solution

Solution1) docker 상태 확인

$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/docker.service.d
             └─http-proxy.conf
     Active: active (running) since Mon 2024-02-05 23:29:18 UTC; 3min 20s ago
>>> 정상 작동('active') 확인

Solution2) docker user group 확인

## 1. docker 그룹 존재 유무 확인
$ cat /etc/group | grep -i docker
docker:x:995:
## 2. 특정 계정을 docker 소속 그룹에 추가
$ sudo usermod -aG docker $(whoami)
## 3. docker 그룹에 추가(user1) 완료 확인
$ su - user1
$ docker ps 
>> user1 사용자로 접속하여 docker 명령어가 정상 실행되는지 확인 완료

Solution3) socket 파일 확인
/run/docker.sock 파일이 확인되지 않은 것으로 판별 >> 실행 되지 않은 상태

Solution4) reboot

$ sudo reboot
>> 해당 명령어로 정상 작동 진행


Error name : OS Error Text file busy

1) Detail

OSError: [Errno 26] Text file busy: '/home/***/.cache/selenium/chromedriver/linux64/{ip}/chromedriver'

2) Causes

3) Solution

Solution1) pidof
chrome 프로세스 확인 및 kill

# confirm chrome process
pidof chrome  
# kill chrome process
pkill -f chrome

Solution2) chromedriver 재설치

rm /home/***/.cache/selenium/chromedriver/lniux64/{ip}/chromedriver
# install auto chrome driver (selenium > 4.x version)
service = ChromeService(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)


Error name : docker-prox still running

1) Detail

docker-proxy using port when no containers are running

2) Causes

3) Solution

Solution1) AAA

# Stop docker
sudo service docker stop 
# Find your particular zombie proxy processes
sudo netstat -pna | grep docker-proxy
# tcp6       0      0 0.0.0.0:8080       :::*     LISTEN      <PID_A>/docker-proxy  
# tcp6       0      0 :::8080            :::*     LISTEN      <PID_B>/docker-proxy
# ...
# Kill them
sudo kill -9 PID_A PID_B ...
# restart
sudo service docker start


Error name : Error response from daemon

1) Detail

Error response from daemon: 
driver failed programming external connectivity on endpoint airflow_webserver_container 
(666e108e2513d1daf8c59c8dcf3dd547960349cc299e882588e913309b3dcdce): 
Error starting userland proxy: listen tcp4 0.0.0.0:8080: bind: address already in use

2) Causes

docker-compose up -d 를 할때 8080 port를 사용중이므로 하나의 서버에서 port가 중복되는 문제 발생

3) Solution

Solution1) 해당 port 프로세스를 삭제 후 다시 run

# check port
$ sudo netstat -pna | grep  8080
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      9068/docker-proxy
tcp6       0      0 :::8080                 :::*                    LISTEN      9076/docker-proxy 
# check pid (List who's using the port)
$ sudo lsof -i -P -n | grep  8080
docker-pr   9068            root    4u  IPv4    80599      0t0  TCP *:8080 (LISTEN)
docker-pr   9076            root    4u  IPv6    78739      0t0  TCP *:8080 (LISTEN)
# kill duplicated pid
$ sudo kill -9 9068
$ sudo kill -9 9076


Error name : Jenkins Host key verification failed

1) Detail

'Host key verification failed' in deployment of Jenkins

위 에러와 함께 user1 user account에서 작동되는 shell script가 jenkins user account에서 작동되지 않는 이슈

2) Causes

For example, There are multiple users(user1, jenkins, docker, root)in A server.
But jenkins is service account.

3) Solution

Solution1) jenkins account에서 default user account에서 설정한 것과 똑같이 설정

# enter into jenkins account
$ sudo su -s /bin/bash jenkins
## Setting ssh github without password 
# 1. create keygen and upload in github
$ ssh-keygen
$ cat /var/lib/jenkins/.ssh/id_rsa.pub
# 2. ctrl+c > ctrl+v to github ssh setting
## Setting ssh remote server without password
# 1. copy id_rsa.pub into remote server  
$ vi ~/.ssh/authorized_keys # in remote server
# 2. paste id_rsa.pub

Error name : "git submodule update" failed with 'fatal: detected dubious ownership in repository at...

1) Detail

fatal: detected dubious ownership in repository at '/media/data/users/jhu3szh/serialize'
To add an exception for this directory, call:

git config --global --add safe.directory /media/data/users/jhu3szh/serialize

This error is occured when I execute "git pull" in jenkins server with jenkins user account.

2) Causes

This is not error. Its warning.
Silence all warnings related to Git's safe.directory system

3) Solution

Solution1) git config setting

$ sudo su -s /bin/bash jenkins
$ git config --global --add safe.directory '*'

Error name : Permission denied: '/opt/airflow/logs/scheduler'

1) Detail

When I run docker-compose up --build, I get below error in airflow_scheduler_container

PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'

2) Causes

3) Solution

Solution1) changee log directory location
I modified my log path to "/usr/local/airflow/log"

# docker-compose.yaml
...
volumes:
    - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
    - ${AIRFLOW_PROJ_DIR:-.}/comm:/opt/airflow/comm
    - ${AIRFLOW_PROJ_DIR:-.}/logs:/usr/local/airflow/log

Reference : https://github.com/xnuinside/airflow_in_docker_compose/issues/4


Error name : Permission denied Docker daemon sock

1) Detail

This error is occured when I deployed airflow by jenkins

ERROR: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock

2) Causes

docker is system account group.
usergroup of docker is not included current user account

# Check usergroup of docker
cat /etc/group

# Check docker group to which the user account is included
groups docker 
root user1
> jenkins is not included in docker group

3) Solution

Solution1) add user in docker group

# add user(jenkins) to docker group
$ sudo usermod -aG docker jenkins
# new permission
$ sudo chmod 666 /var/run/docker.sock
# restart docker daemon service
$ sudo systemctl restart docker

Error name : Runtime.InvalidEntrypoint

1) Detail

This error is occured when I deploy lambda function with container image from ECR.

{
  "errorType": "Runtime.InvalidEntrypoint",
  "errorMessage": "RequestId: e38a5180-079b-4c73-86a3-9512f2798da3 Error: fork/exec /lambda-entrypoint.sh: exec format error"
}

2) Causes

CPU architecture name is different with local(M3 environment) and Lambda function environment.
Apple M3 is a series of ARM-based system on a chip.

3) Solution

Solution 1) Matching arm64
Since when I build the container image, I matched with the cpu name 'arm64'

# AS-IS
$ docker build --no-cache -t lambda . 
# TO-BE
$ docker build --no-cache --platform linux/arm64 -t lambda . 

Error name : airflow uid, gid error

1) Detail

2) Causes

3) Solution

Solution 1) update GID
there's a file .env
[.env]
AIRFLOW_UID=50000


Error name : pip install permission denied metadata

1) Detail

ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/home/airflow/.local/lib/python3.10/site-packages/pip'

2) Causes

when I run RUN pip install --upgrade pip,
or RUN pip install --user --no-cache-dir -r /opt/airflow/dags/requirements.txt
Permission denied erorr is occured.

When I checked permission in docker container, result is like this.

$ whoami
airflow

$ id
uid=50000(airflow) gid=50000 groups=50000

$ ls -ld /home/airflow/.local/lib/python3.10/site-packages/
drwxrrwxr-x 1001 1001

현재 사용자는 airflow이고 UID와 GID가 50000으로 설정.
또한 /home/airflow/.local/lib/python3.10/site-packages/ 디렉터리의 권한이 drwxrwxr-x 1001 1001로 표시되므로, 1001 사용자(즉, Airflow 컨테이너 내의 기본 사용자)가 해당 디렉터리의 소유자로 설정되어 있으며, 읽기 및 실행 권한은 있지만 쓰기 권한이 없음.
이 때문에 Permission denied 오류가 발생

3) Solution

Solution 1) change owner and permission

# Run in root user
USER root
RUN mkdir -p /home/airflow/.local/ && \
    chown -R airflow:airflow /home/airflow/.local/
# Transfer airflow user
USER airflow
RUN pip install --upgrade pip
RUN pip install --user --no-cache-dir -r /opt/airflow/dags/requirements.txt

Error name : Failed to allocate directory watch: Too many open files

1) Detail

I'm using airbyte with k8s environment.

$ systemctl restart nrpe
Failed to allocate directory watch: Too many open files

2) Causes

Limitation of resource

  • There is limitation of system resource that can open the number of files, etc...

  • current file limitation open available

    $ ulimit -n
    1024

3) Solution

Solution 1) update limits.conf

$ sudo vi /etc/security/limits.conf
  • soft nofile 15360
  • hard nofile 15360

> __Solution 2) ulimi__

ulimit -n 15360


---

## Template 
## Error name : 
#### 1) Detail
#### 2) Causes
#### 3) Solution
> __Solution 1) AAA__
> __Solution 2) BBB__
profile
Data Engineer

0개의 댓글