[AWES 3기] 3주차 스터디 내용 정리

ajufresh·2025년 2월 20일

1. 실습 환경 배포

2주차와 구성이 유사해보이지만 차이가 있다. EFS가 생겼고 파드와 연결되어 파드에서는 스토리지용으로 EFS를 사용한다.

[1] AWS CloudFormation 을 통해 기본 실습 환경 배포

# yaml 파일 다운로드
curl -O https://s3.ap-northeast-2.amazonaws.com/cloudformation.cloudneta.net/K8S/myeks-3week.yaml

# 배포
# aws cloudformation deploy --template-file myeks-1week.yaml --stack-name mykops --parameter-overrides KeyName=<My SSH Keyname> SgIngressSshCidr=<My Home Public IP Address>/32 --region <리전>
예시) aws cloudformation deploy --template-file ~/Downloads/myeks-3week.yaml \
     --stack-name myeks --parameter-overrides KeyName=kp-gasida SgIngressSshCidr=$(curl -s ipinfo.io/ip)/32 --region ap-northeast-2

# CloudFormation 스택 배포 완료 후 운영서버 EC2 IP 출력
aws cloudformation describe-stacks --stack-name myeks --query 'Stacks[*].Outputs[*].OutputValue' --output text
예시) 3.35.137.31

# 운영서버 EC2 에 SSH 접속
예시) ssh ec2-user@3.35.137.31
ssh -i <ssh 키파일> ec2-user@$(aws cloudformation describe-stacks --stack-name myeks --query 'Stacks[*].Outputs[0].OutputValue' --output text)

배포가 완료되면 AWS 콘솔에서 EFS를 확인할 수 있다.

[2] eksctl 을 통해 EKS 배포
yaml 기반으로 배포하기 위해 환경변수를 세팅한다.

export CLUSTER_NAME=myeks

# myeks-VPC/Subnet 정보 확인 및 변수 지정
export VPCID=$(aws ec2 describe-vpcs --filters "Name=tag:Name,Values=$CLUSTER_NAME-VPC" --query 'Vpcs[*].VpcId' --output text)
echo $VPCID

export PubSubnet1=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet1" --query "Subnets[0].[SubnetId]" --output text)
export PubSubnet2=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet2" --query "Subnets[0].[SubnetId]" --output text)
export PubSubnet3=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet3" --query "Subnets[0].[SubnetId]" --output text)
echo $PubSubnet1 $PubSubnet2 $PubSubnet3


#------------------ 
SSHKEYNAME=<각자 자신의 SSH Keypair 이름>
SSHKEYNAME=aews

그 이후에는 myeks.yaml 파일을 작성하고, 최종 yaml로 eks를 배포한다.

cat << EOF > myeks.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: myeks
  region: ap-northeast-2
  version: "1.31"

iam:
  withOIDC: true # enables the IAM OIDC provider as well as IRSA for the Amazon CNI plugin

  serviceAccounts: # service accounts to create in the cluster. See IAM Service Accounts
  - metadata:
      name: aws-load-balancer-controller
      namespace: kube-system
    wellKnownPolicies:
      awsLoadBalancerController: true

vpc:
  cidr: 192.168.0.0/16
  clusterEndpoints:
    privateAccess: true # if you only want to allow private access to the cluster
    publicAccess: true # if you want to allow public access to the cluster
  id: $VPCID
  subnets:
    public:
      ap-northeast-2a:
        az: ap-northeast-2a
        cidr: 192.168.1.0/24
        id: $PubSubnet1
      ap-northeast-2b:
        az: ap-northeast-2b
        cidr: 192.168.2.0/24
        id: $PubSubnet2
      ap-northeast-2c:
        az: ap-northeast-2c
        cidr: 192.168.3.0/24
        id: $PubSubnet3

addons:
  - name: vpc-cni # no version is specified so it deploys the default version
    version: latest # auto discovers the latest available
    attachPolicyARNs: # attach IAM policies to the add-on's service account
      - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
    configurationValues: |-
      enableNetworkPolicy: "true"

  - name: kube-proxy
    version: latest

  - name: coredns
    version: latest

  - name: metrics-server
    version: latest

managedNodeGroups:
- amiFamily: AmazonLinux2023
  desiredCapacity: 3
  iam:
    withAddonPolicies:
      certManager: true # Enable cert-manager
      externalDNS: true # Enable ExternalDNS
  instanceType: t3.medium
  preBootstrapCommands:
    # install additional packages
    - "dnf install nvme-cli links tree tcpdump sysstat ipvsadm ipset bind-utils htop -y"
  labels:
    alpha.eksctl.io/cluster-name: myeks
    alpha.eksctl.io/nodegroup-name: ng1
  maxPodsPerNode: 100
  maxSize: 3
  minSize: 3
  name: ng1
  ssh:
    allow: true
    publicKeyName: $SSHKEYNAME
  tags:
    alpha.eksctl.io/nodegroup-name: ng1
    alpha.eksctl.io/nodegroup-type: managed
  volumeIOPS: 3000
  volumeSize: 120
  volumeThroughput: 125
  volumeType: gp3
EOF

eksctl create cluster -f myeks.yaml --verbose 4

배포가 완료되면 콘솔에서 기본정보를 확인한다.

#
kubectl cluster-info

# 네임스페이스 default 변경 적용
kubens default

# 
kubectl ctx
kubectl config rename-context "<각자 자신의 IAM User>@myeks.ap-northeast-2.eksctl.io" "eksworkshop"
kubectl config rename-context "admin@myeks.ap-northeast-2.eksctl.io" "eksworkshop"

#
kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
kubectl get node -v=6

#
kubectl get pod -A
kubectl get pdb -n kube-system

# 관리형 노드 그룹 확인
eksctl get nodegroup --cluster $CLUSTER_NAME
aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng1 | jq

# eks addon 확인
eksctl get addon --cluster $CLUSTER_NAME

# aws-load-balancer-controller를 위한 iam service account 생성 확인 : AWS IAM role bound to a Kubernetes service account
eksctl get iamserviceaccount --cluster $CLUSTER_NAME
NAMESPACE	NAME				ROLE ARN
kube-system	aws-load-balancer-controller	arn:aws:iam::911283464785:role/eksctl-myeks-addon-iamserviceaccount-kube-sys-Role1-S60JsHI62pHB

EC2 AWS 콘솔에서도 Nodegroup Role에 yaml 파일에 정의한 external dns, cert manager 같은 권한이 들어가있는 것을 확인할 수 있다.

[3] 관리형 노드 그룹(EC2) 접속 및 노드 정보 확인 : max-pods

# 인스턴스 정보 확인 1
aws ec2 describe-instances --query "Reservations[*].Instances[*].{InstanceID:InstanceId, PublicIPAdd:PublicIpAddress, PrivateIPAdd:PrivateIpAddress, InstanceName:Tags[?Key=='Name']|[0].Value, Status:State.Name}" --filters Name=instance-state-name,Values=running --output table

# 인스턴스 정보 확인 2 : AZ, ID, 공인IP
aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=myeks-ng1-Node" \
    --query "Reservations[].Instances[].{InstanceID:InstanceId, PublicIP:PublicIpAddress, AZ:Placement.AvailabilityZone}" \
    --output table

# AZ1 배치된 EC2 공인 IP
aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=myeks-ng1-Node" "Name=availability-zone,Values=ap-northeast-2a" \
    --query 'Reservations[*].Instances[*].PublicIpAddress' \
    --output text

# AZ2 배치된 EC2 공인 IP
aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=myeks-ng1-Node" "Name=availability-zone,Values=ap-northeast-2b" \
    --query 'Reservations[*].Instances[*].PublicIpAddress' \
    --output text

# AZ3 배치된 EC2 공인 IP
aws ec2 describe-instances \
    --filters "Name=tag:Name,Values=myeks-ng1-Node" "Name=availability-zone,Values=ap-northeast-2c" \
    --query 'Reservations[*].Instances[*].PublicIpAddress' \
    --output text

# EC2 공인 IP 변수 지정
export N1=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=myeks-ng1-Node" "Name=availability-zone,Values=ap-northeast-2a" --query 'Reservations[*].Instances[*].PublicIpAddress' --output text)
export N2=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=myeks-ng1-Node" "Name=availability-zone,Values=ap-northeast-2b" --query 'Reservations[*].Instances[*].PublicIpAddress' --output text)
export N3=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=myeks-ng1-Node" "Name=availability-zone,Values=ap-northeast-2c" --query 'Reservations[*].Instances[*].PublicIpAddress' --output text)
echo $N1, $N2, $N3


# *remoteAccess* 포함된 보안그룹 ID
aws ec2 describe-security-groups --filters "Name=group-name,Values=*remoteAccess*" | jq
export MNSGID=$(aws ec2 describe-security-groups --filters "Name=group-name,Values=*remoteAccess*" --query 'SecurityGroups[*].GroupId' --output text)

# 해당 보안그룹 inbound 에 자신의 집 공인 IP 룰 추가
aws ec2 authorize-security-group-ingress --group-id $MNSGID --protocol '-1' --cidr $(curl -s ipinfo.io/ip)/32

# 해당 보안그룹 inbound 에 운영서버 내부 IP 룰 추가
aws ec2 authorize-security-group-ingress --group-id $MNSGID --protocol '-1' --cidr 172.20.1.100/32


# ping 테스트
ping -c 2 $N1
ping -c 2 $N2
ping -c 2 $N3

# 워커 노드 SSH 접속
ssh -i <SSH 키> -o StrictHostKeyChecking=no ec2-user@$N1 hostname
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh -o StrictHostKeyChecking=no ec2-user@$i hostname; echo; done

ssh ec2-user@$N1
exit
ssh ec2-user@$N2
exit
ssh ec2-user@$N2
exit

ssh 접근이 되는 이유는 yaml에서 ssh enable을 true로 설정했기 때문이다. 이렇게 설정하면 remoteAccess sg가 추가되는데, 이 sg를 통해 다른 노드들에도 접근할 수 있게 되는 것이다. (보안 상으로는 false 두는 것이 당연히 더 안전하지만 실습의 편의를 위해 true로 설정한다.)

ping까지 잘 되고 워커 노드 SSH 접속도 잘 되는 것을 확인한다.

# 노드 기본 정보 확인
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i **hostnamectl**; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i **sudo ip -c addr**; echo; done

# 
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i **lsblk**; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i **df -hT /**; echo; done

# 스토리지클래스 및 CSI 노드 확인
kubectl get sc
kubectl describe sc gp2

kubectl get crd
kubectl get csinodes

# max-pods 정보 확인
kubectl describe node | grep Capacity: -A13
**kubectl get nodes -o custom-columns="NAME:.metadata.name,MAXPODS:.status.capacity.pods"**

# 노드에서 확인
~~for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i cat **/etc/eks/bootstrap.sh**; echo; done~~
ssh ec2-user@$N1 sudo cat /etc/kubernetes/kubelet/config.json | jq
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /etc/kubernetes/kubelet/config.json | grep maxPods; echo; done
for i in $N1 $N2 $N3; do echo ">> node $i <<"; ssh ec2-user@$i sudo cat /etc/kubernetes/kubelet/config.json.d/00-nodeadm.conf | grep maxPods; echo; done

참고로 gp2보다 gp3를 사용하는 것이 더 좋다.

기본이 17개인데 여기에서는 100개로 조정했다.

[4] 운영서버 EC2 : eks kubeconfig 설정, EFS 마운트 테스트
운영 서버에도 자격증명을 세팅하고 쿠버네티스 정보가 잘 나오는지 확인한다

# eks 설치한 iam 자격증명을 설정하기
aws configure
...

# get-caller-identity 확인
aws sts get-caller-identity --query Arn

# kubeconfig 생성
aws eks update-kubeconfig --name myeks --user-alias <위 출력된 자격증명 사용자>
aws eks update-kubeconfig --name myeks --user-alias admin

# 
kubectl cluster-info
kubectl ns default
kubectl get node -v6

EFS를 마운트하는 실습을 진행한다.

# 현재 EFS 정보 확인
aws efs describe-file-systems | jq

# 파일 시스템 ID만 출력
aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text
fs-040469b8fab273469

# EFS 마운트 대상 정보 확인
aws efs describe-mount-targets --file-system-id $(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text) | jq

# IP만 출력 : 
aws efs describe-mount-targets --file-system-id $(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text) --query "MountTargets[*].IpAddress" --output text
192.168.2.102	192.168.1.71	192.168.3.184

# DNS 질의 : 안되는 이유가 무엇일까요?
# EFS 도메인 이름(예시) : fs-040469b8fab273469.efs.ap-northeast-2.amazonaws.com
dig +short $(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text).efs.ap-northeast-2.amazonaws.com


# EFS 마운트 테스트
EFSIP1=<IP만 출력에서 아무 IP나 지정>
EFSIP1=192.168.1.71

df -hT
mkdir /mnt/myefs
mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport $EFSIP1:/ /efs
findmnt -t nfs4
df -hT --type nfs4

마운트가 잘 된 것을 확인할 수 있다. => 이렇게 마운트하면 EFS 파일 시스템에 있는 데이터를 로컬 파일 시스템처럼 사용할 수 있다.

즉, 이 명령어를 입력함으로써 이 명령어는 192.168.2.196 서버에서 제공하는 NFS을 로컬 디렉토리 /efs에 마운트하는 작업을 수행한 것

# 파일 작성
nfsstat
echo "EKS Workshop" > /efs/memo.txt
nfsstat
ls -l /efs
cat /efs/memo.txt

로컬에 저장되는 것으로 보이지만, 실제로는 네트워크 기반 공유 저장소에 저장되는 것이다.

[5] AWS LoadBalancerController, ExternalDNS, kube-ops-view 설치

주로 보는 tool을 설치한다.

# kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 --set service.main.type=ClusterIP  --set env.TZ="Asia/Seoul" --namespace kube-system

# AWS LoadBalancerController
helm repo add eks https://aws.github.io/eks-charts
helm repo update
kubectl get sa -n kube-system aws-load-balancer-controller
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=$CLUSTER_NAME \
  --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller

# ExternalDNS
MyDomain=ajufresh.com
MyDnzHostedZoneId=$(aws route53 list-hosted-zones-by-name --dns-name "$MyDomain." --query "HostedZones[0].Id" --output text)
curl -s https://raw.githubusercontent.com/gasida/PKOS/main/aews/externaldns.yaml | MyDomain=$MyDomain MyDnzHostedZoneId=$MyDnzHostedZoneId envsubst | kubectl apply -f -

# 사용 리전의 인증서 ARN 확인 : 정상 상태 확인(만료 상태면 에러 발생!)
CERT_ARN=$(aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text)
echo $CERT_ARN

# kubeopsview 용 Ingress 설정 : group 설정으로 1대의 ALB를 여러개의 ingress 에서 공용 사용
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
    alb.ingress.kubernetes.io/group.name: study
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/success-codes: 200-399
    alb.ingress.kubernetes.io/target-type: ip
  labels:
    app.kubernetes.io/name: kubeopsview
  name: kubeopsview
  namespace: kube-system
spec:
  ingressClassName: alb
  rules:
  - host: kubeopsview.$MyDomain
    http:
      paths:
      - backend:
          service:
            name: kube-ops-view
            port:
              number: 8080
        path: /
        pathType: Prefix
EOF

# 설치된 파드 정보 확인
kubectl get pods -n kube-system

# service, ep, ingress 확인
kubectl get ingress,svc,ep -n kube-system

# Kube Ops View 접속 정보 확인 
echo -e "Kube Ops View URL = https://kubeopsview.$MyDomain/#scale=1.5"
open "https://kubeopsview.$MyDomain/#scale=1.5" # macOS

다 잘 뜬 것을 확인할 수 있다.

2. 스토리지 이해

기본 컨테이너에 아무런 설정을 하지 않으면 파드가 정지되면 모두 삭제된다 => 파드에 볼륨을 추가함으로써 데이터를 저장해둔다.

만약 보존해야 하는 데이터가 있다면, Pod와 lifecycle과 저장되는 곳의 분리가 필요하다. => 이 요구사항을 만족시키기 위해 나온게 퍼시스턴트 볼륨(Persistent Volume). PVC는 중간 연결다리를 한다.

파드가 생성될 때 Storage Class를 활용하여 자동으로 볼륨을 마운트하여 파드에 연결하는 기능을 동적 프로비저닝(Dynamic Provisioning)이라고 한다.

emptyDir
hostPath
PV/PVC

CSI Driver: CNI Driver처럼 쿠버네티스에서 인터페이스를 만들어둔다. 노드별로 DemonSet을 깔아서 실제 Volume이 잘 붙었는지, 떨어졌는지 확인한다.

참고로 AWS EC2 Type에 따라 볼륨 최대 제한은 25개 or 39개이다.

kubectl describe csinodes

아직 설치하지 않아서 기본 정보만 보인다.

[파드 기본 및 empty 저장소 동작 확인]

# 모니터링
kubectl get pod -w

# redis 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: redis
    image: redis
EOF

# redis 파드 내에 파일 작성
kubectl exec -it redis -- pwd
kubectl exec -it redis -- sh -c "echo hello > /data/hello.txt"
kubectl exec -it redis -- cat /data/hello.txt

# ps 설치
kubectl exec -it redis -- sh -c "apt update && apt install procps -y"
kubectl exec -it redis -- ps aux

# redis 프로세스 강제 종료 : 파드가 어떻게 되나요? hint) restartPolicy
kubectl exec -it redis -- kill 1
kubectl get pod

# redis 파드 내에 파일 확인
kubectl exec -it redis -- cat /data/hello.txt
kubectl exec -it redis -- ls -l /data

# 파드 삭제
kubectl delete pod redis

1) redis 프로세스 강제 종료가 되면 다시 띄운다.

2) 이후에 파일은 삭제된다.

# 모니터링
kubectl get pod -w

# redis 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: redis
    image: redis
    volumeMounts:
    - name: redis-storage
      mountPath: /data/redis
  volumes:
  - name: redis-storage
    emptyDir: {}
EOF

# redis 파드 내에 파일 작성
kubectl exec -it redis -- pwd
kubectl exec -it redis -- sh -c "echo hello > /data/redis/hello.txt"
kubectl exec -it redis -- cat /data/redis/hello.txt

# ps 설치
kubectl exec -it redis -- sh -c "apt update && apt install procps -y"
kubectl exec -it redis -- ps aux

# redis 프로세스 강제 종료 : 파드가 어떻게 되나요? hint) restartPolicy
kubectl exec -it redis -- kill 1
kubectl get pod

# redis 파드 내에 파일 확인
kubectl exec -it redis -- cat /data/redis/hello.txt
kubectl exec -it redis -- ls -l /data/redis

# 파드 삭제 후 파일 확인
kubectl delete pod redis
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: redis
    image: redis
    volumeMounts:
    - name: redis-storage
      mountPath: /data/redis
  volumes:
  - name: redis-storage
    emptyDir: {}
EOF

# redis 파드 내에 파일 확인
kubectl exec -it redis -- cat /data/redis/hello.txt
kubectl exec -it redis -- ls -l /data/redis

# 파드 삭제
kubectl delete pod redis

1) redis 프로세스 강제 종료가 되면 다시 띄운다.

2) 그러나 아까와 달리 파일은 삭제되지 않는다.

3) 파드 삭제 이후에 다시 띄우면 파일은 없다

[호스트 Path 를 사용하는 PV/PVC : local-path-provisioner 스트리지 클래스 배포]

노드에 특정한 호스트 디렉터리에 동적 디렉터리명으로 마운터해서 사용할 수 있게 만드는 컨트롤러 이다.

# 배포
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml
...
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-path
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-path-config
  namespace: local-path-storage
data:
  config.json: |-
    {
            "nodePathMap":[
            {
                    "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
                    "paths":["/opt/local-path-provisioner"]
            }
            ]
    }
  setup: |-
    #!/bin/sh
    set -eu
    mkdir -m 0777 -p "$VOL_DIR"
  teardown: |-
    #!/bin/sh
    set -eu
    rm -rf "$VOL_DIR"
...

# 확인
kubectl get-all -n local-path-storage
kubectl get pod -n local-path-storage -owide
kubectl describe cm -n local-path-storage local-path-config
kubectl get sc
kubectl get sc local-path
NAME         PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  34s

이제 PV/PVC 를 사용하는 파드를 생성한다.

# PVC 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: localpath-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 1Gi
EOF

# PVC 확인
kubectl get pvc
kubectl describe pvc


# 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: localpath-claim
EOF

# 파드 확인
kubectl get pod,pv,pvc
kubectl describe pv    # Node Affinity 확인
kubectl exec -it app -- tail -f /data/out.txt
Thu Feb 13 09:33:53 UTC 2025
Thu Feb 13 09:33:58 UTC 2025
... 

# 워커노드 중 현재 파드가 배포되어 있다만, 아래 경로에 out.txt 파일 존재 확인
for node in $N1 $N2 $N3; do ssh ec2-user@$node tree /opt/local-path-provisioner; done
/opt/local-path-provisioner
└── pvc-f1615862-e4cd-47d0-b89c-8d0e99270678_default_localpath-claim
    └── out.txt

# 해당 워커노드 자체에서 out.txt 파일 확인 : 아래 굵은 부분은 각자 실습 환경에 따라 다름
ssh ec2-user@$N1 tail -f /opt/local-path-provisioner/pvc-f1615862-e4cd-47d0-b89c-8d0e99270678_default_localpath-claim/out.txt
...

아직 매핑할 파드가 없어 Pending 상태이다.

파드가 없는 노드에는 부착되어 있지 않고, 파드가 있는 노드에는 PVC가 붙어 있는 것을 확인할 수 있다.

파드 삭제 후 파드 재생성해서 데이터 유지 되는지 확인해본다.

# 파드 삭제 후 PV/PVC 확인
kubectl delete pod app
kubectl get pod,pv,pvc
for node in $N1 $N2 $N3; do ssh ec2-user@$node tree /opt/local-path-provisioner; done

# 파드 다시 실행
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: localpath-claim
EOF
 
# 확인
kubectl exec -it app -- head /data/out.txt
kubectl exec -it app -- tail -f /data/out.txt

emptydir과는 다르게 파드를 재생성해도 파일은 여전히 남아있었다.

# 파드와 PVC 삭제 
kubectl delete pod app
kubectl get pv,pvc
kubectl delete pvc localpath-claim

# 확인
kubectl get pv
for node in $N1 $N2 $N3; do ssh ec2-user@$node tree /opt/local-path-provisioner; done

PV를 지우면 PVC도 같이 지워진다.

[Kubestr 이용한 성능 측정]
쿠버네티스 스토리지의 성능을 측정할 수 있는 툴이다.

# [운영서버 EC2] kubestr 툴 다운로드 - Link
wget https://github.com/kastenhq/kubestr/releases/download/v0.4.48/kubestr_0.4.48_Linux_amd64.tar.gz
tar xvfz kubestr_0.4.48_Linux_amd64.tar.gz && mv kubestr /usr/local/bin/ && chmod +x /usr/local/bin/kubestr

# 스토리지클래스 점검
kubestr -h
kubestr

# 모니터링
watch 'kubectl get pod -owide;echo;kubectl get pv,pvc'

## 아래 서버 각각 접속 후 iostat 명령 실행 해두기 : 입출력(I/O) 통계 확인
ssh ec2-user@$N1
ssh ec2-user@$N2
ssh ec2-user@$N3
-----------------
iostat -xmdz 1
--------------------------------------------------------------
# rrqm/s : 초당 드라이버 요청 대기열에 들어가 병합된 읽기 요청 횟수
# wrqm/s : 초당 드라이버 요청 대기열에 들어가 병합된 쓰기 요청 횟수
# r/s : 초당 디스크 장치에 요청한 읽기 요청 횟수
# w/s : 초당 디스크 장치에 요청한 쓰기 요청 횟수
# rMB/s : 초당 디스크 장치에서 읽은 메가바이트 수
# wMB/s : 초당 디스크 장치에 쓴 메가바이트 수
# await : 가장 중요한 지표, 평균 응답 시간. 드라이버 요청 대기열에서 기다린 시간과 장치의 I/O 응답시간을 모두 포함 (단위: ms)
iostat -xmdz 1 -p xvdf
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdf              0.00     0.00 2637.93    0.00    10.30     0.00     8.00     6.01    2.28    2.28    0.00   0.33  86.21
--------------------------------------------------------------

# 랜덤 읽기 성능 테스트 수행 : 3분 정도 소요
# libaio 엔진과 다이렉트 I/O를 사용하여 고성능 스토리지의 랜덤 읽기 성능을 측정.
# OS 캐시를 사용하지 않고 직접 디스크 I/O 수행 (direct 플래그)
cat << EOF > fio-read.fio
[global]
ioengine=libaio
direct=1
bs=4k
runtime=120
time_based=1
iodepth=16
numjobs=4
group_reporting
size=1g
rw=randread
[read]
EOF
kubestr fio -f fio-read.fio -s local-path --size 10G  # size 미 지정시 기본 100G로 노드 Disk full 발생하니 유의
PVC created kubestr-fio-pvc-v5wzp
Pod created kubestr-fio-pod-24nqd
Running FIO test (fio-read.fio) on StorageClass (local-path) with a PVC of Size (10G)
Elapsed time- 2m33.103404647s
FIO test results:
  
FIO version - fio-3.36
Global options - ioengine=libaio verify= direct=1 gtod_reduce=

JobName: 
  blocksize= filesize= iodepth= rw=
read:
  IOPS=3023.671631 BW(KiB/s)=12094
  iops: min=2306 max=8972 avg=3025.259521
  bw(KiB/s): min=9224 max=35888 avg=12101.108398

Disk stats (read/write):
  nvme0n1: ios=362470/161 merge=0/55 ticks=6337974/3107 in_queue=6341080, util=96.498177%
  -  OK

# (참고) [NVMe] Read 평균 IOPS는 20300
kubestr fio -f fio-read.fio -s local-path --size 10G
...
read:
  IOPS=20300.531250 BW(KiB/s)=81202
  iops: min=17304 max=71653 avg=20309.919922
  bw(KiB/s): min=69216 max=286612 avg=81239.710938


# 랜덤 쓰기 성능 테스트 수행 : 5분 정도 소요
# numjobs=16, iodepth=16 : 총 16×16 = 256개의 I/O 요청이 동시에 발생
cat << EOF > fio-write.fio
[global]
ioengine=libaio
numjobs=16
iodepth=16
direct=1
bs=4k
runtime=120
time_based=1
size=1g
group_reporting
rw=randrw
rwmixread=0
rwmixwrite=100
[write]
EOF
kubestr fio -f fio-write.fio -s local-path --size 20G
...
write:
  IOPS=3024.619873 BW(KiB/s)=12098
  iops: min=1557 max=8682 avg=3024.849365
  bw(KiB/s): min=6231 max=34732 avg=12099.703125
...

IOPS=3024==AWS 콘솔에서 볼 수 있는 정보이다.

*** [정보] Choosing the right storage for cloud native CI/CD on Amazon Elastic Kubernetes Service - Blog (적합한 스토리지를 찾는 내용에 대한 글)

2. AWS EBs Controller

AWS CSI 드라이버는 크게 2개 구성요소가 있다.
1) AWS API를 호출하면서 AWS 스토리지를 관리하는 CSI-Controller (EBS CSI-Conroller)
2) kubelet과 상호작용하면서 AWS 스토리지를 pod에 마운트하는 CSI-Node

실제 배포 도식화 그림은 아래와 같다.

persistentvolume, persistentvolumeclaim의 accessModes는 ReadWriteOnce로 설정해야 한다.

[설치 : Amazon EBS CSI driver as an Amazon EKS add-on - Parameters]

# 아래는 aws-ebs-csi-driver 전체 버전 정보와 기본 설치 버전(True) 정보 확인
aws eks describe-addon-versions \
    --addon-name aws-ebs-csi-driver \
    --kubernetes-version 1.31 \
    --query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" \
    --output text

# ISRA 설정 : AWS관리형 정책 AmazonEBSCSIDriverPolicy 사용
eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster ${CLUSTER_NAME} \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve \
  --role-only \
  --role-name AmazonEKS_EBS_CSI_DriverRole

# ISRA 확인
eksctl get iamserviceaccount --cluster ${CLUSTER_NAME}
NAMESPACE	    NAME				            ROLE ARN
kube-system 	ebs-csi-controller-sa		arn:aws:iam::911283464785:role/AmazonEKS_EBS_CSI_DriverRole
...

# Amazon EBS CSI driver addon 배포(설치)
export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
eksctl create addon --name aws-ebs-csi-driver --cluster ${CLUSTER_NAME} --service-account-role-arn arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKS_EBS_CSI_DriverRole --force
kubectl get sa -n kube-system ebs-csi-controller-sa -o yaml | head -5

# 확인
eksctl get addon --cluster ${CLUSTER_NAME}
kubectl get deploy,ds -l=app.kubernetes.io/name=aws-ebs-csi-driver -n kube-system
kubectl get pod -n kube-system -l 'app in (ebs-csi-controller,ebs-csi-node)'
kubectl get pod -n kube-system -l app.kubernetes.io/component=csi-driver

# ebs-csi-controller 파드에 6개 컨테이너 확인
kubectl get pod -n kube-system -l app=ebs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}' ; echo
ebs-plugin csi-provisioner csi-attacher csi-snapshotter csi-resizer liveness-probe

# csinodes 확인
kubectl api-resources | grep -i csi
kubectl get csinodes
kubectl describe csinodes
...
Name:               ip-192-168-1-104.ap-northeast-2.compute.internal
Labels:             <none>
Annotations:        storage.alpha.kubernetes.io/migrated-plugins:
                      kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/portworx-v...
CreationTimestamp:  Sat, 15 Feb 2025 13:43:03 +0900
Spec:
  Drivers:
    ebs.csi.aws.com:
      Node ID:  i-01fe8eed1ead9cde5
      Allocatables:
        Count:        25
      Topology Keys:  [kubernetes.io/os topology.ebs.csi.aws.com/zone topology.kubernetes.io/zone]
Events:               <none>
...

kubectl get csidrivers
NAME              ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
ebs.csi.aws.com   true             false            false             <unset>         false               Persistent   109s
efs.csi.aws.com   false            false            false             <unset>         false               Persistent   40m

kubectl describe csidrivers ebs.csi.aws.com


# (참고) 노드에 최대 EBS 부착 수량 변경
aws eks update-addon --cluster-name ${CLUSTER_NAME} --addon-name aws-ebs-csi-driver \
  --addon-version v1.39.0-eksbuild.1 --configuration-values '{
    "node": {
      "volumeAttachLimit": 31,
      "enableMetrics": true
    }
  }'
혹은
cat << EOF > node-attachments.yaml
"node":
  "volumeAttachLimit": 31
  "enableMetrics": true
EOF
aws eks update-addon --cluster-name ${CLUSTER_NAME} --addon-name aws-ebs-csi-driver \
  --addon-version v1.39.0-eksbuild.1 --configuration-values 'file://node-attachments.yaml'


## 확인
kubectl get ds -n kube-system ebs-csi-node -o yaml
...
      containers:
      - args:
        - node
        - --endpoint=$(CSI_ENDPOINT)
        - --csi-mount-point-prefix=/var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/
        - --volume-attach-limit=31
        - --logging-format=text
        - --v=2

kubectl describe csinodes
...
Spec:
  Drivers:
    ebs.csi.aws.com:
      Node ID:  i-0660cdc75451595ab
      Allocatables:
        Count:        31

csi-driver가 설치된 것을 확인할 수 있다.

아까와 다르게 부착 가능 개수가 나오는 것을 볼 수 있다 (25개)

[gp3 스토리지 클래스 생성 : AWS EBS 스토리지 클래스 파라미터]

# gp3 스토리지 클래스 생성
kubectl get sc
cat <<EOF | kubectl apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  #iops: "5000"
  #throughput: "250"
  allowAutoIOPSPerGBIncrease: 'true'
  encrypted: 'true'
  fsType: xfs # 기본값이 ext4
EOF
kubectl get sc
kubectl describe sc gp3 | grep Parameters

[PVC/PV 파드 테스트]

# 워커노드의 EBS 볼륨 확인 : tag(키/값) 필터링 - 링크
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --output table
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[*].Attachments" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[].[VolumeId, VolumeType, Attachments[].[InstanceId, State][]][]" | jq
aws ec2 describe-volumes --filters Name=tag:Name,Values=$CLUSTER_NAME-ng1-Node --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" | jq

# 워커노드에서 파드에 추가한 EBS 볼륨 확인
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --output table
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | jq
aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" | jq

# 워커노드에서 파드에 추가한 EBS 볼륨 모니터링
while true; do aws ec2 describe-volumes --filters Name=tag:ebs.csi.aws.com/cluster,Values=true --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" --output text; date; sleep 1; done

# PVC 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: gp3
EOF
kubectl get pvc,pv

# 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: ebs-claim
EOF

# PVC, 파드 확인
kubectl get pvc,pv,pod
kubectl get VolumeAttachment
kubectl df-pv

# 추가된 EBS 볼륨 상세 정보 확인 : AWS 관리콘솔 EC2(EBS)에서 확인
aws ec2 describe-volumes --volume-ids $(kubectl get pv -o jsonpath="{.items[0].spec.csi.volumeHandle}") | jq

4GB가 추가되어 있는 것을 볼 수 있고, gp3인 것도 확인이 가능하다.


# PV 상세 확인 : nodeAffinity 내용의 의미는?
kubectl get pv -o yaml
...
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.ebs.csi.aws.com/zone
            operator: In
            values:
            - ap-northeast-2b
...

kubectl get node --label-columns=topology.ebs.csi.aws.com/zone,topology.k8s.aws/zone-id
kubectl describe node

# 파일 내용 추가 저장 확인
kubectl exec app -- tail -f /data/out.txt

## 파드 내에서 볼륨 정보 확인
kubectl exec -it app -- sh -c 'df -hT --type=overlay'
kubectl exec -it app -- sh -c 'df -hT --type=xfs'

파드 내에서 overlay, xfs type을 확인했다. /dev/nvme1n1가 동적으로 4GB를 만들어 파드에 부착되어 있는 것이다.

볼륨 증가를 해본다. 주의점으로는 늘릴수는 있어도 줄일수는 없.

# 현재 pv 의 이름을 기준하여 4G > 10G 로 증가 : .spec.resources.requests.storage의 4Gi 를 10Gi로 변경
kubectl get pvc ebs-claim -o jsonpath={.spec.resources.requests.storage} ; echo
kubectl get pvc ebs-claim -o jsonpath={.status.capacity.storage} ; echo
kubectl patch pvc ebs-claim -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
kubectl patch pvc ebs-claim -p '{"status":{"capacity":{"storage":"10Gi"}}}' # status 는 바로 위 커멘드 적용 후 EBS 10Gi 확장 후 알아서 10Gi 반영됨

# 확인 : 볼륨 용량 수정 반영이 되어야 되니, 수치 반영이 조금 느릴수 있다
kubectl exec -it app -- sh -c 'df -hT --type=xfs'
kubectl df-pv
aws ec2 describe-volumes --volume-ids $(kubectl get pv -o jsonpath="{.items[0].spec.csi.volumeHandle}") | jq

10GB로 바뀐 것을 확인할 수 있다.

여기에서도 마찬가지로 Avail이 4 -> 10GB로 증가했다.

kubectl delete pod app & kubectl delete pvc ebs-claim

3. AWS Volume SnapShots Controller

관리자가 실수해서 Pod, PV를 지우게 되면 어떻게 할까? => Volume도 같이 날라간다.
그래서 이러한 것을 방지하기 위해 Volume을 snapshot 찍는 기능을 제공해준다.

[Volumesnapshots 컨트롤러 설치 - 링크 VolumeSnapshot example Blog Docs]

# Install Snapshot CRDs
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl get crd | grep snapshot
kubectl api-resources  | grep snapshot

# Install Common Snapshot Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
kubectl get deploy -n kube-system snapshot-controller
kubectl get pod -n kube-system

# Install Snapshotclass
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-ebs-csi-driver/master/examples/kubernetes/snapshot/manifests/classes/snapshotclass.yaml
kubectl get vsclass # 혹은 volumesnapshotclasses
kubectl describe vsclass

테스트를 위해 테스트 PVC/파드를 생성한다.

# PVC 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  storageClassName: gp3
EOF
kubectl get pvc,pv

# 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: ebs-claim
EOF

# 파일 내용 추가 저장 확인
kubectl exec app -- tail -f /data/out.txt

# VolumeSnapshot 생성 : Create a VolumeSnapshot referencing the PersistentVolumeClaim name
# AWS 관리 콘솔 EBS 스냅샷 확인
cat <<EOF | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: ebs-volume-snapshot
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: ebs-claim
EOF

# VolumeSnapshot 확인
kubectl get volumesnapshot
kubectl get volumesnapshot ebs-volume-snapshot -o jsonpath={.status.boundVolumeSnapshotContentName} ; echo
kubectl describe volumesnapshot.snapshot.storage.k8s.io ebs-volume-snapshot
kubectl get volumesnapshotcontents

# VolumeSnapshot ID 확인 
kubectl get volumesnapshotcontents -o jsonpath='{.items[*].status.snapshotHandle}' ; echo

# AWS EBS 스냅샷 확인
aws ec2 describe-snapshots --owner-ids self | jq
aws ec2 describe-snapshots --owner-ids self --query 'Snapshots[]' --output table

동적 파드의 볼륨을 스냅샷 찍어 저장한 스냅샷의 정보이다.

# app & pvc 제거 : 강제로 장애 재현
kubectl delete pod app && kubectl delete pvc ebs-claim

스냅샷에서 복원을 해본다.

# 스냅샷에서 PVC 로 복원
kubectl get pvc,pv
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ebs-snapshot-restored-claim
spec:
  storageClassName: gp3
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi
  dataSource:
    name: ebs-volume-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
EOF

# 확인
kubectl get pvc,pv

# 파드 생성
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: app
    image: centos
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo \$(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: ebs-snapshot-restored-claim
EOF

# 파일 내용 저장 확인 : 파드 삭제 전까지의 저장 기록이 남아 있다. 이후 파드 재생성 후 기록도 잘 저장되고 있다
kubectl exec app -- cat /data/out.txt
...
Sat Dec 24 15:12:24 UTC 2022
Sat Dec 24 15:12:24 UTC 2022
Sat Dec 24 15:24:23 UTC 2022
Sat Dec 24 15:24:23 UTC 2022
...

# 삭제
kubectl delete pod app && kubectl delete pvc ebs-snapshot-restored-claim && kubectl delete volumesnapshots ebs-volume-snapshot

이번에 PVC를 만들 때 특이한 부분이

  dataSource:
    name: ebs-volume-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

이 부분이다. 여기에서 스냅샷 기반으로 PVC를 만든다고 선언하는 것이다.

이제 exec 커맨드로 파일을 다시 확인해본다.

파일 내용이 복원된 것을 확인할 수 있다.

kubectl delete pod app && kubectl delete pvc ebs-snapshot-restored-claim && kubectl delete volumesnapshots ebs-volume-snapshot

4. AWS EFS Controller

EBS는 블록 기반, EFS는 파일 기반이다. 바로 마운트해서 사용할 수 있다는게 큰 특징이다.

[EFS 파일시스템 확인 및 EFS Controller Addon 설치]

# EFS 정보 확인 
aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text

# 아래는 aws-efs-csi-driver 전체 버전 정보와 기본 설치 버전(True) 정보 확인
aws eks describe-addon-versions \
    --addon-name aws-efs-csi-driver \
    --kubernetes-version 1.31 \
    --query "addons[].addonVersions[].[addonVersion, compatibilities[].defaultVersion]" \
    --output text

# IAM 정책 생성
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json
aws iam create-policy --policy-name AmazonEKS_EFS_CSI_Driver_Policy --policy-document file://iam-policy-example.json

# ISRA 설정 : 고객관리형 정책 AmazonEKS_EFS_CSI_Driver_Policy 사용
eksctl create iamserviceaccount \
  --name efs-csi-controller-sa \
  --namespace kube-system \
  --cluster ${CLUSTER_NAME} \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy \
  --approve \
  --role-only \
  --role-name AmazonEKS_EFS_CSI_DriverRole

# ISRA 확인
eksctl get iamserviceaccount --cluster ${CLUSTER_NAME}

# Amazon EFS CSI driver addon 배포(설치)
export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
eksctl create addon --name aws-efs-csi-driver --cluster ${CLUSTER_NAME} --service-account-role-arn arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKS_EFS_CSI_DriverRole --force
kubectl get sa -n kube-system efs-csi-controller-sa -o yaml | head -5

# 확인
eksctl get addon --cluster ${CLUSTER_NAME}
kubectl get pod -n kube-system -l "app.kubernetes.io/name=aws-efs-csi-driver,app.kubernetes.io/instance=aws-efs-csi-driver"
kubectl get pod -n kube-system -l app=efs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}' ; echo
kubectl get csidrivers efs.csi.aws.com -o yaml

[EFS 파일시스템을 파드가 사용하게 설정 : Add empty StorageClasses from static example - Workshop 링크]

# 모니터링
watch 'kubectl get sc efs-sc; echo; kubectl get pv,pvc,pod'

**# [운영 서버 EC2]**
**# 실습 코드 clone**
git clone https://github.com/kubernetes-sigs/aws-efs-csi-driver.git /root/efs-csi
cd /root/efs-csi/examples/kubernetes/multiple_pods/specs && tree

# EFS 스토리지클래스 생성 및 확인
cat storageclass.yaml
kubectl apply -f storageclass.yaml
kubectl get sc efs-sc

# PV 생성 및 확인 : volumeHandle을 자신의 EFS 파일시스템ID로 변경
**EfsFsId=**$(aws efs describe-file-systems --query "FileSystems[*].FileSystemId" --output text)
sed -i "s/**fs-4af69aab**/**$EfsFsId**/g" pv.yaml
**cat pv.yaml**
apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-pv
spec:
  capacity:
    **storage: 5Gi**
  volumeMode: **Filesystem**
  accessModes:
    - **ReadWriteMany**
  persistentVolumeReclaimPolicy: Retain
  storageClassName: **efs-sc**
  csi:
    driver: efs.csi.aws.com
    volumeHandle: ***fs-05699d3c12ef609e2***

**kubectl apply -f pv.yaml**
kubectl get pv; kubectl describe pv

RWX이기 때문에 동시 접근/사용이 가능하다.

# PVC 생성 및 확인
cat claim.yaml
**kubectl apply -f claim.yaml**
**kubectl get pvc**

# 파드 생성 및 연동 : 파드 내에 /data 데이터는 EFS를 사용
cat pod1.yaml pod2.yaml
**kubectl apply -f pod1.yaml,pod2.yaml**
~~kubectl df-pv~~

# 파드 정보 확인 : **PV에 5Gi 와 파드 내에서 확인한 NFS4 볼륨 크리 8.0E의 차이는 무엇?**
kubectl get pods
kubectl exec -ti app1 -- sh -c "df -hT -t nfs4"
kubectl exec -ti app2 -- sh -c "df -hT -t nfs4"

# 공유 저장소 저장 동작 확인
tree /mnt/myefs              # 운영서버 EC2 에서 확인
tail -f /mnt/myefs/out1.txt  # 운영서버 EC2 에서 확인
tail -f /mnt/myefs/out2.txt  # 운영서버 EC2 에서 확인
kubectl exec -ti app1 -- tail -f /data/out1.txt
kubectl exec -ti app2 -- tail -f /data/out2.txt

둘 다 똑같이 사용하고 있는 것을 확인할 수 있다.

파일도 똑같다. => 네트워크 기반으로 파드1, 2, 운영 서버 모두 같은 파일 시스템을 사용하는 것이다.

# 쿠버네티스 리소스 삭제
kubectl delete pod app1 app2
kubectl delete pvc efs-claim && kubectl delete pv efs-pv && kubectl delete sc efs-sc

[EFS 파일시스템을 다수의 파드가 사용하게 설정 : Dynamic provisioning using EFS ← Fargate node는 현재 미지원]

Access point를 사용하여 다중 사용에서의 보안성을 높일 수 있다.

# 모니터링
watch 'kubectl get sc efs-sc; echo; kubectl get pv,pvc,pod'

# [운영 서버 EC2]
# EFS 스토리지클래스 생성 및 확인
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/examples/kubernetes/dynamic_provisioning/specs/storageclass.yaml
cat storageclass.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap #  The type of volume to be provisioned by Amazon EFS. Currently, only access point based provisioning is supported (efs-ap).
  fileSystemId: fs-92107410 # The file system under which the access point is created.
  directoryPerms: "700" # The directory permissions of the root directory created by the access point.
  gidRangeStart: "1000" # optional, The starting range of the Posix group ID to be applied onto the root directory of the access point. The default value is 50000.
  gidRangeEnd: "2000" # optional, The ending range of the Posix group ID. The default value is 7000000.
  basePath: "/dynamic_provisioning" # optional, The path on the file system under which the access point root directory is created. If the path isn't provided, the access points root directory is created under the root of the file system.
  subPathPattern: "${.PVC.namespace}/${.PVC.name}" # optional, A pattern that describes the subPath under which an access point should be created. So if the pattern were ${.PVC.namespace}/${PVC.name}, the PVC namespace is foo and the PVC name is pvc-123-456, and the basePath is /dynamic_provisioner the access point would be created at /dynamic_provisioner/foo/pvc-123-456
  ensureUniqueDirectory: "true" # optional # A boolean that ensures that, if set, a UUID is appended to the final element of any dynamically provisioned path, as in the above example. This can be turned off but this requires you as the administrator to ensure that your storage classes are set up correctly. Otherwise, it's possible that 2 pods could end up writing to the same directory by accident. Please think very carefully before setting this to false!
  reuseAccessPoint: "false" # optional
  
sed -i "s/fs-92107410/$EfsFsId/g" storageclass.yaml
kubectl apply -f storageclass.yaml
kubectl get sc efs-sc

# PVC/파드 생성 및 확인
curl -s -O https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/examples/kubernetes/dynamic_provisioning/specs/pod.yaml
cat pod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: efs-claim
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: efs-app
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: efs-claim

kubectl apply -f pod.yaml
kubectl get pvc,pv,pod

# PVC/PV 생성 로그 확인
kubectl krew install stern
kubectl stern -n kube-system -l app=efs-csi-controller -c csi-provisioner
혹은
kubectl logs  -n kube-system -l app=efs-csi-controller -c csi-provisioner -f

# 파드 정보 확인
kubectl exec -it efs-app -- sh -c "df -hT -t nfs4"
Filesystem           Type            Size      Used Available Use% Mounted on
127.0.0.1:/          nfs4            8.0E         0      8.0E   0% /data

# 공유 저장소 저장 동작 확인
tree /mnt/myefs              # 운영서버 EC2 에서 확인
kubectl exec efs-app -- bash -c "cat /data/out"
kubectl exec efs-app -- bash -c "ls -l /data/out"
kubectl exec efs-app -- bash -c "stat /data/"

이번에는 전과 비슷하지만 액세스 포인트가 생긴 것을 확인할 수 있다 => 경로에 따라 권한을 제어할 수 있다.

kubectl delete -f pod.yaml
kubectl delete -f storageclass.yaml
cd $HOME

5. EKS Persistent Volumes for Instance Store & Add NodeGroup

Storage IOPS가 높게 필요한 Pod가 필요하지만, 디스크에 유실이 나도 괜찮은 경우 => EC2 인스턴스 스토어(임시 블록 스토리지)를 사용한다.

단점: 데이터 손실(기본 디스크 드라이브 오류, 인스턴스가 중지됨, 인스턴스가 최대 절전 모드로 전환됨, 인스턴스가 종료되는 경우 ...)
장점 : 로컬에 있기 때문에 IO가 매우 빠르다

# 인스턴스 스토어 볼륨이 있는 c5 모든 타입의 스토리지 크기
aws ec2 describe-instance-types \
 --filters "Name=instance-type,Values=c5*" "Name=instance-storage-supported,Values=true" \
 --query "InstanceTypes[].[InstanceType, InstanceStorageInfo.TotalSizeInGB]" \
 --output table
--------------------------
|  DescribeInstanceTypes |
+---------------+--------+
|  c5d.large    |  50    |
|  c5d.12xlarge |  1800  |
...

사용 가능한 인스턴스 스토어. 빠르고 싸지만 데이터 유실 될 수 있다.

#
export PubSubnet1=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet1" --query "Subnets[0].[SubnetId]" --output text)
export PubSubnet2=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet2" --query "Subnets[0].[SubnetId]" --output text)
export PubSubnet3=$(aws ec2 describe-subnets --filters Name=tag:Name,Values="$CLUSTER_NAME-Vpc1PublicSubnet3" --query "Subnets[0].[SubnetId]" --output text)
echo $PubSubnet1 $PubSubnet2 $PubSubnet3

# 
SSHKEYNAME=<각자 자신의 SSH Keypair 이름>
SSHKEYNAME=aews

cat << EOF > myng2.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: myeks
  region: ap-northeast-2
  version: "1.31"

managedNodeGroups:
- amiFamily: AmazonLinux2
  desiredCapacity: 1
  instanceType: c5d.large
  labels:
    alpha.eksctl.io/cluster-name: myeks
    alpha.eksctl.io/nodegroup-name: ng2
    disk: instancestore
  maxPodsPerNode: 110
  maxSize: 1
  minSize: 1
  name: ng2
  ssh:
    allow: true
    publicKeyName: $SSHKEYNAME
  subnets:
  - $PubSubnet1
  - $PubSubnet2
  - $PubSubnet3
  tags:
    alpha.eksctl.io/nodegroup-name: ng2
    alpha.eksctl.io/nodegroup-type: managed
  volumeIOPS: 3000
  volumeSize: 30
  volumeThroughput: 125
  volumeType: gp3
  preBootstrapCommands:
    - |
      # Install Tools
      yum install nvme-cli links tree jq tcpdump sysstat -y

      # Filesystem & Mount
      mkfs -t xfs /dev/nvme1n1
      mkdir /data
      mount /dev/nvme1n1 /data

      # Get disk UUID
      uuid=\$(blkid -o value -s UUID mount /dev/nvme1n1 /data) 

      # Mount the disk during a reboot
      echo /dev/nvme1n1 /data xfs defaults,noatime 0 2 >> /etc/fstab
EOF

myng2.yaml 파일 작성이 완료되면 apply한다.

# 신규 노드 그룹 생성 전 정보 확인
eksctl create nodegroup --help
eksctl create nodegroup -c $CLUSTER_NAME -r ap-northeast-2 --subnet-ids "$PubSubnet1","$PubSubnet2","$PubSubnet3" --ssh-access \
  -n ng2 -t c5d.large -N 1 -m 1 -M 1 --node-volume-size=30 --node-labels disk=instancestore --max-pods-per-node 100 --dry-run > myng2.yaml

cat <<EOT > nvme.yaml
  preBootstrapCommands:
    - |
      # Install Tools
      yum install nvme-cli links tree jq tcpdump sysstat -y

      # Filesystem & Mount
      mkfs -t xfs /dev/nvme1n1
      mkdir /data
      mount /dev/nvme1n1 /data

      # Get disk UUID
      uuid=\$(blkid -o value -s UUID mount /dev/nvme1n1 /data) 

      # Mount the disk during a reboot
      echo /dev/nvme1n1 /data xfs defaults,noatime 0 2 >> /etc/fstab
EOT
sed -i -n -e '/volumeType/r nvme.yaml' -e '1,$p' myng2.yaml
```

![](https://velog.velcdn.com/images/ajufresh/post/2b9031c1-b335-4eca-b9c6-42b37ae009d2/image.png)


```
# 확인
kubectl get node --label-columns=node.kubernetes.io/instance-type,eks.amazonaws.com/capacityType,topology.kubernetes.io/zone
kubectl get node -l disk=instancestore


# ng2 노드 그룹 *ng2-remoteAccess* 포함된 보안그룹 ID
aws ec2 describe-security-groups --filters "Name=group-name,Values=*ng2-remoteAccess*" | jq
export NG2SGID=$(aws ec2 describe-security-groups --filters "Name=group-name,Values=*ng2-remoteAccess*" --query 'SecurityGroups[*].GroupId' --output text)
aws ec2 authorize-security-group-ingress --group-id $NG2SGID --protocol '-1' --cidr $(curl -s ipinfo.io/ip)/32
aws ec2 authorize-security-group-ingress --group-id $NG2SGID --protocol '-1' --cidr 172.20.1.100/32


# 워커 노드 SSH 접속
N4=<각자 자신의 워커 노드4번 공인 IP 지정>
N4=3.37.44.222
ssh ec2-user@$N4 hostname

# 확인
ssh ec2-user@$N4 sudo nvme list
ssh ec2-user@$N4 sudo lsblk -e 7 -d
ssh ec2-user@$N4 sudo df -hT -t xfs
ssh ec2-user@$N4 sudo tree /data
ssh ec2-user@$N4 sudo cat /etc/fstab

# (옵션) max-pod 확인
kubectl describe node -l disk=instancestore | grep Allocatable: -A7

# (옵션) kubelet 데몬 파라미터 확인 : --max-pods=29 --max-pods=110
ssh ec2-user@$N4 cat /etc/eks/bootstrap.sh
ssh ec2-user@$N4 sudo ps -ef | grep kubelet
root        3012       1  0 06:50 ?        00:00:02 /usr/bin/kubelet --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime-endpoint unix:///run/containerd/containerd.sock --image-credential-provider-config /etc/eks/image-credential-provider/config.json --image-credential-provider-bin-dir /etc/eks/image-credential-provider --node-ip=192.168.2.228 --pod-infra-container-image=602401143452.dkr.ecr.ap-northeast-2.amazonaws.com/eks/pause:3.5 --v=2 --hostname-override=ip-192-168-2-228.ap-northeast-2.compute.internal --cloud-provider=external --node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/cluster-name=myeks,alpha.eksctl.io/nodegroup-name=ng2,disk=instancestore,eks.amazonaws.com/nodegroup-image=ami-0fa05db9e3c145f63,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=ng2,eks.amazonaws.com/sourceLaunchTemplateId=lt-0955d0931c1d712c1 --max-pods=29 --max-pods=110

c5d.large 타입이 인스턴스 스토어이다.

[local-path 스토리지 클래스 재생성 : 패스 변경]

# 기존 local-path 스토리지 클래스 삭제
kubectl delete -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml

#
curl -sL https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml | sed 's/opt/data/g' | kubectl apply -f -

kubectl describe cm -n local-path-storage local-path-config
...
        "nodePathMap":[
        {
                "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
                "paths":["/data/local-path-provisioner"]
        }
        ]
...

# 모니터링
watch 'kubectl get pod -owide;echo;kubectl get pv,pvc'
ssh ec2-user@$N4 iostat -xmdz 1 -p nvme1n1

# [운영서버 EC2] Read 측정
kubestr fio -f fio-read.fio -s local-path --size 10G --nodeselector disk=instancestore
...
read:
  IOPS=20309.355469 BW(KiB/s)=81237
  iops: min=17392 max=93872 avg=20316.857422
  bw(KiB/s): min=69570 max=375488 avg=81268.023438

Disk stats (read/write):
  nvme1n1: ios=2432488/9 merge=0/3 ticks=7639891/23 in_queue=7639913, util=99.950768%

아까 테스트한 결과보다 IOPS 성능이 훨씬 높게 나오는 것을 확인할 수 있다.

일반 EBS(기본값 3000 IOPS) vs 인스턴스 스토어 평균 IOPS 속도 비교 with kubestr ← 인스턴스 스토어가 7배 빠르다. => 성능 면에서 월등하다.

# local-path 스토리지 클래스 삭제
kubectl delete -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml

# ng2 노드그룹 삭제
eksctl delete nodegroup -c $CLUSTER_NAME -n ng2

6. 노드 그룹

[[운영서버 EC2] docker buildx 활성화 : Multi(or cross)-platform 빌드]

우분투는 플랫폼별로 제공해주고 있다.

# 
arch
x86_64

# CPU Arch arm64v8 , riscv64 실행 시도
docker run --rm -it riscv64/ubuntu bash
docker run --rm -it arm64v8/ubuntu bash


# Extended build capabilities with BuildKit - List builder instances
docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS  BUILDKIT PLATFORMS
default * docker
  default default         running v0.12.5  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386


# docker buildx 활성화 (멀티 아키텍처 빌드를 위해 필요)
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker images

docker buildx create --use --name mybuilder
docker buildx ls

# Buildx가 정상 동작하는지 확인
docker buildx inspect --bootstrap
...
Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
...

docker buildx ls
NAME/NODE    DRIVER/ENDPOINT             STATUS  BUILDKIT PLATFORMS
mybuilder *  docker-container
  mybuilder0 unix:///var/run/docker.sock running v0.19.0  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
default      docker
  default    default                     running v0.12.5  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

docker ps
CONTAINER ID   IMAGE                           COMMAND       CREATED              STATUS              PORTS     NAMES
fa8773b87c70   moby/buildkit:buildx-stable-1   "buildkitd"   About a minute ago   Up About a minute             buildx_buildkit_mybuilder0

docker buildx를 활성화하면 멀티 아키텍처에서 실행이 된다.

Platforms에 지원하는 플랫폼이 나와있다 (x86_64 기반인데 arm 기반의 이미지를 빌드할 수 있도록 지원해준다)

[(샘플) 컨테이너 이미지 빌드 및 실행 - 윈도우PC(amd64)와 macOS(arm64)]

#
mkdir myweb && cd myweb

# server.py 파일 작성
cat > server.py <<EOF
from http.server import ThreadingHTTPServer, BaseHTTPRequestHandler
from datetime import datetime
import socket

class RequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/plain')
        self.end_headers()
        
        now = datetime.now()
        hostname = socket.gethostname()
        response_string = now.strftime("The time is %-I:%M:%S %p, VERSION 0.0.1\n")
        response_string += f"Server hostname: {hostname}\n"
        self.wfile.write(bytes(response_string, "utf-8")) 

def startServer():
    try:
        server = ThreadingHTTPServer(('', 80), RequestHandler)
        print("Listening on " + ":".join(map(str, server.server_address)))
        server.serve_forever()
    except KeyboardInterrupt:
        server.shutdown()

if __name__ == "__main__":
    startServer()
EOF


# Dockerfile 생성
cat > Dockerfile <<EOF
FROM python:3.12
ENV PYTHONUNBUFFERED 1
COPY . /app
WORKDIR /app 
CMD python3 server.py
EOF

# 빌드, 실행 후 삭제
docker pull python:3.12
docker build -t myweb:1 -t myweb:latest .
docker images
docker run -d -p 8080:80 --name=timeserver myweb
curl http://localhost:8080
docker rm -f timeserver


# 멀티 플랫폼 빌드 후 푸시
docker images
docker login

DOCKERNAME=<도커허브 계정명>
DOCKERNAME=gasida

docker buildx build --platform linux/amd64,linux/arm64 .
docker images
docker manifest inspect $DOCKERNAME/myweb:multi | jq
docker buildx imagetools inspect $DOCKERNAME/myweb:multi

# 컨테이너 실행 해보기 : 윈도우PC(amd64)와 macOS(arm64) 두 곳 모두 동일한 컨테이너 이미지 경로로 실행해보자!
docker ps
docker run -d -p 8080:80 --name=timeserver $DOCKERNAME/myweb:multi
docker ps

# 컨테이너 접속 및 로그 확인
curl http://localhost:8080
docker logs timeserver

# 컨테이너 이미지 내부에 파일 확인
docker exec -it timeserver ls -l

# 컨테이너 이미지 내부에 server.py 파일 확인
docker exec -it timeserver cat server.py

# 컨테이너 삭제
docker rm -f timeserver

아키텍처가 다름에도 둘 다 실행이 된다. => 이게 멀티 플랫폼!

[AWS ECR 프라이빗 저장소 사용하기]

#
export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
aws ecr get-login-password \
--region **ap-northeast-2** | **docker login** \
--username AWS \
--password-stdin ${ACCOUNT_ID}.dkr.ecr.**ap-northeast-2**.amazonaws.com
cat /root/.docker/config.json | jq

# ECR 프라이빗 저장소 생성
aws ecr **create-repository** --repository-name **myweb**

# ECR 프라이빗 저장소에 푸시
docker buildx build --platform linux/amd64,linux/arm64 **--push** --tag ${ACCOUNT_ID}.dkr.ecr.**ap-northeast-2**.amazonaws.com/**myweb:multi** .
docker images

# 컨테이너 실행 : 윈도우PC(amd64)와 macOS(arm64) 두 곳 모두 동일한 컨테이너 이미지 경로로 실행해보자!
docker run -d -p 8080:80 --name=timeserver ${ACCOUNT_ID}.dkr.ecr.**ap-northeast-2**.amazonaws.com/**myweb:multi**
docker ps
**curl http://localhost:8080**

# 컨테이너 삭제
docker rm -f timeserver

마찬가지로 실행이 잘 된다.

[ARM 노드 그룹]
AWS Graviton 프로세서 : 64-bit Arm 프로세서 코어 기반의 AWS 커스텀 반도체 ⇒ 20~40% 향상된 가격대비 성능

#
kubectl get nodes -L kubernetes.io/arch

# 신규 노드 그룹 생성
eksctl create nodegroup --help
eksctl create nodegroup -c $CLUSTER_NAME -r ap-northeast-2 --subnet-ids "$PubSubnet1","$PubSubnet2","$PubSubnet3" \
  -n ng3 -t t4g.medium -N 1 -m 1 -M 1 --node-volume-size=30 --node-labels family=graviton --dry-run > myng3.yaml
cat myng3.yaml
eksctl create nodegroup -f myng3.yaml

# 확인
kubectl get nodes --label-columns eks.amazonaws.com/nodegroup,kubernetes.io/arch,eks.amazonaws.com/capacityType
kubectl describe nodes --selector family=graviton
aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng3 | jq .nodegroup.taints

# taints 셋팅 -> 적용에 2~3분 정도 시간 소요
aws eks update-nodegroup-config --cluster-name $CLUSTER_NAME --nodegroup-name ng3 --taints "addOrUpdateTaints=[{key=frontend, value=true, effect=NO_EXECUTE}]"

# 확인
kubectl describe nodes --selector family=graviton | grep Taints
aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng3 | jq .nodegroup.taints
# NO_SCHEDULE - This corresponds to the Kubernetes NoSchedule taint effect. This configures the managed node group with a taint that repels all pods that don't have a matching toleration. All running pods are not evicted from the manage node group's nodes.
# NO_EXECUTE - This corresponds to the Kubernetes NoExecute taint effect. Allows nodes configured with this taint to not only repel newly scheduled pods but also evicts any running pods without a matching toleration.
# PREFER_NO_SCHEDULE - This corresponds to the Kubernetes PreferNoSchedule taint effect. If possible, EKS avoids scheduling Pods that do not tolerate this taint onto the node.

[Spot 인스턴스]

AWS EC2를 저렴한 가격으로 사용할 수 있다. 다만 언제든 꺼질 수 있기 때문에 Kubernetes 워커 노드로 Spot Instances를 사용하는 것은 상태 비저장 API 엔드포인트, 일괄 처리, ML 학습 워크로드, Apache Spark를 사용한 빅데이터 ETL, 대기열 처리 애플리케이션, CI/CD 파이프라인과 같은 워크로드에 사용한다.

# [운영서버 EC2] ec2-instance-selector 설치
curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v2.4.1/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector
mv ec2-instance-selector /usr/local/bin/
ec2-instance-selector --version

# 적절한 인스턴스 스펙 선택을 위한 도구 사용
ec2-instance-selector --vcpus 2 --memory 4 --gpus 0 --current-generation -a x86_64 --deny-list 't.*' --output table-wide
Instance Type   VCPUs   Mem (GiB)  Hypervisor  Current Gen  Hibernation Support  CPU Arch  Network Performance  ENIs    GPUs    GPU Mem (GiB)  GPU Info  On-Demand Price/Hr  Spot Price/Hr (30d avg)
-------------   -----   ---------  ----------  -----------  -------------------  --------  -------------------  ----    ----    -------------  --------  ------------------  -----------------------
c5.large        2       4          nitro       true         true                 x86_64    Up to 10 Gigabit     3       0       0              none      $0.096              $0.02837
c5a.large       2       4          nitro       true         false                x86_64    Up to 10 Gigabit     3       0       0              none      $0.086              $0.04022
c5d.large       2       4          nitro       true         true                 x86_64    Up to 10 Gigabit     3       0       0              none      $0.11               $0.03265
c6i.large       2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.096              $0.03425
c6id.large      2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.1155             $0.03172
c6in.large      2       4          nitro       true         true                 x86_64    Up to 25 Gigabit     3       0       0              none      $0.1281             $0.04267
c7i-flex.large  2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.09576            $0.02872
c7i.large       2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.1008             $0.02977

#Internally ec2-instance-selector is making calls to the DescribeInstanceTypes for the specific region and filtering the instances based on the criteria selected in the command line, in our case we filtered for instances that meet the following criteria:
- Instances with no GPUs
- of x86_64 Architecture (no ARM instances like A1 or m6g instances for example)
- Instances that have 2 vCPUs and 4 GB of RAM
- Instances of current generation (4th gen onwards)
- Instances that don’t meet the regular expression t.* to filter out burstable instance types

적절한 인스턴스 스펙을 추천해준다 (키워드, 스펙 기반)

#
kubectl get nodes -l eks.amazonaws.com/capacityType=ON_DEMAND
kubectl get nodes -L eks.amazonaws.com/capacityType
NAME                                              STATUS   ROLES    AGE   VERSION               CAPACITYTYPE
ip-192-168-1-65.ap-northeast-2.compute.internal   Ready    <none>   75m   v1.28.5-eks-5e0fdde   ON_DEMAND
ip-192-168-2-89.ap-northeast-2.compute.internal   Ready    <none>   75m   v1.28.5-eks-5e0fdde   ON_DEMAND
ip-192-168-3-39.ap-northeast-2.compute.internal   Ready    <none>   75m   v1.28.5-eks-5e0fdde   ON_DEMAND

# 노드 그룹 생성
NODEROLEARN=$(aws iam list-roles --query "Roles[?contains(RoleName, 'nodegroup-ng1')].Arn" --output text)
echo $NODEROLEARN

aws eks create-nodegroup \
  --cluster-name $CLUSTER_NAME \
  --nodegroup-name managed-spot \
  --subnets $PubSubnet1 $PubSubnet2 $PubSubnet3 \
  --node-role $NODEROLEARN \
  --instance-types c5.large c5d.large c5a.large \
  --capacity-type SPOT \
  --scaling-config minSize=2,maxSize=3,desiredSize=2 \
  --disk-size 20

# The command can be used to wait until a specific EKS node group is active and ready for use.
aws eks wait nodegroup-active --cluster-name $CLUSTER_NAME --nodegroup-name managed-spot

# 확인
kubectl get nodes -L eks.amazonaws.com/capacityType,eks.amazonaws.com/nodegroup
NAME                                               STATUS   ROLES    AGE    VERSION               CAPACITYTYPE   NODEGROUP
ip-192-168-1-229.ap-northeast-2.compute.internal   Ready    <none>   102s   v1.31.5-eks-5d632ec   SPOT           managed-spot
ip-192-168-1-68.ap-northeast-2.compute.internal    Ready    <none>   87m    v1.31.5-eks-5d632ec   ON_DEMAND      ng1
ip-192-168-2-138.ap-northeast-2.compute.internal   Ready    <none>   103s   v1.31.5-eks-5d632ec   SPOT           managed-spot
ip-192-168-2-27.ap-northeast-2.compute.internal    Ready    <none>   88m    v1.31.5-eks-5d632ec   ON_DEMAND      ng1
ip-192-168-3-183.ap-northeast-2.compute.internal   Ready    <none>   87m    v1.31.5-eks-5d632ec   ON_DEMAND      ng1

wait를 하게 되면 Spot Request에 스팟 인스턴스들이 추가된다.

아까와 다르게 capacity type도 SPOT인 노드들이 뜬 것을 확인할 수 있다.

#
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  terminationGracePeriodSeconds: 3
  containers:
  - name: busybox
    image: busybox
    command:
    - "/bin/sh"
    - "-c"
    - "while true; do date >> /home/pod-out.txt; cd /home; sync; sync; sleep 10; done"
  nodeSelector:
    eks.amazonaws.com/capacityType: SPOT
EOF

# 파드가 배포된 노드 정보 확인
kubectl get pod -owide

# 삭제
kubectl delete pod busybox

Spot 중단을 처리하기 위해 AWS Node Termination Handler와 같은 클러스터에 자동화 도구를 설치할 필요는 없고, Amazon EC2 Auto Scaling 그룹을 구성하고 자동으로 중단을 처리해준다.

실습 완료 후 자원 삭제

(실습 했을 경우) AWS ECR 저장소 삭제
Amazon EKS 클러스터 삭제(10분 정도 소요): eksctl delete cluster --name $CLUSTER_NAME
(클러스터 삭제 완료 확인 후) AWS CloudFormation 스택 삭제 : aws cloudformation delete-stack --stack-name myeks
EKS 배포 후 실습 편의를 위한 변수 설정 삭제 : macOS : vi ~/.zshrc , Windows(WSL2) : vi ~/.bashrc

ajufresh

공블로그

이전 포스트

[AWES 3기] 2주차 스터디 내용 정리(2)

다음 포스트