[Week3] AWS EKS Storage & Nodegroup (3/3)

오태경·2024년 3월 23일

본 게시물은 CloudNet@팀 Gasida(서종호) 님이 진행하시는
AWS EKS Workshop Study 내용을 기반으로 작성되었습니다.

NodeGroup


  • Graviton Node Group 생성
kubectl get nodes -L kubernetes.io/arch

# 신규 노드 그룹 생성
eksctl create nodegroup --help
eksctl create nodegroup -c $CLUSTER_NAME -r $AWS_DEFAULT_REGION --subnet-ids "$PubSubnet1","$PubSubnet2","$PubSubnet3" --ssh-access \
  -n ng3 -t t4g.medium -N 1 -m 1 -M 1 --node-volume-size=30 --node-labels family=graviton --dry-run > myng3.yaml
eksctl create nodegroup -f myng3.yaml

# 확인
kubectl get nodes --label-columns 
NAME                                               STATUS   ROLES    AGE     VERSION               NODEGROUP   ARCH
ip-192-168-1-170.ap-northeast-2.compute.internal   Ready    <none>   4m47s   v1.28.5-eks-5e0fdde   ng3         arm64
ip-192-168-1-171.ap-northeast-2.compute.internal   Ready    <none>   3h16m   v1.28.5-eks-5e0fdde   ng1         amd64
ip-192-168-2-201.ap-northeast-2.compute.internal   Ready    <none>   3h16m   v1.28.5-eks-5e0fdde   ng1         amd64
ip-192-168-3-39.ap-northeast-2.compute.internal    Ready    <none>   3h16m   v1.28.5-eks-5e0fdde   ng1         amd64eks.amazonaws.com/nodegroup,kubernetes.io/arch

kubectl describe nodes --selector family=graviton

aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng3 | jq .nodegroup.taints

# taints 셋팅 -> 적용에 2~3분 정도 시간 소요
aws eks update-nodegroup-config --cluster-name $CLUSTER_NAME --nodegroup-name ng3 --taints "addOrUpdateTaints=[{key=frontend, value=true, effect=NO_EXECUTE}]"

# 확인
kubectl describe nodes --selector family=graviton | grep Taints
Taints:             frontend=true:NoExecute

aws eks describe-nodegroup --cluster-name $CLUSTER_NAME --nodegroup-name ng3 | jq .nodegroup.taints
[
  {
    "key": "frontend",
    "value": "true",
    "effect": "NO_EXECUTE"
  }
]
# NO_SCHEDULE - This corresponds to the Kubernetes NoSchedule taint effect. This configures the managed node group with a taint that repels all pods that don't have a matching toleration. All running pods are not evicted from the manage node group's nodes.
# NO_EXECUTE - This corresponds to the Kubernetes NoExecute taint effect. Allows nodes configured with this taint to not only repel newly scheduled pods but also evicts any running pods without a matching toleration.
# PREFER_NO_SCHEDULE - This corresponds to the Kubernetes PreferNoSchedule taint effect. If possible, EKS avoids scheduling Pods that do not tolerate this taint onto the node.
  • Graviton Node에 Pod 실행

    #
    cat << EOT > busybox.yaml
    apiVersion: v1
    kind: Pod
    metadata:
     name: busybox
    spec:
     terminationGracePeriodSeconds: 3
     containers:
     - name: busybox
       image: busybox
       command:
       - "/bin/sh"
       - "-c"
       - "while true; do date >> /home/pod-out.txt; cd /home; sync; sync; sleep 10; done"
     tolerations:
       - effect: NoExecute
         key: frontend
         operator: Exists
    EOT
    
    kubectl apply -f busybox.yaml
    
    # 파드가 배포된 노드 정보 확인
    kubectl get pod -owide
    NAME      READY   STATUS    RESTARTS   AGE   IP              NODE                                               NOMINATED NODE   READINESS GATES
    busybox   1/1     Running   0          54s   192.168.1.168   ip-192-168-1-170.ap-northeast-2.compute.internal   <none>           <none>
    
    # 삭제
    kubectl delete pod busybox
    eksctl delete nodegroup -c $CLUSTER_NAME -n ng3
    



  • Instance type diversification - Link

    # ec2-instance-selector 설치
    curl -Lo ec2-instance-selector https://github.com/aws/amazon-ec2-instance-selector/releases/download/v2.4.1/ec2-instance-selector-`uname | tr '[:upper:]' '[:lower:]'`-amd64 && chmod +x ec2-instance-selector
    mv ec2-instance-selector /usr/local/bin/
    ec2-instance-selector --version
    v2.4.1
    
    # 사용
    ec2-instance-selector --vcpus 2 --memory 4 --gpus 0 --current-generation -a x86_64 --deny-list 't.*' --output table-wide
    Instance Type  VCPUs   Mem (GiB)  Hypervisor  Current Gen  Hibernation Support  CPU Arch  Network Performance  ENIs    GPUs    GPU Mem (GiB)  GPU Info  On-Demand Price/Hr  Spot Price/Hr (30d avg)
    c5.large       2       4          nitro       true         true                 x86_64    Up to 10 Gigabit     3       0       0              none      $0.096              $0.04839
    c5a.large      2       4          nitro       true         false                x86_64    Up to 10 Gigabit     3       0       0              none      $0.086              $0.0292
    c5d.large      2       4          nitro       true         true                 x86_64    Up to 10 Gigabit     3       0       0              none      $0.11               $0.03285
    c6i.large      2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.096              $0.03542
    c6id.large     2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.1155             $0.02768
    c6in.large     2       4          nitro       true         false                x86_64    Up to 25 Gigabit     3       0       0              none      $0.1281             $0.02754
    c7i.large      2       4          nitro       true         true                 x86_64    Up to 12.5 Gigabit   3       0       0              none      $0.1008             $0.02998
    
    #Internally ec2-instance-selector is making calls to the DescribeInstanceTypes for the specific region and filtering the instances based on the criteria selected in the command line, in our case we filtered for instances that meet the following criteria:
    #- Instances with no GPUs
    #- of x86_64 Architecture (no ARM instances like A1 or m6g instances for example)
    #- Instances that have 2 vCPUs and 4 GB of RAM
    #- Instances of current generation (4th gen onwards)
    #- Instances that don’t meet the regular expression t.* to filter out burstable instance types
    
  • Spot Capacity 생성 - Link

    #
    kubectl get nodes -l eks.amazonaws.com/capacityType=ON_DEMAND
    kubectl get nodes -L eks.amazonaws.com/capacityType
    NAME                                               STATUS   ROLES    AGE     VERSION               CAPACITYTYPE
    ip-192-168-1-171.ap-northeast-2.compute.internal   Ready    <none>   3h32m   v1.28.5-eks-5e0fdde   ON_DEMAND
    ip-192-168-2-201.ap-northeast-2.compute.internal   Ready    <none>   3h32m   v1.28.5-eks-5e0fdde   ON_DEMAND
    ip-192-168-3-39.ap-northeast-2.compute.internal    Ready    <none>   3h32m   v1.28.5-eks-5e0fdde   ON_DEMAND
    
    # 생성 : 아래 node-role 은 각자 자신의 노드 role ARN을 입력
    # role AWSServiceRoleForAmazonEKSNodegroup 테스트해보자
    
    aws eks create-nodegroup \
     --cluster-name $CLUSTER_NAME \
     --nodegroup-name managed-spot \
     --subnets $PubSubnet1 $PubSubnet2 $PubSubnet3 \
     --node-role arn:aws:iam::911283464785:role/eksctl-myeks-nodegroup-ng1-NodeInstanceRole-wvZ2FX2m79Vv \
     --instance-types c5.large c5d.large c5a.large \
     --capacity-type SPOT \
     --scaling-config minSize=2,maxSize=3,desiredSize=2 \
     --disk-size 20
    
    #
    aws eks wait nodegroup-active --cluster-name $CLUSTER_NAME --nodegroup-name managed-spot
    
    # 확인
    kubectl get nodes -L eks.amazonaws.com/capacityType,eks.amazonaws.com/nodegroup
    NAME                                               STATUS   ROLES    AGE     VERSION               CAPACITYTYPE   NODEGROUP
    ip-192-168-1-171.ap-northeast-2.compute.internal   Ready    <none>   3h37m   v1.28.5-eks-5e0fdde   ON_DEMAND      ng1
    ip-192-168-1-254.ap-northeast-2.compute.internal   Ready    <none>   41s     v1.28.5-eks-5e0fdde   SPOT           managed-spot
    ip-192-168-2-201.ap-northeast-2.compute.internal   Ready    <none>   3h37m   v1.28.5-eks-5e0fdde   ON_DEMAND      ng1
    ip-192-168-3-139.ap-northeast-2.compute.internal   Ready    <none>   40s     v1.28.5-eks-5e0fdde   SPOT           managed-spot
    ip-192-168-3-39.ap-northeast-2.compute.internal    Ready    <none>   3h37m   v1.28.5-eks-5e0fdde   ON_DEMAND      ng1
  • Spot 요청 확인

  • Spot Node에 워크로드 실행

    #
    cat << EOT > busybox.yaml
    apiVersion: v1
    kind: Pod
    metadata:
     name: busybox
    spec:
     terminationGracePeriodSeconds: 3
     containers:
      - name: busybox
       image: busybox
       command:
       - "/bin/sh"
       - "-c"
       - "while true; do date >> /home/pod-out.txt; cd /home; sync; sync; sleep 10; done"
     nodeSelector:
       eks.amazonaws.com/capacityType: SPOT
    EOT
    kubectl apply -f busybox.yaml
    
    # 파드가 배포된 노드 정보 확인
    kubectl get pod -owide
    NAME      READY   STATUS    RESTARTS   AGE   IP             NODE                                               NOMINATED NODE   READINESS GATES
    busybox   1/1     Running   0          63s   192.168.1.10   ip-192-168-1-254.ap-northeast-2.compute.internal   <none>           <none>
    
    # 삭제
    kubectl delete pod busybox
    eksctl delete nodegroup -c $CLUSTER_NAME -n managed-spot

0개의 댓글