[flagger] 개요

zzery·2022년 5월 19일

zzerym

일지(2022~2024)

목록 보기

10/25

flagger

https://docs.flagger.app/install/flagger-install-on-kubernetes

이 글은 공식 문서 내용 정리한거고 실제 사용은 아직 안했다.

install

helm 사용
다만 로컬에서 설치할 때 prometheus 도메인은 확인 필요.

❯ helm repo add flagger https://flagger.app
"flagger" has been added to your repositories

# canary CRD 적용
❯ kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
customresourcedefinition.apiextensions.k8s.io/canaries.flagger.app created
customresourcedefinition.apiextensions.k8s.io/metrictemplates.flagger.app created
customresourcedefinition.apiextensions.k8s.io/alertproviders.flagger.app created

❯ k get crd
NAME                                       CREATED AT
alertproviders.flagger.app                 2022-05-19T15:54:19Z # 👍
authorizationpolicies.security.istio.io    2022-05-19T15:05:50Z
canaries.flagger.app                       2022-05-19T15:54:19Z # 👍
demoes.demoapp.my.domain                   2022-04-24T10:46:39Z
destinationrules.networking.istio.io       2022-05-19T15:05:50Z
envoyfilters.networking.istio.io           2022-05-19T15:05:50Z
gateways.networking.istio.io               2022-05-19T15:05:50Z
istiooperators.install.istio.io            2022-05-19T15:05:50Z
metrictemplates.flagger.app                2022-05-19T15:54:19Z # 👍
peerauthentications.security.istio.io      2022-05-19T15:05:50Z
proxyconfigs.networking.istio.io           2022-05-19T15:05:50Z
requestauthentications.security.istio.io   2022-05-19T15:05:50Z
serviceentries.networking.istio.io         2022-05-19T15:05:50Z
sidecars.networking.istio.io               2022-05-19T15:05:50Z
telemetries.telemetry.istio.io             2022-05-19T15:05:50Z
virtualservices.networking.istio.io        2022-05-19T15:05:50Z
wasmplugins.extensions.istio.io            2022-05-19T15:05:50Z
workloadentries.networking.istio.io        2022-05-19T15:05:50Z
workloadgroups.networking.istio.io         2022-05-19T15:05:50Z

# istio provider로 배포
❯ helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set crd.create=false \
--set meshProvider=istio \
--set metricsServer=http://prometheus:9090

For Istio multi-cluster shared control plane

각 클러스터마다 flagger 설치
istio control-plane host cluster의 kubeconfig 지정

❯ helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set crd.create=false \
--set meshProvider=istio \
--set metricsServer=http://istio-cluster-prometheus:9090 \ # 🙊
--set controlplane.kubeconfig.secretName=istio-kubeconfig \ # 🙊
--set controlplane.kubeconfig.key=kubeconfig # 🙊

사용

nginx-ingress 로 사용한건 여기 참고
https://kmaster.tistory.com/m/16

설치

app 배포 (deployment or daemonset)

canary CR 작성

다만 canary가 계속 모니터링 하는건지 여부는 모르겠음.
배포 순서도

app 배포 -> canary 생성
app, canary 전부 동시에 생성

인지 확인해봐야 함.

app 배포

target: Deployment, DaemonSet
service: Service

배포한 app에 대한 Canary CR

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: "app에 대한 canary CR 이름 작성"
spec:
  # app에 대한 내용 ---
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <"여기에 Deployment 이름 작성"
  service:
    port: 9898
  # ---
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 5
    metrics:
      - name: request-success-rate
        thresholdRange:
          min: 99
        interval: 1m
      - name: request-duration
        thresholdRange:
          max: 500
        interval: 1m
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

target 관련

targetRef
- 대상 deployment에 대한 정보
- deployment/<targetRef.name>-primary 생성
autoscalerRef (optional)
- hpa 관련
- hpa/<autoscalerRef.name>-primary 생성

primary 리소스

배포한 app의 stable release
모든 트래픽이 primary로 가고 target은 scale값이 0으로 바뀜.
target에 대한 변경을 감지하면 (configMap, Secret 포함) canary analysis를 수행.
canary analysis를 통과하면 primary를 새 버전으로 바꿈.

spec:
  progressDeadlineSeconds: 60
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo

단, target의 label selector는 꼭 하나만 있어야 됨.

app: <DEPLOYMENT-NAME>

apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
spec:
  selector: # 🙊
    matchLabels: # 🙊
      app: podinfo # 🙊
  template:
    metadata:
      labels:
        app: podinfo

셀렉터가 여러개로 배포되는데 이건 호환이 되는지...?

selector:
	matchLabels:
    	app: "app-name"
        code: "service-code"
        env: "dev"
        part-of: "app-name"

FAQ: https://docs.flagger.app/faq#label-selectors

이미 배포된 deploy에 여러 selector가 있어도 상관은 없는 것 같다.
다만 canary가 바라볼 label만 단 한개로 지정.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
spec:
  selector:
    matchLabels:
      app: podinfo
      affinity: podinfo
  template:
    metadata:
      labels:
        app: podinfo
        affinity: podinfo
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  affinity: podinfo
              topologyKey: topology.kubernetes.io/zone

flagger에서 기본 지원하는 selector 형식

app: <DEPLOYMENT-NAME>
name: <DEPLOYMENT-NAME>
app.kubernetes.io/name: <DEPLOYMENT-NAME>

만약 다른 라벨로 selector 쓰고싶다면 방법 2개 중 하나 사용

flagger deployment manifest에 수정
- -selector-labels=my-app-label 를 arg로
helm으로 설치할 때 옵션 추가
- --set selectorLabels=my-app-label

configMap, Secret 관리 방식

Flagger will create a copy of each object using the -primary suffix and will reference these objects in the primary deployment.

deploy와 마찬가지로 -primary 네이밍이 붙는 copy 리소스를 사용.

configMap, Secret 리소스에 라벨을 줘서 같은 리소스를 공통으로 쓸 수 있다.

flagger.app/config-tracking: disabled

global하게 disable 하는 방법도 2개임.

flagger deployment manifest에 수정
- -enable-config-tracking=false 를 arg로
helm으로 설치할 때 옵션 추가
- --set configTracking.enabled=false

하지만 전역 설정보다 리소스마다 annotation 추가해주는게 use-case에 더 맞을거라고 문서에 나와있었다.

autoscaler (optional)

Flagger will pause the traffic increase while the target and primary deployments are scaled up or down.

HPA can help reduce the resource usage during the canary analysis.

When the autoscaler reference is specified, any changes made to the autoscaler are only made active in the primary autoscaler when a rollout for the deployment starts and completes successfully.

Optionally, you can create two HPAs, one for canary and one for the primary to update the HPA without doing a new rollout.
As the canary deployment will be scaled to 0, the HPA on the canary will be inactive.

service

기본으론 service 이름을 deploy와 동일하게 적용하는걸 전제로 함.

근데 canary.spec.service 에서 service 이름을 따로 작성해줄 수 있기에 괜찮은 것 같다.

spec:
  service:
    name: podinfo # 이름을 명시하면 deploy 이름과 달라도 됨
    port: 9898
    portName: http
    targetPort: 9898
    portDiscovery: true

portName도 optional이고 기본은 http로 되어있음.
만약 gRPC를 쓴다면 portName을 grpc 로 써줘야 한다.

port discovery

If port discovery is enabled, Flagger scans the target workload and extracts the containers ports excluding the port specified in the canary service and service mesh sidecar ports. These ports will be used when generating the ClusterIP services.

canary.spec.service 작성으로 생성되는 service 객체들

<service.name>.<namespace>.svc.cluster.local
- selector app=<name>-primary
<service.name>-primary.<namespace>.svc.cluster.local
- selector app=<name>-primary
<service.name>-canary.<namespace>.svc.cluster.local
- selector app=<name>

This ensures that traffic to podinfo.test:9898 will be routed to the latest stable release of your app. The podinfo-canary.test:9898 address is available only during the canary analysis and can be used for conformance testing or load testing.

selector 설정을 수정할 수 있다.

apex?

여기서 정의한 annotation은 both the generated Kubernetes Service and the generated service mesh/ingress object.
- This allows using external-dns with Istio VirtualServices and TraefikServices

istio에 맞춘 설정은 FAQ를 참고

spec:
  service:
    port: 9898
    apex:
      annotations:
        test: "test"
      labels:
        test: "test"
    canary:
      annotations:
        test: "test"
      labels:
        test: "test"
    primary:
      annotations:
        test: "test"
      labels:
        test: "test"

canary 리소스 상태 확인

❯ k get canaries --all-namespaces

NAMESPACE   NAME      STATUS        WEIGHT   LASTTRANSITIONTIME
test        podinfo   Progressing   15       2019-06-30T14:05:07Z
prod        frontend  Succeeded     0        2019-06-30T16:15:07Z
prod        backend   Failed        0        2019-06-30T17:05:07Z

이 리소스에 대해서 성공적인 상태는 이렇게 됨

status:
  canaryWeight: 0
  failedChecks: 0
  iterations: 0
  lastAppliedSpec: "14788816656920327485"
  lastPromotedSpec: "14788816656920327485"
  conditions:
  - lastTransitionTime: "2019-07-10T08:23:18Z"
    lastUpdateTime: "2019-07-10T08:23:18Z"
    message: Canary analysis completed successfully, promotion finished.
    reason: Succeeded
    status: "True"
    type: Promoted

상태 종류는 아래와 같음.

Initialized
Waiting
Progressing
WaitingPromotion
Promoting
Finalising
Succeeded or Failed

A failed canary will have the promoted status set to false.

the reason to failed and the last applied spec will be different to the last promoted one.

CI example도 있다.

# update the container image
kubectl set image deployment/podinfo podinfod=stefanprodan/podinfo:3.0.1

# wait for Flagger to detect the change
ok=false
until ${ok}; do
    kubectl get canary/podinfo | grep 'Progressing' && ok=true || ok=false
    sleep 5
done

# wait for the canary analysis to finish
kubectl wait canary/podinfo --for=condition=promoted --timeout=5m

# check if the deployment was successful 
kubectl get canary/podinfo | grep Succeeded

canary analysis

the type of deployment strategy
the metrics used to validate the canary version
the webhooks used for conformance testing, load testing and manual gating
the alerting settings 👍

템플릿

  analysis:
    # schedule interval (default 60s)
    interval:
    # max number of failed metric checks before rollback
    threshold:
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight:
    # canary increment step
    # percentage (0-100)
    stepWeight:
    # promotion increment step
    # percentage (0-100)
    stepWeightPromotion:
    # total number of iterations
    # used for A/B Testing and Blue/Green
    iterations:
    # threshold of primary pods that need to be available to consider it ready
    # before starting rollout. this is optional and the default is 100
    # percentage (0-100)
    primaryReadyThreshold: 100
    # threshold of canary pods that need to be available to consider it ready
    # before starting rollout. this is optional and the default is 100
    # percentage (0-100)
    canaryReadyThreshold: 100
    # canary match conditions
    # used for A/B Testing
    match:
      - # HTTP header
    # key performance indicators
    metrics:
      - # metric check
    # alerting
    alerts:
      - # alert provider
    # external checks
    webhooks:
      - # hook