CKA - Cluster Maintenance

임상규·2024년 2월 3일
1

Kubernetes

목록 보기
11/19
post-thumbnail

OS Upgrade

1

Let us explore the environment first. How many nodes do you see in the cluster?

Including the controlplane and worker nodes.
$ k get node -A

2

How many applications do you see hosted on the cluster?

Check the number of deployments in the default namespace.
$ k get deploy

3

Which nodes are the applications hosted on?
$ k get pods -o wide

4

We need to take node01 out for maintenance. Empty the node of all applications and mark it unschedulable.

Node node01 Unschedulable
Pods evicted from node01
$ k drain node01 --ignore-daemonsets

5

The maintenance tasks have been completed. 
Configure the node node01 to be schedulable again.
$ k uncordon node01

6

hr-app is a critical app and we do not want it to be removed and we do not want to schedule any more pods on node01.
Mark node01 as unschedulable so that no new pods are scheduled on this node.

Make sure that hr-app is not affected.
$ k cordon node01

Cluster Upgrade Process

1

This lab tests your skills on upgrading a kubernetes cluster. We have a production cluster with applications running on it. Let us explore the setup first.

What is the current version of the cluster?
$ k get nodes

2

How many applications are hosted on the cluster?
$ k describe nodes | grep -i taints

3

How many applications are hosted on the cluster?

Count the number of deployments in the default namespace.
$ k get depoly

4

What is the latest stable version of Kubernetes as of today?
$ kubeadm upgrade plan

5

What is the latest version available for an upgrade with the current version of the kubeadm tool installed?
$ kubeadm upgrade plan

6

We will be upgrading the controlplane node first. Drain the controlplane node of workloads and mark it UnSchedulable
$ k drain controlplane --ignore-daemonsets

7

Upgrade the controlplane components to exact version v1.27.0
$ apt update
$ apt-get install -y kubeadm='1.27.0-00'
$ kubeadm upgrade apply v1.27.0
$ apt-get install -y kubelet='1.27.0-00' kubectl='1.27.0-00'
$ systemctl daemon-reload
$ systemctl restart kubelet

$ kubeadm version
$ k get node

8

Mark the controlplane node as "Schedulable" again
$ k uncordon controlplane

9

Next is the worker node. Drain the worker node of the workloads and mark it UnSchedulable
$ k drain node01 --ignore-daemonsets

10

Upgrade the worker node to the exact version v1.27.0
$ ssh node01
$ apt-get update
$ apt-get install kubeadm=1.27.0-00
$ kubeadm upgrade node
$ apt-get install kubelet=1.27.0-00 
$ systemctl daemon-reload
$ systemctl restart kubelet

11

Remove the restriction and mark the worker node as schedulable again.
$ k uncordon node01

Backup and Restore Methods

1

We have a working Kubernetes cluster with a set of web applications running. Let us first explore the setup.

How many deployments exist in the cluster in default namespace?
$ k get deploy

2

What is the version of ETCD running on the cluster?
$ k describe pod -n kube-system etcd-controlplane | grep -i image

3

At what address can you reach the ETCD cluster from the controlplane node?
$ k describe pod -n kube-system etcd-controlplane | grep -i '\--listen-client-urls'

4

Where is the ETCD server certificate file located?
$ k describe pod -n kube-system etcd-controlplane | grep -i '\--cert-file'

5

Where is the ETCD CA Certificate file located?
$ k describe pod -n kube-system etcd-controlplane | grep -i 'ca.crt'

6

The master node in our cluster is planned for a regular maintenance reboot tonight. 
While we do not anticipate anything to go wrong, we are required to take the necessary backups. 
Take a snapshot of the ETCD database using the built-in snapshot functionality.
$ ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/snapshot-pre-boot.db

7

Luckily we took a backup. Restore the original state of the cluster using the backup file.
$ ETCDCTL_API=3 etcdctl  --data-dir /var/lib/etcd-from-backup \
snapshot restore /opt/snapshot-pre-boot.db
$ vi /etc/kubernetes/manifests/etcd.yaml

etcd.yaml

volumes:
- hostPath:
    path: /var/lib/etcd-from-backup
    type: DirectoryOrCreate
  name: etcd-data

Backup and Restore Methods 2

1

How many clusters are defined in the kubeconfig on the student-node?
$ k config get-clusters

2

How many nodes (both controlplane and worker) are part of cluster1?
$ k config use-context cluster1
$ k get node

3

What is the name of the controlplane node in cluster2?
$ kubectl config use-context cluster2
$ k get node

4

How is ETCD configured for cluster1?
$ kubectl config use-context cluster1
$ k get pod -n kube-system

5

How is ETCD configured for cluster2?
$ kubectl config use-context cluster2
$ ssh cluster2-controlplane
$ ps -ef | grep etcd
$ kubectl -n kube-system describe pod kube-apiserver-cluster2-controlplane 

6

What is the default data directory used the for ETCD datastore used in cluster1?
Remember, this cluster uses a Stacked ETCD topology.
$ k describe pod -n kube-system etcd-cluster1-controlplane | grep data-dir

7

What is the default data directory used the for ETCD datastore used in cluster2?
Remember, this cluster uses an External ETCD topology.
$ ssh etcd-server
$ ps -ef | grep -i etcd
profile
Cloud Engineer / DevOps Engineer

0개의 댓글