Logging & Monitoring
Test Monitor Cluster Components
1
Deploy the metrics-server by creating all the components downloaded.
Run the kubectl create -f . command from within the downloaded repository.
$ git clone https://github.com/kodekloudhub/kubernetes-metrics-server.git
$ cd kubernetes-metrics-server
$ k apply -f .
2
Identify the node that consumes the most CPU(cores).
$ k top node
3
Identify the POD that consumes the most Memory(bytes) in default namespace.
$ k top pod
Test Managing Application Logs
1
A user - USER5 - has expressed concerns accessing the application. Identify the cause of the issue.
Inspect the logs of the POD
$ k get pod
$ k logs webapp-1
2
A user is reporting issues while trying to purchase an item. Identify the user and the cause of the issue.
Inspect the logs of the webapp in the POD
$ k get pod
$ k logs webapp-2
Cluster Maintenance
OS Upgrades
1
Let us explore the environment first. How many nodes do you see in the cluster?
$ k get node
2
How many applications do you see hosted on the cluster?
$ k get deploy
3
Which nodes are the applications hosted on?
$ k get pod -o wide
4
We need to take node01 out for maintenance. Empty the node of all applications and mark it unschedulable.
$ k drain node01 --ignore-daemonsets
5
The maintenance tasks have been completed. Configure the node node01 to be schedulable again.
$ k uncordon node01
6
hr-app is a critical app and we do not want it to be removed and we do not want to schedule any more pods on node01.
Mark node01 as unschedulable so that no new pods are scheduled on this node.
Make sure that hr-app is not affected.
$ k cordon node01
Cluster Upgrade Process
1
This lab tests your skills on upgrading a kubernetes cluster. We have a production cluster with applications running on it. Let us explore the setup first.
What is the current version of the cluster?
$ k get node -o wide
2
How many nodes can host workloads in this cluster?
$ k describe node | grep -i taint
3
How many applications are hosted on the cluster?
$ k get deploy
4
What nodes are the pods hosted on?
$ k get pod -o wide
5
What is the latest version available for an upgrade with the current version of the kubeadm tool installed?
Use the kubeadm tool
$ kubeadm upgrade plan
6
We will be upgrading the controlplane node first. Drain the controlplane node of workloads and mark it UnSchedulable
$ k drain controlplane --ignore-daemonsets
7
Upgrade the controlplane components to exact version v1.29.0
$ vi /etc/apt/sources.list.d/kubernetes.list
/etc/apt/sources.list.d/kubernetes.list
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /
$ apt update
$ apt-cache madison kubeadm
$ apt-get install kubeadm=1.29.0-1.1
$ kubeadm upgrade plan v1.29.0
$ kubeadm upgrade apply v1.29.0
$ apt-get install kubelet=1.29.0-1.1
$ systemctl daemon-reload
$ systemctl restart kubelet
8
Mark the controlplane node as "Schedulable" again
$ kubectl uncordon controlplane
9
Next is the worker node. Drain the worker node of the workloads and mark it UnSchedulable
$ k drain node01 --ignore-daemonsets
10
Upgrade the worker node to the exact version v1.29.0
$ ssh node01
$ vi /etc/apt/sources.list.d/kubernetes.list
/etc/apt/sources.list.d/kubernetes.list
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /
$ apt update
$ apt-cache madison kubeadm
$ apt-get install kubeadm=1.29.0-1.1
$ kubeadm upgrade node
$ apt-get install kubelet=1.29.0-1.1
$ systemctl daemon-reload
$ systemctl restart kubelet
$ k uncordon node01
Backup and Restore Methods
1
We have a working Kubernetes cluster with a set of web applications running. Let us first explore the setup.
How many deployments exist in the cluster in default namespace?
$ k get deploy
2
What is the version of ETCD running on the cluster?
$ k describe pod -n kube-system etcd-controlplane | grep -i image
3
At what address can you reach the ETCD cluster from the controlplane node?
Check the ETCD Service configuration in the ETCD POD
$ $ k describe pod -n kube-system etcd-controlplane
4
The master node in our cluster is planned for a regular maintenance reboot tonight. While we do not anticipate anything to go wrong, we are required to take the necessary backups. Take a snapshot of the ETCD database using the built-in snapshot functionality.
Store the backup file at location /opt/snapshot-pre-boot.db
$ ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/snapshot-pre-boot.db
5
Luckily we took a backup. Restore the original state of the cluster using the backup file.
$ ETCDCTL_API=3 etcdctl --data-dir /var/lib/etcd-from-backup \
snapshot restore /opt/snapshot-pre-boot.db
$ cd /etc/kubernetes/manifests
$ vi etcd.yaml
volumes:
- hostPath:
path: /var/lib/etcd-from-backup
type: DirectoryOrCreate
name: etcd-data
Backup and Restore Methods 2
1
How many clusters are defined in the kubeconfig on the student-node?
$ k config view
2
How many nodes (both controlplane and worker) are part of cluster1?
$ k config use-context cluster1
$ k get node
3
What is the name of the controlplane node in cluster2?
$ k config use-context cluster2
$ k get node
4
How is ETCD configured for cluster1?
$ k config use-context cluster1
$ k get pod -n kube-system
5
How is ETCD configured for cluster2?
Remember, you can access the clusters from student-node using the kubectl tool. You can also ssh to the cluster nodes from the student-node.
$ k config use-conetxt cluster2
$ kubectl get pods -n kube-system
$ ssh cluster2-controlplane
$ cd /etc/kubernetes/manifests
$ ps -ef | grep etcd
kube-apiserver --advertise-address=192.11.47.6 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.pem --etcd-certfile=/etc/kubernetes/pki/etcd/etcd.pem --etcd-keyfile=/etc/kubernetes/pki/etcd/etcd-key.pem --etcd-servers=https://192.11.47.16:2379 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
6
What is the IP address of the External ETCD datastore used in cluster2?
$ k describe pod -n kube-system kube-apiserver-cluster2-controlplane | grep -i etcd
7
What is the default data directory used the for ETCD datastore used in cluster1?
Remember, this cluster uses a Stacked ETCD topology.
$ k describe pod -n kube-system etcd-cluster1-controlplane
--data-dir=/var/lib/etcd
8
What is the default data directory used the for ETCD datastore used in cluster2?
Remember, this cluster uses an External ETCD topology.
$ ssh etcd-server
$ ps -aux | grep etcd
9
How many nodes are part of the ETCD cluster that etcd-server is a part of?
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/etcd/pki/ca.pem \
--cert=/etc/etcd/pki/etcd.pem \
--key=/etc/etcd/pki/etcd-key.pem \
member list
10
Take a backup of etcd on cluster1 and save it on the student-node at the path /opt/cluster1.db
$ k describe pod -n kube-system etcd-cluster1-controlplane
$ ssh cluster1-controlplane
$ ETCDCTL_API=3 etcdctl --endpoints=https://192.11.47.3:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save /opt/cluster1.db
$ scp cluster1-controlplane:/opt/cluster1.db /opt/cluster1.db
11
An ETCD backup for cluster2 is stored at /opt/cluster2.db. Use this snapshot file to carryout a restore on cluster2 to a new path /var/lib/etcd-data-new.
Once the restore is complete, ensure that the controlplane components on cluster2 are running.
The snapshot was taken when there were objects created in the critical namespace on cluster2. These objects should be available post restore.
$ scp /opt/cluster2.db etcd-server:/root
$ ssh etcd-server
$ ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/etcd/pki/ca.pem --cert=/etc/etcd/pki/etcd.pem --key=/etc/etcd/pki/etcd-key.pem snapshot restore /root/cluster2.db --data-dir /var/lib/etcd-data-new
$ cd /etc/systemd/system
$ vi etcd.service
/etc/systemd/system/etcd.service
ExecStart=/usr/local/bin/etcd \
--name etcd-server \
--data-dir=/var/lib/etcd-data-new \
$ cd /var/lib
$ chown -R etcd:etcd /var/lib/etcd-data-new
$ systemctl daemon-reload
$ systemctl restart etcd