이전 포스팅에서 Kamaji Control Plane과 CAPO를 이용한 기본 클러스터 생성 방법을 다뤘는데, 실제 POC 환경을 운영하면서 겪었던 다양한 이슈들과 해결 과정을 정리해보려고 한다.
ssh ubuntu@XX.XX.XXX.XX -i capi-ssh-key.pem$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
mgmt-master-01 Ready control-plane 25h v1.33.2 10.0.0.XXX
mgmt-master-02 Ready control-plane 25h v1.33.2 10.0.0.XXX
mgmt-master-03 Ready control-plane 25h v1.33.2 10.0.0.XXX
mgmt-worker-01 Ready <none> 25h v1.33.2 10.0.0.XXX
mgmt-worker-02 Ready <none> 25h v1.33.2 10.0.0.XXX
# 1. CSI 설치 (Local Path Provisioner for POC)
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.31/deploy/local-path-storage.yaml
# 2. Cert-manager 설치 (Kamaji Webhook용)
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true
# 3. Kamaji 설치
helm install kamaji clastix/kamaji \
--version 0.0.0+latest \
--namespace kamaji-system \
--create-namespace \
--set image.tag=latest
# 4. ORC 설치 (OpenStack Resource Controller - 필수!)
kubectl apply -f https://github.com/k-orc/openstack-resource-controller/releases/latest/download/install.yaml
# 5. CAPI, CAPO 설치
clusterctl init --infrastructure openstack --control-plane kamaji
Kamaji의 LoadBalancer Service를 OpenStack Octavia로 사용하기 위해 반드시 필요하다.
# cloud.conf 작성
[Global]
auth-url=https://keystone-xxx.internal.com/v3
user-id=xxxxxxxxx
password=xxxxxxxxxxxx
region=region-name
tenant-id=xxxxxxxxx
[LoadBalancer]
subnet-id=xxxxxxxxx
floating-network-id=xxxxxxxxx
# Secret 생성 및 CCM 배포
kubectl create secret -n kube-system generic cloud-config --from-file=cloud.conf
kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/cloud-controller-manager-roles.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/cloud-controller-manager-role-bindings.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/cloud-provider-openstack/master/manifests/controller-manager/openstack-cloud-controller-manager-ds.yaml
# 컨트롤 플레인 노드 라벨링
kubectl label node mgmt-master-01 node-role.kubernetes.io/control-plane=true --overwrite
kubectl label node mgmt-master-02 node-role.kubernetes.io/control-plane=true --overwrite
kubectl label node mgmt-master-03 node-role.kubernetes.io/control-plane=true --overwrite
첫 번째 큰 장벽이었던 OpenStack 엔드포인트 연결 문제:
E0709 01:31:47.696482 1 controller.go:316] "Reconciler error"
err="providerClient authentication err: Post \"https://keystone-xxx.internal.com/v3/auth/tokens\":
dial tcp 10.XX.XX.200:443: connect: connection timed out"
문제 원인: NodeLocal DNS가 잘못된 IP로 해석
해결 방법: CoreDNS와 NodeLocal DNS에 hosts 엔트리 추가
# CoreDNS ConfigMap에 추가
hosts {
172.17.X.200 keystone-xxx.internal.com
172.17.X.200 neutron-xxx.internal.com
172.17.X.200 nova-xxx.internal.com
# ... 기타 오픈스택 서비스들
fallthrough
}
# NodeLocal DNS ConfigMap에도 동일하게 추가
.:53 {
errors
hosts {
172.17.X.200 keystone-xxx.internal.com
# ... 동일한 엔트리들
fallthrough
}
테넌트 클러스터의 Pod에 접속할 수 없는 문제가 발생했다:
$ kubectl --kubeconfig=tenant-kubeconfig exec -it pod-name -- sh
Error: no preferred addresses found; known addresses: [{Hostname node-name}]
$ kubectl --kubeconfig=tenant-kubeconfig get nodes -o wide
NAME STATUS ROLES INTERNAL-IP EXTERNAL-IP
tenant-worker-01 Ready <none> <none> <none> # 문제!
핵심 원인: 워커 노드의 InternalIP가 설정되지 않음
해결 방법: KubeadmConfigTemplate에 node-ip 설정 추가
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
spec:
template:
spec:
preKubeadmCommands:
- |
MAIN_IP=$(hostname -I | awk '{print $1}')
echo "KUBELET_EXTRA_ARGS=--node-ip=${MAIN_IP}" > /etc/default/kubelet
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
provider-id: "openstack:///'{{ instance_id }}'"
동시에 KamajiControlPlane에서 preferredAddressTypes를 InternalIP로 설정:
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: KamajiControlPlane
spec:
kubelet:
cgroupfs: systemd
preferredAddressTypes:
- InternalIP # 이게 핵심!
# ... 기타 설정
복잡한 멀티 External Gateway 환경에서의 라우팅 설정:
# External용 라우터 생성
openstack router create kamaji-ext-router
openstack router set kamaji-ext-router \
--external-gateway xxxxxxxxx \
--fixed-ip subnet=xxxxxxxxx
# 포트 생성 및 연결
openstack port create \
--network mgmt-network \
--fixed-ip subnet=mgmt-subnet,ip-address=10.0.0.254 \
kamaji-router-port
openstack router add port kamaji-ext-router kamaji-router-port
# 관리 라우터에 static route 추가
openstack router add route \
--route destination=172.17.5.0/24,gateway=10.0.0.254 \
mgmt-router
기본적으로 ephemeral disk를 사용하는데, Cinder Volume으로 변경:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
spec:
template:
spec:
rootVolume:
sizeGiB: 50
type: '${CINDER_VOL_TYPE}'
availabilityZone:
name: '${CINDER_AZ}'
실제 사용하는 핵심 템플릿 부분:
---
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: KamajiControlPlane
spec:
replicas: 3
version: 'v1.32.4'
apiServer:
extraArgs:
- --cloud-provider=external
controllerManager:
extraArgs:
- --cloud-provider=external
kubelet:
cgroupfs: systemd
preferredAddressTypes:
- InternalIP
network:
serviceType: LoadBalancer
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackCluster
spec:
# Kamaji 사용시 반드시 false!
apiServerLoadBalancer:
enabled: false
disableAPIServerFloatingIP: true
managedSecurityGroups:
allNodesSecurityGroupRules:
# Calico BGP 통신용
- description: "BGP (calico)"
direction: ingress
etherType: IPv4
name: "BGP (Calico)"
portRangeMin: 179
portRangeMax: 179
protocol: "tcp"
remoteManagedGroups:
- controlplane
- worker
# Terminating 상태에서 멈춘 경우
kubectl patch cluster cluster-name -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl delete cluster cluster-name --force --grace-period=0
# SOFT_DELETED 인스턴스 정리
for i in $(openstack server list --deleted | grep SOFT | awk '{print $2}'); do
openstack server delete $i --force
done
External IP (172.17.5.XX) → LB VIP (10.0.0.XX) → NodePort (31920, 30330)
테넌트 클러스터 배포 후 수동 작업들:
# kubeconfig 다운로드
kubectl get secret ${TENANT_NAME}-control-plane-admin-kubeconfig \
-o jsonpath='{.data.admin\.conf}' | base64 -d > tenant-kubeconfig
# CNI 배포 (향후 자동화 필요)
curl https://raw.githubusercontent.com/projectcalico/calico/v3.24.1/manifests/calico.yaml -O
kubectl --kubeconfig=tenant-kubeconfig apply -f calico.yaml