In a previous post I wrote about how to scale workload cluster control plane and worker nodes vertically. This post explains how to do the same for the TKG Management Cluster nodes.
Scaling vertically is increasing or decreasing the CPU, Memory, Disk or changing other things such as the network for the nodes. Using the Cluster API it is possible to make these changes on the fly, Kubernetes will use rolling updates to make the necessary changes.
First change to the TKG Management Cluster context to make the changes.
Scaling Worker Nodes
Run the following to list all the vSphereMachineTemplates.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -A
NAMESPACE NAME AGE
tkg-system tkg-mgmt-control-plane 20h
tkg-system tkg-mgmt-worker 20h
These custom resource definitions are immutable so we will need to make a copy of the yaml file and edit it to add a new vSphereMachineTemplate.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-worker -o yaml > tkg-mgmt-worker-new.yaml
Now edit the new file named tkg-mgmt-worker-new.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"VSphereMachineTemplate","metadata":{"annotations":{"vmTemplateMoid":"vm-9726"},"name":"tkg-mgmt-worker","namespace":"tkg-system"},"spec":{"template":{"spec":{"cloneMode":"fullClone","datacenter":"/home.local","datastore":"/home.local/datastore/lun01","diskGiB":40,"folder":"/home.local/vm/tkg-vsphere-tkg-mgmt","memoryMiB":8192,"network":{"devices":[{"dhcp4":true,"networkName":"/home.local/network/tkg-mgmt"}]},"numCPUs":2,"resourcePool":"/home.local/host/Management/Resources/tkg-vsphere-tkg-Mgmt","server":"vcenter.vmwire.com","storagePolicyName":"","template":"/home.local/vm/Templates/photon-3-kube-v1.22.9+vmware.1"}}}}
vmTemplateMoid: vm-9726
creationTimestamp: "2022-12-23T15:23:56Z"
generation: 1
name: tkg-mgmt-worker
namespace: tkg-system
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
name: tkg-mgmt
uid: 9acf6370-64be-40ce-9076-050ab8c6f41f
resourceVersion: "3069"
uid: 4a8f305f-0b61-4d33-ba02-7fb3fcc8ba22
spec:
template:
spec:
cloneMode: fullClone
datacenter: /home.local
datastore: /home.local/datastore/lun01
diskGiB: 40
folder: /home.local/vm/tkg-vsphere-tkg-mgmt
memoryMiB: 8192
network:
devices:
- dhcp4: true
networkName: /home.local/network/tkg-mgmt
numCPUs: 2
resourcePool: /home.local/host/Management/Resources/tkg-vsphere-tkg-Mgmt
server: vcenter.vmwire.com
storagePolicyName: ""
template: /home.local/vm/Templates/photon-3-kube-v1.22.9+vmware.1
Change the name of the CRD on line 10. Make any other changes you need, such as CPU on line 32 or RAM on line 27. Save the file.
Now you’ll need to create the new vSphereMachineTemplate.
k apply -f tkg-mgmt-worker-new.yaml
Now we’re ready to make the change.
Lets first take a look at the MachineDeployments.
k get machinedeployments.cluster.x-k8s.io -A
NAMESPACE NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION
tkg-system tkg-mgmt-md-0 tkg-mgmt 2 2 2 0 Running 20h v1.22.9+vmware.1
Now edit this MachineDeployment.
k edit machinedeployments.cluster.x-k8s.io -n tkg-system tkg-mgmt-md-0
You need to make the change to the section spec.template.spec.infrastructureRef under line 56.
53 infrastructureRef:
54 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
55 kind: VSphereMachineTemplate
56 name: tkg-mgmt-worker
Change line 56 to the new VsphereMachineTemplate CRD we created earlier.
53 infrastructureRef:
54 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
55 kind: VSphereMachineTemplate
56 name: tkg-mgmt-worker-new
Save and quit. You’ll notice that a new VM will immediately start being cloned in vCenter. Wait for it to complete, this new VM is the new worker with the updated CPU and memory sizing and it will replace the current worker node. Eventually, after a few minutes, the old worker node will be deleted and you will be left with a new worker node with the updated CPU and RAM specified in the new VSphereMachineTemplate.
Scaling Control Plane Nodes
Scaling the control plane nodes is similar.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-control-plane -o yaml > tkg-mgmt-control-plane-new.yaml
Edit the file and perform the same steps as the worker nodes.
You’ll notice that there is no MachineDeployment for the control plane node for a TKG Management Cluster. Instead we have to edit the CRD named KubeAdmControlPlane.
Run this command
k get kubeadmcontrolplane -A
NAMESPACE NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION
tkg-system tkg-mgmt-control-plane tkg-mgmt true true 1 1 1 0 21h v1.22.9+vmware.1
Now we can edit it
k edit kubeadmcontrolplane -n tkg-system tkg-mgmt-control-plane
Change the section under spec.machineTemplate.infrastructureRef, around line 106.
102 machineTemplate:
103 infrastructureRef:
104 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
105 kind: VSphereMachineTemplate
106 name: tkg-mgmt-control-plane
107 namespace: tkg-system
Change line 106 to
102 machineTemplate:
103 infrastructureRef:
104 apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
105 kind: VSphereMachineTemplate
106 name: tkg-mgmt-control-plane-new
107 namespace: tkg-system
Save the file. You’ll notice that another VM will start cloning and eventually you’ll have a new control plane node up and running. This new control plane node will replace the older one. It will take longer than the worker node so be patient.
Hey Hugo, really like the blog, I have been trying this with TKGm 2.1 and this method does not seem to work anymore
Thanks Rolf, I’ve not tried this with 2.1, will do sometime this weekend and let you know.
Did you deploy a TKG 2.1 cluster using cluster class instead of legacy?