TKG 2.3 has some changes to how TKG clusters with multi availability zones are deployed. This post summarises these changes.
These changes allow some cool new options such as
- Deploy a TKG cluster into multiple AZs where, each AZ can be a vSphere cluster or a host group, where a host group can have one or more ESX hosts.
- Deploy worker nodes across AZs, but do not deploy control plane nodes into any AZ.
- Deploy worker nodes across AZs, and enforce control plane nodes to be in one AZ
- Deploy TKG clusters without AZs.
- Deploy all nodes into just one AZ, think vSAN stretched cluster use cases.
- Enable multi-AZ for already deployed clusters that were initially deployed without AZs.
- All of the above but with one control plane node (CLUSTER_PLAN: dev) or three control plane nodes (CLUSTER_PLAN: prod)
- All of the above but with single node clusters too!
- CSI topology has not changed and is supported for topology aware volume provisioning.
VSphereDeploymentZone requires labels
The VSphereDeploymentZone needs to be labeled in order for the new configuration variable VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS to use the labels. This parameter is used to place the control plane nodes into the desired AZ.
Note that if VSPHERE-ZONE and VSPHERE_REGION is specified in the cluster configuration file then you must specify a VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS. If you don’t you’ll get this error:
Error: workload cluster configuration validation failed: VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS should be configured if VSPHERE_ZONE/VSPHERE_REGION are configured
You also cannot leave the variable for VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS blank, or give a fake label e.g., VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: “fake=fake” as you’ll get this error:
Error: workload cluster configuration validation failed: unable find VsphereDeploymentZone by the matchlabels.
However, there are ways around this, which I’ll cover below.
Below is my manifest for the VSphereDeploymentZone, note that labels for region and az.
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
name: az-1
labels:
region: cluster
az: az-1
spec:
server: vcenter.vmwire.com
failureDomain: az-1
placementConstraint:
resourcePool: tkg-vsphere-workload
folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
name: az-2
labels:
region: cluster
az: az-2
spec:
server: vcenter.vmwire.com
failureDomain: az-2
placementConstraint:
resourcePool: tkg-vsphere-workload
folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
name: az-3
labels:
region: cluster
az: az-3
spec:
server: vcenter.vmwire.com
failureDomain: az-3
placementConstraint:
resourcePool: tkg-vsphere-workload
folder: tkg-vsphere-workload
Deploy a TKG cluster with multi AZs
Lets say you have an environment with three AZs, and you want both the control plane nodes and the worker nodes to be distributed across the AZs.
The cluster config file would need to have the following variables.
VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "region=cluster"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
USE_TOPOLOGY_CATEGORIES: true
tanzu cluster create tkg-workload1 -f tkg-cluster.yaml --dry-run > tkg-workload1-spec.yaml
tanzu cluster create -f tkg-workload1-spec.yaml
Deploy a TKG cluster with multi AZs but not for control plane nodes
tanzu cluster create tkg-workload2 -f
tkg-cluster
.yaml--dry-run > tkg-workload2-spec.yaml
Edit the file tkg-workload2-spec.yaml file and remove the following lines to not deploy the control plane nodes into an AZ
- name: controlPlaneZoneMatchingLabels
value:
region: cluster
tanzu cluster create -f tkg-workload2-spec.yaml
Deploy a TKG cluster with multi AZs and force control plane nodes into one AZ
The cluster config file would need to have the following variables.
VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "az=az-1"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
USE_TOPOLOGY_CATEGORIES: true
tanzu cluster create tkg-workload3 -f
tkg-cluster
.yaml--dry-run > tkg-workload3-spec.yaml
tanzu cluster create -f tkg-workload3-spec.yaml
Deploy a TKG cluster into one AZ
The cluster config file would need to have the following variables.
VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "az=az-1"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
USE_TOPOLOGY_CATEGORIES: true
tanzu cluster create tkg-workload4 -f tkg-cluster.yaml --dry-run > tkg-workload4-spec.yaml
tanzu cluster create -f tkg-workload4-spec.yaml
Deploy TKG cluster with only one control plane node
You can also deploy all of the options above, but with just one control plane node. This minimises resources if you’re resource constrained.
To do this your cluster config file would have the following variables.
CLUSTER_PLAN: dev
VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "region=cluster"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
USE_TOPOLOGY_CATEGORIES: true
tanzu cluster create tkg-workload5 -f
tkg-cluster
.yaml--dry-run > tkg-workload5-spec.yaml
Edit the file tkg-workload5-spec.yaml file and remove the following lines to not deploy the control plane nodes into an AZ
- name: controlPlaneZoneMatchingLabels
value:
region: cluster
Also, since the CLUSTER_PLAN is set to dev, you’ll see that the machineDeployments will show az-1 having three replicas. To change the machineDeployments to deploy one replica in each AZ, change the file to the following:
workers:
machineDeployments:
- class: tkg-worker
failureDomain: az-1
metadata:
annotations:
run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
name: md-0
replicas: 1
strategy:
type: RollingUpdate
- class: tkg-worker
failureDomain: az-2
metadata:
annotations:
run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
name: md-1
replicas: 1
strategy:
type: RollingUpdate
- class: tkg-worker
failureDomain: az-3
metadata:
annotations:
run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
name: md-2
replicas: 1
strategy:
type: RollingUpdate
tanzu cluster create -f tkg-workload5-spec.yaml
How to find which AZs the nodes are deployed into
kubectl get machines -o json | jq -r '[.items[] | {name:.metadata.name, failureDomain:.spec.failureDomain}]'
[
{
"name": "tkg-workload2-md-0-xkdm2-6f58d5f5bbxpkfcz-ffvmn",
"failureDomain": "az-1"
},
{
"name": "tkg-workload2-md-1-w9dk7-cf5c7cbd7xs9gwz-2mjj4",
"failureDomain": "az-2"
},
{
"name": "tkg-workload2-md-2-w9dk7-cf5c7cbd7xs9gwz-4j9ds",
"failureDomain": "az-3"
},
{
"name": "tkg-workload2-vnpbp-5rt4b",
"failureDomain": null
},
{
"name": "tkg-workload2-vnpbp-8rtqd",
"failureDomain": null
},
{
"name": "tkg-workload2-vnpbp-dq68j",
"failureDomain": null
}
]
One thought on “TKG 2.3 Multi Availability Zone Updates”