With the vSphere CSI driver version 2.4.1, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.
Using local storage has distinct advantages over shared storage, especially when it comes to supporting faster and cheaper storage media for applications that do not benefit from or require the added complexity of having their data replicated by the storage layer. Examples of applications that do not require storage protection (RAID or failures to tolerate) are applications that can achieve data protection at the application level.
With this model, it is possible to present individual SSDs or NVMe drives attached to an ESXi host and configure a local datastore for use with topology aware volume provisioning. Kubernetes can then create persistent volumes and schedule pods that are deployed onto the worker nodes that are on the same ESXi host as the volume. This enables Kubernetes pods to have direct local access to the underlying storage.
To setup such an environment, it is necessary to go over some of the requirements first.
- Deploy Tanzu Kubernetes Clusters to Multiple Availability Zones on vSphere – link
- Spread Nodes Across Multiple Hosts in a Single Compute Cluster
- Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware – link
- Deploy TKG clusters into a multi-AZ topology
- Deploy the k8s-local-ssd storage class
- Deploy Workloads with WaitForFirstConsumer Mode in Topology-Aware Environment – link
Before you start
Note that only the CSI driver for vSphere version 2.4.1 supports local storage topology in a multi-AZ topology. To check if you have the correct version in your TKG cluster, run the following.
tanzu package installed get vsphere-csi -n tkg-system
- Retrieving installation details for vsphere-csi... I0224 19:20:29.397702 317993 request.go:665] Waited for 1.03368201s due to client-side throttling, not priority and fairness, request: GET:https://172.16.3.94:6443/apis/secretgen.k14s.io/v1alpha1?timeout=32s
\ Retrieving installation details for vsphere-csi...
NAME: vsphere-csi
PACKAGE-NAME: vsphere-csi.tanzu.vmware.com
PACKAGE-VERSION: 2.4.1+vmware.1-tkg.1
STATUS: Reconcile succeeded
CONDITIONS: [{ReconcileSucceeded True }]
Deploy Tanzu Kubernetes Clusters to Multiple Availibility Zones on vSphere
In my example, I am using the Spread Nodes Across Multiple Hosts in a Single Compute Cluster example, each ESXi host is an availability zone (AZ) and the vSphere cluster is the Region.
Figure 1. shows a TKG cluster with three worker nodes, each node is running on a separate ESXi host. Each ESXi host has a local SSD drive formatted with VMFS 6. The topology aware volume provisioner would always place pods and their replicas on separate worker nodes and also any persistent volume claims (PVC) on separate ESXi hosts.
Parameter | Specification | vSphere object | Datastore |
Region | tagCategory: k8s-region | cluster* | |
Zone az-1 az-2 az-3 | tagCategory: k8s-zone host-group-1 host-group-2 host-group-3 | esx1.vcd.lab esx2.vcd.lab esx3.vcd.lab | esx1-ssd-1 esx2-ssd-1 esx3-ssd-1 |
Storage Policy | k8s-local-ssd | esx1-ssd-1 esx2-ssd-1 esx3-ssd-1 | |
Tags | tagCategory: k8s-storage tag: k8s-local-ssd | esx1-ssd-1 esx2-ssd-1 esx3-ssd-1 |
*Note that “cluster” is the name of my vSphere cluster.
Ensure that you’ve set up the correct rules that enforce worker nodes to their respective ESXi hosts. Always use “Must run on hosts in group“, this is very important for local storage topology to work. This is because the worker nodes will be labelled for topology awareness, and if a worker node is vMotion’d accidentally then the CSI driver will not be able to bind the PVC to the worker node.

Below is my vsphere-zones.yaml file.
Note that autoConfigure is set to true. Which means that you do not have to tag the cluster or the ESX hosts yourself, you would only need to setup up the affinity rules under Cluster, Configure, VM/Host Groups and VM/Host Rules. The setting autoConfigure: true, would then make CAPV automatically configure the tags and tag categories for you.
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereFailureDomain
metadata:
name: az-1
spec:
region:
name: cluster
type: ComputeCluster
tagCategory: k8s-region
autoConfigure: true
zone:
name: az-1
type: HostGroup
tagCategory: k8s-zone
autoConfigure: true
topology:
datacenter: home.local
computeCluster: cluster
hosts:
vmGroupName: workers-group-1
hostGroupName: host-group-1
datastore: lun01
networks:
- tkg-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereFailureDomain
metadata:
name: az-2
spec:
region:
name: cluster
type: ComputeCluster
tagCategory: k8s-region
autoConfigure: true
zone:
name: az-2
type: HostGroup
tagCategory: k8s-zone
autoConfigure: true
topology:
datacenter: home.local
computeCluster: cluster
hosts:
vmGroupName: workers-group-2
hostGroupName: host-group-2
datastore: lun01
networks:
- tkg-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereFailureDomain
metadata:
name: az-3
spec:
region:
name: cluster
type: ComputeCluster
tagCategory: k8s-region
autoConfigure: true
zone:
name: az-3
type: HostGroup
tagCategory: k8s-zone
autoConfigure: true
topology:
datacenter: home.local
computeCluster: cluster
hosts:
vmGroupName: workers-group-3
hostGroupName: host-group-3
datastore: lun01
networks:
- tkg-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
name: az-1
spec:
server: vcenter.vmwire.com
failureDomain: az-1
placementConstraint:
resourcePool: tkg-vsphere-workload
folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
name: az-2
spec:
server: vcenter.vmwire.com
failureDomain: az-2
placementConstraint:
resourcePool: tkg-vsphere-workload
folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
name: az-3
spec:
server: vcenter.vmwire.com
failureDomain: az-3
placementConstraint:
resourcePool: tkg-vsphere-workload
folder: tkg-vsphere-workload
Note that Kubernetes does not like using parameter names that are not standard, I suggest for your vmGroupName and hostGroupName parameters, use lowercase and dashes instead of periods. For example host-group-3, instead of Host.Group.3. The latter will be rejected.
Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware
To ensure that this topology can be built by TKG, we first need to create a TKG cluster plan overlay that tells Tanzu how what to do when creating worker nodes in a multi-availability zone topology.
Lets take a look at my az-overlay.yaml file.
Since I have three AZs, I need to create an overlay file that includes the cluster plan for all three AZs.
Parameter | Specification | |
Zone az-1 az-2 az-3 | VSphereMachineTemplate -worker-0 -worker-1 -worker-2 | KubeadmConfigTemplate -md-0 -md-1 -md-2 |
#! Please add any overlays specific to vSphere provider under this file.
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")
#@ load("lib/helpers.star", "get_bom_data_for_tkr_name", "get_default_tkg_bom_data", "kubeadm_image_repo", "get_image_repo_for_component", "get_vsphere_thumbprint")
#@ load("lib/validate.star", "validate_configuration")
#@ load("@ytt:yaml", "yaml")
#@ validate_configuration("vsphere")
#@ bomDataForK8sVersion = get_bom_data_for_tkr_name()
#@ if data.values.CLUSTER_PLAN == "dev" and not data.values.IS_WINDOWS_WORKLOAD_CLUSTER:
#@overlay/match by=overlay.subset({"kind":"VSphereCluster"})
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
name: #@ data.values.CLUSTER_NAME
spec:
thumbprint: #@ get_vsphere_thumbprint()
server: #@ data.values.VSPHERE_SERVER
identityRef:
kind: Secret
name: #@ data.values.CLUSTER_NAME
#@overlay/match by=overlay.subset({"kind":"MachineDeployment", "metadata":{"name": "{}-md-0".format(data.values.CLUSTER_NAME)}})
---
spec:
template:
spec:
#@overlay/match missing_ok=True
#@ if data.values.VSPHERE_AZ_0:
failureDomain: #@ data.values.VSPHERE_AZ_0
#@ end
infrastructureRef:
name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
#@overlay/match by=overlay.subset({"kind":"VSphereMachineTemplate", "metadata":{"name": "{}-worker".format(data.values.CLUSTER_NAME)}})
---
metadata:
name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
spec:
template:
spec:
#@overlay/match missing_ok=True
#@ if data.values.VSPHERE_AZ_0:
failureDomain: #@ data.values.VSPHERE_AZ_0
#@ end
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
#@overlay/match missing_ok=True
annotations:
vmTemplateMoid: #@ data.values.VSPHERE_TEMPLATE_MOID
spec:
template:
spec:
cloneMode: #@ data.values.VSPHERE_CLONE_MODE
datacenter: #@ data.values.VSPHERE_DATACENTER
datastore: #@ data.values.VSPHERE_DATASTORE
storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
folder: #@ data.values.VSPHERE_FOLDER
memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
network:
devices:
#@overlay/match by=overlay.index(0)
#@overlay/replace
- networkName: #@ data.values.VSPHERE_NETWORK
#@ if data.values.WORKER_NODE_NAMESERVERS:
nameservers: #@ data.values.WORKER_NODE_NAMESERVERS.replace(" ", "").split(",")
#@ end
#@ if data.values.TKG_IP_FAMILY == "ipv6":
dhcp6: true
#@ elif data.values.TKG_IP_FAMILY in ["ipv4,ipv6", "ipv6,ipv4"]:
dhcp4: true
dhcp6: true
#@ else:
dhcp4: true
#@ end
numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
server: #@ data.values.VSPHERE_SERVER
template: #@ data.values.VSPHERE_TEMPLATE
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
#@overlay/match missing_ok=True
annotations:
vmTemplateMoid: #@ data.values.VSPHERE_TEMPLATE_MOID
spec:
template:
spec:
cloneMode: #@ data.values.VSPHERE_CLONE_MODE
datacenter: #@ data.values.VSPHERE_DATACENTER
datastore: #@ data.values.VSPHERE_DATASTORE
storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
folder: #@ data.values.VSPHERE_FOLDER
memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
network:
devices:
#@overlay/match by=overlay.index(0)
#@overlay/replace
- networkName: #@ data.values.VSPHERE_NETWORK
#@ if data.values.WORKER_NODE_NAMESERVERS:
nameservers: #@ data.values.WORKER_NODE_NAMESERVERS.replace(" ", "").split(",")
#@ end
#@ if data.values.TKG_IP_FAMILY == "ipv6":
dhcp6: true
#@ elif data.values.TKG_IP_FAMILY in ["ipv4,ipv6", "ipv6,ipv4"]:
dhcp4: true
dhcp6: true
#@ else:
dhcp4: true
#@ end
numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
server: #@ data.values.VSPHERE_SERVER
template: #@ data.values.VSPHERE_TEMPLATE
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
spec:
clusterName: #@ data.values.CLUSTER_NAME
replicas: #@ data.values.WORKER_MACHINE_COUNT_1
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
clusterName: #@ data.values.CLUSTER_NAME
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
version: #@ data.values.KUBERNETES_VERSION
#@ if data.values.VSPHERE_AZ_1:
failureDomain: #@ data.values.VSPHERE_AZ_1
#@ end
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
spec:
clusterName: #@ data.values.CLUSTER_NAME
replicas: #@ data.values.WORKER_MACHINE_COUNT_2
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
clusterName: #@ data.values.CLUSTER_NAME
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
version: #@ data.values.KUBERNETES_VERSION
#@ if data.values.VSPHERE_AZ_2:
failureDomain: #@ data.values.VSPHERE_AZ_2
#@ end
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
namespace: '${ NAMESPACE }'
spec:
template:
spec:
useExperimentalRetryJoin: true
joinConfiguration:
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
name: '{{ ds.meta_data.hostname }}'
preKubeadmCommands:
- hostname "{{ ds.meta_data.hostname }}"
- echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
- echo "127.0.0.1 localhost" >>/etc/hosts
- echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
- echo "{{ ds.meta_data.hostname }}" >/etc/hostname
files: []
users:
- name: capv
sshAuthorizedKeys:
- #@ data.values.VSPHERE_SSH_AUTHORIZED_KEY
sudo: ALL=(ALL) NOPASSWD:ALL
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
namespace: '${ NAMESPACE }'
spec:
template:
spec:
useExperimentalRetryJoin: true
joinConfiguration:
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
name: '{{ ds.meta_data.hostname }}'
preKubeadmCommands:
- hostname "{{ ds.meta_data.hostname }}"
- echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
- echo "127.0.0.1 localhost" >>/etc/hosts
- echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
- echo "{{ ds.meta_data.hostname }}" >/etc/hostname
files: []
users:
- name: capv
sshAuthorizedKeys:
- #@ data.values.VSPHERE_SSH_AUTHORIZED_KEY
sudo: ALL=(ALL) NOPASSWD:ALL
#@ end
Deploy a TKG cluster into a multi-AZ topology
To deploy a TKG cluster that spreads its worker nodes over multiple AZs, we need to add some key value pairs into the cluster config file.
Below is an example for my cluster config file – tkg-hugo.yaml.
The new key value pairs are described in the table below.
Parameter | Specification | Details |
VSPHERE_REGION | k8s-region | Must be the same as the configuration in the vsphere-zones.yaml file |
VSPHERE_ZONE | k8s-zone | Must be the same as the configuration in the vsphere-zones.yaml file |
VSPHERE_AZ_0 VSPHERE_AZ_1 VSPHERE_AZ_2 | az-1 az-2 az-3 | Must be the same as the configuration in the vsphere-zones.yaml file |
WORKER_MACHINE_COUNT | 3 | This is the number of worker nodes for the cluster. The total number of workers are distributed in a round-robin fashion across the number of AZs specified. |
A note on WORKER_MACHINE_COUNT when using CLUSTER_PLAN: dev instead of prod. If you change the az-overlay.yaml @ if data.values.CLUSTER_PLAN == “prod” to @ if data.values.CLUSTER_PLAN == “dev” | Then the WORKER_MACHINE_COUNT reverts to the number of workers for each AZ. So if you set this number to 3, in a three AZ topology, you would end up with a TKG cluster with nine workers! |
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg-hugo
CLUSTER_PLAN: prod
ENABLE_CEIP_PARTICIPATION: 'false'
ENABLE_MHC: 'true'
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /home.local
VSPHERE_DATASTORE: lun02
VSPHERE_FOLDER: /home.local/vm/tkg-vsphere-workload
VSPHERE_NETWORK: /home.local/network/tkg-workload
VSPHERE_PASSWORD: <encoded:snipped>
VSPHERE_RESOURCE_POOL: /home.local/host/cluster/Resources/tkg-vsphere-workload
VSPHERE_SERVER: vcenter.vmwire.com
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa <snipped> administrator@vsphere.local
VSPHERE_USERNAME: administrator@vsphere.local
CONTROLPLANE_SIZE: small
WORKER_MACHINE_COUNT: 3
WORKER_SIZE: small
VSPHERE_INSECURE: 'true'
ENABLE_AUDIT_LOGGING: 'true'
ENABLE_DEFAULT_STORAGE_CLASS: 'false'
ENABLE_AUTOSCALER: 'false'
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
Deploy the k8s-local-ssd Storage Class
Below is my storageclass-k8s-local-ssd.yaml.
Note that parameters.storagePolicyName: k8s-local-ssd, which is the same as the name of the storage policy for the local storage. All three of the local VMFS datastores that are backed by the local SSD drives are members of this storage policy.
Note that the volumeBindingMode is set to WaitForFirstConsumer.
Instead of creating a volume immediately, the WaitForFirstConsumer
setting instructs the volume provisioner to wait until a pod using the associated PVC runs through scheduling. In contrast with the Immediate
volume binding mode, when the WaitForFirstConsumer
setting is used, the Kubernetes scheduler drives the decision of which failure domain to use for volume provisioning using the pod policies.
This guarantees the pod at its volume is always on the same AZ (ESXi host).
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: k8s-local-ssd
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.vsphere.vmware.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
storagePolicyName: k8s-local-ssd
Deploy a workload that uses Topology Aware Volume Provisioning
Below is a statefulset that deploys three pods running nginx. It configures two persistent volumes, one for www and another for log. Both of these volumes are going to be provisioned onto the same ESXi host where the pod is running. The statefulset also runs an initContainer that will download a simple html file from my repo and copy it to the www mount point (/user/share/nginx/html).
You can see under spec.affinity.nodeAffinity how the statefulset uses the topology.
The statefulset then exposes the nginx app using the nginx-service which uses the Gateway API, that I wrote about in a previous blog post.
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: default
labels:
ako.vmware.com/gateway-name: gateway-tkg-workload-vip
ako.vmware.com/gateway-namespace: default
spec:
selector:
app: nginx
ports:
- port: 80
targetPort: 80
protocol: TCP
type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: nginx
serviceName: nginx-service
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.csi.vmware.com/k8s-zone
operator: In
values:
- az-1
- az-2
- az-3
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: topology.csi.vmware.com/k8s-zone
terminationGracePeriodSeconds: 10
initContainers:
- name: install
image: busybox
command:
- wget
- "-O"
- "/www/index.html"
- https://raw.githubusercontent.com/hugopow/cse/main/index.html
volumeMounts:
- name: www
mountPath: "/www"
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
- name: logs
mountPath: /logs
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: k8s-local-ssd
resources:
requests:
storage: 2Gi
- metadata:
name: logs
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: k8s-local-ssd
resources:
requests:
storage: 1Gi
What if you wanted to use more than three availability zones?
Some notes here on what I experienced during my testing.
The TKG cluster config has the following three lines to specify the names of the AZs that you want to use which will be passed onto the Tanzu CLI to use to deploy your TKG cluster using the ytt overlay file. However, the Tanzu CLI only supports a total of three AZs.
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
If you wanted to use more than three AZs, then you would have to remove these three lines from the TKG cluster config and change the ytt overlay to not use the VSPHERE_AZ_# variables but to hard code the AZs into the ytt overlay file instead.
To do this replace the following:
#@ if data.values.VSPHERE_AZ_2:
failureDomain: #@ data.values.VSPHERE_AZ_0
#@ end
with the following:
failureDomain: az-2
and create an additional block of MachineDeployment and KubeadmConfigTemplate for each additional AZ that you need.
Summary
Below are screenshots and the resulting deployed objects after running kubectl apply -f to the above.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-hugo-md-0-7d455b7488-d6jrl Ready <none> 3h23m v1.22.5+vmware.1
tkg-hugo-md-1-bc76659f7-cntn4 Ready <none> 3h23m v1.22.5+vmware.1
tkg-hugo-md-2-6bb75968c4-mnrk5 Ready <none> 3h23m v1.22.5+vmware.1
You can see that the worker nodes are distributed across the ESXi hosts as per our vsphere-zones.yaml and also our az-overlay.yaml files.
kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-0 1/1 Running 0 3h14m 100.124.232.195 tkg-hugo-md-2-6bb75968c4-mnrk5 <none> <none>
web-1 1/1 Running 0 3h13m 100.122.148.67 tkg-hugo-md-1-bc76659f7-cntn4 <none> <none>
web-2 1/1 Running 0 3h12m 100.108.145.68 tkg-hugo-md-0-7d455b7488-d6jrl <none> <none>
You can see that each pod is placed on a separate worker node.
kubectl get csinodes -o jsonpath='{range .items[*]}{.metadata.name} {.spec}{"\n"}{end}'
tkg-hugo-md-0-7d455b7488-d6jrl {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"tkg-hugo-md-0-7d455b7488-d6jrl","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-hugo-md-1-bc76659f7-cntn4 {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"tkg-hugo-md-1-bc76659f7-cntn4","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-hugo-md-2-6bb75968c4-mnrk5 {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"tkg-hugo-md-2-6bb75968c4-mnrk5","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
We can see that the CSI driver has correctly configured the worker nodes with the topologyKeys that enables the topology aware volume provisioning.
kubectl get pvc -o wide
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE
logs-web-0 Bound pvc-13cf4150-db60-4c13-9ee2-cbc092dba782 1Gi RWO k8s-local-ssd 3h18m Filesystem
logs-web-1 Bound pvc-e99cfe33-9fa4-46d8-95f8-8a71f4535b15 1Gi RWO k8s-local-ssd 3h17m Filesystem
logs-web-2 Bound pvc-6bd51eed-e0aa-4489-ac0a-f546dadcee16 1Gi RWO k8s-local-ssd 3h17m Filesystem
www-web-0 Bound pvc-8f46420a-41c4-4ad3-97d4-5becb9c45c94 2Gi RWO k8s-local-ssd 3h18m Filesystem
www-web-1 Bound pvc-c3c9f551-1837-41aa-b24f-f9dc6fdb9063 2Gi RWO k8s-local-ssd 3h17m Filesystem
www-web-2 Bound pvc-632a9f81-3e9d-492b-847a-9316043a2d47 2Gi RWO k8s-local-ssd 3h17m Filesystem
kubectl get pv -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.claimRef.name}{"\t"}{.spec.nodeAffinity}{"\n"}{end}'
pvc-13cf4150-db60-4c13-9ee2-cbc092dba782 logs-web-0 {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-3"]}]}]}}
pvc-632a9f81-3e9d-492b-847a-9316043a2d47 www-web-2 {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-1"]}]}]}}
pvc-6bd51eed-e0aa-4489-ac0a-f546dadcee16 logs-web-2 {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-1"]}]}]}}
pvc-8f46420a-41c4-4ad3-97d4-5becb9c45c94 www-web-0 {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-3"]}]}]}}
pvc-c3c9f551-1837-41aa-b24f-f9dc6fdb9063 www-web-1 {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-2"]}]}]}}
pvc-e99cfe33-9fa4-46d8-95f8-8a71f4535b15 logs-web-1 {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-2"]},{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]}]}]}}
Here we see the placement for the persistent volumes within the AZs and they also align to the right worker node.
k get no tkg-hugo-md-0-7d455b7488-d6jrl -o yaml | grep topology.kubernetes.io/zone:
topology.kubernetes.io/zone: az-1
k get no tkg-hugo-md-1-bc76659f7-cntn4 -o yaml | grep topology.kubernetes.io/zone:
topology.kubernetes.io/zone: az-2
k get no tkg-hugo-md-2-6bb75968c4-mnrk5 -o yaml | grep topology.kubernetes.io/zone:
topology.kubernetes.io/zone: az-3
k get volumeattachments.storage.k8s.io
NAME ATTACHER PV NODE ATTACHED AGE
csi-476b244713205d0d4d4e13da1a6bd2beec49ac90fbd4b64c090ffba8468f6479 csi.vsphere.vmware.com pvc-c3c9f551-1837-41aa-b24f-f9dc6fdb9063 tkg-hugo-md-1-bc76659f7-cntn4 true 9h
csi-5a759811557125917e3b627993061912386f4d2e8fb709e85fc407117138b178 csi.vsphere.vmware.com pvc-8f46420a-41c4-4ad3-97d4-5becb9c45c94 tkg-hugo-md-2-6bb75968c4-mnrk5 true 9h
csi-6016904b0ac4ac936184e95c8ff0b3b8bebabb861a99b822e6473c5ee1caf388 csi.vsphere.vmware.com pvc-6bd51eed-e0aa-4489-ac0a-f546dadcee16 tkg-hugo-md-0-7d455b7488-d6jrl true 9h
csi-c5b9abcc05d7db5348493952107405b557d7eaa0341aa4e952457cf36f90a26d csi.vsphere.vmware.com pvc-13cf4150-db60-4c13-9ee2-cbc092dba782 tkg-hugo-md-2-6bb75968c4-mnrk5 true 9h
csi-df68754411ab34a5af1c4014db9e9ba41ee216d0f4ec191a0d191f07f99e3039 csi.vsphere.vmware.com pvc-e99cfe33-9fa4-46d8-95f8-8a71f4535b15 tkg-hugo-md-1-bc76659f7-cntn4 true 9h
csi-f48a7db32aafb2c76cc22b1b533d15d331cd14c2896b20cfb4d659621fd60fbc csi.vsphere.vmware.com pvc-632a9f81-3e9d-492b-847a-9316043a2d47 tkg-hugo-md-0-7d455b7488-d6jrl true 9h
And finally, some other screenshots to show the PVCs in vSphere.
ESX1
ESX2
ESX3
2 thoughts on “Using local storage with Tanzu Kubernetes Grid Topology Aware Volume Provisioning”