Tanzu – VMwire

Using a Let’s Encrypt Cluster Issuer for Certificate Manager

I’m a big fan of Let’s Encrypt as its a great service that provides free TLS certificates for your applications and websites. This post summarizes the steps to setup Let’s Encrypt as the cluster issuer for certificate manager.

A Cluster Issuer enables your applications to automatically request TLS certificates from Let’s Encrypt. It basically avoids having to do the following manually:

Typing certbot certonly --manual --cert-name something.domain.com --preferred-challenge=dns to create a manual TLS request.
Then going to your DNS service and creating the TXT record.
Then downloading the cert.pem and privkey.pem.
Then creating a secret to use the new certificate.

Pre-requisites

A Kubernetes cluster, I’m using a TKG cluster
A domain name managed by a well-known domain registrar (I’m using Cloud Flare, but Route 53 and others can also be used)

Step 1. Install Cert Manager into your Kubernetes cluster

# Install Tanzu Standard Repository
tanzu package repository add tanzu-standard --url projects.registry.vmware.com/tkg/packages/standard/repo:v2024.2.1 --namespace tkg-system

# Create namespace for cert-manager tanzu packages
k create ns my-packages

# Install cert-manager 
tanzu package install cert-manager --package cert-manager.tanzu.vmware.com --namespace my-packages --version 1.12.2+vmware.2-tkg.2

# Install cert-manager custom resource definitions
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.crds.yaml

Step 2. Create a Secret for Cloud Flare

I am using Cloud Flare as my DNS provider. Cloud Flare has an API that can be used with an API Token or an API Key. I am using an API Key to allow Let’s Encrypt to verify domain ownership with Cloud Flare.

You can get your API Key by following this screenshot.

Then create the following file secret-cloudflare.yaml.

apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-api-key-secret
  namespace: cert-manager
type: Opaque
stringData:
  api-key: <your-cloud-flare-api-key>
  # - or -
  # api-token: your-api-token

Step 3. Create the Let’s Encrypt Cluster Issuer

I am using Let’s Encrypt as the certificate issuer and it will check the validity of the certificate request against the domain ownership in Cloud Flare using the Secret created in Step 2.

Create another file named cluster-issuer-production.yaml.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    email: <your-email-address>
    # Letsencrypt Production
    server: https://acme-v02.api.letsencrypt.org/directory
    # - or -
    # Letsencrypt Staging
    # server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: example-issuer-account-key
    solvers:
    - dns01:
        cloudflare:
          email: <your-cloudflare-email-account>
          apiKeySecretRef:
            name: cloudflare-api-key-secret
            key: api-key

Step 4. Apply both files to create the Secret containing the Cloud Flare API Key and the Cluster Issuer.

kubectl apply -f secret-cloudflare.yaml

kubectl apply -f cluster-issuer-production.yaml

Your cluster is now ready for automatically issuing TLS certificates using Certificate Manager.

Example Application

The following is an example application manifest that uses the letsencrypt-production ClusterIssuer to request a TLS certificate from Let’s Encrypt named nginx.k8slabs.com.

My test domain k8slabs.com is running in Cloud Flare.

The manifest has the following sections:

namespace – creates the nginx namespace for all of the resources below
service – ClusterIP service for nginx to expose the nginx pod created by the
statefulset – creates the statefulset that will deploy the nginx pods
certificate – issued by the ClusterIssuer using Let’s Encrypt and checks validity against the DNS records in CloudFlare
httpproxy (ingress) – creates an ingress and uses the certificate created by the ClusterIssuer to expose the nginx application over secure TLS

Sample application nginx-statefulset-contour-tls.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: nginx
  labels:
    name: nginx
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: nginx
spec:
  selector:
    app: nginx
  ports:
    - name: http
      port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx
  namespace: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  serviceName: "nginx-service"
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: nginx
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: nginx
  namespace: nginx
spec:
  secretName: nginx
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  dnsNames:
    - 'nginx.k8slabs.com'
---
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  annotations:
  labels:
    app: nginx
  name: nginx-httpproxy
  namespace: nginx
spec:
  routes:
  - conditions:
    - prefix: /
    pathRewritePolicy:
      replacePrefix:
      - prefix: /
        replacement: /
    services:
    - name: nginx-service
      port: 80
  virtualhost:
    fqdn: nginx.k8slabs.com
    tls:
      secretName: nginx

Visual View

TKG 2.3 Multi AZ Day 2 Operations

In the previous post, I highlighted the key updates that the TKG 2.3 release had towards multi availability zone enabled clusters. The previous post discussed greenfield Day-0 deployments of TKG clusters using multi AZs. This post will focus more on Day-2 operations, such as how to enable AZs for already deployed clusters that were not AZ enabled. For example, you already deployed clusters with an older version of TKG that did not bring generally available support for the multi-AZ feature.

Enabling multi-AZ for clusters initially deployed without AZs

To enable multi-AZ for a cluster that was initially deployed without AZs, you can follow the procedure below. Note that this is for a workload cluster and not a management cluster. To enable this for a management cluster, just add the tkg-system namespace and change the name of the cluster to the management cluster.

We’ve made it very easy to do Day-2 operations, since the AZs are just labels, and if you’re already familiar with Kubernetes labels, its a simple operation of adding the label to the controlPlaneZoneMatchingLabels key.

Note that the labels needs to be relevant to the file vsphere-zones.yaml labels, just apply this file to the TKG management cluster. My example is below:

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-1
 labels:
   region: cluster
   az: az-1
spec:
 server: vcenter.vmwire.com
 failureDomain: az-1
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-2
 labels:
   region: cluster
   az: az-2
spec:
 server: vcenter.vmwire.com
 failureDomain: az-2
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-3
 labels:
   region: cluster
   az: az-3
spec:
 server: vcenter.vmwire.com
 failureDomain: az-3
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload

Control Plane nodes

When ready, run the command below against the TKG Management Cluster context to set the label for control plane nodes of the TKG cluster named tkg-cluster.
kubectl get cluster tkg-cluster -o json | jq '.spec.topology.variables |= map(if .name == "controlPlaneZoneMatchingLabels" then .value = {"region": "cluster"} else . end)'| kubectl replace -f -

You should receive the following response.

cluster.cluster.x-k8s.io/tkg-cluster replaced

You can check that the cluster status to ensure that the failure domain has been updated as expected.

kubectl get cluster tkg-cluster -o json | jq -r '.status.failureDomains | to_entries[].key'

The response would look something like

az-1
az-2
az-3

Next we patch the KubeAdmControlPlane with rolloutAfter to trigger an update of controlplane node(s).

kubectl patch kcp tkg-cluster-f2km7 --type merge -p "{\"spec\":{\"rolloutAfter\":\"$(date +'%Y-%m-%dT%TZ')\"}}"

You should see vCenter start to clone new control plane nodes, and when the nodes start, they will be placed in an AZ. You can also check with the command below.

kubectl get machines -o json | jq -r '[.items[] | {name:.metadata.name, failureDomain:.spec.failureDomain}]'

As nodes are started and join the cluster, they will get placed into the right AZ.

[
  {
    "name": "tkg-cluster-f2km7-2kwgs",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-f2km7-6pgmr",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-f2km7-cqndc",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-cluster-f2km7-pzqwx",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-0-j6c24-6c8c9d45f7xjdchc-97q57",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-1-nqvsf-55b5464bbbx4xzkd-q6jhq",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-2-srr2c-77cc694688xcx99w-qcmwg",
    "failureDomain": null
  }
]

And after a few minutes…

[
  {
    "name": "tkg-cluster-f2km7-2kwgs",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-f2km7-4tn6l",
    "failureDomain": "az-1"
  },
  {
    "name": "tkg-cluster-f2km7-cqndc",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-cluster-f2km7-w7vs5",
    "failureDomain": "az-3"
  },
  {
    "name": "tkg-cluster-md-0-j6c24-6c8c9d45f7xjdchc-97q57",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-1-nqvsf-55b5464bbbx4xzkd-q6jhq",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-2-srr2c-77cc694688xcx99w-qcmwg",
    "failureDomain": null
  }
]

Worker nodes

The procedure is almost the same for the worker nodes.

Let’s check the current MachineDeploy topology.

kubectl get cluster tkg-cluster -o=jsonpath='{range .spec.topology.workers.machineDeployments[*]}{"Name: "}{.name}{"\tFailure Domain: "}{.failureDomain}{"\n"}{end}'

The response should be something like this, since this cluster was initially deployed without AZs.

Name: md-0	Failure Domain:
Name: md-1	Failure Domain:
Name: md-2	Failure Domain:

Patch the cluster tkg-cluster with VSphereFailureDomain az-1, az-2 and az-3. In this example, the tkg-cluster cluster plan is prod and has three MachineDeployments. If your tkg-cluster cluster uses the dev plan, then you only need to update 1 MachineDeployment in cluster spec.toplogy.wokers.machineDeployments.

kubectl patch cluster tkg-cluster --type=json -p='[ {"op": "replace", "path": "/spec/topology/workers/machineDeployments/0/failureDomain", "value": "az-1"}, {"op": "replace", "path": "/spec/topology/workers/machineDeployments/1/failureDomain", "value": "az-2"}, {"op": "replace", "path": "/spec/topology/workers/machineDeployments/2/failureDomain", "value": "az-3"}]'

Lets check the MachineDeployment topology now that the change has been made.

kubectl get cluster tkg-cluster -o=jsonpath='{range .spec.topology.workers.machineDeployments[*]}{"Name: "}{.name}{"\tFailure Domain: "}{.failureDomain}{"\n"}{end}'

The response should be something like this, since this cluster was initially deployed without AZs.

Name: md-0	Failure Domain: az-1
Name: md-1	Failure Domain: az-2
Name: md-2	Failure Domain: az-3

vCenter should immediately start deploying new worker nodes, when they start they will be placed into the correct AZs.

You can also check with the command below.

kubectl get machines -o json | jq -r '[.items[] | {name:.metadata.name, failureDomain:.spec.failureDomain}]'

[
  {
    "name": "tkg-cluster-f2km7-4tn6l",
    "failureDomain": "az-1"
  },
  {
    "name": "tkg-cluster-f2km7-cqndc",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-cluster-f2km7-w7vs5",
    "failureDomain": "az-3"
  },
  {
    "name": "tkg-cluster-md-0-j6c24-6c8c9d45f7xjdchc-97q57",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-0-j6c24-8f6b4f8d5xplqlf-p8d8k",
    "failureDomain": "az-1"
  },
  {
    "name": "tkg-cluster-md-1-nqvsf-55b5464bbbx4xzkd-q6jhq",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-1-nqvsf-7dc48df8dcx6bs6b-kmj9r",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-cluster-md-2-srr2c-77cc694688xcx99w-qcmwg",
    "failureDomain": null
  },
  {
    "name": "tkg-cluster-md-2-srr2c-f466d4484xxc9xz-8sjfn",
    "failureDomain": "az-3"
  }
]

And after a few minutes…

kubectl get machines -o json | jq -r ‘[.items[] | {name:.metadata.name, failureDomain:.spec.failureDomain}]’

[
  {
    "name": "tkg-cluster-f2km7-4tn6l",
    "failureDomain": "az-1"
  },
  {
    "name": "tkg-cluster-f2km7-cqndc",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-cluster-f2km7-w7vs5",
    "failureDomain": "az-3"
  },
  {
    "name": "tkg-cluster-md-0-j6c24-8f6b4f8d5xplqlf-p8d8k",
    "failureDomain": "az-1"
  },
  {
    "name": "tkg-cluster-md-1-nqvsf-7dc48df8dcx6bs6b-kmj9r",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-cluster-md-2-srr2c-f466d4484xxc9xz-8sjfn",
    "failureDomain": "az-3"
  }
]

Update CPI and CSI for topology awareness

We also need to update the CPI and CSI to reflect the support for multi-AZ, note this is only required for Day-2 operations as CSI and CPI topology awareness is automatically done for greenfield clusters.

First, check to see if the machineDeployments have been updated for Failure

In TKG 2.3 with cluster class based clusters, CPI and CSI are managed by Tanzu Packages (pkgi), you can see these by running the following commands:

k get vspherecpiconfigs.cpi.tanzu.vmware.com

k get vspherecsiconfigs.csi.tanzu.vmware.com

First, we need to update the VSphereCPIConfig and add in the k8s-region and k8s-zone into the spec.

k edit vspherecpiconfigs.cpi.tanzu.vmware.com tkg-workload12-vsphere-cpi-package

Add in the region and zone into the spec.

spec:
  vsphereCPI:
    antreaNSXPodRoutingEnabled: false
    mode: vsphereCPI
    region: k8s-region
    tlsCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    zone: k8s-zone

Change to the workload cluster context and run this command to check the reconciliation status for VSphereCPIConfig

k get pkgi -n tkg-system tkg-workload12-vsphere-cpi

If it shows anything but Reconcile succeeded, then we need to force the update with a deletion.

k delete pkgi -n tkg-system tkg-workload12-vsphere-cpi

Secondly, we need to update the VSphereCSIConfig and add in the k8s-region and k8s-zone into the spec.

Change back to the TKG Management cluster context and run the following command

k edit vspherecsiconfigs.csi.tanzu.vmware.com tkg-workload12

spec:
  vsphereCSI:
    config:
      datacenter: /home.local
      httpProxy: ""
      httpsProxy: ""
      insecureFlag: false
      noProxy: ""
      useTopologyCategories: true
      region: k8s-region
      zone: k8s-zone
    mode: vsphereCSI

Delete the csinodes and csinodetopologies to make the change.

Change to the workload cluster context and run the following commands

k delete csinode --all --context tkg-workload12-admin@tkg-workload12

k delete csinodetopologies.cns.vmware.com --all --context tkg-workload12-admin@tkg-workload12

Run the following command to check the reconciliation process

k get pkgi -n tkg-system tkg-workload12-vsphere-csi

We need to delete the CSI pkgi to force the change

k delete pkgi -n tkg-system tkg-workload12-vsphere-csi

We can check that the topology keys are now active with this command

kubectl get csinodes -o jsonpath='{range .items[*]}{.metadata.name} {.spec}{"\n"}{end}'

tkg-workload12-md-0-5j2dw-76bf777bbdx6b4ss-v7fn4 {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"4225d4f9-ded1-611b-1fd5-7320ffffbe28","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-workload12-md-1-69s4n-85b74654fdx646xd-ctrkg {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"4225ff47-9c82-b377-a4a2-d3ea15bce5aa","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-workload12-md-2-h2p9p-5f85887b47xwzcpq-7pgc8 {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"4225b76d-ef40-5a7f-179a-31d804af969c","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-workload12-x2jb5-6nt2b {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"4225ba85-53dc-56fd-3e9c-5ce609bb08d3","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-workload12-x2jb5-7sl8j {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"42251a1c-871c-5826-5a45-a6747c181962","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-workload12-x2jb5-mmhvb {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"42257d5a-daab-2ba6-dfb7-aa75f4063250","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}

Thats it! We’ve successfully updated an already deployed cluster without AZs to now be able to use AZs for pod placement and PVC placement with topology awareness.

TKG 2.3 Multi Availability Zone Updates

TKG 2.3 has some changes to how TKG clusters with multi availability zones are deployed. This post summarises these changes.

These changes allow some cool new options such as

Deploy a TKG cluster into multiple AZs where, each AZ can be a vSphere cluster or a host group, where a host group can have one or more ESX hosts.
Deploy worker nodes across AZs, but do not deploy control plane nodes into any AZ.
Deploy worker nodes across AZs, and enforce control plane nodes to be in one AZ
Deploy TKG clusters without AZs.
Deploy all nodes into just one AZ, think vSAN stretched cluster use cases.
Enable multi-AZ for already deployed clusters that were initially deployed without AZs.
All of the above but with one control plane node (CLUSTER_PLAN: dev) or three control plane nodes (CLUSTER_PLAN: prod)
All of the above but with single node clusters too!
CSI topology has not changed and is supported for topology aware volume provisioning.

VSphereDeploymentZone requires labels

The VSphereDeploymentZone needs to be labeled in order for the new configuration variable VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS to use the labels. This parameter is used to place the control plane nodes into the desired AZ.

Note that if VSPHERE-ZONE and VSPHERE_REGION is specified in the cluster configuration file then you must specify a VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS. If you don’t you’ll get this error:

Error: workload cluster configuration validation failed: VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS should be configured if VSPHERE_ZONE/VSPHERE_REGION are configured

You also cannot leave the variable for VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS blank, or give a fake label e.g., VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: “fake=fake” as you’ll get this error:

Error: workload cluster configuration validation failed: unable find VsphereDeploymentZone by the matchlabels.

However, there are ways around this, which I’ll cover below.

Below is my manifest for the VSphereDeploymentZone, note that labels for region and az.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-1
 labels:
   region: cluster
   az: az-1
spec:
 server: vcenter.vmwire.com
 failureDomain: az-1
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-2
 labels:
   region: cluster
   az: az-2
spec:
 server: vcenter.vmwire.com
 failureDomain: az-2
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-3
 labels:
   region: cluster
   az: az-3
spec:
 server: vcenter.vmwire.com
 failureDomain: az-3
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload

Deploy a TKG cluster with multi AZs

Lets say you have an environment with three AZs, and you want both the control plane nodes and the worker nodes to be distributed across the AZs.

The cluster config file would need to have the following variables.

VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "region=cluster"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
USE_TOPOLOGY_CATEGORIES: true

tanzu cluster create tkg-workload1 -f tkg-cluster.yaml --dry-run > tkg-workload1-spec.yaml

tanzu cluster create -f tkg-workload1-spec.yaml

Deploy a TKG cluster with multi AZs but not for control plane nodes

tanzu cluster create tkg-workload2 -f tkg-cluster.yaml--dry-run > tkg-workload2-spec.yaml

Edit the file tkg-workload2-spec.yaml file and remove the following lines to not deploy the control plane nodes into an AZ

    - name: controlPlaneZoneMatchingLabels
      value:
        region: cluster

tanzu cluster create -f tkg-workload2-spec.yaml

Deploy a TKG cluster with multi AZs and force control plane nodes into one AZ

The cluster config file would need to have the following variables.

VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "az=az-1"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
USE_TOPOLOGY_CATEGORIES: true

tanzu cluster create tkg-workload3 -f tkg-cluster.yaml--dry-run > tkg-workload3-spec.yaml

tanzu cluster create -f tkg-workload3-spec.yaml

Deploy a TKG cluster into one AZ

The cluster config file would need to have the following variables.

VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "az=az-1"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
USE_TOPOLOGY_CATEGORIES: true

tanzu cluster create tkg-workload4 -f tkg-cluster.yaml --dry-run > tkg-workload4-spec.yaml

tanzu cluster create -f tkg-workload4-spec.yaml

Deploy TKG cluster with only one control plane node

You can also deploy all of the options above, but with just one control plane node. This minimises resources if you’re resource constrained.

To do this your cluster config file would have the following variables.

CLUSTER_PLAN: dev
VSPHERE_AZ_CONTROL_PLANE_MATCHING_LABELS: "region=cluster"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3
USE_TOPOLOGY_CATEGORIES: true

tanzu cluster create tkg-workload5 -f tkg-cluster.yaml--dry-run > tkg-workload5-spec.yaml

Edit the file tkg-workload5-spec.yaml file and remove the following lines to not deploy the control plane nodes into an AZ

    - name: controlPlaneZoneMatchingLabels
      value:
        region: cluster

Also, since the CLUSTER_PLAN is set to dev, you’ll see that the machineDeployments will show az-1 having three replicas. To change the machineDeployments to deploy one replica in each AZ, change the file to the following:

    workers:
      machineDeployments:
      - class: tkg-worker
        failureDomain: az-1
        metadata:
          annotations:
            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
        name: md-0
        replicas: 1
        strategy:
          type: RollingUpdate
      - class: tkg-worker
        failureDomain: az-2
        metadata:
          annotations:
            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
        name: md-1
        replicas: 1
        strategy:
          type: RollingUpdate
      - class: tkg-worker
        failureDomain: az-3
        metadata:
          annotations:
            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
        name: md-2
        replicas: 1
        strategy:
          type: RollingUpdate

tanzu cluster create -f tkg-workload5-spec.yaml

How to find which AZs the nodes are deployed into

kubectl get machines -o json | jq -r '[.items[] | {name:.metadata.name, failureDomain:.spec.failureDomain}]'

[
  {
    "name": "tkg-workload2-md-0-xkdm2-6f58d5f5bbxpkfcz-ffvmn",
    "failureDomain": "az-1"
  },
  {
    "name": "tkg-workload2-md-1-w9dk7-cf5c7cbd7xs9gwz-2mjj4",
    "failureDomain": "az-2"
  },
  {
    "name": "tkg-workload2-md-2-w9dk7-cf5c7cbd7xs9gwz-4j9ds",
    "failureDomain": "az-3"
  },
  {
    "name": "tkg-workload2-vnpbp-5rt4b",
    "failureDomain": null
  },
  {
    "name": "tkg-workload2-vnpbp-8rtqd",
    "failureDomain": null
  },
  {
    "name": "tkg-workload2-vnpbp-dq68j",
    "failureDomain": null
  }
]

Avi DNS Provider for Kubernetes

Avi DNS can host the names and IP addresses of the virtual services configured in Avi Vantage. Avi Vantage serves as DNS provider for the hosted virtual services.

Avi DNS runs a virtual service with System-DNS application profile type and a network profile using per-packet load balancing.

An Avi Ingress service is created in Kubernetes, Avi will automatically create the DNS record for the ingress service.

For example, creating an ingress for nginx.tkg-workload1.vmwire.com will automatically be routed to the nginx pod by the Avi DNS Provider.

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-ingress
  annotations:
    ako.vmware.com/enable-tls: "true"
  labels:
    app: nginx
spec:
  ingressClassName: aviingressclass-tkg-workload-vip
  rules:
    - host: "nginx.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: nginx-service
                port:
                  number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
  labels:
spec:
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP

Step 1 – Create a virtual service for DNS

Click on Applications | Virtual Services | Create Virtual Service | Advanced Setup

Select the Cloud to create the DNS virtual service in.

Under Application Profile, select System DNS.

Under VS VIP, click on Create VS VIP.

Press the ADD button under VIPs.

Give the service a name, select a VIP Address Allocation Network, IPv4 Subnet and Placement Network. Don’t set anything for DNS or RBAC.

Then press Save a few times to complete the wizard.

Goto the Advanced tab and choose a Service Engine Group for the DNS service to use.

Press Save to complete the virtual service setup.

Step 2- Enable DNS Service for Avi

Navigate to the Administration tab and select the DNS Virtual Service in the drop-down menu.

Step 3- Edit the default DNS Profile

Navigate to the Templates tab and edit the default DNS profile, the type is Avi Vantage DNS.

Under DNS Service Domains, add in the domain that you will be delegated by the Avi DNS Service. Then press Save.

Step 4- Edit the default DNS Profile

Navigate to the Infrastructure tab and edit the cloud that you want to enable for Avi DNS.

Click on the IPAM/DNS button at the top and it should take you to that section.

Make sure that the DNS profile is selected under DNS Profile.

Step 5- Add the Avi DNS Service as a delegated domain in DNS

Find out the IP address of the Avi DNS virtual service, mine is 172.16.4.67.

You can identify it by going to Applications | Virtual Services.

I use Microsoft DNS Services, so using DNS Manager for the DNS Delegation. I want to use *.tkg-workload1.vmwire.com with Avi Ingress, so to delegate the tkg-workload1 domain with Microsoft DNS Services we create a new Delegation.

Enter the IP address for the FQDN.

Thats it!

You’re now ready for Avi to manage DNS records for the sub domain delegation.

Using Contour to expose Grafana and Prometheus with TLS

The Tanzu Packages in Tanzu Kubernetes Grid (TKG) include Contour, Grafana and Prometheus. Tanzu Packages automatically install and create TLS if ingress is enabled. This post, shows how to update the prometheus-data-values.yaml and grafana-data-values.yaml files to use TLS certificates with ingress using Contour.

This post can be used for TKG on vSphere and CSE with VCD. The examples below use TKG with CSE 4.0.3.

Install Contour

List available contour packages

tanzu package available list contour.tanzu.vmware.com -A

We shall install the latest version available for TKG 1.6.1 used by CSE 4.0.3, 1.20.2+vmware.2-tkg.1. First we need a contour-data-values.yaml file to use to install contour.

If you want to use a static IP address for the envoy load balancer service, for example to re-use the external public IP address currently used by the Kube API you can add a line under line 12:

LoadBalancerIP: <external-ip>

---
infrastructure_provider: vsphere
namespace: tanzu-system-ingress
contour:
 configFileContents: {}
 useProxyProtocol: false
 replicas: 2
 pspNames: "vmware-system-restricted"
 logLevel: info
envoy:
 service:
   type: LoadBalancer
   annotations: {}
   labels: {}
   nodePorts:
     http: null
     https: null
   externalTrafficPolicy: Cluster
   disableWait: false
 hostPorts:
   enable: true
   http: 80
   https: 443
 hostNetwork: false
 terminationGracePeriodSeconds: 300
 logLevel: info
 pspNames: null
certificates:
 duration: 8760h
 renewBefore: 360h

Then install with this command

kubectl create ns my-packages
tanzu package install contour \
--package contour.tanzu.vmware.com \
--version 1.20.2+vmware.2-tkg.1 \
--values-file /home/contour/contour-data-values.yaml \
--namespace my-packages

Install Prometheus

tanzu package available list prometheus.tanzu.vmware.com -A

The latest available version for TKG 1.6.1 used by CSE 4.0.3 is 2.36.2+vmware.1-tkg.1.

Update your prometheus-data-values.yaml file with the TLS certificate, private key, enable ingress and update the virtual_host_fqdn. Use pipe “|” to include all lines of your certificate.

ingress:
  annotations:
    service.beta.kubernetes.io/vcloud-avi-ssl-no-termination: "true"
  alertmanager_prefix: /alertmanager/
  alertmanagerServicePort: 80
  enabled: true
  prometheus_prefix: /
  prometheusServicePort: 80
  tlsCertificate:
    tls.crt: |
      -----BEGIN CERTIFICATE-----
      MIIEZDCCA0ygAwIBAgISA1UHbwcEhpImsiCGFwSMTVQsMA0GCSqGSIb3DQEBCwUA
      MDIxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MQswCQYDVQQD
      -- snipped --
      -----END CERTIFICATE-----
    tls.key: |

    ca.crt:
  virtual_host_fqdn: prometheus.tenant1.vmwire.com

Install Prometheus with this command

tanzu package install prometheus \
--package prometheus.tanzu.vmware.com \
--version 2.36.2+vmware.1-tkg.1 \
--values-file prometheus-data-values.yaml \
--namespace my-packages

Install Grafana

List available Grafana packages

tanzu package available list grafana.tanzu.vmware.com -A

The latest available version for TKG 1.6.1 used by CSE 4.0.3 is 7.5.7+vmware.2-tkg.1.

Update your grafana-data-values.yaml file with the TLS certificate, private key, enable ingress and update the virtual_host_fqdn. Use pipe “|” to include all lines of your certificate.

ingress:
  annotations:
    service.beta.kubernetes.io/vcloud-avi-ssl-no-termination: "true"
  enabled: true
  prefix: /
  servicePort: 80
  virtual_host_fqdn: grafana.tenant1.vmwire.com
  tlsCertificate:
    tls.crt: |
      -----BEGIN CERTIFICATE-----
      MIIEZDCCA0ygAwIBAgISA1UHbwcEhpImsiCGFwSMTVQsMA0GCSqGSIb3DQEBCwUA
      MDIxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MQswCQYDVQQD
      --snipped--
      -----END CERTIFICATE-----
    tls.key: |
      -----BEGIN PRIVATE KEY-----
      
      -----END PRIVATE KEY-----

Install Grafana with this command

tanzu package install grafana \
--package grafana.tanzu.vmware.com \
--version 7.5.7+vmware.2-tkg.1 \
--values-file grafana-data-values.yaml \
--namespace my-packages

Update DNS records

Update DNS records for the FQDNs to point to the IP address of the envoy service. You can find the External IP address used by Envoy by typing

k get svc -n tanzu-system-ingress envoy.

Single node clusters with TKG

Single-node clusters are a Tech Preview for TKG since 2.1 on vSphere. Its not actually a single-node cluster per-se but a collapsed Kubernetes node with both the control plane and the worker node on one virtual machine that can be deployed in a cluster with more than one node or just as a single-node.

Use cases include edge deployments or hardware constrained environments.

You can deploy a single node or three nodes that has both the control plane and the worker node roles. In fact, to Kubernetes, the node is recognised as a control plane node, but pods are allowed to be scheduled on the nodes since we change the spec.topology.variables.controlPlaneTaint=false in the cluster config specification.

A few things to know about single node clusters

Supported on TKG 2.1 and newer with the standalone management cluster only, not supported with vSphere with Tanzu (TKG with Supervisor).
Single node clusters are supported with Cluster Class based clusters only. Legacy clusters are not supported.
Single node clusters behave just like any other TKG clusters so it will support everything you are used to.
You can deploy nodes that are both control plane and workers in only odd numbers, this is because Kubernetes still treats these nodes as control plane nodes, but allow any pod to be scheduled on them. So scaling the cluster up from one node to 3, 5, 7 etc is possible with a simple one line command of tanzu cluster scale <cluster-name> -c #. Here is a cluster with five nodes. As you can see Kubernetes assigns the control-plane role to the nodes. However, deploying a single-node cluster removes the Taints from the node. On any other cluster type you’ll see this taint Taints: node-role.kubernetes.io/control-plane:NoSchedule. This is removed for single-node clusters.

k get no -o wide
NAME                     STATUS   ROLES           AGE     VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
tkg-single-ngbmw-dcljq   Ready    control-plane   17m     v1.25.7+vmware.2   172.16.3.84   172.16.3.84   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1
tkg-single-ngbmw-mm6tp   Ready    control-plane   9m51s   v1.25.7+vmware.2   172.16.3.85   172.16.3.85   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1
tkg-single-ngbmw-mvdv2   Ready    control-plane   14m     v1.25.7+vmware.2   172.16.3.70   172.16.3.70   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1
tkg-single-ngbmw-ngqxd   Ready    control-plane   12m     v1.25.7+vmware.2   172.16.3.75   172.16.3.75   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1
tkg-single-ngbmw-tqq79   Ready    control-plane   3h1m    v1.25.7+vmware.2   172.16.3.82   172.16.3.82   Ubuntu 20.04.6 LTS   5.4.0-144-generic   containerd://1.6.18-1-gdbc99e5b1

You can also scale down

k get no
NAME                     STATUS   ROLES           AGE   VERSION
tkg-single-ngbmw-mm6tp   Ready    control-plane   18m   v1.25.7+vmware.2

You can register single node clusters to TMC. This is possible as TKG changes the metadata for single node clusters as a workload cluster type. You can find this by looking at the config map for the tkg-metadata k get cm -n tkg-system-public tkg-metadata -o yaml. Line 6 below.

apiVersion: v1
data:
  metadata.yaml: |
    cluster:
        name: tkg-single
        type: workload
        plan: dev
        kubernetesProvider: VMware Tanzu Kubernetes Grid
        tkgVersion: v2.2.0
        edition: tkg
        infrastructure:
            provider: vsphere
        isClusterClassBased: true
    bom:
        configmapRef:
            name: tkg-bom
kind: ConfigMap
metadata:
  creationTimestamp: "2023-05-29T14:47:14Z"
  name: tkg-metadata
  namespace: tkg-system-public
  resourceVersion: "250"
  uid: 944a120b-595c-4367-a570-db295af54d11

To deploy a single-node cluster, you can refer to the documentation here.

In summary, switch to the TKG management cluster context and type this command to enable single-node clusters tanzu config set features.cluster.single-node-clusters true
create a cluster config file as normal, and save the file as a yaml, for example tkg-single.yaml.

#! ---------------------------------------------------------------------
#! Basic cluster creation configuration
#! ---------------------------------------------------------------------

# CLUSTER_NAME:
ALLOW_LEGACY_CLUSTER: false
INFRASTRUCTURE_PROVIDER: vsphere
CLUSTER_PLAN: dev
NAMESPACE: default
# CLUSTER_API_SERVER_PORT: # For deployments without NSX Advanced Load Balancer
CNI: antrea
ENABLE_DEFAULT_STORAGE_CLASS: false

#! ---------------------------------------------------------------------
#! Node configuration
#! ---------------------------------------------------------------------

# SIZE:
#CONTROLPLANE_SIZE: small
#WORKER_SIZE: small

# VSPHERE_NUM_CPUS: 2
# VSPHERE_DISK_GIB: 40
# VSPHERE_MEM_MIB: 4096

VSPHERE_CONTROL_PLANE_NUM_CPUS: 4
VSPHERE_CONTROL_PLANE_DISK_GIB: 40
VSPHERE_CONTROL_PLANE_MEM_MIB: 8192
# VSPHERE_WORKER_NUM_CPUS: 2
# VSPHERE_WORKER_DISK_GIB: 40
# VSPHERE_WORKER_MEM_MIB: 4096

# CONTROL_PLANE_MACHINE_COUNT:
# WORKER_MACHINE_COUNT:
# WORKER_MACHINE_COUNT_0:
# WORKER_MACHINE_COUNT_1:
# WORKER_MACHINE_COUNT_2:

#! ---------------------------------------------------------------------
#! vSphere configuration
#! ---------------------------------------------------------------------

#VSPHERE_CLONE_MODE: "fullClone"
VSPHERE_NETWORK: tkg-workload
# VSPHERE_TEMPLATE:
# VSPHERE_TEMPLATE_MOID:
# IS_WINDOWS_WORKLOAD_CLUSTER: false
# VIP_NETWORK_INTERFACE: "eth0"
VSPHERE_SSH_AUTHORIZED_KEY: <-- snipped -->
VSPHERE_USERNAME: administrator@vsphere.local
VSPHERE_PASSWORD: 
# VSPHERE_REGION:
# VSPHERE_ZONE:
# VSPHERE_AZ_0:
# VSPHERE_AZ_1:
# VSPHERE_AZ_2:
# USE_TOPOLOGY_CATEGORIES: false
VSPHERE_SERVER: vcenter.vmwire.com
VSPHERE_DATACENTER: home.local
VSPHERE_RESOURCE_POOL: tkg-vsphere-workload
VSPHERE_DATASTORE: lun01
VSPHERE_FOLDER: tkg-vsphere-workload
# VSPHERE_STORAGE_POLICY_ID
# VSPHERE_WORKER_PCI_DEVICES:
# VSPHERE_CONTROL_PLANE_PCI_DEVICES:
# VSPHERE_IGNORE_PCI_DEVICES_ALLOW_LIST:
VSPHERE_CONTROL_PLANE_CUSTOM_VMX_KEYS: 'ethernet0.ctxPerDev=3,ethernet0.pnicFeatures=4,sched.cpu.shares=high'
# VSPHERE_WORKER_CUSTOM_VMX_KEYS: 'ethernet0.ctxPerDev=3,ethernet0.pnicFeatures=4,sched.cpu.shares=high'
# WORKER_ROLLOUT_STRATEGY: "RollingUpdate"
# VSPHERE_CONTROL_PLANE_HARDWARE_VERSION:
# VSPHERE_WORKER_HARDWARE_VERSION:
VSPHERE_TLS_THUMBPRINT: <-- snipped -->
VSPHERE_INSECURE: false
# VSPHERE_CONTROL_PLANE_ENDPOINT: # Required for Kube-Vip
# VSPHERE_CONTROL_PLANE_ENDPOINT_PORT: 6443
# VSPHERE_ADDITIONAL_FQDN:
AVI_CONTROL_PLANE_HA_PROVIDER: true


#! ---------------------------------------------------------------------
#! Common configuration
#! ---------------------------------------------------------------------

ADDITIONAL_IMAGE_REGISTRY_1: "harbor.vmwire.com"
ADDITIONAL_IMAGE_REGISTRY_1_SKIP_TLS_VERIFY: false
ADDITIONAL_IMAGE_REGISTRY_1_CA_CERTIFICATE: <-- snipped -->


# TKG_CUSTOM_IMAGE_REPOSITORY: ""
# TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: false
# TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: ""

# TKG_HTTP_PROXY: ""
# TKG_HTTPS_PROXY: ""
# TKG_NO_PROXY: ""
# TKG_PROXY_CA_CERT: ""

ENABLE_AUDIT_LOGGING: false

CLUSTER_CIDR: 100.96.0.0/11
SERVICE_CIDR: 100.64.0.0/13

# OS_NAME: ""
# OS_VERSION: ""
# OS_ARCH: ""

#! ---------------------------------------------------------------------
#! Autoscaler configuration
#! ---------------------------------------------------------------------

ENABLE_AUTOSCALER: false

Then use the –dry-run option and save the cluster object spec file with tanzu cluster create <name-of-new-cluster> -f tkg-single.yaml > tkg-single-spec.yaml --dry-run, this creates a new file called tkg-single-spec.yaml that you need to edit before creating the single node cluster.

Edit the tkg-single-spec.yaml file and change the following sections.

under spec.topology.variables, add the following:

- name: controlPlaneTaint
  value: false

under spec.topology.workers, delete the entire block including the workers section heading.

Your changed file should look like the example below.

apiVersion: csi.tanzu.vmware.com/v1alpha1
kind: VSphereCSIConfig
metadata:
  name: tkg-single
  namespace: default
spec:
  vsphereCSI:
    config:
      datacenter: /home.local
      httpProxy: ""
      httpsProxy: ""
      noProxy: ""
      region: null
      tlsThumbprint: <-- snipped -->
      useTopologyCategories: false
      zone: null
    mode: vsphereCSI
---
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: ClusterBootstrap
metadata:
  annotations:
    tkg.tanzu.vmware.com/add-missing-fields-from-tkr: v1.25.7---vmware.2-tkg.1
  name: tkg-single
  namespace: default
spec:
  additionalPackages:
  - refName: metrics-server*
  - refName: secretgen-controller*
  - refName: pinniped*
  - refName: tkg-storageclass*
    valuesFrom:
      inline:
        infraProvider: ""
  csi:
    refName: vsphere-csi*
    valuesFrom:
      providerRef:
        apiGroup: csi.tanzu.vmware.com
        kind: VSphereCSIConfig
        name: tkg-single
  kapp:
    refName: kapp-controller*
---
apiVersion: v1
kind: Secret
metadata:
  name: tkg-single
  namespace: default
stringData:
  password: 
  username: administrator@vsphere.local
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  annotations:
    osInfo: ubuntu,20.04,amd64
    tkg/plan: dev
  labels:
    tkg.tanzu.vmware.com/cluster-name: tkg-single
  name: tkg-single
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 100.96.0.0/11
    services:
      cidrBlocks:
      - 100.64.0.0/13
  topology:
    class: tkg-vsphere-default-v1.0.0
    controlPlane:
      metadata:
        annotations:
          run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=ubuntu
      replicas: 1
    variables:
    - name: controlPlaneTaint
      value: false
    - name: cni
      value: antrea
    - name: controlPlaneCertificateRotation
      value:
        activate: true
        daysBefore: 90
    - name: additionalImageRegistries
      value:
      - caCert: <-- snipped -->
        host: harbor.vmwire.com
        skipTlsVerify: false
    - name: auditLogging
      value:
        enabled: false
    - name: podSecurityStandard
      value:
        audit: baseline
        deactivated: false
        warn: baseline
    - name: aviAPIServerHAProvider
      value: true
    - name: vcenter
      value:
        cloneMode: fullClone
        datacenter: /home.local
        datastore: /home.local/datastore/lun01
        folder: /home.local/vm/tkg-vsphere-workload
        network: /home.local/network/tkg-workload
        resourcePool: /home.local/host/cluster/Resources/tkg-vsphere-workload
        server: vcenter.vmwire.com
        storagePolicyID: ""
        template: /home.local/vm/Templates/ubuntu-2004-efi-kube-v1.25.7+vmware.2
        tlsThumbprint: <-- snipped -->
    - name: user
      value:
        sshAuthorizedKeys:
        - <-- snipped -->
    - name: controlPlane
      value:
        machine:
          customVMXKeys:
            ethernet0.ctxPerDev: "3"
            ethernet0.pnicFeatures: "4"
            sched.cpu.shares: high
          diskGiB: 40
          memoryMiB: 8192
          numCPUs: 4
    - name: worker
      value:
        count: 1
        machine:
          diskGiB: 40
          memoryMiB: 4096
          numCPUs: 2
    version: v1.25.7+vmware.2-tkg.1

AviInfraSetting with IngressClass

Avi Infra Setting provides a way to segregate Layer-4/Layer-7 virtual services to have properties based on different underlying infrastructure components, like Service Engine Group, intended VIP Network etc.

Here I have a different network that I want a new Ingress to use, in this case the tkg-wkld-trf-vip network, 172.16.4.97/27, lets assume its used for 5G traffic connectivity and the NSX-T T1 is connected to a different T0 VRF. This isolates the traffic between VRFs, so that we can expose certain applications on different VRFs.

In this example, I’ll change Grafana from using the default VIP network to the tkg-wkld-trf-vip network instead. You can read up on how this was originally done using the default VIP network in the previous post.

aviinfrasetting-tkg-wkld-trf-vip.yaml

---
apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  name: aviinfrasetting-tkg-wkld-trf-vip
spec:
  seGroup:
    name: tkg-workload1
  network:
    vipNetworks:
      - networkName: tkg-wkld-trf-vip
        cidr: 172.16.4.96/27
    enableRhi: false

Attaching Avi Infra Setting to Ingress

Avi Infra Settings can be applied to Ingress resources, using the IngressClass construct. IngressClass provides a way to configure Controller-specific load balancing parameters and applies these configurations to a set of Ingress objects. AKO supports listening to IngressClass resources in Kubernetes version 1.19+. The Avi Infra Setting reference can be provided in the Ingress Class as shown below:

aviingressclass-tkg-wkld-trf-vip.yaml

---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: aviingressclass-tkg-wkld-trf-vip
spec:
  controller: ako.vmware.com/avi-lb
  parameters:
    apiGroup: ako.vmware.com
    kind: AviInfraSetting
    name: aviinfrasetting-tkg-wkld-trf-vip

dashboard-ingress.yaml

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: dashboard-ingress
  namespace: tanzu-system-dashboards
  annotations:
    ako.vmware.com/enable-tls: "true"
  labels:
    app: dashboard-ingress
spec:
  ingressClassName: aviingressclass-tkg-wkld-trf-vip
  rules:
    - host: "grafana.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: grafana
                port:
                  number: 80

Below you can see that Grafana is now using the new AviInfraSetting and has been assigned an IP address of 172.16.4.98.

Introduction to Avi Ingress and Replacing Contour for Prometheus and Grafana

Avi Ingress is an alternative to Contour and NGINX ingress controllers.

Tanzu Kubernetes Grid ships with Contour as the default Ingress controller that Tanzu Packages uses to expose Prometheus and Contour. Prometheus and Grafana are configured to use Contour if you set ingress: true in the values.yaml files.

This post details how to set Avi Ingress up and use it to expose these applications using signed TLS certificates.

Let’s start

Install AKO with helm as normal, use ClusterIP in the Avi values.yaml config file.

Reference link to documentation:

https://avinetworks.com/docs/ako/1.9/networking-v1-ingress/

Create secret for ingress certificate, using a wildcard certificate will enable Avi to automatically secure all applications with the TLS certificate.

tls.key and tls.crt in base64 encoded format.

router-certs-default.yaml

apiVersion: v1
kind: Secret
metadata:
  name: router-certs-default
  namespace: avi-system
type: kubernetes.io/tls
data:
  tls.key: --snipped--
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUVjVENDQTFtZ0F3SUJBZ0lTQTI0MDJNMStJN01kaTIwRWZlK2hlQitQTUEwR0NTcUdTSWIzRFFFQkN3VUEKTURJeEN6QUpCZ05WQkFZVEFsVlRNUll3RkFZRFZRUUtFdzFNWlhRbmN5QkZibU55ZVhCME1Rc3dDUVlEVlFRRApFd0pTTXpBZUZ3MHlNekF6TWpReE1qSTBNakphRncweU16QTJNakl4TWpJME1qRmFNQ1V4SXpBaEJnTlZCQU1NCkdpb3VkR3RuTFhkdmNtdHNiMkZrTVM1MmJYZHBjbVV1WTI5dE1Ga3dFd1lIS29aSXpqMENBUVlJS29aSXpqMEQKQVFjRFFnQUVmcEs2MUQ5bFkyQUZzdkdwZkhwRlNEYVl1alF0Nk05Z21yYUhrMG5ySHJhTUkrSEs2QXhtMWJyRwpWMHNrd2xDWEtrWlNCbzRUZmFlTDF6bjI1N0M1QktPQ0FsY3dnZ0pUTUE0R0ExVWREd0VCL3dRRUF3SUhnREFkCkJnTlZIU1VFRmpBVUJnZ3JCZ0VGQlFjREFRWUlLd1lCQlFVSEF3SXdEQVlEVlIwVEFRSC9CQUl3QURBZEJnTlYKSFE0RUZnUVVxVjMydlU4Yzl5RFRpY3NVQmJCMFE0MFNsZFl3SHdZRFZSMGpCQmd3Rm9BVUZDNnpGN2RZVnN1dQpVQWxBNWgrdm5Zc1V3c1l3VlFZSUt3WUJCUVVIQVFFRVNUQkhNQ0VHQ0NzR0FRVUZCekFCaGhWb2RIUndPaTh2CmNqTXVieTVzWlc1amNpNXZjbWN3SWdZSUt3WUJCUVVITUFLR0ZtaDBkSEE2THk5eU15NXBMbXhsYm1OeUxtOXkKWnk4d0pRWURWUjBSQkI0d0hJSWFLaTUwYTJjdGQyOXlhMnh2WVdReExuWnRkMmx5WlM1amIyMHdUQVlEVlIwZwpCRVV3UXpBSUJnWm5nUXdCQWdFd053WUxLd1lCQkFHQzN4TUJBUUV3S0RBbUJnZ3JCZ0VGQlFjQ0FSWWFhSFIwCmNEb3ZMMk53Y3k1c1pYUnpaVzVqY25sd2RDNXZjbWN3Z2dFR0Jnb3JCZ0VFQWRaNUFnUUNCSUgzQklIMEFQSUEKZHdCNk1veFUyTGN0dGlEcU9PQlNIdW1FRm5BeUU0Vk5POUlyd1RwWG8xTHJVZ0FBQVljVHlxNTJBQUFFQXdCSQpNRVlDSVFEekZNSklaT3NKMG9GQTV2UVVmNUpZQUlaa3dBMnkxNE92K3ljcTU0ZDZmZ0loQUxOcmNnM0lrNllsCkxlMW1ROHFVZmttNWsxRTZBSDU4OFJhYWZkZlhONTJCQUhjQTZEN1EyajcxQmpVeTUxY292SWxyeVFQVHk5RVIKYSt6cmFlRjNmVzBHdlc0QUFBR0hFOHF1VlFBQUJBTUFTREJHQWlFQW9Wc3ZxbzhaR2o0cmszd1hmL0xlSkNCbApNQkg2UFpBb2UyMVVkbko5aThvQ0lRRGoyS1Q1eWlUOGtRdjFyemxXUWgveHV6VlRpUGtkdlBHL3Zxd3J0SWhjCjJEQU5CZ2txaGtpRzl3MEJBUXNGQUFPQ0FRRUFFczlKSTFwZ3R6T2JyRmd0Vnpsc1FuZC8xMi9QYWQ5WXI2WVMKVE5XM3F1bElhaEZ4UDdVcVRIT0xVSGw0cGdpTThxZ2ZlcmhyTHZXbk1wOUlxQ3JVVElTSnFRblh5bnkyOHA2Zwoyc2NqS2xFSWt2RURvcExoek0ydGpCenc4a1dUYUdYUE8yM0dhcHBHWW14OS9Ma2NkUDVSS0xKMmlRTEJXZlhTCmNQRlNmZWsySEc3dEw1N0s0Uit4eDB4MTdsZ2RLeFdOL1JYQ2RvcHFPY3RyTCtPL0lwWVVWZXNiVzNJbkpFZDkKdjZmS1RmVE84K3JVVnlkajVmUGdFUWJva2Q2L3BDTGdIYS81UVpQMjZ1ZytRa1llUEJvUWRrTkpGOTk4a2NHWQpBZGc0THpJZjdYdU9SNDB4eU90aHIyN1p4Y1FXZnhMM2M4bGJuUlJrMXZNL3pMMDhIdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0=

k apply -f router-certs-default.yaml

Here is the example online store website deployment using ingress with the certificate. Lets play with this before we get around to exposing Prometheus and Grafana.

sample-ingress.yaml

---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: http-ingress-deployment
  labels:
    app: http-ingress
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-ingress
  template:
    metadata:
      labels:
        app: http-ingress
    spec:
      containers:
        - name: http-ingress
          image: ianwijaya/hackazon
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
      imagePullSecrets:
      - name: regcred
---
kind: Service
apiVersion: v1
metadata:
  name: ingress-svc
  labels:
    svc: ingress-svc
spec:
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector:
    app: http-ingress
  type: ClusterIP

avisvcingress.yaml

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: avisvcingress
  annotations:
    ako.vmware.com/enable-tls: "true"
  labels:
    app: avisvcingress
spec:
  ingressClassName: avi-lb
  rules:
    - host: "hackazon.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: ingress-svc
                port:
                  number: 80

Note that the Service uses ClusterIP and the Ingress is annotated with ako.vmware.com/enable-tls: "true" to use the default tls specified in router-certs-default.yaml. Also add the ingressClassName into the Ingress manifest.

k apply -f sample-ingress.yaml

k apply -f avisvcingress.yaml

k get ingress avisvcingress

NAME CLASS HOSTS ADDRESS PORTS AGE
avisvcingress avi-lb hackazone.tkg-workload1.vmwire.com 172.16.4.69 80 13m

Let’s add another host

Append another host to the avisvcingress.yaml file.

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: avisvcingress
  annotations:
    ako.vmware.com/enable-tls: "true"
  labels:
    app: avisvcingress
spec:
  ingressClassName: avi-lb
  rules:
    - host: "hackazon.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: ingress-svc
                port:
                  number: 80
    - host: "nginx.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: nginx-service
                port:
                  number: 80

k replace -f avisvcingress.yaml

And use the trusty statefulset file to create an nginx webpage. statefulset-topology-aware.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
  labels:
spec:
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  serviceName: nginx-service
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: failure-domain.beta.kubernetes.io/zone
                operator: In
                values:
                - az-1
                - az-2
                - az-3
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: failure-domain.beta.kubernetes.io/zone
      terminationGracePeriodSeconds: 10
      initContainers:
      - name: install
        image: busybox
        command:
        - wget
        - "-O"
        - "/www/index.html"
        - https://raw.githubusercontent.com/hugopow/cse/main/index.html
        volumeMounts:
        - name: www
          mountPath: "/www"
      containers:
        - name: nginx
          image: k8s.gcr.io/nginx-slim:0.8
          ports:
            - containerPort: 80
              name: web
          volumeMounts:
            - name: www
              mountPath: /usr/share/nginx/html
            - name: logs
              mountPath: /logs
  volumeClaimTemplates:
    - metadata:
        name: www
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: tanzu-local-ssd
        resources:
          requests:
            storage: 2Gi
    - metadata:
        name: logs
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: tanzu-local-ssd
        resources:
          requests:
            storage: 1Gi

k apply -f statefulset-topology-aware.yaml

k get ingress avisvcingress

NAME            CLASS    HOSTS                                                             ADDRESS       PORTS   AGE
avisvcingress   avi-lb   hackazon.tkg-workload1.vmwire.com,nginx.tkg-workload1.vmwire.com   172.16.4.69   80      7m33s

Notice that another host is added to the same ingress, and both hosts share the same VIP.

Lets add Prometheus to this!

Create a new manifest for Prometheus to use called monitoring-ingress.yaml

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: tanzu-system-monitoring
  annotations:
    ako.vmware.com/enable-tls: "true"
  labels:
    app: monitoring-ingress
spec:
  ingressClassName: avi-lb
  rules:
    - host: "prometheus.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: prometheus-server
                port:
                  number: 80

Note that since Prometheus when deployed by Tanzu Packages is deployed into the namespace tanzu-system-monitoring, we also need to create the new ingress in the same namespace.

Deploy Prometheus following the documentation here, but do not enable ingress in the prometheus-data-values.yaml file, that uses Contour. We don’t want that as we are using Avi Ingress instead.

Add Grafana too!

Create a new manifest for Grafana to use called dashboard-ingress.yaml.

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: dashboard-ingress
  namespace: tanzu-system-dashboards
  annotations:
    ako.vmware.com/enable-tls: "true"
  labels:
    app: dashboard-ingress
spec:
  ingressClassName: avi-lb
  rules:
    - host: "grafana.tkg-workload1.vmwire.com"
      http:
        paths:
          - pathType: Prefix
            path: /
            backend:
              service:
                name: grafana
                port:
                  number: 80

Note that since Grafana when deployed by Tanzu Packages is deployed into the namespace tanzu-system-dashboards, we also need to create the new ingress in the same namespace.

Deploy Grafana following the documentation here, but do not enable ingress in the grafana-data-values.yaml file, that uses Contour. We don’t want that as we are using Avi Ingress instead.

Summary

Ingress with Avi is really nice, I like it! A single secret to store the TLS certificates and all hosts are automatically configured to use TLS. You also just need to expose TCP 80 as ClusterIP Services and Avi will do the rest for you and expose the application over TCP 443 using the TLS cert.

Here you can see that all four of our applications – hackazon, nginx running across three AZs, Grafana and Prometheus all using Ingress and sharing a single IP address.

Very cool indeed!

k get ingress -A

NAMESPACE                 NAME                 CLASS    HOSTS                                                              ADDRESS       PORTS   AGE
default                   avisvcingress        avi-lb   hackazon.tkg-workload1.vmwire.com,nginx.tkg-workload1.vmwire.com   172.16.4.69   80      58m
tanzu-system-dashboards   dashboard-ingress    avi-lb   grafana.tkg-workload1.vmwire.com                                   172.16.4.69   80      3m47s
tanzu-system-monitoring   monitoring-ingress   avi-lb   prometheus.tkg-workload1.vmwire.com                                172.16.4.69   80      14m

CSE TKG Clusters can’t pull from GitHub

During TKG cluster creation you might see the following errors.

Error: failed to get
provider components for the "cluster-api:v1.1.3" provider: failed to get
repository client for the CoreProvider with name cluster-api: error creating
the GitHub repository client: failed to get GitHub latest version: failed to
get repository versions: failed to get repository versions: rate limit for
github api has been reached. Please wait one hour or get a personal API
token and assign it to the GITHUB_TOKEN environment variable

This is due to GitHub rate limiting for anonymous access to GitHub. CSE TKG clusters pull images from GitHub, and if you are pulling too many within a short period of time, you will eventually hit the rate limits.

To ensure that you don’t hit the limits a GitHub Access Token is needed.

Then configure CSE to use the GitHub Access Token using the CSE documentation here.

Scaling TKG Management Cluster Nodes Vertically

In a previous post I wrote about how to scale workload cluster control plane and worker nodes vertically. This post explains how to do the same for the TKG Management Cluster nodes.

Scaling vertically is increasing or decreasing the CPU, Memory, Disk or changing other things such as the network for the nodes. Using the Cluster API it is possible to make these changes on the fly, Kubernetes will use rolling updates to make the necessary changes.

First change to the TKG Management Cluster context to make the changes.

Scaling Worker Nodes

Run the following to list all the vSphereMachineTemplates.

k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -A

NAMESPACE    NAME                         AGE
tkg-system   tkg-mgmt-control-plane       20h
tkg-system   tkg-mgmt-worker              20h

These custom resource definitions are immutable so we will need to make a copy of the yaml file and edit it to add a new vSphereMachineTemplate.

k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system   tkg-mgmt-worker -o yaml > tkg-mgmt-worker-new.yaml

Now edit the new file named tkg-mgmt-worker-new.yaml

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"VSphereMachineTemplate","metadata":{"annotations":{"vmTemplateMoid":"vm-9726"},"name":"tkg-mgmt-worker","namespace":"tkg-system"},"spec":{"template":{"spec":{"cloneMode":"fullClone","datacenter":"/home.local","datastore":"/home.local/datastore/lun01","diskGiB":40,"folder":"/home.local/vm/tkg-vsphere-tkg-mgmt","memoryMiB":8192,"network":{"devices":[{"dhcp4":true,"networkName":"/home.local/network/tkg-mgmt"}]},"numCPUs":2,"resourcePool":"/home.local/host/Management/Resources/tkg-vsphere-tkg-Mgmt","server":"vcenter.vmwire.com","storagePolicyName":"","template":"/home.local/vm/Templates/photon-3-kube-v1.22.9+vmware.1"}}}}
    vmTemplateMoid: vm-9726
  creationTimestamp: "2022-12-23T15:23:56Z"
  generation: 1
  name: tkg-mgmt-worker
  namespace: tkg-system
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: tkg-mgmt
    uid: 9acf6370-64be-40ce-9076-050ab8c6f41f
  resourceVersion: "3069"
  uid: 4a8f305f-0b61-4d33-ba02-7fb3fcc8ba22
spec:
  template:
    spec:
      cloneMode: fullClone
      datacenter: /home.local
      datastore: /home.local/datastore/lun01
      diskGiB: 40
      folder: /home.local/vm/tkg-vsphere-tkg-mgmt
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: true
          networkName: /home.local/network/tkg-mgmt
      numCPUs: 2
      resourcePool: /home.local/host/Management/Resources/tkg-vsphere-tkg-Mgmt
      server: vcenter.vmwire.com
      storagePolicyName: ""
      template: /home.local/vm/Templates/photon-3-kube-v1.22.9+vmware.1

Change the name of the CRD on line 10. Make any other changes you need, such as CPU on line 32 or RAM on line 27. Save the file.

Now you’ll need to create the new vSphereMachineTemplate.

k apply -f tkg-mgmt-worker-new.yaml

Now we’re ready to make the change.

Lets first take a look at the MachineDeployments.

k get machinedeployments.cluster.x-k8s.io -A

NAMESPACE    NAME            CLUSTER    REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
tkg-system   tkg-mgmt-md-0   tkg-mgmt   2          2       2         0             Running   20h   v1.22.9+vmware.1

Now edit this MachineDeployment.

k edit machinedeployments.cluster.x-k8s.io -n tkg-system   tkg-mgmt-md-0

You need to make the change to the section spec.template.spec.infrastructureRef under line 56.

 53       infrastructureRef:
 54         apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
 55         kind: VSphereMachineTemplate
 56         name: tkg-mgmt-worker

Change line 56 to the new VsphereMachineTemplate CRD we created earlier.

 53       infrastructureRef:
 54         apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
 55         kind: VSphereMachineTemplate
 56         name: tkg-mgmt-worker-new

Save and quit. You’ll notice that a new VM will immediately start being cloned in vCenter. Wait for it to complete, this new VM is the new worker with the updated CPU and memory sizing and it will replace the current worker node. Eventually, after a few minutes, the old worker node will be deleted and you will be left with a new worker node with the updated CPU and RAM specified in the new VSphereMachineTemplate.

Scaling Control Plane Nodes

Scaling the control plane nodes is similar.

k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-control-plane -o yaml > tkg-mgmt-control-plane-new.yaml

Edit the file and perform the same steps as the worker nodes.

You’ll notice that there is no MachineDeployment for the control plane node for a TKG Management Cluster. Instead we have to edit the CRD named KubeAdmControlPlane.

Run this command

k get kubeadmcontrolplane -A

NAMESPACE    NAME                     CLUSTER    INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
tkg-system   tkg-mgmt-control-plane   tkg-mgmt   true          true                   1          1       1         0             21h   v1.22.9+vmware.1

Now we can edit it

k edit kubeadmcontrolplane -n tkg-system   tkg-mgmt-control-plane

Change the section under spec.machineTemplate.infrastructureRef, around line 106.

102   machineTemplate:
103     infrastructureRef:
104       apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
105       kind: VSphereMachineTemplate
106       name: tkg-mgmt-control-plane
107       namespace: tkg-system

Change line 106 to

102   machineTemplate:
103     infrastructureRef:
104       apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
105       kind: VSphereMachineTemplate
106       name: tkg-mgmt-control-plane-new
107       namespace: tkg-system

Save the file. You’ll notice that another VM will start cloning and eventually you’ll have a new control plane node up and running. This new control plane node will replace the older one. It will take longer than the worker node so be patient.

Rights for VMware Data Solutions

Creating a new Global Role

You’ll need to create a new global role with the correct rights to be able to deploy data solutions into a TKG cluster.

The easiest way to do this is to clone the role named Kubernetes Cluster Author created by CSE 4.0 and add additional rights for Data Solutions.

Administrator View: VMWARE:CAPVCDCLUSTER
Administrator View: VMWARE:DSCONFIG
Administrator View: VMWARE:DSINSTANCETEMPLATE
Administrator View: VMWARE:DSINSTANCE
Administrator View: VMWARE:DSPROVISIONING
Administrator View: VMWARE:DSCLUSTER

Administrator Full Control: VMWARE:DSINSTANCE

View: VMWARE:DSCONFIG
View: VMWARE:DSPROVISIONING
View: VMWARE:DSINSTANCE
View: VMWARE:DSINSTANCETEMPLATE
View: VMWARE:DSCLUSTER

Full Control: VMWARE:DSPROVISIONING
Full Control: VMWARE:DSCLUSTER
Full Control: VMWARE:DSINSTANCE

Edit VMWARE:DSINSTANCE
Edit VMWARE:DSCLUSTER
Edit VMWARE:DSPROVISIONING

Now publish this new Global Role to a tenant and assign a tenant user this new role and you can then deploy Data Solutions into a TKG cluster.

Cleaning up CSE 4.0 beta

For those partners that have been testing the beta, you’ll need to remove all traces of it before you can install the GA version. VMware does not support upgrading or migrating from beta builds to GA builds.

This is a post to help you clean up your VMware Cloud Director environment in preparation for the GA build of CSE 4.0.

If you don’t clean up, when you try to configure CSE again with the CSE Management wizard, you’ll see the message below:

“Server configuration entity already exists.”

Delete CSE Roles

First delete all the CSE Roles that the beta has setup, the GA version of CSE will recreate these for you when you use the CSE management wizard. Don’t forget to assign the new role to your CSE service account when you deploy the CSE GA OVA.

Use the Postman Collection to clean up

I’ve included a Postman collection on my Github account, available here.

Hopefully, it is self-explanatory. Authenticate against the VCD API, then run each API request in order, make sure you obtain the entity and entityType IDs before you delete.

If you’re unable to delete the entity or entityTypes, you may need to delete all of the CSE clusters before, that means cleaning up all PVCs, PVs, deployments and then the clusters themselves.

Deploy CSE GA Normally

You’ll now be able to use the Configure Management wizard and deploy CSE 4.0 GA as normal.

Known Issues

If you’re unable to delete any of these entities then run a POST using /resolve.

For example, https://vcd.vmwire.com/api-explorer/provider#/definedEntity/resolveDefinedEntity

Once, it is resolved, you can go ahead and delete the entity.

Using Velero with Restic for Kubernetes Data Protection

Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises.

This works with any Kubernetes cluster, including Tanzu Kubernetes Grid and Kubernetes clusters deployed with Container Service Extension with VMware Cloud Director.

This solution can be used for air-gapped environments where the Kuberenetes clusters do not have Internet access and cannot use public services such as Amazon S3, or Tanzu Mission Control Data Protection. These services are SaaS services which are pretty much out of bounds in air-gapped environments.

Overview

Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises. Velero lets you:

Take backups of your cluster and restore in case of loss.
Migrate cluster resources to other clusters.
Replicate your production cluster to development and testing clusters.

Velero consists of:

A server that runs on your Kubernetes cluster
A command-line client that runs locally

Velero works with any Kubernetes cluster, including Tanzu Kubernetes Grid and Kubernetes clusters deployed using Container Service Extension with VMware Cloud Director.

This solution can be used for air-gapped environments where the Kubernetes clusters do not have Internet access and cannot use public services such as Amazon S3, or Tanzu Mission Control Data Protection. These services are SaaS services which are pretty much out of bounds in air-gapped environments.

Install Velero onto your workstation

Download the latest Velero release for your preferred operating system, this is usually where you have your kubectl tools.

https://github.com/vmware-tanzu/velero/releases

Extract the contents.

tar zxvf velero-v1.8.1-linux-amd64.tar.gz

You’ll see a folder structure like the following.

ls -l
total 70252
-rw-r----- 1 phanh users    10255 Mar 10 09:45 LICENSE
drwxr-x--- 4 phanh users     4096 Apr 11 08:40 examples
-rw-r----- 1 phanh users    15557 Apr 11 08:52 values.yaml
-rwxr-x--- 1 phanh users 71899684 Mar 15 02:07 velero

Copy the velero binary to the /usr/local/bin location so it is usable from anywhere.

sudo cp velero /usr/local/bin/velero

sudo chmod +x /usr/local/bin/velero

sudo chmod 755 /usr/local/bin/velero

If you want to enable bash auto completion, please follow this guide.

Setup an S3 service and bucket

I’m using TrueNAS’ S3 compatible storage in my lab. TrueNAS is an S3 compliant object storage system and is incredibly easy to setup. You can use other S3 compatible object stores such as Amazon S3. A full list of supported providers can be found here.

Follow these instructions to setup S3 on TrueNAS.

Add certificate, go to System, Certificates
Add, Import Certificate, copy and paste cert.pem and cert.key
Storage, Pools, click on the three dots next to the Pools that will hold the S3 root bucket.
Add a Dataset, give it a name such as s3-storage
Services, S3, click on pencil icon.
Setup like the example below.

Setup the access key and secret key for this configuration.

access key: AKIAIOSFODNN7EXAMPLE
secret key: wJalrXUtnFEMIK7MDENGbPxRfiCYEXAMPLEKEY

Update DNS to point to s3.vmwire.com to 10.92.124.20 (IP of TrueNAS). Note that this FQDN and IP address needs to be accessible from the Kubernetes worker nodes. For example, if you are installing Velero onto Kubernetes clusters in VCD, the worker nodes on the Organization network need to be able to route to your S3 service. If you are a service provider, you can place your S3 service on the services network that is accessible by all tenants in VCD.

Test access

Download and install the S3 browser tool https://s3-browser.en.uptodown.com/windows

Setup the connection to your S3 service using the access key and secret key.

Create a new bucket to store some backups. If you are using Container Service Extension with VCD, create a new bucket for each Tenant organization. This ensures multi-tenancy is maintained. I’ve create a new bucket named tenant1 which corresponds to one of my tenant organizations in my VCD environment.

Install Velero into the Kubernetes cluster

You can use the velero-plugin-for-aws and the AWS provider with any S3 API compatible system, this includes TrueNAS, Cloudian Hyperstore etc.

Setup a file with your access key and secret key details, the file is named credentials-velero.

vi credentials-velero
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMIK7MDENGbPxRfiCYEXAMPLEKEY

Change your Kubernetes context to the cluster that you want to enable for Velero backups. The Velero CLI will connect to your Kubernetes cluster and deploy all the resources for Velero.

velero install \
    --use-restic \
    --default-volumes-to-restic \
    --use-volume-snapshots=false \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.4.0 \
    --bucket tenant1 \
    --backup-location-config region=default,s3ForcePathStyle="true",s3Url=https://s3.vmwire.com:9000 \
    --secret-file ./credentials-velero

To install Restic, use the --use-restic flag in the velero install command. See the install overview for more details on other flags for the install command.

velero install --use-restic

When using Restic on a storage provider that doesn’t have Velero support for snapshots, the --use-volume-snapshots=false flag prevents an unused VolumeSnapshotLocation from being created on installation. The VCD CSI provider does not provide native snapshot capability, that’s why using Restic is a good option here.

I’ve enabled the default behavior to include all persistent volumes to be included in pod backups enabled on all Velero backups running the velero install command with the --default-volumes-to-restic flag. Refer install overview for details.

Specify the bucket with the --bucket flag, I’m using tenant1 here to correspond to a VCD tenant that will have its own bucket for storing backups in the Kubernetes cluster.

For the --backup-location-config flag, configure you settings like mine, and use the s3Url flag to point to your S3 object store, if you don’t use this Velero will use AWS’ S3 public URIs.

A working deployment looks like this

time="2022-04-11T19:24:22Z" level=info msg="Starting Controller" logSource="/go/pkg/mod/github.com/bombsimon/logrusr@v1.1.0/logrusr.go:111" logger=controller.downloadrequest reconciler group=velero.io reconciler kind=DownloadRequest
time="2022-04-11T19:24:22Z" level=info msg="Starting controller" controller=restore logSource="pkg/controller/generic_controller.go:76"
time="2022-04-11T19:24:22Z" level=info msg="Starting controller" controller=backup logSource="pkg/controller/generic_controller.go:76"
time="2022-04-11T19:24:22Z" level=info msg="Starting controller" controller=restic-repo logSource="pkg/controller/generic_controller.go:76"
time="2022-04-11T19:24:22Z" level=info msg="Starting controller" controller=backup-sync logSource="pkg/controller/generic_controller.go:76"
time="2022-04-11T19:24:22Z" level=info msg="Starting workers" logSource="/go/pkg/mod/github.com/bombsimon/logrusr@v1.1.0/logrusr.go:111" logger=controller.backupstoragelocation reconciler group=velero.io reconciler kind=BackupStorageLocation worker count=1
time="2022-04-11T19:24:22Z" level=info msg="Starting workers" logSource="/go/pkg/mod/github.com/bombsimon/logrusr@v1.1.0/logrusr.go:111" logger=controller.downloadrequest reconciler group=velero.io reconciler kind=DownloadRequest worker count=1
time="2022-04-11T19:24:22Z" level=info msg="Starting workers" logSource="/go/pkg/mod/github.com/bombsimon/logrusr@v1.1.0/logrusr.go:111" logger=controller.serverstatusrequest reconciler group=velero.io reconciler kind=ServerStatusRequest worker count=10
time="2022-04-11T19:24:22Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"
time="2022-04-11T19:24:22Z" level=info msg="Backup storage location valid, marking as available" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:121"
time="2022-04-11T19:25:22Z" level=info msg="Validating backup storage location" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:114"
time="2022-04-11T19:25:22Z" level=info msg="Backup storage location valid, marking as available" backup-storage-location=default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:121"

To see all resources deployed, use this command.

k get all -n velero

NAME                          READY   STATUS    RESTARTS   AGE
pod/restic-x6r69              1/1     Running   0          49m
pod/velero-7bc4b5cd46-k46hj   1/1     Running   0          49m

NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/restic   1         1         1       1            1           <none>          49m

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/velero   1/1     1            1           49m

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/velero-7bc4b5cd46   1         1         1       49m

Example to test Velero and Restic integration

Please use this link here: https://velero.io/docs/v1.5/examples/#snapshot-example-with-persistentvolumes

You may need to edit the with-pv.yaml manifest if you don’t have a default storage class.

Useful commands

velero get backup-locations

NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        tenant1          Available   2022-04-11 19:26:22 +0000 UTC   ReadWrite     true

Create a backup example

velero backup create nginx-backup --selector app=nginx

Show backup logs

velero backup logs nginx-backup

Delete a backup

velero delete backup nginx-backup

Show all backups

velero backup get

Backup the VCD PostgreSQL database, see this previous blog post.

velero backup create postgresql --ordered-resources 'statefulsets=vmware-cloud-director/postgresql-primary' --include-namespaces=vmware-cloud-director

Show logs for this backup

velero backup logs postgresql

Describe the postgresql backup

velero backup describe postgresql

Describe volume backups

kubectl -n velero get podvolumebackups -l velero.io/backup-name=nginx-backup -o yaml

apiVersion: v1
items:
- apiVersion: velero.io/v1
  kind: PodVolumeBackup
  metadata:
    annotations:
      velero.io/pvc-name: nginx-logs
    creationTimestamp: "2022-04-13T17:55:04Z"
    generateName: nginx-backup-
    generation: 4
    labels:
      velero.io/backup-name: nginx-backup
      velero.io/backup-uid: c92d306a-bc76-47ba-ac81-5b4dae92c677
      velero.io/pvc-uid: cf3bdb2f-714b-47ee-876c-5ed1bbea8263
    name: nginx-backup-vgqjf
    namespace: velero
    ownerReferences:
    - apiVersion: velero.io/v1
      controller: true
      kind: Backup
      name: nginx-backup
      uid: c92d306a-bc76-47ba-ac81-5b4dae92c677
    resourceVersion: "8425774"
    uid: 1fcdfec5-9854-4e43-8bc2-97a8733ee38f
  spec:
    backupStorageLocation: default
    node: node-7n43
    pod:
      kind: Pod
      name: nginx-deployment-66689547d-kwbzn
      namespace: nginx-example
      uid: 05afa981-a6ac-4caf-963b-95750c7a31af
    repoIdentifier: s3:https://s3.vmwire.com:9000/tenant1/restic/nginx-example
    tags:
      backup: nginx-backup
      backup-uid: c92d306a-bc76-47ba-ac81-5b4dae92c677
      ns: nginx-example
      pod: nginx-deployment-66689547d-kwbzn
      pod-uid: 05afa981-a6ac-4caf-963b-95750c7a31af
      pvc-uid: cf3bdb2f-714b-47ee-876c-5ed1bbea8263
      volume: nginx-logs
    volume: nginx-logs
  status:
    completionTimestamp: "2022-04-13T17:55:06Z"
    path: /host_pods/05afa981-a6ac-4caf-963b-95750c7a31af/volumes/kubernetes.io~csi/pvc-cf3bdb2f-714b-47ee-876c-5ed1bbea8263/mount
    phase: Completed
    progress:
      bytesDone: 618
      totalBytes: 618
    snapshotID: 8aa5e473
    startTimestamp: "2022-04-13T17:55:04Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Kubernetes Gateway API with NSX Advanced Load Balancer (Avi)

Gateway API replaces services of type LoadBalancer in applications that require shared IP with multiple services and network segmentation. The Gateway API can be used to meet the following requirements:
– Shared IP – supporting multiple services, protocols and ports on the same load balancer external IP address
– Network segmentation – supporting multiple networks, e.g., oam, signaling and traffic on the same load balancer

Using LoadBalancers, Gateways, GatewayClasses, AviInfraSettings, IngressClasses and Ingresses

Shared IP – supporting multiple services, protocols and ports on the same load balancer external IP address
Network segmentation – supporting multiple networks, e.g., oam, signaling and traffic on the same load balancer

NSX Advanced Load Balancer (Avi) supports both of these requirements through the use of the Gateway API. The following section describes how this is implemented.

The Gateway API introduces a few new resource types:

GatewayClasses are cluster-scoped resources that act as templates to explicitly define behavior for Gateways derived from them. This is similar in concept to StorageClasses, but for networking data-planes.
Gateways are the deployed instances of GatewayClasses. They are the logical representation of the data-plane which performs routing, which may be in-cluster proxies, hardware LBs, or cloud LBs.

Aviinfrasetting

A sample Avi Infra Setting is as shown below:

apiVersion: ako.vmware.com/v1alpha1
kind: AviInfraSetting
metadata:
  name: aviinfrasetting-tkg-wkld-oam
spec:
  seGroup:
    name: tkgvsphere-tkgworkload-group10
  network:
    vipNetworks:
      - networkName: tkg-wkld-oam-vip
        cidr: 10.223.63.0/26
    enableRhi: false

Avi Infra Setting is a cluster scoped CRD and can be attached to the intended Services. Avi Infra setting resources can be attached to Services using Gateway APIs.

GatewayClass

Gateway APIs provide interfaces to structure Kubernetes service networking.

AKO supports Gateway APIs via the servicesAPI flag in the values.yaml.

The Avi Infra Setting resource can be attached to a Gateway Class object, via the .spec.parametersRef as shown below:

apiVersion: networking.x-k8s.io/v1alpha1
kind: GatewayClass
metadata:
  name: avigatewayclass-tkg-wkld-oam
spec:
  controller: ako.vmware.com/avi-lb
  parametersRef:
    group: ako.vmware.com
    kind: AviInfraSetting
    name: aviinfrasetting-tkg-wkld-oam

Gateway

The Gateway object provides a way to configure multiple Services as backends to the Gateway using label matching. The labels are specified as constant key-value pairs, the keys being ako.vmware.com/gateway-namespace and ako.vmware.com/gateway-name. The values corresponding to these keys must match the Gateway namespace and name respectively, for AKO to consider the Gateway valid. In case any one of the label keys are not provided as part of matchLabels OR the namespace/name provided in the label values do no match the actual Gateway namespace/name, AKO will consider the Gateway invalid. Please see https://avinetworks.com/docs/ako/1.5/gateway/.

kind: Gateway
apiVersion: networking.x-k8s.io/v1alpha1
metadata:
  name: app-gateway-admin-0
  namespace: default
spec:
  gatewayClassName: avigatewayclass-tkg-wkld-oam
  listeners:
  - protocol: UDP
    port: 161
    routes:
      selector:
        matchLabels:
          ako.vmware.com/gateway-name: app-gateway-admin-0
          ako.vmware.com/gateway-namespace: default
      group: v1
      kind: Service
  - protocol: TCP
    port: 80
    routes:
      selector:
        matchLabels:
          ako.vmware.com/gateway-name: app-gateway-admin-0
          ako.vmware.com/gateway-namespace: default
      group: v1
      kind: Service
  - protocol: TCP
    port: 443
    routes:
      selector:
        matchLabels:
          ako.vmware.com/gateway-name: app-gateway-admin-0
          ako.vmware.com/gateway-namespace: default
      group: v1
      kind: Service

How to use the GatewayAPI

In your helm charts, for any service that needs a LoadBalancer service. You would now want to use ClusterIP instead but use Labels such as the following:

apiVersion: v1
kind: Service
metadata:
  name: web-statefulset-service-oam
  namespace: default
  labels:
    ako.vmware.com/gateway-name: app-gateway-admin-0
    ako.vmware.com/gateway-namespace: default
spec:
  selector:
  app: nginx
  ports:
  - port: 8443
    targetPort: 443
    protocol: TCP
    type: ClusterIP

The Gateway Labels

ako.vmware.com/gateway-name: app-gateway-admin-0
ako.vmware.com/gateway-namespace: default

and the ClusterIP type tells the Avi Kubernetes Operator (AKO) to use the gateways, each gateway is on a separate network segment for traffic separation.

The gateways also have the relevant ports that the application uses, configure your gateway and change your helm chart to use the gateway objects.

Ingress Class

Avi Infra Settings can be applied to Ingress resources, using the IngressClass construct. IngressClass provides a way to configure Controller-specific load balancing parameters and applies these configurations to a set of Ingress objects. AKO supports listening to IngressClass resources in Kubernetes version 1.19+. The Avi Infra Setting reference can be provided in the Ingress Class as shown below:

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: avi-ingress-class-oam
spec:
  controller: ako.vmware.com/avi-lb
  parameters:
    apiGroup: ako.vmware.com
    kind: AviInfraSetting
    name: aviinfrasetting-tkg-wkld-oam
---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: avi-ingress-class-trf
spec:
  controller: ako.vmware.com/avi-lb
  parameters:
    apiGroup: ako.vmware.com
    kind: AviInfraSetting
    name: aviinfrasetting-tkg-wkld-trf
    ---
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: avi-ingress-class-trf
spec:
  controller: ako.vmware.com/avi-lb
  parameters:
    apiGroup: ako.vmware.com
    kind: AviInfraSetting
    name: aviinfrasetting-tkg-wkld-sigtran

Using IngresClass

The Avi Infra Setting resource can be attached to a Gateway Class object and Ingress Class object, via the .spec.parametersRef. However, using annotations with LoadBalancer object instead of using labels with Gateway API object, you will not be able to use shared protocol and ports on the same IP address. For example, TCP AND UDP 53 on the same LoadBalancer IP address. This is not supported yet, until MixedProtocolLB is supported by Kubernetes.

To provide a Controller to implement a given ingress, in addition to creating the IngressClass object, the ingressClassName should be specified, that matches the IngressClass name. The ingress looks as shown below:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: my-ingress
spec:
  ingressClassName: avi-ingress-class-oam
  rules:
    - host: my-website.my-domain.com
      http:
        paths:
        - path: /foo
          backend:
            serviceName: web-service-1
            servicePort: 443

Using Annotation with Services of type LoadBalancer

Services of Type LoadBalancer can specify the Avi Infra Setting using an annotation as shown below without using Gateway API objects:

annotations:
    aviinfrasetting.ako.vmware.com/name: "aviinfrasetting-tkg-wkld-sigtran"

annotations:
    aviinfrasetting.ako.vmware.com/name: "aviinfrasetting-tkg-wkld-trf”

annotations:
    aviinfrasetting.ako.vmware.com/name: "aviinfrasetting-tkg-wkld-oam"

Automated installation of Container Service Extension 3.1.2

This post is an update to enable the automated installation of Container Service Extension to version 3.1.2, the script is also updated for better efficiency.

You can find the details on my github account under the repository named cse-automated.

https://github.com/hugopow/cse-automated

Ensure you review the README.MD and read the comments in the script too.

Pre-Requisites

Deploy Photon OVA into vSphere, 2 VCPUs, 4GB RAM is more than enough
Assign VM a hostname and static IP
Ensure it can reach the Internet
Ensure it can also reach VCD on TCP 443 and vCenter servers registered in VCD on TCP 443.
SSH into the Photon VM
Note that my environment has CA signed SSL certs and the script has been tested against this environment. I have not tested the script in environments with self-signed certificates.

Download cse-install.sh script to Photon VM

# Download the script to the Photon VM
curl https://raw.githubusercontent.com/hugopow/cse-automated/main/cse-install.sh --output cse-install.sh

#  Make script executable
chmod +x cse-install.sh

Change the cse-install.sh script

Make sure you change passwords, CA SSL certificates and environment variables to suit your environment.

Launch the script, sit back and relax

# Run as root
sh cse-install.sh

Demo Video

Old video of CSE 3.0.4 automated install, but still the same process.

Enable Feature Gates for kube-apiserver on TKG clusters

Feature gates are a set of key=value pairs that describe Kubernetes features. You can turn these features on or off using the a ytt overlay file or by editing KubeadmControlPlane or VSphereMachineTemplate. This post, shows you how to enable a feature gate by enabling the MixedProtocolLBService to the TKG kube-apiserver. It can be used to enable other feature gates as well, however, I am using the MixedProtocolLBService to test this at one of my customers.

Note that enabling feature gates on TKG clusters is unsupported.

The customer has a requirement to test mixed protocols in the same load balancer service (multiple ports and protocols on the same load balancer IP address). This feature is currently in alpha and getting a head start on alpha features is always a good thing to do to stay ahead.

For example to do this in a LoadBalancer service (with the MixedProtocolLBService feature gate enabled):

apiVersion: v1
kind: Service
metadata:
  name: mixed-protocol-dns
spec:
  type: LoadBalancer
  ports:
    - name: dns-udp
      port: 53
      protocol: UDP
    - name: dns-tcp
      port: 53
      protocol: TCP
  selector:
    app: my-dns-server

Today, without enabling this feature gate, can only be achieved using the Gateway API. The gateway object would look something like this:

apiVersion: networking.x-k8s.io/v1alpha1
kind: Gateway
metadata:
  name: gateway-tkg-dns
  namespace: default
spec:
  gatewayClassName: gatewayclass-tkg-workload
  listeners:
  - protocol: TCP
    port: 53
    routes:
      selector:
        matchLabels:
          ako.vmware.com/gateway-name: gateway-tkg-dns
          ako.vmware.com/gateway-namespace: default
      group: v1
      kind: Service
  - protocol: UDP
    port: 53
    routes:
      selector:
        matchLabels:
          ako.vmware.com/gateway-name: gateway-tkg-dns
          ako.vmware.com/gateway-namespace: default
      group: v1
      kind: Service

And the service would look something like this.

apiVersion: v1
kind: Service
metadata:
  name: mixed-protocol-dns
  namespace: default
  labels:
    ako.vmware.com/gateway-name: gateway-tkg-dns
    ako.vmware.com/gateway-namespace: default
spec:
  selector:
    app: nginx
  ports:
    - port: 53
      targetPort: 53
      protocol: TCP
    - port: 53
      targetPort: 53
      protocol: UDP
  type: ClusterIP

Let’s assume that you want to enable this feature gate before deploying a new TKG cluster. I’ll show you how to enable this on an existing cluster further down the post.

Greenfield – before creating a new TKG cluster

Create a new overlay file named kube-apiserver-feature-gates.yaml. Place this file in your ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/ directory. For more information on ytt overlays, please read this link.

#! Please add any overlays specific to vSphere provider under this file.

#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")

#! Enable MixedProtocolLBService feature gate on kube api.
#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
---
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          #@overlay/match missing_ok=True
          feature-gates: MixedProtocolLBService=true

Deploy the TKG cluster.

Inspect the kube-apiserver pod for feature gate

k get po -n kube-system kube-apiserver-tkg-test-control-plane-#####  -o yaml

You should see on line 44 that the overlay has enabled the feature gate.

kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 172.16.3.66:6443
    kubernetes.io/config.hash: 15fb674a0f0f4d8b5074593f74365f98
    kubernetes.io/config.mirror: 15fb674a0f0f4d8b5074593f74365f98
    kubernetes.io/config.seen: "2022-03-08T22:05:59.729647404Z"
    kubernetes.io/config.source: file
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2022-03-08T22:06:00Z"
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver-tkg-test-control-plane-fmpw2
  namespace: kube-system
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Node
    name: tkg-test-control-plane-fmpw2
    uid: 9fa5077e-4802-46ac-bce7-0cf62252e0e6
  resourceVersion: "2808"
  uid: fe22305b-5be1-48b3-b4be-d660d1d307b6
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=172.16.3.66
    - --allow-privileged=true
    - --audit-log-maxage=30
    - --audit-log-maxbackup=10
    - --audit-log-maxsize=100
    - --audit-log-path=/var/log/kubernetes/audit.log
    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cloud-provider=external
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --feature-gates=MixedProtocolLBService=true
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-issuer=https://kubernetes.default.svc.cluster.local
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=100.64.0.0/13
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt

Inspect kubeadmcontrolplane, this is the control plane template for the master node, and all subsequent master nodes that are deployed. You can see on line 32, that the feature gate flag is enabled.

k get kubeadmcontrolplane tkg-test-control-plane -o yaml

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  creationTimestamp: "2022-03-08T22:03:12Z"
  finalizers:
  - kubeadm.controlplane.cluster.x-k8s.io
  generation: 1
  labels:
    cluster.x-k8s.io/cluster-name: tkg-test
  name: tkg-test-control-plane
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Cluster
    name: tkg-test
    uid: b0d75a37-9968-4119-bc56-c9fa2347be55
  resourceVersion: "8160318"
  uid: 72d74b68-d386-4f75-b54b-b1a8ab63b379
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          audit-log-maxage: "30"
          audit-log-maxbackup: "10"
          audit-log-maxsize: "100"
          audit-log-path: /var/log/kubernetes/audit.log
          audit-policy-file: /etc/kubernetes/audit-policy.yaml
          cloud-provider: external
          feature-gates: MixedProtocolLBService=true

Now if you created a service with mixed protocols, the kube-apiserver will accept the service and will tell the load balancer to deploy the service.

Brownfield – enable feature gates on an existing cluster

Enabling feature gates on an already deployed cluster is a little bit harder to do, as you need to be extra careful that you don’t break your current cluster.

Let’s edit the KubeadmControlPlane template, you need to do this in the tkg-mgmt cluster context

kubectl config use-context tkg-mgmt-admin@tkg-mgmt
kubectl edit kubeadmcontrolplane tkg-hugo-control-plane

Find the line:

spec.kubeadmConfigSpec.apiServer.extraArgs

Add in the following line:

feature-gates: MixedProtocolLBService=true

so that section now looks like this:

spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          feature-gates: MixedProtocolLBService=true
          audit-log-maxage: "30"
          audit-log-maxbackup: "10"
          audit-log-maxsize: "100"
          audit-log-path: /var/log/kubernetes/audit.log
          audit-policy-file: /etc/kubernetes/audit-policy.yaml
          cloud-provider: external

Save the changes with :wq!

You’ll see that TKG has immediately started to clone a new control plane VM. Wait for the new VM to replace the current one.

If you inspect the new control plane VM, you’ll see that it has the feature gate applied. You need to do this in the worker cluster context that you want the feature gate enabled on, in my case tkg-hugo.

Note that enabling the feature gate to spec.kubeadmconfigspec.clusterconfiguration.apiserver.extraargs actually, enables the feature gate on the kube-apiserver, which in TKG runs in a pod.

kubectl config use-context tkg-hugo-admin@tkg-hugo
k get po kube-apiserver-tkg-hugo-control-plane-#### -n kube-system -o yaml

Go to the line spec.containers.command.kubeapiserver. You’ll see something like the following:

spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=172.16.3.82
    - --allow-privileged=true
    - --audit-log-maxage=30
    - --audit-log-maxbackup=10
    - --audit-log-maxsize=100
    - --audit-log-path=/var/log/kubernetes/audit.log
    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cloud-provider=external
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379
    - --feature-gates=MixedProtocolLBService=true

Congratulations, the feature gate is now enabled!

Deploy Harbor Registry with Tanzu Packages and expose with Ingress

In the previous post, I described how to install Harbor using Helm to utilize ChartMuseum for running Harbor as a Helm chart repository.

The Harbor registry that comes shipped with TKG 1.5.1 uses Tanzu Packages to deploy Harbor into a TKG cluster. This version of Harbor does not support Helm Charts using ChartMuseum. VMware dropped support for ChartMuseum in TKG and are adopting OCI registries instead. This post describes how to deploy Harbor using the Tanzu Packages (KApp) and use Harbor as an OCI registry that fully supports Helm charts. This is the preferred way to use chart and image registries.

The latest versions as of TKG 1.5.1 packages, February 2022.

Package	Version
cert-manager	1.5.3+vmware.2-tkg.1
contour	1.18.2+vmware.1-tkg.1
harbor	2.3.3+vmware.1-tkg.1

Or run the following to see the latest available versions.

tanzu package available list harbor.tanzu.vmware.com -A

Pre-requisites

Before installing Harbor, you need to install Cert Manager and Contour. You can follow this other guide here to get started. This post uses Ingress, which requires NSX Advanced Load Balancer (Avi). The previous post will show you how to install these pre-requisites.

Deploy Harbor

Create a configuration file named harbor-data-values.yaml. This file configures the Harbor package. Follow the steps below to obtain a template file.

image_url=$(kubectl -n tanzu-package-repo-global get packages harbor.tanzu.vmware.com.2.3.3+vmware.1-tkg.1 -o jsonpath='{.spec.template.spec.fetch[0].imgpkgBundle.image}')

imgpkg pull -b $image_url -o /tmp/harbor-package-2.3.3+vmware.1-tkg.1

cp /tmp/harbor-package-2.3.3+vmware.1-tkg.1/config/values.yaml harbor-data-values.yaml

Set the mandatory passwords and secrets in the harbor-data-values.yaml file by automatically generating random passwords and secrets:

bash /tmp/harbor-package-2.3.3+vmware.1-tkg.1/config/scripts/generate-passwords.sh harbor-data-values.yaml

Specify other settings in the harbor-data-values.yaml file.

Set the hostname setting to the hostname you want to use to access Harbor via ingress. For example, harbor.yourdomain.com.

To use your own certificates, update the tls.crt, tls.key, and ca.crt settings with the contents of your certificate, key, and CA certificate. The certificate can be signed by a trusted authority or be self-signed. If you leave these blank, Tanzu Kubernetes Grid automatically generates a self-signed certificate.

The format of the tls.crt and tls.key looks like this:

tlsCertificate:
  tls.crt: |
    -----BEGIN CERTIFICATE-----
    ---snipped---
    -----END CERTIFICATE-----
  tls.key: |
    -----BEGIN PRIVATE KEY-----
    ---snipped---
    -----END PRIVATE KEY-----

If you used the generate-passwords.sh script, optionally update the harborAdminPassword with something that is easier to remember.

Optionally update other persistence settings to specify how Harbor stores data.

If you need to store a large quantity of container images in Harbor, set persistence.persistentVolumeClaim.registry.size to a larger number.

If you do not update the storageClass under persistence settings, Harbor uses the cluster’s default storageClass.

Remove all comments in the harbor-data-values.yaml file:

yq -i eval '... comments=""' harbor-data-values.yaml

Install the Harbor package:

tanzu package install harbor \
--package-name harbor.tanzu.vmware.com \
--version 2.3.3+vmware.1-tkg.1 \
--values-file harbor-data-values.yaml \
--namespace my-packages

Obtain the address of the Envoy service load balancer.

kubectl get svc envoy -n tanzu-system-ingress -o jsonpath='{.status.loadBalancer.ingress[0]}'

Update your DNS record to point the hostname to the IP address above.

Update Harbor

Update the Harbor installation in any way, such as updating the TLS certificate, make your changes to the harbor-data-values.yaml file then run the following to update Harbor.

tanzu package installed update harbor --version 2.3.3+vmware.1-tkg.1 --values-file harbor-data-values.yaml --namespace my-packages

Using Harbor as an OCI Registry for Helm Charts

helm registry login -u admin harbor2.vmwire.com

Package a helm chart if you haven’t got one already packaged

helm package buildachart

Upload a chart to the registry

helm push buildachart-0.1.0.tgz oci://harbor2.vmwire.com/chartrepo

The chart can now be seen in the Harbor UI in the view as where normal Docker images are.

Notice that this is an OCI registry and not a Helm repository that is based on ChartMuseum, thats why you won’t see the ‘Helm Charts’ tab next to the ‘Repositories’ tab.

Deploy an application with Helm

Let’s deploy the buildachart application, this is a simple nginx application that can use TLS so we have a secure site with HTTPS.

Create a new namespace and the TLS secret for the application. Copy the tls.crt and tls.key files in pem format to $HOME/certs/

# Create a new namespace for cherry
k create ns cherry

# Create a TLS secret with the contents of tls.key and tls.crt in the cherry namespace
kubectl create secret tls cherry-tls --key $HOME/certs/tls.key --cert $HOME/certs/tls.crt -n cherry

Deploy the app using Harbor as the Helm chart repository

helm install buildachart oci://harbor2.vmwire.com/chartrepo/buildachart --version 0.1.0 -n cherry

If you need to install Helm

Follow this link here.

Useful links

https://helm.sh/docs/topics/registries/

https://opensource.com/article/20/5/helm-charts

https://itnext.io/helm-3-8-0-oci-registry-support-b050ff218911

Quick guide to install cert-manager, contour, prometheus and grafana into TKG using Tanzu Packages (Kapp)

Intro

For an overview of Kapp, please see this link here.

The latest versions as of TKG 1.5.1, February 2022.

Package	Version
cert-manager	1.5.3+vmware.2-tkg.1
contour	1.18.2+vmware.1-tkg.1
prometheus	2.27.0+vmware.2-tkg.1
grafana	7.5.7+vmware.2-tkg.1

Or run the following to see the latest available versions.

tanzu package available list cert-manager.tanzu.vmware.com -A
tanzu package available list contour.tanzu.vmware.com -A
tanzu package available list prometheus.tanzu.vmware.com -A
tanzu package available list grafana.tanzu.vmware.com -A

Install Cert Manager

tanzu package install cert-manager \
--package-name cert-manager.tanzu.vmware.com \
--namespace my-packages \
--version 1.5.3+vmware.2-tkg.1 \
--create-namespace

I’m using ingress with Contour which needs a load balancer to expose the ingress services. Install AKO and NSX Advanced Load Balancer (Avi) by following this previous post.

Install Contour

Create a file named contour-data-values.yaml, this example uses NSX Advanced Load Balancer (Avi)

---
infrastructure_provider: vsphere
namespace: tanzu-system-ingress
contour:
 configFileContents: {}
 useProxyProtocol: false
 replicas: 2
 pspNames: "vmware-system-restricted"
 logLevel: info
envoy:
 service:
   type: LoadBalancer
   annotations: {}
   nodePorts:
     http: null
     https: null
   externalTrafficPolicy: Cluster
   disableWait: false
 hostPorts:
   enable: true
   http: 80
   https: 443
 hostNetwork: false
 terminationGracePeriodSeconds: 300
 logLevel: info
 pspNames: null
certificates:
 duration: 8760h
 renewBefore: 360h

Remove comments in the contour-data-values.yaml file.

yq -i eval '... comments=""' contour-data-values.yaml

Deploy contour

tanzu package install contour \
--package-name contour.tanzu.vmware.com \
--version 1.18.2+vmware.1-tkg.1 \
--values-file contour-data-values.yaml \
--namespace my-packages

Install Prometheus

Download the prometheus-data-values.yaml file to use custom values to use ingress.

image_url=$(kubectl -n tanzu-package-repo-global get packages prometheus.tanzu.vmware.com.2.27.0+vmware.2-tkg.1 -o jsonpath='{.spec.template.spec.fetch[0].imgpkgBundle.image}')

imgpkg pull -b $image_url -o /tmp/prometheus-package-2.27.0+vmware.2-tkg.1

cp /tmp/prometheus-package-2.27.0+vmware.2-tkg.1/config/values.yaml prometheus-data-values.yaml

Edit the file and change any settings you need such as adding the TLS certificate and private key for ingress. It’ll look something like this.

ingress:
  enabled: true
  virtual_host_fqdn: "prometheus-tkg-mgmt.vmwire.com"
  prometheus_prefix: "/"
  alertmanager_prefix: "/alertmanager/"
  prometheusServicePort: 80
  alertmanagerServicePort: 80
  tlsCertificate:
    tls.crt: |
      -----BEGIN CERTIFICATE-----
      --- snipped---
      -----END CERTIFICATE-----
    tls.key: |
      -----BEGIN PRIVATE KEY-----
      --- snipped---
      -----END PRIVATE KEY-----

Remove comments in the prometheus-data-values.yaml file.

yq -i eval '... comments=""' prometheus-data-values.yaml

Deploy prometheus

tanzu package install prometheus \
--package-name prometheus.tanzu.vmware.com \
--version 2.27.0+vmware.2-tkg.1 \
--values-file prometheus-data-values.yaml \
--namespace my-packages

Install Grafana

Download the grafana-data-values.yaml file.

image_url=$(kubectl -n tanzu-package-repo-global get packages grafana.tanzu.vmware.com.7.5.7+vmware.2-tkg.1 -o jsonpath='{.spec.template.spec.fetch[0].imgpkgBundle.image}')

imgpkg pull -b $image_url -o /tmp/grafana-package-7.5.7+vmware.2-tkg.1

cp /tmp/grafana-package-7.5.7+vmware.2-tkg.1/config/values.yaml grafana-data-values.yaml

Generate a Base64 password and edit the grafana-data-values.yaml file to update the default admin password.

echo -n 'Vmware1!' | base64

Also update the TLS configuration to use signed certificates for ingress. It will look something like this.

  secret:
    type: "Opaque"
    admin_user: "YWRtaW4="
    admin_password: "Vm13YXJlMSE="

ingress:
  enabled: true
  virtual_host_fqdn: "grafana-tkg-mgmt.vmwire.com"
  prefix: "/"
  servicePort: 80
  #! [Optional] The certificate for the ingress if you want to use your own TLS certificate.
  #! We will issue the certificate by cert-manager when it's empty.
  tlsCertificate:
    #! [Required] the certificate
    tls.crt: |
      -----BEGIN CERTIFICATE-----
      ---snipped---
      -----END CERTIFICATE-----
    #! [Required] the private key
    tls.key: |
      -----BEGIN PRIVATE KEY-----
      ---snipped---
      -----END PRIVATE KEY-----

Since I’m using ingress to expose the Grafana service, also change line 33, from LoadBalancer to ClusterIP. This prevents Kapp from creating an unnecessary service that will consume an IP address.

#! Grafana service configuration
   service:
     type: ClusterIP
     port: 80
     targetPort: 3000
     labels: {}
     annotations: {}

Remove comments in the grafana-data-values.yaml file.

yq -i eval '... comments=""' grafana-data-values.yaml

Deploy Grafana

tanzu package install grafana \
--package-name grafana.tanzu.vmware.com \
--version 7.5.7+vmware.2-tkg.1 \
--values-file grafana-data-values.yaml \
--namespace my-packages

Accessing Grafana

Since I’m using ingress and I set the ingress FQDN as grafana-tkg-mgmt.vmwire.com and I also used TLS. I can now access the Grafana UI using https://grafana-tkg-mgmt.vmwire.com and enjoy a secure connection.

Listing all installed packages

tanzu package installed list -A

Making changes to Contour, Prometheus or Grafana

If you need to make changes to any of the configuration files, you can then update the deployment with the tanzu package installed update command.

tanzu package installed update contour \
--version 1.18.2+vmware.1-tkg.1 \
--values-file contour-data-values.yaml \
--namespace my-packages

tanzu package installed update prometheus \
--version 2.27.0+vmware.2-tkg.1 \
--values-file prometheus-data-values.yaml \
--namespace my-packages

tanzu package installed update grafana \
--version 7.5.7+vmware.2-tkg.1 \
--values-file grafana-data-values.yaml \
--namespace my-packages

Removing Cert Manager, Contour, Prometheus or Grafana

tanzu package installed delete cert-manager -n my-packages

tanzu package installed delete contour -n my-packages

tanzu package installed delete prometheus -n my-packages

tanzu package installed delete grafana -n my-packages

Copypasta for doing this again on another cluster

Place all your completed data-values files into a directory and just run the entire code block below to set everything up in one go.

# Deploy cert-manager
tanzu package install cert-manager \
--package-name cert-manager.tanzu.vmware.com \
--namespace my-packages \
--version 1.5.3+vmware.2-tkg.1 \
--create-namespace

# Deploy contour
yq -i eval '... comments=""' contour-data-values.yaml
tanzu package install contour \
--package-name contour.tanzu.vmware.com \
--version 1.18.2+vmware.1-tkg.1 \
--values-file contour-data-values.yaml \
--namespace my-packages

# Deploy prometheus
yq -i eval '... comments=""' prometheus-data-values.yaml
tanzu package install prometheus \
--package-name prometheus.tanzu.vmware.com \
--version 2.27.0+vmware.2-tkg.1 \
--values-file prometheus-data-values.yaml \
--namespace my-packages

# Deploy grafana
yq -i eval '... comments=""' grafana-data-values.yaml
tanzu package install grafana \
--package-name grafana.tanzu.vmware.com \
--version 7.5.7+vmware.2-tkg.1 \
--values-file grafana-data-values.yaml \
--namespace my-packages

Using local storage with Tanzu Kubernetes Grid Topology Aware Volume Provisioning

With the vSphere CSI driver, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.

With this model, it is possible to present individual SSDs or NVMe drives attached to an ESXi host and configure a local datastore for use with topology aware volume provisioning. Kubernetes can then create persistent volumes and schedule pods that are deployed onto the worker nodes that are on the same ESXi host as the volume. This enables Kubernetes pods to have direct local access to the underlying storage.

With the vSphere CSI driver version 2.4.1, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.

Using local storage has distinct advantages over shared storage, especially when it comes to supporting faster and cheaper storage media for applications that do not benefit from or require the added complexity of having their data replicated by the storage layer. Examples of applications that do not require storage protection (RAID or failures to tolerate) are applications that can achieve data protection at the application level.

To setup such an environment, it is necessary to go over some of the requirements first.

Deploy Tanzu Kubernetes Clusters to Multiple Availability Zones on vSphere – link
Spread Nodes Across Multiple Hosts in a Single Compute Cluster
Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware – link
Deploy TKG clusters into a multi-AZ topology
Deploy the k8s-local-ssd storage class
Deploy Workloads with WaitForFirstConsumer Mode in Topology-Aware Environment – link

Before you start

Note that only the CSI driver for vSphere version 2.4.1 supports local storage topology in a multi-AZ topology. To check if you have the correct version in your TKG cluster, run the following.

tanzu package installed get vsphere-csi -n tkg-system

- Retrieving installation details for vsphere-csi... I0224 19:20:29.397702  317993 request.go:665] Waited for 1.03368201s due to client-side throttling, not priority and fairness, request: GET:https://172.16.3.94:6443/apis/secretgen.k14s.io/v1alpha1?timeout=32s
\ Retrieving installation details for vsphere-csi...
NAME:                    vsphere-csi
PACKAGE-NAME:            vsphere-csi.tanzu.vmware.com
PACKAGE-VERSION:         2.4.1+vmware.1-tkg.1
STATUS:                  Reconcile succeeded
CONDITIONS:              [{ReconcileSucceeded True  }]

Deploy Tanzu Kubernetes Clusters to Multiple Availibility Zones on vSphere

In my example, I am using the Spread Nodes Across Multiple Hosts in a Single Compute Cluster example, each ESXi host is an availability zone (AZ) and the vSphere cluster is the Region.

Figure 1. shows a TKG cluster with three worker nodes, each node is running on a separate ESXi host. Each ESXi host has a local SSD drive formatted with VMFS 6. The topology aware volume provisioner would always place pods and their replicas on separate worker nodes and also any persistent volume claims (PVC) on separate ESXi hosts.

Parameter	Specification	vSphere object	Datastore
Region	tagCategory: k8s-region	cluster*
Zone az-1 az-2 az-3	tagCategory: k8s-zone host-group-1 host-group-2 host-group-3	esx1.vcd.lab esx2.vcd.lab esx3.vcd.lab	esx1-ssd-1 esx2-ssd-1 esx3-ssd-1
Storage Policy	k8s-local-ssd		esx1-ssd-1 esx2-ssd-1 esx3-ssd-1
Tags	tagCategory: k8s-storage tag: k8s-local-ssd		esx1-ssd-1 esx2-ssd-1 esx3-ssd-1

*Note that “cluster” is the name of my vSphere cluster.

Ensure that you’ve set up the correct rules that enforce worker nodes to their respective ESXi hosts. Always use “Must run on hosts in group“, this is very important for local storage topology to work. This is because the worker nodes will be labelled for topology awareness, and if a worker node is vMotion’d accidentally then the CSI driver will not be able to bind the PVC to the worker node.

Below is my vsphere-zones.yaml file.

Note that autoConfigure is set to true. Which means that you do not have to tag the cluster or the ESX hosts yourself, you would only need to setup up the affinity rules under Cluster, Configure, VM/Host Groups and VM/Host Rules. The setting autoConfigure: true, would then make CAPV automatically configure the tags and tag categories for you.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereFailureDomain
metadata:
 name: az-1
spec:
 region:
   name: cluster
   type: ComputeCluster
   tagCategory: k8s-region
   autoConfigure: true
 zone:
   name: az-1
   type: HostGroup
   tagCategory: k8s-zone
   autoConfigure: true
 topology:
   datacenter: home.local
   computeCluster: cluster
   hosts:
     vmGroupName: workers-group-1
     hostGroupName: host-group-1
   datastore: lun01
   networks:
   - tkg-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereFailureDomain
metadata:
 name: az-2
spec:
 region:
   name: cluster
   type: ComputeCluster
   tagCategory: k8s-region
   autoConfigure: true
 zone:
   name: az-2
   type: HostGroup
   tagCategory: k8s-zone
   autoConfigure: true
 topology:
   datacenter: home.local
   computeCluster: cluster
   hosts:
     vmGroupName: workers-group-2
     hostGroupName: host-group-2
   datastore: lun01
   networks:
   - tkg-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereFailureDomain
metadata:
 name: az-3
spec:
 region:
   name: cluster
   type: ComputeCluster
   tagCategory: k8s-region
   autoConfigure: true
 zone:
   name: az-3
   type: HostGroup
   tagCategory: k8s-zone
   autoConfigure: true
 topology:
   datacenter: home.local
   computeCluster: cluster
   hosts:
     vmGroupName: workers-group-3
     hostGroupName: host-group-3
   datastore: lun01
   networks:
   - tkg-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-1
spec:
 server: vcenter.vmwire.com
 failureDomain: az-1
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-2
spec:
 server: vcenter.vmwire.com
 failureDomain: az-2
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereDeploymentZone
metadata:
 name: az-3
spec:
 server: vcenter.vmwire.com
 failureDomain: az-3
 placementConstraint:
   resourcePool: tkg-vsphere-workload
   folder: tkg-vsphere-workload

Note that Kubernetes does not like using parameter names that are not standard, I suggest for your vmGroupName and hostGroupName parameters, use lowercase and dashes instead of periods. For example host-group-3, instead of Host.Group.3. The latter will be rejected.

Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware

To ensure that this topology can be built by TKG, we first need to create a TKG cluster plan overlay that tells Tanzu how what to do when creating worker nodes in a multi-availability zone topology.

Lets take a look at my az-overlay.yaml file.

Since I have three AZs, I need to create an overlay file that includes the cluster plan for all three AZs.

Parameter	Specification
Zone az-1 az-2 az-3	VSphereMachineTemplate -worker-0 -worker-1 -worker-2	KubeadmConfigTemplate -md-0 -md-1 -md-2

#! Please add any overlays specific to vSphere provider under this file.

#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")

#@ load("lib/helpers.star", "get_bom_data_for_tkr_name", "get_default_tkg_bom_data", "kubeadm_image_repo", "get_image_repo_for_component", "get_vsphere_thumbprint")

#@ load("lib/validate.star", "validate_configuration")
#@ load("@ytt:yaml", "yaml")
#@ validate_configuration("vsphere")

#@ bomDataForK8sVersion = get_bom_data_for_tkr_name()

#@ if data.values.CLUSTER_PLAN == "dev" and not data.values.IS_WINDOWS_WORKLOAD_CLUSTER:
#@overlay/match by=overlay.subset({"kind":"VSphereCluster"})
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
  name: #@ data.values.CLUSTER_NAME
spec:
  thumbprint: #@ get_vsphere_thumbprint()
  server: #@ data.values.VSPHERE_SERVER
  identityRef:
    kind: Secret
    name: #@ data.values.CLUSTER_NAME

#@overlay/match by=overlay.subset({"kind":"MachineDeployment", "metadata":{"name": "{}-md-0".format(data.values.CLUSTER_NAME)}})
---
spec:
  template:
    spec:
      #@overlay/match missing_ok=True
      #@ if data.values.VSPHERE_AZ_0:
      failureDomain: #@ data.values.VSPHERE_AZ_0
      #@ end
      infrastructureRef:
        name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)

#@overlay/match by=overlay.subset({"kind":"VSphereMachineTemplate", "metadata":{"name": "{}-worker".format(data.values.CLUSTER_NAME)}})
---
metadata:
  name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
spec:
  template:
    spec:
      #@overlay/match missing_ok=True
      #@ if data.values.VSPHERE_AZ_0:
      failureDomain: #@ data.values.VSPHERE_AZ_0
      #@ end
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
  #@overlay/match missing_ok=True
  annotations:
    vmTemplateMoid: #@ data.values.VSPHERE_TEMPLATE_MOID
spec:
  template:
    spec:
      cloneMode:  #@ data.values.VSPHERE_CLONE_MODE
      datacenter: #@ data.values.VSPHERE_DATACENTER
      datastore: #@ data.values.VSPHERE_DATASTORE
      storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
      diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
      folder: #@ data.values.VSPHERE_FOLDER
      memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
      network:
        devices:
          #@overlay/match by=overlay.index(0)
          #@overlay/replace
          - networkName: #@ data.values.VSPHERE_NETWORK
            #@ if data.values.WORKER_NODE_NAMESERVERS:
            nameservers: #@ data.values.WORKER_NODE_NAMESERVERS.replace(" ", "").split(",")
            #@ end
            #@ if data.values.TKG_IP_FAMILY == "ipv6":
            dhcp6: true
            #@ elif data.values.TKG_IP_FAMILY in ["ipv4,ipv6", "ipv6,ipv4"]:
            dhcp4: true
            dhcp6: true
            #@ else:
            dhcp4: true
            #@ end
      numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
      resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
      server: #@ data.values.VSPHERE_SERVER
      template: #@ data.values.VSPHERE_TEMPLATE
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
  #@overlay/match missing_ok=True
  annotations:
    vmTemplateMoid: #@ data.values.VSPHERE_TEMPLATE_MOID
spec:
  template:
    spec:
      cloneMode:  #@ data.values.VSPHERE_CLONE_MODE
      datacenter: #@ data.values.VSPHERE_DATACENTER
      datastore: #@ data.values.VSPHERE_DATASTORE
      storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
      diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
      folder: #@ data.values.VSPHERE_FOLDER
      memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
      network:
        devices:
          #@overlay/match by=overlay.index(0)
          #@overlay/replace
          - networkName: #@ data.values.VSPHERE_NETWORK
            #@ if data.values.WORKER_NODE_NAMESERVERS:
            nameservers: #@ data.values.WORKER_NODE_NAMESERVERS.replace(" ", "").split(",")
            #@ end
            #@ if data.values.TKG_IP_FAMILY == "ipv6":
            dhcp6: true
            #@ elif data.values.TKG_IP_FAMILY in ["ipv4,ipv6", "ipv6,ipv4"]:
            dhcp4: true
            dhcp6: true
            #@ else:
            dhcp4: true
            #@ end
      numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
      resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
      server: #@ data.values.VSPHERE_SERVER
      template: #@ data.values.VSPHERE_TEMPLATE
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
  name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
spec:
  clusterName: #@ data.values.CLUSTER_NAME
  replicas: #@ data.values.WORKER_MACHINE_COUNT_1
  selector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
        node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
      clusterName: #@ data.values.CLUSTER_NAME
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineTemplate
        name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
      version: #@ data.values.KUBERNETES_VERSION
      #@ if data.values.VSPHERE_AZ_1:
      failureDomain: #@ data.values.VSPHERE_AZ_1
      #@ end
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
  name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
spec:
  clusterName: #@ data.values.CLUSTER_NAME
  replicas: #@ data.values.WORKER_MACHINE_COUNT_2
  selector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
        node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
      clusterName: #@ data.values.CLUSTER_NAME
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineTemplate
        name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
      version: #@ data.values.KUBERNETES_VERSION
      #@ if data.values.VSPHERE_AZ_2:
      failureDomain: #@ data.values.VSPHERE_AZ_2
      #@ end
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
  namespace: '${ NAMESPACE }'
spec:
  template:
    spec:
      useExperimentalRetryJoin: true
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          kubeletExtraArgs:
            cloud-provider: external
            tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
          name: '{{ ds.meta_data.hostname }}'
      preKubeadmCommands:
        - hostname "{{ ds.meta_data.hostname }}"
        - echo "::1         ipv6-localhost ipv6-loopback" >/etc/hosts
        - echo "127.0.0.1   localhost" >>/etc/hosts
        - echo "127.0.0.1   {{ ds.meta_data.hostname }}" >>/etc/hosts
        - echo "{{ ds.meta_data.hostname }}" >/etc/hostname
      files: []
      users:
        - name: capv
          sshAuthorizedKeys:
            - #@ data.values.VSPHERE_SSH_AUTHORIZED_KEY
          sudo: ALL=(ALL) NOPASSWD:ALL
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
  namespace: '${ NAMESPACE }'
spec:
  template:
    spec:
      useExperimentalRetryJoin: true
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          kubeletExtraArgs:
            cloud-provider: external
            tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
          name: '{{ ds.meta_data.hostname }}'
      preKubeadmCommands:
        - hostname "{{ ds.meta_data.hostname }}"
        - echo "::1         ipv6-localhost ipv6-loopback" >/etc/hosts
        - echo "127.0.0.1   localhost" >>/etc/hosts
        - echo "127.0.0.1   {{ ds.meta_data.hostname }}" >>/etc/hosts
        - echo "{{ ds.meta_data.hostname }}" >/etc/hostname
      files: []
      users:
        - name: capv
          sshAuthorizedKeys:
            - #@ data.values.VSPHERE_SSH_AUTHORIZED_KEY
          sudo: ALL=(ALL) NOPASSWD:ALL
#@ end

Deploy a TKG cluster into a multi-AZ topology

To deploy a TKG cluster that spreads its worker nodes over multiple AZs, we need to add some key value pairs into the cluster config file.

Below is an example for my cluster config file – tkg-hugo.yaml.

The new key value pairs are described in the table below.

Parameter	Specification	Details
VSPHERE_REGION	k8s-region	Must be the same as the configuration in the vsphere-zones.yaml file
VSPHERE_ZONE	k8s-zone	Must be the same as the configuration in the vsphere-zones.yaml file
VSPHERE_AZ_0 VSPHERE_AZ_1 VSPHERE_AZ_2	az-1 az-2 az-3	Must be the same as the configuration in the vsphere-zones.yaml file
WORKER_MACHINE_COUNT	3	This is the number of worker nodes for the cluster. The total number of workers are distributed in a round-robin fashion across the number of AZs specified.
A note on WORKER_MACHINE_COUNT when using CLUSTER_PLAN: dev instead of prod. If you change the az-overlay.yaml @ if data.values.CLUSTER_PLAN == “prod” to @ if data.values.CLUSTER_PLAN == “dev”		Then the WORKER_MACHINE_COUNT reverts to the number of workers for each AZ. So if you set this number to 3, in a three AZ topology, you would end up with a TKG cluster with nine workers!

CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg-hugo
CLUSTER_PLAN: prod
ENABLE_CEIP_PARTICIPATION: 'false'
ENABLE_MHC: 'true'
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /home.local
VSPHERE_DATASTORE: lun02
VSPHERE_FOLDER: /home.local/vm/tkg-vsphere-workload
VSPHERE_NETWORK: /home.local/network/tkg-workload
VSPHERE_PASSWORD: <encoded:snipped>
VSPHERE_RESOURCE_POOL: /home.local/host/cluster/Resources/tkg-vsphere-workload
VSPHERE_SERVER: vcenter.vmwire.com
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa <snipped> administrator@vsphere.local
VSPHERE_USERNAME: administrator@vsphere.local
CONTROLPLANE_SIZE: small
WORKER_MACHINE_COUNT: 3
WORKER_SIZE: small
VSPHERE_INSECURE: 'true'
ENABLE_AUDIT_LOGGING: 'true'
ENABLE_DEFAULT_STORAGE_CLASS: 'false'
ENABLE_AUTOSCALER: 'false'
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3

Deploy the k8s-local-ssd Storage Class

Below is my storageclass-k8s-local-ssd.yaml.

Note that parameters.storagePolicyName: k8s-local-ssd, which is the same as the name of the storage policy for the local storage. All three of the local VMFS datastores that are backed by the local SSD drives are members of this storage policy.

Note that the volumeBindingMode is set to WaitForFirstConsumer.

Instead of creating a volume immediately, the WaitForFirstConsumer setting instructs the volume provisioner to wait until a pod using the associated PVC runs through scheduling. In contrast with the Immediate volume binding mode, when the WaitForFirstConsumer setting is used, the Kubernetes scheduler drives the decision of which failure domain to use for volume provisioning using the pod policies.

This guarantees the pod at its volume is always on the same AZ (ESXi host).

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: k8s-local-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.vsphere.vmware.com
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
  storagePolicyName: k8s-local-ssd

Deploy a workload that uses Topology Aware Volume Provisioning

Below is a statefulset that deploys three pods running nginx. It configures two persistent volumes, one for www and another for log. Both of these volumes are going to be provisioned onto the same ESXi host where the pod is running. The statefulset also runs an initContainer that will download a simple html file from my repo and copy it to the www mount point (/user/share/nginx/html).

You can see under spec.affinity.nodeAffinity how the statefulset uses the topology.

The statefulset then exposes the nginx app using the nginx-service which uses the Gateway API, that I wrote about in a previous blog post.

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: default
  labels:
    ako.vmware.com/gateway-name: gateway-tkg-workload-vip
    ako.vmware.com/gateway-namespace: default
spec:
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  serviceName: nginx-service
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.csi.vmware.com/k8s-zone
                operator: In
                values:
                - az-1
                - az-2
                - az-3
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: topology.csi.vmware.com/k8s-zone
      terminationGracePeriodSeconds: 10
      initContainers:
      - name: install
        image: busybox
        command:
        - wget
        - "-O"
        - "/www/index.html"
        - https://raw.githubusercontent.com/hugopow/cse/main/index.html
        volumeMounts:
        - name: www
          mountPath: "/www"
      containers:
        - name: nginx
          image: k8s.gcr.io/nginx-slim:0.8
          ports:
            - containerPort: 80
              name: web
          volumeMounts:
            - name: www
              mountPath: /usr/share/nginx/html
            - name: logs
              mountPath: /logs
  volumeClaimTemplates:
    - metadata:
        name: www
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: k8s-local-ssd
        resources:
          requests:
            storage: 2Gi
    - metadata:
        name: logs
      spec:
        accessModes: [ "ReadWriteOnce" ]
        storageClassName: k8s-local-ssd
        resources:
          requests:
            storage: 1Gi

What if you wanted to use more than three availability zones?

Some notes here on what I experienced during my testing.

The TKG cluster config has the following three lines to specify the names of the AZs that you want to use which will be passed onto the Tanzu CLI to use to deploy your TKG cluster using the ytt overlay file. However, the Tanzu CLI only supports a total of three AZs.

VSPHERE_AZ_0: az-1
VSPHERE_AZ_1: az-2
VSPHERE_AZ_2: az-3

If you wanted to use more than three AZs, then you would have to remove these three lines from the TKG cluster config and change the ytt overlay to not use the VSPHERE_AZ_# variables but to hard code the AZs into the ytt overlay file instead.

To do this replace the following:

      #@ if data.values.VSPHERE_AZ_2:
      failureDomain: #@ data.values.VSPHERE_AZ_0
      #@ end

with the following:

      failureDomain: az-2

and create an additional block of MachineDeployment and KubeadmConfigTemplate for each additional AZ that you need.

Summary

Below are screenshots and the resulting deployed objects after running kubectl apply -f to the above.

kubectl get nodes

NAME                             STATUS   ROLES                  AGE     VERSION
tkg-hugo-md-0-7d455b7488-d6jrl   Ready    <none>                 3h23m   v1.22.5+vmware.1
tkg-hugo-md-1-bc76659f7-cntn4    Ready    <none>                 3h23m   v1.22.5+vmware.1
tkg-hugo-md-2-6bb75968c4-mnrk5   Ready    <none>                 3h23m   v1.22.5+vmware.1

You can see that the worker nodes are distributed across the ESXi hosts as per our vsphere-zones.yaml and also our az-overlay.yaml files.

kubectl get po -o wide

NAME    READY   STATUS    RESTARTS   AGE     IP                NODE                             NOMINATED NODE   READINESS GATES
web-0   1/1     Running   0          3h14m   100.124.232.195   tkg-hugo-md-2-6bb75968c4-mnrk5   <none>           <none>
web-1   1/1     Running   0          3h13m   100.122.148.67    tkg-hugo-md-1-bc76659f7-cntn4    <none>           <none>
web-2   1/1     Running   0          3h12m   100.108.145.68    tkg-hugo-md-0-7d455b7488-d6jrl   <none>           <none>

You can see that each pod is placed on a separate worker node.

kubectl get csinodes -o jsonpath='{range .items[*]}{.metadata.name} {.spec}{"\n"}{end}'

tkg-hugo-md-0-7d455b7488-d6jrl {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"tkg-hugo-md-0-7d455b7488-d6jrl","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-hugo-md-1-bc76659f7-cntn4 {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"tkg-hugo-md-1-bc76659f7-cntn4","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}
tkg-hugo-md-2-6bb75968c4-mnrk5 {"drivers":[{"allocatable":{"count":59},"name":"csi.vsphere.vmware.com","nodeID":"tkg-hugo-md-2-6bb75968c4-mnrk5","topologyKeys":["topology.csi.vmware.com/k8s-region","topology.csi.vmware.com/k8s-zone"]}]}

We can see that the CSI driver has correctly configured the worker nodes with the topologyKeys that enables the topology aware volume provisioning.

kubectl get pvc -o wide

NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE     VOLUMEMODE
logs-web-0   Bound    pvc-13cf4150-db60-4c13-9ee2-cbc092dba782   1Gi        RWO            k8s-local-ssd   3h18m   Filesystem
logs-web-1   Bound    pvc-e99cfe33-9fa4-46d8-95f8-8a71f4535b15   1Gi        RWO            k8s-local-ssd   3h17m   Filesystem
logs-web-2   Bound    pvc-6bd51eed-e0aa-4489-ac0a-f546dadcee16   1Gi        RWO            k8s-local-ssd   3h17m   Filesystem
www-web-0    Bound    pvc-8f46420a-41c4-4ad3-97d4-5becb9c45c94   2Gi        RWO            k8s-local-ssd   3h18m   Filesystem
www-web-1    Bound    pvc-c3c9f551-1837-41aa-b24f-f9dc6fdb9063   2Gi        RWO            k8s-local-ssd   3h17m   Filesystem
www-web-2    Bound    pvc-632a9f81-3e9d-492b-847a-9316043a2d47   2Gi        RWO            k8s-local-ssd   3h17m   Filesystem

kubectl get pv -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.claimRef.name}{"\t"}{.spec.nodeAffinity}{"\n"}{end}'

pvc-13cf4150-db60-4c13-9ee2-cbc092dba782        logs-web-0      {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-3"]}]}]}}
pvc-632a9f81-3e9d-492b-847a-9316043a2d47        www-web-2       {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-1"]}]}]}}
pvc-6bd51eed-e0aa-4489-ac0a-f546dadcee16        logs-web-2      {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-1"]}]}]}}
pvc-8f46420a-41c4-4ad3-97d4-5becb9c45c94        www-web-0       {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-3"]}]}]}}
pvc-c3c9f551-1837-41aa-b24f-f9dc6fdb9063        www-web-1       {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]},{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-2"]}]}]}}
pvc-e99cfe33-9fa4-46d8-95f8-8a71f4535b15        logs-web-1      {"required":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"topology.csi.vmware.com/k8s-zone","operator":"In","values":["az-2"]},{"key":"topology.csi.vmware.com/k8s-region","operator":"In","values":["cluster"]}]}]}}

Here we see the placement for the persistent volumes within the AZs and they also align to the right worker node.

k get no tkg-hugo-md-0-7d455b7488-d6jrl -o yaml | grep topology.kubernetes.io/zone:

topology.kubernetes.io/zone: az-1

k get no tkg-hugo-md-1-bc76659f7-cntn4 -o yaml | grep topology.kubernetes.io/zone:

topology.kubernetes.io/zone: az-2

k get no tkg-hugo-md-2-6bb75968c4-mnrk5 -o yaml | grep topology.kubernetes.io/zone:

topology.kubernetes.io/zone: az-3

k get volumeattachments.storage.k8s.io

NAME                                                                   ATTACHER                 PV                                         NODE                             ATTACHED   AGE
csi-476b244713205d0d4d4e13da1a6bd2beec49ac90fbd4b64c090ffba8468f6479   csi.vsphere.vmware.com   pvc-c3c9f551-1837-41aa-b24f-f9dc6fdb9063   tkg-hugo-md-1-bc76659f7-cntn4    true       9h
csi-5a759811557125917e3b627993061912386f4d2e8fb709e85fc407117138b178   csi.vsphere.vmware.com   pvc-8f46420a-41c4-4ad3-97d4-5becb9c45c94   tkg-hugo-md-2-6bb75968c4-mnrk5   true       9h
csi-6016904b0ac4ac936184e95c8ff0b3b8bebabb861a99b822e6473c5ee1caf388   csi.vsphere.vmware.com   pvc-6bd51eed-e0aa-4489-ac0a-f546dadcee16   tkg-hugo-md-0-7d455b7488-d6jrl   true       9h
csi-c5b9abcc05d7db5348493952107405b557d7eaa0341aa4e952457cf36f90a26d   csi.vsphere.vmware.com   pvc-13cf4150-db60-4c13-9ee2-cbc092dba782   tkg-hugo-md-2-6bb75968c4-mnrk5   true       9h
csi-df68754411ab34a5af1c4014db9e9ba41ee216d0f4ec191a0d191f07f99e3039   csi.vsphere.vmware.com   pvc-e99cfe33-9fa4-46d8-95f8-8a71f4535b15   tkg-hugo-md-1-bc76659f7-cntn4    true       9h
csi-f48a7db32aafb2c76cc22b1b533d15d331cd14c2896b20cfb4d659621fd60fbc   csi.vsphere.vmware.com   pvc-632a9f81-3e9d-492b-847a-9316043a2d47   tkg-hugo-md-0-7d455b7488-d6jrl   true       9h

And finally, some other screenshots to show the PVCs in vSphere.

ESX1

ESX2

ESX3

Deploying Kubeapps on TKG in vCloud Director Clouds

Kubeapps is a web-based UI for deploying and managing applications in Kubernetes clusters. This guide shows how you can deploy Kubeapps into your TKG clusters deployed in VMware Cloud Director.

With Kubeapps you can:

Customize deployments through an intuitive, form-based user interface
Inspect, upgrade and delete applications installed in the cluster
Browse and deploy Helm charts from public or private chart repositories (including VMware Marketplace™ and Bitnami Application Catalog)
Browse and deploy Kubernetes Operators
Secure authentication to Kubeapps using a standalone OAuth2/OIDC provider or using Pinniped
Secure authorization based on Kubernetes Role-Based Access Control

Pre-requisites:

a Kubernetes cluster deployed in VCD
Avi is setup for VCD to provide L4 load balancer to Kubernetes services
NSX-T is is setup for VCD
A default storageclass is defined for your Kubernetes cluster
Helm installed to your workstation, if using Photon OS, its already installed

Step 1: Install KubeApps

helm repo add bitnami https://charts.bitnami.com/bitnami
kubectl create namespace kubeapps
helm install kubeapps --namespace kubeapps bitnami/kubeapps

Step 2: Create demo credentials

kubectl create --namespace default serviceaccount kubeapps-operator
kubectl create clusterrolebinding kubeapps-operator --clusterrole=cluster-admin --serviceaccount=default:kubeapps-operator

Step 3: Obtain token to login to KubeApps

kubectl get --namespace default secret $(kubectl get --namespace default serviceaccount kubeapps-operator -o jsonpath='{range .secrets[*]}{.name}{"\n"}{end}' | grep kubeapps-operator-token) -o jsonpath='{.data.token}' -o go-template='{{.data.token | base64decode}}' && echo

Step 4: Expose KubeApps using Avi load balancer

k edit svc kubeapps -n kubeapps

change the line from

"type: ClusterIP"

to

"type: LoadBalancer"

Or: Expose using Gateway API, add ako.vmware.com labels into the kubeapps service like this (Not supported in VCD clouds):

apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: kubeapps
    meta.helm.sh/release-namespace: kubeapps
  creationTimestamp: "2022-03-26T13:47:45Z"
  labels:
    ako.vmware.com/gateway-name: gateway-tkg-workload-vip
    ako.vmware.com/gateway-namespace: default
    app.kubernetes.io/component: frontend
    app.kubernetes.io/instance: kubeapps
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kubeapps
    helm.sh/chart: kubeapps-7.8.13
  name: kubeapps
  namespace: kubeapps

Step 5: Log into KubeApps with the token