In a previous post I wrote about how to scale workload cluster control plane and worker nodes vertically. This post explains how to do the same for the TKG Management Cluster nodes.
Scaling vertically is increasing or decreasing the CPU, Memory, Disk or changing other things such as the network for the nodes. Using the Cluster API it is possible to make these changes on the fly, Kubernetes will use rolling updates to make the necessary changes.
First change to the TKG Management Cluster context to make the changes.
Scaling Worker Nodes
Run the following to list all the vSphereMachineTemplates.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -A
NAMESPACE NAME AGE
tkg-system tkg-mgmt-control-plane 20h
tkg-system tkg-mgmt-worker 20h
These custom resource definitions are immutable so we will need to make a copy of the yaml file and edit it to add a new vSphereMachineTemplate.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-worker -o yaml > tkg-mgmt-worker-new.yaml
Now edit the new file named tkg-mgmt-worker-new.yaml
Save and quit. You’ll notice that a new VM will immediately start being cloned in vCenter. Wait for it to complete, this new VM is the new worker with the updated CPU and memory sizing and it will replace the current worker node. Eventually, after a few minutes, the old worker node will be deleted and you will be left with a new worker node with the updated CPU and RAM specified in the new VSphereMachineTemplate.
Scaling Control Plane Nodes
Scaling the control plane nodes is similar.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-control-plane -o yaml > tkg-mgmt-control-plane-new.yaml
Edit the file and perform the same steps as the worker nodes.
You’ll notice that there is no MachineDeployment for the control plane node for a TKG Management Cluster. Instead we have to edit the CRD named KubeAdmControlPlane.
Run this command
k get kubeadmcontrolplane -A
NAMESPACE NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION
tkg-system tkg-mgmt-control-plane tkg-mgmt true true 1 1 1 0 21h v1.22.9+vmware.1
Now we can edit it
k edit kubeadmcontrolplane -n tkg-system tkg-mgmt-control-plane
Change the section under spec.machineTemplate.infrastructureRef, around line 106.
Save the file. You’ll notice that another VM will start cloning and eventually you’ll have a new control plane node up and running. This new control plane node will replace the older one. It will take longer than the worker node so be patient.
The default setting for load balancer service requests for application services defaults to using the two-arm load balancer with NSX Advanced Load Balancer (Avi) in Container Service Extension (CSE) provisioned Tanzu Kubernetes Grid (TKG) cluster deployed in VMware Cloud Director (VCD).
VCD tells NSX-T to create a DNAT towards an internal only IP range of 192.168.8.x. This may be undesirable for some customers and it is now possible to remove the need for this and just use a one-arm load balancer instead.
The default setting for load balancer service requests for application services defaults to using the two-arm load balancer with NSX Advanced Load Balancer (Avi) in Container Service Extension (CSE) provisioned Tanzu Kubernetes Grid (TKG) cluster deployed in VMware Cloud Director (VCD).
VCD tells NSX-T to create a DNAT towards an internal only IP range of 192.168.8.x. This may be undesirable for some customers and it is now possible to remove the need for this and just use a one-arm load balancer instead.
This capability has been enabled only for VCD 10.4.x, in prior versions of VCD this support was not available.
The requirements are:
CSE 4.0
VCD 10.4
Avi configured for VCD
A TKG cluster provisioned by CSE UI.
If you’re still running VCD 10.3.x then this blog article is irrelevant.
The vcloud-ccm-configmap config map stores the vcloud-ccm-config.yaml, that is used by the vmware-cloud-director-ccm deployment.
Step 1 – make a copy of the vcloud-ccm-configmap
k get cm -n kube-system vcloud-ccm-configmap -o yaml
Make a copy of the config map to edit it and then apply, since the current config map is immutable.
k get cm -n kube-system vcloud-ccm-configmap -o yaml > vcloud-ccm-configmap.yaml
Step 2 – Edit the vcloud-ccm-configmap
Edit the file, delete the entries under data: oneArm:\n , delete the startIP and endIP lines and change the value to true for the key enableVirtualServiceSharedIP. Ignore the rest of the file.
To apply the new config map, you need to delete the old configmap first.
k delete cm -n kube-system vcloud-ccm-configmap
configmap "vcloud-ccm-configmap" deleted
Apply the new config map with the yaml file that you just edited.
k apply -f vcloud-ccm-configmap.yaml
configmap/vcloud-ccm-configmap created
To finalize the change, you have to take a backup of the vmware-cloud-director-ccm deployment and then delete it so that it can use the new config-map.
You can check the config map that this deployment uses by typing:
k get deploy -n kube-system vmware-cloud-director-ccm -o yaml
Step 4 – Redeploy the vmware-cloud-director-ccm deloyment
Take a backup of the vmware-cloud-director-ccm deployment by typing:
k get deploy -n kube-system vmware-cloud-director-ccm -o yaml > vmware-cloud-director-ccm.yaml
No need to edit this time. Now delete the deployment:
k delete deploy -n kube-system vmware-cloud-director-ccm
You can now recreate the deployment from the yaml file:
k apply -f vmware-cloud-director-ccm.yaml
deployment.apps/vmware-cloud-director-ccm created
Now when you deploy and application and request a load balancer service, NSX ALB (Avi) will route the external VIP IP towards the k8s workers nodes, instead of to the NSX-T DNAT segment on 192.168.2.x first.
Step 5 – Deploy a load balancer service
k apply -f https://raw.githubusercontent.com/hugopow/tkg-dev/main/web-statefulset.yaml
You’ll notice a few things happening with this example. A new statefulset with one replica is scheduled with an nginx pod. The statefulset also requests a 1 GiB PVC to store the website. A load balancer service is also requested.
Note that there is no DNAT setup on this tenant’s NAT services, this is because after the config map change, the vmware-cloud-director cloud controller manager is not using a two-arm load balancer architecture anymore, therefore no need to do anything with NSX-T NAT rules.
If you check your NSX ALB settings you’ll notice that it is indeed now using a one-arm configuration. Where the external VIP IP address is 10.149.1.113 and port is TCP 80. NSX ALB is routing that to the two worker nodes with IP addresses of 192.168.0.100 and 192.168.0.102 towards port TCP 30020.
k get svc -n web-statefulset
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE web-statefulset-service LoadBalancer 100.66.198.78 10.149.1.113 80:30020/TCP 13m
k get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
tkg-1-worker-node-pool-1-68c67d5fd6-c79kr Ready 5h v1.22.9+vmware.1 192.168.0.102 192.168.0.102 Ubuntu 20.04.4 LTS 5.4.0-109-generic containerd://1.5.11
tkg-1-worker-node-pool-2-799d6bccf5-8vj7l Ready 4h46m v1.22.9+vmware.1 192.168.0.100 192.168.0.100 Ubuntu 20.04.4 LTS 5.4.0-109-generic containerd://1.5.11
For those partners that have been testing the beta, you’ll need to remove all traces of it before you can install the GA version. VMware does not support upgrading or migrating from beta builds to GA builds.
This is a post to help you clean up your VMware Cloud Director environment in preparation for the GA build of CSE 4.0.
For those partners that have been testing the beta, you’ll need to remove all traces of it before you can install the GA version. VMware does not support upgrading or migrating from beta builds to GA builds.
If you don’t clean up, when you try to configure CSE again with the CSE Management wizard, you’ll see the message below:
“Server configuration entity already exists.”
Delete CSE Roles
First delete all the CSE Roles that the beta has setup, the GA version of CSE will recreate these for you when you use the CSE management wizard. Don’t forget to assign the new role to your CSE service account when you deploy the CSE GA OVA.
Use the Postman Collection to clean up
I’ve included a Postman collection on my Github account, available here.
Hopefully, it is self-explanatory. Authenticate against the VCD API, then run each API request in order, make sure you obtain the entity and entityType IDs before you delete.
If you’re unable to delete the entity or entityTypes, you may need to delete all of the CSE clusters before, that means cleaning up all PVCs, PVs, deployments and then the clusters themselves.
Deploy CSE GA Normally
You’ll now be able to use the Configure Management wizard and deploy CSE 4.0 GA as normal.
Known Issues
If you’re unable to delete any of these entities then run a POST using /resolve.
For example, https://vcd.vmwire.com/api-explorer/provider#/definedEntity/resolveDefinedEntity
Once, it is resolved, you can go ahead and delete the entity.
This post summarizes how you can migrate the VMware Cloud Director database from PostgreSQL running in the VCD appliance into a PostgreSQL pod running in Kuberenetes and then creating new VCD cells running as pods in Kubernetes to run VCD services. In summary, modernizing VCD as a modern application.
This post summarizes how you can migrate the VMware Cloud Director database from PostgreSQL running in the VCD appliance into a PostgreSQL pod running in Kuberenetes and then creating new VCD cells running as pods in Kubernetes to run VCD services. In summary, modernizing VCD into a modern application.
I wanted to experiment with VMware Cloud Director to see if it would run in Kubernetes. One of the reasons for this is to reduce resource consumption in my home lab. The VCD appliance can be quite a high resource consuming VM needing a minimum of 2 vCPUs and 6GB of RAM. Running VCD in Kubernetes would definitely reduce this down and free up much needed RAM for other applications. Other benefits by running this workload in Kubernetes would benefit from faster deployment, higher availability, easier lifecycle management and operations and additional benefits from the ecosystem such as observability tools.
Here’s a view of the current VCD appliance in the portal. 172.16.1.34 is the IP of the appliance, 172.16.1.0/27 is the network for the NSX-T segment that I’ve created for the VCD DMZ network. At the end of this post, you’ll see VCD running in Kubernetes pods with IP addresses assigned by the CNI instead.
Tanzu Kubernetes Grid Shared Services Cluster
I am using a Tanzu Kubernetes Grid cluster set up for shared services. Its the ideal place to run applications that in the virtual machine world would have been running in a traditional vSphere Management Cluster. I also run Container Service Extension and App Launchpad Kubernetes pods in this cluster too.
Step 1. Deploy PostgreSQL with Kubeapps into a Kubernetes cluster
If you have Kubeapps, this is the easiest way to deploy PostgreSQL.
Copy my settings below to create a PostgreSQL database server and the vcloud user and database that are required for the database restore.
Step 1. Alternatively, use Helm directly.
# Create database server using KubeApps or Helm, vcloud user with password
helm repo add bitnami https://charts.bitnami.com/bitnami
# Pull the chart, unzip then edit values.yaml
helm pull bitnami/postgresql
tar zxvf postgresql-11.1.11.tgz
helm install postgresql bitnami/postgresql -f /home/postgresql/values.yaml -n vmware-cloud-director
# Expose postgres service using load balancer
k expose pod -n vmware-cloud-director postgresql-primary-0 --type=LoadBalancer --name postgresql-public
# Get the IP address of the load balancer service
k get svc -n vmware-cloud-director postgresql-public
# Connect to database as postgres user from VCD appliance to test connection
psql --host 172.16.4.70 -U postgres -p 5432
# Type password you used when you deployed postgresql
# Quit
\q
Step 2. Backup database from VCD appliance and restore to PostgreSQL Kubernetes pod
Log into the VCD appliance using SSH.
# Stop vcd services on all VCD appliances
service vmware-vcd stop
# Backup database and important files on VCD appliance
./opt/vmware/appliance/bin/create_backup.sh
# Unzip the zip file into /opt/vmware/vcloud-director/data/transfer/backups
# Restore database using pg_dump backup file. Do this from the VCD appliance as it already has the postgres tools installed.
pg_restore --host 172.16.4.70 -U postgres -p 5432 -C -d postgres /opt/vmware/vcloud-director/data/transfer/backups/vcloud-database.sql
# Edit responses.properties and change IP address of database server from load balancer IP to the assigned FQDN for the postgresql pod, e.g. postgresql-primary.vmware-cloud-director.svc.cluster.local
# Shutdown the VCD appliance, its no longer needed
Step 3. Deploy Helm Chart for VCD
# Pull the Helm Chart
helm pull oci://harbor.vmwire.com/library/vmware-cloud-director
# Uncompress the Helm Chart
tar zxvf vmware-cloud-director-0.5.0.tgz
# Edit the values.yaml to suit your needs
# Deploy the Helm Chart
helm install vmware-cloud-director vmware-cloud-director --version 0.5.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml
# Wait for about five minutes for the installation to complete
# Monitor logs
k logs -f -n vmware-cloud-director vmware-cloud-director-0
Known Issues
If you see an error such as:
Error starting application: Unable to create marker file in the transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer/cells/4c959d7c-2e3a-4674-b02b-c9bbc33c5828]
This is due to the transfer share being created by a different vcloud user on the original VCD appliance. This user has a different Linux user ID, normally 1000 or 1001, we need to change this to work with the new vcloud user.
Run the following commands to resolve this issue:
# Launch a bash session into the VCD pod
k exec -it -n vmware-cloud-director vmware-cloud-director-0 -- /bin/bash
# change ownership to the /transfer share to the vcloud user
chmod -R vcloud:vcloud /opt/vmware/vcloud-director/data/transfer
# type exit to quit
exit
Once that’s done, the cell can start and you’ll see the following:
Successfully verified transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer]
Cell startup completed in 2m 26s
Accessing VCD
The VCD pod is exposed using a load balancer in Kubernetes. Ports 443 and 8443 are exposed on a single IP, just like how it is configured on the VCD appliance.
Run the following to obtain the new load balancer IP address of VCD.
k get svc -n vmware-cloud-director vmware-cloud-director
Redirect your DNS server record to point to this new IP address for both the HTTP and VMRC services, e.g., 172.16.4.71.
If everything ran successfully, you should now be able to log into VCD. Here’s my VCD instance that I use for my lab environment which was previously running in a VCD appliance, now migrated over to Kubernetes.
Notice, the old cell is now inactive because it is powered-off. It can now be removed from VCD and deleted from vCenter.
The pod vmware-cloud-director-0 is now running the VCD application. Notice its assigned IP address of 100.107.74.159. This is the pod’s IP address.
Everything else will work as normal, any UI customizations, TLS certificates are kept just as before the migration, this is because we restored the database and used the responses.properties to add new cells.
Even opening a remote console to a VM will continue to work.
Load Balancer is NSX Advanced LB (Avi)
Avi provides the load balancing services automatically through the Avi Kubernetes Operator (AKO).
AKO automatically configures the services in Avi for you when services are exposed.
Deploy another VCD cell, I mean pod
It is very easy now to scale the VCD by deploying additional replicas.
Edit the values.yaml file and change the replicas number from 1 to 2.
# Upgrade the Helm Chart
helm upgrade vmware-cloud-director vmware-cloud-director --version 0.4.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml
# Wait for about five minutes for the installation to complete
# Monitor logs
k logs -f -n vmware-cloud-director vmware-cloud-director-1
When the VCD services start up successfully, you’ll notice that the cell will appear in the VCD UI and Avi is also updated automatically with another pool.
We can also see that Avi is load balancing traffic across the two pods.
Deploy as many replicas as you like.
Resource usage
Here’s a very brief overview of what we have deployed so far.
Notice that the two PostgreSQL pods together are only using 700 Mb of RAM. The VCD pods are consuming much more. But a vast improvement over the 6GB that one appliance needed previously.
High Availability
You can ensure that the VCD pods are scheduled on different Kubernetes worker nodes by using multi availability zone topology. To do this just change the values.yaml.
# Availability zones in deployment.yaml are setup for TKG and must match VsphereFailureDomain and VsphereDeploymentZones
availabilityZones:
enabled: true
This makes sure that if you scale up the vmware-cloud-director statefulset, Kubernetes will ensure that each of the pods will not be placed on the same worker node.
As you can see from the Kubernetes Dashboard output under Resource usage above, vmware-cloud-director-0 and vmware-cloud-director-1 pods are scheduled on different worker nodes.
More importantly, you can see that I have also used the same for the postgresql-primary-0 and postgresql-read-0 pods. These are really important to keep separate in case of failure of a worker node or of an ESX server that the worker node runs on.
Finally
Here are a few screenshots of VCD, CSE and ALP all running in my Shared Services Kubernetes cluster.
Backing up the PostgreSQL database
For Day 2 operations, such as backing up the PostgreSQL database you can use Velero or just take a backup of the database using the pg_dump tool.
Backing up the database with pg_dump using a Docker container
Its super easy to take a database backup using a Docker container, just make sure you have Docker running on your workstation and that it can reach the load balancer IP address for the PostgreSQL service.
For an overview of Kapp, please see this link here.
The latest versions as of TKG 1.5.1, February 2022.
Package
Version
cert-manager
1.5.3+vmware.2-tkg.1
contour
1.18.2+vmware.1-tkg.1
prometheus
2.27.0+vmware.2-tkg.1
grafana
7.5.7+vmware.2-tkg.1
Or run the following to see the latest available versions.
tanzu package available list cert-manager.tanzu.vmware.com -A
tanzu package available list contour.tanzu.vmware.com -A
tanzu package available list prometheus.tanzu.vmware.com -A
tanzu package available list grafana.tanzu.vmware.com -A
I’m using ingress with Contour which needs a load balancer to expose the ingress services. Install AKO and NSX Advanced Load Balancer (Avi) by following this previous post.
Install Contour
Create a file named contour-data-values.yaml, this example uses NSX Advanced Load Balancer (Avi)
Generate a Base64 password and edit the grafana-data-values.yaml file to update the default admin password.
echo -n 'Vmware1!' | base64
Also update the TLS configuration to use signed certificates for ingress. It will look something like this.
secret:
type: "Opaque"
admin_user: "YWRtaW4="
admin_password: "Vm13YXJlMSE="
ingress:
enabled: true
virtual_host_fqdn: "grafana-tkg-mgmt.vmwire.com"
prefix: "/"
servicePort: 80
#! [Optional] The certificate for the ingress if you want to use your own TLS certificate.
#! We will issue the certificate by cert-manager when it's empty.
tlsCertificate:
#! [Required] the certificate
tls.crt: |
-----BEGIN CERTIFICATE-----
---snipped---
-----END CERTIFICATE-----
#! [Required] the private key
tls.key: |
-----BEGIN PRIVATE KEY-----
---snipped---
-----END PRIVATE KEY-----
Since I’m using ingress to expose the Grafana service, also change line 33, from LoadBalancer to ClusterIP. This prevents Kapp from creating an unnecessary service that will consume an IP address.
Since I’m using ingress and I set the ingress FQDN as grafana-tkg-mgmt.vmwire.com and I also used TLS. I can now access the Grafana UI using https://grafana-tkg-mgmt.vmwire.com and enjoy a secure connection.
Listing all installed packages
tanzu package installed list -A
Making changes to Contour, Prometheus or Grafana
If you need to make changes to any of the configuration files, you can then update the deployment with the tanzu package installed update command.
With the vSphere CSI driver, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.
With this model, it is possible to present individual SSDs or NVMe drives attached to an ESXi host and configure a local datastore for use with topology aware volume provisioning. Kubernetes can then create persistent volumes and schedule pods that are deployed onto the worker nodes that are on the same ESXi host as the volume. This enables Kubernetes pods to have direct local access to the underlying storage.
With the vSphere CSI driver version 2.4.1, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.
Using local storage has distinct advantages over shared storage, especially when it comes to supporting faster and cheaper storage media for applications that do not benefit from or require the added complexity of having their data replicated by the storage layer. Examples of applications that do not require storage protection (RAID or failures to tolerate) are applications that can achieve data protection at the application level.
With this model, it is possible to present individual SSDs or NVMe drives attached to an ESXi host and configure a local datastore for use with topology aware volume provisioning. Kubernetes can then create persistent volumes and schedule pods that are deployed onto the worker nodes that are on the same ESXi host as the volume. This enables Kubernetes pods to have direct local access to the underlying storage.
Figure 1.
To setup such an environment, it is necessary to go over some of the requirements first.
Deploy Tanzu Kubernetes Clusters to Multiple Availability Zones on vSphere – link
Spread Nodes Across Multiple Hosts in a Single Compute Cluster
Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware – link
Deploy TKG clusters into a multi-AZ topology
Deploy the k8s-local-ssd storage class
Deploy Workloads with WaitForFirstConsumer Mode in Topology-Aware Environment – link
Before you start
Note that only the CSI driver for vSphere version 2.4.1 supports local storage topology in a multi-AZ topology. To check if you have the correct version in your TKG cluster, run the following.
tanzu package installed get vsphere-csi -n tkg-system
- Retrieving installation details for vsphere-csi... I0224 19:20:29.397702 317993 request.go:665] Waited for 1.03368201s due to client-side throttling, not priority and fairness, request: GET:https://172.16.3.94:6443/apis/secretgen.k14s.io/v1alpha1?timeout=32s
\ Retrieving installation details for vsphere-csi...
NAME: vsphere-csi
PACKAGE-NAME: vsphere-csi.tanzu.vmware.com
PACKAGE-VERSION: 2.4.1+vmware.1-tkg.1
STATUS: Reconcile succeeded
CONDITIONS: [{ReconcileSucceeded True }]
Deploy Tanzu Kubernetes Clusters to Multiple Availibility Zones on vSphere
In my example, I am using the Spread Nodes Across Multiple Hosts in a Single Compute Cluster example, each ESXi host is an availability zone (AZ) and the vSphere cluster is the Region.
Figure 1. shows a TKG cluster with three worker nodes, each node is running on a separate ESXi host. Each ESXi host has a local SSD drive formatted with VMFS 6. The topology aware volume provisioner would always place pods and their replicas on separate worker nodes and also any persistent volume claims (PVC) on separate ESXi hosts.
*Note that “cluster” is the name of my vSphere cluster.
Ensure that you’ve set up the correct rules that enforce worker nodes to their respective ESXi hosts. Always use “Must run on hosts in group“, this is very important for local storage topology to work. This is because the worker nodes will be labelled for topology awareness, and if a worker node is vMotion’d accidentally then the CSI driver will not be able to bind the PVC to the worker node.
Below is my vsphere-zones.yaml file.
Note that autoConfigure is set to true. Which means that you do not have to tag the cluster or the ESX hosts yourself, you would only need to setup up the affinity rules under Cluster, Configure, VM/Host Groups and VM/Host Rules. The setting autoConfigure: true, would then make CAPV automatically configure the tags and tag categories for you.
Note that Kubernetes does not like using parameter names that are not standard, I suggest for your vmGroupName and hostGroupName parameters, use lowercase and dashes instead of periods. For example host-group-3, instead of Host.Group.3. The latter will be rejected.
Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware
To ensure that this topology can be built by TKG, we first need to create a TKG cluster plan overlay that tells Tanzu how what to do when creating worker nodes in a multi-availability zone topology.
Lets take a look at my az-overlay.yaml file.
Since I have three AZs, I need to create an overlay file that includes the cluster plan for all three AZs.
To deploy a TKG cluster that spreads its worker nodes over multiple AZs, we need to add some key value pairs into the cluster config file.
Below is an example for my cluster config file – tkg-hugo.yaml.
The new key value pairs are described in the table below.
Parameter
Specification
Details
VSPHERE_REGION
k8s-region
Must be the same as the configuration in the vsphere-zones.yaml file
VSPHERE_ZONE
k8s-zone
Must be the same as the configuration in the vsphere-zones.yaml file
VSPHERE_AZ_0 VSPHERE_AZ_1 VSPHERE_AZ_2
az-1 az-2 az-3
Must be the same as the configuration in the vsphere-zones.yaml file
WORKER_MACHINE_COUNT
3
This is the number of worker nodes for the cluster.
The total number of workers are distributed in a round-robin fashion across the number of AZs specified.
A note on WORKER_MACHINE_COUNT when using CLUSTER_PLAN: dev instead of prod.
If you change the az-overlay.yaml @ if data.values.CLUSTER_PLAN == “prod” to @ if data.values.CLUSTER_PLAN == “dev”
Then the WORKER_MACHINE_COUNT reverts to the number of workers for each AZ. So if you set this number to 3, in a three AZ topology, you would end up with a TKG cluster with nine workers!
Note that parameters.storagePolicyName: k8s-local-ssd, which is the same as the name of the storage policy for the local storage. All three of the local VMFS datastores that are backed by the local SSD drives are members of this storage policy.
Note that the volumeBindingMode is set to WaitForFirstConsumer.
Instead of creating a volume immediately, the WaitForFirstConsumer setting instructs the volume provisioner to wait until a pod using the associated PVC runs through scheduling. In contrast with the Immediate volume binding mode, when the WaitForFirstConsumer setting is used, the Kubernetes scheduler drives the decision of which failure domain to use for volume provisioning using the pod policies.
This guarantees the pod at its volume is always on the same AZ (ESXi host).
Deploy a workload that uses Topology Aware Volume Provisioning
Below is a statefulset that deploys three pods running nginx. It configures two persistent volumes, one for www and another for log. Both of these volumes are going to be provisioned onto the same ESXi host where the pod is running. The statefulset also runs an initContainer that will download a simple html file from my repo and copy it to the www mount point (/user/share/nginx/html).
You can see under spec.affinity.nodeAffinity how the statefulset uses the topology.
The statefulset then exposes the nginx app using the nginx-service which uses the Gateway API, that I wrote about in a previous blog post.
What if you wanted to use more than three availability zones?
Some notes here on what I experienced during my testing.
The TKG cluster config has the following three lines to specify the names of the AZs that you want to use which will be passed onto the Tanzu CLI to use to deploy your TKG cluster using the ytt overlay file. However, the Tanzu CLI only supports a total of three AZs.
If you wanted to use more than three AZs, then you would have to remove these three lines from the TKG cluster config and change the ytt overlay to not use the VSPHERE_AZ_# variables but to hard code the AZs into the ytt overlay file instead.
To do this replace the following:
#@ if data.values.VSPHERE_AZ_2:
failureDomain: #@ data.values.VSPHERE_AZ_0
#@ end
with the following:
failureDomain: az-2
and create an additional block of MachineDeployment and KubeadmConfigTemplate for each additional AZ that you need.
Summary
Below are screenshots and the resulting deployed objects after running kubectl apply -f to the above.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-hugo-md-0-7d455b7488-d6jrl Ready <none> 3h23m v1.22.5+vmware.1
tkg-hugo-md-1-bc76659f7-cntn4 Ready <none> 3h23m v1.22.5+vmware.1
tkg-hugo-md-2-6bb75968c4-mnrk5 Ready <none> 3h23m v1.22.5+vmware.1
You can see that the worker nodes are distributed across the ESXi hosts as per our vsphere-zones.yaml and also our az-overlay.yaml files.
kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-0 1/1 Running 0 3h14m 100.124.232.195 tkg-hugo-md-2-6bb75968c4-mnrk5 <none> <none>
web-1 1/1 Running 0 3h13m 100.122.148.67 tkg-hugo-md-1-bc76659f7-cntn4 <none> <none>
web-2 1/1 Running 0 3h12m 100.108.145.68 tkg-hugo-md-0-7d455b7488-d6jrl <none> <none>
You can see that each pod is placed on a separate worker node.
kubectl get csinodes -o jsonpath='{range .items[*]}{.metadata.name} {.spec}{"\n"}{end}'
This post describes how to setup Harbor to run on a standalone VM. There are times when you want to do this, such as occasions where your environment does not have internet access or you want to have a local repository running close to your environment.
This post describes how to setup Harbor to run on a standalone VM. There are times when you want to do this, such as occasions where your environment does not have internet access or you want to have a local repository running close to your environment.
I found that I was running a lot of TKG deployments against TKG staging builds in my lab and wanted to speed up cluster creation times, so building a local Harbor repository would make things a bit quicker and more reliable.
This post describes how you can setup a Harbor repository on a Photon VM.
# Configuration file of Harbor
# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname: harbor.vmwire.com
# http related config
http:
# port for http, default is 80. If https enabled, this port will redirect to https port
port: 80
# https related config
https:
# https port for harbor, default is 443
port: 443
# The path of cert and key files for nginx
certificate: /etc/docker/certs.d/harbor.vmwire.com/harbor.cert
private_key: /etc/docker/certs.d/harbor.vmwire.com/harbor_key.key
[snipped]
Update line 5 with your harbor instance’s FQDN.
Update lines 17 and 18 with the certificate and private key.
You can leave all the other lines on default.
Install Harbor with the following command:
./install.sh
Check to see if services are running
docker-compose ps
Step 9: Add harbor FQDN to your DNS servers and connect to Harbor.
To upgrade, download the new offline installer and run
Photon OS 3 does not support Linux guest customization unfortunately, so we will use the links below to manually setup the OS with a hostname and static IP address.
Boot the VM, the default credentials are root with password changeme. Change the default password.
Photon 3 has the older repositories, so we will need to update to newer repositories as detailed in this KB article. I’ve included this in the instructions below.
Copypasta or use create a bash script.
# Update Photon repositories
cd /etc/yum.repos.d/
sed -i 's/dl.bintray.com\/vmware/packages.vmware.com\/photon\/$releasever/g' photon.repo photon-updates.repo photon-extras.repo photon-debuginfo.repo
# If you get errors with the above command, then copy the command from the KB article.
# Update Photon
tdnf --assumeyes update
# Install dependencies
tdnf --assumeyes install build-essential python3-devel python3-pip git
# Update python3, cse supports python3 version 3.7.3 or greater, it does not support python 3.8 or above.
tdnf --assumeyes update python3
# Prepare cse user and application directories
mkdir -p /opt/vmware/cse
chmod 775 -R /opt
chmod 777 /
groupadd cse
useradd cse -g cse -m -p Vmware1! -d /opt/vmware/cse
chown cse:cse -R /opt
# Run as cse user, add your public ssh key to CSE server
su - cse
mkdir -p ~/.ssh
cat >> ~/.ssh/authorized_keys << EOF
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
EOF
cat >> ~/.bash_profile << EOF
# For Container Service Extension
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
source /opt/vmware/cse/python/bin/activate
EOF
# Install CSE in virtual environment
python3 -m venv /opt/vmware/cse/python
source /opt/vmware/cse/python/bin/activate
pip3 install container-service-extension==3.1.1
cse version
source ~/.bash_profile
# Prepare vcd-cli
mkdir -p ~/.vcd-cli
cat > ~/.vcd-cli/profiles.yaml << EOF
extensions:
- container_service_extension.client.cse
EOF
vcd cse version
# Add my Let's Encrypt intermediate and root certs. Use your certificates issued by your CA to enable verify=true with CSE.
cat >> /opt/vmware/cse/python/lib/python3.7/site-packages/certifi/cacert.pem << EOF
-----BEGIN CERTIFICATE-----
MIIFFjCCAv6gAwIBAgIRAJErCErPDBinU/bWLiWnX1owDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMjAwOTA0MDAwMDAw
WhcNMjUwOTE1MTYwMDAwWjAyMQswCQYDVQQGEwJVUzEWMBQGA1UEChMNTGV0J3Mg
RW5jcnlwdDELMAkGA1UEAxMCUjMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQC7AhUozPaglNMPEuyNVZLD+ILxmaZ6QoinXSaqtSu5xUyxr45r+XXIo9cP
R5QUVTVXjJ6oojkZ9YI8QqlObvU7wy7bjcCwXPNZOOftz2nwWgsbvsCUJCWH+jdx
sxPnHKzhm+/b5DtFUkWWqcFTzjTIUu61ru2P3mBw4qVUq7ZtDpelQDRrK9O8Zutm
NHz6a4uPVymZ+DAXXbpyb/uBxa3Shlg9F8fnCbvxK/eG3MHacV3URuPMrSXBiLxg
Z3Vms/EY96Jc5lP/Ooi2R6X/ExjqmAl3P51T+c8B5fWmcBcUr2Ok/5mzk53cU6cG
/kiFHaFpriV1uxPMUgP17VGhi9sVAgMBAAGjggEIMIIBBDAOBgNVHQ8BAf8EBAMC
AYYwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMBMBIGA1UdEwEB/wQIMAYB
Af8CAQAwHQYDVR0OBBYEFBQusxe3WFbLrlAJQOYfr52LFMLGMB8GA1UdIwQYMBaA
FHm0WeZ7tuXkAXOACIjIGlj26ZtuMDIGCCsGAQUFBwEBBCYwJDAiBggrBgEFBQcw
AoYWaHR0cDovL3gxLmkubGVuY3Iub3JnLzAnBgNVHR8EIDAeMBygGqAYhhZodHRw
Oi8veDEuYy5sZW5jci5vcmcvMCIGA1UdIAQbMBkwCAYGZ4EMAQIBMA0GCysGAQQB
gt8TAQEBMA0GCSqGSIb3DQEBCwUAA4ICAQCFyk5HPqP3hUSFvNVneLKYY611TR6W
PTNlclQtgaDqw+34IL9fzLdwALduO/ZelN7kIJ+m74uyA+eitRY8kc607TkC53wl
ikfmZW4/RvTZ8M6UK+5UzhK8jCdLuMGYL6KvzXGRSgi3yLgjewQtCPkIVz6D2QQz
CkcheAmCJ8MqyJu5zlzyZMjAvnnAT45tRAxekrsu94sQ4egdRCnbWSDtY7kh+BIm
lJNXoB1lBMEKIq4QDUOXoRgffuDghje1WrG9ML+Hbisq/yFOGwXD9RiX8F6sw6W4
avAuvDszue5L3sz85K+EC4Y/wFVDNvZo4TYXao6Z0f+lQKc0t8DQYzk1OXVu8rp2
yJMC6alLbBfODALZvYH7n7do1AZls4I9d1P4jnkDrQoxB3UqQ9hVl3LEKQ73xF1O
yK5GhDDX8oVfGKF5u+decIsH4YaTw7mP3GFxJSqv3+0lUFJoi5Lc5da149p90Ids
hCExroL1+7mryIkXPeFM5TgO9r0rvZaBFOvV2z0gp35Z0+L4WPlbuEjN/lxPFin+
HlUjr8gRsI3qfJOQFy/9rKIJR0Y/8Omwt/8oTWgy1mdeHmmjk7j1nYsvC9JSQ6Zv
MldlTTKB3zhThV1+XWYp6rjd5JW1zbVWEkLNxE7GJThEUG3szgBVGP7pSWTUTsqX
nLRbwHOoq7hHwg==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
-----END CERTIFICATE-----
EOF
# Create service account
vcd login vcd.vmwire.com system administrator -p Vmware1!
cse create-service-role vcd.vmwire.com
# Enter system administrator username and password
# Create VCD service account for CSE
vcd user create --enabled svc-cse Vmware1! "CSE Service Role"
# Create config file
mkdir -p /opt/vmware/cse/config
cat > /opt/vmware/cse/config/config-not-encrypted.conf << EOF
mqtt:
verify_ssl: false
vcd:
host: vcd.vmwire.com
log: true
password: Vmware1!
port: 443
username: administrator
verify: true
vcs:
- name: vcenter.vmwire.com
password: Vmware1!
username: administrator@vsphere.local
verify: true
service:
enforce_authorization: false
legacy_mode: false
log_wire: false
no_vc_communication_mode: false
processors: 15
telemetry:
enable: true
broker:
catalog: cse-catalog
ip_allocation_mode: pool
network: default-organization-network
org: cse
remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/master/template_v2.yaml
storage_profile: 'iscsi'
vdc: cse-vdc
EOF
cse encrypt /opt/vmware/cse/config/config-not-encrypted.conf --output /opt/vmware/cse/config/config.yaml
chmod 600 /opt/vmware/cse/config/config.yaml
cse check /opt/vmware/cse/config/config.yaml
cse template list
# Import TKGm ova with this command
# Copy the ova to /tmp/ first, the ova can be obtained from my.vmware.com, ensure that it has chmod 644 permissions.
cse template import -F /tmp/ubuntu-2004-kube-v1.20.5-vmware.2-tkg.1-6700972457122900687.ova
# You may need to enable 644 permissions on the file if cse complains that the file is not readable.
# Install CSE
cse install -k ~/.ssh/authorized_keys
# Or use this if you've already installed and want to skip template creation again
cse upgrade --skip-template-creation -k ~/.ssh/authorized_keys
# Register the cse extension with vcd if it did not already register
vcd system extension create cse cse cse vcdext '/api/cse, /api/cse/.*, /api/cse/.*/.*'
# Setup cse.sh
cat > /opt/vmware/cse/cse.sh << EOF
#!/usr/bin/env bash
source /opt/vmware/cse/python/bin/activate
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
cse run
EOF
# Make cse.sh executable
chmod +x /opt/vmware/cse/cse.sh
# Deactivate the python virtual environment and go back to root
deactivate
exit
# Setup cse.service, use MQTT and not RabbitMQ
cat > /etc/systemd/system/cse.service << EOF
[Unit]
Description=Container Service Extension for VMware Cloud Director
[Service]
ExecStart=/opt/vmware/cse/cse.sh
User=cse
WorkingDirectory=/opt/vmware/cse
Type=simple
Restart=always
[Install]
WantedBy=default.target
EOF
systemctl enable cse.service
systemctl start cse.service
systemctl status cse.service
Enable the CSE UI Plugin for VCD
The new CSE UI extension is bundled with VCD 10.3.1.
Enable it for the tenants that you want or for all tenants.
For 3.1.1 you will also need to edit the cse:nativeCluster Entitlement Rights Bundle and add the two following rights:
ACCESS CONTROL, User, Manage user’s own API token
COMPUTE, Organization VDC, Create a Shared Disk
Then publish the Rights Bundle to all tenants.
Enable Global Roles to use CSE or Configure Rights Bundles
The quickest way to get CSE working is to add the relevant rights to the Organization Administrator role. You can create a custom rights bundle and create a custom role for the k8s admin tenant persona if you like. I won’t cover that in this post.
Log in as the /Provider and go to the Administration menu and click on Global Roles on the left.
Edit the Organization Administrator role and scroll all the way down to the bottom and click both the View 8/8 and Manage 12/12, then Save.
Setting up VCD CSI and CPI Operators
You may notice that when the cluster is up you might not be able to deploy any pods, this is because the cluster is not ready and is in a tainted state due to the CSI and CPI Operators not having the credentials.
This article describes how to setup vCenter, VCD, NSX-T and NSX Advanced Load Balancer to support exposing Kubernetes applications in Kubernetes clusters provisioned into VCD.
At the end of this post, you would be able to run this command:
… and have NSX ALB together with VCD and NSX-T automate the provisioning and setup of everything that allows you to expose that application to the outside world using a Kubernetes service of type LoadBalancer.
This article describes how to setup vCenter, VCD, NSX-T and NSX Advanced Load Balancer to support exposing Kubernetes applications in Kubernetes clusters provisioned into VCD.
At the end of this post, you would be able to run this command:
… and have NSX ALB together with VCD and NSX-T automate the provisioning and setup of everything that allows you to expose that application to the outside world using a Kubernetes service of type LoadBalancer.
Create a Content Library for NSX ALB
In vCenter (Resource vCenter managing VCD PVDCs), create a Content Library for NSX Advanced Load Balancer to use to upload the service engine ova.
Create T1 for Avi Service Engine management network
Create T1 for Avi Service Engine management network. You can either attach this T1 to the default T0 or create a new T0.
enable DHCP server for the T1
enable All Static Routes and All Connected Segments & Service Ports under Route Advertisement
Create a network segment for Service Engine management network
Create a network segment for Avi Service Engine management network. Attach the segment to the T1 the was created in the previous step.
Ensure you enable DHCP, this will assign IP addresses to the service engines automatically and you won’t need to setup IPAM profiles in Avi Vantage.
NSX Advanced Load Balancer Settings
A couple of things to setup here.
You do not need to create any tenants in NSX ALB, just use the default admin context.
No IPAM/DNS Profiles are required as we will use DHCP from NSX-T for all networks.
Use FQDNs instead of IP addresses
Use the same FQDN in all systems for consistency and to ensure that registration between the systems work
NSX ALB
VCD
NSX-T
Navigate to Administration, User Credentials and setup user credentials for NSX-T controller and vCenter server
Navigate to Administration, Settings, Tenant Settings and ensure that the settings are as follows
Setup an NSX-T Cloud
Navigate to Infrastructure, Clouds. Setup your cloud similar to mine, I have valled my NSX-T cloud nsx.vmwire.com (which is the FQDN of my NSX-T Controller).
Lets go through these settings from the top.
use the FQDN of your NSX-T manager for the name
click the DHCP option, we will be using NSX-T’s DHCP server so we can ignore IPAM/DNS later
enter something for the Object Name Prefix, this will give the SE VM name a prefix so they can be identified in vCenter. I used avi here, so it will look like this in vCenter
type the FQDN of the NSX-T manager into the NSX-T Manager Address
choose the NSX-T Manager Credentials that you configured earlier
select the Transport Zone that you are using in VCD for your tenants
under Management Network Segment, select the T1 that you created earlier for SE management networking
under Segment ID, select the network segment that you created earlier for the SE management network
click ADD under the Data Network Segment(s)
select the T1 that is used by the tenant in VCD
select the tenant organization routed network that is attached to the t1 in the previous task
the two previous settings tell NSX ALB where to place the data/vip network for front-end load balancing use. NSX-ALB will create a new segment for this in NSX-T automatically, and VCD will automatically create DNAT rules when a virtual service is requested in NSX ALB
the last step is to add the vCenter server, this would be the vCenter server that is managing the PVDCs used in VCD.
Now wait for a while until the status icon turns green and shows Complete.
Setup a Service Engine Group
Decide whether you want to use a shared service engine group for all VCD tenants or dedicated a service engine group for each Tenant.
I use the dedicated model.
navigate to Infrastructure, Service Engine Group
change the cloud to the NSX-T cloud that you setup earlier
create a new service engine group with your preferred settings, you can read about the options here.
Setup Avi in VCD
Log into VCD as a Provider and navigate to Resources, Infrastructure Resources, NSX-ALB, Controllers and click on the ADD link.
Wait for a while for Avi to sync with VCD. Then continue to add the NSX-T Cloud.
Navigate to Resources, Infrastructure Resources, NSX-ALB, NSX-T Clouds and click on the ADD link.
Proceed when you can see the status is healthy.
Navigate to Resources, Infrastructure Resources, NSX-ALB, Service Engine Groups and click on the ADD link.
Staying logged in as a Provider, navigate to the tenant that you wish to enable NSX ALB load balancing services and navigate to Networking, Edge Gateways, Load Balancer, Service Engine Groups. Then add the service engine group to this tenant.
This will enable this tenant to use NSX ALB load balancing services.
Deploy a new Kubernetes cluster in VCD with Container Service Extension
Deploy a new Kubernetes cluster using Container Service Extension in VCD as normal.
Once the cluster is ready, download the kube config file and log into the cluster.
Check that all the nodes and pods are up as normal.
You might see that the following pods in the kube-system namespace are in a pending state. If everything is already working then move onto the next section.
Wait for the load balancer service to start and the pod to go into a running state. During this time, you’ll see the service engines being provisioned automatically by NSX ALB. It’ll take 10 minutes or so to get everything up and running.
You can use this command to check when the load balancer service has completed and check the EXTERNAL-IP.
kubectl get service webserver
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
webserver LoadBalancer 100.71.45.194 10.149.1.114 80:32495/TCP 7h48m
You can see that NSX ALB, VCD and NSX-T all worked together to expose the nginx applicationto the outside world.
The external IP of 10.149.1.114 in my environment is an uplink segment on a T0 that I have configured for VCD tenants to use as egress and ingress into their organization VDC. It is the external network for their VDCs.
Paste the external IP into a web browser and you should see the nginx web page.
In the next post, I’ll go over the end to end network flow to show how this all connects NSX ALB, VCD, NSX-T and Kubernetes together.
Container Service Extension (CSE) 3.1.1 now supports persistent volumes that are backed by VCD’s Named Disk feature.
Setting up the VCD CSI driver on your Kubernetes cluster
Container Service Extension (CSE) 3.1.1 now supports persistent volumes that are backed by VCD’s Named Disk feature. These now appear under Storage – Named disks in VCD. To use this functionality today (28 September 2021), you’ll need to deploy CSE 3.1.1 beta with VCD 10.3. See this previous post for details.
Ideally, you want to deploy the CSI driver using the same user that also deployed the Kubernetes cluster into VCD. In my environment, I used a user named tenant1-admin, this user has the Organization Administrator role with the added right:
Compute – Organization VDC – Create a Shared Disk.
Create the vcloud-basic-auth.yaml
Before you can create persistent volumes you have to setup the Kubernetes cluster with the VCD CSI driver.
Ensure you can log into the cluster by downloading the kube config and logging into it using the correct context.
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* kubernetes-admin@kubernetes kubernetes kubernetes-admin
Create the vcloud-basic-auth.yaml file which is used to setup the VCD CSI driver for this Kubernetes cluster.
Notice that the storageProfile needs to be set to either “*” for any storage policy or the name of a storage policy that you has access to in your Organization VDC.
Create the storage class by applying that file.
kubectl apply -f storage-class.yaml
You can see if that was successful by getting all storage classes.
kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
vcd-disk-dev named-disk.csi.cloud-director.vmware.com Delete Immediate false 43h
Now that we’ve got a storage class and the driver installed, we can now deploy a persistent volume claim and attach it to a pod. Lets create a persistent volume claim first.
Creating a persistent volume claim
We will need to prepare another file, I’ve called my my-pvc.yaml, and it looks like this.
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
my-pvc Bound pvc-2ddeccd0-e092-4aca-a090-dff9694e2f04 1Gi RWO vcd-disk-dev 36m
Attaching the persistent volume to a pod
Lets deploy an nginx pod that will attach the PV and use it for nginx.
You can see that the persistentVolumeClaim, claimName: my-pvc, this aligns to the name of the PVC. I’ve also mounted it to /usr/share/nginx/html within the nginx pod.
Lets attach the PV.
kubectl apply -f pod.yaml
You’ll see a few things happen in the Recent Tasks pane when you run this. You can see that Kubernetes has attached the PV to the nginx pod using the CSI driver, the driver informs VCD to attach the disk to the worker node.
If you open up vSphere Web Client, you can see that the disk is now attached to the worker node.
You can also see the CSI driver doing its thing if you take a look at the logs with this command.
You can log into the nginx pod using this command.
kubectl exec -it pod -- bash
Then type mount and df to see the mount is present and the size of the mount point.
df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 999320 1288 929220 1% /usr/share/nginx/html
mount
/dev/sdb on /usr/share/nginx/html type ext4 (rw,relatime)
The size is correct, being 1GB and the disk is mounted.
Describing the pod gives us more information.
kubectl describe po pod
Name: pod
Namespace: default
Priority: 0
Node: node-xgsw/192.168.0.101
Start Time: Sun, 26 Sep 2021 12:43:15 +0300
Labels: app=nginx
Annotations: <none>
Status: Running
IP: 100.96.1.12
IPs:
IP: 100.96.1.12
Containers:
my-pod-container:
Container ID: containerd://6a194ac30dab7dc5a5127180af139e531e650bedbb140e4dc378c21869bd570f
Image: nginx
Image ID: docker.io/library/nginx@sha256:853b221d3341add7aaadf5f81dd088ea943ab9c918766e295321294b035f3f3e
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 26 Sep 2021 12:43:34 +0300
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/nginx/html from my-pod-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xm4gd (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
my-pod-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: my-pvc
ReadOnly: false
default-token-xm4gd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-xm4gd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Useful commands
Show storage classes
kubectl get storageclass
Show persistent volumes and persistent volume claims
Photon OS 3 does not support Linux guest customization unfortunately, so we will use the links below to manually setup the OS with a hostname and static IP address.
Boot the VM, the default credentials are root with password changeme. Change the default password.
Photon 3 has the older repositories, so we will need to update to newer repositories as detailed in this KB article. I’ve included this in the instructions below.
Copypasta or use create a bash script.
# Update Photon repositories
cd /etc/yum.repos.d/
sed -i 's/dl.bintray.com\/vmware/packages.vmware.com\/photon\/$releasever/g' photon.repo photon-updates.repo photon-extras.repo photon-debuginfo.repo
# Update Photon
tdnf --assumeyes update
# Install dependencies
tdnf --assumeyes install build-essential python3-devel python3-pip git
# Prepare cse user and application directories
mkdir -p /opt/vmware/cse
chmod 775 -R /opt
chmod 777 /
groupadd cse
useradd cse -g cse -m -p Vmware1! -d /opt/vmware/cse
chown cse:cse -R /opt
# Run as cse user
su - cse
mkdir -p ~/.ssh
cat >> ~/.ssh/authorized_keys << EOF
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
EOF
cat >> ~/.bash_profile << EOF
# For Container Service Extension
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
source /opt/vmware/cse/python/bin/activate
EOF
# Install CSE in virtual environment
python3 -m venv /opt/vmware/cse/python
source /opt/vmware/cse/python/bin/activate
pip3 install git+https://github.com/vmware/container-service-extension.git@3.1.1.0b2
cse version
source ~/.bash_profile
# Prepare vcd-cli
mkdir -p ~/.vcd-cli
cat > ~/.vcd-cli/profiles.yaml << EOF
extensions:
- container_service_extension.client.cse
EOF
vcd cse version
# Add my Let's Encrypt intermediate and root certs. Use your certificates issued by your CA to enable verify=true with CSE.
cat >> /opt/vmware/cse/python/lib/python3.7/site-packages/certifi/cacert.pem << EOF #ok
-----BEGIN CERTIFICATE-----
MIIFFjCCAv6gAwIBAgIRAJErCErPDBinU/bWLiWnX1owDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMjAwOTA0MDAwMDAw
WhcNMjUwOTE1MTYwMDAwWjAyMQswCQYDVQQGEwJVUzEWMBQGA1UEChMNTGV0J3Mg
RW5jcnlwdDELMAkGA1UEAxMCUjMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQC7AhUozPaglNMPEuyNVZLD+ILxmaZ6QoinXSaqtSu5xUyxr45r+XXIo9cP
R5QUVTVXjJ6oojkZ9YI8QqlObvU7wy7bjcCwXPNZOOftz2nwWgsbvsCUJCWH+jdx
sxPnHKzhm+/b5DtFUkWWqcFTzjTIUu61ru2P3mBw4qVUq7ZtDpelQDRrK9O8Zutm
NHz6a4uPVymZ+DAXXbpyb/uBxa3Shlg9F8fnCbvxK/eG3MHacV3URuPMrSXBiLxg
Z3Vms/EY96Jc5lP/Ooi2R6X/ExjqmAl3P51T+c8B5fWmcBcUr2Ok/5mzk53cU6cG
/kiFHaFpriV1uxPMUgP17VGhi9sVAgMBAAGjggEIMIIBBDAOBgNVHQ8BAf8EBAMC
AYYwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMBMBIGA1UdEwEB/wQIMAYB
Af8CAQAwHQYDVR0OBBYEFBQusxe3WFbLrlAJQOYfr52LFMLGMB8GA1UdIwQYMBaA
FHm0WeZ7tuXkAXOACIjIGlj26ZtuMDIGCCsGAQUFBwEBBCYwJDAiBggrBgEFBQcw
AoYWaHR0cDovL3gxLmkubGVuY3Iub3JnLzAnBgNVHR8EIDAeMBygGqAYhhZodHRw
Oi8veDEuYy5sZW5jci5vcmcvMCIGA1UdIAQbMBkwCAYGZ4EMAQIBMA0GCysGAQQB
gt8TAQEBMA0GCSqGSIb3DQEBCwUAA4ICAQCFyk5HPqP3hUSFvNVneLKYY611TR6W
PTNlclQtgaDqw+34IL9fzLdwALduO/ZelN7kIJ+m74uyA+eitRY8kc607TkC53wl
ikfmZW4/RvTZ8M6UK+5UzhK8jCdLuMGYL6KvzXGRSgi3yLgjewQtCPkIVz6D2QQz
CkcheAmCJ8MqyJu5zlzyZMjAvnnAT45tRAxekrsu94sQ4egdRCnbWSDtY7kh+BIm
lJNXoB1lBMEKIq4QDUOXoRgffuDghje1WrG9ML+Hbisq/yFOGwXD9RiX8F6sw6W4
avAuvDszue5L3sz85K+EC4Y/wFVDNvZo4TYXao6Z0f+lQKc0t8DQYzk1OXVu8rp2
yJMC6alLbBfODALZvYH7n7do1AZls4I9d1P4jnkDrQoxB3UqQ9hVl3LEKQ73xF1O
yK5GhDDX8oVfGKF5u+decIsH4YaTw7mP3GFxJSqv3+0lUFJoi5Lc5da149p90Ids
hCExroL1+7mryIkXPeFM5TgO9r0rvZaBFOvV2z0gp35Z0+L4WPlbuEjN/lxPFin+
HlUjr8gRsI3qfJOQFy/9rKIJR0Y/8Omwt/8oTWgy1mdeHmmjk7j1nYsvC9JSQ6Zv
MldlTTKB3zhThV1+XWYp6rjd5JW1zbVWEkLNxE7GJThEUG3szgBVGP7pSWTUTsqX
nLRbwHOoq7hHwg==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFYDCCBEigAwIBAgIQQAF3ITfU6UK47naqPGQKtzANBgkqhkiG9w0BAQsFADA/
MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT
DkRTVCBSb290IENBIFgzMB4XDTIxMDEyMDE5MTQwM1oXDTI0MDkzMDE4MTQwM1ow
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwggIiMA0GCSqGSIb3DQEB
AQUAA4ICDwAwggIKAoICAQCt6CRz9BQ385ueK1coHIe+3LffOJCMbjzmV6B493XC
ov71am72AE8o295ohmxEk7axY/0UEmu/H9LqMZshftEzPLpI9d1537O4/xLxIZpL
wYqGcWlKZmZsj348cL+tKSIG8+TA5oCu4kuPt5l+lAOf00eXfJlII1PoOK5PCm+D
LtFJV4yAdLbaL9A4jXsDcCEbdfIwPPqPrt3aY6vrFk/CjhFLfs8L6P+1dy70sntK
4EwSJQxwjQMpoOFTJOwT2e4ZvxCzSow/iaNhUd6shweU9GNx7C7ib1uYgeGJXDR5
bHbvO5BieebbpJovJsXQEOEO3tkQjhb7t/eo98flAgeYjzYIlefiN5YNNnWe+w5y
sR2bvAP5SQXYgd0FtCrWQemsAXaVCg/Y39W9Eh81LygXbNKYwagJZHduRze6zqxZ
Xmidf3LWicUGQSk+WT7dJvUkyRGnWqNMQB9GoZm1pzpRboY7nn1ypxIFeFntPlF4
FQsDj43QLwWyPntKHEtzBRL8xurgUBN8Q5N0s8p0544fAQjQMNRbcTa0B7rBMDBc
SLeCO5imfWCKoqMpgsy6vYMEG6KDA0Gh1gXxG8K28Kh8hjtGqEgqiNx2mna/H2ql
PRmP6zjzZN7IKw0KKP/32+IVQtQi0Cdd4Xn+GOdwiK1O5tmLOsbdJ1Fu/7xk9TND
TwIDAQABo4IBRjCCAUIwDwYDVR0TAQH/BAUwAwEB/zAOBgNVHQ8BAf8EBAMCAQYw
SwYIKwYBBQUHAQEEPzA9MDsGCCsGAQUFBzAChi9odHRwOi8vYXBwcy5pZGVudHJ1
c3QuY29tL3Jvb3RzL2RzdHJvb3RjYXgzLnA3YzAfBgNVHSMEGDAWgBTEp7Gkeyxx
+tvhS5B1/8QVYIWJEDBUBgNVHSAETTBLMAgGBmeBDAECATA/BgsrBgEEAYLfEwEB
ATAwMC4GCCsGAQUFBwIBFiJodHRwOi8vY3BzLnJvb3QteDEubGV0c2VuY3J5cHQu
b3JnMDwGA1UdHwQ1MDMwMaAvoC2GK2h0dHA6Ly9jcmwuaWRlbnRydXN0LmNvbS9E
U1RST09UQ0FYM0NSTC5jcmwwHQYDVR0OBBYEFHm0WeZ7tuXkAXOACIjIGlj26Ztu
MA0GCSqGSIb3DQEBCwUAA4IBAQAKcwBslm7/DlLQrt2M51oGrS+o44+/yQoDFVDC
5WxCu2+b9LRPwkSICHXM6webFGJueN7sJ7o5XPWioW5WlHAQU7G75K/QosMrAdSW
9MUgNTP52GE24HGNtLi1qoJFlcDyqSMo59ahy2cI2qBDLKobkx/J3vWraV0T9VuG
WCLKTVXkcGdtwlfFRjlBz4pYg1htmf5X6DYO8A4jqv2Il9DjXA6USbW1FzXSLr9O
he8Y4IWS6wY7bCkjCWDcRQJMEhg76fsO3txE+FiYruq9RUWhiF1myv4Q6W+CyBFC
Dfvp7OOGAN6dEOM4+qR9sdjoSYKEBpsr6GtPAQw4dy753ec5
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDSjCCAjKgAwIBAgIQRK+wgNajJ7qJMDmGLvhAazANBgkqhkiG9w0BAQUFADA/
MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT
DkRTVCBSb290IENBIFgzMB4XDTAwMDkzMDIxMTIxOVoXDTIxMDkzMDE0MDExNVow
PzEkMCIGA1UEChMbRGlnaXRhbCBTaWduYXR1cmUgVHJ1c3QgQ28uMRcwFQYDVQQD
Ew5EU1QgUm9vdCBDQSBYMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB
AN+v6ZdQCINXtMxiZfaQguzH0yxrMMpb7NnDfcdAwRgUi+DoM3ZJKuM/IUmTrE4O
rz5Iy2Xu/NMhD2XSKtkyj4zl93ewEnu1lcCJo6m67XMuegwGMoOifooUMM0RoOEq
OLl5CjH9UL2AZd+3UWODyOKIYepLYYHsUmu5ouJLGiifSKOeDNoJjj4XLh7dIN9b
xiqKqy69cK3FCxolkHRyxXtqqzTWMIn/5WgTe1QLyNau7Fqckh49ZLOMxt+/yUFw
7BZy1SbsOFU5Q9D8/RhcQPGX69Wam40dutolucbY38EVAjqr2m7xPi71XAicPNaD
aeQQmxkqtilX4+U9m5/wAl0CAwEAAaNCMEAwDwYDVR0TAQH/BAUwAwEB/zAOBgNV
HQ8BAf8EBAMCAQYwHQYDVR0OBBYEFMSnsaR7LHH62+FLkHX/xBVghYkQMA0GCSqG
SIb3DQEBBQUAA4IBAQCjGiybFwBcqR7uKGY3Or+Dxz9LwwmglSBd49lZRNI+DT69
ikugdB/OEIKcdBodfpga3csTS7MgROSR6cz8faXbauX+5v3gTt23ADq1cEmv8uXr
AvHRAosZy5Q6XkjEGB5YGV8eAlrwDPGxrancWYaLbumR9YbK+rlmM6pZW87ipxZz
R8srzJmwN0jP41ZL9c8PDHIyh8bwRLtTcm1D9SZImlJnt1ir/md2cXjbDaJWFBM5
JDGFoqgCWjBH4d1QB7wCCZAA62RjYJsWvIjJEubSfZGL+T0yjWW06XyxV3bqxbYo
Ob8VZRzI9neWagqNdwvYkQsEjgfbKbYK7p2CNTUQ
-----END CERTIFICATE-----
EOF
# Create service account
vcd login vcd.vmwire.com system administrator -p Vmware1!
cse create-service-role vcd.vmwire.com
# Enter system administrator username and password
# Create VCD service account for CSE
vcd user create --enabled svc-cse Vmware1! "CSE Service Role"
# Create config file
mkdir -p /opt/vmware/cse/config
cat > /opt/vmware/cse/config/config-not-encrypted.conf << EOF
mqtt:
verify_ssl: false
vcd:
host: vcd.vmwire.com
log: true
password: Vmware1!
port: 443
username: administrator
verify: true
vcs:
- name: vcenter.vmwire.com
password: Vmware1!
username: administrator@vsphere.local
verify: true
service:
enforce_authorization: false
legacy_mode: false
log_wire: false
processors: 15
telemetry:
enable: true
broker:
catalog: cse-catalog
ip_allocation_mode: pool
network: default-organization-network
org: cse
remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/master/template_v2.yaml
storage_profile: 'truenas-iscsi-luns'
vdc: cse-vdc
EOF
cse encrypt /opt/vmware/cse/config/config-not-encrypted.conf --output /opt/vmware/cse/config/config.yaml
chmod 600 /opt/vmware/cse/config/config.yaml
cse check /opt/vmware/cse/config/config.yaml
cse template list
mkdir -p ~/.ssh
# Add your public key(s) here
cat >> ~/.ssh/authorized_keys << EOF
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
EOF
# Import TKGm ova with this command
# Copy the ova to /home/ first, the ova can be obtained from my.vmware.com, ensure that it has chmod 644 permissions.
cse template import -F /home/ubuntu-2004-kube-v1.20.5-vmware.2-tkg.1-6700972457122900687.ova
# Install CSE
cse install -k ~/.ssh/authorized_keys
# Or use this if you've already installed and want to skip template creation again
cse upgrade --skip-template-creation -k ~/.ssh/authorized_keys
# Setup cse.sh
cat > /opt/vmware/cse/cse.sh << EOF
#!/usr/bin/env bash
source /opt/vmware/cse/python/bin/activate
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
cse run
EOF
# Make cse.sh executable
chmod +x /opt/vmware/cse/cse.sh
# Deactivate the python virtual environment and go back to root
deactivate
exit
# Setup cse.service, use MQTT and not RabbitMQ
cat > /etc/systemd/system/cse.service << EOF
[Unit]
Description=Container Service Extension for VMware Cloud Director
[Service]
ExecStart=/opt/vmware/cse/cse.sh
User=cse
WorkingDirectory=/opt/vmware/cse
Type=simple
Restart=always
[Install]
WantedBy=default.target
EOF
systemctl enable cse.service
systemctl start cse.service
systemctl status cse.service
Enable Global Roles to use CSE or Configure Rights Bundles
The quickest way to get CSE working is to add the relevant rights to the Organization Administrator role. You can create a custom rights bundle and create a custom role for the k8s admin tenant persona if you like. I won’t cover that in this post.
Log in as the /Provider and go to the Administration menu and click on Global Roles on the left.
Edit the Organization Administrator role and scroll all the way down to the bottom and click both the View 8/8 and Manage 12/12, then Save.
A quick note on the Rights Bundles for Container Service Extension when enabling native, TKGm or TKGs clusters.
The rights bundle named vmware:tkgcluster Entitlement are for TKGs clusters and NOT for TKGm.
The rights bundle named cse:nativeCluster Entitlement are for native clusters AND also for TKGm clusters.
Yes, this is very confusing and will be fixed in an upcoming release.
You can see a brief note about this on the release notes here.
Users deploying VMware Tanzu Kubernetes Grid clusters should have the rights required to deploy exposed native clusters and additionally the right Full Control: CSE:NATIVECLUSTER. This right is crucial for VCD CPI to work properly.
So in summary, for a user to be able to deploy TKGm clusters they will need to have the cse:nativeCluster Entitlement rights.
To publish these rights, go to the Provider portal and navigate to Administration, Rights Bundles.
Click on the radio button next to cse:nativeCluster Entitlement and click on Publish, then publish to the desired tenant or to all tenants.
In this post I show how to deploy the Kubernetes Dashboard onto a Tanzu Kubernetes Grid cluster.
Dashboard provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.
Dashboard is a web-based Kubernetes user interface. You can use Dashboard to deploy containerized applications to a Kubernetes cluster, troubleshoot your containerized application, and manage the cluster resources. You can use Dashboard to get an overview of applications running on your cluster, as well as for creating or modifying individual Kubernetes resources (such as Deployments, Jobs, DaemonSets, etc). For example, you can scale a Deployment, initiate a rolling update, restart a pod or deploy new applications using a deploy wizard.
Dashboard also provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.
In the previous post I prepared NSX ALB for Tanzu Kubernetes Grid ingress services. In this post I will deploy a new TKG cluster and use if for Tanzu Shared Services.
Tanzu Kubernetes Grid includes binaries for tools that provide in-cluster and shared services to the clusters running in your Tanzu Kubernetes Grid instance. All of the provided binaries and container images are built and signed by VMware.
A shared services cluster, is just a Tanzu Kubernetes Grid workload cluster used for shared services, it can be provisioned using the standard cli command tanzu cluster create, or through Tanzu Mission Control.
In the previous post I prepared NSX ALB for Tanzu Kubernetes Grid ingress services. In this post I will deploy a new TKG cluster and use if for Tanzu Shared Services.
Tanzu Kubernetes Grid includes binaries for tools that provide in-cluster and shared services to the clusters running in your Tanzu Kubernetes Grid instance. All of the provided binaries and container images are built and signed by VMware.
A shared services cluster, is just a Tanzu Kubernetes Grid workload cluster used for shared services, it can be provisioned using the standard cli command tanzu cluster create, or through Tanzu Mission Control.
You can add functionalities to Tanzu Kubernetes clusters by installing extensions to different cluster locations as follows:
The Harbor service runs on a shared services cluster, to serve all the other clusters in an installation. The Harbor service requires the Contour service to also run on the shared services cluster. In many environments, the Harbor service also benefits from External DNS running on its cluster, as described in Harbor Registry and External DNS.
Some extensions require or are enhanced by other extensions deployed to the same cluster:
Contour is required by Harbor, External DNS, and Grafana
Prometheus is required by Grafana
External DNS is recommended for Harbor on infrastructures with load balancing (AWS, Azure, and vSphere with NSX Advanced Load Balancer), especially in production or other environments in which Harbor availability is important.
Each Tanzu Kubernetes Grid instance can only have one shared services cluster.
Relationships
The following table shows the relationships between the NSX ALB system, the TKG cluster deployment config and the AKO config. It is important to get these three correct.
Avi Controller
TKG cluster deployment file
AKO Config file
Service Engine Group name tkg-ssc-se-group
AVI_LABELS 'cluster': 'tkg-ssc'
clusterSelector: matchLabels: cluster: tkg-ssc
serviceEngineGroup: tkg-ssc-se-group
TKG Cluster Deployment Config File – tkg-ssc.yaml
Lets first take a look at the deployment configuration file for the Shared Services Cluster.
I’ve highlighted in bold the two key value pairs that are important in this file. You’ll notice that
AVI_LABELS: |
'cluster': 'tkg-ssc'
We are labeling this TKG cluster so that Avi knows about it. In addition the other key value pair
AVI_SERVICE_ENGINE_GROUP: tkg-ssc-se-group
This ensures that this TKG cluster will use the service engine group named tkg-ssc-se-group.
While we have this file open you’ll notice that the long certificate under AVI_CA_DATA_B64 is the copy and paste of the Avi Controller certificate that I copied from the previous post.
Take some time to review my cluster deployment config file for the Shared Services Cluster below. You’ll see that you will need to specify the VIP network for NSX ALB to use
AVI_DATA_NETWORK: tkg-ssc-vip
AVI_DATA_NETWORK_CIDR: 172.16.4.32/27
Basically, any key that begins with AVI_ needs to have the corresponding setting configured in NSX ALB. This is what we prepared in the previous post.
The next file we need to configure is the AKODeploymentConfig file, this file is used by Kubernetes to ensure that the L4 load balancing is using NSX ALB.
I’ve highlighted some settings that are important.
clusterSelector: matchLabels: cluster: tkg-ssc
Here we are specifying a cluster selector for AKO that will use the name of the cluster, this corresponds to the following setting in the tkg-ssc.yaml file.
AVI_LABELS: | 'cluster': 'tkg-ssc'
The next key value pair specifies what Service Engines to use for this TKG cluster. This is of course what we configured within Avi in the previous post.
Setup the new AKO configuration before deploying the new TKG cluster
Before deploying the new TKG cluster, we have to setup a new AKO configuration. To do this run the following command under the TKG Management Cluster context.
kubectl apply -f <Path_to_YAML_File>
Which in my example is
kubectl apply -f tkg-ssc-akodeploymentconfig.yaml
You can use the following to check that that was successful.
kubectl get akodeploymentconfig
root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl get akodeploymentconfig
NAME AGE
ako-for-tkg-ssc 3d19h
You can also show additional details by using the kubectl describe command
For any new AKO configs that you need, just take a copy of the .yaml file and edit the contents that correspond to the new AKO config. For example, to create another AKO config for a new tenant, take a copy of the tkg-ssc-akodeploymentconfig.yaml file and give it a new name such as tkg-tenant-1-akodeploymentconfig.yaml, and change the following highlighted key value pairs.
root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# tanzu cluster list --include-management-cluster NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN tkg-ssc default running 1/1 1/1 v1.20.5+vmware.2 tanzu-services dev tkg-mgmt tkg-system running 1/1 1/1 v1.20.5+vmware.2 management dev
Add the key value pair of cluster=tkg-ssc to label this cluster and complete the setup of AKO.
kubectl label cluster tkg-ssc cluster=tkg-ssc
Once the cluster is labelled, switch to the tkg-ssc context and you will notice a new namespace named “avi-system” being created and a new pod named “ako-0” being started.
root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl get ns NAME STATUS AGE avi-system Active 3d18h cert-manager Active 3d15h default Active 3d19h kube-node-lease Active 3d19h kube-public Active 3d19h kube-system Active 3d19h kubernetes-dashboard Active 3d16h tanzu-system-monitoring Active 3d15h tkg-system Active 3d19h tkg-system-public Active 3d19h
root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl get pods -n avi-system NAME READY STATUS RESTARTS AGE ako-0 1/1 Running 0 3d18h
Summary
We now have a new TKG Shared Services Cluster up and running and configured for Kubernetes ingress services with NSX ALB.
In the next post I’ll deploy the Kubernetes Dashboard onto the Shared Services Cluster and show how this then configures the NSX ALB for ingress services.
In this post I describe how to setup NSX ALB (Avi) in preparation for use with Tanzu Kubernetes Grid, more specifically, the Avi Kubernetes Operator (AKO).
AKO is a Kubernetes operator which works as an ingress controller and performs Avi-specific functions in a Kubernetes environment with the Avi Controller. It runs as a pod in the cluster and translates the required Kubernetes objects to Avi objects and automates the implementation of ingresses/routes/services on the Service Engines (SE) via the Avi Controller.
In this post I describe how to setup NSX ALB (Avi) in preparation for use with Tanzu Kubernetes Grid, more specifically, the Avi Kubernetes Operator (AKO).
AKO is a Kubernetes operator which works as an ingress controller and performs Avi-specific functions in a Kubernetes environment with the Avi Controller. It runs as a pod in the cluster and translates the required Kubernetes objects to Avi objects and automates the implementation of ingresses/routes/services on the Service Engines (SE) via the Avi Controller.
Avi Kubernetes Operator Architecture
First lets describe the architecture for TKG + AKO.
For each tenant that you have, you will have at least one AKO configuration.
A tenant can have one or more TKG workload clusters and more than one TKG workload cluster can share an AKO configuration. This is important to remember for multi-tenant services when using Tanzu Kubernetes Grid. However, you can of course configure an AKO config for each TKG workload cluster if you wish to provide multiple AKO configurations. This will require more Service Engines and Service Engine Groups as we will discuss further below.
So as a minimum, you will have several AKO configs. Let me summarize in the following table.
AKO Config
Description
Specification
install-ako-for-all
The default ako configuration used for the TKG Management Cluster and deployed by default
Provider side ako configuration for the TKG Management Cluster only.
ako-for-tkg-ssc
The ako configuration for the Tanzu Shared Services Cluster
Provider side AKO configuration for the Tanzu Shared Services Cluster only.
tkg-ssc-akodeploymentconfig.yaml
ako-for-tenant-1
The ako configuration for Tenant 1
AKO configuration prepared by the Provider and deployed for the tenant to use.
tkg-tenant-1-akodeploymentconfig.yaml
ako-for-tenant-x
The ako configuration for Tenant x
Although TKG deploys a default AKO config, we do not use any ingress services for the TKG Management Cluster. Therefore we do not need to deploy a Service Engine Group and Service Engines for this cluster.
Service Engine Groups and Service Engines are only required if you need ingress services to your applications. We of course need this for the Tanzu Shared Services and any applications deployed into a workload cluster.
I will go into more detail in a follow-up post where I will demonstrate how to setup the Tanzu Shared Services Cluster that uses the preparation steps described in this post.
Lets start the Avi Controller configuration. Although I am using the Tanzu Shared Services Cluster as an example for this guide, the same steps can be repeated for all additional Tanzu Kubernetes Grid workload clusters. All that is needed is a few changes to the .yaml files and you’re good to go.
Clouds
I prefer not to use the Default-Cloud, and will always create a new cloud.
The benefit to using NSX ALB in write mode (Orchestration mode) is that NSX ALB will orchestrate the creation of service engines for you and also scale out more service engines if your applications demand more capacity. However, if you are using VMware Cloud on AWS, this is not possible due to restrictions with the RBAC constraints within VMC so only non-orchestration mode is possible with VMC.
In this post I’m using my home lab which is running vSphere.
Navigate to Infrastructure, Clouds and click on the CREATE button and select the VMware vCenter/VMware vSphere ESX option. This post uses vCenter as a cloud type.
Fill in the details as my screenshots show. You can leave the IPAM Profile settings empty for now, we will complete these in the next step.
Select the Data Center within your vSphere hierarchy. I’m using my home lab for this example. Again leave all the other settings on the defaults.
The next tab will take you to the network options for the management network to use for the service engines. This network needs to be routable between the Avi Controller(s) and the service engines.
The network tab will show you the networks that it finds from the vCenter connection, I am using my management network. This network is where I run all of the management appliances, such as vCenter, NSX-T, Avi Controllers etc.
Its best to configure a static IP pool for the service engines. Generally, you’ll need just a handful of IP addresses as each service engine group will have two service engines and each service engine will only need one management IP. A service engine group can provide Kubernetes load balancing services for the entire Kubernetes cluster. This of course depends on your sizing requirements, and can be reviewed here. For my home lab, fourteen IP addresses is more than sufficient for my needs.
Service Engine Group
While we’re in the Infrastructure settings lets proceed to setup a new Service Engine Group. Navigate to Infrastructure, Service Engine Group, select the new cloud that we previously setup and then click on the CREATE button. Its important that you ensure you select your new cloud from that drop down menu.
Give your new service engine group a name, I tend to use a naming format such as tkg-<cluster-name>-se-group. For this example, I am setting up a new SE group for the Tanzu Shared Services Cluster.
Reduce the maximum number of service engines down if you wish. You can leave all other settings on defaults.
Click on the Advanced tab to setup some vSphere specifics. Here you can setup some options that will help you identify the SEs in the vSphere hierarchy as well as placing the SEs into a VM folder and options to include or exclude compute clusters or hosts and even an option to include or exclude a datastore.
Service Engine groups are important as they are the boundary with which TKG clusters will use for L4 services. Each SE Group needs to have a unique name, this is important as each TKG workload cluster will use this name for its AKODeploymentConfig file, this is the config file that maps a TKG cluster to NSX ALB for L4 load balancing services.
With TKG, when you create a TKG workload cluster you must specify some key value pairs that correspond to service engine group names and this is then applied in the AKODeploymentConfig file.
The following table shows where these relationships lie and I will go into more detail in a follow-up post where I will demonstrate how to setup the Tanzu Shared Services Cluster.
Avi Controller
TKG cluster deployment file
AKO Config file
Service Engine Group name tkg-ssc-se-group
AVI_LABELS 'cluster': 'tkg-ssc'
clusterSelector: matchLabels: cluster: tkg-ssc
serviceEngineGroup: tkg-ssc-se-group
Networks
Navigate to Infrastructure, Networks, again ensure that you select your new cloud from the drop down menu.
The Avi Controller will show you all the networks that it has detected using the vCenter connection that you configured. What we need to do in this section is to configure the networks that NSX ALB will use when configuring a service for Kubernetes to use. Generally, depending on how you setup your network architecture for TKG, you will have one network that the TKG cluster will use and another for the front-end VIP. This network is what you will use to expose the pods on. Think of it as a load balancer DMZ network.
In my home lab, I use the following setup.
Network
Description
Specification
tkg-mgmt
TKG Management Cluster
Network: 172.16.3.0/27 Static IP Pools: 172.16.3.26 – 172.16.3.29
tkg-ssc
TKG Shared Services Cluster
Network: 172.16.3.32/27 Static IP Pools: 172.16.3.59 – 172.16.3.62
tkg-ssc-vip
TKG Shared Services Cluster front-end VIPs
Network: 172.16.4.32/27 Static IP Pools: 172.16.4.34 – 172.16.4.62
IPAM Profile
Create an IPAM profile by navigating to Templates, Profiles, IPAM/DNS Profiles and clicking on the CREATE button and select IPAM Profile.
Select the cloud that you setup above and select all of the usable networks that you will use for applications that will use the load balancer service from NSX ALB. You want to select the networks that you configured in the step above.
Avi Controller Certificate
We also need the SSL certificate used by the Avi Controller, I am using a signed certificate in my home lab from Let’s Encrypt, which I wrote about in a previous post.
Navigate to Templates, Security, SSL/TLS Certificates, click on the icon with a downward arrow in a circle next to the certificate for your Avi Controller, its normally the first one in the list.
Click on the Copy to clipboard button and paste the certificate into Notepad++ or similar.
At this point we have NSX ALB setup for deploying a new TKG workload cluster using the new Service Engine Group that we have prepared. In the next post, I’ll demonstrate how to setup the Tanzu Shared Services Cluster to use NSX ALB for ingress services.
Deploying your first pod with a persistent volume claim and service on vSphere with Tanzu. With sample code for you to try.
Learning the k8s ropes…
This is not a how to article to get vSphere with Tanzu up and running, there are plenty of guides out there, here and here. This post is more of a “lets have some fun with Kubernetes now that I have a vSphere with Tanzu cluster to play with“.
Answering the following question would be a good start to get to grips with understanding Kubernetes from a VMware perspective.
How do I do things that I did in the past in a VM but now do it with Kubernetes in a container context instead?
For example building the certbot application in a container instead of a VM.
Lets try to create an Ubuntu deployment that deploys one Ubuntu container into a vSphere Pod with persistent storage and a load balancer service from NSX-T to get to the /bin/bash shell of the deployed container.
Let’s go!
I created two yaml files for this, accessible from Github. You can read up on what these objects are here.
Filename
Whats it for?
What does it do?
Github link
certbot-deployment.yaml
k8s deployment specification
Deploys one ubuntu pod, claims a 16Gb volume and mounts it to /dev/sdb and creates a load balancer to enable remote management with ssh.
Creates a persistent volume of 16Gb size from the underlying vSphere storage class named tanzu-demo-storage. The PVC is then consumed by the deployment.
kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
certbot 1/1 1 1 47m
kubectl get pods
NAME READY STATUS RESTARTS AGE
certbot-68b4747476-pq5j2 1/1 Running 0 47m
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
certbot-pvc Bound pvc-418a0d4a-f4a6-4aef-a82d-1809dacc9892 16Gi RWO tanzu-demo-storage 84m
Let’s log into our pod, note the name from the kubectl get pods command above.
certbot-68b4747476-pq5j2
Its not yet possible to log into the pod using SSH since this is a fresh container that does not have SSH installed, lets log in first using kubectl and install SSH.
You will then be inside the container at the /bin/bash prompt.
root@certbot-68b4747476-pq5j2:/# ls
bin dev home lib32 libx32 mnt proc run srv tmp var
boot etc lib lib64 media opt root sbin sys usr
root@certbot-68b4747476-pq5j2:/#
Before we can log into the container over an SSH connection, we need to find out what the external IP is for the SSH service that the NSX-T load balancer configured for the deployment. You can find this using the command:
kubectl get services
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
certbot LoadBalancer 10.96.0.44 172.16.2.3 22:31731/TCP 51m
The IP that we use to get to the Ubuntu container over SSH is 172.16.2.3. Lets try that with a putty/terminal session…
login as: root
certbot@172.16.2.3's password:
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 4.19.126-1.ph3-esx x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.
To restore this content, you can run the 'unminimize' command.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
$ ls
bin dev home lib32 libx32 mnt proc run srv tmp var
boot etc lib lib64 media opt root sbin sys usr
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 258724 185032 73692 72% /
/mnt/sdb 16382844 45084 16321376 1% /mnt/sdb
tmpfs 249688 12 249676 1% /run/secrets/kubernetes.io/servic eaccount
/dev/sda 258724 185032 73692 72% /dev/termination-log
$
You can see that there is a 16Gb mount point at /mnt/sdb just as we specified in the specifications and remote SSH access is working.