This post summarizes how you can migrate the VMware Cloud Director database from PostgreSQL running in the VCD appliance into a PostgreSQL pod running in Kuberenetes and then creating new VCD cells running as pods in Kubernetes to run VCD services. In summary, modernizing VCD into a modern application.
I wanted to experiment with VMware Cloud Director to see if it would run in Kubernetes. One of the reasons for this is to reduce resource consumption in my home lab. The VCD appliance can be quite a high resource consuming VM needing a minimum of 2 vCPUs and 6GB of RAM. Running VCD in Kubernetes would definitely reduce this down and free up much needed RAM for other applications. Other benefits by running this workload in Kubernetes would benefit from faster deployment, higher availability, easier lifecycle management and operations and additional benefits from the ecosystem such as observability tools.
Here’s a view of the current VCD appliance in the portal. 172.16.1.34 is the IP of the appliance, 172.16.1.0/27 is the network for the NSX-T segment that I’ve created for the VCD DMZ network. At the end of this post, you’ll see VCD running in Kubernetes pods with IP addresses assigned by the CNI instead.
Tanzu Kubernetes Grid Shared Services Cluster
I am using a Tanzu Kubernetes Grid cluster set up for shared services. Its the ideal place to run applications that in the virtual machine world would have been running in a traditional vSphere Management Cluster. I also run Container Service Extension and App Launchpad Kubernetes pods in this cluster too.
Step 1. Deploy PostgreSQL with Kubeapps into a Kubernetes cluster
If you have Kubeapps, this is the easiest way to deploy PostgreSQL.
Copy my settings below to create a PostgreSQL database server and the vcloud user and database that are required for the database restore.
Step 1. Alternatively, use Helm directly.
# Create database server using KubeApps or Helm, vcloud user with password helm repo add bitnami https://charts.bitnami.com/bitnami # Pull the chart, unzip then edit values.yaml helm pull bitnami/postgresql tar zxvf postgresql-11.1.11.tgz helm install postgresql bitnami/postgresql -f /home/postgresql/values.yaml -n vmware-cloud-director # Expose postgres service using load balancer k expose pod -n vmware-cloud-director postgresql-primary-0 --type=LoadBalancer --name postgresql-public # Get the IP address of the load balancer service k get svc -n vmware-cloud-director postgresql-public # Connect to database as postgres user from VCD appliance to test connection psql --host 172.16.4.70 -U postgres -p 5432 # Type password you used when you deployed postgresql # Quit \q
Step 2. Backup database from VCD appliance and restore to PostgreSQL Kubernetes pod
Log into the VCD appliance using SSH.
# Stop vcd services on all VCD appliances service vmware-vcd stop # Backup database and important files on VCD appliance ./opt/vmware/appliance/bin/create_backup.sh # Unzip the zip file into /opt/vmware/vcloud-director/data/transfer/backups # Restore database using pg_dump backup file. Do this from the VCD appliance as it already has the postgres tools installed. pg_restore --host 172.16.4.70 -U postgres -p 5432 -C -d postgres /opt/vmware/vcloud-director/data/transfer/backups/vcloud-database.sql # Edit responses.properties and change IP address of database server from load balancer IP to the assigned FQDN for the postgresql pod, e.g. postgresql-primary.vmware-cloud-director.svc.cluster.local # Shutdown the VCD appliance, its no longer needed
Step 3. Deploy Helm Chart for VCD
# Pull the Helm Chart helm pull oci://harbor.vmwire.com/library/vmware-cloud-director # Uncompress the Helm Chart tar zxvf vmware-cloud-director-0.5.0.tgz # Edit the values.yaml to suit your needs # Deploy the Helm Chart helm install vmware-cloud-director vmware-cloud-director --version 0.5.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml # Wait for about five minutes for the installation to complete # Monitor logs k logs -f -n vmware-cloud-director vmware-cloud-director-0
If you see an error such as:
Error starting application: Unable to create marker file in the transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer/cells/4c959d7c-2e3a-4674-b02b-c9bbc33c5828]
This is due to the transfer share being created by a different vcloud user on the original VCD appliance. This user has a different Linux user ID, normally 1000 or 1001, we need to change this to work with the new vcloud user.
Run the following commands to resolve this issue:
# Launch a bash session into the VCD pod k exec -it -n vmware-cloud-director vmware-cloud-director-0 -- /bin/bash # change ownership to the /transfer share to the vcloud user chmod -R vcloud:vcloud /opt/vmware/vcloud-director/data/transfer # type exit to quit exit
Once that’s done, the cell can start and you’ll see the following:
Successfully verified transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer] Cell startup completed in 2m 26s
The VCD pod is exposed using a load balancer in Kubernetes. Ports 443 and 8443 are exposed on a single IP, just like how it is configured on the VCD appliance.
Run the following to obtain the new load balancer IP address of VCD.
k get svc -n vmware-cloud-director vmware-cloud-director
vmware-cloud-director LoadBalancer 100.64.230.197 172.16.4.71 443:31999/TCP,8443:30016/TCP 16m
Redirect your DNS server record to point to this new IP address for both the HTTP and VMRC services, e.g., 172.16.4.71.
If everything ran successfully, you should now be able to log into VCD. Here’s my VCD instance that I use for my lab environment which was previously running in a VCD appliance, now migrated over to Kubernetes.
Notice, the old cell is now inactive because it is powered-off. It can now be removed from VCD and deleted from vCenter.
The pod vmware-cloud-director-0 is now running the VCD application. Notice its assigned IP address of 100.107.74.159. This is the pod’s IP address.
Everything else will work as normal, any UI customizations, TLS certificates are kept just as before the migration, this is because we restored the database and used the responses.properties to add new cells.
Even opening a remote console to a VM will continue to work.
Load Balancer is NSX Advanced LB (Avi)
Avi provides the load balancing services automatically through the Avi Kubernetes Operator (AKO).
AKO automatically configures the services in Avi for you when services are exposed.
Deploy another VCD cell, I mean pod
It is very easy now to scale the VCD by deploying additional replicas.
Edit the values.yaml file and change the replicas number from 1 to 2.
# Upgrade the Helm Chart helm upgrade vmware-cloud-director vmware-cloud-director --version 0.4.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml # Wait for about five minutes for the installation to complete # Monitor logs k logs -f -n vmware-cloud-director vmware-cloud-director-1
When the VCD services start up successfully, you’ll notice that the cell will appear in the VCD UI and Avi is also updated automatically with another pool.
We can also see that Avi is load balancing traffic across the two pods.
Deploy as many replicas as you like.
Here’s a very brief overview of what we have deployed so far.
Notice that the two PostgreSQL pods together are only using 700 Mb of RAM. The VCD pods are consuming much more. But a vast improvement over the 6GB that one appliance needed previously.
You can ensure that the VCD pods are scheduled on different Kubernetes worker nodes by using multi availability zone topology. To do this just change the values.yaml.
# Availability zones in deployment.yaml are setup for TKG and must match VsphereFailureDomain and VsphereDeploymentZones availabilityZones: enabled: true
This makes sure that if you scale up the
vmware-cloud-director statefulset, Kubernetes will ensure that each of the pods will not be placed on the same worker node.
As you can see from the Kubernetes Dashboard output under Resource usage above,
vmware-cloud-director-1 pods are scheduled on different worker nodes.
More importantly, you can see that I have also used the same for the
postgresql-read-0 pods. These are really important to keep separate in case of failure of a worker node or of an ESX server that the worker node runs on.
Here are a few screenshots of VCD, CSE and ALP all running in my Shared Services Kubernetes cluster.
Backing up the PostgreSQL database
For Day 2 operations, such as backing up the PostgreSQL database you can use Velero or just take a backup of the database using the
Backing up the database with pg_dump using a Docker container
Its super easy to take a database backup using a Docker container, just make sure you have Docker running on your workstation and that it can reach the load balancer IP address for the PostgreSQL service.
docker run -it -e PGPASSWORD=Vmware1! postgres:14.2 pg_dump -h 172.16.4.70 -U postgres vcloud > backup.sql
The command will create a file in the current working directory named
Backing up the database with Velero
Please see this other post on how to setup Velero and Restic to backup Kubernetes pods and persistent volumes.
To create a backup of the PostgreSQL database using Velero run the following command.
velero backup create postgresql --ordered-resources 'statefulsets=vmware-cloud-director/postgresql-primary' --include-namespaces=vmware-cloud-director
Describe the backup
velero backup describe postgresql
Show backup logs
velero backup logs postgresql
To delete the backup
velero backup delete postgresql