CSE TKG Clusters can’t pull from GitHub

During TKG cluster creation you might see the following errors.

Error: failed to get
provider components for the "cluster-api:v1.1.3" provider: failed to get
repository client for the CoreProvider with name cluster-api: error creating
the GitHub repository client: failed to get GitHub latest version: failed to
get repository versions: failed to get repository versions: rate limit for
github api has been reached. Please wait one hour or get a personal API
token and assign it to the GITHUB_TOKEN environment variable

This is due to GitHub rate limiting for anonymous access to GitHub. CSE TKG clusters pull images from GitHub, and if you are pulling too many within a short period of time, you will eventually hit the rate limits.

To ensure that you don’t hit the limits a GitHub Access Token is needed.

Then configure CSE to use the GitHub Access Token using the CSE documentation here.

Best practices for installing CSE 4.0

Container Service Extension 4 was released recently. This post aims to help ease the setup of CSE 4.0 as it has a different deployment model using the Solutions framework instead of deploying the CSE appliance into the traditional Management cluster concept used by service providers to run VMware management components such as vCenter, NSX-T Managers, Avi Controllers and other management systems.

Step 1 – Create a CSE Service Account

Perform these steps using the administrator@system account or an equivalent system administrator role.

Setup a Service Account in the Provider (system) organization with the role CSE Admin Role.

In my environment I created a user to use as a service account named svc-cse. You’ll notice that this user has been assigned the CSE Admin Role.

The CSE Admin Role is created automatically by CSE when you use the CSE Management UI as a Provider administrator, just do these steps using the administrator@system account.

Step 2 – Create a token for the Service Account

Log out of VCD and log back into the Provider organization as the service account you created in Step 1 above. Once logged in, it should look like the following screenshot, notice that the svc-cse user is logged into the Provider organization.

Click on the downward arrow at the top right of the screen, next to the user svc-cse and select User Preferences.

Under Access Tokens, create a new token and copy the token to a safe place. This is what you use to deploy the CSE appliance later.

Log out of VCD and log back in as adminstrator@system to the Provider organization.

Step 3 – Deploy CSE appliance

Create a new tenant Organization where you will run CSE. This new organization is dedicated to VCD extensions such as CSE and is managed by the service provider.

For example you can name this new organization something like “solutions-org“. Create an Org VDC within this organization and also the necessary network infrastructure such as a T1 router and an organization network with internet access.

Still logged into the Provider organization, open another tab by clicking on the Open in Tenant Portal link to your “solutions-org” organization. You must deploy the CSE vApp as a Provider.

Now you can deploy the CSE vApp.

Use the Add vApp From Catalog workflow.

Accept the EULA and continue with the workflow.

When you get the Step 8 of the Create vApp from Template, ensure that you setup the OVF properties like my screenshot below:

The important thing to note is to ensure that you are using the correct service account username and use the token from Step 2 above.

Also since you must have the service account in the Provider organization, leave the default system organization for CSE service account’s org.

The last value is very important, it must by set to the tenant organization that will run the CSE appliance, in our case it is the “solutions-org” org.

Once the OVA is deployed you can boot it up or if you want to customize the root password then do so before you start the vApp. If not, the default credentials are root and vmware.

Rights required for deploying TKG clusters

Ensure that the user that is logged into a tenant organization has the correct rights to deploy a TKG cluster. This user must have at a minimum the rights in the Kubernetes Cluster Author Global Role.

App LaunchPad

You’ll also need to upgrade App Launchpad to the latest version alp-2.1.2-20764259 to support CSE 4.0 deployed clusters.

Also ensure that the App-Launchpad-Service role has the rights to manage CAPVCD clusters.

Otherwise you may encounter the following issue:

VCD API Protected by Web Application Firewalls

If you are using a web application firewall (WAF) in front of your VCD cells and you are blocking access to the provider side APIs. You will need to add the SNAT IP address of the T1 from the solutions-org into the WAF whitelist.

The CSE appliance will need access to the VCD provider side APIs.

I wrote about using a WAF in front of VCD in the past to protect provider side APIs. You can read those posts here and here.

Cleaning up CSE 4.0 beta

For those partners that have been testing the beta, you’ll need to remove all traces of it before you can install the GA version. VMware does not support upgrading or migrating from beta builds to GA builds.

This is a post to help you clean up your VMware Cloud Director environment in preparation for the GA build of CSE 4.0.

If you don’t clean up, when you try to configure CSE again with the CSE Management wizard, you’ll see the message below:

“Server configuration entity already exists.”

Delete CSE Roles

First delete all the CSE Roles that the beta has setup, the GA version of CSE will recreate these for you when you use the CSE management wizard. Don’t forget to assign the new role to your CSE service account when you deploy the CSE GA OVA.

Use the Postman Collection to clean up

I’ve included a Postman collection on my Github account, available here.

Hopefully, it is self-explanatory. Authenticate against the VCD API, then run each API request in order, make sure you obtain the entity and entityType IDs before you delete.

If you’re unable to delete the entity or entityTypes, you may need to delete all of the CSE clusters before, that means cleaning up all PVCs, PVs, deployments and then the clusters themselves.

Deploy CSE GA Normally

You’ll now be able to use the Configure Management wizard and deploy CSE 4.0 GA as normal.

Known Issues

If you’re unable to delete any of these entities then run a POST using /resolve.

For example, https://vcd.vmwire.com/api-explorer/provider#/definedEntity/resolveDefinedEntity

Once, it is resolved, you can go ahead and delete the entity.

VMware Cloud Director, Container Service Extension and App Launchpad Running in Kubernetes

I’ve been experimenting with the VMware Cloud Director, Container Service Extension and App Launchpad applications and wanted to test if these applications would run in Kubernetes.

The short answer is yes!

I’ve been experimenting with the VMware Cloud Director, Container Service Extension and App Launchpad applications and wanted to test if these applications would run in Kubernetes.

The short answer is yes!

I initially deployed these apps as a standalone Docker container to see if they would run as a container. I wanted to eventually get them to run in a Kubernetes cluster to benefit from all the goodies that Kubernetes provides.

Packaging the apps wasn’t too difficult, just needed patience and a lot of Googling. The process was as follows:

run a Docker image of a Linux image, CentOS for VCD and Photon for ALP and CSE.
prepare all the pre-requisites, such as yum update and tdnf update.
commit the image to a Harbor registry
build a Helm chart to deploy the applications using the images and then create a shell script that is run when the image starts to install and run the applications.

Well, its not that simple but you can take a look at the code for all three Helm Charts on my Github or pull them from my public Harbor repository.

VMware Cloud Director

Github: https://github.com/hugopow/vmware-cloud-director

Helm Chart: helm pull oci://harbor.vmwire.com/library/vmware-cloud-director

How to install: Update values.yaml and then run

helm install vmware-cloud-director oci://harbor.vmwire.com/library/vmware-cloud-director --version 0.5.0 -n vmware-cloud-director

Notice how easy that was to install?

The values.yaml file is the only file you’ll need to edit, just update to suit your environment.

# Default values for vmware-cloud-director.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 1

installFirstCell:
  enabled: true

installAdditionalCell:
  enabled: false

storageClass: iscsi
pvcCapacity: 2Gi

vcdNfs:
  server: 10.92.124.20
  mountPath: /mnt/nvme/vcd-k8s

vcdSystem:
  user: administrator
  password: Vmware1!
  email: admin@domain.local
  systemName: VCD
  installationId: 1

postgresql:
  dbHost: postgresql.vmware-cloud-director.svc.cluster.local
  dbName: vcloud
  dbUser: vcloud
  dbPassword: Vmware1!

# Availability zones in deployment.yaml are setup for TKG and must match VsphereFailureDomain and VsphereDeploymentZones
availabilityZones:
  enabled: false

httpsService:
  type: LoadBalancer
  port: 443

consoleProxyService:
  port: 8443

publicAddress:
  uiBaseUri: https://vcd-k8s.vmwire.com
  uiBaseHttpUri: http://vcd-k8s.vmwire.com
  restapiBaseUri: https://vcd-k8s.vmwire.com
  restapiBaseHttpUri: http://vcd-k8s.vmwire.com
  consoleProxy: vcd-vmrc.vmwire.com

tls:
  certFullChain: |-
    -----BEGIN CERTIFICATE-----
          wildcard certificate
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
          intermediate certificate
    -----END CERTIFICATE-----
    -----BEGIN CERTIFICATE-----
          root certificate
    -----END CERTIFICATE-----
  certKey: |-
    -----BEGIN PRIVATE KEY-----
          wildcard certificate private key
    -----END PRIVATE KEY-----

The installation process is quite fast, less than three minutes to get the first pod up and running and two minutes for each subsequent pod. That means a VCD multi-cell system up and running in less than ten minutes.

I’ve deployed VCD as a StatefulSet, and have three replicas. Since the replica is set to three, three VCD “Pods” are deployed, in the old world these would be the cells. Here you can see three pods running which would provide both load balancing and high-availability. The other pod is the PostgreSQL database that these cells use. You should also be able to see that Kubernetes has scheduled each pod on a different worker node. I have three worker nodes in this Kubernetes cluster.

Below is the view in VCD of the three cells.

The StatefulSet also has a LoadBalancer service configured for performing the load balancing of the HTTP and Console Proxy traffic on TCP 443 and TCP 8443 respectively.

You can see the LoadBalancer service has configured the services for HTTP and Console Proxy. Note, that this is done automatically by Kubernetes using a manifest in the Helm Chart.

Migrating an existing VCD instance to Kubernetes

If you want to migrate an existing instance to Kubernetes, then use this post here.

Container Service Extension

Github: https://github.com/hugopow/container-service-extension

Helm Chart: helm pull oci://harbor.vmwire.com/library/container-service-extension

How to install: Update values.yaml and then run helm install container-service-extension oci://harbor.vmwire.com/library/container-service-extension --version 0.2.0 -n container-service-extension

Here’s CSE running as a pod in Kubernetes. Since CSE is a stateless application, I’ve configured it to run as a Deployment.

CSE also does not need a database as it purely communicates with VCD through a message bus such as MQTT or RabbitMQ. Additionally no external access to CSE is required as this is done via VCD, so no load balancer is needed either.

You can see that when CSE is idle it only needs 1 milicore of CPU and 102Mib of RAM. This is so much better in terms of resource requirements than running CSE in a VM. This is one of the advantages of running pods vs VMs. Pods will use considerably fewer resources than VMs.

App Launchpad

Github: https://github.com/hugopow/app-launchpad

Helm Chart: helm pull oci://harbor.vmwire.com/library/app-launchpad

How to install: Update values.yaml and then run helm install app-launchpad oci://harbor.vmwire.com/library/app-launchpad --version 0.4.0 -n app-launchpad

The values.yaml file is the only file you’ll need to edit, just update to suit your environment.

# Default values for app-launchpad.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

alpConnect:
  saUser: "svc-alp"
  saPass: Vmware1!
  url: https://vcd-k8s.vmwire.com
  adminUser: administrator@system
  adminPass: Vmware1!
  mqtt: true
  eula: accept
# If you accept the EULA then type "accept" in the EULA key value to install ALP. You can fine the EULA in the README.md file.

I’ve already written an article about ALP here. That article contains a lot more details so I’ll share a few screenshots below for ALP.

Just like CSE, ALP is a stateless application and is deployed as a Deployment. ALP also does not require external access through a load balancer as it too communicates with VCD using the MQTT or RabbitMQ message bus.

You can see that ALP when idle requires just 3 milicores of CPU and 400 Mib of RAM.

ALP can be deployed with multiple instances to provide load balancer and high availability. This is done by deploying RabbitMQ and connecting ALP and VCD to the same exchange. VCD does not support multiple instances of ALP if MQTT is used.

When RabbitMQ is configured, then ALP can be scaled by changing the Deployment number of replicas to two or more. Kubernetes would then deploy additional pods with ALP.

Migrating VMware Cloud Director to Kubernetes

This post summarizes how you can migrate the VMware Cloud Director database from PostgreSQL running in the VCD appliance into a PostgreSQL pod running in Kuberenetes and then creating new VCD cells running as pods in Kubernetes to run VCD services. In summary, modernizing VCD as a modern application.

I wanted to experiment with VMware Cloud Director to see if it would run in Kubernetes. One of the reasons for this is to reduce resource consumption in my home lab. The VCD appliance can be quite a high resource consuming VM needing a minimum of 2 vCPUs and 6GB of RAM. Running VCD in Kubernetes would definitely reduce this down and free up much needed RAM for other applications. Other benefits by running this workload in Kubernetes would benefit from faster deployment, higher availability, easier lifecycle management and operations and additional benefits from the ecosystem such as observability tools.

Here’s a view of the current VCD appliance in the portal. 172.16.1.34 is the IP of the appliance, 172.16.1.0/27 is the network for the NSX-T segment that I’ve created for the VCD DMZ network. At the end of this post, you’ll see VCD running in Kubernetes pods with IP addresses assigned by the CNI instead.

Tanzu Kubernetes Grid Shared Services Cluster

I am using a Tanzu Kubernetes Grid cluster set up for shared services. Its the ideal place to run applications that in the virtual machine world would have been running in a traditional vSphere Management Cluster. I also run Container Service Extension and App Launchpad Kubernetes pods in this cluster too.

Step 1. Deploy PostgreSQL with Kubeapps into a Kubernetes cluster

If you have Kubeapps, this is the easiest way to deploy PostgreSQL.

Copy my settings below to create a PostgreSQL database server and the vcloud user and database that are required for the database restore.

Step 1. Alternatively, use Helm directly.

# Create database server using KubeApps or Helm, vcloud user with password

helm repo add bitnami https://charts.bitnami.com/bitnami

# Pull the chart, unzip then edit values.yaml
helm pull bitnami/postgresql
tar zxvf postgresql-11.1.11.tgz

helm install postgresql bitnami/postgresql -f /home/postgresql/values.yaml -n vmware-cloud-director

# Expose postgres service using load balancer
k expose pod -n vmware-cloud-director postgresql-primary-0 --type=LoadBalancer --name postgresql-public

# Get the IP address of the load balancer service
k get svc -n vmware-cloud-director postgresql-public

# Connect to database as postgres user from VCD appliance to test connection
psql --host 172.16.4.70 -U postgres -p 5432

# Type password you used when you deployed postgresql

# Quit
\q

Step 2. Backup database from VCD appliance and restore to PostgreSQL Kubernetes pod

Log into the VCD appliance using SSH.

# Stop vcd services on all VCD appliances
service vmware-vcd stop

# Backup database and important files on VCD appliance
./opt/vmware/appliance/bin/create_backup.sh

# Unzip the zip file into /opt/vmware/vcloud-director/data/transfer/backups

# Restore database using pg_dump backup file. Do this from the VCD appliance as it already has the postgres tools installed.

pg_restore --host 172.16.4.70 -U postgres -p 5432 -C -d postgres /opt/vmware/vcloud-director/data/transfer/backups/vcloud-database.sql

# Edit responses.properties and change IP address of database server from  load balancer IP to the assigned FQDN for the postgresql pod, e.g. postgresql-primary.vmware-cloud-director.svc.cluster.local

# Shutdown the VCD appliance, its no longer needed

Step 3. Deploy Helm Chart for VCD

# Pull the Helm Chart
helm pull oci://harbor.vmwire.com/library/vmware-cloud-director

# Uncompress the Helm Chart
tar zxvf vmware-cloud-director-0.5.0.tgz

# Edit the values.yaml to suit your needs

# Deploy the Helm Chart
helm install vmware-cloud-director vmware-cloud-director --version 0.5.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml

# Wait for about five minutes for the installation to complete

# Monitor logs
k logs -f  -n vmware-cloud-director vmware-cloud-director-0

Known Issues

If you see an error such as:

Error starting application: Unable to create marker file in the transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer/cells/4c959d7c-2e3a-4674-b02b-c9bbc33c5828]

This is due to the transfer share being created by a different vcloud user on the original VCD appliance. This user has a different Linux user ID, normally 1000 or 1001, we need to change this to work with the new vcloud user.

Run the following commands to resolve this issue:

# Launch a bash session into the VCD pod
k exec -it -n vmware-cloud-director vmware-cloud-director-0 -- /bin/bash

# change ownership to the /transfer share to the vcloud user
chmod -R vcloud:vcloud /opt/vmware/vcloud-director/data/transfer

# type exit to quit
exit

Once that’s done, the cell can start and you’ll see the following:

Successfully verified transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer]
Cell startup completed in 2m 26s

Accessing VCD

The VCD pod is exposed using a load balancer in Kubernetes. Ports 443 and 8443 are exposed on a single IP, just like how it is configured on the VCD appliance.

Run the following to obtain the new load balancer IP address of VCD.

k get svc -n vmware-cloud-director  vmware-cloud-director

vmware-cloud-director   LoadBalancer   100.64.230.197   172.16.4.71   443:31999/TCP,8443:30016/TCP   16m

Redirect your DNS server record to point to this new IP address for both the HTTP and VMRC services, e.g., 172.16.4.71.

If everything ran successfully, you should now be able to log into VCD. Here’s my VCD instance that I use for my lab environment which was previously running in a VCD appliance, now migrated over to Kubernetes.

Notice, the old cell is now inactive because it is powered-off. It can now be removed from VCD and deleted from vCenter.

The pod vmware-cloud-director-0 is now running the VCD application. Notice its assigned IP address of 100.107.74.159. This is the pod’s IP address.

Everything else will work as normal, any UI customizations, TLS certificates are kept just as before the migration, this is because we restored the database and used the responses.properties to add new cells.

Even opening a remote console to a VM will continue to work.

Load Balancer is NSX Advanced LB (Avi)

Avi provides the load balancing services automatically through the Avi Kubernetes Operator (AKO).

AKO automatically configures the services in Avi for you when services are exposed.

Deploy another VCD cell, I mean pod

It is very easy now to scale the VCD by deploying additional replicas.

Edit the values.yaml file and change the replicas number from 1 to 2.

# Upgrade the Helm Chart
helm upgrade vmware-cloud-director vmware-cloud-director --version 0.4.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml

# Wait for about five minutes for the installation to complete

# Monitor logs
k logs -f  -n vmware-cloud-director vmware-cloud-director-1

When the VCD services start up successfully, you’ll notice that the cell will appear in the VCD UI and Avi is also updated automatically with another pool.

We can also see that Avi is load balancing traffic across the two pods.

Deploy as many replicas as you like.

Resource usage

Here’s a very brief overview of what we have deployed so far.

Notice that the two PostgreSQL pods together are only using 700 Mb of RAM. The VCD pods are consuming much more. But a vast improvement over the 6GB that one appliance needed previously.

High Availability

You can ensure that the VCD pods are scheduled on different Kubernetes worker nodes by using multi availability zone topology. To do this just change the values.yaml.

# Availability zones in deployment.yaml are setup for TKG and must match VsphereFailureDomain and VsphereDeploymentZones
availabilityZones:
  enabled: true

This makes sure that if you scale up the vmware-cloud-director statefulset, Kubernetes will ensure that each of the pods will not be placed on the same worker node.

As you can see from the Kubernetes Dashboard output under Resource usage above, vmware-cloud-director-0 and vmware-cloud-director-1 pods are scheduled on different worker nodes.

More importantly, you can see that I have also used the same for the postgresql-primary-0 and postgresql-read-0 pods. These are really important to keep separate in case of failure of a worker node or of an ESX server that the worker node runs on.

Finally

Here are a few screenshots of VCD, CSE and ALP all running in my Shared Services Kubernetes cluster.

Backing up the PostgreSQL database

For Day 2 operations, such as backing up the PostgreSQL database you can use Velero or just take a backup of the database using the pg_dump tool.

Backing up the database with pg_dump using a Docker container

Its super easy to take a database backup using a Docker container, just make sure you have Docker running on your workstation and that it can reach the load balancer IP address for the PostgreSQL service.

docker run -it  -e PGPASSWORD=Vmware1! postgres:14.2  pg_dump  -h 172.16.4.70 -U postgres vcloud > backup.sql

The command will create a file in the current working directory named backup.sql.

Backing up the database with Velero

Please see this other post on how to setup Velero and Restic to backup Kubernetes pods and persistent volumes.

To create a backup of the PostgreSQL database using Velero run the following command.

velero backup create postgresql --ordered-resources 'statefulsets=vmware-cloud-director/postgresql-primary' --include-namespaces=vmware-cloud-director

Describe the backup

velero backup describe postgresql

Show backup logs

velero backup logs postgresql

To delete the backup

velero backup delete postgresql

Install Container Service Extension 3.1.1 with VCD 10.3.1

Prepare the Photon OS 3 VM

Deploy the OVA using this link.

Photon OS 3 does not support Linux guest customization unfortunately, so we will use the links below to manually setup the OS with a hostname and static IP address.

Boot the VM, the default credentials are root with password changeme. Change the default password.

Set host name by changing the /etc/hostname file.

Configure a static IP using this guide.

Add DNS server using this guide.

Reboot.

Photon 3 has the older repositories, so we will need to update to newer repositories as detailed in this KB article. I’ve included this in the instructions below.

Copypasta or use create a bash script.

# Update Photon repositories
cd /etc/yum.repos.d/
sed  -i 's/dl.bintray.com\/vmware/packages.vmware.com\/photon\/$releasever/g' photon.repo photon-updates.repo photon-extras.repo photon-debuginfo.repo

# If you get errors with the above command, then copy the command from the KB article.

# Update Photon
tdnf --assumeyes update

# Install dependencies
tdnf --assumeyes install build-essential python3-devel python3-pip git

# Update python3, cse supports python3 version 3.7.3 or greater, it does not support python 3.8 or above.
tdnf --assumeyes update python3

# Prepare cse user and application directories
mkdir -p /opt/vmware/cse
chmod 775 -R /opt
chmod 777 /
groupadd cse
useradd cse -g cse -m -p Vmware1! -d /opt/vmware/cse
chown cse:cse -R /opt

# Run as cse user, add your public ssh key to CSE server
su - cse
mkdir -p ~/.ssh
cat >> ~/.ssh/authorized_keys << EOF
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
EOF

cat >> ~/.bash_profile << EOF
# For Container Service Extension
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
source /opt/vmware/cse/python/bin/activate
EOF

# Install CSE in virtual environment
python3 -m venv /opt/vmware/cse/python
source /opt/vmware/cse/python/bin/activate
pip3 install container-service-extension==3.1.1

cse version

source ~/.bash_profile

# Prepare vcd-cli
mkdir -p ~/.vcd-cli
cat >  ~/.vcd-cli/profiles.yaml << EOF
extensions:
- container_service_extension.client.cse
EOF

vcd cse version

# Add my Let's Encrypt intermediate and root certs. Use your certificates issued by your CA to enable verify=true with CSE.
cat >> /opt/vmware/cse/python/lib/python3.7/site-packages/certifi/cacert.pem << EOF
-----BEGIN CERTIFICATE-----
MIIFFjCCAv6gAwIBAgIRAJErCErPDBinU/bWLiWnX1owDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMjAwOTA0MDAwMDAw
WhcNMjUwOTE1MTYwMDAwWjAyMQswCQYDVQQGEwJVUzEWMBQGA1UEChMNTGV0J3Mg
RW5jcnlwdDELMAkGA1UEAxMCUjMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQC7AhUozPaglNMPEuyNVZLD+ILxmaZ6QoinXSaqtSu5xUyxr45r+XXIo9cP
R5QUVTVXjJ6oojkZ9YI8QqlObvU7wy7bjcCwXPNZOOftz2nwWgsbvsCUJCWH+jdx
sxPnHKzhm+/b5DtFUkWWqcFTzjTIUu61ru2P3mBw4qVUq7ZtDpelQDRrK9O8Zutm
NHz6a4uPVymZ+DAXXbpyb/uBxa3Shlg9F8fnCbvxK/eG3MHacV3URuPMrSXBiLxg
Z3Vms/EY96Jc5lP/Ooi2R6X/ExjqmAl3P51T+c8B5fWmcBcUr2Ok/5mzk53cU6cG
/kiFHaFpriV1uxPMUgP17VGhi9sVAgMBAAGjggEIMIIBBDAOBgNVHQ8BAf8EBAMC
AYYwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMBMBIGA1UdEwEB/wQIMAYB
Af8CAQAwHQYDVR0OBBYEFBQusxe3WFbLrlAJQOYfr52LFMLGMB8GA1UdIwQYMBaA
FHm0WeZ7tuXkAXOACIjIGlj26ZtuMDIGCCsGAQUFBwEBBCYwJDAiBggrBgEFBQcw
AoYWaHR0cDovL3gxLmkubGVuY3Iub3JnLzAnBgNVHR8EIDAeMBygGqAYhhZodHRw
Oi8veDEuYy5sZW5jci5vcmcvMCIGA1UdIAQbMBkwCAYGZ4EMAQIBMA0GCysGAQQB
gt8TAQEBMA0GCSqGSIb3DQEBCwUAA4ICAQCFyk5HPqP3hUSFvNVneLKYY611TR6W
PTNlclQtgaDqw+34IL9fzLdwALduO/ZelN7kIJ+m74uyA+eitRY8kc607TkC53wl
ikfmZW4/RvTZ8M6UK+5UzhK8jCdLuMGYL6KvzXGRSgi3yLgjewQtCPkIVz6D2QQz
CkcheAmCJ8MqyJu5zlzyZMjAvnnAT45tRAxekrsu94sQ4egdRCnbWSDtY7kh+BIm
lJNXoB1lBMEKIq4QDUOXoRgffuDghje1WrG9ML+Hbisq/yFOGwXD9RiX8F6sw6W4
avAuvDszue5L3sz85K+EC4Y/wFVDNvZo4TYXao6Z0f+lQKc0t8DQYzk1OXVu8rp2
yJMC6alLbBfODALZvYH7n7do1AZls4I9d1P4jnkDrQoxB3UqQ9hVl3LEKQ73xF1O
yK5GhDDX8oVfGKF5u+decIsH4YaTw7mP3GFxJSqv3+0lUFJoi5Lc5da149p90Ids
hCExroL1+7mryIkXPeFM5TgO9r0rvZaBFOvV2z0gp35Z0+L4WPlbuEjN/lxPFin+
HlUjr8gRsI3qfJOQFy/9rKIJR0Y/8Omwt/8oTWgy1mdeHmmjk7j1nYsvC9JSQ6Zv
MldlTTKB3zhThV1+XWYp6rjd5JW1zbVWEkLNxE7GJThEUG3szgBVGP7pSWTUTsqX
nLRbwHOoq7hHwg==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
-----END CERTIFICATE-----
EOF

# Create service account
vcd login vcd.vmwire.com system administrator -p Vmware1!
cse create-service-role vcd.vmwire.com
# Enter system administrator username and password

# Create VCD service account for CSE
vcd user create --enabled svc-cse Vmware1! "CSE Service Role"

# Create config file
mkdir -p /opt/vmware/cse/config

cat > /opt/vmware/cse/config/config-not-encrypted.conf << EOF
mqtt:
  verify_ssl: false

vcd:
  host: vcd.vmwire.com
  log: true
  password: Vmware1!
  port: 443
  username: administrator
  verify: true

vcs:
- name: vcenter.vmwire.com
  password: Vmware1!
  username: administrator@vsphere.local
  verify: true

service:
  enforce_authorization: false
  legacy_mode: false
  log_wire: false
  no_vc_communication_mode: false
  processors: 15
  telemetry:
    enable: true

broker:
  catalog: cse-catalog
  ip_allocation_mode: pool
  network: default-organization-network
  org: cse
  remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/master/template_v2.yaml
  storage_profile: 'iscsi'
  vdc: cse-vdc
EOF

cse encrypt /opt/vmware/cse/config/config-not-encrypted.conf --output /opt/vmware/cse/config/config.yaml
chmod 600 /opt/vmware/cse/config/config.yaml
cse check /opt/vmware/cse/config/config.yaml

cse template list

# Import TKGm ova with this command
# Copy the ova to /tmp/ first, the ova can be obtained from my.vmware.com, ensure that it has chmod 644 permissions.
cse template import -F /tmp/ubuntu-2004-kube-v1.20.5-vmware.2-tkg.1-6700972457122900687.ova

# You may need to enable 644 permissions on the file if cse complains that the file is not readable.

# Install CSE
cse install -k ~/.ssh/authorized_keys

# Or use this if you've already installed and want to skip template creation again
cse upgrade --skip-template-creation -k ~/.ssh/authorized_keys

# Register the cse extension with vcd if it did not already register
vcd system extension create cse cse cse vcdext '/api/cse, /api/cse/.*, /api/cse/.*/.*'

# Setup cse.sh
cat > /opt/vmware/cse/cse.sh << EOF
#!/usr/bin/env bash
source /opt/vmware/cse/python/bin/activate
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
cse run
EOF

# Make cse.sh executable
chmod +x /opt/vmware/cse/cse.sh

# Deactivate the python virtual environment and go back to root
deactivate
exit

# Setup cse.service, use MQTT and not RabbitMQ
cat > /etc/systemd/system/cse.service << EOF
[Unit]
Description=Container Service Extension for VMware Cloud Director

[Service]
ExecStart=/opt/vmware/cse/cse.sh
User=cse
WorkingDirectory=/opt/vmware/cse
Type=simple
Restart=always

[Install]
WantedBy=default.target
EOF

systemctl enable cse.service
systemctl start cse.service

systemctl status cse.service

Enable the CSE UI Plugin for VCD

The new CSE UI extension is bundled with VCD 10.3.1.

Enable it for the tenants that you want or for all tenants.

Enable the rights bundles

Follow the instructions in this other post.

For 3.1.1 you will also need to edit the cse:nativeCluster Entitlement Rights Bundle and add the two following rights:

ACCESS CONTROL, User, Manage user’s own API token

COMPUTE, Organization VDC, Create a Shared Disk

Then publish the Rights Bundle to all tenants.

Enable Global Roles to use CSE or Configure Rights Bundles

The quickest way to get CSE working is to add the relevant rights to the Organization Administrator role. You can create a custom rights bundle and create a custom role for the k8s admin tenant persona if you like. I won’t cover that in this post.

Edit the Organization Administrator role and scroll all the way down to the bottom and click both the View 8/8 and Manage 12/12, then Save.

Setting up VCD CSI and CPI Operators

You may notice that when the cluster is up you might not be able to deploy any pods, this is because the cluster is not ready and is in a tainted state due to the CSI and CPI Operators not having the credentials.

kubectl get pods -A
NAMESPACE     NAME                                         READY   STATUS    RESTARTS   AGE
kube-system   antrea-agent-lhsxv                           2/2     Running   0          10h
kube-system   antrea-agent-pjwtp                           2/2     Running   0          10h
kube-system   antrea-controller-5cd95c574d-4qb7p           0/1     Pending   0          10h
kube-system   coredns-6598d898cd-9vbzv                     0/1     Pending   0          10h
kube-system   coredns-6598d898cd-wwpk9                     0/1     Pending   0          10h
kube-system   csi-vcd-controllerplugin-0                   0/3     Pending   0          37s
kube-system   etcd-mstr-h8mg                               1/1     Running   0          10h
kube-system   kube-apiserver-mstr-h8mg                     1/1     Running   0          10h
kube-system   kube-controller-manager-mstr-h8mg            1/1     Running   0          10h
kube-system   kube-proxy-2dzwh                             1/1     Running   0          10h
kube-system   kube-proxy-wd7tf                             1/1     Running   0          10h
kube-system   kube-scheduler-mstr-h8mg                     1/1     Running   0          10h
kube-system   vmware-cloud-director-ccm-5489b6788c-kgtsn   1/1     Running   0          13s

To bring up the pods to a ready state, you will need to follow this previous post.

Useful links

https://github.com/vmware/container-service-extension/commit/5d2a60b5eeb164547aef39602f9871c06726863e

https://vmware.github.io/container-service-extension/cse3_1/RELEASE_NOTES.html

Kubernetes Load Balancer Service for CSE on Cloud Director

This article describes how to setup vCenter, VCD, NSX-T and NSX Advanced Load Balancer to support exposing Kubernetes applications in Kubernetes clusters provisioned into VCD.

At the end of this post, you would be able to run this command:

kubectl expose deployment webserver –port=80 –type=LoadBalancer

… and have NSX ALB together with VCD and NSX-T automate the provisioning and setup of everything that allows you to expose that application to the outside world using a Kubernetes service of type LoadBalancer.

This article describes how to setup vCenter, VCD, NSX-T and NSX Advanced Load Balancer to support exposing Kubernetes applications in Kubernetes clusters provisioned into VCD.

At the end of this post, you would be able to run this command:

kubectl expose deployment webserver --port=80 --type=LoadBalancer

Create a Content Library for NSX ALB

In vCenter (Resource vCenter managing VCD PVDCs), create a Content Library for NSX Advanced Load Balancer to use to upload the service engine ova.

Create T1 for Avi Service Engine management network

Create T1 for Avi Service Engine management network. You can either attach this T1 to the default T0 or create a new T0.

enable DHCP server for the T1
enable All Static Routes and All Connected Segments & Service Ports under Route Advertisement

Create a network segment for Service Engine management network

Create a network segment for Avi Service Engine management network. Attach the segment to the T1 the was created in the previous step.

Ensure you enable DHCP, this will assign IP addresses to the service engines automatically and you won’t need to setup IPAM profiles in Avi Vantage.

NSX Advanced Load Balancer Settings

A couple of things to setup here.

You do not need to create any tenants in NSX ALB, just use the default admin context.
No IPAM/DNS Profiles are required as we will use DHCP from NSX-T for all networks.
Use FQDNs instead of IP addresses
Use the same FQDN in all systems for consistency and to ensure that registration between the systems work
- NSX ALB
- VCD
- NSX-T

Navigate to Administration, User Credentials and setup user credentials for NSX-T controller and vCenter server
Navigate to Administration, Settings, Tenant Settings and ensure that the settings are as follows

Setup an NSX-T Cloud

Navigate to Infrastructure, Clouds. Setup your cloud similar to mine, I have valled my NSX-T cloud nsx.vmwire.com (which is the FQDN of my NSX-T Controller).

Lets go through these settings from the top.

use the FQDN of your NSX-T manager for the name
click the DHCP option, we will be using NSX-T’s DHCP server so we can ignore IPAM/DNS later
enter something for the Object Name Prefix, this will give the SE VM name a prefix so they can be identified in vCenter. I used avi here, so it will look like this in vCenter

type the FQDN of the NSX-T manager into the NSX-T Manager Address
choose the NSX-T Manager Credentials that you configured earlier
select the Transport Zone that you are using in VCD for your tenants
- under Management Network Segment, select the T1 that you created earlier for SE management networking
- under Segment ID, select the network segment that you created earlier for the SE management network
click ADD under the Data Network Segment(s)
- select the T1 that is used by the tenant in VCD
- select the tenant organization routed network that is attached to the t1 in the previous task
the two previous settings tell NSX ALB where to place the data/vip network for front-end load balancing use. NSX-ALB will create a new segment for this in NSX-T automatically, and VCD will automatically create DNAT rules when a virtual service is requested in NSX ALB
the last step is to add the vCenter server, this would be the vCenter server that is managing the PVDCs used in VCD.

Now wait for a while until the status icon turns green and shows Complete.

Setup a Service Engine Group

Decide whether you want to use a shared service engine group for all VCD tenants or dedicated a service engine group for each Tenant.

I use the dedicated model.

navigate to Infrastructure, Service Engine Group
change the cloud to the NSX-T cloud that you setup earlier
create a new service engine group with your preferred settings, you can read about the options here.

Setup Avi in VCD

Log into VCD as a Provider and navigate to Resources, Infrastructure Resources, NSX-ALB, Controllers and click on the ADD link.

Wait for a while for Avi to sync with VCD. Then continue to add the NSX-T Cloud.

Navigate to Resources, Infrastructure Resources, NSX-ALB, NSX-T Clouds and click on the ADD link.

Proceed when you can see the status is healthy.

Navigate to Resources, Infrastructure Resources, NSX-ALB, Service Engine Groups and click on the ADD link.

Staying logged in as a Provider, navigate to the tenant that you wish to enable NSX ALB load balancing services and navigate to Networking, Edge Gateways, Load Balancer, Service Engine Groups. Then add the service engine group to this tenant.

This will enable this tenant to use NSX ALB load balancing services.

Deploy a new Kubernetes cluster in VCD with Container Service Extension

Deploy a new Kubernetes cluster using Container Service Extension in VCD as normal.

Once the cluster is ready, download the kube config file and log into the cluster.

Check that all the nodes and pods are up as normal.

kubectl get nodes -A

kubectl get pods -A
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   antrea-agent-7nlqs                          2/2     Running   0          21m
kube-system   antrea-agent-q5qc8                          2/2     Running   0          24m
kube-system   antrea-controller-5cd95c574d-r4q2z          0/1     Pending   0          8m38s
kube-system   coredns-6598d898cd-qswn8                    0/1     Pending   0          24m
kube-system   coredns-6598d898cd-s4p5m                    0/1     Pending   0          24m
kube-system   csi-vcd-controllerplugin-0                  0/3     Pending   0          4m29s
kube-system   etcd-mstr-zj9p                              1/1     Running   0          24m
kube-system   kube-apiserver-mstr-zj9p                    1/1     Running   0          24m
kube-system   kube-controller-manager-mstr-zj9p           1/1     Running   0          24m
kube-system   kube-proxy-76m4h                            1/1     Running   0          24m
kube-system   kube-proxy-9229x                            1/1     Running   0          21m
kube-system   kube-scheduler-mstr-zj9p                    1/1     Running   0          24m
kube-system   vmware-cloud-director-ccm-99fd59464-qjj7n   1/1     Running   0          24m

You might see that the following pods in the kube-system namespace are in a pending state. If everything is already working then move onto the next section.

kube-system   coredns-6598d898cd-qswn8     0/1     Pending
kube-system   coredns-6598d898cd-s4p5m     0/1     Pending
kube-system   csi-vcd-controllerplugin-0   0/3     Pending

This is due to the cluster waiting for the csi-vcd-controllerplugin-0 to start.

To get this working, we just need to configure the csi-vcd-controllerplugin-0 with the instructions in this previous post.

Once done, you’ll see that the pods are all now healthy.

kubectl get pods -A
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   antrea-agent-7nlqs                          2/2     Running   0          23m
kube-system   antrea-agent-q5qc8                          2/2     Running   0          26m
kube-system   antrea-controller-5cd95c574d-r4q2z          1/1     Running   0          10m
kube-system   coredns-6598d898cd-qswn8                    1/1     Running   0          26m
kube-system   coredns-6598d898cd-s4p5m                    1/1     Running   0          26m
kube-system   csi-vcd-controllerplugin-0                  3/3     Running   0          60s
kube-system   csi-vcd-nodeplugin-twr4w                    2/2     Running   0          49s
kube-system   etcd-mstr-zj9p                              1/1     Running   0          26m
kube-system   kube-apiserver-mstr-zj9p                    1/1     Running   0          26m
kube-system   kube-controller-manager-mstr-zj9p           1/1     Running   0          26m
kube-system   kube-proxy-76m4h                            1/1     Running   0          26m
kube-system   kube-proxy-9229x                            1/1     Running   0          23m
kube-system   kube-scheduler-mstr-zj9p                    1/1     Running   0          26m
kube-system   vmware-cloud-director-ccm-99fd59464-qjj7n   1/1     Running   0          26m

Testing the Load Balancer service

Lets deploy a nginx webserver and expose it using all of the infrastructure that we setup above.

kubectl create deployment webserver --image nginx

Wait for the deployment to start and the pod to go into a running state. You can use this command to check

kubectl get deploy webserver
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
webserver   1/1     1            1           7h47m

Now we can’t access the nginx default web page yet until we expose it using the load balancer service.

kubectl expose deployment webserver --port=80 --type=LoadBalancer

Wait for the load balancer service to start and the pod to go into a running state. During this time, you’ll see the service engines being provisioned automatically by NSX ALB. It’ll take 10 minutes or so to get everything up and running.

You can use this command to check when the load balancer service has completed and check the EXTERNAL-IP.

kubectl get service webserver
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
webserver   LoadBalancer   100.71.45.194   10.149.1.114   80:32495/TCP   7h48m

You can see that NSX ALB, VCD and NSX-T all worked together to expose the nginx applicationto the outside world.

The external IP of 10.149.1.114 in my environment is an uplink segment on a T0 that I have configured for VCD tenants to use as egress and ingress into their organization VDC. It is the external network for their VDCs.

Paste the external IP into a web browser and you should see the nginx web page.

In the next post, I’ll go over the end to end network flow to show how this all connects NSX ALB, VCD, NSX-T and Kubernetes together.

VMware Cloud Director CSI Driver for Kubernetes

Container Service Extension (CSE) 3.1.1 now supports persistent volumes that are backed by VCD’s Named Disk feature.

Setting up the VCD CSI driver on your Kubernetes cluster

Container Service Extension (CSE) 3.1.1 now supports persistent volumes that are backed by VCD’s Named Disk feature. These now appear under Storage – Named disks in VCD. To use this functionality today (28 September 2021), you’ll need to deploy CSE 3.1.1 beta with VCD 10.3. See this previous post for details.

Ideally, you want to deploy the CSI driver using the same user that also deployed the Kubernetes cluster into VCD. In my environment, I used a user named tenant1-admin, this user has the Organization Administrator role with the added right:

Compute – Organization VDC – Create a Shared Disk.

Create the vcloud-basic-auth.yaml

Before you can create persistent volumes you have to setup the Kubernetes cluster with the VCD CSI driver.

Ensure you can log into the cluster by downloading the kube config and logging into it using the correct context.

kubectl config get-contexts
CURRENT   NAME                          CLUSTER      AUTHINFO           NAMESPACE
*         kubernetes-admin@kubernetes   kubernetes   kubernetes-admin

Create the vcloud-basic-auth.yaml file which is used to setup the VCD CSI driver for this Kubernetes cluster.

VCDUSER=$(echo -n 'tenant1-admin' | base64)
PASSWORD=$(echo -n 'Vmware1!' | base64)

cat > vcloud-basic-auth.yaml << END
---
apiVersion: v1
kind: Secret
metadata:
 name: vcloud-basic-auth
 namespace: kube-system
data:
 username: "$VCDUSER"
 password: "$PASSWORD"
END

Install the CSI driver into the Kubernetes cluster.

kubectl apply  -f vcloud-basic-auth.yaml

You should see three new pods starting in the kube-system namespace.

kube-system   csi-vcd-controllerplugin-0                  3/3     Running   0          43m     100.96.1.10     node-xgsw   <none>           <none>
kube-system   csi-vcd-nodeplugin-bckqx                    2/2     Running   0          43m     192.168.0.101   node-xgsw   <none>           <none>
kube-system   vmware-cloud-director-ccm-99fd59464-swh29   1/1     Running   0          43m     192.168.0.100   mstr-31jt   <none>           <none>

Setup a Storage Class

Here’s my storage-class.yaml file, which is used to setup the storage class for my Kubernetes cluster.

apiVersion: v1
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
  name: vcd-disk-dev
provisioner: named-disk.csi.cloud-director.vmware.com
reclaimPolicy: Delete
parameters:
  storageProfile: "truenas-iscsi-luns"
  filesystem: "ext4"

Notice that the storageProfile needs to be set to either “*” for any storage policy or the name of a storage policy that you has access to in your Organization VDC.

Create the storage class by applying that file.

kubectl apply -f storage-class.yaml

You can see if that was successful by getting all storage classes.

kubectl get storageclass
NAME           PROVISIONER                                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
vcd-disk-dev   named-disk.csi.cloud-director.vmware.com   Delete          Immediate           false                  43h

Make the storage class the default

kubectl patch storageclass vcd-disk-dev -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Using the VCD CSI driver

Now that we’ve got a storage class and the driver installed, we can now deploy a persistent volume claim and attach it to a pod. Lets create a persistent volume claim first.

Creating a persistent volume claim

We will need to prepare another file, I’ve called my my-pvc.yaml, and it looks like this.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: "vcd-disk-dev"

Lets deploy it

kubectl apply -f my-pvc.yaml

We can check that it deployed with this command

kubectl get pvc
NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
my-pvc   Bound    pvc-2ddeccd0-e092-4aca-a090-dff9694e2f04   1Gi        RWO            vcd-disk-dev   36m

Attaching the persistent volume to a pod

Lets deploy an nginx pod that will attach the PV and use it for nginx.

My pod.yaml looks like this.

apiVersion: v1
kind: Pod
metadata:
  name: pod
  labels:
    app : nginx
spec:
  volumes:
    - name: my-pod-storage
      persistentVolumeClaim:
        claimName: my-pvc
  containers:
    - name: my-pod-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: my-pod-storage

You can see that the persistentVolumeClaim, claimName: my-pvc, this aligns to the name of the PVC. I’ve also mounted it to /usr/share/nginx/html within the nginx pod.

Lets attach the PV.

kubectl apply -f pod.yaml

You’ll see a few things happen in the Recent Tasks pane when you run this. You can see that Kubernetes has attached the PV to the nginx pod using the CSI driver, the driver informs VCD to attach the disk to the worker node.

If you open up vSphere Web Client, you can see that the disk is now attached to the worker node.

You can also see the CSI driver doing its thing if you take a look at the logs with this command.

kubectl logs csi-vcd-controllerplugin-0 -n kube-system -c csi-attacher

Checking the mount in the pod

You can log into the nginx pod using this command.

kubectl exec -it pod -- bash

Then type mount and df to see the mount is present and the size of the mount point.

df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdb          999320    1288    929220   1% /usr/share/nginx/html

mount
/dev/sdb on /usr/share/nginx/html type ext4 (rw,relatime)

The size is correct, being 1GB and the disk is mounted.

Describing the pod gives us more information.

kubectl describe po pod
Name:         pod
Namespace:    default
Priority:     0
Node:         node-xgsw/192.168.0.101
Start Time:   Sun, 26 Sep 2021 12:43:15 +0300
Labels:       app=nginx
Annotations:  <none>
Status:       Running
IP:           100.96.1.12
IPs:
  IP:  100.96.1.12
Containers:
  my-pod-container:
    Container ID:   containerd://6a194ac30dab7dc5a5127180af139e531e650bedbb140e4dc378c21869bd570f
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:853b221d3341add7aaadf5f81dd088ea943ab9c918766e295321294b035f3f3e
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 26 Sep 2021 12:43:34 +0300
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/nginx/html from my-pod-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xm4gd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  my-pod-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-pvc
    ReadOnly:   false
  default-token-xm4gd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xm4gd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

Useful commands

Show storage classes

kubectl get storageclass

Show persistent volumes and persistent volume claims

kubectl get pv,pvc

Show all pods running in the cluster

kubectl get po -A -o wide

Describe the nginx pod

kubectl describe po pod

Show logs for the CSI driver

kubectl logs csi-vcd-controllerplugin-0 -n kube-system -c csi-attacher

kubectl logs csi-vcd-controllerplugin-0 -n kube-system -c csi-provisioner

kubectl logs csi-vcd-controllerplugin-0 -n kube-system -c vcd-csi-plugin

kubectl logs vmware-cloud-director-ccm-99fd59464-swh29 -n kube-system

Useful links

https://github.com/vmware/cloud-director-named-disk-csi-driver/blob/0.1.0-beta/README.md

Install Container Service Extension 3.1.1 beta with VCD 10.3

Prepare the Photon OS 3 VM

Deploy the OVA using this link.

Photon OS 3 does not support Linux guest customization unfortunately, so we will use the links below to manually setup the OS with a hostname and static IP address.

Boot the VM, the default credentials are root with password changeme. Change the default password.

Set host name by changing the /etc/hostname file.

Configure a static IP using this guide.

Add DNS server using this guide.

Reboot.

Photon 3 has the older repositories, so we will need to update to newer repositories as detailed in this KB article. I’ve included this in the instructions below.

Copypasta or use create a bash script.

# Update Photon repositories
cd /etc/yum.repos.d/
sed  -i 's/dl.bintray.com\/vmware/packages.vmware.com\/photon\/$releasever/g' photon.repo photon-updates.repo photon-extras.repo photon-debuginfo.repo

# Update Photon
tdnf --assumeyes update

# Install dependencies
tdnf --assumeyes install build-essential python3-devel python3-pip git

# Prepare cse user and application directories
mkdir -p /opt/vmware/cse
chmod 775 -R /opt
chmod 777 /
groupadd cse
useradd cse -g cse -m -p Vmware1! -d /opt/vmware/cse
chown cse:cse -R /opt

# Run as cse user
su - cse
mkdir -p ~/.ssh
cat >> ~/.ssh/authorized_keys << EOF
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
EOF

cat >> ~/.bash_profile << EOF
# For Container Service Extension
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
source /opt/vmware/cse/python/bin/activate
EOF

# Install CSE in virtual environment
python3 -m venv /opt/vmware/cse/python
source /opt/vmware/cse/python/bin/activate
pip3 install git+https://github.com/vmware/container-service-extension.git@3.1.1.0b2

cse version

source ~/.bash_profile

# Prepare vcd-cli
mkdir -p ~/.vcd-cli
cat >  ~/.vcd-cli/profiles.yaml << EOF
extensions:
- container_service_extension.client.cse
EOF

vcd cse version

# Add my Let's Encrypt intermediate and root certs. Use your certificates issued by your CA to enable verify=true with CSE.
cat >> /opt/vmware/cse/python/lib/python3.7/site-packages/certifi/cacert.pem << EOF #ok
-----BEGIN CERTIFICATE-----
MIIFFjCCAv6gAwIBAgIRAJErCErPDBinU/bWLiWnX1owDQYJKoZIhvcNAQELBQAw
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMjAwOTA0MDAwMDAw
WhcNMjUwOTE1MTYwMDAwWjAyMQswCQYDVQQGEwJVUzEWMBQGA1UEChMNTGV0J3Mg
RW5jcnlwdDELMAkGA1UEAxMCUjMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQC7AhUozPaglNMPEuyNVZLD+ILxmaZ6QoinXSaqtSu5xUyxr45r+XXIo9cP
R5QUVTVXjJ6oojkZ9YI8QqlObvU7wy7bjcCwXPNZOOftz2nwWgsbvsCUJCWH+jdx
sxPnHKzhm+/b5DtFUkWWqcFTzjTIUu61ru2P3mBw4qVUq7ZtDpelQDRrK9O8Zutm
NHz6a4uPVymZ+DAXXbpyb/uBxa3Shlg9F8fnCbvxK/eG3MHacV3URuPMrSXBiLxg
Z3Vms/EY96Jc5lP/Ooi2R6X/ExjqmAl3P51T+c8B5fWmcBcUr2Ok/5mzk53cU6cG
/kiFHaFpriV1uxPMUgP17VGhi9sVAgMBAAGjggEIMIIBBDAOBgNVHQ8BAf8EBAMC
AYYwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMBMBIGA1UdEwEB/wQIMAYB
Af8CAQAwHQYDVR0OBBYEFBQusxe3WFbLrlAJQOYfr52LFMLGMB8GA1UdIwQYMBaA
FHm0WeZ7tuXkAXOACIjIGlj26ZtuMDIGCCsGAQUFBwEBBCYwJDAiBggrBgEFBQcw
AoYWaHR0cDovL3gxLmkubGVuY3Iub3JnLzAnBgNVHR8EIDAeMBygGqAYhhZodHRw
Oi8veDEuYy5sZW5jci5vcmcvMCIGA1UdIAQbMBkwCAYGZ4EMAQIBMA0GCysGAQQB
gt8TAQEBMA0GCSqGSIb3DQEBCwUAA4ICAQCFyk5HPqP3hUSFvNVneLKYY611TR6W
PTNlclQtgaDqw+34IL9fzLdwALduO/ZelN7kIJ+m74uyA+eitRY8kc607TkC53wl
ikfmZW4/RvTZ8M6UK+5UzhK8jCdLuMGYL6KvzXGRSgi3yLgjewQtCPkIVz6D2QQz
CkcheAmCJ8MqyJu5zlzyZMjAvnnAT45tRAxekrsu94sQ4egdRCnbWSDtY7kh+BIm
lJNXoB1lBMEKIq4QDUOXoRgffuDghje1WrG9ML+Hbisq/yFOGwXD9RiX8F6sw6W4
avAuvDszue5L3sz85K+EC4Y/wFVDNvZo4TYXao6Z0f+lQKc0t8DQYzk1OXVu8rp2
yJMC6alLbBfODALZvYH7n7do1AZls4I9d1P4jnkDrQoxB3UqQ9hVl3LEKQ73xF1O
yK5GhDDX8oVfGKF5u+decIsH4YaTw7mP3GFxJSqv3+0lUFJoi5Lc5da149p90Ids
hCExroL1+7mryIkXPeFM5TgO9r0rvZaBFOvV2z0gp35Z0+L4WPlbuEjN/lxPFin+
HlUjr8gRsI3qfJOQFy/9rKIJR0Y/8Omwt/8oTWgy1mdeHmmjk7j1nYsvC9JSQ6Zv
MldlTTKB3zhThV1+XWYp6rjd5JW1zbVWEkLNxE7GJThEUG3szgBVGP7pSWTUTsqX
nLRbwHOoq7hHwg==
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIFYDCCBEigAwIBAgIQQAF3ITfU6UK47naqPGQKtzANBgkqhkiG9w0BAQsFADA/
MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT
DkRTVCBSb290IENBIFgzMB4XDTIxMDEyMDE5MTQwM1oXDTI0MDkzMDE4MTQwM1ow
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwggIiMA0GCSqGSIb3DQEB
AQUAA4ICDwAwggIKAoICAQCt6CRz9BQ385ueK1coHIe+3LffOJCMbjzmV6B493XC
ov71am72AE8o295ohmxEk7axY/0UEmu/H9LqMZshftEzPLpI9d1537O4/xLxIZpL
wYqGcWlKZmZsj348cL+tKSIG8+TA5oCu4kuPt5l+lAOf00eXfJlII1PoOK5PCm+D
LtFJV4yAdLbaL9A4jXsDcCEbdfIwPPqPrt3aY6vrFk/CjhFLfs8L6P+1dy70sntK
4EwSJQxwjQMpoOFTJOwT2e4ZvxCzSow/iaNhUd6shweU9GNx7C7ib1uYgeGJXDR5
bHbvO5BieebbpJovJsXQEOEO3tkQjhb7t/eo98flAgeYjzYIlefiN5YNNnWe+w5y
sR2bvAP5SQXYgd0FtCrWQemsAXaVCg/Y39W9Eh81LygXbNKYwagJZHduRze6zqxZ
Xmidf3LWicUGQSk+WT7dJvUkyRGnWqNMQB9GoZm1pzpRboY7nn1ypxIFeFntPlF4
FQsDj43QLwWyPntKHEtzBRL8xurgUBN8Q5N0s8p0544fAQjQMNRbcTa0B7rBMDBc
SLeCO5imfWCKoqMpgsy6vYMEG6KDA0Gh1gXxG8K28Kh8hjtGqEgqiNx2mna/H2ql
PRmP6zjzZN7IKw0KKP/32+IVQtQi0Cdd4Xn+GOdwiK1O5tmLOsbdJ1Fu/7xk9TND
TwIDAQABo4IBRjCCAUIwDwYDVR0TAQH/BAUwAwEB/zAOBgNVHQ8BAf8EBAMCAQYw
SwYIKwYBBQUHAQEEPzA9MDsGCCsGAQUFBzAChi9odHRwOi8vYXBwcy5pZGVudHJ1
c3QuY29tL3Jvb3RzL2RzdHJvb3RjYXgzLnA3YzAfBgNVHSMEGDAWgBTEp7Gkeyxx
+tvhS5B1/8QVYIWJEDBUBgNVHSAETTBLMAgGBmeBDAECATA/BgsrBgEEAYLfEwEB
ATAwMC4GCCsGAQUFBwIBFiJodHRwOi8vY3BzLnJvb3QteDEubGV0c2VuY3J5cHQu
b3JnMDwGA1UdHwQ1MDMwMaAvoC2GK2h0dHA6Ly9jcmwuaWRlbnRydXN0LmNvbS9E
U1RST09UQ0FYM0NSTC5jcmwwHQYDVR0OBBYEFHm0WeZ7tuXkAXOACIjIGlj26Ztu
MA0GCSqGSIb3DQEBCwUAA4IBAQAKcwBslm7/DlLQrt2M51oGrS+o44+/yQoDFVDC
5WxCu2+b9LRPwkSICHXM6webFGJueN7sJ7o5XPWioW5WlHAQU7G75K/QosMrAdSW
9MUgNTP52GE24HGNtLi1qoJFlcDyqSMo59ahy2cI2qBDLKobkx/J3vWraV0T9VuG
WCLKTVXkcGdtwlfFRjlBz4pYg1htmf5X6DYO8A4jqv2Il9DjXA6USbW1FzXSLr9O
he8Y4IWS6wY7bCkjCWDcRQJMEhg76fsO3txE+FiYruq9RUWhiF1myv4Q6W+CyBFC
Dfvp7OOGAN6dEOM4+qR9sdjoSYKEBpsr6GtPAQw4dy753ec5
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIDSjCCAjKgAwIBAgIQRK+wgNajJ7qJMDmGLvhAazANBgkqhkiG9w0BAQUFADA/
MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT
DkRTVCBSb290IENBIFgzMB4XDTAwMDkzMDIxMTIxOVoXDTIxMDkzMDE0MDExNVow
PzEkMCIGA1UEChMbRGlnaXRhbCBTaWduYXR1cmUgVHJ1c3QgQ28uMRcwFQYDVQQD
Ew5EU1QgUm9vdCBDQSBYMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB
AN+v6ZdQCINXtMxiZfaQguzH0yxrMMpb7NnDfcdAwRgUi+DoM3ZJKuM/IUmTrE4O
rz5Iy2Xu/NMhD2XSKtkyj4zl93ewEnu1lcCJo6m67XMuegwGMoOifooUMM0RoOEq
OLl5CjH9UL2AZd+3UWODyOKIYepLYYHsUmu5ouJLGiifSKOeDNoJjj4XLh7dIN9b
xiqKqy69cK3FCxolkHRyxXtqqzTWMIn/5WgTe1QLyNau7Fqckh49ZLOMxt+/yUFw
7BZy1SbsOFU5Q9D8/RhcQPGX69Wam40dutolucbY38EVAjqr2m7xPi71XAicPNaD
aeQQmxkqtilX4+U9m5/wAl0CAwEAAaNCMEAwDwYDVR0TAQH/BAUwAwEB/zAOBgNV
HQ8BAf8EBAMCAQYwHQYDVR0OBBYEFMSnsaR7LHH62+FLkHX/xBVghYkQMA0GCSqG
SIb3DQEBBQUAA4IBAQCjGiybFwBcqR7uKGY3Or+Dxz9LwwmglSBd49lZRNI+DT69
ikugdB/OEIKcdBodfpga3csTS7MgROSR6cz8faXbauX+5v3gTt23ADq1cEmv8uXr
AvHRAosZy5Q6XkjEGB5YGV8eAlrwDPGxrancWYaLbumR9YbK+rlmM6pZW87ipxZz
R8srzJmwN0jP41ZL9c8PDHIyh8bwRLtTcm1D9SZImlJnt1ir/md2cXjbDaJWFBM5
JDGFoqgCWjBH4d1QB7wCCZAA62RjYJsWvIjJEubSfZGL+T0yjWW06XyxV3bqxbYo
Ob8VZRzI9neWagqNdwvYkQsEjgfbKbYK7p2CNTUQ
-----END CERTIFICATE-----
EOF

# Create service account
vcd login vcd.vmwire.com system administrator -p Vmware1!
cse create-service-role vcd.vmwire.com
# Enter system administrator username and password

# Create VCD service account for CSE
vcd user create --enabled svc-cse Vmware1! "CSE Service Role"

# Create config file
mkdir -p /opt/vmware/cse/config

cat > /opt/vmware/cse/config/config-not-encrypted.conf << EOF
mqtt:
  verify_ssl: false

vcd:
  host: vcd.vmwire.com
  log: true
  password: Vmware1!
  port: 443
  username: administrator
  verify: true

vcs:
- name: vcenter.vmwire.com
  password: Vmware1!
  username: administrator@vsphere.local
  verify: true

service:
  enforce_authorization: false
  legacy_mode: false
  log_wire: false
  processors: 15
  telemetry:
    enable: true

broker:
  catalog: cse-catalog
  ip_allocation_mode: pool
  network: default-organization-network
  org: cse
  remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/master/template_v2.yaml
  storage_profile: 'truenas-iscsi-luns'
  vdc: cse-vdc
EOF

cse encrypt /opt/vmware/cse/config/config-not-encrypted.conf --output /opt/vmware/cse/config/config.yaml
chmod 600 /opt/vmware/cse/config/config.yaml
cse check /opt/vmware/cse/config/config.yaml

cse template list

mkdir -p ~/.ssh

# Add your public key(s) here
cat >> ~/.ssh/authorized_keys << EOF
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
EOF

# Import TKGm ova with this command
# Copy the ova to /home/ first, the ova can be obtained from my.vmware.com, ensure that it has chmod 644 permissions.
cse template import -F /home/ubuntu-2004-kube-v1.20.5-vmware.2-tkg.1-6700972457122900687.ova

# Install CSE
cse install -k ~/.ssh/authorized_keys

# Or use this if you've already installed and want to skip template creation again
cse upgrade --skip-template-creation -k ~/.ssh/authorized_keys

# Setup cse.sh
cat > /opt/vmware/cse/cse.sh << EOF
#!/usr/bin/env bash
source /opt/vmware/cse/python/bin/activate
export CSE_CONFIG=/opt/vmware/cse/config/config.yaml
export CSE_CONFIG_PASSWORD=Vmware1!
cse run
EOF

# Make cse.sh executable
chmod +x /opt/vmware/cse/cse.sh

# Deactivate the python virtual environment and go back to root
deactivate
exit

# Setup cse.service, use MQTT and not RabbitMQ
cat > /etc/systemd/system/cse.service << EOF
[Unit]
Description=Container Service Extension for VMware Cloud Director

[Service]
ExecStart=/opt/vmware/cse/cse.sh
User=cse
WorkingDirectory=/opt/vmware/cse
Type=simple
Restart=always

[Install]
WantedBy=default.target
EOF

systemctl enable cse.service
systemctl start cse.service

systemctl status cse.service

Install and enable the CSE UI Plugin for VCD

Download the latest version from https://github.com/vmware/container-service-extension/raw/master/cse_ui/3.0.4/container-ui-plugin.zip.

Enable it for the tenants that you want or for all tenants.

Enable the rights bundles

Follow the instructions in this other post.

Enable Global Roles to use CSE or Configure Rights Bundles

Edit the Organization Administrator role and scroll all the way down to the bottom and click both the View 8/8 and Manage 12/12, then Save.

Useful links

https://github.com/vmware/container-service-extension/commit/5d2a60b5eeb164547aef39602f9871c06726863e

https://vmware.github.io/container-service-extension/cse3_1/RELEASE_NOTES.html

Rights Bundles for Container Service Extension

A quick note on the Rights Bundles for Container Service Extension when enabling native, TKGm or TKGs clusters.

The rights bundle named vmware:tkgcluster Entitlement are for TKGs clusters and NOT for TKGm.

The rights bundle named cse:nativeCluster Entitlement are for native clusters AND also for TKGm clusters.

Yes, this is very confusing and will be fixed in an upcoming release.

You can see a brief note about this on the release notes here.

Users deploying VMware Tanzu Kubernetes Grid clusters should have the rights required to deploy exposed native clusters and additionally the right Full Control: CSE:NATIVECLUSTER. This right is crucial for VCD CPI to work properly.

So in summary, for a user to be able to deploy TKGm clusters they will need to have the cse:nativeCluster Entitlement rights.

To publish these rights, go to the Provider portal and navigate to Administration, Rights Bundles.

Click on the radio button next to cse:nativeCluster Entitlement and click on Publish, then publish to the desired tenant or to all tenants.

Using Let’s Encrypt certificates with Cloud Director

Let’s Encrypt (LE) is a certificate authority that issues free SSL certificates for use in your web applications. This post details how to get LE setup to support Cloud Director specifically with a wildcard certificate.

Certbot

LE uses an application called certbot to request, automatically download and renew certificates. You can think of certbot as the client for LE.

First you’ll need to create a client machine that can request certificates from LE. I started with a simple CentOS VM. For more details about installing certbot into your preferred OS read this page here.

Once you get yours on the network with outbound internet access, you can start by performing the following.

 # Update software
 yum update
 
 # Install wget if not already installed
 yum install wget
 
 # Download the certbot application.
 wget https://dl.eff.org/certbot-auto
 
 # Move certbot into a local application directory
 sudo mv certbot-auto /usr/local/bin/certbot-auto
 
 # Set ownership to root
 sudo chown root /usr/local/bin/certbot-auto
 
 # Change permisssions for certbot
 sudo chmod 0755 /usr/local/bin/certbot-auto

Now you’re ready to request certificates. Run the following command but of course replacing your desired domain within the ‘your.domain.here ‘.

/usr/local/bin/certbot-auto --config-dir $HOME/.certbot --work-dir $HOME/.certbot/work --logs-dir $HOME/.certbot/logs  certonly --manual --preferred-challenges=dns -d '*.vmwire.com'

This will create a request for a wildcard certificate for *.vmwire.com.

You’ll then be asked to create a new DNS TXT record on your public DNS server for the domain that you are requesting to validate that you can manage that domain. Here’s what mine looks like for the above.

This means that you can only request public certificates with LE, private certificates are not supported.

You will then see a response from LE such as the following:

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /root/.certbot/live/vmwire.com/fullchain.pem
   Your key file has been saved at:
   /root/.certbot/live/vmwire.com/privkey.pem
   Your cert will expire on 2020-12-24. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot-auto
   again. To non-interactively renew *all* of your certificates, run
   "certbot-auto renew"

Updating Cloud Director certificates

Before you can use new certificate, you need to perform some operations with the JAVA Keytool to import the pem formatted certificates into the certificates.ks file that Cloud Director uses.

The issued certificate is available in the directory

/root/.certbot/live/

Navigate to there using an SSH client and you’ll see a structure like this

Download the entire folder for the next steps. Within the folder you’ll see the following files

Filename	Purpose
cert.pem	your certificate in pem format
chain.pem	the Let’s Encrypt root CA certificate in pem format
fullchain.pem	your wildcard certificate AND the LE root CA certificate in pem format
privkey.pem	the private key for your certificate (without passphrase)

We need to rename the file to something that the JAVA Keytool can work with. I renamed mine to the following:

Original filename	New Filename
cert.pem	vmwire-com.crt
chain.pem	vmwire-com-ca.crt
fullchain.pem	not needed
privkey.pem	vmwire-com.key

Copy the three new files to one of the Cloud Director cells, use the /tmp directory.

Now launch an SSH session to one of the Cloud Director cells and perform the following.

# Import the certificate and the private key into a new pfx format certificate
openssl pkcs12 -export -out /tmp/vmwire-com.pfx -inkey /tmp/vmwire-com.key -in /tmp/vmwire-com.crt

# Create a new certificates.ks file and import the pfx formatted certificate
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storepass Vmware1! -keypass Vmware1! -storetype JCEKS -importkeystore -srckeystore /tmp/vmwire-com.pfx -srcstorepass Vmware1!

# Change the alias for the first entry to be http
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storetype JCEKS -changealias -alias 1 -destalias http -storepass Vmware1!

# Import the certificate again, this time creating alias 1 again (we will use the same wildcard certifiate for the consoleproxy)
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storepass Vmware1! -keypass Vmware1! -storetype JCEKS -importkeystore -srckeystore /tmp/vmwire-com.pfx -srcstorepass Vmware1!

# Change the alias for the first entry to be consoleproxy
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storetype JCEKS -changealias -alias 1 -destalias consoleproxy -storepass Vmware1!

# Import the root certificate into the certificates.ks file
/opt/vmware/vcloud-director/jre/bin/keytool -importcert -alias root -file /tmp/vmwire-com-ca.crt -storetype JCEKS -keystore /tmp/certificates.ks -storepass Vmware1!

# List all the entries, you should now see three, http, consoleproxy and root
/opt/vmware/vcloud-director/jre/bin/keytool  -list -keystore /tmp/certificates.ks -storetype JCEKS -storepass Vmware1!

# Stop the Cloud Director service on all cells
service vmware-vcd stop

# Make a backup of the current certificate
mv /opt/vmware/vcloud-director/certificates.ks /opt/vmware/vcloud-director/certificates.ks.old

# Copy the new certificate to the Cloud Director directory
cp /tmp/certificates.ks /opt/vmware/vcloud-director/

# List all the entries, you should now see three, http, consoleproxy and root
/opt/vmware/vcloud-director/jre/bin/keytool  -list -keystore /opt/vmware/vcloud-director/certificates.ks -storetype JCEKS -storepass Vmware1!

# Reconfigure the Cloud Director application to use the new certificate
/opt/vmware/vcloud-director/bin/configure

# Start the Cloud Director application
service vmware-vcd start

# Monitor startup logs
tail -f /opt/vmware/vcloud-director/logs/cell.log

Copy the certificates.ks file to the other cells and perform the configure on the other cells to update the certificates for all cells. Don’t forget to update the certificate on the load balancer too. This other post shows how to do it with the NSX-T load balancer.

Check out the new certificate at https://vcloud.vmwire.com/tenant/vmwire.

Automate NSX-T Load Balancer setup for Cloud Director and the Tenant App

This post describes how to use the NSX-T Policy API to automate the creation of load balancer configurations for Cloud Director and the vRealize Operations Tenant App.

I’ve included a Postman collection that contains all of the necessary API calls to get everything configured. There is also a Postman environment that contains the necessary variables to successfully configure the load balancer services.

To get started import the collection and environment into Postman.

You’ll see the collection in Postman named NSX-T Load Balancer Setup. All the steps are numbered to import certificates, configure the Cloud Director load balancer services. I’ve also included the calls to create the load balancer services for the vRealize Operations Tenant App.

Before you run any of those API calls, you’ll first want to import the Postman environment. Once imported you’ll see the environments in the top right screen of Postman, the environment is called NSX-T Load Balancer Setup.

Complete your environment variables.

Variable	Value Description
nsx_vip	nsx-t manager cluster virtual ip
nsx-manager-user	nsx-t manager username, usually admin
nsx-manager-password	nsx-t manager password
vcd-public-ip	public ip address for the vcd service to be configured on the load balancer
tenant-app-public-ip	public ip address for the tenant app service to be configured on the load balancer
vcd-cert-name	a name for the imported vcd http certificate
vcd-cert-private-key	vcd http certificate private key in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. for example: —–BEGIN RSA PRIVATE KEY—–\n<private key>\n—–END RSA PRIVATE KEY—–
vcd-cert-passphrase	vcd private key passphrase
vcd-certificate	vcd http certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. For example: —–BEGIN CERTIFICATE—–\nMIIGADCCBOigAwIBAgIRALUVXndtVGMeRM1YiMqzBCowDQYJKoZIhvcNAQELBQAw\ngY8xCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO\nBgNVBAcTB1NhbGZvcmQxGDAWBgNVBAoTD1NlY3RpZ28gTGltaXRlZDE3MDUGA1UE\nAxMuU2VjdGlnbyBSU0EgRG9tYWluIFZhbGlkYXRpb24gU2VjdXJlIFNlcnZlciBD\nQTAeFw0xOTA4MjMwMDAwMDBaFw0yMDA4MjIyMzU5NTlaMFUxITAfBgNVBAsTGERv\nbWFpbiBDb250cm9sIFZhbGlkYXRlZDEUMBIGA1UECxMLUG9zaXRpdmVTU0wxGjAY\nBgNVBAMTEXZjbG91ZC52bXdpcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A\nMIIBCgKCAQEAqh9sn6bNiDmmg3fJSG4zrK9IbrdisALFqnJQTkkErvoky2ax0RzV\n/ZJ/1fNHpvy1yT7RSZbKcWicoxatYPCgFHDzz2JwgvfwQCRMOfbPzohTSAhrPZph\n4FOPnrF8iwGggTxp+/2/ixg0DjQZL32rc9ax1qEvSURt571hUE7uLkRbPrdbocSZ\n4c2atVh8K1fp3uBqEbAs0UyjW5PK3wIN5ZRFArxc5kiGW0btN1RmoWwOmuJkAtu7\nzuaAJcgr/UVb1PP+GgAvKdmikssB1MWQALTRHm7H2GJp2MlbyGU3ZROSPkSSaNsq\n4otCJxtvQze/lB5QGWj5V2B7YbNJKwJdXQIDAQABo4ICjjCCAoowHwYDVR0jBBgw\nFoAUjYxexFStiuF36Zv5mwXhuAGNYeEwHQYDVR0OBBYEFNhZaRisExXrYrqfIIm6\n9TP8JrqwMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQG\nCCsGAQUFBwMBBggrBgEFBQcDAjBJBgNVHSAEQjBAMDQGCysGAQQBsjEBAgIHMCUw\nIwYIKwYBBQUHAgEWF2h0dHBzOi8vc2VjdGlnby5jb20vQ1BTMAgGBmeBDAECATCB\nhAYIKwYBBQUHAQEEeDB2ME8GCCsGAQUFBzAChkNodHRwOi8vY3J0LnNlY3RpZ28u\nY29tL1NlY3RpZ29SU0FEb21haW5WYWxpZGF0aW9uU2VjdXJlU2VydmVyQ0EuY3J0\nMCMGCCsGAQUFBzABhhdodHRwOi8vb2NzcC5zZWN0aWdvLmNvbTAzBgNVHREELDAq\nghF2Y2xvdWQudm13aXJlLmNvbYIVd3d3LnZjbG91ZC52bXdpcmUuY29tMIIBAgYK\nKwYBBAHWeQIEAgSB8wSB8ADuAHUAsh4FzIuizYogTodm+Su5iiUgZ2va+nDnsklT\nLe+LkF4AAAFsv3BsIwAABAMARjBEAiBat+l0e3BTu+EBcRJfR8hCA/CznWm1mbVl\nxZqDoKM6tAIgON6U0YoqA91xxpXH2DyA04o5KSdSvNT05wz2aa7zkzwAdQBep3P5\n31bA57U2SH3QSeAyepGaDIShEhKEGHWWgXFFWAAAAWy/cGw+AAAEAwBGMEQCIDHl\njofAcm5GqECwtjBfxYD7AFkJn4Ez0IGRFrux4ldiAiAaNnkMbf0P9arSDNno4hQT\nIJ2hUaIWNfuKBEIIkfqhCTANBgkqhkiG9w0BAQsFAAOCAQEAZCubBHRV+m9iiIeq\nCoaFV2YZLQUz/XM4wzQL+73eqGHINp6xh/+kYY6vw4j+ypr9P8m8+ouqichqo7GJ\nMhjtbXrB+TTRwqQgDHNHP7egBjkO+eDMxK4aa3x1r1AQoRBclPvEbXCohg2sPUG5\nZleog76NhPARR43gcxYC938OH/2TVAsa4JApF3vbCCILrbTuOy3Z9rf3aQLSt6Jp\nkh85w6AlSkXhQJWrydQ1o+NxnfQmTOuIH8XEQ2Ne1Xi4sbiMvWQ7dlH5/N8L8qWQ\nEPCWn+5HGxHIJFXMsgLEDypvuXGt28ZV/T91DwPLeGCEp8kUC3N+uamLYeYMKOGD\nMrToTA==\n—–END CERTIFICATE—–
ca-cert-name	a name for the imported ca root certificate
ca-certificate	ca root certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.
vcd-node1-name	the hostname for the first vcd appliance
vcd-node1-ip	the dmz ip address for the first vcd appliance
vcd-node2-name	the hostname for the second vcd appliance
vcd-node2-ip	the dmz ip address for the second vcd appliance
vcd-node3-name	the hostname for the third vcd appliance
vcd-node3-ip	the dmz ip address for the third vcd appliance
tenant-app-node-name	the hostname for the vrealize operations tenant app appliance
tenant-app-node-ip	the dmz ip address for the vrealize operations tenant app appliance
tenant-app-cert-name	a name for the imported tenant app certificate
tenant-app-cert-private-key	tenant app certificate private key in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. For example: —–BEGIN RSA PRIVATE KEY—–\n<private key>\n—–END RSA PRIVATE KEY—–
tenant-app-cert-passphrase	tenant app private key passphrase
tenant-app-certificate	tenant app certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. For example: —–BEGIN CERTIFICATE—–\nMIIGADCCBOigAwIBAgIRALUVXndtVGMeRM1YiMqzBCowDQYJKoZIhvcNAQELBQAw\ngY8xCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO\nBgNVBAcTB1NhbGZvcmQxGDAWBgNVBAoTD1NlY3RpZ28gTGltaXRlZDE3MDUGA1UE\nAxMuU2VjdGlnbyBSU0EgRG9tYWluIFZhbGlkYXRpb24gU2VjdXJlIFNlcnZlciBD\nQTAeFw0xOTA4MjMwMDAwMDBaFw0yMDA4MjIyMzU5NTlaMFUxITAfBgNVBAsTGERv\nbWFpbiBDb250cm9sIFZhbGlkYXRlZDEUMBIGA1UECxMLUG9zaXRpdmVTU0wxGjAY\nBgNVBAMTEXZjbG91ZC52bXdpcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A\nMIIBCgKCAQEAqh9sn6bNiDmmg3fJSG4zrK9IbrdisALFqnJQTkkErvoky2ax0RzV\n/ZJ/1fNHpvy1yT7RSZbKcWicoxatYPCgFHDzz2JwgvfwQCRMOfbPzohTSAhrPZph\n4FOPnrF8iwGggTxp+/2/ixg0DjQZL32rc9ax1qEvSURt571hUE7uLkRbPrdbocSZ\n4c2atVh8K1fp3uBqEbAs0UyjW5PK3wIN5ZRFArxc5kiGW0btN1RmoWwOmuJkAtu7\nzuaAJcgr/UVb1PP+GgAvKdmikssB1MWQALTRHm7H2GJp2MlbyGU3ZROSPkSSaNsq\n4otCJxtvQze/lB5QGWj5V2B7YbNJKwJdXQIDAQABo4ICjjCCAoowHwYDVR0jBBgw\nFoAUjYxexFStiuF36Zv5mwXhuAGNYeEwHQYDVR0OBBYEFNhZaRisExXrYrqfIIm6\n9TP8JrqwMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQG\nCCsGAQUFBwMBBggrBgEFBQcDAjBJBgNVHSAEQjBAMDQGCysGAQQBsjEBAgIHMCUw\nIwYIKwYBBQUHAgEWF2h0dHBzOi8vc2VjdGlnby5jb20vQ1BTMAgGBmeBDAECATCB\nhAYIKwYBBQUHAQEEeDB2ME8GCCsGAQUFBzAChkNodHRwOi8vY3J0LnNlY3RpZ28u\nY29tL1NlY3RpZ29SU0FEb21haW5WYWxpZGF0aW9uU2VjdXJlU2VydmVyQ0EuY3J0\nMCMGCCsGAQUFBzABhhdodHRwOi8vb2NzcC5zZWN0aWdvLmNvbTAzBgNVHREELDAq\nghF2Y2xvdWQudm13aXJlLmNvbYIVd3d3LnZjbG91ZC52bXdpcmUuY29tMIIBAgYK\nKwYBBAHWeQIEAgSB8wSB8ADuAHUAsh4FzIuizYogTodm+Su5iiUgZ2va+nDnsklT\nLe+LkF4AAAFsv3BsIwAABAMARjBEAiBat+l0e3BTu+EBcRJfR8hCA/CznWm1mbVl\nxZqDoKM6tAIgON6U0YoqA91xxpXH2DyA04o5KSdSvNT05wz2aa7zkzwAdQBep3P5\n31bA57U2SH3QSeAyepGaDIShEhKEGHWWgXFFWAAAAWy/cGw+AAAEAwBGMEQCIDHl\njofAcm5GqECwtjBfxYD7AFkJn4Ez0IGRFrux4ldiAiAaNnkMbf0P9arSDNno4hQT\nIJ2hUaIWNfuKBEIIkfqhCTANBgkqhkiG9w0BAQsFAAOCAQEAZCubBHRV+m9iiIeq\nCoaFV2YZLQUz/XM4wzQL+73eqGHINp6xh/+kYY6vw4j+ypr9P8m8+ouqichqo7GJ\nMhjtbXrB+TTRwqQgDHNHP7egBjkO+eDMxK4aa3x1r1AQoRBclPvEbXCohg2sPUG5\nZleog76NhPARR43gcxYC938OH/2TVAsa4JApF3vbCCILrbTuOy3Z9rf3aQLSt6Jp\nkh85w6AlSkXhQJWrydQ1o+NxnfQmTOuIH8XEQ2Ne1Xi4sbiMvWQ7dlH5/N8L8qWQ\nEPCWn+5HGxHIJFXMsgLEDypvuXGt28ZV/T91DwPLeGCEp8kUC3N+uamLYeYMKOGD\nMrToTA==\n—–END CERTIFICATE—–
tier1-full-path	the full path to the nsx-t tier1 gateway that will run the load balancer, for example /infra/tier-1s/stage1-m-ec01-t1-gw01
vcd-dmz-segment-name	the portgroup name of the vcd dmz portgroup, for example stage1-m-vCDFront
allowed_ip_a	an ip address that is allowed to access the /provider URI and the admin API
allowed_ip_b	an ip address that is allowed to access the /provider URI and the admin API

Variables

Now you’re ready to run the calls.

The collection and environment are available to download from Github.

Protecting Cloud Director with NSX-T Load Balancer L7 HTTP Policies

Running Cloud Director (formerly vCloud Director) over the Internet has its benefits however opens up the portal to security risks. To prevent this, we can use the native load balancing capabilities of NSX-T to serve only HTTP access to the URIs that are required and preventing access to unnecessary URIs from the rest of the Internet.

An example of this is to disallow the /provider and /cloudapi/1.0.0/sessions/provider URIs as these are provider side administrator only URIs that a service provider uses to manage the cloud and should not be accessible from the Internet.

The other article that I wrote previously describes the safe URIs and unsafe URIs that can be exposed over the Internet, you can find that article here. That article discuss doing the L7 HTTP policies using Avi. This article will go through how you can achieve the same with the built in NSX-T load balancer.

This article assumes that you already have the Load Balancer configured with the Cloud Director Virtual Servers, Server Pools and HTTPS Profiles and Monitors already set up. If you need a guide on how to do this, then please visit Tomas Fojta’s article here.

The L7 HTTP rules can be set up under Load Balancing | Virtual Servers. Edit the Virtual Server rule for the Cloud Director service and open up the Load Balancer Rules section.

Click on the Set link next to HTTP Access Phase. I’ve already set mine up so you can see that I already have two rules. You should also end up with two rules once this is complete.

Go ahead and add a new rule with the Add Rule button.

The first rule we want to set up is to prevent access from the Internet to the /provider URI but allow an IP address or group of IP addresses to access the service for provider side administration, such as a management bastion host.

Set up you rule as follows:

What we are doing here is creating a condition that when the /provider URI is requested, we drop all incoming connections unless the connection is initiated from the management jump box, this jump box has an IP address of 10.37.5.30. The Negate option is enabled to achieve this. Think of negate as the opposite of the rule, so negate does not drop connections to /provider when the source IP address is 10.37.5.30.

Here’s the brief explanation from the official NSX-T 3.0 Administration Guide.

If negate is enabled, when Connection Drop is configured, all requests not
matching the specified match condition are dropped. Requests matching the
specified match condition are allowed.

Save this rule and lets setup another one to prevent access to the admin API. Setup this second rule as follows:

This time use /cloudapi/1.0.0/sessions/provider as the URI. Again, use the Negate option for your management IP address. Save your second rule and Apply all the changes.

Now you should be able to access /tenant URIs over the Internet but not the /provider URI. However, accessing the /provider URI from 10.37.5.30 (or whatever your equivalent is) will work.

Doing this with the API

Do a PUT against /policy/api/v1/infra/lb-virtual-servers/vcloud with the following.

(Note that the Terraform provider for NSX-T doesn’t support HTTP Access yet. So to automate, use the NSX-T API directly instead.)

{
  "enabled": true,
  "ip_address": "<IP_address_of_this_load_balancer>",
  "ports": [
    "443"
  ],
  "access_log_enabled": false,
  "lb_persistence_profile_path": "/infra/lb-persistence-profiles/default-source-ip-lb-persistence-profile",
  "lb_service_path": "/infra/lb-services/vcloud",
  "pool_path": "/infra/lb-pools/vcd-appliances",
  "application_profile_path": "/infra/lb-app-profiles/vcd-https",
  "client_ssl_profile_binding": {
    "ssl_profile_path": "/infra/lb-client-ssl-profiles/default-balanced-client-ssl-profile",
    "default_certificate_path": "/infra/certificates/my-signed-certificate",
    "client_auth": "IGNORE",
    "certificate_chain_depth": 3
  },
  "server_ssl_profile_binding": {
    "ssl_profile_path": "/infra/lb-server-ssl-profiles/default-balanced-server-ssl-profile",
    "server_auth": "IGNORE",
    "certificate_chain_depth": 3,
    "client_certificate_path": "/infra/certificates/my-signed-certificate"
  },
    "rules": [
    {
      "match_conditions": [
        {
          "uri": "/cloudapi/1.0.0/sessions/provider",
          "match_type": "CONTAINS",
          "case_sensitive": false,
          "type": "LBHttpRequestUriCondition",
          "inverse": false
        },
        {
          "source_address": "10.37.5.30",
          "type": "LBIpHeaderCondition",
          "inverse": true
        }
      ],
      "match_strategy": "ALL",
      "phase": "HTTP_ACCESS",
      "actions": [
        {
          "type": "LBConnectionDropAction"
        }
      ]
    },
    {
      "match_conditions": [
        {
          "uri": "/provider",
          "match_type": "EQUALS",
          "case_sensitive": false,
          "type": "LBHttpRequestUriCondition",
          "inverse": false
        },
        {
          "source_address": "10.37.5.30",
          "type": "LBIpHeaderCondition",
          "inverse": true
        }
      ],
      "match_strategy": "ALL",
      "phase": "HTTP_ACCESS",
      "actions": [
        {
          "type": "LBConnectionDropAction"
        }
      ]
    }
  ],
  "log_significant_event_only": false,
  "resource_type": "LBVirtualServer",
  "id": "vcloud",
  "display_name": "vcloud",
  "_revision": 1
}

Workflow for end-to-end tenant provisioning with VMware Cloud Director

VMware vRealize Orchestrator workflows for VMware Cloud Director to automate the provisioning of cloud services.

Firstly, apologies to all those who asked for the workflow at VMworld 2019 in Barcelona and also e-mailed me for a copy. It’s been hectic in my professional and personal life. I also wanted to clean up the workflows and remove any customer specific items that are not relevant to this workflow. Sorry it took so long!

If you’d like to see an explanation video of the workflows in action, please take a look at the VMworld session recording.

Credits

These vRealize Orchestrator workflows were co-created and developed by Benoit Serratrice and Henri Timmerman.

You can download a copy of the workflow using this link here.

What does it do?

The workflow does the following:

Creates an organization based on your initial organisation name as an input.
Creates a vDC into this organization.
Adds a gateway to the vDC.
Adds an routed network with a gateway CIDR that you enter.
Adds a direct external network.
Converts the organization network to use distributed routing.
Adds a default outbound firewall rule for the routed network.
Adds a source NAT rule to allow the routed network to goto the external network.
Adds a catalog.

It also cleans up the provisioning if there is a failure. I have also included a Decommission Customer workflow separately to enable you to quickly delete vCD objects quickly and easily. It is designed for lab environments. Bear this in mind when using it.

Other caveats: the workflows contained in this package are unsupported. I’ll help in the comments below as much as I can.

Getting Started

Import the package after downloading it from github.

The first thing you need to do is setup the global settings in the Global, Commission, storageProfiles and the other configurations. You can find these under Assets > Configurations.

You should then see the Commission Customer v5 workflow under Workflows in your vRO client, it should look something like this.

Enter a customer name and enter the gateway IP in CIDR into the form.

Press Run, then sit back and enjoy the show.

Known Issues

Commissioning a customer when there are no existing edge gateways deployed that use an external network. You see the following error in the vRO logs:

item: 'Commission Customer v5/item12', state: 'failed', business state: 'null', exception: 'TypeError: Cannot read property "ipAddress" from null (Workflow:Commission Customer v5 / get next ip (item8)#5)'

This happens because no IP addresses are in use from the external network pool. The Commission Customer workflow calculates the next IP address to assign to the edge gateway, it cannot do this if the last IP in use is null. Manually provision something that uses one IP address from the external network IP pool. Then use the Commission Customer workflow, it should now work.

Commissioning a customer workflow completes successfully, however you see the following errors:

[2020-03-22 19:30:44.596] [I] orgNetworkId: 545b5ef4-ff89-415b-b8ef-bae3559a1ac7
[2020-03-22 19:30:44.662] [I] =================================================================== Converting Org network to a distributed interface...
[2020-03-22 19:30:44.667] [I] ** API endpoint: vcloud.vmwire.com/api/admin/network/545b5ef4-ff89-415b-b8ef-bae3559a1ac7/action/convertToDistributedInterface
[2020-03-22 19:30:44.678] [I] error caught!
[2020-03-22 19:30:44.679] [I] error details: InternalError: Cannot execute the request:  (Workflow:Convert net to distributed interface / Post to vCD (item4)#21)
[2020-03-22 19:30:44.680] [I] error details: Cannot execute the request:  (Workflow:Convert net to distributed interface / Post to vCD (item4)#21)
[2020-03-22 19:30:44.728] [I] Network converted succesfully.

The workflow attempts to convert the org network from an internal interface to a distributed interface but it does not work even thought the logs says it was successful. Let me know if you are able to fix this.

VMworld 2019 Rewatch: Building a Modern Cloud Hosting Platform on VMware Cloud Foundation with VMware vCloud Director (HBI1321BE)

Rewatch my session with Onni Rautanen at VMworld EMEA 2019 where we cover the clouds that we are building together with Tieto.

Description: In this session, you will get a technical deep dive into Tieto’s next generation service provider cloud hosting platform running on VMware vCloud Director Cloud POD architecture deployed on top of VMware Cloud Foundation. Administrators and cloud engineers will learn from Tieto cloud architects about their scalable design and implementation guidance for building a modern multi-tenant hosting platform for 10,000+ VMs. Other aspects of this session will discuss the API integration of ServiceNow into the VMware cloud stack, Backup and DR, etc.

You’ll need to create a free VMworld account to access this video and many other videos that are made available during and after the VMworld events.

https://videos.vmworld.com/global/2019/videoplayer/29271

A look at VMware vCloud Director Organization LDAP Authentication Options

VMware vCloud Director can use three different authentication mechanisms for subscriber authentication to the VCD portal. The portal is accessed using the URL https://<cloud-url>/cloud/org/<organisation>. In this post, I’ll try to highlight some of the authentication options that a subscriber can use to access the VCD portal.

Supported LDAP Services

Platform	LDAP Server	Authentication Methods
Windows Server 2003	Active Directory	Simple, Simple SSL, Kerberos, Kerberos SSL
Windows Server 2008	Active Directory	Simple
Windows 7 (2008 R2)	Active Directory	Simple, Simple SSL, Kerberos, Kerberos SSL
Linux	OpenLDAP	Simple, Simple SSL

VCD LDAP Options

A provider can configure a subscriber to use three different authentication mechanisms as highlighted by Figure 1.

Figure 1 – VCD LDAP Options

Do not use LDAP (also known as local authentication)

This is the simplest authentication method, selecting this radio button when configuring a new Organization will not use any kind of LDAP service. Instead, new users will need to be configured using the VCD GUI or the VCD API, and these users will be stored within the VCD database. Some of the disadvantages when using the local authentication are:

Groups cannot be used
A minimum length of 6 character only
No password complexity policies
No password expiration policies
No password history
No authentication failure controls
No integration with enterprise identity management systems

VCD system LDAP service

Selecting this will force the Organization to use the same LDAP service as the LDAP service that is used by the VCD system (Provider). Although, a separate OU can be used for each Organization, this is not the ideal model to use for large cloud deployments. Some of the disadvantages when using the VCD system LDAP service are:

Organizations must use the same LDAP service as the Provider.
Although separate OUs can be used, Organizations may not want to have their Users and Groups managed by the Provider.
Organizations may not want to share the same LDAP service with another Organization, even if separate OUs are used.
No self-service of the LDAP service by each subscriber is possible unless complex access is setup for each subscriber to their respective OU.

Custom LDAP service

Selecting this will allow the Organization to use its own private LDAP service. What this means is for each Organization, a completely separate and unique LDAP service can be used for that Organization, an Organization does not need to use the same service as the VCD system but can use its own LDAP service. This can be a completely separate unique Active Directory Forest for example, with no network links to any other AD Forest.

VCD System LDAP Service

Consider this following example:

I run a Public Cloud so I am a Provider of cloud services, my VCD system authenticates to a Microsoft Active Directory Forest with a domain name of HUGO.LOCAL. This allows me as a System Administrator to log into my VCD portal as a user on HUGO.LOCAL.

As the System Administrator, I first configure an LDAP service for the VCD System:

Figure 2 – VCD System LDAP

Then, a new Security Group called SG_VCD.System.Administrators is created in the HUGO.LOCAL domain, with the user HUGO.LOCAL\HPhan as a member of that group.

Figure 3 – VCD System Administrators Group

The new Security Group SG_VCD.System.Administrators is then added to the System Administrator role in VCD.

Figure 4 – Import LDAP group into VCD role

Now I can log into my cloud as a System Administrator with my domain user HUGO\HPhan.

Figure 5 – System LDAP

Organization Custom LDAP Service

So pretty easy and straightforward so far right? What happens when a subscriber comes along and wants to use my cloud services? Let’s do another example.

A new organization let’s say Coke, wish to use their own LDAP service to authenticate with the VCD portal. In much the same way as how the System LDAP was configured, an Organization LDAP service is configured in similar ways.

As a System Administrator, I first configure a LDAP service for the Coke Organization, instead of using the HUGO.LOCAL LDAP service, I now direct this Organization’s LDAP service to a unique LDAP service for Coke. This can be a LDAP service hosted by me (the Provider) and managed by Coke (think co-lo), or a LDAP service managed by Coke in Coke’s datacentres (think MPLS/IPVPN):

Figure 6 – Organization LDAP

Then a new Security Group called Organization Administrators is created in the COKE.LOCAL domain, with the user COKE.LOCAL\John.Smith as a member of that group.

Figure 7 – VCD Organization Administrators Group and Members

The new Security Group Organization Administrator is then added to the Organization Administrator role in Coke’s Organization.

Figure 8 – Assign LDAP Group to VCD Role

John Smith can log into the Coke Organization as an Organization Administrator with the domain user COKE\John.Smith.

Figure 9 – LDAP User logged into VCD

So what happens when another Organization joins the party? Extending our example above, let’s say Pepsi also want to use my cloud services. In much the same way that the Coke Organization is configured to use its own LDAP service, we do the same for the Pepsi Organization – an Organization Administrator group is created in the PEPSI.LOCAL domain, and a user named Peter.Smith is a member of that group, Peter Smith can also log into Pepsi’s Organization as an Organization Administrator.

Figure 10 – Another LDAP User logged into VCD

In Summary

In summary the provider will use the System LDAP, all other (subscribers) Organizations could also use the System LDAP (either with a separate OU or not) if required, however, you can also configure each Organization to use its own LDAP Service.

We have a Provider which uses the domain HUGO.LOCAL to authenticate the System VCD, with the Active Directory Security Group SG_VCD.System.Administrators having the System Administrator role in VCD and my account HUGO\HPhan is a member of this group.
We have subscriber 1 with an Organization named Coke Co, and this organization uses its own LDAP service which is backed by a domain COKE.LOCAL.
We have another subscriber, subscriber 2 with an Organization named Pepsi Co, and this organization uses its own LDAP service which is backed by a domain PEPSI.LOCAL.
Provider – Uses HUGO.LOCAL – System LDAP
Subscriber 1 – Uses COKE.LOCAL – Custom LDAP
Subscriber 2 – Uses PEPSI.LOCAL – Custom LDAP
There is no trust between the Provider LDAP or any Subscribers’ LDAP required.
More importantly, there is no trust and no network connectivity between any of the subscriber’s LDAP systems.

Securing Custom LDAP Services

For each Organization, a single LDAP Service for that Organization will need to be configured as a Custom LDAP to authenticate to. To enable this functionality, the vCloud Director Cell must be able to connect to ALL LDAP servers over TCP 389 or 636. The VMware vCloud Security Hardening Guide gives good recommendations on how Service Providers can host Subscribers’ LDAP servers and also how to maintain connectivity to Subscribers’ LDAP servers if hosted remotely over MPLS/VPN etc.

It is therefore important that the vCD Cell is secured and network connectivity to each organization’s LDAP services are also secured. The following extract from the VMware vCloud Security Hardening Guide explains the connectivity options for subscriber’s LDAP services:

Connectivity from the VMware vCloud Director cells to the system LDAP server and any Organization LDAP servers must be enabled for the software to properly authenticate users. As recommended in this document, the system LDAP server must be located on the private management network, separated from the DMZ by a firewall. Some cloud providers and most IT organizations will run any Organization LDAP servers required, and those too would be on a private network, not the DMZ. Another option for an Organization LDAP server is to have it hosted and managed outside of the cloud provider’s environment and under the control of the Organization. In that case, it must be exposed to the VMware vCloud Director cells, potentially through the enterprise datacenter’s own DMZ (see Shared Resource Cloud Service Provider Deployment above).

In all of these circumstances, opening the appropriate ports through the various firewalls in the path between the cells and the LDAP server is required. By default, this port is 389/TCP for LDAP and 636/TCP for LDAPS; however, this port is customizable with most servers and in the LDAP settings in the Web UI. Also, a concern that arises when the Organization is hosting their own LDAP server is exposing it through their DMZ. It is not a service that needs to be accessible to the general public, so steps should be taken to limit access only to the VMware vCloud Director cells. One simple way to do that is to configure the LDAP server and/or the external firewall to only allow access from IP addresses that belong to the VMware vCloud Director cells as reported by the cloud provider. Other options include systems such as per-Organization site-to-site VPNs connecting those two sets of systems, hardened LDAP proxies or virtual directories, or other options, all outside the scope of this document.

Figure 11 – Multiple Custom LDAP in VCD

Note: The use of Coke and Pepsi are used as an example of multi tenancy within a public cloud and the use of the names on this blog are for information purposes only.

Uninstalling vCD agent on ESXi host

To unistall the vCD agent (vslad) on an ESXi host:

Enable Remote Tech Support (SSH) in Configuration | Security Profile | Properties

Log into the ESXi host using your favourite SSH client
Navigate to /opt/vmware/uninstallers
Now run the script named vslad-uninstall.sh, or you could just do the below after logging into the ESXi host

/opt/vmware/unistallers/vslad-uninstall.sh

Disable Remote Tech Support (SSH)
Restart your ESXi host.

Incorrectly configured URL for Organisation in vCloud Director 1.0

VMware vCloud Director (vCD) automatically creates a URL for each organisation that is created in vCD. There is a slight bug which does not create the URL properly and will cause the URL that is displayed under Customer | Administration | Settings | General to be incorrect.

For example, if you create an organisation called Customer1, the default URL that is created will be:

https://url.of.your.cloud/org/Customer1/

This is of course wrong and if you clicked on the link you would see a page similar to this:

So how do we fix this?

Simple, just add cloud into the URL so the new URL will be:

https://url.of.your.cloud/cloud/org/Customer1/

This WILL work but you will have to do this for every new customer and also remember to publish the correct URL.

However, there is a better way, being much more intelligent, amend the system VCD public URL under System | Administration | System Settings | Public Addresses

vCD Public URL

This will automatically add cloud into all organisation VCD public URLs.

vShield Manager Notes

Most administrative changes to vShield Manager can be done using the command line interface (CLI) by initiating a console session to the vShield Manager virtual machine. You can log in to the CLI by using the default user name admin and password default.

You can also access the CLI by enabling SSH.

To enable SSH:

Log in to the CLI by using the default user name and password
Enter configuration mode by typing

manager# en

manager# configure terminal

manager(config)# ssh start

manager(config)# cli ssh allow

To change the hostname of vShield Manager

vShield Manager uses manager as the default hostname but there is no easy way to change the hostname using the web interface or the vSphere plugin. You can only change vShield Manager’s hostname using the CLI.

Log in to the CLI by using the default user name and password
Enter configuration mode by typing

manager# en

manager# configure terminal

manager# hostname newhostname

vShield will then restart its web services and accept the changes

More to follow….

Creating a VMware vCloud Director Cluster

Overview

A VMware vCloud Director (vCD) cluster contains one or more vCD servers, these servers are referred to as “Cells” and form the basis of the VMware cloud. A cloud can be formed of multiple cells.

This diagram is a good representation of the vCD Cluster concept.

To enable multiple servers to participate in a cluster, the same pre-requisites exist for a single host as for multiple hosts but the following must be met:

each host must mount the shared transfer server storage at $VCLOUD_HOME/data/transfer, this is typically located in /opt/vmware/cloud-director/data/transfer.

This shared storage could be a NFS mount, mounted to all participating servers with rw access for root. It is important that prior to configuring the first server, a decision must be made on whether a cluster is required. If you intend to use a vCD Cluster, configure the shared transfer server storage before executing the vCD installer.

Check out the vCloud Director Installation and Configuration Guide for pre-requisites.

Shared Transfer Server Storage

For this post, I’ve setup an NFS volume on Freenas and given rw permissions for all cluster members to the volume. It is assummed that you have a completely clean installation of RHEL 5 x64 (or if like me you are running this in a lab CENTOS 5 x64), with all the latest updates and pre-requisite packages.

Now to mount the volume on all hosts:

Connect to your first host using SSH or login directly
Edit your /etc/fstab file and add the following line remembering to change to your NFS server and relevant mount point
vcd-freenas.vmwire.local:/mnt/SSD /opt/vmware/cloud-director/data/transfer nfs rw,soft,_netdev 0 0
The resulting /etc/fstab should look something like this:
/etc/fstab
Now create the shared transfer server storage folder structure, /opt/vmware/cloud-director/data/transfer (just do a mkdir command)
run chkconfig netfs on
Repeat steps 1-6 for any other hosts
Restart servers

Now you are ready to install vCD onto the first host, making sure that you have met all the pre-requisites as detailed in the vCloud Director Installation and Configuration Guide. Once completed you should have a working cell with its shared transfer server storage folder located on the NFS volume.

Setting up a second cell as part of the Cloud Director Cluster

At this point you should already have a working cell with the vCD shared transfer server storage located on the NFS volume. Before you install vCD onto a server the following must be done:

All pre-requisites for a single server installation must also be met for subsequent servers as part of a vCD Cluster
The second server must also have rw access for root to the shared transfer server storage
The second server must have access to the response file, this file is located in /opt/vmware/cloud-director/etc/responses.properties on the first successfully installed server
Copy the above file to the second server or to the shared transfer server storage
It is important to note that the response file contains values that were used for the first server. Subsequent servers will use the response file, and as such if you stored your certificates.ks file for the first server in a location not recognised by subsequent servers, you will be prompted by the installation script to enter the correct path to the certificates.ks file for any subsequent servers. To avoid this, you could create all the certificates.ks files for all cluster members and place them in the shared transfer server storage, with of course unique names such as vcd-cell1-certificates.ks and vcd-cell2-certificates.ks.
You can now install vCD onto subsequent servers with the command vmware-cloud-director-1.0.0-285979.bin -r /opt/vmware/cloud-director/data/transfer/responses.properties

The installer will automatically complete most prompts for you, but you will still need to select the correct eth adapter for the http and consoleproxy services, everything else will be automatic.

Go ahead and have a play and maybe even deploy a load balancer on top.

Here’s a screenshot of my two cells working side by side connecting to the same shared transfer server storage, oracle database and managing the same vCenters.

For more information read the overview at Yellow Bricks which also includes links to the product pages.