In a previous post I wrote about how to scale workload cluster control plane and worker nodes vertically. This post explains how to do the same for the TKG Management Cluster nodes.
Scaling vertically is increasing or decreasing the CPU, Memory, Disk or changing other things such as the network for the nodes. Using the Cluster API it is possible to make these changes on the fly, Kubernetes will use rolling updates to make the necessary changes.
First change to the TKG Management Cluster context to make the changes.
Scaling Worker Nodes
Run the following to list all the vSphereMachineTemplates.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -A
NAMESPACE NAME AGE
tkg-system tkg-mgmt-control-plane 20h
tkg-system tkg-mgmt-worker 20h
These custom resource definitions are immutable so we will need to make a copy of the yaml file and edit it to add a new vSphereMachineTemplate.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-worker -o yaml > tkg-mgmt-worker-new.yaml
Now edit the new file named tkg-mgmt-worker-new.yaml
Save and quit. You’ll notice that a new VM will immediately start being cloned in vCenter. Wait for it to complete, this new VM is the new worker with the updated CPU and memory sizing and it will replace the current worker node. Eventually, after a few minutes, the old worker node will be deleted and you will be left with a new worker node with the updated CPU and RAM specified in the new VSphereMachineTemplate.
Scaling Control Plane Nodes
Scaling the control plane nodes is similar.
k get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n tkg-system tkg-mgmt-control-plane -o yaml > tkg-mgmt-control-plane-new.yaml
Edit the file and perform the same steps as the worker nodes.
You’ll notice that there is no MachineDeployment for the control plane node for a TKG Management Cluster. Instead we have to edit the CRD named KubeAdmControlPlane.
Run this command
k get kubeadmcontrolplane -A
NAMESPACE NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION
tkg-system tkg-mgmt-control-plane tkg-mgmt true true 1 1 1 0 21h v1.22.9+vmware.1
Now we can edit it
k edit kubeadmcontrolplane -n tkg-system tkg-mgmt-control-plane
Change the section under spec.machineTemplate.infrastructureRef, around line 106.
Save the file. You’ll notice that another VM will start cloning and eventually you’ll have a new control plane node up and running. This new control plane node will replace the older one. It will take longer than the worker node so be patient.
Container Service Extension 4 was released recently. This post aims to help ease the setup of CSE 4.0 as it has a different deployment model using the Solutions framework instead of deploying the CSE appliance into the traditional Management cluster concept used by service providers to run VMware management components such as vCenter, NSX-T Managers, Avi Controllers and other management systems.
Step 1 – Create a CSE Service Account
Perform these steps using the administrator@system account or an equivalent system administrator role.
Setup a Service Account in the Provider (system) organization with the role CSE Admin Role.
In my environment I created a user to use as a service account named svc-cse. You’ll notice that this user has been assigned the CSE Admin Role.
The CSE Admin Role is created automatically by CSE when you use the CSE Management UI as a Provider administrator, just do these steps using the administrator@system account.
Step 2 – Create a token for the Service Account
Log out of VCD and log back into the Provider organization as the service account you created in Step 1 above. Once logged in, it should look like the following screenshot, notice that the svc-cse user is logged into the Provider organization.
Click on the downward arrow at the top right of the screen, next to the user svc-cse and select User Preferences.
Under Access Tokens, create a new token and copy the token to a safe place. This is what you use to deploy the CSE appliance later.
Log out of VCD and log back in as adminstrator@system to the Provider organization.
Step 3 – Deploy CSE appliance
Create a new tenant Organization where you will run CSE. This new organization is dedicated to VCD extensions such as CSE and is managed by the service provider.
For example you can name this new organization something like “solutions-org“. Create an Org VDC within this organization and also the necessary network infrastructure such as a T1 router and an organization network with internet access.
Still logged into the Provider organization, open another tab by clicking on the Open in Tenant Portal link to your “solutions-org” organization. You must deploy the CSE vApp as a Provider.
Now you can deploy the CSE vApp.
Use the Add vApp From Catalog workflow.
Accept the EULA and continue with the workflow.
When you get the Step 8 of the Create vApp from Template, ensure that you setup the OVF properties like my screenshot below:
The important thing to note is to ensure that you are using the correct service account username and use the token from Step 2 above.
Also since you must have the service account in the Provider organization, leave the default system organization for CSE service account’s org.
The last value is very important, it must by set to the tenant organization that will run the CSE appliance, in our case it is the “solutions-org” org.
Once the OVA is deployed you can boot it up or if you want to customize the root password then do so before you start the vApp. If not, the default credentials are root and vmware.
Rights required for deploying TKG clusters
Ensure that the user that is logged into a tenant organization has the correct rights to deploy a TKG cluster. This user must have at a minimum the rights in the Kubernetes Cluster Author Global Role.
App LaunchPad
You’ll also need to upgrade App Launchpad to the latest version alp-2.1.2-20764259 to support CSE 4.0 deployed clusters.
Also ensure that the App-Launchpad-Service role has the rights to manage CAPVCD clusters.
Otherwise you may encounter the following issue:
VCD API Protected by Web Application Firewalls
If you are using a web application firewall (WAF) in front of your VCD cells and you are blocking access to the provider side APIs. You will need to add the SNAT IP address of the T1 from the solutions-org into the WAF whitelist.
The CSE appliance will need access to the VCD provider side APIs.
I wrote about using a WAF in front of VCD in the past to protect provider side APIs. You can read those posts here and here.
The default setting for load balancer service requests for application services defaults to using the two-arm load balancer with NSX Advanced Load Balancer (Avi) in Container Service Extension (CSE) provisioned Tanzu Kubernetes Grid (TKG) cluster deployed in VMware Cloud Director (VCD).
VCD tells NSX-T to create a DNAT towards an internal only IP range of 192.168.8.x. This may be undesirable for some customers and it is now possible to remove the need for this and just use a one-arm load balancer instead.
The default setting for load balancer service requests for application services defaults to using the two-arm load balancer with NSX Advanced Load Balancer (Avi) in Container Service Extension (CSE) provisioned Tanzu Kubernetes Grid (TKG) cluster deployed in VMware Cloud Director (VCD).
VCD tells NSX-T to create a DNAT towards an internal only IP range of 192.168.8.x. This may be undesirable for some customers and it is now possible to remove the need for this and just use a one-arm load balancer instead.
This capability has been enabled only for VCD 10.4.x, in prior versions of VCD this support was not available.
The requirements are:
CSE 4.0
VCD 10.4
Avi configured for VCD
A TKG cluster provisioned by CSE UI.
If you’re still running VCD 10.3.x then this blog article is irrelevant.
The vcloud-ccm-configmap config map stores the vcloud-ccm-config.yaml, that is used by the vmware-cloud-director-ccm deployment.
Step 1 – make a copy of the vcloud-ccm-configmap
k get cm -n kube-system vcloud-ccm-configmap -o yaml
Make a copy of the config map to edit it and then apply, since the current config map is immutable.
k get cm -n kube-system vcloud-ccm-configmap -o yaml > vcloud-ccm-configmap.yaml
Step 2 – Edit the vcloud-ccm-configmap
Edit the file, delete the entries under data: oneArm:\n , delete the startIP and endIP lines and change the value to true for the key enableVirtualServiceSharedIP. Ignore the rest of the file.
To apply the new config map, you need to delete the old configmap first.
k delete cm -n kube-system vcloud-ccm-configmap
configmap "vcloud-ccm-configmap" deleted
Apply the new config map with the yaml file that you just edited.
k apply -f vcloud-ccm-configmap.yaml
configmap/vcloud-ccm-configmap created
To finalize the change, you have to take a backup of the vmware-cloud-director-ccm deployment and then delete it so that it can use the new config-map.
You can check the config map that this deployment uses by typing:
k get deploy -n kube-system vmware-cloud-director-ccm -o yaml
Step 4 – Redeploy the vmware-cloud-director-ccm deloyment
Take a backup of the vmware-cloud-director-ccm deployment by typing:
k get deploy -n kube-system vmware-cloud-director-ccm -o yaml > vmware-cloud-director-ccm.yaml
No need to edit this time. Now delete the deployment:
k delete deploy -n kube-system vmware-cloud-director-ccm
You can now recreate the deployment from the yaml file:
k apply -f vmware-cloud-director-ccm.yaml
deployment.apps/vmware-cloud-director-ccm created
Now when you deploy and application and request a load balancer service, NSX ALB (Avi) will route the external VIP IP towards the k8s workers nodes, instead of to the NSX-T DNAT segment on 192.168.2.x first.
Step 5 – Deploy a load balancer service
k apply -f https://raw.githubusercontent.com/hugopow/tkg-dev/main/web-statefulset.yaml
You’ll notice a few things happening with this example. A new statefulset with one replica is scheduled with an nginx pod. The statefulset also requests a 1 GiB PVC to store the website. A load balancer service is also requested.
Note that there is no DNAT setup on this tenant’s NAT services, this is because after the config map change, the vmware-cloud-director cloud controller manager is not using a two-arm load balancer architecture anymore, therefore no need to do anything with NSX-T NAT rules.
If you check your NSX ALB settings you’ll notice that it is indeed now using a one-arm configuration. Where the external VIP IP address is 10.149.1.113 and port is TCP 80. NSX ALB is routing that to the two worker nodes with IP addresses of 192.168.0.100 and 192.168.0.102 towards port TCP 30020.
k get svc -n web-statefulset
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE web-statefulset-service LoadBalancer 100.66.198.78 10.149.1.113 80:30020/TCP 13m
k get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
tkg-1-worker-node-pool-1-68c67d5fd6-c79kr Ready 5h v1.22.9+vmware.1 192.168.0.102 192.168.0.102 Ubuntu 20.04.4 LTS 5.4.0-109-generic containerd://1.5.11
tkg-1-worker-node-pool-2-799d6bccf5-8vj7l Ready 4h46m v1.22.9+vmware.1 192.168.0.100 192.168.0.100 Ubuntu 20.04.4 LTS 5.4.0-109-generic containerd://1.5.11
I’ve been experimenting with the VMware Cloud Director, Container Service Extension and App Launchpad applications and wanted to test if these applications would run in Kubernetes.
The short answer is yes!
I’ve been experimenting with the VMware Cloud Director, Container Service Extension and App Launchpad applications and wanted to test if these applications would run in Kubernetes.
The short answer is yes!
I initially deployed these apps as a standalone Docker container to see if they would run as a container. I wanted to eventually get them to run in a Kubernetes cluster to benefit from all the goodies that Kubernetes provides.
Packaging the apps wasn’t too difficult, just needed patience and a lot of Googling. The process was as follows:
run a Docker image of a Linux image, CentOS for VCD and Photon for ALP and CSE.
prepare all the pre-requisites, such as yum update and tdnf update.
commit the image to a Harbor registry
build a Helm chart to deploy the applications using the images and then create a shell script that is run when the image starts to install and run the applications.
Well, its not that simple but you can take a look at the code for all three Helm Charts on my Github or pull them from my public Harbor repository.
The values.yaml file is the only file you’ll need to edit, just update to suit your environment.
# Default values for vmware-cloud-director.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 1
installFirstCell:
enabled: true
installAdditionalCell:
enabled: false
storageClass: iscsi
pvcCapacity: 2Gi
vcdNfs:
server: 10.92.124.20
mountPath: /mnt/nvme/vcd-k8s
vcdSystem:
user: administrator
password: Vmware1!
email: admin@domain.local
systemName: VCD
installationId: 1
postgresql:
dbHost: postgresql.vmware-cloud-director.svc.cluster.local
dbName: vcloud
dbUser: vcloud
dbPassword: Vmware1!
# Availability zones in deployment.yaml are setup for TKG and must match VsphereFailureDomain and VsphereDeploymentZones
availabilityZones:
enabled: false
httpsService:
type: LoadBalancer
port: 443
consoleProxyService:
port: 8443
publicAddress:
uiBaseUri: https://vcd-k8s.vmwire.com
uiBaseHttpUri: http://vcd-k8s.vmwire.com
restapiBaseUri: https://vcd-k8s.vmwire.com
restapiBaseHttpUri: http://vcd-k8s.vmwire.com
consoleProxy: vcd-vmrc.vmwire.com
tls:
certFullChain: |-
-----BEGIN CERTIFICATE-----
wildcard certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
intermediate certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
root certificate
-----END CERTIFICATE-----
certKey: |-
-----BEGIN PRIVATE KEY-----
wildcard certificate private key
-----END PRIVATE KEY-----
The installation process is quite fast, less than three minutes to get the first pod up and running and two minutes for each subsequent pod. That means a VCD multi-cell system up and running in less than ten minutes.
I’ve deployed VCD as a StatefulSet, and have three replicas. Since the replica is set to three, three VCD “Pods” are deployed, in the old world these would be the cells. Here you can see three pods running which would provide both load balancing and high-availability. The other pod is the PostgreSQL database that these cells use. You should also be able to see that Kubernetes has scheduled each pod on a different worker node. I have three worker nodes in this Kubernetes cluster.
Below is the view in VCD of the three cells.
The StatefulSet also has a LoadBalancer service configured for performing the load balancing of the HTTP and Console Proxy traffic on TCP 443 and TCP 8443 respectively.
You can see the LoadBalancer service has configured the services for HTTP and Console Proxy. Note, that this is done automatically by Kubernetes using a manifest in the Helm Chart.
Migrating an existing VCD instance to Kubernetes
If you want to migrate an existing instance to Kubernetes, then use this post here.
How to install: Update values.yaml and then run helm install container-service-extension oci://harbor.vmwire.com/library/container-service-extension --version 0.2.0 -n container-service-extension
Here’s CSE running as a pod in Kubernetes. Since CSE is a stateless application, I’ve configured it to run as a Deployment.
CSE also does not need a database as it purely communicates with VCD through a message bus such as MQTT or RabbitMQ. Additionally no external access to CSE is required as this is done via VCD, so no load balancer is needed either.
You can see that when CSE is idle it only needs 1 milicore of CPU and 102Mib of RAM. This is so much better in terms of resource requirements than running CSE in a VM. This is one of the advantages of running pods vs VMs. Pods will use considerably fewer resources than VMs.
How to install: Update values.yaml and then run helm install app-launchpad oci://harbor.vmwire.com/library/app-launchpad --version 0.4.0 -n app-launchpad
The values.yaml file is the only file you’ll need to edit, just update to suit your environment.
# Default values for app-launchpad.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
alpConnect:
saUser: "svc-alp"
saPass: Vmware1!
url: https://vcd-k8s.vmwire.com
adminUser: administrator@system
adminPass: Vmware1!
mqtt: true
eula: accept
# If you accept the EULA then type "accept" in the EULA key value to install ALP. You can fine the EULA in the README.md file.
I’ve already written an article about ALP here. That article contains a lot more details so I’ll share a few screenshots below for ALP.
Just like CSE, ALP is a stateless application and is deployed as a Deployment. ALP also does not require external access through a load balancer as it too communicates with VCD using the MQTT or RabbitMQ message bus.
You can see that ALP when idle requires just 3 milicores of CPU and 400 Mib of RAM.
ALP can be deployed with multiple instances to provide load balancer and high availability. This is done by deploying RabbitMQ and connecting ALP and VCD to the same exchange. VCD does not support multiple instances of ALP if MQTT is used.
When RabbitMQ is configured, then ALP can be scaled by changing the Deployment number of replicas to two or more. Kubernetes would then deploy additional pods with ALP.
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises.
This works with any Kubernetes cluster, including Tanzu Kubernetes Grid and Kubernetes clusters deployed with Container Service Extension with VMware Cloud Director.
This solution can be used for air-gapped environments where the Kuberenetes clusters do not have Internet access and cannot use public services such as Amazon S3, or Tanzu Mission Control Data Protection. These services are SaaS services which are pretty much out of bounds in air-gapped environments.
Overview
Velero (formerly Heptio Ark) gives you tools to back up and restore your Kubernetes cluster resources and persistent volumes. You can run Velero with a cloud provider or on-premises. Velero lets you:
Take backups of your cluster and restore in case of loss.
Migrate cluster resources to other clusters.
Replicate your production cluster to development and testing clusters.
Velero consists of:
A server that runs on your Kubernetes cluster
A command-line client that runs locally
Velero works with any Kubernetes cluster, including Tanzu Kubernetes Grid and Kubernetes clusters deployed using Container Service Extension with VMware Cloud Director.
This solution can be used for air-gapped environments where the Kubernetes clusters do not have Internet access and cannot use public services such as Amazon S3, or Tanzu Mission Control Data Protection. These services are SaaS services which are pretty much out of bounds in air-gapped environments.
Install Velero onto your workstation
Download the latest Velero release for your preferred operating system, this is usually where you have your kubectl tools.
If you want to enable bash auto completion, please follow this guide.
Setup an S3 service and bucket
I’m using TrueNAS’ S3 compatible storage in my lab. TrueNAS is an S3 compliant object storage system and is incredibly easy to setup. You can use other S3 compatible object stores such as Amazon S3. A full list of supported providers can be found here.
Follow these instructions to setup S3 on TrueNAS.
Add certificate, go to System, Certificates
Add, Import Certificate, copy and paste cert.pem and cert.key
Storage, Pools, click on the three dots next to the Pools that will hold the S3 root bucket.
Add a Dataset, give it a name such as s3-storage
Services, S3, click on pencil icon.
Setup like the example below.
Setup the access key and secret key for this configuration.
Update DNS to point to s3.vmwire.com to 10.92.124.20 (IP of TrueNAS). Note that this FQDN and IP address needs to be accessible from the Kubernetes worker nodes. For example, if you are installing Velero onto Kubernetes clusters in VCD, the worker nodes on the Organization network need to be able to route to your S3 service. If you are a service provider, you can place your S3 service on the services network that is accessible by all tenants in VCD.
Setup the connection to your S3 service using the access key and secret key.
Create a new bucket to store some backups. If you are using Container Service Extension with VCD, create a new bucket for each Tenant organization. This ensures multi-tenancy is maintained. I’ve create a new bucket named tenant1 which corresponds to one of my tenant organizations in my VCD environment.
Install Velero into the Kubernetes cluster
You can use the velero-plugin-for-aws and the AWS provider with any S3 API compatible system, this includes TrueNAS, Cloudian Hyperstore etc.
Setup a file with your access key and secret key details, the file is named credentials-velero.
vi credentials-velero
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMIK7MDENGbPxRfiCYEXAMPLEKEY
Change your Kubernetes context to the cluster that you want to enable for Velero backups. The Velero CLI will connect to your Kubernetes cluster and deploy all the resources for Velero.
To install Restic, use the --use-restic flag in the velero install command. See the install overview for more details on other flags for the install command.
velero install --use-restic
When using Restic on a storage provider that doesn’t have Velero support for snapshots, the --use-volume-snapshots=false flag prevents an unused VolumeSnapshotLocation from being created on installation. The VCD CSI provider does not provide native snapshot capability, that’s why using Restic is a good option here.
I’ve enabled the default behavior to include all persistent volumes to be included in pod backups enabled on all Velero backups running the velero install command with the --default-volumes-to-restic flag. Refer install overview for details.
Specify the bucket with the --bucket flag, I’m using tenant1 here to correspond to a VCD tenant that will have its own bucket for storing backups in the Kubernetes cluster.
For the --backup-location-config flag, configure you settings like mine, and use the s3Url flag to point to your S3 object store, if you don’t use this Velero will use AWS’ S3 public URIs.
NAME READY STATUS RESTARTS AGE
pod/restic-x6r69 1/1 Running 0 49m
pod/velero-7bc4b5cd46-k46hj 1/1 Running 0 49m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/restic 1 1 1 1 1 <none> 49m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/velero 1/1 1 1 49m
NAME DESIRED CURRENT READY AGE
replicaset.apps/velero-7bc4b5cd46 1 1 1 49m
This post summarizes how you can migrate the VMware Cloud Director database from PostgreSQL running in the VCD appliance into a PostgreSQL pod running in Kuberenetes and then creating new VCD cells running as pods in Kubernetes to run VCD services. In summary, modernizing VCD as a modern application.
This post summarizes how you can migrate the VMware Cloud Director database from PostgreSQL running in the VCD appliance into a PostgreSQL pod running in Kuberenetes and then creating new VCD cells running as pods in Kubernetes to run VCD services. In summary, modernizing VCD into a modern application.
I wanted to experiment with VMware Cloud Director to see if it would run in Kubernetes. One of the reasons for this is to reduce resource consumption in my home lab. The VCD appliance can be quite a high resource consuming VM needing a minimum of 2 vCPUs and 6GB of RAM. Running VCD in Kubernetes would definitely reduce this down and free up much needed RAM for other applications. Other benefits by running this workload in Kubernetes would benefit from faster deployment, higher availability, easier lifecycle management and operations and additional benefits from the ecosystem such as observability tools.
Here’s a view of the current VCD appliance in the portal. 172.16.1.34 is the IP of the appliance, 172.16.1.0/27 is the network for the NSX-T segment that I’ve created for the VCD DMZ network. At the end of this post, you’ll see VCD running in Kubernetes pods with IP addresses assigned by the CNI instead.
Tanzu Kubernetes Grid Shared Services Cluster
I am using a Tanzu Kubernetes Grid cluster set up for shared services. Its the ideal place to run applications that in the virtual machine world would have been running in a traditional vSphere Management Cluster. I also run Container Service Extension and App Launchpad Kubernetes pods in this cluster too.
Step 1. Deploy PostgreSQL with Kubeapps into a Kubernetes cluster
If you have Kubeapps, this is the easiest way to deploy PostgreSQL.
Copy my settings below to create a PostgreSQL database server and the vcloud user and database that are required for the database restore.
Step 1. Alternatively, use Helm directly.
# Create database server using KubeApps or Helm, vcloud user with password
helm repo add bitnami https://charts.bitnami.com/bitnami
# Pull the chart, unzip then edit values.yaml
helm pull bitnami/postgresql
tar zxvf postgresql-11.1.11.tgz
helm install postgresql bitnami/postgresql -f /home/postgresql/values.yaml -n vmware-cloud-director
# Expose postgres service using load balancer
k expose pod -n vmware-cloud-director postgresql-primary-0 --type=LoadBalancer --name postgresql-public
# Get the IP address of the load balancer service
k get svc -n vmware-cloud-director postgresql-public
# Connect to database as postgres user from VCD appliance to test connection
psql --host 172.16.4.70 -U postgres -p 5432
# Type password you used when you deployed postgresql
# Quit
\q
Step 2. Backup database from VCD appliance and restore to PostgreSQL Kubernetes pod
Log into the VCD appliance using SSH.
# Stop vcd services on all VCD appliances
service vmware-vcd stop
# Backup database and important files on VCD appliance
./opt/vmware/appliance/bin/create_backup.sh
# Unzip the zip file into /opt/vmware/vcloud-director/data/transfer/backups
# Restore database using pg_dump backup file. Do this from the VCD appliance as it already has the postgres tools installed.
pg_restore --host 172.16.4.70 -U postgres -p 5432 -C -d postgres /opt/vmware/vcloud-director/data/transfer/backups/vcloud-database.sql
# Edit responses.properties and change IP address of database server from load balancer IP to the assigned FQDN for the postgresql pod, e.g. postgresql-primary.vmware-cloud-director.svc.cluster.local
# Shutdown the VCD appliance, its no longer needed
Step 3. Deploy Helm Chart for VCD
# Pull the Helm Chart
helm pull oci://harbor.vmwire.com/library/vmware-cloud-director
# Uncompress the Helm Chart
tar zxvf vmware-cloud-director-0.5.0.tgz
# Edit the values.yaml to suit your needs
# Deploy the Helm Chart
helm install vmware-cloud-director vmware-cloud-director --version 0.5.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml
# Wait for about five minutes for the installation to complete
# Monitor logs
k logs -f -n vmware-cloud-director vmware-cloud-director-0
Known Issues
If you see an error such as:
Error starting application: Unable to create marker file in the transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer/cells/4c959d7c-2e3a-4674-b02b-c9bbc33c5828]
This is due to the transfer share being created by a different vcloud user on the original VCD appliance. This user has a different Linux user ID, normally 1000 or 1001, we need to change this to work with the new vcloud user.
Run the following commands to resolve this issue:
# Launch a bash session into the VCD pod
k exec -it -n vmware-cloud-director vmware-cloud-director-0 -- /bin/bash
# change ownership to the /transfer share to the vcloud user
chmod -R vcloud:vcloud /opt/vmware/vcloud-director/data/transfer
# type exit to quit
exit
Once that’s done, the cell can start and you’ll see the following:
Successfully verified transfer spooling area: VfsFile[fileObject=file:///opt/vmware/vcloud-director/data/transfer]
Cell startup completed in 2m 26s
Accessing VCD
The VCD pod is exposed using a load balancer in Kubernetes. Ports 443 and 8443 are exposed on a single IP, just like how it is configured on the VCD appliance.
Run the following to obtain the new load balancer IP address of VCD.
k get svc -n vmware-cloud-director vmware-cloud-director
Redirect your DNS server record to point to this new IP address for both the HTTP and VMRC services, e.g., 172.16.4.71.
If everything ran successfully, you should now be able to log into VCD. Here’s my VCD instance that I use for my lab environment which was previously running in a VCD appliance, now migrated over to Kubernetes.
Notice, the old cell is now inactive because it is powered-off. It can now be removed from VCD and deleted from vCenter.
The pod vmware-cloud-director-0 is now running the VCD application. Notice its assigned IP address of 100.107.74.159. This is the pod’s IP address.
Everything else will work as normal, any UI customizations, TLS certificates are kept just as before the migration, this is because we restored the database and used the responses.properties to add new cells.
Even opening a remote console to a VM will continue to work.
Load Balancer is NSX Advanced LB (Avi)
Avi provides the load balancing services automatically through the Avi Kubernetes Operator (AKO).
AKO automatically configures the services in Avi for you when services are exposed.
Deploy another VCD cell, I mean pod
It is very easy now to scale the VCD by deploying additional replicas.
Edit the values.yaml file and change the replicas number from 1 to 2.
# Upgrade the Helm Chart
helm upgrade vmware-cloud-director vmware-cloud-director --version 0.4.0 -n vmware-cloud-director -f /home/vmware-cloud-director/values.yaml
# Wait for about five minutes for the installation to complete
# Monitor logs
k logs -f -n vmware-cloud-director vmware-cloud-director-1
When the VCD services start up successfully, you’ll notice that the cell will appear in the VCD UI and Avi is also updated automatically with another pool.
We can also see that Avi is load balancing traffic across the two pods.
Deploy as many replicas as you like.
Resource usage
Here’s a very brief overview of what we have deployed so far.
Notice that the two PostgreSQL pods together are only using 700 Mb of RAM. The VCD pods are consuming much more. But a vast improvement over the 6GB that one appliance needed previously.
High Availability
You can ensure that the VCD pods are scheduled on different Kubernetes worker nodes by using multi availability zone topology. To do this just change the values.yaml.
# Availability zones in deployment.yaml are setup for TKG and must match VsphereFailureDomain and VsphereDeploymentZones
availabilityZones:
enabled: true
This makes sure that if you scale up the vmware-cloud-director statefulset, Kubernetes will ensure that each of the pods will not be placed on the same worker node.
As you can see from the Kubernetes Dashboard output under Resource usage above, vmware-cloud-director-0 and vmware-cloud-director-1 pods are scheduled on different worker nodes.
More importantly, you can see that I have also used the same for the postgresql-primary-0 and postgresql-read-0 pods. These are really important to keep separate in case of failure of a worker node or of an ESX server that the worker node runs on.
Finally
Here are a few screenshots of VCD, CSE and ALP all running in my Shared Services Kubernetes cluster.
Backing up the PostgreSQL database
For Day 2 operations, such as backing up the PostgreSQL database you can use Velero or just take a backup of the database using the pg_dump tool.
Backing up the database with pg_dump using a Docker container
Its super easy to take a database backup using a Docker container, just make sure you have Docker running on your workstation and that it can reach the load balancer IP address for the PostgreSQL service.
Gateway API replaces services of type LoadBalancer in applications that require shared IP with multiple services and network segmentation. The Gateway API can be used to meet the following requirements:
– Shared IP – supporting multiple services, protocols and ports on the same load balancer external IP address
– Network segmentation – supporting multiple networks, e.g., oam, signaling and traffic on the same load balancer
Using LoadBalancers, Gateways, GatewayClasses, AviInfraSettings, IngressClasses and Ingresses
Gateway API replaces services of type LoadBalancer in applications that require shared IP with multiple services and network segmentation. The Gateway API can be used to meet the following requirements:
Shared IP – supporting multiple services, protocols and ports on the same load balancer external IP address
Network segmentation – supporting multiple networks, e.g., oam, signaling and traffic on the same load balancer
NSX Advanced Load Balancer (Avi) supports both of these requirements through the use of the Gateway API. The following section describes how this is implemented.
The Gateway API introduces a few new resource types:
GatewayClasses are cluster-scoped resources that act as templates to explicitly define behavior for Gateways derived from them. This is similar in concept to StorageClasses, but for networking data-planes.
Gateways are the deployed instances of GatewayClasses. They are the logical representation of the data-plane which performs routing, which may be in-cluster proxies, hardware LBs, or cloud LBs.
Aviinfrasetting
Avi Infra Setting provides a way to segregate Layer-4/Layer-7 virtual services to have properties based on different underlying infrastructure components, like Service Engine Group, intended VIP Network etc.
Avi Infra Setting is a cluster scoped CRD and can be attached to the intended Services. Avi Infra setting resources can be attached to Services using Gateway APIs.
GatewayClass
Gateway APIs provide interfaces to structure Kubernetes service networking.
AKO supports Gateway APIs via the servicesAPI flag in the values.yaml.
The Avi Infra Setting resource can be attached to a Gateway Class object, via the .spec.parametersRef as shown below:
The Gateway object provides a way to configure multiple Services as backends to the Gateway using label matching. The labels are specified as constant key-value pairs, the keys being ako.vmware.com/gateway-namespace and ako.vmware.com/gateway-name. The values corresponding to these keys must match the Gateway namespace and name respectively, for AKO to consider the Gateway valid. In case any one of the label keys are not provided as part of matchLabels OR the namespace/name provided in the label values do no match the actual Gateway namespace/name, AKO will consider the Gateway invalid. Please see https://avinetworks.com/docs/ako/1.5/gateway/.
In your helm charts, for any service that needs a LoadBalancer service. You would now want to use ClusterIP instead but use Labels such as the following:
and the ClusterIP type tells the Avi Kubernetes Operator (AKO) to use the gateways, each gateway is on a separate network segment for traffic separation.
The gateways also have the relevant ports that the application uses, configure your gateway and change your helm chart to use the gateway objects.
Ingress Class
Avi Infra Settings can be applied to Ingress resources, using the IngressClass construct. IngressClass provides a way to configure Controller-specific load balancing parameters and applies these configurations to a set of Ingress objects. AKO supports listening to IngressClass resources in Kubernetes version 1.19+. The Avi Infra Setting reference can be provided in the Ingress Class as shown below:
The Avi Infra Setting resource can be attached to a Gateway Class object and Ingress Class object, via the .spec.parametersRef. However, using annotations with LoadBalancer object instead of using labels with Gateway API object, you will not be able to use shared protocol and ports on the same IP address. For example, TCP AND UDP 53 on the same LoadBalancer IP address. This is not supported yet, until MixedProtocolLB is supported by Kubernetes.
To provide a Controller to implement a given ingress, in addition to creating the IngressClass object, the ingressClassName should be specified, that matches the IngressClass name. The ingress looks as shown below:
This post is an update to enable the automated installation of Container Service Extension to version 3.1.2, the script is also updated for better efficiency.
This post is an update to enable the automated installation of Container Service Extension to version 3.1.2, the script is also updated for better efficiency.
You can find the details on my github account under the repository named cse-automated.
Ensure you review the README.MD and read the comments in the script too.
Pre-Requisites
Deploy Photon OVA into vSphere, 2 VCPUs, 4GB RAM is more than enough
Assign VM a hostname and static IP
Ensure it can reach the Internet
Ensure it can also reach VCD on TCP 443 and vCenter servers registered in VCD on TCP 443.
SSH into the Photon VM
Note that my environment has CA signed SSL certs and the script has been tested against this environment. I have not tested the script in environments with self-signed certificates.
Download cse-install.sh script to Photon VM
# Download the script to the Photon VM
curl https://raw.githubusercontent.com/hugopow/cse-automated/main/cse-install.sh --output cse-install.sh
# Make script executable
chmod +x cse-install.sh
Change the cse-install.sh script
Make sure you change passwords, CA SSL certificates and environment variables to suit your environment.
Launch the script, sit back and relax
# Run as root
sh cse-install.sh
Demo Video
Old video of CSE 3.0.4 automated install, but still the same process.
VMware Cloud Director App Launchpad allows users to deploy applications from public and private registries very easily into their VCD clouds, either as virtual machines or as containers into Kubernetes clusters provisioned into VCD by Container Service Extension.
Installing ALP requires a Linux system, followed by installing the application from an RPM file and then going through some configuration commands to connect ALP to the VCD system. Tedious at best and prone to errors.
This post shows how you can run ALP as a Kubernetes pod in a Kubernetes cluster instead of running ALP in a VM.
Disclaimer: This is unsupported. This post is an example of how you can run App Launchpad in Kubernetes instead of deploying it on a traditional VM. Use at your own risk. Please continue to run ALP in supported configurations in production environments.
VMware Cloud Director App Launchpad allows users to deploy applications from public and private registries very easily into their VCD clouds, either as virtual machines or as containers into Kubernetes clusters provisioned into VCD by Container Service Extension.
Here are a few screenshots of ALP in action in VCD.
How is App Launchpad Installed in a VM?
Installing ALP requires a Linux system, followed by installing the application from an RPM file and then going through some configuration commands to connect ALP to the VCD system. Tedious at best and prone to errors.
This post shows how you can run ALP as a Kubernetes pod in a Kubernetes cluster instead of running ALP in a VM.
Disclaimer: This is unsupported. This post is an example of how you can run App Launchpad in Kubernetes instead of deploying it on a traditional VM. Use at your own risk. Please continue to run ALP in supported configurations in production environments.
VMs vs Containers
Running containers in Kubernetes instead of VMs provides enhanced benefits. Such as:
Containers are more lightweight than VMs, as their images are measured in megabytes rather than gigabytes
Containers require fewer IT resources to deploy, run, and manage
Containers spin up in milliseconds
Since their order of magnitude is smaller, a single system can host many more containers as compared to VMs
Containers are easier to deploy and fit in well with infrastructure as code concepts
Developing, testing, running and managing applications are easier and more efficient with containers.
You’ll find the values.yaml file in the /app-launchpad directory. Edit it to your liking and also accept the ALP EULA, you’ll also find the EULA in the README.md file.
alpConnect:
saUser: "svc-alp"
saPass: Vmware1!
url: https://vcd.vmwire.com
adminUser: administrator@system
adminPass: Vmware1!
mqtt: true
eula: accept
# If you accept the EULA then type "accept" in the EULA key value to install ALP.
You can either package the chart and place it into your own registry or just use mine.
values.yaml
NAME: app-launchpad
LAST DEPLOYED: Fri Mar 18 09:54:16 2022
NAMESPACE: app-launchpad
STATUS: deployed
REVISION: 1
TEST SUITE: None
Running the following command will show that the deployment is successful
helm list -n app-launchpad
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
app-launchpad app-launchpad 1 2022-03-18 09:54:16.560871812 +0000 UTC deployed app-launchpad-0.4.0 2.1.1
Running the following commands you’ll see that the pod has started
kubectl get deploy -n app-launchpad
NAME READY UP-TO-DATE AVAILABLE AGE
app-launchpad 1/1 1 1 25m
kubectl get po -n app-launchpad
NAME READY STATUS RESTARTS AGE
app-launchpad-669786b6dd-p8fjw 1/1 Running 0 25m
Uninstalling...
Removed /etc/systemd/system/multi-user.target.wants/alp.service.
Removed /etc/systemd/system/multi-user.target.wants/alp-deployer.service.
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
warning: %postun(vmware-alp-2.1.1-19234432.x86_64) scriptlet failed, exit status 1
warning: /home/vmware-alp-2.1.1-19234432.ph3.x86_64.rpm: Header V3 RSA/SHA1 Signature, key ID 001e5cc9: NOKEY
Verifying... ########################################
Preparing... ########################################
Updating / installing...
vmware-alp-2.1.1-19234432 ########################################
New installing...
Found the /opt/vmware/alp/log, change log directory owner and permission ...
chmod: /opt/vmware/alp/log/*: No such file or directory
chown: /opt/vmware/alp/log/*: No such file or directory
Created symlink /etc/systemd/system/multi-user.target.wants/alp.service → /lib/systemd/system/alp.service.
Created symlink /etc/systemd/system/multi-user.target.wants/alp-deployer.service → /lib/systemd/system/alp-deployer.service.
Setup ALP connections
VMWARE END USER LICENSE AGREEMENT
Last updated: 03 May 2021
--- snipped ---
Cloud Director Setting for App Launchpad
+--------------------------------+-----------------------------------------------+
| Cloud Director URL | https://vcd.vmwire.com |
| App Launchpad Service Account | svc-alp |
| App Launchpad Service Password | Vmware1! |
| MQTT Triplet | VMware/AppLaunchpad/1.0.0 |
| MQTT Token | e089774e-389c-4e12-82d0-11378a30981d |
| MQTT Topic of Monitor | topic/extension/VMware/AppLaunchpad/1.0.0/ext |
| MQTT Topic of Response | topic/extension/VMware/AppLaunchpad/1.0.0/vcd |
| App Launchpad extension UUID | 9ba4f6c8-a1e4-3a57-bd4c-e5ca5c2f8375 |
+--------------------------------+-----------------------------------------------+
Successfully connected and configured with Cloud Director for App Launchpad.
start ALP Deployer service
Start ALP service
==> /opt/vmware/alp/deployer/log/deployer/default.log <==
{"level":"info","timestamp":"2022-03-18T09:54:24.446Z","caller":"cmd/deployer.go:68","msg":"Starting server","Config":{"ALP":{"System":{"Deployer":{"AuthToken":"***"}},"VCDEndpoint":{"URL":"https://vcd.vmwire.com","FingerprintsSHA256":"f4:e0:1b:7c:9c:d2:da:15:94:52:58:6f:80:02:2a:46:8f:ab:a5:91:d7:43:f6:8b:85:60:23:16:93:8b:2a:87"},"Deployer":{"Host":"127.0.0.1","Port":8087,"KubeRESTClient":{"QPS":256,"Burst":512,"Timeout":180000,"CertificateValidation":false},"ChartCacheSize":128}},"Logging":{"Stdout":false,"File":{"Path":"log/deployer/"},"Level":{"Com":{"VMware":{"ALP":"INFO"}}}}}}
{"level":"info","timestamp":"2022-03-18T09:54:24.550Z","caller":"server/manager.go:59","msg":"The manager is starting mux-router"}
__ __ __ __ __ __ _ ____ _____ _ _ ____
\ \ / / | \/ | \ \ / / / \ | _ \ | ____| / \ | | | _ \
\ \ / / | |\/| | \ \ /\ / / / _ \ | |_) | | _| / _ \ | | | |_) |
\ V / | | | | \ V V / / ___ \ | _ < | |___ / ___ \ | |___ | __/
\_/ \_/\_/ /_/ \_\ \_\ |_____| /_/ \_\ |_____|
:: Spring Boot Version : 2.4.13
:: VMware vCloud Director App LaunchPad Version : 2.1.1-19234432, Build Date: Thu Jan 20 02:18:37 GMT 2022
=================================================================================================================
What next?
It will take around two minutes until ALP is ready.
Open the VCD provider portal and click on the More menu to open up App Launchpad.
From here you can configure App Launchpad and enjoy using the app in a container running in a Kubernetes cluster.
Some other details
You’ll notice (if you deployed Kubernetes Dashboard), that the pod uses minimal resources after it has started and settled down to an idle state.
Using pretty much no CPU and around 300Mb of memory. This is so much better than running this thing in a VM right?
Note that I have used MQTT for the message bus between ALP and VCD. If you use RabbitMQ, you can in fact deploy multiple pods of ALP and enable Kubernetes to run ALP as a clustered service. MQTT does not support multiple instances of ALP.
Just change the replicaCount value from 1 to 2, and also edit the configMap to change from MQTT to RabbitMQ.
To finish off
I’ve found that moving my lab applications such as ALP and Container Service Extension to Kubernetes has freed up a lot of memory and CPU. This is the main use case for me as I run a lot of labs and demo environments. It is also just a lot easier to deploy these applications with Helm into Kubernetes than using virtual machines.
This is just one example of modernizing some of the VCPP applications to take advantage of the benefits of running in Kubernetes.
I hope this helps you too. Feel free to comment below if you find this useful. I am also working on improving my Container Service Extension Helm chart and will publish that when it is ready.
Feature gates are a set of key=value pairs that describe Kubernetes features. You can turn these features on or off using the a ytt overlay file or by editing KubeadmControlPlane or VSphereMachineTemplate. This post, shows you how to enable a feature gate by enabling the MixedProtocolLBService to the TKG kube-apiserver. It can be used to enable other feature gates as well, however, I am using the MixedProtocolLBService to test this at one of my customers.
Feature gates are a set of key=value pairs that describe Kubernetes features. You can turn these features on or off using the a ytt overlay file or by editing KubeadmControlPlane or VSphereMachineTemplate. This post, shows you how to enable a feature gate by enabling the MixedProtocolLBService to the TKG kube-apiserver. It can be used to enable other feature gates as well, however, I am using the MixedProtocolLBService to test this at one of my customers.
Note that enabling feature gates on TKG clusters is unsupported.
The customer has a requirement to test mixed protocols in the same load balancer service (multiple ports and protocols on the same load balancer IP address). This feature is currently in alpha and getting a head start on alpha features is always a good thing to do to stay ahead.
For example to do this in a LoadBalancer service (with the MixedProtocolLBService feature gate enabled):
Let’s assume that you want to enable this feature gate before deploying a new TKG cluster. I’ll show you how to enable this on an existing cluster further down the post.
Greenfield – before creating a new TKG cluster
Create a new overlay file named kube-apiserver-feature-gates.yaml. Place this file in your ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/ directory. For more information on ytt overlays, please read this link.
#! Please add any overlays specific to vSphere provider under this file.
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")
#! Enable MixedProtocolLBService feature gate on kube api.
#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
---
spec:
kubeadmConfigSpec:
clusterConfiguration:
apiServer:
extraArgs:
#@overlay/match missing_ok=True
feature-gates: MixedProtocolLBService=true
Deploy the TKG cluster.
Inspect the kube-apiserver pod for feature gate
k get po -n kube-system kube-apiserver-tkg-test-control-plane-##### -o yaml
You should see on line 44 that the overlay has enabled the feature gate.
Inspect kubeadmcontrolplane, this is the control plane template for the master node, and all subsequent master nodes that are deployed. You can see on line 32, that the feature gate flag is enabled.
k get kubeadmcontrolplane tkg-test-control-plane -o yaml
Now if you created a service with mixed protocols, the kube-apiserver will accept the service and will tell the load balancer to deploy the service.
Brownfield – enable feature gates on an existing cluster
Enabling feature gates on an already deployed cluster is a little bit harder to do, as you need to be extra careful that you don’t break your current cluster.
Let’s edit the KubeadmControlPlane template, you need to do this in the tkg-mgmt cluster context
You’ll see that TKG has immediately started to clone a new control plane VM. Wait for the new VM to replace the current one.
If you inspect the new control plane VM, you’ll see that it has the feature gate applied. You need to do this in the worker cluster context that you want the feature gate enabled on, in my case tkg-hugo.
Note that enabling the feature gate to spec.kubeadmconfigspec.clusterconfiguration.apiserver.extraargs actually, enables the feature gate on the kube-apiserver, which in TKG runs in a pod.
kubectl config use-context tkg-hugo-admin@tkg-hugo
k get po kube-apiserver-tkg-hugo-control-plane-#### -n kube-system -o yaml
Go to the line spec.containers.command.kubeapiserver. You’ll see something like the following:
In the previous post, I described how to install Harbor using Helm to utilize ChartMuseum for running Harbor as a Helm chart repository.
The Harbor registry that comes shipped with TKG 1.5.1 uses Tanzu Packages to deploy Harbor into a TKG cluster. This version of Harbor does not support Helm Charts using ChartMuseum. VMware dropped support for ChartMuseum in TKG and are adopting OCI registries instead. This post describes how to deploy Harbor using the Tanzu Packages (KApp) and use Harbor as an OCI registry that fully supports Helm charts. This is the preferred way to use chart and image registries.
The latest versions as of TKG 1.5.1 packages, February 2022.
Package
Version
cert-manager
1.5.3+vmware.2-tkg.1
contour
1.18.2+vmware.1-tkg.1
harbor
2.3.3+vmware.1-tkg.1
Or run the following to see the latest available versions.
tanzu package available list harbor.tanzu.vmware.com -A
Pre-requisites
Before installing Harbor, you need to install Cert Manager and Contour. You can follow this other guide here to get started. This post uses Ingress, which requires NSX Advanced Load Balancer (Avi). The previous post will show you how to install these pre-requisites.
Deploy Harbor
Create a configuration file named harbor-data-values.yaml. This file configures the Harbor package. Follow the steps below to obtain a template file.
Specify other settings in the harbor-data-values.yaml file.
Set the hostname setting to the hostname you want to use to access Harbor via ingress. For example, harbor.yourdomain.com.
To use your own certificates, update the tls.crt, tls.key, and ca.crt settings with the contents of your certificate, key, and CA certificate. The certificate can be signed by a trusted authority or be self-signed. If you leave these blank, Tanzu Kubernetes Grid automatically generates a self-signed certificate.
The format of the tls.crt and tls.key looks like this:
Obtain the address of the Envoy service load balancer.
kubectl get svc envoy -n tanzu-system-ingress -o jsonpath='{.status.loadBalancer.ingress[0]}'
Update your DNS record to point the hostname to the IP address above.
Update Harbor
Update the Harbor installation in any way, such as updating the TLS certificate, make your changes to the harbor-data-values.yaml file then run the following to update Harbor.
The chart can now be seen in the Harbor UI in the view as where normal Docker images are.
OCI based Harbor
Notice that this is an OCI registry and not a Helm repository that is based on ChartMuseum, thats why you won’t see the ‘Helm Charts’ tab next to the ‘Repositories’ tab.
ChartMuseum based Harbor
Deploy an application with Helm
Let’s deploy the buildachart application, this is a simple nginx application that can use TLS so we have a secure site with HTTPS.
Create a new namespace and the TLS secret for the application. Copy the tls.crt and tls.key files in pem format to $HOME/certs/
# Create a new namespace for cherry
k create ns cherry
# Create a TLS secret with the contents of tls.key and tls.crt in the cherry namespace
kubectl create secret tls cherry-tls --key $HOME/certs/tls.key --cert $HOME/certs/tls.crt -n cherry
Deploy the app using Harbor as the Helm chart repository
The Harbor registry that comes shipped with TKG 1.5.1 uses Tanzu Packages to deploy Harbor into a TKG cluster. This version of Harbor does not support Helm Charts using ChartMuseum. VMware dropped support for ChartMuseum in TKG and are adopting OCI registries instead. This post describes how to deploy the upstream Harbor distribution that supports ChartMuseum for a helm repository. Follow this other post here to deploy Harbor with Tanzu Packages (Kapp) with support for OCI.
Intro
The Harbor registry that comes shipped with TKG 1.5.1 uses Tanzu Packages to deploy Harbor into a TKG cluster. This version of Harbor does not support Helm Charts using ChartMuseum. VMware dropped support for ChartMuseum in TKG and are adopting OCI registries instead. This post describes how to deploy the upstream Harbor distribution that supports ChartMuseum for a helm repository. Follow this other post here to deploy Harbor with Tanzu Packages (Kapp) with support for OCI.
The example below uses the following components:
TKG 1.5.1
AKO 1.6.1
Contour 1.18.2
Helm 3.8.0
Use the previous post to deploy the per-requisites.
A storage class if you don’t have a default storage class. Leave blank to use your default storage class.
355
admin password
expose:
# Set the way how to expose the service. Set the type as "ingress",
# "clusterIP", "nodePort" or "loadBalancer" and fill the information
# in the corresponding section
type: ingress
tls:
# Enable the tls or not.
# Delete the "ssl-redirect" annotations in "expose.ingress.annotations" when TLS is disabled and "expose.type" is "ingress"
# Note: if the "expose.type" is "ingress" and the tls
# is disabled, the port must be included in the command when pull/push
# images. Refer to https://github.com/goharbor/harbor/issues/5291
# for the detail.
enabled: true
# The source of the tls certificate. Set it as "auto", "secret"
# or "none" and fill the information in the corresponding section
# 1) auto: generate the tls certificate automatically
# 2) secret: read the tls certificate from the specified secret.
# The tls certificate can be generated manually or by cert manager
# 3) none: configure no tls certificate for the ingress. If the default
# tls certificate is configured in the ingress controller, choose this option
certSource: secret
auto:
# The common name used to generate the certificate, it's necessary
# when the type isn't "ingress"
commonName: ""
secret:
# The name of secret which contains keys named:
# "tls.crt" - the certificate
# "tls.key" - the private key
secretName: "harbor-cert"
# The name of secret which contains keys named:
# "tls.crt" - the certificate
# "tls.key" - the private key
# Only needed when the "expose.type" is "ingress".
notarySecretName: "harbor-cert"
ingress:
hosts:
core: harbor.vmwire.com
notary: notary.harbor.vmwire.com
---snipped---
Step 3 – Create a TLS secret for ingress
Copy the tls.crt and tls.key files in pem format to $HOME/certs/
# Create a new namespace for harbor
k create ns harbor
# Create a TLS secret with the contents of tls.key and tls.crt in the harbor namespace
kubectl create secret tls harbor-cert --key $HOME/certs/tls.key --cert $HOME/certs/tls.crt -n harbor
Step 4 – Install Harbor
Ensure you’re in the directory that you ran Step 2 in.
helm install harbor . -n harbor
Monitor deployment with
kubectl get po -n harbor
Log in
Use admin and the password you set on line 355 of the values.yaml file. The default password is Harbor12345.
For an overview of Kapp, please see this link here.
The latest versions as of TKG 1.5.1, February 2022.
Package
Version
cert-manager
1.5.3+vmware.2-tkg.1
contour
1.18.2+vmware.1-tkg.1
prometheus
2.27.0+vmware.2-tkg.1
grafana
7.5.7+vmware.2-tkg.1
Or run the following to see the latest available versions.
tanzu package available list cert-manager.tanzu.vmware.com -A
tanzu package available list contour.tanzu.vmware.com -A
tanzu package available list prometheus.tanzu.vmware.com -A
tanzu package available list grafana.tanzu.vmware.com -A
I’m using ingress with Contour which needs a load balancer to expose the ingress services. Install AKO and NSX Advanced Load Balancer (Avi) by following this previous post.
Install Contour
Create a file named contour-data-values.yaml, this example uses NSX Advanced Load Balancer (Avi)
Generate a Base64 password and edit the grafana-data-values.yaml file to update the default admin password.
echo -n 'Vmware1!' | base64
Also update the TLS configuration to use signed certificates for ingress. It will look something like this.
secret:
type: "Opaque"
admin_user: "YWRtaW4="
admin_password: "Vm13YXJlMSE="
ingress:
enabled: true
virtual_host_fqdn: "grafana-tkg-mgmt.vmwire.com"
prefix: "/"
servicePort: 80
#! [Optional] The certificate for the ingress if you want to use your own TLS certificate.
#! We will issue the certificate by cert-manager when it's empty.
tlsCertificate:
#! [Required] the certificate
tls.crt: |
-----BEGIN CERTIFICATE-----
---snipped---
-----END CERTIFICATE-----
#! [Required] the private key
tls.key: |
-----BEGIN PRIVATE KEY-----
---snipped---
-----END PRIVATE KEY-----
Since I’m using ingress to expose the Grafana service, also change line 33, from LoadBalancer to ClusterIP. This prevents Kapp from creating an unnecessary service that will consume an IP address.
Since I’m using ingress and I set the ingress FQDN as grafana-tkg-mgmt.vmwire.com and I also used TLS. I can now access the Grafana UI using https://grafana-tkg-mgmt.vmwire.com and enjoy a secure connection.
Listing all installed packages
tanzu package installed list -A
Making changes to Contour, Prometheus or Grafana
If you need to make changes to any of the configuration files, you can then update the deployment with the tanzu package installed update command.
This post shows how to run a replicated stateful application on local storage using a StatefulSet controller. This application is a replicated MySQL database. The example topology has a single primary server and multiple replicas, using asynchronous row-based replication. The MySQL data is using a storage class backed by local SSD storage provided by the vSphere CSI driver performing the dynamic PersistentVolume provisioner.
This post continues from the previous post where I described how to setup multi-AZ topology aware volume provisioning with local storage.
I used this example here to setup a StatefulSet with MySQL to get an example application up and running.
However, I did not use the default storage class, but added one line to the mysql-statefulset.yaml file to use the storage class that is backed by local SSDs instead.
I also appended the StatefulSet to include the spec.template.spec.affinity and spec.template.spec.podAntiAffinity settings to make use of the three AZs for pod scheduling.
spec:
selector:
matchLabels:
app: mysql
serviceName: mysql
replicas: 3
template:
metadata:
labels:
app: mysql
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.csi.vmware.com/k8s-zone
operator: In
values:
- az-1
- az-2
- az-3
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mysql
topologyKey: topology.csi.vmware.com/k8s-zone
Everything else stayed the same. Please spend some time reading the example from kubernetes.io as I will be performing the same steps but using local storage instead to test the behavior of MySQL replication.
Architecture
I am using the same setup, with three replicas in the StatefulSet to match with the three AZs that I have setup in my lab.
My AZ layout is the following.
AZ
ESX host
TKG worker
az-1
esx1.vcd.lab
tkg-hugo-md-0-7d455b7488-g28bl
az-2
esx2.vcd.lab
tkg-hugo-md-1-7bbd55cdb8-996×2
az-3
esx3.vcd.lab
tkg-hugo-md-2-6c6c49dc67-xbpg7
We can see which pod runs on which worker using the following command:
k get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysql-0 2/2 Running 0 3h24m 100.120.135.67 tkg-hugo-md-1-7bbd55cdb8-996x2 <none> <none>
mysql-1 2/2 Running 0 3h22m 100.127.29.3 tkg-hugo-md-0-7d455b7488-g28bl <none> <none>
mysql-2 2/2 Running 0 113m 100.109.206.65 tkg-hugo-md-2-6c6c49dc67-xbpg7 <none> <none>
To see which PVCs are using which AZs using the CSI driver’s node affinity we can use this command.
kubectl get pv -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.claimRef.name}{"\t"}{.spec.nodeAffinity}{"\n"}{end}'
The server_id’s are either 100, 101, or 102, referencing either mysql-0, mysql-1 or mysql-2 respectively. We can see that we can read data from all three of the pods which means our MySQL service is running well across all three AZs.
Simulating Pod and Node downtime
To demonstrate the increased availability of reading from the pool of replicas instead of a single server, keep the SELECT @@server_id loop from above running while you force a Pod out of the Ready state.
Delete Pods
The StatefulSet also recreates Pods if they’re deleted, similar to what a ReplicaSet does for stateless Pods.
kubectl delete pod mysql-2
The StatefulSet controller notices that no mysql-2 Pod exists anymore, and creates a new one with the same name and linked to the same PersistentVolumeClaim. You should see server ID 102 disappear from the loop output for a while and then return on its own.
Drain a Node
If your Kubernetes cluster has multiple Nodes, you can simulate Node downtime (such as when Nodes are upgraded) by issuing a drain.
We already know that mysql-2 is running on worker tkg-hugo-md-2. Then drain the Node by running the following command, which cordons it so no new Pods may schedule there, and then evicts any existing Pods.
What happens now is the pod mysql-2 will be evicted, it will also have its PVC unattached. Because we only have one worker per AZ, mysql-2 won’t be able to be scheduled on another node in another AZ.
The mysql-client-loop pod would show that 102 (mysql-2) is no longer serving MySQL requests. The pod mysql-2 will stay with a status as pending until a worker is available in AZ2 again.
Perform maintenance on ESX
After draining the worker node, we can now go ahead and perform maintenance operations on the ESX host by placing it into maintenance mode. Doing so will VMotion any VMs that are not using shared storage. You will find that because the worker node is still powered on and has locally attached VMDKs, this will prevent the ESX host from going into maintenance mode.
We know that the worker node is already drained and the MySQL application has two other replicas that are running in two other AZs, so we can safely power off this worker and enable the ESX host to complete going into maintenance mode. Yes, power off instead of gracefully shutting down. Kubernetes worker nodes are cattle and not pets and Kubernetes will destroy it anyway.
Operations with local storage
Consider the following when using local storage with Tanzu Kubernetes Grid.
TKG worker nodes that have been tagged with a k8s-zone and have attached PVs will not be able to VMotion.
TKG worker nodes that have been tagged with a k8s-zone and do not have attached PVs will also not be able to VMotion as they have the affinity rule set to “Must run on this host”.
Placing a ESX host into maintenance mode will not complete until the TKG worker node running on that host has been powered off.
However, do not be alarmed by any of this, as this is normal behavior. Kubernetes workers can be replaced very often and since we have a stateful application with more than one replica, we can do this with no consequences.
The following section shows why this is the case.
How do TKG clusters with local storage handle ESX maintenance?
To perform maintenance on an ESX host that requires a host reboot perform the following.
Drain the TKG worker node of the host that you want to place into maintenance mode
What this does is it evicts all pods but daemonsets, it will also evict the MySQL pod running on this node, including removing the volume mount. In our example here, we still have the other two MySQL pods running on two other worker nodes.
Now place the ESX host into maintenance mode.
Power off the TKG worker node on this ESX host to allow the host to go into maintenance mode.
You might notice that TKG will try to delete that worker node and clone a new worker node on this host, but it cannot due to the host being in maintenance mode. This is normal behavior as any Kubernetes clusters will try to replace a worker that is no longer accessible. This of course is the case as we have powered ours off.
You will notice that Kubernetes does not try to create a worker node on any other ESX host. This is because the powered-off worker is labelled with one of the AZs therefore Kubernetes tries to place a new worker in the same AZ.
Perform ESX maintenance as normal and when complete exit the host from maintenance mode.
When the host exits maintenance mode, you’ll notice that Kubernetes can now delete the powered-off worker and replace it with a new one.
When the new worker node powers on and becomes ready, you will notice that the previous PV that was attached to the now deleted worker node is now attached to the new worker node.
The MySQL pod will then claim the PV and the pod will start and come out of pending status into ready status.
All three MySQL pods are now up and running and we have a healthy MySQL cluster again. Any MySQL data that was changed during this maintenance window will be replicated to the MySQL pod.
Summary
Using local storage backed storage classes with TKG is a viable alternative to using shared storage when your applications can perform data protection and replication at a higher level. Applications such as databases like the MySQL example that I used can benefit from using cheaper locally attached fast solid state media such as SSD or NVMe without the need to create hyperconverged storage environments. Applications that can replicated data at the application level, can avoid using SAN and NAS completely and benefit from simpler infrastructures and lower costs as well as benefiting from faster storage and lower latencies.
With the vSphere CSI driver, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.
With this model, it is possible to present individual SSDs or NVMe drives attached to an ESXi host and configure a local datastore for use with topology aware volume provisioning. Kubernetes can then create persistent volumes and schedule pods that are deployed onto the worker nodes that are on the same ESXi host as the volume. This enables Kubernetes pods to have direct local access to the underlying storage.
With the vSphere CSI driver version 2.4.1, it is now possible to use local storage with TKG clusters. This is enabled by TKG’s Topology Aware Volume Provisioning capability.
Using local storage has distinct advantages over shared storage, especially when it comes to supporting faster and cheaper storage media for applications that do not benefit from or require the added complexity of having their data replicated by the storage layer. Examples of applications that do not require storage protection (RAID or failures to tolerate) are applications that can achieve data protection at the application level.
With this model, it is possible to present individual SSDs or NVMe drives attached to an ESXi host and configure a local datastore for use with topology aware volume provisioning. Kubernetes can then create persistent volumes and schedule pods that are deployed onto the worker nodes that are on the same ESXi host as the volume. This enables Kubernetes pods to have direct local access to the underlying storage.
Figure 1.
To setup such an environment, it is necessary to go over some of the requirements first.
Deploy Tanzu Kubernetes Clusters to Multiple Availability Zones on vSphere – link
Spread Nodes Across Multiple Hosts in a Single Compute Cluster
Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware – link
Deploy TKG clusters into a multi-AZ topology
Deploy the k8s-local-ssd storage class
Deploy Workloads with WaitForFirstConsumer Mode in Topology-Aware Environment – link
Before you start
Note that only the CSI driver for vSphere version 2.4.1 supports local storage topology in a multi-AZ topology. To check if you have the correct version in your TKG cluster, run the following.
tanzu package installed get vsphere-csi -n tkg-system
- Retrieving installation details for vsphere-csi... I0224 19:20:29.397702 317993 request.go:665] Waited for 1.03368201s due to client-side throttling, not priority and fairness, request: GET:https://172.16.3.94:6443/apis/secretgen.k14s.io/v1alpha1?timeout=32s
\ Retrieving installation details for vsphere-csi...
NAME: vsphere-csi
PACKAGE-NAME: vsphere-csi.tanzu.vmware.com
PACKAGE-VERSION: 2.4.1+vmware.1-tkg.1
STATUS: Reconcile succeeded
CONDITIONS: [{ReconcileSucceeded True }]
Deploy Tanzu Kubernetes Clusters to Multiple Availibility Zones on vSphere
In my example, I am using the Spread Nodes Across Multiple Hosts in a Single Compute Cluster example, each ESXi host is an availability zone (AZ) and the vSphere cluster is the Region.
Figure 1. shows a TKG cluster with three worker nodes, each node is running on a separate ESXi host. Each ESXi host has a local SSD drive formatted with VMFS 6. The topology aware volume provisioner would always place pods and their replicas on separate worker nodes and also any persistent volume claims (PVC) on separate ESXi hosts.
*Note that “cluster” is the name of my vSphere cluster.
Ensure that you’ve set up the correct rules that enforce worker nodes to their respective ESXi hosts. Always use “Must run on hosts in group“, this is very important for local storage topology to work. This is because the worker nodes will be labelled for topology awareness, and if a worker node is vMotion’d accidentally then the CSI driver will not be able to bind the PVC to the worker node.
Below is my vsphere-zones.yaml file.
Note that autoConfigure is set to true. Which means that you do not have to tag the cluster or the ESX hosts yourself, you would only need to setup up the affinity rules under Cluster, Configure, VM/Host Groups and VM/Host Rules. The setting autoConfigure: true, would then make CAPV automatically configure the tags and tag categories for you.
Note that Kubernetes does not like using parameter names that are not standard, I suggest for your vmGroupName and hostGroupName parameters, use lowercase and dashes instead of periods. For example host-group-3, instead of Host.Group.3. The latter will be rejected.
Configure Tanzu Kubernetes Plans and Clusters with an overlay that is topology-aware
To ensure that this topology can be built by TKG, we first need to create a TKG cluster plan overlay that tells Tanzu how what to do when creating worker nodes in a multi-availability zone topology.
Lets take a look at my az-overlay.yaml file.
Since I have three AZs, I need to create an overlay file that includes the cluster plan for all three AZs.
To deploy a TKG cluster that spreads its worker nodes over multiple AZs, we need to add some key value pairs into the cluster config file.
Below is an example for my cluster config file – tkg-hugo.yaml.
The new key value pairs are described in the table below.
Parameter
Specification
Details
VSPHERE_REGION
k8s-region
Must be the same as the configuration in the vsphere-zones.yaml file
VSPHERE_ZONE
k8s-zone
Must be the same as the configuration in the vsphere-zones.yaml file
VSPHERE_AZ_0 VSPHERE_AZ_1 VSPHERE_AZ_2
az-1 az-2 az-3
Must be the same as the configuration in the vsphere-zones.yaml file
WORKER_MACHINE_COUNT
3
This is the number of worker nodes for the cluster.
The total number of workers are distributed in a round-robin fashion across the number of AZs specified.
A note on WORKER_MACHINE_COUNT when using CLUSTER_PLAN: dev instead of prod.
If you change the az-overlay.yaml @ if data.values.CLUSTER_PLAN == “prod” to @ if data.values.CLUSTER_PLAN == “dev”
Then the WORKER_MACHINE_COUNT reverts to the number of workers for each AZ. So if you set this number to 3, in a three AZ topology, you would end up with a TKG cluster with nine workers!
Note that parameters.storagePolicyName: k8s-local-ssd, which is the same as the name of the storage policy for the local storage. All three of the local VMFS datastores that are backed by the local SSD drives are members of this storage policy.
Note that the volumeBindingMode is set to WaitForFirstConsumer.
Instead of creating a volume immediately, the WaitForFirstConsumer setting instructs the volume provisioner to wait until a pod using the associated PVC runs through scheduling. In contrast with the Immediate volume binding mode, when the WaitForFirstConsumer setting is used, the Kubernetes scheduler drives the decision of which failure domain to use for volume provisioning using the pod policies.
This guarantees the pod at its volume is always on the same AZ (ESXi host).
Deploy a workload that uses Topology Aware Volume Provisioning
Below is a statefulset that deploys three pods running nginx. It configures two persistent volumes, one for www and another for log. Both of these volumes are going to be provisioned onto the same ESXi host where the pod is running. The statefulset also runs an initContainer that will download a simple html file from my repo and copy it to the www mount point (/user/share/nginx/html).
You can see under spec.affinity.nodeAffinity how the statefulset uses the topology.
The statefulset then exposes the nginx app using the nginx-service which uses the Gateway API, that I wrote about in a previous blog post.
What if you wanted to use more than three availability zones?
Some notes here on what I experienced during my testing.
The TKG cluster config has the following three lines to specify the names of the AZs that you want to use which will be passed onto the Tanzu CLI to use to deploy your TKG cluster using the ytt overlay file. However, the Tanzu CLI only supports a total of three AZs.
If you wanted to use more than three AZs, then you would have to remove these three lines from the TKG cluster config and change the ytt overlay to not use the VSPHERE_AZ_# variables but to hard code the AZs into the ytt overlay file instead.
To do this replace the following:
#@ if data.values.VSPHERE_AZ_2:
failureDomain: #@ data.values.VSPHERE_AZ_0
#@ end
with the following:
failureDomain: az-2
and create an additional block of MachineDeployment and KubeadmConfigTemplate for each additional AZ that you need.
Summary
Below are screenshots and the resulting deployed objects after running kubectl apply -f to the above.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-hugo-md-0-7d455b7488-d6jrl Ready <none> 3h23m v1.22.5+vmware.1
tkg-hugo-md-1-bc76659f7-cntn4 Ready <none> 3h23m v1.22.5+vmware.1
tkg-hugo-md-2-6bb75968c4-mnrk5 Ready <none> 3h23m v1.22.5+vmware.1
You can see that the worker nodes are distributed across the ESXi hosts as per our vsphere-zones.yaml and also our az-overlay.yaml files.
kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-0 1/1 Running 0 3h14m 100.124.232.195 tkg-hugo-md-2-6bb75968c4-mnrk5 <none> <none>
web-1 1/1 Running 0 3h13m 100.122.148.67 tkg-hugo-md-1-bc76659f7-cntn4 <none> <none>
web-2 1/1 Running 0 3h12m 100.108.145.68 tkg-hugo-md-0-7d455b7488-d6jrl <none> <none>
You can see that each pod is placed on a separate worker node.
kubectl get csinodes -o jsonpath='{range .items[*]}{.metadata.name} {.spec}{"\n"}{end}'
In a previous post I went through how to deploy the Kubernetes Dashboard into a Kubernetes cluster with default settings, running with a self-signed certificate. This post covers how to update the configuration to use a signed certificate. I’m a fan of Let’s Encrypt so will be using a signed wildcard certificate from Let’s Encrypt for this post.
In a previous post I went through how to deploy the Kubernetes Dashboard into a Kubernetes cluster with default settings, running with a self-signed certificate. This post covers how to update the configuration to use a signed certificate. I’m a fan of Let’s Encrypt so will be using a signed wildcard certificate from Let’s Encrypt for this post.
You can prepare Let’s Encrypt by referring to a previous post here.
Step 1. Create a new namespace
Create a new namespace for Kubernetes Dashboard
kubectl create ns kubernetes-dashboard
Step 2. Upload certificates
Upload your certificate and private key to $HOME/certs in pem format. Let’s Encrypt just happens to issue certificates in pem format with the following names:
cert.pem and privkey.pem
All we need to do is to rename these to:
tls.crt and tls.key
And then upload them to $HOME/certs where our kubectl tool is installed.
Step 3. Create secret
Create the secret for the custom certificate by running this command.
Save changes to the file. Now we’re ready to deploy.
If you want to use the Avi Services API (K8s Gateway API). Then add labels to the service, like this. This will ensure that the service uses the Avi gateway.
This post describes how to setup Harbor to run on a standalone VM. There are times when you want to do this, such as occasions where your environment does not have internet access or you want to have a local repository running close to your environment.
This post describes how to setup Harbor to run on a standalone VM. There are times when you want to do this, such as occasions where your environment does not have internet access or you want to have a local repository running close to your environment.
I found that I was running a lot of TKG deployments against TKG staging builds in my lab and wanted to speed up cluster creation times, so building a local Harbor repository would make things a bit quicker and more reliable.
This post describes how you can setup a Harbor repository on a Photon VM.
# Configuration file of Harbor
# The IP address or hostname to access admin UI and registry service.
# DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname: harbor.vmwire.com
# http related config
http:
# port for http, default is 80. If https enabled, this port will redirect to https port
port: 80
# https related config
https:
# https port for harbor, default is 443
port: 443
# The path of cert and key files for nginx
certificate: /etc/docker/certs.d/harbor.vmwire.com/harbor.cert
private_key: /etc/docker/certs.d/harbor.vmwire.com/harbor_key.key
[snipped]
Update line 5 with your harbor instance’s FQDN.
Update lines 17 and 18 with the certificate and private key.
You can leave all the other lines on default.
Install Harbor with the following command:
./install.sh
Check to see if services are running
docker-compose ps
Step 9: Add harbor FQDN to your DNS servers and connect to Harbor.
To upgrade, download the new offline installer and run
Avi (NSX Advanced Load Balancer) supports Kubernetes Gateway API. This post shows how to install and use the Gateway API to expose applications using this custom resource definition (CRD).
Introduction
Avi (NSX Advanced Load Balancer) supports Kubernetes Gateway API. This post shows how to install and use the Gateway API to expose applications using this custom resource definition (CRD).
Gateway API is an open source project managed by the SIG-NETWORK community. It is a collection of resources that model service networking in Kubernetes. These resources – GatewayClass,Gateway, HTTPRoute, TCPRoute, Service, etc – aim to evolve Kubernetes service networking through expressive, extensible, and role-oriented interfaces that are implemented by many vendors and have broad industry support.
For a quick introduction to the Kubernetes Gateway API, read this link and this link from the Avi documentation.
Why use Gateway API?
You would want to use the Gateway API if you had the following requirements:
Network segmentation – exposing applications from the same Kubernetes cluster to different network segments
Shared IP – exposing multiple services that use both TCP and UDP ports on the same IP address
NSX Advanced Load Balancer supports both of these requirements through the use of the Gateway API. The following section describes how this is implemented.
The Gateway API introduces a few new resource types:
GatewayClasses are cluster-scoped resources that act as templates to explicitly define behavior for Gateways derived from them. This is similar in concept to StorageClasses, but for networking data-planes.
Gateways are the deployed instances of GatewayClasses. They are the logical representation of the data-plane which performs routing, which may be in-cluster proxies, hardware LBs, or cloud LBs.
AVI Infra Setting
Aviinfrasetting provides a way to segregate Layer-4/Layer-7 virtual services to have properties based on different underlying infrastructure components, like Service Engine Group, intended VIP Network etc.
Avi Infra Setting is a cluster scoped CRD and can be attached to the intended Services. Avi Infra setting resources can be attached to Services using Gateway APIs.
GatewayClass
Gateway APIs provide interfaces to structure Kubernetes service networking.
AKO supports Gateway APIs via the servicesAPI flag in the values.yaml.
The Avi Infra Setting resource can be attached to a Gateway Class object, via the .spec.parametersRef as shown below:
The Gateway object provides a way to configure multiple Services as backends to the Gateway using label matching. The labels are specified as constant key-value pairs, the keys being ako.vmware.com/gateway-namespace and ako.vmware.com/gateway-name. The values corresponding to these keys must match the Gateway namespace and name respectively, for AKO to consider the Gateway valid. In case any one of the label keys are not provided as part of matchLabels OR the namespace/name provided in the label values do no match the actual Gateway namespace/name, AKO will consider the Gateway invalid.
A Gateway uses a GatewayClass, which in turn uses an AviInfraSetting. Therefore when a Gateway is used by a Service using the relevant labels, that particular service will be exposed on a network that is referenced by the AviInfraSetting via the .spec.network.vipNetworks
In your helm charts, for any service that needs a LoadBalancer service. You would now want to use ClusterIP instead of LoadBalancer and use Labels such as the following:
and the ClusterIP type tells the AKO operator to use the gateways, each gateway is on a separate network segment for traffic separation via the spec.gatewayClassName and conversely the gatewayclass via the spec.parametersRef.name for the AviInfraSetting.
This post describes how to change TKGm control plane nodes resources, such as vCPU and RAM. In the previous post, I described how to increase resources for a worker node. This process was quite simple and straightforward and initially I had a tough time finding the right resource to edit as the control plane nodes use a different resource to provision the virtual machines.
This post describes how to change TKGm control plane nodes resources, such as vCPU and RAM. In the previous post, I described how to increase resources for a worker node. This process was quite simple and straightforward and initially I had a tough time finding the right resource to edit as the control plane nodes use a different resource to provision the virtual machines.
Step 1. Change to the TKG management cluster context
kubectl config use-context tkg-mgmt
Step 2. List VSphereMachineTemplate
kubectl get VSphereMachineTemplate
Step 4. Make a copy of the current control plane VsphereMachineTemplate to a new file
kubectl get vspheremachinetemplates tkg-ssc-control-plane -o yaml > tkg-ssc-control-plane-new.yaml
Change line 32, to use the new VsphereMachineTemplate called tkg-ssc-control-plane-new. Once you save and quit with :wq! the control plane nodes will be re-deployed.
Envoy is configured to run as a non-root user by default. This is much more secure but we won’t be able to use any ports that are lower than 1024. Therefore we must change the values.yaml file for contour.
Edit the values.yaml file located in the directory that you untar the tkz file into and search for
envoy.containerPorts.http
Change the http port to 8080 and the https port to 8443.
It should end up looking like this:
containerPorts:
http: 8080
https: 8443
Step 4. Installing Contour (and Envoy)
Install Contour by running the following command
helm install ingress <path-to-contour-directory>
You should get one daemonset named ingress-contour-envoy and deployment named ingress-contour-contour. These spin up two pods.
You will also see two services starting, one called ingress-contour with a service type of ClusterIP and another called ingress-contour-envoy with a service type LoadBalancer. Wait for NSX ALB to assign an external IP for the envoy service from your Organization network IP pool.
This IP is now your Kubernetes cluster IP for ingress services. Make a note of this IP address. My example uses 10.149.1.116 as the external IP.
Step 5. Setup DNS
The next step to do is to setup DNS, I’m using Windows DNS in my lab so what I’ve done is setup a sub domain called apps.vmwire.com and also setup an A record pointing to *.apps.vmwire.com.
*.apps.vmwire.com 10.149.1.116
DNS is now setup to point *.apps.vmwire.com to the external IP assigned to Envoy. From this point forward, any DNS request that hits *.apps.vmwire.com will be redirected to Contour.
They are two yaml files that deploys a sample web application and then exposes the applications using Contour and Envoy.
You don’t have to edit the shapes.yaml file, but you will need to edit the shapes-ingress.yaml file and change lines 9 and 16 to your desired DNS settings.
In this example, Contour will use circles.apps.vmwire.com to expose the circles application and triangles.apps.vmwire.com to expose the triangles application. Note that we are not adding circles. or triangles. A records into the DNS server.
Lets deploy the circles and triangles apps.
kubectl apply -f shapes.yaml
And then expose the applications with Contour
kubectl apply -f shapes-ingress.yaml
Now open up a web browser and navigate to http://circles.<your-domain> or http://triangles.<your-domain> and see the apps being exposed by Contour. If you don’t get a connection, its probably because you haven’t enabled port 80 through your Edge Gateway.