Enable external access to Kubernetes clusters running in VMware Cloud Director with CSE expose

This post details how you can enable Kubernetes clusters provisioned by the Container Service Extension to be accessible from outside of the cloud provider networks.

Providing great user experience to Kubernetes as a service from a cloud provider is important and as such enabling users to use their tools running on their personal devices to connect to remotely hosted Kubernetes clusters running in the cloud is a key feature of any cloud service.

Advertisement

This post details how you can enable Kubernetes clusters provisioned by the Container Service Extension to be accessible from outside of the cloud provider networks.

Providing great user experience to Kubernetes as a service from a cloud provider is important and as such enabling users to use their tools running on their personal devices to connect to remotely hosted Kubernetes clusters running in the cloud is a key feature of any cloud service.

A brief review of VCD networking

VMware Cloud Director provides network isolation between tenants by leveraging Geneve based networking provided by NSX-T. In simple terms, a tenant can utilize any network subnet without worrying about clashing with any other tenant using the same VCD cloud.

That means that a tenant with a private address space can deploy a Kubernetes cluster and utilize internal addresses for the Control Plane and the Worker nodes. A user can then access the Control Plane endpoint from inside of the tenant’s VDC and run kubectl commands happily and this will work – using a jumpbox for example. However, doing this from outside of the organization virtual datacenter will not work. Even if you tried to setup a DNAT rule to NAT to the internal IP of the Control Plane endpoint and mapping it to an external IP on the Edge gateway.

It doesn’t work because of the x.509 certificate that gets created when kubeadm creates the Kubernetes cluster. During this phase the certificate needs to include all subject alternative names (SANS) and with CSE, there is no way for the operator to define SANs during cluster provisioning with CSE.

If you attempt to connect using the external IP of the DNAT rule, you may get an error like the below:

kubectl get nodes -A --kubeconfig=tkg-vcd.yaml

Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 192.168.0.100, not 10.149.1.101

For context, 192.168.0.100 is the internal IP of the Control Plane node. 10.149.1.101 is the external IP in the external IP pool allocated to this tenant’s Edge gateway. See the high-level architecture diagram.

How can we enable a better user experience to access a Kubernetes cluster running in a provider’s cloud?

Container Service Extension has a feature called ‘expose’ that can be used during Kubernetes cluster provisioning to enable the DNAT changes to the Edge gateway as well as including the external IP into the x.509 certificate SANs. This is done automatically and at the current CSE 3.0.4 version only through the vcd cse cli. Please see my previous post to learn more.

What is supported with CSE 3.0.4?

Expose works under the following conditions

  • cluster deployment via vcd cse cli only, no UI
  • new kubernetes cluster deployments only
  • you can deploy a cluster without expose initially but you cannot expose it later
  • you can deploy a cluster with expose and then un-expose it later, however you cannot re-expose it again
  • you are using NSX-T for VCD networking
  • the tenant has an Edge gateway defined for their VDCs
  • you have an external IP pool assigned to the Edge gateway
  • expose works with both TKGm and native k8s runtimes

High Level Architecture

Deploying a Kubernetes cluster using expose

To enable this feature create a cluster config file anywhere on a terminal with the vcd cse cli installed. Below is an example of my config.yaml file, notice the lines for kind: use either TKGm for a TKGm runtime or native for a native runtime. Also change the template_name to suit the runtime.

The line under the spec section for expose: true will enable this feature.

api_version: '35.0'
  kind: TKGm
  metadata:
    cluster_name: tkg1
    org_name: tenant1
    ovdc_name: tenant1-vdc
  spec:
    control_plane:
      count: 1
      storage_profile: truenas-iscsi-luns
    expose: true
    k8_distribution:
      template_name: ubuntu-20.04_tkgm-1.20_antrea-0.11
      template_revision: 1
    settings:
      network: default-organization-network
      rollback_on_failure: true
      ssh_key: null
    workers:
      count: 1
      storage_profile: truenas-iscsi-luns

Log into VCD using tenant credentials, by the way a tenant can use vcd cse cli to do this themselves to maintain self-service use cases. As a provider you don’t have to do this on a tenant’s behalf.

syntax is vcd login <cloud-url> <organization> <user>

vcd login vcd.vmwire.com tenant1 tenant1-admin -i -w

Now to run the deployment just use this command

vcd cse cluster apply config.yaml

You’ll see in VCD that the tasks will kick off and your new cluster will be made available soon. What VCD does during deployment is it will pick up an IP address either using DHCP or static IP pool for the internal network (geneve NSX-T segment), in my example this is an IP on the 192.168.0.0/24 range and in the organization network named default-organization-network. This IP will be assigned to the master node of the Control Plane, in my case 192.168.0.100.

VCD will also create a DNAT rule and pick up the next available IP address from the external IP pool allocated to the Edge gateway. In my example this will be 10.149.1.102.

You can review the tasks for this workflow below

Once the cluster is ready, a user will just need to download the kubeconfig file onto his workstation and use the cluster.

Notice that the Control Plane Gateway IP is not an internal IP but in fact one of the external IPs of the organization VDC.

This is also reflected in the kubeconfig file on line 5. CSE expose uses the external IP and also adds all the IPs into the SANs.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: [snipped]
    server: https://10.149.1.102:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: [snipped]
    client-key-data: [snipped]

Logging into the Kubernetes cluster from outside of the cloud

As long as your workstation can route to the Control Plane Gateway IP you will be able to access the cluster from anywhere. Note that you can allocate public IP addresses directly to the Edge gateway, and in fact I work with providers who do this using BGP to the NSX-T T0. CSE expose basically uses an IP from the external network IP allocation pool.

The easiest way to test connectivity is to use kubectl like the following example.

kubectl get nodes -A --kubeconfig=/root/kubeconfig-native4.yaml

Which will have a response of

NAME        STATUS   ROLES                  AGE    VERSION
mstr-18nu   Ready    control-plane,master   13m    v1.21.2
node-oh7f   Ready    <none>                 8m4s   v1.21.2

This of course corresponds to what has been deployed in VCD.

More screenshots

Install Guide – Container Service Extension 3.0.4 with VMware Cloud Director for Tanzu Kubernetes Grid

This post covers how to install CSE 3.0.4 and enable it to work with VCD 10.2.2 and TKG 1.3. It is a simplified step by step guide on how to install CSE and get up and running with VCD as quickly as possible.

This post covers how to install CSE 3.0.4 and enable it to work with VCD 10.2.2 and TKG 1.3. It is a simplified step by step guide on how to install CSE and get up and running with VCD as quickly as possible.

A Short Introduction to Container Service Extension

Container Service Extension (CSE) is a VMware vCloud Director (VCD) extension that helps tenants create and work with Kubernetes clusters.

CSE brings Kubernetes as a Service to VCD, by creating customized VM templates (Kubernetes templates) and enabling tenant users to deploy fully functional Kubernetes clusters as self-contained vApps.

CSE has a server component that installs as a VCD extension. It exposes REST API endpoint points via VCD. CSE also has a client component that plugs in vcd-cli, communicates with the CSE server via the exposed API endpoints, and facilitates VCD users to create Kubernetes clusters in VCD. The following diagram illustrates the interactions between the components.

cse-workflow

Please refer to the official documentation for more details.

However complicated the above diagram is, I aim to make the installation process super-simple. You can use this article to get up and running more quickly than using the official documentation above.

Preparing CSE Server

Choose a Linux distribution to use for the CSE server and deploy it into your vSphere management cluster and ensure that it can route to the public interface of your VMware Cloud Director instance.

We will be using MQTT which is embedded into VCD and therefore does not need RabbitMQ.

I used a Centos 8 VM with the following settings.

[Update] I’ve recently published a new post to show how you can deploy CSE server on Photon OS and run it as a Linux service. I recommend using Photon OS instead of CentOS.

ComponentSpecification
Centos 8 imageCentOS-8.4.2105-x86_64-boot.iso
vCPUs1
Memory2GB
NetworkManagement Network (same as vCenter, NSX-T Manager etc)
RoutesRoutable to vCD public URL and has outbound Internet access.
Other configurationDNS, NTP, VMware Tools

Perform the following on the Centos 8 VM

yum update
yum upgrade

yum install -y yum-utils
yum groupinstall -y development
yum -y install python38 python38-pip python38-devel
easy_install-3.8 pip
pip3 install --user vcd-cli

# Add /root/.local/bin to PATH to remove path errors for vcd-cli
PATH=$PATH:/root/.local/bin
export PATH

# Check vcd cli is installed correctly
vcd version

# Check python version
python3 --version

# Uninstall cryptography and humanfriendly
pip uninstall cryptography
pip uninstall humanfriendly

# Install CSE
pip3 install git+https://github.com/vmware/container-service-extension.git@3.0.4

# Check versions
cse version
vcd cse version

# To enable the CSE client in vcd-cli, make the ~/.vcd-cli directory
mkdir ~/.vcd-cli

# create a new file in ~/.vcd-cli/profiles.yaml
vi ~/.vcd-cli/profiles.yaml

# to include the following contents in that file
extensions: 
- container_service_extension.client.cse

CSE Server Configuration

Generate a sample config.yaml file

cse sample -o config.yaml

Contents of my file

# Only one of the amqp or mqtt sections should be present. I am using MQTT, which is built into VCD 10.2 and is supported by CSE 3.

#amqp:
#  exchange: cse-ext
#  host: amqp.vmware.com
#  password: guest
#  port: 5672
#  prefix: vcd
#  routing_key: cse
#  username: guest
#  vhost: /

# using verify_ssl: false as this is a demo lab
mqtt:
  verify_ssl: false


vcd:
  api_version: '35.0'
  host: vcd.vmwire.com
  log: true
  password: Vmware1!
  port: 443
  username: administrator
  verify: false

vcs:
# vcenter name needs to be in FQDN format in vCD too, see screenshots below.
- name: vcenter.vmwire.com
  password: Vmware1!
  username: administrator@vsphere.local
  verify: false

service:
  enable_tkg_m: true
  enforce_authorization: false
  log_wire: false
  processors: 15
  telemetry:
    enable: true

# ensure that you have setup a dedicated organization, VDC, internet accessible network and catalog for CSE.

broker:
  catalog: cse-catalog
  default_template_name: ubuntu-20.04_tkgm-1.20_antrea-0.11
  default_template_revision: 1
  ip_allocation_mode: pool
  network: default-organization-network
  org: cse
  remote_template_cookbook_url: https://raw.githubusercontent.com/vmware/container-service-extension-templates/tkgm/template.yaml
  storage_profile: 'truenas-iscsi-luns'
  vdc: cse-vdc

A couple of notes on this config.yaml file.

  • Disable certificate verification if you do not have signed SSL certificates or this is for lab purposes and you are comfortable with this.
  • Create a new organization, org VDC (any allocation model), catalog, organization network (with access to the internet). See my screenshots below.

If you prefer to use an org routed network behind a NSX-T T0 then don’t forget to setup the Edge firewall and source NAT rules, I’ve provided screenshots below. Otherwise you can use a direct connect organization network backed by a port group instead. Just ensure that this network has outbound internet access.

Create a static IP pool for this organization network so that the VMs that CSE prepares can be configured with networking details.

  • Ensure that this new org corresponds to the settings under the broker section in the config.yaml file.
  • the default_template_name can correspond to any of the templates listed in this file, look for the name parameter. This file is the TKGm specific file, if you also want to support native upstream k8s then you can use this file instead. In fact you can support both at the same time. To support both, first install CSE with one config file (TKGm) and then upgrade CSE with the other config file (native). Or use my script here that does everything for you.
  • Read this documentation for pointers or place a comment below and I’ll help you out.
  • Under the vcs section, you’ll notice that you need to specify a vCenter name, this has to be the same name but in a FQDN format as the vCenter Server Instance setting under Infrastructure Resources in VCD. Like my settings below:

Once everything is ready you will need to encrypt the config file as CSE will only work with an encrypted file.

cse encrypt config.yaml --output encrypted-config.yaml

CSE will ask you for an encryption password, please keep a note of it.

Install CSE

Remove group and unnecessary permissions from the config file, CSE will complain if you don’t.

chmod 600 encrypted-config.yaml

First check the validity of the config file before installing CSE.

cse check encrypted-config.yaml

Install CSE with this command

cse install -c encrypted-config.yaml

This activity will take a long time, over an hour as CSE will do the following:

  1. Download all of the OVAs from the template specification file. There are five templates to download
  2. For each OVA, it will upload to the VCD cse organization catalog, in my case cse-catalog under the cse organization
  3. Create a vApp for each catalog
  4. Prepare the VM by download bits
  5. Upload the VM to the catalog as a template

Once complete you’ll be able to see the following templates in the catalog. Note that I’ve enabled CSE to use both TKGm and native upstream k8s, hence the many templates listed here.

Enable CSE 2.3 Plugin in VCD

CSE 3.0.4 does not support the default CSE 2.2 plugin that is enabled by default with VCD 10.2.2. We need to disable and remove the CSE 2.2 plugin and upload and enable the CSE 2.3 plugin instead.

This plugin is available from this link on my.vmware.com.

To install it go to the /Provider portal and under More, use the Customize Portal.

And then publish the plugin to all/select Tenants.

Enable Tenant access to CSE

Note that CSE supports both TKGm and native k8s runtimes at the same time, and you can provision both with VCD.

TKG related options won’t show up in vcd-cli, unless explicitly enabled. To enable TKG options in vcd-cli, set the following environment variable

export CSE_TKG_M_ENABLED=True

First login to VCD using the vcd cli

vcd login vcd.vmwire.com system administrator --password Vmware1! -w -i

Check for registered extensions

vcd system extension list

Register the cse extension

vcd system extension create cse cse cse vcdext '/api/cse, /api/cse/.*, /api/cse/.*/.*'

Enable a tenant to use TKGm with CSE. The command below enables CSE for another organization named tenant1 and for the org VDC named tenant1-vdc.

vcd cse ovdc enable tenant1-vdc -o tenant1 --tkg

You should see the following output.

OVDC Update: Updating OVDC placement policies
task: 900ea6f4-e08b-4d29-ae24-2f604e50276e, Operation success, result: success

To enable a tenant to use native upstream k8s with CSE. Use the following command.

vcd cse ovdc enable --native --org tenant1 tenant1-vdc

Enable Global Roles to use CSE or Configure Rights Bundles

The quickest way to get CSE working is to add the relevant rights to the Organization Administrator role. You can create a custom rights bundle and create a custom role for the k8s admin tenant persona if you like. I won’t cover that in this post.

Log in as the /Provider and go to the Administration menu and click on Global Roles on the left.

Edit the Organization Administrator role and scroll all the way down to the bottom and click both the View 8/8 and Manage 12/12, then Save.

Starting CSE

First lets check our installation

cse check encrypted-config.yaml --check-install

Run CSE from command line

# Run server in foreground
cse run --config config.yaml

# Run server in background
nohup cse run --config config.yaml > nohup.out 2>&1 &

You can also run CSE as a service, please refer to this link if you prefer to do this instead.

Deploying a TKG cluster as a Tenant

Congratulations, now we’re ready to deploy a k8s cluster.

Log into VCD as a tenant and go to More, Kubernetes Container Clusters.

Click on New and you should now see an option to deploy a Native Kubernetes runtime or a VMware Tanzu Kubernetes Grid runtime. VCD also supports vSphere with Tanzu as well (which is not installed as part of this article). You’ll see a third tile here if you did enable vSphere with Tanzu (TKGs).

On the next page, give the k8s cluster a name, select a runtime and optionally paste in your SSH public key for easier access to the Kubernetes cluster later.

Proceed as following screenshots.

CSE 3.0.4 does not support multi-master, i.e., more than one node for the Control Plane. This is coming in a future release.

Next select the storage policies that the Control Plane node and the Worker node(s) will be deployed into. You can also opt to deploy another node to use as persistent volumes through NFS.

Select the network.

Review the final page and click on Finish. CSE will now deploy the TKG cluster for you and it will be ready once all nodes are up and running. You’ll see the following once ready.

Which you can also see with this command in CSE

vcd cse cluster list
[root@cse .vcd-cli]# vcd cse cluster list
Name    Org      Owner          VDC          K8s Runtime    K8s Version       Status
------  -------  -------------  -----------  -------------  ----------------  ----------------
tkg     tenant1  tenant1-admin  tenant1-vdc  native         upstream 1.14.10  CREATE:SUCCEEDED

Only thing left to do is download the Kube Config file and login with kubectl.

Useful commands

# Login to VCD
vcd login vcd.vmwire.com system administrator --password Vmware1! -w -i

# Register CSE extension with VCD
vcd system extension create cse cse cse vcdext '/api/cse, /api/cse/.*, /api/cse/.*/.*'

# List VCD extentions
vcd system extension list

# Describe CSE extension
vcd system extension info cse

# Describe CSE configuration
vcd cse system info

# List organization VDCs with CSE enabled
vcd cse ovdc list

# Enable CSE for org VDC
vcd cse ovdc enable --native --org tenant1 tenant1-vdc

# Look at CSE logs
cat /root/.cse-logs/cse-server-info.log, /root/.cse-logs/cse-server-debug.log

# Tail the CSE logs
tail -f /root/.cse-logs/cse-server-info.log, /root/.cse-logs/cse-server-debug.log

# Upgrading CSE or changing config file parameters, e.g., changing verify_ssl certs to true, note the skip-template-creation which will save you a lot of time
cse upgrade --config <config_file> --skip-template-creation

# Get infor for a cluster named tkg
vcd cse cluster info tkg

# Login as a tenant user
vcd login vcd.vmwire.com tenant1 tenant1-admin -i -w

# Deploy tkg cluster using the command line
vcd cse cluster apply tkg7.yaml

Deploy Kubernetes Dashboard to your Tanzu Kubernetes Cluster

In this post I show how to deploy the Kubernetes Dashboard onto a Tanzu Kubernetes Grid cluster.

Dashboard provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.

Dashboard is a web-based Kubernetes user interface. You can use Dashboard to deploy containerized applications to a Kubernetes cluster, troubleshoot your containerized application, and manage the cluster resources. You can use Dashboard to get an overview of applications running on your cluster, as well as for creating or modifying individual Kubernetes resources (such as Deployments, Jobs, DaemonSets, etc). For example, you can scale a Deployment, initiate a rolling update, restart a pod or deploy new applications using a deploy wizard.

Dashboard also provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.

Deploy Kubernetes Dashboard

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml

To get full cluster access to the kubernetes-dashboard account run the following

kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:kubernetes-dashboard

Expose the dashboard using the ingress service so it can be accessed externally, I’m using NSX Advanced Load Balancer (Avi) in my lab.

kubectl expose deployment -n kubernetes-dashboard kubernetes-dashboard --type=LoadBalancer --name=kubernetes-dashboard-public

This will then expose the kubernetes-dashboard deployment over port 8443 using the AKO ingress config for that TKG cluster.

Get the service details using this command

kubectl get -n kubernetes-dashboard svc kubernetes-dashboard-public

You should see the following output:

root@photon-manager [ ~/.local/share/tanzu-cli ]# kubectl get -n kubernetes-dashboard svc kubernetes-dashboard-public
NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
kubernetes-dashboard-public   LoadBalancer   100.71.203.41   172.16.4.34   8443:32360/TCP   17h
root@photon-manager [ ~/.local/share/tanzu-cli ]#

You can see that the service is exposed on an IP address of 172.16.4.34, with a port of 8443.

Open up a web browser and enter that IP and port. You’ll see a screen similar to the following.

You’ll need a token to login, to obtain a token use the following command.

kubectl describe -n kubernetes-dashboard secret kubernetes-dashboard-token

Copy just the token and paste it into the browser to login. Enjoy Kubernetes Dashboard. Enjoy!

Deploying a Tanzu Shared Services Cluster with NSX Advanced Load Balancer Ingress Services Integration

In the previous post I prepared NSX ALB for Tanzu Kubernetes Grid ingress services. In this post I will deploy a new TKG cluster and use if for Tanzu Shared Services.

Tanzu Kubernetes Grid includes binaries for tools that provide in-cluster and shared services to the clusters running in your Tanzu Kubernetes Grid instance. All of the provided binaries and container images are built and signed by VMware.

A shared services cluster, is just a Tanzu Kubernetes Grid workload cluster used for shared services, it can be provisioned using the standard cli command tanzu cluster create, or through Tanzu Mission Control.

In the previous post I prepared NSX ALB for Tanzu Kubernetes Grid ingress services. In this post I will deploy a new TKG cluster and use if for Tanzu Shared Services.

Tanzu Kubernetes Grid includes binaries for tools that provide in-cluster and shared services to the clusters running in your Tanzu Kubernetes Grid instance. All of the provided binaries and container images are built and signed by VMware.

A shared services cluster, is just a Tanzu Kubernetes Grid workload cluster used for shared services, it can be provisioned using the standard cli command tanzu cluster create, or through Tanzu Mission Control.

You can add functionalities to Tanzu Kubernetes clusters by installing extensions to different cluster locations as follows:

FunctionExtensionLocationProcedure
Ingress ControlContourTanzu Kubernetes or Shared Service clusterImplementing Ingress Control with Contour
Service DiscoveryExternal DNSTanzu Kubernetes or Shared Service clusterImplementing Service Discovery with External DNS
Log ForwardingFluent BitTanzu Kubernetes clusterImplementing Log Forwarding with Fluentbit
Container RegistryHarborShared Services clusterDeploy Harbor Registry as a Shared Service
MonitoringPrometheus
Grafana
Tanzu Kubernetes clusterImplementing Monitoring with Prometheus and Grafana

The Harbor service runs on a shared services cluster, to serve all the other clusters in an installation. The Harbor service requires the Contour service to also run on the shared services cluster. In many environments, the Harbor service also benefits from External DNS running on its cluster, as described in Harbor Registry and External DNS.

Some extensions require or are enhanced by other extensions deployed to the same cluster:

  • Contour is required by Harbor, External DNS, and Grafana
  • Prometheus is required by Grafana
  • External DNS is recommended for Harbor on infrastructures with load balancing (AWS, Azure, and vSphere with NSX Advanced Load Balancer), especially in production or other environments in which Harbor availability is important.

Each Tanzu Kubernetes Grid instance can only have one shared services cluster.

Relationships

The following table shows the relationships between the NSX ALB system, the TKG cluster deployment config and the AKO config. It is important to get these three correct.

Avi ControllerTKG cluster deployment fileAKO Config file
Service Engine Group name
tkg-ssc-se-group
AVI_LABELS
'cluster': 'tkg-ssc'
clusterSelector:
matchLabels:
cluster: tkg-ssc

serviceEngineGroup: tkg-ssc-se-group

TKG Cluster Deployment Config File – tkg-ssc.yaml

Lets first take a look at the deployment configuration file for the Shared Services Cluster.

I’ve highlighted in bold the two key value pairs that are important in this file. You’ll notice that

AVI_LABELS: |
    'cluster': 'tkg-ssc'

We are labeling this TKG cluster so that Avi knows about it. In addition the other key value pair

AVI_SERVICE_ENGINE_GROUP: tkg-ssc-se-group

This ensures that this TKG cluster will use the service engine group named tkg-ssc-se-group.

While we have this file open you’ll notice that the long certificate under AVI_CA_DATA_B64 is the copy and paste of the Avi Controller certificate that I copied from the previous post.

Take some time to review my cluster deployment config file for the Shared Services Cluster below. You’ll see that you will need to specify the VIP network for NSX ALB to use

AVI_DATA_NETWORK: tkg-ssc-vip

AVI_DATA_NETWORK_CIDR: 172.16.4.32/27

Basically, any key that begins with AVI_ needs to have the corresponding setting configured in NSX ALB. This is what we prepared in the previous post.

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# cat tkg-ssc.yaml
AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZIakNDQkFhZ0F3SUJBZ0lTQXo2SUpRV3hvMDZBUUdZaXkwUGpyN0pqTUEwR0NTcUdTSWIzRFFFQkN3VUEKTURJeEN6QUpCZ05WQkFZVEFsVlRNUll3RkFZRFZRUUtFdzFNWlhRbmN5QkZibU55ZVhCME1Rc3dDUVlEVlFRRApFd0pTTXpBZUZ3MHlNVEEzTURnd056TXlORGRhRncweU1URXdNRFl3TnpNeU5EWmFNQmN4RlRBVEJnTlZCQU1NCkRDb3VkbTEzYVhKbExtTnZiVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFLVVkKbE9iZjk4QlF1cnhyaFNzWmFDRVFrMWxHWUJMd1REMTRITVlWZlNaOWE3WWRuNVFWZ2dKdUd6Um1tdVdFaWRTRgppaW1teTdsWHMyVTZjazdjOXhQZkxWOHpySEZjL2tEbDB4ZzZRWVNJa0wrRWdkWW9VeG1vTk95VzArc0FrTEFQCkZaU1JYUW44K3BiYlJDVFZ2L3ErTjBueUJzNlZyTVJySjA3UEhwck9qTE01WHo2eEJxR2E5WXEzbUhNZ0lXaXoKYXVVam84ZnBrYllxMlpKSlkzVXFxNkhBTEw3MDVTblJxZXk5c05pOEJkT2RtTk5ka1ovZXBacUlCRDZQZk1WegpEWVVEV2h0OUczR2NXcGxUT1AweDJTYmZ0MEFCUWYvU1BEWHlBSFBYYWhoNThKc1kwNHd5ZE04QWNQRkRwbUc5CjRvTnpoaTdTVXJLdDJ4cStXazhDQXdFQUFhT0NBa2N3Z2dKRE1BNEdBMVVkRHdFQi93UUVBd0lGb0RBZEJnTlYKSFNVRUZqQVVCZ2dyQmdFRkJRY0RBUVlJS3dZQkJRVUhBd0l3REFZRFZSMFRBUUgvQkFJd0FEQWRCZ05WSFE0RQpGZ1FVbk1QeDFQWUlOL1kvck5WL2FNYU9MNVBzUGtRd0h3WURWUjBqQkJnd0ZvQVVGQzZ6RjdkWVZzdXVVQWxBCjVoK3ZuWXNVd3NZd1ZRWUlLd1lCQlFVSEFRRUVTVEJITUNFR0NDc0dBUVVGQnpBQmhoVm9kSFJ3T2k4dmNqTXUKYnk1c1pXNWpjaTV2Y21jd0lnWUlLd1lCQlFVSE1BS0dGbWgwZEhBNkx5OXlNeTVwTG14bGJtTnlMbTl5Wnk4dwpGd1lEVlIwUkJCQXdEb0lNS2k1MmJYZHBjbVV1WTI5dE1Fd0dBMVVkSUFSRk1FTXdDQVlHWjRFTUFRSUJNRGNHCkN5c0dBUVFCZ3Q4VEFRRUJNQ2d3SmdZSUt3WUJCUVVIQWdFV0dtaDBkSEE2THk5amNITXViR1YwYzJWdVkzSjUKY0hRdWIzSm5NSUlCQkFZS0t3WUJCQUhXZVFJRUFnU0I5UVNCOGdEd0FIWUFSSlJsTHJEdXpxL0VRQWZZcVA0bwp3TnJtZ3I3WXl6RzFQOU16bHJXMmdhZ0FBQUY2aFQ5NlpRQUFCQU1BUnpCRkFpQlRkTUErcUdlQmM4cnpvREtMClFQaGgya05aVitVMmFNVEJacGYyaFpFRnpBSWhBT2Y2bVZFQkMwVlY3VWlUNC9tSWZWUEM2NUphL0RRc0xEOFkKMHFjQzdzNGhBSFlBZlQ3eStJLy9pRlZvSk1MQXlwNVNpWGtyeFE1NENYOHVhcGRvbVg0aThOY0FBQUY2aFQ5Ngpqd0FBQkFNQVJ6QkZBaUVBeEx1SDJTS2ZlVThpRUd6NWd4dmc2ME8zSmxmY25mN2JKS3h2K1dzUENPUUNJRjArCm1aV0psQSt2Uy85cjhPbmZ5cElMTFhaamIydzBEQlREWTN6WlErOUxNQTBHQ1NxR1NJYjNEUUVCQ3dVQUE0SUIKQVFDczNYeUhJbUF2T25JdjVMdm1nZ2l5M0s0Q1J1Zkg5ZmZWc1RtSDkwSzZETmorWkNySDEvNlJBUUREek45SApYZDJJd1BCWkJHcURocmYvZ0xOWTg2NHlkT2pPRHFuSXdsNVFDb095c0h4NGtISnFXdmpwVVF3d3FiS3lZeDZCCmkxalRtU0RTZmV0OHVtTUtMZ1Z3djBHdnZaV0daYmNUaVJCMEIycW5MR0VPbDJ2WnoyQlhkK1RGMmgyUGRkMzkKTHFDYzJ6UFZOVnB4WVR3OVJ4U1Q1d2FiR2doK0pOVGl1aVVlOUFhQmxOcmRjYjJ0d3d5TnA4Y3krdEw2RXMyagpRbVB2WndMYVRGQngzNWJHeE9LNGF1NFM5SVBIUG84cVVQVVR6RVAwYWtyK2FqY3hOMXlPalRSRENzWEIrcitrCmQ5ZW9CVlFjbFJTa0dMbE5Nc044OEhlKwotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t
AVI_CLOUD_NAME: vcenter.vmwire.com
AVI_CONTROLLER: avi.vmwire.com
AVI_DATA_NETWORK: tkg-ssc-vip
AVI_DATA_NETWORK_CIDR: 172.16.4.32/27
AVI_ENABLE: "true"
AVI_LABELS: |
    'cluster': 'tkg-ssc'
AVI_PASSWORD: <encoded:Vm13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: tkg-ssc-se-group
AVI_USERNAME: admin
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg-ssc
CLUSTER_PLAN: dev
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: 172.16.3.58
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /home.local
VSPHERE_DATASTORE: /home.local/datastore/vsanDatastore
VSPHERE_FOLDER: /home.local/vm/tkg-ssc
VSPHERE_NETWORK: tkg-ssc
VSPHERE_PASSWORD: <encoded:Vm13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /home.local/host/cluster/Resources/tkg-ssc
VSPHERE_SERVER: vcenter.vmwire.com
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQEAhcw67bz3xRjyhPLysMhUHJPhmatJkmPUdMUEZre+MeiDhC602jkRUNVu43Nk8iD/I07kLxdAdVPZNoZuWE7WBjmn13xf0Ki2hSH/47z3ObXrd8Vleq0CXa+qRnCeYM3FiKb4D5IfL4XkHW83qwp8PuX8FHJrXY8RacVaOWXrESCnl3cSC0tA3eVxWoJ1kwHxhSTfJ9xBtKyCqkoulqyqFYU2A1oMazaK9TYWKmtcYRn27CC1Jrwawt2zfbNsQbHx1jlDoIO6FLz8Dfkm0DToanw0GoHs2Q+uXJ8ve/oBs0VJZFYPquBmcyfny4WIh4L0lwzsiAVWJ6PvzF5HMuNcwQ== rsa-key-20210508
VSPHERE_TLS_THUMBPRINT: D0:38:E7:8A:94:0A:83:69:F7:95:80:CD:99:9B:D3:3E:E4:DA:62:FA
VSPHERE_USERNAME: administrator@vsphere.local
VSPHERE_WORKER_DISK_GIB: "20"
VSPHERE_WORKER_MEM_MIB: "4096"
VSPHERE_WORKER_NUM_CPUS: "2"

AKODeploymentConfig – tkg-ssc-akodeploymentconfig.yaml

The next file we need to configure is the AKODeploymentConfig file, this file is used by Kubernetes to ensure that the L4 load balancing is using NSX ALB.

I’ve highlighted some settings that are important.

clusterSelector:
matchLabels:
cluster: tkg-ssc

Here we are specifying a cluster selector for AKO that will use the name of the cluster, this corresponds to the following setting in the tkg-ssc.yaml file.

AVI_LABELS: |
'cluster': 'tkg-ssc'

The next key value pair specifies what Service Engines to use for this TKG cluster. This is of course what we configured within Avi in the previous post.

serviceEngineGroup: tkg-ssc-se-group

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# cat tkg-ssc-akodeploymentconfig.yaml
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
  finalizers:
  - ako-operator.networking.tkg.tanzu.vmware.com
  generation: 2
  name: ako-for-tkg-ssc
spec:
  adminCredentialRef:
    name: avi-controller-credentials
    namespace: tkg-system-networking
  certificateAuthorityRef:
    name: avi-controller-ca
    namespace: tkg-system-networking
  cloudName: vcenter.vmwire.com
  clusterSelector:
    matchLabels:
      cluster: tkg-ssc
  controller: avi.vmwire.com
  dataNetwork:
    cidr: 172.16.4.32/27
    name: tkg-ssc-vip
  extraConfigs:
    image:
      pullPolicy: IfNotPresent
      repository: projects-stg.registry.vmware.com/tkg/ako
      version: v1.3.2_vmware.1
    ingress:
      defaultIngressController: false
      disableIngressClass: true
  serviceEngineGroup: tkg-ssc-se-group

Setup the new AKO configuration before deploying the new TKG cluster

Before deploying the new TKG cluster, we have to setup a new AKO configuration. To do this run the following command under the TKG Management Cluster context.

kubectl apply -f <Path_to_YAML_File>

Which in my example is

kubectl apply -f tkg-ssc-akodeploymentconfig.yaml

You can use the following to check that that was successful.

kubectl get akodeploymentconfig

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl get akodeploymentconfig
NAME AGE
ako-for-tkg-ssc 3d19h

You can also show additional details by using the kubectl describe command

kubectl describe akodeploymentconfig  ako-for-tkg-ssc

For any new AKO configs that you need, just take a copy of the .yaml file and edit the contents that correspond to the new AKO config. For example, to create another AKO config for a new tenant, take a copy of the tkg-ssc-akodeploymentconfig.yaml file and give it a new name such as tkg-tenant-1-akodeploymentconfig.yaml, and change the following highlighted key value pairs.

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
  finalizers:
  - ako-operator.networking.tkg.tanzu.vmware.com
  generation: 2
  name: ako-for-tenant-1
spec:
  adminCredentialRef:
    name: avi-controller-credentials
    namespace: tkg-system-networking
  certificateAuthorityRef:
    name: avi-controller-ca
    namespace: tkg-system-networking
  cloudName: vcenter.vmwire.com
  clusterSelector:
    matchLabels:
      cluster: tkg-tenant-1
  controller: avi.vmwire.com
  dataNetwork:
    cidr: 172.16.4.64/27
    name: tkg-tenant-1-vip
  extraConfigs:
    image:
      pullPolicy: IfNotPresent
      repository: projects-stg.registry.vmware.com/tkg/ako
      version: v1.3.2_vmware.1
    ingress:
      defaultIngressController: false
      disableIngressClass: true
  serviceEngineGroup: tkg-tenant-1-se-group

Create the Tanzu Shared Services Cluster

Now we can deploy the Shared Services Cluster with the file named tkg-ssc.yaml

tanzu cluster create --file /root/.tanzu/tkg/clusterconfigs/tkg-ssc.yaml

This will deploy the cluster according to that cluster spec.

Obtain credentials for shared services cluster tkg-ssc

tanzu cluster kubeconfig get tkg-ssc --admin

Add labels to the cluster tkg-ssc

Add the cluster role as a Tanzu Shared Services Cluster

kubectl label cluster.cluster.x-k8s.io/tkg-ssc cluster-role.tkg.tanzu.vmware.com/tanzu-services="" --overwrite=true

Check that label was applied

tanzu cluster list --include-management-cluster

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# tanzu cluster list --include-management-cluster
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN
tkg-ssc default running 1/1 1/1 v1.20.5+vmware.2 tanzu-services dev
tkg-mgmt tkg-system running 1/1 1/1 v1.20.5+vmware.2 management dev

Add the key value pair of cluster=tkg-ssc to label this cluster and complete the setup of AKO.

kubectl label cluster tkg-ssc cluster=tkg-ssc

Once the cluster is labelled, switch to the tkg-ssc context and you will notice a new namespace named “avi-system” being created and a new pod named “ako-0” being started.

kubectl config use-context tkg-ssc-admin@tkg-ssc

kubectl get ns

kubectl get pods -n avi-system

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl config use-context tkg-ssc-admin@tkg-ssc
Switched to context "tkg-ssc-admin@tkg-ssc".

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl get ns
NAME STATUS AGE
avi-system Active 3d18h
cert-manager Active 3d15h
default Active 3d19h
kube-node-lease Active 3d19h
kube-public Active 3d19h
kube-system Active 3d19h
kubernetes-dashboard Active 3d16h
tanzu-system-monitoring Active 3d15h
tkg-system Active 3d19h
tkg-system-public Active 3d19h

root@photon-manager [ ~/.tanzu/tkg/clusterconfigs ]# kubectl get pods -n avi-system
NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 3d18h

Summary

We now have a new TKG Shared Services Cluster up and running and configured for Kubernetes ingress services with NSX ALB.

In the next post I’ll deploy the Kubernetes Dashboard onto the Shared Services Cluster and show how this then configures the NSX ALB for ingress services.

Preparing NSX Advanced Load Balancer for Tanzu Kubernetes Grid Ingress Services

In this post I describe how to setup NSX ALB (Avi) in preparation for use with Tanzu Kubernetes Grid, more specifically, the Avi Kubernetes Operator (AKO).

AKO is a Kubernetes operator which works as an ingress controller and performs Avi-specific functions in a Kubernetes environment with the Avi Controller. It runs as a pod in the cluster and translates the required Kubernetes objects to Avi objects and automates the implementation of ingresses/routes/services on the Service Engines (SE) via the Avi Controller.

In this post I describe how to setup NSX ALB (Avi) in preparation for use with Tanzu Kubernetes Grid, more specifically, the Avi Kubernetes Operator (AKO).

AKO is a Kubernetes operator which works as an ingress controller and performs Avi-specific functions in a Kubernetes environment with the Avi Controller. It runs as a pod in the cluster and translates the required Kubernetes objects to Avi objects and automates the implementation of ingresses/routes/services on the Service Engines (SE) via the Avi Controller.

AKO
Avi Kubernetes Operator Architecture

First lets describe the architecture for TKG + AKO.

For each tenant that you have, you will have at least one AKO configuration.

A tenant can have one or more TKG workload clusters and more than one TKG workload cluster can share an AKO configuration. This is important to remember for multi-tenant services when using Tanzu Kubernetes Grid. However, you can of course configure an AKO config for each TKG workload cluster if you wish to provide multiple AKO configurations. This will require more Service Engines and Service Engine Groups as we will discuss further below.

So as a minimum, you will have several AKO configs. Let me summarize in the following table.

AKO ConfigDescriptionSpecification
install-ako-for-allThe default ako configuration used for the TKG Management Cluster and deployed by defaultProvider side ako configuration for the TKG Management Cluster only.
ako-for-tkg-sscThe ako configuration for the Tanzu Shared Services ClusterProvider side AKO configuration for the Tanzu Shared Services Cluster only.

tkg-ssc-akodeploymentconfig.yaml
ako-for-tenant-1The ako configuration for Tenant 1AKO configuration prepared by the Provider and deployed for the tenant to use.

tkg-tenant-1-akodeploymentconfig.yaml
ako-for-tenant-x The ako configuration for Tenant x

Although TKG deploys a default AKO config, we do not use any ingress services for the TKG Management Cluster. Therefore we do not need to deploy a Service Engine Group and Service Engines for this cluster.

Service Engine Groups and Service Engines are only required if you need ingress services to your applications. We of course need this for the Tanzu Shared Services and any applications deployed into a workload cluster.

I will go into more detail in a follow-up post where I will demonstrate how to setup the Tanzu Shared Services Cluster that uses the preparation steps described in this post.

Lets start the Avi Controller configuration. Although I am using the Tanzu Shared Services Cluster as an example for this guide, the same steps can be repeated for all additional Tanzu Kubernetes Grid workload clusters. All that is needed is a few changes to the .yaml files and you’re good to go.

Clouds

I prefer not to use the Default-Cloud, and will always create a new cloud.

The benefit to using NSX ALB in write mode (Orchestration mode) is that NSX ALB will orchestrate the creation of service engines for you and also scale out more service engines if your applications demand more capacity. However, if you are using VMware Cloud on AWS, this is not possible due to restrictions with the RBAC constraints within VMC so only non-orchestration mode is possible with VMC.

In this post I’m using my home lab which is running vSphere.

Navigate to Infrastructure, Clouds and click on the CREATE button and select the VMware vCenter/VMware vSphere ESX option. This post uses vCenter as a cloud type.

Fill in the details as my screenshots show. You can leave the IPAM Profile settings empty for now, we will complete these in the next step.

Select the Data Center within your vSphere hierarchy. I’m using my home lab for this example. Again leave all the other settings on the defaults.

The next tab will take you to the network options for the management network to use for the service engines. This network needs to be routable between the Avi Controller(s) and the service engines.

The network tab will show you the networks that it finds from the vCenter connection, I am using my management network. This network is where I run all of the management appliances, such as vCenter, NSX-T, Avi Controllers etc.

Its best to configure a static IP pool for the service engines. Generally, you’ll need just a handful of IP addresses as each service engine group will have two service engines and each service engine will only need one management IP. A service engine group can provide Kubernetes load balancing services for the entire Kubernetes cluster. This of course depends on your sizing requirements, and can be reviewed here. For my home lab, fourteen IP addresses is more than sufficient for my needs.

Service Engine Group

While we’re in the Infrastructure settings lets proceed to setup a new Service Engine Group. Navigate to Infrastructure, Service Engine Group, select the new cloud that we previously setup and then click on the CREATE button. Its important that you ensure you select your new cloud from that drop down menu.

Give your new service engine group a name, I tend to use a naming format such as tkg-<cluster-name>-se-group. For this example, I am setting up a new SE group for the Tanzu Shared Services Cluster.

Reduce the maximum number of service engines down if you wish. You can leave all other settings on defaults.

Click on the Advanced tab to setup some vSphere specifics. Here you can setup some options that will help you identify the SEs in the vSphere hierarchy as well as placing the SEs into a VM folder and options to include or exclude compute clusters or hosts and even an option to include or exclude a datastore.

Service Engine groups are important as they are the boundary with which TKG clusters will use for L4 services. Each SE Group needs to have a unique name, this is important as each TKG workload cluster will use this name for its AKODeploymentConfig file, this is the config file that maps a TKG cluster to NSX ALB for L4 load balancing services.

With TKG, when you create a TKG workload cluster you must specify some key value pairs that correspond to service engine group names and this is then applied in the AKODeploymentConfig file.

The following table shows where these relationships lie and I will go into more detail in a follow-up post where I will demonstrate how to setup the Tanzu Shared Services Cluster.

Avi ControllerTKG cluster deployment fileAKO Config file
Service Engine Group name
tkg-ssc-se-group
AVI_LABELS
'cluster': 'tkg-ssc'
clusterSelector:
matchLabels:
cluster: tkg-ssc

serviceEngineGroup: tkg-ssc-se-group

Networks

Navigate to Infrastructure, Networks, again ensure that you select your new cloud from the drop down menu.

The Avi Controller will show you all the networks that it has detected using the vCenter connection that you configured. What we need to do in this section is to configure the networks that NSX ALB will use when configuring a service for Kubernetes to use. Generally, depending on how you setup your network architecture for TKG, you will have one network that the TKG cluster will use and another for the front-end VIP. This network is what you will use to expose the pods on. Think of it as a load balancer DMZ network.

In my home lab, I use the following setup.

NetworkDescriptionSpecification
tkg-mgmtTKG Management ClusterNetwork: 172.16.3.0/27
Static IP Pools:
172.16.3.26 – 172.16.3.29
tkg-sscTKG Shared Services ClusterNetwork: 172.16.3.32/27
Static IP Pools:
172.16.3.59 – 172.16.3.62
tkg-ssc-vipTKG Shared Services Cluster front-end VIPsNetwork: 172.16.4.32/27
Static IP Pools:
172.16.4.34 – 172.16.4.62

IPAM Profile

Create an IPAM profile by navigating to Templates, Profiles, IPAM/DNS Profiles and clicking on the CREATE button and select IPAM Profile.

Select the cloud that you setup above and select all of the usable networks that you will use for applications that will use the load balancer service from NSX ALB. You want to select the networks that you configured in the step above.

Avi Controller Certificate

We also need the SSL certificate used by the Avi Controller, I am using a signed certificate in my home lab from Let’s Encrypt, which I wrote about in a previous post.

Navigate to Templates, Security, SSL/TLS Certificates, click on the icon with a downward arrow in a circle next to the certificate for your Avi Controller, its normally the first one in the list.

Click on the Copy to clipboard button and paste the certificate into Notepad++ or similar.

At this point we have NSX ALB setup for deploying a new TKG workload cluster using the new Service Engine Group that we have prepared. In the next post, I’ll demonstrate how to setup the Tanzu Shared Services Cluster to use NSX ALB for ingress services.

Tanzu Bootstrap using Photon OS 4.0

This topic explains how to install and initialize the Tanzu command line interface (CLI) on a bootstrap machine. I’ve found that using Photon OS 4.0 is the fastest and most straightforward method to get going with Tanzu.

A quick post on how to setup a Tanzu Kubernetes Grid bootstrap VM using Photon OS.

This topic explains how to install and initialize the Tanzu command line interface (CLI) on a bootstrap machine. I’ve found that using Photon OS 4.0 is the fastest and most straightforward method to get going with Tanzu. I’ve tried Ubuntu and CentOS but these require a lot more preparation with pre-requisites and dependencies such as building a VM from an ISO, and installing Docker which Photon OS already comes with. The other thing I’ve noticed with Linux distros such as Ubuntu is that other things might interfere with your Tanzu deployment. Apparmor and the ufw firewall come to mind.

The Tanzu bootstrap machine is the laptop, host, or server that you deploy management and workload clusters from, and that keeps the Tanzu and Kubernetes configuration files for your deployments. The bootstrap machine is typically local, but it can also be a physical machine or VM that you access remotely. We will be using a ready built Photon OS OVA that is supported by VMware.

Photon OS, is an open-source minimalist Linux operating system from VMware that is optimized for cloud computing platforms, VMware vSphere deployments, and applications native to the cloud.

Photon OS is a Linux container host optimized for vSphere and cloud-computing platforms such as Amazon Elastic Compute and Google Compute Engine. As a lightweight and extensible operating system, Photon OS works with the most common container formats, including Docker, Rocket, and Garden. Photon OS includes a yum-compatible, package-based lifecycle management system called tdnf.

Once the Tanzu CLI is installed, the second and last step to deploying Tanzu Kubernetes Grid is using the Tanzu CLI to create or designate a management cluster on each cloud provider that you use. The Tanzu CLI then communicates with the management cluster to create and manage workload clusters on the cloud provider.

Tanzu bootstrap high-level architecture

Download the Photon OS OVA from this direct link.

Deploy that ova using vCenter Linux Guest Customization and ensure that you deploy it on the same network as the TKG management cluster network that you intend to use. This is important as the bootstrap will set up a temporary kind cluster on Docker that will then be moved over to the TKG management cluster.

Ensure your VM has a minimum of 6GB of RAM. You can read up on other pre-requisites here.

Login using root, with password of changeme, you will be asked to update the root password. All the steps below are done using the root account, if you wish to use a non root account with Photon OS, ensure that you add that account to the docker group, more details in this link here.

Install the tar package, which we will need later to extract the Tanzu CLI.

tdnf install tar

First thing to do is create a new directory called /tanzu.

mkdir /tanzu

and then go into that directory, we will be performing most tasks in this director

cd /tanzu

Copy the following files to the /tanzu directory. You can get these files from my.vmware.com under the Tanzu Kubernetes Grid product listing.

kubectl-linux-v1.20.5-vmware.1.gz, this is the kubectl tool.

tanzu-cli-bundle-v1.3.1-linux-amd64.tar, this is the tanzu CLI.

Unpack the Tanzu CLI

tar -xvf tanzu-cli-bundle-v1.3.1-linux-amd64.tar

Unzip the kubectl tool

gunzip kubectl-linux-v1.20.5-vmware.1.gz

Install kubectl

install kubectl-linux-v1.20.5-vmware.1 /usr/local/bin/kubectl

Now go to the /cli directory

cd /tanzu/cli

Install Tanzu CLI

install core/v1.3.1/tanzu-core-linux_amd64 /usr/local/bin/tanzu

Install Tanzu CLI Plugins

Now go back to the /tanzu directory

tanzu plugin clean
tanzu plugin install --local cli all
tanzu plugin list

Enable bash auto completion for tanzu cli

source <(tanzu completion bash)

Enable bash auto completion for kubectl

source <(kubectl completion bash)

Install Carvel Tools – ytt, kapp, kgld and imgpkg

Install ytt

gunzip ytt-linux-amd64-v0.31.0+vmware.1.gz
chmod ugo+x ytt-linux-amd64-v0.31.0+vmware.1
mv ./ytt-linux-amd64-v0.31.0+vmware.1 /usr/local/bin/ytt

Install kapp

gunzip kapp-linux-amd64-v0.36.0+vmware.1.gz
chmod ugo+x kapp-linux-amd64-v0.36.0+vmware.1
mv ./kapp-linux-amd64-v0.36.0+vmware.1 /usr/local/bin/kapp

Install kbld

gunzip kbld-linux-amd64-v0.28.0+vmware.1.gz
chmod ugo+x kbld-linux-amd64-v0.28.0+vmware.1
mv ./kbld-linux-amd64-v0.28.0+vmware.1 /usr/local/bin/kbld

Install imgpkg

gunzip imgpkg-linux-amd64-v0.5.0+vmware.1.gz
chmod ugo+x imgpkg-linux-amd64-v0.5.0+vmware.1
mv ./imgpkg-linux-amd64-v0.5.0+vmware.1 /usr/local/bin/imgpkg

Stop Docker

systemctl stop docker

Upgrade software on Photon OS

tdnf -y upgrade docker

Start Docker

systemctl start docker

Add your SSH public key to login with your private key. Please see this link for a guide. Logout and use your private key to login.

vi ~/.ssh/authorized_keys

Start TKG UI Installer

tanzu management-cluster create --ui --bind <ip-of-your-photon-vm>:8080 --browser none

Updating Let’s Encrypt SSL Certificates for the Avi Controller

Updating Let’s Encrypt SSL Certificates for the Avi Controller

Again, you’ll need the fullchain.pem file but with the appended DST Root CA X3 certificate that was prepared in this article.

Navigate to Templates and then under Security, click on the SSL/TLS tab.

First we need to import each of the CA certificates in the chain before we import the certificate for the Avi Controller.

Again the certificates in the fullchain.pem file in order are

Subscriber Certificate
R3 Certificate
ISRG Root X1 Certificate
DST Root CA X3 Certificate

Click on CREATE, Root/Intermediate CA Certificate. Then import each certificate individually starting from the bottom. Click on Validate and then Import.

Do this again for the other two certificates, the ISRG Root X1 certificate and then the R3 intermediate certificate. Once done, you’ll see the following.

The Subscriber certificate is done differently.

Click on CREATE, Controller Certificate. Then give the certificate a name, click on the Import option and browse to the fullchain.pem file and also the privkey.pem file. A passphrase is not required as Let’s Encrypt does not create a passphrase. Click on Validate and then Import.

Once done, you’ll see the following.

Now that we’ve imported the Let’s Encrypt CA certificates, we can proceed to change the SSL certificate used by the Avi Controller for HTTPS web management.

Navigate to Administration, Settings, Access Settings, then click on the pencil icon.

Delete all of the current certificates in the SSL/TLS Certificate box and then select the new Subscriber certificate that we imported earlier, in my case I named it star-vmwire-com.

Once you press Save, you can close the browser session and open up a new one to start enjoying secure connections to your Avi Controller.

Updating Let’s Encrypt SSL Certificates for NSX-T Manager

Updating Let’s Encrypt SSL Certificates for NSX-T Manager

Updating NSX-T Manager to use a CA signed SSL certificate is a little bit different from how we updated the vCenter certificate. It requires interacting with the NSX-T API.

First lets import the certificate into NSX-T. Again, you’ll need the fullchain.pem file but with the appended DST Root CA X3 certificate that was prepared in this article.

Navigate to System and then under Settings, click on the Certificates link.

First we need to import each of the CA certificates in the chain before we import the certificate for NSX-T Manager.

Again the certificates in the fullchain.pem file in order are

Subscriber Certificate
R3 Certificate
ISRG Root X1 Certificate
DST Root CA X3 Certificate

Click on IMPORT, Import CA Certificate. Then import each certificate individually starting from the bottom, make sure to deselect the Service Certificate slider, as we are not using these certificates for virtual services.

Its important to import bottom up as this enables NSX-T to check the issuer for subsequent certificates that you import. So import in reverse order of the fullchain.pem file. Start importing with this order

DST Root CA X3 Certificate
ISRG Root X1 Certificate
R3 Certificate
Subscriber Certificate

Once you’ve imported all three of the CA root and intermediate certificates – DST Root CA X3 certificate, ISRG Root X1 CA and the R3 CA certificate, you can then import the Subscriber Certificate *.vmwire.com last, once all done you’ll see the following.

Summarized in the following table.

Order in fullchain.pemName in NSX-TIssued By
Subscriber Certificatestar-vmwire-comR3
R3 CertificateR3ISRG Root X1
ISRG Root X1 CertificateISRG Root X1DST Root CA X3
DST Root CA X3 CertificateDST Root CA X3DST Root CA X3

You’ll need the certificate ID for the certificate star-vmwire-com to use to update the NSX-T Manager certificate.

Click on the ID column of that certificate and copy the ID to your clipboard.

Now you’ll need to open a tool such as Postman to make the change.

First lets validate that our certificate is OK by using this GET against the NSX-T API, paste in the certificate ID into the URL.

GET https://nsx.vmwire.com/api/v1/trust-management/certificates/21fd7e8a-3a2e-4938-9dc7-5f3eccd791e7/?action=validate

If the status is “OK”, we’re good to continue.

Next use will POST the certificate ID against the following URL.

POST https://nsx.vmwire.com/api/v1/node/services/http?action=apply_certificate&certificate_id=21fd7e8a-3a2e-4938-9dc7-5f3eccd791e7

Once done, close your NSX-T Manager browser session, and enjoy using a CA signed certificate with NSX-T.

Updating Let’s Encrypt SSL Certificates for vCenter Server

Updating Let’s Encrypt SSL Certificates for vCenter Server

I prefer to use wildcard certificates for my environment to reduce the number of certificates that I need to manage. This is due to Let’s Encrypt limiting their certificates to 90 days. This means that you’ll need to renew each certificate every <90 days or so. Using a wildcard certificate reduces your operational overhead. However, vCenter does not support wildcard certificates.

After you’ve prepped the fullchain.pem file according to the previous article, you can now update the vCenter SSL certificate using vCenter’s Certificate Management tool.

Navigate to Menu then Administration and click on Certificate Management.

Under the Machine SSL Certificate, click on Actions and choose Import and Replace Certificate.

Select the Replace with external CA certificate (requires private key).

Copy the section for the Subscriber Certificate part into the Machine SSL Certificate box, and then the rest into the Chain of trusted root certificates box.

Copy the contents of the privkey.pem file into the Private Key box.

Once you click on Replace, vCenter will restart its services and you can open a new browser window to the FQDN of vCenter and enjoy a secured vCenter session.

Preparing Let’s Encrypt SSL Certificates for vCenter, NSX-T Manager and Avi Controller

The Let’s Encrypt DST Root CA X3 certificate is missing from the fullchain.pem and chain.pem files, therefore errors such as the following prevent certificates from being imported by VMware appliances such as NSX-T and vCenter.

This post summarizes how to fix this issue.

Let’s Encrypt is a great service that provides free SSL certificates. I recently rebuilt my lab and decided to use SSL certs for my management appliances. However, non of the management appliances would accept the certificates issued by Let’s Encrypt due to an incomplete chain. This post summarizes how to fix this issue.

[Updated here]

TL;DR the Let’s Encrypt DST Root CA X3 certificate is missing from the fullchain.pem and chain.pem files, therefore errors such as the following prevent certificates from being imported by VMware appliances such as NSX-T and vCenter.

Certificate chain validation failed. Make sure a valid chain is provided in order leaf,intermediate,root certificate. (Error code: 2076)

Get your certbot tool up and running, you can read more with this link.

Grab your files from the /etc/letsencrypt/live folder for your vCenter certificate. My one is in /etc/letsencrypt/live/vcenter.vmwire.com

You should now have the following files.

cert.pem
chain.pem
fullchain.pem
privkey.pem

A note on Let’s Encrypt certificate chain. If you look at the Certification Path for Let’s Encrypt certificates, you’ll notice something like this.

figure 1.

vcenter.vmwire.com is issued by the R3 CA certificate. This is Let’s Encrypt’s Intermediate certificate.

R3 is issued by the DST Root CA X3 certificate. This is Let’s Encrypt root certificate.

Then the DST Root CA X3 certificate needs to be trusted by all of our management appliances, vCenter, NSX-T and Avi Controller.

What I found is that, this is not the case and trying to import a Let’s Encrypt certificate without the root certificate that issued the DST Root CA X3 certificate will fail. Here’s an example from NSX-T when importing the chain.pem certificate.

figure 2. Importing the chain.pem certificate to NSX

The chain.pem file contains the R3 certificate and the DST Root CA X3 certificate. When you open it in notepad++ it looks like this.

figure 3. chain.pem

So we have a problem. We need the certificate that issued the DST Root CA X3 certificate to complete the chain and pass the chain validation.

Lets take a look at Let’s Encrypt certificates on their website.

ISRG Certificate Hierarchy Diagram, as of December 2020

So looking up the chain, it appears that my certificate vcenter.vmwire.com corresponds to the Subscriber Cert, which is issued by R3. This confirms the assumptions above in figure 1. However, it looks like the R3 certificate is not issued by the DST Root CA X3 certificate but in fact another certificate named ISRG Root X1.

Lets test this theory and import each of the certificates in the chain.pem file individually using NSX-T.

After importing, you can see that this is in fact the ISRG Root X1 certificate that is issued by the DST Root CA X3 certificate. My assumption from figure 3. is then incorrect.

So what is the top certificate in the chain.pem file?

Lets’ import it and find out. Yup, its the R3 certificate.

So where is the DST Root CA X3 certificate that we need to complete the validation chain?

We can obtain this from the Let’s Encrypt website. Scroll all the way down to the bottom of that page and you’ll see the following:

Clicking on that link will get you the the following page with this link.

And we will get closer to our DST Root CA X3 certificate when we click on that link above.

Clicking on that link gets us to this page here.

Then clicking on that link will get us to this page here.

We can now grab our certificate with this link highlighted here.

When you click on this link, you’ll be able to download a file named 8395.crt, this is the DST Root CA X3 certificate that we need to complete the chain. However, it is in a .crt format but we need to work with .pem.

To convert a crt certificate to pem use the following command.

openssl x509 -in 8395.crt -out 8395.pem -outform PEM

Once we import it into NSX-T we can see that it is indeed our missing certificate.

Looking at all the certificates in NSX-T, it all looks good.

You can see from my terrible Apple Pencil skills that it all now looks good.

So how do we use all these certificates together?

Well, the easiest is to take the fullchain.pem file, this file contains the following certificates in the following order.

Subscriber Certificate
R3 Certificate
ISRG Root X1 Certificate

That means we just need to append our new DST Root CA X3 certificate to the bottom of the fullchain.pem file to get a valid chain. It will now look like this.

Subscriber Certificate
R3 Certificate
ISRG Root X1 Certificate
DST Root CA X3 Certificate

Once we import this new fullchain into NSX-T in goes through successfully.

Now that we have a working fullchain.pem, we can use that file or its contents to import these signed certificates into our management appliances.

Read on – the following articles summarize how to use these certificates with vCenter, NSX-T Manager and the Avi Controller.

Updating vCenter to use Let’s Encrypt certificates.

Updating NSX-T to use Let’s Encrypt certificates.

Updating Avi Controller to use Let’s Encrypt certificates.

Playing with Tanzu – persistent volume claims, deployments & services

Deploying your first pod with a persistent volume claim and service on vSphere with Tanzu. With sample code for you to try.

Learning the k8s ropes…

This is not a how to article to get vSphere with Tanzu up and running, there are plenty of guides out there, here and here. This post is more of a “lets have some fun with Kubernetes now that I have a vSphere with Tanzu cluster to play with“.

Answering the following question would be a good start to get to grips with understanding Kubernetes from a VMware perspective.

How do I do things that I did in the past in a VM but now do it with Kubernetes in a container context instead?

For example building the certbot application in a container instead of a VM.

Lets try to create an Ubuntu deployment that deploys one Ubuntu container into a vSphere Pod with persistent storage and a load balancer service from NSX-T to get to the /bin/bash shell of the deployed container.

Let’s go!

I created two yaml files for this, accessible from Github. You can read up on what these objects are here.

FilenameWhats it for?What does it do?Github link
certbot-deployment.yamlk8s deployment specificationDeploys one ubuntu pod, claims a 16Gb volume and mounts it to /dev/sdb and creates a load balancer to enable remote management with ssh.ubuntu-deployment.yaml
certbot-pvc.yamlpersistent volume claim specificationCreates a persistent volume of 16Gb size from the underlying vSphere storage class named tanzu-demo-storage.
The PVC is then consumed by the deployment.
ubuntu-pvc.yaml
Table 1. The only two files that you need.

Here’s the certbot-deployment.yaml file that shows the required fields and object spec for a Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: certbot
spec:
  replicas: 1
  selector:
    matchLabels:
      app: certbot
  template:
    metadata:
      labels:
        app: certbot
    spec:
      volumes:
      - name: certbot-storage
        persistentVolumeClaim:
         claimName: ubuntu-pvc
      containers:
      - name: ubuntu
        image: ubuntu:latest
        command: ["/bin/sleep", "3650d"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: "/mnt/sdb"
          name: certbot-storage
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: certbot
  name: certbot
spec:
  ports:
  - port: 22
    protocol: TCP
    targetPort: 22
  selector:
    app: certbot
  sessionAffinity: None
  type: LoadBalancer

Here’s the certbot-pvc.yaml file that shows the required fields and object spec for a Kubernetes Persistent Volume Claim.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: certbot-pvc
  labels:
    storage-tier: tanzu-demo-storage
    availability-zone: home
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: tanzu-demo-storage
  resources:
    requests:
        storage: 16Gi

First deploy the PVC claim with this command:

kubectl apply -f certbot-pvc.yaml

Then deploy the deployment with this command:

kubectl.exe apply -f certbot-deployment.yaml

Magic happens and you can monitor the vSphere client and kubectl for status. Here are a couple of screenshots to show you whats happening.

kubectl describe deployment certbot
Name:                   certbot
Namespace:              new
CreationTimestamp:      Thu, 11 Mar 2021 23:40:25 +0200
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=certbot
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=certbot
  Containers:
   ubuntu:
    Image:      ubuntu:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sleep
      3650d
    Environment:  <none>
    Mounts:
      /mnt/sdb from certbot-storage (rw)
  Volumes:
   certbot-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  certbot-pvc
    ReadOnly:   false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   certbot-68b4747476 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  44m   deployment-controller  Scaled up replica set certbot-68b4747476 to 1
kubectl describe pvc
Name:          certbot-pvc
Namespace:     new
StorageClass:  tanzu-demo-storage
Status:        Bound
Volume:        pvc-418a0d4a-f4a6-4aef-a82d-1809dacc9892
Labels:        availability-zone=home
               storage-tier=tanzu-demo-storage
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volumehealth.storage.kubernetes.io/health: accessible
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      16Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Mounted By:    certbot-68b4747476-pq5j2
Events:        <none>
kubectl get deployments
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
certbot     1/1     1            1           47m


kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
certbot-68b4747476-pq5j2     1/1     Running   0          47m


kubectl get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
certbot-pvc   Bound    pvc-418a0d4a-f4a6-4aef-a82d-1809dacc9892   16Gi       RWO            tanzu-demo-storage   84m

Let’s log into our pod, note the name from the kubectl get pods command above.

certbot-68b4747476-pq5j2

Its not yet possible to log into the pod using SSH since this is a fresh container that does not have SSH installed, lets log in first using kubectl and install SSH.

kubectl exec --stdin --tty certbot-68b4747476-pq5j2 -- /bin/bash

You will then be inside the container at the /bin/bash prompt.

root@certbot-68b4747476-pq5j2:/# ls
bin   dev  home  lib32  libx32  mnt  proc  run   srv  tmp  var
boot  etc  lib   lib64  media   opt  root  sbin  sys  usr
root@certbot-68b4747476-pq5j2:/#

Lets install some tools and configure ssh.

apt-get update
apt-get install iputils-ping
apt-get install ssh

passwd root

service ssh restart

exit

Before we can log into the container over an SSH connection, we need to find out what the external IP is for the SSH service that the NSX-T load balancer configured for the deployment. You can find this using the command:

kubectl get services

kubectl get services
NAME        TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
certbot     LoadBalancer   10.96.0.44   172.16.2.3    22:31731/TCP   51m

The IP that we use to get to the Ubuntu container over SSH is 172.16.2.3. Lets try that with a putty/terminal session…

login as: root
certbot@172.16.2.3's password:
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 4.19.126-1.ph3-esx x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

$ ls
bin   dev  home  lib32  libx32  mnt  proc  run   srv  tmp  var
boot  etc  lib   lib64  media   opt  root  sbin  sys  usr
$ df
Filesystem     1K-blocks   Used Available Use% Mounted on
overlay           258724 185032     73692  72% /
/mnt/sdb        16382844  45084  16321376   1% /mnt/sdb
tmpfs             249688     12    249676   1% /run/secrets/kubernetes.io/servic                         eaccount
/dev/sda          258724 185032     73692  72% /dev/termination-log
$

You can see that there is a 16Gb mount point at /mnt/sdb just as we specified in the specifications and remote SSH access is working.

How to import existing infrastructure into Terraform management

This post will focus on how to import existing infrastructure into Terraform’s management. Some scenarios where this could happen is that you’ve already deployed infrastructure and have only recently started to look into infrastructure as code and maybe you’ve tried to use PowerShell, Ansible and other tools but none are quite as declarative as Terraform.

Terraform is a great framework to use to start developing and working with infrastructure-as-code to manage resources. It provides awesome benefits such as extremely fast deployment through automation, managing configuration drift, adding configuration changes and destroying entire environments with a few key strokes. Plus it supports many providers so you can easily use the same code logic to deploy and manage different resources, for example on VMware clouds, AWS or Azure at the same time.

For more information if you haven’t looked at Terraform before, please take a quick run through HashiCorp’s website:

https://www.terraform.io/

Getting started with Terraform is really quite simple when the environment that you are starting to manage is green-field. In that, you are starting from a completely fresh deployment on Day-0. If we take AWS as an example, this is as fresh as signing up to the AWS free-tier with a new account and having nothing deployed in your AWS console.

Terraform has a few simple files that are used to build and manage infrastructure through code, these are the configuration and the state. The basic building blocks of Terraform. There are other files and concepts that could be used such as variables and modules, but I won’t cover these in much detail in this post.

How do you bring in infrastructure that is already deployed into Terraform’s management?

This post will focus on how to import existing infrastructure (brown-field) into Terraform’s management. Some scenarios where this could happen is that you’ve already deployed infrastructure and have only recently started to look into infrastructure as code and maybe you’ve tried to use PowerShell, Ansible and other tools but none are quite as useful as Terraform.

Assumptions

First lets assume that you’ve deployed Terraform CLI or are already using Terraform Cloud, the concepts are pretty much the same. I will be using Terraform CLI for the examples in this post together with AWS. I’m also going to assume that you know how to obtain access and secret keys from your AWS Console.

By all means this import method works with any supported Terraform provider, including all the VMware ones. For this exercise, I will work with AWS.

My AWS environment consists of the following infrastructure, yours will be different of course and I’m using this infrastructure below in the examples.

You will need to obtain the AWS resource IDs from your environment, use the AWS Console or API to obtain this information.

#ResourceNameAWS Resource ID
1VPCVPCvpc-02d890cacbdbaaf87
2PublicSubnetAPublicSubnetAsubnet-0f6d45ef0748260c6
3PublicSubnetBPublicSubnetBsubnet-092bf59b48c62b23f
4PrivateSubnetAPrivateSubnetAsubnet-03c31081bf98804e0
5PrivateSubnetBPrivateSubnetBsubnet-05045746ac7362070
6IGWIGWigw-09056bba88a03f8fb
7NetworkACLNACLacl-0def8bcfeff536048
8RoutePublicPublicRoutertb-082be686bca733626
9RoutePrivatePrivateRoutertb-0d7d3b5eacb25a022
10Instance1Instance1i-0bf15fecd31957129
11elbelb-UE360LJ7779Celb-158WU63HHVD3
12SGELBELBSecurityGroupsg-0b8f9ee4e1e2723e7
13SGappAppServerSecurityGroupsg-031fadbb59460a776
Table 1. AWS Resource IDs

But I used CloudFormation to deploy my infrastructure…

If you used CloudFormation to deploy your infrastructure and you now want to use Terraform, then you will need to update the CloudFormation deletion policy to retain before bringing any resources into Terraform. This is important as any accidental deletion or change with CloudFormation stack would impact your Terraform configuration and state. I recommend setting this policy before importing resources with Terraform.

This link has some more information that will help you enable the deletion policy on all resources.

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-deletionpolicy.html

For example to change a CloudFormation configuration with the deletion policy enabled, the code would look like this:

Resources:


    VPC:
      Type: AWS::EC2::VPC
      DeletionPolicy: Retain
      Properties:
        CidrBlock: 10.0.0.0/16
        InstanceTenancy: default
        EnableDnsSupport: 'true'
        EnableDnsHostnames: 'true'

Lets get started!

Set up your main.tf configuration file for a new project that will import an existing AWS infrastructure. The first version of our main.tf file will look like this, with the only resource that we will import being the VPC. Its always good to work with a single resource first to ensure that your import works before going all out and importing all the rest.

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }
}

provider "aws" {
  # Configuration options
  region = "eu-west-1"
  access_key = "my_access_key"
  secret_key = "my_secret_key"
}

resource "aws_vpc" "VPC" {
  # (resource arguments)
}

Run the following to initialize the AWS provider in Terraform.

terraform init

Import the VPC resource with this command in your terminal

terraform import aws_vpc.VPC vpc-02d890cacbdbaaf87

You can then review the terraform state file, it should be named terraform.tfstate, and it will look something like this. (Open it in a text editor).

{
  "version": 4,
  "terraform_version": "0.14.6",
  "serial": 13,
  "lineage": "xxxx",
  "outputs": {},
  "resources": [    {
  "mode": "managed",
      "type": "aws_vpc",
      "name": "VPC",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "arn": "xxxx",
            "assign_generated_ipv6_cidr_block": false,
            "cidr_block": "10.0.0.0/16",
            "default_network_acl_id": "acl-067e11c10e2327cc9",
            "default_route_table_id": "rtb-0a55b9e1683991242",
            "default_security_group_id": "sg-0db58c5c159b1ebf9",
            "dhcp_options_id": "dopt-7d1b121b",
            "enable_classiclink": false,
            "enable_classiclink_dns_support": false,
            "enable_dns_hostnames": true,
            "enable_dns_support": true,
            "id": "vpc-02d890cacbdbaaf87",
            "instance_tenancy": "default",
            "ipv6_association_id": "",
            "ipv6_cidr_block": "",
            "main_route_table_id": "rtb-0a55b9e1683991242",
            "owner_id": "xxxxxxx",
            "tags": {
              "Name": "VPC",
              "environment": "aws",
              "project": "Imported by Terraform"
            }
          },
          "sensitive_attributes": [],
          "private": "xxxxxx"
        }
      ]
    }
  ]
}

Notice that the VPC and all of the VPC settings have now been imported into Terraform.

Now that we have successfully imported the VPC, we can continue and import the rest of the infrastructure. The remaining AWS services we need to import are detailed in Table 1. AWS Resource IDs.

To import the remaining infrastructure we need to add the code to the main.tf file to import the other resources. Edit your main.tf so that it looks like this. Notice that all of the thirteen resources are defined in the configuration file and the resource arguments are all empty. We will update the resource arguments later, initially we just need to import the resources into the Terraform state and then update the configuration with the known state.

Terraform does not support automatic creation of a configuration out of a state.

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }
}

provider "aws" {
  # Configuration options
  region = "eu-west-1"
  access_key = "my_access_key"
  secret_key = "my_secret_key"
}

resource "aws_vpc" "SAVPC" {
  # (resource arguments)
}

resource "aws_subnet" "PublicSubnetA" {
  # (resource arguments)
}

resource "aws_subnet" "PublicSubnetB" {
  # (resource arguments)
}

resource "aws_subnet" "PrivateSubnetA" {
  # (resource arguments)
}

resource "aws_subnet" "PrivateSubnetB" {
  # (resource arguments)
}

resource "aws_internet_gateway" "IGW" {
  # (resource arguments)
}

resource "aws_network_acl" "NACL" {
  # (resource arguments)
}

resource "aws_route_table" "PublicRoute" {
  # (resource arguments)
}

resource "aws_route_table" "PrivateRoute" {
  # (resource arguments)
}

resource "aws_instance" "Instance1" {
  # (resource arguments)
}

resource "aws_elb" "elb-UE360LJ7779C" {
  # (resource arguments)
}

resource "aws_security_group" "ELBSecurityGroup" {
  # (resource arguments)
}

resource "aws_security_group" "AppServerSecurityGroup" {
  # (resource arguments)
}

Run the following commands in your terminal to import the remaining resources into Terraform.

terraform import aws_subnet.PublicSubnetA subnet-0f6d45ef0748260c6
terraform import aws_subnet.PublicSubnetB subnet-092bf59b48c62b23f
terraform import aws_subnet.PrivateSubnetA subnet-03c31081bf98804e0
terraform import aws_subnet.PrivateSubnetB subnet-05045746ac7362070
terraform import aws_internet_gateway.IGW igw-09056bba88a03f8fb
terraform import aws_network_acl.NACL acl-0def8bcfeff536048
terraform import aws_route_table.PublicRoute rtb-082be686bca733626
terraform import aws_route_table.PrivateRoute rtb-0d7d3b5eacb25a022
terraform import aws_instance.Instance1 i-0bf15fecd31957129
terraform import aws_elb.elb-158WU63HHVD3 elb-158WU63HHVD3
terraform import aws_security_group.ELBSecurityGroup sg-0b8f9ee4e1e2723e7
terraform import aws_security_group.AppServerSecurityGroup sg-031fadbb59460a776

Now that all thirteen resources are imported you will need to manually update the configuration file, in our case main.tf with the resource arguments that correspond to the current state of all the resources that were just imported. The easiest way to do this is to first take a look at the Terraform provider for AWS documentation to find the mandatory fields that are needed. Lets use the aws_subnet as an example:

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/subnet

From the documentation we need two things

cidr_block – (Required) The CIDR block for the subnet.

vpc_id – (Required) The VPC ID.

We know that we need these two as a minimum, but what if there are other configuration items that were done in the AWS Console or CloudFormation before you started to work with Terraform. An example of this is of course tags and other configuration parameters. You want to update your main.tf file with the same configuration as what was just imported into the state. This is very important.

To do this, do not use the terraform.tfstate but instead run the following command.

terraform show

You’ll get an output of the current state of your AWS environment that you can then copy and paste the resource arguments into your main.tf configuration.

I won’t cover how to do all thirteen resources in this post so I’ll again use our example for one of the aws_subnet resources. Here is the PublicSubnetA aws_subnet resource information copy and pasted straight out of the terraform show command.

# aws_subnet.PublicSubnetA:
resource "aws_subnet" "PublicSubnetA" {
    arn                             = "arn:aws:ec2:eu-west-1:xxxx:subnet/subnet-0f6d45ef0748260c6"
    assign_ipv6_address_on_creation = false
    availability_zone               = "eu-west-1a"
    availability_zone_id            = "euw1-az2"
    cidr_block                      = "10.0.0.0/24"
    id                              = "subnet-0f6d45ef0748260c6"
    map_customer_owned_ip_on_launch = false
    map_public_ip_on_launch         = true
    owner_id                        = "xxxx"
    tags                            = {
        "Name"        = "PublicSubnetA"
        "environment" = "aws"
        "project"     = "my_project"
    }
    vpc_id                          = "vpc-02d890cacbdbaaf87"

    timeouts {}
}

Not all resource arguments are needed, again review the documentation. Here is an example of my changes to the main.tf file with some of the settings taken from the output of the terraform show command.

resource "aws_subnet" "PublicSubnetA" {
    assign_ipv6_address_on_creation = false
    cidr_block                      = var.cidr_block_PublicSubnetA
	map_public_ip_on_launch         = true
    tags                            = {
        Name        = "PublicSubnetA"
        environment = "aws"
        project     = "my_project"
    }
    vpc_id                          = var.vpc_id

    timeouts {}
}

Notice that I have turned the value for cidr_block and vpc_id into a variables.

Using Variables

Using variables simplifies a lot of your code. I’m not going to explain what these are in this post, you can read up on these with this link:

https://www.terraform.io/docs/language/values/variables.html

However, the contents of my terraform.tfvars file looks like this:

cidr_block = "10.0.0.0/16"
vpc_id = "vpc-02d890cacbdbaaf87"
cidr_block_PublicSubnetA = "10.0.0.0/24"
cidr_block_PublicSubnetB = "10.0.1.0/24"
cidr_block_PrivateSubnetA = "10.0.2.0/24"
cidr_block_PrivateSubnetB = "10.0.3.0/24"
instance_type = "t2.micro"
ami_id = "ami-047bb4163c506cd98"
instance_port = "80"
instance_protocol = "http"
lb_port = "80"
lb_protocol = "http"

Just place your terraform.tfvars file in the same location as your main.tf file. Terraform automatically links to the default or you can reference a different variable file, again refer to the documentation.

Finalizing the configuration

Once you’ve updated your main.tf configuration with all the correct resource arguments, you can test to see if what is in the configuration is the same as what is in the state. To do this run the following command:

terraform plan

If you copied and pasted and updated your main.tf correctly then you would get output from your terminal similar to the following:

terraform plan
[ Removed content to save space ]

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.

Congratulations, you’ve successfully imported an infrastructure that was built outside of Terraform.

You can now proceed to manage your infrastructure with Terraform. For example changing the terraform.tfvars parameters for

lb_port = "443"
lb_protocol = "https"

And then running plan and apply will update the elastic load balancer elb-158WU63HHVD3 from health check on port 80 to port 443 instead.

terraform plan
[ removed content to save space ]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_elb.elb-158WU63HHVD3 will be updated in-place
  ~ resource "aws_elb" "elb-158WU63HHVD3" {
      ~ health_check {
          ~ target              = "TCP:80" -> "TCP:443"           
        }
    }
terraform apply
[ content removed to save space] 

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

And that’s how you import existing resources into Terraform, I hope you find this post useful. Please comment below if you have a better method or have any suggestions for improvements. And feel free to comment below if you have questions and need help.

Reducing HCX on-premises appliances resources

When HCX is deployed there are three appliances that are deployed as part of the Service Mesh. If you’re running testing or deploying these in a nested lab, the resource requirements may be too high for your infrastructure. This post shows you how you can edit the OVF appliances to be deployed with lower resource requirements.

When HCX is deployed there are three appliances that are deployed as part of the Service Mesh. These are detailed below.

ApplianceRolevCPUMemory (GB)
IXInterconnect appliance83
NEL2 network extension appliance83
WOWAN optimization appliance814
Total2420

As you can see, these three appliances require a lot of resources just for one Service Mesh. A Service Mesh is created on a 1:1 basis between source and destination. If you connected your on-premises environment to another destination, you would need another service mesh.

For example, if you had the following hybrid cloud requirements:

Service MeshSource siteDestination sitevCPUsMemory (GB)
1On-premisesVCPP Provider2420
2On-PremisesVMware Cloud on AWS2420
3On-PremisesAnother On-premises2420
Total7260

As you can see, resource requirements will add up.

If you’re running testing or deploying these in a nested lab, the resource requirements may be too high for your infrastructure. This post shows you how you can edit the OVF appliances to be deployed with lower resource requirements.

Disclaimer: The following is unsupported by VMware. Reducing vCPU and memory on any of the HCX appliances will impact HCX services.

  1. Log into your HCX manager appliance with the admin account
  2. do a su – to gain root access, use the same password
  3. go into the /common/appliances directory
  4. here you’ll see folders for sp and vcc, these are the only two that you need to work in.
  5. first lets start with sp, sp stands for Silverpeak which is what is running the WAN optimization.
  6. go into the /common/appliances/sp/7.3.9.0 directory
  7. vi the file VX-0000-7.3.9.0_62228.ovf
  8. go to the section where virtual cpus and memory is configured and change to the following. (I find that reducing to four vCPUs and 7GB RAM for the WO appliance is good).
       <Item>
         <rasd:AllocationUnits>hertz * 10^6</rasd:AllocationUnits>
         <rasd:Description>Number of Virtual CPUs</rasd:Description>
         <rasd:ElementName>4 virtual CPU(s)</rasd:ElementName>
         <rasd:InstanceID>1</rasd:InstanceID>
         <rasd:Reservation>1000</rasd:Reservation>
         <rasd:ResourceType>3</rasd:ResourceType>
         <rasd:VirtualQuantity>4</rasd:VirtualQuantity>
       </Item>
       <Item>
         <rasd:AllocationUnits>byte * 2^20</rasd:AllocationUnits>
         <rasd:Description>Memory Size</rasd:Description>
         <rasd:ElementName>7168MB of memory</rasd:ElementName>
         <rasd:InstanceID>2</rasd:InstanceID>
         <rasd:Reservation>2000</rasd:Reservation>
         <rasd:ResourceType>4</rasd:ResourceType>
         <rasd:VirtualQuantity>7168</rasd:VirtualQuantity>
       </Item> 

Next configure the IX and L2E appliances, these are both contained in the vcc directory.

  1. go to the /common/appliances/vcc/3.5.0 directory
  2. vi the vcc-va-large-3.5.3-17093722.ovf file, again changing the vCPU to 4 and leaving RAM at 3 GB.
  <Item>
         <rasd:AllocationUnits>hertz * 10^6</rasd:AllocationUnits>
         <rasd:Description>Number of Virtual CPUs</rasd:Description>
         <rasd:ElementName>4 virtual CPU(s)</rasd:ElementName>
         <rasd:InstanceID>1</rasd:InstanceID>
         <rasd:ResourceType>3</rasd:ResourceType>
         <rasd:VirtualQuantity>4</rasd:VirtualQuantity>
       </Item> 

Once you save your changes and create a Service Mesh, you will notice that the new appliances will be deployed with reduced virtual hardware requirements.

Hope this helps your testing with HCX!

Upgrade Cloud Director to 10.2

A very quick post on how to upgrade your Cloud Director cluster to 10.2. This post contains some shortcuts that removes some repetitive steps.

Download the latest Cloud Director 10.2 from my.vmware.com. You can read more about the latest updates in the release notes.

Primarily, VCD 10.2 brings some very new functionality, including:

  1. NSX-T Advanced functional parity – NSX Advanced Load Balancer (Avi), Distributed Firewall, VRF-lite, Cross VDC networking, IPv6, Dual stack (IPv4/IPv6) on the same network, SLAAC, DHCPv6, CVDS (vSphere 7.0/NSX-T 3.0), L2VPN – API only.
  2. vSphere with Kubernetes support – Provider and tenant UI for managing and consuming Kubernetes clusters.

Start on the primary appliance

  1. You can find which appliance is primary by going to https://<ip-of-vcd-appliance&gt;:5480.
  2. Copy the appliance update package to one of the appliances, directly into the transfer share so that you don’t have to do this for all the appliances in your cluster.
  3. Once copied do the following on the first primary appliance.
root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# ls
VMware_Cloud_Director_10.2.0.5190-17029810_update.tar.gz cells
appliance-nodes responses.properties

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# chmod u+x VMware_Cloud_Director_10.2.0.5190-17029810_update.tar.gz

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# mkdir update-package

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# tar -zxf VMware_Cloud_Director_10.2.0.5190-17029810_update.tar.gz -C update-package/

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# vamicli update --repo file://opt/vmware/vcloud-director/data/transfer/update-package

Set local repository address…
Done.

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# vamicli update --check
Checking for available updates, this process can take a few minutes….
Available Updates -
10.2.0.5190 Build 17029810

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# /opt/vmware/vcloud-director/bin/cell-management-tool -u administrator cell --shutdown

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# vamicli update --install latest

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# /opt/vmware/appliance/bin/create-db-backup

2020-10-16 08:41:01 | Invoking Database backup utility
2020-10-16 08:41:01 | Command line usage to create embedded PG DB backup: create-db-backup
2020-10-16 08:41:01 | Using "vcloud" as default PG DB to backup since DB_NAME is not provided
2020-10-16 08:41:01 | Creating back up directory /opt/vmware/vcloud-director/data/transfer/pgdb-backup if it does not already exist …
2020-10-16 08:41:01 | Creating the "vcloud" DB backup at /opt/vmware/vcloud-director/data/transfer/pgdb-backup…
2020-10-16 08:41:03 | "vcloud" DB backup has been successfully created.
2020-10-16 08:41:03 | Copying the primary node's properties and certs …
2020-10-16 08:41:04 | "vcloud" DB backup, Properties files and certs have been successfully saved to /opt/vmware/vcloud-director/data/transfer/pgdb-backup/db-backup-2020-10-16-084101.tgz.

Note: To restore the postgres DB dump copy this tar file to the remote system.

root@vcd01 [ /opt/vmware/vcloud-director/data/transfer ]# /opt/vmware/vcloud-director/bin/upgrade

Welcome to the VMware Cloud Director upgrade utility
Verify that you have a valid license key to use the version of the
VMware Cloud Director software to which you are upgrading.
This utility will apply several updates to the database. Please
ensure you have created a backup of your database prior to continuing.

Do you wish to upgrade the product now? [Y/N] y

Examining database at URL: jdbc:postgresql://172.16.2.28:5432/vcloud?socketTimeout=90&ssl=true
The next step in the upgrade process will change the VMware Cloud Director database schema.

Backup your database now using the tools provided by your database vendor.

Enter [Y] after the backup is complete. y

Running 5 upgrade tasks
Executing upgrade task:
Successfully ran upgrade task
Executing upgrade task:
Successfully ran upgrade task
Executing upgrade task:
Successfully ran upgrade task
Executing upgrade task:
…..\Successfully ran upgrade task
Executing upgrade task:
……………[15]
Successfully ran upgrade task
Database upgrade complete
Upgrade complete

Would you like to start the Cloud Director service now? If you choose not
to start it now, you can manually start it at any time using this command:
service vmware-vcd start

Start it now? [y/n] n

Skipping start up for now

On the remaining appliances

root@vcd02 [ /opt/vmware/vcloud-director/data/transfer ]# vamicli update --repo file://opt/vmware/vcloud-director/data/transfer/update-package

Set local repository address…
Done.

root@vcd02 [ /opt/vmware/vcloud-director/data/transfer ]# vamicli update --check
Checking for available updates, this process can take a few minutes….
Available Updates -
10.2.0.5190 Build 17029810

root@vcd02 [ /opt/vmware/vcloud-director/data/transfer ]# /opt/vmware/vcloud-director/bin/cell-management-tool -u administrator cell --shutdown
Please enter the administrator password:
Cell successfully deactivated and all tasks cleared in preparation for shutdown

root@vcd02 [ /opt/vmware/vcloud-director/data/transfer ]# vamicli update --install latest
Installing version -
10.2.0.5190 Build 17029810
………………………………………………………………………………

Reboot appliances

Now reboot each appliance one at a time starting with the primary, wait for it to rejoin the PostgreSQL cluster before rebooting the other appliances.

Using Let’s Encrypt certificates with Cloud Director

Let’s Encrypt (LE) is a certificate authority that issues free SSL certificates for use in your web applications. This post details how to get LE setup to support Cloud Director specifically with a wildcard certificate.

Let’s Encrypt (LE) is a certificate authority that issues free SSL certificates for use in your web applications. This post details how to get LE setup to support Cloud Director specifically with a wildcard certificate.

Certbot

LE uses an application called certbot to request, automatically download and renew certificates. You can think of certbot as the client for LE.

First you’ll need to create a client machine that can request certificates from LE. I started with a simple CentOS VM. For more details about installing certbot into your preferred OS read this page here.

Once you get yours on the network with outbound internet access, you can start by performing the following.

 # Update software
 yum update
 
 # Install wget if not already installed
 yum install wget
 
 # Download the certbot application.
 wget https://dl.eff.org/certbot-auto
 
 # Move certbot into a local application directory
 sudo mv certbot-auto /usr/local/bin/certbot-auto
 
 # Set ownership to root
 sudo chown root /usr/local/bin/certbot-auto
 
 # Change permisssions for certbot
 sudo chmod 0755 /usr/local/bin/certbot-auto

Now you’re ready to request certificates. Run the following command but of course replacing your desired domain within the ‘your.domain.here ‘.

/usr/local/bin/certbot-auto --config-dir $HOME/.certbot --work-dir $HOME/.certbot/work --logs-dir $HOME/.certbot/logs  certonly --manual --preferred-challenges=dns -d '*.vmwire.com'

This will create a request for a wildcard certificate for *.vmwire.com.

You’ll then be asked to create a new DNS TXT record on your public DNS server for the domain that you are requesting to validate that you can manage that domain. Here’s what mine looks like for the above.

This means that you can only request public certificates with LE, private certificates are not supported.

You will then see a response from LE such as the following:

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /root/.certbot/live/vmwire.com/fullchain.pem
   Your key file has been saved at:
   /root/.certbot/live/vmwire.com/privkey.pem
   Your cert will expire on 2020-12-24. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot-auto
   again. To non-interactively renew *all* of your certificates, run
   "certbot-auto renew"

Updating Cloud Director certificates

Before you can use new certificate, you need to perform some operations with the JAVA Keytool to import the pem formatted certificates into the certificates.ks file that Cloud Director uses.

The issued certificate is available in the directory

/root/.certbot/live/

Navigate to there using an SSH client and you’ll see a structure like this

Download the entire folder for the next steps. Within the folder you’ll see the following files

FilenamePurpose
cert.pemyour certificate in pem format
chain.pemthe Let’s Encrypt root CA certificate in pem format
fullchain.pemyour wildcard certificate AND the LE root CA certificate in pem format
privkey.pemthe private key for your certificate (without passphrase)

We need to rename the file to something that the JAVA Keytool can work with. I renamed mine to the following:

Original filenameNew Filename
cert.pemvmwire-com.crt
chain.pemvmwire-com-ca.crt
fullchain.pemnot needed
privkey.pemvmwire-com.key

Copy the three new files to one of the Cloud Director cells, use the /tmp directory.

Now launch an SSH session to one of the Cloud Director cells and perform the following.

# Import the certificate and the private key into a new pfx format certificate
openssl pkcs12 -export -out /tmp/vmwire-com.pfx -inkey /tmp/vmwire-com.key -in /tmp/vmwire-com.crt

# Create a new certificates.ks file and import the pfx formatted certificate
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storepass Vmware1! -keypass Vmware1! -storetype JCEKS -importkeystore -srckeystore /tmp/vmwire-com.pfx -srcstorepass Vmware1!

# Change the alias for the first entry to be http
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storetype JCEKS -changealias -alias 1 -destalias http -storepass Vmware1!

# Import the certificate again, this time creating alias 1 again (we will use the same wildcard certifiate for the consoleproxy)
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storepass Vmware1! -keypass Vmware1! -storetype JCEKS -importkeystore -srckeystore /tmp/vmwire-com.pfx -srcstorepass Vmware1!

# Change the alias for the first entry to be consoleproxy
/opt/vmware/vcloud-director/jre/bin/keytool -keystore /tmp/certificates.ks -storetype JCEKS -changealias -alias 1 -destalias consoleproxy -storepass Vmware1!

# Import the root certificate into the certificates.ks file
/opt/vmware/vcloud-director/jre/bin/keytool -importcert -alias root -file /tmp/vmwire-com-ca.crt -storetype JCEKS -keystore /tmp/certificates.ks -storepass Vmware1!

# List all the entries, you should now see three, http, consoleproxy and root
/opt/vmware/vcloud-director/jre/bin/keytool  -list -keystore /tmp/certificates.ks -storetype JCEKS -storepass Vmware1!

# Stop the Cloud Director service on all cells
service vmware-vcd stop

# Make a backup of the current certificate
mv /opt/vmware/vcloud-director/certificates.ks /opt/vmware/vcloud-director/certificates.ks.old

# Copy the new certificate to the Cloud Director directory
cp /tmp/certificates.ks /opt/vmware/vcloud-director/

# List all the entries, you should now see three, http, consoleproxy and root
/opt/vmware/vcloud-director/jre/bin/keytool  -list -keystore /opt/vmware/vcloud-director/certificates.ks -storetype JCEKS -storepass Vmware1!

# Reconfigure the Cloud Director application to use the new certificate
/opt/vmware/vcloud-director/bin/configure

# Start the Cloud Director application
service vmware-vcd start

# Monitor startup logs
tail -f /opt/vmware/vcloud-director/logs/cell.log

Copy the certificates.ks file to the other cells and perform the configure on the other cells to update the certificates for all cells. Don’t forget to update the certificate on the load balancer too. This other post shows how to do it with the NSX-T load balancer.

Check out the new certificate at https://vcloud.vmwire.com/tenant/vmwire.

Automate NSX-T Load Balancer setup for Cloud Director and the Tenant App

This post describes how to use the NSX-T Policy API to automate the creation of load balancer configurations for Cloud Director and the vRealize Operations Tenant App.

This post describes how to use the NSX-T Policy API to automate the creation of load balancer configurations for Cloud Director and the vRealize Operations Tenant App.

Postman collection

I’ve included a Postman collection that contains all of the necessary API calls to get everything configured. There is also a Postman environment that contains the necessary variables to successfully configure the load balancer services.

To get started import the collection and environment into Postman.

You’ll see the collection in Postman named NSX-T Load Balancer Setup. All the steps are numbered to import certificates, configure the Cloud Director load balancer services. I’ve also included the calls to create the load balancer services for the vRealize Operations Tenant App.

Before you run any of those API calls, you’ll first want to import the Postman environment. Once imported you’ll see the environments in the top right screen of Postman, the environment is called NSX-T Load Balancer Setup.

Complete your environment variables.

VariableValue Description
nsx_vipnsx-t manager cluster virtual ip
nsx-manager-usernsx-t manager username, usually admin
nsx-manager-passwordnsx-t manager password
vcd-public-ippublic ip address for the vcd service to be configured on the load balancer
tenant-app-public-ippublic ip address for the tenant app service to be configured on the load balancer
vcd-cert-namea name for the imported vcd http certificate
vcd-cert-private-keyvcd http certificate private key in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.

for example:

—–BEGIN RSA PRIVATE KEY—–\n<private key>\n—–END RSA PRIVATE KEY—–
vcd-cert-passphrasevcd private key passphrase
vcd-certificatevcd http certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.

For example:
—–BEGIN CERTIFICATE—–\nMIIGADCCBOigAwIBAgIRALUVXndtVGMeRM1YiMqzBCowDQYJKoZIhvcNAQELBQAw\ngY8xCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO\nBgNVBAcTB1NhbGZvcmQxGDAWBgNVBAoTD1NlY3RpZ28gTGltaXRlZDE3MDUGA1UE\nAxMuU2VjdGlnbyBSU0EgRG9tYWluIFZhbGlkYXRpb24gU2VjdXJlIFNlcnZlciBD\nQTAeFw0xOTA4MjMwMDAwMDBaFw0yMDA4MjIyMzU5NTlaMFUxITAfBgNVBAsTGERv\nbWFpbiBDb250cm9sIFZhbGlkYXRlZDEUMBIGA1UECxMLUG9zaXRpdmVTU0wxGjAY\nBgNVBAMTEXZjbG91ZC52bXdpcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A\nMIIBCgKCAQEAqh9sn6bNiDmmg3fJSG4zrK9IbrdisALFqnJQTkkErvoky2ax0RzV\n/ZJ/1fNHpvy1yT7RSZbKcWicoxatYPCgFHDzz2JwgvfwQCRMOfbPzohTSAhrPZph\n4FOPnrF8iwGggTxp+/2/ixg0DjQZL32rc9ax1qEvSURt571hUE7uLkRbPrdbocSZ\n4c2atVh8K1fp3uBqEbAs0UyjW5PK3wIN5ZRFArxc5kiGW0btN1RmoWwOmuJkAtu7\nzuaAJcgr/UVb1PP+GgAvKdmikssB1MWQALTRHm7H2GJp2MlbyGU3ZROSPkSSaNsq\n4otCJxtvQze/lB5QGWj5V2B7YbNJKwJdXQIDAQABo4ICjjCCAoowHwYDVR0jBBgw\nFoAUjYxexFStiuF36Zv5mwXhuAGNYeEwHQYDVR0OBBYEFNhZaRisExXrYrqfIIm6\n9TP8JrqwMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQG\nCCsGAQUFBwMBBggrBgEFBQcDAjBJBgNVHSAEQjBAMDQGCysGAQQBsjEBAgIHMCUw\nIwYIKwYBBQUHAgEWF2h0dHBzOi8vc2VjdGlnby5jb20vQ1BTMAgGBmeBDAECATCB\nhAYIKwYBBQUHAQEEeDB2ME8GCCsGAQUFBzAChkNodHRwOi8vY3J0LnNlY3RpZ28u\nY29tL1NlY3RpZ29SU0FEb21haW5WYWxpZGF0aW9uU2VjdXJlU2VydmVyQ0EuY3J0\nMCMGCCsGAQUFBzABhhdodHRwOi8vb2NzcC5zZWN0aWdvLmNvbTAzBgNVHREELDAq\nghF2Y2xvdWQudm13aXJlLmNvbYIVd3d3LnZjbG91ZC52bXdpcmUuY29tMIIBAgYK\nKwYBBAHWeQIEAgSB8wSB8ADuAHUAsh4FzIuizYogTodm+Su5iiUgZ2va+nDnsklT\nLe+LkF4AAAFsv3BsIwAABAMARjBEAiBat+l0e3BTu+EBcRJfR8hCA/CznWm1mbVl\nxZqDoKM6tAIgON6U0YoqA91xxpXH2DyA04o5KSdSvNT05wz2aa7zkzwAdQBep3P5\n31bA57U2SH3QSeAyepGaDIShEhKEGHWWgXFFWAAAAWy/cGw+AAAEAwBGMEQCIDHl\njofAcm5GqECwtjBfxYD7AFkJn4Ez0IGRFrux4ldiAiAaNnkMbf0P9arSDNno4hQT\nIJ2hUaIWNfuKBEIIkfqhCTANBgkqhkiG9w0BAQsFAAOCAQEAZCubBHRV+m9iiIeq\nCoaFV2YZLQUz/XM4wzQL+73eqGHINp6xh/+kYY6vw4j+ypr9P8m8+ouqichqo7GJ\nMhjtbXrB+TTRwqQgDHNHP7egBjkO+eDMxK4aa3x1r1AQoRBclPvEbXCohg2sPUG5\nZleog76NhPARR43gcxYC938OH/2TVAsa4JApF3vbCCILrbTuOy3Z9rf3aQLSt6Jp\nkh85w6AlSkXhQJWrydQ1o+NxnfQmTOuIH8XEQ2Ne1Xi4sbiMvWQ7dlH5/N8L8qWQ\nEPCWn+5HGxHIJFXMsgLEDypvuXGt28ZV/T91DwPLeGCEp8kUC3N+uamLYeYMKOGD\nMrToTA==\n—–END CERTIFICATE—–
ca-cert-namea name for the imported ca root certificate
ca-certificateca root certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.
vcd-node1-namethe hostname for the first vcd appliance
vcd-node1-ipthe dmz ip address for the first vcd appliance
vcd-node2-namethe hostname for the second vcd appliance
vcd-node2-ipthe dmz ip address for the second vcd appliance
vcd-node3-namethe hostname for the third vcd appliance
vcd-node3-ipthe dmz ip address for the third vcd appliance
tenant-app-node-namethe hostname for the vrealize operations tenant app appliance
tenant-app-node-ipthe dmz ip address for the vrealize operations tenant app appliance
tenant-app-cert-namea name for the imported tenant app certificate
tenant-app-cert-private-keytenant app certificate private key in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.

For example:

—–BEGIN RSA PRIVATE KEY—–\n<private key>\n—–END RSA PRIVATE KEY—–
tenant-app-cert-passphrasetenant app private key passphrase
tenant-app-certificatetenant app certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.

For example:
—–BEGIN CERTIFICATE—–\nMIIGADCCBOigAwIBAgIRALUVXndtVGMeRM1YiMqzBCowDQYJKoZIhvcNAQELBQAw\ngY8xCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO\nBgNVBAcTB1NhbGZvcmQxGDAWBgNVBAoTD1NlY3RpZ28gTGltaXRlZDE3MDUGA1UE\nAxMuU2VjdGlnbyBSU0EgRG9tYWluIFZhbGlkYXRpb24gU2VjdXJlIFNlcnZlciBD\nQTAeFw0xOTA4MjMwMDAwMDBaFw0yMDA4MjIyMzU5NTlaMFUxITAfBgNVBAsTGERv\nbWFpbiBDb250cm9sIFZhbGlkYXRlZDEUMBIGA1UECxMLUG9zaXRpdmVTU0wxGjAY\nBgNVBAMTEXZjbG91ZC52bXdpcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A\nMIIBCgKCAQEAqh9sn6bNiDmmg3fJSG4zrK9IbrdisALFqnJQTkkErvoky2ax0RzV\n/ZJ/1fNHpvy1yT7RSZbKcWicoxatYPCgFHDzz2JwgvfwQCRMOfbPzohTSAhrPZph\n4FOPnrF8iwGggTxp+/2/ixg0DjQZL32rc9ax1qEvSURt571hUE7uLkRbPrdbocSZ\n4c2atVh8K1fp3uBqEbAs0UyjW5PK3wIN5ZRFArxc5kiGW0btN1RmoWwOmuJkAtu7\nzuaAJcgr/UVb1PP+GgAvKdmikssB1MWQALTRHm7H2GJp2MlbyGU3ZROSPkSSaNsq\n4otCJxtvQze/lB5QGWj5V2B7YbNJKwJdXQIDAQABo4ICjjCCAoowHwYDVR0jBBgw\nFoAUjYxexFStiuF36Zv5mwXhuAGNYeEwHQYDVR0OBBYEFNhZaRisExXrYrqfIIm6\n9TP8JrqwMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQG\nCCsGAQUFBwMBBggrBgEFBQcDAjBJBgNVHSAEQjBAMDQGCysGAQQBsjEBAgIHMCUw\nIwYIKwYBBQUHAgEWF2h0dHBzOi8vc2VjdGlnby5jb20vQ1BTMAgGBmeBDAECATCB\nhAYIKwYBBQUHAQEEeDB2ME8GCCsGAQUFBzAChkNodHRwOi8vY3J0LnNlY3RpZ28u\nY29tL1NlY3RpZ29SU0FEb21haW5WYWxpZGF0aW9uU2VjdXJlU2VydmVyQ0EuY3J0\nMCMGCCsGAQUFBzABhhdodHRwOi8vb2NzcC5zZWN0aWdvLmNvbTAzBgNVHREELDAq\nghF2Y2xvdWQudm13aXJlLmNvbYIVd3d3LnZjbG91ZC52bXdpcmUuY29tMIIBAgYK\nKwYBBAHWeQIEAgSB8wSB8ADuAHUAsh4FzIuizYogTodm+Su5iiUgZ2va+nDnsklT\nLe+LkF4AAAFsv3BsIwAABAMARjBEAiBat+l0e3BTu+EBcRJfR8hCA/CznWm1mbVl\nxZqDoKM6tAIgON6U0YoqA91xxpXH2DyA04o5KSdSvNT05wz2aa7zkzwAdQBep3P5\n31bA57U2SH3QSeAyepGaDIShEhKEGHWWgXFFWAAAAWy/cGw+AAAEAwBGMEQCIDHl\njofAcm5GqECwtjBfxYD7AFkJn4Ez0IGRFrux4ldiAiAaNnkMbf0P9arSDNno4hQT\nIJ2hUaIWNfuKBEIIkfqhCTANBgkqhkiG9w0BAQsFAAOCAQEAZCubBHRV+m9iiIeq\nCoaFV2YZLQUz/XM4wzQL+73eqGHINp6xh/+kYY6vw4j+ypr9P8m8+ouqichqo7GJ\nMhjtbXrB+TTRwqQgDHNHP7egBjkO+eDMxK4aa3x1r1AQoRBclPvEbXCohg2sPUG5\nZleog76NhPARR43gcxYC938OH/2TVAsa4JApF3vbCCILrbTuOy3Z9rf3aQLSt6Jp\nkh85w6AlSkXhQJWrydQ1o+NxnfQmTOuIH8XEQ2Ne1Xi4sbiMvWQ7dlH5/N8L8qWQ\nEPCWn+5HGxHIJFXMsgLEDypvuXGt28ZV/T91DwPLeGCEp8kUC3N+uamLYeYMKOGD\nMrToTA==\n—–END CERTIFICATE—–
tier1-full-paththe full path to the nsx-t tier1 gateway that will run the load balancer,

for example /infra/tier-1s/stage1-m-ec01-t1-gw01
vcd-dmz-segment-namethe portgroup name of the vcd dmz portgroup,

for example stage1-m-vCDFront
allowed_ip_aan ip address that is allowed to access the /provider URI and the admin API
allowed_ip_ban ip address that is allowed to access the /provider URI and the admin API
Variables

Now you’re ready to run the calls.

The collection and environment are available to download from Github.

Protecting Cloud Director with NSX-T Load Balancer L7 HTTP Policies

Running Cloud Director (formerly vCloud Director) over the Internet has its benefits however opens up the portal to security risks. To prevent this, we can use the native load balancing capabilities of NSX-T to serve only HTTP access to the URIs that are required and preventing access to unnecessary URIs from the rest of the Internet.

Running Cloud Director (formerly vCloud Director) over the Internet has its benefits however opens up the portal to security risks. To prevent this, we can use the native load balancing capabilities of NSX-T to serve only HTTP access to the URIs that are required and preventing access to unnecessary URIs from the rest of the Internet.

An example of this is to disallow the /provider and /cloudapi/1.0.0/sessions/provider URIs as these are provider side administrator only URIs that a service provider uses to manage the cloud and should not be accessible from the Internet.

The other article that I wrote previously describes the safe URIs and unsafe URIs that can be exposed over the Internet, you can find that article here. That article discuss doing the L7 HTTP policies using Avi. This article will go through how you can achieve the same with the built in NSX-T load balancer.

This article assumes that you already have the Load Balancer configured with the Cloud Director Virtual Servers, Server Pools and HTTPS Profiles and Monitors already set up. If you need a guide on how to do this, then please visit Tomas Fojta’s article here.

The L7 HTTP rules can be set up under Load Balancing | Virtual Servers. Edit the Virtual Server rule for the Cloud Director service and open up the Load Balancer Rules section.

Click on the Set link next to HTTP Access Phase. I’ve already set mine up so you can see that I already have two rules. You should also end up with two rules once this is complete.

Go ahead and add a new rule with the Add Rule button.

The first rule we want to set up is to prevent access from the Internet to the /provider URI but allow an IP address or group of IP addresses to access the service for provider side administration, such as a management bastion host.

Set up you rule as follows:

What we are doing here is creating a condition that when the /provider URI is requested, we drop all incoming connections unless the connection is initiated from the management jump box, this jump box has an IP address of 10.37.5.30. The Negate option is enabled to achieve this. Think of negate as the opposite of the rule, so negate does not drop connections to /provider when the source IP address is 10.37.5.30.

Here’s the brief explanation from the official NSX-T 3.0 Administration Guide.

If negate is enabled, when Connection Drop is configured, all requests not
matching the specified match condition are dropped. Requests matching the
specified match condition are allowed.

Save this rule and lets setup another one to prevent access to the admin API. Setup this second rule as follows:

This time use /cloudapi/1.0.0/sessions/provider as the URI. Again, use the Negate option for your management IP address. Save your second rule and Apply all the changes.

Now you should be able to access /tenant URIs over the Internet but not the /provider URI. However, accessing the /provider URI from 10.37.5.30 (or whatever your equivalent is) will work.

Doing this with the API

Do a PUT against /policy/api/v1/infra/lb-virtual-servers/vcloud with the following.

(Note that the Terraform provider for NSX-T doesn’t support HTTP Access yet. So to automate, use the NSX-T API directly instead.)

{
  "enabled": true,
  "ip_address": "<IP_address_of_this_load_balancer>",
  "ports": [
    "443"
  ],
  "access_log_enabled": false,
  "lb_persistence_profile_path": "/infra/lb-persistence-profiles/default-source-ip-lb-persistence-profile",
  "lb_service_path": "/infra/lb-services/vcloud",
  "pool_path": "/infra/lb-pools/vcd-appliances",
  "application_profile_path": "/infra/lb-app-profiles/vcd-https",
  "client_ssl_profile_binding": {
    "ssl_profile_path": "/infra/lb-client-ssl-profiles/default-balanced-client-ssl-profile",
    "default_certificate_path": "/infra/certificates/my-signed-certificate",
    "client_auth": "IGNORE",
    "certificate_chain_depth": 3
  },
  "server_ssl_profile_binding": {
    "ssl_profile_path": "/infra/lb-server-ssl-profiles/default-balanced-server-ssl-profile",
    "server_auth": "IGNORE",
    "certificate_chain_depth": 3,
    "client_certificate_path": "/infra/certificates/my-signed-certificate"
  },
    "rules": [
    {
      "match_conditions": [
        {
          "uri": "/cloudapi/1.0.0/sessions/provider",
          "match_type": "CONTAINS",
          "case_sensitive": false,
          "type": "LBHttpRequestUriCondition",
          "inverse": false
        },
        {
          "source_address": "10.37.5.30",
          "type": "LBIpHeaderCondition",
          "inverse": true
        }
      ],
      "match_strategy": "ALL",
      "phase": "HTTP_ACCESS",
      "actions": [
        {
          "type": "LBConnectionDropAction"
        }
      ]
    },
    {
      "match_conditions": [
        {
          "uri": "/provider",
          "match_type": "EQUALS",
          "case_sensitive": false,
          "type": "LBHttpRequestUriCondition",
          "inverse": false
        },
        {
          "source_address": "10.37.5.30",
          "type": "LBIpHeaderCondition",
          "inverse": true
        }
      ],
      "match_strategy": "ALL",
      "phase": "HTTP_ACCESS",
      "actions": [
        {
          "type": "LBConnectionDropAction"
        }
      ]
    }
  ],
  "log_significant_event_only": false,
  "resource_type": "LBVirtualServer",
  "id": "vcloud",
  "display_name": "vcloud",
  "_revision": 1
}

Workflow for end-to-end tenant provisioning with VMware Cloud Director

VMware vRealize Orchestrator workflows for VMware Cloud Director to automate the provisioning of cloud services.

Firstly, apologies to all those who asked for the workflow at VMworld 2019 in Barcelona and also e-mailed me for a copy. It’s been hectic in my professional and personal life. I also wanted to clean up the workflows and remove any customer specific items that are not relevant to this workflow. Sorry it took so long!

If you’d like to see an explanation video of the workflows in action, please take a look at the VMworld session recording.

Credits

These vRealize Orchestrator workflows were co-created and developed by Benoit Serratrice and Henri Timmerman.

You can download a copy of the workflow using this link here.

What does it do?

Commission Customer Process

The workflow does the following:

  1. Creates an organization based on your initial organisation name as an input.
  2. Creates a vDC into this organization.
  3. Adds a gateway to the vDC.
  4. Adds an routed network with a gateway CIDR that you enter.
  5. Adds a direct external network.
  6. Converts the organization network to use distributed routing.
  7. Adds a default outbound firewall rule for the routed network.
  8. Adds a source NAT rule to allow the routed network to goto the external network.
  9. Adds a catalog.
Commission Customer vRO Workflow

It also cleans up the provisioning if there is a failure. I have also included a Decommission Customer workflow separately to enable you to quickly delete vCD objects quickly and easily. It is designed for lab environments. Bear this in mind when using it.

Other caveats: the workflows contained in this package are unsupported. I’ll help in the comments below as much as I can.

Getting Started

Import the package after downloading it from github.

The first thing you need to do is setup the global settings in the Global, Commission, storageProfiles and the other configurations. You can find these under Assets > Configurations.

You should then see the Commission Customer v5 workflow under Workflows in your vRO client, it should look something like this.

Enter a customer name and enter the gateway IP in CIDR into the form.

Press Run, then sit back and enjoy the show.

Known Issues

Commissioning a customer when there are no existing edge gateways deployed that use an external network. You see the following error in the vRO logs:

item: 'Commission Customer v5/item12', state: 'failed', business state: 'null', exception: 'TypeError: Cannot read property "ipAddress" from null (Workflow:Commission Customer v5 / get next ip (item8)#5)'

This happens because no IP addresses are in use from the external network pool. The Commission Customer workflow calculates the next IP address to assign to the edge gateway, it cannot do this if the last IP in use is null. Manually provision something that uses one IP address from the external network IP pool. Then use the Commission Customer workflow, it should now work.

Commissioning a customer workflow completes successfully, however you see the following errors:

[2020-03-22 19:30:44.596] [I] orgNetworkId: 545b5ef4-ff89-415b-b8ef-bae3559a1ac7
[2020-03-22 19:30:44.662] [I] =================================================================== Converting Org network to a distributed interface...
[2020-03-22 19:30:44.667] [I] ** API endpoint: vcloud.vmwire.com/api/admin/network/545b5ef4-ff89-415b-b8ef-bae3559a1ac7/action/convertToDistributedInterface
[2020-03-22 19:30:44.678] [I] error caught!
[2020-03-22 19:30:44.679] [I] error details: InternalError: Cannot execute the request:  (Workflow:Convert net to distributed interface / Post to vCD (item4)#21)
[2020-03-22 19:30:44.680] [I] error details: Cannot execute the request:  (Workflow:Convert net to distributed interface / Post to vCD (item4)#21)
[2020-03-22 19:30:44.728] [I] Network converted succesfully.

The workflow attempts to convert the org network from an internal interface to a distributed interface but it does not work even thought the logs says it was successful. Let me know if you are able to fix this.

VMworld 2019 Rewatch: Building a Modern Cloud Hosting Platform on VMware Cloud Foundation with VMware vCloud Director (HBI1321BE)

Rewatch my session with Onni Rautanen at VMworld EMEA 2019 where we cover the clouds that we are building together with Tieto.

Rewatch my session with Onni Rautanen at VMworld EMEA 2019 where we cover the clouds that we are building together with Tieto.

Description: In this session, you will get a technical deep dive into Tieto’s next generation service provider cloud hosting platform running on VMware vCloud Director Cloud POD architecture deployed on top of VMware Cloud Foundation. Administrators and cloud engineers will learn from Tieto cloud architects about their scalable design and implementation guidance for building a modern multi-tenant hosting platform for 10,000+ VMs. Other aspects of this session will discuss the API integration of ServiceNow into the VMware cloud stack, Backup and DR, etc.

You’ll need to create a free VMworld account to access this video and many other videos that are made available during and after the VMworld events.

https://videos.vmworld.com/global/2019/videoplayer/29271

Load Balancing and Protecting Cloud Director with Avi Networks

This article covers protecting and load balancing the Cloud Director application with Avi Networks. It covers SSL termination. health monitoring and layer 7 HTTP filtering. It can also be used as a reference for other load balancer products such as F5 LTM or NGINX.

Overview

The Avi Vantage platform is built on software-defined principles, enabling a next generation architecture to deliver the flexibility and simplicity expected by IT and lines of business. The Avi Vantage architecture separates the data and control planes to deliver application services beyond load balancing, such as application analytics, predictive autoscaling, micro-segmentation, and self-service for app owners in both on-premises or cloud environments. The platform provides a centrally managed, dynamic pool of load balancing resources on commodity x86 servers, VMs or containers, to deliver granular services close to individual applications. This allows network services to scale near infinitely without the added complexity of managing hundreds of disparate appliances.

https://avinetworks.com/docs/18.2/architectural-overview/

Avi components

Controllers – these are the management appliances that are responsible for state data, Service Engines are deployed by the controllers. The controllers run in a management network.

Service Engines – the load balancing services run in here. These generally run in a DMZ network. Service Engines can have one or more network adaptors connected to multiple networks. At least one network with routing to the controllers, and the remaining networks as data networks.

Deployment modes

Avi can be installed in a variety of deployment types. For VMware Cloud on AWS, it is not currently possible to deploy using ‘write access’ as vCenter is locked-down in VMC and it also has a different API from vSphere 6.7 vCenter Server. You’ll also find that other tools may not work with vCenter in a VMware Cloud on AWS SDDC, such as govc.

Instead Avi needs to be deployed using ‘No Access’ mode.

You can refer to this link for instructions to deploy Avi Controllers in ‘No Access’ mode.

Since it is only possible to use ‘No Access’ mode with VMC based SDDCs, its also a requirement to deploy the service engines manually. To do this follow the guide in this link, and start at the section titled Downloading Avi Service Engine on OVA.

If you’re using Avi with on-premises deployments of vCenter, then ‘Write Mode’ can be used to automate the provisioning of service engines. Refer to this link for more information on the different modes.

Deploying Avi Controller with govc

You can deploy the Avi Controller onto non VMware Cloud on AWS vCenter servers using the govc tool. Refer to this other post on how to do so. I’ve copied the JSON for the controller.ova for your convenience below.

{
    "DiskProvisioning": "flat",
    "IPAllocationPolicy": "dhcpPolicy",
    "IPProtocol": "IPv4",
    "PropertyMapping": [
        {
            "Key": "avi.mgmt-ip.CONTROLLER",
            "Value": ""
        },
        {
            "Key": "avi.mgmt-mask.CONTROLLER",
            "Value": ""
        },
        {
            "Key": "avi.default-gw.CONTROLLER",
            "Value": ""
        },
        {
            "Key": "avi.sysadmin-public-key.CONTROLLER",
            "Value": ""
        }
    ],
    "NetworkMapping": [
        {
            "Name": "Management",
            "Network": ""
        }
    ],
    "MarkAsTemplate": false,
    "PowerOn": false,
    "InjectOvfEnv": false,
    "WaitForIP": false,
    "Name": null
}

Architecture

For a high-level architecture overview, this link provides a great starting point.

Figure 1. Avi architecture

Service Engine Typical Deployment Architecture

Generally in legacy deployments, where BGP is not used. The service engines would tend to have three network interfaces. These are typically used for frontend, backend and management networks. This is typical of traditional deployments with F5 LTM for example.

For our example here, I will use three networks for the SEs as laid out below.

Network nameGateway CIDRPurpose
sddc-cgw-vcd-dmz1 10.104.125.1/24Management
sddc-cgw-vcd-dmz210.104.126.1/24Backend
sddc-cgw-vcd-dmz310.104.127.1/24Frontend

The service engines are configured with the following details. It is important to make a note of the MAC addresses in ‘No access’ mode as you will need this information later.

Service Engineavi-se1avi-se2
ManagementIP Address 10.104.125.11
Mac Address 00:50:56:8d:c0:2e
IP Address 10.104.125.12
Mac Address 00:50:56:8d:38:33
BackendIP Address 10.104.126.11
Mac Address 00:50:56:8d:8e:41
IP Address 10.104.126.12
Mac Address 00:50:56:8d:53:f6
FrontendIP Address 10.104.127.11
Mac Address 00:50:56:8d:89:b4
IP Address 10.104.127.12
Mac Address 00:50:56:8d:80:41

The Management network is used for communications between the SEs and the Avi controllers. For the port requirements, please refer to this link.

The Backend network is used for communications between the SEs and the application that is being load balanced and protected by Avi.

The Frontend network is used for upstream communications to the clients, in this case the northbound router or firewall towards the Internet.

Sample Application

Lets use VMware Cloud Director as the sample application for configuring Avi. vCD as it is more commonly named (to be renamed VMware Cloud Director), is a cloud platform which is deployed with an Internet facing portal. Due to this, it is always best to protect the portal from malicious attacks by employing a number of methods.

Some of these include, SSL termination and web application filtering. The following two documents explain this in more detail.

vCloud Director Security and VMware vCloud Director Security Hardening Guide.

The vCD application is configured as below:

vCD Appliance 1vCD Appliance 2
namevcd-fe1vcd-fe2
eth0 ip address10.104.123.2110.104.123.22
static route10.104.123.1 10.104.126.0/2410.104.123.1 10.104.126.0/24
eth1 ip address10.104.124.2110.104.124.22

You’ll notice that the eth0 and eth1 interfaces are connected to two different management networks 10.104.123.0/24 and 10.104.124.0/24 respectively. For vCD, it is generally good practice to separate the two interfaces into separate networks.

Network nameGateway CIDRPurpose
sddc-cgw-vcd-mgmt-110.104.123.1/24vCD Frontend
UI/API/VM Remote Console
sddc-cgw-vcd-mgmt-210.104.124.1/24vCD Backend
PostgreSQL, SSH etc.

For simplicity, I also deployed my Avi controllers onto the sddc-cgw-vcd-mgmt-2 network.

The diagram below summarises the above architecture for the HTTP interface for vCD. For this guide, I’ve used VMware Cloud on AWS together with Avi Networks to protect vCD running as an appliance inside the SDDC. This is not a typical deployment model as Cloud Director Service will be able to use VMware Cloud on AWS SDDC resource soon, but I wanted to showcase the possibilities and constraints when using Avi with VMC based SDDCs.

Figure 2 . vCD HTTP Diagram

Configuring Avi for Cloud Director

After you have deployed the Avi Controllers and the Service Engines, there are few more steps needed before vCD is fully up and operational. The proceeding steps can be summarised as follows:

  1. Setup networking for the service engines by assigning the right IP address to the correct MAC addresses for the data networks
  2. Configure the network subnets for the service engines
  3. Configure static routes for the service engines to reach vCD
  4. Setup Legacy HA mode for the service engine group
  5. Setup the SSL certificate for the HTTP service
  6. Setup the Virtual Services for HTTP and Remote Console (VMRC)
  7. Setup the server pools
  8. Setup health monitors
  9. Setup HTTP security policies

Map Service Engine interfaces

Using the Avi Vantage Controller, navigate to Infrastructure > Service Engine, select one of the Service Engines then click on the little pencil icon. Then map the MAC addresses to the correct IP addresses.

Configure the network subnets for the service engines

Navigate to Infrastructure > Networks and create the subnets.

Configure static routes

Navigate to Infrastructure > Routing and setup any static routes. You’ll notice from figure 2 that since the service engine has three network interfaces on different networks, we need to create a static route on the interface that does not have the default gateway. This is so the service engines knows which gateway to use to route traffic for particular traffic types. In this case, the gateway for the service engine to route the HTTP and Remote Console traffic southbound to the vCD cells.

Setup Legacy HA mode for the service engine group

Navigate to Infrastructure > Service Engine Group.

Setup the HA mode to Legacy HA. This is the simplest configuration, you can use Elastic HA if you wish.

Configure the HTTP and Remote Console Virtual Services

Navigate to Applications > Virtual Services.

Creating a Virtual Service, has a few sub tasks which include the creation of the downstream server pools and SSL certificates.

Create a new Virtual Service for the HTTP service, this is for the Cloud Director UI and API. Please use this example to create another Virtual Service for the Remote Console.

For the Remote Console service, you will need to accept TCP 443 on the load balancer but connect southbound to the Cloud Director appliances on port TCP 8443. TCP 8443 is the port that VMRC uses as it shares the same IP addresses as the HTTP service.

You may notice that the screenshot is for an already configured Virtual Service for the vCD HTTP service. The server pool and SSL certificate is already configured. Below are the screenshots for those.

Certificate Management

You may already have a signed HTTP certificate that you wish to use with the load balancer for SSL termination. To do so, you will need to use the JAVA keytool to manipulate the HTTP certificate, obtaining the private key and convert from JCEKS to PCKS12. JAVA keytool is available in the vCD appliance at /opt/vmware/vcloud-director/jre/bin/.

Figure 3. SSL termination on load balancer

For detailed instructions on creating a signed certificate for vCD, please follow this guide.

Convert the keystore file certificates.ks file from JCEKS to PKCS12

keytool -importkeystore -srcstoretype JCEKS -srckeystore certificates.ks -destkeystore certificates_pkcs12.ks -deststoretype PKCS12

Export private key for the HTTP certificate from the certificates_pkcs12.ks file

keytool -importkeystore -srckeystore certificates_pkcs12.ks -srcalias http -destalias http -destkeystore httpcert.p12 -deststoretype PKCS12

Now that you have the private key for the HTTP certificate, you can go ahead and configure the HTTP certificate on the load balancer.

For the certificate file, you can either paste the text or upload the certificate file (.cer, .crt) from the certificate authority for the HTTP certificate.

For the Key (PEM) or PKCS12 file, you can use the httpcert.p12 file that you extracted from the certificates_pkcs12.ks file above.

The Key Passphrase is the password that you used to secure the httpcert.p12 file earlier.

Note that the vCD Remote Console (VMRC) must use pass-through for SSL termination, e.g., termination of the VMRC session must happen on the Cloud Director cell. Therefore, the above certificate management activities on Avi are not required for the VMRC.

Health Monitors

Navigate to Applications > Pools.

Edit the HTTP pool using the pencil icon and click on the Add Active Monitor green button.

Health monitoring of the HTTP service uses

GET /cloud/server_status HTTP/1.0

With an expected server response of

Service is up.

And a response code of 200.

The vCD Remote Console Health monitor is a lot simpler as you can see below.

Layer 7 HTTP Security

Layer 7 HTTP Security is very important and is highly recommended for any application exposed to the Internet. Layer 3 fire-walling and SSL certificates is always never enough in protecting and securing applications.

Navigate to Applications > Virtual Services.

Click on the pencil icon for the HTTP virtual service and then click on the Policies tab. Then click on the HTTP Security policy. Add a new policy with the following settings. You can read more about Layer 7 HTTP policies here.

Allowed StringsRequired by
/tenantTenant use
/loginLogin
/networkAccess to networking
/tenant-networkingAccess to networking
/cloudFor SAML/SSO logins
/transferUploads/Downloads of ISO and templates
/apiGeneral API access
/cloudapiGeneral API access
/docsSwagger API browser
Blocked Strings
/cloudapi/1.0.0/sessions/providerSpecifically block admin APIs from the Internet

This will drop all provider side services when accessed from the Internet. To access provider side services, such as /provider or admin APIs, use an internal connection to the Cloud Director cells.

Change Cloud Director public addresses

If not already done so, you should also change the public address settings in Cloud Director.

Testing the Cloud Director portal

Try to access https://vcloud.vmwire.com/provider

You won’t be able to access it as /provider is not on the list of allowed URI strings that we configured in the L7 HTTPS Security settings.

However, if you try to access https://vcloud.vmwire.com/tenant/vmwire, you will be able to reach the tenant portal for the organisation named VMwire.

Many thanks to Mikael Steding, our Avi Network Systems Engineer for helping me with setting this up.

Please reach out to me if you have any questions.