Enable external access to Kubernetes clusters running in VMware Cloud Director with CSE expose

This post details how you can enable Kubernetes clusters provisioned by the Container Service Extension to be accessible from outside of the cloud provider networks.

Providing great user experience to Kubernetes as a service from a cloud provider is important and as such enabling users to use their tools running on their personal devices to connect to remotely hosted Kubernetes clusters running in the cloud is a key feature of any cloud service.

This post details how you can enable Kubernetes clusters provisioned by the Container Service Extension to be accessible from outside of the cloud provider networks.

Providing great user experience to Kubernetes as a service from a cloud provider is important and as such enabling users to use their tools running on their personal devices to connect to remotely hosted Kubernetes clusters running in the cloud is a key feature of any cloud service.

A brief review of VCD networking

VMware Cloud Director provides network isolation between tenants by leveraging Geneve based networking provided by NSX-T. In simple terms, a tenant can utilize any network subnet without worrying about clashing with any other tenant using the same VCD cloud.

That means that a tenant with a private address space can deploy a Kubernetes cluster and utilize internal addresses for the Control Plane and the Worker nodes. A user can then access the Control Plane endpoint from inside of the tenant’s VDC and run kubectl commands happily and this will work – using a jumpbox for example. However, doing this from outside of the organization virtual datacenter will not work. Even if you tried to setup a DNAT rule to NAT to the internal IP of the Control Plane endpoint and mapping it to an external IP on the Edge gateway.

It doesn’t work because of the x.509 certificate that gets created when kubeadm creates the Kubernetes cluster. During this phase the certificate needs to include all subject alternative names (SANS) and with CSE, there is no way for the operator to define SANs during cluster provisioning with CSE.

If you attempt to connect using the external IP of the DNAT rule, you may get an error like the below:

kubectl get nodes -A --kubeconfig=tkg-vcd.yaml

Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 192.168.0.100, not 10.149.1.101

For context, 192.168.0.100 is the internal IP of the Control Plane node. 10.149.1.101 is the external IP in the external IP pool allocated to this tenant’s Edge gateway. See the high-level architecture diagram.

How can we enable a better user experience to access a Kubernetes cluster running in a provider’s cloud?

Container Service Extension has a feature called ‘expose’ that can be used during Kubernetes cluster provisioning to enable the DNAT changes to the Edge gateway as well as including the external IP into the x.509 certificate SANs. This is done automatically and at the current CSE 3.0.4 version only through the vcd cse cli. Please see my previous post to learn more.

What is supported with CSE 3.0.4?

Expose works under the following conditions

  • cluster deployment via vcd cse cli only, no UI
  • new kubernetes cluster deployments only
  • you can deploy a cluster without expose initially but you cannot expose it later
  • you can deploy a cluster with expose and then un-expose it later, however you cannot re-expose it again
  • you are using NSX-T for VCD networking
  • the tenant has an Edge gateway defined for their VDCs
  • you have an external IP pool assigned to the Edge gateway
  • expose works with both TKGm and native k8s runtimes

High Level Architecture

Deploying a Kubernetes cluster using expose

To enable this feature create a cluster config file anywhere on a terminal with the vcd cse cli installed. Below is an example of my config.yaml file, notice the lines for kind: use either TKGm for a TKGm runtime or native for a native runtime. Also change the template_name to suit the runtime.

The line under the spec section for expose: true will enable this feature.

api_version: '35.0'
  kind: TKGm
  metadata:
    cluster_name: tkg1
    org_name: tenant1
    ovdc_name: tenant1-vdc
  spec:
    control_plane:
      count: 1
      storage_profile: truenas-iscsi-luns
    expose: true
    k8_distribution:
      template_name: ubuntu-20.04_tkgm-1.20_antrea-0.11
      template_revision: 1
    settings:
      network: default-organization-network
      rollback_on_failure: true
      ssh_key: null
    workers:
      count: 1
      storage_profile: truenas-iscsi-luns

Log into VCD using tenant credentials, by the way a tenant can use vcd cse cli to do this themselves to maintain self-service use cases. As a provider you don’t have to do this on a tenant’s behalf.

syntax is vcd login <cloud-url> <organization> <user>

vcd login vcd.vmwire.com tenant1 tenant1-admin -i -w

Now to run the deployment just use this command

vcd cse cluster apply config.yaml

You’ll see in VCD that the tasks will kick off and your new cluster will be made available soon. What VCD does during deployment is it will pick up an IP address either using DHCP or static IP pool for the internal network (geneve NSX-T segment), in my example this is an IP on the 192.168.0.0/24 range and in the organization network named default-organization-network. This IP will be assigned to the master node of the Control Plane, in my case 192.168.0.100.

VCD will also create a DNAT rule and pick up the next available IP address from the external IP pool allocated to the Edge gateway. In my example this will be 10.149.1.102.

You can review the tasks for this workflow below

Once the cluster is ready, a user will just need to download the kubeconfig file onto his workstation and use the cluster.

Notice that the Control Plane Gateway IP is not an internal IP but in fact one of the external IPs of the organization VDC.

This is also reflected in the kubeconfig file on line 5. CSE expose uses the external IP and also adds all the IPs into the SANs.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: [snipped]
    server: https://10.149.1.102:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: [snipped]
    client-key-data: [snipped]

Logging into the Kubernetes cluster from outside of the cloud

As long as your workstation can route to the Control Plane Gateway IP you will be able to access the cluster from anywhere. Note that you can allocate public IP addresses directly to the Edge gateway, and in fact I work with providers who do this using BGP to the NSX-T T0. CSE expose basically uses an IP from the external network IP allocation pool.

The easiest way to test connectivity is to use kubectl like the following example.

kubectl get nodes -A --kubeconfig=/root/kubeconfig-native4.yaml

Which will have a response of

NAME        STATUS   ROLES                  AGE    VERSION
mstr-18nu   Ready    control-plane,master   13m    v1.21.2
node-oh7f   Ready    <none>                 8m4s   v1.21.2

This of course corresponds to what has been deployed in VCD.

More screenshots

Author: Hugo Phan

@hugophan

One thought on “Enable external access to Kubernetes clusters running in VMware Cloud Director with CSE expose”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s