How to import existing infrastructure into Terraform management

This post will focus on how to import existing infrastructure into Terraform’s management. Some scenarios where this could happen is that you’ve already deployed infrastructure and have only recently started to look into infrastructure as code and maybe you’ve tried to use PowerShell, Ansible and other tools but none are quite as declarative as Terraform.

Terraform is a great framework to use to start developing and working with infrastructure-as-code to manage resources. It provides awesome benefits such as extremely fast deployment through automation, managing configuration drift, adding configuration changes and destroying entire environments with a few key strokes. Plus it supports many providers so you can easily use the same code logic to deploy and manage different resources, for example on VMware clouds, AWS or Azure at the same time.

For more information if you haven’t looked at Terraform before, please take a quick run through HashiCorp’s website:

https://www.terraform.io/

Getting started with Terraform is really quite simple when the environment that you are starting to manage is green-field. In that, you are starting from a completely fresh deployment on Day-0. If we take AWS as an example, this is as fresh as signing up to the AWS free-tier with a new account and having nothing deployed in your AWS console.

Terraform has a few simple files that are used to build and manage infrastructure through code, these are the configuration and the state. The basic building blocks of Terraform. There are other files and concepts that could be used such as variables and modules, but I won’t cover these in much detail in this post.

How do you bring in infrastructure that is already deployed into Terraform’s management?

This post will focus on how to import existing infrastructure (brown-field) into Terraform’s management. Some scenarios where this could happen is that you’ve already deployed infrastructure and have only recently started to look into infrastructure as code and maybe you’ve tried to use PowerShell, Ansible and other tools but none are quite as useful as Terraform.

Assumptions

First lets assume that you’ve deployed Terraform CLI or are already using Terraform Cloud, the concepts are pretty much the same. I will be using Terraform CLI for the examples in this post together with AWS. I’m also going to assume that you know how to obtain access and secret keys from your AWS Console.

By all means this import method works with any supported Terraform provider, including all the VMware ones. For this exercise, I will work with AWS.

My AWS environment consists of the following infrastructure, yours will be different of course and I’m using this infrastructure below in the examples.

You will need to obtain the AWS resource IDs from your environment, use the AWS Console or API to obtain this information.

#	Resource	Name	AWS Resource ID
1	VPC	VPC	vpc-02d890cacbdbaaf87
2	PublicSubnetA	PublicSubnetA	subnet-0f6d45ef0748260c6
3	PublicSubnetB	PublicSubnetB	subnet-092bf59b48c62b23f
4	PrivateSubnetA	PrivateSubnetA	subnet-03c31081bf98804e0
5	PrivateSubnetB	PrivateSubnetB	subnet-05045746ac7362070
6	IGW	IGW	igw-09056bba88a03f8fb
7	NetworkACL	NACL	acl-0def8bcfeff536048
8	RoutePublic	PublicRoute	rtb-082be686bca733626
9	RoutePrivate	PrivateRoute	rtb-0d7d3b5eacb25a022
10	Instance1	Instance1	i-0bf15fecd31957129
11	elb	elb-UE360LJ7779C	elb-158WU63HHVD3
12	SGELB	ELBSecurityGroup	sg-0b8f9ee4e1e2723e7
13	SGapp	AppServerSecurityGroup	sg-031fadbb59460a776

Table 1. AWS Resource IDs

But I used CloudFormation to deploy my infrastructure…

If you used CloudFormation to deploy your infrastructure and you now want to use Terraform, then you will need to update the CloudFormation deletion policy to retain before bringing any resources into Terraform. This is important as any accidental deletion or change with CloudFormation stack would impact your Terraform configuration and state. I recommend setting this policy before importing resources with Terraform.

This link has some more information that will help you enable the deletion policy on all resources.

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-deletionpolicy.html

For example to change a CloudFormation configuration with the deletion policy enabled, the code would look like this:

Resources:


    VPC:
      Type: AWS::EC2::VPC
      DeletionPolicy: Retain
      Properties:
        CidrBlock: 10.0.0.0/16
        InstanceTenancy: default
        EnableDnsSupport: 'true'
        EnableDnsHostnames: 'true'

Lets get started!

Set up your main.tf configuration file for a new project that will import an existing AWS infrastructure. The first version of our main.tf file will look like this, with the only resource that we will import being the VPC. Its always good to work with a single resource first to ensure that your import works before going all out and importing all the rest.

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }
}

provider "aws" {
  # Configuration options
  region = "eu-west-1"
  access_key = "my_access_key"
  secret_key = "my_secret_key"
}

resource "aws_vpc" "VPC" {
  # (resource arguments)
}

Run the following to initialize the AWS provider in Terraform.

terraform init

Import the VPC resource with this command in your terminal

terraform import aws_vpc.VPC vpc-02d890cacbdbaaf87

You can then review the terraform state file, it should be named terraform.tfstate, and it will look something like this. (Open it in a text editor).

{
  "version": 4,
  "terraform_version": "0.14.6",
  "serial": 13,
  "lineage": "xxxx",
  "outputs": {},
  "resources": [    {
  "mode": "managed",
      "type": "aws_vpc",
      "name": "VPC",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 1,
          "attributes": {
            "arn": "xxxx",
            "assign_generated_ipv6_cidr_block": false,
            "cidr_block": "10.0.0.0/16",
            "default_network_acl_id": "acl-067e11c10e2327cc9",
            "default_route_table_id": "rtb-0a55b9e1683991242",
            "default_security_group_id": "sg-0db58c5c159b1ebf9",
            "dhcp_options_id": "dopt-7d1b121b",
            "enable_classiclink": false,
            "enable_classiclink_dns_support": false,
            "enable_dns_hostnames": true,
            "enable_dns_support": true,
            "id": "vpc-02d890cacbdbaaf87",
            "instance_tenancy": "default",
            "ipv6_association_id": "",
            "ipv6_cidr_block": "",
            "main_route_table_id": "rtb-0a55b9e1683991242",
            "owner_id": "xxxxxxx",
            "tags": {
              "Name": "VPC",
              "environment": "aws",
              "project": "Imported by Terraform"
            }
          },
          "sensitive_attributes": [],
          "private": "xxxxxx"
        }
      ]
    }
  ]
}

Notice that the VPC and all of the VPC settings have now been imported into Terraform.

Now that we have successfully imported the VPC, we can continue and import the rest of the infrastructure. The remaining AWS services we need to import are detailed in Table 1. AWS Resource IDs.

To import the remaining infrastructure we need to add the code to the main.tf file to import the other resources. Edit your main.tf so that it looks like this. Notice that all of the thirteen resources are defined in the configuration file and the resource arguments are all empty. We will update the resource arguments later, initially we just need to import the resources into the Terraform state and then update the configuration with the known state.

Terraform does not support automatic creation of a configuration out of a state.

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
      version = "3.28.0"
    }
  }
}

provider "aws" {
  # Configuration options
  region = "eu-west-1"
  access_key = "my_access_key"
  secret_key = "my_secret_key"
}

resource "aws_vpc" "SAVPC" {
  # (resource arguments)
}

resource "aws_subnet" "PublicSubnetA" {
  # (resource arguments)
}

resource "aws_subnet" "PublicSubnetB" {
  # (resource arguments)
}

resource "aws_subnet" "PrivateSubnetA" {
  # (resource arguments)
}

resource "aws_subnet" "PrivateSubnetB" {
  # (resource arguments)
}

resource "aws_internet_gateway" "IGW" {
  # (resource arguments)
}

resource "aws_network_acl" "NACL" {
  # (resource arguments)
}

resource "aws_route_table" "PublicRoute" {
  # (resource arguments)
}

resource "aws_route_table" "PrivateRoute" {
  # (resource arguments)
}

resource "aws_instance" "Instance1" {
  # (resource arguments)
}

resource "aws_elb" "elb-UE360LJ7779C" {
  # (resource arguments)
}

resource "aws_security_group" "ELBSecurityGroup" {
  # (resource arguments)
}

resource "aws_security_group" "AppServerSecurityGroup" {
  # (resource arguments)
}

Run the following commands in your terminal to import the remaining resources into Terraform.

terraform import aws_subnet.PublicSubnetA subnet-0f6d45ef0748260c6
terraform import aws_subnet.PublicSubnetB subnet-092bf59b48c62b23f
terraform import aws_subnet.PrivateSubnetA subnet-03c31081bf98804e0
terraform import aws_subnet.PrivateSubnetB subnet-05045746ac7362070
terraform import aws_internet_gateway.IGW igw-09056bba88a03f8fb
terraform import aws_network_acl.NACL acl-0def8bcfeff536048
terraform import aws_route_table.PublicRoute rtb-082be686bca733626
terraform import aws_route_table.PrivateRoute rtb-0d7d3b5eacb25a022
terraform import aws_instance.Instance1 i-0bf15fecd31957129
terraform import aws_elb.elb-158WU63HHVD3 elb-158WU63HHVD3
terraform import aws_security_group.ELBSecurityGroup sg-0b8f9ee4e1e2723e7
terraform import aws_security_group.AppServerSecurityGroup sg-031fadbb59460a776

Now that all thirteen resources are imported you will need to manually update the configuration file, in our case main.tf with the resource arguments that correspond to the current state of all the resources that were just imported. The easiest way to do this is to first take a look at the Terraform provider for AWS documentation to find the mandatory fields that are needed. Lets use the aws_subnet as an example:

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/subnet

From the documentation we need two things

cidr_block – (Required) The CIDR block for the subnet.

vpc_id – (Required) The VPC ID.

We know that we need these two as a minimum, but what if there are other configuration items that were done in the AWS Console or CloudFormation before you started to work with Terraform. An example of this is of course tags and other configuration parameters. You want to update your main.tf file with the same configuration as what was just imported into the state. This is very important.

To do this, do not use the terraform.tfstate but instead run the following command.

terraform show

You’ll get an output of the current state of your AWS environment that you can then copy and paste the resource arguments into your main.tf configuration.

I won’t cover how to do all thirteen resources in this post so I’ll again use our example for one of the aws_subnet resources. Here is the PublicSubnetA aws_subnet resource information copy and pasted straight out of the terraform show command.

# aws_subnet.PublicSubnetA:
resource "aws_subnet" "PublicSubnetA" {
    arn                             = "arn:aws:ec2:eu-west-1:xxxx:subnet/subnet-0f6d45ef0748260c6"
    assign_ipv6_address_on_creation = false
    availability_zone               = "eu-west-1a"
    availability_zone_id            = "euw1-az2"
    cidr_block                      = "10.0.0.0/24"
    id                              = "subnet-0f6d45ef0748260c6"
    map_customer_owned_ip_on_launch = false
    map_public_ip_on_launch         = true
    owner_id                        = "xxxx"
    tags                            = {
        "Name"        = "PublicSubnetA"
        "environment" = "aws"
        "project"     = "my_project"
    }
    vpc_id                          = "vpc-02d890cacbdbaaf87"

    timeouts {}
}

Not all resource arguments are needed, again review the documentation. Here is an example of my changes to the main.tf file with some of the settings taken from the output of the terraform show command.

resource "aws_subnet" "PublicSubnetA" {
    assign_ipv6_address_on_creation = false
    cidr_block                      = var.cidr_block_PublicSubnetA
	map_public_ip_on_launch         = true
    tags                            = {
        Name        = "PublicSubnetA"
        environment = "aws"
        project     = "my_project"
    }
    vpc_id                          = var.vpc_id

    timeouts {}
}

Notice that I have turned the value for cidr_block and vpc_id into a variables.

Using Variables

Using variables simplifies a lot of your code. I’m not going to explain what these are in this post, you can read up on these with this link:

https://www.terraform.io/docs/language/values/variables.html

However, the contents of my terraform.tfvars file looks like this:

cidr_block = "10.0.0.0/16"
vpc_id = "vpc-02d890cacbdbaaf87"
cidr_block_PublicSubnetA = "10.0.0.0/24"
cidr_block_PublicSubnetB = "10.0.1.0/24"
cidr_block_PrivateSubnetA = "10.0.2.0/24"
cidr_block_PrivateSubnetB = "10.0.3.0/24"
instance_type = "t2.micro"
ami_id = "ami-047bb4163c506cd98"
instance_port = "80"
instance_protocol = "http"
lb_port = "80"
lb_protocol = "http"

Just place your terraform.tfvars file in the same location as your main.tf file. Terraform automatically links to the default or you can reference a different variable file, again refer to the documentation.

Finalizing the configuration

Once you’ve updated your main.tf configuration with all the correct resource arguments, you can test to see if what is in the configuration is the same as what is in the state. To do this run the following command:

terraform plan

If you copied and pasted and updated your main.tf correctly then you would get output from your terminal similar to the following:

terraform plan
[ Removed content to save space ]

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.

Congratulations, you’ve successfully imported an infrastructure that was built outside of Terraform.

You can now proceed to manage your infrastructure with Terraform. For example changing the terraform.tfvars parameters for

lb_port = "443"
lb_protocol = "https"

And then running plan and apply will update the elastic load balancer elb-158WU63HHVD3 from health check on port 80 to port 443 instead.

terraform plan
[ removed content to save space ]
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_elb.elb-158WU63HHVD3 will be updated in-place
  ~ resource "aws_elb" "elb-158WU63HHVD3" {
      ~ health_check {
          ~ target              = "TCP:80" -> "TCP:443"           
        }
    }

terraform apply
[ content removed to save space] 

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

And that’s how you import existing resources into Terraform, I hope you find this post useful. Please comment below if you have a better method or have any suggestions for improvements. And feel free to comment below if you have questions and need help.

Automate NSX-T Load Balancer setup for Cloud Director and the Tenant App

This post describes how to use the NSX-T Policy API to automate the creation of load balancer configurations for Cloud Director and the vRealize Operations Tenant App.

I’ve included a Postman collection that contains all of the necessary API calls to get everything configured. There is also a Postman environment that contains the necessary variables to successfully configure the load balancer services.

To get started import the collection and environment into Postman.

You’ll see the collection in Postman named NSX-T Load Balancer Setup. All the steps are numbered to import certificates, configure the Cloud Director load balancer services. I’ve also included the calls to create the load balancer services for the vRealize Operations Tenant App.

Before you run any of those API calls, you’ll first want to import the Postman environment. Once imported you’ll see the environments in the top right screen of Postman, the environment is called NSX-T Load Balancer Setup.

Complete your environment variables.

Variable	Value Description
nsx_vip	nsx-t manager cluster virtual ip
nsx-manager-user	nsx-t manager username, usually admin
nsx-manager-password	nsx-t manager password
vcd-public-ip	public ip address for the vcd service to be configured on the load balancer
tenant-app-public-ip	public ip address for the tenant app service to be configured on the load balancer
vcd-cert-name	a name for the imported vcd http certificate
vcd-cert-private-key	vcd http certificate private key in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. for example: —–BEGIN RSA PRIVATE KEY—–\n<private key>\n—–END RSA PRIVATE KEY—–
vcd-cert-passphrase	vcd private key passphrase
vcd-certificate	vcd http certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. For example: —–BEGIN CERTIFICATE—–\nMIIGADCCBOigAwIBAgIRALUVXndtVGMeRM1YiMqzBCowDQYJKoZIhvcNAQELBQAw\ngY8xCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO\nBgNVBAcTB1NhbGZvcmQxGDAWBgNVBAoTD1NlY3RpZ28gTGltaXRlZDE3MDUGA1UE\nAxMuU2VjdGlnbyBSU0EgRG9tYWluIFZhbGlkYXRpb24gU2VjdXJlIFNlcnZlciBD\nQTAeFw0xOTA4MjMwMDAwMDBaFw0yMDA4MjIyMzU5NTlaMFUxITAfBgNVBAsTGERv\nbWFpbiBDb250cm9sIFZhbGlkYXRlZDEUMBIGA1UECxMLUG9zaXRpdmVTU0wxGjAY\nBgNVBAMTEXZjbG91ZC52bXdpcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A\nMIIBCgKCAQEAqh9sn6bNiDmmg3fJSG4zrK9IbrdisALFqnJQTkkErvoky2ax0RzV\n/ZJ/1fNHpvy1yT7RSZbKcWicoxatYPCgFHDzz2JwgvfwQCRMOfbPzohTSAhrPZph\n4FOPnrF8iwGggTxp+/2/ixg0DjQZL32rc9ax1qEvSURt571hUE7uLkRbPrdbocSZ\n4c2atVh8K1fp3uBqEbAs0UyjW5PK3wIN5ZRFArxc5kiGW0btN1RmoWwOmuJkAtu7\nzuaAJcgr/UVb1PP+GgAvKdmikssB1MWQALTRHm7H2GJp2MlbyGU3ZROSPkSSaNsq\n4otCJxtvQze/lB5QGWj5V2B7YbNJKwJdXQIDAQABo4ICjjCCAoowHwYDVR0jBBgw\nFoAUjYxexFStiuF36Zv5mwXhuAGNYeEwHQYDVR0OBBYEFNhZaRisExXrYrqfIIm6\n9TP8JrqwMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQG\nCCsGAQUFBwMBBggrBgEFBQcDAjBJBgNVHSAEQjBAMDQGCysGAQQBsjEBAgIHMCUw\nIwYIKwYBBQUHAgEWF2h0dHBzOi8vc2VjdGlnby5jb20vQ1BTMAgGBmeBDAECATCB\nhAYIKwYBBQUHAQEEeDB2ME8GCCsGAQUFBzAChkNodHRwOi8vY3J0LnNlY3RpZ28u\nY29tL1NlY3RpZ29SU0FEb21haW5WYWxpZGF0aW9uU2VjdXJlU2VydmVyQ0EuY3J0\nMCMGCCsGAQUFBzABhhdodHRwOi8vb2NzcC5zZWN0aWdvLmNvbTAzBgNVHREELDAq\nghF2Y2xvdWQudm13aXJlLmNvbYIVd3d3LnZjbG91ZC52bXdpcmUuY29tMIIBAgYK\nKwYBBAHWeQIEAgSB8wSB8ADuAHUAsh4FzIuizYogTodm+Su5iiUgZ2va+nDnsklT\nLe+LkF4AAAFsv3BsIwAABAMARjBEAiBat+l0e3BTu+EBcRJfR8hCA/CznWm1mbVl\nxZqDoKM6tAIgON6U0YoqA91xxpXH2DyA04o5KSdSvNT05wz2aa7zkzwAdQBep3P5\n31bA57U2SH3QSeAyepGaDIShEhKEGHWWgXFFWAAAAWy/cGw+AAAEAwBGMEQCIDHl\njofAcm5GqECwtjBfxYD7AFkJn4Ez0IGRFrux4ldiAiAaNnkMbf0P9arSDNno4hQT\nIJ2hUaIWNfuKBEIIkfqhCTANBgkqhkiG9w0BAQsFAAOCAQEAZCubBHRV+m9iiIeq\nCoaFV2YZLQUz/XM4wzQL+73eqGHINp6xh/+kYY6vw4j+ypr9P8m8+ouqichqo7GJ\nMhjtbXrB+TTRwqQgDHNHP7egBjkO+eDMxK4aa3x1r1AQoRBclPvEbXCohg2sPUG5\nZleog76NhPARR43gcxYC938OH/2TVAsa4JApF3vbCCILrbTuOy3Z9rf3aQLSt6Jp\nkh85w6AlSkXhQJWrydQ1o+NxnfQmTOuIH8XEQ2Ne1Xi4sbiMvWQ7dlH5/N8L8qWQ\nEPCWn+5HGxHIJFXMsgLEDypvuXGt28ZV/T91DwPLeGCEp8kUC3N+uamLYeYMKOGD\nMrToTA==\n—–END CERTIFICATE—–
ca-cert-name	a name for the imported ca root certificate
ca-certificate	ca root certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character.
vcd-node1-name	the hostname for the first vcd appliance
vcd-node1-ip	the dmz ip address for the first vcd appliance
vcd-node2-name	the hostname for the second vcd appliance
vcd-node2-ip	the dmz ip address for the second vcd appliance
vcd-node3-name	the hostname for the third vcd appliance
vcd-node3-ip	the dmz ip address for the third vcd appliance
tenant-app-node-name	the hostname for the vrealize operations tenant app appliance
tenant-app-node-ip	the dmz ip address for the vrealize operations tenant app appliance
tenant-app-cert-name	a name for the imported tenant app certificate
tenant-app-cert-private-key	tenant app certificate private key in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. For example: —–BEGIN RSA PRIVATE KEY—–\n<private key>\n—–END RSA PRIVATE KEY—–
tenant-app-cert-passphrase	tenant app private key passphrase
tenant-app-certificate	tenant app certificate in pem format, the APIs only accept single line and no spaces in the certificate chain, use \n as an end of line character. For example: —–BEGIN CERTIFICATE—–\nMIIGADCCBOigAwIBAgIRALUVXndtVGMeRM1YiMqzBCowDQYJKoZIhvcNAQELBQAw\ngY8xCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAO\nBgNVBAcTB1NhbGZvcmQxGDAWBgNVBAoTD1NlY3RpZ28gTGltaXRlZDE3MDUGA1UE\nAxMuU2VjdGlnbyBSU0EgRG9tYWluIFZhbGlkYXRpb24gU2VjdXJlIFNlcnZlciBD\nQTAeFw0xOTA4MjMwMDAwMDBaFw0yMDA4MjIyMzU5NTlaMFUxITAfBgNVBAsTGERv\nbWFpbiBDb250cm9sIFZhbGlkYXRlZDEUMBIGA1UECxMLUG9zaXRpdmVTU0wxGjAY\nBgNVBAMTEXZjbG91ZC52bXdpcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A\nMIIBCgKCAQEAqh9sn6bNiDmmg3fJSG4zrK9IbrdisALFqnJQTkkErvoky2ax0RzV\n/ZJ/1fNHpvy1yT7RSZbKcWicoxatYPCgFHDzz2JwgvfwQCRMOfbPzohTSAhrPZph\n4FOPnrF8iwGggTxp+/2/ixg0DjQZL32rc9ax1qEvSURt571hUE7uLkRbPrdbocSZ\n4c2atVh8K1fp3uBqEbAs0UyjW5PK3wIN5ZRFArxc5kiGW0btN1RmoWwOmuJkAtu7\nzuaAJcgr/UVb1PP+GgAvKdmikssB1MWQALTRHm7H2GJp2MlbyGU3ZROSPkSSaNsq\n4otCJxtvQze/lB5QGWj5V2B7YbNJKwJdXQIDAQABo4ICjjCCAoowHwYDVR0jBBgw\nFoAUjYxexFStiuF36Zv5mwXhuAGNYeEwHQYDVR0OBBYEFNhZaRisExXrYrqfIIm6\n9TP8JrqwMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMB0GA1UdJQQWMBQG\nCCsGAQUFBwMBBggrBgEFBQcDAjBJBgNVHSAEQjBAMDQGCysGAQQBsjEBAgIHMCUw\nIwYIKwYBBQUHAgEWF2h0dHBzOi8vc2VjdGlnby5jb20vQ1BTMAgGBmeBDAECATCB\nhAYIKwYBBQUHAQEEeDB2ME8GCCsGAQUFBzAChkNodHRwOi8vY3J0LnNlY3RpZ28u\nY29tL1NlY3RpZ29SU0FEb21haW5WYWxpZGF0aW9uU2VjdXJlU2VydmVyQ0EuY3J0\nMCMGCCsGAQUFBzABhhdodHRwOi8vb2NzcC5zZWN0aWdvLmNvbTAzBgNVHREELDAq\nghF2Y2xvdWQudm13aXJlLmNvbYIVd3d3LnZjbG91ZC52bXdpcmUuY29tMIIBAgYK\nKwYBBAHWeQIEAgSB8wSB8ADuAHUAsh4FzIuizYogTodm+Su5iiUgZ2va+nDnsklT\nLe+LkF4AAAFsv3BsIwAABAMARjBEAiBat+l0e3BTu+EBcRJfR8hCA/CznWm1mbVl\nxZqDoKM6tAIgON6U0YoqA91xxpXH2DyA04o5KSdSvNT05wz2aa7zkzwAdQBep3P5\n31bA57U2SH3QSeAyepGaDIShEhKEGHWWgXFFWAAAAWy/cGw+AAAEAwBGMEQCIDHl\njofAcm5GqECwtjBfxYD7AFkJn4Ez0IGRFrux4ldiAiAaNnkMbf0P9arSDNno4hQT\nIJ2hUaIWNfuKBEIIkfqhCTANBgkqhkiG9w0BAQsFAAOCAQEAZCubBHRV+m9iiIeq\nCoaFV2YZLQUz/XM4wzQL+73eqGHINp6xh/+kYY6vw4j+ypr9P8m8+ouqichqo7GJ\nMhjtbXrB+TTRwqQgDHNHP7egBjkO+eDMxK4aa3x1r1AQoRBclPvEbXCohg2sPUG5\nZleog76NhPARR43gcxYC938OH/2TVAsa4JApF3vbCCILrbTuOy3Z9rf3aQLSt6Jp\nkh85w6AlSkXhQJWrydQ1o+NxnfQmTOuIH8XEQ2Ne1Xi4sbiMvWQ7dlH5/N8L8qWQ\nEPCWn+5HGxHIJFXMsgLEDypvuXGt28ZV/T91DwPLeGCEp8kUC3N+uamLYeYMKOGD\nMrToTA==\n—–END CERTIFICATE—–
tier1-full-path	the full path to the nsx-t tier1 gateway that will run the load balancer, for example /infra/tier-1s/stage1-m-ec01-t1-gw01
vcd-dmz-segment-name	the portgroup name of the vcd dmz portgroup, for example stage1-m-vCDFront
allowed_ip_a	an ip address that is allowed to access the /provider URI and the admin API
allowed_ip_b	an ip address that is allowed to access the /provider URI and the admin API

Variables

Now you’re ready to run the calls.

The collection and environment are available to download from Github.

Atlantis USX Data Services with Hyper-Converged Architecture – Web Scale, Virtual Volumes & In-line Deduplication

Atlantis USX has some very cool technology which I’ve had the pleasure to ‘play’ with over the past few weeks. In these series of posts I’ll attempt to cover the various technologies within the Atlantis USX stack.

The key technologies in the Atlantis USX In-Memory Data Services are:

Inline IO and Data de-duplication
Content aware IO processing
Compression
Fast Clone
Storage Policies
Thin Provisioning

This post focuses on Inline IO and Data de-duplication (or just dedupe for short) and Fast Clone and how these rich data services enable a hyper converged solution to outperform enterprise storage arrays.

Why would you use Atlantis USX?

The best way to approach this is to look at some use cases: Crazy as it seems, Atlantis USX delivers All-Flash Array performance but also gives five times the capacity of traditional storage arrays. Doing this with 100% software, no hardware appliances, and true software defined storage with software, enabling true web-scale architecture.

The majority of storage vendors today either do one of the other, not both. So you could end up with storage silos where IOPS are provided by an all-flash array and capacity is provided by a traditional SAN.

USX Use Cases

The three key Atlantis USX messages are:

Why buy more storage when you can do more with the storage you already have
- Get up to 5X the capacity out of your existing storage array
- Avoid buying any new storage hardware for the next 5 years
- Reduce storage costs by up to 75%
Use cases: Storage capacity running out in your current arrays.
- Don’t buy another disk tray or array, free up capacity by leveraging Atlantis USX Inline Deduplication.
- Get more capacity out of your all-flash array purchase – all-flash arrays (AFA) provide great performance but not great capacity, get 5X more capacity by using USX on-top of your AFA.
Accelerate the performance of your existing storage array
- Deliver all-flash performance to applications with your existing storage at a fraction of the cost
- Works with any storage system type – SAN, NAS, Hybrid, DAS
Use cases: Current storage arrays not providing enough IOPS to your applications – place USX in front of your array and gain all-flash performance by using RAM from your hypervisor to accelerate and optimize the IO.
Build hyper-converged systems INSTANTLY without buying any new hardware
- With RAM, local disk (SSD/SAS/SATA) or VMware VSAN on your existing servers
- Don’t replace your servers of choice with alternative appliances
- Use blade servers for hyper-converged infrastructure
Use cases: Leverage existing investment in your compute estate by using USX to pool and protect local RAM and DAS to create a hyper-converged solution which can leverage both the DAS and any shared storage resources already deployed, including traditional SAN/NAS and VMware VSAN. Also use your preferred server architecture for hyper-converged, USX allows you to use both blade and rack server form factors due to the reduction in the number of disks required.

What if I want to do all of the above, all at the same time?

Well yes you can. And yes Duncan, we are doing this today (http://www.yellow-bricks.com/2014/05/30/looking-back-software-defined-storage/).

You can get the benefits of rich data services coupled with crazy fast storage and in-line deduplication enabling immediate capacity savings today.

What is Inline IO and Data de-duplication?

In short, it is the ability to dedupe data blocks and therefore IO operations before those blocks and IO operations reach the underlying storage. Atlantis USX reduces the load on the underlying storage by processing IO using the distributed in-memory technology within Atlantis USX.

To demonstrate this, the blue graph below represents IOPS provided by USX to VMs. The red graph represents the actual IOPS that USX then sends down to the underlying storage (if it needs to). [The red graph would be for IO operations that are required for unique writes, however I won’t go into detail about that here in this post.]

Conversely, the same graphs can be used to show data de-duplication, just replace the IOPS metric on the y-axis with Capacity Utilization (GB) and you will also see the same savings in the red graph. Atlantis USX uses in-memory in-line de-duplication to offload IOPS from the underlying storage and to reduce consumed capacity on the underlying storage. I’ll show you how this works in the following labs below.

Examples in the lab

Let’s see some of these use cases in action in the lab.

Lab setup

3 x SuperMicro servers installed with vSphere 5.5 U1b with 32GB RAM, 1 x SSD, 1 x SATA and some shared storage (which is not in use in this post) presented from an all-flash array (violin memory) and SAN (Nexenta) both over iSCSI.
Local direct attached storage (DAS) pooled, protected and managed by Atlantis USX.

Use Case 1: Building hyper-converged using Atlantis USX for VDI

In this use case I’ve created a hyper-converged system using the three servers and pooling the local SSDs as a performance pool and the local SATA drives as a capacity pool.

Memory is not used as a performance pool due to the servers only having 32GB of RAM. In a real world deployment you can of course use RAM as the performance pool and not require any SSDs altogether. I’ll use RAM in another blog post.

In the vSphere Client, these disks are shown as local VMFS5 data stores.

Pooling Local Resources

What USX then does is pool the SSDs into a Performance Pool and the SATA disks into a Capacity Pool.

Performance Pools

Atlantis USX pools the SSDs into a Performance Pool to provide performance. Performance Pools provide redundancy and resiliency to the underlying resources. In this example, where we are only using three servers, the RAW capacity provided by the SSDs are 120 x 3 = 360, however due to the Performance Pool providing redundancy, the actual usable capacity will be 66% of this, so 240GB is usable. This is the minimum configuration for a 3-node vSphere cluster. If you had a 4-node cluster then you will have the option to deploy a Performance Pool with a ‘RAID-10’ configuration. This will then give you 480GB RAW and 240GB usable. It’s really up to you to define how local resources are protected by Atlantis USX and by adding more nodes to your vSphere cluster and/or more local resources you can create hyper-converged infrastructure which is truly web scale.

Side note 1: an aside on web scale

Atlantis USX can pool, protect and manage multiple vCenter Servers and their resources. vCenter Servers can manage thousands of vSphere ESXi hosts. You can even create a Virtual Volume from resources which span over multiple ESXi servers, which are not in the same vSphere Cluster and not managed by the same vCenter Server. Heck, you can even use USX to provide the rich data services through Virtual Volumes which use multiple vsanDatastores (VMware VSAN). What I’m trying to say is that your USX Virtual Volume is not restricted to a vCenter construct and as such is free to roam as it is in essence decoupled from any underlying hardware. More on Virtual Volumes later.

Back to Capacity Pools

Atlantis USX pools the SATA disks into a Capacity Pool to provide capacity. Capacity Pools also provide redundancy and resiliency to the underlying resources. In this example, where we are only using three servers, the RAW capacity provided by the SATA disks are 1000 x 3 = 3000, however due to the Capacity Pool providing redundancy, the actual usable capacity will be 66% of this, so 2000GB is usable.

The resources from the Performance Pool and Capacity Pool are then used to carve out resources to Virtual Volumes.

Side note 2: a quick introduction to Atlantis USX Virtual Volumes

The concept of a Virtual Volume is not new, it was proposed by VMware back in 2012 (http://blogs.vmware.com/vsphere/2012/10/virtual-volumes-vvols-tech-preview-with-video.html) and in more detail by Duncan here (http://www.yellow-bricks.com/2012/08/07/vmware-vstorage-apis-for-vm-and-application-granular-data-management/) but since then has not really had the engineering focus that it deserves until now (http://www.punchingclouds.com/2014/06/30/virtual-volumes-public-beta/). The concept is very straightforward – your application should not be dependent on the underlying storage for its storage needs.

“Virtual Volumes is all about making the storage VM-centric – in other words making the VMDK a first class citizen in the storage world” – Cormac Hogan

Your application should be able to define its own set of requirements and then the storage will configure itself to accommodate the application. Some of these requirements could be:

The amount of capacity
The performance – IOPS and latency
The level of availability – backup and replication
The isolation level – single virtual volume container just for this application or shared between multiple applications of a similar workload

With Atlantis USX, Virtual Volumes have a storage policy which defines those exact requirements. Atlantis USX will provide the rich data services for the virtual volumes which can then be consumed by the application at the request of an Application Owner. Enabling self-service storage request and management for an application without waiting for a storage admin to calculate the RAID level and getting your LUN two weeks later. Is this still happening?

An Atlantis USX Virtual Volume is created from some memory from the hypervisor, some resource from the Performance Pool and some resource from the Capacity Pool. The Atlantis USX rich data services – inline data deduplication and content aware IO processing happens at the Virtual Volume level. The Virtual Volume is then exported by Atlantis USX as NFS or iSCSI (today. Object and CIFS very soon) either to the underlying hypervisor as a datastore or directly to the application. Think of a Virtual Volume as either a) an application container or b) a datastore – all with the storage policy characteristics as explained above and of course supporting all of the lovely vSphere, Horizon View, vCloud, VCAC features that you’ve come to love and depend on:

HA
DRS
vMotion
Fault Tolerance
Snapshots
Thin Provisioning
vSphere Replication
Storage Profiles
Linked Clones
Fast Provisioning
VAAI

Back to creating Virtual Volumes from Pools

In our example here, the maximum size for one Virtual Volume would be constructed from 240GB from the Performance Pool and 2000GB from the Capacity Pool. However, to take advantage of Atlantis USX in-memory I/O optimization and de-duplication, you would create multiple Virtual Volumes, one for a particular workload type. Doing so will make the most out of the Atlantis USX Content Aware IO Processing engine.

Let’s configure a single Virtual Volume for a VDI use case. I’ll create a Virtual Volume with just 100GB from the Capacity Pool and 5GB from the Performance Pool. We will then deploy some Windows 8 VMs into this Virtual Volume and see the Atlantis USX in-memory data deduplication and content aware IO processing in action.

Here’s our Virtual Volume below, configured from 100GB of resilient SATA and just 5GB of resilient SSD. Note that VAAI integration is supported and for NFS the following primitives are currently available: ‘Full File Clone’ and ‘Fast File Clone/Native Snapshot Support’.

[Dear VMware, how about a new ‘Drive Type’ label named ‘In-Memory’, ‘USX’, ‘Crazy Fast’?]

As you can see the datastore is empty. Very empty. The status graphs within USX currently show no IO offload and no deduplication. There’s nothing to dedupe and no IO to process.

Let’s start using this datastore by cloning a Windows 8 template into it. We will immediately see deduplication savings on the full clone after it is copied to our new virtual volume.

Here’s our new template, cloned from the ‘Windows 8.1 Template’ template above which is now located on the new usx-hyb-vol1 virtual volume.

The same graph below shows that for just that single workload, USX has been able to perform data de-duplication by 18%.

Let’s jump into Horizon View and create a desktop pool and use Full Clones for any new desktops, I’ll use the template named win8-template-on-usx as the base template for the new desktop pool and our new virtual volume usx-hyb-vol1 as the datastore.

Let’s see what happens when we deploy one new virtual machine via a full clone with Horizon View which uses an Atlantis USX Virtual Volume. Hint: The clone happens almost instantly due to the VAAI Full Clone offload to USX. We will also see the deduplication ratio increase and IO offload will also increase.

The Full Clone completes in about 9 seconds. Happy days!

The deduplication has increased to 63%! With just two VMs on this datastore – the template win8-template-on-usx and the first VM usx-vdi1.

Taking a look with the vSphere Client datastore browser again, we now see two VMs in the virtual volume which are both full VMs, not linked clones.

Two Full VMs, only occupying 8.9GB.

Let’s now go ahead and deploy an additional 5 VMs using Horizon View.

All five new VMs are provisioned pretty much instantly as shown in the vSphere Client Recent Tasks pane.

Checking the Atlantis USX status graphs again, the deduplication ratio has increased to 88%.

And we now see 6 Full Clones and the template in the datastore but still just consuming 10.57GB.

Additionally because the workloads are pretty much exactly the same, with all six VMs deployed and running in the usx-hyb-vol1 Virtual Volume and with Atlantis USX in-memory Content Aware IO processing, IO and data de-duplication, the IO Offload is pretty much at 100%. This will decrease accordingly as users start using the virtual desktops and more unique data is created but Atlantis USX will always try to provide all IO from the Performance Pool (RAM, Flash or SSD).

No storage blog post is complete without an Iometer test

Let’s do a VDI Iometer profile with 80% writes, 20% reads at 80% random with 4k blocks using the guide from Jim (http://www.jimmoyle.com/2013/08/how-to-use-iometer-to-simulate-a-desktop-workload/).

Here’s the result:

55k IOPS (fifty five thousand IOPS!) and pretty much negligible read and write latency on just three vSphere ESXi hosts. To put that into context, if I deployed one hundred Windows 8 VDI desktops into that Virtual Volume, each desktop (and therefore user) would basically have 550 IOPS. You can read more about IOPS per user in this post by Brian Madden (http://searchvirtualstorage.techtarget.com/video/Brian-Madden-discusses-VDI-IOPS-SSD-storageless-VDI). To put this IOPS number into further context, that Virtual Volume is configured to use just 10GB of RAM from the hypervisor, 5GB of SSD and 100GB (of which only 10.57GB is in use, which is a 88% capacity saving) of super slow SATA disks in total over the three vSphere ESXi hosts. If you want more IOPS, you just need to create more Virtual Volumes or add more ESXi hosts to scale out the hyper-converged solution.

In other words… crazy performance on hyper converged architecture with just a few off-the shelf disks on a few servers. No unicorns or magic in Atlantis USX, just pure speed and space savings. BOOM!

Summary

To summarize, Atlantis USX is a software-defined storage solution that delivers the performance of an All-Flash Array at half the cost of traditional SAN or NAS. You can pool any SAN, NAS or DAS storage and accelerate its performance, while at the same time consolidating storage to increase storage capacity by up to five times. With Atlantis USX, you can avoid purchasing additional storage for more than five years, meet the performance needs of any application without buying hardware, and transition from costly shared storage systems to lower cost hyper-converged systems based on direct-attached storage as I’ve demonstrated here.

In part 2. I’ll use local RAM instead of SSDs and in part 3. I’ll demonstrate how Atlantis USX can be used to get more capacity and IOPS from your current storage array.