Terraform - Python orchestration

Terraform is a great tool to configure infrastructure, but, at certain level of project complexity, Terraform configuration management becomes an issue. It is easy to start your project simple, but, as it grows, you need to create more and more resources: for production, dev and staging environment, for different regions and cloud providers, for individual clients. All these environments need to be similar, though still may vary slightly. There are different approaches to address growing complexity in Terraform. One may manage all infrastructure through the single module, which becomes a single source of truth, and helps to achieve uniformity and avoid code duplication. On the downside, this tightly couples all the infrastructure in a project and doesn't scale well beyond certain project complexity. The next logical step is to split project into many subprojects. But as soon as configuration and Terraform states are splitted into independent subprojects, there arise the need to manage and coordinate them. When it comes to multiple configurations, Terraform internal capabilities are limited enough, which made tools like Terragrunt very popular. While extending some Terraform capabilities, Terragrunt still is quite limited. After all, the real root of the problem lies in the fact, that as convenient for infrastructure definitions as HCL (a configuration language used in both Terragrunt and Terraform) is, it is not as flexible and powerful, as a general purpose language. And that is when Python could be very helpful.

Configuration generation

In Terraform you may specify configuration not only using HCL configuration language, but using JSON files as well. And, as you know, Python has a lot of tooling to work with JSON. Not a surprise, there have been developed many libraries allowing to define Terraform artifacts directly in Python. One of them, terraformpy allows to create Terraform artifacts using syntax nearly identical to HCL:

    Resource(
        'aws_instance', 'example',
        ami=ami.id,
        instance_type='m4.xlarge'
    )

Using such library, one may easily define entire configuration in Python. Or, even better, one just can use it to combine/orchestrate existing Terraform modules.

Managing Terraform modules

To provision even simplest project, one may need to create a lot of interdependent resources. Wiring them together could be a challenging endeavor. To make life of developers easier, there have been developed many configurable, open source Terraform modules. Such modules could be found on Github and/or in Terraform Registry. When working with yor infrastructure, your configuration often relies on many such modules, coming from different sources. To organize and manage such modules, I usually place them in a modules folder, and describe/update using terrafile.yaml:

aws-vps:
  name: terraform-aws-modules-vpc
  source: terraform-aws-modules/vpc/aws
  version: "2.44.0"
aws-ecs:
  name: terraform-aws-modules-ecs
  source: terraform-aws-modules/ecs/aws
  version: "2.3.0"

The keys in a terrafile.yaml file serve as short module IDs, names (if given) specify exact module directory name. Source may contain Github URL, Terraform Registry name, or local file path. A simple utility class could be used to check/update/download modules when necessary (see source code):

from terraformy.terrafile import Terrafile
terrafile = Terrafile()
terrafile.update()

Managing multiple Terraform states

I usually place all top level Terraform modules (modules with state) into state subdirectories, to distinguish them from reusable modules, located in modules directory. I try to keep such modules simple, generating them based on some configuration. For example, a simple helper class below is able to generate Terraform configuration and invoke corresponding Terraform commands in appropriate subdirectories:

class RootModule:
    name: str
    state_dir: str
    terrafile: Terrafile
    config: Iterable[TerraformConfig]

    def __init__(self, name: str, terrafile: Terrafile, config: Iterable[TerraformConfig]):
        self.name = name
        self.config = config
        self.terrafile = terrafile
        self.state_dir = "./state/{}".format(name)

    def insure_dir_exists(self):
        pathlib.Path(self.state_dir).mkdir(parents=True, exist_ok=True)

    def generate_config(self):
        self.insure_dir_exists()
        for conf in self.config:
            conf.config()
            with open("{}/{}.yaml".format(self.state_dir, conf.name), 'w') as file:
                yaml.dump(conf.dict(), file)

        print("terraformpy - Writing main.tf.json")

        with open("{}/main.tf.json".format(self.state_dir), "w") as fd:
            json.dump(compile(), fd, indent=4, sort_keys=True)

    def init(self, **kwargs):
        self.insure_dir_exists()
        return exec('terraform', 'init', cwd=self.state_dir, **kwargs)

    def plan(self, **kwargs):
        self.insure_dir_exists()
        return exec('terraform', 'plan', '-out=plan.saved', cwd=self.state_dir, **kwargs)

    def apply(self, **kwargs):
        self.insure_dir_exists()
        return exec('terraform', 'apply', '-auto-approve', cwd=self.state_dir, **kwargs)

    def destroy(self, **kwargs):
        self.insure_dir_exists()
        return exec('terraform', 'destroy', '-auto-approve', cwd=self.state_dir, **kwargs)

Defining Terraform configuration

To generate a new Terraform configuration, define a new class subclassing BaseConfig class:

from pydantic import BaseModel
import abc

class TerraformConfig(abc.ABC):
    name: str
    @abc.abstractmethod
    def config(self):
        pass

    @abc.abstractmethod
    def dict(self):
        pass

class BaseConfig(BaseModel, TerraformConfig):
    terrafile: Terrafile

    def module(self, name):
        dir = name

        if name in self.terrafile.entries and self.terrafile.entries[name].name:
            dir = self.terrafile.entries[name].name

        return "../../modules/{}".format(dir)

    class Config:
        arbitrary_types_allowed = True

BaseConfig above defines certain protocol (config and dict methods and name field), provides pydantic validation and helper methods (module method returns a relative path to a module based on a short id from terrafile.yaml). For example, the configuration, responsible to create AWS VPC with the help of terraform-aws-modules/vpc/aws module, could be defined as follows:

from terraformpy import Module, Provider

from terraformy.config import BaseConfig

class MyVpc(BaseConfig):
    name: str
    region: str = "eu-west-2"
    private_number: int = 2
    public_number: int = 2

    def config(self):
        name = self.name
        region = self.region
        private_number = self.private_number
        public_number = self.public_number

        Provider("aws", region=region)
        params = dict(
            source=self.module("aws-vps"),
            name=name,
            cidr="10.0.0.0/16",
            azs=[region + "a", region + "b"],
            private_subnets=[
                "10.0.{}.0/24".format(k) for k in range(1, private_number + 1)
            ],
            public_subnets=[
                "10.0.10{}.0/24".format(k) for k in range(1, public_number + 1)
            ],
            enable_ipv6="true",
            enable_nat_gateway="true",
            single_nat_gateway="true",
            public_subnet_tags={"Name": "overridden-name-public"},
            tags={"Owner": "user", "Environment": "dev"},
            vpc_tags={"Name": name},
        )

        Module(name, **params)

The best of both worlds

Using Python in combination with Terraform modules, we are able to leverage both open source Terraform modules and the power of Python. We may use Terraform declarative HCL definitions. And we have access to all Python capabilities: *.yaml and *.json config files, schema validation, tools to create CLI and/or REST APIs, etc.

Find the source code used in this blog on Github.