Terraform - Python orchestration
Terraform is a great tool to configure infrastructure, but, at certain level of project complexity, Terraform configuration management becomes an issue. It is easy to start your project simple, but, as it grows, you need to create more and more resources: for production, dev and staging environment, for different regions and cloud providers, for individual clients. All these environments need to be similar, though still may vary slightly. There are different approaches to address growing complexity in Terraform. One may manage all infrastructure through the single module, which becomes a single source of truth, and helps to achieve uniformity and avoid code duplication. On the downside, this tightly couples all the infrastructure in a project and doesn't scale well beyond certain project complexity. The next logical step is to split project into many subprojects. But as soon as configuration and Terraform states are splitted into independent subprojects, there arise the need to manage and coordinate them. When it comes to multiple configurations, Terraform internal capabilities are limited enough, which made tools like Terragrunt very popular. While extending some Terraform capabilities, Terragrunt still is quite limited. After all, the real root of the problem lies in the fact, that as convenient for infrastructure definitions as HCL (a configuration language used in both Terragrunt and Terraform) is, it is not as flexible and powerful, as a general purpose language. And that is when Python could be very helpful.
Configuration generation
In Terraform you may specify configuration not only using HCL configuration language, but using JSON files as well. And, as you know, Python has a lot of tooling to work with JSON. Not a surprise, there have been developed many libraries allowing to define Terraform artifacts directly in Python. One of them, terraformpy allows to create Terraform artifacts using syntax nearly identical to HCL:
Resource( 'aws_instance', 'example', ami=ami.id, instance_type='m4.xlarge' )
Using such library, one may easily define entire configuration in Python. Or, even better, one just can use it to combine/orchestrate existing Terraform modules.
Managing Terraform modules
To provision even simplest project, one may need to create a lot of interdependent resources. Wiring them together could be a challenging endeavor. To make life of developers easier, there have been developed many configurable, open source Terraform modules. Such modules could be found on Github and/or in Terraform Registry. When working with yor infrastructure, your configuration often relies on many such modules, coming from different sources. To organize and manage such modules, I usually place them in a modules
folder, and describe/update using terrafile.yaml
:
aws-vps:
name: terraform-aws-modules-vpc
source: terraform-aws-modules/vpc/aws
version: "2.44.0"
aws-ecs:
name: terraform-aws-modules-ecs
source: terraform-aws-modules/ecs/aws
version: "2.3.0"
The keys in a terrafile.yaml
file serve as short module IDs, names (if given) specify exact module directory name. Source may contain Github URL, Terraform Registry name, or local file path. A simple utility class could be used to check/update/download modules when necessary (see source code):
from terraformy.terrafile import Terrafile terrafile = Terrafile() terrafile.update()
Managing multiple Terraform states
I usually place all top level Terraform modules (modules with state) into state
subdirectories, to distinguish them from reusable modules, located in modules
directory. I try to keep such modules simple, generating them based on some configuration. For example, a simple helper class below is able to generate Terraform configuration and invoke corresponding Terraform commands in appropriate subdirectories:
class RootModule:
name: str
state_dir: str
terrafile: Terrafile
config: Iterable[TerraformConfig]
def __init__(self, name: str, terrafile: Terrafile, config: Iterable[TerraformConfig]):
self.name = name
self.config = config
self.terrafile = terrafile
self.state_dir = "./state/{}".format(name)
def insure_dir_exists(self):
pathlib.Path(self.state_dir).mkdir(parents=True, exist_ok=True)
def generate_config(self):
self.insure_dir_exists()
for conf in self.config:
conf.config()
with open("{}/{}.yaml".format(self.state_dir, conf.name), 'w') as file:
yaml.dump(conf.dict(), file)
print("terraformpy - Writing main.tf.json")
with open("{}/main.tf.json".format(self.state_dir), "w") as fd:
json.dump(compile(), fd, indent=4, sort_keys=True)
def init(self, **kwargs):
self.insure_dir_exists()
return exec('terraform', 'init', cwd=self.state_dir, **kwargs)
def plan(self, **kwargs):
self.insure_dir_exists()
return exec('terraform', 'plan', '-out=plan.saved', cwd=self.state_dir, **kwargs)
def apply(self, **kwargs):
self.insure_dir_exists()
return exec('terraform', 'apply', '-auto-approve', cwd=self.state_dir, **kwargs)
def destroy(self, **kwargs):
self.insure_dir_exists()
return exec('terraform', 'destroy', '-auto-approve', cwd=self.state_dir, **kwargs)
Defining Terraform configuration
To generate a new Terraform configuration, define a new class subclassing BaseConfig
class:
from pydantic import BaseModel
import abc
class TerraformConfig(abc.ABC):
name: str
@abc.abstractmethod
def config(self):
pass
@abc.abstractmethod
def dict(self):
pass
class BaseConfig(BaseModel, TerraformConfig):
terrafile: Terrafile
def module(self, name):
dir = name
if name in self.terrafile.entries and self.terrafile.entries[name].name:
dir = self.terrafile.entries[name].name
return "../../modules/{}".format(dir)
class Config:
arbitrary_types_allowed = True
BaseConfig
above defines certain protocol (config
and dict
methods and name
field), provides pydantic
validation and helper methods (module
method returns a relative path to a module based on a short id from terrafile.yaml
). For example, the configuration, responsible to create AWS VPC with the help of terraform-aws-modules/vpc/aws
module, could be defined as follows:
from terraformpy import Module, Provider
from terraformy.config import BaseConfig
class MyVpc(BaseConfig):
name: str
region: str = "eu-west-2"
private_number: int = 2
public_number: int = 2
def config(self):
name = self.name
region = self.region
private_number = self.private_number
public_number = self.public_number
Provider("aws", region=region)
params = dict(
source=self.module("aws-vps"),
name=name,
cidr="10.0.0.0/16",
azs=[region + "a", region + "b"],
private_subnets=[
"10.0.{}.0/24".format(k) for k in range(1, private_number + 1)
],
public_subnets=[
"10.0.10{}.0/24".format(k) for k in range(1, public_number + 1)
],
enable_ipv6="true",
enable_nat_gateway="true",
single_nat_gateway="true",
public_subnet_tags={"Name": "overridden-name-public"},
tags={"Owner": "user", "Environment": "dev"},
vpc_tags={"Name": name},
)
Module(name, **params)
The best of both worlds
Using Python in combination with Terraform modules, we are able to leverage both open source Terraform modules and the power of Python. We may use Terraform declarative HCL definitions. And we have access to all Python capabilities: *.yaml
and *.json
config files, schema validation, tools to create CLI and/or REST APIs, etc.
Find the source code used in this blog on Github.