Terragrunt is a thin wrapper for Terraform that supports locking and enforces best practices for Terraform state:
- Locking: Terragrunt can use Amazon's DynamoDB as a distributed locking mechanism to ensure that two team members working on the same Terraform state files do not overwrite each other's changes. DynamoDB is part of the AWS free tier, so using it as a locking mechanism should not cost you anything.
- Remote state management: A common mistake when using Terraform is to forget to configure remote state or to configure it incorrectly. Terragrunt can prevent these sorts of errors by automatically configuring remote state for everyone on your team.
- Managing multiple modules: Terragrunt has tools that make it easier to work with multiple Terraform folders, and therefore, multiple state files.
Other types of locking mechanisms and automation for more best practices may be added in the future.
When you use Terraform to provision infrastructure, it records the state of your infrastructure in state files. In order to make changes to your infrastructure, everyone on your team needs access to these state files. You could check the files into version control (not a great idea, as the state files may contain secrets) or use a supported remote state backend to store the state files in a shared location such as S3, Consul, or etcd. All of these options have three problems:
- They do not provide locking. If two team members run
terraform apply
on the same state files at the same time, they may overwrite each other's changes. The official solution to this problem is to use Hashicorp's Atlas, but that can be a fairly expensive option, and it requires you to use a SaaS platform for all Terraform operations. - They are error prone. Very often, you do a fresh checkout of a bunch of Terraform templates from version control, forget to enable remote state storage before applying them, and end up creating a bunch of duplicate resources. Sometimes you do remember to enable remote state storage, but you use the wrong configuration (e.g. the wrong S3 bucket name or key) and you end up overwriting the state for a totally different set of templates.
- If you define all of your environments (stage, prod) and components (database, app server) in one set of templates (and therefore one state file), then a mistake anywhere can cause problems everywhere. To isolate different environments and components, you need to define your Terraform code in multiple different folders (see How to manage Terraform state), but this makes it harder to manage state and quickly spin up and tear down environments.
The goal of Terragrunt is to take Terraform, which is a fantastic tool, and make it even better for teams by providing a simple, free locking mechanism, and enforcing best practices around CLI usage and state management.
- Install Terraform and make sure it is in your PATH.
- Install Terragrunt by going to the Releases Page, downloading
the binary for your OS, renaming it to
terragrunt
, and adding it to your PATH.
Go into the folder with your Terraform templates and create a .terragrunt
file. This file uses the same
HCL syntax as Terraform. Here is an example .terragrunt
file that configures
Terragrunt to use DynamoDB for locking and to automatically manage remote
state for you using the S3 backend:
# Configure Terragrunt to use DynamoDB for locking
lock = {
backend = "dynamodb"
config {
state_file_id = "my-app"
}
}
# Configure Terragrunt to automatically store tfstate files in an S3 bucket
remote_state = {
backend = "s3"
config {
encrypt = "true"
bucket = "my-bucket"
key = "terraform.tfstate"
region = "us-east-1"
}
}
Once you check this .terragrunt
file into source control, everyone on your team can use Terragrunt to run all the
standard Terraform commands:
terragrunt get
terragrunt plan
terragrunt apply
terragrunt output
terragrunt destroy
Terragrunt forwards almost all commands, arguments, and options directly to Terraform, using whatever version of
Terraform you already have installed. However, before running Terraform, Terragrunt will ensure your remote state is
configured according to the settings in the .terragrunt
file. Moreover, for the apply
, refresh
, and destroy
commands,
Terragrunt will first try to acquire a lock using DynamoDB:
terragrunt apply
[terragrunt] 2016/05/30 16:55:28 Configuring remote state for the s3 backend
[terragrunt] 2016/05/30 16:55:28 Running command: terraform remote config -backend s3 -backend-config=key=terraform.tfstate -backend-config=region=us-east-1 -backend-config=encrypt=true -backend-config=bucket=my-bucket
Initialized blank state with remote state enabled!
[terragrunt] 2016/05/30 16:55:29 Attempting to acquire lock for state file my-app in DynamoDB
[terragrunt] 2016/05/30 16:55:30 Attempting to create lock item for state file my-app in DynamoDB table terragrunt_locks
[terragrunt] 2016/05/30 16:55:30 Lock acquired!
[terragrunt] 2016/05/30 16:55:30 Running command: terraform apply
terraform apply
aws_instance.example: Creating...
ami: "" => "ami-0d729a60"
instance_type: "" => "t2.micro"
[...]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
[terragrunt] 2016/05/27 00:39:19 Attempting to release lock for state file my-app in DynamoDB
[terragrunt] 2016/05/27 00:39:19 Lock released!
Terragrunt can use Amazon's DynamoDB to acquire and release locks. DynamoDB supports strongly consistent reads as well as conditional writes, which are all the primitives we need for a very basic distributed lock system. It's also part of AWS's free tier, and given the tiny amount of data we are working with and the relatively small number of times per day you're likely to run Terraform, it should be a free option for teams already using AWS. We take no responsibility for any charges you may incur.
To use DynamoDB for locking, you must:
-
Set your AWS credentials in the environment using one of the following options:
- Set your credentials as the environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
(and alsoAWS_SESSION_TOKEN
if using STS temporary credentials) - Run
aws configure
and fill in the details it asks for. - Run Terragrunt on an EC2 instance with an IAM Role.
- Set your credentials as the environment variables
-
Your AWS user must have an IAM policy granting all DynamoDB actions (
dynamodb:*
) on the tableterragrunt_locks
(see the DynamoDB locking configuration for how to configure this table name).Here is an example IAM policy that grants the necessary permissions on the
terragrunt_locks
table in regionus-west-2
for an account with account id1234567890
:{ "Version": "2012-10-17", "Statement": [ { "Sid": "ReadWriteToDynamoDB", "Effect": "Allow", "Action": "dynamodb:*", "Resource": "arn:aws:dynamodb:us-west-2:1234567890:table/terragrunt_locks" } ] }
For DynamoDB locking, Terragrunt supports the following settings in .terragrunt
:
lock = {
backend = "dynamodb"
config {
state_file_id = "my-app"
aws_region = "us-east-1"
table_name = "terragrunt_locks"
max_lock_retries = 360
}
}
state_file_id
: (Required) A unique id for the state file for these Terraform templates. Many teams have more than one set of templates, and therefore more than one state file, so this setting is used to disambiguate locks for one state file from another.aws_region
: (Optional) The AWS region to use. Default:us-east-1
.table_name
: (Optional) The name of the table in DynamoDB to use to store lock information. Default:terragrunt_locks
.max_lock_retries
: (Optional) The maximum number of times to retry acquiring a lock. Terragrunt waits 10 seconds between retries. Default: 360 retries (one hour).
When you run terragrunt apply
or terragrunt destroy
, Terragrunt does the following:
- Create the
terragrunt_locks
table if it doesn't already exist. - Try to write an item to the
terragrunt_locks
table withStateFileId
equal to thestate_file_id
specified in your.terragrunt
file. This item will include useful metadata about the lock, such as who created it (e.g. your username) and when. - Note that the write is a conditional write that will fail if an item with the same
state_file_id
already exists.- If the write succeeds, it means we have a lock!
- If the write does not succeed, it means someone else has a lock. Keep retrying every 10 seconds until we get a lock.
- Run
terraform apply
orterraform destroy
. - When Terraform is done, delete the item from the
terragrunt_locks
table to release the lock.
Occasionally, you may want to lock a set of Terraform files and not allow further changes, perhaps during maintenance
work or as a precaution for templates that rarely change. To do that, you can use the acquire-lock
command:
terragrunt acquire-lock
Are you sure you want to acquire a long-term lock? (y/n): y
See the next section for how to release this lock.
You can use the release-lock
command to manually release a lock. This is useful if you used the acquire-lock
command to create a long-term lock or if Terragrunt shut down before it released a lock (e.g. because of CTRL+C
or a
crash).
terragrunt release-lock
Are you sure you want to forcibly remove the lock for stateFileId "my-app"? (y/n): y
Terragrunt can automatically manage remote state for you, preventing manual errors such as forgetting to enable remote state or using the wrong settings.
Terragrunt works with all backends supported by Terraform. Check out the Terraform remote state docs for the requirements to use a particular remote state backend.
For remote state management, Terragrunt supports the following settings in .terragrunt
:
remote_state = {
backend = "s3"
config {
key1 = "value1"
key2 = "value2"
key3 = "value3"
}
}
backend
: (Required) The name of the remote state backend to use (e.g. s3, consul).config
: (Optional) A map of additional key/value pairs to pass to the backend. Each backend requires different key/value pairs, so consult the Terraform remote state docs for details.
With Terraform, it can be a good idea to store your templates in separate folders (and therefore, separate state files)
to provide isolation between different environments,such as stage and prod, and different components, such as a
database and an app cluster (for more info, see How to Manage Terraform
State). That means you will need a .terragrunt
file in each folder:
my-terraform-repo
└ qa
└ my-app
└ main.tf
└ .terragrunt
└ stage
└ my-app
└ main.tf
└ .terragrunt
└ prod
└ my-app
└ main.tf
└ .terragrunt
Most of these .terragrunt
files will have almost the same content. For example, qa/my-app/.terragrunt
may look
like this:
# Configure Terragrunt to use DynamoDB for locking
lock = {
backend = "dynamodb"
config {
state_file_id = "qa/my-app"
}
}
# Configure Terragrunt to automatically store tfstate files in an S3 bucket
remote_state = {
backend = "s3"
config {
encrypt = "true"
bucket = "my-bucket"
key = "qa/my-app/terraform.tfstate"
region = "us-east-1"
}
}
And stage/my-app/.terragrunt
may look like this:
# Configure Terragrunt to use DynamoDB for locking
lock = {
backend = "dynamodb"
config {
state_file_id = "stage/my-app"
}
}
# Configure Terragrunt to automatically store tfstate files in an S3 bucket
remote_state = {
backend = "s3"
config {
encrypt = "true"
bucket = "my-bucket"
key = "stage/my-app/terraform.tfstate"
region = "us-east-1"
}
}
Note how most of the content is copy/pasted, except for the state_file_id
and key
parameters, which match the path
of the .terragrunt
file itself. How do you avoid having to manually maintain the contents of all of these
similar-looking .terragrunt
files? Also, if you want to spin up an entire environment (e.g. stage
, prod
), how do
you do it without having to manually run terragrunt apply
in each of the Terraform folders within that environment?
The solution is to use the following features of Terragrunt:
- Includes
- Find parent helper
- Relative path helper
- Overriding included settings
- The
spin-up
andtear-down
commands - Dependencies between modules
One .terragrunt
file can automatically "include" the contents of another .terragrunt
file using the include
block. For example, imagine you have the following file layout:
my-terraform-repo
└ .terragrunt
└ qa
└ my-app
└ main.tf
└ .terragrunt
└ stage
└ my-app
└ main.tf
└ .terragrunt
└ prod
└ my-app
└ main.tf
└ .terragrunt
The .terragrunt
file in the root folder defines the typical lock
and remote_state
settings. The .terragrunt
files in all the subfolders (e.g. qa/my-app/.terragrunt
) can automatically include all the settings from a parent
file using the include
block:
include = {
path = "../../.terragrunt"
}
When you run Terragrunt in the qa/my-app
folder, it will see the include
block in the qa/my-app/.terragrunt
file
and realize that it should load the contents of the root .terragrunt
file instead. It's almost as if you had
copy/pasted the contents of the root .terragrunt
file into qa/my-app/.terragrunt
, but much easier to maintain!
Note: only one level of includes is allowed. If root/qa/my-app/.terragrunt
includes root/.terragrunt
, then
root/.terragrunt
may NOT specify an include
block.
There are a few problems with the simple approach above, so read on before using it!
- Having to manually manage the file paths to the included
.terragrunt
file is tedious and error prone. To solve this problem, you can use thefind_in_parent_folders()
helper. - If the included
.terragrunt
file hard-codes thestate_file_id
andkey
settings, then every child that includes it would end up using the same lock and write state to the same location. To avoid this problem, you can use thepath_relative_to_include()
helper. - Some of the child
.terragrunt
files may want to override the settings they include. To do this, see the section on overriding included settings.
Each of these items is discussed next.
Terragrunt supports the use of a few helper functions using the same syntax as Terraform: ${some_function()}
. One of
the supported helper functions is find_in_parent_folders()
, which returns the path to the first .terragrunt
file it
finds in the parent folders above the current .terragrunt
file.
Example:
include = {
path = "${find_in_parent_folders()}"
}
If you ran this in qa/my-app/.terragrunt
, this would automatically set path
to ../../.terragrunt
. You will almost
always want to use this function, as it allows you to copy/paste the same .terragrunt
file to all child folders with
no changes.
find_in_parent_folders()
will search up the directory tree until it hits the root folder of your file system, and if
no .terragrunt
file is found, Terragrunt will exit with an error.
Another helper function supported by Terragrunt is path_relative_to_include()
, which returns the relative path between
the current .terragrunt
file and the path specified in its include
block. For example, in the root .terragrunt
file, you could do the following:
# Configure Terragrunt to use DynamoDB for locking
lock = {
backend = "dynamodb"
config {
state_file_id = "${path_relative_to_include()}"
}
}
# Configure Terragrunt to automatically store tfstate files in an S3 bucket
remote_state = {
backend = "s3"
config {
encrypt = "true"
bucket = "my-bucket"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
}
}
Each child .terragrunt
file that references the configuration above in its include
block will get a unique path for
its state_file_id
and key
settings. For example, in qa/my-app/.terragrunt
, the state_file_id
will resolve to
qa/my-app
and the key
will resolve to qa/my-app/terraform.tfstate
.
You will almost always want to use this helper too. The only time you may want to specify the state_file_id
or key
manually is if you moved a child folder. In that case, to ensure it can reuse its old state and lock, you may want to
hard-code the state_file_id
and key
to the old file path. However, a safer approach would be to move the state
files themselves to match the new location of the child folder, as that makes things more consistent!
Any settings in the child .terragrunt
file will override the settings pulled in via an include
. For example,
imagine if qa/my-app/.terragrunt
had the following contents:
include = {
path = "${find_in_parent_folders()}"
}
remote_state = {
backend = "s3"
config {
encrypt = "true"
bucket = "some-other-bucket"
key = "/foo/bar/terraform.tfstate"
region = "us-west-2"
}
}
The result is that when you run terragrunt
commands in the qa/my-app
folder, you get the lock
settings from the
parent, but the remote_state
settings of the child.
Let's say you have a single environment (e.g. stage
or prod
) that has a number of Terraform modules within it:
my-terraform-repo
└ .terragrunt
└ stage
└ frontend-app
└ main.tf
└ .terragrunt
└ backend-app
└ main.tf
└ .terragrunt
└ search-app
└ main.tf
└ .terragrunt
└ mysql
└ main.tf
└ .terragrunt
└ redis
└ main.tf
└ .terragrunt
└ vpc
└ main.tf
└ .terragrunt
There is one module to deploy a frontend-app, another to deploy a backend-app, another for the MySQL database, and so
on. To deploy such an environment, you'd have to manually run terragrunt apply
in each of the subfolders. How do you
avoid this tedious and time-consuming process?
The answer is that you can use the spin-up
command:
cd my-terraform-repo/stage
terragrunt spin-up
When you run this command, Terragrunt will find all .terragrunt
files in the subfolders of the current working
directory, and run terragrunt apply
in each one concurrently.
Similarly, to undeploy all the Terraform modules, you can use the tear-down
command:
cd my-terraform-repo/stage
terragrunt tear-down
Of course, if your modules have dependencies between them—for example, you can't deploy the backend-app until the MySQL
database is deployed—you'll need to express those dependencines in your .terragrunt
config as explained in the next
section.
Consider the following file structure for the stage
environment:
my-terraform-repo
└ .terragrunt
└ stage
└ frontend-app
└ main.tf
└ .terragrunt
└ backend-app
└ main.tf
└ .terragrunt
└ search-app
└ main.tf
└ .terragrunt
└ mysql
└ main.tf
└ .terragrunt
└ redis
└ main.tf
└ .terragrunt
└ vpc
└ main.tf
└ .terragrunt
Let's assume you have the following dependencies between Terraform modules:
- Every module depends on the VPC being deployed
- The backend-app depends on the MySQL database and Redis
- The frontend-app and search-app depend on the backend-app
You can express these dependencies in your .terragrunt
config files using the dependencies
block. For example, in
stage/backend-app/.terragrunt
you would specify:
include = {
path = "${find_in_parent_folders()}"
}
dependencies = {
paths = ["../vpc", "../mysql", "../redis"]
}
Similarly, in stage/frontend-app/.terragrunt
, you would specify:
include = {
path = "${find_in_parent_folders()}"
}
dependencies = {
paths = ["../vpc", "../backend-app"]
}
Once you've specified the depenedencies in each .terragrunt
file, when you run the terragrunt spin-up
and
terragrunt tear-down
, Terragrunt will ensure that the dependencies are applied or destroyed, respectively, in the
correct order. For the example at the start of this section, the order for the spin-up
command would be:
- Deploy the VPC
- Deploy MySQL and Redis in parallel
- Deploy the backend-app
- Deploy the frontend-app and search-app in parallel
If any of the modules fail to deploy, then Terragrunt will not attempt to deploy the modules that depend on them. Once
you've fixed the error, it's usually safe to re-run the spin-up
or tear-down
command again, since it'll be a noop
for the modules that already deployed successfully, and should only affect the ones that had an error the last time
around.
Terragrunt forwards all arguments and options to Terraform. The only exceptions are the options that start with the
prefix --terragrunt-
. The currently available options are:
--terragrunt-config
: A custom path to the.terragrunt
file. May also be specified via theTERRAGRUNT_CONFIG
environment variable. The default path is.terragrunt
in the current directory.--terragrunt-non-interactive
: Don't show interactive user prompts. This will default the answer for all prompts to 'yes'. Useful if you need to run Terragrunt in an automated setting (e.g. from a script).--terragrunt-working-dir
: Set the directory where Terragrunt should execute theterraform
command. Default is the current working directory. Note that for thespin-up
andtear-down
directories, this parameter has a different meaning: Terragrunt will apply or destroy all the Terraform modules in the subfolders of theterragrunt-working-dir
, runningterraform
in the root of each module it finds.
To run Terragrunt locally, use the go run
command:
go run main.go plan
Note: The tests in the dynamodb
folder for Terragrunt run against a real AWS account and will add and remove
real data from DynamoDB. DO NOT hit CTRL+C
while the tests are running, as this will prevent them from cleaning up
temporary tables and data in DynamoDB. We are not responsible for any charges you may incur.
Before running the tests, you must configure your AWS credentials as explained in the DynamoDB locking prerequisites section.
To run all the tests:
go test -v -parallel 128 $(glide novendor)
To run only the tests in a specific package, such as the package remote
:
cd remote
go test -v -parallel 128
And to run a specific test, such as TestToTerraformRemoteConfigArgsNoBackendConfigs
in package remote
:
cd remote
go test -v -parallel 128 -run TestToTerraformRemoteConfigArgsNoBackendConfigs
If you set the TERRAGRUNT_DEBUG
environment variable to "true", the stack trace for any error will be printed to
stdout when you run the app.
In this project, we try to ensure that:
- Every error has a stacktrace. This makes debugging easier.
- Every error generated by our own code (as opposed to errors from Go built-in functions or errors from 3rd party libraries) has a custom type. This makes error handling more precise, as we can decide to handle different types of errors differently.
To accomplish these two goals, we have created an errors
package that has several helper methods, such as
errors.WithStackTrace(err error)
, which wraps the given error
in an Error object that contains a stacktrace. Under
the hood, the errors
package is using the go-errors library, but this may
change in the future, so the rest of the code should not depend on go-errors
directly.
Here is how the errors
package should be used:
- Any time you want to create your own error, create a custom type for it, and when instantiating that type, wrap it
with a call to
errors.WithStackTrace
. That way, any time you call a method defined in the Terragrunt code, you know the error it returns already has a stacktrace and you don't have to wrap it yourself. - Any time you get back an error object from a function built into Go or a 3rd party library, immediately wrap it with
errors.WithStackTrace
. This gives us a stacktrace as close to the source as possible. - If you need to get back the underlying error, you can use the
errors.IsError
anderrors.Unwrap
functions.
To release a new version, just go to the Releases Page and create a new release. The CircleCI job for this repo has been configured to:
- Automatically detect new tags.
- Build binaries for every OS using that tag as a version number.
- Upload the binaries to the release in GitHub.
See circle.yml
and _ci/build-and-push-release-asset.sh
for details.
This code is released under the MIT License. See LICENSE.txt.
- Add a check that modules have been downloaded using
terraform get
. - Add a check that all local changes have been committed before running
terraform apply
. - Consider implementing alternative locking mechanisms, such as using Git instead of DynamoDB.
- Consider embedding the Terraform Go code within Terragrunt instead of calling out to it.
- Add a
show-lock
command. - Add a command to automatically set up best-practices remote state storage in a versioned, encrypted, S3 bucket.
- Add a command to list the different versions of state available in a versioned S3 bucket and to diff any two state files.