How to Create New Autoscaling Groups on Lauch Configuration Change

Updating an auto scaling group’s (ASG) launch config exposes a little quirk: the old instances are not automatically taken out of service.

Sometimes that’s not a bad thing. Running a distributed database in an ASG would make taking old instances out of service all at once a bad idea.

Other times removing the old servers is exactly what’s desired. Terraform and immutable infrastructure terraform help make this scenario really easy.

Creating a Launch Config

First off, we’ll want to describe a launch configuration with a name_prefix and lifecycle block.

resource "aws_launch_configuration" "example" {
  name_prefix = "example-"
  # other config here
  lifecycle {
    create_before_destroy = true
  }
}

create_before_destroy forces the new resources to be created before old resources are deleted. Launch configs are immutable so a new resource will always be created when they are changed.

Tying the Launch Configuration and ASG Together

Now we’ll tie the name of the launch config to the name of an auto scaling group. This forces a new ASG to be created each time the launch config changes. Also required is the same lifecycle as the launch config: the new ASG will come up before the old one is destroyed.

resource "aws_autoscaling_group" "example" {
  name = "${aws_launch_configuration.example.name}"
  launch_configuration = "${aws_launch_configuration.example.name}"
  # other config ...
  lifecycle {
    create_before_destroy = true
  }
}

Practical Example

I use this strategy to update instances tied to an ECS cluster. Whenever a new version of the ECS agent (or docker) becomes available the launch config’s AMI gets changed which forces a new launch config to be created, a new ASG comes up, and the old one is removed. Terraform’s configuration is declarative, so it takes a description of the desired state (the configuration above) and figures out how to get there.

The downside is that sometimes the cluster comes down quickly and services may be left with any running tasks. Practically this means serving web traffic may require more tasks (spread across multiple instances) to ensure no downtime as servers are removed.