Announcing CDK Garbage Collection

February 21, 2025 By Mark Otto Off

The AWS Cloud Development Kit (CDK) is an open source framework that enables developers to define cloud infrastructure using a familiar programming language. Additionally, CDK provides higher level abstractions (Constructs), which reduce the complexity required to define and integrate AWS services together when building on AWS. CDK also provides core functionality like CDK Assets, which gives users the ability to bundle application assets into their CDK applications. These assets can be local files (main.py), directories (python_app/), or Docker images (Dockerfile). CDK Assets are stored in an Amazon Simple Storage Service (Amazon S3) Bucket or Amazon Elastic Container Registry (Amazon ECR) Repository that is created during CDK bootstrapping.

For CDK developers that leverage assets at scale, they may notice over time that the bootstrapped bucket or repository accumulated old or unused data. If users wanted to clean this data on their own, CDK didn’t provide a clear way of determining which data is safe to delete. To solve this problem, we are excited to announce the preview launch of CDK Garbage Collection, a new feature of the CDK that automatically deletes old assets in your bootstrapped Amazon S3 Bucket and Amazon ECR Repository, saving users time and money. This feature is available starting in AWS CDK version 2.165.0.

We expect CDK Garbage Collection to help AWS CDK customers save on storage costs associated with using the product while not affecting how customers use CDK.

Quickstart

CDK Garbage Collection is exposed as a CDK CLI command named gc. To use CDK Garbage Collection in its default configuration, run the following command on a terminal in your CDK application.

cdk gc --unstable=gc

The --unstable flag is meant to acknowledge that CDK Garbage Collection is in preview mode. This indicates that the scope and API of the feature might still change, but otherwise the feature is generally production ready and fully supported.

Walkthrough

CDK Garbage Collection works at the environment level, so it will attempt to delete isolated assets in the AWS account / region that you call it in. For the purposes of this walkthrough, you will be re-bootstrapping the environment with a custom qualifier so that you do not delete isolated assets before you are ready.

cdk bootstrap --qualifier=abcdef --toolkit-stack-name=CDKToolkitDemo

You now have a new bootstrap template under the name CDKToolkitDemo and bootstrap resources associated with it. Next, set up a CDK application with both Amazon S3 and Amazon ECR assets:

mkdir garbage-collection-demo && cd garbage-collection-demo
cdk init -l typescript app

Your next step is to replace the existing code In lib/garbage-collection-demo-stack.ts with the following CDK Stack:

import * as path from 'path';
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda'; export class GarbageCollectionDemoStack extends cdk.Stack { constructor(scope: Construct, id: string, props?: cdk.StackProps) { super(scope, id, props); const fn1 = new lambda.Function(this, 'my-function-s3', { code: lambda.Code.fromAsset(path.join(__dirname, '..', 'lambda')), runtime: lambda.Runtime.NODEJS_LATEST, handler: 'index.handler', }); const fn2 = new lambda.Function(this, 'my-function-ecr', { code: lambda.Code.fromAssetImage(path.join(__dirname, '..', 'docker')), runtime: lambda.Runtime.FROM_IMAGE, handler: lambda.Handler.FROM_IMAGE, }); }
}

This creates two AWS Lambda functions, one which uses an Amazon S3 asset as its source code and one that uses an Amazon ECR image as its source code. You need to add the assets that are referenced to our CDK application. In lambda/index.js add a simple Lambda function:

exports.handler = async function(event) { const response = require('./response.json'); return response;
};

And in docker/Dockerfile add a simple Docker image:

FROM public.ecr.aws/docker/library/alpine:latest

Now you can run cdk deploy and get your initial CDK application set up in your AWS Account.

cdk deploy \ --toolkit-stack-name=CDKToolkitDemo \ --context='@aws-cdk/core:bootstrapQualifier=abcdef'

At this point you can check to make sure that assets have been correctly added into the bootstrapped Amazon S3 bucket and Amazon ECR repository:

cdk assets inside s3 bucket

Two objects exist in the bootstrapped Amazon S3 Bucket after the initial AWS CDK Deploy.

1 Image exists in the bootstrapped Amazon ECR Repository after the initial AWS CDK Deploy.

One image exists in the bootstrapped Amazon ECR Repository after the initial AWS CDK Deploy.

The output shows that you have the data you expect in both bootstrapped resources. The Amazon S3 Bucket also stores the json file of the AWS CloudFormation Template that was generated when you ran cdk deploy.

You can now simulate a typical CDK development cycle by updating both assets. Add a small change to the Amazon S3 asset that lives in lambda/index.js:

exports.handler = async function(event) { console.log('hello world'); const response = require('./response.json'); return response;
};

And do the same in docker/Dockerfile:

FROM public.ecr.aws/docker/library/alpine:latest
CMD echo 'Hello World'

You can now run cdk deploy again, and both assets should be re-uploaded under a new hash.

4 Objects exist in the bootstrapped Amazon S3 Bucket after the second AWS CDK Deploy.

Four objects exist in the bootstrapped Amazon S3 Bucket after the second AWS CDK Deploy.

2 Images exist in the bootstrapped Amazon ECR Repository after the second AWS CDK Deploy.

Two images exist in the bootstrapped Amazon ECR Repository after the second AWS CDK Deploy.

This output confirms that everything is as expected and the new assets have been added in. Because you are using new bootstrapped resources, you can still tell which resources are currently isolated and which are not. Right now, only the zipfile prefixed with 50f409b9 is referenced in AWS CloudFormation, and in Amazon ECR, only the image prefixed a5801b5b is referenced. That means that every other asset — 3 objects in Amazon S3 and 1 object in Amazon ECR — are isolated and can be deleted.

One item to note is the additional files in Amazon S3 that are not your local assets — these are AWS CloudFormation templates that are uploaded to Amazon S3 as an intermediary step before being sent to AWS CloudFormation. They are not needed after being copied over and are a perfect candidate for deletion via CDK Garbage Collection.

Here is where CDK Garbage Collection comes in. With the right parameters, you are able to clean up the isolated objects while not disturbing the assets that are actively in use.

cdk gc \ --unstable=gc \ --bootstrap-stack-name=CDKToolkitDemo \ --rollback-buffer-days=0 \ --created-buffer-days=0

Because you want to delete assets immediately, and not tag them for deletion later, set rollback-buffer-days to 0. You also want to delete assets that were just created, so be sure to set created-buffer-days to 0 as well. The default for created-buffer-days is 1.

 ⏳ Garbage Collecting environment aws://912331974472/us-east-1...
Found 3 objects to delete based off of the following criteria:
- objects have been isolated for > 0 days
- objects were created > 0 days ago Delete this batch (yes/no/delete-all)? 

CDK Garbage Collection found three assets to be deleted from Amazon S3, which is to be expected. It prompts you to verify that you want to delete, which you do, so enter yes. You will then get this response:

[100.00%] 4 files scanned: 0 assets (0.00 MiB) tagged, 3 assets (0.02 MiB) deleted.

Followed by:

Found 1 image to delete based off of the following criteria:
- images have been isolated for > 0 days
- images were created > 0 days ago Delete this batch (yes/no/delete-all)?

Once again, this is to be expected for Amazon ECR, so you enter yes again. You then get the response:

[100.00%] 2 files scanned: 0 assets (0.00 MiB) tagged, 1 assets (3.90 MiB) deleted.

At this point, CDK Garbage Collection is finished.

Details

CDK Garbage Collection exposes some parameters to help you customize the experience to your specific scenario. These options help you determine how aggressive you want your garbage collection to be.

  • rollback-buffer-days: this is the amount of days an asset has to be marked as isolated before it is eligible for deletion.
  • created-buffer-days: this is the amount of days an asset must live before it is eligible for deletion.

Rollback Buffer Days should be considered when you are not using cdk deploy and instead use a deployment method that operates on templates only, like a pipeline. If your pipeline can rollback without any involvement of the CDK CLI, this parameter will help ensure that assets are not prematurely deleted. When used, instead of deleting unused objects, cdk gc tags them with the current date. Subsequent runs of cdk gc will check this tag and delete the asset only after it has been tagged for longer than the specified buffer days.

Created Buffer Days should be considered if you want to be extra safe about assets that have been recently uploaded. When used, cdk gc filters out any assets that have not persisted that number of days. Note that this may not include assets that have been shared across multiple CDK Apps CDK reuses assets that are identical, and its possible that a recent deploy of a CDK App references an asset that was uploaded earlier.

For example, if you want to ensure that only assets that are over a month old and have been isolated for a week are deleted, you can specify:

cdk gc --unstable --rollback-buffer-days=7 --created-buffer-days=30.
Decision flow diagram of an asset as it gets audited for garbage collection.

Decision flow diagram of an asset as it gets audited for garbage collection.

Limitations of CDK Garbage Collection

During CDK Garbage Collection, we collect all stack templates to see what assets are in use. If garbage collection runs between the asset upload and stack deployment, there is a chance that it does not pick up the latest stack deployment, but it does pick up the latest asset. In this scenario, CDK Garbage Collection may delete those assets.

We recommend not deploying stacks while running CDK Garbage Collection. If that is unavoidable, setting --created-buffer-days will help as garbage collection will avoid deleting assets that are recently created. Finally, if you do experience a failed deployment, the mitigation is to redeploy, as the asset upload step will be able to re-upload the missing asset. In practice, this race condition is only for a specific edge case and unlikely to happen. However, we are working on a new method of storing CDK Assets to reduce the risk of this race condition. That work is being tracked in this issue.

Conclusion

CDK Garbage Collection helps users manage the lifecycle of unused CDK Assets in their AWS account. As users continue to scale with the CDK, tools like CDK Garbage Collection will play a crucial role in maintaining clean, efficient, and cost-effective cloud environments. We encourage CDK users to explore this feature, provide feedback, and incorporate it into their workflows to optimize their AWS resource management.