De-risk releases with AWS CodePipeline rollbacks
April 30, 2024It’s an established practice for development teams to build deployment pipelines, with services such as AWS CodePipeline, to increase the quality of application and infrastructure releases through reliable, repeatable and consistent automation.
Automating the deployment process helps build quality into our products by introducing continuous integration to build and test code as early as possible, aiming to catch errors before they reach production, but with all the best will in the world, issues can be missed and not caught until after a production release has been initiated.
In our AWS DevOps Guidance we include a DevOps Sagas indicator to Implement automatic rollbacks for failed deployments:
“Implement an automatic rollback strategy to enhance system reliability and minimize service disruptions. The strategy should be defined as a proactive measure in case of an operational event, which prioritizes customer impact mitigation even before identifying whether the new deployment is the cause of the issue.”
When release problems occur, pressure is put on teams to fix the issue quickly which can be time consuming and stressful. Is the issue related to code in the last change? Has the environment changed? Should we manually fix-forward by urgently fixing the code and re-releasing?
AWS CodePipeline recently added a stage level rollback feature that enables customers to recover from a failed pipeline quickly by processing the source revisions that previously successfully completed the failed stage.
In this blog post I will cover how the rollback feature can be enabled and walk through two scenarios to cover both automatic and manual rollbacks.
AWS CodePipeline
AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates.
CodePipeline has three core constructs:
- Pipelines – A pipeline is a workflow construct that describes how software changes go through a release process. Each pipeline is made up of a series of stages.
- Stages – A stage is a logical unit you can use to isolate an environment and to limit the number of concurrent changes in that environment. Each stage contains actions that are performed on the application artifacts.
- Actions – An action is a set of operations performed on application code and configured so that the actions run in the pipeline at a specified point.
A pipeline execution is a set of changes released by a pipeline. Each pipeline execution is unique and has its own ID. An execution corresponds to a set of changes, such as a merged commit or a manual release of the latest commit.
Enable Automatic rollbacks
Automatic rollbacks can be enabled at the stage level within V2 type pipelines. When automatic rollbacks are enabled on a stage that has failed, CodePipeline will automatically select the source revisions from the most recent pipeline execution that successfully completed the stage and initiate the rollback.
Enabling automatic rollbacks can be set on V2 pipeline creation by selecting Configure automatic rollback on stage failure on a given stage, as per Figure 2 below.
Automated rollbacks can be enabled for any stage, except the “Source” stage.
For existing V2 pipelines, Automatic rollbacks can also be toggled by editing a stage and toggling Configure automatic rollback on stage failure, see Figure 3 below.
Enabling automatic rollback on an existing pipeline will change the pipeline configuration which means pipeline executions before this step will not be considered an eligible pipeline execution. Only successful pipeline executions from this point onwards will be able to be considered for a rollback.
Example 1 – Automated Rollbacks
To demonstrate how auto-rollbacks work, I’ve created a simple pipeline based on one of the CodePipeline tutorials: Create a simple pipeline (S3 bucket). When following this tutorial, In Step 4: Create your first pipeline in CodePipeline there are two differences:
- Ensure a V2 pipeline type is selected when creating the pipeline
- When in the add deploy stage, select Configure automatic rollback on stage failure
Once the pipeline has been created, it should complete its first run within a few minutes.
Now I am going to simulate a deployment failure, and this is how:
- Locate and un-zip either SampleApp_Windows.zip or SampleApp_Linux.zip depending on which server instances was selected when following the CodePipeline tutorial.
- Delete the appspec.yml file.
- Re-zip the application and upload to the Amazon S3 bucket created in Step1: Create an S3 bucket for your application.
The pipeline will trigger again after the new archive is uploaded and after a few minutes the deployment will report a stage failure and trigger an automatic rollback on the “Deploy” stage.
It’s possible to observe the Rollback as it re-deploys the last successful pipeline execution, initially showing a status of “In-progress” as per Figure 5 below:
A new pipeline execution ID can be seen in the stage with the source revisions from the previous successfully completed pipeline execution. After a few minutes the rollback will be marked as “Succeeded”:
Note that the rollback started a new Pipeline execution ID but it used the artifacts and variables from the prior successful pipeline execution.
With our production systems back in a healthy state, I can begin the investigations into why the deploy failed, the execution history, see Figure 7 below, shows the failed execution and allows the Execution ID to be inspected for more details on the failure.
Note that the Trigger column also reports the FailedPipelineExecutionId, we can use the provided link to start investigating the failure.
Example 2 – Manual Rollbacks
There may be cases where the pipeline release was successful but it caused application errors such as errors in logs and infrastructure alarms. In these scenarios a manual rollback can be initiated via the Start rollback button.
When manually triggering a rollback, it is possible to select a specific Execution ID and Source revision to re-deploy:
Selecting Start rollback will create a new Pipeline execution in the stage with the selected source revisions.
Clean up
Instructions on how to clean up the resources created in this blog can be found in the last step of the tutorial used to create the pipeline: Step 7: Clean up resources.
Conclusion
The absence of a rollback strategy can lead to prolonged service disruptions and compatibility issues. Introducing an automatic rollback mechanism to stages within a release pipeline can reduce downtime, maintain system reliability and can help return to a stable state in the event of a fault. Having a rollback plan for deployments, without any disruption for our customers, is critical in making a service reliable. Explicitly testing for rollback safety eliminates the need to rely on manual analysis, which can be error-prone.
To learn more about CodePipeline rollbacks, visit https://docs.aws.amazon.com/codepipeline/latest/userguide/stage-rollback.html.
Further reading
- Ensuring rollback safety during deployments
- Implement automatic rollbacks for failed deployments
- My CI/CD pipeline is my release captain: Easy and automatic rollbacks
- Automating safe, hands-off deployments
- Amazon’s approach to high-availability deployment