AWS Cloud Operations & Migrations Blog

Automate RDS Aurora Snapshots for disaster recovery

It is important to have a well-defined proactive disaster recovery strategy for efficient and uninterrupted flow of data across an organization. This applies to all components of your application architecture, including the database layer. While Amazon Aurora database clusters are fault-tolerant and highly available by design, for disaster recovery use cases, customers prefer to keep a snapshot of their Aurora database clusters in an AWS Region different from the primary Region.

In this blog post, we demonstrate how you can leverage AWS Systems Manager to create encrypted snapshots of Amazon RDS Aurora (MySQL or PostgreSQL) clusters. Furthermore, we will use AWS Systems Manager to copy those snapshots to a different AWS Region, for disaster recovery purposes.

Walkthrough

The solution takes advantage of AWS System Manager Automation feature to build a three-step automation workflow, as shown in the following diagram:

  1. Create an Aurora database cluster snapshot using Automation’s aws:executeAwsApi capability and invoking the CreateDBClusterSnapshot API.
  2. Wait for the snapshot to complete, using Automation’s aws:waitForAwsResourceProperty capability and invoking the DescribeDBClusterSnapshots API.
  3. Initiate a snapshot copy to target a selected Region using the aws:executeScript action, which uses the CopyDBClusterSnapshot API in the Python script.
Solution architecture for Automating RDS Aurora Snapshots for disaster recovery

Solution architecture for Automating RDS Aurora Snapshots for disaster recovery

We have provided an AWS CloudFormation template that deploys this solution in your AWS account. The template takes four parameters – DBClusterIdentifier (source DB cluster ID), KMSTargetKey (KMS key ID in the target region), SourceRegion (region where DB cluster is located) and TargetRegion (destination region for the snapshot). Please ensure that the SourceRegion and TargetRegion parameter inputs are specified in lower case (such as us-east-1 and us-west-2). In addition, also ensure that you execute this template in the Region where your Aurora database cluster resides. The template appends an Aurora database cluster identifier in the name of the resources it creates. Hence, you can deploy this template individually for each of your Aurora clusters. See Creating a Stack on the AWS CloudFormation Console for more information on creating an AWS CloudFormation stack.

The AWS CloudFormation deploys the following AWS Systems Manager automation document:

description: Aurora RDS Cluster Snapshot and Copy Automation Document
schemaVersion: '0.3'
assumeRole: '<AssumedRoleARN>'
mainSteps:
  - name: CreateSnapshot
    action: 'aws:executeAwsApi'
    inputs:
      Service: rds
      Api: CreateDBClusterSnapshot
      DBClusterSnapshotIdentifier: '<CLUSTER IDENTIFIER>-db-snapshot-{{automation:EXECUTION_ID}}'
      DBClusterIdentifier: <Aurora Cluster Identifier>
    outputs:
      - Name: SnapShotId
        Selector: $.DBClusterSnapshot.DBClusterSnapshotIdentifier
        Type: String
      - Name: DBClusterId
        Selector: $.DBClusterSnapshot.DBClusterIdentifier
        Type: String
      - Name: DBClusterSnapshotArn
        Selector: $.DBClusterSnapshot.DBClusterSnapshotArn
        Type: String
  - name: waitForSnapshotCompletion
    action: 'aws:waitForAwsResourceProperty'
    inputs:
      Service: rds
      Api: DescribeDBClusterSnapshots
      DBClusterSnapshotIdentifier: '<CLUSTER IDENTIFIER>-db-snapshot-{{automation:EXECUTION_ID}}'
      DBClusterIdentifier: <CLUSTER IDENTIFIER>
      PropertySelector: '$.DBClusterSnapshots[0].Status'
      DesiredValues:
        - available
  - name: ExecuteCode
            action: 'aws:executeScript'
            inputs:
              Runtime: python3.7
              Handler: script_handler
              InputPayload:
                snapshotid: '{{CreateSnapshot.SnapShotId}}'
                snapshotarn: '{{CreateSnapshot.DBClusterSnapshotArn}}'
                dbclusterid: '{{CreateSnapshot.DBClusterId}}'
                automationid: '{{automation:EXECUTION_ID}}'
                sourceregion: !Ref SourceRegion
                targetregion: !Ref TargetRegion
                kmstargetkey: !Ref KMSTargetKey
              Scri<pre><code class="lang-yaml">pt: |- def script_handler(event, context): import boto3, json, os # Input parameters are provided by SSM document snapshotid = event.get("snapshotid") snapshotarn = event.get("snapshotarn") dbclusterid = event.get("dbclusterid") sourceregion = event.get("sourceregion") targetregion = event.get("targetregion") kmstargetkey = event.get("kmstargetkey") # Define Target region in the region_name.Following API # is expected to run in Target region. Hence, by setting region_name # to Target region, we achive that. client = boto3.client('rds', region_name=targetregion) response = client.copy_db_cluster_snapshot( SourceDBClusterSnapshotIdentifier=snapshotarn, TargetDBClusterSnapshotIdentifier=snapshotid, KmsKeyId=kmstargetkey, # KMS Key ID in Target region CopyTags=True, SourceRegion=sourceregion # This attribute will automatically generate presigned URL ) print(response) copystatus = response.get("DBClusterSnapshot").get("Status") print("Status of Copying of Snapshot:" + str(copystatus)) 

Copying the snapshot to the target Region using the CopyDBClusterSnapshot API requires generation of a PreSignedURL. You can use the aws:executeScript action to execute a Python script, which invokes this API. The script uses the AWS SDK for Python, which automatically generates the PreSignedUrl once you provide the SourceRegion attribute. The Amazon RDS client in the script is initialized in the snapshot target AWS Region. Please note that the script executes in the source region where the Aurora database cluster exists.

Conclusion

This blog post presents a solution for implementing disaster recovery for Aurora database clusters by automating the process of cluster snapshot creation and copying to different AWS Regions.  Based on your Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements, this process can be triggered using either AWS Systems Manager Maintenance Windows or an Amazon CloudWatch event rule, which uses an Automation document as a target.

 

About the Authors

 

Kapil Shardha is an AWS Solutions Architect and supports enterprise customers with their AWS adoption. He has background in infrastructure automation and DevOps.

 

 

 

William Torrealba is an AWS Serverless Specialist Solutions Architect supporting customers with their AWS adoption specially in the usage of Serverless Technologies. He has background in Application Development, Serverless Technologies, High Available Distributed Systems, Automation, and DevOps.