Introduction
In today's digital age, businesses rely heavily on technology to drive their operations and serve their customers efficiently. However, with this dependency comes the risk of unexpected disruptions such as natural disasters, cyberattacks, or system failures. To mitigate these risks and ensure uninterrupted service delivery, companies need robust disaster recovery solutions in place. To address this, a leading AMC sought a reliable disaster recovery solution that could safeguard their operations and meet stringent their audit requirements.
Business Challenge:
- Compliance: The AMC needed to establish a disaster recovery setup to comply with industry regulations and audits.
- Uninterrupted Operations: Ensuring seamless service delivery to millions of customers was paramount, even in the event of unforeseen disruptions.
- Risk Mitigation: Protecting against risks posed by natural disasters, cyberattacks, or system failures was a top priority.
Solution:
Partnering closely with the AMC, Bajaj Technology Services designed and implemented an AWS’s Warm Standby disaster recovery solution. Disaster Recovery (DR) was available across 2 availability zones, but we took it to the next level by following automation first approach and did it across 2 regions. The warm standby approach is an Active Passive DR approach of AWS, this approach was followed by replicating all infrastructure with lower scale size and necessary AWS services from Primary region (Mumbai Region) into secondary region i.e. into DR region (Hyderabad Region), so that Production Environment (which is hosted in Mumbai region) Infrastructure or services can be quickly recovered in the event of a disaster by scaling DR Infrastructure (EC2, RDS) (which is hosted in Hyderabad region) size as per Production Infrastructure, performing DB failover from Primary region into DR region and by diverting or pointing Production Environment traffic to DR Environment by doing necessary changes for Public DNS configuration in CDN.
In less than 3 months' time, the DR solution was in place and fully validated. The sample architecture diagram can be seen in the figure below.
The project went through the following phases:
- Objective Alignment: Defined Recovery Time Objective (RTO) and Recovery Point Objective (RPO) , ensuring the disaster recovery plan met precise business needs.
- Deployment Strategy: Orchestrated the deployment of 10+ critical AWS services into the disaster recovery (DR) region, spanning ECS, ECR, EC2, S3, AWS RDS, Lambda, ACM, Route 53 HZs, VPC, Subnets, etc.
- Database Migration: Collaborated on setting up DB instances and executed migration strategies for various databases, for each database optimal approach was followed.
- Automated Infrastructure Setup: Employed AWS CloudFormation templates, AWS Backup Service, and Lambdas to automate infrastructure setup in the DR region, tailoring configurations to regional requirements.
- Application Services Deployment: Ensured smooth deployment by addressing infrastructure configuration challenges, updating AWS SDK versions, and migrating critical application services to the DR region.
- Testing Excellence: Conducted a meticulous Mock DR drill, resolving identified issues before the actual DR drill, ensuring the disaster recovery plan's effectiveness.
- Final Disaster Recovery Drill: Conducted the final DR drill in which the secondary region’s infrastructure was up in 24mins followed by the migrations.
Impact:
The implementation of the warm standby disaster recovery solution yielded significant benefits
- Compliance: Adherence to audit requirements through the establishment of a robust DR setup and regular DR drills.
- Risk Mitigation: Protection against natural and technical disasters affecting entire regions within AWS.
- Data Loss Prevention: Continuous database replication ensured minimal data loss.
- Improved Efficiency: Automation reduced the risk of errors and improved recovery time.
In conclusion, the adoption of a warm standby disaster recovery solution demonstrated a commitment to operational excellence, resilience, and compliance. This strategic investment enabled the AMC to safeguard their business continuity and mitigate risks in today's dynamic business landscape.