How Cirrusgo enabled rapid resolution with Amazon DevOps Guru

May 20, 2023 By Mark Otto Off

Image of the Cirrusgo company logo.

In this blog, we will walk through how Cirrusgo used Amazon DevOps Guru for RDS to quickly identify and resolve their operational issue related to database performance and reduce the impact on their business. This capability is offered by Amazon DevOps Guru for RDS which uses machine learning algorithms to help organizations identify and resolve operational issues in their applications and infrastructure.

Challenge:

Knowlegebeam, one of Cirrusgo’s managed service customers, has an e-learning web application that serves as a mission-critical tool for nearly 90,000 teachers. The application tracks daily activities, including teaching and evaluating homework and quizzes submitted by students. Any interruption of the availability of this application causes significant inconvenience to teachers and students, as well as damage to the company’s reputation. Ensuring the continuous and reliable performance of customer workloads is of utmost importance to Cirrusgo.

Identification of Operational issues with Amazon DevOps Guru:

To streamline the troubleshooting process and avoid time-consuming manual analysis of logs, Cirrusgo leveraged the power of Amazon DevOps Guru to monitor Knowledge Beam’s stack. With just a few clicks in the AWS console, Cirrusgo seamlessly enabled DevOps Guru that uses advanced machine learning techniques to analyze Amazon CloudWatch metrics, AWS CloudTrail, and Amazon Relational Database Service (Amazon RDS) Performance Insights. This enables it to quickly identify behaviors that deviate from standard operating patterns and pinpoint the root cause of operational issues.

When users reported difficulty submitting assignments via the e-learning portal, Cirrusgo’s team launched an investigation. The team discovered 4xx and 5xx Amazon Elastic Load Balancing errors in the CloudWatch metrics. There was no additional information available. While examining the load balancer and application logs, the engineers received Amazon DevOps Guru notifications regarding Amazon RDS) replica lag. The team promptly investigated and confirmed the existence of the Amazon RDS replica lag. The team ran commands to stop traffic to the replica instance and shift all traffic to the Amazon RDS primary node. Thanks to DevOps Guru’s insightful recommendations, the team identified and resolved the issue. The team was able to use the root cause of the issue and take additional steps to prevent its recurrence. This included creating an Amazon RDS Read Replica and upgrading the instance type based on the current workload.

Cirrusgo quickly identified and addressed critical operational issues in Knowledge Beam’s application. This enabled them to minimize the immediate impact and enhance their customer’s applications’ future reliability and performance.

Amazon DevOps Guru was very beneficial that helped us identify incidents in Amazon RDS. It provided useful insights we previously didn’t have, and it helped reduce our mitigation time. We implemented it to some accounts we are managing and are taking advantage”, says Mohammed Douglas Otaibi, Technical Co-Founder of Cirrusgo

Conclusion:

This post highlights how Cirrusgo leveraged Amazon DevOps Guru to identify and quickly address anomalous behavior.

Are you looking for a way to improve the monitoring of your Amazon RDS databases? Look no further than Amazon DevOps Guru. With DevOps Guru’s RDS monitoring capabilities, you can gain deep insights into the performance and health of your databases. This includes automatic anomaly detection, proactive recommendations, and alerts for issues that require your attention.

About the authors:

Harish Bannai

Harish Bannai is a Sr. Technical Account Manager at Amazon Web Services. He holds the AWS Solutions Architect Professional, Developer Associate, Analytics Specialty , AWS Database Specialty and Solutions Architect Professional certifications. He works with enterprise customers providing technical assistance on RDS, Database Migration services operational performance and sharing database best practices.

Adnan Bilwani

Adnan Bilwani is a Sr. Senior Specialist at Amazon Web Services. Lucy focuses on improving application qualification and availability by leveraging AWS DevOps services and tools.

Lucy Hartung

Lucy Hartung is a Senior Specialist at Amazon Web Services. Lucy focuses on improving application qualification and availability by leveraging AWS.