Logo

dev-resources.site

for different kinds of informations.

Monitoring AWS ECS Deployment failures

Published at
9/26/2024
Categories
ecs
aws
eventbridge
lambda
Author
nileshprasad137
Categories
4 categories in total
ecs
open
aws
open
eventbridge
open
lambda
open
Author
15 person written this
nileshprasad137
open
Monitoring AWS ECS Deployment failures

Image description

This post discusses how ECS state change events can be used to monitor deployment failures on ECS. To set the context, I work on a project where we use ECS to deploy containerized applications, and our CircleCI pipeline is responsible for building Docker images, pushing them to AWS ECR, and initiating the ECS deployment using the aws ecs update-service command. Our CircleCI job ends after this command, at which point we considered the deployment successful. However, the deployment isn't truly complete until the new containers are up and running, which created a gap in monitoring, as containers could fail to start due to issues like failed migrations, incorrect configurations, or resource allocation problems.

Relying solely on aws ecs update-service execution was misleading, as it didn't account for failures after the deployment was initiated. To address this, we needed to listen to ECS state change events, particularly for failed deployments. These events provide real-time insight into whether containers failed to start, allowing us to handle issues like failed migrations or resource allocation errors and notify team on Slack for further investigation.

Using EventBridge to Monitor ECS State Changes

Amazon EventBridge is a powerful event bus that can help monitor and respond to various AWS service events., including ECS deployment state changes. When you deploy containerized applications with ECS, ECS deployment state changes are automatically sent to EventBridge, specifically these:

SERVICE_DEPLOYMENT_IN_PROGRESS
The service deployment is in progress. This event is sent for both initial deployments and rollback deployments.

SERVICE_DEPLOYMENT_COMPLETED
The service deployment has completed. This event is sent once a service reaches a steady state after a deployment.

SERVICE_DEPLOYMENT_FAILED
The service deployment has failed. This event is sent for services with deployment circuit breaker logic turned on.

To track failed ECS deployments, we can set up an EventBridge rule that listens for SERVICE_DEPLOYMENT_FAILED events. This captures real-time failure information, allowing us to quickly respond to issues such as failed migrations, configuration errors, or resource limitations. Below is an example of the EventBridge rule used to listen for these failure events. When this rule matches an event, it can trigger AWS Lambda or other services to send alerts to your Slack channel, providing real-time visibility into deployment failures.

{
    "source": ["aws.ecs"],
    "detail-type": ["ECS Deployment State Change"],
    "detail": {
      "eventType": ["ERROR"],
      "eventName": ["SERVICE_DEPLOYMENT_FAILED"]
    }  
}
Enter fullscreen mode Exit fullscreen mode

Here's an example of a failed deployment event that would trigger this rule. This event indicates that a task failed to start during the ECS deployment, potentially due to issues like incorrect configurations or missing dependencies.

{
   "version": "0",
   "id": "ddca6449-b258-46c0-8653-e0e3aEXAMPLE",
   "detail-type": "ECS Deployment State Change",
   "source": "aws.ecs",
   "account": "111122223333",
   "time": "2020-05-23T12:31:14Z",
   "region": "us-west-2",
   "resources": [ 
        "arn:aws:ecs:us-west-2:111122223333:service/default/servicetest"
   ],
   "detail": {
        "eventType": "ERROR", 
        "eventName": "SERVICE_DEPLOYMENT_FAILED",
        "deploymentId": "ecs-svc/123",
        "updatedAt": "2020-05-23T11:11:11Z",
        "reason": "ECS deployment circuit breaker: task failed to start."
   }
}
Enter fullscreen mode Exit fullscreen mode

So, now we are able to capture failed deployment events on eventbridge rule. We now need to set target, where we want EventBridge to send any events that match the event pattern of the rule. In our case, we'll use AWS Lambda. We'll use Lambda to send slack alerts on our configured incoming webhooks. To read more on how you can setup slack webhook, read this later.

Setting Up AWS Lambda to Post Deployment Failure Alerts to Slack

This lambda function will parse the event, determine whether the failure occurred in staging or production, generate a direct URL to the affected ECS service, and send the failure details to a specified Slack channel. For detailed code, you can refer to this Lambda function code on GitHub Gist.

Conclusion and further readings

Monitoring ECS deployments is crucial to ensure that your applications are running smoothly. By using Amazon EventBridge to capture ECS state change events and integrating AWS Lambda with Slack, you can receive real-time notifications whenever a deployment fails.

For further readings, check out the following resources to deepen your understanding of ECS deployment events and the deployment circuit breaker:

These documents will help you gain a more in-depth understanding of the ECS deployment lifecycle and the circuit breaker feature that helps in rolling back failed deployments automatically.

ecs Article's
30 articles in total
Favicon
ecstop: My CLI Tool to Stop ECS Resources Easily
Favicon
Deploying Flask-based Microservices on AWS with ECS Service Connect
Favicon
Docker in AWS: Elastic Beanstalk, ECS, and Fargate Explained
Favicon
Retour d'expรฉrience : Quand ECS s'impose comme une alternative pertinente ร  Kubernetes
Favicon
How to Deploy a Multi-Container App in Amazon ECS?
Favicon
Deploying Fider on AWS ECS: A Step-by-Step Guide to Deploy a Feedback Platform
Favicon
Exporting an AMI to multiple formats
Favicon
Speeding up ECS containers with SOCI
Favicon
Amazon ECS Overview ๐Ÿš€
Favicon
ECS Blue/Green com CodePipeline - Provisionado com Terraform
Favicon
Standup Serverless Jenkins on Fargate with Terraform - Part 2: ECS Deployment
Favicon
ECS Orchestration Part 4: Monitoring
Favicon
ECS Orchestration Part 3: Autoscaling
Favicon
AWS ELASTIC CONTAINER SERVICE
Favicon
A Decade of AWS Lambda and ECS: My Journey of Growth and Gratitude
Favicon
ECS Task can not find a secret manager even if exist
Favicon
Solving AWS ECS connect timeouts: configure default settings easily
Favicon
Monitoring AWS ECS Deployment failures
Favicon
AWS Compute - Part 2: containerization
Favicon
Deploying a Dockerized Web App on AWS Using ECS and Fargate: A Step-by-Step Guide
Favicon
How to Change Network Configurations for Blue/Green Amazon ECS Services
Favicon
Serverless Jenkins: ECS on Fargate - Simple Setup
Favicon
AmazonECS now supports AWS Graviton-based Spot compute with AWS Fargate Spot
Favicon
A Comprehensive Guide to Generating Entity Prefabs at Runtime in Unity ECS
Favicon
Easily automate Rust web service deployments on AWS without DevOps
Favicon
WSL in AWS Windows Server 2022 Core instance
Favicon
A Step-by-Step Guide to Creating and Adding Components in Unity ECS
Favicon
Deploying a Spring Boot Application on AWS: ECS, EKS, or Kubernetes? A Detailed Guide with Cost-Effective Recommendations
Favicon
What is ECS in Unity
Favicon
Por que escolhi AWS ECS para uma fintech e nรฃo o Serveless

Featured ones: