Logo

dev-resources.site

for different kinds of informations.

ECS Orchestration Part 4: Monitoring

Published at
11/13/2024
Categories
aws
ecs
monitoring
metrics
Author
dbanieles
Categories
4 categories in total
aws
open
ecs
open
monitoring
open
metrics
open
Author
9 person written this
dbanieles
open
ECS Orchestration Part 4: Monitoring

This post is about monitoring an ECS cluster, if you want to learn more about container orchestration with ECS you can see Part 1, Part 2, Part 3. Let's start by saying the monitoring an Amazon ECS (Elastic Container Service) cluster is essential for tracking resource utilization, performance, and health of your containerized applications. In ECS, monitoring focuses on aspects like CPU and memory utilization, task and container statuses, and network traffic. Amazon CloudWatch is commonly used to monitor ECS clusters by providing metrics, logs, and alarms for observability.

Key ECS Monitoring Components:

  1. Container Insights: A feature in CloudWatch that provides more granular metrics and analysis on ECS performance.
  2. CloudWatch Logs: Captures logs from ECS tasks and containers, essential for debugging and tracking application behavior.
  3. CloudWatch Metrics: These are built-in metrics for CPU, memory, and other resources.
  4. CloudWatch Alarms: Alerts based on metrics, allowing proactive responses to scaling or failures.

Setting Up Monitoring for ECS Using Terraform
Now let us see how to configure monitoring for an ECS cluster using Terraform.

Note:
AWS EC2 and AWS Auto Scaling natively does not support memory metrics (like Memory Utilization), as it only includes basic CloudWatch metrics like CPU Utilization, Network In/Out, etc. To collect memory metrics, youโ€™ll need to install and configure the CloudWatch Agent on your EC2 instances. If youโ€™re using an Amazon Machine Image (AMI) that doesnโ€™t have the agent pre-installed, you can add it via a user data script in your Auto Scaling Group.

#!/bin/bash
# Install the CloudWatch Agent
sudo yum install -y amazon-cloudwatch-agent

# Update package list and install CloudWatch Agent on Ubuntu
sudo apt-get update
sudo apt-get install -y amazon-cloudwatch-agent

Enter fullscreen mode Exit fullscreen mode

1. Enable ECS Container Insights in Terraform

Container Insights in ECS provides metrics such as memory and CPU utilization at both the cluster and service levels. You can enable Container Insights directly when creating the ECS cluster in Terraform.

resource "aws_ecs_cluster" "ecs_cluste" {
  name = "my-cluster"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}
Enter fullscreen mode Exit fullscreen mode

Once enabled, you can view memory usage per container/task and set CloudWatch Alarms based on Container Insights metrics. This can provide insights into container resource usage and help set thresholds for scaling policies.

2. Configure CloudWatch Logs for ECS Tasks

To capture logs from ECS tasks, create a CloudWatch log group in which each container logs data. Then, configure ECS task definitions to send their logs to this group

resource "aws_cloudwatch_log_group" "ecs_task_logs" {
  name              = "/ecs/my-task"
  retention_in_days = 7
}

resource "aws_ecs_task_definition" "task_definition" {
  family                   = "my-task"
  network_mode             = "awsvpc"
  container_definitions    = jsonencode([
    {
      name      = "app-container",
      image     = "nginx:latest",
      cpu       = 256,
      memory    = 512,
      essential = true,
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.ecs_task_logs.name
          "awslogs-region"        = "eu-west-1"
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])
}

Enter fullscreen mode Exit fullscreen mode

This setup creates a log group and configures each ECS task container to send logs to CloudWatch. The log retention period is set to 7 days.

3. Create CloudWatch Alarms for ECS Metrics

You can configure CloudWatch alarms on key ECS metrics to trigger notifications or actions based on thresholds. For example, you might set up alarms for high CPU or memory usage in your ECS service.

resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
  alarm_name          = "high_cpu_alarm"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "Triggered when CPU utilization exceeds 80%"

  dimensions = {
    ClusterName = aws_ecs_cluster.example.name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "memory_alarm" {
  alarm_name          = "high_memory_alarm"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "MemoryUtilization"
  namespace           = "AWS/ECS"
  period              = 60
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "Triggered when CPU utilization exceeds 80%"

  dimensions = {
    ClusterName = aws_ecs_cluster.example.name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

resource "aws_sns_topic" "alerts" {
  name = "ecs_alerts"
}

resource "aws_sns_topic_subscription" "alert_subscription" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = "[email protected]"
}


Enter fullscreen mode Exit fullscreen mode

In this example, the CloudWatch alarm monitors CPU and memory utilization on the ECS cluster and triggers an alarm if it goes above 80% for two consecutive periods of 60 seconds. The alarm sends a notification to an SNS topic configured to send email alerts.

4. Set Up Detailed ECS Monitoring with CloudWatch Dashboards

You can use CloudWatch Dashboards to visualize metrics for ECS services and clusters. With Terraform, you can define custom dashboards that show CPU and memory metrics for quick, real-time monitoring.

resource "aws_cloudwatch_dashboard" "ecs_dashboard" {
  dashboard_name = "ECS-Dashboard"
  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric",
        x    = 0,
        y    = 0,
        width = 6,
        height = 6,
        properties = {
          metrics = [
            ["AWS/ECS", "CPUUtilization", "ClusterName", aws_ecs_cluster.my_cluster.name],
            ["AWS/ECS", "MemoryUtilization", "ClusterName", aws_ecs_cluster.my_cluster.name]
          ]
          title = "ECS Cluster CPU and Memory Utilization"
          view = "timeSeries"
          stacked = false
          region = "us-west-2"
          period = 300
          stat = "Average"
        }
      }
    ]
  })
}

Enter fullscreen mode Exit fullscreen mode

This dashboard contains a widget showing CPU and memory utilization for the ECS cluster. You can customize the dashboard to display metrics for specific services, tasks, or additional resources in your ECS cluster.

Summary

  1. Enable Container Insights to get granular metrics on your ECS cluster and services.
  2. Set Up CloudWatch Logs to capture ECS task logs and make debugging easier.
  3. Create CloudWatch Alarms for proactive alerts on resource usage, task health, and other custom metrics.
  4. Use CloudWatch Dashboards for real-time visual monitoring of ECS cluster and service performance.

By setting up these components with Terraform, you achieve consistent and automated monitoring, giving you insight into the performance and health of your ECS cluster and services. This configuration is especially useful in production environments where proactive monitoring is essential for maintaining application uptime and resource efficiency.

metrics Article's
30 articles in total
Favicon
25 Questions About Event Success KPIs: The Event FAQ Guide
Favicon
What is Observability?
Favicon
How I obtained the true lines of code in my project.
Favicon
Building a Metric Program That Actually Works
Favicon
Rule of 40 - A Key Metric for Evaluating SaaS Companies
Favicon
DORA Metrics and Implementation in CircleCI
Favicon
[Open Source] Simplify Metrics Reporting in NestJS with a Zero-Dependency-Injection Global Reporter
Favicon
Google Log-Based Metrics
Favicon
Observability - 3(Prometheus Explanation)
Favicon
Observability - 2(Metrics, Monitoring & Prometheus)
Favicon
ECS Orchestration Part 4: Monitoring
Favicon
Observability - 4(Custom Metrics Instrumentation)
Favicon
Advanced Metrics Optimization: Filter, Reduce, and Aggregate
Favicon
Monitor EC2 memory and disk with CloudWatch Agent๐Ÿ•ต๐Ÿฝ
Favicon
Kubectl Top command:-Secrets behind scenes
Favicon
Long Frames and INP: Understanding Post-Load Performance
Favicon
Top 5 UX Frameworks to measure User Experience
Favicon
Is Modern Observability Enough?
Favicon
Understanding the 0.6-Second Detection Time for Full Outages
Favicon
Exposing Hazelcast Metrics with JMX and Prometheus
Favicon
Understanding CPU Performance Metrics
Favicon
Behavioral and Attitudinal UX metrics
Favicon
The Role of DORA Metrics in Accelerating Software Delivery
Favicon
OpenTelemetry Metrics meets Azure
Favicon
Build metrics and budgets with git-metrics
Favicon
Top KPIs for Your SaaS Venture Post-Launch
Favicon
Forget the Gym, Benchmark Testing is Here to Pump Up Your Software!
Favicon
The Adventures of Blink #23: Outcome-Based Measurement
Favicon
The Importance of Analytics in Mobile Application ๐Ÿ“Š
Favicon
Measurements with MetricKit

Featured ones: