Logo

dev-resources.site

for different kinds of informations.

Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate

Published at
11/16/2023
Categories
aws
ecs
natgateway
costoptimization
Author
chayanikaa
Author
10 person written this
chayanikaa
open
Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate

Working on a new project recently, I delved into deploying ECS Fargate containers in private subnets. The goal in this case was to have ECS Fargate containers deployed in private subnets, which allowed ingress only through an Application Load Balancer. We chose this configuration primarily for security and firewall configuration reasons. Cost optimization was also an important consideration for this architecture.

The containers also needed egress access to other (non AWS) services, and this is allowed through a NAT Gateway.

Architecture

Note: Some parts of the architecture(like the database) are omitted from this post, in order to focus on the necessary components.

With this configuration alone, the images would be fetched from ECR(and S3) using the NAT Gateway, which presents the following challenges:

  1. Cost Implications of NAT Gateway Usage: The NAT gateway accrues costs based on a per GB data processing fee, in addition to an hourly charge. For instance, in the us-east-1 region at the time of writing, it's $0.045 per GB. At first glance, this might seem negligible. But consider this: if your container images are around 400MB, deploying just three containers exceeds 1GB. This can quickly add up, leading to unexpectedly high charges. Instances of such unexpected expenses have been reported (source: Excellent Dev.to article). Furthermore, repeated deployments due to failures can exacerbate this cost, as the image is pulled multiple times.

  2. Security Concerns with Data Transit: While this post focuses primarily on cost, it's worth noting that routing traffic over the public internet can pose security risks. For a deeper dive into this aspect, refer to AWS's documentation on VPC Endpoints and ECR.

The Networking Behind Docker Image Retrieval in Private Subnets

ECS interacts with three AWS services behind the scenes when pulling Docker images:

  • ECR DKR: Utilized for Docker Registry APIs. Docker client commands like push and pull engage with this endpoint.
  • ECR API: This endpoint handles calls to the Amazon ECR API, facilitating actions like DescribeImages and CreateRepository.
  • S3: ECR stores the actual layers of Docker images in AWS-managed S3 buckets, typically named arn:aws:s3:::prod-<region>-starport-layer-bucket.

ECS also needs to have access to other services, like ECS telemetry and CloudWatch, but they are not directly linked to the docker image pull.

Understanding and Mitigating NAT Gateway Traffic

In this section, we'll explore different strategies to minimise NAT gateway traffic and, consequently, its associated costs.

The experiment is to deploy one container instance with every scenario. With each step, we add the VPC endpoint(s) mentioned in the scenario to evaluate the difference.

The infrastructure is created using terraform, and can be found in this git repository. The project uses community maintained AWS Terraform modules, which simplify this process. The code examples that follow in the post are using the vpc-endpoints module to create the Gateway and interface endpoints.

In addition, I created a custom dashboard on CloudWatch that has a widget showing the sum of BytesOutToSource(The number of bytes sent through the NAT gateway to the clients in your VPC.) and BytesOutToDestination(The number of bytes sent out through the NAT gateway to the destination.) as an indication of the data processed by the NAT Gateway.

The docker image being used in this scenario is a very simple NodeJS image with a size of ~403MB.

That's enough about the setup, let's dive into the scenarios and results.

1. Only NAT Gateway, no VPC endpoints

As we see in the Total Bytes Out below, all the data(~414MB) for pulling the docker image flows through the NAT Gateway.

Data processed, only NAT Gateway

2. NAT Gateway + S3 Gateway endpoint

Now let's add an S3 Gateway endpoint to the VPC. Gateway endpoints have no cost associated with them. These are offered for S3 and DynamoDB by AWS.

In this case, adding the s3 endpoint using the vpc-endpoints module:

 s3 = {
      service             = "s3"
      private_dns_enabled = true
      service_type        = "Gateway"
      tags                = { Name = "S3 Gateway Endpoint" }
      policy              = data.aws_iam_policy_document.s3_endpoint_policy.json
      route_table_ids     = module.vpc.private_route_table_ids
    },
Enter fullscreen mode Exit fullscreen mode

And corresponding endpoint policy

data "aws_iam_policy_document" "s3_endpoint_policy" {
  statement {
    effect    = "Allow"
    actions   = ["s3:GetObject"]
    resources = ["arn:aws:s3:::prod-${local.region}-starport-layer-bucket/*"] # to access the layer files

    principals {
      type        = "*"
      identifiers = ["*"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Important to note here is that S3 Gateway endpoints should be created in the same region as the S3 bucket.

Data Processed, NAT Gateway + S3 endpoint

As we see here, the data processed by the NAT Gateway drops drastically(to ~245KB), confirming our image layers are now largely being transferred through the S3 gateway endpoint.

Note: If your containers have existing connections to Amazon S3, their connections might be briefly interrupted when you add the Amazon S3 gateway endpoint. Source

3. NAT Gateway + S3 Gateway endpoint + ECR DKR interface endpoints

In the next step, we add an ECR DKR interface endpoint.

ecr_dkr = {
      service             = "ecr.dkr"
      private_dns_enabled = true
      tags                = { Name = "ECR DKR Interface Endpoint" }
      subnet_ids          = [module.vpc.private_subnets[0]] # Interface endpoints are priced per AZ
      policy              = data.aws_iam_policy_document.generic_endpoint_policy.json
    },
Enter fullscreen mode Exit fullscreen mode

See the demo project for details on the endpoint policy.

Note that interface endpoints also have an hourly and data processing fees, but these tend to be lower than NAT gateway charges. Depending on the amount of data processed by the NAT gateway for a particular service, it might make sense to include these for cost optimization reasons.

Data Processed, NAT Gateway + S3 Gateway endpoint + ECR DKR interface endpoint

In this instance the traffic for a single deployment dropped further to ~33KB.

4. NAT Gateway + S3 Gateway endpoint + ECR DKR and API interface endpoints

Adding the ECR API endpoint:

ecr_api = {
      service             = "ecr.api"
      private_dns_enabled = true
      tags                = { Name = "ECR API Interface Endpoint" }
      subnet_ids          = [module.vpc.private_subnets[0]] # Interface endpoints are priced per AZ
      policy              = data.aws_iam_policy_document.generic_endpoint_policy.json
    },
Enter fullscreen mode Exit fullscreen mode

Data processed, NAT Gateway + S3 Gateway endpoint + ECR DKR and API interface endpoint

Comparing the scenarios

The results needed to be plotted on a logarithmic scale for visibility. As we see below, the S3 Gateway endpoint has the biggest impact on the data processed by the NAT gateway.

NAT Gateway bytes processed comparison graph

The cost impact

Considering a scenario similar to the original article, how much impact could the Gateway S3 endpoint have made?

The article mentions that their NAT Gateway processed 16TB of data, with a 500MB docker image. This is approximately 32,000 deployments. This was also because of a failing health check, which can happen in real world scenarios.

Let's simulate the same scenario with our docker image, which is 403MB.

Without the S3 Endpoint, the NAT Gateway processes ~414MB.
With an S3 Gateway endpoint, the NAT Gateway processes ~0.245MB.

If there were 32,000 deployments with the image in our example:

1. Without the S3 Gateway endpoint
Data processed: 414MB*32,000 = 13,248,000MB = 13,248GB
Cost($0.045/GB) = $596.16

2. With the S3 Gateway endpoint
Data processed: 0.245MB*32,000 = 7,840MB = 7.84GB
Costs($0.045/GB) = $0.3528

This could of course be mitigated further with VPC interface endpoints, but since they come with their own costs, it would be worth analysing based on requirements for a specific setup.

Wrapping up

Looking at the data processed by the NAT gateway in different scenarios, I think it's fair to say:

  • Definitely consider creating an S3 gateway endpoint, since these are available at no additional cost and drastically reduce the data processed by the NAT Gateway for this and other scenarios.
  • Depending on the number of deployments and security aspects of your architecture, consider using VPC interface endpoints.

If there are questions or feedback, please feel free to reach out!

costoptimization Article's
30 articles in total
Favicon
Cost Optimization Strategies for AWS: Leveraging Spot Instances, Savings Plans, and Cost Explorer
Favicon
Optimize Cloud Expenses with CloudCADI
Favicon
FinOps Consulting Services for Cloud Cost Optimization
Favicon
Automating Cost Optimization Insights with AWS SAM: A Well-Architected Framework Solution
Favicon
S3 Lifecycle or Intelligent-Tiering? Object Size Always Matters
Favicon
Azure CAF for Cost Optimization: Reducing Cloud Spend Without Compromising Performance
Favicon
Unlocking AWS Cost Savings: Unique Optimization Tips
Favicon
SaaS Cost Optimization on AWS: Effective Strategies for Managing Cloud Expenses
Favicon
15 Cut Points to Save AWS Bills
Favicon
Optimizing Software Licensing Costs with AWS License Manager
Favicon
Open Source Tools for AWS Cost Optimization: The Ultimate Guide
Favicon
Snowflake Advanced Cost Control: Ingest Optimizations
Favicon
AWS Cloud Cost Optimization - Identifying Stale Resources
Favicon
AWS Cost Optimization: Periodic Deletion of ECR Container Images
Favicon
Are You Wasting Money in Your Software Development Project?
Favicon
Mastering AWS Instance Pricing: A Guide to Choosing the Right Plan for Your Needs
Favicon
AWS announces a 7-day window to return Savings Plans
Favicon
Saving 90% of our AWS Cost using ECR Lifecycle Rules
Favicon
Maximizing SPOT Instance Efficiency Strategies
Favicon
Revolutionizing Assessments: A Dive into AWS Well-Architected Tool's Newest Feature โ€“ โ€œReview Templatesโ€
Favicon
Organization's EKS Clusters Discovery
Favicon
Architecting for Cost Savings on BigQuery
Favicon
Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate
Favicon
Resizing images on-the-fly
Favicon
AWS Cloud Cost Dilemma: Savings Plans vs. RIs - Your Definitive Guide
Favicon
Cloud Cost Control: Exploring 10+ Tools for AWS Spending Optimization
Favicon
With AWS Savings Plans - Next stop is AWS Budgets
Favicon
How to save cost in non-prod AWS environment
Favicon
Optimize Your AWS Costs - Get 60% Savings Now
Favicon
Maximizing Cost Optimization with Well-Architected Programs on AWS

Featured ones: