Logo

dev-resources.site

for different kinds of informations.

Saving 90% of our AWS Cost using ECR Lifecycle Rules

Published at
3/16/2024
Categories
aws
savings
ecr
costoptimization
Author
prithvijj
Categories
4 categories in total
aws
open
savings
open
ecr
open
costoptimization
open
Author
9 person written this
prithvijj
open
Saving 90% of our AWS Cost using ECR Lifecycle Rules

Context

For fun, I logged into an AWS account that is used for development purposes. Due to it's purpose, there was a high chance that resources may have been dangling around and not being cleaned up.

Exploring in the AWS Costs Explorer, I took a quick look at the different services, and I came across the ECR Service, which was around $16 per day (around $480 per month). Switching to Dimension: Usage Type seems like most of the cost was for an attribute called TimedStorage-BytesHrs.

I opened random ECR Private Repos that were manually setup before through a bash script. I found one in particular where the ECR Repo had around 9000+ Images, and most of it was prefixed with tag named as dev-*. Just scrolling through each image size was around 600 MB, and existed as far back as 2021. Oh boy, I had to find out what it costed us.

Analysis

I had to get the size of the Repository, and hence used the AWS CLI command to retrieve the information using AWS CloudShell

aws ecr describe-images --repository-name 'repo-a' --query 'imageDetails[*].imageSizeInBytes' | jq '.[]' | awk '{sum+=0} END{print sum}'
Enter fullscreen mode Exit fullscreen mode

Copied over the Bytes Value, and converting into GB

4,236,629,104,429 Bytes/1024 = 4,137,333,109.79 KB
4,137,333,109.79 KB/1024 = 4,040,364.36 MB
4,040,364.36 MB/1024 = 3,945.66 GB
Enter fullscreen mode Exit fullscreen mode

Yeah that's a lotta GBs. Taking a quick look at ECR Pricing, it's $0.10 per GB / month, so doing some quick maths

3,945.66 GB * $0.1 GB per month = $ 394 per month potentially
Enter fullscreen mode Exit fullscreen mode

This was just for a single ECR Private Repository, which seemed super weird that we were paying this for a long while. Oh man, if this is for 1 ECR Repository, what about the others?

Quick ChatGPT queries later, had got the AWS CLI commands to call

 #!/bin/bash

 repositories=$(aws ecr describe-repositories --query 'repositories[*].repositoryName' --output text)

 for repo in $repositories; do
         size=$(aws ecr describe-images --repository-name $repo --query 'imageDetails[*].imageSizeInBytes' | jq 'if length == 0 then 0 else .[] end' | awk '{sum+=$0} END{print sum}')
         leng=$(aws ecr list-images --repository-name $repo | jq '.imageIds | unique_by(.imageDigest) | length')
         echo "Repository: $repo, num-of-images: $leng, Total Size: $size Bytes"
 done
Enter fullscreen mode Exit fullscreen mode

Called this script within AWS CloudShell and piped the output

bash cool.sh | tee cool.txt
Enter fullscreen mode Exit fullscreen mode

From here I wanted to get a list of ECR Private Repos sorted by Size so I used the following

 awk '{sum=$7/1024/1024/1024} {cost=sum*0.1} {printf("%s\tnum-of-images: %s\t%s GB\t\$%s\n",$2,$4,sum,cost)}' cool.txt | column -t | sort -n -k4 > sorted-cool.txt
Enter fullscreen mode Exit fullscreen mode

I was able to get a list of all the ECR Private Repositories, along with the Total size of the given repository, and the potential cost for it

# Rough Numbers

...
repo-d, num-of-images:  3800,  890     GB  $89.0
repo-c, num-of-images:  3100,  1000    GB  $100.0
repo-b, num-of-images:  5000,  1600    GB  $160.0
repo-a, num-of-images:  9800,  3900    GB  $390.0
Enter fullscreen mode Exit fullscreen mode

Where to go from here?

I started with providing visibility, by making a post on the Slack Channel to the Team that owned those ECR Private Repositories and providing them information about what I had found. I also asked for their permission to act on cleaning out those unused resources. This eventually led to a 1 hour meeting where we identified stale images and ECR Private Repos that should definitely be deleted.

Next I gathered screenshots of all the ECR Privates Repos we would intend to purge, to get "evidence" of a before and after comparison.

After all the "evidences" were validated, sent the following message in the Slack Thread.

I intend to start this procedure by 12:45PM
Enter fullscreen mode Exit fullscreen mode

[ℹ️ Turn the Ship Around! is a nice book, learned about I intend to before doing something.]

I went into the 9 ECR Private Repos and manually added the following Lifecycle Rules

1 Untagged    expire | sinceImagePushed(7 days) | untagged
2 dev-*       expire | sinceImagePushed(14 days) | tagged prefix [dev-]
Enter fullscreen mode Exit fullscreen mode

This essential means:

  • Delete any untagged images that are 7 days old
  • Delete any tagged images with prefix dev- that are 14 days old

After adding it in, we waited for a couple of days for the Lifecycle Rules to execute.

Results

We saw some great results

# Before:
# Rough Numbers

...
repo-d, num-of-images:  3800,  890     GB  $89.0
repo-c, num-of-images:  3100,  1000    GB  $100.0
repo-b, num-of-images:  5000,  1600    GB  $160.0
repo-a, num-of-images:  9800,  3900    GB  $390.0
Enter fullscreen mode Exit fullscreen mode
# After
# Rough Numbers

...
repo-d, num-of-images:  420,  90     GB  $9.0
repo-c, num-of-images:  360,  120    GB  $12
repo-b, num-of-images:  480,  150    GB  $15
repo-a, num-of-images:  670,  330    GB  $33
Enter fullscreen mode Exit fullscreen mode

🤑 🎉 Looking at the ECR Costs in Cost Explorer, we saw that the Storage costs of the image in the ECR Private Repos (Attribute: TimedStorage-ByteHrs) dropped to around $1.48 per day ($44.4 per month) compared to the $16 per day (around $480 per month) which was ~90% savings on our AWS bills!

It was good to share these results with the Team.
One thing that I'd look forward to how to is how to implement this at scale. Since that's a lotta savings

Things that make you go hmmmm

Q) How did this flow under the radar?

Ans) I think that it was a mix of :

  • Assumptions that there were checks and balances (like the ECR Lifecycle Rules) were already setup
  • Changing the Docker Tags that were being used across the 4 years
  • No observability in terms of size of the ECR Repository
  • Change in Priorities and Teams might have caused this to flow under the Radar

Q) You mentioned manually adding, Why not automate it using a script?

Ans) I think that it was best to start off with manually adding in the rules to help remove most of the costs, and then acting on how this can be scaled out for all repositories.

Q) You mentioned the ECR Repos manually setup using a bash script? Why not IaC?

Unluckly, when the project initially began, there was a bash script that was provided and run manually to help setup the ECR Private Repos. There is clearly room for improvement in terms of converting it all into IaC that can be managed

References

Featured ones: