Logo

dev-resources.site

for different kinds of informations.

AWS ECS: Breaking out of the observability vendor lock-in with SigNoz

Published at
9/18/2023
Categories
aws
signoz
awsecs
observability
Author
wednesdaysol
Categories
4 categories in total
aws
open
signoz
open
awsecs
open
observability
open
Author
12 person written this
wednesdaysol
open
AWS ECS: Breaking out of the observability vendor lock-in with SigNoz

Image description

In the not-too-distant past, the debate was between on-prem and cloud-native. You’re now faced with the choice of choosing between the different cloud infrastructure providers, and inevitably, someone will throw in the phrase “vendor lock-in”. Not having a response for the famed “vendor-lockin” sometimes leads to building things that are much more complex than required basis the stage that the product is in. In this one, I’d like to talk to you about how you can break out of the cursed vendor-lockin while using AWS ECS to manage your workloads.

AWS ECS is a managed service container orchestration service from AWS. It’s easy to set up, manage, and do the job. You don’t need to configure a bunch of pods, services, deployments, daemon sets, yada yada yada to run an app that responds with “hello world”. Sometimes a few clicks are all you need, and that's just what ECS does. Coming back to vendor lock-ins, through this article I’d like to take you through the architecture and the process through which we leverage the container orchestration capabilities of ECS without depending on AWS for logging, distributed tracing, metrics, alerts, and visualizations. You’re damn right, Christmas came early for you!

Image description

Observability with AWS ECS

If you also use AWS copilot to deploy applications on AWS ECS Fargate, the complete process of deploying your application becomes a walk in the park. Take a look at this article for a quick walk-through. To know how to configure canary deployments for ECS workloads you can go through this article right here. You got that right, anything ECS and Wednesday’s got your back. Depending only on logs to debug issues, identify bottlenecks, and pinpoint potential points of failure is a thing of the past. Having the right tools to monitor the health, behavior, and performance of your application is essential - conventionally with ECS this is done using AWS CloudWatch Logs, Metrics, Alarms & AWS X-Ray. These tools are amazing to get off the ground and get started. However, they are limited in functionality and are increasingly expensive as traffic increases and you need to scale your operations.

Fret not, I don’t come to instill feat, I come bearing gifts.

Image description

In these dark and perilous times, SigNoz decided to come in and save the day. SigNoz is an open-source APM(Application Performance Monitoring) tool. It helps developers monitor their applications & troubleshoot problems. It is open source and is a one-stop solution for developers to view their application’s logs, traces, and metrics.

I wish I was paid for this (nudge-nudge SigNoz). Here is why I like SigNoz over the AWS Services for observability

  1. SigNoz shows you the traces, metrics, and logs of your application in one place. This makes it much easier for you to troubleshoot issues, without switching between different services.
  2. SigNoz is much cheaper than other paid APM alternatives like data dog and is also cheaper than using AWS cloudwatch and AWS x-ray. Here is an article that compares the pricing between datadog and SigNoz.
  3. SigNoz is open-source, so you can host it on your own infrastructure and you can monitor your data without it ever leaving your servers. If you want you can even host your own clickhouse cluster to store your data but deploy your services on AWS.
  4. SigNoz uses open telemetry to collect data so it is language-agnostic and can be used with any stack of your choice.
  5. SigNoz also uses clickhouse, an OLAP database that has significant performance improvement and consumes fewer resources than other log storage solutions. Here is an article about why Cloudflare migrated from elastic search to clickhouse.

You’re not wrong to point out that they are the new kid on the block, however, they are battle-tested, used by companies like Outplay, Wombo, Fi Money, and more along with 12K+ stars on GitHub, and a thriving and extremely helpful community! I speak from first-hand experience, the community really helped me when I was getting this set up configured the first time around.
The rest of the article talks about what the setup would look like, and things that you need to be careful about. But if you’re more of an “enough talking, show me what it’s capable of” kinda person, here’s a link to a Github repo that allows you to get off the ground with a single command.

Hosting SigNoz on AWS ECS

SigNoz Architecture

SigNoz has multiple internal services such as

SigNoz Otel Collector

It is responsible for collecting data from various sources, including applications, infrastructure, and other monitoring tools. The Otel Collector can ingest data in various formats, such as OpenTelemetry, and from various systems such as Jaeger, and Zipkin. This data is then sent and stored in a Clickhouse database.

Query Service

It enables users to query and analyze the data collected by SigNoz. The Query Service is responsible for processing user queries and returning the relevant data in a structured format that can be visualized and analyzed.

Frontend Service

It is the user interface component of the SigNoz observability tool. It provides a web-based dashboard that allows users to visualize and interact with the data collected by SigNoz.

Alert Manager

It enables users to set up alerts based on specific criteria and receive notifications when those alerts are triggered.

In our AWS ECS setup, we’re going to have separate ECS services corresponding to each of the SigNoz internal services. It’ll look something like this

Here are all the 5 services that have been deployed on AWS ECS Fargate

We’ve mounted an EFS to store metadata that the query service requires to authenticate users. This ensures data persistence across restarts and ensures your ECS service doesn’t lose state.

Additionally, you’re going to have to deploy otel & fluentbit sidecars along with each of your application containers for seamless and efficient tracing, logging, and metrics.

You can choose to use Clickhouse’s SAAS offering, host your own on-prem clickhouse servers, or even deploy your clickhouse cluster on EC2 instances. I chose to deploy the cluster on EC2 instances.

Hosting your own clickhouse cluster

The EC2 Clickhouse Architecture will look like this:

Image description

First, we are going to create vpc, then two private subnets and two public subnets inside it. We will host the entirety of our cluster inside the private subnet and host a single bastion host in our public subnet so we can ssh into our servers in the private subnet in case we need to troubleshoot any issues.

You can choose to deploy 1 or more clickhouse shards and zookeepers. We’ve created custom AMIs for configuring our zookeeper and clickhouse cluster that you can directly use. These AMIs are configured to back up clickhouse data to AWS 3 daily.

đź’ˇ Depending on your workload you can choose to use a higher or lower number of instances and instance types.

That’s it, it was that easy. The end setup would look like this:

The system architecture of all the services deployed on AWS

Here are some sample images of how the SigNoz setup looks for my workload

All the services are shown in SigNoz

Showing latency of API calls on our gin API

Graphs showing traces of our API calls<br>

Displaying all the logs of our application

Custom chart showing the CPU load of our application

What next homie?

At Wednesday we migrated a few of our production workloads out of XRay and CloudWatch. This setup largely consisted of AWS CloudWatch Logs, Metrics, and Alerts with slack integration using AWS Chatbot, along with X-Ray for distributed traces. We migrated workloads with no loss of functionality and without skipping a beat. If you’d like to get your hands on the JSON to set up some commonly used dashboards, here is the good stuff (wink wink)

Image description

In conclusion, hosting SigNoz on AWS ECS Fargate provides a powerful and scalable solution for monitoring and troubleshooting distributed systems. With features like distributed tracing, customizable alerting, and a powerful query language, SigNoz is a valuable tool for identifying and resolving performance issues in complex systems. SigNoz is currently working on adding tail-based sampling and an Anomaly detection framework. Overall, the upcoming developments for SigNoz promise to make it even more powerful and valuable, and we're excited to see what the future holds for this amazing open-source tool.

Here is a one-hit setup to create the entire infrastructure (yes yes, I mean it, the entire infrastructure!) with a single command. Go check out ECS SigNoz now!

If you’re looking for a team that can help you with observability with SigNoz write to us at [email protected]

Featured ones: