Logo

dev-resources.site

for different kinds of informations.

Unlock CloudFront's New Logging Potential with Athena Partition Projection

Published at
11/26/2024
Categories
aws
cloudfront
athena
s3
Author
shuvohoque
Categories
4 categories in total
aws
open
cloudfront
open
athena
open
s3
open
Author
10 person written this
shuvohoque
open
Unlock CloudFront's New Logging Potential with Athena Partition Projection

When I first set out to use Athena for querying CloudFront logs, I thought it would be a breeze. Just set up the Glue database, create a table, and start querying, right? Wrong!! The problem hit me when I realized the log files had a flat structure—no partitions, no hierarchy. Every query ended up scanning massive amounts of data, even when I just needed a small slice of information from logs.

To make things worse, manually adding partitions for each batch of logs felt like an endless chore. But then came AWS's announcement: Apache Parquet support for CloudFront logs, along with Hive-compatible folder structures in S3 bucket. The new logging capabilities provide native log configurations, eliminating the need for custom log processing.That’s when it clicked—if I combine this with Athena Partition Projection, it would be a total breakthrough.


How CloudFront Logs Work: From Then to Now

Previously, CloudFront logs were delivered in plain text (CSV) format. While this format was simple, it wasn’t optimized for querying large datasets. Besides that, logs were delivered to S3 in a flat structure, requiring manual processing to re-organize them into a more structured format, making it easier to organize the data into partitions.

Flat structure of file name

s3://cloudfront--logs/E123ABC456DEF-2024-11-25-12-00-00-abcdef0123456789.gz

You can find the distribution-id and date-time in the file name. Extracting those information and re-structuring log files for partitioning accordingly, it requires to setup an automation process behind the scnene.

--

Now, CloudFront can deliver logs in Apache Parquet format.

Parquet is a columnar storage format that improves query performance and reduces storage space significantly.

CloudFront also supports Hive-style partitioning when delivering logs to S3.

Hive-style refers to a folder structure where your data will be organized into directories named after partition keys and their values, like key=value/.

This means your logs are stored in a folder structure like this:

s3://cloudfront--logs/year=2024/month=11/day=25/hour=15/

Even better, you can customize the partitioning field to match your needs. For example, partition by year, month, day, or even by DistributionId:

Example : `{DistributionId}/{yyyy}/{MM}/{dd}/{HH}/`

This flexibility makes querying faster and perfectly tailored to your use case.


Why does "Partition Projection" matter?

Partition Projection is a feature in Athena that automatically understands how your data is organized in S3.

Without partition projection, Athena requires partitions to be explicitly loaded using the MSCK REPAIR TABLE command which can require few minutes if your bucket contains huge data. Instead of manually loading partitions before each query, you simply define your data structure during table creation, and Athena handles the rest automatically.

Beside that, Developers who query Athena expect to see the latest logs, forget to load the partitions sometimes causing logs to appear as "missing." This creates chaos and confusion inside your team. As a DevOps Engineer, you need to provide a platform that is more automated and less hassle for developers, reducing manual steps and ensuring they can access the latest data without worrying about partition management. That’s where Partition Projection becomes your new friend.


How does "Partition Projection" work?

When you create a table in Athena,you simply describe how your data is structured in your log bucket. For example, if your logs are stored by year, you let Athena know upfront like below :

CREATE EXTERNAL TABLE cloudfront_logs (
  <col_name> <col_type>, ....
)
PARTITIONED BY (year STRING)
STORED AS PARQUET
LOCATION 's3://cloudfront--logs/'
TBLPROPERTIES (
  'projection.enabled' = 'true',
  'projection.year.type' = 'string',
  'projection.year.range' = '2020,2030',
  'storage.location.template' = 's3://bucket_name/year=${year}/'
);
Enter fullscreen mode Exit fullscreen mode

Let’s say you’ve got CloudFront logs stored in your bucket like this:

s3://cloudfront--logs/year=2024/
s3://cloudfront--logs/year=2025/
Enter fullscreen mode Exit fullscreen mode

Then, whenever you query the logs for the year of 2024, Athena will only look for the data inside the year=2024 folder and return the latest logs in the query result.

Lets talk about the setup:

To set up Hive-style partitions in CloudFront:

When you enable the logging, choose Amazon S3 as the destination. Then, you need to enable Hive-compatible prefixes for your log files and choose Parquet as output format.Here's an example of suffix path for partitioning cloudfront logs.

{DistributionId}/{yyyy}/{MM}/{dd}/{HH}/

To effortlessly set up an Athena database and table with Partition Projection enabled using IAC, check out this GitHub repo


Wrapping It Up

CloudFront logs just got a lot easier to work with. Whether you're using the new Apache Parquet format with Hive-compatible folders or combining it with Athena Partition Projection, you can now query your logs faster, cheaper, and with way less hassle. It’s been a game-changer for me, and I hope it will be for you too. However, it has few limitations, make sure you are aware of that and it doesn't clash with your use case. You'll find those in Athena's official aws documentation.

But that’s not all. You can also deliver CloudFront logs to CloudWatch in JSON or text format, or even use Kinesis Data Firehose to process logs on the fly. AWS has made it super flexible to work with CloudFront logs, so you can choose the setup that works best for you.

Happy logging! 😊

cloudfront Article's
30 articles in total
Favicon
Lee esto antes de implementar S3 y CloudFront usando Terraform.
Favicon
Lee esto antes de implementar S3 y CloudFront usando Terraform.
Favicon
Building an S3 Static Website with CloudFront Using Terraform
Favicon
AWS CloudFront: A Comprehensive Guide
Favicon
Configuring AWS WAF, CloudFront, and S3 Bucket for Secure Access
Favicon
Setting Up Custom Domain for API Gateway & CloudFront
Favicon
Apply SSL Certificate on AWS ACM (also Cloudflare)
Favicon
How to Fix Next.js CloudFront Permission Denied: Complete Guide to Static Export URL Issues
Favicon
Building Testable CloudFront Functions with TypeScript
Favicon
CloudFront and S3: SignatureDoesNotMatch , the request signature we calculated does not match the signature you provided
Favicon
Mastering Custom Responses in Amazon CloudFront to Block Behavior — Default(*)
Favicon
Unlock CloudFront's New Logging Potential with Athena Partition Projection
Favicon
Discovering the Latest Features of AWS CloudFront: Enhancing Performance and Security
Favicon
Create an Asset Store with a Custom Domain using AWS CDK, Route53, S3 and CloudFront
Favicon
New Feature: Amazon CloudFront no longer charges (No Billing) for requests blocked by AWS WAF
Favicon
Reduce the amount of code in AWS CDK: Apply OAC in Amazon CloudFront L2 constructs
Favicon
Amazon Web Services (AWS) has announced today a new edge location in #Qatar
Favicon
Hosting a Static Website On S3 bucket With CloudFront.
Favicon
Setting up AWS S3 and CloudFront with Signed URLs using CDK
Favicon
Leveraging Lambda@Edge to seamlessly manage frontend maintenance window like a pro
Favicon
How to Resolve 403 Access Issues When Deploying a SPA on AWS with S3 and CloudFront
Favicon
Cloud Resume Challenge: Hosting a React CV using S3, CloudFront, Route 53, Lambda, DynamoDB and GitHub actions for CI/CD
Favicon
Deploying Static Website to AWS: A Step-by-Step Guide with S3, Route 53, and CloudFront
Favicon
Resolve Lambda URL Error - signature not match when using POST/PUT
Favicon
Setting Up and Securing CloudFront for S3 Static Sites with Custom Subdomains Using AWS Cloud Development Kit(CDK)
Favicon
How to Host a Static Website on AWS Using S3, Route 53, CloudFront, and Certificate Manager
Favicon
RTT Reduction Strategies for Enhanced Network Performance
Favicon
Dynamically Choosing Origin Based on Host Header and Path with AWS CloudFront and Lambda@Edge
Favicon
Understanding Speed: A Beginner's Guide to AWS CloudFront
Favicon
Deploy a Static React Site Using AWS S3 and CloudFront

Featured ones: