Logo

dev-resources.site

for different kinds of informations.

Serverless NBA Data Lake Application with API Gateway, AWS Lambda, Amazon S3, AWS Glue and Athena Using Terraform

Published at
1/15/2025
Categories
aws
python
terraform
webdev
Author
gbenga700
Categories
4 categories in total
aws
open
python
open
terraform
open
webdev
open
Author
9 person written this
gbenga700
open
Serverless NBA Data Lake Application with API Gateway, AWS Lambda, Amazon S3, AWS Glue and Athena Using Terraform

In sports analytics, the ability to process and analyze vast amounts of data in real time has become a game-changer. Having the power to ingest, store, and query large datasets of NBA statistics seamlessly and also enjoying the scalability and cost-efficiency of serverless architecture is awesome.
In this project, we’ll explore how to build a Serverless NBA Data Lake Application using API Gateway, AWS Lambda, Amazon S3, AWS Glue, and Amazon Athena β€” all orchestrated with Terraform.

System Architecture Overview
The architecture leverages the following components:
β€’ Amazon S3: Serves as the central data lake for storing raw, processed, and curated NBA data in JSON format.
β€’ AWS Lambda: Lambda functions Fetches NBA Data from sportdata.io, formats it and upload to Amazon S3
β€’ Amazon API Gateway: Provides a RESTful API that triggers the Lambda function to fetch NBA data from sportdata.io and upload to an S3 bucket.

β€’ AWS Glue: Automatically discovers and catalogs the data stored in S3 into a schema using the Glue Database Catalog and Glue crawler for efficient querying.
β€’ Amazon Athena: Enables serverless querying of the data lake using standard SQL, allowing users to retrieve insights from the curated NBA data and store result in an Amazon S3 bucket

Image description

Prerequisites:
β€’ AWS account with required access and permission to configure services such as Lambda, S3, Glue API Gateway and Athena.
β€’ Experience with programming languages supported by AWS Lambda, such as Python.
β€’ Terraform installed on your local machine
β€’ AWS CLI Installed and configured on your local machine.

Define Your Lambda function
We will develop a Python script for our Lambda function to retrieve NBA data from sportdata.io, process it, and uploads it to Amazon S3. The complete python code is available in the repository.

Image description

Terraform Configuration
We will use Terraform modules for this deployment to ensure modularity, reusability, and maintainability in our infrastructure as code. Each folder in the modules directory will define the infrastructure configurations required for deploying specific AWS services. See below

Image description

β€’ API Gateway Module: This module deploys an API Gateway that will serve as a trigger to the lambda function to retrieve data from sportdata.io and upload it in Amazon S3.

Image description

β€’ iam_role Module: This module contains the terraform codes that defines the necessary permissions for lambda to be able to retrieve and upload NBA data to Amazon S3 and API Gateway to be able to trigger the lambda function.

Image description

β€’ Lambda Module: This module defines the terraform codes for archiving the python code in a zip file and also create a lambda function that retrieves NBA data from sportsdata.io, process it and uploads to Amazon S3.

Image description

β€’ S3 module: This module defines the terraform codes that creates the Amazon S3 bucket that will be used to store data retrieved form sportdata.io by the lambda function.

Image description

β€’ glue module: This module defines the terraform codes that creates the Amazon Glue catalogs database, Glue crawler and Glue table which automatically discovers the data stored in S3 and catalogs it into a schema for efficient querying.

Image description

β€’ athena module: This module defines the terraform codes that creates an Athena workgroup that enables serverless querying of the sport data lake stored in S3 using standard SQL.

Image description

Check the link below for the full terraform configurations

https://github.com/OjoOluwagbenga700/sport-data-lake.git

Step 1: Clone the Terraform Code
By cloning the Terraform code, we'll have access to the infrastructure-as-code configurations needed for our deployment process.

Clone Repository: Use the git clone command to clone the Terraform code repository to your local machine.

Ensure that you have Git installed and configured on your system.

https://github.com/OjoOluwagbenga700/sport-data-lake.git

Change directory to the folder name sport-data-lake.

Ensure you update the terraform.tfvars file with your API Key from sportdata.io

Image description

Step 2: Running Terraform Commands

Terraform init: Initialize Terraform in the project directory to download necessary plugins and modules.

Image description

Terraform Plan: Generate an execution plan to preview the changes that Terraform will make to the infrastructure.

Image description

Terraform Apply: Run terraform apply --auto-approve to deploy the infrastructure on AWS.

Image description

Step 3: Confirm resources deployed on AWS

Lambda Function

Image description

Glue crawler

Image description

Glue catalog database and Table

Image description

S3 Bucket without Data upload

Image description

Athena Workgroup

Image description

API Gateway

Image description

Step 4: Testing the Application

To trigger the lambda function to retrieve, process and upload NBA data to S3, we will send a GET request through the API Gateway Invoke URL.

Copy the API Gateway invoke url to your browser, add /dev/data to indicate the API stage and path and click enter.

https://r3zks22udh.execute-api.us-east-1.amazonaws.com/dev/data

Image description

Image description

NBA Data Uploaded into S3

Image description

Image description

Image description

Preview data table in Athena

Image description

Performing Simple SQL query in Athena

Image description

Athena Query Result

Query results are stored in a defined folder in the s3 bucket and can be downloaded accordingly. See below

Image description

Image description

Conclusion: Congratulations!!!, we have successfully built a Serverless NBA Data Lake Application by leveraging AWS services like API Gateway, Lambda, S3, Glue, and Athena. Terraform adds to the elegance by ensuring your infrastructure is provisioned consistently and can be replicated or modified with ease. This architecture not only showcases the potential of serverless computing but also opens up endless possibilities for expanding into other domains, such as real-time analytics, machine learning, or personalized user experiences.

To Clean up: Run terraform destroy to delete all infrastructure deployed by the terraform codes.

Image description

aws Article's
30 articles in total
Favicon
Best Tips to Prepare for the AWS Certification Exam in 2025
Favicon
Top 10 Reasons to Learn AWS in 2025
Favicon
Building a Serverless REST API with AWS Lambda and API Gateway
Favicon
AWS Certification Syllabus [Updated 2025]
Favicon
Simple SQL Generator using AWS Bedrock
Favicon
Amazon S3 vs. Glacier: Data Archival Explained
Favicon
Serverless NBA Data Lake Application with API Gateway, AWS Lambda, Amazon S3, AWS Glue and Athena Using Terraform
Favicon
Detect Inappropriate Content with AWS Rekognition
Favicon
Building a Weather Data Collection System with AWS S3 and OpenWeather API
Favicon
Why AWS Matters: A Beginner's View
Favicon
Stop Worrying About EC2 Patching – Automate It Like a Pro!
Favicon
Step 1: GET-> SET-> AWS!
Favicon
My AWS Learning Journey Begins
Favicon
Something You Didn't Know About AWS Availability Zones
Favicon
3..2..1… AWS aterriza en MΓ©xico πŸš€ πŸ‡²πŸ‡½
Favicon
πŸ›‘οΈ Security Measures: Safeguarding Your Codebase πŸ”’
Favicon
Creating a react game on AWS
Favicon
What is Cloud Service Providers? Types, Benefits, & Examples
Favicon
Power Up Your AWS Game: Create EC2 Instances, Install Apache, and Connect with PowerShell
Favicon
Navigating Disaster Recovery in the Digital Age: Choosing the Right Approach – Part 3
Favicon
Choosing Between Amazon Bedrock and Amazon SageMaker AI: A Comprehensive Guide
Favicon
Navigating Generative AI Services on AWS: Your Essential Guide
Favicon
πŸš€ Week 3 Recap: Learning in Public – Software Engineering with DevOps πŸš€
Favicon
VPN Peering "Region to Region "
Favicon
Introducing vulne-soldier: A Modern AWS EC2 Vulnerability Remediation Tool
Favicon
Top 7 Kubernetes Certifications in 2025
Favicon
🚨 (Amazon) Interview Alert: I Just Decoded a Tricky JavaScript Question in 2 Minutes!
Favicon
How to Pay AWS Bills in Naira: A Quick Guide
Favicon
Cloud computing can be confusing, but it doesn't have to be! β˜οΈπŸ€” In the latest episode of Cloud in List of Threes (CiLoTs), I’m serving up easy-to-digest (pun intended 🀭) explanations analogy to explain Regions, Availability Zones, and Edge Locations
Favicon
From Regions to Edge Locations: A CiLoTs Guide to Cloud Infrastructure

Featured ones: