Logo

dev-resources.site

for different kinds of informations.

In-place Serverless Querying AWS S3 Data

Published at
5/24/2024
Categories
s3
query
serverless
datalake
Author
Asanka Boteju
Categories
4 categories in total
s3
open
query
open
serverless
open
datalake
open
In-place Serverless Querying AWS S3 Data

We often have a need to directly query unstructured data stored in S3 Buckets in various data formats such as CSV, JSON, AVRO, ORC, PARQUET for ad-hoc querying or may be as a part of building a comprehensive data solution.

Below are some AWS Serverless services that you can use to directly query your S3 data.

1. Amazon Athena
Suitable for ad-hoc data discovery and SQL querying. In this service you are charged based on the amount of data scanned.

Image description

2. Amazon Redshift Spectrum
Suitable if you have to use more complex queries and also if you need to support a large user base.

Image description

Redshift spectrum is recommended due to below reasons.

  • Uses Redshift Data warehouse SQL syntax which can spans Redshift Tables and S3 Data Lakes.

  • Provides sophisticated query optimization.

  • Distributes queries across multiple nodes for parallel processing.

  • Can be used with already existing BI tools.

Thank you for your time...

Featured ones: