Logo

dev-resources.site

for different kinds of informations.

Glue cross-account setup

Published at
1/11/2025
Categories
aws
analytics
cloud
Author
payalgupta4639
Categories
3 categories in total
aws
open
analytics
open
cloud
open
Author
14 person written this
payalgupta4639
open
Glue cross-account setup

This document will cover detailed steps on how to query glue DB catalog from Dremio in a cross-account setup using AWS Lake formation

Use-case
Account A - Dremio is deployed here and AWS Glue_DB_A is created and added as a source in Dremio

Account B - AWS Glue_DB_B is created and data is located in the S3 bucket

Customer wants to share Glue-DB B catalog with Glue-DB A and query the data located in account B from Dremio

Setup Diagram

Image description

Role of each of service in the given setup -

  • Lake Formation - To create data mesh, simplify cross-account data sharing, and create resource links

  • Resource Access Manager - To share resources and view shared Data catalog

  • IAM User - To provide cross-account read/write access to the S3 bucket to run queries from Dremio

  • Amazon Athena - Just to test whether lake formation access is working fine or not

Steps

  • Resource Sharing using Lake Formation and Resource Access Manager

First we need to use Lake Formation and Resource Access Manager to share glue catalog from account B to A

Steps for Account-B:

  1. Create Glue DB named Glue_DB_B

  2. Create Glue Table in this DB, point to S3 location where data resides, and provide schema
    OR
    You can use glue crawler to automatically extract data from S3 and add glue table for you.

  3. Go to Lake Formation console -> Data Lake Location -> Register same S3 location -> Use default IAM role -> AWSServiceRoleForLakeFormationDataAccess

  4. Go to Lake Formation -> Databases -> Select Glue_DB_B -> Actions -> Grant -> Fill in (External Account), put AWS Account-A ID -> Choose a specific table

For DB, grant Alter, Create table, Describe
For Table, grant Alter, Delete, Describe, Drop, Insert
Enter fullscreen mode Exit fullscreen mode
  1. Go to Resource Access Manager console -> Shared by me in the left pane -> Resource Shares You should be able to view your shared resources

Steps for Account-A:

  1. Go to Resource Access Manager → Shared with me → Resource Shares → Accept your Resource Share

  2. Now, Go to Lake Formation -> Table -> Your shared table will appear here -> Click on table -> Actions -> create Resource link

  3. Table will now appear italicized in the glue db as shown below

Provide cross-account read/write access to the S3 bucket

Steps to do so:

  • Go to Account B → S3 console
  • Select your S3 bucket
  • Go to the Permissions tab
  • Edit Bucket Policy and add the following policy (make sure to add the AWS Account-A ID, IAM User name, and bucket name)
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
            },
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>/*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<AccountA-ID>:user/<username>"
            },
            "Action": [
                "s3:GetLifecycleConfiguration",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::<bucket-name>"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode
  1. Add Glue catalog as a source in Dremio

Last step is to add Glue_DB_A as a source in Dremio :

  • Go to Add Source
  • Select AWS Glue Data Catalog
  • Fill in the details - Name, Region, Authentication
  • Hit Save

You should be able to view the datasets from both the glue catalogs and run queries on them.

Or

You can run the query on the glue source via Athena instead of Dremio.

analytics Article's
30 articles in total
Favicon
7 Open-Source Tools for Better Website Analytics
Favicon
OpenSearchCon Europe 2025 - Amsterdam!
Favicon
Massively Scalable Processing & Massively Parallel Processing
Favicon
How to Perform a Comprehensive SEO Audit
Favicon
Analytics Tool For React Devs (Vercel Analytics Alternative)
Favicon
Glue cross-account setup
Favicon
The Columnar Approach: A Deep Dive into Efficient Data Storage for Analytics 🚀
Favicon
Geometric Empirical Modeling: The End of AI
Favicon
Powerdrill AI: a Comprehensive Guide and Common Use Cases
Favicon
Unleashing Data Insights: Harnessing Amazon QuickSight Q's Generative BI for Transformative Analytics
Favicon
U.S. Drug Seizures Analysis (2020–2024): Insights on Regional Trends, Drug Types, and Enforcement
Favicon
Quickstart Guide: Getting Started with Measurely
Favicon
Top 5 Analytics Dashboards to Track Metrics for Your App
Favicon
How Data Analytics in the Cloud Can Level Up Your App
Favicon
TrendSpotter
Favicon
EXPLORATORY DATA ANALYSIS (EDA) WITH PYTHON: UNCOVERING INSIGHTS FROM DATA
Favicon
Insightful Tips for AWS Analytics Cost Optimization
Favicon
From Data Zero to Data Hero: How Canvas Makes Everyone a BI Pro!
Favicon
Cloud Data Warehouse Challenges and Solutions
Favicon
https://techmindsacademy.in/courses/certification-course-in-data-analyst/
Favicon
Pipeline Analytics: Unlocking the Power of Data to Enhance Software Development
Favicon
Surge Datalab Private Limited
Favicon
Unlocking Growth with Data-Driven Decisions: How Analytics Can Transform Your Business
Favicon
Reoogle
Favicon
AI and Machine Learning: Transforming Business Analytics
Favicon
ClickHouse Vs DuckDB
Favicon
BigQuery
Favicon
How LSTMs Are Powering Predictive Analytics in Business by 2025
Favicon
Should I add Data Science or Analytics to my skills?
Favicon
Query 1B Rows in PostgreSQL >25x Faster with Squirrels!

Featured ones: