Logo

dev-resources.site

for different kinds of informations.

How to build your own data platform. Episode 2: authorization layer. Data Warehouse implementation.

Published at
6/4/2023
Categories
datamesh
dataplatform
redshift
lakeformation
Author
gumartinm
Author
9 person written this
gumartinm
open
How to build your own data platform. Episode 2: authorization layer. Data Warehouse implementation.

Introduction.

This article is the second part of the episode about building an authorization layer for your data platform. You can find the whole list of articles following this link: https://medium.com/@gu.martinm/list/how-to-build-your-own-data-platform-9e6f85e4ce39

In the previous article we talked about how to implement the authorization layer in the Data Lake, in this second part we will be talking about the same but in the Data Warehouse.


Authorization layer.

Image description

You can see in this diagram the Lakehouse with its metastore and the Data Warehouse. We already talked about the authorization layer for the Lakehouse in the previous article. Now it is the turn for the Data Warehouse.

Because we will be using Amazon Web Services with AWS Redshift, we will be implementing this layer using Lake Formation.

Processing layer.

Image description

Human users and processes will be the ones accessing the stored data through the authorization layer. Machines and processes like Zeppelin notebooks, AWS Athena for SQL, clusters of AWS EMR, Databricks, etc, etc.

The problem with the authorization.

Data engineers, data analysts and data scientists work in different and sometimes isolated teams. They do not want their data to be deleted or changed by tools or people outside their teams.

Data owners are typically in charge of granting access to their data.

Owner — consumer, relationship.

Image description

  1. A data consumer requests access to some data owned by a different team in a different domain. For example, a table in a database.

  2. The data owner grants access by approving the access request.

  3. Upon the approval of an access request, a new permission is added to the specific table.

Our authorization layer must be able to provide the above capability if we want to implement a data mesh with success.

Data Warehouse, AWS Redshift.

Image description

The Data Warehouse is implemented on the top of AWS Redshift. Not many years ago a new service was released by Amazon called AWS Redshift RA3. What makes RA3 different from the old Redshift is that, in the new implementation, computation and storage are separated. Before having RA3, if users needed more storage capabilities, more computation had also to be paid even if computation was not a problem. And in the opposite way, when users needed more computation capabilities, more storage had to be paid. So, Redshift costs were typically high.

We will be using AWS Redshift RA3. Here you can find some useful links that explain further what are AWS Redshift and AWS Redshift RA3:

Data Warehouse, AWS Redshift RA3.

Image description

Amazon Redshift data sharing allows you to securely and easily share data for read purposes across different Amazon Redshift clusters without the complexity and delays associated with data copies and data movement. Data can be shared at many levels, including schemas, tables, views, and user-defined functions, providing fine-grained access controls that can be tailored for different users and businesses that all need access to the data.

Lake Formation can be integrated with data sharing.

For further information visit the following links:

Authorization, Federated Lake Formation.

Using Lake Formation with AWS Redshift RA3 we can manage the permissions across different accounts from only one central account in a federated way. We are delegating permissions to other accounts but we keep the control of them.

Image description

Authorization, implementation.

In order to implement federated authorization with AWS Redshift RA3 you can follow the next steps:

AWS Redshift RA3, producer account:

  • CREATE DATASHARE producer_sharing
  • GRANT USAGE ON DATASHARE producer_sharing TO ACCOUNT ‘FEDERATED_GOVERNANCE’
  • ALTER DATASHARE producer_sharing ADD SCHEMA producer_schema

AWS Redshift RA3, consumer account:

  • CREATE DATASHARE consumer_sharing
  • GRANT USAGE ON DATASHARE consumer_sharing TO ACCOUNT ‘FEDERATED_GOVERNANCE’
  • ALTER DATASHARE consumer_sharing ADD SCHEMA consumer_schema

AWS Redshift RA3, main federated account:

  • Through Lake formation console, allow access from consumer account to producer_sharing. You can see a screenshot about this configuration down below.

Image description

With the above configuration, the query from the consumer account will only see the column brand_id.

Image description

Conclusion.

In this article we have explained how you can implement an authorization layer using AWS AWS Redshift RA3 and AWS Lake Formation.

With this authorization layer we will be able to resolve the following problems:

  • Producers and consumers from different domains must have the capability of working in an isolated way (if they wish so) if we want to implement a data mesh with success.

  • Producers must be able to decide how consumers can access their data. They are the data owners, and they decide how others use their data.

  • Fine grained permissions can be established. At column and even if we want, at row level. This will be of great interest if we want to be GDPR compliant. More information about how to implement the GDPR in your own data platform will be explained in future articles.

Stay tuned for the next article about how to implement your own Data Platform with success.


I hope this article was useful. If you enjoy messing around with Big Data, Microservices, reverse engineering or any other computer stuff and want to share your experiences with me, just follow me.

redshift Article's
30 articles in total
Favicon
Securing Amazon Redshift - Best Practices for Access Control
Favicon
Migrate from Native Google to AWS Redshift: Benefits and Best Practices
Favicon
A Comprehensive Guide to Establishing a Successful Connection to Amazon Redshift Using the ODBC Driver
Favicon
Cloud Data Warehouse Comparison: Who’s the Real MVP?
Favicon
Move Data from DynamoDB to Redshift Using Estuary
Favicon
Amazon Redshift Workload Management (WLM): A Step-by-Step Guide
Favicon
Building a Scalable Data Platform: Addressing Uncertainty in Data Requirements with AWS
Favicon
Step by Step process to setup Redshift datashare across Redshift clusters
Favicon
Amazon Redshift guia de estudio (Comunidad AWS ML Latam)
Favicon
Data Governance on AWS using DataZone
Favicon
How to Migrate Amazon Redshift to a Different Account and Region: Step-by-Step Guide
Favicon
Channel Your Inner Scrooge with Redshift Reserved Instances: Slash Your Cloud Bill Like a Boss
Favicon
Streamline SSO Access to AWS Redshift Query Editor with Okta and Terraform
Favicon
Two ways to manage secrets for AWS Redshift Serverless with AWS Secrets Manager !!
Favicon
Three Ways to Retrieve Row Counts in Redshift Tables and Views
Favicon
Optimising Sentiment Analysis Workflows: AWS Zero-ETL and Amazon Redshift Synergy-Part 1
Favicon
Leveraging AWS Redshift for Your Organization's Needs
Favicon
From Relational to Analytical: The Power of Redshift Data Warehousing and Analytics
Favicon
Interactions Tracker, Part 3: Why I stopped and Lessons Learned
Favicon
A Comprehensive Guide to AWS DynamoDB vs. Redshift for Databases and Data Warehouses
Favicon
Understanding Redshift
Favicon
Unlearning what you know about relational databases to unlock the power of Redshift
Favicon
4 reasons why your lambda function cannot communicate with RedShift
Favicon
Query Amazon Redshift from YugabyteDB though PostgreSQL Foreign Data Wrapper and VPC peering
Favicon
Run analytical queries on SAP data with Redshift Serverless powered by Amazon AppFlow
Favicon
AWS Redshift: Robust and Scalable Data Warehousing
Favicon
How to build your own data platform. Episode 2: authorization layer. Data Warehouse implementation.
Favicon
Data Analysis with Redshift Serverless and Quicksight - Part 1
Favicon
DBT + REDSHIFT = ❤
Favicon
Analytics on AWS — Amazon Redshift

Featured ones: