dev-resources.site
for different kinds of informations.
Securing Amazon Redshift - Best Practices for Access Control
Introduction
Not long ago, I had the chance to conduct a knowledge transfer session focused on access management in Amazon Redshift for our partner client. As I started researching the topic, I realized something surprising - while there’s plenty of material explaining Redshift’s features, finding clear best practices or structured approaches for managing access was a challenge. Most of the information available felt basic and lacked the depth I was looking for. Yes, I also watched re:Invent videos.
This realization motivated me to dive deeper and build a better understanding of how to effectively secure Redshift. In this blog, I want to share what I’ve learned - not just the technical details, but practical strategies and insights I’ve gained from working hands-on with Redshift access management. Whether you’re setting up a new cluster or refining an existing one, I hope this guide gives you the tools and confidence to tackle access management with clarity and purpose.
Understanding Redshift Access Management
Of course, before we'll be able to apply these settings in Redshift, we need to understand how its security framework works first.
Redshift Built-In Security and Compliance
Let’s review the core components of Redshift security.
For this blog, I’ll skip VPC isolation since, if you’ve been using AWS for a while, you’re likely familiar with VPC-supported services.
1. Authentication
Redshift supports multiple authentication methods to verify user identities:
- Using IAM authentication to generate database user credentials - You can use AWS IAM (Users, Roles) to manage user access.
- Single Sign-On (SSO) - Simplifies user access by leveraging corporate identity providers.
- MFA (Multi-Factor Authentication) - You can enforce MFA for added security.
- Database Credentials - Traditional username and password-based authentication. Make sure to store your superuser or admin credentials in AWS Secrets Manager.
2. Authorization
Once users are authenticated, Redshift manages what they can access through robust authorization features:
- Default Permissions - By default, only the owner of a database object (schema, tables, views) can modify or delete it, ensuring secure defaults. THIS CONCEPT IS IMPORTANT.
- Users and Groups - Permissions can be granted to individual users or groups, allowing for more efficient access control. Groups are always advisable so that you won't have the pain of managing many users.
- Role-Based Access Control (RBAC) - Roles simplify managing permissions by grouping privileges and assigning them to users.
A Review on Authentication vs. Authorization:
Authentication verifies who a user is, ensuring only valid individuals or systems can connect to your Redshift cluster through methods like credentials, MFA, or federated SSO. Authorization, on the other hand, determines what authenticated users can access and do within the system, such as querying specific tables, viewing certain rows, or accessing sensitive columns.
3. Data Encryption
Security isn’t just about controlling access; it’s also about protecting the data itself:
- Data at Rest - Encrypted using AWS Key Management Service (KMS) or custom-managed keys.
- Data in Transit - Protected with SSL to ensure secure communication.
- Load Data Encryption - Ensures sensitive data remains protected even during data loading.
4. Advanced Access Management Features
- Column-Level Security (CLS) - Restricts access to sensitive columns (e.g., SSNs, mobile number, credit cards) while allowing access to non-sensitive data.
- Row-Level Security (RLS) - Ensures users see only the rows they are authorized to access, based on attributes or roles.
- Dynamic Data Masking - Masks sensitive information for users without full permissions, providing an extra layer of data protection.
5. Compliance
Third-party auditors assess the security and compliance of Amazon Redshift as part of multiple AWS compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others.
Here's a link for the list: Compliance validation for Amazon Redshift
Understanding these foundational components is the first step toward mastering Redshift access management as each of these works together.
Default Permissions
I wanted to expand more about the Default Permissions in Redshift. In my experience, overlooking how these default settings interact with specific roles or use cases can lead to prolonged back-and-forth conversations between users and administrators, especially when working across time zones. This is why it’s essential to understand how these permissions work and how to adjust them as needed.
New users are assigned a set of default permissions that determine their initial access rights. These permissions provide a basic level of access to the database, allowing users to perform common actions while restricting access to sensitive areas.
The Principle of Object Ownership
- When a user creates an object (e.g., a table, view, or schema) in Redshift, they automatically become its owner.
- Key Rule - Only the owner has permission to modify or drop the object unless they explicitly grant those permissions to others.
- This principle ensures that no unauthorized user can tamper with critical objects, even if they have general access to the schema.
GRANT
and ALTER DEFAULT PERMISSIONS
The GRANT
statement provides access to existing objects in Redshift, such as tables and views, for users, groups, or roles.
The ALTER DEFAULT PRIVILEGES
statement is used to manage permissions for future objects that will be created in a schema. This is especially useful in collaborative environments where you want new objects to automatically inherit specific access rules.
Regranting Permissions After Object Modification - If a table or view is recreated (e.g., dropped and re-created), all previously granted permissions are lost and must be reapplied. Use
ALTER DEFAULT PERMISSIONS
statement to solve this or executeGRANT
statement again.
Advanced Access Management Features
One of the requirements of a modern data platform is the ability to provide granular access control. Amazon Redshift delivers this through features such as Column-Level Security (CLS), Row-Level Security (RLS), and Dynamic Data Masking, ensuring users see only what they’re authorized to see. Let's go through them one-by-one.
1. Column-Level Security (CLS)
Column-Level Security restricts access to sensitive columns within a table while allowing users to interact with non-sensitive data. This prevents unnecessary exposure of information such as personal identifiers or payment details.
GRANT SELECT (customer_id, name, address) ON customers TO
GROUP customer_service_representatives;
Creating a view is also an option and can sometimes be a straightforward and effective way to control column-level access.
CREATE VIEW customer_info AS
SELECT customer_id, name, address
FROM customers;
You can also change this into a MATERIALIZED VIEW
.
2. Row-Level Security (RLS)
RLS restricts access to specific rows within a table based on user roles or attributes, ensuring that users see only the data relevant to them.
This feature is essential for scenarios where data segregation is required, such as multi-tenant environments or organizations with hierarchical roles (e.g., regional managers, department leads).
CREATE RLS POLICY region_a_policy
USING (region = 'Region A');
ATTACH RLS POLICY region_a_policy ON sales_data TO ROLE
sales_manager;
ALTER TABLE sales_data ROW LEVEL SECURITY ON;
- The
CREATE RLS POLICY
statement defines the filtering condition (region = 'Region A'). - The policy is attached to the
sales_data
table and applies only to users assigned theregional_manager_a
role. - Row-level security is enabled for the table with
ALTER TABLE
.
Again, you can always use a VIEW
for this.
3. Dynamic Data Masking
Dynamic Data Masking (DDM) is a powerful feature in Amazon Redshift that protects sensitive data by replacing it with masked values when accessed by users without full permissions. Unlike traditional encryption, which hides data entirely, masking obfuscates sensitive fields while maintaining their usability for tasks like reporting and analysis.
How it works:
Data Obfuscation - Sensitive information is replaced with masked values when accessed by users without full permissions.
Conditional Masking - Masking can be applied based on user roles or attributes, ensuring the right users see appropriate levels of data.
The masking is applied at query runtime, ensuring that the underlying data remains unchanged.
How to implement:
1. Create a MASKING POLICY
Define how sensitive data should be masked.
Example: Masking a credit card number to show only the last four digits:
CREATE MASKING POLICY mask_credit_card_full
WITH (credit_card VARCHAR(256))
USING ('000000XXXX0000'::TEXT);
2. Attach the MASKING POLICY
Apply the masking policy to the desired column and restrict access based on ROLES
. You can also attach this to USERS
.
Example: Attaching the policy to the credit_card column:
ATTACH MASKING POLICY mask_credit_card
ON credit_cards(credit_card)
TO ROLE customer_support_role;
Step 3: Test the Policy
When users with the customer_support_role
query the credit_card
column, they'll see masked values:
Result:
****-****-****-1234
Though, users with elevated permissions (e.g., administrators) will see the full credit card number.
Advanced: Custom Masking Logic
In cases where more complex masking logic is needed, you can define custom functions using Python as a language.
CREATE FUNCTION REDACT_CREDIT_CARD(credit_card TEXT)
RETURNS TEXT IMMUTABLE
AS $$
import re
regexp = re.compile("^([0-9]{6})[0-9]{5,6}([0-9]{4})")
match = regexp.search(credit_card)
if match:
first = match.group(1)
last = match.group(2)
else:
first = "000000"
last = "0000"
return f"{first}XXXXX{last}"
$$ LANGUAGE plpythonu;
CREATE MASKING POLICY custom_mask_credit_card
WITH (credit_card VARCHAR(256))
USING (REDACT_CREDIT_CARD(credit_card));
ATTACH MASKING POLICY custom_mask_credit_card
ON credit_cards(credit_card)
TO ROLE data_analyst_role;
Best Practices for Redshift Access Management
While Amazon Redshift provides many options for different use cases, it can sometimes feel overwhelming to decide which one to use. The key is to simplify your approach by focusing on your specific needs and objectives, rather than trying to use every available feature. Remember your KISS principle, people :) - keep it simple, stupid and straightforward.
In this section, let’s use the example of a Sales Report System where different teams, like sales analysts and regional managers, interact with data in various ways.
1. Role-Based Access Control (RBAC)
Instead of assigning permissions to individual users, define roles that are related to their job responsibilities.
CREATE ROLE sales_read_only;
GRANT SELECT ON ALL TABLES IN SCHEMA sales TO ROLE sales_read_only;
GRANT ROLE sales_read_only TO USER sales_analyst1;
RBAC can only be assigned to USERS not GROUPS.
A
USER
can have multipleROLES
2. Least Privilege Principle
If you’ve been in the cloud domain for some time now, you’ve likely heard this phrase countless times - and for good reason. The Least Privilege Principle means granting users only the permissions they need to perform their specific tasks. This approach minimizes risks by limiting access to resources, ensuring users can do their job without accidentally compromising security.
For instance, a Sales Analyst needs access to view the monthly_sales
table but shouldn’t be able to modify it.
3. Use Groups for Database-Level Security
Group users into logical categories to manage permissions more efficiently. If you've worked with IAM in AWS, you'd be familiar with this concept.
4. Secure Authentication
Strengthen how users authenticate to your Redshift cluster to prevent unauthorized access. A secure authentication process is the first line of defense against security breaches.
Store and manage database credentials securely using AWS Secrets Manager.
5. Implement Advanced Access Management Features
Leverage Redshift’s built-in capabilities for fine-grained access control.
Column-Level Security (CLS) - Restrict access to sensitive fields
Row-Level Security (RLS) - Ensure users see only the rows they are authorized to access.
Both of these features are very useful when creating reports, though a key consideration is to whether to apply this within Redshift side vs. BI tools like Power BI, Tableau, QuickSight, and etc. Each approach has its trade-offs and should align with your overall data governance strategy.
- Dynamic Data Masking (DDM) - Mask sensitive fields like credit card numbers for users without full permissions:
6. Use available Access Monitoring with Admin Scripts and Tools
Managing and monitoring access in Amazon Redshift becomes much easier when you have the right tools and scripts in place. To ensure a secure and well-governed environment, it’s a game-changer to use tools like Redshift Utils, which provide a collection of administrative views and scripts for monitoring and managing your Redshift clusters effectively.
Final Thoughts
Managing access in Amazon Redshift is about keeping your data safe while making sure users can do their jobs effectively. Simple practices like creating roles for specific tasks, giving only the necessary permissions, and using features like column and row restrictions have worked well for us and for our partners.
This approach has been effective in our case, but I’m always eager to learn from others. If you have similar practices, ideas, or constructive feedback, I’d love to hear them!
This blog is authored solely by me and reflects my personal opinions and experiences, not those of my employer. All references to products, including names, logos, and trademarks, belong to their respective owners and are used for identification purposes only.
Featured ones: