dev-resources.site
for different kinds of informations.
Real-Time Data Integration Techniques Using PostgreSQL Foreign Data Wrapper (FDW)
Introduction
When operating relational databases on AWS, there may be situations where you need to exchange data between multiple databases. For example, you might have Aurora PostgreSQL as your source database containing a Products
table, and you want to reference it in real time from a PostgreSQL database (the target) located in a different VPC. You then wish to perform a JOIN with the Transactions
table on the target side.
- No need for data duplication or ETL
- Real-time access to the latest source data
- Flexible use for historical aggregation and reporting
These requirements can be met using Foreign Data Wrapper (FDW). In this article, we’ll start by explaining what FDW is, then discuss its advantages and disadvantages, network configurations, and finally walk through the setup steps in order.
What Is Foreign Data Wrapper (FDW)?
Foreign Data Wrapper (FDW) is an extension feature of PostgreSQL that provides a mechanism allowing you to treat external data sources (e.g., tables from other databases) as if they were local tables.
In practice, you install an FDW extension (such as postgres_fdw
), then configure “servers,” “user mappings,” and “foreign tables” to reference the tables located on a remote database.
Advantages and Disadvantages of Using FDW
Advantages
Real-Time Access
You can always retrieve the latest data from the source database. There’s no need for ETL batch processing, making it simple to leverage the most up-to-date information.Improved Development Efficiency
By simply writing SQL JOIN statements, you can seamlessly integrate local and remote tables, simplifying application and data analysis implementations.Reduced Data Duplication
Compared to a batch approach that duplicates data, FDW can help minimize storage costs and prevent unnecessary data redundancy.
Disadvantages
Performance Risks
Because access is remote, query response times can vary depending on network latency and the load on the source database.High Dependency on the Source DB
If the source database experiences failures or heavy load, any queries passing through FDW will be directly affected.Limited Functionality in Some Cases
Certain PostgreSQL features (like transaction management or triggers) may not be available on foreign tables.
Network Configuration Steps (VPC Peering or Transit Gateway)
Even if the source (Aurora PostgreSQL) and the target (PostgreSQL) are in the same account but reside in different VPCs, you need to configure one of the following methods to securely enable network connectivity:
-
VPC Peering
- Create a peering connection between the source VPC and the target VPC.
- Update the route tables in each VPC to include a route pointing to the peering connection for the relevant IP addresses.
- Open port 5432 (PostgreSQL port) in the security groups’ inbound/outbound rules.
-
AWS Transit Gateway
- If you need to connect multiple VPCs or multiple Regions from a single hub, use a Transit Gateway.
- Attach each VPC to the Transit Gateway and configure the necessary routes.
- Similarly, make sure security groups and NACLs allow traffic on the required ports.
In either case, be sure to configure the correct security groups to grant access to Aurora.
Step-by-Step Guide to Setting Up FDW
Below is a set of concrete steps assuming the following configuration:
-
Source: Aurora PostgreSQL (in a separate VPC)
-
Endpoint
:maindb-instance-1.ctqegcgyi6u1.us-east-1.rds.amazonaws.com
- Contains a
Products
table
-
-
Target: PostgreSQL (in a separate VPC)
-
Endpoint
:subdb-instance-1.ctqegcgyi6u1.us-east-1.rds.amazonaws.com
- Contains a
Transactions
table and needs to JOIN withProducts
-
1. Preparation on the Aurora PostgreSQL (Source) Side
-
VPC Peering or Transit Gateway Configuration
- Ensure that the Aurora VPC and the target PostgreSQL VPC can communicate over the network.
- In the Aurora instance’s security group, allow inbound traffic on port 5432 from the target side.
-
Table Definition in the Source Database
CREATE TABLE Products ( product_id UUID PRIMARY KEY, product_name TEXT NOT NULL, category TEXT NOT NULL, price NUMERIC(10, 2) NOT NULL, stock INTEGER NOT NULL );
Assumes data is already populated.
2. Preparation on the PostgreSQL (Target) Side
-
Launch the Target PostgreSQL Instance
- This could be RDS for PostgreSQL, PostgreSQL on EC2, a container-based setup, etc.
- Make sure port 5432 is also open to accept connections.
-
Table Definition in the Target Database
CREATE TABLE Transactions ( transaction_id UUID PRIMARY KEY, product_id UUID NOT NULL, quantity INTEGER NOT NULL, total_price NUMERIC(10, 2) NOT NULL, transaction_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
This will be used in the future together with the
foreign_products
table (FDW) for JOIN operations. Installing and Enabling FDW
On the target database (PostgreSQL), make postgres_fdw
available for use:
-- Execute on the target database
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
Note: Depending on your managed service or PostgreSQL version, this extension may be disabled by default. You may need to enable it via the management console or similar interface.
- Creating a Server Connection (CREATE SERVER)
Register the connection information for the source database (Aurora PostgreSQL) in the target database:
CREATE SERVER source_aurora FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
host 'maindb-instance-1.ctqegcgyi6u1.us-east-1.rds.amazonaws.com',
port '5432',
dbname 'postgres' -- Name of the source database
);
Here, “source_aurora” is simply an identifier.
The “host” and “port” specify the Aurora connection details.
“dbname” is the source database name.
- Creating a User Mapping (CREATE USER MAPPING)
Set up the authentication details needed to access the source via FDW:
CREATE USER MAPPING FOR target_user SERVER source_aurora
OPTIONS (
user 'aurora_user',
password 'aurora_password'
);
Sample:
CREATE USER MAPPING FOR postgres SERVER source_aurora
OPTIONS (
user 'postgres',
password 'xxx'
);
“FOR target_user” refers to a user present on the target PostgreSQL side.
“user” and “password” are valid credentials for Aurora PostgreSQL.
- Creating a Foreign Table (CREATE FOREIGN TABLE)
Define the Products
table from the Aurora source as a “foreign table” on the target:
CREATE FOREIGN TABLE foreign_products (
product_id UUID,
product_name TEXT,
category TEXT,
price NUMERIC(10, 2),
stock INTEGER
) SERVER source_aurora
OPTIONS (
schema_name 'public', -- Schema in the source DB
table_name 'products' -- Table name in the source DB
);
This allows you to run a simple query, such as SELECT * FROM foreign_products;
on the target database, which in turn retrieves data from the Products
table in the remote Aurora database.
Tip: For PostgreSQL 9.5 and later, you can use the IMPORT FOREIGN SCHEMA
feature to import multiple tables from the source schema at once, which is convenient when you need to work with numerous tables.
Confirming the Configuration
You can verify foreign table information using the following query:
SELECT
fs.srvname AS server_name,
fs.srvoptions AS server_options,
um.umuser AS user_id,
um.umoptions AS user_options,
ft.foreign_table_name AS table_name,
ft.foreign_table_schema AS table_schema
FROM
pg_foreign_server fs
LEFT JOIN
pg_user_mappings um
ON
fs.oid = um.srvid -- changed from umserver to srvid
LEFT JOIN
information_schema.foreign_tables ft
ON
fs.srvname = ft.foreign_server_name;
Sample Results
Attribute | Value |
---|---|
server_name | source_aurora |
server_options | {host=maindb-instance-1.ctqegcgyi6u1.us-east-1.rds.amazonaws.com, port=5432, dbname=postgres} |
user_id | 16400 |
user_options | {user=postgres, password=xxx} |
table_name | foreign_products |
table_schema | public |
Example of a JOIN Query
Once the configuration is set up correctly, you can easily join the target’s Transactions
table with the remote foreign_products
:
SELECT
t.transaction_id,
f.product_name,
f.category,
t.quantity,
t.total_price,
t.transaction_date
FROM
transactions t
JOIN
foreign_products f
ON
t.product_id = f.product_id
WHERE
f.category = 'Electronics';
- Fields such as
f.product_name
andf.category
come from the Aurora table in real time. -
t.transaction_date
is a local column in the target (Transactions
) table.
This makes it straightforward to leverage the latest source database information on the target side.
Sample Results (Excerpt)
transaction_id | product_name | category | quantity | total_price | transaction_date
--------------------------------------+--------------+-------------+----------+-------------+---------------------
f8b8b097-75ce-47e6-8fb8-42adbf3456fc | Growth | Electronics | 4 | 421.16 | 2025-01-02 21:57:27
8b8a1390-6f24-4d07-9239-adc5df9ab92e | Lose | Electronics | 6 | 448.08 | 2025-01-03 03:29:54
86386a4b-4eda-4ff4-9854-87b65d58e3aa | Without | Electronics | 7 | 2537.57 | 2025-01-01 20:44:28
203291b1-24af-443e-ad34-092151a99c61 | Wonder | Electronics | 8 | 2098.00 | 2025-01-02 14:48:15
fd75b3fa-f564-4599-9faf-c29dd42f82e3 | How | Electronics | 3 | 1176.63 | 2025-01-02 15:22:55
42a281a5-de30-4bf2-b528-bd3b942e35b6 | Pattern | Electronics | 2 | 186.56 | 2025-01-01 06:54:58
03a7c005-9f1c-46ca-b102-352dd0ed32ae | Win | Electronics | 6 | 2955.60 | 2025-01-01 13:39:38
6bc0bdd8-0d9e-47a8-853a-69ac235ca983 | Successful | Electronics | 2 | 986.18 | 2025-01-02 23:59:29
aaffdb6e-dbd1-4f33-a788-d62bdbf012bd | Every | Electronics | 1 | 184.57 | 2025-01-02 07:48:06
76f0a5db-297a-42fc-b8e6-d0e874031471 | Sing | Electronics | 3 | 1426.11 | 2025-01-03 02:17:25
b1dd0e57-4b49-413c-9514-aeb879200138 | Paper | Electronics | 5 | 481.60 | 2025-01-01 04:53:51
23f60790-6cb8-4c67-8c2c-60afe30289fb | Teacher | Electronics | 6 | 429.30 | 2025-01-02 21:37:51
638f7951-7a70-4da5-b87d-41e681143811 | Society | Electronics | 3 | 302.22 | 2025-01-04 04:47:11
Performance
In PostgreSQL’s Foreign Data Wrapper (FDW), there is no built-in caching mechanism. Each time you access a foreign table, a query is sent to the remote database (the server configured for FDW) to retrieve the latest data.
Below are some performance measurements taken when retrieving data within the same AWS Region. The following query yielded these results:
(5987 rows) Time: 1094.468 ms (00:01.094)
Overall, this performance is quite good and can be considered practical for real-world use.
Tuning 1
PostgreSQL Foreign Data Wrapper (FDW) can push parts of query planning to the remote database to streamline communication. This is often referred to as query pushdown. Thanks to pushdown, not all data must be retrieved every time.
Pushdown Explained
FDW minimizes unnecessary data transfer by sending parts of the query to the remote database. For instance:
- WHERE clauses
- LIMIT
- Some JOIN conditions
- Queries that select only specific columns (SELECT column_name)
Query Optimization Tuning
Sample Query
To illustrate room for improvement, let’s consider a less optimal SQL query:
SELECT
p1.product_name,
SUM(t.quantity) AS total_sold
FROM
transactions t
JOIN (
SELECT
*
FROM
foreign_products ) p1
ON
t.product_id = p1.product_id
WHERE
t.transaction_date >= '2022-01-01'
AND
p1.category = 'Electronics'
AND
p1.product_id
IN (
SELECT
product_id
FROM
foreign_products
WHERE
stock > 10
)
GROUP BY
p1.product_name;
(830 rows) Time: 531.426 ms
Although not terrible, it can be improved. Let’s analyze the query plan via EXPLAIN VERBOSE (sql)
Problem Analysis
-
Double Scanning of the Remote Table
The remote table
foreign_products
is being scanned twice:- First scan:
SELECT product_id, product_name FROM public.products WHERE ((category = 'Electronics'))
- Retrieves 6 rows for `category = 'Electronics'`.
- Second scan:
SELECT product_id FROM public.products WHERE ((stock > 10))
- Retrieves 683 rows for `stock > 10`.
Problem:
- The remote SQL is executed twice, increasing the communication cost with the remote database.
- Data is unnecessarily processed by a local Hash Join.
- Unnecessary Local Processing
- There is a local Hash Join happening.
- Conditions like `category` and `stock` that could be handled remotely are instead processed locally, reducing efficiency.
- Full Table Scan on the Local
transactions
Table
- The local `transactions` table is undergoing a Seq Scan:
`Seq Scan on public.transactions t (cost=0.00..685.00 rows=29997 width=20) Filter: (t.transaction_date >= '2022-01-01')`
- Impact:
- Because there is no index on `transaction_date`, all 29,997 rows are being scanned.
Points for Improvement
-
Perform JOIN and condition filtering remotely
Apply conditions (
category = 'Electronics'
andstock > 10
) on the remote database to reduce unnecessary local joins. For instance:
WITH filtered_products AS (
SELECT
product_id,
product_name
FROM
foreign_products
WHERE
category = 'Electronics'
AND
stock > 10
)
SELECT
p.product_name,
SUM(t.quantity) AS total_sold
FROM
transactions t
JOIN
filtered_products p
ON
t.product_id = p.product_id
WHERE
t.transaction_date >= '2022-01-01'
GROUP BY
p.product_name;
-
Add an Index to the Local Table
Create an index on
transaction_date
in thetransactions
table to avoid a full table scan.
Results
(830 rows) Time: 387.544 ms
This is roughly a 30% reduction in execution time.
From the query plan, you can see that remote access has been consolidated into a single call. While this example is not unique to foreign tables, inefficiencies in the query become more evident when using FDW. Hence, analyzing queries with EXPLAIN
is crucial for avoiding performance issues.
Tuning 2 – Using Materialized Views for Caching
As another method of performance tuning, consider using caching. Since FDW does not provide built-in caching by default, you can use a Materialized View to cache data locally. This approach offloads frequent remote queries and complex processes from the remote database to the local environment, reducing load on the remote data source and further shortening query execution times.
Materialized View
Below is an example of caching the entire foreign_products
table:
CREATE MATERIALIZED VIEW products AS SELECT * FROM foreign_products;
SQL
Next, use the materialized view to filter data. In the following query, only the reference to the table has been changed from foreign_products
to the materialized view named products
.
WITH filtered_products AS (
SELECT
product_id,
product_name
FROM
products
WHERE
category = 'Electronics'
AND
stock > 10
)
SELECT
p.product_name,
SUM(t.quantity) AS total_sold
FROM
transactions t
JOIN
filtered_products p
ON
t.product_id = p.product_id
WHERE
t.transaction_date >= '2022-01-01'
GROUP BY
p.product_name;
(830 rows) Time: 353.511 ms
This yields roughly a 10% reduction in execution time. The exact effect will vary depending on data volume, query complexity, master database location, and how often the view is accessed. In this example, the entire table was cached, but caching just the subset of data you need may improve performance even more.
Because a materialized view will not reflect real-time data updates, its use depends on your requirements. For instance, in a batch workflow, you could refresh the materialized view (to get the latest data) before running the main process, then use the view locally for faster batch processing. This can be very effective.
Operational Points and Considerations
- Performance Monitoring
- Since FDW is a remote connection, network latency may significantly impact queries.
- If you perform frequent JOIN operations, consider indexing and caching strategies on the source side as needed.
- Load on the Source Database
- FDW directly references the latest data on the source. Repeatedly issuing heavy queries can affect the source’s performance.
- Monitor query frequency and complexity, and consider using materialized views if appropriate.
- Permissions and Security
- In addition to VPC peering and security group settings, ensure proper roles and privileges are set at the database level to restrict unwanted access.
- Combining Partial Data Migration
- If certain data does not need to be accessed in real time, using ETL or snapshots to bring it into the target might be faster.
- FDW is not a one-size-fits-all solution; combine various approaches based on the specific use case.
- Dependency on PostgreSQL
- Because this approach depends on PostgreSQL features, it may constrain future system architecture decisions. Conduct sufficient reviews before committing to this design.
Summary
By using Foreign Data Wrapper (FDW), you can JOIN tables across Aurora PostgreSQL (source) and Aurora PostgreSQL (target) even if they reside in separate VPCs, making them appear as if they were part of the same database. Proper network configuration (VPC Peering or Transit Gateway) and basic FDW setup (server, user mapping, and foreign tables) enable real-time data access with minimal effort.
- Advantages: Real-time access, enhanced development efficiency, reduced data duplication
- Disadvantages: Potential performance risks, dependency on the source DB, limited functionality in certain scenarios
The key is to set up your network, security, and source-side optimizations with care. This approach is especially effective for scenarios where on-demand access to the latest data is required alongside seamless local table integration on the target side. Consider introducing it in your next project if it meets your requirements.
Featured ones: