dev-resources.site
for different kinds of informations.
TimescaleDB in 2024: Making Postgres Faster
If I had to summarize 2024 for Timescale, Iâd call it the year of Postgres for AI. From game-changing open-source launches like pgvectorscale (a performance booster for large production vector workloads with PostgreSQL + pgvector) to pgai (which integrates Postgres with LLMs for AI app development), we pushed the boundaries of what developers can achieve with Postgres. But AI wasnât the only story. TimescaleDBâs time-series capabilities also evolved dramatically during 2024, making Postgres even faster for real-time analytics.
For those of you who donât know, TimescaleDB is a PostgreSQL extension for high-performance real-time analytics on time series and event data. It is available as an open-source extension or fully managed on Timescale Cloud.
I love TimescaleDB (I was a user well before I worked at Timescale â¤ď¸), so in the spirit of sharing that love, here are my top five TimescaleDB features from 2024 and why you should care (or at least why I do đ ).
1. Chunk-Skipping Indexes on Compressed Hypertables
This feature introduces a way to add min/max indexes on compressed hypertable chunks. The planner uses these to perform chunk exclusion on columns other than the default time-based one.
This can be useful in two main cases:
- Youâve got a column that is correlated to the time-based partition in some way, and youâd like to use it efficiently in queries. The classic example is an order ID. As time increases, so does the order ID, so the min/max value will be tightly bound per chunk. Adding chunk skipping would allow you to use order ID in your
WHERE
clause and have all chunks but the matching one excluded. - You have a column in which you want to search for outliers that donât occur often. Say you have a temperature column, and you want to find all rows that exceed a temperature threshold. You could enable chunk skipping and only have to scan the chunks that exceed the outlier value in your
WHERE
clause.
Letâs take a look at how it works:
-- Create the orders hypertable and enable compression
CREATE TABLE orders (ts timestamptz, orderid serial, orderdata JSONB);
SELECT create_hypertable ('orders', by_range ('ts'));
ALTER TABLE orders SET (timescaledb.compress);
-- Insert some data, this will make 52 chunks (one per week)
INSERT INTO orders
SELECT generate_series(now() - INTERVAL '1 year', now(), '1 minute');
-- Enable chunk skipping for the orders column then compress all the data
SET timescaledb.enable_chunk_skipping = on;
SELECT enable_chunk_skipping('orders', 'orderid');
SELECT compress_chunk(c) FROM show_chunks('orders') c;
-- Query by orderid alone, this would normally hit all 52 chunks :)
EXPLAIN ANALYZE SELECT * FROM orders WHERE orderid=1;
Custom Scan (DecompressChunk) on _hyper_95_10464_chunk (cost=0.03..15.60 rows=560000 width=44) (actual time=0.032..0.033 rows=1 loops=1)
Vectorized Filter: (orderid = 1)
Rows Removed by Filter: 5605
-> Seq Scan on compress_hyper_96_10517_chunk (cost=0.00..15.60 rows=560 width=116) (actual time=0.003..0.004 rows=6 loops=1)
Planning Time: 1.436 ms
Execution Time: 0.064 ms
-- You can see the query above only hit a single chunk!
As with most new features in TimescaleDB, chunk skipping is disabled by default at launch, so remember to set timescaledb.enable_chunk_skipping
to true (either globally or per session) to let the planner know youâre using it. Iâd recommend only enabling the feature for queries that you think will use it.
Performance gains are pretty data-dependent, but we have seen up to 7x when testing normal situations.
2. Faster Real-Time Continuous Aggregates and Continuous Aggregate Policies
Continuous aggregates (CAggs) are one of my favorite features in TimescaleDB. We already give you lightning-fast analytical queries out of the box, but CAggs take it a step further by taking a SQL query and automatically materializing it in the background. Iâm cheating here by squashing two CAgg features into the same item, but Iâm sure you can live with that.
Faster real-time continuous aggregates
Real-time continuous aggregates allow the computation of results for newly inserted data on the fly (before the background job has materialized). That data is then immediately contained in queries to the continuous aggregate alongside the already materialized data to give a full, up-to-date snapshot.
In Timescale 2.14.0, we made a change that increased the performance of real-time continuous aggregates by up to 50,000x (thatâs not a typo). For those who want the technical details we constifyied our internal cagg_watermark
function call, which unlocked plan-time chunk exclusion (as opposed to run-time) and made everything faster (read the link above for the full debrief).
If you haven't tried real-time (or even normal) CAggs, Iâd encourage you to give them a go. Check out our newly rewritten GitHub Readme for a quick how-to.
Faster continuous aggregate policies
In the past, continuous aggregates could sometimes be a little I/O hungry when updates flowed in for older chunks. Before version 2.17.0, we would recalculate and rewrite each chunk with changes, but now we use an efficient merge policy to only write the changes to storage. Overall, continuous aggregate policies are lighter, use fewer system resources in the background, and complete faster.
For anyone building dashboards or running real-time analytics with late-arriving data, this is a game-changer.
3. Hypercore Introduction and Enhancements
Weâve had a hybrid row-columnar storage engine for a long time (it powers our compression), but this year we gave it an official name: hypercore. That wasnât enough for the team, so during the year, they also knocked out some great hypercore improvements. My favorites were:
-
Optimized defaults : Timescale now recommends the best
segment_by
andorder_by
settings based on table configuration and statistics. -
Smarter decompression : The query planner can now evaluate more
WHERE
conditions before decompressing data, reducing unnecessary I/O. -
Min/max sparse indexes : Columns with
btree
indexes can now benefit from min/max sparse indexes after compression. This is similar to chunk skipping but operates on individual batches within chunks. -
Vectorized filters : Queries with text equality or
LIKE
filters are now vectorized for faster execution. - Compressed tuple filtering : Up to 500x faster updates and deletes by avoiding the need to decompress and materialize irrelevant batches.
- Faster upserts : Improved upsert performance by more than 300x on compressed data by forcing B-tree index scans.
We also continued our work on vectorization for real-time analytics queries, but that deserves its own mention.
4. SIMD Vectorization for Blazing Fast Analytical Queries
Real-time analytics over columnar (compressed) data got a massive performance boost this year thanks to more work on SIMD (Single Instruction, Multiple Data) vectorization. This lets us take advantage of modern CPUs when processing vectors (or arrays) of values, which hypercore can conveniently provide for each column. If youâre running queries that group by segment_by
columns and use aggregate functions like sum
, count
, avg
, min
, or max
, youâll notice major speedups.
This is an area we are continuing to invest heavily in, youâll hear a lot more about this in the coming months. If youâre after more information right now , then have a read of my post on the start of our vectorization journey from last year.
5. Foreign Keys on Hypertables
For years, developers have asked for support for foreign keys on hypertables. I can even remember asking for this shortly after TimescaleDB launched!
This year, we delivered: You can now add foreign keys from regular tables to hypertables, with full support for cascading actions. While it's not going to change the world, it's one of those little developer experience bumps that make your life a little easier.
Wrap-Up
What a year for TimescaleDB! đ If youâre working with time-series data or want to keep enjoying Postgres familiarity but with a significant performance boost, give TimescaleDB a try. With these new features, you can optimize your queries, reduce your storage footprint, and make real-time analytics more efficient. Set it up, test it out, and see how it fits into your projects.
Got questions? Join the Timescale Slack community, where like-minded devs and Timescale team members share their tips and will help you get started.
Featured ones: