TimescaleDB in 2024: Making Postgres Faster

Published at

1/15/2025

1. Chunk-Skipping Indexes on Compressed Hypertables

This feature introduces a way to add min/max indexes on compressed hypertable chunks. The planner uses these to perform chunk exclusion on columns other than the default time-based one.

This can be useful in two main cases:

You’ve got a column that is correlated to the time-based partition in some way, and you’d like to use it efficiently in queries. The classic example is an order ID. As time increases, so does the order ID, so the min/max value will be tightly bound per chunk. Adding chunk skipping would allow you to use order ID in your WHERE clause and have all chunks but the matching one excluded.
You have a column in which you want to search for outliers that don’t occur often. Say you have a temperature column, and you want to find all rows that exceed a temperature threshold. You could enable chunk skipping and only have to scan the chunks that exceed the outlier value in your WHERE clause.

Let’s take a look at how it works:

-- Create the orders hypertable and enable compression
CREATE TABLE orders (ts timestamptz, orderid serial, orderdata JSONB);
SELECT create_hypertable ('orders', by_range ('ts'));
ALTER TABLE orders SET (timescaledb.compress);

-- Insert some data, this will make 52 chunks (one per week)
INSERT INTO orders 
  SELECT generate_series(now() - INTERVAL '1 year', now(), '1 minute');

-- Enable chunk skipping for the orders column then compress all the data
SET timescaledb.enable_chunk_skipping = on;
SELECT enable_chunk_skipping('orders', 'orderid');
SELECT compress_chunk(c) FROM show_chunks('orders') c;

-- Query by orderid alone, this would normally hit all 52 chunks :)
EXPLAIN ANALYZE SELECT * FROM orders WHERE orderid=1;

Custom Scan (DecompressChunk) on _hyper_95_10464_chunk  (cost=0.03..15.60 rows=560000 width=44) (actual time=0.032..0.033 rows=1 loops=1)
  Vectorized Filter: (orderid = 1)
  Rows Removed by Filter: 5605
  ->  Seq Scan on compress_hyper_96_10517_chunk  (cost=0.00..15.60 rows=560 width=116) (actual time=0.003..0.004 rows=6 loops=1)
Planning Time: 1.436 ms
Execution Time: 0.064 ms

-- You can see the query above only hit a single chunk!

As with most new features in TimescaleDB, chunk skipping is disabled by default at launch, so remember to set timescaledb.enable_chunk_skipping to true (either globally or per session) to let the planner know you’re using it. I’d recommend only enabling the feature for queries that you think will use it.

Performance gains are pretty data-dependent, but we have seen up to 7x when testing normal situations.

2. Faster Real-Time Continuous Aggregates and Continuous Aggregate Policies

Continuous aggregates (CAggs) are one of my favorite features in TimescaleDB. We already give you lightning-fast analytical queries out of the box, but CAggs take it a step further by taking a SQL query and automatically materializing it in the background. I’m cheating here by squashing two CAgg features into the same item, but I’m sure you can live with that.

Faster real-time continuous aggregates

Real-time continuous aggregates allow the computation of results for newly inserted data on the fly (before the background job has materialized). That data is then immediately contained in queries to the continuous aggregate alongside the already materialized data to give a full, up-to-date snapshot.

In Timescale 2.14.0, we made a change that increased the performance of real-time continuous aggregates by up to 50,000x (that’s not a typo). For those who want the technical details we constifyied our internal cagg_watermark function call, which unlocked plan-time chunk exclusion (as opposed to run-time) and made everything faster (read the link above for the full debrief).

If you haven't tried real-time (or even normal) CAggs, I’d encourage you to give them a go. Check out our newly rewritten GitHub Readme for a quick how-to.

Faster continuous aggregate policies

In the past, continuous aggregates could sometimes be a little I/O hungry when updates flowed in for older chunks. Before version 2.17.0, we would recalculate and rewrite each chunk with changes, but now we use an efficient merge policy to only write the changes to storage. Overall, continuous aggregate policies are lighter, use fewer system resources in the background, and complete faster.

For anyone building dashboards or running real-time analytics with late-arriving data, this is a game-changer.

3. Hypercore Introduction and Enhancements

We’ve had a hybrid row-columnar storage engine for a long time (it powers our compression), but this year we gave it an official name: hypercore. That wasn’t enough for the team, so during the year, they also knocked out some great hypercore improvements. My favorites were:

Optimized defaults : Timescale now recommends the best segment_by and order_by settings based on table configuration and statistics.
Smarter decompression : The query planner can now evaluate more WHERE conditions before decompressing data, reducing unnecessary I/O.
Min/max sparse indexes : Columns with btree indexes can now benefit from min/max sparse indexes after compression. This is similar to chunk skipping but operates on individual batches within chunks.
Vectorized filters : Queries with text equality or LIKE filters are now vectorized for faster execution.
Compressed tuple filtering : Up to 500x faster updates and deletes by avoiding the need to decompress and materialize irrelevant batches.
Faster upserts : Improved upsert performance by more than 300x on compressed data by forcing B-tree index scans.

We also continued our work on vectorization for real-time analytics queries, but that deserves its own mention.

4. SIMD Vectorization for Blazing Fast Analytical Queries

Real-time analytics over columnar (compressed) data got a massive performance boost this year thanks to more work on SIMD (Single Instruction, Multiple Data) vectorization. This lets us take advantage of modern CPUs when processing vectors (or arrays) of values, which hypercore can conveniently provide for each column. If you’re running queries that group by segment_by columns and use aggregate functions like sum, count, avg, min, or max, you’ll notice major speedups.

This is an area we are continuing to invest heavily in, you’ll hear a lot more about this in the coming months. If you’re after more information right now , then have a read of my post on the start of our vectorization journey from last year.

5. Foreign Keys on Hypertables

For years, developers have asked for support for foreign keys on hypertables. I can even remember asking for this shortly after TimescaleDB launched!

This year, we delivered: You can now add foreign keys from regular tables to hypertables, with full support for cascading actions. While it's not going to change the world, it's one of those little developer experience bumps that make your life a little easier.

Wrap-Up

What a year for TimescaleDB! 🚀 If you’re working with time-series data or want to keep enjoying Postgres familiarity but with a significant performance boost, give TimescaleDB a try. With these new features, you can optimize your queries, reduce your storage footprint, and make real-time analytics more efficient. Set it up, test it out, and see how it fits into your projects.

Got questions? Join the Timescale Slack community, where like-minded devs and Timescale team members share their tips and will help you get started.

dev-resources.site