dev-resources.site
for different kinds of informations.
Index Filtering in PostgreSQL and YugabyteDB (Index Scan instead of Index Only Scan)
This is inspired by Henrietta Dombrovskaya's recent post (Can we use this index, please? โ Why not?), which reveals an SQL optimization method that requires the user to rewrite the query in PostgreSQL. This is also a tiny counter-example to my general assertion that seeking the fully covering index, the Index Only Scan, isn't always necessary, as I explained in Covering Index nuances: which columns to cover (WHERE, ORDER BY, LIMIT, SELECT)?
In this scenario, with PostgreSQL, an Index-Only Scan was necessary to optimize the filtering, even with an index that includes all columns utilized by the filter condition.
Additionally, I will demonstrate that in some cases, YugabyteDB outperforms by filtering directly on the index entries, even during an Index Scan, and a smaller index can be good enough.
Example with an immutable function (EXTRACT on TIMESTAMP)
I'll start with an example that diverges from Hettie's case. For the demonstration with immutable function predicates, I use TIMESTAMP instead of TIMESTAMPTZ, although TIMESTAMPTZ is often recommended for storing points in time as they relate to a time zone.
PostgreSQL with immutable function
I created the following table where columns "category" and "created_at" are indexed:
create table demo (
id bigserial
,category int
,created_at timestamp
,filler text default lpad('x',9999,'x')
)
;
insert into demo(category,created_at)
select 0, now() + generate_series(0,99999) * interval '1 minute'
;
create index on demo ( category, created_at )
;
vacuum analyze demo
;
This table is stored in 2223 pages of heap table and 388 pages of index, as visible from the query planner statistics and also when scanning them though explain (analyze buffers)
:
postgres=> select relname, relpages, reltuples from pg_class where oid in('demo'::regclass,'demo_category_created_at_idx'::regclass);
relname | relpages | reltuples
------------------------------+----------+-----------
demo | 2223 | 100000
demo_category_created_at_idx | 388 | 100000
(2 rows)
postgres=> explain (analyze, costs off, summary off, buffers)
select * from demo;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on demo (actual time=0.008..5.409 rows=100000 loops=1)
Buffers: shared hit=2223
postgres=> explain (analyze, costs off, summary off, buffers)
select category, created_at from demo;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Only Scan using demo_category_created_at_idx on demo (actual time=0.046..7.013 rows=100000 loops=1)
Heap Fetches: 0
Buffers: shared hit=389
(3 rows)
The following query reads rows for one "category" within a range of "created_at", not very selective, and an additional filter on the hour and minute, which is very selective:
postgres=> explain (analyze, costs off, summary off, buffers)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_category_created_at_idx on demo (actual time=0.172..25.991 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone))
Filter: ((EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Rows Removed by Filter: 99930
Buffers: shared hit=2609
The result is 70 rows, and all the WHERE clause predicates can be applied to the index entries, which include "category" and "created_at" in their key. Then, I could expect a maximum of 70 buffers from the table to get the non-indexed columns for the "SELECT *" projection. Seventy rows accessed by CTID should be fast and not require an Index Only Scan. However, this query read most of the table, 2609 pages from the shared buffers.
The reason is visible on the execution plan: the predicate on the hour and minute extracted from "created_at" is in Filter
, which applies after the Index Scan
, after reading the table. Its impact is visible in Rows Removed by Filter
: 99930 rows have been fetched from the table to be discarded afterward when applying this filter.
Note that a bitmap scan execution plan differentiates the access to the index and the heap and it is more visible that the Filter
applies on the Heap Scan
and not the Index Scan
:
postgres=> explain (analyze, costs off, summary off, buffers)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on demo (actual time=4.271..23.077 rows=70 loops=1)
Recheck Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone))
Filter: ((EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Rows Removed by Filter: 99930
Heap Blocks: exact=2223
Buffers: shared hit=2609
-> Bitmap Index Scan on demo_category_created_at_idx (actual time=3.863..3.864 rows=100000 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone))
Buffers: shared hit=386
To filter after the index access but before the heap access, we can explicitly separate the index and table access, as Hettie exposed in her blog post:
postgres=> explain (analyze, costs off, summary off, buffers)
select demo.* from demo where (category,created_at) in (
select category,created_at from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (actual time=18.120..18.449 rows=70 loops=1)
Buffers: shared hit=669
-> Unique (actual time=18.090..18.103 rows=70 loops=1)
Buffers: shared hit=389
-> Sort (actual time=18.087..18.090 rows=70 loops=1)
Sort Key: demo_1.category, demo_1.created_at
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=389
-> Index Only Scan using demo_category_created_at_idx on demo demo_1 (actual time=0.182..18.063 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone))
Filter: ((EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Rows Removed by Filter: 99930
Heap Fetches: 0
Buffers: shared hit=389
-> Index Scan using demo_category_created_at_idx on demo (actual time=0.004..0.005 rows=1 loops=70)
Index Cond: ((category = 0) AND (created_at = demo_1.created_at))
Buffers: shared hit=280
The subquery selects only the columns included in the index from an Index Only Scan
, and all filtering happens there. Access to the table is more expensive, with an additional Index Scan on the same index, but it occurs on a limited number of rows in this case (70 rows instead of 70+99930=100000 in the previous case).
YugabyteDB and immutable function (EXTRACT on TIMESTAMP)
I have run the same on YugabyteDB. There's no need for any trick, and the dist
option displays more information about the number of rows scanned from the index and the table:
yugabyte=# explain (analyze, costs off, summary off, dist)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_category_created_at_idx on demo (actual time=87.445..87.968 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone))
Storage Index Filter: ((EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 1.630 ms
Storage Table Rows Scanned: 70
Storage Index Read Requests: 1
Storage Index Read Execution Time: 85.479 ms
Storage Index Rows Scanned: 100000
Like in PostgreSQL, it is an Index Scan because the index does not cover all columns selected, but there's a Storage Index Filter
that is applied before the access to the table which would be seen as Storage Table Filter
if pushed down, or Filter
if not.
Like with PostgreSQL, 100000 index entries were read, but the filter applied there meant that only 70 table rows were read, similar to the query with Hettie's rewrite, without changing the query.
This is an optimization by Yugabyte to reduce the network transfer on distributed tables, which benefits this case. However, not all functions can be pushed down, mainly when they depend on server configuration. Then, the behavior reverts to PostgreSQL so that the configuration of the query layer applies. We will look at this later when using TIMESTAMPTZ instead of TIMESTAMP.
First, I would like to show that it is not a limitation of Index Scan vs. Index Only Scan, and partially covering indexes can help.
Example with an index covering the immutable function (EXTRACT on TIMESTAMP)
I used only immutable functions (extract from a TIMESTAMP), and it is possible to cover the predicates without a fully covering Index Only Scan:
create index on demo (
category, created_at
, (extract (hour from created_at))
, (extract (minute from created_at))
)
;
PostgreSQL can apply the filter on the index entries, visible in Index Cond
:
postgres=> explain (analyze, costs off, summary off, buffers)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_category_created_at_extract_extract1_idx1 on demo (actual time=0.058..4.436 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone) AND (EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Buffers: shared hit=668
YugabyteDB does the same with additional information with the dist
option of explain
:
yugabyte=# explain (analyze, costs off, summary off, dist)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_category_created_at_extract_extract1_idx on demo (actual time=101.617..102.129 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2025-12-31 00:00:00'::timestamp without time zone) AND (EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 1.834 ms
Storage Table Rows Scanned: 70
Storage Index Read Requests: 1
Storage Index Read Execution Time: 99.426 ms
Storage Index Rows Scanned: 70
In both cases, adding the function's result to the index allowed the query planner to push down the condition to the index scan.
However, this is possible only with immutable functions. Extracting the hour and minute from a TIMESTAMP is deterministic. This is not the case when the timestamp includes the timezone because it depends on the database configuration. Changing the default timezone would corrupt the index.
Example with non-immutable function (EXTRACT on TIMESTAMPTZ)
Hettie's case defined "created_at" as TIMESTAMPTZ (which is a good idea to store timestamps that can be compared with each other, but maybe not when extracting the hour and minute):
create table demo (
id bigserial
,category int
,created_at timestamptz
,filler text default lpad('x',9999,'x')
)
;
insert into demo(category,created_at)
select 0, now() + generate_series(0,99999) * interval '1 minute'
;
create index on demo ( category, created_at )
;
vacuum analyze demo
;
Covering the extract
is not possible anymore because it not deterministic:
postgres=> create index on demo (
category, created_at
, (extract (hour from created_at))
, (extract (minute from created_at))
)
;
ERROR: functions in index expression must be marked IMMUTABLE
extract(text, timestamp with time zone)
is stable (deterministic within the scope of one SQL call) but not immutable (deterministic forever):
postgres=> \df+ extract
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Internal name | Description
------------+---------+------------------+-----------------------------------+------+------------+----------+--------------+----------+-------------------+----------+---------------------+---------------------------------------------
pg_catalog | extract | numeric | text, date | func | immutable | safe | zenith_admin | invoker | | internal | extract_date | extract field from date
pg_catalog | extract | numeric | text, interval | func | immutable | safe | zenith_admin | invoker | | internal | extract_interval | extract field from interval
pg_catalog | extract | numeric | text, time with time zone | func | immutable | safe | zenith_admin | invoker | | internal | extract_timetz | extract field from time with time zone
pg_catalog | extract | numeric | text, time without time zone | func | immutable | safe | zenith_admin | invoker | | internal | extract_time | extract field from time
pg_catalog | extract | numeric | text, timestamp with time zone | func | stable | safe | zenith_admin | invoker | | internal | extract_timestamptz | extract field from timestamp with time zone
pg_catalog | extract | numeric | text, timestamp without time zone | func | immutable | safe | zenith_admin | invoker | | internal | extract_timestamp | extract field from timestamp
It's not possible to add this function to the index.
YugabyteDB applied the filter on the index entries because the predicate was pushed down to the storage layer. However, this doesn't happen with this function:
yugabyte=# \d demo
Table "public.demo"
Column | Type | Collation | Nullable | Default
------------+--------------------------+-----------+----------+----------------------------------
id | bigint | | not null | nextval('demo_id_seq'::regclass)
category | integer | | |
created_at | timestamp with time zone | | |
filler | text | | | lpad('x'::text, 9999, 'x'::text)
Indexes:
"demo_category_created_at_idx" lsm (category ASC, created_at ASC)
yugabyte=# explain (analyze, costs off, summary off, dist)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_category_created_at_idx on demo (actual time=23.135..1925.909 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2025-12-31 00:00:00+00'::timestamp with time zone))
Filter: ((EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Rows Removed by Filter: 99930
Storage Table Read Requests: 98
Storage Table Read Execution Time: 1057.112 ms
Storage Table Rows Scanned: 100000
Storage Index Read Requests: 98
Storage Index Read Execution Time: 14.704 m
With a TIMESTAMPTZ argument, the extract
is not pushed down, even though it is an immutable function. YugabyteDB behavior reverts to what PostgreSQL does: Filter
after the Index Scan
result, which fetches 100000 Table Rows Scanned
from the storage (DocDB, distributed on YugabyteDB nodes) to the query layer (YSQL, the Postgres backend you are connected to) to discard 99930 Rows Removed by Filter
.
This is not efficient. Hettie's trick helps to reduce the table rows scanned:
yugabyte=# explain (analyze, costs off, summary off, dist)
select demo.* from demo where (category,created_at) in (
select category,created_at from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and extract (hour from created_at)=16 and extract (minute from created_at)=15
);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (actual time=107.361..179.929 rows=70 loops=1)
-> HashAggregate (actual time=105.868..105.996 rows=70 loops=1)
Group Key: demo_1.category, demo_1.created_at
Batches: 1 Memory Usage: 32kB
-> Index Only Scan using demo_category_created_at_idx on demo demo_1 (actual time=2.232..105.641 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2025-12-31 00:00:00+00'::timestamp with time zone))
Filter: ((EXTRACT(hour FROM created_at) = '16'::numeric) AND (EXTRACT(minute FROM created_at) = '15'::numeric))
Rows Removed by Filter: 99930
Heap Fetches: 0
Storage Index Read Requests: 98
Storage Index Read Execution Time: 55.127 ms
Storage Index Rows Scanned: 100000
-> Index Scan using demo_category_created_at_idx on demo (actual time=1.027..1.031 rows=1 loops=70)
Index Cond: ((category = 0) AND (created_at = demo_1.created_at))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 0.469 ms
Storage Table Rows Scanned: 1
Storage Index Read Requests: 1
Storage Index Read Execution Time: 0.464 ms
Storage Index Rows Scanned: 1
Note that this saves only the table rows scanned, but the predicate with the non-immutable extract
function is still applied as a filter and cannot be pushed down to the storage, resulting in 100000 index entries fetched.
Use an immutable function (TO_CHAR on TIMESTAMPTZ AT TIME ZONE 'GMT' )
One solution is to create an immutable function to extract the hours and minutes without depending on the default timezone:
create function GMT_HH24MI(ts timestamptz) returns text as $SQL$
select to_char(ts at time zone 'GMT','HH24MI')
$SQL$
language sql
immutable -- because it operates always on GMT whatever the default timezone is
;
This function can be indexed:
create index on demo ( category, created_at, GMT_HH24MI(created_at) )
;
The query must be rewritten to use the function:
yugabyte=# explain (analyze, costs off, summary off, dist)
select demo.* from demo
where category=0
and created_at between '01-01-2025' and '12-31-2025'
and GMT_HH24MI(created_at)='1615'
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using demo_category_created_at_gmthh24mi_idx on demo (actual time=73.116..73.610 rows=70 loops=1)
Index Cond: ((category = 0) AND (created_at >= '2025-01-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2025-12-31 00:00:00+00'::timestamp with time zone) AND (GMT_HH24MI(created_at) = '1615'::text))
Storage Table Read Requests: 1
Storage Table Read Execution Time: 1.915 ms
Storage Table Rows Scanned: 70
Storage Index Read Requests: 1
Storage Index Read Execution Time: 70.931 ms
Storage Index Rows Scanned: 70
Creating an immutable function and adding it to the index key or include clause allows PostgreSQL and YugabyteDB to apply the predicate to the index entries without an Index-Only Scan.
If you use such an immutable function, I recommend that the function's name describes the specific behavior. In this example, my function extracts the time of a particular time zone (GMT). The semantics are different:
- The original example with TIMESTAMPTZ and EXTRACT shows records created when it was 15:16 in the time zone of the user who runs the query later, but it may have been a different clock time for the one who created it.
- My GMT_HH24MI function shows those created when it was 15:16 UTC.
- With a TIMESTAMP without a time zone, EXTRACT shows those created when it was 15:16 in the creation's time zone, which may not match the querying user's time zone.
When specifying 15:16 in the query, which of these three results does the user expect? You cannot know without asking.
๐ All these considerations originate from client/server times. Today, you don't want your stored data to depend on where the application server runs. The application server and the database's default time zone should be identical and probably UTC. TIMESTAMP or AT TIME ZONE 'GMT' provides better optimization possibilities, as their results do not depend on environment variables but rely on the guarantee that those timezone settings are consistent across all servers and do not change.
๐ก Always look at the explain (analyze, buffers)
output in PostgreSQL or explain (analyze, dist)
in YugabyteDB. In general, you should avoid Rows Removed by Filter
on many rows, especially with distributed databases that must fetch them across the network before applying the Filter
condition. The number of buffers or rows scanned should not exceed the scan result (actual rows=
).
Featured ones: