Logo

dev-resources.site

for different kinds of informations.

PostgreSQL Full Text Search Rank by Position

Published at
11/30/2024
Categories
postgressql
tutorial
javascript
sveltekit
Author
kvetoslavnovak
Author
14 person written this
kvetoslavnovak
open
PostgreSQL Full Text Search Rank by Position

Recently, I encountered a problem with full-text search. I am using this feature in my search input, where the backend sends hints of possible matches as you type. The backend database is PostgreSQL. I needed the hints to be ranked by the position of the searched term in the text.

So if you search for "Star Wars" title you will get "Star Wars" post first instead of "How Star Wars 7- 9 changed the world of Star Wars (a fun made documentary about Star Wars)" which may have higher ranking as the term is there 3 times.

Full Text Search in PostgreSQL

Full-text search in PostgreSQL can be achieved quite easily. There are two main tools to use:

  • tsvector - represents a searchable document.
  • tsquery - represents the search query to perform against a document.

Let’s say we want to search the titles of our blog posts. To make them searchable, we can use the following query:

SELECT 
id, 
title 
FROM blogposts
WHERE to_tsquery('JavaScript') @@ to_tsvector(posts.title);
Enter fullscreen mode Exit fullscreen mode

In this case, we are converting the post titles to a tsvector dynamically with every search. However, this transformation takes some time. A better approach is to perform this transformation in advance in the database and also store it as an indexes for the titles for faster search.

Lets create a new collumn of vectors of titles and also index this new column:

ALTER TABLE blogposts ADD COLUMN search_vector tsvector;
UPDATE blogposts SET search_vector = (to_tsvector(posts.title));
CREATE INDEX titles_fts_idx ON blogposts USING gin(search_vector);
Enter fullscreen mode Exit fullscreen mode

Now try the search the term "JavaScript"

SELECT 
id, 
title
FROM blogposts
WHERE to_tsquery('JavaScript') @@ search_vector;
Enter fullscreen mode Exit fullscreen mode

You may also make indexes from ts vectors directly on titles column like this:

CREATE INDEX titles_fts_idx ON blogposts USING GIN (to_tsvector(posts.title));
Enter fullscreen mode Exit fullscreen mode

and use the search like this:

SELECT 
id, 
title
FROM blogposts
WHERE to_tsquery('JavaScript') @@ posts.title;
Enter fullscreen mode Exit fullscreen mode

Now, the full-text search will be blazingly fast, completing in milliseconds.

Ranking the Results

PostgreSQL provides the ts_rank feature, which allows you to score search results and order them based on their ranking. PostgreSQL supports the following ranking options:

  • 0 (the default) ignores the document length
  • 1 divides the rank by 1 + the logarithm of the document length
  • 2 divides the rank by the document length
  • 4 divides the rank by the mean harmonic distance between extents (this is implemented only by ts_rank_cd)
  • 8 divides the rank by the number of unique words in document
  • 16 divides the rank by 1 + the logarithm of the number of unique words in document
  • 32 divides the rank by itself + 1

You can use the ts_rank like this:

SELECT
    ...
ts_rank(search_vector, to_tsquery('JavaScript'), 0) as rank_title
    ...
ORDER BY rank_title DESC NULLS LAST
Enter fullscreen mode Exit fullscreen mode

However, there is no built-in ranking option based on the position of the search term within the string (i.e. title column).

POSITION to the rescue

Fortunatelly there is POSITION function in PostgreSQL. The PostgreSQL POSITION function is used to find the location of a substring within a given string. In our case we can use it like this

...
ts_rank(search_vector, to_tsquery('JavaScript'), 2) + (1.0 / NULLIF(POSITION('JavaScript' IN posts.title), 0) + 0.0001)  as rank
    ...
ORDER BY rank_title DESC NULLS LAST
Enter fullscreen mode Exit fullscreen mode

ts_rank uses normalization integer 2 because 2 divides the rank by the document length
Magick number + 0.0001 is to avoid dividing by the 0 because POSTION function counts from 1 not 0 and returns 0 if the string is not found.

Final code may look like this:

SELECT 
id, 
title,
ts_rank(search_vector, to_tsquery('JavaScript'), 0) as rank_title
FROM blogposts
WHERE to_tsquery('JavaScript') @@ search_vector
ORDER BY rank_title DESC NULLS LAST;
Enter fullscreen mode Exit fullscreen mode

Searching for more terms

One caveat has to be mentioned if you search for more terms at once (like JavaScript and TypeScript).

The arguments for the to_tsquery function may be used with a big fleibility, including logical operators etc. POSITION function on the other hand is "just" a substring in string.

Real World Example

Here is my example from real world endpoint in SvelteKit web application which uses postgres (sql) npm library:

import sql from '../../../utils/sql.js'

export async function POST({request, locals}) {
const {input: searchTerm, limit}  = await request.json()
let queryTerm = searchTerm.split(' ').map(word => `'${word}'`).join(' & ');

let lang = locals.searchingLang
let fts = `fts_${lang}`

if the string is not found 
    let fts_response = await sql `SELECT 
    id, 
    COALESCE(title->>${lang}, title->>'en') as title,
    ts_rank(${ sql(fts)}, ${queryTerm}, 2) + (1.0 / NULLIF(POSITION(${ searchTerm } IN COALESCE(title->>${lang}, title->>'en')), 0) + 0.0001)  as rank
    FROM notices 
    WHERE ${ sql(fts) } @@ to_tsquery('simple', ${queryTerm})
    ORDER BY rank DESC NULLS LAST
    LIMIT ${limit};`

 return  new Response(
      JSON.stringify(fts_response)
    )
}
Enter fullscreen mode Exit fullscreen mode

Here are the links to the documentatio in matter:

sveltekit Article's
30 articles in total
Favicon
Optimize SvelteKit performance with brotli compression
Favicon
SvelteKit VS Astro. laidback side by side
Favicon
Integrating SvelteKit with Storyblok (Using Svelte 5)
Favicon
Make EditorJS work in Svelte(kit) SSR
Favicon
Why Svelte?
Favicon
Nosecone: a library for setting security headers in Next.js, SvelteKit, Node.js, Bun, and Deno
Favicon
Building AI-Powered Apps with SvelteKit: Managing HTTP Streams from Ollama Server
Favicon
NgSysV2-3.4: A Serious Svelte InfoSys: Rules-friendly version
Favicon
NgSysV2-3.3: A Serious Svelte InfoSys: Firebase D/b rules and Login
Favicon
NgSysV2-3.6: A Serious Svelte InfoSys: Deploying to the Google Cloud
Favicon
NgSysV2-3.5: A Serious Svelte InfoSys: Client-Server Version
Favicon
NgSysV2-4.2: SEO (Search Engine Optimisation)
Favicon
NgSysV2-4.3: Automated Svelte Pre-render Builds
Favicon
NgSysV2-4.4: Responsive/Adaptive Design
Favicon
Deploy a Static Sveltekit site to Railway
Favicon
Why You Should Avoid Using `try...catch` in SvelteKit Actions
Favicon
How to integrate shadcn-svelte into the editable.website template
Favicon
PostgreSQL Full Text Search Rank by Position
Favicon
How to Build a Content-Driven Static Site with Markdown, SvelteKit and Fusionable
Favicon
Interview with Prabhu Kiran Konda, Creator of Snail AI!
Favicon
"Helper" Varaibles in Svelte 5
Favicon
Experiences and Caveats of Svelte 5 Migration
Favicon
Running a Function When an #await Block resolves in Svelte(Kit)
Favicon
SanS-UI v0.0.1 Quick Start!
Favicon
Introduction to Svelte: Features and Benefits of the Modern JavaScript Framework
Favicon
Sveltekit + TypeScript + TypeORM + ESM
Favicon
Svelte 5 is out!!!
Favicon
SanS-UI Released v0.0.1
Favicon
How to Integrate Passkeys into SvelteKit
Favicon
AWS Amplify (Gen2) with SvelteKit: authentication on SSR

Featured ones: