Learn how to build a reverse video search system using Mixpeek for video processing & embedding generation, combined with PostgreSQL as a vector database (powered by pgvector and pgvectorscale) hosted on Timescale Cloud.
With this system, you can query video data using both video and text queries to retrieve relevant video segments based on semantic similarity.
Prerequisites
PostgreSQL database (We're using Timescale Cloud)
Python 3.7+
Basic understanding of vector embeddings
Mixpeek API key (sign up at mixpeek.com)
How The Reverse Video System Works
Before we jump in to implementation details, let's review the system architecture.
Reverse video search system architecture
Video ingestion process
The ingestion process involves inputting the source video data into a vector database. Here's how it works:
Source video: The process starts with uploading a source video.
Video chunks: The video is split into chunks for optimized embedding creation using Mixpeek’s video indexing tool.
Embeddings: Mixpeek’s indexing tool extracts video features from each video chunk to generate vector embeddings using one of the embedding models integrated with it.
Vector database: The generated vector embeddings are stored in a database that supports vector similarity search—in this case, PostgreSQL with pgvector and pgvectorscale and hosted on Timescale Cloud. This database not only handles vector similarity searches but also enables you to store metadata, such as start and end times, alongside their embeddings.
Video retrieval process
Query: the user submits a text or video query.
Query embeddings: Mixpeek converts the query into embeddings to capture its semantic meaning.
Vector search: these embeddings are then compared against the stored embeddings in the vector database to retrieve the closest matches using vector similarity search.
The Timescale and Mixpeek tech stacks complement each other. Mixpeek generates the vector embeddings, while Timescale’s PostgreSQL database—powered by pgvector and pgvectorscale—ensures optimized storage, management, and retrieval of the video data and its embeddings.
In this section, we first create a .env file to store our environment variables. Let’s first get these variables:
Obtain a https://docs.mixpeek.com/overview/introduction from your Mixpeek account. If you don’t have an account yet, you can sign up for one through this page.
To get started, let’s install the required libraries:
%pipinstallpsycopg2-binarypython-dotenvrequests
psycopg2 enables the connection to PostgreSQL. python-dotenv lets you read the values stored in your environment while requests allows you to send HTTP requests easily.
Then, you can import the libraries and load the environment variables as follows:
In this section, we define different functions related to video indexing, feature extraction, and retrieving video chunks & their embeddings using Mixpeek’s API. Then, we demonstrate how to get embeddings using a video.
Video indexing and feature extraction
In the index_video_file function, we use Mixpeek’s Index Video Url endpoint to process the source video and divide it into chunks. For each video chunk, this tool does the following:
Reads on-screen text using the video-descriptor-v1 model.
Generates a 1408-dimensional vector embedding with the multimodal-v1 model
Transcribes the audio in the video chunk using the polyglot-v1 model
Creates a comprehensive description of the chunk, including the screenplay and sound details
BASE_URL="https://api.mixpeek.com"headers={'Authorization':f'Bearer {MIXPEEK_API_KEY}','Content-Type':'application/json'}defindex_video_file(video_url,video_name,chunking_interval):payload=json.dumps({"url":video_url,"collection_id":"mixpeek_timescaledb","metadata":{"name":video_name},"video_settings":[{"interval_sec":chunking_interval,"read":{"model_id":"video-descriptor-v1"},"embed":{"model_id":"multimodal-v1"},"transcribe":{"model_id":"polyglot-v1"},"describe":{"model_id":"video-descriptor-v1","prompt":"Create a holistic description of the video, include sounds and screenplay"},}]})indexing_response=requests.post(url=f"{BASE_URL}/index/videos/url",headers=headers,data=payload)task_id=indexing_response.json()["task_id"]print(f"Indexing started. Task ID: {task_id}")returntask_id
Let's use the task_id associated with the indexing process to check its status through the Get Task endpoint.
In this part, we access the metadata (start_time and end_time) and feature_ids of the video chunks created using the Get Asset With Features endpoint and the asset_id from the response from the Get Task endpoint.
In this tutorial, we'll use PostgreSQL with the pgvector and pgvectorscale extensions as our vector database. This database instance is hosted on Timescale Cloud.
The pgvectorscale extension builds on top of pgvector, enabling PostgreSQL to efficiently store and query vector embeddings. You might wonder why you should upgrade from PostgreSQL with pgvector to Timescale Cloud’s AI stack (pgai, pgvectorscale, and pgai Vectorizer). Here’s why:
High-performance and scalability for your AI Applications: pgvectorscale boosts PostgreSQL’s ANN capabilities with StreamingDiskANN, a disk-based ANN algorithm that outperforms memory-based indexes like pgvector’s IVFFlat. With no ef_search cutoffs and its streaming model, it enhances query speed and accuracy, continuously retrieving the “next closest” item, potentially even traversing the entire graph!
A simplified AI stack: Timescale’s AI stack integrates vector embeddings, relational data, and time-series data in one place. This consolidation significantly reduces the complexity of infrastructure management and data synchronization, allowing you to focus on building AI applications.
Seamless PostgreSQL compatibility: Since Timescale inherits PostgreSQL’s syntax and robustness, developers with PostgreSQL experience can integrate AI capabilities in their application development without a steep learning curve.
Use the code below to connect to your database service and confirm database access:
Since we are working with embedding data, we need to ensure our PostgreSQL service can support it. Therefore, we install the pgvector and pgvectorscale extensions before creating the table, video_embeddings, that stores information about video segments (or chunks) and their embeddings.
withconnect_db()asconn:withconn.cursor()ascurs:#Installsbothpgvectorandpgvectorscalecurs.execute("CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;")withconn.cursor()ascurs:curs.execute("""
CREATE TABLE IF NOT EXISTS video_embeddings(
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
embedding VECTOR(1408),
start_time NUMERIC,
end_time NUMERIC
);
""")
Here's a breakdown of the columns:
id: a unique identifier for each video chunk.
embedding: a 1408-dimensional vector embedding of the video chunk.
start_time: the starting time of the video chunk. For example, if a video is split into segments, this could be each segment's start time (in seconds or another unit).
end_time: the ending time of the video chunk, indicating when the segment finishes.
Data insertion
Let’s ingest the video chunks and their embeddings into our database.
Vector search queries will primarily target the embedding column, so we create an index on this column using StreamingDiskANN. It significantly speeds up vector similarity searches.
withconnect_db()asconn:withconn.cursor()ascurs:curs.execute('''
CREATE INDEX video_embeddings_idx
ON video_embeddings
USING diskann (embedding);
''')
Search functions
In this section, we demonstrate two search functions for retrieving relevant video chunks: one based on video input and the other based on text query. The idea is to search for similar video chunks stored in the database by comparing embeddings, allowing us to match the content of the video query or find similar scenes based on descriptive text.
For each query, we first generate vector embeddings and then use them to search for the closest video chunks through the source video embeddings, ranking results by cosine distance. Let’s first define a helper function for vector similarity search.
#helperfunctionforvectorsimilaritysearchdefretrieve_closest_video_chunks(query_embedding,limit):withconnect_db()asconn:withconn.cursor()ascurs:curs.execute('''
SELECT start_time, end_time
FROM video_embeddings
ORDER BY embedding <=> %s::vector
LIMIT %s
''',(query_embedding['embedding'],limit))print("CLOSEST VIDEO CHUNKS:")closest_video_chunks=[]forrowincurs.fetchall():print(f"start_time: {row[0]}, end_time: {row[1]}")closest_video_chunks.append({"start_time":row[0],"end_time":row[1]})
In this part, let's use the Index Text endpoint to generate embeddings for the text query and then use them to perform a vector similarity search.
text_query="two people in a car"payload=json.dumps({"text":text_query,"collection_id":"mixpeek_timescale","metadata":{"author":"user"},"text_settings":{"embed":{"model_id":"multimodal-v1"}}})index_text_response=requests.post(url=f"{BASE_URL}/index/text",headers=headers,data=payload)task_id=index_text_response.json()["task_id"]print(f"Indexing started. Task ID: {task_id}")#retrievefeatureextractedfromthetextqueryasset_id=get_asset_id(index_text_response.json()["task_id"])get_asset_response=requests.get(url=f"{BASE_URL}/assets/{asset_id}/features",headers=headers)text_asset=get_asset_response.json()["features"]["text"]#extractthegeneratedtextembeddingget_feature_response=requests.get(url=f"{BASE_URL}/features/{text_asset[0]['feature_id']}",headers=headers,params={"include_vectors":True})text_query_embedding={"embedding":get_feature_response.json()["vectors"]['multimodal-v1']}retrieve_closest_video_chunks(text_query_embedding,2)
One of the frames from video segments (start_time: 30.0, end_time: 40.0)
This demo uses a single video. However, we can extend the same approach to handle a collection of videos.
Conclusion
In this article, we covered how to build a reverse video search engine using Mixpeek and Timescale Cloud’s mature PostgreSQL cloud platform. This stack potentially paves the way for many enhancements in multi-modal video analysis and retrieval. We can deploy add-ons to the current system, for example, integrating AI-generated sentiment analysis or treating support queries in several languages.
AI is still in its early stages. Video search and understanding will continue to evolve. If you're interested in implementing these solutions, check out Mixpeek’s API documentation and Timescale’s AI stack to start building your own advanced video search engine.
A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL
Power your AI applications with PostgreSQL
pgai is a PostgreSQL extension that simplifies data storage and retrieval for Retrieval Augmented Generation (RAG), and other AI applications
In particular, it automates the creation and sync of embeddings for your data stored in PostgreSQL, simplifies
semantic search, and allows you to call LLM models from SQL.
Docker
See the install via docker guide for docker compose files and detailed container instructions.
Installing pgai into an existing PostgreSQL instance (Linux / MacOS)
See the install from source guide for instructions on how to install pgai from source.
Quick Start
This section will walk you through the steps to get started with pgai and Ollama using docker and show you the major features of pgai. We also have a quick start with OpenAI and a quick start with…