Logo

dev-resources.site

for different kinds of informations.

Quick tip: Cosine Similarity revisited in SingleStore

Published at
3/31/2024
Categories
singlestoredb
cosinesimilarity
vectordatabase
Author
veryfatboy
Author
10 person written this
veryfatboy
open
Quick tip: Cosine Similarity revisited in SingleStore

Abstract

In a previous article, we saw how to represent Cosine Similarity in SingleStore by combining the DOT_PRODUCT and SQRT functions. The SingleStore documentation also provides a way to implement a COSINE_SIMILARITY function. In this article, we'll see how.

Create a SingleStoreDB Cloud account

A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use the following settings:

  • Workspace Group Name: Iris Demo Group
  • Cloud Provider: AWS
  • Region: US East 1 (N. Virginia)
  • Workspace Name: iris-demo
  • Size: S-00

Create a Database and Table

In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this iris_db, as follows:

CREATE DATABASE IF NOT EXISTS iris_db;
Enter fullscreen mode Exit fullscreen mode

We'll also create the iris table using the new VECTOR data type, as follows:

USE iris_db;

CREATE TABLE IF NOT EXISTS iris (
    vector VECTOR(4),
    species VARCHAR(20)
);
Enter fullscreen mode Exit fullscreen mode

The Iris data set contains 150 rows of data for three different species of flowers. Each flower has four columns of data: sepal_length, sepal_width, petal_length and petal_width. We can store these four column values together, as follows:

INSERT INTO iris VALUES
('[5.1,3.5,1.4,0.2]','Iris-setosa'),
('[4.9,3,1.4,0.2]','Iris-setosa'),
('[4.7,3.2,1.3,0.2]','Iris-setosa'),
('[4.6,3.1,1.5,0.2]','Iris-setosa'),
('[5,3.6,1.4,0.2]','Iris-setosa'),
...
('[6.7,3,5.2,2.3]','Iris-virginica'),
('[6.3,2.5,5,1.9]','Iris-virginica'),
('[6.5,3,5.2,2]','Iris-virginica'),
('[6.2,3.4,5.4,2.3]','Iris-virginica'),
('[5.9,3,5.1,1.8]','Iris-virginica');
Enter fullscreen mode Exit fullscreen mode

Only the first five and last five rows are shown above. The complete INSERT code listing is available in a GitHub Gist.

Create a Cosine Similarity Function

From the product documentation, we'll now define two functions that we can use:

DELIMITER //
CREATE OR REPLACE FUNCTION NORMALIZE(v VECTOR(4)) RETURNS VECTOR(4) AS
DECLARE
    squares VECTOR(4) = VECTOR_MUL(v, v);
    length FLOAT = SQRT(VECTOR_ELEMENTS_SUM(squares));
BEGIN
    RETURN SCALAR_VECTOR_MUL(1/length, v);
END //
DELIMITER ;
Enter fullscreen mode Exit fullscreen mode

and

DELIMITER //
CREATE OR REPLACE FUNCTION COSINE_SIMILARITY(v1 VECTOR(4), v2 VECTOR(4)) RETURNS FLOAT AS
BEGIN
    RETURN DOT_PRODUCT(NORMALIZE(v1), NORMALIZE(v2));
END //
DELIMITER ;
Enter fullscreen mode Exit fullscreen mode

In the SingleStore notebook environment, the code would be as follows:

CREATE OR REPLACE FUNCTION NORMALIZE(v VECTOR(4)) RETURNS VECTOR(4) AS
DECLARE
    squares VECTOR(4) = VECTOR_MUL(v, v);
    length FLOAT = SQRT(VECTOR_ELEMENTS_SUM(squares));
BEGIN
    RETURN SCALAR_VECTOR_MUL(1/length, v);
END;
Enter fullscreen mode Exit fullscreen mode

and

CREATE OR REPLACE FUNCTION COSINE_SIMILARITY(v1 VECTOR(4), v2 VECTOR(4)) RETURNS FLOAT AS
BEGIN
    RETURN DOT_PRODUCT(NORMALIZE(v1), NORMALIZE(v2));
END;
Enter fullscreen mode Exit fullscreen mode

In other words, remove the references to DELIMITER and //.

Query 1

First, let's try a query where we want to find the name of the flower species using an exact match for the sepal_length, sepal_width, petal_length and petal_width. We'll use the values [5.9,3,5.1,1.8] from the last row of the iris table, shown above.

SELECT species
FROM iris
ORDER BY COSINE_SIMILARITY(vector, '[5.9,3,5.1,1.8]') DESC
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

The result should be:

+----------------+
| species        |
+----------------+
| Iris-virginica |
+----------------+
Enter fullscreen mode Exit fullscreen mode

Query 2

Now, let's use some fictitious data values [5.2,3.6,1.5,0.3] to make a prediction.

SELECT species
FROM iris
ORDER BY COSINE_SIMILARITY(vector, '[5.2,3.6,1.5,0.3]') DESC
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

The result should be:

+-------------+
| species     |
+-------------+
| Iris-setosa |
+-------------+
Enter fullscreen mode Exit fullscreen mode

Cleanup:

DROP TABLE IF EXISTS iris;
DROP DATABASE IF EXISTS iris_db;
Enter fullscreen mode Exit fullscreen mode

Summary

SingleStore provides direct support for the DOT_PRODUCT and EUCLIDEAN_DISTANCE functions. We can easily implement our own COSINE_SIMILARITY function as shown in this example.

singlestoredb Article's
30 articles in total
Favicon
Quick tip: Visualising the Air Quality Index (AQI) across Punjab, Pakistan and India
Favicon
Quick tip: Using SingleStore with OpenAI's Swarm
Favicon
Quick tip: Using SingleStore and WebAssembly for Sentiment Analysis of Stack Overflow Comments
Favicon
Quick tip: Building Predictive Analytics for Loan Approvals
Favicon
Quick tip: Build Vector Embeddings for Video via Python Notebook & OpenAI CLIP
Favicon
Quick tip: SingleStore Kai support for MongoDB $vectorSearch
Favicon
Quick tip: Using SingleStore with PyIceberg
Favicon
Quick tip: Using SingleStore for Iceberg Catalog Storage
Favicon
Quick tip: Using picoGPT in the SingleStore portal
Favicon
Quick tip: Ollama + SingleStore - LangChain = :-(
Favicon
Quick tip: How to Build Local LLM Apps with Ollama and SingleStore
Favicon
Quick tip: Using R, OpenAI and SingleStore Notebooks
Favicon
Quick tip: Write numpy arrays directly to the SingleStore VECTOR data type
Favicon
Quick tip: Using R, Rayshader and SingleStore Notebooks
Favicon
Quick tip: Using R with SingleStore Notebooks
Favicon
Quick tip: Using Apache Spark and GraphFrames with SingleStore Notebooks
Favicon
Quick tip: Using Apache Spark Structured Streaming with SingleStore Notebooks
Favicon
Quick tip: Using SingleStore Spark Connector's Query Pushdown with SingleStore Notebooks
Favicon
Quick tip: Using the SingleStore Spark Connector with SingleStore Notebooks
Favicon
Quick tip: Using Apache Spark with SingleStore Notebooks for Fraud Detection
Favicon
Quick tip: Cosine Similarity revisited in SingleStore
Favicon
Quick tip: Using Apache Spark with SingleStore Notebooks
Favicon
Quick tip: Using Approximate Nearest Neighbor (ANN) Search with SingleStoreDB
Favicon
Quick tip: Using the new VECTOR data type and Infix Operators in SingleStoreDB
Favicon
Quick tip: Dot Product, Euclidean Distance and Cosine Similarity in SingleStoreDB
Favicon
Vector Databases & AI Applications for Dummies
Favicon
Quick tip: Analysing Stock Tick Data in SingleStoreDB using LangChain and OpenAI's Whisper
Favicon
Quick tip: Replicating JSON data from MongoDB to SingleStore Kai and creating OpenAI embeddings
Favicon
Quick tip: Streaming data from MongoDB Atlas to SingleStore Kai using Kafka and CDC
Favicon
Quick tip: Using LangChain's SQLDatabaseToolkit with SingleStoreDB

Featured ones: