dev-resources.site
for different kinds of informations.
Generate embeddings with Azure AI Vision multi-modal embeddings API
Welcome to the second part of the โImage similarity search with pgvectorโ learning series! In the previous article, you learned how to describe vector embeddings and vector similarity search. You also used the multi-modal embeddings APIs of Azure AI Vision for generating embeddings for images and text and calculated the cosine similarity between two vectors.
Introduction
In this learning series, we will create an application that enables users to search for paintings based on either a reference image or a text description. We will use the SemArt Dataset, which contains approximately 21k paintings gathered from the Web Gallery of Art. Each painting comes with various attributes, like a title, description, and the name of the artist.
In this tutorial, you will:
- Prepare the data for further processing.
- Generate vector embeddings for a collection of images of paintings using the Vectorize Image API of Azure AI Vision.
Prerequisites
To proceed with this tutorial, ensure that you have the following prerequisites installed and configured:
- An Azure subscription - Create an Azure free account or an Azure for Students account.
- An Azure AI Vision resource - For instructions on creating an Azure AI Vision resource, see Part 1.
- Python 3.10, Visual Studio Code, Jupyter Notebook, and Jupyter Extension for Visual Studio Code.
Set-up your working environment
In this article, you will find instructions on how to generate embeddings for a collection of images using Azure AI Vision. The complete working project can be found in the GitHub repository. If you want to follow along, you can fork the repository and clone it to have it locally available.
Before running the scripts, you should:
- Download the SemArt Dataset into the semart_dataset directory.
- Create a virtual environment and activate it.
-
Install the required Python packages using the following command:
pip install -r requirements.txt
Data preprocessing
Note: The code for data preprocessing can be found at data_processing/data_preprocessing.ipynb.
For our application, we'll be working with a subset of the original dataset. Alongside the image files, we aim to retain associated metadata like the title, author's name, and description for each painting. To prepare the data for further processing and eliminate unnecessary information, we will take several steps as outlined in the Jupyter Notebook available on my GitHub repository:
- Clean up the text descriptions by removing special characters to minimize errors related to character encoding.
- Clean up the names of the artists, addressing encoding issues for some artists' names.
- Exclude artists with fewer than 15 paintings from the dataset, along with other data we won't be using.
After these steps, the final dataset will comprise 11,206 images of paintings.
Create vector embeddings with Azure AI Vision
Note: The code for vector embeddings generation can be found at data_processing/generate_embeddings.py.
To generate embeddings for the images, our process can be summarized as follows:
- Retrieve the filenames of the images in the dataset.
- Divide the data into batches, and for each batch, perform the following steps:
- Compute the vector embedding for each image in the batch using the Vectorize Image API of Azure AI Vision.
- Save the vector embeddings of the images along with the filenames into a file.
- Update the dataset by inserting the vector embedding of each image.
In the following sections, we will discuss specific segments of the code.
Compute embeddings for the images in the dataset
As discussed in Part 1, computing the vector embedding of an image involves sending a POST request to the Azure AI Vision retrieval:vectorizeImage
API. The binary image data (or a publicly available image URL) is included in the request body, and the response consists of a JSON object containing the vector embedding of the image. In Python, this can be achieved by utilizing the requests
library to send a POST request.
def get_image_embedding(image: str) -> list[float] | None:
"""
Generates a vector embedding for an image using Azure AI Vision 4.0
(Vectorize Image API).
:param image: The image filepath.
:return: The vector embedding of the image.
"""
with open(image, "rb") as img:
data = img.read()
headers = {
"Content-type": "application/octet-stream",
"Ocp-Apim-Subscription-Key": vision_key,
}
try:
r = requests.post(vectorize_img_url, data=data, headers=headers)
if r.status_code == 200:
image_vector = r.json()["vector"]
return image_vector
else:
print(
f"An error occurred while processing {image}. "
f"Error code: {r.status_code}."
)
except Exception as e:
print(f"An error occurred while processing {image}: {e}")
return None
The compute_embeddings
function computes the vector embeddings for all the images in our dataset. It uses the ThreadPoolExecutor
object to generate vector embeddings for each batch of images efficiently, utilizing multiple threads. The tqdm
library is also utilized in order to provide progress bars for better visualizing the embeddings generation process.
def compute_embeddings(image_names: list[str]) -> None:
"""
Computes vector embeddings for the provided images and saves the embeddings
alongside their corresponding image filenames in a CSV file.
:param image_names: A list containing the filenames of the images.
"""
image_names_batches = [
image_names[i:(i + BATCH_SIZE)]
for i in range(0, len(image_names), BATCH_SIZE)
]
for batch in tqdm(range(len(image_names_batches)), desc="Computing embeddings"):
images = image_names_batches[batch]
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
embeddings = list(
tqdm(
executor.map(
lambda x: get_image_embedding(
image=os.path.join(images_folder, x),
),
images,
),
total=len(images),
desc=f"Processing batch {batch+1}",
leave=False,
)
)
valid_data = [
[images[i], str(embeddings[i])] for i in range(len(images))
if embeddings[i] is not None
]
save_data_to_csv(valid_data)
Once the embeddings for all the images in a batch are computed, the data is saved into a CSV file.
def save_data_to_csv(data: list[list[str]]) -> None:
"""
Appends a list of image filenames and their associated embeddings to
a CSV file.
:param data: The data to be appended to the CSV file.
"""
with open(embeddings_filepath, "a", newline="") as csv_file:
write = csv.writer(csv_file)
write.writerows(data)
Azure AI Vision API rate limits
Azure AI Vision API imposes rate limits on its usage. In the free tier, only 20 transactions per minute are allowed, while the standard tier allows up to 30 transactions per second, depending on the operation (Source: Microsoft Docs). If you exceed the default rate limit, you'll receive a 429
HTTP error code.
For our application, it is recommended to use the standard tier during the embeddings generation process and limit the number of requests per second to approximately 10 to avoid potential issues.
Generate the dataset
After computing the vector embeddings for all images in the dataset, we proceed to update our dataset by inserting the vector embedding for each image. In the generate_dataset
function, the merge
method of pandas.DataFrame
is used for merging the dataset with a database-style join.
def generate_dataset() -> None:
"""
Appends the corresponding vectors to each column of the original dataset
and saves the updated dataset as a CSV file.
"""
dataset_df = pd.read_csv(dataset_filepath, sep="\t", dtype="string")
embeddings_df = pd.read_csv(
embeddings_filepath,
dtype="string",
names=[IMAGE_FILE_CSV_COLUMN_NAME, EMBEDDINGS_CSV_COLUMN_NAME],
)
final_dataset_df = dataset_df.merge(
embeddings_df, how="inner", on=IMAGE_FILE_CSV_COLUMN_NAME
)
final_dataset_df.to_csv(final_dataset_filepath, index=False, sep="\t")
Next steps
In this post, we computed vector embeddings for a set of images featuring paintings using the Azure AI Vision Vectorize Image API. The code shared here serves as a reference, and you can customize it to suit your particular use case.
Here are some additional learning resources:
- Azure AI Vision Multi-modal embeddings - Microsoft Docs
- Call the multi-modal embeddings APIs โ Microsoft Docs
๐ Hi, I am Foteini Savvidou!
An Electrical and Computer Engineer and Microsoft AI MVP (Most Valuable Professional) from Greece.
Featured ones: