Logo

dev-resources.site

for different kinds of informations.

Deploying OpenAI's Whisper Large V3 Model on SageMaker Using Hugging Face Libraries

Published at
1/23/2024
Categories
whisper
huggingface
sagemaker
aws
Author
makawtharani
Categories
4 categories in total
whisper
open
huggingface
open
sagemaker
open
aws
open
Author
12 person written this
makawtharani
open
Deploying OpenAI's Whisper Large V3 Model on SageMaker Using Hugging Face Libraries

In a recent project, I was utilizing OpenAI's Whisper model for transcription. The sprint goal was to deploy it on SageMaker, leveraging the smoothness of Hugging Face libraries. However, I encountered a block: a ModelError that puzzled me for a couple of hours.
The error in more details:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Wrong index found for \u003c|0.02|\u003e: should be None but found 50366."

After conducting research, I discovered a solution discussed in Issue #58 on the Hugging Face forum, within the OpenAI Whisper Large V3 repository. The solution indicates that the issue is caused by variations in the transformers libraries, and to resolve it, we need to enforce the use of a more recent version. It's important to note that the required libraries are not currently supported by the Hugging Face library (as of now).

In this blog post, I will present a straightforward method to implement this solution, whether you are utilizing a SageMaker domain or a SageMaker notebook for deploying Whisper Large models.

1. Setting Up Directory and Files

In this phase, we create the necessary directory structure and files for our Whisper model deployment.
This includes creating the whisper-model directory, the inference.py script, and the requirements.txt file.
The script inference.py sets up the model and processor configurations for the Whisper model.

import os

# Directory and file paths
dir_path = './whisper-model'
inference_file_path = os.path.join(dir_path, 'code/inference.py')
requirements_file_path = os.path.join(dir_path, 'code/requirements.txt')

# Create the directory structure
os.makedirs(os.path.dirname(inference_file_path), exist_ok=True)

# Inference.py content
inference_content = '''
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

# Model and task specifications
model_id = "openai/whisper-large-v3"
task = "automatic-speech-recognition"

# Device configuration
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

def model_fn(model_dir):
    try:
        print(f"Loading model: {model_id}")
        # Load the model
        model = AutoModelForSpeechSeq2Seq.from_pretrained(
            model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
        )
        model.to(device)
        print(f"Model loaded on device: {device}")

        # Load the processor
        processor = AutoProcessor.from_pretrained(model_id)
        print("Processor loaded")

        # Create and return a pipeline for ASR
        asr_pipeline = pipeline(
            task,
            model=model,
            tokenizer=processor.tokenizer,
            feature_extractor=processor.feature_extractor,
            return_timestamps=True,
            torch_dtype=torch_dtype,
            device=device,
        )
        print("Pipeline created")

        return asr_pipeline
    except Exception as e:
        print(f"An error occurred: {e}")
        raise
'''

# Write the inference.py file
with open(inference_file_path, 'w') as file:
    file.write(inference_content)

# Requirements.txt content
requirements_content = '''
transformers==4.38.0
accelerate==0.26.1
'''

# Write the requirements.txt file
with open(requirements_file_path, 'w') as file:
    file.write(requirements_content)
Enter fullscreen mode Exit fullscreen mode

2. Archiving the Directory

In this phase, we archive the entire whisper-model directory into a compressed file using the make_archive function from shutil.
This compressed file is prepared for deployment to SageMaker.

import shutil
shutil.make_archive('./whisper-model', 'gztar', './whisper-model')
Enter fullscreen mode Exit fullscreen mode

3. Uploading the Model to S3

This phase involves uploading the Whisper model, which is now in a compressed format, to Amazon S3 bucket.
We utilize SageMaker's capabilities to interact with S3 for efficient storage and retrieval.

import sagemaker
import boto3

# Get the SageMaker session and default S3 bucket
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() # Change if you want to store in a different bucket
prefix = 'whisper/code'

# Upload the model to S3
s3_path = sagemaker_session.upload_data(
    'whisper-model.tar.gz', 
    bucket=bucket,
    key_prefix=prefix
)

print(f"Model uploaded to {s3_path}")
Enter fullscreen mode Exit fullscreen mode

4. Deploying the Model on SageMaker

Here, we deploy the Whisper model on SageMaker using the Hugging Face Model Class.
We specify the model's version, PyTorch version, instance type, and other parameters to ensure smooth deployment as an inference endpoint.

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    model_data=s3_path,
    role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge"
)
Enter fullscreen mode Exit fullscreen mode

5. Making Predictions with the Deployed Model

In this final phase, we configure the deployed model to handle audio input data.
We specify the data serializer for audio and demonstrate how to use the deployed model for making predictions, such as transcribing speech from audio files.

from sagemaker.serializers import DataSerializer

predictor.serializer = DataSerializer(content_type='audio/x-audio')

# Make sure the input file "sample1.flac" exists
with open("sample.wav", "rb") as f:
    data = f.read()
predictor.predict(data)
Enter fullscreen mode Exit fullscreen mode

Hope it was helpful!!

whisper Article's
30 articles in total
Favicon
How Machines Hear and Understand Us
Favicon
Wisper, ffmpeg을 활용한 비디오 자막 자동 생성
Favicon
Most affordable Whisper API
Favicon
AI and Emotional Dependency: A Growing Concern
Favicon
Distance de Levenshtein : Le Guide Ultime pour Mesurer la Similarité Textuelle
Favicon
Creating a Free AI voice-to-text transcription Program using Whisper
Favicon
Do Pet Translator Apps Work? Unveiling the Science Behind Dog Translator Apps & More!
Favicon
Build A Transcription App with Strapi, ChatGPT, & Whisper: Part 1
Favicon
免費開源的語音辨識功能:Cloudflare Workers AI + Whisper
Favicon
fishaudio/fish-speech-1.2-torrent
Favicon
Pitch-Tonic
Favicon
Making My Own Karaoke Videos with AI
Favicon
When smart algorithms beat artificial intelligence -brute force
Favicon
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
Favicon
Generate subtitles with OpenAI Whisper and Eyevinn OSC
Favicon
Deploying OpenAI's Whisper Large V3 Model on SageMaker Using Hugging Face Libraries
Favicon
免費開源的語音辨識功能:Google Colab + Faster Whisper
Favicon
免費開源的語音辨識功能:Google Colab + Whisper large v3
Favicon
OpenAI Whisper new model Large V3 just released and amazing
Favicon
Write a video translation and voiceover tool in Python
Favicon
Optimise OpenAI Whisper API: Audio Format, Sampling Rate and Quality
Favicon
Using Nodejs Buffers to transcribe an Audio file using OpenAI's Whisper service
Favicon
OpenAI Playground: Unlocking the Potential of AI Models
Favicon
How to get text from any YT video | Free transcribe program 🖹
Favicon
How to use Whisper AI (using Google Colab)
Favicon
OpenAI Whisper Deployment on AWS as Asynchronous Endpoint
Favicon
How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro
Favicon
Achieving 90% Cost-Effective Transcription and Translation with Optimised OpenAI Whisper on Q Blocks
Favicon
Translate Speech Into Japanese (open source web app)
Favicon
Build a Telegram voice chatbot using ChatGPT API and Whisper

Featured ones: