Logo

dev-resources.site

for different kinds of informations.

Optimise OpenAI Whisper API: Audio Format, Sampling Rate and Quality

Published at
9/17/2023
Categories
openai
whisper
programming
audio
Author
mxro
Categories
4 categories in total
openai
open
whisper
open
programming
open
audio
open
Author
4 person written this
mxro
open
Optimise OpenAI Whisper API: Audio Format, Sampling Rate and Quality

While experimenting with OpenAI's Whisper model via API, I discovered it could sometimes seem slow when recognising vocal commands sent from a tool I have developed. I originally put this down to the inherent constraints of using this API, attributing it to the size of the data needing to be sent to the server - in this instance, somewhat large audio files.

However, tinkering with the format of the audio files sent to OpenAI led me to discover that response times could be optimised considerably - by over 50%, in fact. The main culprit for lag was the file size I sent through the API; i.e., the recordings requiring transcription.

By adjusting the sampling rate and quality of the content I sent to the OpenAI API, I managed to cut the file size significantly.

tldr;

It seems you can reduce latency considerably by lowering quality settings without detriment to transcription precision. Here are the settings I suggest:

  • Format: MP3
  • Bit Rate: 16 kbps
  • Sample Rate: 12 kHz
  • Channels: mono

Experiment!

I conducted a series of tests, each time recording the same command, 'Open Firefox'. The table below shows the correlated file sizes once I adjusted MP3 quality from 16 kbps through to 128 kbps and switched the sampling rate from 12k to 24k.

Recording Sizes

(Note: most microphones sample at 16 kHz)

I generated these recordings using fmedia with variations of the following command:



./bin/fmedia-1.31-windows-x64/fmedia/fmedia.exe --record --overwrite --mpeg-q-mpeg-quality=16 --rate=24000 --out=rec_16_24k.mp3


Enter fullscreen mode Exit fullscreen mode

The results for transcription and GPT4 parsing times are as follows:

File Size Time
rec_16_12k.mp3 5KB 1.8 s
rec_32_12k.mp3 9KB 1.8 s
rec_128_24k.mp3 33KB 2.6 s

I obtained these measurements via this command:



time ./bin/whisper-autohotkey/whisper-autohotkey.exe rec_16_24k.mp3


Enter fullscreen mode Exit fullscreen mode

Regardless of the quality settings, the accuracy of transcription did not vary.

Conclusion

Even though my experiments are not perfect - Whisper and ChatGPT APIs are, after all, notoriously unpredictable for consumer grade accounts - the results suggest that reducing audio recording quality sent to Whisper's API can cut latency significantly without sacrificing transcription accuracy.

There seems to be little to no advantage in reducing the quality below 32 kbps and 12 kHz, possibly because of the characteristics of MP3 compression.

whisper Article's
30 articles in total
Favicon
How Machines Hear and Understand Us
Favicon
Wisper, ffmpeg을 활용한 비디오 자막 자동 생성
Favicon
Most affordable Whisper API
Favicon
AI and Emotional Dependency: A Growing Concern
Favicon
Distance de Levenshtein : Le Guide Ultime pour Mesurer la Similarité Textuelle
Favicon
Creating a Free AI voice-to-text transcription Program using Whisper
Favicon
Do Pet Translator Apps Work? Unveiling the Science Behind Dog Translator Apps & More!
Favicon
Build A Transcription App with Strapi, ChatGPT, & Whisper: Part 1
Favicon
免費開源的語音辨識功能:Cloudflare Workers AI + Whisper
Favicon
fishaudio/fish-speech-1.2-torrent
Favicon
Pitch-Tonic
Favicon
Making My Own Karaoke Videos with AI
Favicon
When smart algorithms beat artificial intelligence -brute force
Favicon
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
Favicon
Generate subtitles with OpenAI Whisper and Eyevinn OSC
Favicon
Deploying OpenAI's Whisper Large V3 Model on SageMaker Using Hugging Face Libraries
Favicon
免費開源的語音辨識功能:Google Colab + Faster Whisper
Favicon
免費開源的語音辨識功能:Google Colab + Whisper large v3
Favicon
OpenAI Whisper new model Large V3 just released and amazing
Favicon
Write a video translation and voiceover tool in Python
Favicon
Optimise OpenAI Whisper API: Audio Format, Sampling Rate and Quality
Favicon
Using Nodejs Buffers to transcribe an Audio file using OpenAI's Whisper service
Favicon
OpenAI Playground: Unlocking the Potential of AI Models
Favicon
How to get text from any YT video | Free transcribe program 🖹
Favicon
How to use Whisper AI (using Google Colab)
Favicon
OpenAI Whisper Deployment on AWS as Asynchronous Endpoint
Favicon
How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro
Favicon
Achieving 90% Cost-Effective Transcription and Translation with Optimised OpenAI Whisper on Q Blocks
Favicon
Translate Speech Into Japanese (open source web app)
Favicon
Build a Telegram voice chatbot using ChatGPT API and Whisper

Featured ones: