Logo

dev-resources.site

for different kinds of informations.

Generate subtitles with OpenAI Whisper and Eyevinn OSC

Published at
2/8/2024
Categories
opensource
openai
whisper
api
Author
oscnrd
Categories
4 categories in total
opensource
open
openai
open
whisper
open
api
open
Author
6 person written this
oscnrd
open
Generate subtitles with OpenAI Whisper and Eyevinn OSC

With AI and LLMs being on (almost) everyone's mind these days, we at Eyevinn thought that it would be interesting to explore the possibilities of using OpenAI's speech-to-text model to generate subtitles for video/audio content.

The initial idea was that it would be cool if we could generate subtitles on the fly when a user requests them for a movie, they won't be perfect of course but maybe they can be just good enough to provide basic translations for content that otherwise wouldn't have subtitles in that specific language available.

With that idea in mind, we have created a small POC that is available on GitHub. It's a super simple API that makes it possible to provide a link to a video/audio segment where the response is the transcribed content. We've also added the possibility of uploading the transcribed file to an S3 bucket.

To transcribe content you will do a POST to the /transcribe endpoint:

{
  "url": "https://example.net/vod-audio_en=128000.aac"
  "language": "en" // ISO 639-1 language code (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) (optional)
  "format": "vtt" // Supported formats: json, text, srt, verbose_json, or vtt (optional)
}
Enter fullscreen mode Exit fullscreen mode

The response will then be the transcribed content:

{
  "workerId": "BFabbcCi3IYuWOj6LfsgK",
  "result": "WEBVTT\n\n00:00:00.000 --> 00:00:04.180\nor into transcoding I mean, I could probably add just the keyframe in the start and just\n\n00:00:04.180 --> 00:00:06.920\nskip I-frames and the rest of that.\n\n"
}
Enter fullscreen mode Exit fullscreen mode

Formatted output:

WEBVTT

00:00:00.000 --> 00:00:01.940
So into transcoding, I mean, I could

00:00:01.940 --> 00:00:03.700
probably add just a keyframe at the start

00:00:03.700 --> 00:00:06.700
and then just skip iFrames in the rest of the scenes.
Enter fullscreen mode Exit fullscreen mode

The API uses the OpenAI API so an OpenAI account and API key are required.

The easiest and best way to try this out is to use our newly released (currently in beta) platform that we call Eyevinn Open Source
as a service
. It's a cloud platform that makes it super easy to prototype, monetize, and take open-source services to production. The idea behind this is that when you sign up, which is completely free, you can easily try out and deploy a selection of open-source services with a click of a button. What makes this platform special is that you only pay for the cost of running the services and a small community fee that will be shared with the maintainers of the open-source service that you use.

Take the auto-subtitles API as an example, when you have created an account you will be presented with a dashboard containing multiple services.

Dashboard view

When creating an instance of auto-subtitles you'll need to input your OpenAI key (this is only used when making calls with the OpenAI API)

Creating a service view

That's it now you are up and running and can try it out!

Eyevinn OSaaS is currently in beta so let us know what you think about it.

As always if you need assistance in the development and implementation of this, our team of video developers are happy to help you out. If you have any questions or comments just drop a line in the comments section to this post.

whisper Article's
30 articles in total
Favicon
How Machines Hear and Understand Us
Favicon
Wisper, ffmpeg을 활용한 비디오 자막 자동 생성
Favicon
Most affordable Whisper API
Favicon
AI and Emotional Dependency: A Growing Concern
Favicon
Distance de Levenshtein : Le Guide Ultime pour Mesurer la Similarité Textuelle
Favicon
Creating a Free AI voice-to-text transcription Program using Whisper
Favicon
Do Pet Translator Apps Work? Unveiling the Science Behind Dog Translator Apps & More!
Favicon
Build A Transcription App with Strapi, ChatGPT, & Whisper: Part 1
Favicon
免費開源的語音辨識功能:Cloudflare Workers AI + Whisper
Favicon
fishaudio/fish-speech-1.2-torrent
Favicon
Pitch-Tonic
Favicon
Making My Own Karaoke Videos with AI
Favicon
When smart algorithms beat artificial intelligence -brute force
Favicon
Deploying whisperX on AWS SageMaker as Asynchronous Endpoint
Favicon
Generate subtitles with OpenAI Whisper and Eyevinn OSC
Favicon
Deploying OpenAI's Whisper Large V3 Model on SageMaker Using Hugging Face Libraries
Favicon
免費開源的語音辨識功能:Google Colab + Faster Whisper
Favicon
免費開源的語音辨識功能:Google Colab + Whisper large v3
Favicon
OpenAI Whisper new model Large V3 just released and amazing
Favicon
Write a video translation and voiceover tool in Python
Favicon
Optimise OpenAI Whisper API: Audio Format, Sampling Rate and Quality
Favicon
Using Nodejs Buffers to transcribe an Audio file using OpenAI's Whisper service
Favicon
OpenAI Playground: Unlocking the Potential of AI Models
Favicon
How to get text from any YT video | Free transcribe program 🖹
Favicon
How to use Whisper AI (using Google Colab)
Favicon
OpenAI Whisper Deployment on AWS as Asynchronous Endpoint
Favicon
How I converted a podcast into a knowledge base using Orama search and OpenAI whisper and Astro
Favicon
Achieving 90% Cost-Effective Transcription and Translation with Optimised OpenAI Whisper on Q Blocks
Favicon
Translate Speech Into Japanese (open source web app)
Favicon
Build a Telegram voice chatbot using ChatGPT API and Whisper

Featured ones: