dev-resources.site
for different kinds of informations.
Comparison of YouTube Free Autogenerated Subtitles vs Paid AI
- TL;DR
- Prerequisites
- Create a process to download and convert an autogenerated subtitle file from YouTube
- Download an audio file from YouTube and import it into Notta
- Comparison of YouTube vs Notta AI
- Conclusion
- Get the Source Code
- What To Do Next
TL;DR
YouTube isn’t just for watching and sharing videos. It’s also for listening to podcasts, and reading transcriptions. Using an open-source application gives you the ability to download the video, audio or text data only, where you can consume the data at your own leisure, or even autogenerate blog articles.
In this first part of a multi-series blog you’ll use a command-line interface (CLI) yt-dlp
to download YouTube’s free autogenerated subtitles and compare it to paid AI transcription services, such as OpenAI, Notta.ai, and Otter.ai.
Prerequisites
- An open-source CLI yt-dlp to download YouTube files.
- Python 3.
- Notta.ai account (no credit card required).
Create a process to download and convert an autogenerated subtitle file from YouTube
In order to design a workflow to autogenerate blog articles, we must first create a process to download a subtitle file from YouTube.
- Open a terminal and run the command
yt-dlp --version
to check if it is installed.
2023.10.1
- Type the following command to download the autogenerated subtitle file of a YouTube video. Replace the YOUTUBE_URL with any YouTube link, e.g.
https://www.youtube.com/watch?v=x3vnCKivCjs
.
yt-dlp --write-auto-sub --skip-download [YOUTUBE_URL]
You should see an output similar to below.
[youtube] Extracting URL: https://www.youtube.com/watch?v=x3vnCKivCjs
[youtube] x3vnCKivCjs: Downloading webpage
[youtube] x3vnCKivCjs: Downloading ios player API JSON
[youtube] x3vnCKivCjs: Downloading android player API JSON
[youtube] x3vnCKivCjs: Downloading m3u8 information
[info] x3vnCKivCjs: Downloading subtitles: en
[info] x3vnCKivCjs: Downloading 1 format(s): 22
[info] Writing video subtitles to: The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.vtt
[download] Destination: The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.vtt
[download] 100% of 85.84KiB in 00:00:00 at 879.88KiB/s
The subtitle file name is created using the video title and code, e.g. The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.vtt
.
The reason that you’ll need to convert the subtitle file is if you open the vtt
file, it contains metadata that makes it hard to read.
Fortunately, there is a Python script that converts youtube subtitle file (vtt
) to plain text. Credit to glasslion for making it open-source.
Download the above Python script
vtt2text.py
into the same folder as yourvtt
file.In your terminal, type the following command.
python vtt2text.py The\ Fastest\ Way\ to\ Lose\ Belly\ Fat\ \[x3vnCKivCjs\].en.vtt
- If successful, you’ll find a new
txt
file created, e.g.The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.txt
, which has readable text but still contains a few metadata lines.
00:00
today I'm going to share with you the absolute fastest way
to lose your belly now you could have the best willpower
the best discipline really want it really bad and never
really see any results because you're missing the technique
you're missing the strategy I'm the perfect example I took
guitar lessons for six years right as a teenager and I
never really progressed or never really went anywhere
because the techniques that were taught to me were just not
that great great the same thing happened with tennis in
college I was never taught the right technique and so I
would use force and I would keep practicing but practice
incorrectly never went anywhere and this really applies
with weight loss too because if you have the right way of
doing something you don't have to put so much effort into
it so what I'm going to show you is the fastest way to
achieve your goal number one when you do the ketogenic diet
it's not differentiated what type of fats you should be
eating in other words keto is really about low carb you
...
Note: You’ll need to tweak the Python script to remove the metadata lines. If you prefer to download my enhanced Python script, you can Get the Source Code.
Download an audio file from YouTube and import it into Notta
- Type the following command to download the audio data of a YouTube video. Replace the YOUTUBE_URL with any YouTube link, e.g.
https://www.youtube.com/watch?v=x3vnCKivCjs
.
yt-dlp -f m4a --quiet [YOUTUBE_URL]
The audio file name is created using the video title and code, e.g. The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.m4a
.
Open a browser, navigate to Notta.ai and login to your account.
In the Dashboard, click on
Import files
, and select the audio file above.If successful, you should see a new recording under Recent recordings in your Dashboard.
Click on the recording for your audio, e.g.
The Fastest Way to Lose Belly Fat [x3vnCKivCjs]
, to see the output transcript.
Note: Notta’s output transcript is segregated into blocks based on speakers.
Today, I'm going to share with you the absolute fastest way to lose your belly. Now, you can have the best willpower, the best discipline, really want it, really bad, and never really see any results because you're missing the technique.
You're missing the strategy. I'm the perfect example. I took guitar lessons for six years as a teenager, and I never really progressed or never really went anywhere because the techniques that were taught to me were just not that great.
The same thing happened with tennis in college. I was never taught the right technique, and so I would use force, and I would keep practicing but practice incorrectly. Never went anywhere. This really applies with weight loss too because if you have the right way of doing something, you don't have to put so much effort into it.
What I'm going to show you is the fastest way to achieve your goal. Number one, when you do the ketogenic diet, it's not differentiated what type of fats you should be eating. In other words, keto is really about low carb.
Comparison of YouTube vs Notta AI
Let’s start with the negatives.
YouTube free autogenerated subtitles | Notta AI transcription services |
---|---|
1. Must be familiar or comfortable using the command-line. | 1. Paid service, but offer a free tier of 120 mins. |
2. Two-step process of downloading the subtitle and converting it to text. | 2. Two-step process of downloading an audio file and importing it. |
3. The converted text has a few lines of metadata. | 3. Most features are behind a paywall, e.g. view the first 5 mins of transcript. |
4. The converted text has no punctuations. | 4. No API or SDK to allow the creation of an automated pipeline. |
For the positives.
YouTube free autogenerated subtitles | Paid AI transcription services |
---|---|
1. You can create a workflow of automated processes using a pipeline. | 1. The converted text has punctuations and proper sentences. |
2. No subscription cost, only your Internet bandwidth for downloading the subtitles. | 2. You can create a custom template for your converted text. |
Both YouTube and Notta seem to have about the same accuracy based on the sample output text, however you’ll have to find an application to measure the accuracy.
Conclusion
The free YouTube autogenerated subtitle is a good option for personal use, such as generating transcription for offline reading. However, due to the lack of polish and also customisable templates, the Paid AI transcription services should be considered for commercial use, which may include autogenerating articles for your blog.
Get the Source Code
You can download the above source code from my GitHub repository dennislwm/playscribe.
What To Do Next
You can further extend your code in several meaningful ways:
Implement a GitHub Actions that will continuously monitor an RSS feed for new YouTube links, and process the links and returns the transcribe results.
Implement a static site blog that publishes each transcribe result as a new post, with an RSS feed to allow users to subscribe to new posts.
Was this article useful? Help me to improve by replying in the comments.
Featured ones: