Logo

dev-resources.site

for different kinds of informations.

AI powered video summarizer with Amazon Bedrock and Anthropic’s Claude

Published at
1/3/2024
Categories
bedrock
claude
generativeai
serverless
Author
zied
Author
4 person written this
zied
open
AI powered video summarizer with Amazon Bedrock and Anthropic’s Claude

Photo by [Andy Benham](https://unsplash.com/@benham3160?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)

At times, I find myself wanting to quickly get a summary of a video or capture the key points of a tech talk. Thanks to the capabilities of generative AI, achieving this is entirely possible with minimal effort.

In this article, I’ll walk you through the process of creating a service that summarizes YouTube videos based their transcripts and generates audio from these summaries.

AI powered youtube video summarizer

We’ll leverage Anthropic’s Claude 2.1 foundation model through Amazon Bedrock for summary generation, and Amazon Polly to synthesize speech from these summaries.

Solution overview

I will use a step functions to orchestrate the different steps involved in the summary and audio generation :

AI powered youtube video summarizer architecture

🔍 Let’s break this down:

  • The Get Video Transcript function retrieves the transcript from a specified YouTube video URL. Upon successful retrieval, the transcript is stored in an S3 bucket, ready for processing in the next step.

  • Generate Model Parameters function retrieves the transcript from the bucket and generates the prompt and inference parameters specific to Anthropic’s Claude v2 model. These parameters are then stored in the bucket for use by the Bedrock API in the subsequent step.

  • Invoking the Bedrock API is achieved through the step functions’ AWS SDK integration, enabling the execution of the model inferences with inputs stored in the bucket. This step generates a structured JSON containing the summary.

  • Generate audio form summary relies on Amazon Polly to perform speech synthesis from the summary produced in the previous step. This step returns the final output containing the video summary in text format, as well as a presigned URL for the generated audio file.

  • The bucket serves as a state storage used across all the steps of the state machine. In fact, we don’t know the size of generated video transcript upfront; it might reach the Step Functions’ payload size limit of 256 KB in some lengthy videos.

On using Anthoropic’s Claude 2.1

At the time of writing, Claude 2.1 model supports 200K tokens, an estimated word count of 150K. It provides also a good accuracy over long documents, making it well-suited for summarizing lengthy video transcripts.

TL;DR

You will find the complete source code here 👇
GitHub - ziedbentahar/yt-video-summarizer-with-bedrock

I will use NodeJs, typescript and CDK for IaC.

Solution details

1- Enabling Anthropic’s Claude v2 in your account

Amazon Bedrock offers a range of foundational models, including Amazon Titan, Anthropic’s Claude, Meta Llama2, etc., which are accessible through Bedrock APIs. By default, these foundational models are not enabled; they must be enabled through the console before use.

We’ll request access to Anthropic’s Claude models. But first we’ll need to submit a use case details:

Request Anthropic’s Claude access

2- Getting transcripts from Youtube Videos

I will rely on this lib for the video transcript extraction (It feels like a cheat code 😉) ; in fact, this library makes use of an unofficial YouTube API without relying on a headless Chrome solution. For now, it yields good results on several YouTube videos, but I might explore a more robust solutions in the future :

The extracted transcript is then stored on the s3 bucket using ${requestId}/transcript as a key.

You can find the code for this lambda function here

3- Finding the adequate prompt and generating model inference parameters

At the time of writing, Bedrock currently only supports Claude’s Text Completions API. Prompts must be wrapped in \n\nHuman: and \n\nAssistant: markers to let Claude understand the conversation context.

Here is the prompt; I find that it produces good results for our use case:

    You are a video transcript summarizer.
    Summarize this transcript in a third person point of view in 10 sentences.
    Identify the speakers and the main topics of the transcript and add them in the output as well.
    Do not add or invent speaker names if you not able to identify them.
    Please output the summary JSON format conforming to this JSON schema:
    {
      "type": "object",
      "properties": {
        "speakers": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "topics": {
          "type": "string"
        },
        "summary": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      }
    }

    <transcript>{{transcript}}</transcript>
Enter fullscreen mode Exit fullscreen mode

🤖 Helping Claude producing good results:

  • To clearly mark to the transcript to summarize, we use XML tags. Claude will specifically focus on the structure encapsulated by these XML tags. I will be substituting {{transcript}} string with the actual video transcript.

  • To assist Claude in generating a reliable JSON output format, I include in the prompt the JSON schema that needs to be adhered to.

  • Finally, I also need to inform Claude that I want to generate only a concise JSON response without unnecessary chattiness, meaning without including a preamble and postscript while returning the JSON payload:

\n\nHuman:{{prompt}}\n\nAssistant:{
Enter fullscreen mode Exit fullscreen mode

Note that the full prompt ends with a trailing {

As mentioned on the section above, we will store this generated prompt as well as the model parameters in the bucket so that It can be used as an input of Bedrock API:

      const modelParameters = {
        prompt,
        max_tokens_to_sample: MAX_TOKENS_TO_SAMPLE,
        top_k: 250,
        top_p: 1,
        temperature: 0.2,
        stop_sequences: ["Human:"],
        anthropic_version: "bedrock-2023-05-31",
      };
Enter fullscreen mode Exit fullscreen mode

You can follow this link for the full code of the generate-model-parameters lambda function.

4- Invoking Claude Model

In this step, we’ll avoid writing custom lambda function to invoke Bedrock API. Instead, we’ll use Step functions direct SDK integration. This state loads from the bucket the model inference parameters that were generated in the previous step:

☝️ Note: As we instructed Claude to generate the response in JSON format, the completion API response misses a leading { as Claude outputs the rest of the requested JSON schema.

We use intrinsic functions on the state’s ResultSelector to add the missing opening curly brace and to format the state output in a well formed JSON payload :

    ResultSelector: {
      "id.$": "$$.Execution.Name",
      "summaryTaskResult.$":
        "States.StringToJson(States.Format('\\{{}', $.Body.completion))",
    }
Enter fullscreen mode Exit fullscreen mode

I have to admit, it is not ideal but this helps get by without writing a custom Lambda function.

5- Generating audio from video summary

This step is heavily inspired by this previous blog post. Amazon Polly generates the audio from the video summary:

Here are the details of synthesize function:

Once the audio generated, we store it on the S3 bucket and we generate a presigned Url so it can be downloaded afterwards.

☝️ On language detection : In this example, I am not performing language detection; by default, I am assuming that the video is in English. You can find in my previous article how to perform such a process in speech synthesis. Alternatively, We can also leverage Claude model capabilities to detect the language of the transcript.

6- Defining the state machine

Alright, let’s put it all together and let’s take a look at the CDK definition of the state machine:

In order to be able to invoke Bedrock API, we’ll need to add this policy to the workflow’s role (And it’s important to remember granting the S3 bucket read & write permissions to the state machine):

Wrapping up

I find creating generative AI based applications to be a fun exercise, I am always impressed by how quickly we can develop such applications by combining Serverless and Gen AI.

Certainly, there is room for improvement to make this solution production-grade. This workflow can be integrated into a larger process, allowing the video summary to be sent asynchronously to a client, and let’s not forget robust error handling.

Follow this link to get the source code for this article.

Thanks for reading and hope you enjoyed it !

Further readings

Put words in Claude's mouth
Anthropic Claude models
What is Amazon Bedrock?

claude Article's
29 articles in total
Favicon
Integrating Locally running Postgres with Claude Desktop
Favicon
Write tools for LLMs with go - mcp-golang
Favicon
MCP using node on asdf
Favicon
Modify the local bolt.new interface to allow input of the API key
Favicon
Enabling Application Downloads in Local bolt.new
Favicon
Running bolt.new Locally
Favicon
In the Beginning...
Favicon
Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Podcast Summary
Favicon
Certainly! Absolutely! I apologize!
Favicon
Claude prompting guide - General tips for effective prompting
Favicon
How I used ChatGPT o1 and Claude for generating a SQL RBAC report and was surprised by the results
Favicon
How to use AI for coding the right way
Favicon
Using Cursor + Claude to Make Full-Stack SaaS Apps
Favicon
Exploring Anthropic Claude: A Safe and Ethical AI Assistant
Favicon
Claude 3.5 API Introductory Tutorial
Favicon
Unlocking Rapid Data Extraction: Groq + OCR and Claude Vision
Favicon
Free AI Chat and AI Art
Favicon
Optimising Function Calling (GPT4 vs Opus vs Haiku vs Sonnet)
Favicon
DEMO - Voice to PDF - Complete PDF documents with voice commands using the Claude 3 Opus API
Favicon
Claude LLM - Pros and Cons Compared with Other LLMs
Favicon
Is Claude Self Aware
Favicon
Guide to Effective Prompt Engineering for ChatGPT and LLM Responses
Favicon
AI powered video summarizer with Amazon Bedrock and Anthropic’s Claude
Favicon
Claude 2.1 Unleashed: The AI Revolution That's Outshining GPT-4
Favicon
AWS Bedrock Claude 2.1 - Return only JSON
Favicon
Claude: 10 Minute Docs Audit
Favicon
New Discoveries in No-Code AI App Building with ChatGPT
Favicon
Meet Claude - The AI Assistant That Understands The World Like You Do
Favicon
La IA de Anthropic, Claude, Supera a ChatGPT

Featured ones: