dev-resources.site
for different kinds of informations.
Using Nodejs Buffers to transcribe an Audio file using OpenAI's Whisper service
I was having a difficult time attempting to do the following:
- Retrieve an Audio file from AWS S3
- Have the Audio file in memory be sent to Open AI's Whisper transcription service to get the transcribed text and save the result.
I've seen some examples where users were attempting to use a ReadStream instead and attempt to send that, but that didn't work for me.
So after a Sunday's worth of effort to figure out how to send this data, I did the following (using this specific version of OpenAI's Node.js API: 4.0.0-beta7)
s3.service.ts
function async returnBufferFromS3(params: { key: string }) {
const data = await this.awsS3.send(new GetObjectCommand({
Bucket: 'some-bucket'
, Key: params.key
}));
const stream = data.Body as Readable;
return new Promise<Buffer>((resolve, reject) => {
const chunks: Buffer[] = [];
stream.on('data', (chunk) => chunks.push(chunk));
stream.on('error', (error) => reject(error));
stream.on('end', () => resolve(Buffer.concat(chunks)));
});
}
openAITranscription.ts
import { returnBufferFromS3 } from "s3.service";
import { isUploadable } from "openai/uploads";
...
const openAI = new OpenAI({
apiKey: 'some-api-key',
organization: 'some-org-id'
});
function async returnTranscriptionsBasedOnKey(key: string) {
const buffer = await returnBufferFromS3({ key: 'some-key; });
const tmp = new Blob([buffer], { type: `test.mp4` });
const fileLikeBlob = return Object.assign(blob, {
name: 'test.mp4'
, lastModified: new Date().getTime()
});
if (!isUploadable(fileLikeBlob)) {
throw new Error(`Unable to upload file to OpenAI due to the following not being uploadable: ${fileLikeBlob}`);
}
const openAIResponse = await openAI.audio.transcriptions.create({
file: fileLikeBlob,
model: 'whisper-1'
}
return openAIResponse;
}
This seems like an incredible hacky way of completing this task (Buffer -> Blob -> "FileLikeBlob").
I'm not proud of it, but I didn't enjoy the time I spent trying to make this work either.
If anyone in the comments has better recommendations of how to accomplish this same method; I welcome any critique to make it better and to make this information more readily available to more users.
Featured ones: