dev-resources.site
for different kinds of informations.
Tales from the Software Graveyard: The Birth and Death of Squawk Market
The Squawk Market Service & Background
Since early March I've been honing my skills as a day trader, viewing it both as a compliment to my software engineering career and because I want to show my customers at The Wheel Screener that I've got ‘skin in the game' so to speak. While trading, I ran into many services that have audio based news, so you can keep them on in the background while staying focused on price action and the markets. As a shameless hop on the AI hype train as what I expected would bring more interest, I realized I could leverage both GPT4 and a variety of text-to-speech models to build a similar product but with lower latency and lower costs (and better reliability!) I posted about Squawk Market on the launch day - you can read that here.
If you're interested or just want to see what I mean by similar products, I recommended Financial Juice, which offers a 10 second delayed feed completely for free (no affiliation).
Ultimately however, even after advertising on Reddit and Twitter, I saw not a single trial or subscription, and even now writing this in mid June, I've had only two people approach me about the service.
However, not all is lost! In this post, I'm going to go into all the software aspects of Squawk Market, so as to turn this failed product into some software based positives going into the future.
Learnings and Takeaways
Go is Fast!
I already knew this but was reminded again — Go is very performant — scraping websites and converting them to speech as described below was done in sub second times.
Text to Speech Models — Largely Varied in Price and Quality
I ultimately ended up using three different text to speech APIs throughout the creation of squawk market:
I started with ElevenLabs, which in my opinion, has very impressive voice generation — sounding truly human with varied inflection and tones. Their free tier is for up to 10000 characters every month, which is only 6 pages (however, if you need only a fixed amount of voice snippets, this could be great for you) The only problem was even with the $5 per month subscription, I was hitting the character limit of 30000 too quickly.
I then moved on to Google Cloud's text to speech, but I realized Google did not have a free tier (if you start a new account with them you get up to $300 — my free credits are long gone!) and I quickly moved away from them after racking up $10–15 in charges.
I finally settled on Amazon's Polly, their free tier is super generous — a whopping 5 million characters, or 3000+ pages of text — that even with everything that Squawk Market was parsing, I never got outside of the free tier.
Ideally, I would have liked to stick to ElevenLabs, as but with no subscribers (i.e. no revenue), it just didn't make sense financially.
Here's the Go code for interfacing with each of those three services:
ElevenLabs:
func TextToSpeech(text string) []byte {
log.Println("getting voice data...")
// headers for elevenlabs
headers := map[string]string{
"Content-Type": "application/json",
"xi-api-key": os.Getenv("ELEVEN_LABS_API_KEY"),
"Accept": "audio/mpeg",
}
body := strings.NewReader(`{
"text": "` + text + `",
"voice_settings": {
"stability": 0.15,
"similarity_boost": 1.0
}
}`)
// elli voice id MF3mGyEYCl7XYWbV9V6O
data, err := http_helper.MakeHTTPRequest(
"https://api.elevenlabs.io/v1/text-to-speech/MF3mGyEYCl7XYWbV9V6O",
"POST",
headers,
nil,
body,
)
if err != nil {
log.Println(err)
}
// return the byte data generated by elevenlabs' boss bots
return data
}
Amazon:
// TextToSpeech converts text to speech using Amazon Polly
func TextToSpeech(text string) ([]byte, error) {
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String("us-east-1"),
}))
svc := polly.New(sess)
input := &polly.SynthesizeSpeechInput{
OutputFormat: aws.String("mp3"),
Text: aws.String("<speak><prosody rate=\"fast\">" + utils.EscapeForSSML(text) + "</prosody></speak>"),
VoiceId: aws.String("Amy"),
TextType: aws.String("ssml"),
// british language code
LanguageCode: aws.String("en-GB"),
}
output, err := svc.SynthesizeSpeech(input)
if err != nil {
fmt.Println(err)
return nil, err
}
mp3data, err := io.ReadAll(output.AudioStream)
if err != nil {
fmt.Println(err)
return nil, err
}
return mp3data, nil
}
Google:
func TextToSpeech(text string) []byte {
// Instantiates a client.
ctx := context.Background()
client, err := texttospeech.NewClient(ctx)
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Perform the text-to-speech request on the text input with the selected
// voice parameters and audio file type.
req := texttospeechpb.SynthesizeSpeechRequest{
// Set the text input to be synthesized.
Input: &texttospeechpb.SynthesisInput{
InputSource: &texttospeechpb.SynthesisInput_Text{Text: text},
},
// Build the voice request, select the language code ("en-US") and the SSML
// voice gender ("neutral").
Voice: &texttospeechpb.VoiceSelectionParams{
LanguageCode: "en-gb",
SsmlGender: texttospeechpb.SsmlVoiceGender_FEMALE,
// Name: "en-US-Studio-O", // nice one but expensive
Name: "en-GB-Neural2-A",
},
// Select the type of audio file you want returned.
// TODO: could eventually be client side configurable
AudioConfig: &texttospeechpb.AudioConfig{
AudioEncoding: texttospeechpb.AudioEncoding_MP3,
SpeakingRate: 1.3,
Pitch: 0,
},
}
resp, err := client.SynthesizeSpeech(ctx, &req)
if err != nil {
log.Fatal(err)
}
// The resp's AudioContent is binary.
return resp.AudioContent
}
We Can Also Parse Any Video with Speech to Text in Realtime!
While building Squawk Market, I found a way to convert video streams (m3u8) into text. You provide the URL to your YouTube video, and then text of that video comes out! How does it work under the hood?
First, we need to find an m3u8 file that we can actually parse! If you use a youtube video (preferrable a live video), there is a tool called yt-dlp that can do that for us. I wrote up a function called GetStreamUrlFromYoutubeVideoId , that calls yt-dlp to extract the m3u8 URL from the YouTube video ID:
func GetStreamUrlFromYoutubeVideoId(youtubeVideoId string) (string, error) {
// use yt-dlp to get the stream url
cmd := exec.Command("yt-dlp", "-f", "91", "-g", "https://www.youtube.com/watch?v="+youtubeVideoId)
output, err := cmd.Output()
if err != nil {
return "", err
}
return string(output), nil
}
Then, I pass that on to a function M3U8VideoStreamToFile :
func M3U8VideoStreamToFile(m3u8VideoStreamURI string, fileName string, fileType string, secondsString string) error {
// write to flac file
cmd := exec.Command("ffmpeg", "-y", "-i", m3u8VideoStreamURI, "-t", secondsString, "-f", fileType, "-ar", fmt.Sprint(sampleRateHertz), fileName)
err := cmd.Start()
if err != nil {
fmt.Println(err)
return err
}
err = cmd.Wait()
if err != nil {
fmt.Println(err)
return err
}
return nil
}
Finally, I use Open AI's audio to text to convert the audio in the file to text:
func SpeechToText(filePath string) (*string, error) {
c := openai.NewClient(os.Getenv("OPEN_AI_SECRET_KEY"))
ctx := context.Background()
req := openai.AudioRequest{
Model: openai.Whisper1,
FilePath: filePath,
}
resp, err := c.CreateTranscription(ctx, req)
if err != nil {
fmt.Printf("Transcription error: %v\n", err)
return nil, err
}
return &resp.Text, nil
}
Altogether, we can define a function called YoutubeVideoIdToText
that looks like this:
func YoutubeVideoIdToText(youtubeVideoId string) error {
// get m3u8 video stream url
videoStreamUrl, err := streamfinder.GetStreamUrlFromYoutubeVideoId(youtubeVideoId)
if err != nil {
return err
}
// filename for now is more or less an intermediate, may want to change later
fileName := "audio.mp3"
// convert video stream to audio file
err = videostreaming.M3U8VideoStreamToFile(videoStreamUrl, fileName, "mp3", "10")
if err != nil {
return err
}
// convert audio file to text
// open ai cost is $0.006 per minute of audio - i.e. $0.36 per hour of audio
text, err := open_ai.SpeechToText(fileName)
if err != nil {
return err
}
// print text to log for now
log.Println(*text)
return nil
}
From there, it gets as interesting as you want — you could just take the text as is and display it somewhere, or continue on to use something like GPT4 to summarize the text, or provide sentiment analysis to boil down the transcript to a simple ‘BULLISH', ‘BEARISH', or ‘NEUTRAL' signal. Or chain it to a text to speech model like the ones mentioned above and just read it straight out.
Squawk Market is Now Open Source!
For those out there who would like to learn or see how I put a full stack app together, both the frontend and backend repositories of Squawk Market are open source! Check them out here:
Please note that this post doesn't go into all the details of actually deploying the front and backend — you can checkout the READMEs of each on how to do that — if you are interested in running an instance yourself or having trouble reach out to me! I'm more than happy to consider a community led effort to get these services back up and running for everyone to share.
Thanks!
I hope you enjoyed this post and learned a thing or two about writing full stack applications with Go!
-Chris 🍻
I'm on a mission to teach 1,000,000 up-and-coming developers real-world software! Check out my blog for more:
and my Udemy profile & courses:
Featured ones: