dev-resources.site

for different kinds of informations.

Cross browser speech synthesis - the hard way and the easy way

Published at

12/7/2021

Minimal example

Let's approach this topic step-by-step and start with a minimal example that all browsers (that generally support speech synthesis) should run:



if ('speechSynthesis' in window) {
  window.speechSynthesis.speak(
    new SpeechSynthesisUtterance('Hello, world!')
  )
}

You can simply copy that code and execute it in your browser console.

If you have basic support you will hear some "default" voice speaking the text 'Hello, world!' and it may sound natural or not, depending on the default "voice" that is used.

Loading voices

Browsers may detect your current language and select a default voice, if installed. However, this may not represent the desired language you'd like to hear for the text to be spoken.

In such case you need to load the list of voices, which are instances of SpeechSynthesisVoice. This is the first greater obstacle where browsers behave quite differently:

Load voices sync-style



const voices =  window.speechSynthesis.getVoices()
voices // Array of voices or empty if none are installed

Firefox and Safari Desktop just load the voices immediately in sync-style. This however would return an empty array on Chrome Desktop, Chrome Android and may return an empty Array on Firefox Android (see next section).

Load voices async-style



window.speechSynthesis.onvoiceschanged = function () {
  const voices = window.speechSynthesis.getVoices()
  voices // Array of voices or empty if none are installed
}

This methods loads the voices async, so your overall system needs a callback or wrap it with a Promise. Firefox Desktop does not support this method at all, although it's defined as property of window.speechSynthesis, while Safari does not have it at all.

In contrast: Firefox Android loads the voices the first time using this method and on a refresh has them available via the sync-style method.

Loading using interval

Some users of older Safari have reported that their voices are not available immediately (while onvoiceschanged is not available, too). For this case we need to check in a constant interval for the voices:



let timeout = 0
const maxTimeout = 2000
const interval = 250

const loadVoices = (cb) => {
  const voices = speechSynthesis.getVoices()

  if (voices.length > 0) {
    return cb(undefined, voices)
  }

  if (timeout >= maxTimeout) {
    return cb(new Error('loadVoices max timeout exceeded'))
  }

  timeout += interval
  setTimeout(() => loadVoices(cb), interval)
}

loadVoices((err, voices) => {
  if (err) return console.error(err)

  voices // voices loaded and available
})

Speaking with a certain voice

There are use-cases, where the default selected voice is not the same language as the text to be spoken. We need to change the voice for the "utterance" to speak.

Step 1: get a voice by a given language



// assume voices are loaded, see previous section
const getVoicebyLang = lang => speechSynthesis
  .getVoices()
  .find(voice => voice.startsWith(lang))

const german = getVoicebyLang('de')

Note: Voices have standard language codes, like en-GB or en-US or de-DE. However, on Android's Samsung Browser or Android Chrome voices have underscore-connected codes, like en_GB.

Then on Firefox android voices have three characters before the separator, like deu-DEU-f00 or eng-GBR-f00.

However, they all start with the language code so passing a two-letter short-code should be sufficient.

Step 2: create a new utterance

We can now pass the voice to a new SpeechSynthesisUtterance and as your precognitive abilities correctly manifest - there are again some browser-specific issues to consider:



const text = 'Guten Tag!'
const utterance = new SpeechSynthesisUtterance(text)

if (utterance.text !== text) {
  // I found no browser yet that does not support text
  // as constructor arg but who knows!?
  utterance.text = text
}

utterance.voice = german // ios required
utterance.lang = voice.lang // // Android Chrome required
utterance.voiceURI = voice.voiceURI // Who knows if required?

utterance.pitch = 1
utterance.volume = 1

// API allows up to 10 but values > 2 break on all Chrome
utterance.rate = 1

We can now pass the utterance to the speak function as a preview:



speechSynthesis.speak(utterance) // speaks 'Guten Tag!' in German

Step 3: add events and speak

This is of course just the half of it. We actually want to get deeper insights of what's happening and what's missing by tapping into some of the utterance's events:



const handler = e => console.debug(e.type)

utterance.onstart = handler
utterance.onend = handler
utterance.onerror = e => console.error(e)

// SSML markup is rarely supported
// See: https://www.w3.org/TR/speech-synthesis/
utterance.onmark = handler

// word boundaries are supported by
// Safari MacOS and on windows but
// not on Linux and Android browsers
utterance.onboundary = handler

// not supported / fired
// on many browsers somehow
utterance.onpause = handler
utterance.onresume = handler

// finally speak and log all the events
speechSynthesis.speak(utterance)

Step 4: Chrome-specific fix

Longer texts on Chrome-Desktop will be cancelled automatically after 15 seconds. This can be fixed by either chunking the texts or by using an interval of "zero"-latency pause/resume combination. At the same time this fix breaks on Android, since Android devices don't implement speechSynthesis.pause() as pause but as cancel:



let timer

utterance.onstart = () => {
  // detection is up to you for this article as
  // this is an own huge topic for itself
  if (!isAndroid) {
    resumeInfinity(utterance)
  }
}

const clear = () => {  clearTimeout(timer) }

utterance.onerror = clear
utterance.onend = clear

const resumeInfinity = (target) => {
  // prevent memory-leak in case utterance is deleted, while this is ongoing
  if (!target && timer) { return clear() }

  speechSynthesis.pause()
  speechSynthesis.resume()

  timer = setTimeout(function () {
    resumeInfinity(target)
  }, 5000)
}

Furthermore, some browser don't update the speechSynthesis.paused property when speechSynthesis.pause() is executed (and speech is correctly paused). You need to manage these states yourself then.

Issues that can't be fixed with JavaScript:

All the above fixes rely on JavaScript but some issues are platform-specific. You need to your app in a way to avoid these issues, where possible:

All browsers on Android actually do a cancel/stop when calling speechSynthesis.pause; pause is simply not supported on Android 👎
There are no voices on Chromium-Ubuntu and Ubuntu-derivatives unless the browser is started with a flag 👎
If on Chromium-Desktop Ubuntu and the very first page wants to load speech synthesis, then there are no voices ever loaded until the page is refreshed or a new page is entered. This can be fixed with JavaScript but it can lead to very bad UX to auto-refresh the page. 👎
If voices are not installed on the host-OS and there are no voices loaded from remote by the browser, then there are no voices and thus no speech synthesis 👎
There is no chance to just instant-load custom voices from remote and use them as a shim in case there are no voices 👎
If the installed voices are just bad users have to manually install better voices 👎

Making your life easier with `EasySpeech`

Now you have seen the worst and believe me, it takes ages to implement all potential fixes.

Fortunately I already did this and published a package to NPM with the intent to provide a common API that handles most issues internally and provide the same experience across browsers (that support speechSynthesis):

leaonline / easy-speech

🔊 Cross browser Speech Synthesis also known as Text to speech or TTS; no dependencies; uses Web Speech API

Easy Speech

Cross browser Speech Synthesis; no dependencies

API docs »

⭐️ Why EasySpeech?

This project was created, because it's always a struggle to get the synthesis part of Web Speech API running on most major browsers.

✨ Features

🪄 Single API for using speechSynthesis across multiple browsers
🌈 Async API (Promises, async/await)
🚀 Hooks for all events; global and/or voice-instance-specific
🌱 Easy to set up and integrate: auto-detects and loads available voices
🔧 Includes fixes or workarounds for many browser-specific quirks
📝 Internal logging via EasySpeech.debug hook
📦 Multiple build targets
🎮 Live demo to test your browser

Note: this is not a polyfill package, if your target browser does not support speech synthesis or the Web Speech API, this package is not usable.

🚀 Live Demo

The live demo is available at https://leaonline.github.io/easy-speech/ You can use it to test your browser for speechSynthesis support and functionality.

Table of

…

View on GitHub

You should give it a try if you want to implement speech synthesis the next time. It also comes with a DEMO page so you can easy test and debug your devices there: https://jankapunkt.github.io/easy-speech/

Let's take a look how it works:



import EasySpeech from 'easy-speech'

// sync, returns Object with detected features
EasySpeech.detect()

EasySpeech.init()
  .catch(e => console.error('no speech synthesis:', error.message)
  .then(() = > {
     EasySpeech.speak({ text: 'Hello, world!' })
   })

It will not only detect, which features are available but also loads an optimal default voice, based on a few heuristics.

Of course there is much more to use and the full API is also documented via JSDoc: https://github.com/jankapunkt/easy-speech/blob/master/API.md

If you like it leave a star and please file an issue if you found (yet another) browser-specific issue.

References

standards Article's

30 articles in total