TTS AD: A Viable Solution for Audio Describing Content?

TTS AD is the abbreviation for text-to-speech audio description. Now you may have heard text to speech and audio description as separate entities, but TTS AD is a relatively new concept that is starting to turn heads in the world of accessible media. Speed is the name of the game with TTS AD, as media professionals seek cost and time-effective solutions for making their programming accessible to low-vision audiences.

So let's take a closer look at what TTS AD is, how it is made, and what role it will play in the future of accessible media.

TTS AD: A Viable Solution for Audio Describing Content?

First, can you quickly explain to me what text to speech is?

Text to speech (TTS) converts written text into a synthesised spoken voice.

It is sometimes referred to as "read aloud" technology.

TTS is an increasingly popular accessibility feature on computers and phones for reading out digital text.

You can find TTS in many places today such as internet browsers, smart assistants, e-book readers, and word processors.

TTS AD - A Viable Solution for Audio Describing Content? A women reading an e-book with text to speech.

And what is audio description?

Audio description for film and television is a form of narration that uses verbal descriptions to provide information on visual aspects of a media production.

In other words, a pre-recorded voice over track describes what is happening in a video, TV show or film.

The audio description should be neatly intertwined with the production's original dialogue and soundtrack.

So TTS AD is a combination of the two?

That's right.

TTS AD is audio description that is read aloud using synthetic voices through text to speech.

Who is the audio description intended for?

Audio description is primarily intended for blind or visually impaired viewers, so key visual elements are described to help their understanding of the video.

TTS AD - A Viable Solution for Audio Describing Content? Rear view of couple watching TV with screen blurred

Why is there a demand for TTS AD?

TTS AD is gaining in popularity due to time and cost factors.

The conventional way of making audio description is complex, meaning production companies often don't have the budget for it.

How is audio description conventionally produced?

The audio description production process can be broken down into the following steps:

Watching the video

The audio describer receives the video for which an audio description is required from the client
The audio describer watches the video to determine the gaps in dialogue where the audio description can be placed

Writing the script

The audio describer writes the time-coded script using a simple text editor, often with multiple viewings of the video
The audio describer sends the script to the client
If necessary, the client sends the script back to the audio describer with corrections
Once approved by the client, the script is exported to the relevant format(s)

Recording the audio description

The script is read out by a professional voice artist in the recording studio
A sound engineer mixes the audio in the recording studio
If necessary, the audio description is sent back to the audio describer or re-recorded in the studio when further corrections are requested by the client

Mixing and conversion

Once approved, the audio description is mixed to create a usable file
The audio file is converted to whichever media are required

Complications often arise at the recording and mixing stages, as production is reliant on the availability of staff and recordings needing correction. With delays and re-recordings comes rising costs and postponed deadlines.

calculator on the desk with woman in the background

What is the knock-on effect of having a complicated production process?

When costs rise over budget, media companies and content creators simply won't provide audio description for their productions.

This is one of the main reasons why there is a severe lack of audio-described programming.

Some countries have quotas in place, but only between 4 and 11% of programming in the EU is provided with an audio description.

Moral and legal aspects aside, an improved, more cost-effective workflow should encourage media service providers to factor audio description into their budgets.

This is where TTS AD comes in.

How does TTS AD improve the audio description workflow?

TTS reads out the script, so there is no need for recording with a voice artist.

This removes roadblocks associated with scheduling voice talent and sound engineers.

If changes are needed, corrections can be easily made to the script, and the new TTS voice output is ready right away.

What improvements are there at the mixing stage?

It depends on what features are included in the TTS AD software.

VIDEO TO VOICE has developed Frazier, an audio description production suite that automates mixing and mastering steps.

This way, the mixed audio-described video meets official loudness standards and is ready to broadcast.

With Frazier, a user can produce TTS AD on their own?

Exactly.

The audio describer can work on every stage of the process in the browser-based audio description editor – everything from script writing to delivering the broadcast-ready mixed video.

There is no need for expensive recording studios or mixing desks.

This article provides more in-depth analysis of audio description software.

Let's talk about the final product. How has TTS AD been received by its intended audience?

The reaction has been generally positive from blind and low-vision audiences, though acknowledging human voices are preferable.

In 2012, a Polish study showed 95% of respondents regarded TTS AD as a viable interim solution.

A 2015 study by the Autonomous University of Barcelona supported these findings, where 94% of blind or partially sighted participants found TTS AD to be a suitable solution.

Since these studies were conducted, synthetic voice quality has continued to improve.

Production houses are encouraged to test samples on their audiences before deciding if TTS AD is the right choice.

Do you have any examples of TTS AD with synthetic voices?

Frazier includes human-like synthetic voices. Here's an example in Australian English:

You can listen to further examples in different languages on the VIDEO TO VOICE production page.

How does TTS AD tie in with the future of accessible media?

With more visual media being produced than ever before, we need workable solutions for ensuring as much content as possible is made accessible.

The number of productions without audio description is continually growing, especially when considering the thousands of hours of footage being uploaded onto YouTube and other video-sharing platforms every day.

The only way to bridge the accessibility gap is to automate time-consuming and fiddly processes, as mentioned above.

Summary – what have we learnt?

The focus should not be on discussing whether human audio description is preferable to synthetic voices. Instead, we should be looking at the bigger picture: TTS AD provides a viable option for production companies that would otherwise not be able to audio describe their content.

TTS AD: A Viable Solution for Audio Describing Content?