AI voices lack nuance

AI has been around for some time, and has already gone through several hype cycles, but in the last year the technology has finally captured our attention. Whereas previously its use was confined to inscrutable data crunching and opaque social media algorithms, generative AI now provides services that build on human creativity. From writing prose, creating artwork and producing music tracks to generating a completely new performance from a replica of an actor, AI is coming for all of our jobs. Or is it?

There’s no doubt gen AI technology is significantly transforming the voiceover (VO) landscape. AI voice software, such as Natural Readers, Speechify and ElevenLabs, promise to create or recreate natural sounding human voices using just a text prompt. For producers this might sound like a tempting proposition – no more having to deal with troublesome agents or paying for expensive re-records when the client decides to change the script at the last minute. However, there are reasons to pause before we rush headlong into the future.

Firstly, there are the ethical issues: actors have had their voices reproduced without permission and celebrities like Jay-Z have had to fight against deep fakes. The recent long running SAG AFTRA (Screen Actors Guild – American Federation of Television and Radio Artists) strike in the US is a great illustration of the conflict that has arisen from major media companies trying to own the performances and likeness of actors.

Trained human experts still necessary

In the UK, the actors’ union, Equity, is developing policies to respond to these issues. In a recent exchange with us at Bristol Academy of Voice Acting (BRAVA), Liam Budd at Equity Audio Committee said: “We want to work constructively with the audio industry to ensure that human creativity is protected and the licensing framework for engaging voiceovers for performance cloning work is built around the principle of active consent, limited usage and fair remuneration.”

We should fight for the right to be paid residuals for any future use of content using our voice or selves. Regulation is one possible answer yet, despite our pleas, it can’t currently keep up with the pace of development. If we cannot make a living from our craft, the next generation of talent will be dissuaded from entering the industry and we’ll be left with AI as the only option.

Which brings me to my second point. These new gen AI platforms are currently being used to deliver the kind of voiceovers that require less human skill. For example, I’ve heard synthetic voices being used to voice products such as newspaper audio articles. Yes, the voiceover is flat but for blind or visually impaired audiences, I can see how necessary these recordings can be and how quickly and cheaply this content can be generated. Gen AI is also being used in other areas, such as rapid prototyping of character voices in video games, to help reduce development costs.

However, while AI can voice passages of text, modify certain words and translate a newspaper article in a matter of moments, it delivers this content in what I call a ‘neutral’ way. What the technology cannot currently produce is a fully nuanced human performance. When booking a voice actor, producers are looking for someone to interpret and perform scripts to bring a particular story to life. This requires a trained human expert who understands pace, intonation, pitch, subtext and more to convey and elicit an emotional response from the audience.

Licensing your ‘digital twin’

Often a voice actor will provide a performance that the producer or client didn’t know they wanted but sounds exactly right once they hear it. AI can deliver a voice that sounds human, but it can’t yet deliver emotion, sentiment or character. My own experiences have seen clients who have tried using gen AI voices come to me to re-record them.

Despite the threat of AI to the industry, there are opportunities for voice actors. Some have already started licensing their ‘digital twin’ to AI companies and can make passive income from doing so. Well Said Labs adopts an ethical stance where voice actors receive payment for recording their dataset, as well as revenue share. AI can help replace words in a recorded performance or create translated content in the same voice for consistency. The challenge, as always, is to ensure that contracts support this use in a fair way.

Voice actors would benefit from learning to utilise AI tools to expedite their own workflows and monetise their voice in new ways. I can see real benefits for actors that skill up and many of our talent at BRAVA are already experimenting with AI tools in their day-to-day work.

Moving into the future, the biggest opportunity will be for actors who can deliver fully nuanced performances that differentiate them from AI voices. Voice actors who seriously hone their craft will be the ones that succeed in the constantly shifting landscape.

Melissa Thom is a voice actor and founder of Bristol Academy of Voice Acting (BRAVA).
brava.uk.com/ | melissathom.me
bravauk/

*Bristol Academy of Voice Acting (BRAVA) was set up in 2021 and brings together voice experts from the UK & US to deliver high quality training and advice about the industry.