Artificial intelligence is quickly starting to get to grips with human speech patterns. If you think back to just a few years ago, most language processing algorithms all sounded like Stephen Hawking, but nowadays, your smartphone’s AI assistant is able to accurately replicate human speech.
Siri, Google Assistant and Amazon Alexa all have lovely lifelike voices, if a little synthetic. Yet a new AI start-up, Lyrebird, has taken AI’s voice to a new level.
The ability to generate natural-sounding speech has long been a core challenge for computer programs that transform text into spoken words.
Artificial intelligence (AI) personal assistants such as Siri, Alexa, Microsoft’s Cortana and the Google Assistant all use text-to-speech software to create a more convenient interface with their users.
Those systems work by cobbling together words and phrases from prerecorded files of one particular voice. Switching to a different voice such as having Alexa sound like a manrequires a new audio file containing every possible word the device might need to communicate with users.
After learning how to generate speech the system can then adapt to any voice based on only a one-minute sample of someone’s speech. “Different voices share a lot of information,” says Lyrebird co-founder Alexandre de Brébisson, a PhD student at the Montreal Institute for Learning Algorithms laboratory at the University of Montreal. “After having learned several speakers’ voices, learning a whole new speaker’s voice is much faster. That’s why we don’t need so much data to learn a completely new voice. More data will still definitely help, yet one minute is enough to capture a lot of the voice ‘DNA.’”
Check out what Lyrebird’s AI can do in the video clip below.