Do you hear an echo? Probably, and it could be the answer for improving AI speech recognition

Chinese scientists have discovered that humans are able to hear two separate sound streams – direct speech and echo
They hope the new findings can now be used in AI to improve the way machines process recordings with echoes

Science

Holly Chik

Published: 3:30pm, 16 Feb 2024

Why you can trust SCMP

The human brain can understand speech with an echo because people are able to separate sounds into two streams – direct speech and echo – according to groundbreaking research by Chinese scientists.

The team from Zhejiang University said the discovery could pave the way for improving automatic speech recognition for machines to produce more accurate transcripts from recordings.

“An intense echo strongly distorts the speech envelope that contains critical cues for speech intelligibility, but human listeners can still reliably recognise echoic speech,” the team wrote in an article published in the peer-reviewed journal PLOS Biology on Friday.

“The current study showed that the auditory system can effectively restore the low-frequency components of the speech envelope that are attenuated or eliminated by an echo, providing a plausible neural basis for reliable speech recognition,” the researchers wrote.

There can be echo anywhere – from online meetings to lectures in large auditoriums. And it can make it harder to understand what is being said.

To determine how humans still manage to make sense of such distorted speech, the scientists studied how the brain reacts when people listen to recordings underlaid with echo.

02:38

Apple supplier Foxconn to build ‘AI factories’ using US hardware leader Nvidia’s chips and software

For this, the researchers recruited around 50 native Chinese speakers aged between 19 and 33, and played them Chinese-language narrations of a snippet from a novel, with and without an echo.

Participants listened to the audio through headphones in a quiet room while having their neural responses recorded by magnetoencephalography (MEG). It is a non-invasive test which measures magnetic fields generated by electrical currents in the brain to map its activities.

Researchers then asked the participants comprehension questions to check their understanding of the story. Analysis showed that they understood the content with an accuracy higher than 95 per cent, regardless of any echo.

The scientists compared the collected neural signals to two computational models. They found that the neural activity was better explained by the second model which split sounds into two processing streams of original speech and its echo, than the one that simulates the brain adapting to the echo.

The team said the ability to segregate auditory streams may be important for both focusing on a specific speaker in a crowded environment, such as having a conversation at a party, and for clearly understanding one speaker in a reverberant space.

AI tools like ChatGPT are a ‘revolution’ in investment analysis: BlackRock

Lead author Ding Nai, a research professor at the College of Biomedical Engineering and Instrument Science, Zhejiang University in Hangzhou, said the new findings could be adopted to improve the way machines process echoic recordings.

Automatic speech recognition technology that converts speech to text has seen rapid development in recent years thanks to advancements in deep learning, he said.

Deep learning is a technique in machine learning and artificial intelligence (AI) that trains computers to process data like humans and enables them to perform tasks such as image identification and speech recognition.

Ding, who researches brain science and artificial intelligence, said: “The discovery points to the potential of developing algorithms that identify and separate acoustic sources in soundtracks, just like how the brain works, to make speech recognition more accurate.

“We can also train machines with more echoic recordings so that they get used to identifying and overcoming related issues in audio.”

He said one of the future AI development directions is to equip systems with skills to deal with issues that might arise without the users preprocessing the input materials and interfering during the process, making AI smarter and more powerful.

Post