Do you hear an echo? Probably, and it could be the answer for improving AI speech recognition
- Chinese scientists have discovered that humans are able to hear two separate sound streams – direct speech and echo
- They hope the new findings can now be used in AI to improve the way machines process recordings with echoes
“An intense echo strongly distorts the speech envelope that contains critical cues for speech intelligibility, but human listeners can still reliably recognise echoic speech,” the team wrote in an article published in the peer-reviewed journal PLOS Biology on Friday.
“The current study showed that the auditory system can effectively restore the low-frequency components of the speech envelope that are attenuated or eliminated by an echo, providing a plausible neural basis for reliable speech recognition,” the researchers wrote.
There can be echo anywhere – from online meetings to lectures in large auditoriums. And it can make it harder to understand what is being said.
For this, the researchers recruited around 50 native Chinese speakers aged between 19 and 33, and played them Chinese-language narrations of a snippet from a novel, with and without an echo.
Participants listened to the audio through headphones in a quiet room while having their neural responses recorded by magnetoencephalography (MEG). It is a non-invasive test which measures magnetic fields generated by electrical currents in the brain to map its activities.
Researchers then asked the participants comprehension questions to check their understanding of the story. Analysis showed that they understood the content with an accuracy higher than 95 per cent, regardless of any echo.
The scientists compared the collected neural signals to two computational models. They found that the neural activity was better explained by the second model which split sounds into two processing streams of original speech and its echo, than the one that simulates the brain adapting to the echo.
The team said the ability to segregate auditory streams may be important for both focusing on a specific speaker in a crowded environment, such as having a conversation at a party, and for clearly understanding one speaker in a reverberant space.
AI tools like ChatGPT are a ‘revolution’ in investment analysis: BlackRock
Lead author Ding Nai, a research professor at the College of Biomedical Engineering and Instrument Science, Zhejiang University in Hangzhou, said the new findings could be adopted to improve the way machines process echoic recordings.
Automatic speech recognition technology that converts speech to text has seen rapid development in recent years thanks to advancements in deep learning, he said.
Ding, who researches brain science and artificial intelligence, said: “The discovery points to the potential of developing algorithms that identify and separate acoustic sources in soundtracks, just like how the brain works, to make speech recognition more accurate.
“We can also train machines with more echoic recordings so that they get used to identifying and overcoming related issues in audio.”