There has been rapid development in the artificial intelligence (AI) space recently, with researchers coming up with new ways to improve everyday life. Now, they have developed an AI model that can significantly improve audio quality in real-world situations by utilising how humans perceive speech.
The study, conducted by researchers from Ohio State University, showed that the subjective ratings of sound quality made by people could be combined with a speech enhancement model that can give better speech quality as measured by objective metrics.
The new model performed better than other standard approaches that aim to minimize the noisy audio – unwanted sounds that may disrupt what the listener actually wants to hear, a press statement from the university explained. Notably, the predicted quality scores the model generates were strongly correlated to the judgments humans would make.
“What distinguishes this study from others is that we’re trying to use perception to train the model to remove unwanted sounds,” co-author Donald Williamson said in the statement. “If something about the signal in terms of its quality can be perceived by people, then our model can use that as additional information to learn and better remove noise.”
The findings, published in the journal IEEE Xplore, aimed to enhance speech that comes from a single audio source such as a microphone. The researchers trained the model on two datasets from previous research, which were recordings of people talking.
This model’s high performance was because of a joint-learning method that used a specialised speech enhancement language module along with a prediction model that could predict the mean opinion score that human listeners might give a noisy signal, the statement adds.
However, using human perception can also come with some issues. For instance, it is difficult to evaluate unwanted sounds as they can be subjective. It depends on a person’s hearing capabilities and experiences. Moreover, factors such as having a hearing aid or a cochlear implant affect how much a person perceives their sound environment, Williamson added in the statement.
To improve the model, the researchers plan on continuing the use of human subjective evaluations to train it to handle more complex audio systems and the changing expectations of human users.
Researchers have been developing AI models for diverse uses. For example, in December 2023, a study published in the journal Nature Computational Science showed that AI models can analyse data on people's residence, education, income, health and working conditions and predict life events with high accuracy.
Another study presented at the NeurIPS Conference in December 2023, highlighted a portable, non-invasive AI system that can decode silent thoughts and turn them into text.