Institut Mines-Télécom (IMT) has announced that the HI-Audio project developed by Gaël Richard, professor at Télécom Paris, has won a 2021 Advanced ERC grant. This grant is awarded to experienced, internationally-recognized researchers. It provides funding for a research project that opens up new opportunities based on its research activities. The winning Machine Listening project focuses on using new AI models to analyze and understand sound. Target applications include speech and audio scene analysis, musical information retrieval and sound transformation and synthesis.
As part of the Horizon Europe program, these grants provide funding for experienced researchers whose work has already gained worldwide recognition. “Advanced Grants” awarded by the ERC (European Research Council) target ambitious, high-risk research projects at the frontiers of knowledge, responding to innovative scientific challenges or obstacles. They provide up to €2.5 million in funding over a period of up to 5 years.
Gaël Richard is a specialist in audio signal processing who already won the IMT-Académie des Sciences Grand Prix Award in 2020. After earning his PhD at University Paris-Sud in 1994, he began his research career by studying singing voice synthesis, followed by speech synthesis. His work in the field of signal processing led him to find new methods for decomposing voice into the constituent elements of audio signal, in order to better recreate a synthetic voice. He thus developed the principle of decomposing signal as a product of two positive matrices: one representing the basic components of sound, and the other indicating the activation of these components over time.
The HI-Audio project – Creating new AI models for sound analysis
Machine Listening, or AI for sound, involves audio analysis, understanding and synthesis by a machine. Access to increasingly powerful supercomputers, combined with the availability of huge data repositories (although largely unannotated), has fostered the development of purely data-driven machine learning approaches. The field has been quick to focus on end-to-end neural approaches aimed at directly solving the problem of machine learning for raw acoustic signals, without giving enough consideration to the nature and structure of the data processed.
The main consequences of this are that the models:
- are excessively complexes and require massive amounts of data to be trained and extreme computational power to be effective (in terms of task performance)
- remain largely unexplainable and uninterpretable
- Gaël Richard’s research seeks to overcome these major shortcomings. “We think that our prior knowledge about the nature of the data processed, their generation processes and perception by humans should be explicitly used in neural-based machine learning.”
The aim of the project awarded the Advanced ERC grant is therefore to develop deep hybrid approaches combining signal models that are effective in terms of parameters and interpretable, musicological and physics models, with customized deep neural architectures. The research directions pursued in HI-Audio will use novel deterministic and statistical audio and sound environment models with dedicated neural auto-encoders and generative networks. They will target specific applications including speech and audio scene analysis, music information retrieval and sound transformation and synthesis.