In this new paper,we examine the nature of sound representations in intermediate layers of convolutional DNNs by means of in silico experiments involving a new sound data set of material and actions. Furthermore, by means of a new methodology based on invertible neural networks, we show that there is a causal relationship between these internal representations and the semantic model output.
Deciphering the Transformation of Sounds into Meaning: Insights from Disentangling Intermediate Representations in Sound-to-Event DNNs. Tim Dick, Alexia Briassouli, Enrique Hortal Quesada and Elia Formisano (2024) Available at SSRN: https://ssrn.com/abstract=4979651 or http://dx.doi.org/10.2139/ssrn.4979651.
In this new article in Scientific Report (Open Access), we have learned that deep neural networks (DNN) mapping sounds to distributed language semantics approximate human listeners’ behavior better than standard DNNs with categorical output:
Esposito, M., Valente, G., Plasencia-Calaña, Y., Michel Dumontier, Bruno L. Giordano, Elia Formisano. Bridging auditory perception and natural language processing with semantically informed deep neural networks. Scientific Reports 14, 20994 (2024). https://doi.org/10.1038/s41598-024-71693-9
Read our preprint on arxiv reporting a survey of the many available audio-language datasets. The survey includes a useful overview of the different audio-language tasks and an informative analysis examining the overlap between the different datasets.
Audio-Language Datasets of Scenes and Events: A Survey. Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier. https://arxiv.org/abs/2407.06947
Read our new preprint in bioRxiv on enhancing sound recognition with semantics. We found that integrating NLP embeddings into CNNs enhances the network’s sound recognition performance and aligns it with human auditory perception:
Bridging Auditory Perception and Natural Language Processing with Semantically informed Deep Neural Networks. Michele Esposito, Giancarlo Valente, Yenisel Plasencia-Calaña, Michel Dumontier, Bruno L. Giordano, and Elia Formisano,
Read our new preprint in arXiv on a novel metric to evaluate and compare audio captions generated by humans or automated models: Gijs Wijngaard and Elia Formisano and Bruno L. Giordano and Michel Dumontier, ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds. (2024) arXiv, 2403.18572. https://arxiv.org/abs/2403.18572.
You may also want to check Gijs’s HuggingFace model associated with this article and that classifs tokens of sound caption with auditory semantic tags (Who/What/How/Where): here
Read our new article in Nature Neuroscience (Open Access) on how deep neural networks and other computational models can help us understanding how sounds are transformed into meaning in the human brain:
Giordano, B.L., Esposito, M., Valente, G., Formisano. Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds. Nat Neurosci (2023). https://doi.org/10.1038/s41593-023-01285-9
Maria de Araújo Nobre Duarte Vitória has joined the Aud2Sem research team as a full-time PhD student. Welcome, Maria!
Do you have a genuine interest in auditory perception? Are you familiar with neuroimaging acquisition and analysis methods? Apply now for a PhD student position in Cognitive Neuroscience and join our team! (closed)
Are you passionate about combining AI research and neuroscience and find out how the brain recognize sounds? During the coming year Ph.D. student positions will be available in Maastricht (1 position, 4 years) and Marseille (2 positions, 3 years) and you will have the possibility to join our team. We will post here information and links to the official applications, so stay tuned…