2024-10-21
In this new paper,we examine the nature of sound representations in intermediate layers of convolutional DNNs by means of in silico experiments involving a new sound data set of material and actions. Furthermore, by means of a new methodology based on invertible neural networks, we show that there is a causal relationship between these internal representations and the semantic model output. Deciphering the Transformation of Sounds into Meaning: Insights from Disentangling Intermediate Representations in Sound-to-Event DNNs. Tim Dick, Alexia Briassouli, Enrique Hortal Quesada and Elia Formisano (2024) Available at SSRN: https://ssrn.com/abstract=4979651 or http://dx.doi.org/10.2139/ssrn.4979651.
2024-10-21
In this new article in Scientific Report (Open Access), we have learned that deep neural networks (DNN) mapping sounds to distributed language semantics approximate human listeners’ behavior better than standard DNNs with categorical output: Esposito, M., Valente, G., Plasencia-Calaña, Y., Michel Dumontier, Bruno L. Giordano, Elia Formisano. Bridging auditory perception and natural language processing with semantically informed deep neural networks. Scientific Reports 14, 20994 (2024). https://doi.org/10.1038/s41598-024-71693-9
2024-10-21
Read our preprint on arxiv reporting a survey of the many available audio-language datasets. The survey includes a useful overview of the different audio-language tasks and an informative analysis examining the overlap between the different datasets. Audio-Language Datasets of Scenes and Events: A Survey. Gijs Wijngaard, Elia Formisano, Michele Esposito, Michel Dumontier. https://arxiv.org/abs/2407.06947
2024-05-01
Read our new preprint in bioRxiv on enhancing sound recognition with semantics. We found that integrating NLP embeddings into CNNs enhances the network's sound recognition performance and aligns it with human auditory perception: Bridging Auditory Perception and Natural Language Processing with Semantically informed Deep Neural Networks. Michele Esposito, Giancarlo Valente, Yenisel Plasencia-Calaña, Michel Dumontier, Bruno L. Giordano, and Elia Formisano,
2024-05-01
Read our new preprint in arXiv on a novel metric to evaluate and compare audio captions generated by humans or automated models: Gijs Wijngaard and Elia Formisano and Bruno L. Giordano and Michel Dumontier, ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds. (2024) arXiv, 2403.18572. https://arxiv.org/abs/2403.18572. You may also want to check Gijs's HuggingFace model associated with this article and that classifs tokens of sound caption with auditory semantic tags (Who/What/How/Where): here
2023-03-24
Read our new article in Nature Neuroscience (Open Access) on how deep neural networks and other computational models can help us understanding how sounds are transformed into meaning in the human brain: Giordano, B.L., Esposito, M., Valente, G., Formisano. Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds. Nat Neurosci (2023). https://doi.org/10.1038/s41593-023-01285-9
2022-07-14
Do you have a genuine interest in auditory perception? Are you familiar with neuroimaging acquisition and analysis methods? Apply now for a PhD student position in Cognitive Neuroscience and join our team! (closed)
2021-10-21
Are you passionate about combining AI research and neuroscience and find out how the brain recognize sounds? During the coming year Ph.D. student positions will be available in Maastricht (1 position, 4 years) and Marseille (2 positions, 3 years) and you will have the possibility to join our team. We will post here information and links to the official applications, so stay tuned...
2021-10-14
The NWO-funded project "Aud2Sem: Acoustic to semantic transformations in Human Auditory Cortex" led by E.Formisano has officially started on October, 1st, 2021. Find below the project abstract and stay tuned for further information... A bird chirping, a glass breaking, an ambulance passing by. Listening to sounds helps recognizing events and objects, even when they are out of sight, in the dark or behind a wall, for example. Despite rapid progress in the field of auditory cognition, we know very little about how the brain transforms acoustic sound representations into meaningful source representations. To fill this gap, this project develops a neurobiologically-grounded computational model of sound recognition by combining advanced methodologies from information science, artificial intelligence, and cognitive neuroscience. First, we develop "Sounds", an ontology that characterizes a large number of everyday sounds and their taxonomic relation in terms of sound-generating mechanisms and properties of the corresponding sources. Second, we develop novel ontology-based deep neural networks (DNNs) that combine acoustic sound analysis with high-level information over the sound sources and learn to perform sound recognition tasks at different abstraction levels. In parallel, we measure brain responses with sub-millimeter functional MRI (fMRI) and intracranial electroencephalography (iEEG) in human listeners as they perform the same SR tasks. We expect that the ontology-based DNNs will explain measured laminar-specific (fMRI) and spectro-temporally resolved (iEEG) neural responses in auditory cortex better than existing acoustic and semantic models. Results of this research will provide mechanistic insights into the acoustic-to-semantic transformations in the human brain. Furthermore, the ontology of everyday sounds and the methods developed for embedding ontological information in DNNs will be of broad academic interest and relevant for the rapidly-expanding societal applications of artificial hearing. See NWO website