New preprint in arXiv: ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds.
2024-05-01
Read our new preprint in arXiv on a novel metric to evaluate and compare audio captions generated by humans or automated models: Gijs Wijngaard and Elia Formisano and Bruno L. Giordano and Michel Dumontier, ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds. (2024) arXiv, 2403.18572. https://arxiv.org/abs/2403.18572.
You may also want to check Gijs’s HuggingFace model associated with this article and that classifs tokens of sound caption with auditory semantic tags (Who/What/How/Where): here