23 Mar
12:00 - 13:00

UM Data Science Research Seminar

The UM Data Science Research Seminar Series are monthly sessions organized by the Institute of Data Science, in collaboration with different departments across UM. The aim of these sessions is to bring together scientists from all over Maastricht University to discuss breakthroughs and research topics related to Data Science.  This seminar is organized in collaboration with the Faculty of Psychology and Neuroscience (FPN).
In this seminar, the speakers will focus on the project "ARCHIE" (AuditoRy Cognition in Humans and MachInEs).

N.B.: All events are in-person and free of charge. We also offer participants a FREE lunch.

Schedule

 

LECTURE 1

Time: 12:00 - 12:30

Speaker: Elia Formisano

Title: Semantically-informed deep neural networks for sound recognition

Abstract: Deep neural networks (DNNs) for sound recognition learn to categorize a barking sound as a ”dog” and a meowing sound as a ”cat” but do not exploit information inherent to the semantic relations between classes (e.g., both are animal vocalisations). Cognitive neuroscience research, however, suggests that human listeners automatically exploit higher-level semantic information on the sources besides acoustic information. Inspired by this notion, we introduce here a DNN that learns to recognize sounds and simultaneously learns the semantic relation between the sources (semDNN). Comparison of semDNN with a homologous network trained with categorical labels (catDNN) revealed that semDNN produces semantically more accurate labelling than catDNN in sound recognition tasks and that semDNN-embeddings preserve higher-level semantic relations between sound sources. Importantly, through a model-based analysis of human dissimilarity ratings of natural sounds, we show that semDNN approximates the behaviour of human listeners better than catDNN and several other DNN and NLP comparison models (Esposito et al., in press).

LECTURE 2

Time: 12:30 -13:00

Speaker: Gijs Wijngaard

Title: Transformer-based automated audio captioning: applications and evaluation metrics (I)

Abstract: Automated Audio Captioning (AAC) is a multimodal task aiming to convert audio content into natural language. The AAC uses an encoder-decoder architecture. The audio and text are encoded into a computer representation, the decoder is used to generate a caption. To assess how well AAC models perform, AAC systems are evaluated on quantitative metrics applied to the text representations. Previously, researchers have applied metrics from machine translation and image captioning to evaluate a generated caption of audio. Here we introduce a novel metric inspired on cognitive neuroscientific ideas on auditory semantics: Audio Captioning Evaluation on Semantics of Sound (ACES). The assessment of automated audio captions obtained with the proposed metric is highly correlated with human judgement.