Speech emotion recognition (SER) models typically rely on costly
human-l...
Advertisement videos (ads) play an integral part in the domain of Intern...
The process of human affect understanding involves the ability to infer
...
Heterogeneous graphs provide a compact, efficient, and scalable way to m...
Audio event detection is a widely studied audio processing task, with
ap...
Longform media such as movies have complex narrative structures, with ev...
Perception of auditory events is inherently multimodal relying on both a...
This technical report presents the modeling approaches used in our submi...
Large scale databases with high-quality manual annotations are scarce in...
Societal ideas and trends dictate media narratives and cinematic depicti...
Robust face clustering is a key step towards computational understanding...
Violent content in the media can influence viewers' perception of the
so...
A key objective in multi-view learning is to model the information commo...
An objective understanding of media depictions, such as about inclusive
...
The primary characteristic of robust speaker representations is that the...
In this paper, we address the problem of speaker recognition in challeng...
We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an
ex...