Data-driven approaches hold promise for audio captioning. However, the
d...
Self-supervised learning methods have achieved promising performance for...
Audio super-resolution is a fundamental task that predicts high-frequenc...
Fish feeding intensity assessment (FFIA) aims to evaluate the intensity
...
This survey paper provides a comprehensive overview of the recent
advanc...
Sound events in daily life carry rich information about the objective wo...
For learning-based sound event localization and detection (SELD) methods...
Automated audio captioning (AAC) which generates textual descriptions of...
Although deep learning is the mainstream method in unsupervised anomalou...
Existing contrastive learning methods for anomalous sound detection refi...
State-of-the-art audio captioning methods typically use the encoder-deco...
The advancement of audio-language (AL) multimodal learning tasks has bee...
Differentiable particle filters are an emerging class of particle filter...
Deep learning-based methods have achieved significant performance for im...
Automated audio captioning is a cross-modal translation task for describ...
Vision transformers, which were originally developed for natural languag...
This study defines a new evaluation metric for audio tagging tasks to
ov...
Audio captioning is the task of generating captions that describe the co...
Most existing deep learning-based acoustic scene classification (ASC)
ap...
Audio tagging aims to assign predefined tags to audio clips to indicate ...
Automated audio captioning (AAC) aims to describe the content of an audi...
The audio spectrogram is a time-frequency representation that has been w...
Recently, there has been increasing interest in building efficient audio...
Few-shot audio event detection is a task that detects the occurrence tim...
Few-shot bioacoustic event detection is a task that detects the occurren...
Continuously learning new classes without catastrophic forgetting is a
c...
Automated audio captioning is a cross-modal translation task that aims t...
Target sound detection (TSD) aims to detect the target sound from a mixt...
Target sound detection (TSD) aims to detect the target sound from mixtur...
Audio-text retrieval aims at retrieving a target audio clip or caption f...
In this paper, we introduce the task of language-queried audio source
se...
Acoustic scene classification (ASC) aims to classify an audio clip based...
Audio captioning aims at using natural language to describe the content ...
Unsupervised anomalous sound detection aims to detect unknown abnormal s...
Automated audio captioning (AAC) aims to describe audio data with captio...
Audio captioning aims at generating natural language descriptions for au...
In a recent study of auditory evoked potential (AEP) based brain-compute...
Single channel blind source separation (SCBSS) refers to separate multip...
Although prototypical network (ProtoNet) has proved to be an effective m...
The availability of audio data on sound sharing platforms such as Freeso...
Automated audio captioning aims to use natural language to describe the
...
Deep generative models have recently achieved impressive performance in
...
Automated Audio captioning (AAC) is a cross-modal translation task that ...
Audio captioning aims to automatically generate a natural language
descr...
This paper proposes a deep learning framework for classification of BBC
...
In this paper, we present SpecAugment++, a novel data augmentation metho...
Speech enhancement aims to obtain speech signals with high intelligibili...
Data augmentation is an inexpensive way to increase training data divers...
Weakly labelled audio tagging aims to predict the classes of sound event...
Polyphonic sound event localization and detection (SELD), which jointly
...