Recent advancements in large language models (LLMs) have transformed the...
Expressive text-to-speech (TTS) can synthesize a new speaking style by
i...
Generating sound effects that humans want is an important topic. However...
The past ten years have witnessed the rapid development of text-based in...
Transformer-based models attain excellent results and generalize well wh...
Target sound detection (TSD) aims to detect the target sound from a mixt...
Target sound detection (TSD) aims to detect the target sound from mixtur...
Target sound extraction (TSE) aims to extract the sound part of a target...
Human beings can perceive a target sound that we are interested in from ...
Automated audio captioning (AAC) has developed rapidly in recent years,
...
Although prototypical network (ProtoNet) has proved to be an effective m...
While Machine Comprehension (MC) has attracted extensive research intere...
It is well known that the mismatch between training (source) and test
(t...
Transformer-based self-supervised models are trained as feature extracto...
In this paper, we present SpecAugment++, a novel data augmentation metho...
Weakly labelled audio tagging aims to predict the classes of sound event...
Recently, convolutional neural networks (CNN) have achieved the
state-of...
Convolutional neural networks (CNN) are one of the best-performing neura...