Sound event localization and detection (SELD) systems estimate
direction...
Restoring degraded music signals is essential to enhance audio quality f...
Generative adversarial network (GAN)-based vocoders have been intensivel...
Semantic communication is poised to play a pivotal role in shaping the
l...
This paper summarizes the cinematic demixing (CDX) track of the Sound
De...
This paper summarizes the music demixing (MDX) track of the Sound Demixi...
Taking long-term spectral and temporal dependencies into account is esse...
While direction of arrival (DOA) of sound events is generally estimated ...
The emergence of various notions of “consistency” in diffusion models ha...
Diffusion-based speech enhancement (SE) has been investigated recently, ...
This paper presents the crossing scheme (X-scheme) for improving the
per...
Audio classification and restoration are among major downstream tasks in...
We have developed a diffusion-based speech refiner that improves the
ref...
Sustaining coherent and engaging narratives requires dialogue or storyte...
Image-to-image translation and voice conversion enable the generation of...
Although music is typically multi-label, many works have studied hierarc...
Generative adversarial networks (GANs) learn a target probability
distri...
Pre-trained diffusion models have been successfully used as priors in a
...
Recent years have seen progress beyond domain-specific sound separation ...
Removing reverb from reverberant music is a necessary technique to clean...
We propose an end-to-end music mixing style transfer system that convert...
Although deep neural network (DNN)-based speech enhancement (SE) methods...
Understanding rich narratives, such as dialogues and stories, often requ...
Many existing works on singing voice conversion (SVC) require clean
reco...
Recent progress in deep generative models has improved the quality of ne...
In this paper we propose a novel generative approach, DiffRoll, to tackl...
Score-based generative models learn a family of noise-conditional score
...
Music mixing traditionally involves recording instruments in the form of...
This report presents the Sony-TAu Realistic Spatial Soundscapes 2022
(ST...
One noted issue of vector-quantized variational autoencoder (VQ-VAE) is ...
Audio effects are an essential element in the context of music productio...
Sound event localization and detection (SELD) involves identifying the
d...
A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic wit...
Recording and annotating real sound events for a sound event localizatio...
While deep neural network-based music source separation (MSS) is very
ef...
Data augmentation methods have shown great importance in diverse supervi...
This paper deals with the problem of informed source separation (ISS), w...
Audio steganography aims at concealing secret information in carrier aud...
Music source separation has been intensively studied in the last decade ...
This report describes our systems submitted to the DCASE2021 challenge t...
Recently, deep neural network (DNN)-based speech enhancement (SE) system...
Variational autoencoders (VAEs) often suffer from posterior collapse, wh...
Conventional singing voice conversion (SVC) methods often suffer from
op...
Tasks that involve high-resolution dense prediction require a modeling o...
Neural-network (NN)-based methods show high performance in sound event
l...
This paper proposes several improvements for music separation with deep
...
Despite the excellent performance of neural-network-based audio source
s...
Music source separation involves a large input field to model a long-ter...
Our systems submitted to the DCASE2020 task 3: Sound Event Localization ...
Despite recent advances in voice separation methods, many challenges rem...