Yuki Mitsufuji

research

∙ 09/17/2023

Zero- and Few-shot Sound Event Localization and Detection

Sound event localization and detection (SELD) systems estimate direction...

0 Kazuki Shimada, et al. ∙

research

∙ 09/13/2023

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

Restoring degraded music signals is essential to enhance audio quality f...

0 Carlos Hernandez-Olivan, et al. ∙

research

∙ 09/06/2023

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

Generative adversarial network (GAN)-based vocoders have been intensivel...

0 Takashi Shibuya, et al. ∙

research

∙ 09/05/2023

Enhancing Semantic Communication with Deep Generative Models – An ICASSP Special Session Overview

Semantic communication is poised to play a pivotal role in shaping the l...

0 Eleonora Grassucci, et al. ∙

research

∙ 08/14/2023

The Sound Demixing Challenge 2023 x2013 Cinematic Demixing Track

This paper summarizes the cinematic demixing (CDX) track of the Sound De...

0 Stefan Uhlich, et al. ∙

research

∙ 08/14/2023

The Sound Demixing Challenge 2023 x2013 Music Demixing Track

This paper summarizes the music demixing (MDX) track of the Sound Demixi...

0 Giorgio Fabbro, et al. ∙

research

∙ 07/10/2023

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Taking long-term spectral and temporal dependencies into account is esse...

0 Keisuke Toyama, et al. ∙

research

∙ 06/15/2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

While direction of arrival (DOA) of sound events is generally estimated ...

5 Kazuki Shimada, et al. ∙

research

∙ 06/01/2023

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

The emergence of various notions of “consistency” in diffusion models ha...

0 Chieh-Hsin Lai, et al. ∙

research

∙ 05/18/2023

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Diffusion-based speech enhancement (SE) has been investigated recently, ...

0 Hao Shi, et al. ∙

research

∙ 05/13/2023

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

This paper presents the crossing scheme (X-scheme) for improving the per...

0 Ryosuke Sawata, et al. ∙

research

∙ 05/11/2023

Extending Audio Masked Autoencoders Toward Audio Restoration

Audio classification and restoration are among major downstream tasks in...

0 Zhi Zhong, et al. ∙

research

∙ 05/10/2023

Diffusion-based Signal Refiner for Speech Separation

We have developed a diffusion-based speech refiner that improves the ref...

0 Masato Hirano, et al. ∙

research

∙ 05/03/2023

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives

Sustaining coherent and engaging narratives requires dialogue or storyte...

0 Silin Gao, et al. ∙

research

∙ 02/27/2023

Cross-modal Face- and Voice-style Transfer

Image-to-image translation and voice conversion enable the generation of...

2 Naoya Takahashi, et al. ∙

research

∙ 02/16/2023

An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification

Although music is typically multi-label, many works have studied hierarc...

0 Zhi Zhong, et al. ∙

research

∙ 01/30/2023

Adversarially Slicing Generative Networks: Discriminator Slices Feature for One-Dimensional Optimal Transport

Generative adversarial networks (GANs) learn a target probability distri...

0 Yuhta Takida, et al. ∙

research

∙ 01/30/2023

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

Pre-trained diffusion models have been successfully used as priors in a ...

0 Naoki Murata, et al. ∙

research

∙ 12/14/2022

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Recent years have seen progress beyond domain-specific sound separation ...

6 Hao-Wen Dong, et al. ∙

research

∙ 11/08/2022

Unsupervised vocal dereverberation with diffusion-based generative models

Removing reverb from reverberant music is a necessary technique to clean...

0 Koichi Saito, et al. ∙

research

∙ 11/04/2022

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

We propose an end-to-end music mixing style transfer system that convert...

0 Junghyun Koo, et al. ∙

research

∙ 10/27/2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Although deep neural network (DNN)-based speech enhancement (SE) methods...

0 Ryosuke Sawata, et al. ∙

research

∙ 10/23/2022

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge

Understanding rich narratives, such as dialogues and stories, often requ...

0 Silin Gao, et al. ∙

research

∙ 10/20/2022

Robust One-Shot Singing Voice Conversion

Many existing works on singing voice conversion (SVC) require clean reco...

0 Naoya Takahashi, et al. ∙

research

∙ 10/14/2022

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Recent progress in deep generative models has improved the quality of ne...

5 Naoya Takahashi, et al. ∙

research

∙ 10/11/2022

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

In this paper we propose a novel generative approach, DiffRoll, to tackl...

16 Kin Wai Cheuk, et al. ∙

research

∙ 10/09/2022

Regularizing Score-based Models with Score Fokker-Planck Equations

Score-based generative models learn a family of noise-conditional score ...

0 Chieh-Hsin Lai, et al. ∙

research

∙ 08/24/2022

Automatic music mixing with deep learning and out-of-domain data

Music mixing traditionally involves recording instruments in the form of...

0 Marco A. Martínez Ramírez, et al. ∙

research

∙ 06/04/2022

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (ST...

0 Archontis Politis, et al. ∙

research

∙ 05/16/2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

One noted issue of vector-quantized variational autoencoder (VQ-VAE) is ...

26 Yuhta Takida, et al. ∙

research

∙ 02/03/2022

Removing Distortion Effects in Music Using Deep Neural Networks

Audio effects are an essential element in the context of music productio...

0 Johannes Imort, et al. ∙

research

∙ 10/14/2021

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

Sound event localization and detection (SELD) involves identifying the d...

0 Kazuki Shimada, et al. ∙

research

∙ 10/13/2021

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks

A central task of a Disc Jockey (DJ) is to create a mixset of mu-sic wit...

0 Bo-Yu Chen, et al. ∙

research

∙ 10/13/2021

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

Recording and annotating real sound events for a sound event localizatio...

0 Yuichiro Koyama, et al. ∙

research

∙ 10/13/2021

Music Source Separation with Deep Equilibrium Models

While deep neural network-based music source separation (MSS) is very ef...

0 Yuichiro Koyama, et al. ∙

research

∙ 10/12/2021

Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

Data augmentation methods have shown great importance in diverse supervi...

0 Ricardo Falcon-Perez, et al. ∙

research

∙ 10/11/2021

Amicable examples for informed source separation

This paper deals with the problem of informed source separation (ISS), w...

0 Naoya Takahashi, et al. ∙

research

∙ 10/11/2021

Source Mixing and Separation Robust Audio Steganography

Audio steganography aims at concealing secret information in carrier aud...

0 Naoya Takahashi, et al. ∙

research

∙ 08/31/2021

Music Demixing Challenge 2021

Music source separation has been intensively studied in the last decade ...

0 Yuki Mitsufuji, et al. ∙

research

∙ 06/21/2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

This report describes our systems submitted to the DCASE2021 challenge t...

0 Kazuki Shimada, et al. ∙

research

∙ 05/26/2021

Training Speech Enhancement Systems with Noisy Speech Datasets

Recently, deep neural network (DNN)-based speech enhancement (SE) system...

0 Koichi Saito, et al. ∙

research

∙ 02/17/2021

Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE

Variational autoencoders (VAEs) often suffer from posterior collapse, wh...

26 Yuhta Takida, et al. ∙

research

∙ 01/18/2021

Hierarchical disentangled representation learning for singing voice conversion

Conventional singing voice conversion (SVC) methods often suffer from op...

20 Naoya Takahashi, et al. ∙

research

∙ 11/21/2020

Densely connected multidilated convolutional networks for dense prediction tasks

Tasks that involve high-resolution dense prediction require a modeling o...

8 Naoya Takahashi, et al. ∙

research

∙ 10/29/2020

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

Neural-network (NN)-based methods show high performance in sound event l...

0 Kazuki Shimada, et al. ∙

research

∙ 10/08/2020

All for One and One for All: Improving Music Separation by Bridging Networks

This paper proposes several improvements for music separation with deep ...

0 Ryosuke Sawata, et al. ∙

research

∙ 10/07/2020

Adversarial attacks on audio source separation

Despite the excellent performance of neural-network-based audio source s...

0 Naoya Takahashi, et al. ∙

research

∙ 10/05/2020

D3Net: Densely connected multidilated DenseNet for music source separation

Music source separation involves a large input field to model a long-ter...

0 Naoya Takahashi, et al. ∙

research

∙ 06/22/2020

Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net

Our systems submitted to the DCASE2020 task 3: Sound Event Localization ...

0 Kazuki Shimada, et al. ∙

research

∙ 11/29/2019

Improving Voice Separation by Incorporating End-to-end Speech Recognition

Despite recent advances in voice separation methods, many challenges rem...

0 Naoya Takahashi, et al. ∙

Yuki Mitsufuji

Featured Co-authors

Sign in with Google

Consider DeepAI Pro