In this study, we investigate whether speech symbols, learned through de...
This paper proposes a method for extracting a lightweight subset from a
...
A method for estimating the incident sound field inside a region contain...
A sound field synthesis method enhancing perceptual quality is proposed....
In this paper, we address the multichannel blind source extraction (BSE)...
We propose HumanDiffusion, a diffusion model trained from humans'
percep...
In this paper, we propose algorithms for handling non-integer strides in...
A multichannel active noise control (ANC) method with exterior radiation...
We examine the speech modeling potential of generative spoken language
m...
We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS)
...
We present CALLS, a Japanese speech corpus that considers phone calls in...
We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of
Ja...
We present a large-scale in-the-wild Japanese laughter corpus and a laug...
A spatial active noise control (ANC) method based on the interpolation o...
An interpolation method for region-to-region acoustic transfer functions...
Pause insertion, also known as phrase break prediction and phrasing, is ...
While neural text-to-speech (TTS) has achieved human-like natural synthe...
We construct a corpus of Japanese a cappella vocal ensembles (jaCappella...
We present a multi-speaker Japanese audiobook text-to-speech (TTS) syste...
This paper proposes a method for selecting training data for text-to-spe...
In this paper, we propose a method for intermediating multiple speakers'...
We propose a training method for spontaneous speech synthesis models tha...
We propose a method for synthesizing environmental sounds from visually
...
We present a comprehensive empirical study for personalized spontaneous
...
We propose a novel training algorithm for a multi-speaker neural
text-to...
We propose a method of head-related transfer function (HRTF) interpolati...
A sound field estimation method based on a physics-informed convolutiona...
We present an emotion recognition system for nonverbal vocalizations (NV...
This paper proposes a human-in-the-loop speaker-adaptation method for
mu...
We propose an end-to-end empathetic dialogue speech synthesis (DSS) mode...
This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder...
We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...
We present STUDIES, a new speech corpus for developing a voice agent tha...
This paper proposes visual-text to speech (vTTS), a method for synthesiz...
We present a self-supervised speech restoration method without paired sp...
In this paper, we propose a method to generate personalized filled pause...
A spatial active noise control (ANC) method based on the individual kern...
A differentiable digital signal processing (DDSP) autoencoder is a music...
In this paper, we construct a Japanese audiobook speech corpus called "J...
A method of optimizing secondary source placement in sound field synthes...
A method to estimate an acoustic field from discrete microphone measurem...
Incremental text-to-speech (TTS) synthesis generates utterances in small...
A method of binaural rendering from microphone array signals of arbitrar...
Rank-constrained spatial covariance matrix estimation (RCSCME) is a meth...
Independent deeply learned matrix analysis (IDLMA) is one of the
state-o...
When we place microphones close to a sound source near other sources in ...
We address the determined audio source separation problem in the
time-fr...
Independent deeply learned matrix analysis (IDLMA) is one of the
state-o...
Audio source separation is often used as preprocessing of various
applic...
Rank-constrained spatial covariance matrix estimation (RCSCME) is a
stat...