Hiroshi Saruwatari

research

∙ 09/18/2023

Do learned speech symbols follow Zipf's law?

In this study, we investigate whether speech symbols, learned through de...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 09/15/2023

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

This paper proposes a method for extracting a lightweight subset from a ...

0 Kentaro Seki, et al. ∙

research

∙ 09/11/2023

Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects

A method for estimating the incident sound field inside a region contain...

0 Shoichi Koyama, et al. ∙

research

∙ 07/26/2023

Perceptual Quality Enhancement of Sound Field Synthesis Based on Combination of Pressure and Amplitude Matching

A sound field synthesis method enhancing perceptual quality is proposed....

0 Keisuke Kimura, et al. ∙

research

∙ 06/22/2023

NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction

In this paper, we address the multichannel blind source extraction (BSE)...

0 Koki Nishida, et al. ∙

research

∙ 06/21/2023

HumanDiffusion: diffusion model using perceptual gradients

We propose HumanDiffusion, a diffusion model trained from humans' percep...

0 Yota Ueda, et al. ∙

research

∙ 06/19/2023

Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides

In this paper, we propose algorithms for handling non-integer strides in...

0 Kanami Imamura, et al. ∙

research

∙ 06/15/2023

Multichannel Active Noise Control with Exterior Radiation Suppression Based on Riemannian Optimization

A multichannel active noise control (ANC) method with exterior radiation...

0 Takaaki Kojima, et al. ∙

research

∙ 06/01/2023

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics

We examine the speech modeling potential of generative spoken language m...

0 Joonyong Park, et al. ∙

research

∙ 05/23/2023

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS) ...

0 Yuki Saito, et al. ∙

research

∙ 05/23/2023

CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center

We present CALLS, a Japanese speech corpus that considers phone calls in...

0 Yuki Saito, et al. ∙

research

∙ 05/21/2023

JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions

We present JNV (Japanese Nonverbal Vocalizations) corpus, a corpus of Ja...

0 Detai Xin, et al. ∙

research

∙ 05/21/2023

Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus

We present a large-scale in-the-wild Japanese laughter corpus and a laug...

0 Detai Xin, et al. ∙

research

∙ 03/28/2023

Spatial Active Noise Control Method Based On Sound Field Interpolation From Reference Microphone Signals

A spatial active noise control (ANC) method based on the interpolation o...

0 Kazuyuki Arikawa, et al. ∙

research

∙ 03/07/2023

Kernel interpolation of acoustic transfer functions with adaptive kernel for directed and residual reverberations

An interpolation method for region-to-region acoustic transfer functions...

0 Juliano G. C. Ribeiro, et al. ∙

research

∙ 02/27/2023

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Pause insertion, also known as phrase break prediction and phrasing, is ...

0 Dong Yang, et al. ∙

research

∙ 01/30/2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

While neural text-to-speech (TTS) has achieved human-like natural synthe...

0 Takaaki Saeki, et al. ∙

research

∙ 11/29/2022

jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus

We construct a corpus of Japanese a cappella vocal ensembles (jaCappella...

0 Tomohiko Nakamura, et al. ∙

research

∙ 11/04/2022

Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts

We present a multi-speaker Japanese audiobook text-to-speech (TTS) syste...

0 Detai Xin, et al. ∙

research

∙ 10/26/2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

This paper proposes a method for selecting training data for text-to-spe...

0 Kentaro Seki, et al. ∙

research

∙ 10/18/2022

Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

In this paper, we propose a method for intermediating multiple speakers'...

0 Aya Watanabe, et al. ∙

research

∙ 10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...

0 Yuta Matsunaga, et al. ∙

research

∙ 10/17/2022

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

We propose a method for synthesizing environmental sounds from visually ...

0 Hien Ohnaka, et al. ∙

research

∙ 10/14/2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

We present a comprehensive empirical study for personalized spontaneous ...

0 Yuta Matsunaga, et al. ∙

research

∙ 09/26/2022

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

We propose a novel training algorithm for a multi-speaker neural text-to...

0 Yusuke Nakai, et al. ∙

research

∙ 07/22/2022

Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning

We propose a method of head-related transfer function (HRTF) interpolati...

0 Yuki Ito, et al. ∙

research

∙ 07/22/2022

Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation

A sound field estimation method based on a physics-informed convolutiona...

0 Kazuhide Shigemi, et al. ∙

research

∙ 06/21/2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations

We present an emotion recognition system for nonverbal vocalizations (NV...

0 Detai Xin, et al. ∙

research

∙ 06/21/2022

Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

This paper proposes a human-in-the-loop speaker-adaptation method for mu...

0 Kenta Udagawa, et al. ∙

research

∙ 06/16/2022

Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

We propose an end-to-end empathetic dialogue speech synthesis (DSS) mode...

0 Yuto Nishimura, et al. ∙

research

∙ 04/22/2022

Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation

This paper presents a speaking-rate-controllable HiFi-GAN neural vocoder...

0 Detai Xin, et al. ∙

research

∙ 04/05/2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...

0 Takaaki Saeki, et al. ∙

research

∙ 03/28/2022

STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

We present STUDIES, a new speech corpus for developing a voice agent tha...

0 Yuki Saito, et al. ∙

research

∙ 03/28/2022

vTTS: visual-text to speech

This paper proposes visual-text to speech (vTTS), a method for synthesiz...

0 Yoshifumi Nakano, et al. ∙

research

∙ 03/24/2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

We present a self-supervised speech restoration method without paired sp...

0 Takaaki Saeki, et al. ∙

research

∙ 03/18/2022

Personalized filled-pause generation with group-wise prediction models

In this paper, we propose a method to generate personalized filled pause...

0 Yuta Matsunaga, et al. ∙

research

∙ 02/10/2022

Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields

A spatial active noise control (ANC) method based on the individual kern...

0 Kazuyuki Arikawa, et al. ∙

research

∙ 02/01/2022

Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

A differentiable digital signal processing (DDSP) autoencoder is a music...

0 Masaya Kawamura, et al. ∙

research

∙ 01/26/2022

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

In this paper, we construct a Japanese audiobook speech corpus called "J...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 12/10/2021

Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

A method of optimizing secondary source placement in sound field synthes...

0 Keisuke Kimura, et al. ∙

research

∙ 10/11/2021

Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

A method to estimate an acoustic field from discrete microphone measurem...

0 Ryosuke Horiuchi, et al. ∙

research

∙ 09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...

0 Takaaki Saeki, et al. ∙

research

∙ 09/15/2021

Binaural rendering from microphone array signals of arbitrary geometry

A method of binaural rendering from microphone array signals of arbitrar...

0 Naoto Iijima, et al. ∙

research

∙ 09/10/2021

Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis

Rank-constrained spatial covariance matrix estimation (RCSCME) is a meth...

0 Sota Misawa, et al. ∙

research

∙ 09/02/2021

Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

Independent deeply learned matrix analysis (IDLMA) is one of the state-o...

0 Takuya Hasumi, et al. ∙

research

∙ 09/01/2021

Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization

When we place microphones close to a sound source near other sources in ...

0 Yusaku Mizobuchi, et al. ∙

research

∙ 06/10/2021

Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation

We address the determined audio source separation problem in the time-fr...

0 Naoki Narisawa, et al. ∙

research

∙ 06/07/2021

Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation

Independent deeply learned matrix analysis (IDLMA) is one of the state-o...

0 Takuya Hasumi, et al. ∙

research

∙ 05/10/2021

Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method

Audio source separation is often used as preprocessing of various applic...

0 Koichi Saito, et al. ∙

research

∙ 05/06/2021

Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction

Rank-constrained spatial covariance matrix estimation (RCSCME) is a stat...

0 Yuto Kondo, et al. ∙

Hiroshi Saruwatari

Featured Co-authors

Sign in with Google

Consider DeepAI Pro