Sound event localization and detection (SELD) systems estimate
direction...
This paper tackles the challenging task of evaluating socially situated
...
Recent approaches to empathetic response generation try to incorporate
c...
Diffusion-based speech enhancement (SE) has been investigated recently, ...
Time-domain speech enhancement (SE) has recently been intensively
invest...
Current Spoken Dialogue Systems (SDSs) often serve as passive listeners ...
As the aging of society continues to accelerate, Alzheimer's Disease (AD...
Connectionist temporal classification (CTC) -based models are attractive...
Connectionist temporal classification (CTC) -based models are attractive...
Conventional automatic speech recognition systems do not produce punctua...
In automatic speech recognition (ASR) rescoring, the hypothesis with the...
This article describes an efficient end-to-end speech translation (E2E-S...
In this work, we propose novel decoding algorithms to enable streaming
a...
While attention-based encoder-decoder (AED) models have been successfull...
Over the past year, research in various domains, including Natural Langu...
Following the success of spoken dialogue systems (SDS) in smartphone
ass...
A conventional approach to improving the performance of end-to-end speec...
This article describes an efficient training method for online streaming...
Fast inference speed is an important goal towards real-world deployment ...
In open-domain dialogue response generation, a dialogue context can be
c...
Attention-based sequence-to-sequence (seq2seq) models have achieved prom...
We investigate a monotonic multihead attention (MMA) by extending hard
m...
It is important to transcribe and archive speech data of endangered lang...
Monotonic chunkwise attention (MoChA) has been studied for the online
st...
Spoken language understanding, which extracts intents and/or semantic
co...
Automatic dialogue response evaluator has been proposed as an alternativ...
Ainu is an unwritten language that has been spoken by Ainu people who ar...
In this paper, we propose a simple yet effective framework for multiling...
Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) sys...
In dialog studies, we often encode a dialog using a hierarchical encoder...
Various encoder-decoder models have been applied to response generation ...
This paper describes multichannel speech enhancement for improving autom...
This work explores better adaptation methods to low-resource languages u...
This paper presents a statistical method of single-channel speech enhanc...
Detection of engagement during a conversation is an important function o...