Transformer-based end-to-end neural speaker diarization (EEND) models ut...
In acoustic scene classification (ASC) task, an acoustic scene consists ...
The remarkable performance of the pre-trained language model (LM) using
...
Attribution methods calculate attributions that visually explain the
pre...
As for the humanoid robots, the internal noise, which is generated by mo...
Distant speech recognition is a challenge, particularly due to the corru...