We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...
Audio-visual learning helps to comprehensively understand the world by f...
Multi-channel speech separation using speaker's directional information ...
Recently, frequency domain all-neural beamforming methods have achieved
...
While current deep learning (DL)-based beamforming techniques have been
...
Acoustic echo cancellation (AEC) plays an important role in the full-dup...
In this paper, we present a novel framework that jointly performs speake...
Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
Conversational bilingual speech encompasses three types of utterances: t...
Automatic speech recognition (ASR) of multi-channel multi-speaker overla...
Acoustic echo cancellation (AEC) is a technique used in full-duplex
comm...
We present a neural-network-based fast diffuse room impulse response
gen...
To date, mainstream target speech separation (TSS) approaches are formul...
Recently, our proposed recurrent neural network (RNN) based all deep lea...
Recently we proposed an all-deep-learning minimum variance distortionles...
Many purely neural network based speech separation approaches have been
...
This paper proposes a new paradigm for handling far-field multi-speaker ...
Speech enhancement and speech separation are two related tasks, whose pu...
Purely neural network (NN) based speech separation and enhancement metho...
Target speech separation refers to extracting a target speaker's voice f...
Hand-crafted spatial features (e.g., inter-channel phase difference, IPD...
Speaker diarization, which is to find the speech segments of specific
sp...
Automatic recognition of overlapped speech remains a highly challenging ...
Speech separation refers to extracting each individual speech source in ...
Background noise, interfering speech and room reverberation frequently
d...
Speech separation has been studied widely for single-channel close-talk
...
The end-to-end approach for single-channel speech separation has been st...
The cloud-based speech recognition/API provides developers or enterprise...
This paper summarizes several follow-up contributions for improving our
...
Audio-visual multi-modal modeling has been demonstrated to be effective ...
A new type of End-to-End system for text-dependent speaker verification ...