Large Language Models (LLMs) have revolutionized natural language proces...
Existing propositions often rely on logical constants for classification...
Large language models (LLMs) have been successfully adapted for interact...
In this work, we address the challenge of encoding speech captured by a
...
We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...
Multi-document summarization aims to obtain core information from a
coll...
Mapping two modalities, speech and text, into a shared representation sp...
Automatic speech recognition (ASR) based on transducers is widely used. ...
We consider the problem of eliciting compositional generalization
capabi...
Although large-scale pre-trained language models (PTLMs) are shown to en...
Reasoning in mathematical domains remains a significant challenge for
re...
Recently developed large language models have achieved remarkable succes...
Various applications of voice synthesis have been developed independentl...
We consider the problem of Open-world Information Extraction (Open-world...
Traditional sentence embedding models encode sentences into vector
repre...
Researchers have proposed various information extraction (IE) techniques...
Although large language models demonstrate remarkable question-answering...
Multi-channel speech separation using speaker's directional information ...
Knowledge-aided dialogue response generation aims at augmenting chatbots...
Current self-training methods such as standard self-training, co-trainin...
Expressive text-to-speech (TTS) aims to synthesize different speaking st...
Humans can listen to a target speaker even in challenging acoustic condi...
Aspect or query-based summarization has recently caught more attention, ...
Recently, frequency domain all-neural beamforming methods have achieved
...
Knowledge base completion (KBC) aims to predict the missing links in
kno...
While current deep learning (DL)-based beamforming techniques have been
...
Event extraction (EE) is the task of identifying interested event mentio...
Fully-parametric language models generally require a huge number of mode...
Text segmentation is important for signaling a document's structure. Wit...
In this paper, we propose a comprehensive benchmark to investigate model...
Abstractive summarization models typically learn to capture the salient
...
Large-scale pretrained language models have made significant advances in...
Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a
...
Speaker identification, determining which character said each utterance ...
Although large language models have achieved impressive zero-shot abilit...
Self-supervised learning (SSL) has drawn an increased attention in the f...
Generating sound effects that humans want is an important topic. However...
Utterance rewriting aims to recover coreferences and omitted information...
In this paper, we propose a novel unsupervised text-to-speech (UTTS)
fra...
Despite the rapid progress in automatic speech recognition (ASR) researc...
Acoustic echo cancellation (AEC) plays an important role in the full-dup...
Disentangling content and speaking style information is essential for
ze...
Approaches for the stance classification task, an important task for
und...
Recently, Conformer based CTC/AED model has become a mainstream architec...
Traditional studies on voice conversion (VC) have made progress with par...
In automatic speech recognition (ASR) research, discriminative criteria ...
Podcasts have recently shown a rapid rise in popularity. Summarization o...
Comprehending a dialogue requires a model to capture diverse kinds of ke...
We consider the problem of pretraining a two-stage open-domain question
...
Just Noticeable Difference (JND) has many applications in multimedia sig...