End-to-end automatic speech recognition (ASR) systems often struggle to
...
Recently, multi-modal vision-language foundation models have gained
sign...
Personalized text-to-image generation has emerged as a powerful and
soug...
To translate well, machine translation (MT) systems and general-purposed...
Transformer and its variants are a powerful class of architectures for
s...
Collaborative Filtering (CF) has been successfully used to help users
di...
Electronic Bill (E-Bill) is a rucial negotiable instrument in the form o...
Automated log analysis is crucial in modern software-intensive systems f...
In this paper, we develop a generic controlled alternate quantum walk mo...
Creating expressive, diverse and high-quality 3D avatars from highly
cus...
Recovering sharp images from dual-pixel (DP) pairs with disparity-depend...
Graph Neural Networks (GNNs) have achieved impressive performance in
col...
Cross-lingual Machine Translation (MT) quality estimation plays a crucia...
Back Translation (BT) is widely used in the field of machine translation...
Pre-trained speech encoders have been central to pushing state-of-the-ar...
Task planning systems have been developed to help robots use human knowl...
The ability to automatically detect and track surgical instruments in
en...
We present a sequence-to-sequence vision-language model whose parameters...
Classical planning systems have shown great advances in utilizing rule-b...
The ability to incrementally learn new classes from limited samples is
c...
Interactive image segmentation enables annotators to efficiently perform...
Although many recent works have investigated generalizable NeRF-based no...
We propose InCA, a lightweight method for transfer learning that
cross-a...
We propose an approach to estimate the number of samples required for a ...
BERTScore is an effective and robust automatic metric for referencebased...
Recent years have witnessed significant growth of face alignment. Though...
Directly training a document-to-document (Doc2Doc) neural machine transl...
There has been an explosion of interest in designing various Knowledge G...
Generalist models, which are capable of performing diverse multi-modal t...
Transformer-based sequential recommenders are very powerful for capturin...
Graph neural networks have achieved significant success in representatio...
In this work, we present a well-optimized GPU implementation of Dilithiu...
Pre-trained speech Transformers have facilitated great success across va...
Pre-trained speech Transformers in speech translation (ST) have facilita...
Automated task planning algorithms have been developed to help robots
co...
Annotating bounding boxes for object detection is expensive, time-consum...
This paper presents a simple yet effective framework MaskCLIP, which
inc...
Logs are one of the most critical data for service management. It contai...
Prompt tuning has become a new paradigm for model tuning and it has
demo...
Most existing works on few-shot object detection (FSOD) focus on a setti...
3D object detection has achieved remarkable progress by taking point clo...
End-to-end speech-to-text translation models are often initialized with
...
Multimodal fine-grained sentiment analysis has recently attracted increa...
Prompt Learning has recently gained great popularity in bridging the gap...
After a developer submits code, corresponding test cases arise to ensure...
Few-shot relation learning refers to infer facts for relations with a li...
However, current autoregressive approaches suffer from high latency. In ...
This paper aims to address the problem of pre-training for person
re-ide...
We consider the problem of omni-supervised object detection, which can u...
Multimodal sentiment analysis has attracted increasing attention and lot...