Self-supervised molecular representation learning is critical for
molecu...
We present Emu, a Transformer-based multimodal foundation model, which c...
Offline-to-online reinforcement learning (RL), by combining the benefits...
The label-free model evaluation aims to predict the model performance on...
For semantic segmentation in urban scene understanding, RGB cameras alon...
Effective BEV object detection on infrastructure can greatly improve tra...
Traditional image detail enhancement is local filter-based or global
fil...
Reward function is essential in reinforcement learning (RL), serving as ...
Patients take care of what their teeth will be like after the orthodonti...
In offline reinforcement learning (RL), one detrimental issue to policy
...
Unsupervised domain adaptation (UDA) aims to transfer the knowledge lear...
Most existing video-and-language (VidL) research focuses on a single dat...
With large-scale pre-training, the past two years have witnessed signifi...
Large-scale transformer-based pre-training has recently revolutionized
v...
Vision-and-language pre-training has achieved impressive success in lear...
This work concerns video-language pre-training and representation learni...
Lottery Ticket Hypothesis raises keen attention to identifying sparse
tr...
Recent advances in computer vision take advantage of adversarial data
au...
In this paper, we construct a bijective mapping between a biquadratic sp...
Multimodal pre-training has propelled great advancement in
vision-and-la...
Training generative adversarial networks (GANs) with limited data genera...
The canonical approach to video-and-language learning (e.g., video quest...
Deep, heavily overparameterized language models such as BERT, XLNet and ...
The primary goal of knowledge distillation (KD) is to encapsulate the
in...
Large-scale pre-trained multimodal transformers, such as ViLBERT and UNI...
In this paper, we propose Cross-Thought, a novel approach to pre-trainin...
Pre-trained neural abstractive summarization systems have dominated
extr...
Large-scale language models such as BERT have achieved state-of-the-art
...
Adversarial training is so far the most effective strategy in defending
...
Existing language model compression methods mostly use a simple L2 loss ...
Transformer has become ubiquitous in the deep learning field. One of the...
Existing approaches to real-time question answering (RTQA) rely on learn...
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder ...
Cross-domain alignment between two sets of entities (e.g., objects in an...
Adaptive gradient methods such as RMSProp and Adam use exponential movin...
We present VILLA, the first known effort on large-scale adversarial trai...
Recent Transformer-based large-scale pre-trained models have revolutioni...
We present HERO, a Hierarchical EncodeR for Omni-representation learning...
We introduce a new task, Contextual Text Style Transfer - translating a
...
Natural language often exhibits inherent hierarchical structure ingraine...
We propose a new task towards more practical application for image gener...
We introduce a new task, Video-and-Language Inference, for joint multimo...
Unsupervised domain adaptation (UDA) has achieved unprecedented success ...
Transformer has been successfully applied to many natural language proce...
Large-scale pre-trained language model, such as BERT, has recently achie...
In this paper, we present Hierarchical Graph Network (HGN) for multi-hop...
We present a large, tunable neural conversational response generation mo...
Recently BERT has been adopted in state-of-the-art text summarization mo...
There are two main lines of research on visual reasoning: neural module
...
Adversarial training, which minimizes the maximal risk for label-preserv...