In this work, we present CleanUNet 2, a speech denoising model that comb...
In this paper, we investigate the in-context learning ability of
retriev...
In this work, we introduce GraPhSyM, a Graph Attention Network (GATv2) m...
Large decoder-only language models (LMs) can be largely improved in term...
Pretrained large language models have become indispensable for solving
v...
Augmenting pretrained language models (LMs) with a vision encoder (e.g.,...
We work to create a multilingual speech synthesis system which can gener...
Large-scale diffusion-based generative models have led to breakthroughs ...
Parameter efficient learning methods (PERMs) have recently gained signif...
Closed-book question answering (QA) requires a model to directly answer ...
Despite recent progress in generative adversarial network(GAN)-based
voc...
Pretrained language models (LMs) are susceptible to generate text with
n...
In this work, we present a reinforcement learning (RL) based approach to...
Training large transformer models is one of the most important computati...
Existing knowledge-grounded dialogue systems typically use finetuned ver...
Despite recent advances in generative modeling for text-to-speech synthe...
In this work, we present CleanUNet, a causal speech denoising model on t...
Pre-trained language models (LMs) are shown to easily generate toxic
lan...
Video compression is a central feature of the modern internet powering
t...
Pretrained general-purpose language models can achieve state-of-the-art
...
Detecting social bias in text is challenging due to nuance, subjectivity...
Recent advances in GPU accelerated global and detail placement have redu...
Speech-to-text alignment is a critical component of neural textto-speech...
Transformers have achieved success in both language and vision domains.
...
Large language models have led to state-of-the-art accuracies across a r...
Recent work on training neural retrievers for open-domain question answe...
State-of-the-art conversational agents have advanced significantly in
co...
Existing pre-trained large language models have shown unparalleled gener...
In this work, we propose DiffWave, a versatile Diffusion probabilistic m...
Conventional CNNs for texture synthesis consist of a sequence of
(de)-co...
Multi-scale inference is commonly used to improve the results of semanti...
Non-goal oriented dialog agents (i.e. chatbots) aim to produce varying a...
generative network for text-to-speech synthesis with control over speec...
Question and answer generation is a data augmentation method that aims t...
Unsupervised landmark learning is the task of learning semantic keypoint...
We propose a novel approach for image segmentation that combines Neural
...
This work investigates the use of natural language to enable zero-shot m...
Video-to-video synthesis (vid2vid) aims at converting an input semantic
...
Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GS...
We present GQSAT, a branching heuristic in a Boolean SAT solver trained ...
Recent work in unsupervised language modeling demonstrates that training...
Recent work in unsupervised language modeling demonstrates that training...
Prediction and interpolation for long-range video data involves the comp...
Learning to synthesize high frame rate videos via interpolation requires...
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
Most scene graph generators use a two-stage pipeline to detect visual
re...
Semantic segmentation requires large amounts of pixel-wise annotations t...
Multi-emotion sentiment classification is a natural language processing ...
In this paper, we present a simple yet effective padding scheme that can...
We propose an efficient and interpretable scene graph generator. We cons...