We propose a method to fuse frozen text-only large language models (LLMs...
We propose an efficient method to ground pretrained text-only language m...
We present the Pathways Autoregressive Text-to-Image (Parti) model, whic...
We study the problem of synthesizing immersive 3D indoor scenes from one...
Pretraining language models with next-token prediction on massive text
c...
People navigating in unfamiliar buildings take advantage of myriad visua...
Learning to predict the long-term future of video frames is notoriously
...
The output of text-to-image synthesis systems should be coherent, clear,...
Localized Narratives is a dataset with detailed natural language descrip...
Fully-automatic execution is the ultimate goal for many Computer Vision
...
Semantic boundary and edge detection aims at simultaneously detecting ob...