3D Open-vocabulary Segmentation with Foundation Models

by   Kunhao Liu, et al.

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature significantly as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting the open-vocabulary multimodal knowledge and object reasoning capability of pre-trained foundation models CLIP and DINO, without necessitating any fine-tuning. Specifically, we distill open-vocabulary visual and textual knowledge from CLIP into a neural radiance field (NeRF) which effectively lifts 2D features into view-consistent 3D segmentation. Furthermore, we introduce the Relevancy-Distribution Alignment loss and Feature-Distribution Alignment loss to respectively mitigate the ambiguities of CLIP features and distill precise object boundaries from DINO features, eliminating the need for segmentation annotations during training. Extensive experiments show that our method even outperforms fully supervised models trained with segmentation annotations, suggesting that 3D open-vocabulary segmentation can be effectively learned from 2D images and text-image pairs.


page 2

page 8

page 9

page 17

page 18

page 19

page 20

page 21


Open-Vocabulary Panoptic Segmentation with MaskCLIP

In this paper, we tackle a new computer vision task, open-vocabulary pan...

Open-vocabulary Panoptic Segmentation with Embedding Modulation

Open-vocabulary image segmentation is attracting increasing attention du...

Diffusion Models for Zero-Shot Open-Vocabulary Segmentation

The variety of objects in the real world is nearly unlimited and is thus...

Common Sense Knowledge Learning for Open Vocabulary Neural Reasoning: A First View into Chronic Disease Literature

In this paper, we address reasoning tasks from open vocabulary Knowledge...

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

We propose a novel algorithm, named Open-Edit, which is the first attemp...

Natural Vocabulary Emerges from Free-Form Annotations

We propose an approach for annotating object classes using free-form tex...

Going Denser with Open-Vocabulary Part Segmentation

Object detection has been expanded from a limited number of categories t...

Please sign up or login with your details

Forgot password? Click here to reset