Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

by   Fangfu Liu, et al.

In this paper, we aim to learn a semantic radiance field from multiple scenes that is accurate, efficient and generalizable. While most existing NeRFs target at the tasks of neural scene rendering, image synthesis and multi-view reconstruction, there are a few attempts such as Semantic-NeRF that explore to learn high-level semantic understanding with the NeRF structure. However, Semantic-NeRF simultaneously learns color and semantic label from a single ray with multiple heads, where the single ray fails to provide rich semantic information. As a result, Semantic NeRF relies on positional encoding and needs to train one specific model for each scene. To address this, we propose Semantic Ray (S-Ray) to fully exploit semantic information along the ray direction from its multi-view reprojections. As directly performing dense attention over multi-view reprojected rays would suffer from heavy computational cost, we design a Cross-Reprojection Attention module with consecutive intra-view radial and cross-view sparse attentions, which decomposes contextual information along reprojected rays and cross multiple views and then collects dense connections by stacking the modules. Experiments show that our S-Ray is able to learn from multiple scenes, and it presents strong generalization ability to adapt to unseen scenes.


page 1

page 3

page 7

page 11

page 15

page 16

page 17


StudyFormer : Attention-Based and Dynamic Multi View Classifier for X-ray images

Chest X-ray images are commonly used in medical diagnosis, and AI models...

Learning to Render Novel Views from Wide-Baseline Stereo Pairs

We introduce a method for novel view synthesis given only a single wide-...

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

Dense semantic 3D reconstruction is typically formulated as a discrete o...

ContourletNet: A Generalized Rain Removal Architecture Using Multi-Direction Hierarchical Representation

Images acquired from rainy scenes usually suffer from bad visibility whi...

Geometry-Aware Attenuation Field Learning for Sparse-View CBCT Reconstruction

Cone Beam Computed Tomography (CBCT) is the most widely used imaging met...

Focused Specific Objects NeRF

Most NeRF-based models are designed for learning the entire scene, and c...

GenLayNeRF: Generalizable Layered Representations with 3D Model Alignment for Multi-Human View Synthesis

Novel view synthesis (NVS) of multi-human scenes imposes challenges due ...

Please sign up or login with your details

Forgot password? Click here to reset