Abstract

We present NeSF, a method for producing 3D semantic fields from posed RGB images alone. In place of classical 3D representations, our method builds on recent work in neural fields wherein 3D structure is captured by point-wise functions. We leverage this methodology to recover 3D density fields upon which we then train a 3D semantic segmentation model supervised by posed 2D semantic maps. Despite being trained on 2D signals alone, our method is able to generate 3D-consistent semantic maps from novel camera poses and can be queried at arbitrary 3D points. Notably, NeSF is compatible with any method producing a density field. Our empirical analysis demonstrates comparable quality to competitive 2D and 3D semantic segmentation baselines on complex, realistically-rendered scenes and significantly outperforms a comparable neural radiance field-based method on a series of tasks requiring 3D reasoning. Our method is the first to learn semantics by recognizing patterns in the geometry stored within a 3D neural field representation. NeSF is trained using purely 2D signals and requires as few as one labeled image per-scene at train time. No semantic input is required for inference on novel scenes.

NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

Suhani Vora · Noha Radwan · Klaus Greff · Henning Meyer · Kyle Genova · Mehdi S. M. Sajjadi · Etienne Pot · Andrea Tagliasacchi · Daniel Duckworth

Video

Paper PDF

Abstract