Learning to Reconstruct Shapes from Unseen Classes

We study the task of generalizable single-image 3D reconstruction, aiming to reconstruct the 3D shape of an object outside training classes. Here we show a table and a bed reconstructed from single RGB images by our model trained on cars, chairs, and airplanes. Our model learns to reconstruct objects outside the training classes.

Single-image 3D reconstructions of objects within and beyond training classes. Each row from left to right: the input image, two views from the best-performing baseline for each testing class, two views of GenRe predictions, and the ground truth. All the models are trained on the same dataset of cars, chairs, and airplanes.

Single-view completion of non-rigid shapes from depth maps by our model trained on cars, chairs, and airplanes.

Abstract

From a single image, humans are able to perceive the full 3D shape of an object by exploiting learned shape priors from everyday life. Contemporary single-image 3D reconstruction algorithms aim to solve this task in a similar fashion, but often end up with priors that are highly biased by training classes. Here we present an algorithm, Generalizable Reconstruction (GenRe), designed to capture more generic, class-agnostic shape priors. We achieve this with an inference network and training procedure that combine 2.5D representations of visible surfaces (depth and silhouette), spherical shape representations of both visible and non-visible surfaces, and 3D voxel-based representations, in a principled manner that exploits the causal structure of how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe performs well on single-view shape reconstruction, and generalizes to diverse novel objects from categories not seen during training.