Abstract
Given a single image of a 3D object, this paper proposes a novel method (named ConsistNet) that can generate multiple images of the same object, as if they are capturedfrom different viewpoints, while the 3D (multi-view) consistencies among those multiple generated images are effectively exploited. Central to our method is a lightweight multi-view consistency block that enables information exchange across multiple single-view diffusion processes based on the underlying multi-view geometry principles. ConsistNet is an extension to the standard latent diffusion model and it consists of two submodules: (a) a view aggregation module that unprojects multi-view features into global 3D volumes and infers consistency, and (b) a ray aggregation module that samples and aggregates 3D consistent features back to each view to enforce consistency. Our approach departs from previous methods in multi-view image generation, in that it can be easily dropped in pretrained LDMs without requiring explicit pixel correspondences or depth prediction. Experiments show that our method effectively learns 3D consistency over a frozen Zero123-XL backbone and can generate 16 surrounding views of the object within 11 seconds on a single A100 GPU. Our code will be made available on https://github.com/JiayuYANG/ConsistNet.
| Original language | English |
|---|---|
| Pages (from-to) | 7079-7088 |
| Number of pages | 10 |
| Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States Duration: 16 Jun 2024 → 22 Jun 2024 |
Fingerprint
Dive into the research topics of 'ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver