Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Han Yan*, Yang Li, Zhennan Wu*, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single triplane tensor, from which multiple Signed Distance Function (SDF) fields can be decoded to represent the compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of the compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024
EditorsStephen N. Spencer
PublisherAssociation for Computing Machinery (ACM)
ISBN (Electronic)9798400711312
DOIs
Publication statusPublished - 3 Dec 2024
Event2024 SIGGRAPH Asia 2024 Conference Papers, SA 2024 - Tokyo, Japan
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings - SIGGRAPH Asia 2024 Conference Papers, SA 2024

Conference

Conference2024 SIGGRAPH Asia 2024 Conference Papers, SA 2024
Country/TerritoryJapan
CityTokyo
Period3/12/246/12/24

Fingerprint

Dive into the research topics of 'Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane'. Together they form a unique fingerprint.

Cite this