MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation

Shenhao Zhu1,2*, Lingteng Qiu2*, Xiaodong Gu2*, Zhengyi Zhao2*,
Chao Xu2, Yuxiao He1, Zhe Li2,3, Xiaoguang Han5,6, Yao Yao1, Xun Cao1, Siyu Zhu4, Weihao Yuan2,
Zilong Dong2+, Hao Zhu1+

1 Nanjing University  
2 Alibaba Group   3 Huazhong University of Science and Technology  
4 Fudan University   5 SSE, CUHKSZ   6 FNii, CUHKSZ  
* Equal Contribution   + Corresponding Author

Abstract

Existing 2D methods utilize UNet-based diffusion models to generate multi-view PBR maps but struggle with multi-view inconsistency, while some 3D methods directly generate UV maps, encountering generalization issues due to the limited 3D data. To address these problems, we propose a two-stage approach, including multi-view generation and UV materials refinement. In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency. In addition, we adopt a PBR-based diffusion loss to ensure that the generated materials align with realistic physical principles. In the refinement stage, we propose a material-refined DiT that performs inpainting in empty areas and enhances details in UV space. Except for the normal condition, this refinement also takes the material map from the generation stage as an additional condition to reduce the learning difficulty and improve generalization. Extensive experiments show that our method achieves state-of-the-art performance in texturing 3D objects with PBR materials and provides significant advantages for graphics relighting applications.

Multi-View Material Generation

Golden Ray-Ban Aviator Sunglasses

Imarf logo representing a Jewish festival

A yellow shield with a handle and brown accents

Colt M191

A red wooden TV stand with two drawers and a shelf

A purple and yellow Power Ranger in a robot suit

Generated Textured Meshes

Relighting Results

Golden Ray-Ban Aviator Sunglasses

Tiki mask model with colorful feathers on it

Colt M191

A purple and yellow Power Ranger in a robot suit

Steampunk Steam Engine Clock

Halo Master Chief helmet model by Daniel Taylor on DeviantArt

Yellow and black DeWalt cordless drill

NASA Voyager spacecraft

An ornate, floral tiara with intricate details

A military tactical vest

A snake head with red eyes, silver and gold armor

A marble fountain and tombstone with a pillar

Video

Methodology

Pipeline Diagram

Our method consists of a generation stage and a refinement stage. In the generation stage, the Multi-View Generation DiT (MG-DiT) model utilizes surface normal information from the 3D model as geometric conditions, reference images, and textual descriptions to generate multi-view-consistent PBR material properties. In the refinement stage, the Material Refinement DiT (MR-DiT) model performs inpainting in void regions and enhances details in UV space, ultimately producing high-quality 2K resolution textures with precise material information.

BibTeX

@misc{zhu2024mcmat,
      title={MCMat: Multiview-Consistent and Physically Accurate PBR Material Generation},
      author={Shenhao Zhu and Lingteng Qiu and Xiaodong Gu and Zhengyi Zhao and Chao Xu and Yuxiao He and Zhe Li and Xiaoguang Han and Yao Yao and Xun Cao and Siyu Zhu and Weihao Yuan and Zilong Dong and Hao Zhu},
      year={2024},
    }