Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

1University of California, Irvine     2Stony Brook University
3Huazhong University of Science and Technology     4University of Florida
Teaser image.

Single-step Diffusion Models for Forward and Inverse Rendering in Cycle Consistency .
Left Upper: Ouroboros decomposes input images into intrinsic maps (albedo, normal, roughness, metallicity, and irradiance). Given these generated intrinsic maps and textual prompts, our neural forward rendering model synthesizes images closely matching the originals.
Right Upper: We extend an end-to-end finetuning technique to diffusion-based neural rendering. The radar plot illustrates numerical comparisons with RGB↔X on the InteriorVerse dataset.
Bottom: Our method achieves temporally consistent video inverse rendering without specific finetuning on video data.


Abstract

While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.