Selected Publications
✱: Both authors contributed equally.
|
|
|
DA2: Depth Anything in Any Direction
Haodong Li,
Wangguangdong Zheng,
Jing He,
Yuhao Liu,
Xin Lin,
Xin Yang,
Ying-Cong Chen
Chunchao Guo
arXiv 2025
arXiv
/
Paper
/
Project Page
/
Github
/
Demo
/
Data
/
Slides
Powered by large-scale training data curated from our panoramic data curation engine, and the SphereViT for addressing the spherical distortions in panoramas,
DA2 is able to predict dense, scale-invariant distance from a single 360° panorama in an end-to-end manner,
with remarkable geometric fidelity and strong zero-shot generalization.
|
|
|
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He✱ ,
Haodong Li✱ ,
Wei Yin,
Yixun Liang,
Leheng Li,
Kaiqiang Zhou,
Hongbo Zhang,
Bingbing Liu,
Ying-Cong Chen
ICLR 2025
arXiv
/
Paper
/
Project Page
/
Github
/
Demo (D)
/
Demo (N)
/
ComfyUI
Lotus is a diffusion-based visual foundation model with a simple yet effective adaptation protocol,
aiming to fully leverage the pre-trained diffusion's powerful visual priors for dense prediction.
With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot monocular depth and normal estimation.
|
|
|
DisEnvisioner: Disentangled and Enriched Visual Prompt for Image Customization
Jing He✱ ,
Haodong Li✱ ,
Yongzhe Hu,
Guibao Shen,
Yingjie Cai,
Weichao Qiu,
Ying-Cong Chen
ICLR 2025
arXiv
/
Paper
/
Project Page
/
Github
/
Demo
Characterized by its emphasis on the interpretation of subject-essential attributes, the proposed DisEnvisioner
effectively identifies and enhances the subject-essential feature while filtering out other irrelevant information,
enabling exceptional image customization without cumbersome tuning or relying on multiple reference images.
|
|
|
DIScene: Object Decoupling and Interaction Modeling for Complex Scene Generation
Xiao-Lei Li✱ ,
Haodong Li✱ ,
Hao-Xiang Chen,
Tai-Jiang Mu,
Shi-Min Hu
SIGGRAPH Asia 2024
Paper
/
Video
DIScene is capable of generating complex 3D scene with decoupled objects and clear interactions. Leveraging a learnable Scene Graph and Hybrid Mesh-Gaussian representation, we get 3D scenes with superior quality. DIScene can also flexibly edit the 3D scene by changing interactive objects or their attributes, benefiting diverse applications.
|
|
|
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang✱ ,
Xin Yang✱ ,
Jiantao Lin,
Haodong Li,
Xiaogang Xu,
Ying-Cong Chen
CVPR 2024 Highlight
arXiv
/
Paper
/
Github
/
Demo
/
Video
We present LucidDreamer, a text-to-3D generation framework, to distill high-fidelity textures and shapes from pretrained 2D
diffusion models with a novel Interval Score Matching objective and an advanced 3D distillation pipeline.
Together, we achieve superior 3D generation results with photorealistic quality in a short training time.
|
Academic Service
Reviewer: CVPR 2025, ICLR 2026, CVPR 2026
|
|