🌞Zechao Guan - Homepage

Light is always waiting to bloom in the darkness.

Zechao Guan 关则潮

M.S. in Mathematics

I am a third-year master's student from the School of Mathematics, Southeast University. I am a third-year master's student at the School of Mathematics, Southeast University. My research focuses on 3D computer vision, including multimodal foundation models (VLMs and VLAs for embodied intelligence), 3D generative AI (3D AIGC), and the integration of topological data analysis with machine learning.

I am fortunate to be advised by Qingshan liu at the Key Laboratory of Collective Intelligence of Cyberspace, Jiangsu Province. Prior to this, I obtained a Bachelor of Applied Mathematics from Guangdong University of Technology.

I am looking for collaborators interested in 3D Computer Vision, Multimodal Foundation Models, and applying these technologies to tasks involving Discrimination👀, Generation🤔, and Action🤖. I believe that deep learning models can effectively leverage multimodal information—such as text, images, point clouds, and topology—to understand complex 3D spatial structures, thereby enhancing their ability for perception, reasoning, and interaction in both physical and virtual environments. If you are interested in collaborating, please feel free to reach out via email or WeChat.

zechao.guan(at)gmail.com Google Scholar GitHub WeChat Twitter LinkedIn ORCID

Education

Southeast University

M.S. in Mathematics

Sep. 2022 - present
Guangdong University of Technology

B.A. in Applied Mathematics

Sep. 2018 - Jul. 2022

Honors & Awards

National Second Prize in Mathematical Modeling for Postgraduate Students

Dec. 2022
National Second Prize in Mathematical Modeling for College Students

Sep. 2020
Outstanding Graduate Award

Jun. 2022
Top Ten Young People of Guangdong University of Technology

Jan. 2022

Experience

Xpeng

Multimodal and Computer Vision Intern (Pre-Research Department of the Robotics Center, Human-Computer Interaction and VLA Team)

Mar. 2025 - May 2025
Meituan

Multimodal and Computer Vision Intern (Basic Research and Development Department, VLA Team)

May. 2024 - Sep. 2024

News

2025

[📜Paper] Our RoboMM paper has been accepted to International Conference on Computer Vision (ICCV) 2025.

Jun 28

[📜Paper] Our TopoLayer paper has been accepted to IEEE International Conference on Multimedia & Expo (ICME) 2025.

Mar 21

[💻Intern] Joined the Exploratory Research Team at Xpeng Robotics as a Research Intern, focusing on Human-Robot Interaction (HRI) and Vision-Language-Action (VLA) models. Link Here

Mar 10

2024

[💻Intern] Joined the Exploratory Research Team at Meituan as a Research Intern, focusing on Vision-Language-Action model (VLA).

May 15

Selected Publications (view all )

TopoDDPM: Diffusion Probabilistic Models using Persistent Homology for 3D Point Cloud Generation

Zechao Guan*, Shuai Du*, Qingshan Liu^# (* equal contribution, ^# corresponding author)

OnGoing 2025

We propose TopoDDPM, a novel topological diffusion model that integrates topological features as shape-informed latent variables. We further introduce a topological loss function to enhance the model's sensitivity to topological variations, encouraging the generation of shapes with more consistent and meaningful topology. Despite having 160× fewer parameters than LION, TopoDDPM achieves superior generation quality and faster inference speed.

TopoDDPM: Diffusion Probabilistic Models using Persistent Homology for 3D Point Cloud Generation

Zechao Guan*, Shuai Du*, Qingshan Liu^# (* equal contribution, ^# corresponding author)

OnGoing 2025

RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation

Feng Yan, Fanfan Liu, Yiyang Huang, Zechao Guan, Liming Zheng, Yufeng Zhong, Chengjian Feng, Lin Ma^# (^# corresponding author)

Accepted to ICCV 2025 2025

In recent years, robotics has advanced significantly through the integration of larger models and large-scale datasets. However, challenges remain in applying these models to 3D spatial interactions and managing data collection costs. To address these issues, we propose the multimodal robotic manipulation model, RoboMM, along with the comprehensive dataset, RoboData. RoboMM enhances 3D perception through camera parameters and occupancy supervision. Building on OpenFlamingo, it incorporates Modality-Isolation-Mask and multimodal decoder blocks, improving modality fusion and fine-grained perception. RoboData offers the complete evaluation system by integrating several well-known datasets, achieving the first fusion of multi-view images, camera parameters, depth maps, and actions, and the space alignment facilitates comprehensive learning from diverse robotic datasets. Equipped with RoboData and the unified physical space, RoboMM is the first generalist policy that enables simultaneous evaluation across all tasks within multiple datasets, rather than focusing on limited selection of data or tasks. Its design significantly enhances robotic manipulation performance, increasing the average sequence length on the CALVIN from 1.7 to 3.3 and ensuring cross-embodiment capabilities, achieving state-of-the-art results across multiple datasets.

[Paper] [Code]

RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation

Feng Yan, Fanfan Liu, Yiyang Huang, Zechao Guan, Liming Zheng, Yufeng Zhong, Chengjian Feng, Lin Ma^# (^# corresponding author)

Accepted to ICCV 2025 2025

[Paper] [Code]

TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

Zechao Guan, Feng Yan, Shuai Du, Lin Ma, Qingshan Liu^# (^# corresponding author)

UnderReview 2025

Recent advancements in Diffusion Transformer (DiT) models have significantly improved 3D point cloud generation. However, existing methods primarily focus on local feature extraction while overlooking global topological information, such as voids, which are crucial for maintaining shape consistency and capturing complex geometries. To address this limitation, we propose TopoDiT-3D, a Topology-Aware Diffusion Transformer with a bottleneck structure for 3D point cloud generation. Specifically, we design the bottleneck structure utilizing Perceiver Resampler, which not only offers a mode to integrate topological information extracted through persistent homology into feature learning, but also adaptively filters out redundant local features to improve training efficiency. Experimental results demonstrate that TopoDiT-3D outperforms state-of-the-art models in visual quality, diversity, and training efficiency. Furthermore, TopoDiT-3D demonstrates the importance of rich topological information for 3D point cloud generation and its synergy with conventional local feature learning.

[Paper] [Code] [Videos]

TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

Zechao Guan, Feng Yan, Shuai Du, Lin Ma, Qingshan Liu^# (^# corresponding author)

UnderReview 2025

[Paper] [Code] [Videos]

TopoLayer: A Universal Neural Network Layer for Advanced Topology Feature Learning on Point Clouds using Persistent Homology

Zechao Guan, Shuai Du, Qingshan Liu^# (^# corresponding author)

Accepted to ICME 2025

Point cloud is complex 3D data characterized by its irregularity and unordered structure. In contrast to previous efforts aimed at extracting local geometric information by sophisticated techniques, we delve into the rich topological information of point clouds using persistent homology. First, we introduce two vectorization methods, PPDTF and PDTF, to transform topological information into a format suitable for deep neural networks. Then we propose TopoLayer, a simple but effective and universal neural network layer seamlessly integrated into existing architectures. Integration of TopoLayer, without architectural modifications, significantly improves established models such as PointMLP and PointNet++. For classification on ModelNet40, the class mean accuracy of PointMLP notably improves from 91.3% to 91.8%, surpassing the state-of-the-art PointMixer. Additionally, PointNet++ achieves a remarkable gain of 2.7%. For part segmentation on ShapeNetPart, PointMLP achieves a new state-of-the-art performance with 85.1% Cls.mIoU, while PointNet++ secures a significant 0.9% increase.

[Code]

TopoLayer: A Universal Neural Network Layer for Advanced Topology Feature Learning on Point Clouds using Persistent Homology

Zechao Guan, Shuai Du, Qingshan Liu^# (^# corresponding author)

Accepted to ICME 2025

[Code]

Warning

Action required

Education

Honors & Awards

Experience

News

Selected Publications (view all )

TopoDDPM: Diffusion Probabilistic Models using Persistent Homology for 3D Point Cloud Generation

TopoDDPM: Diffusion Probabilistic Models using Persistent Homology for 3D Point Cloud Generation

RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation

RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation

TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

TopoLayer: A Universal Neural Network Layer for Advanced Topology Feature Learning on Point Clouds using Persistent Homology

TopoLayer: A Universal Neural Network Layer for Advanced Topology Feature Learning on Point Clouds using Persistent Homology

All publications