Visual and Geometric Perception Lab
The Vision and Geometric Perception Laboratory (VGPL), of the School of Computer Science and Technology at Donghua University, was established in 2013 by Dr. Shen Cai. It focuses on theoretical and applied research in Computer Vision, Computer Graphics, and Robotics, particularly in 3D or robot tasks using geometric constraints and deep learning methods. Key research areas include camera calibration, pose estimation, image-based 3D reconstruction, feature extraction and matching, 3D object recognition, neural implicit reconstruction, concise 3D representations, collision detection, robot navigation, and intelligent grasping. The lab also collaborates with companies on industrial vision inspection, action recognition, object detection/segmentation, AR/VR, 3D registration, and robot manipulations. Currently, there are 14 graduate students and 1 intern in the lab, with 21 students and 6 interns having graduated so far.
Directions in the field of Computer Vision (CV)
Camera Calibration Camera Calibration aims to calculate the internal and external parameters of a camera and lens distortion. Research in this direction covers pattern design, feature extraction, homography computation based on 2D-2D point correspondences, lens distortion model selection, and the joint optimization of internal and external parameters. Our lab has conducted theoretical and applied research in many areas, such as fast and interpretable homography decomposition, deep homography estimation, feature correspondence, calibration using conic curves and hybrid primitives, rapid multi-camera calibration, robot-based calibration, depth camera calibration, and zoom calibration. We have published a number of academic papers and applied our findings to various company projects.
Pose Estimation Pose Estimation aims to calculate the Euclidean transformation (rotation and translation) between 3D coordinate systems. It often uses 3D-2D point pairs to estimate the camera's absolute pose from a single image or uses 2D-2D point pairs to estimate relative pose from two images. Our lab has completed an in-depth research on the perspective-three-point (P3P) problem.
Image-based 3D Reconstruction Image-based 3D Reconstruction aims to reconstruct scenes or objects from images or RGB-D data. Its research includes feature extraction, pose estimation, 3D representation, and the joint optimization of external parameters and 3D points. Our lab has explored multiple directions such as depth camera based fusion reconstruction, multi-view stereo, and robot navigation.
Directions in the field of Computer Graphics (CG)
3D Representation 3D Representation is a key research topic in graphics. To meet the needs of different tasks, 3D representations often need to be switched or combined. Our lab focuses on using orthogonal distance fields and spherical primitives to accomplish various 3D representation tasks. These include fast and accurate neural implicit representation based on orthogonal distance fields, concise inner(spherical) outer-ball representation, concise spherical-node-graph representation, double-layer spherical-shell representation, and hierarchical shell representation. We've published several academic papers in these areas.
3D Object Recognition and Segmentation 3D Object Recognition and Segmentation involve classifying and segmenting known 3D models. Our lab has published multiple papers on deep classification networks based on spherical projection, classification networks based on spatial key-sphere representation, classification networks based on spherical-node graphs, and part segmentation based on key spheres.
3D Point Cloud Registration 3D Point Cloud Registration aims to quickly and accurately match a known 3D model in a scene and estimate relative pose using matched 3D-3D point pairs. Our lab has conducted research in 3D feature extraction, random sample consensus, and robotic grasping, applying these findings to various corporate projects.
Directions in the field of Robotics
Robotic Arm Kinematic Solving Robotic Arm Kinematic Solving is one of the fundamental problems in the field of robotics. Our laboratory focuses on generating precise multi-degree-of-freedom control signals for robotic arm movements using either traditional analytical methods or reinforcement learning algorithms. We have conducted systematic theoretical modeling and experimental verification in this area.
Robotic Autonomous Navigation Robotic Autonomous Navigation utilizes multi-modal sensors—such as RGB cameras, depth cameras, and LiDAR—to achieve Simultaneous Localization and Mapping (SLAM). Based on this, the system performs global/local path planning to autonomously drive the robot to the target pose. Our lab has successfully deployed and validated this technology on various mobile wheeled robot platforms.
Robotic Intelligent Manipulation Robotic Intelligent Manipulation employs Vision-Language Models (VLMs) as the high-level "brain" to comprehend complex human intents and decompose them into operational steps. For low-level execution, we combine traditional kinematic solving methods with cutting-edge end-to-end models (such as Vision-Language-Action models [VLA] or World-Action models [WAM]) to control the robot's manipulation tasks. Our laboratory has conducted in-depth theoretical research in this field and has successfully applied these technologies in multiple robotics competitions and corporate projects.

0