A 6D pose estimation method based on cross-modal information fusion
By employing a cross-modal information fusion method, utilizing the encoding and decoding stages of RGB networks and point cloud networks, and combining geometric context feature aggregation and cross-modal attention fusion modules, the accuracy and computational cost issues in RGB-D pose estimation are addressed, thereby improving pose estimation performance in occluded scenarios.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2024-03-20
- Publication Date
- 2026-06-26
AI Technical Summary
Existing RGB-D based 6D pose estimation methods suffer from low accuracy and high computational cost when dealing with weak textures, occlusion, and lighting problems, and fail to effectively integrate the global semantic relevance of RGB and depth information.
A cross-modal information fusion-based approach is adopted, which integrates RGB and point cloud features through RGB network branches and point cloud network branches in the encoding and decoding stages, and utilizes a geometric context feature aggregation module and a cross-modal attention fusion module to perform 6D pose estimation.
It improves the accuracy of pose estimation in occluded scenarios, reduces computational costs, and achieves high-performance end-to-end pose estimation, making it suitable for fields such as robot manipulation and autonomous driving.
Smart Images

Figure CN118135553B_ABST