A 6D pose estimation method based on cross-modal information fusion

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By employing a cross-modal information fusion method, utilizing the encoding and decoding stages of RGB networks and point cloud networks, and combining geometric context feature aggregation and cross-modal attention fusion modules, the accuracy and computational cost issues in RGB-D pose estimation are addressed, thereby improving pose estimation performance in occluded scenarios.

CN118135553BActive Publication Date: 2026-06-26XIDIAN UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: XIDIAN UNIV
Filing Date: 2024-03-20
Publication Date: 2026-06-26

Application Information

Patent Timeline

20 Mar 2024

Application

26 Jun 2026

Publication

CN118135553B

IPC: G06V20/64; G06V10/80; G06V10/26; G06V20/70; G06V10/44; G06V10/46; G06V10/42; G06V10/56; G06V10/82; G06N3/0464; G06N3/0455

AI Tagging

Technology Topics

Point cloud Rgb image

Technical Efficacy Phrases

Efficient aggregationAggregation precision

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A method and system for automatic inspection of offshore wind turbine blades using unmanned aerial vehicles (UAVs) based on multi-airport collaboration
CN122308455Aimprove accuracy Improve environmental adaptabilityNo-fly zoneAnomaly detection
Large-scale two-dimensional label map aggregation node optimization method
CN121900677BEfficient aggregationefficient splitGeographical information databases Special data processing applications Algorithm Memory footprint
A water-based digital glaze ink with an aqueous dispersant and its preparation method
CN122080299Agood compatibilityprevent collision reunionInks Polymer science Side chain
A Method and System for Multi-hop Path Analysis of Firewall Policies Based on Heterogeneous Graphs
CN122093114AImprove traceabilityEfficient aggregationOther databases indexing Special data processing applications Graph spectra Engineering
Graph Representation Learning Method Based on Variational Hypergraph Mask Autoencoder
CN122286287AIncrease resistance Improve robustness AlgorithmHypergraph

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing RGB-D based 6D pose estimation methods suffer from low accuracy and high computational cost when dealing with weak textures, occlusion, and lighting problems, and fail to effectively integrate the global semantic relevance of RGB and depth information.

Method used

A cross-modal information fusion-based approach is adopted, which integrates RGB and point cloud features through RGB network branches and point cloud network branches in the encoding and decoding stages, and utilizes a geometric context feature aggregation module and a cross-modal attention fusion module to perform 6D pose estimation.

Benefits of technology

It improves the accuracy of pose estimation in occluded scenarios, reduces computational costs, and achieves high-performance end-to-end pose estimation, making it suitable for fields such as robot manipulation and autonomous driving.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN118135553B_ABST

Patent Text Reader

Abstract

The application discloses a 6D pose estimation method based on cross-modal information fusion, in the encoding stage, the RGB network branch and the point cloud network branch respectively use the encoder to extract the RGB feature of the RGB image and the point cloud feature of the depth image layer by layer, when the point cloud feature is extracted, the geometric context feature aggregation module is used in each layer of encoding. In the decoding stage, the RGB network branch and the point cloud network branch respectively use a multilayer decoder to decode the features. Between the corresponding encoding layers and decoding layers of the two branches, the cross-modal attention fusion module is used to fuse the RGB feature and the point cloud feature, and the fused features are re-split into the RGB feature and the point cloud feature according to the arrangement order of the RGB feature and the point cloud feature. In the pose calculation stage, the RGB feature output by the first multilayer decoder and the point cloud feature output by the second multilayer decoder are spliced, and 6D pose estimation is carried out according to the spliced features. The application improves the feature representation capability in the occlusion scene and improves the performance of pose estimation in the occlusion scene.

Need to check novelty before this filing date? Find Prior Art

Citation Information

Patent Citations

CN114863573A
CN117218343A

Patent Information

AI Technical Summary

Abstract

Patent Citations

CN114863573A

CN117218343A