A target hierarchical tree-based unmanned aerial vehicle visual language navigation method

By employing a target-hierarchical tree-structured UAV visual-language navigation method, and utilizing a large language model and a multimodal encoder, the problem of aligning visual and textual information in complex environments for UAVs is solved, achieving efficient and accurate navigation decisions and target recognition.

CN119197529BActive Publication Date: 2026-06-16BEIHANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIHANG UNIV
Filing Date
2024-09-24
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In complex environments, drones struggle to accurately identify and understand visual targets in navigation commands, especially in multi-view and multi-granularity scenarios. Existing methods struggle to achieve fine-grained alignment between visual and textual information.

Method used

A visual language navigation method for UAVs based on a target hierarchy tree is adopted. The text features of navigation instructions are obtained through the target parsing module, a first-order logic program is generated using a large language model, a hierarchy tree is constructed and visual features are extracted by the target localization module, and finally, navigation decision is achieved by integrating navigation information through a multimodal encoder.

🎯Benefits of technology

It improves the accuracy and understanding of navigation targets for UAVs in complex environments, enhances the quality of navigation decisions and system scalability, and reduces system upgrade and maintenance costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119197529B_ABST
    Figure CN119197529B_ABST
Patent Text Reader

Abstract

The application provides a UAV visual language navigation method based on a target hierarchical tree, comprising the following steps: S1, obtaining a first-order logic program of a target in a navigation instruction through a target analysis module, and obtaining text features of the navigation instruction by using a text encoder; S2, constructing a hierarchical tree corresponding to each type of target in a front view image through a target positioning module, positioning a key target according to the first-order logic program of the target and obtaining corresponding visual features, and then extracting visual features of the front view image by using a visual encoder; and S3, obtaining a navigation action output corresponding to each moment of navigation by using visual text features through a multi-modal encoder. The application improves the quality of navigation decision.
Need to check novelty before this filing date? Find Prior Art