Secondary retrieval-based method and apparatus for cross-modal image and text retrieval, device, and medium

By using a secondary retrieval method, which acquires and fuses image features, the problem of insufficient image-text interaction in feature-based retrieval is solved, achieving higher retrieval accuracy and efficiency.

WO2026124054A1PCT designated stage Publication Date: 2026-06-18SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD +2

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD
Filing Date
2025-11-05
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

In existing technologies, feature-based cross-modal image and text retrieval methods lack image and text interaction, resulting in low retrieval accuracy. How to implement image and text interaction in feature-based retrieval to improve retrieval accuracy has become an urgent problem to be solved.

Method used

The method based on secondary retrieval first obtains the first retrieval features and performs feature query on the database to obtain N first image features. Then, these features are fused with the first retrieval features to obtain the second retrieval features. The second retrieval features are then used for feature query to achieve image-text interaction fusion.

Benefits of technology

By employing a two-stage retrieval method, the impact of differences between image and text modalities is reduced, retrieval accuracy is improved, and high efficiency is maintained.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025132684_18062026_PF_FP_ABST
    Figure CN2025132684_18062026_PF_FP_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of image retrieval, and in particular, to a secondary retrieval-based method and apparatus for cross-modal image and text retrieval, a device, and a medium. The method comprises: by means of obtaining a first retrieval feature and using the first retrieval feature, performing feature querying on a database to be queried, to obtain N first image features, database to be queried storing a mapping relationship of image features of at least one image and a corresponding image in the database to be queried; fusing the N first image features and the first retrieval feature, to obtain a fusion result as a second retrieval feature; and performing feature querying on the database to be queried by using the second retrieval feature, to obtain a retrieval target image. The retrieval accuracy can be improved by means of two retrievals. In the second retrieval, interactive image-text fusion is performed on the image in the first retrieval and a retrieval requirement. Although the image and the text are in different modes, the fusion can reduce the impact of the difference in the modes, thereby improving the final retrieval accuracy.
Need to check novelty before this filing date? Find Prior Art