Layered supervision cross-modal image-text retrieval method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A cross-modal, graphic-text technology, applied in the field of layered supervision and cross-modal graphic-text retrieval, to achieve the effect of improving accuracy

Pending Publication Date: 2022-03-11

GUILIN UNIV OF ELECTRONIC TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0010] In order to solve the above-mentioned deficiencies in the prior art, the present invention provides a hierarchically supervised cross-modal image-text retrieval method, which can realize the retrieval of cross-modal data with hierarchical supervision and improve the efficiency of cross-modal retrieval

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0070] like figure 1 , figure 2 As shown, a hierarchically supervised cross-modal image-text retrieval method, the method includes the following steps:

[0071] S1: Construct a feature extraction network for extracting image features and text features;

[0072] S2: Use the feature extraction network to extract image and text features, and obtain the preliminary high-dimensional feature values of the image and text respectively;

[0073] S3: In the feature extraction stage, using the feature extraction network as the generator and the modal confrontation network as the adversarial device, input the preliminary high-dimensional feature values of images and texts generated by the feature extraction network into the modal confrontation network for confrontation learning, so that The different modalities of semantics are closest in common space;

[0074] S4: Construct a hash code generation network, and use the hash code generation network to constrain the last fully connec...

Embodiment 2

[0137] A computer system, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that: when the processor executes the computer program, the steps of the method are as follows:

[0138] S1: Construct a feature extraction network for extracting image features and text features;

[0139] S2: Use the feature extraction network to extract image and text features, and obtain the preliminary high-dimensional feature values of the image and text respectively;

[0140] S3: In the feature extraction stage, construct a modality confrontation network, and input the preliminary high-dimensional feature values of images and texts into the modality confrontation network for confrontation learning, so that the distance between different modes with the same semantics in the public space is the shortest;

[0141] S4: Construct a hash code generation network, and use the hash code generation network to constrain the last...

Embodiment 3

[0143] A computer-readable storage medium, on which a computer program is stored, is characterized in that: when the computer program is executed by a processor, the method steps implemented are as follows:

[0144] S1: Construct a feature extraction network for extracting image features and text features;

[0145] S2: Use the feature extraction network to extract image and text features, and obtain the preliminary high-dimensional feature values of the image and text respectively;

[0146] S3: In the feature extraction stage, construct a modality confrontation network, and input the preliminary high-dimensional feature values of images and texts into the modality confrontation network for confrontation learning, so that different modalities with the same semantics have the closest distance in the public space;

[0147]S4: Construct a hash code generation network, and use the hash code generation network to constrain the last fully connected layer of the feature extraction...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a hierarchical supervision cross-modal image-text retrieval method. The method comprises the following steps: S1, constructing a feature extraction network for extracting image features and text features; s2, extracting image and text features by using a feature extraction network, and respectively obtaining initial high-dimensional feature values of the image and the text; s3, constructing a modal adversarial network, and inputting the preliminary high-dimensional feature values of the image and the text into the modal adversarial network for adversarial learning, so that different modals containing the same semantics are closest in a public space; and S4, constructing a Hash code generation network, and utilizing the Hash code generation network to restrain the last full-connection layer of the feature extraction network, so that the initial high-dimensional feature values of the image and the text passing through the last full-connection layer generate an optimal Hash code, and cross-modal data retrieval is realized. According to the cross-modal data retrieval method and device, the retrieval of the cross-modal data with hierarchical supervision can be realized, and the cross-modal retrieval efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of cross-modal image-text retrieval, and more specifically, relates to a hierarchically supervised cross-modal image-text retrieval method. Background technique [0002] With the rapid development of the Internet and the Internet of Things, a large amount of valuable multimodal data has been generated. How to quickly and efficiently find relevant multimodal information in massive data is extremely important, which makes cross-modal retrieval have application scenarios and research significance. [0003] Most of the existing cross-modal retrieval methods aim at non-hierarchical supervised information, and cannot fully mine the rich semantic information of tags. However, in many real-world application scenarios, the label supervision information of cross-modal data often has a certain hierarchical structure and contains rich semantic information. Therefore, it is extremely important for the field of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/583G06F16/33G06N3/04G06N3/08

CPCG06F16/583G06F16/334G06N3/084G06N3/048G06N3/045

Inventor 陈锐东强保华陶林郑虹孙苹苹张世豪

Owner GUILIN UNIV OF ELECTRONIC TECH

Layered supervision cross-modal image-text retrieval method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology