Layered supervision cross-modal image-text retrieval method
A cross-modal, graphic-text technology, applied in the field of layered supervision and cross-modal graphic-text retrieval, to achieve the effect of improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0070] like figure 1 , figure 2 As shown, a hierarchically supervised cross-modal image-text retrieval method, the method includes the following steps:
[0071] S1: Construct a feature extraction network for extracting image features and text features;
[0072] S2: Use the feature extraction network to extract image and text features, and obtain the preliminary high-dimensional feature values of the image and text respectively;
[0073] S3: In the feature extraction stage, using the feature extraction network as the generator and the modal confrontation network as the adversarial device, input the preliminary high-dimensional feature values of images and texts generated by the feature extraction network into the modal confrontation network for confrontation learning, so that The different modalities of semantics are closest in common space;
[0074] S4: Construct a hash code generation network, and use the hash code generation network to constrain the last fully connec...
Embodiment 2
[0137] A computer system, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, characterized in that: when the processor executes the computer program, the steps of the method are as follows:
[0138] S1: Construct a feature extraction network for extracting image features and text features;
[0139] S2: Use the feature extraction network to extract image and text features, and obtain the preliminary high-dimensional feature values of the image and text respectively;
[0140] S3: In the feature extraction stage, construct a modality confrontation network, and input the preliminary high-dimensional feature values of images and texts into the modality confrontation network for confrontation learning, so that the distance between different modes with the same semantics in the public space is the shortest;
[0141] S4: Construct a hash code generation network, and use the hash code generation network to constrain the last...
Embodiment 3
[0143] A computer-readable storage medium, on which a computer program is stored, is characterized in that: when the computer program is executed by a processor, the method steps implemented are as follows:
[0144] S1: Construct a feature extraction network for extracting image features and text features;
[0145] S2: Use the feature extraction network to extract image and text features, and obtain the preliminary high-dimensional feature values of the image and text respectively;
[0146] S3: In the feature extraction stage, construct a modality confrontation network, and input the preliminary high-dimensional feature values of images and texts into the modality confrontation network for confrontation learning, so that different modalities with the same semantics have the closest distance in the public space;
[0147]S4: Construct a hash code generation network, and use the hash code generation network to constrain the last fully connected layer of the feature extraction...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


