A visual positioning method based on a diverse identification candidate box generation network

A technology of visual positioning and candidate frame, applied in the field based on deep neural network, can solve the problem of high computational complexity

Active Publication Date: 2019-05-03
HANGZHOU DIANZI UNIV
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, the feature fusion model based on the bilinear model has played a very good role in many fields, such as

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A visual positioning method based on a diverse identification candidate box generation network
  • A visual positioning method based on a diverse identification candidate box generation network
  • A visual positioning method based on a diverse identification candidate box generation network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0110] The detailed parameters of the present invention will be further specifically described below.

[0111] Step (1), training diverse discriminative candidate frame generation network (Diversified and DiscriminativeProposal Networks, DDPN)

[0112] Use Faster-RCNN (an image detection algorithm) and add the prediction of the attribute value of the object on the basis of it, such as figure 1 As shown, it is trained on the Visual Genome dataset until the network converges, and the resulting converged network is called the DDPN network.

[0113] The described use DDPN network of step (2) extracts feature to image, specifically as follows:

[0114] 2-1. Here the DDPN network is used to predict 100 candidate boxes in the input image.

[0115] 2-2. Input the image area corresponding to 100 candidate frames into the DDPN network, extract the output data of the Pool5 layer as the feature pf corresponding to the candidate frame, And splicing the features corresponding to all the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a visual positioning method based on a diverse identification candidate box generation network. The method comprises the following steps: 1, training a diverse identification candidate box generation network; 2, extracting features of the image by using the trained DDPN network,3, extracting text data characteristics,4, constructing a target vector and a target value of a regression frame,5, constructing a deep neural network,6, setting a loss function, 7, training the model,8, calculating a network prediction value. According to the algorithm provided by the invention,especially the DDPN network-based algorithm for extracting the features of the image, a significant improvement effect is achieved on an image visual positioning task, and all mainstream methods on the task at present are greatly exceeded. In addition, the feature extraction algorithm has very important application value and huge potential in other cross-modal related fields such as image contentquestion answering and image description.

Description

technical field [0001] The present invention relates to an algorithm based on a deep neural network for the problem of image visual grounding (Visual Grounding), in particular to an image feature extraction method based on Diversified and Discriminative Proposal Networks (DDPN), and a A Deep Neural Network Architecture for Image Visual Localization Problems. Background technique [0002] Visual positioning is a sub-task in the field of "cross-media", and "cross-media" is a cross-direction between the research fields of computer vision and natural language processing, which aims to open up the "cross-media" between different media (such as images and texts). Semantic gap" to establish a unified semantic expression. Based on the theoretical method of cross-media unified expression, some current hot research directions have been derived, such as natural description generation (Image Captioning), image-text cross-media retrieval (Image-Text Cross-media Retrieval), automatic que...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06T7/00
Inventor 俞俊余宙项晨钞
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products