Semantic vision positioning method and device based on multi-modal graph convolutional network

A convolutional network, multi-modal technology, applied in the computer field, can solve the problems of lack of semantic relationship information exploration, limited, difficult to obtain visual positioning, etc., to achieve the effect of improving task performance, alleviating the impact, and obtaining accurate acquisition.

Active Publication Date: 2020-10-16
BEIJING SHENRUI BOLIAN TECH CO LTD +1
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Generally speaking, methods that lack semantic relationship modeling between noun phrases are often difficult to obtain ideal visual positioning when dealing with ambiguous or ambiguous semantic elements. It is necessary to study fine semantic relationship modeling methods for noun phrases so that they can be used in Semantic Visual Localization Guided by Semantic Structure Information
[0003] Existing solutions mainly focus on the fusion of visual features and corpus features, reconstructing the corpus from the corresponding visual regions of the corpus, or roughly combining semantic context information. They lack the exploration of semantic relationship information between noun phrases in the corpus, so they are limited. Semantic elements that need to combine information such as context and semantic relations for visual position reasoning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic vision positioning method and device based on multi-modal graph convolutional network
  • Semantic vision positioning method and device based on multi-modal graph convolutional network
  • Semantic vision positioning method and device based on multi-modal graph convolutional network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0025] The core of the present invention lies in: proposing to construct a semantic structure graph by parsing corpus, learning and extracting multi-modal features under the guidance of semantic information, and improving the performance of semantic visual positioning tasks. This method constructs a semantic structure graph based on semantic information by parsing corpus input, uses multimodal features that combine visual features, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a semantic vision positioning method and device based on a multi-modal graph convolutional network. The method comprises the steps: obtaining an input picture and corpus description; extracting multi-scale visual features of an input picture by using a convolutional neural network, and encoding and embedding spatial coordinate information to obtain spatial perception visualfeatures; analyzing the corpus description to construct a semantic structure diagram, encoding each node word vector in the semantic structure diagram, and learning diagram node semantic features through a multilayer perceptron; fusing the spatial perception visual features and the graph node semantic features to obtain multi-modal features of each node in the semantic structure graph; spreading relationship information of nodes in the semantic structure chart through a graph convolution network, and learning visual semantic relationships under the guidance of semantic relationships; and performing semantic visual position reasoning to obtain a visual position of the semantic information. According to the method, context semantic information is combined when ambiguous semantic elements areprocessed, and visual positioning can be guided by utilizing semantic relation information.

Description

technical field [0001] The present invention relates to the field of computers, in particular to a semantic visual positioning method and device based on a multimodal graph convolutional network. Background technique [0002] Realizing the communication between humans and machines in the real world, enabling machines to understand visual scenes described by natural language, is a basic but very challenging problem in the field of artificial intelligence. The basis of this problem is to enable the machine to locate semantic elements in the visual scene, that is, given a natural language description of a visual scene, the machine must be able to locate the corresponding semantic element in the visual scene. In recent years, the task of semantic visual localization has received extensive attention and has been developed rapidly, achieving excellent performance. However, existing solutions locate noun phrases one by one in the picture, do not model the semantic relationship bet...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06F40/126G06K9/62G06N3/04
CPCG06F40/289G06F40/30G06F40/126G06N3/045G06F18/22G06F18/25
Inventor 俞益洲史业民杨思蓓吴子丰
Owner BEIJING SHENRUI BOLIAN TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products