Unlock instant, AI-driven research and patent intelligence for your innovation.

A visual positioning method, device, equipment and medium

A visual positioning and normalization technology, applied in image enhancement, image analysis, instruments, etc., can solve problems such as difficult to find and locate objects, text errors, obstacles, etc.

Active Publication Date: 2022-07-08
SUZHOU METABRAIN INTELLIGENT TECH CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Under normal circumstances, human slips of the tongue, subjective deviation when describing objects, ambiguity of description sentences and other reasons will lead to errors in the text. These errors are very common in daily life, but they are very easy in the process of AI algorithm design. Ignored, this becomes an obstacle between existing methods and implementation
In short, when there are some errors in the input text, it is difficult for existing methods to find and locate the object that the sentence itself wants to describe.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A visual positioning method, device, equipment and medium
  • A visual positioning method, device, equipment and medium
  • A visual positioning method, device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0082] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0083] In the visual localization task, when there are some errors in the input text, it is difficult for existing methods to find and locate the object that the sentence itself wants to describe.

[0084] To this end, the embodiment of the present application proposes a visual positioning solution, which can avoid the influence of noise caused by human language and text errors on visual positioning, and realize anti-noise visual posi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The application discloses a visual positioning method, device, equipment and medium, and relates to the technical field of artificial intelligence. The method includes: performing feature splicing on image coding features and text coding features; and performing feature fusion on the spliced ​​coding features to obtain a first The post-fusion coding feature; based on the preset cross-attention mechanism, the noise correction is performed on the first post-fusion coding feature and the text coding feature respectively, and the post-fusion coding feature and post-modification text coding feature are obtained, and the post-splicing coding feature and the post-modification text coding feature are encoded The feature is fused to obtain the second fused coding feature; the preset frame feature is modified by using the target coding feature determined based on the revised fused feature and the second fused coding feature to predict the regional position coordinates of the target visual object, it can be seen that, The present application corrects image and text noise based on a preset cross-attention mechanism, reduces the influence of noise by reducing the attention to the noise part in the text, and realizes anti-noise visual positioning.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, and in particular, to a visual positioning method, device, equipment and medium. Background technique [0002] In recent years, Multi Modal (MM) has become a very important research direction in the field of artificial intelligence. Due to its emphasis on the fusion of vision, text, speech and other information, various algorithms related to multimodality are also emerging in an endless stream: various methods based on Convolutional Neural Networks (CNN) and attention mechanisms (attention) have their own A wide range of applications have become mainstream methods in fields such as Visual Commonsense Reasoning (VCR), Visual Question Answering (VQA), and Visual Grounding (VG). [0003] The visual localization task is one of the important research directions in the field of multimodal artificial intelligence. The task aims to locate the relevant object in the picture accord...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06T5/00G06T9/00G06T7/70
CPCG06T9/00G06T7/70G06T2207/20084G06T2207/20081G06T5/70
Inventor 李晓川李仁刚赵雅倩郭振华范宝余
Owner SUZHOU METABRAIN INTELLIGENT TECH CO LTD