Check patentability & draft patents in minutes with Patsnap Eureka AI!

Visual positioning method, device, equipment and medium

A visual positioning and normalization technology, used in image enhancement, image analysis, instrumentation, etc., can solve problems such as text errors, difficult to find and locate objects, obstacles, etc.

Active Publication Date: 2022-05-17
SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Under normal circumstances, human slips of the tongue, subjective deviation when describing objects, ambiguity of description sentences and other reasons will lead to errors in the text. These errors are very common in daily life, but they are very easy in the process of AI algorithm design. Ignored, this becomes an obstacle between existing methods and implementation
In short, when there are some errors in the input text, it is difficult for existing methods to find and locate the object that the sentence itself wants to describe.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual positioning method, device, equipment and medium
  • Visual positioning method, device, equipment and medium
  • Visual positioning method, device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0082] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0083] In the visual positioning task, when there are some errors in the input text, it is difficult for existing methods to find and locate the object that the sentence itself wants to describe.

[0084] For this reason, the embodiment of the present application proposes a visual positioning scheme, which can avoid the influence of noise generated by human language text errors on visual positioning, and realize anti-noise visual positioning.

[0085] The embo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a visual positioning method and device, equipment and a medium, and relates to the technical field of artificial intelligence, and the method comprises the steps: carrying out the feature splicing of image coding features and text coding features; performing feature fusion on the spliced coding features to obtain first fused coding features; performing noise correction on the first fused coding feature and the text coding feature based on a preset cross-attention mechanism to obtain a corrected fused feature and a corrected text coding feature, and performing feature fusion on the spliced coding feature and the corrected text coding feature to obtain a second fused coding feature; and correcting the preset frame feature by using a target coding feature determined on the basis of the corrected fusion feature and the second fused coding feature to predict the regional position coordinate of the target visual object, so that the image-text noise is corrected on the basis of the preset cross-attention mechanism, and the image-text noise prediction accuracy is improved. The influence of noise is weakened by reducing the attention on the noise part in the text, and anti-noise visual positioning is realized.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, in particular to a visual positioning method, device, equipment and medium. Background technique [0002] In recent years, Multi Modal (MM) has become a very important research direction in the field of artificial intelligence. Due to its emphasis on the fusion of vision, text, voice and other information, various algorithms related to multimodality emerge in endlessly: various methods based on convolutional neural networks (CNN) and attention mechanisms (attention) have their own advantages. It has a wide range of applications and has become a mainstream method in fields such as Visual Commonsense Reasoning (VCR), Visual Question Answering (VQA), and Visual Grounding (VG). [0003] The visual localization task is one of the important research directions in the field of multimodal artificial intelligence. This task aims to locate the relevant object in the picture accordi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06T5/00G06T9/00G06T7/70
CPCG06T9/00G06T7/70G06T2207/20084G06T2207/20081G06T5/70
Inventor 李晓川李仁刚赵雅倩郭振华范宝余
Owner SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More