Visual positioning method, device, equipment and medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A visual positioning and normalization technology, used in image enhancement, image analysis, instrumentation, etc., can solve problems such as text errors, difficult to find and locate objects, obstacles, etc.

Active Publication Date: 2022-05-17

SUZHOU LANGCHAO INTELLIGENT TECH CO LTD

View PDF2 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Under normal circumstances, human slips of the tongue, subjective deviation when describing objects, ambiguity of description sentences and other reasons will lead to errors in the text. These errors are very common in daily life, but they are very easy in the process of AI algorithm design. Ignored, this becomes an obstacle between existing methods and implementation

In short, when there are some errors in the input text, it is difficult for existing methods to find and locate the object that the sentence itself wants to describe.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0082] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0083] In the visual positioning task, when there are some errors in the input text, it is difficult for existing methods to find and locate the object that the sentence itself wants to describe.

[0084] For this reason, the embodiment of the present application proposes a visual positioning scheme, which can avoid the influence of noise generated by human language text errors on visual positioning, and realize anti-noise visual positioning.

[0085] The embo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a visual positioning method and device, equipment and a medium, and relates to the technical field of artificial intelligence, and the method comprises the steps: carrying out the feature splicing of image coding features and text coding features; performing feature fusion on the spliced coding features to obtain first fused coding features; performing noise correction on the first fused coding feature and the text coding feature based on a preset cross-attention mechanism to obtain a corrected fused feature and a corrected text coding feature, and performing feature fusion on the spliced coding feature and the corrected text coding feature to obtain a second fused coding feature; and correcting the preset frame feature by using a target coding feature determined on the basis of the corrected fusion feature and the second fused coding feature to predict the regional position coordinate of the target visual object, so that the image-text noise is corrected on the basis of the preset cross-attention mechanism, and the image-text noise prediction accuracy is improved. The influence of noise is weakened by reducing the attention on the noise part in the text, and anti-noise visual positioning is realized.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, in particular to a visual positioning method, device, equipment and medium. Background technique [0002] In recent years, Multi Modal (MM) has become a very important research direction in the field of artificial intelligence. Due to its emphasis on the fusion of vision, text, voice and other information, various algorithms related to multimodality emerge in endlessly: various methods based on convolutional neural networks (CNN) and attention mechanisms (attention) have their own advantages. It has a wide range of applications and has become a mainstream method in fields such as Visual Commonsense Reasoning (VCR), Visual Question Answering (VQA), and Visual Grounding (VG). [0003] The visual localization task is one of the important research directions in the field of multimodal artificial intelligence. This task aims to locate the relevant object in the picture accordi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06T5/00G06T9/00G06T7/70

CPCG06T9/00G06T7/70G06T2207/20084G06T2207/20081G06T5/70

Inventor 李晓川李仁刚赵雅倩郭振华范宝余

Owner SUZHOU LANGCHAO INTELLIGENT TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Visual positioning method, device, equipment and medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology