Cross-layer multi-model feature fusion and convolutional decoding-based image description method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of feature fusion and image description, applied in neural learning methods, biological neural network models, still image data retrieval, etc., can solve problems such as inaccurate information description

Active Publication Date: 2020-10-30

JIANGXI UNIV OF SCI & TECH

View PDF10 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Aiming at the deficiencies of the prior art, the present invention provides an image description method based on cross-layer multi-model feature fusion and convolution decoding, which solves the problem of inaccurate description when the information contained in the image is complex in the existing image description method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0028] Such as Figure 1-5 The illustrated embodiment of the present invention provides a cross-layer multi-model feature fusion and image description method based on convolutional decoding, including the following steps:

[0029] S1. Firstly, in the vision module, the low-level and high-level cross-layer image feature fusion is realized in a single model, and then the feature maps obtained by multiple visual feature extraction models are averagely fused, and each sentence contained in the corresponding image is combined. words are mapped to words with D e dimensional embedding space, get their embedding vector sequences, and then obtain the final text features through 6 layers of causal convolution operations. When performing visual feature extraction, the rich feature information has a good guiding effect on the image description results, so using three A VGG16 structure is used as the extraction module of image visual features. At the same time, in order to fuse low-level ...

Embodiment 2

[0046] Such as Figure 1-7 The shown embodiment of the present invention provides a cross-layer multi-model feature fusion and image description method based on convolutional decoding, using VGG-16 and language-CNN (that is, the language module used in the present invention) to train the model, and its As the benchmark model CNN+CNN (Baseline), then on the basis of Baseline, add multiple VGG-16 networks, and realize cross-layer feature fusion in each VGG-16, and use the trained benchmark model parameters to carry out the model Initialization, retraining, on the MSCOCO dataset, some experimental results are as follows:

[0047] R1: A hamburger and a salad sitting on top of a table.

[0048] R2: A salad and a sandwich wait to be eaten at a restaurant. R3: An outside dining area with tables and chairs highlighting a salad and sandwich.

[0049] R4: A sandwich and a salad are on a tray on a wooden table.

[0050] R5: A table with a bowl of food, sandwich and wine glass sitting ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a cross-layer multi-model feature fusion and convolutional decoding-based image description method, and relates to the field of computer vision and natural language processing.The method comprises the following steps: S1, obtaining an embedded vector sequence and a final text feature; S2, calculating to obtain a vision and text fusion matching attention vector; S3, adding and fusing the attention vector and the text feature vector sum; S4, generating a complete description sentence. Cross-layer multi-model feature fusion is used, and the loss of low-layer image featureinformation is effectively compensated; the model can effectively extract and store semantic information in a complex background image, has the capacity of processing long-sequence words, is more accurate in description of image content and richer in information expression, and is worthy of vigorous popularization.

Description

technical field [0001] The invention relates to the fields of computer vision and natural language processing, in particular to an image description method based on cross-layer multi-model feature fusion and convolution decoding. Background technique [0002] As one of the main carriers of information, images have been increasingly shared by humans. How to make computers generate grammatically correct and semantically reasonable natural language sentences based on image content is very important, which is different from target detection and image classification. For relatively simple computer vision tasks such as computer vision, image description belongs to higher-level visual understanding. It not only needs to recognize objects and scenes in the image, but also needs to express the relationship between objects and objects, and between objects and scenes. The description sentences can meet human standards in both grammar and semantics. The traditional image description met...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/583G06F16/58G06F16/55G06F16/51G06K9/62G06N3/04G06N3/08

CPCG06F16/583G06F16/5866G06F16/55G06F16/51G06N3/08G06N3/045G06F18/253

Inventor 罗会兰岳亮亮陈鸿坤

Owner JIANGXI UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-layer multi-model feature fusion and convolutional decoding-based image description method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology