Image description method based on space-time memory attention

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image description and attention technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as ignoring the timing of language expression, hindering the improvement of image description effects, and inaccurate attention feature acquisition.

Active Publication Date: 2020-05-12

BEIJING UNIV OF TECH

View PDF8 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The model structure based on the attention mechanism is called the attention model. The introduction of this model allows the decoder to have the ability to focus on part of the coding features, that is, to focus on the local area in the original image, which has promoted the great progress of the image description method, but currently There are also problems in the introduced attention model

[0005] It is generally believed that language description is a temporal representation, but the attention model combined with the current image description method only realizes the extraction of image space features, ignoring the temporal nature of language expression, that is, the attention obtained by the attention model at each moment. The force features are independent of each other in time series, which is different from the process of human observation of things, which directly results in the inaccurate acquisition of attention features, which hinders the further improvement of the image description effect.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] The following takes the MS COCO image description data set as an example to illustrate the specific implementation steps of the present invention:

[0026] Step (1) Obtain and preprocess the MS COCO image description dataset:

[0027] Step (1.1) Obtain the MS COCO image description data set, which contains image data I and its corresponding standard description data The download address of the MS COCO dataset is http: / / cocodataset.org / #download. The dataset contains a total of 164,062 pictures. The sizes of the training set, validation set, and test set are 82,783, 40,504, and 40,775, respectively. Except for the test set, each A picture also contains at least 5 corresponding standard descriptions as labels, some samples are figure 1 shown.

[0028] Step (1.2) describes the data in MS COCO Do preprocessing. Set the maximum length of the image description to 16, replace words with a word frequency less than 6 with "UNK" to reduce the interference of a few noise wor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image description method based on spatio-temporal memory attention. The method comprises the steps that (1) acquiring and pro-processing an MS COCO image description data set; (2) constructing an encoder model, pre-training the encoder model, and completing the encoding of MS COCO image data I to obtain an image feature V; (3) constructing a decoder, and decoding the image feature V; and (4) model training. According to the model built by the method, gate control and memory in the long-short-term memory network are adopted in an original attention model. Compared with a traditional attention model, a memory matrix is newly added to the space-time memory attention model and used for dynamically storing past attention features, continuous self-updating is conductedunder the control effect of an input gate, an output gate and a forgetting gate, and finally relevant attention features in time sequence space are output. Based on the STMA model, the method is moreaccurate in image attention position, and the image description result is more accurate.

Description

technical field [0001] The invention belongs to the interdisciplinary research field of computer vision and natural language processing. Specifically, the invention designs an image description method based on spatio-temporal memory attention. Background technique [0002] Image description, in short, is to describe the main content of the image in one sentence, which requires the machine to have the ability to understand the content of the image and the ability to express content similar to humans. Image description is a difficult problem connecting the two research fields of computer vision and natural language processing. It is not easy to determine the existence, attributes and relationship of objects in the image. Using appropriate sentences to describe this information makes this task more difficult. How to accurately describe the content of images with fluent sentences is the research goal in the field of image description. The research and development of image descr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/04G06N3/08

CPCG06N3/08G06N3/044G06N3/045Y02T10/40

Inventor 徐骋冀俊忠张晓丹

Owner BEIJING UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image description method based on space-time memory attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology