Image description method based on space-time memory attention

An image description and attention technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as ignoring the timing of language expression, hindering the improvement of image description effects, and inaccurate attention feature acquisition.

Active Publication Date: 2020-05-12
BEIJING UNIV OF TECH
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The model structure based on the attention mechanism is called the attention model. The introduction of this model allows the decoder to have the ability to focus on part of the coding features, that is, to focus on the local area in the original image, which has promoted the great progress of the image description method, but currently There are also problems in the introduced attention model
[0005] It is generally believed that language description is a temporal representation, but the attention model combined with the current image description method only realizes the extraction of image space features, ignoring the temporal nature of language expression, that is, the attention obtained by the attention model at each moment. The force features are independent of each other in time series, which is different from the process of human observation of things, which directly results in the inaccurate acquisition of attention features, which hinders the further improvement of the image description effect.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description method based on space-time memory attention
  • Image description method based on space-time memory attention
  • Image description method based on space-time memory attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The following takes the MS COCO image description data set as an example to illustrate the specific implementation steps of the present invention:

[0026] Step (1) Obtain and preprocess the MS COCO image description dataset:

[0027] Step (1.1) Obtain the MS COCO image description data set, which contains image data I and its corresponding standard description data The download address of the MS COCO dataset is http: / / cocodataset.org / #download. The dataset contains a total of 164,062 pictures. The sizes of the training set, validation set, and test set are 82,783, 40,504, and 40,775, respectively. Except for the test set, each A picture also contains at least 5 corresponding standard descriptions as labels, some samples are figure 1 shown.

[0028] Step (1.2) describes the data in MS COCO Do preprocessing. Set the maximum length of the image description to 16, replace words with a word frequency less than 6 with "UNK" to reduce the interference of a few noise wor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an image description method based on spatio-temporal memory attention. The method comprises the steps that (1) acquiring and pro-processing an MS COCO image description data set; (2) constructing an encoder model, pre-training the encoder model, and completing the encoding of MS COCO image data I to obtain an image feature V; (3) constructing a decoder, and decoding the image feature V; and (4) model training. According to the model built by the method, gate control and memory in the long-short-term memory network are adopted in an original attention model. Compared with a traditional attention model, a memory matrix is newly added to the space-time memory attention model and used for dynamically storing past attention features, continuous self-updating is conductedunder the control effect of an input gate, an output gate and a forgetting gate, and finally relevant attention features in time sequence space are output. Based on the STMA model, the method is moreaccurate in image attention position, and the image description result is more accurate.

Description

technical field [0001] The invention belongs to the interdisciplinary research field of computer vision and natural language processing. Specifically, the invention designs an image description method based on spatio-temporal memory attention. Background technique [0002] Image description, in short, is to describe the main content of the image in one sentence, which requires the machine to have the ability to understand the content of the image and the ability to express content similar to humans. Image description is a difficult problem connecting the two research fields of computer vision and natural language processing. It is not easy to determine the existence, attributes and relationship of objects in the image. Using appropriate sentences to describe this information makes this task more difficult. How to accurately describe the content of images with fluent sentences is the research goal in the field of image description. The research and development of image descr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/044G06N3/045Y02T10/40
Inventor 徐骋冀俊忠张晓丹
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products