An image description method based on a bidirectional double-attention mechanism

An image description and attention technology, applied in computer parts, instruments, computing and other directions, can solve problems such as reduced accuracy, and achieve the effect of high accuracy and good generalization

Active Publication Date: 2019-06-21
SHANXI UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the current description is highly correlated with the previous and subsequent information, the model onl

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An image description method based on a bidirectional double-attention mechanism
  • An image description method based on a bidirectional double-attention mechanism
  • An image description method based on a bidirectional double-attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0028] Recurrent Neural Network RNN ​​is a type of neural network used to process sequence data, mainly processing and predicting sequence data. figure 1 A typical recurrent neural network is shown. At each moment, enter x t And the hidden layer state h at the previous moment t-1 As the input of the recurrent neural network, the recurrent neural network produces the output o t And update h t Pass in the next moment. Since the variables and operations in the recurrent neural network are the same at different times, the recurrent neural network can be regarded as the result of the same neural network being copied infinitely. A represents all other states in the hidden layer.

[0029] The cyclic neural network has only a "causal" structure, and the state at the current moment can only obtain information from the past state and the current input. But in many application tasks, the output is likely to depend on the entire sequence. In order to solve this problem, a Bidirectional Re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an image description method based on a bidirectional double-attention mechanism. The image description method based on the bidirectional double-attention mechanism includes that: image features of the image are extracted through a convolutional neural network; The convolutional neural network takes the image feature of the last convolutional layer as the input of an attention mechanism, and inputs the image feature into a bidirectional long-short term memory network containing the attention mechanism; The attention mechanism obtains the hidden layer state of the last bidirectional long-short-term memory network, the bidirectional long-short-term memory network predicts the current hidden layer state by using the last hidden layer state, the significant image information and the current input, and then inputs the current hidden layer state into the attention mechanism to obtain the current significant information; And the bidirectional attention network predicts and describes the image according to the forward hidden layer state, the salient image information, the backward hidden layer state and the salient information.

Description

technical field [0001] The invention relates to an image description method. Background technique [0002] In recent years, based on computer vision and natural language processing, a lot of research has been done on image description research. The process of image description is to input the image into the "encoding-decoding model" to generate a language description. Encoding converts the input image into a fixed-length vector, and decoding converts the generated vector into an output language sequence. The commonly used encoder model in image description is Convolutional Neural Network (CNN for short), and the decoder is various variants of Recurrent Neural Network (RNN for short), such as Long Short-Term Memory Network (Long Short-Term Memory Network). Term Memory Network, referred to as LSTM). In recent years, Kelvin Xu et al. have introduced the attention mechanism to focus on the salient parts of the image when generating descriptions, thereby improving the accuracy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
Inventor 张丽红陶云松
Owner SHANXI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products