Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Bidirectional Multimodal Recurrent Network Image Description Method

An image description, multi-modal technology, applied in still image data indexing, biological neural network model, still image data retrieval and other directions, can solve the problems of staying unchanged, changing, loss of visual information, etc., to improve performance and accuracy , rich visual information, easy to train

Active Publication Date: 2020-07-31
南通斑马智能科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] These currently existing models are used directly after extracting image features. The first is to input image features only at the beginning of the model, which will cause the loss of visual information in the model at subsequent moments. The second is to input image features at each model step. The image features are input at all times, so that although the visual information is guaranteed, the image features remain unchanged at different times, and the words generated by the model at each time change.
In addition, the existing model only considers historical text information and ignores future text information when generating words at each moment, that is, each word in the generated sentence is obtained from its previous words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Bidirectional Multimodal Recurrent Network Image Description Method
  • A Bidirectional Multimodal Recurrent Network Image Description Method
  • A Bidirectional Multimodal Recurrent Network Image Description Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] combine figure 1 , an image description method for a bidirectional multimodal recurrent network, comprising the following steps:

[0019] Step 1, download the image description dataset, and obtain the images in the dataset and their corresponding description sentences;

[0020] Step 2, process the sentences in the training set, extract the words that appear in the sentences and build a vocabulary;

[0021] Step 3, using the pre-trained convolutional neural network to extract the features of the images in the data set;

[0022] Step 4, build a bidirectional multimodal recursive network, and fuse the extracted image features with the corresponding text features;

[0023] Step 5, the network model considers the historical and future text information, combines the fused image features, uses the training set to train the model and makes it converge;

[0024] Step 6: Input a picture into the pre-trained bidirectional multimodal recurrent network model to obtain the corresp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an image description method for a bidirectional multimodal recursive network, comprising: downloading images as a training set, acquiring images in the training set and their corresponding description sentences; extracting words appearing in the sentences in the training set and constructing a vocabulary ; Use the pre-trained convolutional neural network to extract the features of the images in the data set; build a bidirectional multimodal recursive network model, and fuse the extracted image features with the corresponding text features; train the bidirectional multimodal recurrent network model ; Input a picture into the pre-trained model to get the corresponding description sentence.

Description

technical field [0001] The invention designs an image processing and pattern recognition technology, especially an image description method of a bidirectional multimodal recursive network. Background technique [0002] With the rapid development of computer vision and natural language processing, people pay more and more attention to the acquisition of visual information. How to use natural language to describe the content of an image is a focus of current research, and it is also a research content in the field of pattern recognition. In recent years, relying on convolutional neural networks to extract image features, combined with the advantages of recurrent neural networks in natural language processing, deep neural network models have become the mainstream method for image description. Its principle is based on image features, each word is sequentially generated at each moment in the running process through the recurrent neural network, and these words are combined to fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/58G06F16/51G06K9/62G06N3/04
CPCG06F16/51G06F16/5866G06N3/04G06F18/253G06F18/214
Inventor 唐金辉束炎武
Owner 南通斑马智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products