Image description method of bidirectional multi-mode recursive network

An image description and multi-modal technology, applied in the direction of still image data retrieval, still image data index, biological neural network model, etc., can solve the problems of keeping unchanged, changing, and losing visual information, and achieve performance and accuracy improvement , rich visual information, easy to train the effect

Active Publication Date: 2017-11-24
南通斑马智能科技有限公司
View PDF5 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] These currently existing models are used directly after extracting image features. The first is to input image features only at the beginning of the model, which will cause the loss of visual information in the model at subsequent moments. The second is to input image features at each model step. The image features are input at all times, so that although the visual information is guaranteed, the image features remain unchanged at different times, and the words generated by the model at each time change.
In addition, the existing model only considers historical text information and ignores future text information when generating words at each moment, that is, each word in the generated sentence is obtained from its previous words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image description method of bidirectional multi-mode recursive network
  • Image description method of bidirectional multi-mode recursive network
  • Image description method of bidirectional multi-mode recursive network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] combine figure 1 , an image description method for a bidirectional multimodal recurrent network, comprising the following steps:

[0019] Step 1, download the image description dataset, and obtain the images in the dataset and their corresponding description sentences;

[0020] Step 2, process the sentences in the training set, extract the words that appear in the sentences and build a vocabulary;

[0021] Step 3, using the pre-trained convolutional neural network to extract the features of the images in the data set;

[0022] Step 4, build a bidirectional multimodal recursive network, and fuse the extracted image features with the corresponding text features;

[0023] Step 5, the network model considers the historical and future text information, combines the fused image features, uses the training set to train the model and makes it converge;

[0024] Step 6: Input a picture into the pre-trained bidirectional multimodal recurrent network model to obtain the corresp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an image description method of a bidirectional multi-mode recursive network. The image description method comprises the steps that downward images serve as a training set, and images in the training set and description sentences corresponding to the images are obtained; words emerging in the sentences in the training set are extracted, and a vocabulary is established; a pre-trained convolutional neural network is utilized to extract characteristics in the images in the data set; a bidirectional multi-mode recursive model is established, and the extracted image characteristics are fused with corresponding text characteristics; the bidirectional multi-mode recursive model is trained; a picture is input into the pre-trained model to obtain a corresponding description sentence.

Description

technical field [0001] The invention designs an image processing and pattern recognition technology, especially an image description method of a bidirectional multimodal recursive network. Background technique [0002] With the rapid development of computer vision and natural language processing, people pay more and more attention to the acquisition of visual information. How to use natural language to describe the content of an image is a focus of current research, and it is also a research content in the field of pattern recognition. In recent years, relying on convolutional neural networks to extract image features, combined with the advantages of recurrent neural networks in natural language processing, deep neural network models have become the mainstream method for image description. Its principle is based on image features, each word is sequentially generated at each moment in the running process through the recurrent neural network, and these words are combined to fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62G06N3/04
CPCG06F16/51G06F16/5866G06N3/04G06F18/253G06F18/214
Inventor 唐金辉束炎武
Owner 南通斑马智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products