Image subtitle generation method and system fusing visual attention and semantic attention

A technology of visual attention and attention, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., can solve the lack of personalization of subtitles, visual information is not fully considered in every time step, machine recognition Issues such as next attentional area difficulties

Active Publication Date: 2018-01-19
CHINA UNIV OF PETROLEUM (EAST CHINA)
View PDF4 Cites 72 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, its method has an obvious shortcoming: the original visual information is not fully considered at each time step, which leads to the lack o...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image subtitle generation method and system fusing visual attention and semantic attention
  • Image subtitle generation method and system fusing visual attention and semantic attention
  • Image subtitle generation method and system fusing visual attention and semantic attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0092] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0093] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0094] Image captioning is becoming increasingly important in the fields of computer vision and machine learning...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an image subtitle generation method and system fusing visual attention and semantic attention. The method comprises the steps of extracting an image feature from each image tobe subjected to subtitle generation through a convolutional neural network to obtain an image feature set; building an LSTM model, and transmitting a previously labeled text description correspondingto each image to be subjected to subtitle generation into the LSTM model to obtain time sequence information; in combination with the image feature set and the time sequence information, generating avisual attention model; in combination with the image feature set, the time sequence information and words of a previous time sequence, generating a semantic attention model; according to the visual attention model and the semantic attention model, generating an automatic balance policy model; according to the image feature set and a text corresponding to the image to be subjected to subtitle generation, building a gLSTM model; according to the gLSTM model and the automatic balance policy model, generating words corresponding to the image to be subjected to subtitle generation by utilizing anMLP (multilayer perceptron) model; and performing serial combination on all the obtained words to generate a subtitle.

Description

technical field [0001] The invention relates to the technical field of subtitle generation from images, in particular to a method and system for generating subtitles for images that integrate visual attention and semantic attention. Background technique [0002] In the field of computer vision, image captioning has become a very challenging task. Recent attempts have focused on exploiting attentional models in machine translation. Attention model-based methods for generating image captions are mainly developed from the encoding-decoding framework. This framework converts visual features encoded by a CNN encoder into subtitles decoded by an RNN. The point of the attention-based model is to highlight the spatial features corresponding to a generated word. [0003] In the field of image captioning, attention models have been shown to be very effective. But it still faces the following two problems: [0004] On the one hand, it loses track of typical visual information. Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/18G06K9/46G06N3/04G06N3/08
Inventor 吴春雷魏燚伟储晓亮王雷全崔学荣
Owner CHINA UNIV OF PETROLEUM (EAST CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products