Image subtitle generation method and system fusing visual attention and semantic attention

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of visual attention and attention, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., can solve the lack of personalization of subtitles, visual information is not fully considered in every time step, machine recognition Issues such as next attentional area difficulties

Active Publication Date: 2018-01-19

CHINA UNIV OF PETROLEUM (EAST CHINA)

View PDF4 Cites 72 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, its method has an obvious shortcoming: the original visual information is not fully considered at each time step, which leads to the lack of personalization of the generated subtitles.

Conversely, a small variance means that the machine has great difficulty identifying the next attention region

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0092] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0093] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0094] Image captioning is becoming increasingly important in the fields of computer vision and machine learning...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image subtitle generation method and system fusing visual attention and semantic attention. The method comprises the steps of extracting an image feature from each image tobe subjected to subtitle generation through a convolutional neural network to obtain an image feature set; building an LSTM model, and transmitting a previously labeled text description correspondingto each image to be subjected to subtitle generation into the LSTM model to obtain time sequence information; in combination with the image feature set and the time sequence information, generating avisual attention model; in combination with the image feature set, the time sequence information and words of a previous time sequence, generating a semantic attention model; according to the visual attention model and the semantic attention model, generating an automatic balance policy model; according to the image feature set and a text corresponding to the image to be subjected to subtitle generation, building a gLSTM model; according to the gLSTM model and the automatic balance policy model, generating words corresponding to the image to be subjected to subtitle generation by utilizing anMLP (multilayer perceptron) model; and performing serial combination on all the obtained words to generate a subtitle.

Description

technical field [0001] The invention relates to the technical field of subtitle generation from images, in particular to a method and system for generating subtitles for images that integrate visual attention and semantic attention. Background technique [0002] In the field of computer vision, image captioning has become a very challenging task. Recent attempts have focused on exploiting attentional models in machine translation. Attention model-based methods for generating image captions are mainly developed from the encoding-decoding framework. This framework converts visual features encoded by a CNN encoder into subtitles decoded by an RNN. The point of the attention-based model is to highlight the spatial features corresponding to a generated word. [0003] In the field of image captioning, attention models have been shown to be very effective. But it still faces the following two problems: [0004] On the one hand, it loses track of typical visual information. Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/18G06K9/46G06N3/04G06N3/08

Inventor 吴春雷魏燚伟储晓亮王雷全崔学荣

Owner CHINA UNIV OF PETROLEUM (EAST CHINA)

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image subtitle generation method and system fusing visual attention and semantic attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology