Video caption generation method based on semantic segmentation and multilayer attention frame

A semantic segmentation and attention technology, applied in character and pattern recognition, selective content distribution, instruments, etc., can solve the problems of repeated information extraction, cross-interference of modal information, loss of image structure information, etc., and achieve the goal of improving utilization rate Effect

Inactive Publication Date: 2018-05-01
CHINA UNIV OF PETROLEUM (EAST CHINA)
View PDF5 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, due to the characteristics of video, video needs to extract a large number of picture features, and resizing each picture will lose the structural information of the picture, and extracting 3D convolution and 2D convolution seems to extract different features, but due to the weight sharing factor of the convolution, a large amount of information is repeatedly extracted
At present, although the attention mechanism is used to improve the fusion effect between modalities, the use of the same attention operation for different modalities does not consider the differences between modalities, which will lead to cross-interference of information between modalities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video caption generation method based on semantic segmentation and multilayer attention frame
  • Video caption generation method based on semantic segmentation and multilayer attention frame
  • Video caption generation method based on semantic segmentation and multilayer attention frame

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] Hereinafter, the implementation of the technical solution will be further described in detail with reference to the accompanying drawings.

[0021] It will be understood by those skilled in the art that although the following description deals with many technical details regarding the embodiments of the present invention, it is by way of example only to illustrate the principles of the present invention and is not meant to be limiting. The present invention can be applied to situations other than the technical details exemplified below, as long as they do not depart from the principle and spirit of the present invention.

[0022] In addition, in order to avoid the description of this specification being limited to being redundant, in the description in this specification, some technical details that can be obtained in the prior art materials may be omitted, simplified, modified, etc. It will be understood by persons, and this does not affect the sufficiency of the discl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a video caption generation method based on semantic segmentation and a multilayer attention frame, and the method comprises the steps: 1, extracting multiple image frames froma video, wherein the caption of the video needs to be generated; 2, sensing a semantic segmentation model through a full convolution instance, and extracting the feature information of one deconvolution layer from the video; 3, extraction motion features and audio features of the video; 4, sensing the semantic segmentation model through the full convolution instance, and extracting an attribute label from an image extracted at step 1, wherein the attribute label comprises the object information in each image frame; 5, generating context matrixes in different modes according to the informationextracted at former steps, carrying out the layered fusion of the context matrixes in different modes, and generating a fused context matrix; 6, obtaining words, serving as the components of the caption, through employing LSTM and employing a multilayer sensing mechanism for processing; 7, carrying out the series combination of all obtained words, and generating a final caption.

Description

technical field [0001] The present invention relates to the technical field of computer vision and natural language processing, in particular to three-dimensional feature extraction technology and semantic segmentation technology based on computer vision, and time series model technology based on natural language processing, more particularly, to full convolution-based semantic segmentation technology and a multi-layer attention framework for video caption generation. Background technique [0002] Video subtitle generation refers to the automatic generation of natural language descriptions for a video. Such research has received increasing attention in the fields of artificial intelligence and computer vision. In today's society, it has a very wide range of applications, such as helping blind people in their daily lives, improving the quality of online video retrieval, and so on. In addition to related applications, video subtitle generation technology has played a huge ro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04N21/234H04N21/233H04N21/44H04N21/439H04N21/488G06K9/62G06K9/00
CPCH04N21/233H04N21/23418H04N21/4394H04N21/44008H04N21/4884G06V20/46G06F18/2163
Inventor 吴春雷魏燚伟王雷全褚晓亮崔学荣
Owner CHINA UNIV OF PETROLEUM (EAST CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products