Video caption generation method based on semantic segmentation and multilayer attention frame

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A semantic segmentation and attention technology, applied in character and pattern recognition, selective content distribution, instruments, etc., can solve the problems of repeated information extraction, cross-interference of modal information, loss of image structure information, etc., and achieve the goal of improving utilization rate Effect

Inactive Publication Date: 2018-05-01

CHINA UNIV OF PETROLEUM (EAST CHINA)

View PDF5 Cites 40 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

For example, due to the characteristics of video, video needs to extract a large number of picture features, and resizing each picture will lose the structural information of the picture, and extracting 3D convolution and 2D convolution seems to extract different features, but due to the weight sharing factor of the convolution, a large amount of information is repeatedly extracted

At present, although the attention mechanism is used to improve the fusion effect between modalities, the use of the same attention operation for different modalities does not consider the differences between modalities, which will lead to cross-interference of information between modalities

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] Hereinafter, the implementation of the technical solution will be further described in detail with reference to the accompanying drawings.

[0021] It will be understood by those skilled in the art that although the following description deals with many technical details regarding the embodiments of the present invention, it is by way of example only to illustrate the principles of the present invention and is not meant to be limiting. The present invention can be applied to situations other than the technical details exemplified below, as long as they do not depart from the principle and spirit of the present invention.

[0022] In addition, in order to avoid the description of this specification being limited to being redundant, in the description in this specification, some technical details that can be obtained in the prior art materials may be omitted, simplified, modified, etc. It will be understood by persons, and this does not affect the sufficiency of the discl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a video caption generation method based on semantic segmentation and a multilayer attention frame, and the method comprises the steps: 1, extracting multiple image frames froma video, wherein the caption of the video needs to be generated; 2, sensing a semantic segmentation model through a full convolution instance, and extracting the feature information of one deconvolution layer from the video; 3, extraction motion features and audio features of the video; 4, sensing the semantic segmentation model through the full convolution instance, and extracting an attribute label from an image extracted at step 1, wherein the attribute label comprises the object information in each image frame; 5, generating context matrixes in different modes according to the informationextracted at former steps, carrying out the layered fusion of the context matrixes in different modes, and generating a fused context matrix; 6, obtaining words, serving as the components of the caption, through employing LSTM and employing a multilayer sensing mechanism for processing; 7, carrying out the series combination of all obtained words, and generating a final caption.

Description

technical field [0001] The present invention relates to the technical field of computer vision and natural language processing, in particular to three-dimensional feature extraction technology and semantic segmentation technology based on computer vision, and time series model technology based on natural language processing, more particularly, to full convolution-based semantic segmentation technology and a multi-layer attention framework for video caption generation. Background technique [0002] Video subtitle generation refers to the automatic generation of natural language descriptions for a video. Such research has received increasing attention in the fields of artificial intelligence and computer vision. In today's society, it has a very wide range of applications, such as helping blind people in their daily lives, improving the quality of online video retrieval, and so on. In addition to related applications, video subtitle generation technology has played a huge ro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): H04N21/234H04N21/233H04N21/44H04N21/439H04N21/488G06K9/62G06K9/00

CPCH04N21/233H04N21/23418H04N21/4394H04N21/44008H04N21/4884G06V20/46G06F18/2163

Inventor 吴春雷魏燚伟王雷全褚晓亮崔学荣

Owner CHINA UNIV OF PETROLEUM (EAST CHINA)

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Video caption generation method based on semantic segmentation and multilayer attention frame

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology