Video content description method and system based on frame selection

A technology for video content and video, applied in the field of natural language description generation, which can solve the problems of expensive, wasteful computing power, and subjective video summary annotation data.

Inactive Publication Date: 2019-03-01
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF5 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Such methods waste a lot of computing power on the one hand, and are also susceptible to noise or unimportant visual content on the other;
[0010] 2. The video content description method based on the attention mechanism, especially the method based on the spatial attention mechanism, needs to extract the features of the video frames first, and then perform weighted fusion on them. This can only be done when the video is of limited length and can be completely It is used in the case of observation, which is not suitable for real-time video or real-world scenario

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video content description method and system based on frame selection
  • Video content description method and system based on frame selection
  • Video content description method and system based on frame selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the purpose, technical solution and advantages of the present invention clearer, the method and system for describing video content based on frame selection proposed by the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific implementation methods described here are only used to explain the present invention, and are not intended to limit the present invention.

[0024]The present invention adopts the task-driven video frame selection technology. According to the needs of the task, the information content of the video frame can be defined, and the technology can select the required high-information video frame according to the information content of the video frame defined by the task. Collection, while reducing redundant calculations, also improves the accuracy of task processing; and the uninterrupted video content description technology based on video fram...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a video content description method based on frame selection, comprising the following steps: a feed-forward neural network is used for constructing a screening model, and thescreening model screens the video frame according to the visual richness and semantic consistency of the video frame; constructing a description model for describing the content of the video to be described; training the screening model and the description model with training data; Selecting a description frame in the video to be described through the filtering model; the visual features of the description frame are extracted and the description model is input to obtain the description sentence of the video to be described.

Description

technical field [0001] The invention relates to technologies in the fields of digital image processing and natural language processing, in particular to a technology for generating natural language descriptions for video content. Background technique [0002] Video content description (video captioning) is the task of converting video content into natural language. As early as 2002, Kojima et al. proposed the first video content description system to describe human behavior. Since then, a series of studies on image and video description have been born. Early methods used a bottom-up approach to deal with this problem, that is, first generated descriptors through attribute learning or object detection, and then concatenated these descriptors into a complete sentence through a language model. With the development of neural networks and deep learning, most modern description systems are based on convolutional neural networks and recurrent neural networks, and adopt an encoder...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62G06F16/332
CPCG06V20/46G06N3/045G06F18/214
Inventor 王树徽陈扬羽黄庆明张维刚
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products