Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and equipment for generating video content description information

A technology for describing information and video content, applied in the field of computer vision, can solve problems such as failure to consider the relevance of continuous video frames, difficult tasks for machines, and inability to cope with multimedia data requirements

Pending Publication Date: 2021-05-04
SHANGHAI INST OF MICROSYSTEM & INFORMATION TECH CHINESE ACAD OF SCI
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the context of the stable development of the Internet and big data, the demand for multimedia information has shown explosive growth, and traditional information processing technology has been unable to meet the needs of multimedia data in labeling, description and other tasks
Describing videos, images, etc. in natural language is easy for humans but a difficult task for machines
[0003] At present, there have been many studies on the use of convolutional neural networks to process two-dimensional image data, but the processing methods for video data are still in the stage to be improved.
[0004] In the existing video understanding method, based on the video data frame-level feature sequence, the global part-of-speech sequence feature corresponding to natural language is extracted, and then an accurate natural language description is generated, but this method does not exclude repeated information between consecutive frames in the same scene. High redundancy; another example is that in the existing technology, the video image sequence is screened for key frames, and then the screened key frames are sent to the video frame description network to generate description text, but this method does not consider the The correlation between continuous video frames and the information differences between scenes are not suitable for understanding videos with scene changes, such as non-fixed camera videos, and videos that have been edited and spliced ​​from multiple scenes such as film and television works, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and equipment for generating video content description information
  • Method and equipment for generating video content description information
  • Method and equipment for generating video content description information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present application.

[0042] It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or des...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and equipment for generating video content description information. The method comprises the following steps: acquiring an image sequence of a target video; dividing the image sequence into a plurality of sub-image sequences; any two continuous subimage sequences in the plurality of subimage sequences corresponding to different scenes; for each subimage sequence in the plurality of subimage sequences, detecting the first frame image of the current subimage sequence according to a trained first detection model to obtain corresponding static scene description information; detecting images except the first frame image in the current sub-image sequence according to a trained second detection model to obtain corresponding dynamic event description information; and determining content description information corresponding to the current sub-image sequence according to the static scene description information and the dynamic event description information. Therefore, the video understanding difficulty can be reduced, redundant information extraction can be reduced, and the calculation efficiency can be improved.

Description

technical field [0001] The present application relates to the technical field of computer vision, in particular to a method and device for generating video content description information. Background technique [0002] In the context of the era of steady development of the Internet and big data, the demand for multimedia information has shown explosive growth, and traditional information processing technology has been unable to cope with the needs of multimedia data in labeling, description and other tasks. Describing videos, images, etc. in natural language is very simple for humans, but it is a difficult task for machines. [0003] At present, there have been many studies on the use of convolutional neural networks to process two-dimensional image data, but the processing methods for video data are still in the stage to be improved. [0004] In the existing video understanding method, based on the video data frame-level feature sequence, the global part-of-speech sequence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/46
CPCG06V20/47G06V20/48G06V10/44
Inventor 陈南希刘李黎张睿芃李燕北王俊翰张晓林
Owner SHANGHAI INST OF MICROSYSTEM & INFORMATION TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products