Method and equipment for generating video content description information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for describing information and video content, applied in the field of computer vision, can solve problems such as failure to consider the relevance of continuous video frames, difficult tasks for machines, and inability to cope with multimedia data requirements

Pending Publication Date: 2021-05-04

SHANGHAI INST OF MICROSYSTEM & INFORMATION TECH CHINESE ACAD OF SCI

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] In the context of the stable development of the Internet and big data, the demand for multimedia information has shown explosive growth, and traditional information processing technology has been unable to meet the needs of multimedia data in labeling, description and other tasks

Describing videos, images, etc. in natural language is easy for humans but a difficult task for machines

[0003] At present, there have been many studies on the use of convolutional neural networks to process two-dimensional image data, but the processing methods for video data are still in the stage to be improved.

[0004] In the existing video understanding method, based on the video data frame-level feature sequence, the global part-of-speech sequence feature corresponding to natural language is extracted, and then an accurate natural language description is generated, but this method does not exclude repeated information between consecutive frames in the same scene. High redundancy; another example is that in the existing technology, the video image sequence is screened for key frames, and then the screened key frames are sent to the video frame description network to generate description text, but this method does not consider the The correlation between continuous video frames and the information differences between scenes are not suitable for understanding videos with scene changes, such as non-fixed camera videos, and videos that have been edited and spliced from multiple scenes such as film and television works, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0041] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present application.

[0042] It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or des...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a method and equipment for generating video content description information. The method comprises the following steps: acquiring an image sequence of a target video; dividing the image sequence into a plurality of sub-image sequences; any two continuous subimage sequences in the plurality of subimage sequences corresponding to different scenes; for each subimage sequence in the plurality of subimage sequences, detecting the first frame image of the current subimage sequence according to a trained first detection model to obtain corresponding static scene description information; detecting images except the first frame image in the current sub-image sequence according to a trained second detection model to obtain corresponding dynamic event description information; and determining content description information corresponding to the current sub-image sequence according to the static scene description information and the dynamic event description information. Therefore, the video understanding difficulty can be reduced, redundant information extraction can be reduced, and the calculation efficiency can be improved.

Description

technical field [0001] The present application relates to the technical field of computer vision, in particular to a method and device for generating video content description information. Background technique [0002] In the context of the era of steady development of the Internet and big data, the demand for multimedia information has shown explosive growth, and traditional information processing technology has been unable to cope with the needs of multimedia data in labeling, description and other tasks. Describing videos, images, etc. in natural language is very simple for humans, but it is a difficult task for machines. [0003] At present, there have been many studies on the use of convolutional neural networks to process two-dimensional image data, but the processing methods for video data are still in the stage to be improved. [0004] In the existing video understanding method, based on the video data frame-level feature sequence, the global part-of-speech sequence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/00G06K9/46

CPCG06V20/47G06V20/48G06V10/44

Inventor 陈南希刘李黎张睿芃李燕北王俊翰张晓林

Owner SHANGHAI INST OF MICROSYSTEM & INFORMATION TECH CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and equipment for generating video content description information

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology