A multimodal video scene segmentation method based on sound and vision

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for video scene and scene segmentation, applied in speech analysis, character and pattern recognition, instruments, etc., to achieve the effect of improving accuracy

Inactive Publication Date: 2019-02-15

SHANGHAI JILIAN NETWORK TECH CO LTD

View PDF5 Cites 21 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At present, there is still a lack of an effective multi-modal joint modeling method that effectively combines sound information and visual information to improve the accuracy of scene segmentation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0016] Various details involved in the technical solution will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are intended to facilitate the understanding of the present invention, but not to limit it in any way.

[0017] The implementation process of the present invention is as figure 1 Shown:

[0018] In the embodiment of the present invention, firstly, the temporal boundary of the shot is determined by using the comprehensive feature of the tracking flow and the continuity of the global image color distribution, and the video is divided into segments composed of shots. Tracking flow continuity refers to the continuous movement of objects or regions appearing in a single shot in a video, but a sudden change occurs at the shot boundary.

[0019] In the embodiment of the present invention, the optical flow field between adjacent frames in the video is calculated to obtain the motion amount between...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multimodal video scene segmentation method based on sound and vision. The method comprises the following steps: step S1, the input video is shot segmented to obtain each shotsegment; step S2, the input video is shot segmented to obtain each shot segment. Step S2, visual and sound features are extracted from the segmented lens segments to obtain visual and sound feature vectors corresponding to the lens; Step S3: According to the visual and sound characteristics, the adjacent shots belonging to the same semantics are merged into the same scene to obtain a new scene time boundary.

Description

technical field [0001] The invention relates to a video scene segmentation method, in particular to a multi-mode video scene segmentation method based on sound and vision. Background technique [0002] Video segmentation in time dimension is the basic step and important link of video structure analysis. Its purpose is to segment the original video according to its content structure, divide the parts containing the same similar content into the same segment, and separate the parts with different content. The video content structure can be divided into shots and scenes according to the semantic level. A shot is a video segment captured continuously by a camera at one time. The reason for the transformation of the image in a shot is usually the movement of the camera and the object and the change of the light source, which is a gradual rather than a sudden change process. A scene is a video clip composed of several semantically related continuous shots that can express commo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G10L25/57G10L25/30

CPCG10L25/30G10L25/57G06V20/46G06V20/49

Inventor 张奕谢锦滨

Owner SHANGHAI JILIAN NETWORK TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A multimodal video scene segmentation method based on sound and vision

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology