Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A multimodal video scene segmentation method based on sound and vision

A technology for video scene and scene segmentation, applied in speech analysis, character and pattern recognition, instruments, etc., to achieve the effect of improving accuracy

Inactive Publication Date: 2019-02-15
SHANGHAI JILIAN NETWORK TECH CO LTD
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there is still a lack of an effective multi-modal joint modeling method that effectively combines sound information and visual information to improve the accuracy of scene segmentation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multimodal video scene segmentation method based on sound and vision
  • A multimodal video scene segmentation method based on sound and vision
  • A multimodal video scene segmentation method based on sound and vision

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] Various details involved in the technical solution will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are intended to facilitate the understanding of the present invention, but not to limit it in any way.

[0017] The implementation process of the present invention is as figure 1 Shown:

[0018] In the embodiment of the present invention, firstly, the temporal boundary of the shot is determined by using the comprehensive feature of the tracking flow and the continuity of the global image color distribution, and the video is divided into segments composed of shots. Tracking flow continuity refers to the continuous movement of objects or regions appearing in a single shot in a video, but a sudden change occurs at the shot boundary.

[0019] In the embodiment of the present invention, the optical flow field between adjacent frames in the video is calculated to obtain the motion amount between...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multimodal video scene segmentation method based on sound and vision. The method comprises the following steps: step S1, the input video is shot segmented to obtain each shotsegment; step S2, the input video is shot segmented to obtain each shot segment. Step S2, visual and sound features are extracted from the segmented lens segments to obtain visual and sound feature vectors corresponding to the lens; Step S3: According to the visual and sound characteristics, the adjacent shots belonging to the same semantics are merged into the same scene to obtain a new scene time boundary.

Description

technical field [0001] The invention relates to a video scene segmentation method, in particular to a multi-mode video scene segmentation method based on sound and vision. Background technique [0002] Video segmentation in time dimension is the basic step and important link of video structure analysis. Its purpose is to segment the original video according to its content structure, divide the parts containing the same similar content into the same segment, and separate the parts with different content. The video content structure can be divided into shots and scenes according to the semantic level. A shot is a video segment captured continuously by a camera at one time. The reason for the transformation of the image in a shot is usually the movement of the camera and the object and the change of the light source, which is a gradual rather than a sudden change process. A scene is a video clip composed of several semantically related continuous shots that can express commo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G10L25/57G10L25/30
CPCG10L25/30G10L25/57G06V20/46G06V20/49
Inventor 张奕谢锦滨
Owner SHANGHAI JILIAN NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products