Unlock instant, AI-driven research and patent intelligence for your innovation.

Method of locating sound source from video

A video and sound source technology, applied in the field of cross-modal learning, can solve the problems of blurred edges of positioning and low positioning accuracy, and achieve the effect of high accuracy and high application value

Active Publication Date: 2019-04-16
TSINGHUA UNIV
View PDF5 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, most of the existing work on video sound source positioning is at the pixel level, using convolutional neural networks to learn the relationship between sound and different positions in the picture, and using heat maps to mark possible sounds in the original image Partly, the edge of this method is blurred, the positioning accuracy is not high, and there is still positioning information in video frames where the sound and picture are not synchronized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of locating sound source from video
  • Method of locating sound source from video
  • Method of locating sound source from video

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention proposes a method for locating a sound source from a video, which will be further described in detail below in conjunction with specific embodiments.

[0040] The present invention proposes a method for locating a sound source from a video, comprising the following steps:

[0041] (1) Training stage;

[0042] (1-1) Obtain training samples; obtain J segments of video from any channel as training samples, each training sample length is 10 seconds, there is no special requirement for the content of the training sample video, the video needs to contain a variety of different object categories, each The object categories in the training sample videos are manually labeled;

[0043] The video source of training sample in the present embodiment is the video of 10 categories in the Audioset data set, (comprising automobile, motorcycle, helicopter, yacht, speech, dog, cat, pig, alarm clock, guitar), present embodiment selects altogether J = 32469 video clips...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for locating a sound source from a video, and belongs to the field of cross-modal learning. According to the method, a training sample video is acquired in a training stage and preprocessed, a neural network composed of a full connection layer and a sound source positioning neural network composed of a positioning network are constructed, the sound source positioning neural network is trained by using a preprocessed training sample, and the trained sound source positioning neural network is obtained. In the test stage, a test video is obtained and preprocessed,then the test video is input into the trained sound source positioning neural network, the similarity is calculated, synchronization of sound and video images and sound source positioning after synchronization are further carried out through the similarity, and therefore the problem of sound source positioning of the asynchronous video is solved. The method can automatically find the correspondingrelation between each object and the sound in the video picture, is high in positioning accuracy, is high in position precision, and is very high in application value.

Description

technical field [0001] The invention proposes a method for locating a sound source from a video, which belongs to the field of cross-modal learning. Background technique [0002] In recent years, with the popularity of the Internet and television, people are faced with more and more video clips. Videos contain rich sounds and images, and finding connections among them is meaningful in many ways, such as making human-machine interactions more friendly. It is becoming more and more important to automatically discover the corresponding relationship between various objects and sounds in the video screen, so as to help people quickly understand the part of the pronunciation in the video. The robot can also determine the location of the target in many scenarios such as rescue by locating the sound source in the video. [0003] At present, most of the existing work on video sound source positioning is at the pixel level, using convolutional neural networks to learn the relationsh...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06N3/04G06N3/08G10L25/30
CPCG06N3/08G10L25/30G06V20/41G06V2201/07G06N3/045
Inventor 刘华平王峰郭迪周峻峰孙富春
Owner TSINGHUA UNIV