Media segment-based speaking detection method and system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A detection method and media technology, applied in speech analysis, instrumentation, computing, etc., can solve problems such as weak generalization ability and decreased detection rate

Active Publication Date: 2016-02-17

博视联(苏州)信息科技有限公司

View PDF3 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the existing technology usually uses a supervised learning trainer, and the generalization ability is not strong, resulting in a decrease in the detection rate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] The technical solution of the present invention will be described in detail below in conjunction with the embodiments and accompanying drawings.

[0048] Such as figure 1 , the processing work of the method provided by the embodiment of the present invention includes the following specific steps:

[0049] Step 1, divide the input media signal S(t) into audio signal S 1 (t) and the video signal S 2 (t), which are processed separately,

[0050] For audio signal S 1 (t), processed as follows:

[0051] (1) To the audio signal of the media file of input, calculate the harmonic frequency vector in the discrete Fourier window, suppose to obtain a common harmonic frequency in a plurality of discrete Fourier window DFT (DiscreteFouriertransform) in the embodiment.

[0052] (2) Calculate the likelihood ratio logΛ(t) of each frame containing a harmonic frequency component as an audio feature, and t is the frame label of the audio.

[0053] In specific implementation, (1) and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a media segment-based speaking detection method and system. The media segment-based speaking detection method includes the following steps that: inputted media signals are divided into audio signals and video signals, and the audio signals and video signals are processed respectively, and as for the audio signals, the hidden Markov model is adopted to calculate per-second conditional probabilities based on the likelihood ratio of harmonic frequencies, and clustering is performed, and as for the video signals, a face region, a lip portion and the image energy of a lip region are extracted from each frame of image in the video signals of an inputted media file, and clustering is performed according to the image energy, and the hidden Markov model is adopted to calculate per-second conditional probabilities, and clustering is performed, and two clusters can be obtained; and clustering results of the audio signals are matched with clustering results of the video signals respectively, so that the final result of speaking detection can be obtained. According to the method and system of the invention, speaking detection is performed based on the audio and video information, and a detection rate can be improved.

Description

technical field [0001] The invention relates to the technical field of speech detection, in particular to a method and system for speech detection involving media segments. Background technique [0002] With the development of information technology, human-computer interaction, teleconferencing, voiceprint recognition and other technologies have become hot research objects, and speech detection, as an important part of it, has also received more and more attention. The speech detection technology is a technology for distinguishing whether a person in a media segment is speaking or not. The traditional speech activity detection method is mainly based on audio information or video information, which has poor robustness. In order to solve this problem, multi-modal speech detection technology based on audio and video information is introduced. However, the existing technology usually uses a supervised learning trainer, and the generalization ability is not strong, resulting in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/62G06K9/00G10L25/78

CPCG10L25/78G06V40/172G06V10/758G06F18/23

Inventor胡瑞敏王瑾梁超王晓晨

Owner博视联(苏州)信息科技有限公司

Media segment-based speaking detection method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology