Media segment-based speaking detection method and system

A detection method and media technology, applied in speech analysis, instrumentation, computing, etc., can solve problems such as weak generalization ability and decreased detection rate

Active Publication Date: 2016-02-17
博视联(苏州)信息科技有限公司
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing technology usually uses a supervised learning trainer, and the generalization ability is not strong, resulting in a decrease in the detection rate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Media segment-based speaking detection method and system
  • Media segment-based speaking detection method and system
  • Media segment-based speaking detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The technical solution of the present invention will be described in detail below in conjunction with the embodiments and accompanying drawings.

[0048] Such as figure 1 , the processing work of the method provided by the embodiment of the present invention includes the following specific steps:

[0049] Step 1, divide the input media signal S(t) into audio signal S 1 (t) and the video signal S 2 (t), which are processed separately,

[0050] For audio signal S 1 (t), processed as follows:

[0051] (1) To the audio signal of the media file of input, calculate the harmonic frequency vector in the discrete Fourier window, suppose to obtain a common harmonic frequency in a plurality of discrete Fourier window DFT (DiscreteFouriertransform) in the embodiment.

[0052] (2) Calculate the likelihood ratio logΛ(t) of each frame containing a harmonic frequency component as an audio feature, and t is the frame label of the audio.

[0053] In specific implementation, (1) and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a media segment-based speaking detection method and system. The media segment-based speaking detection method includes the following steps that: inputted media signals are divided into audio signals and video signals, and the audio signals and video signals are processed respectively, and as for the audio signals, the hidden Markov model is adopted to calculate per-second conditional probabilities based on the likelihood ratio of harmonic frequencies, and clustering is performed, and as for the video signals, a face region, a lip portion and the image energy of a lip region are extracted from each frame of image in the video signals of an inputted media file, and clustering is performed according to the image energy, and the hidden Markov model is adopted to calculate per-second conditional probabilities, and clustering is performed, and two clusters can be obtained; and clustering results of the audio signals are matched with clustering results of the video signals respectively, so that the final result of speaking detection can be obtained. According to the method and system of the invention, speaking detection is performed based on the audio and video information, and a detection rate can be improved.

Description

technical field [0001] The invention relates to the technical field of speech detection, in particular to a method and system for speech detection involving media segments. Background technique [0002] With the development of information technology, human-computer interaction, teleconferencing, voiceprint recognition and other technologies have become hot research objects, and speech detection, as an important part of it, has also received more and more attention. The speech detection technology is a technology for distinguishing whether a person in a media segment is speaking or not. The traditional speech activity detection method is mainly based on audio information or video information, which has poor robustness. In order to solve this problem, multi-modal speech detection technology based on audio and video information is introduced. However, the existing technology usually uses a supervised learning trainer, and the generalization ability is not strong, resulting in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/00G10L25/78
CPCG10L25/78G06V40/172G06V10/758G06F18/23
Inventor 胡瑞敏王瑾梁超王晓晨
Owner 博视联(苏州)信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products