Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Audio frequency splitting method for changing detection based on decision tree and speaking person

A technology that changes detection and speaker. It is used in speech analysis, speech recognition, instruments, etc. It can solve the problem of unreliable distance values, save computing time and improve accuracy.

Inactive Publication Date: 2009-06-24
ZHEJIANG UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the KL2 distance-based algorithm is used to calculate the distance of the speech segment from the moving fixed-length window, making the distance value unreliable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio frequency splitting method for changing detection based on decision tree and speaking person
  • Audio frequency splitting method for changing detection based on decision tree and speaking person
  • Audio frequency splitting method for changing detection based on decision tree and speaking person

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present invention will be further introduced below in conjunction with accompanying drawing and embodiment: this audio segmentation method based on decision tree and speaker change detection is divided into six steps altogether:

[0012] Step 1: Audio Preprocessing

[0013] Audio preprocessing is divided into three parts: sampling quantization, zero drift removal, pre-emphasis and windowing.

[0014] 1. Sampling and quantization

[0015] A), filter the audio signal with a sharp cut-off filter to make its Nyquist frequency F N 4KHZ;

[0016] B), set the audio sampling rate F=2F N ;

[0017] C), for audio signal s a (t) Sampling by period to obtain the amplitude sequence of the digital audio signal s ( n ) = s a ( n F ) ;

[0018] D), s(n) is quantized and coded by p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The audio splitting method based on the decision tree and the talker change detection includes the first adaptive silencing detection to find out the mute in audio frequency and the coarse splitting of audio frequency signal with the mute; the subsequent mutation detection for fine splitting of audio frequency signal and classifying the audio segment into phonetic part and non-phonetic part with the decision tree; and the final detecting the talker change point in the phonetic segment to obtain the final splitting result. The present invention performs phonetic detection via combining two methods of mute detection and mutation detection and adopting phonetic / non-phonetic decision tree to raise the accuracy of phonetic detection, and performs the talker change detection in phonetic segment with saving in calculation time.

Description

technical field [0001] The invention relates to signal processing and pattern recognition, mainly an audio segmentation method based on decision tree and speaker change detection. Background technique [0002] Speaker retrieval technology refers to the technology of retrieving specific speakers in a large number of audio documents by using signal processing and pattern recognition methods. Speaker retrieval technology needs to solve two problems, that is, who is speaking and when. Common speaker retrieval uses speaker recognition technology to solve the problem of who is speaking, and when to speak requires audio segmentation. [0003] Commonly used segmentation methods include segmentation based on Bayesian information criterion and segmentation based on KL2 distance. The segmentation method of the Bayesian information criterion determines whether to segment by calculating the Bayesian values ​​of two hypotheses, "two audio features obey the same Gaussian distribution" an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/08G10L15/00G10L25/87
Inventor 吴朝晖杨莹春杨旻
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products