Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers

A speaker and human language technology, applied in speech analysis, instruments, etc., can solve the problems of not getting multi-speaker speech rate estimation results, unfavorable real-time processing, slow speed, etc., and achieve the effect of saving computing time

Inactive Publication Date: 2012-07-04
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the input speech contains multi-speaker speech, the input speech is only processed as the speech of one speaker, and the speech rate estimation result of multi-speaker cannot be obtained.
[0006] (2) slow
This method needs to train a large number of phoneme models (generally Hidden Mar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers
  • Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers
  • Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] A detailed description will be given below in conjunction with specific embodiments and accompanying drawings.

[0037] figure 1 is a flowchart of a method for estimating speech rates of multiple speakers according to an embodiment of the present invention. Such as figure 1 As shown, first in step 101, the voice stream is read. The voice stream is voice data that records the voices of multiple speakers, and can be files in various formats, such as WAV, RAM, MP3, VOX, etc.

[0038] Then, in step 102, use the silence detection method based on the threshold judgment to find out the silence segment and the speech segment in the speech stream, splice the above speech segments into a long speech segment in order, and extract audio features from the long speech segment, using The audio features extracted above, according to the Bayesian information criterion, judge the similarity between adjacent data windows in the long speech segment to detect the speaker change point; fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers, and relates to a method for estimating speech speed of multiple speakers. The method for estimating speech speed of multiple speakers comprises the following steps: firstly, reading speech flow; detecting changing points of speakers in the speed flow, and segmenting the speech flow into a plurality of speech sections according to the changing points; carrying out clustering of the speakers according to the speech sections, splicing the speech sections of the samespeakers according to the sequence to acquire the number of the speakers and the speech sound of each speaker; and finally, estimating the time length of the speech sound of each speaker and the included word numbers to estimate the speech speed of each speaker. Compared with the method for estimating the speech speed of a single speaker based on speech recognition, not only the method can estimate the speech speeds of the multiple speakers, but also the estimating speed is faster.

Description

technical field [0001] The invention relates to speech signal processing and pattern recognition technology, in particular to a multi-speaker speech rate estimation method based on speaker segmentation and clustering. Background technique [0002] With the development of speech processing technology, the object of speech processing is gradually shifting from single-speaker speech to multi-speaker speech (such as conference speech, conversation speech). It is becoming more and more important to adaptively adjust the parameters of speech processing systems such as speech recognition systems. In addition, during the recording process in a recording studio or laboratory, speakers (such as announcers, program hosts, customer service personnel, etc.) measure speech rate subjectively based on experience, which is often not accurate enough. Although manual labeling can be used to estimate the speaker's speech rate after the recording is over, it is very time-consuming, especially w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L11/00G10L25/00
Inventor 李艳雄徐鑫贺前华
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products