Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speaker and human language technology, applied in speech analysis, instruments, etc., can solve the problems of not getting multi-speaker speech rate estimation results, unfavorable real-time processing, slow speed, etc., and achieve the effect of saving computing time

Inactive Publication Date: 2012-07-04

SOUTH CHINA UNIV OF TECH

View PDF4 Cites 58 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

When the input speech contains multi-speaker speech, the input speech is only processed as the speech of one speaker, and the speech rate estimation result of multi-speaker cannot be obtained.

[0006] (2) slow

This method needs to train a large number of phoneme models (generally Hidden Mar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036] A detailed description will be given below in conjunction with specific embodiments and accompanying drawings.

[0037] figure 1 is a flowchart of a method for estimating speech rates of multiple speakers according to an embodiment of the present invention. Such as figure 1 As shown, first in step 101, the voice stream is read. The voice stream is voice data that records the voices of multiple speakers, and can be files in various formats, such as WAV, RAM, MP3, VOX, etc.

[0038] Then, in step 102, use the silence detection method based on the threshold judgment to find out the silence segment and the speech segment in the speech stream, splice the above speech segments into a long speech segment in order, and extract audio features from the long speech segment, using The audio features extracted above, according to the Bayesian information criterion, judge the similarity between adjacent data windows in the long speech segment to detect the speaker change point; fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers, and relates to a method for estimating speech speed of multiple speakers. The method for estimating speech speed of multiple speakers comprises the following steps: firstly, reading speech flow; detecting changing points of speakers in the speed flow, and segmenting the speech flow into a plurality of speech sections according to the changing points; carrying out clustering of the speakers according to the speech sections, splicing the speech sections of the samespeakers according to the sequence to acquire the number of the speakers and the speech sound of each speaker; and finally, estimating the time length of the speech sound of each speaker and the included word numbers to estimate the speech speed of each speaker. Compared with the method for estimating the speech speed of a single speaker based on speech recognition, not only the method can estimate the speech speeds of the multiple speakers, but also the estimating speed is faster.

Description

technical field [0001] The invention relates to speech signal processing and pattern recognition technology, in particular to a multi-speaker speech rate estimation method based on speaker segmentation and clustering. Background technique [0002] With the development of speech processing technology, the object of speech processing is gradually shifting from single-speaker speech to multi-speaker speech (such as conference speech, conversation speech). It is becoming more and more important to adaptively adjust the parameters of speech processing systems such as speech recognition systems. In addition, during the recording process in a recording studio or laboratory, speakers (such as announcers, program hosts, customer service personnel, etc.) measure speech rate subjectively based on experience, which is often not accurate enough. Although manual labeling can be used to estimate the speaker's speech rate after the recording is over, it is very time-consuming, especially w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L11/00G10L25/00

Inventor 李艳雄徐鑫贺前华

Owner SOUTH CHINA UNIV OF TECH

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for estimating speech speed of multiple speakers based on segmentation and clustering of speakers

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology