Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics

A technology of short-term features and long-term features, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as median difference, mean difference, and the inability to effectively characterize the difference between overlapping speech and single-person speech statistical characteristics.

Inactive Publication Date: 2013-03-13
SOUTH CHINA UNIV OF TECH
View PDF6 Cites 38 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the feature parameters of the short-time frame layer can better describe some differences between overlapping voices and single-person voices, they cannot describe the differences in statistical significance between the above two voices, such as the mean difference, maximum value difference, and minimum value of features. value difference, median difference, mean square difference difference, etc.
In other words, short-term feature parameters cannot effectively represent the statistical difference between overlapping speech and single-person speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
  • Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
  • Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0189] The experimental data are taken from the Mandarin Chinese natural spoken dialogue corpus (Chinese Annotated Dialogue and Conversation Corpus, CADCC). The speech data is recorded by selected speakers of standard Mandarin in a professional recording environment. There are 12 dialogue units in total, and each dialogue unit has two speakers. The sampling frequency is 16 kHz, quantized with 16 bit, and saved in monophonic WAV format, with a library capacity of about 1.6GB. In the training data, there are 500 overlapping voice samples and 500 single-person voice samples; in the test data, there are 427 overlapping voice samples and 505 single-person voice samples. The duration of overlapping speech and single-person speech samples ranges from 0.8 to 6 seconds. Each speech sample is divided into frames and features are extracted, the frame length is 40 milliseconds, and the frame shift is 20 milliseconds. The dimension of the short-term feature matrix is ​​D=28, in which the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics. The overlapped voice and single voice distinguishing method comprises the following steps: performing read-in of voice; performing voice pre-processing which comprises pre-emphasis, framing and windowing; extracting short time characteristic parameters, and extracting various short time characteristic parameters from each frame of voice; extracting long time characteristic parameters, and calculating statistical characteristics of the short time characteristic parameters; training a Gaussian mixture model; adopting an expectation-maximization algorithm to train four Gaussian mixture models; performing model fusion judgment, extracting the short time characteristic parameters and the long time characteristic parameters from tested voice to respectively serve as input of a short time characteristic model and a long time characteristic model, performing weighing of output probabilities of the two models to obtain a total probability output value, judging the tested voice to be the overlapped voice or the single voice according to the probability output value, and achieving distinguishing of the overlapped voice and the single voice. Compared with a method of adopting the short time characteristics, the overlapped voice and single voice distinguishing method has good distinguishing effect, and the distinguishing accuracy rate is improved by 5.9% on average.

Description

technical field [0001] The invention relates to speech signal processing and pattern recognition technology, in particular to a method for distinguishing overlapping speech from single-person speech based on long-term features and short-term features. Background technique [0002] Overlapped Speech (OS) refers to the speech produced when multiple people speak at the same time. Overlapping voices frequently appear in multi-person conversational voices. For example, in the ICSI conference voice database, 6-14% of the voices overlap. Due to the different acoustic characteristics of overlapping speech and single-person speech (speech produced by one speaker), the appearance of overlapping speech will lead to a sharp decline in the performance of current speech recognition systems and speaker segmentation and clustering systems that process single-person speech. Distinguishing overlapping speech from single-person speech is of great significance for improving the performance of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G10L15/06G10L25/03
Inventor 李艳雄陈祝允贺前华李广隆杜佳媛吴伟王梓里
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products