Unlock instant, AI-driven research and patent intelligence for your innovation.

A method of double tone detection in continuous speech stream

A detection method and voice stream technology, applied in voice analysis, voice recognition, instruments, etc., can solve the problems that the detection accuracy of the system does not meet expectations, improve system robustness and detection performance, increase system robustness, and reduce false alarms. warning error effect

Active Publication Date: 2020-04-14
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

From the test results, the detection accuracy of the system has not yet reached expectations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method of double tone detection in continuous speech stream
  • A method of double tone detection in continuous speech stream
  • A method of double tone detection in continuous speech stream

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0056] 1. The structure and state space of HMM

[0057] For speech and refrain, respectively, a chain of three states is used to model it. For each state chain, GMM (Gaussian Mixture Model) is used to describe the acoustic mapping from state to observation. For speech, a GMM of 256 Gaussians is used to describe its acoustic variation; for antiphonic, a GMM of 64 Gaussians is used. In order to control the jump of the state chain between speech and repeat, a penalty term is introduced. By adjusting this penalty term, a trade-off between system detection accuracy and recall can be made.

[0058] 2. Characteristic form

[0059] At different scales, there are differences in feature robustness and expressive ability. The feature parameters at four scales are calculated and recorded as: MLpR1, MLpR2, MLpR3 and MLpR4.

[0060] MLpR1 was calculated from conventional short-time Fourier analysis. Among them, the frame length is 20ms, the frame shift is 10ms, and the FFT adopts 1024 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a detection method for syllable in continuous speech stream. The detection method comprises the following steps: step 101) through a multi-scale representation of the speech, an integrated feature is acquired, and then the integrated feature acquired is input into a HMM detector for an initial judgment; step 102) a secondary judgment of the initial judgment is made by using non-negative matrix factorization, and then some non-overlapping segments of misjudgment caused by noise interference are removed; the step 101 above) further comprises the following steps: step 101-1) endpoint detection of speech stream is given and mute section is removed; step 101-2) the integrated features of speech streams is acquired, and parameter representation of spectrogram in four scales and their first order and second order difference are included in the integrated features; step 101-3) the mean value and variance normalization of each dimension of the feature vectors contained in the synthetic features are processed; step 101-4) then the processing result of the step 1013) is input into the HMM detector for the initial judgment.

Description

technical field [0001] The invention belongs to the field of voice signal processing, and relates to a method for detecting overlapping sounds, which can be used in continuous voice streams to automatically search for voice segments in which multiple people (two or more) speak at the same time. Background technique [0002] Double tone detection is more common in speaker diarization system (Speaker Diarization). In this system, the continuous speech stream is first segmented into speech segments belonging to different speakers; then a certain algorithm is applied, and the segmented speech segments are given corresponding speaker identifiers. However, this method of single-speaker labeling is unreasonable when a segment of speech contains overlapping sounds. Therefore, it is often desirable to detect the overlapped sound segment of the continuous speech flow in advance and perform special processing. [0003] In the case of a single channel, the overtone detection usually u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/14G10L15/20G10L25/03G10L25/54
CPCG10L15/142G10L15/20G10L25/03G10L25/54
Inventor 胡琦张鹏远潘接林颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI