Deep clustering voice separation method based on improvement

A technology of speech separation and clustering, applied in speech analysis, instrument, character and pattern recognition, etc., can solve the problem that the separation effect is not ideal.

Inactive Publication Date: 2020-05-01
BEIJING INST OF COMP TECH & APPL
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the separation effect of this method is not ideal in the case of low signal-to-noise ratio, so it is necessar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep clustering voice separation method based on improvement
  • Deep clustering voice separation method based on improvement
  • Deep clustering voice separation method based on improvement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040]The voice experiment data used in Embodiment 1 comes from the TIMIT corpus. TIMIT is a classic corpus created by MIT in 1993 and is suitable for speech recognition, speaker classification, etc. The voice sampling frequency of its data set is 8kHz and contains 6300 sentences in total. , each of 630 people from eight major dialect regions in the United States speaks a given 10 sentences, all sentences are manually segmented and labeled at the phone level, 70% of the speakers are male, Most speakers are adult Caucasians. In order to test the speech separation task under different interference conditions, two speeches of different speakers are randomly mixed with SNR=-10dB, -5dB, 0dB, 5dB to form a training set, a verification set and a test set. Experimental conditions in different environments with strong and weak interference can be simulated, and each data set uses a different mixture of data than the other data sets, thus forming a speaker-independent environment. The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a deep clustering voice separation method based on improvement, wherein the method comprises the steps: step 1, mixing experiment data, and extracting logarithm power spectrumfeatures; step 2, building an improved deep clustering voice separation model, and carrying out model training by using a training set; step 3, obtaining an embedded space vector vi from the mixed voice of the test set through the deep clustering speech separation model trained in the step 2, clustering the embedded space vector vi in an embedded subspace through a mean shift clustering method, taking an obtained result as an ideal binary masking value of a training target, and calculating to obtain feature estimation of the two separated voice signals by utilizing the ideal binary masking value and input speech signal features; and step 4, carrying out waveform reconstruction and recovering a voice signal. According to the method, the current voice separation method based on deep clustering is improved, so that the effect of the method is improved under the condition of low signal-to-noise ratio mixed voice input.

Description

technical field [0001] The invention relates to the technical field of speech separation, in particular to a speech separation method based on and improved deep clustering. Background technique [0002] The "cocktail party problem" has always been a difficult problem in the speech separation task, mainly because this problem belongs to a speaker-independent speech separation problem, and the speaker does not know its prior information in advance. By referring to the ideal binary masking-based deep-clustering method (deep-clustering) proposed by Jonathan et al., the present invention improves the clustering method on the basis of it, using a bidirectional long-short-term memory network model and mean shift clustering, And experiments were carried out under the TIMIT speech data set. The final separation results show that in the case of low input signal-to-noise ratio, the separation effect is improved compared with the previous model. [0003] The term speech separation ori...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/0272G10L21/0224G10L21/0232G10L25/18G10L25/21G10L25/27G06K9/62
CPCG10L21/0272G10L21/0224G10L21/0232G10L25/18G10L25/21G10L25/27G06F18/23
Inventor 王昕蒋志翔张杨寇金桥常新旭徐冬冬闫帅赵晓燕
Owner BEIJING INST OF COMP TECH & APPL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products