Deep clustering voice separation method based on improvement

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of speech separation and clustering, applied in speech analysis, instrument, character and pattern recognition, etc., can solve the problem that the separation effect is not ideal.

Inactive Publication Date: 2020-05-01

BEIJING INST OF COMP TECH & APPL

View PDF3 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the separation effect of this method is not ideal in the case of low signal-to-noise ratio, so it is necessary to improve the speech separation effect of the deep clustering method in the case of low input signal-to-noise ratio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040]The voice experiment data used in Embodiment 1 comes from the TIMIT corpus. TIMIT is a classic corpus created by MIT in 1993 and is suitable for speech recognition, speaker classification, etc. The voice sampling frequency of its data set is 8kHz and contains 6300 sentences in total. , each of 630 people from eight major dialect regions in the United States speaks a given 10 sentences, all sentences are manually segmented and labeled at the phone level, 70% of the speakers are male, Most speakers are adult Caucasians. In order to test the speech separation task under different interference conditions, two speeches of different speakers are randomly mixed with SNR=-10dB, -5dB, 0dB, 5dB to form a training set, a verification set and a test set. Experimental conditions in different environments with strong and weak interference can be simulated, and each data set uses a different mixture of data than the other data sets, thus forming a speaker-independent environment. The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a deep clustering voice separation method based on improvement, wherein the method comprises the steps: step 1, mixing experiment data, and extracting logarithm power spectrumfeatures; step 2, building an improved deep clustering voice separation model, and carrying out model training by using a training set; step 3, obtaining an embedded space vector vi from the mixed voice of the test set through the deep clustering speech separation model trained in the step 2, clustering the embedded space vector vi in an embedded subspace through a mean shift clustering method, taking an obtained result as an ideal binary masking value of a training target, and calculating to obtain feature estimation of the two separated voice signals by utilizing the ideal binary masking value and input speech signal features; and step 4, carrying out waveform reconstruction and recovering a voice signal. According to the method, the current voice separation method based on deep clustering is improved, so that the effect of the method is improved under the condition of low signal-to-noise ratio mixed voice input.

Description

technical field [0001] The invention relates to the technical field of speech separation, in particular to a speech separation method based on and improved deep clustering. Background technique [0002] The "cocktail party problem" has always been a difficult problem in the speech separation task, mainly because this problem belongs to a speaker-independent speech separation problem, and the speaker does not know its prior information in advance. By referring to the ideal binary masking-based deep-clustering method (deep-clustering) proposed by Jonathan et al., the present invention improves the clustering method on the basis of it, using a bidirectional long-short-term memory network model and mean shift clustering, And experiments were carried out under the TIMIT speech data set. The final separation results show that in the case of low input signal-to-noise ratio, the separation effect is improved compared with the previous model. [0003] The term speech separation ori...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L21/0272G10L21/0224G10L21/0232G10L25/18G10L25/21G10L25/27G06K9/62

CPCG10L21/0272G10L21/0224G10L21/0232G10L25/18G10L25/21G10L25/27G06F18/23

Inventor王昕蒋志翔张杨寇金桥常新旭徐冬冬闫帅赵晓燕

OwnerBEIJING INST OF COMP TECH & APPL

Deep clustering voice separation method based on improvement

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology