Multi-channel speaker-independent voice separation method based on deep clustering

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speaker-independent, speech separation technology, applied in speech analysis, instruments, character and pattern recognition, etc., can solve the problems of poor robustness, achieve the effect of improving robustness, reducing nonlinear distortion, and solving speech separation problems

Active Publication Date: 2020-04-07

RES & DEV INST OF NORTHWESTERN POLYTECHNICAL UNIV IN SHENZHEN +1

View PDF7 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the robustness of the former is poor, and the latter has large nonlinear distortion, so they are not particularly ideal, and there is still room for improvement.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0034] The present invention will be further described below in conjunction with the accompanying drawings and embodiments, and the present invention includes but not limited to the following embodiments.

[0035] like figure 1 As shown, the present invention provides a multi-channel and speaker-independent speech separation method based on deep clustering. First, collect the speaker's voice signal received by multiple microphones, and extract the amplitude spectrum features and spatial features of the voice to be processed; then, send the features of each channel to the bidirectional long short-term memory (BLSTM) network, and output the network Carry out K-means clustering to obtain the ideal binary time-frequency (Time-Frequency, T-F) mask after processing; then, use the obtained mask to calculate the spatial covariance matrix of the speaker's voice and the interference it receives, And calculate the coefficients of the MVDR beamformer; finally, the separated speaker voice...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a multi-channel speaker-independent voice separation method based on deep clustering. The method comprises the following steps: firstly, carrying out short-time Fourier transform on a voice signal to extract amplitude spectrum characteristics of the voice signal, then calculating cosine values of phase differences between different channels to serve as spatial characteristics, and combining the two characteristics to serve as input characteristics for training a deep clustering network; then, training a bidirectional long-short-term memory network, and obtaining estimated masks of different speakers by utilizing the network; and finally, calculating the coefficient of the MVDR beamformer by using the spatial covariance matrix, and multiplying the mixed voice by the obtained beamformer coefficient to obtain separated speaker voice signals. According to the method, the spatial information of the voice signals is better utilized, the high-quality mask is estimated by using the deep clustering network, the separation processing of the mixed voice signals of a plurality of speakers in the reverberation environment can be realized, and the method has better voice separation performance.

Description

technical field [0001] The invention belongs to the technical field of speech signal processing, and in particular relates to a multi-channel and speaker-independent speech separation method based on deep clustering. Background technique [0002] Speech separation is a relatively basic task in the signal field, and it is a special case of sound source separation. Its goal is to separate the target speech from the background noise. Speech separation has many applications, including hearing prostheses, communications, automatic speech processing, and speaker recognition, among others. For the human hearing system, even in a cocktail scene, we can easily hear a person's speech from the voices of other people and the noisy background noise around them. Therefore, the speech separation problem is often referred to as the "cocktail party problem". But the reason why humans can easily separate speech is that before the sound signal is transmitted to the human auditory center, it ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L21/0272G10L21/0208G10L21/0216G10L21/0224G10L21/0232G10L25/18G10L25/27G06K9/62

CPCG10L21/0272G10L21/0208G10L21/0216G10L21/0224G10L21/0232G10L25/18G10L25/27G10L2021/02166G10L2021/02082G10L2021/02087G06F18/23

Inventor张晓雷杨子叶谭旭

OwnerRES & DEV INST OF NORTHWESTERN POLYTECHNICAL UNIV IN SHENZHEN

Multi-channel speaker-independent voice separation method based on deep clustering

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology