Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-channel speaker-independent voice separation method based on deep clustering

A speaker-independent, speech separation technology, applied in speech analysis, instruments, character and pattern recognition, etc., can solve the problems of poor robustness, achieve the effect of improving robustness, reducing nonlinear distortion, and solving speech separation problems

Active Publication Date: 2020-04-07
RES & DEV INST OF NORTHWESTERN POLYTECHNICAL UNIV IN SHENZHEN +1
View PDF7 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the robustness of the former is poor, and the latter has large nonlinear distortion, so they are not particularly ideal, and there is still room for improvement.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-channel speaker-independent voice separation method based on deep clustering
  • Multi-channel speaker-independent voice separation method based on deep clustering
  • Multi-channel speaker-independent voice separation method based on deep clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described below in conjunction with the accompanying drawings and embodiments, and the present invention includes but not limited to the following embodiments.

[0035] like figure 1 As shown, the present invention provides a multi-channel and speaker-independent speech separation method based on deep clustering. First, collect the speaker's voice signal received by multiple microphones, and extract the amplitude spectrum features and spatial features of the voice to be processed; then, send the features of each channel to the bidirectional long short-term memory (BLSTM) network, and output the network Carry out K-means clustering to obtain the ideal binary time-frequency (Time-Frequency, T-F) mask after processing; then, use the obtained mask to calculate the spatial covariance matrix of the speaker's voice and the interference it receives, And calculate the coefficients of the MVDR beamformer; finally, the separated speaker voice...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-channel speaker-independent voice separation method based on deep clustering. The method comprises the following steps: firstly, carrying out short-time Fourier transform on a voice signal to extract amplitude spectrum characteristics of the voice signal, then calculating cosine values of phase differences between different channels to serve as spatial characteristics, and combining the two characteristics to serve as input characteristics for training a deep clustering network; then, training a bidirectional long-short-term memory network, and obtaining estimated masks of different speakers by utilizing the network; and finally, calculating the coefficient of the MVDR beamformer by using the spatial covariance matrix, and multiplying the mixed voice by the obtained beamformer coefficient to obtain separated speaker voice signals. According to the method, the spatial information of the voice signals is better utilized, the high-quality mask is estimated by using the deep clustering network, the separation processing of the mixed voice signals of a plurality of speakers in the reverberation environment can be realized, and the method has better voice separation performance.

Description

technical field [0001] The invention belongs to the technical field of speech signal processing, and in particular relates to a multi-channel and speaker-independent speech separation method based on deep clustering. Background technique [0002] Speech separation is a relatively basic task in the signal field, and it is a special case of sound source separation. Its goal is to separate the target speech from the background noise. Speech separation has many applications, including hearing prostheses, communications, automatic speech processing, and speaker recognition, among others. For the human hearing system, even in a cocktail scene, we can easily hear a person's speech from the voices of other people and the noisy background noise around them. Therefore, the speech separation problem is often referred to as the "cocktail party problem". But the reason why humans can easily separate speech is that before the sound signal is transmitted to the human auditory center, it ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/0272G10L21/0208G10L21/0216G10L21/0224G10L21/0232G10L25/18G10L25/27G06K9/62
CPCG10L21/0272G10L21/0208G10L21/0216G10L21/0224G10L21/0232G10L25/18G10L25/27G10L2021/02166G10L2021/02082G10L2021/02087G06F18/23
Inventor 张晓雷杨子叶谭旭
Owner RES & DEV INST OF NORTHWESTERN POLYTECHNICAL UNIV IN SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products