Multi-speaker speech separation method and system based on beam forming

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A beamforming and speech separation technology, applied in speech analysis, instruments, etc., can solve problems such as poor speech intelligibility, poor speaker, and difficult separation

Active Publication Date: 2019-05-31

PEKING UNIV

View PDF4 Cites 60 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But in theory, the time complexity of matching calculation is factorial

[0007] However, there are two problems in the two ideas and the multi-channel separation methods based on them. One is that the more speakers there are, the more difficult the separation is, and the lower the intelligibility of the separated speech is; Artificially setting the number of speakers or the maximum number of speakers, even if there is an improved method later, it performs poorly in the separation of unknown speakers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0030] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings of the present invention. figure 1 Shown is the block diagram of multi-speaker speech separation based on beamforming proposed by the present invention. The specific implementation steps of the method of the present invention include multi-channel data acquisition, speaker number acquisition, beam enhancement, PSM mask estimation and target speaker speech recovery. The specific implementation process of each step is as follows:

[0031] 1. Multi-channel data acquisition

[0032] Design microphone arrays, which can be one-dimensional microphone arrays such as line arrays, or two-dimensional microphone arrays such as equilateral triangular arrays, T-shaped arrays, uniform circular arrays, uniform square arrays, coaxial circular arrays, and circular / rectangular arrays. It may also be a three...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-speaker speech separation method and system. The method comprises the following steps : acquiring mixed voice signals, obtaining multichannel multi-speaker mixed voicesignals, and scanning the multi-channel multi-speaker mixed voice signals to obtain MUSIC energy spectra; obtaining S peak values from the MUSIC energy spectra, wherein each peak value corresponds toa beam direction; respectively enhancing S beams to obtain mixed voice in S directions; performing short-time Fourier transform on the mixed voice corresponding to each direction to obtain short-timeFourier amplitude spectra of the S target speaker voice, and respectively inputting the short-time Fourier amplitude spectra into a depth neural network to estimate a phase sensing mask correspondingto each target speaker; and performing element-by-element multiplication between the phase sensing mask of each target speaker and amplitude spectra of the corresponding mixed speech to obtain the amplitude spectrum of the target speaker, and the time domain signal of the target speaker is recovered by inverse short-time Fourier transform using the phase spectrum of the corresponding mixed speech.

Description

technical field [0001] The invention belongs to the technical field of speech separation, and relates to beamforming and a deep neural network model, in particular to a method and system for speech separation based on beamforming. Background technique [0002] In a complex acoustic scene with interference such as noise or multiple speakers, picking out the voice of the target speaker has always been a difficult problem in the field of speech. This problem is called the "cocktail party problem". Normal people benefit from their own auditory attention mechanism, focusing on the target sound in the mixed sound, so that they can communicate in this complex environment. For machines, however, the "cocktail party problem" is a difficult task. Although the recognition rate of automatic speech recognition can be close to or even exceed that of ordinary people in clean speech, the recognition rate of automatic speech recognition will drop significantly in speech recognition with mul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L21/0272G10L25/30G10L21/0208

Inventor曲天书吴玺宏彭超

OwnerPEKING UNIV

Multi-speaker speech separation method and system based on beam forming

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology