Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-speaker speech separation method and system based on beam forming

A beamforming and speech separation technology, applied in speech analysis, instruments, etc., can solve problems such as poor speech intelligibility, poor speaker, and difficult separation

Active Publication Date: 2019-05-31
PEKING UNIV
View PDF4 Cites 60 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But in theory, the time complexity of matching calculation is factorial
[0007] However, there are two problems in the two ideas and the multi-channel separation methods based on them. One is that the more speakers there are, the more difficult the separation is, and the lower the intelligibility of the separated speech is; Artificially setting the number of speakers or the maximum number of speakers, even if there is an improved method later, it performs poorly in the separation of unknown speakers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker speech separation method and system based on beam forming
  • Multi-speaker speech separation method and system based on beam forming
  • Multi-speaker speech separation method and system based on beam forming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings of the present invention. figure 1 Shown is the block diagram of multi-speaker speech separation based on beamforming proposed by the present invention. The specific implementation steps of the method of the present invention include multi-channel data acquisition, speaker number acquisition, beam enhancement, PSM mask estimation and target speaker speech recovery. The specific implementation process of each step is as follows:

[0031] 1. Multi-channel data acquisition

[0032] Design microphone arrays, which can be one-dimensional microphone arrays such as line arrays, or two-dimensional microphone arrays such as equilateral triangular arrays, T-shaped arrays, uniform circular arrays, uniform square arrays, coaxial circular arrays, and circular / rectangular arrays. It may also be a three...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-speaker speech separation method and system. The method comprises the following steps : acquiring mixed voice signals, obtaining multichannel multi-speaker mixed voicesignals, and scanning the multi-channel multi-speaker mixed voice signals to obtain MUSIC energy spectra; obtaining S peak values from the MUSIC energy spectra, wherein each peak value corresponds toa beam direction; respectively enhancing S beams to obtain mixed voice in S directions; performing short-time Fourier transform on the mixed voice corresponding to each direction to obtain short-timeFourier amplitude spectra of the S target speaker voice, and respectively inputting the short-time Fourier amplitude spectra into a depth neural network to estimate a phase sensing mask correspondingto each target speaker; and performing element-by-element multiplication between the phase sensing mask of each target speaker and amplitude spectra of the corresponding mixed speech to obtain the amplitude spectrum of the target speaker, and the time domain signal of the target speaker is recovered by inverse short-time Fourier transform using the phase spectrum of the corresponding mixed speech.

Description

technical field [0001] The invention belongs to the technical field of speech separation, and relates to beamforming and a deep neural network model, in particular to a method and system for speech separation based on beamforming. Background technique [0002] In a complex acoustic scene with interference such as noise or multiple speakers, picking out the voice of the target speaker has always been a difficult problem in the field of speech. This problem is called the "cocktail party problem". Normal people benefit from their own auditory attention mechanism, focusing on the target sound in the mixed sound, so that they can communicate in this complex environment. For machines, however, the "cocktail party problem" is a difficult task. Although the recognition rate of automatic speech recognition can be close to or even exceed that of ordinary people in clean speech, the recognition rate of automatic speech recognition will drop significantly in speech recognition with mul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0272G10L25/30G10L21/0208
Inventor 曲天书吴玺宏彭超
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products