Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-speaker-voice separation method based on deep learning

A separation method and multi-person voice technology, applied in speech analysis, instruments, etc., can solve problems such as instability, separation methods that cannot achieve separation effects, and voice signal complexity, and achieve excellent results

Active Publication Date: 2019-04-05
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Due to the complexity and instability of the speech signal, the traditional separation method cannot achieve a good separation effect, and in the previous separation, only the spectral amplitude of the target signal was estimated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker-voice separation method based on deep learning
  • Multi-speaker-voice separation method based on deep learning
  • Multi-speaker-voice separation method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0048] figure 1 It is a flow chart of a multi-person voice separation method based on deep learning. Such as figure 1 shown, including steps:

[0049] Step S101: Framing, windowing, and Fourier transforming the mixed voice signal containing multiple target voice signals received by the microphone to obtain the spectrum of the mixed voice s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-speaker-voice separation method based on deep learning. The method comprises the following steps: performing framing, windowing and Fourier transform on multi-speaker mixed voice signals obtained by a microphone to obtain a frequency spectrum of the mixed signals; sending frequency spectrum amplitude of the mixed voice signals to a neural network and estimating ideal amplitude masking of each target signal; retrieving the phase of each target signal with an iteration method on the basis of the estimated ideal amplitude masking of the signal as well as the frequency spectrum amplitude and frequency spectrum phase of the mixed signals; calculating phase-sensitive masking of each target signal by the retrieved phase of the target signal, and training the neuralnetwork to estimate the phase-sensitive masking; obtaining frequency spectrum amplitude of each target signal according to the phase-sensitive masking estimated by the neural network, and reconstructing the frequency spectrum of each target signal through combination with the retrieved phase; performing Fourier transform on the reconstructed frequency spectra of the signals to obtain separated time-domain voice signals. The speaker voice separation effect can be improved effectively with the method.

Description

technical field [0001] The present invention relates to the technical field of voice separation, in particular to a deep learning-based multi-person voice separation method. Background technique [0002] Speaker separation technology is to extract the speech signal of each speaker from the mixed speech signals of multiple speakers. This technology is of great significance to target speaker detection, speech recognition, etc. [0003] Due to the complexity and instability of the speech signal, the traditional separation method can not achieve a good separation effect, and the previous separation only estimates the spectrum amplitude of the target signal. Contents of the invention [0004] The purpose of the present invention is to solve the defects in the prior art. [0005] In order to achieve the above object, a deep learning-based multi-person voice separation method includes steps: [0006] Framing, windowing, and Fourier transforming the mixed voice signal containin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/18G10L25/30G10L25/45G10L21/0272
CPCG10L21/0272G10L25/18G10L25/30G10L25/45
Inventor 李军锋尹路颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products