Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-speaker Speech Separation Method Based on Voiceprint Features and Generative Adversarial Learning

A voiceprint feature and voice separation technology, applied in neural learning methods, voice analysis, biological neural network models, etc., can solve problems such as robustness, high complexity of deep models, poor voice separation effect, etc., to achieve tracking” Identify and improve the effect of invariance

Active Publication Date: 2022-05-13
BEIJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the deep model based on spectral mapping has high complexity and strong modeling ability, but its generalization is heavily dependent on the data set. If the amount of data is insufficient, the learned spectral mapping relationship is not robust enough; in addition, feature selection is usually general feature, the speech separation method based on spectral mapping fails to effectively combine the auditory selection characteristics of the human ear and the voice characteristics of different speakers, and the effect of speech separation is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker Speech Separation Method Based on Voiceprint Features and Generative Adversarial Learning
  • Multi-speaker Speech Separation Method Based on Voiceprint Features and Generative Adversarial Learning
  • Multi-speaker Speech Separation Method Based on Voiceprint Features and Generative Adversarial Learning

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0057] This embodiment proposes a speech separation method based on voiceprint features and generative adversarial learning for multi-speaker speech separation in speech recognition. The multi-speaker mentioned in this embodiment refers to a scene where multiple people speak at the same time, and the speech separation to be performed is to extract the speech of the target speaker. Preferably, the scene where multiple people speak at the same time includes: in the intelligent conference instant inter-interpretation system, removing the voice or background sound of unrelated people; suppressing the voice of the non-target speaker on the device side before transmitting the voice signal, Improve the voice quality and intelligibility of conference communication; and the development of application in smart cities will be in the speaker signal collection in voice interaction in smart home, unmanned driving, security monitoring and other fields.

[0058] figure 1 Shown is a schematic...

no. 2 example

[0099] This embodiment provides a multi-speaker speech separation system based on voiceprint features and generative adversarial learning. Figure 4 Shown is a schematic structural diagram of the multi-speaker speech separation system based on voiceprint features and generative adversarial learning. Such as Figure 4 As shown, the multi-speaker speech separation system includes: an anchor sample collection module, a hybrid preprocessing module, a voiceprint feature extraction module, at least one discriminator, and at least one generator.

[0100] Wherein, the anchor sample collection module is connected with the hybrid preprocessing module and the voiceprint feature extraction module, and is used to use the pure speech of the target speaker (ie, the anchor sample) as a pure training corpus, and provide the pure training corpus to the Hybrid preprocessing module and voiceprint feature extraction module.

[0101] The mixed preprocessing module is connected with the voiceprint...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a multi-speaker speech separation method based on voiceprint features and generation confrontation learning to solve the problem of inaccurate and pure speech separation in the prior art. The multi-speaker speech separation method is to mix the audio data of the target speaker, other irrelevant speakers, and noise to obtain an initial mixed training corpus, and to extract the voiceprint features from the pure training corpus of the target speaker and the separation result of the initialization generator, Complete the training of the discriminator; after the parameters of the discriminator are solidified, the training of the generator is completed; the generator with parameter solidification separates the target speaker's speech from the speech to be separated through generative confrontation learning. The invention utilizes generative adversarial learning to generate samples similar to the target, and continuously approaches the output distribution through the generative adversarial network, which reduces the distribution difference between speech data and real target speaker training data in a multi-speaker interference environment, and realizes target speaker Tracking and identification of audio.

Description

technical field [0001] The invention belongs to the field of speech recognition, in particular to a multi-speaker speech separation method based on voiceprint features and generation confrontation learning. Background technique [0002] Automatic Speech Recognition (ASR) is to convert the vocabulary content in human speech into computer-readable input, and use computers to recognize human language. As a way of communication between humans and computers, it is regarded as The basic means of future technology interaction. When people speak in different environments, there will be different interferences. To accurately identify the language of the target speaker, it is necessary to separate the collected audio information. Speech separation includes speech enhancement, multi-speaker separation, and reverberation, among which multi-speaker separation is the most common. For example, in an instant inter-interpretation system for intelligent conferences, on the one hand, when th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L17/00G10L17/02G10L17/04G10L17/06G10L17/18G06N3/04G06N3/08
CPCG10L17/02G10L17/04G10L17/06G10L17/18G06N3/088G06N3/044G06N3/045
Inventor 明悦傅豪
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products