Multi-channel double-speaker separation method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speaker separation and speaker technology, applied in speech analysis, instruments, etc., can solve the problems of speech separation performance degradation, interfering with the speaker's voice, and the inability to know the specific location of the target speaker in advance, etc.

Pending Publication Date: 2021-12-31

INST OF ACOUSTICS CHINESE ACAD OF SCI

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Studies have shown that in a conference environment, the ratio of speaker overlap is generally not higher than 20%, so the robust performance for speech separation with different low speaker overlap ratios still needs to be improved

On the other hand, for mixed speech audio with different overlapping ratios of speakers, the specific position of the target speaker in the speech cannot be known in advance, and the input for neural network training can only be the entire speech

In this case, if average pooling is used, interfering speaker speech and silent frames will seriously affect the estimation of target speaker position information, thereby degrading the performance of speech separation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055] Based on the above-mentioned ideal sound source position estimation network and speaker masking estimation network, an embodiment of the present application provides a multi-channel dual-speaker separation method, the method comprising:

[0056] S31. Obtain the mixed voice audio including the voices of two speakers, perform frame division, windowing and Fourier transform processing on the mixed voice audio, and obtain the frequency spectrum of each frame of audio.

[0057] In a feasible implementation manner, the following steps S311-S313 are included:

[0058] S311, divide the mixed speech and audio to be separated into frames, each frame is 25 milliseconds, and the frame is shifted by 6.25 milliseconds;

[0059] S312, add a window to each frame, and the window function is a Hamming window;

[0060] S313. Perform a 512-point Fourier transform on each frame of audio to obtain a frequency spectrum of each frame of audio.

[0061] S32, input the frequency spectrum of ea...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a multi-channel double-speaker separation method and system, and the method comprises the steps: carrying out the processing of a mixed voice audio, and obtaining the frequency spectrum of each frame of audio; according to each frame of audio and the sound source position estimation network, obtaining estimated frame-level Cartesian coordinates and corresponding weights; obtaining a first logarithmic energy spectrum and a first phase difference between sine and cosine channels according to the frequency spectrum of each frame of audio; according to the estimated frame-level Cartesian coordinates and the corresponding weights, obtaining Cartesian coordinate estimation of a target speaker in the mixed voice audio; obtaining a first angle feature according to the Cartesian coordinate of the target speaker; obtaining a target speaker and a first estimated speaker mask according to the first logarithmic energy spectrum, the first phase difference between sine and cosine channels, the first angle feature and a speaker mask estimation network; and obtaining separated voices of the at least two speakers based on the target speaker, the first estimated speaker mask and the mixed voice audio.

Description

technical field [0001] The embodiments of the present application relate to the field of speech separation, and in particular to a multi-channel dual-speaker separation method and system. Background technique [0002] The goal of speech separation is to separate different speakers from the mixed speech audio with reverberation and noise to obtain clean individual speaker speech. Speech separation, as the front end of speech recognition system, voice log and other technologies, is widely used in various environments such as teaching environment and conference environment. [0003] Deep clustering is a traditional method for speech separation. It obtains the separated speech of the target speaker by training the ideal binary masking of the target speaker on the mixed speech audio. During the training process, each time-frequency unit needs to be vectorized, and then the time-frequency units with similar distances are clustered together. However, for the influence of differe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L21/0272G10L25/27

CPCG10L21/0272G10L25/27Y02T10/40

Inventor 张鹏远杨弋陈航艇颜永红

Owner INST OF ACOUSTICS CHINESE ACAD OF SCI

Multi-channel double-speaker separation method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology