Unlock instant, AI-driven research and patent intelligence for your innovation.

Voice separation method and system, mobile terminal and storage medium

A speech separation and audio technology, applied in speech analysis, neural learning methods, instruments, etc., can solve the problems of poor speech separation effect, achieve the effect of improving accuracy, simplifying steps, and improving user experience

Active Publication Date: 2020-07-03
XIAMEN KUAISHANGTONG TECH CORP LTD
View PDF8 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the embodiment of the present invention is to provide a voice separation method, system, mobile terminal and storage medium, aiming at solving the voice separation effect in the existing voice separation process bad question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice separation method and system, mobile terminal and storage medium
  • Voice separation method and system, mobile terminal and storage medium
  • Voice separation method and system, mobile terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] see figure 1 , is a flowchart of the speech separation method provided by the first embodiment of the present invention, including steps:

[0045] Step S10, obtaining the sample audio of the left channel and the sample audio of the right channel, and combining the sample audio of the left channel and the sample audio of the right channel to obtain the combined sample audio;

[0046] Among them, the left channel sample audio and the right channel sample audio are obtained by collecting the phone audio in the real scene. In this step, the phone call audio is saved through the phone recording function, but the saved call audio is required to be double-sound channel, that is, the left channel is one person's call, and the right channel is another person's call, and 10,000 call audios are recorded, and each call audio is about 2 minutes;

[0047] Specifically, in this step, all the two-channel call audios are merged into single-channel audio, that is, the respective calls o...

Embodiment 2

[0066] see figure 2 , is a flow chart of the speech separation method provided by the second embodiment of the present invention, including steps:

[0067] Step S11, obtaining the sample audio of the left channel and the sample audio of the right channel, and combining the sample audio of the left channel and the sample audio of the right channel to obtain the combined sample audio;

[0068] Step S21, constructing a prenet network, and constructing a CBHG network after the prenet network;

[0069] Wherein, the prenet network includes three fully connected layers, and the CBHG network includes a first convolutional layer, a pooling layer, a second convolutional layer, and a third convolutional layer in sequence, and the second convolutional layer and The third convolutional layer is a one-dimensional convolutional layer, and the filter size of the second convolutional layer and the third convolutional layer is 3, the stride is 1, and the activation function used by the second...

Embodiment 3

[0093] see image 3 , is a schematic structural diagram of the speech separation system 100 provided by the third embodiment of the present invention, including: a sample audio acquisition module 10, a feature dimensionality reduction module 11, a feature decoding module 12, an iterative training module 13 and a speech separation module 14, wherein:

[0094] The sample audio acquisition module 10 is configured to acquire the left channel sample audio and the right channel sample audio, and combine the left channel sample audio and the right channel sample audio to obtain combined sample audio.

[0095] The feature dimensionality reduction module 11 is configured to construct an encoding network, and input the spectral features of the combined sample audio into the decoding network for dimensionality reduction encoding to obtain dimensionality reduction features.

[0096] Wherein, the feature dimensionality reduction module 11 is also used to: construct a prenet network, and co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a voice separation method and system, a mobile terminal and a storage medium, wherein the method comprises the steps: obtaining a left channel sample audio and a right channel sample audio, combining the left channel sample audio and the right channel sample audio, and obtaining a combined sample audio; constructing a coding network, and inputting the frequency spectrum features of the combined frequency spectrum into a decoding network for dimension reduction coding to obtain dimension reduction features; performing attention calculation on the dimension reduction features by adopting an attention mechanism to obtain an attention probability value, and inputting the attention probability value into the decoding network for decoding to obtain a frequency spectrum decoding result; calculating a loss value between the frequency spectrum decoding result and the frequency spectrum characteristics of the sample audio, and performing model iterative training on the coding network and the decoding network according to the loss value to obtain a voice separation model; and inputting the voice to be recognized into the voice separation model for voice separation to obtain a left channel audio file and a right channel audio file. According to the invention, the voice separation effect in the voice data is realized by adopting an end-to-end model, and the voice separation accuracy is improved.

Description

technical field [0001] The invention belongs to the technical field of voice separation, and in particular relates to a voice separation method, system, mobile terminal and storage medium. Background technique [0002] Now more and more people communicate and communicate by phone, but the voices of the two parties on the phone are usually combined in the same audio channel, so it is necessary to extract the audio of the two people from the single channel separately for convenience. Subsequent corresponding voice recognition and voiceprint recognition. [0003] The existing voice separation method is to divide the current entire audio segment into multiple independent audio segments by segmenting the silent segment in the voice, and then cluster all the audio segments. The number of clustering categories is two, and the clustering After the class is completed, the corresponding audio clips of the two types of audio are spliced ​​into a complete audio, so as to perform speech...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/78G10L25/18G10L21/0272G10L19/008G06N3/08G06N3/04
CPCG10L25/78G10L19/008G10L25/18G10L21/0272G06N3/084G06N3/044G06N3/045
Inventor 曾志先肖龙源李稀敏蔡振华刘晓葳
Owner XIAMEN KUAISHANGTONG TECH CORP LTD