Voice separation method and system, mobile terminal and storage medium
A speech separation and audio technology, applied in speech analysis, neural learning methods, instruments, etc., can solve the problems of poor speech separation effect, achieve the effect of improving accuracy, simplifying steps, and improving user experience
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0044] see figure 1 , is a flowchart of the speech separation method provided by the first embodiment of the present invention, including steps:
[0045] Step S10, obtaining the sample audio of the left channel and the sample audio of the right channel, and combining the sample audio of the left channel and the sample audio of the right channel to obtain the combined sample audio;
[0046] Among them, the left channel sample audio and the right channel sample audio are obtained by collecting the phone audio in the real scene. In this step, the phone call audio is saved through the phone recording function, but the saved call audio is required to be double-sound channel, that is, the left channel is one person's call, and the right channel is another person's call, and 10,000 call audios are recorded, and each call audio is about 2 minutes;
[0047] Specifically, in this step, all the two-channel call audios are merged into single-channel audio, that is, the respective calls o...
Embodiment 2
[0066] see figure 2 , is a flow chart of the speech separation method provided by the second embodiment of the present invention, including steps:
[0067] Step S11, obtaining the sample audio of the left channel and the sample audio of the right channel, and combining the sample audio of the left channel and the sample audio of the right channel to obtain the combined sample audio;
[0068] Step S21, constructing a prenet network, and constructing a CBHG network after the prenet network;
[0069] Wherein, the prenet network includes three fully connected layers, and the CBHG network includes a first convolutional layer, a pooling layer, a second convolutional layer, and a third convolutional layer in sequence, and the second convolutional layer and The third convolutional layer is a one-dimensional convolutional layer, and the filter size of the second convolutional layer and the third convolutional layer is 3, the stride is 1, and the activation function used by the second...
Embodiment 3
[0093] see image 3 , is a schematic structural diagram of the speech separation system 100 provided by the third embodiment of the present invention, including: a sample audio acquisition module 10, a feature dimensionality reduction module 11, a feature decoding module 12, an iterative training module 13 and a speech separation module 14, wherein:
[0094] The sample audio acquisition module 10 is configured to acquire the left channel sample audio and the right channel sample audio, and combine the left channel sample audio and the right channel sample audio to obtain combined sample audio.
[0095] The feature dimensionality reduction module 11 is configured to construct an encoding network, and input the spectral features of the combined sample audio into the decoding network for dimensionality reduction encoding to obtain dimensionality reduction features.
[0096] Wherein, the feature dimensionality reduction module 11 is also used to: construct a prenet network, and co...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


