Bi-LSTM (Bidirectional-Long Short-Term Memory Recurrent Neural Networks) and WaveNet fused voice conversion method
A voice conversion and voice technology, applied in voice analysis, instruments, etc., can solve the problems of lack of voice detail information, low conversion quality, etc., achieve good naturalness, high voice similarity, and improve stability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0044] Such as figure 1 As shown, two trained Bi-LSTM network models and a waveform generation neural network model (WaveNet) are first obtained, wherein one Bi-LSTM network model is used for feature conversion (Bi-LSTM1), and the other Bi-LSTM network The model is used for post-processing (Bi-LSTM2), and then the speech features to be converted are extracted, and the trained Bi-LSTM network model (Bi-LSTM1) for feature conversion is converted and sent to the waveform generation neural network model (WaveNet) for generation The pre-converted speech is then post-processed by another Bi-LSTM network model (Bi-LSTM2) for post-processing, and finally the final converted speech is generated by the waveform generation neural network model (WaveNet).
[0045] Such as figure 2 As shown, the specific process includes the following steps.
[0046] Step 1. "Preprocessing" the training speech
[0047] From the parallel corpus, the source speech and the target speech are analyzed by ST...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


