Unlock instant, AI-driven research and patent intelligence for your innovation.

Bi-LSTM (Bidirectional-Long Short-Term Memory Recurrent Neural Networks) and WaveNet fused voice conversion method

A voice conversion and voice technology, applied in voice analysis, instruments, etc., can solve the problems of lack of voice detail information, low conversion quality, etc., achieve good naturalness, high voice similarity, and improve stability

Active Publication Date: 2019-05-17
ARMY ENG UNIV OF PLA
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to propose a voice conversion method that combines Bi-LSTM and WaveNet, to solve the problem that the existing voice conversion method lacks voice detail information and the conversion quality is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bi-LSTM (Bidirectional-Long Short-Term Memory Recurrent Neural Networks) and WaveNet fused voice conversion method
  • Bi-LSTM (Bidirectional-Long Short-Term Memory Recurrent Neural Networks) and WaveNet fused voice conversion method
  • Bi-LSTM (Bidirectional-Long Short-Term Memory Recurrent Neural Networks) and WaveNet fused voice conversion method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0044] Such as figure 1 As shown, two trained Bi-LSTM network models and a waveform generation neural network model (WaveNet) are first obtained, wherein one Bi-LSTM network model is used for feature conversion (Bi-LSTM1), and the other Bi-LSTM network The model is used for post-processing (Bi-LSTM2), and then the speech features to be converted are extracted, and the trained Bi-LSTM network model (Bi-LSTM1) for feature conversion is converted and sent to the waveform generation neural network model (WaveNet) for generation The pre-converted speech is then post-processed by another Bi-LSTM network model (Bi-LSTM2) for post-processing, and finally the final converted speech is generated by the waveform generation neural network model (WaveNet).

[0045] Such as figure 2 As shown, the specific process includes the following steps.

[0046] Step 1. "Preprocessing" the training speech

[0047] From the parallel corpus, the source speech and the target speech are analyzed by ST...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Bi-LSTM (Bidirectional-Long Short-Term Memory Recurrent Neural Networks) and WaveNet fused voice conversion method which comprises the following steps: firstly, extracting features of a to-be-converted voice, and sending a Mel Frequency cepstrum coefficient of the to-be-converted voice into a feature conversion network for conversion to obtain a converted Mel Frequency cepstrum coefficient; then up-sampling an aperiodic frequency of the to-be-converted voice, a linearly converted fundamental tone frequency and the converted Mel Frequency cepstrum coefficient, and sending into a voice generation network to obtain a pre-generated voice, and sending a Mel Frequency cepstrum coefficient of the pre-generated voice into a post-processing network for post-processing; andup-sampling the post-treated Mel Frequency cepstrum coefficient, the aperiodic frequency of the to-be-converted voice and the linearly converted fundamental tone frequency, and sending into the voicegeneration network to generate a final converted voice. The converted voice generated by the Bi-LSTM and WaveNet fused voice conversion method is higher in similarity and higher in naturalness.

Description

technical field [0001] The invention belongs to the field of voice signal processing, in particular to a voice conversion method integrating Bi-LSTM and WaveNet. Background technique [0002] With the rapid development of artificial intelligence technology, its application fields are becoming more and more extensive. Technologies such as voice interaction, intelligent sound imitation, and personalized voice generation have gradually attracted people's attention. Voice conversion (Voice Conversion, VC), as an important technical means of personalized voice generation, involves voice signal processing, phonetics, pattern recognition, artificial intelligence and other disciplines, and is one of the research difficulties and hotspots in the field of voice processing today. . Broadly speaking, people collectively refer to speech processing technologies that change the characteristics of speakers in speech as speech conversion or voice transformation (Voice Transformation). In ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/007G10L25/24G10L25/30
Inventor 张雄伟苗晓孔孙蒙曹铁勇郑昌艳李莉曾歆
Owner ARMY ENG UNIV OF PLA