Chinese and English cross-language speech synthesis method and device, electronic equipment and storage medium
A speech synthesis and cross-language technology, applied in the field of Chinese-English cross-language speech synthesis, can solve the problems of high cost and high price of mixed-reading audio labeling, increasing the difficulty of cross-language synthesis tasks, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0040] This embodiment is used to illustrate the principles and steps of the present invention for solving technical problems, such as figure 1 As shown, it is a flowchart of a method for cross-language speech synthesis in Chinese and English according to Embodiment 1 of the present invention, and the specific steps are:
[0041] S1. Use the sequence-to-sequence task in deep learning to build the first cross-language acoustic model;
[0042] Further, the first cross-language acoustic model is based on the Tacotron model, including: a CBHG-based encoder, a Gaussian mixture distribution-based GMMv2b attention mechanism module, and a decoder.
[0043] In the specific implementation process, such as figure 2 As shown, the first cross-linguistic acoustic model CS-Tacotron includes: an encoder, which is based on CBHG, and the language embedding is added to the convolutional network after activation by different linear and nonlinear layers, serving as the gate of the high-speed net...
Embodiment 2
[0055] This embodiment is used to fine-tune the constructed CS-Tacotron model, the first cross-language acoustic model. When using Chinese monolingual data to fine-tune the cross-language acoustic model, the synthesized Chinese-English cross-language voice effect will be deteriorated. For the catastrophic forgetting problem of Chinese-English cross-language speech synthesis, the present invention introduces a continuous learning method to improve the synthesis effect when using Chinese monolingual recording data to fine-tune the cross-language CS-Tacotron model.
[0056] Further, the first cross-language acoustic model is fine-tuned by using the continuous learning method based on experience replay. During the fine-tuning process, the regular-based plastic weight stabilization method is used to fix the parameters of the first cross-language acoustic model during fine-tuning at the first cross-language acoustic model before fine-tuning. Within a very small margin of error for th...
Embodiment 3
[0058] In this embodiment, in order to improve the expressiveness of the synthesized speech by the CS-Tacotron model, the modeling of prosodic pause in speech synthesis is studied. Effect. The way of hierarchical prosody to achieve prosody modeling usually mixes the prosodic boundaries of different levels as phonemes into the input sequence, the model learns the corresponding pause duration independently according to the training data, constructs the hierarchical prosodic graph from the hierarchical prosody of the input cross-language text, and introduces the graph neural network network for modeling, implements an alternative way of modeling prosodic information, and proposes the second graph-based cross-lingual acoustic model GCS-Tacotron model.
[0059] Further, the Chinese prosodic structure is extended to Chinese-English cross-language texts, and the specific method includes: using English words or single letters as prosodic words in the Chinese four-level prosodic struct...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



