Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice synthesis method and system

A technology of speech synthesis and speech synthesis, applied in the field of speech synthesis methods and systems, which can solve the problems of unnatural speech, error accumulation, and consumption of computing resources, etc., to achieve the effect of saving resources and reducing errors

Inactive Publication Date: 2019-05-17
GUANGZHOU DUOYI NETWORK TECH +2
View PDF8 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional speech synthesis process generally requires a combination of multiple components such as text processing front-end, speech duration model, acoustic feature prediction model, vocoder synthesis model, etc., and the design of these components requires a lot of professional domain knowledge, and each component is usually Individually trained, there will be an error cumulative effect when the final assembly of the synthesized speech, which brings great difficulties to the design and debugging of engineering practitioners
[0004] In addition, the speech synthesis methods currently proposed are only for the synthesis of one language. For the speech synthesis of multiple languages ​​​​in different scenarios, it is necessary to switch multiple models to synthesize the corresponding speech, and finally synthesize the speech of multiple languages ​​by splicing. This often requires the consumption of additional computing resources, and the spliced ​​and synthesized speech is not natural enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice synthesis method and system
  • Voice synthesis method and system
  • Voice synthesis method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] see figure 1 , figure 1 It is a flowchart of a speech synthesis method provided by an embodiment of the present invention; including:

[0053] S1. Convert the multilingual text to be processed into a corresponding mixed phoneme set, and use one-hot coding mapping to obtain a phoneme mixed sequence;

[0054] S2. Using an encoder to generate a text feature sequence from the phoneme mixture sequence;

[0055] S3. Generating the text feature sequence through a decoder to predict acoustic spectrum features;

[0056] S4. Synthesize the predicted acoustic spectral feature into a speech waveform.

[0057] Specifically, in step S1, before converting the multilingual text to be processed into a corresponding mixed phoneme set, word segmentation, polyphonic characters and punctuation processing need to be performed on the multilingual text to be processed. In the embodiment of the present invention, the mixed phoneme set is a set of mixed phonemes corresponding to the multilingu...

Embodiment 2

[0092] see Figure 5 , Figure 5 It is a structural block diagram of a speech synthesis system provided by an embodiment of the present invention; comprising:

[0093] The preprocessing unit 1 is used to convert the multilingual text to be processed into a corresponding mixed phoneme set, and use one-hot coding mapping to obtain a phoneme mixed sequence;

[0094] An encoder unit 2, configured to generate a text feature sequence from the phoneme mixed sequence through an encoder;

[0095] Decoder unit 3, used to generate predicted acoustic spectrum features from the text feature sequence through a decoder;

[0096] A speech waveform synthesizing unit 4, configured to synthesize the predicted acoustic spectral features into a speech waveform.

[0097] Preferably, the encoder is trained by a neural network; wherein the neural network includes at least one of a convolutional neural network and a recurrent neural network; then, the encoder unit 2 is specifically used for:

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a voice synthesis method which comprises the following steps: converting to-be-processed multilingual texts into corresponding mixed phoneme sets, and coding and mapping to obtain a phoneme mixing sequence by utilizing one-hot; generating a text feature sequence from the phoneme mixing sequence by virtue of an encoder; generating predicted acoustic spectrum features from the text feature sequence by virtue of a decoder; synthesizing a voice waveform from the predicted acoustic spectrum features. The embodiment of the invention further discloses a voice synthesis system.With the adoption of the embodiment of the invention, voices of multiple languages can be synthesized, the error of the synthesized voice is reduced, and resources are saved.

Description

technical field [0001] The invention relates to the technical field of speech, in particular to a speech synthesis method and system. Background technique [0002] Speech synthesis technology refers to the conversion of input text into natural and smooth speech, allowing machines to speak, expanding the way of human-computer interaction, and making machine-machine communication more convenient. Speech synthesis technology is a multidisciplinary cross-fusion technology, mainly involving linguistics, digital signal processing, acoustics, statistics, and computer science. It has been widely used in voice customer service networks, mobile communications, and smart homes. [0003] Traditional speech synthesis technology generally adopts the method of unit selection and splicing, and stitches small fragments of pre-recorded speech waveforms together to output the speech corresponding to the text through stitching technology. Another method is a statistical parametric speech synth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/08G10L13/10
Inventor 徐波
Owner GUANGZHOU DUOYI NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products