Speech synthesis method and system based on neural network, and storage medium

A neural network and speech synthesis technology, applied in the field of speech synthesis based on neural network, can solve the problems of difficult speech synthesis, difficult sample acquisition and high acquisition cost

Inactive Publication Date: 2020-02-28
武汉水象电子科技有限公司
View PDF15 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Traditional speech synthesis systems require a large amount of sample data due to the need for parameter modeling. Not only is the acquisition cost high, but it is also difficult to obtain samples
Moreover, the traditional speech synthesis system needs to extract back-end linguistic information. For some low-quality speech, such as non-professional recording acquisition, speech with changeable emotions and spaces, speech synthesis is difficult and the synthesis effect is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and system based on neural network, and storage medium
  • Speech synthesis method and system based on neural network, and storage medium
  • Speech synthesis method and system based on neural network, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] Such as figure 1 , the present invention also discloses a neural network-based speech synthesis method, comprising:

[0034] S100, generating a pre-trained neural network model by using the basic speech-text data set in the sample library;

[0035] Specifically, the neural network module acquires the basic data set in the sample library, the basic data set is the text data and voice data of a single person, the basic data set is preferably a single person, medium-volume, and high-quality text data and voice data, and the single person is selected Human and high-quality speech and text data can make the trained pre-training neural network model reflect the mapping from text to speech, and choosing a medium level, compared with the large number of levels adopted by the existing technology, can not only save costs, but also Data acquisition is also easy. The basic data set can be stored in the sample library module in advance, and can also be temporarily imported into th...

Embodiment 2

[0071] As shown in the figure, the embodiment of the present invention provides a neural network-based speech synthesis system, such as image 3 , including: sample library module 1, data processing module 2, neural network module 3, information input module 4, matching rule module 5, wherein:

[0072] The sample library module 1 is used to store several sets of corresponding data, each set of data includes at least text data and voice data. Specifically, the sample library module 1 includes basic data sets and specific person data sets. The basic data set is text data and voice data of a single person, and the basic data set is preferably single person, medium-volume, high-quality text data and voice data. The specific person data set is several groups of specific person text data and voice data. A small amount of low-quality text data and speech data are preferred for specific person data sets. The low quality here refers to audio obtained from non-professional recordings,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech synthesis method based on a neural network. The speech synthesis method comprises the following steps of generating a pre-trained neural network model and a specific human speech synthesis model by using a sample library; performing speech and text analysis on specific human speech text data, extracting key speech text contents in a speech text classification set,and corresponding to set labels; generating a first matching rule according to the set labels and the key speech text contents, and generating a second matching rule according to a specific person andthe specific human speech synthesis model; and calling the first matching rule and the second matching rule according to a user instruction, and outputting synthetic speech. According to the speech synthesis method based on the neural network provided by the invention, speech texts are trained, specific people can be targeted, there is less demand on the amount of data, and the speech synthesis method can be customized for use according to user needs. Moreover, for each specific human speech text data, the corresponding specific human speech synthesis model is generated, when a user input instruction contains specific person information, the corresponding model is directly called, so that the speech synthesis effect is better.

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a neural network-based speech synthesis method, system and storage medium. Background technique [0002] Speech synthesis, also known as text-to-speech (Text To Speech, TTS). Speech synthesis is the technology of producing artificial voice through mechanical and electronic methods. It is a technology that converts the text information generated by the computer itself or input from the outside into understandable and fluent spoken Chinese output. Speech synthesis is equivalent to installing a "mouth" similar to a human being on a computer, and it plays a vital role in an intelligent computer system that can "hear and speak". [0003] A traditional speech synthesis system usually includes two modules, a front-end and a back-end. The front-end analyzes the text and extracts the linguistic information required by the back-end, such as word segmentation and prosody....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L13/02G10L25/30
CPCG10L13/08G10L13/02G10L25/30
Inventor 柳慧芬季业勤曹丹风
Owner 武汉水象电子科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products