Audio data generating method and system for voice synthesis

A technology for audio data and speech synthesis, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as compression, reduce computing delay, improve speed, and ensure the accuracy of acoustic feature prediction.

Active Publication Date: 2018-12-18
BEIJING GUANGNIAN WUXIAN SCI & TECH
View PDF6 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, at present, the delay of a large number of high-naturalness TTS interfaces is above 150ms, which severely compresses the processing time of the other two steps (ASR, NLP), and thus limits the complexity and accuracy of information processing in the other two steps. To improve the human-computer interaction experience, it is necessary to increase the speed of TTS or speech synthesis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio data generating method and system for voice synthesis
  • Audio data generating method and system for voice synthesis
  • Audio data generating method and system for voice synthesis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0036] figure 1 A flowchart of a method for generating audio data for speech synthesis according to an embodiment of the present invention is shown.

[0037] Such as figure 1 As shown, in step S101, text features in the text data are extracted to obtain text feature data. In one embodiment of the present invention, the text features include: one or a combination of phonetic symbols, intonation, sentence sentence or prosodic marking, syntactic dependency tree, participle marking, part-of-speech tagging, semantic weight and word vector.

[0038] In addition, the manner of obtaining the text feature data may be a natural language processing algorithm (NLP, Natural Language Processing). Natural language processing algorithms can perform word segm...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an audio data generating method for voice synthesis. The audio data generating method comprises the following steps: extracting the text features in the text data to obtain textfeature data; carrying out the acceleration conversion processing on the text feature data through a neural network structure, and converting the text feature data into acoustic feature data; performing sound synthesis or selected splicing according to the acoustic feature data to obtain audio data. According to the invention, through the special anti-convolution structure is adopted, a good voice synthesis effect can be achieved on the premise that no any auto-regressive structure is included and few parameters are used, the calculation delay can be reduced while the prediction precision ofthe acoustic features can be guaranteed through the neural network structure, and the requirement for computing resources is reduced, the concurrent quantity is increased, the voice synthesis speed isincreased, and contribution is made to improvement of human-computer interaction experience.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to a method and system for generating audio data for speech synthesis. Background technique [0002] For a voice-based real-time human-computer interaction system, in order to achieve the optimal human-computer interaction experience, the time from the end of the user's voice pronunciation to the start of the machine's voice reply is called "response time". In order to achieve the optimal human-computer interaction experience, the total time of these three steps should be around 600ms. For most voice human-computer interaction systems, the information processing process needs to go through three steps of ASR-NLP-TTS in sequence. However, at present, the delay of a large number of high-naturalness TTS interfaces is above 150ms, which severely compresses the processing time of the other two steps (ASR, NLP), and thus limits the complexity and accuracy of information processing ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02
CPCG10L13/02G10L2013/021
Inventor 陆羽皓马达标
Owner BEIJING GUANGNIAN WUXIAN SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products