Voice synthesis method and apparatus

A technology of speech synthesis and speech data, applied in speech synthesis, speech analysis, instruments, etc., to improve user experience and reduce scale requirements

Inactive Publication Date: 2016-01-20
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF3 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There is a great contradiction between the professionalism and complexity of speech synthesis data production and the strong desire of users for personalized voice

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice synthesis method and apparatus
  • Voice synthesis method and apparatus
  • Voice synthesis method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0020] figure 2 This is a flowchart of a speech synthesis method provided in the first embodiment of the present invention. This embodiment is applicable to the case of personalized acoustic model training. The method is mainly executed by a speech synthesis device in a computer device, and the computer device includes But it is not limited to at least one of the following: user equipment and network equipment. User equipment includes but is convenient for computers, smart phones, and tablets. Network equipment includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of computers or network servers for cloud computing. Such as figure 2 As shown, the method specifically includes the following operations:

[0021] S110: Acquire voice data of the target user;

[0022] The target user voice data includes the voice characteristics of the target user. Generally, the recorded text is pre-design...

Embodiment 2

[0028] image 3 This is a schematic flowchart of a speech synthesis method provided in the second embodiment of the present invention, such as image 3 As shown, the method specifically includes:

[0029] S210: Acquire voice data of the target user;

[0030] This operation is similar to the operation S110 in the above-mentioned first embodiment, and will not be repeated in this embodiment.

[0031] S220: Perform voice annotation on the voice data of the target user to obtain text context information corresponding to the voice data of the target user.

[0032] Wherein, the voice labeling includes: syllable and phonetic segmentation labeling, accent and intonation labeling, prosodic labeling, boundary and part-of-speech labeling of the target user's voice data. In Chinese, a Chinese character represents a syllable, and the initials and vowels are phonemes. Prosody generally includes three levels of prosodic words, prosodic phrases and intonation phrases. One or more prosodic words cons...

Embodiment 3

[0048] Figure 4 This is a schematic flow chart of a speech synthesis method provided in Embodiment 3 of the present invention, such as Figure 4 As shown, the speech synthesis method specifically includes:

[0049] S310. Acquire voice data of the target user;

[0050] S320: Training the target user's acoustic model according to the target user's voice data and a preset reference acoustic model;

[0051] S330: Acquire text data to be synthesized;

[0052] Among them, the text data to be synthesized can be news text data, e-books, or text data received by mobile phone short messages and instant messaging software.

[0053] S340: Convert the text data to be synthesized into voice data according to the target user acoustic model.

[0054] When there is a voice synthesis requirement, the corresponding target user acoustic model is selected, and the text data to be synthesized is converted into text voice data, and the converted voice data has the voice characteristics of the target user.

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a voice synthesis method and apparatus. The voice synthesis method comprises: obtaining target user voice data; and according to the target user voice data and a preset reference acoustic model, training a target user acoustic model. The voice synthesis apparatus includes a target user voice data obtaining module and a target user acoustic model training module; the target user voice data obtaining module is used for obtaining target user voice data; and the target user acoustic model training module is used for training a target user acoustic model based on the target user voice data and a preset reference acoustic model. According to the invention, the scale requirement of the user recording data can be reduced during the individual voice synthesis process.

Description

Technical field [0001] The embodiments of the present invention relate to the technical field of text-to-speech conversion, in particular to a speech synthesis method and device. Background technique [0002] Speech synthesis, also known as text-to-speech technology, can convert any text information into standard and smooth voice read aloud in real time, which is equivalent to putting an artificial mouth on the machine. It involves acoustics, linguistics, digital signal processing, computer science and other disciplines and technologies, and is a cutting-edge technology in the field of Chinese information processing. [0003] figure 1 It is a schematic flow diagram of the speech synthesis method in the prior art, such as figure 1 As shown, the processing process of the speech synthesis system is generally as follows: first, after a series of processing such as text preprocessing, word segmentation, part-of-speech tagging, and phoneticization of the input text, then the prosody leve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02
Inventor 李秀林谢延康永国关勇
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products