Unlock instant, AI-driven research and patent intelligence for your innovation.

Text-to-voice processing method and computer readable storage medium

A processing method and text technology, applied in speech analysis, speech recognition, speech synthesis, etc., can solve the problems of far away from the effect of manual reading of text, difficulty in simulating emotional color, etc., and achieve a more entertaining and more anthropomorphic reading experience. Effect

Pending Publication Date: 2022-03-18
南京星云数字技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As its carrier, text often needs to adopt many skills combined with the context in the process of expression to accurately express the emotional color in it. When converting text into speech, the machine that performs the conversion cannot understand the emotional color. Therefore, the current technology The artificial intelligence technology can convert text into speech, but it is difficult to simulate the emotional color in it. The emotional color of speech is mainly manifested in the reading skills during playback, including stress, pause, tone, speech speed, intonation, etc.
[0003] In order to solve the above problems, machine deep learning is usually used to process speech data, but it can only be limited to text conversion in some fields, and the converted speech playback effect is still far from the effect of human reading text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text-to-voice processing method and computer readable storage medium
  • Text-to-voice processing method and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Embodiment 1: This embodiment provides a text-to-speech processing method, refer to figure 1 As shown, the method includes:

[0045] S110. Acquire the text to be converted and a conversion control instruction matched with the text to be converted.

[0046] Specifically, the conversion control instruction is pre-edited and configured, and the conversion control instruction is associated with the text to be converted. Exemplarily, both the text to be converted and the conversion control instruction have an identification number, and the text to be converted and the text matching the text to be converted are obtained according to the identification number. Convert control instructions.

[0047] In a preferred embodiment, the conversion control instruction includes at least one of a pause instruction, an accent instruction, a speech rate adjustment instruction, a sentence tone adjustment instruction, and an instruction for adding mouth habit, that is, the conversion contro...

Embodiment 2

[0099] Embodiment 2: This embodiment provides a text-to-speech conversion system, refer to figure 2 As shown, the system includes:

[0100] An acquisition module 210, configured to acquire the text to be converted and the conversion control instruction matched with the text to be converted;

[0101] The processing module 220 is configured to process the text to be converted to obtain a target voice based on the conversion control instruction and preset processing rules.

[0102] In a preferred embodiment, the processing module 220 includes:

[0103] A splitting unit 221, configured to split the text to be converted by morpheme units to obtain a morpheme set;

[0104] The conversion unit 222 is configured to convert each morpheme in the morpheme set to obtain a corresponding audio frame set and generate a corresponding index, the audio frame set includes a first audio frame and a second audio frame, and the first audio frame is the audio frame corresponding to the conversio...

Embodiment 3

[0130] Embodiment 3: This embodiment provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented:

[0131] Obtaining the text to be converted and a conversion control instruction coordinated with the text to be converted;

[0132] Processing the text to be converted based on the conversion control instruction and preset processing rules to obtain a target voice.

[0133] As a preferred implementation mode, in the embodiment of the present invention, when the processor executes the computer program, the following steps are also implemented:

[0134] Splitting the text to be converted by morpheme units to obtain a morpheme set;

[0135] Convert each morpheme in the morpheme set to obtain a corresponding audio frame set and generate a corresponding index, the audio frame set includes a first audio frame and a second audio frame, and the first audio frame is the conver...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text-to-voice processing method and a computer readable storage medium. The method comprises the steps of obtaining a to-be-converted text and a conversion control instruction matched with the to-be-converted text; processing the to-be-converted text based on the conversion control instruction and a preset processing rule to obtain target voice; the conversion control instruction used for controlling the speech synthesis process is obtained, the corresponding audio operation is carried out according to the conversion control instruction in the speech synthesis process, and the finally output audio result is optimized, so that the finally played reading experience is more personified, the original intention can be better expressed, and the entertainment is improved.

Description

technical field [0001] The invention relates to the field of computer data processing, in particular to a text-to-speech processing method and a computer-readable storage medium. Background technique [0002] Language is a great art. Human language is closely related to the history of its civilization. It reflects the cultural characteristics and social forms of the times, and it has rich emotional colors in the process of communication. As its carrier, text often needs to adopt many skills combined with the context in the process of expression to accurately express the emotional color in it. When converting text into speech, the machine that performs the conversion cannot understand the emotional color. Therefore, the current technology The artificial intelligence technology below can convert text into speech, but it is difficult to simulate the emotional color in it. The emotional color of speech is mainly manifested in the reading skills during playback, including stress,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/22G10L15/26G10L13/02
CPCG10L15/22G10L15/26G10L13/02G10L2015/223
Inventor 吴少铎戴治波王瑞吴晨捷晁广晗
Owner 南京星云数字技术有限公司