Speech synthesis method and device and computer storage medium

A speech synthesis and audio technology, applied in the computer field, can solve the problems of incomplete audio content, unable to reflect all the information of the input text, poor accuracy of speech synthesis, etc.

Active Publication Date: 2021-04-20
BEIJING CENTURY TAL EDUCATION TECH CO LTD
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, graphic texts and graphic formulas may be mixed in teaching texts, such as printed formulas or handwritten formulas mixed in mathematical texts. The current speech synthesis method cannot recognize graphic texts and graphic formulas, so the input text mixed with Graphical text and graphic formulas are filtered out, and only the plain text content in the input text is converted into audio. At this time, the synthesized audio content is incomplete and cannot reflect all the information of the input text, resulting in poor speech synthesis accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and device and computer storage medium
  • Speech synthesis method and device and computer storage medium
  • Speech synthesis method and device and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] The embodiment of the present application provides a speech synthesis method, figure 1 It is a flowchart of a speech synthesis method provided in Embodiment 1 of the present application. see figure 1 , the speech synthesis method provided by Embodiment 1 of the present application includes the following steps:

[0054] Step 101: Obtain a mixed sequence to be synthesized.

[0055] The mixed sequence to be synthesized is an input that requires speech synthesis, that is, the text information included in the mixed sequence to be synthesized needs to be converted into natural speech. The mixed sequence includes text to be synthesized and graphics to be synthesized. The text to be synthesized is text data in plain text format, for example, the text to be synthesized is Chinese text, English text or a mixed text of Chinese and English. The graphic to be synthesized is a picture including text information, specifically, it may include graphic text and / or graphic formula, an...

Embodiment 2

[0070] Based on the speech synthesis method provided in the first embodiment above, the second embodiment of the present application provides a speech synthesis method, which is a further specific description of the speech synthesis method described in the first embodiment, and this method can be applied to teaching text speech Synthetic application scenarios. figure 2 It is a flowchart of a speech synthesis method provided in Embodiment 2 of the present application. see figure 2 , the speech synthesis method provided in Embodiment 2 of the present application includes the following steps:

[0071] Step 201: Use sample images containing text information and mathematical formulas to train a pattern recognition model.

[0072] In the embodiment of the present application, before performing speech synthesis on the mixed sequence, it is first necessary to train the pattern recognition model so that it has the function of recognizing text and formulas from pictures. In a possi...

Embodiment 3

[0114] Based on the speech synthesis method provided in Embodiment 1 above, Embodiment 3 of the present application provides a speech synthesis method, which is a further specific description of the speech synthesis method described in Embodiment 1. image 3 It is a process schematic diagram of a speech synthesis method provided in Embodiment 3 of the present application. see image 3 , the speech synthesis method of embodiment three of the present application includes: performing text separation after inputting the mixed sequence, separating the text to be synthesized and the graphics to be synthesized included in the mixed sequence; separating the graphic text and the graphics formula to be synthesized; The plain text in the text is recognized to obtain the recognized text; the graphic formula is recognized as LaTeX characters; the text to be synthesized, the recognized text and the LaTeX characters are combined to obtain the text sequence; the text sequence is input into th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a voice synthesis method and device, and a computer storage medium, and the method comprises the steps: obtaining a to-be-synthesized mixed sequence which comprises a to-be-synthesized text and a to-be-synthesized graph, wherein the to-be-synthesized graph comprises at least one of a graph text and a graph formula; separating the to-be-synthesized text and the to-be-synthesized graph included in the mixed sequence; inputting the to-be-synthesized graph into the graph recognition model, recognizing a recognition text included in the graph text, and recognizing a graph formula as a LaTeX character; combining the to-be-synthesized text, the recognition text and the LaTeX character according to the positions of the graphic text and the graphic formula in the mixed sequence to obtain a text sequence; and inputting the text sequence into a speech synthesis model, and converting the text sequence into audio through the speech synthesis model. According to the scheme, the accuracy of speech synthesis of the mixed sequence can be improved.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a speech synthesis method, device and computer storage medium. Background technique [0002] Speech synthesis is used to convert text information into natural voice output, which can meet the needs of users for voice reading and broadcasting. For example, news reading, novel reading, weather broadcasting, SMS broadcasting, e-book reading, and teaching content reading can be realized through speech synthesis technology etc., so that users can listen to relevant information through voice instead of directly reading text information. [0003] Applying speech synthesis technology to teaching scenarios can convert teaching texts into natural speech output, and realize text reading, topic reading, and dictation of new words. However, graphic texts and graphic formulas may be mixed in teaching texts, such as printed formulas or handwritten formulas mixed in mathematical tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/027G10L13/08G10L19/02G06K9/20G06K9/46G06N3/04G06N3/08
Inventor 智鹏鹏杨嵩
Owner BEIJING CENTURY TAL EDUCATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products