Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text rhythm prediction method based on multi-task multi-level model

A prediction method and multi-level technology, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as long sentences without prosodic phrase boundaries and intonation phrase boundaries, troublesome training and parameter adjustment, and prosodic information errors. Achieve the effects of reducing long sentences without rhythmic pauses, optimizing bad problems, and improving information utilization

Active Publication Date: 2020-06-26
广州深声科技有限公司
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In order to solve the problems of prosodic information generation errors, inability to share information with each other, troublesome training and parameter adjustment, inaccurate boundary prediction of prosodic phrases and intonation phrases, and easy occurrence of long sentences without prosodic phrase boundaries and intonation phrase boundaries, the purpose of the present invention is to Provide a text prosody prediction method based on multi-task multi-level model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text rhythm prediction method based on multi-task multi-level model
  • Text rhythm prediction method based on multi-task multi-level model
  • Text rhythm prediction method based on multi-task multi-level model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Example 1, see Figure 1. This figure mainly explains the data processing, data encoding and model training. The specific implementation includes the following parts:

[0032] Step 101: Obtain training text, including common text regularization methods such as text length cropping, illegal characters and punctuation mark correction;

[0033] Step 102: In prosodic acoustics, full stops, question marks, exclamation marks, and commas are used as intonation phrase boundaries, and accordingly, commas, full stops, exclamation marks, question marks, and semicolons in the text are randomly removed, which can form long pauses in the text rhythm. Punctuation mark, the punctuation mark position is regarded as the intonation phrase level boundary point, and such text is used as the extended text as the training data; this step also includes the splicing of two or more short texts, which is used as the extended data for prosodic text training;

[0034] Step 103: use the character-lev...

Embodiment 2

[0037] Embodiment 2, see Figure 2, this figure is mainly the multi-task neural network model architecture part, the specific implementation includes the following parts, for the sake of clarity and conciseness, the description of the known functions and structures is omitted in the following description, and only the core points are explained :

[0038] As shown in the figure, firstly, after the input sentence enters the model, it is encoded, including word information and position information encoding. The methods used include but are not limited to common one-hot vectors, relative position encoding of trigonometric functions, etc.;

[0039] Using a multi-layer multi-head self-attention layer to extract text semantic analysis and prosodic structure information, the attention weight algorithm of the multi-head self-attention layer is not limited here;

[0040] Among them, the multi-layer self-attention model in the figure can be pre-trained using a large text corpus, or a mode...

Embodiment 3

[0045] Example 3, as shown in Figure 3, mainly explains the solution mechanism for the long sentences in the prediction stage that do not predict the boundaries of L2 and L3, that is, to generate a boundary based on the best, specifically:

[0046] As shown in the figure, it is assumed that after argmax is calculated by the output probability matrix of the L2 layer, all of them are O tags, that is, there are no prosodic phrase boundaries in the sentence, that is, only prosodic word boundaries. is for longer sentences;

[0047] Then adopt a more reasonable mechanism: make a slice from the B label, get the possible probability of the B label in all words, and select the position with the highest probability as the B label position.

[0048] Example 3, see Figure 3, describes the whole process of forecasting, specifically:

[0049] Step 401: Obtain predicted text;

[0050] Step 402: Carry out character-level encoding for the word table of the text to be predicted, similar to st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text rhythm prediction method based on a multi-task multi-level model, and the method is characterized in that the method comprises the following steps: 401, obtaining a prediction text; 402, performing character-level coding on the text to be predicted for the word table; step 403, performing sequence prediction by using a multi-task model; step 404, judging whether thesentences have long sentences without L2 and L3 or not; and step 405, combining output results of the L1, the L2 and the L3, overlapping boundary positions, performing selective combination accordingto priorities of the L3, the L2 and the L1, and returning an output result. The invention relates to the technical field of text rhythm prediction. The problems that rhythm information generation errors exist, information cannot be shared mutually, training and parameter adjustment are troublesome, boundary prediction of rhythm phrases and intonation phrases is not accurate enough, and long sentences have no rhythm phrase boundary or intonation phrase boundary easily are solved.

Description

technical field [0001] The invention relates to the technical field of text prosody prediction, in particular to a text prosody prediction method based on a multi-task multi-level model. Background technique [0002] In recent years, with the rapid development of deep learning technology, it has also brought great breakthroughs in speech synthesis. Prosody plays an important role in the naturalness of speech synthesis, and the prosody information extracted from the text can provide very effective features for the speech synthesis backend, improving the sense of pause and rhythm of the synthesized speech; more specifically, according to prosodic sound There are three levels of credits: prosodic words, prosodic phrases, and intonation phrases. The duration of speech pauses at the above boundaries increases in turn, especially prosodic phrases and intonation phrases, which have a great impact on the sense of speech pause and rhythm. [0003] At present, the most common prosody...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06N3/08G06N3/044G06N3/045Y02D10/00
Inventor 周俊明刘杰肖鉴津黄博贤
Owner 广州深声科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products