Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese rhythm hierarchy prediction method and system based on self-attention

A prosody level and prediction method technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of continuous transmission, large number of Chinese entries, and continuous transmission of results, so as to avoid wrong transmission, improve model performance, avoid negative effect

Pending Publication Date: 2020-06-30
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the above method has the following problems: 1) LSTM, as a RNN structure, needs to use the output value of the previous moment every time it predicts the output value at the current moment. This sequential calculation hinders its parallelization and makes any two The distance between words is O(n); 2) Training and predicting the prosody prediction model at the word granularity means that the input text must be segmented first, and the result of word segmentation will directly affect the performance of prosody level prediction
In addition, the number of Chinese words is huge, and storing these word vectors will take up a lot of storage space and computing resources, which is obviously not practical for offline speech synthesis; 3) step-by-step prosody prediction will make wrong results continue to be transmitted , leading to subsequent prediction errors
[0005] Predicting the prosodic level of text is an essential step in the speech synthesis system, but the current mainstream method uses word-level features to rely on the performance of the word segmentation system, and prosodic prediction level by level will cause continuous transmission of wrong results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese rhythm hierarchy prediction method and system based on self-attention
  • Chinese rhythm hierarchy prediction method and system based on self-attention
  • Chinese rhythm hierarchy prediction method and system based on self-attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described below in conjunction with the accompanying drawings.

[0049] The present invention proposes a Chinese prosody prediction method based on self-attention. This method takes word vectors as input features, models the dependency relationship between words in the text through the self-attention mechanism, sets an independent output layer for each level of prosody, and realizes simultaneous prediction of each level of prosody. This method achieves accurate prediction of the prosodic level of the text while avoiding the dependence on the word segmentation system.

[0050]The present invention proposes a method for constructing a Chinese prosodic level prediction model based on self-attention, including: learning a large number of unlabeled texts to obtain word vectors of individual characters; bit tag sequence; build a prosodic prediction model based on the self-attention mechanism, and pre-train the model according to the word...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese rhythm hierarchy prediction method based on self-attention. The method comprises the steps of learning a large number of unlabeled texts to obtain word vectors of single words, converting a to-be-predicted text into a word vector sequence by utilizing the word vectors, inputting the word vector sequence into a trained rhythm level prediction model, and outputtingword positions and rhythm levels of the text. According to the method, Chinese rhythm hierarchy prediction is carried out by using a rhythm hierarchy prediction model, characteristics of character granularity are used as input while prediction performance is guaranteed, dependence on a word segmentation system and possible negative effects are avoided, the model directly models the relation between any two characters in a text through a self-attention mechanism, and parallel calculation can be achieved. Pre-training is carried out by using additional data to improve the model performance, so that each rhythm level of the to-be-processed text can be accurately predicted at the same time, and wrong transmission is avoided.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a method and system for predicting Chinese prosody levels based on self-attention. Background technique [0002] In the speech synthesis system, predicting the prosodic hierarchical structure according to the input text to be synthesized has always been a crucial step, and the prediction result will be used as a part of the linguistic features to model the acoustic features and duration. Therefore, the accuracy of prosodic level prediction largely determines the naturalness of synthesized speech, and it is of great significance to realize accurate prosodic level prediction. [0003] The current mainstream method is to use the bidirectional long-short-term memory network BLSTM to model different prosodic levels with word vectors as input, that is, to train a model for prosodic words, prosodic phrases, and intonation phrases, and use the low-level prediction results A lev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/02G10L13/10
CPCG10L13/02G10L13/10Y02D10/00
Inventor 张鹏远卢春晖颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products