Chinese mandarin character pronunciation conversion method based on self-attention mechanism

A word-to-speech conversion and mandarin technology, applied in speech analysis, speech synthesis, computer parts, etc., and can solve problems such as computational difficulties
CN111145718AActive Publication Date: 2020-05-12INST OF ACOUSTICS CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
INST OF ACOUSTICS CHINESE ACAD OF SCI
Publication Date
2020-05-12

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The embodiment of the invention provides a Chinese mandarin character pronunciation conversion method based on self-attention mechanism. The Chinese mandarin character pronunciation conversion methodcan be used for direct prediction from Chinese sentences to pronunciation after tone change. According to the Chinese mandarin character pronunciation conversion method, multi-task learning and relative position coding are combined with a self-attention model, a self-attention mechanism is used for capturing the dependency relationship of characters in input sentences, and extra part-of-speech andthree pinyin attributes are introduced into multi-task learning to serve as sub-tasks; a tone transfer relationship is modeled by using CRF, and position information of a sequence is effectively modeled by relative position coding; finally, pronunciation can be obtained through a main task prediction result, and can also be a result of joint judgment of three pinyin attribute subtasks. Accordingto the method, the performance of Chinese mandarin character pronunciation conversion is improved to a great extent.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of speech synthesis, in particular to a Chinese Mandarin word-to-sound conversion method based on a self-attention mechanism. Background technique

[0002] TTS technology is widely used in e-books, voice assistants, car navigation, voice customer service and other products. In Chinese speech synthesis, whether it is a parametric or sequence-to-sequence model, the phoneme-level modeling unit is compact enough to be trained effectively. The role of phonetic conversion is to map Chinese characters to pronunciation.

[0003] At the heart of transliteration is polyphone disambiguation and tone sandhi, and in some cases, pronunciation is determined by semantics. For example, "also" reads "huan2" such as "return" when it means returning, and reads "hai2" such as "still" when it means still. There is also a part of the tone-changing tone environment, such as two consecutive three-tone readings, the former is usually pronou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More