Polyphone disambiguation and rhythm control combined method and system and electronic equipment

A technology of polyphonic characters and prosody, applied in the field of Chinese speech synthesis, can solve problems such as accumulation of module errors and affecting the effect of speech synthesis

Active Publication Date: 2021-07-30
HISENSE VISUAL TECH CO LTD
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, the general practice is to use two independent polyphone disambiguation models and prosodic prediction models in the front-end processing to realize polyphon

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Polyphone disambiguation and rhythm control combined method and system and electronic equipment
  • Polyphone disambiguation and rhythm control combined method and system and electronic equipment
  • Polyphone disambiguation and rhythm control combined method and system and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In order to make the purposes, technical solutions and advantages of the exemplary embodiments of the present application clearer, the technical solutions in the exemplary embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the exemplary embodiments of the present application. , the described exemplary embodiments are only some of the embodiments of the present application, but not all of the embodiments.

[0055] Based on the exemplary embodiments shown in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application. In addition, although the disclosures in this application are introduced according to one or several exemplary examples, it should be understood that each aspect of these disclosures may also independently constitute a complete technical solution.

[0056] It sho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a polyphone disambiguation and rhythm control combined method, a polyphone disambiguation and rhythm control combined system and electronic equipment. The method comprises the steps of: obtaining a to-be-processed text and part-of-speech thereof, converting the to-be-processed text and the part-of-speech thereof into a character vector and a part-of-speech vector, and splicing to obtain a spliced vector; training through an alternate training strategy to obtain a joint model, a first group of weights and a second group of weights, wherein the joint model comprises a first neural network and a second neural network, and encoding the splicing vector through the joint model to obtain a first in-sentence code and a second in-sentence code of the character; obtaining a polyphone weighted sum according to the first group of weights, and obtaining pronunciation probability distribution of polyphones through a first full connection layer; removing incorrect pronunciation in the pronunciation probability distribution of the polyphone through masks, and obtaining final pronunciation prediction; and obtaining a rhythm weighted sum according to the second group of weights, and obtaining a rhythm pause level through a second full connection layer and a conditional random field. Error accumulation caused by stream structure processing is eliminated, and the calculation speed of text-to-speech conversion is improved.

Description

technical field [0001] The present application relates to the technical field of Chinese speech synthesis, and in particular to a polyphone disambiguation and prosody control joint method, system and electronic equipment. Background technique [0002] In order to avoid mispronunciation of polyphonic characters generated by text-to-speech technology or too bland speech, and to make the synthesized speech more accurate and "personalized", polyphonic word disambiguation and prosodic pauses to control speech are often added during the processing. [0003] In the traditional processing method, text-to-speech mainly includes two parts: front-end text / phoneme conversion processing and back-end phoneme / speech signal conversion processing. The processing of the back-end is based on acoustic features, which is used to achieve end-to-end training and synthesis; while the front-end includes the clause segmentation model, text regularization model, natural tone sandhi model, polyphone di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/284G06F40/44G06N3/04G06N3/08
CPCG06F40/284G06F40/44G06N3/08G06N3/044G06N3/045
Inventor 马明刘宇
Owner HISENSE VISUAL TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products