Prosodic labeling method, device and apparatus, and medium

A prosody and first tone technology, applied in the field of speech synthesis, can solve problems such as inability to obtain prosody labeling effect, difficulty in system design, and inability to finally screen out correct prosody information

Active Publication Date: 2019-11-15
ZHEJIANG TONGHUASHUN INTELLIGENT TECH CO LTD
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Solution 1 cannot finally filter out the correct prosodic information without reading according to the predicted prosodic rhythm of the text
Option 2 splits the internal connection between speech and text, and cannot achieve good prosodic labeling effect
Moreover, the prosodic labeling process in the existing solutio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Prosodic labeling method, device and apparatus, and medium
  • Prosodic labeling method, device and apparatus, and medium
  • Prosodic labeling method, device and apparatus, and medium

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0046] The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.

[0047] The existing prosody labeling schemes have low labeling efficiency, or break the internal connection between the acoustic features of the prosody to be labeled and the corresponding text features, and the prosody labeling process includes multiple stages of processing, and the component construction of each stage is different. It requires a wealth of domain knowledge, the design of the entire system is difficult, and the implementation is complex....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application discloses a prosodic labeling method, device and apparatus, and a medium. The method comprises the following steps: acquiring a first acoustic feature, a first text feature and a firstprosodic labeling result corresponding to a sample audio; the first acoustic feature serving as input of an encoder in an end-to-end neural network, the first text feature serving as input of a decoder in the end-to-end neural network, the first prosodic labeling result serving as an output of the end-to-end neural network, and the end-to-end neural network is trained to obtain a trained end-to-end neural network; and when a second acoustic feature and a second text feature of a to-be-labelled rhythm are obtained, directly outputting a second prosodic labeling result by utilizing the trainedend-to-end neural network. According to the prosodic labeling method, the acoustic features and the corresponding text features are effectively fused, thereby improving the accuracy of prosodic labeling.

Description

technical field [0001] The present application relates to the technical field of speech synthesis, in particular to a prosody tagging method, device, equipment, and medium. Background technique [0002] The synthesized sound library generally includes a large number of high-quality recorded audio clips, corresponding transcribed texts, and prosodic annotations on the transcribed texts based on the prosodic information of the recorded audio clips. How to automatically and accurately carry out the prosodic labeling of the synthesized sound bank by computer has become an important technology in the field of speech synthesis. [0003] Existing technical solution 1: first use the pre-trained text prosody prediction model to predict the prosody information of the text, and then use the pre-recorded audio to authenticate and screen the predicted text prosody information, eliminate the incorrect prosody information, and keep the correct prosody information. prosodic information to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/10G10L25/24G10L25/30
CPCG10L13/10G10L25/24G10L25/30
Inventor 谌明陆健徐欣康胡新辉
Owner ZHEJIANG TONGHUASHUN INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products