Unlock instant, AI-driven research and patent intelligence for your innovation.
A deep learning-based emotional speech synthesis method and device
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and deep learning, applied in speech synthesis, speech analysis, instruments, etc., to enrich the emotion of conversational speech, improve naturalness, and improve the experience of human-computer communication
Active Publication Date: 2022-07-05
SUNING CLOUD COMPUTING CO LTD
View PDF3 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0005] In order to solve the above technical problems, the present invention provides an emotional speech synthesis method and device based on deep learning, which can realize the synthesis of emotional speech without manually labeling emotions one by one
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0078] like figure 1 As shown, this embodiment provides an emotional speech synthesis method based on deep learning, which belongs to the field of speech synthesis. Through this method, emotional speech can be synthesized without the need to manually mark emotions, and the emotion of synthesized speech can be effectively improved. naturalness.
[0079] combine figure 1 , 2 As shown, the method includes the following steps:
[0080] S1. Extract the text information to be processed and the preceding information of the text information to be processed.
[0081] Specifically, when the processing object is a text object, the preceding information includes preceding text information;
[0082] When the processing object is a speech object or a video object, the preceding information includes preceding text information and preceding speech information.
[0083] It should be noted that, in this step, extracting text information from text objects, extracting text information and vo...
Embodiment 2
[0127] In order to implement the deep learning-based emotional speech synthesis method in the first embodiment, this embodiment provides a deep learning-based emotional speech synthesis apparatus 100 .
[0128] Figure 5 A schematic diagram of the structure of the deep learning-based emotional speech synthesis device 100, such as Figure 5 As shown, the device 100 includes at least:
[0129] Extraction module 1: used to extract the text information to be processed and the preceding information of the text information to be processed, and the preceding information includes the preceding text information;
[0130] Emotional feature information generation module 2: used to generate emotional feature information through a pre-built first model with the text information to be processed and the preceding information as input;
[0131] Emotional speech synthesis module 3: used for synthesizing emotional speech through the pre-trained second model using the emotional feature informa...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention discloses a deep learning-based emotional speech synthesis method and device, belonging to the field of speech synthesis. The method at least comprises the following steps: extracting text information to be processed and preceding information of the text information to be processed, the preceding information including preceding text information; The text information to be processed and the preceding information are input, and the emotional feature information is generated by the pre-built first model; the emotional feature information and the text information to be processed are used as the input, and the emotional speech is synthesized by the pre-trained second model. On the basis of only obtaining text information, based on deep learning, the synthesis of emotional speech is realized without the need to manually label each acoustic pronunciation in advance. This method can further reduce the labeling error and improve the emotional information while reducing labor costs. It can enrich the emotion of dialogue speech, improve the naturalness and fluency of synthesized speech, improve the human-computer communication experience, and has wide adaptability.
Description
technical field [0001] The present invention relates to the field of speech synthesis, in particular to a deep learning-based emotional speech synthesis method and device. Background technique [0002] With the current social development, people hope that machines can replace humans to perform some simple and frequent tasks, such as broadcasting and simple customer service work. It is hoped that natural and harmonious communication with machines can be achieved. Voice, as an important communication method in human society, largely determines the realization of this natural and harmonious communication between humans and machines. Therefore, speech synthesis has important research significance in the fields of affective computing and signalprocessing. The delicate emotional expression can greatly improve the naturalness of the synthesized speech. [0003] The existing practice is generally based on the labeling information, and each acoustic pronunciation in each speech is...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.