A deep learning-based emotional speech synthesis method and device

A technology of speech synthesis and deep learning, applied in speech synthesis, speech analysis, instruments, etc., to enrich the emotion of conversational speech, improve naturalness, and improve the experience of human-computer communication

Active Publication Date: 2022-07-05
SUNING CLOUD COMPUTING CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the above technical problems, the present invention provides an emotional speech synthesis method and device based on deep learning, which can realize the synthesis of emotional speech without manually labeling emotions one by one

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A deep learning-based emotional speech synthesis method and device
  • A deep learning-based emotional speech synthesis method and device
  • A deep learning-based emotional speech synthesis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0078] like figure 1 As shown, this embodiment provides an emotional speech synthesis method based on deep learning, which belongs to the field of speech synthesis. Through this method, emotional speech can be synthesized without the need to manually mark emotions, and the emotion of synthesized speech can be effectively improved. naturalness.

[0079] combine figure 1 , 2 As shown, the method includes the following steps:

[0080] S1. Extract the text information to be processed and the preceding information of the text information to be processed.

[0081] Specifically, when the processing object is a text object, the preceding information includes preceding text information;

[0082] When the processing object is a speech object or a video object, the preceding information includes preceding text information and preceding speech information.

[0083] It should be noted that, in this step, extracting text information from text objects, extracting text information and vo...

Embodiment 2

[0127] In order to implement the deep learning-based emotional speech synthesis method in the first embodiment, this embodiment provides a deep learning-based emotional speech synthesis apparatus 100 .

[0128] Figure 5 A schematic diagram of the structure of the deep learning-based emotional speech synthesis device 100, such as Figure 5 As shown, the device 100 includes at least:

[0129] Extraction module 1: used to extract the text information to be processed and the preceding information of the text information to be processed, and the preceding information includes the preceding text information;

[0130] Emotional feature information generation module 2: used to generate emotional feature information through a pre-built first model with the text information to be processed and the preceding information as input;

[0131] Emotional speech synthesis module 3: used for synthesizing emotional speech through the pre-trained second model using the emotional feature informa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep learning-based emotional speech synthesis method and device, belonging to the field of speech synthesis. The method at least comprises the following steps: extracting text information to be processed and preceding information of the text information to be processed, the preceding information including preceding text information; The text information to be processed and the preceding information are input, and the emotional feature information is generated by the pre-built first model; the emotional feature information and the text information to be processed are used as the input, and the emotional speech is synthesized by the pre-trained second model. On the basis of only obtaining text information, based on deep learning, the synthesis of emotional speech is realized without the need to manually label each acoustic pronunciation in advance. This method can further reduce the labeling error and improve the emotional information while reducing labor costs. It can enrich the emotion of dialogue speech, improve the naturalness and fluency of synthesized speech, improve the human-computer communication experience, and has wide adaptability.

Description

technical field [0001] The present invention relates to the field of speech synthesis, in particular to a deep learning-based emotional speech synthesis method and device. Background technique [0002] With the current social development, people hope that machines can replace humans to perform some simple and frequent tasks, such as broadcasting and simple customer service work. It is hoped that natural and harmonious communication with machines can be achieved. Voice, as an important communication method in human society, largely determines the realization of this natural and harmonious communication between humans and machines. Therefore, speech synthesis has important research significance in the fields of affective computing and signal processing. The delicate emotional expression can greatly improve the naturalness of the synthesized speech. [0003] The existing practice is generally based on the labeling information, and each acoustic pronunciation in each speech is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/02G10L13/08G10L25/27G10L25/63
CPCG10L13/02G10L13/08G10L25/27G10L25/63
Inventor 钟雨崎
Owner SUNING CLOUD COMPUTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products