Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech synthesis method and device, electronic equipment and computer readable storage medium

A technology for speech synthesis and speech data, which is applied in the fields of devices, speech synthesis methods, electronic equipment and computer-readable storage media, can solve the problems of poor naturalness and readability of speech data, and achieves a high degree of naturalness that meets practical application requirements. , the effect of good practical application requirements

Pending Publication Date: 2022-04-15
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, although there are some technologies in the existing technology that can realize the synthesis of speaker voice data, the naturalness and readability of the synthesized voice data are poor. How to improve the quality of the synthesized speaker voice is still in urgent need of improvement question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and device, electronic equipment and computer readable storage medium
  • Speech synthesis method and device, electronic equipment and computer readable storage medium
  • Speech synthesis method and device, electronic equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0093] figure 1 It shows a schematic flowchart of a speech synthesis method provided by an embodiment of the present application, and the method can be executed by a user terminal or a server, such as figure 1 As shown in , the method provided in this embodiment of the present application may include the following steps 110 to 140.

[0094] Step S110: Obtain a video to be processed, where the video to be processed contains at least one target object.

[0095] The source of the video to be processed is not limited in this embodiment of the present application, and the video to be processed may be a video corresponding to one or more objects, that is, a video collected by one or more objects during speaking. The target object may be every object in the video to be processed, or one or more specified objects in the video to be processed, that is, the object of interest. In practical applications, if you only pay attention to the speech content of some objects in the video, you ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a speech synthesis method and device, electronic equipment and a computer readable storage medium, and relates to the technical field of artificial intelligence, multimedia, speech synthesis and cloud. The method comprises the following steps: respectively carrying out feature extraction on lip image sequences corresponding to target objects in a video to be processed to obtain lip language features of the target objects; for each target object, based on the lip language feature of the target object, predicting the voice content feature of the target object through a first voice content prediction network; extracting timbre features of the reference voice data of each target object; and for each target object, according to the voice content feature and the tone feature of the target object, performing prediction through an audio feature prediction network to obtain an audio feature of the target object, and according to the audio feature of the target object, obtaining target voice data of the target object corresponding to the to-be-processed video. Based on the method provided by the embodiment of the invention, high-quality voice data can be generated based on the video.

Description

technical field [0001] The present application relates to the fields of artificial intelligence, multimedia technology, speech synthesis, and cloud technology. Specifically, the present application relates to a speech synthesis method, device, electronic equipment, and computer-readable storage medium. Background technique [0002] With the breakthrough and rapid development of speech technology research, its importance to computer development and social life has become increasingly prominent. Due to its simple and convenient implementation, speech technology has appeared in many scenarios in people's daily life. [0003] Speech synthesis technology is a very practical and important technology in speech technology. How to generate high-quality synthesized speech has always been one of the important topics studied by technicians. At present, although there are some technologies in the existing technology that can realize the synthesis of speaker voice data, the naturalness an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/027G10L13/08G10L25/30G06N3/08
Inventor 王迪松阳珊苏丹俞栋
Owner TENCENT TECH (SHENZHEN) CO LTD