Personalized voice and video generation system based on phoneme posterior probability

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A posteriori probability and generation system technology, applied in the field of voice and video, can solve problems such as weird lips, poor overall practicability, and difficulty in guaranteeing user-generated effects, and achieve the effect of reducing requirements

Pending Publication Date: 2020-03-13

深圳市声希科技有限公司

View PDF8 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The current mainstream virtual image generation technology is to change the expression of the virtual image in real time based on facial recognition. This method is more suitable for the two-dimensional image but it is difficult to generate a virtual image similar to a real person.

In recent years, both academia and industry have been researching and developing virtual image generation technology based on real-life modeling. At present, the generation effect still needs to be further improved, such as weird lips, blunt voice, facial movements and voice are inconsistent, and the face, especially the lips, has relatively small pixels. low level problem

In addition, the technology has certain requirements on the video data volume of the target speaker. The insufficient data volume is difficult to guarantee the user's generation effect, which reduces the user's experience. The overall practicability is not strong, and it is not convenient for the user to operate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0024] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0025] see Figure 1-3 , the embodiment of the present invention provides a technical solution: a personalized voice and video generation system based on the phoneme posterior probability, which mainly includes the following steps:

[0026] S1. First, from the speech of the source speaker, a speaker-independent automatic speech recognition (SI-ASR) system is used to extract phoneme posterior probabilities (PPGs). Posterior probability-based methods are partly ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a personalized voice and video generation system based on phoneme posterior probability. The personalized voice and video generation system mainly comprises the following steps: S1, extracting phoneme posterior probability through an automatic voice recognition system; s2, training a recurrent neural network to learn a mapping relationship between phoneme posterior probability and lip features, and through the network, inputting an audio of any target speaker to output the corresponding lip feature; s3, synthesizing the lip-shaped features into a corresponding face image through face alignment, image fusion, an optical flow method and other technologies; and S4, generating a final speaker speech video from the generated face sequence through dynamic planning and other technologies. The invention relates to the technical field of speech synthesis and speech conversion. According to the method, the lip shape is generated based on the phoneme posteriori probability, the requirement for the video data volume of the target speaker is greatly reduced, meanwhile, the video of the target speaker can be directly generated from the text content, and the audio of the speaker does not need to be additionally recorded.

Description

technical field [0001] The invention relates to the technical field of voice and video, in particular to a personalized voice and video generation system based on phoneme posterior probability. Background technique [0002] With the improvement of computing power, the collection of a large amount of Internet data, and the breakthrough of core algorithms, artificial intelligence has entered a new stage of development and is gradually changing the way of human-computer interaction. An important part of the human-computer interaction process is to simulate real-life images to interact with users. The key technology is virtual image generation technology, combined with speech synthesis and speech conversion technologies, which can realize personalized speech and video synthesis. [0003] Speech synthesis is a technology that converts text into speech. Speech conversion can be used to customize timbres for synthesized speech. With the application of deep learning, the naturalnes...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/08G10L13/033G10L25/30G10L15/25

CPCG10L13/08G10L13/033G10L25/30G10L15/25

Inventor孙立发周艺超钟静华李坤胡景强刘鹏飞

Owner深圳市声希科技有限公司

Personalized voice and video generation system based on phoneme posterior probability

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology