Personalized voice and video generation system based on phoneme posterior probability

A posteriori probability and generation system technology, applied in the field of voice and video, can solve problems such as weird lips, poor overall practicability, and difficulty in guaranteeing user-generated effects, and achieve the effect of reducing requirements

Pending Publication Date: 2020-03-13
深圳市声希科技有限公司
View PDF8 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current mainstream virtual image generation technology is to change the expression of the virtual image in real time based on facial recognition. This method is more suitable for the two-dimensional image but it is difficult to generate a virtual image similar to a real person.
In recent years, both academia and industry have been researching and developing virtual image generation technology based on real-life modeling. At present, the generation effect still needs to be fu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Personalized voice and video generation system based on phoneme posterior probability
  • Personalized voice and video generation system based on phoneme posterior probability
  • Personalized voice and video generation system based on phoneme posterior probability

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0025] see Figure 1-3 , the embodiment of the present invention provides a technical solution: a personalized voice and video generation system based on the phoneme posterior probability, which mainly includes the following steps:

[0026] S1. First, from the speech of the source speaker, a speaker-independent automatic speech recognition (SI-ASR) system is used to extract phoneme posterior probabilities (PPGs). Posterior probability-based methods are partly ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a personalized voice and video generation system based on phoneme posterior probability. The personalized voice and video generation system mainly comprises the following steps: S1, extracting phoneme posterior probability through an automatic voice recognition system; s2, training a recurrent neural network to learn a mapping relationship between phoneme posterior probability and lip features, and through the network, inputting an audio of any target speaker to output the corresponding lip feature; s3, synthesizing the lip-shaped features into a corresponding face image through face alignment, image fusion, an optical flow method and other technologies; and S4, generating a final speaker speech video from the generated face sequence through dynamic planning and other technologies. The invention relates to the technical field of speech synthesis and speech conversion. According to the method, the lip shape is generated based on the phoneme posteriori probability, the requirement for the video data volume of the target speaker is greatly reduced, meanwhile, the video of the target speaker can be directly generated from the text content, and the audio of the speaker does not need to be additionally recorded.

Description

technical field [0001] The invention relates to the technical field of voice and video, in particular to a personalized voice and video generation system based on phoneme posterior probability. Background technique [0002] With the improvement of computing power, the collection of a large amount of Internet data, and the breakthrough of core algorithms, artificial intelligence has entered a new stage of development and is gradually changing the way of human-computer interaction. An important part of the human-computer interaction process is to simulate real-life images to interact with users. The key technology is virtual image generation technology, combined with speech synthesis and speech conversion technologies, which can realize personalized speech and video synthesis. [0003] Speech synthesis is a technology that converts text into speech. Speech conversion can be used to customize timbres for synthesized speech. With the application of deep learning, the naturalnes...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/08G10L13/033G10L25/30G10L15/25
CPCG10L13/08G10L13/033G10L25/30G10L15/25
Inventor 孙立发周艺超钟静华李坤胡景强刘鹏飞
Owner 深圳市声希科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products