Speech-driven lip-synchronous face video synthesis algorithm based on concatenated convolution LSTM

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of lip synchronization and video synthesis, applied in the field of computer vision, it can solve problems such as under-constrained, and achieve the effect of expanding the receptive field and increasing the depth

Pending Publication Date: 2019-02-05

ZHEJIANG UNIV

View PDF13 Cites 54 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, it is challenging to recover high-fidelity high-dimensional low-frequency video directly from low-dimensional high-frequency speech audio signals or text-to-speech audio signals, which is a severely underconstrained ill-conditioned problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036] The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0038] According to the embodiment that the complete method of the present invention is specifically implemented is as follows:

[0039] Such as figure 2 As shown, the following system modules are used:

[0040] The input module is used to receive the audio signal of the user's input voice or the audio signal of the text-synthesized speech, and then send it to the cascaded con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech-driven lip synchronous face video synthesis algorithm of cascaded convolution LSTM. The speech video of the target person is captured as the background video, and the3D face model of the target is obtained by reconstructing the 3D face of the image sequence, and the facial animation vector sequence of the background video is obtained. The audio signal extracts thespeech features of the filter bank; The speech features of the filter bank are used as the input of the concatenated convolution short-time memory network, and the facial animation vector sequence isused as the output for the training test. Facial animation vector sequences of audio signals are used to replace facial animation vector sequences of target 3D face models to generate new 3D face models and render face images to synthesize lip-shaped synchronous face videos. The invention retains more voiceprint information, innovates to obtain the speech characteristics of the filter bank through the two-dimensional convolution neural network, expands the receptive field of the convolution neural network, increases the network depth, and obtains accurate lip-shaped synchronous face video.

Description

technical field [0001] The present invention relates to the field of computer vision and related technologies of audio signal processing, in particular to a voice-driven lip sync human face video algorithm based on cascaded convolution long short-term memory network structure (cascaded convolution LSTM). Background technique [0002] After years of exploration and development, computer vision has been applied in many fields such as digital entertainment, medical health, and security monitoring. Synthesizing realistic visual content not only has great commercial value, but also has been expected by the industry. Many movie special effects would not be possible without the comprehensive visual effects of computer synthesis. At present, there are already a large number of artificially synthesized videos on the Internet. In addition, speech recognition and text-to-speech technologies have also been widely used in chatbots. The present invention hopes to make the online chat r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06T13/40G10L21/10G10L21/0356

CPCG06T13/40G10L21/0356G10L21/10G10L2021/105Y02D10/00

Inventor朱建科江泽胤子

OwnerZHEJIANG UNIV

Speech-driven lip-synchronous face video synthesis algorithm based on concatenated convolution LSTM

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology