Speaking face video generation method and device based on convolutional neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional neural network and speaker technology, applied in the field of voice-driven speaking face video generation, can solve the problem of low authenticity of the speaking face video

Active Publication Date: 2021-09-10

ANHUI UNIVERSITY

View PDF11 Cites 20 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The technical problem to be solved by the present invention is that the prior art voice-driven talking face video generation method cannot generate a talking face video with high definition and synchronous lip movement and voice, resulting in low authenticity of the generated speaking face video

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0049] Such as figure 1 Shown, a kind of speech face video generation method based on convolutional neural network, described method comprises:

[0050] S1: Construct the data set; the specific process is: about 200 hours of raw video data are collected, and the frame rate of video transmission per second is 25fps. Use the MTCNN model to identify the key points of the face in the high-definition news anchor video, obtain the coordinates of 48 key points, and then calculate the similarity with the key points of the target person's face. The set similarity threshold is 0.8. When the calculation result is greater than 0.8 , the person in the video and the target person are considered to be the same person, record the position of the video frame with high face similarity in the original video, and use FFMPEG software to intercept the target anchor video from the original video according to the recorded position of the target person’s video frame part. Use the DLIB model to ident...

Embodiment 2

[0065] Corresponding to Embodiment 1 of the present invention, Embodiment 2 of the present invention also provides a device for generating a talking face video based on a convolutional neural network, the device comprising:

[0066] Dataset building blocks for building datasets;

[0067] Lip sync module for designing lip sync discriminators;

[0068] The first training module is used to use the data set to train the lip sync discriminator to obtain the trained lip sync discriminator;

[0069] The speaking face generation network construction module is used to construct the speech face generation network, and the speech face generation network includes a voice encoder, a super-resolution module, a face encoder, a face decoder, a face visual discriminator, and a predictor Trained lip sync discriminator, input speech to speech encoder and lip sync discriminator, input face picture to super-resolution module and face visual discriminator, super-resolution module reconstructs face...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speaking face video generation method and device based on a convolutional neural network. The method comprises the following steps: constructing a data set; designing a lip sound synchronous discriminator; training a lip sound synchronization discriminator by using the data set to obtain a trained lip sound synchronization discriminator; constructing a speaking face generation network; training a speaking face generation network by using the data set to obtain a trained speaking face generation network; inputting the target voice and the face picture of the target person into a trained speaking face generation network to generate a video in which the target person speaks the target voice. The invention has the advantages that a speaking face video which is high in definition, synchronous in lip movement and voice and high in authenticity can be generated.

Description

technical field [0001] The present invention relates to the field of voice-driven speaking human face video generation, and more specifically relates to a method and device for generating a speaking human face video based on a convolutional neural network. Background technique [0002] Speech-driven talking face generation aims to generate a speech video corresponding to the speech content given any speech. In recent years, voice-driven talking face video generation technology is a hot research topic in the field of deep learning, and has been widely used in animation character synthesis, virtual interaction, movie dubbing and other fields. [0003] Speech-driven face generation is a multi-modal generation task that realizes the mapping between audio auditory information and visual information. Existing methods achieve good results in low-resolution images, but the definition of face images generated in high-resolution images is low, especially the teeth area will appear bl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08G10L21/10

CPCG06N3/08G10L21/10G10L2021/105G06N3/045G06F18/22G06F18/214

Inventor 李腾刘晨然王妍

Owner ANHUI UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speaking face video generation method and device based on convolutional neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology