Speaking face video generation method and device based on convolutional neural network

A convolutional neural network and speaker technology, applied in the field of voice-driven speaking face video generation, can solve the problem of low authenticity of the speaking face video

Active Publication Date: 2021-09-10
ANHUI UNIVERSITY
View PDF11 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is that the prior art voice-driven talking face video generation method cannot generate a talking face video with high definition and synchronous lip movement and voice, resulting in low authenticity of the generated speaking face video

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaking face video generation method and device based on convolutional neural network
  • Speaking face video generation method and device based on convolutional neural network
  • Speaking face video generation method and device based on convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] Such as figure 1 Shown, a kind of speech face video generation method based on convolutional neural network, described method comprises:

[0050] S1: Construct the data set; the specific process is: about 200 hours of raw video data are collected, and the frame rate of video transmission per second is 25fps. Use the MTCNN model to identify the key points of the face in the high-definition news anchor video, obtain the coordinates of 48 key points, and then calculate the similarity with the key points of the target person's face. The set similarity threshold is 0.8. When the calculation result is greater than 0.8 , the person in the video and the target person are considered to be the same person, record the position of the video frame with high face similarity in the original video, and use FFMPEG software to intercept the target anchor video from the original video according to the recorded position of the target person’s video frame part. Use the DLIB model to ident...

Embodiment 2

[0065] Corresponding to Embodiment 1 of the present invention, Embodiment 2 of the present invention also provides a device for generating a talking face video based on a convolutional neural network, the device comprising:

[0066] Dataset building blocks for building datasets;

[0067] Lip sync module for designing lip sync discriminators;

[0068] The first training module is used to use the data set to train the lip sync discriminator to obtain the trained lip sync discriminator;

[0069] The speaking face generation network construction module is used to construct the speech face generation network, and the speech face generation network includes a voice encoder, a super-resolution module, a face encoder, a face decoder, a face visual discriminator, and a predictor Trained lip sync discriminator, input speech to speech encoder and lip sync discriminator, input face picture to super-resolution module and face visual discriminator, super-resolution module reconstructs face...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speaking face video generation method and device based on a convolutional neural network. The method comprises the following steps: constructing a data set; designing a lip sound synchronous discriminator; training a lip sound synchronization discriminator by using the data set to obtain a trained lip sound synchronization discriminator; constructing a speaking face generation network; training a speaking face generation network by using the data set to obtain a trained speaking face generation network; inputting the target voice and the face picture of the target person into a trained speaking face generation network to generate a video in which the target person speaks the target voice. The invention has the advantages that a speaking face video which is high in definition, synchronous in lip movement and voice and high in authenticity can be generated.

Description

technical field [0001] The present invention relates to the field of voice-driven speaking human face video generation, and more specifically relates to a method and device for generating a speaking human face video based on a convolutional neural network. Background technique [0002] Speech-driven talking face generation aims to generate a speech video corresponding to the speech content given any speech. In recent years, voice-driven talking face video generation technology is a hot research topic in the field of deep learning, and has been widely used in animation character synthesis, virtual interaction, movie dubbing and other fields. [0003] Speech-driven face generation is a multi-modal generation task that realizes the mapping between audio auditory information and visual information. Existing methods achieve good results in low-resolution images, but the definition of face images generated in high-resolution images is low, especially the teeth area will appear bl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06N3/08G10L21/10
CPCG06N3/08G10L21/10G10L2021/105G06N3/045G06F18/22G06F18/214
Inventor 李腾刘晨然王妍
Owner ANHUI UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products