Method for generating mouth shape video based on voice

A technology of video and mouth shape, which is applied in the fields of computer vision and graphics processing, can solve the problems of unable to meet the basic requirements of video chat, powerless, and unable to guarantee real-time performance, and achieve the effect of small memory usage, small calculation amount, and real-time video

Inactive Publication Date: 2018-09-18
NORTHWESTERN POLYTECHNICAL UNIV
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

LSTM selects a target mouth area from the dictionary of saved target frames instead of generating images, so a huge amount of video frames with unique target identities are required to choose from, which requires a large amount of storage space for video chat devices and strong performance, which is difficult to be satisfied in real life
Moreover, this method only compensates the picture after screening from the existing image information, and it is powerless if there is no image information to choose from.
Garrido et al. in the literature "P. Gar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for generating mouth shape video based on voice
  • Method for generating mouth shape video based on voice
  • Method for generating mouth shape video based on voice

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The present invention will be further described below in conjunction with the accompanying drawings and embodiments, and the present invention includes but not limited to the following embodiments.

[0016] The present invention provides a method for generating lip video based on speech, such as figure 1 As shown, it mainly includes the following steps:

[0017] 1. First, check the fluency of the current video call to determine whether the video screen freezes. Since the speaker's mouth shape only depends on the currently spoken phoneme (the smallest unit in speech), a 0.35-second audio clip can obtain enough mouth shape information. Furthermore, online video generally does not exceed 25 frames, considering that it is impossible for a person's mouth shape to change 25 times a second. Therefore, set the time interval to 0.35 seconds, that is, compare the current picture with the picture 0.35 seconds ago. If they are completely consistent, it can be judged that the vide...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for generating a mouth shape video based on a voice. The method comprises the steps of firstly, intercepting a stationary picture when a video lag occurs, performing detection for obtaining a face image therein, and calculating an MFCC coefficient matrix; then, performing processing by means of a trained deep network Speech2Vid model; and finally, aligning the faceimage by means of similarity transformation, and performing replacement for obtaining a new static image, and using the obtained static image as a next frame of the video. Because of dependence to existing speech video fragments, the stationary image which comprises the corresponding mouth shape face is generated as the next image directly through studying the relation between mouth shape changesin an original audio and the video image. The method can effectively alleviate video lags and improves video calling experience of the people.

Description

technical field [0001] The invention belongs to the technical fields of computer vision and graphics processing, and in particular relates to a method for generating lip-shape videos based on speech. Background technique [0002] As video chat gradually replaces traditional voice communication methods, while people enjoy the fun of face-to-face conversations, some new problems have also arisen. Restricted by different network environments in different regions, the stability of video chat is poor under poor network conditions, which brings a very bad experience to users. Among them, the audio file is small, and a smooth communication experience can be obtained even in a poor network environment, but the video information may appear to be stuck due to relatively large files. To solve this problem, the audio information of the speaker can be analyzed to generate the corresponding mouth shape, so that the incoherent picture can be compensated, and the video chat experience unde...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06T11/00G06K9/00G06K9/62G10L25/03H04N7/14
CPCH04N7/147G06T11/00G10L25/03G06V40/161G06V40/168G06F18/214
Inventor 李学龙王琦李欣
Owner NORTHWESTERN POLYTECHNICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products