Method for generating mouth shape video based on voice

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of video and mouth shape, which is applied in the fields of computer vision and graphics processing, can solve the problems of unable to meet the basic requirements of video chat, powerless, and unable to guarantee real-time performance, and achieve the effect of small memory usage, small calculation amount, and real-time video

Inactive Publication Date: 2018-09-18

NORTHWESTERN POLYTECHNICAL UNIV

View PDF3 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

LSTM selects a target mouth area from the dictionary of saved target frames instead of generating images, so a huge amount of video frames with unique target identities are required to choose from, which requires a large amount of storage space for video chat devices and strong performance, which is difficult to be satisfied in real life

Moreover, this method only compensates the picture after screening from the existing image information, and it is powerless if there is no image information to choose from.

Garrido et al. in the literature "P. Garrido, L. Valgaerts, H. Sarmadi, et al. "VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track", Computer Graphics Forum, pp.193-204, 2015 The above method was improved in the above method, and the mouth shape in the video of the dubber was transferred to the target video, so as to synthesize the face of the target speaker. However, this method requires the dubber in the video to speak the voice clip, so that the real-time performance cannot be guaranteed. , unable to meet the basic requirements of video chat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0015] The present invention will be further described below in conjunction with the accompanying drawings and embodiments, and the present invention includes but not limited to the following embodiments.

[0016] The present invention provides a method for generating lip video based on speech, such as figure 1 As shown, it mainly includes the following steps:

[0017] 1. First, check the fluency of the current video call to determine whether the video screen freezes. Since the speaker's mouth shape only depends on the currently spoken phoneme (the smallest unit in speech), a 0.35-second audio clip can obtain enough mouth shape information. Furthermore, online video generally does not exceed 25 frames, considering that it is impossible for a person's mouth shape to change 25 times a second. Therefore, set the time interval to 0.35 seconds, that is, compare the current picture with the picture 0.35 seconds ago. If they are completely consistent, it can be judged that the vide...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for generating a mouth shape video based on a voice. The method comprises the steps of firstly, intercepting a stationary picture when a video lag occurs, performing detection for obtaining a face image therein, and calculating an MFCC coefficient matrix; then, performing processing by means of a trained deep network Speech2Vid model; and finally, aligning the faceimage by means of similarity transformation, and performing replacement for obtaining a new static image, and using the obtained static image as a next frame of the video. Because of dependence to existing speech video fragments, the stationary image which comprises the corresponding mouth shape face is generated as the next image directly through studying the relation between mouth shape changesin an original audio and the video image. The method can effectively alleviate video lags and improves video calling experience of the people.

Description

technical field [0001] The invention belongs to the technical fields of computer vision and graphics processing, and in particular relates to a method for generating lip-shape videos based on speech. Background technique [0002] As video chat gradually replaces traditional voice communication methods, while people enjoy the fun of face-to-face conversations, some new problems have also arisen. Restricted by different network environments in different regions, the stability of video chat is poor under poor network conditions, which brings a very bad experience to users. Among them, the audio file is small, and a smooth communication experience can be obtained even in a poor network environment, but the video information may appear to be stuck due to relatively large files. To solve this problem, the audio information of the speaker can be analyzed to generate the corresponding mouth shape, so that the incoherent picture can be compensated, and the video chat experience unde...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06T11/00G06K9/00G06K9/62G10L25/03H04N7/14

CPCH04N7/147G06T11/00G10L25/03G06V40/161G06V40/168G06F18/214

Inventor 李学龙王琦李欣

Owner NORTHWESTERN POLYTECHNICAL UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for generating mouth shape video based on voice

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology