Voice-driven 3D virtual human expression voice and picture synchronization method and system based on deep learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of audio and video synchronization and deep learning, which is applied in the fields of speech recognition, computer graphics, computer vision, and speech synthesis to achieve good scalability.

Pending Publication Date: 2020-11-27

超维视界(北京)传媒科技有限公司

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] In order to overcome the problems that existing avatars do not have high naturalness of synchronizing facial expressions, audio and video, real-time interaction ability and learning ability to improve synchronous effects of facial expressions, audio and video, the present invention provides a self-supervised learning of synchronizing audio and video of virtual human method, by learning a large amount of face video data, to improve the effect of virtual human lips, making it more natural and human-like

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0048] The deep learning-based voice-driven 3D virtual human facial expression audio-video synchronization system of the present invention includes a video analysis module, a parameter extraction module, a speech synthesis module, a speech signal processing module, a parameter prediction module, a parameter filtering module and a rendering module. All modules are divided into two parts, respectively in training mode and working mode. The modules used in the training mode include: video analysis module, parameter extraction module, voice signal processing module, parameter prediction module. The modules used in the working mode include: speech synthesis module, speech signal processing module, parameter prediction module, parameter filtering modul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a voice-driven 3D virtual human expression voice and picture synchronization method and system based on deep learning. The method comprises the following steps: extracting a logarithm amplitude spectrum in a voice signal as a voice signal characteristic; inputting the voice signal characteristic into a trained parameter prediction model which outputs an expression parameter value, wherein the parameter prediction model is a neural network model obtained by training a natural label pair relationship between a voice signal and an image signal in the video data; filteringthe expression parameter value output by the parameter prediction model; and performing image rendering of a 3D figure model by using the filtered expression parameter value to realize 3D virtual figure expression sound and picture synchronization. The system comprises a video analysis module, a parameter extraction module, a voice synthesis module, a voice signal processing module, a parameter prediction module, a parameter filtering module and a rendering module. According to the invention, the mouth lip effect of a virtual person is improved by learning a large amount of face video data such that the mouth lip effect is more natural and human-like.

Description

technical field [0001] The invention relates to the fields of computer graphics, computer vision, speech recognition, speech synthesis, etc., and specifically relates to a method of using a deep neural network to fit the relationship between speech and 3D model Blend Shape values, and to realize the synchronization of speech-driven 3D virtual human expression, sound and picture methods and systems. Background technique [0002] At present, there are several types of voice-driven methods for generating virtual human facial animations: [0003] (1) Speech generates the vertex coordinates of a 3D model with a fixed topology through the neural network, and these vertex coordinates can show facial animation on the DI4D PRO system. [0004] (2) Speech drives the avatar through the confrontation network to generate different 2D images, which are reflections of different angles of a 3D model. [0005] (3) Speech is split by phonemes, and each phoneme corresponds to an animation cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06T13/40G06T15/00G06N3/04G10L13/02G11B27/10

CPCG06T13/40G06T15/005G10L13/02G11B27/10G06N3/045

Inventor 梁宏华彭超

Owner 超维视界(北京)传媒科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Voice-driven 3D virtual human expression voice and picture synchronization method and system based on deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology