Adaptive emotion expression speaker facial animation generation method and electronic device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speaker and face technology, applied in the field of human-computer interaction, can solve the problems of cold face animation, poor effect, and sense of distance, and achieve the effect of ensuring diversity, continuity, and naturalness.

Active Publication Date: 2021-11-23

INST OF SOFTWARE - CHINESE ACAD OF SCI

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Most of these works do not consider the emotional state of the speaker. The generated facial animation gives people a feeling of coldness, mechanicalness, rigidity, or a sense of distance, or does not consider the speaker's adaptive emotional addition when modeling expressions. It will cause the generated facial animation to be not natural and rich enough

At the same time, some template matching and hidden Markov model-based methods are not effective in continuous expression of pronunciation when modeling pronunciation

[0003] Chinese patent application CN201910745062.8 provides a method and related devices for synthesizing speech expressions based on artificial intelligence. First of all, the method does not have the joint modeling of the two-dimensional speaker pronunciation picture sequence and the three-dimensional virtual human head pronunciation movement sequence Secondly, in this invention, the text and acoustic features of the pronunciation elements and duration are input into the expression model to obtain the corresponding expression features without the adaptive process of adding expressions

[0004] Chinese patent application CN201310173929.X provides a method of real-time voice-driven face animation, the method obtains voice parameters and visual parameters, constructs a training data set, converts voice parameters into visual parameters for modeling and model training, and constructs A set of blendshape corresponding to the face model establishes the transformation from visual parameters to facial animation parameters, but the expression control in this invention only interpolates the current face shape and the specified facial expression, not an adaptive expression control method , and this method is not effective in continuous expression of pronunciation modeling

[0005] Chinese patent application CN201611261096.2 provides a method for driving the expression and posture of a character model based on voice in real time. The method acquires voice data, and the voice drive module synchronously receives the voice stream and the emotional label set corresponding to the voice stream, and calculates the basic animation Weight value, calculate the weight value of the modified animation, calculate the weight value of the basic lip animation, use the voice driver module to analyze the lip animation of the voice stream, and calculate the basic pronunciation PP, FF, TH, DD, H, CH, SS, NN, RR, AH, EI, IH, OH, WU the weight value of the basic mouth shape animation, modify the synthesized animation, and use the voice driving module to correct the synthesized basic expression animation, the modified animation and the basic animation. Lip animation to generate facial model grids, but in this invention, the need to preset artificial coefficients when adding expressions is not a method of adaptively adding expressions, and the use of template matching and splicing reduces the continuity of pronunciation expression

[0006] Chinese patent application CN110874869A discloses a method and device for generating virtual animated expressions, but this method is not suitable for generating facial animations when people speak naturally due to lack of coherence between frames when generating expressions frame by frame, but for Generate the specified facial expression frame by frame; need to use the expression label to determine the second weight of the expression, resulting in the label is usually not rich enough

In addition, the method does not use the correlation between speech information and visual information to model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] The present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that the described embodiments are for the purpose of illustration only, and do not limit the scope of the present invention.

[0047] The invention discloses a method for generating a speaker's facial animation for adaptive emotional expression. The structural diagram of the method is as follows figure 1 shown.

[0048] S100, a continuous expression pronunciation model construction method.

[0049]The pronunciation model is a phoneme-level pronunciation fitting model of cross-modal audio-visual nonlinear mapping, which establishes the mapping relationship between continuous input audio features and output visual features. Before training the pronunciation model, continuous audio features and visual features are obtained from the neutral pronunciation video dataset. The audio features include phoneme features and spectral features, and the visual ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method and an electronic device for generating a speaker's facial animation adaptive to emotional expression, comprising: acquiring key points of the pronunciation face of the current speaker's neutral facial state and audio features of the audio sequence; The motion difference of information; according to the phoneme features in the audio feature and the key points of the pronunciation face, the motion difference containing emotional information is obtained; according to the key points of the pronunciation face, the motion difference containing pronunciation information and the motion difference containing emotional information, the face is generated Key point motion sequence; generate the facial animation of the current speaker through the key point pronunciation motion sequence of the face. The present invention guarantees the naturalness of the emotional expression of the speaker's facial animation through the pronunciation expression dictionary; designs an adaptive emotion adding method to ensure the diversity of the speaker's expression; does not need preset expression tags, and can express rich emotions Emotional pronunciation animation.

Description

technical field [0001] The present invention relates to the technical field of human-computer interaction and the relevant technical field of computer vision, in particular to a method for generating a speaker's facial animation for adaptive emotional expression and an electronic device. Background technique [0002] Human faces are first perceived by visual organs in daily communication, and can convey emotional information in the heart. In the process of communication between people and computers, if computers can accurately simulate people's emotional states, it will greatly shorten the distance between people and computers. distance. Artificial intelligence is the trend of future development, and with the emergence and popularization of various functional intelligent robots, people hope that they can communicate with computers and robots in the same way as humans. Let the robot receive the user's voice signal, observe the user's facial expression state, and use the lear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06T13/20G06T13/40G06N3/04

CPCG06T13/205G06T13/40G06N3/045

Inventor 陈辉姚乃明李博宇乔逢春白泽琛王宏安

Owner INST OF SOFTWARE - CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Adaptive emotion expression speaker facial animation generation method and electronic device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology