Expression synthesis method, device and computer storage medium based on phoneme drive

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An expression synthesis and phoneme technology, applied in the field of image processing, can solve the problems of unable to obtain expression synthesis video, fixed scene, lack of background, etc.

Active Publication Date: 2022-06-17

BEIJING CENTURY TAL EDUCATION TECH CO LTD

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Generally speaking, people's facial information includes expression information and lip shape (mouth shape) information. Under normal circumstances, the expression information and lip shape information will change with the change of pronunciation. However, in the current related technology, It is not yet possible to obtain real-like expression synthesis videos, especially prone to problems such as blurred faces, missing backgrounds, or fixed scenes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

no. 1 example

[0033] figure 1 A schematic flowchart of the phoneme-driven expression synthesis method according to the first embodiment of the present invention is shown. like figure 1 As shown, the phoneme-driven expression synthesis method of this embodiment mainly includes the following steps:

[0034] Step S1, identify the target speech text according to the pre-built database to obtain a phoneme sequence, and convert the phoneme sequence into a corresponding replacement expression parameter sequence.

[0035] Optionally, the target voice text in this embodiment of the present invention refers to a voice file recorded in a text form, which is, for example, any existing voice text file, or may be generated by converting audio files by using audio-to-text software. Speech text file.

[0036] Optionally, the audio file may be an existing voice resource or a voice resource generated by temporary recording. In addition, the audio-to-text software may be audio conversion software known to...

no. 2 example

[0053] image 3 A schematic flowchart of the phoneme-driven expression synthesis method according to the second embodiment of the present invention is shown.

[0054] In this embodiment, the above-mentioned recognizing the target speech text to obtain a phoneme sequence, and converting the phoneme sequence into a replacement expression parameter sequence according to a pre-built database (ie step S1) may further include:

[0055] Step S11, editing the correspondence between each phoneme data and each replacement expression parameter to generate a pre-built database.

[0056] Optionally, the above step S11 further includes the following processing steps:

[0057] First, step S111 is executed to construct the phoneme data in the pre-built database.

[0058] In the prior art, the extracted phonemes generally include 18 vowel phonemes and 25 consonant phonemes, for a total of 43 pronunciation phonemes, as shown in Table 1 below, plus silent phonemes, a total of 44 phonemes.

[...

no. 3 example

[0080] Figure 4 A schematic flowchart of the phoneme-driven expression synthesis method according to the third embodiment of the present invention is shown.

[0081] In an optional embodiment, the frame-by-frame rendering of the target two-dimensional image sequence (ie, step S4) may further include the following processing steps:

[0082] Step S41, acquiring a target two-dimensional image corresponding to the current frame in the target two-dimensional image sequence and performing rendering processing.

[0083] Step S42, repeating step S41, that is, acquiring a target two-dimensional image corresponding to the current frame in the target two-dimensional image sequence and performing rendering processing, until all target two-dimensional images corresponding to each frame in the target two-dimensional image sequence are The images are all rendered.

[0084] please continue Figure 5 , in an optional embodiment, the above-mentioned acquiring a target two-dimensional image ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A phoneme-driven facial expression synthesis method, device, and computer storage medium, mainly including identifying a target speech text according to a pre-built database to obtain a phoneme sequence, and converting the phoneme sequence into a sequence of replacement expression parameters; the speech duration based on the target speech text is changed from the original Extract the original sub-video data to be replaced from the video data; construct a 3D face model based on the face in the original sub-video data, and extract the expression parameters of the 3D face model to be replaced frame by frame to generate a sequence of expression parameters to be replaced. The expression parameter sequence is replaced by the expression parameter sequence to be replaced; the three-dimensional face model is driven by the replaced expression parameter sequence to generate a target two-dimensional image sequence, and the target two-dimensional image sequence is rendered frame by frame; and the target two-dimensional image sequence after splicing and rendering is generated to generate Destination sub video data to replace original sub video data. The present invention can efficiently and accurately obtain more realistic expression synthesis video.

Description

technical field [0001] Embodiments of the present invention relate to image processing technologies, and in particular, to a phoneme-driven expression synthesis method, device, and computer storage medium. Background technique [0002] With the advancement of computer technology, face-based image processing technology has developed from two-dimensional to three-dimensional, and has received extensive attention due to the stronger realism of three-dimensional-based image processing. [0003] Generally speaking, human face information includes expression information and lip shape (lip shape) information, and in general, the expression information and lip shape information will change with the change of pronunciation, however, in the current related art, It is not yet possible to obtain a composite video of facial expressions like real effects, and it is especially prone to problems such as blurred faces, missing backgrounds, or fixed scenes. SUMMARY OF THE INVENTION [0004...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06T17/00G06T15/20G06T11/60

CPCG06T17/00G06T15/205G06T11/60

Inventor 王骁冀志龙刘霄

Owner BEIJING CENTURY TAL EDUCATION TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Expression synthesis method, device and computer storage medium based on phoneme drive

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

no. 1 example

no. 2 example

no. 3 example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology