Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Expression synthesis method, device and computer storage medium based on phoneme drive

An expression synthesis and phoneme technology, applied in the field of image processing, can solve the problems of unable to obtain expression synthesis video, fixed scene, lack of background, etc.

Active Publication Date: 2022-06-17
BEIJING CENTURY TAL EDUCATION TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Generally speaking, people's facial information includes expression information and lip shape (mouth shape) information. Under normal circumstances, the expression information and lip shape information will change with the change of pronunciation. However, in the current related technology, It is not yet possible to obtain real-like expression synthesis videos, especially prone to problems such as blurred faces, missing backgrounds, or fixed scenes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Expression synthesis method, device and computer storage medium based on phoneme drive
  • Expression synthesis method, device and computer storage medium based on phoneme drive
  • Expression synthesis method, device and computer storage medium based on phoneme drive

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0033] figure 1 A schematic flowchart of the phoneme-driven expression synthesis method according to the first embodiment of the present invention is shown. like figure 1 As shown, the phoneme-driven expression synthesis method of this embodiment mainly includes the following steps:

[0034] Step S1, identify the target speech text according to the pre-built database to obtain a phoneme sequence, and convert the phoneme sequence into a corresponding replacement expression parameter sequence.

[0035] Optionally, the target voice text in this embodiment of the present invention refers to a voice file recorded in a text form, which is, for example, any existing voice text file, or may be generated by converting audio files by using audio-to-text software. Speech text file.

[0036] Optionally, the audio file may be an existing voice resource or a voice resource generated by temporary recording. In addition, the audio-to-text software may be audio conversion software known to...

no. 2 example

[0053] image 3 A schematic flowchart of the phoneme-driven expression synthesis method according to the second embodiment of the present invention is shown.

[0054] In this embodiment, the above-mentioned recognizing the target speech text to obtain a phoneme sequence, and converting the phoneme sequence into a replacement expression parameter sequence according to a pre-built database (ie step S1) may further include:

[0055] Step S11, editing the correspondence between each phoneme data and each replacement expression parameter to generate a pre-built database.

[0056] Optionally, the above step S11 further includes the following processing steps:

[0057] First, step S111 is executed to construct the phoneme data in the pre-built database.

[0058] In the prior art, the extracted phonemes generally include 18 vowel phonemes and 25 consonant phonemes, for a total of 43 pronunciation phonemes, as shown in Table 1 below, plus silent phonemes, a total of 44 phonemes.

[...

no. 3 example

[0080] Figure 4 A schematic flowchart of the phoneme-driven expression synthesis method according to the third embodiment of the present invention is shown.

[0081] In an optional embodiment, the frame-by-frame rendering of the target two-dimensional image sequence (ie, step S4) may further include the following processing steps:

[0082] Step S41, acquiring a target two-dimensional image corresponding to the current frame in the target two-dimensional image sequence and performing rendering processing.

[0083] Step S42, repeating step S41, that is, acquiring a target two-dimensional image corresponding to the current frame in the target two-dimensional image sequence and performing rendering processing, until all target two-dimensional images corresponding to each frame in the target two-dimensional image sequence are The images are all rendered.

[0084] please continue Figure 5 , in an optional embodiment, the above-mentioned acquiring a target two-dimensional image ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A phoneme-driven facial expression synthesis method, device, and computer storage medium, mainly including identifying a target speech text according to a pre-built database to obtain a phoneme sequence, and converting the phoneme sequence into a sequence of replacement expression parameters; the speech duration based on the target speech text is changed from the original Extract the original sub-video data to be replaced from the video data; construct a 3D face model based on the face in the original sub-video data, and extract the expression parameters of the 3D face model to be replaced frame by frame to generate a sequence of expression parameters to be replaced. The expression parameter sequence is replaced by the expression parameter sequence to be replaced; the three-dimensional face model is driven by the replaced expression parameter sequence to generate a target two-dimensional image sequence, and the target two-dimensional image sequence is rendered frame by frame; and the target two-dimensional image sequence after splicing and rendering is generated to generate Destination sub video data to replace original sub video data. The present invention can efficiently and accurately obtain more realistic expression synthesis video.

Description

technical field [0001] Embodiments of the present invention relate to image processing technologies, and in particular, to a phoneme-driven expression synthesis method, device, and computer storage medium. Background technique [0002] With the advancement of computer technology, face-based image processing technology has developed from two-dimensional to three-dimensional, and has received extensive attention due to the stronger realism of three-dimensional-based image processing. [0003] Generally speaking, human face information includes expression information and lip shape (lip shape) information, and in general, the expression information and lip shape information will change with the change of pronunciation, however, in the current related art, It is not yet possible to obtain a composite video of facial expressions like real effects, and it is especially prone to problems such as blurred faces, missing backgrounds, or fixed scenes. SUMMARY OF THE INVENTION [0004...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06T17/00G06T15/20G06T11/60
CPCG06T17/00G06T15/205G06T11/60
Inventor 王骁冀志龙刘霄
Owner BEIJING CENTURY TAL EDUCATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products