Virtual video customer service robot synthesis method and system based on generative adversarial network

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of virtual video and synthesis method, applied in the field of face video synthesis, which can solve the problems of artificial processing traces, inability to switch the speaker's language, and inability to achieve good alignment of lip shape and voice, etc., to achieve good authenticity, good expansibility effect

Pending Publication Date: 2022-07-26

TIANJIN UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] However, most of the existing virtual video customer service synthesis methods and systems cannot achieve true and reliable all-in-one synthesis from text to video

Specifically reflected in: the alignment of lip shape and voice cannot be well realized, the language of the speaker cannot be switched according to user needs, and the corresponding facial expressions and voice intonation cannot be generated according to the emotion of the expressed words and sentences.

Although these systems have the primary functions of video customer service, they cannot better approach the habit of speaking with real people, and the traces of manual processing are more obvious

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0032] An embodiment of the present invention provides a method for synthesizing a virtual video customer service robot based on a generative adversarial network, the method comprising the following steps:

[0033] 101: Use the you-get tool to collect 1,000 CCTV news broadcast videos of different characters as the corresponding Chinese corpus-video dataset, and organize them in the format of the LRS2 dataset.

[0034] Further, extract the audio from the video with the ffmpeg tool, and convert the audio file into mel blocks for network reading through the python library librosa, and crop the video into MP4 format files with a resolution of 256*256 and a duration of 15 seconds to complete the data. Preprocessing of the set.

[0035] 102: Train the Wav2Lip network model on the collected Chinese dataset. The model can extract the mapping relationship between sound and lip shape through the face decoder and audio decoder, generate a synthetic lip shape, and pass the pre-trained li...

Embodiment 2

[0043] The scheme in Embodiment 1 is further introduced below in conjunction with specific examples and calculation formulas, and is described in detail below:

[0044] 1. Data preparation

[0045] The invention uses the you-get tool to collect 1000 CCTV news broadcast videos of different characters as the corresponding Chinese corpus-video data set, and organizes them according to the format of the LRS2 data set. Further, the dataset is preprocessed with ffmpeg and librosa tools.

[0046] The dataset consists of corresponding audio and video. The video part contains the broadcast content of 5 different male anchors and 5 different female anchors, the frame rate is 25 fps, the resolution is cropped to 256*256, the duration is 25 seconds, and the format is MP4; the audio part is from the video The extracted Mel block is used for the network to obtain sound information directly.

[0047] 2. Model training

[0048] The present invention includes four modules: a lip shape gene...

Embodiment 3

[0079] The embodiments of the present invention can be used not only in the generation of virtual video customer service, but also in the following application scenarios.

[0080] For example, let historical figures and static pictures complete specific actions such as singing and saying holiday wishes, such as importing the question corpus in advance, the system of virtual video customer service robot can be applied to campus welcome robots, psychological counseling robots, etc., which can enable students and robots to achieve face-to-face Real communication, to achieve better human-computer interaction.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of face video synthesis, and discloses a virtual video customer service robot synthesis method and system based on a generative adversarial network, and the virtual video customer service robot synthesis system based on the generative adversarial network. Comprising a lip shape generator module, an expression generator module, a text sentiment analysis module and a text speech synthesis module. The virtual video customer service robot synthesis method and system based on the generative adversarial network are innovatively characterized in that two schemes for synthesizing the virtual video customer service robot are provided, and the schemes can be autonomously selected by a user according to requirements; according to the synthesis scheme, the user can realize synthesis of various languages, arbitrary selection of customer service images and application of various scenes, and the emotion of a speaker is fused into the video synthesis process, so that the authenticity is good; a set of system based on a Web end is integrated, and users are supported to directly log in websites, upload audio and video materials, perform online synthesis and perform batch rapid production.

Description

technical field [0001] The invention relates to the technical field of face video synthesis, in particular to a virtual video customer service robot synthesis method and system based on a generative confrontation network. Background technique [0002] Face video synthesis is an emerging and challenging problem in computer vision, and virtual video robots based on this technology are gaining more and more attention. The virtual video customer service robot includes modules such as lip shape generation, expression generation, and speech synthesis. It is expected to truly imitate the lip movements, voices and facial expressions of people when they speak. [0003] Inspired by the successful application of deep learning in the field of computer vision, deep learning-based face video synthesis has achieved excellent performance and good visual effects. At present, some important benchmark datasets have been proposed in the field of face video synthesis, such as GRID [1], TIMIT [2...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04N21/81G06N3/04G06N3/08

CPCH04N21/8146G06N3/08G06N3/044G06N3/045

Inventor张轩宇王逸超刘昱麟朱鹏飞

OwnerTIANJIN UNIV

Virtual video customer service robot synthesis method and system based on generative adversarial network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology