Video synthesis method and device, equipment and storage medium

A video synthesis and video technology, applied in the field of video processing, can solve problems affecting user experience, poor anchor video effect, voice content and lips are out of sync, etc., to achieve the effect of improving viewing experience

Active Publication Date: 2021-05-28
北京中科闻歌科技股份有限公司
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] When the anchor generated by AI synthetic anchor broadcasts news articles, the voice content and lips

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Video synthesis method and device, equipment and storage medium
  • Video synthesis method and device, equipment and storage medium
  • Video synthesis method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] figure 1 It is a schematic flow chart of a video synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of performing video synthesis based on the text to be synthesized and the video to be synthesized. The method can be executed by a video synthesis device, wherein the system It can be implemented by software and / or hardware, and is generally integrated in a terminal or server. For details, see figure 1 As shown, the method may include the following steps:

[0036] S110. Acquire the text to be synthesized and the video to be synthesized.

[0037] Wherein, the text to be synthesized refers to a text file that needs to be played by the target object. The text to be synthesized may be Chinese text, English text, or text written in other languages, and the text to be synthesized may include advertisement content, factual content, meeting content, and the like. The video to be synthesized can be a video clip of any ...

Embodiment 2

[0069] Figure 4 It is a schematic flowchart of a video synthesis method provided in Embodiment 2 of the present invention. The technical solution of this embodiment is refined on the basis of the above embodiments. Optionally, performing feature conversion on the text to be synthesized to generate an audio stream of the text to be synthesized includes: The text to be synthesized is input to the trained speech generation model, and the encoder based on the speech generation model performs feature extraction on the text to be synthesized to obtain the one-hot vector of the text to be synthesized; decoding based on the speech generation model The device converts the one-hot vector into a speech signal; based on the sequence generation sub-model of the speech generation model, inversely transforms the speech signal into a time-domain wave signal, and uses the time-domain wave signal as the to-be-synthesized The audio stream for the text. For the parts not described in detail in...

Embodiment 3

[0084] Figure 6 It is a schematic flowchart of a video synthesis method provided by Embodiment 3 of the present invention. The technical solution of this embodiment is refined on the basis of the foregoing embodiments. Specifically, the process of generating the video feature and the lip feature is refined. For the parts not described in detail in the method embodiment, please refer to the above embodiment. For details, see Figure 6 As shown, the method may include the following steps:

[0085] S310. Acquire the text to be synthesized and the video to be synthesized.

[0086] S320. Perform feature conversion on the text to be synthesized, and generate an audio stream of the text to be synthesized.

[0087] S330. Separate the video stream and the audio stream in the video to be synthesized to obtain an audio-free video stream, and combine the audio-free video stream and the audio stream of the text to be synthesized to generate an initial fusion video.

[0088] S340. Ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a video synthesis method and device, equipment and a storage medium. The obtained to-be-synthesized text can be a text in any language, the to-be-synthesized video can be a video clip selected by a user and including any anchor image, an audio stream is automatically generated according to the obtained to-be-synthesized text, and video features and lip features are generated based on an audio-free video stream in the to-be-synthesized video, audio features and mouth shape features are generated according to the audio stream, a mouth and lip mapping relation is determined based on the mouth shape features and lip features, a video sequence with consistent mouth and lip is determined according to the mouth and lip mapping relation, and further, a target synthetic video is generated according to the fused video sequence, so that human face and lip actions in the whole target synthetic video are kept consistent. The anchor lip movement in the target synthetic video is kept natural and consistent, and the target synthetic video conforming to the willingness of the user is generated, so that the watching experience of the user is improved.

Description

technical field [0001] Embodiments of the present invention relate to video processing technologies, and in particular, to a video synthesis method, device, equipment, and storage medium. Background technique [0002] The AI ​​synthesized anchor is trained by extracting the voice, lip shape, facial expressions and other features in the news broadcast video of the real anchor, and using technologies such as voice, lip shape, facial expression synthesis and deep learning to jointly model and train. This technology automatically generates audio and video streams of the corresponding content from the input Chinese and English texts, and ensures that the audio, facial expressions, and lip movements in the video are naturally consistent, showing the same information transmission effect as a real anchor. [0003] At present, the existing AI-synthesized anchors have realized 2D and 3D virtual anchors. Compared with 2D anchors, 3D anchors can support multi-camera and multi-depth of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04N5/265G10L13/04G10L13/047G10L21/10G10L25/57
CPCH04N5/265G10L13/047G10L21/10G10L25/57G10L2021/105
Inventor 徐楠郝艳妮罗引张西娜孔庆超吴晓飞曲宝玉曹家王磊
Owner 北京中科闻歌科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products