Rapid dubbing generation method and device

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A dubbing and fast technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as expensive, no solution, no voice generation, etc., to achieve fast, real-time rapid dubbing generation, and low computational cost.

Pending Publication Date: 2020-05-19

北京中科深智科技有限公司

View PDF14 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Also, providing a new voice to such a model is very expensive as it requires recording a new dataset and retraining the model

Furthermore, existing text-to-speech models do not have the ability to generate speech from any voice, i.e. dubbing generation capability

[0004] For the above problems, no effective solution has been proposed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0040] In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0041] In the implementation of the present invention, provi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a rapid dubbing generation method and device. The method comprises the following steps: constructing a dubbing generation framework including a loudspeaker encoder, a synthesizer and a vocoder, using the loudspeaker encoder for extracting embedded information from a short speech of a single speaker, using the synthesizer for generating a spectrogram from a text according tothe embedded information, and using the vocoder for inferring and outputting an audio waveform according to the spectrogram; training the dubbing generation framework in an end-to-end mode to obtaina trained dubbing generation framework model; and inputting a reference voice and a text into the trained dubbing generation framework model to realize rapid dubbing generation. The problems that theexisting text-to-speech model does not have the ability of generating speech by using any sound and is low in data efficiency are solved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a method and device for quickly dubbing generation. Background technique [0002] In many areas of applied machine learning, deep learning models have become mainstream. Text-to-speech (TTS), the process of synthesizing human speech from text prompts, is no exception. The deep model will produce more natural-sounding speech than traditional cascading methods. [0003] Professionally recorded speech datasets are a scarce resource, and to synthesize a natural voice with correct pronunciation, vivid intonation, and minimal background noise requires training data of the same quality. Secondly, data efficiency is still the core issue of deep learning. Usually training a common text-to-speech model, such as Tacotron, usually requires hundreds of hours of speech. Furthermore, providing such a model with a new voice is very expensive, as it requires recording a new dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/08G10L19/00

CPCG10L13/08G10L19/0018

Inventor不公告发明人

Owner北京中科深智科技有限公司

Rapid dubbing generation method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology