Rapid dubbing generation method and device

A dubbing and fast technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as expensive, no solution, no voice generation, etc., to achieve fast, real-time rapid dubbing generation, and low computational cost.

Pending Publication Date: 2020-05-19
北京中科深智科技有限公司
View PDF14 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Also, providing a new voice to such a model is very expensive as it requires recording a new dataset and retraining the model
Furthermore, existing text-to-speech model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid dubbing generation method and device
  • Rapid dubbing generation method and device
  • Rapid dubbing generation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0040] In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0041] In the implementation of the present invention, provi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a rapid dubbing generation method and device. The method comprises the following steps: constructing a dubbing generation framework including a loudspeaker encoder, a synthesizer and a vocoder, using the loudspeaker encoder for extracting embedded information from a short speech of a single speaker, using the synthesizer for generating a spectrogram from a text according tothe embedded information, and using the vocoder for inferring and outputting an audio waveform according to the spectrogram; training the dubbing generation framework in an end-to-end mode to obtaina trained dubbing generation framework model; and inputting a reference voice and a text into the trained dubbing generation framework model to realize rapid dubbing generation. The problems that theexisting text-to-speech model does not have the ability of generating speech by using any sound and is low in data efficiency are solved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a method and device for quickly dubbing generation. Background technique [0002] In many areas of applied machine learning, deep learning models have become mainstream. Text-to-speech (TTS), the process of synthesizing human speech from text prompts, is no exception. The deep model will produce more natural-sounding speech than traditional cascading methods. [0003] Professionally recorded speech datasets are a scarce resource, and to synthesize a natural voice with correct pronunciation, vivid intonation, and minimal background noise requires training data of the same quality. Secondly, data efficiency is still the core issue of deep learning. Usually training a common text-to-speech model, such as Tacotron, usually requires hundreds of hours of speech. Furthermore, providing such a model with a new voice is very expensive, as it requires recording a new dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/08G10L19/00
CPCG10L13/08G10L19/0018
Inventor 不公告发明人
Owner 北京中科深智科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products