Check patentability & draft patents in minutes with Patsnap Eureka AI!

Vocoder training method, terminal and storage medium

A vocoder and acoustic model technology, applied in the Internet field, can solve problems such as unrecognizable spectrum data and vocoder mismatch

Pending Publication Date: 2021-09-07
TENCENT MUSIC ENTERTAINMENT TECH SHENZHEN CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since the vocoder is trained based on the spectral data obtained from real sounds, and the spectral data input into the trained vocoder during actual use is only the spectral data similar to real sounds obtained by the acoustic model based on phoneme sequences and pause information, it is not It is the spectral data of the real sound, which causes a mismatch between the trained acoustic model and the trained vocoder, which may cause the vocoder to be unable to recognize the spectral data obtained by the acoustic model, making the "rustle" in the synthetic sound the sound of

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vocoder training method, terminal and storage medium
  • Vocoder training method, terminal and storage medium
  • Vocoder training method, terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0081] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0082] figure 1 It is a schematic diagram of an implementation environment of a method for training a vocoder provided in an embodiment of the present application. Such as figure 1 As shown, the method can be implemented by the terminal 101 or the server 102.

[0083] The terminal 101 may include components such as a processor and a memory. The processor, which can be a CPU (Central Processing Unit, central processing unit), etc., can be used to obtain the time domain data of the sample audio, determine the first spectral data corresponding to the reference time domain data, and input the first spectral data into the trained acoustic model In the self-attention learning module, the second spectral data is obtained, the second ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a vocoder training method, a terminal and a storage medium, and belongs to the technical field of Internet. The method comprises the following steps: acquiring time domain data of a sample audio as reference time domain data; determining first frequency spectrum data corresponding to the reference time domain data, and inputting the first frequency spectrum data into a self-attention learning module in a trained acoustic model to obtain second frequency spectrum data; inputting the second frequency spectrum data into a vocoder to obtain predicted time domain data; and training the vocoder based on the predicted time domain data and the reference time domain data. According to the invention, the matching degree of the trained vocoder and the trained acoustic model obtained based on the method is higher than the matching degree of the trained acoustic model and the trained vocoder in the prior art, and the sand sound existing in the synthetic sound is reduced to a certain extent.

Description

technical field [0001] The present application relates to the technical field of the Internet, in particular to a method for training a vocoder, a terminal and a storage medium. Background technique [0002] With the continuous development of Internet technology, when people read novels, they often read the content of novels through AI models. [0003] In related technologies, the AI ​​model actually consists of a phoneme conversion model, a pause prediction model, an acoustic model, and a vocoder. The specific process of applying these models to obtain the target text is as follows: input the target text into the phoneme conversion model and the pause prediction model respectively, and obtain the phoneme sequence and pause information, which includes pause position and pause duration. Input the phoneme sequence and pause information into the trained acoustic model to obtain spectrum data. The frequency spectrum data is input into the trained vocoder to obtain the target t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L19/16G06N3/08G06N3/04
CPCG10L19/16G06N3/08G06N3/044G06N3/045
Inventor 徐东
Owner TENCENT MUSIC ENTERTAINMENT TECH SHENZHEN CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More