N-element non-autoregressive speech synthesis method and device and electronic equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech synthesis and autoregressive technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of high complexity and fitting ability of Mel spectrum decoder, requirements, etc., to reduce work requirements and improve robustness Effect

Pending Publication Date: 2022-01-11

北京百舸飞驰科技有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The present invention at least aims to solve the problem that the existing non-autoregressive model-based scheme has higher requirements on the complexity and fitting ability of the Mel spectrum decoder

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0046] figure 1 is a schematic flow chart of an N-ary non-autoregressive speech synthesis method, such as figure 1 As shown, the methods include:

[0047] S101. Acquire text content to be synthesized, and perform standardization processing on the text content to be synthesized.

[0048] In this embodiment, since the input text content contains a lot of non-standard text content, such as punctuation, numbers, symbols and other words or words, the N-element non-autoregressive speech model cannot convert these non-standard text information It is a mel spectrum, so the text content needs to be normalized.

[0049] On the basis of the above technical solution, further, the flow chart of standardizing the text content to be synthesized is as follows: figure 2 shown, including:

[0050] S1011. Segment the to-be-synthesized text content into short sentences.

[0051] In this embodiment, the input text content is usually the content of the entire paragraph, including multiple lon...

Embodiment 1

[0079] S201. Input the text "ZYB".

[0080] S202. Standardize "ZYB".

[0081] Here, as an example, it is assumed that the content of the input text “ZYB” remains unchanged before and after normalization, so the pronunciation sequence after normalization is “ZYB”.

[0082] S203. The text feature encoding module receives the pronunciation sequence, and outputs the text encoding features "Z', Y', B'".

[0083] S204. The frame number prediction module predicts the frame number value of the pronunciation sequence.

[0084] In this embodiment, it is assumed that the input text "ZYB" and the final synthesized audio corresponding to the spectrogram corresponding to the Meltep exact frame number (pfn, precise frame number) are respectively: pfn1=2, pfn2=4, pfn3=4 .

[0085] Assuming that N is 2, then when the frame number prediction module is predicting, the predicted frame numbers (rfn, roughth frame number) corresponding to the ZYB pronunciation sequence are:

[0086]

[0087]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of data information processing, and provides an N-element non-autoregressive speech synthesis method and device, electronic equipment and a recording medium, and the method comprises the steps: obtaining to-be-synthesized text content, and carrying out the standardization processing of the to-be-synthesized text content; inputting the standardized text content to be synthesized into a voice model based on N-element non-autoregression, outputting a refined Mel spectrum, wherein N is a natural integer; and converting the refined Mel spectrum into a speech synthesis file. According to the method, the prediction task of the conventional non-autoregression network is reduced to 1 / n of the original prediction task, so that the working requirement of the Mel feature decoder is greatly reduced, and meanwhile, the robustness of the model is also improved.

Description

technical field [0001] The invention belongs to the technical field of data information processing, and is particularly suitable for data information processing in online education services, and more specifically relates to an N-ary non-autoregressive speech synthesis method, device and electronic equipment. Background technique [0002] Speech synthesis is a technology that converts text information generated by the computer itself or externally input text information into human-understandable speech and outputs it. Speech synthesis technology has developed rapidly in recent years, and the speech synthesis technology based on neural network makes the synthesized audio effect very close to the real human voice. Speech synthesis schemes based on neural networks mainly include two types: one is based on an autoregressive model, and the other is based on a non-autoregressive model. [0003] The current scheme based on the autoregressive model is based on the generated speech w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/24G10L25/30G10L19/04

CPCG10L13/047G10L13/04G10L13/08G10L25/24G10L25/30G10L19/04

Inventor 付涛王鹏王强强宋旸

Owner 北京百舸飞驰科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

N-element non-autoregressive speech synthesis method and device and electronic equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology