Supercharge Your Innovation With Domain-Expert AI Agents!

N-element non-autoregressive speech synthesis method and device and electronic equipment

A speech synthesis and autoregressive technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of high complexity and fitting ability of Mel spectrum decoder, requirements, etc., to reduce work requirements and improve robustness Effect

Pending Publication Date: 2022-01-11
北京百舸飞驰科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention at least aims to solve the problem that the existing non-autoregressive model-based scheme has higher requirements on the complexity and fitting ability of the Mel spectrum decoder

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • N-element non-autoregressive speech synthesis method and device and electronic equipment
  • N-element non-autoregressive speech synthesis method and device and electronic equipment
  • N-element non-autoregressive speech synthesis method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0046] figure 1 is a schematic flow chart of an N-ary non-autoregressive speech synthesis method, such as figure 1 As shown, the methods include:

[0047] S101. Acquire text content to be synthesized, and perform standardization processing on the text content to be synthesized.

[0048] In this embodiment, since the input text content contains a lot of non-standard text content, such as punctuation, numbers, symbols and other words or words, the N-element non-autoregressive speech model cannot convert these non-standard text information It is a mel spectrum, so the text content needs to be normalized.

[0049] On the basis of the above technical solution, further, the flow chart of standardizing the text content to be synthesized is as follows: figure 2 shown, including:

[0050] S1011. Segment the to-be-synthesized text content into short sentences.

[0051] In this embodiment, the input text content is usually the content of the entire paragraph, including multiple lon...

Embodiment 1

[0079] S201. Input the text "ZYB".

[0080] S202. Standardize "ZYB".

[0081] Here, as an example, it is assumed that the content of the input text “ZYB” remains unchanged before and after normalization, so the pronunciation sequence after normalization is “ZYB”.

[0082] S203. The text feature encoding module receives the pronunciation sequence, and outputs the text encoding features "Z', Y', B'".

[0083] S204. The frame number prediction module predicts the frame number value of the pronunciation sequence.

[0084] In this embodiment, it is assumed that the input text "ZYB" and the final synthesized audio corresponding to the spectrogram corresponding to the Meltep exact frame number (pfn, precise frame number) are respectively: pfn1=2, pfn2=4, pfn3=4 .

[0085] Assuming that N is 2, then when the frame number prediction module is predicting, the predicted frame numbers (rfn, roughth frame number) corresponding to the ZYB pronunciation sequence are:

[0086]

[0087]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data information processing, and provides an N-element non-autoregressive speech synthesis method and device, electronic equipment and a recording medium, and the method comprises the steps: obtaining to-be-synthesized text content, and carrying out the standardization processing of the to-be-synthesized text content; inputting the standardized text content to be synthesized into a voice model based on N-element non-autoregression, outputting a refined Mel spectrum, wherein N is a natural integer; and converting the refined Mel spectrum into a speech synthesis file. According to the method, the prediction task of the conventional non-autoregression network is reduced to 1 / n of the original prediction task, so that the working requirement of the Mel feature decoder is greatly reduced, and meanwhile, the robustness of the model is also improved.

Description

technical field [0001] The invention belongs to the technical field of data information processing, and is particularly suitable for data information processing in online education services, and more specifically relates to an N-ary non-autoregressive speech synthesis method, device and electronic equipment. Background technique [0002] Speech synthesis is a technology that converts text information generated by the computer itself or externally input text information into human-understandable speech and outputs it. Speech synthesis technology has developed rapidly in recent years, and the speech synthesis technology based on neural network makes the synthesized audio effect very close to the real human voice. Speech synthesis schemes based on neural networks mainly include two types: one is based on an autoregressive model, and the other is based on a non-autoregressive model. [0003] The current scheme based on the autoregressive model is based on the generated speech w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/24G10L25/30G10L19/04
CPCG10L13/047G10L13/04G10L13/08G10L25/24G10L25/30G10L19/04
Inventor 付涛王鹏王强强宋旸
Owner 北京百舸飞驰科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More