Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Double-flow voice conversion method, device and equipment and storage medium

A voice conversion and dual-stream technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of unable to generate high-quality mel spectrum information, and achieve the effects of accelerating convergence speed, increasing nonlinearity, and improving processing speed

Pending Publication Date: 2021-09-24
PING AN TECH (SHENZHEN) CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a dual-stream speech conversion method, device, equipment and storage medium, which are used to solve the problem that Mel spectrum information cannot be quickly generated into high-quality speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Double-flow voice conversion method, device and equipment and storage medium
  • Double-flow voice conversion method, device and equipment and storage medium
  • Double-flow voice conversion method, device and equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] The embodiment of the present invention provides a dual-stream voice conversion method, device, equipment and storage medium, which solves the problem that the mel spectrum information cannot be quickly generated into high-quality voice.

[0068] The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and not necessarily Used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the term "comprising" or "having" and any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to those expl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of artificial intelligence, provides a double-flow voice conversion method, device and equipment and a storage medium, and is used for solving the problem that Mel spectrum information cannot be quickly generated into high-quality voice. The double-flow voice conversion method comprises the steps: sampling Mel spectrum information based on speech signals through a target double-flow voice synthesis model to obtain sampled data, wherein the target double-flow voice synthesis model comprises a vector processing layer, a double-flow-based affine coupling layer and a normalization layer; carrying out vector processing and out-of-order processing on the sampled data through a vector processing layer to obtain a to-be-processed vector; performing double-flow affine transformation on the to-be-processed vector through a double-flow-based affine coupling layer to obtain a converted voice feature vector; and performing weighted normalization on the converted voice feature vector through a normalization layer based on a target learning variable value to obtain target voice. In addition, the invention also relates to a block chain technology, and the Mel spectrum information based on the voice signal can be stored in a block chain.

Description

technical field [0001] The invention relates to the field of speech synthesis of artificial intelligence, in particular to a dual-stream speech conversion method, device, equipment and storage medium. Background technique [0002] Voice conversion, as an important technical means of personalized voice generation, is widely used in various industries. At present, speech conversion methods are generally implemented by learning a text-to-speech parameter through a neural network, which is usually divided into two steps: the first step is to convert the text into time-aligned features, such as Mel mel-spectrum, center frequency F0 - frequency and other linguistic features, and the second step converts these time-aligned features into audio samples via an autoregressive vocoder or a non-autoregressive vocoder. [0003] However, the above-mentioned voice conversion method has the problems that the real-time reasoning process of voice conversion is complicated and the conversion t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/08G10L25/18G10L25/30
CPCG10L13/08G10L25/18G10L25/30
Inventor 张旭龙王健宗
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products