Unlock instant, AI-driven research and patent intelligence for your innovation.

Voice conversion method, device and system, and storage medium

A voice conversion and voice technology, which is applied in voice analysis, instruments, etc., can solve the problems of delay and training time, and achieve the effects of avoiding delay, saving network training time, and improving real-time performance

Pending Publication Date: 2021-05-04
标贝(青岛)科技有限公司
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The endpoint detection network is usually built by a deep learning model. It takes a certain amount of time to train the network, and in the process of endpoint detection through the network, it is often necessary to wait until a part of the voice data is provided to determine the starting position of the effective audio signal. This will cause a certain delay

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice conversion method, device and system, and storage medium
  • Voice conversion method, device and system, and storage medium
  • Voice conversion method, device and system, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In the following description, numerous details are provided in order to provide a thorough understanding of the invention. It will be appreciated, however, to those skilled in the art that the following description is merely illustrative of preferred embodiments of the invention, and that the invention may be practiced without one or more of these details. In addition, in order to avoid confusion with the present invention, some technical features known in the art are not described in detail.

[0048] The existing speech conversion scheme based on ASR technology first extracts the acoustic features from the massive speech training data and obtains the corresponding phoneme state set from the pre-labeled text corresponding to the speech training data, and uses the deep learning model to analyze the acoustic features and phoneme The relationship between states is modeled, and the SI-ASR model is obtained through training. Subsequently, the trained SI-ASR model can be uti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a voice conversion method, device and system, and a storage medium. The method comprises the following steps: acquiring source voice of a source speaker; performing feature extraction on the source voice; inputting the source recognition acoustic features into a speech recognition model to obtain a speech posterior probability of the source speaker; inputting the posterior probability vectors corresponding to at least part of the time frames in the multiple time frames into a feature conversion model to obtain target synthetic acoustic features, wherein the target synthetic acoustic features comprise synthetic acoustic feature vectors in one-to-one correspondence with the at least part of the time frames, and the at least part of the time frames comprise all effective time frames in the multiple time frames; performing speech synthesis based on the effective acoustic features to obtain effective speech of the target speaker; and enabling the speech recognition model or the feature conversion model to also output source audio state information, and determining whether each of the plurality of time frames belongs to a valid time frame or an invalid time frame based on the source audio state information. The joint modeling mode can effectively improve the real-time performance of voice conversion.

Description

technical field [0001] The present invention relates to the technical field of voice signal processing, in particular to a voice conversion method, device, system and storage medium. Background technique [0002] In the field of speech signal processing, speech conversion (that is, speech timbre conversion) technology is an important research direction at present. Voice transformation aims to modify the timbre of an arbitrary speaker, converting it to that of a fixed speaker, while keeping the utterance unchanged. Speech conversion involves front-end signal processing, speech recognition and speech synthesis technologies. A speech conversion system based on Automatic Speech Recognition (ASR) technology can extract speaker-independent features from any source input speech, and then convert the voice with the timbre of the specified target speaker through the feature conversion model and vocoder. [0003] The existing voice conversion technology usually inputs the source voi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/013G10L25/27
CPCG10L21/013G10L25/27G10L2021/0135
Inventor 武剑桃李秀林
Owner 标贝(青岛)科技有限公司