Voice conversion model training method and device

A voice conversion and training method technology, applied in voice analysis, voice recognition, instruments, etc., can solve problems such as difficult speech for users, difficulty in obtaining parallel corpus, and influence on voice conversion effect, so as to achieve good conversion effect and ensure accuracy Effect

Pending Publication Date: 2021-12-07
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people, but in actual application scenarios, it is difficult

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice conversion model training method and device
  • Voice conversion model training method and device
  • Voice conversion model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0062] Below with the accompanying drawings, technical solution of the embodiment described in detail in this specification.

[0063] Example embodiments disclosed in this specification training method and apparatus for voice conversion model, the following inventive concepts of the application scenario and voice conversion training method is first described model, as follows:

[0064] The current speech technology depends on the parallel corpus, the speech content of the same information that is recorded by different people, but in the actual application scenario, the user is difficult to record the specific content of the speech, and thus obtain parallel corpus in the real environment of difficult, with less training speech corpus parallel conversion model, the effect of converting their voice is not good enough.

[0065] In view of this, the present embodiment provides a manual method of voice conversion training model, such as figure 1 , The present specification discloses a s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a voice conversion model training method and device, and the method comprises the steps: carrying out the feature extraction of a sample audio, and obtaining a Mel spectrum feature tag and a fundamental frequency sequence; inputting the Mel spectrum feature tag into an encoder to obtain a first content vector; inputting the first content vector into a bottleneck layer to obtain a current codebook vector and a second content vector; determining a first loss value based on the first content vector and the current codebook vector; inputting the first content vector into a perceptron layer to obtain an emission probability of each character or blank character corresponding to the first content vector; determining a second loss value based on the transcriptional text tag and the emission probability of the sample audio; inputting the normalized fundamental frequency sequence, the second content vector and the speaker tag of the sample audio into a decoder to obtain predicted Mel spectrum features; determining a third loss value based on the Mel spectrum feature tag and the predicted Mel spectrum feature; and training a voice conversion model by taking minimization of the loss value as a target.

Description

technical field [0001] This description relates to the technical field of speech processing, in particular to a method and device for training a speech conversion model. Background technique [0002] Voice Conversion (Voice Conversion, VC) is a relatively popular research topic in recent years. It is a process of converting a person's voice into another person's timbre on the basis of retaining complete content information. Speech conversion belongs to a technical field of speech synthesis, and since it focuses on the conversion of identity information in speech, it is one of the challenging research problems in speech signal processing. [0003] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people. However, in actual application scenarios, it is difficult for users to record voices with specific content. It is more difficult, which in turn affects the subsequent voice conversion ef...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L25/24G10L25/30G10L15/26G10L15/06
CPCG10L25/24G10L25/30G10L15/26G10L15/063
Inventor 张鹏远陈子毅颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products