Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice conversion model training method and device

A voice conversion and training method technology, applied in voice analysis, voice recognition, instruments, etc., can solve problems such as difficult speech for users, difficulty in obtaining parallel corpus, and influence on voice conversion effect, so as to achieve good conversion effect and ensure accuracy Effect

Pending Publication Date: 2021-12-07
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people, but in actual application scenarios, it is difficult for users to record voices with specific content, so the acquisition of parallel corpus in the real environment is difficult. It is more difficult, which will affect the subsequent voice conversion effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice conversion model training method and device
  • Voice conversion model training method and device
  • Voice conversion model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The technical solutions of the embodiments of this specification will be described in detail below in conjunction with the accompanying drawings.

[0063] The embodiment of this specification discloses a training method and device for a speech conversion model. The following first introduces the application scenarios and inventive concepts of the training method for a speech conversion model, as follows:

[0064] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people. However, in actual application scenarios, it is difficult for users to record voices with specific content. It is more difficult, and the voice conversion model is trained with less parallel corpus, and the voice conversion effect is not good enough.

[0065] In view of this, the embodiment of this specification provides a kind of training method of voice conversion model, such as figure 1 Shown is a schematic diagr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a voice conversion model training method and device, and the method comprises the steps: carrying out the feature extraction of a sample audio, and obtaining a Mel spectrum feature tag and a fundamental frequency sequence; inputting the Mel spectrum feature tag into an encoder to obtain a first content vector; inputting the first content vector into a bottleneck layer to obtain a current codebook vector and a second content vector; determining a first loss value based on the first content vector and the current codebook vector; inputting the first content vector into a perceptron layer to obtain an emission probability of each character or blank character corresponding to the first content vector; determining a second loss value based on the transcriptional text tag and the emission probability of the sample audio; inputting the normalized fundamental frequency sequence, the second content vector and the speaker tag of the sample audio into a decoder to obtain predicted Mel spectrum features; determining a third loss value based on the Mel spectrum feature tag and the predicted Mel spectrum feature; and training a voice conversion model by taking minimization of the loss value as a target.

Description

technical field [0001] This description relates to the technical field of speech processing, in particular to a method and device for training a speech conversion model. Background technique [0002] Voice Conversion (Voice Conversion, VC) is a relatively popular research topic in recent years. It is a process of converting a person's voice into another person's timbre on the basis of retaining complete content information. Speech conversion belongs to a technical field of speech synthesis, and since it focuses on the conversion of identity information in speech, it is one of the challenging research problems in speech signal processing. [0003] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people. However, in actual application scenarios, it is difficult for users to record voices with specific content. It is more difficult, which in turn affects the subsequent voice conversion ef...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/24G10L25/30G10L15/26G10L15/06
CPCG10L25/24G10L25/30G10L15/26G10L15/063
Inventor 张鹏远陈子毅颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products