Voice conversion model training method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice conversion and training method technology, applied in voice analysis, voice recognition, instruments, etc., can solve problems such as difficult speech for users, difficulty in obtaining parallel corpus, and influence on voice conversion effect, so as to achieve good conversion effect and ensure accuracy Effect

Pending Publication Date: 2021-12-07

INST OF ACOUSTICS CHINESE ACAD OF SCI

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people, but in actual application scenarios, it is difficult for users to record voices with specific content, so the acquisition of parallel corpus in the real environment is difficult. It is more difficult, which will affect the subsequent voice conversion effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0062] The technical solutions of the embodiments of this specification will be described in detail below in conjunction with the accompanying drawings.

[0063] The embodiment of this specification discloses a training method and device for a speech conversion model. The following first introduces the application scenarios and inventive concepts of the training method for a speech conversion model, as follows:

[0064] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people. However, in actual application scenarios, it is difficult for users to record voices with specific content. It is more difficult, and the voice conversion model is trained with less parallel corpus, and the voice conversion effect is not good enough.

[0065] In view of this, the embodiment of this specification provides a kind of training method of voice conversion model, such as figure 1 Shown is a schematic diagr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a voice conversion model training method and device, and the method comprises the steps: carrying out the feature extraction of a sample audio, and obtaining a Mel spectrum feature tag and a fundamental frequency sequence; inputting the Mel spectrum feature tag into an encoder to obtain a first content vector; inputting the first content vector into a bottleneck layer to obtain a current codebook vector and a second content vector; determining a first loss value based on the first content vector and the current codebook vector; inputting the first content vector into a perceptron layer to obtain an emission probability of each character or blank character corresponding to the first content vector; determining a second loss value based on the transcriptional text tag and the emission probability of the sample audio; inputting the normalized fundamental frequency sequence, the second content vector and the speaker tag of the sample audio into a decoder to obtain predicted Mel spectrum features; determining a third loss value based on the Mel spectrum feature tag and the predicted Mel spectrum feature; and training a voice conversion model by taking minimization of the loss value as a target.

Description

technical field [0001] This description relates to the technical field of speech processing, in particular to a method and device for training a speech conversion model. Background technique [0002] Voice Conversion (Voice Conversion, VC) is a relatively popular research topic in recent years. It is a process of converting a person's voice into another person's timbre on the basis of retaining complete content information. Speech conversion belongs to a technical field of speech synthesis, and since it focuses on the conversion of identity information in speech, it is one of the challenging research problems in speech signal processing. [0003] The current voice conversion technology relies on parallel corpus, that is, voices with the same content information recorded by different people. However, in actual application scenarios, it is difficult for users to record voices with specific content. It is more difficult, which in turn affects the subsequent voice conversion ef...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L25/24G10L25/30G10L15/26G10L15/06

CPCG10L25/24G10L25/30G10L15/26G10L15/063

Inventor 张鹏远陈子毅颜永红

Owner INST OF ACOUSTICS CHINESE ACAD OF SCI

Voice conversion model training method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology