Non-parallel corpus sound conversion data enhancement model training method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A sound conversion and voice data technology, applied in voice analysis, voice recognition, instruments, etc., can solve problems such as poor audio effect, inaccurate alignment of audio and text, etc., to achieve the effect of improving the effect and ensuring the accuracy

Active Publication Date: 2021-11-02

AISPEECH CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Inaccurate alignment of audio to text resulting in poorly generated audio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0038] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

[0039] This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program mod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present application discloses a non-parallel corpus sound conversion data enhancement model training method, including: configuring sequentially connected acoustic attention layers, text attention layers and decoder modules for the data enhancement model; the acoustic attention layer includes the first A GRU layer and a first attention layer, the text attention layer includes a second GRU layer and a second attention layer; the sample source text sequence is encoded as an embedded sequence; the sample target acoustic feature sequence is input to the first GRU layer; input the embedding sequence to the first attention layer and the second attention layer to train the enhanced model. This application preserves the duration and language context contained in the source speech through the acoustic attention layer and the text attention layer, thereby ensuring the accuracy of the enhanced model trained and better used for alignment between audio and text , which can help improve the effect of voice conversion.

Description

technical field [0001] The present application relates to the technical field of speech conversion, in particular to a non-parallel corpus sound conversion data enhancement model training method and device. Background technique [0002] Voice conversion (VC) is a technology that aims to transform the audio of one speaker's speech so that it sounds as if it was spoken by another speaker without changing the content of the language. VC has great potential to be applied to a variety of tasks, e.g., customized feedback for computer-aided speech trimming systems, developing personalized teaching assistants for speech-impaired subjects, film dubbing with various human voices, etc. [0003] There are two main types of VC technology based on data conditions: parallel VC and non-parallel VC. Parallel VC techniques require the availability of parallel utterance pairs of source and target speakers. These techniques focus on developing mapping functions for source and target utterance...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/06G10L21/007G10L21/013G10L25/30

CPCG10L15/063G10L21/007G10L21/013G10L25/30G10L2021/0135G10L21/003

Inventor 俞凯李沐阳陈博陈宽吴松泽刘知峻

Owner AISPEECH CO LTD

Non-parallel corpus sound conversion data enhancement model training method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology