Unlock instant, AI-driven research and patent intelligence for your innovation.

Non-parallel corpus sound conversion data enhancement model training method and device

A sound conversion and voice data technology, applied in voice analysis, voice recognition, instruments, etc., can solve problems such as poor audio effect, inaccurate alignment of audio and text, etc., to achieve the effect of improving the effect and ensuring the accuracy

Active Publication Date: 2021-11-02
AISPEECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Inaccurate alignment of audio to text resulting in poorly generated audio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Non-parallel corpus sound conversion data enhancement model training method and device
  • Non-parallel corpus sound conversion data enhancement model training method and device
  • Non-parallel corpus sound conversion data enhancement model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0038] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

[0039] This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program mod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application discloses a non-parallel corpus sound conversion data enhancement model training method, including: configuring sequentially connected acoustic attention layers, text attention layers and decoder modules for the data enhancement model; the acoustic attention layer includes the first A GRU layer and a first attention layer, the text attention layer includes a second GRU layer and a second attention layer; the sample source text sequence is encoded as an embedded sequence; the sample target acoustic feature sequence is input to the first GRU layer; input the embedding sequence to the first attention layer and the second attention layer to train the enhanced model. This application preserves the duration and language context contained in the source speech through the acoustic attention layer and the text attention layer, thereby ensuring the accuracy of the enhanced model trained and better used for alignment between audio and text , which can help improve the effect of voice conversion.

Description

technical field [0001] The present application relates to the technical field of speech conversion, in particular to a non-parallel corpus sound conversion data enhancement model training method and device. Background technique [0002] Voice conversion (VC) is a technology that aims to transform the audio of one speaker's speech so that it sounds as if it was spoken by another speaker without changing the content of the language. VC has great potential to be applied to a variety of tasks, e.g., customized feedback for computer-aided speech trimming systems, developing personalized teaching assistants for speech-impaired subjects, film dubbing with various human voices, etc. [0003] There are two main types of VC technology based on data conditions: parallel VC and non-parallel VC. Parallel VC techniques require the availability of parallel utterance pairs of source and target speakers. These techniques focus on developing mapping functions for source and target utterance...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L21/007G10L21/013G10L25/30
CPCG10L15/063G10L21/007G10L21/013G10L25/30G10L2021/0135G10L21/003
Inventor 俞凯李沐阳陈博陈宽吴松泽刘知峻
Owner AISPEECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More