A system and method for training clone timbre and rhythm based on bottle neck features

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A feature training and training method technology, applied in speech recognition technology, voice cloning, speech synthesis technology, artificial intelligence-intelligent speech field, can solve the delay that cannot meet the market response, a lot of labor costs, speech synthesis technology service difficulties, etc. problems, to achieve the effect of shortening the production cycle and reducing the number of corpora

Active Publication Date: 2021-08-03

NANJING SILICON INTELLIGENCE TECH CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] With the rapid development of the telephone robot business market, the rapid increase in the volume of intelligent voice services has brought great difficulties to customized speech synthesis technology services (TTS). A set of customized speech synthesis technology services (TTS) requires nearly 10,000 For real recording samples, the production cycle from sample collection, data labeling, data preprocessing, model training to service provision is nearly one month, and requires a lot of labor costs. This delay cannot meet the response of the market

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0034] like figure 1 As shown, the present invention provides a system based on Bottle neck feature training clone timbre and rhythm, including:

[0035] (1) Data acquisition module, used to collect speech recognition module (ASR Model) corpus, prosody module (TTTBModel) basic TTB model corpus, multi-speaker acoustic model (Multi-speaker Acoustic Model) corpus, clone corpus (audio of target user) and corresponding text);

[0036] (2) Acoustic feature extraction module, extracting linear predictive coding feature (LPC Feature) and Mel frequency cepstral coefficient (Mfcc) as acoustic feature;

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention relates to the technical fields of speech synthesis, speech recognition, and sound cloning. The present invention combines speech synthesis technology, speech recognition technology, and transfer learning technology to provide a sound cloning implementation scheme based on Bottleneck features (language features of audio), including a training system and training methods; use a small number of samples to provide TTS services with high naturalness and similarity, so as to provide TTS services with target user characteristics, and solve the problems of large sample size, long production cycle, and high labor cost of speech synthesis technology services. The training system includes: a data acquisition module, an acoustic feature extraction module, a speech recognition module, a prosody module, a multi-person speech acoustic module, and a speech synthesis module; the present invention also provides a training method based on the above-mentioned system, including preparing training corpus, acoustic feature extraction , training and fine-tuning of each module, and speech synthesis.

Description

technical field [0001] The invention relates to the fields of speech synthesis technology (TTS), speech recognition technology (ASR), and sound cloning technology, and belongs to the field of artificial intelligence-intelligent speech. Background technique [0002] With the rapid development of the telephone robot business market, the rapid increase in the volume of intelligent voice services has brought great difficulties to customized speech synthesis technology services (TTS). A set of customized speech synthesis technology services (TTS) requires nearly 10,000 For real recording samples, the production cycle from sample collection, data labeling, data preprocessing, model training to service provision is nearly one month, and requires a lot of labor costs. This delay cannot meet the market's response. Currently, TTS mainly includes two technical solutions: staged speech synthesis and end-to-end speech synthesis. The purpose of timbre and rhythm cloning is to synthesize ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G10L13/02G10L15/02G10L15/06G10L15/16G10L25/03G10L25/24G10L25/30G10L25/12

CPCG10L13/02G10L15/02G10L15/063G10L15/16G10L25/03G10L25/12G10L25/24G10L25/30

Inventor司马华鹏龚雪飞

OwnerNANJING SILICON INTELLIGENCE TECH CO LTD

A system and method for training clone timbre and rhythm based on bottle neck features

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology