System and method for training cloned tone and rhythm based on Bottleneck features

A feature training and training method technology, applied in speech synthesis, speech analysis, speech recognition, etc., can solve the problems of delays that cannot meet the market's response, large labor costs, etc., and achieve the effect of shortening the production cycle and reducing clone samples

Active Publication Date: 2020-05-29
NANJING SILICON INTELLIGENCE TECH CO LTD
View PDF11 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the rapid development of the telephone robot business market, the rapid increase in the volume of intelligent voice services has brought great difficulties to customized speech synthesis technology services (TTS). A set of customized speech synthesis technology service

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for training cloned tone and rhythm based on Bottleneck features
  • System and method for training cloned tone and rhythm based on Bottleneck features
  • System and method for training cloned tone and rhythm based on Bottleneck features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0034] Such as figure 1 As shown, the present invention provides a system based on Bottleneck feature training clone timbre and rhythm, including:

[0035] (1) The data collection module is used to collect speech recognition module (ASR Model) corpus, prosody module (TTB Model) basic TTB model corpus, multi-speaker acoustic model (Multi-speaker Acoustic Model) corpus, clone corpus (target user's audio and corresponding text);

[0036] (2) Acoustic feature extraction module, which extracts linear predictive coding features (LPC Feature) and Mel frequency cepstral coefficients (Mfcc) as acoustic fe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of voice synthesis, voice recognition and voice cloning, and provides a voice cloning implementation scheme based on Bottleneck features (language featuresof audio) by combining a voice synthesis technology, a voice recognition technology and a transfer learning technology. A training system and a training method are included. The TTS service with highnaturalness and similarity is provided by using a small number of samples, so that the TTS service with target user characteristics is provided, and problems of large service sample size, long manufacturing period and high labor cost of a voice synthesis technology are solved. The training system comprises a data acquisition module, an acoustic feature extraction module, a voice recognition module, a rhythm module, a multi-person voice acoustic module and a voice synthesis module. The invention further provides a training method based on the system. The training method comprises the steps oftraining corpus preparation, acoustic feature extraction, training and fine adjustment of all modules and speech synthesis.

Description

technical field [0001] The invention relates to the fields of speech synthesis technology (TTS), speech recognition technology (ASR), and sound cloning technology, and belongs to the field of artificial intelligence-intelligent speech. Background technique [0002] With the rapid development of the telephone robot business market, the rapid increase in the volume of intelligent voice services has brought great difficulties to customized speech synthesis technology services (TTS). A set of customized speech synthesis technology services (TTS) requires nearly 10,000 For real recording samples, the production cycle from sample collection, data labeling, data preprocessing, model training to service provision is nearly one month, and requires a lot of labor costs. This delay cannot meet the market's response. Currently, TTS mainly includes two technical solutions: staged speech synthesis and end-to-end speech synthesis. The purpose of timbre and rhythm cloning is to synthesize ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/02G10L15/02G10L15/06G10L15/16G10L25/03G10L25/24G10L25/30G10L25/12
CPCG10L13/02G10L15/02G10L15/063G10L15/16G10L25/03G10L25/12G10L25/24G10L25/30
Inventor 司马华鹏龚雪飞
Owner NANJING SILICON INTELLIGENCE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products