Method and system for providing sound-library hybrid training model

A technology for training models and sound libraries, applied in the direction of dot-dash line transmission devices, etc., can solve the problem of high cost, reduce requirements, easily train the model process, and reduce costs.

Active Publication Date: 2012-10-24
BEIJING SINOVOICE TECH CO LTD
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the relatively high quality requirements for the speaker's

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for providing sound-library hybrid training model
  • Method and system for providing sound-library hybrid training model
  • Method and system for providing sound-library hybrid training model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In view of the deficiencies in the prior art, the present invention proposes a sound bank mixed training model method, which can solve some or all of the aforementioned problems, and can establish a relatively stable model. The method of a mixed training model provided by the present invention: firstly select several speakers to record sound banks, and when training the model, mix multiple sound banks to train the model, that is, put the sound bank data of several speakers together for training. The advantage is that training with multiple speakers will blur the shortcomings of a single speaker, and the final trained model tends to be an average of multiple speakers, thus obtaining a more stable model. Secondly, each speaker has its own characteristics, through mixed training, different advantages can be combined. Third, the parameter characteristics of real speakers are not optimal, and training with multiple speakers can significantly optimize the speech synthesis eff...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for providing a sound-library hybrid training model, which comprises the following steps: according to a selected sound-recording corpus used as a sample, acquiring sound signals of at least two speakers so as to obtain at least two sets of sound-recording data; extracting the parameter information of a sound from each set of sound-recording data, wherein the parameter information of the sound comprises at least one of pitch, spectrum and duration; and carrying out statistical analysis on the sound parameters so as to obtain a parameter model. The invention also discloses a corresponding system for providing a sound-library hybrid training model. According to the invention, based on the existing sound synthesis technology, in the process of model training, a plurality of sound-library hybrid training models, namely, the sound library data of a plurality of speakers, are placed together and trained, and finally, the trained model tends to the average parameter of a plurality of speakers or the optimal parameter of a single speaker, thereby obtaining a relatively stable model. By using the method and system disclosed by the invention, the requirements on speakers can be reduced, and the cost of sound recording can be reduced; meanwhile, the model training process can be completed more easily, so that the synthetic sound is more natural.

Description

technical field [0001] The invention relates to the technical field, in particular to a method and a system for providing a sound bank mixed training model. Background technique [0002] Speech synthesis is an important technology to realize natural and efficient human-computer interaction. Speech synthesis technology is TTS. Simply put, it is to let the computer "speak", which is to use the computer to convert any combination of text files into sound files, and output the sound through multimedia devices, that is, to automatically convert any text into voice information and play it to the audience. user. There are two most common speech synthesis methods today, one is a synthesis method based on unit selection and waveform splicing, and the other is a parametric synthesis method based on an acoustic statistical model. [0003] In the traditional unit selection algorithm, the target cost and connection cost are often realized by calculating the difference of the context at...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L15/06
Inventor 李健郑晓明张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products