Supercharge Your Innovation With Domain-Expert AI Agents!

Statistical Enhancement of Speech Output from Statistical Text to Speech Synthesis Systems

A technology of speech and statistical models, applied in the field according to the third aspect of the present invention

Inactive Publication Date: 2016-04-13
INT BUSINESS MASCH CORP
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The formant blurring effect has been known in the field of speech coding for many years, however in HMMTTS this effect has a stronger negative impact on the perceived quality of the output

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Statistical Enhancement of Speech Output from Statistical Text to Speech Synthesis Systems
  • Statistical Enhancement of Speech Output from Statistical Text to Speech Synthesis Systems
  • Statistical Enhancement of Speech Output from Statistical Text to Speech Synthesis Systems

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0101] In the case of an exponentially corrected homomorphic filter function (9) and an enhancement criterion (14), the calculation of the optimal enhancement parameter α (7) can be achieved by log-linear regression:

[0102] log α opt = Σ n n · log R ( n ) / Σ n n 2 R ( n ) = ...

example 2

[0105] In the case of two connectivity indices (11) and augmentation criterion (14), the optimal augmentation parameter set can be computed as follows: The values ​​of fixed junction points γ, α and β can be computed as:

[0106] log α ( γ ) = Σ n ≤ γ n · log R ( n ) / Σ n ≤ γ n ...

example 3

[0114] In the case of the exponentially corrected homomorphic filter function (9) and the enhancement criterion (15), the optimal value of the exponential base α can be obtained by solving the following equation:

[0115] Σ n α 2 n · n 2 · H syn ( n ) = Σ n n 2 · M real 2 ( n ) , α > 0 - - - ( 20 )

[0116] The left side of (20) is an infinite monotonically increasing function of α, which is smaller than the right side value for α=0. Therefore, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for enhancing speech synthesized by a statistical text-to-speech (TTS) system that employs a parametric representation of the speech in an acoustic feature vector space is described. The method comprises: defining a series of parametric rectifying transformations operating in the acoustic eigenvector space and dependent on a set of enhancement parameters; and defining a distortion indicator for an eigenvector or a plurality of eigenvectors. The method also includes receiving an eigenvector output by the system; and generating an instance of the corrective transformation by computing a reference value of the distortion indicator determined by the phonetic unit from which the eigenvector was emitted. Generated by a statistical model of said distortion indicator; calculating an actual value of said distortion indicator, said actual value being produced by said statistical model of said statistical model of said unit of speech from which said characteristic vector was emitted; calculating an enhancement parameter value based on said reference value, said actual value of said distortion indicator, and said parameter correction transformation; and obtaining an instance of said correction transformation corresponding to said enhancement parameter value from said series of parameter correction transformations . The instance of the rectifying transformation may be applied to the feature vectors to provide enhanced feature vectors.

Description

Background technique [0001] The present invention relates to the field of synthesized speech. In particular, the present invention relates to the statistical enhancement of synthesized speech output from a statistical text-to-speech (TTS) synthesis system. [0002] Synthetic speech is artificially generated human speech that is generated by computer software or hardware. TTS systems convert spoken text into speech signals or waveforms suitable for digital-to-analog conversion and playback. [0003] One form of TTS system uses concatenated synthesis, in which recorded speech segments are selected from a database and concatenated to form a speech signal conveying the input text. Typically, stored speech fragments represent speech units such as subphones, phonemes, and diphones that occur in a particular phonetic-linguistic context. [0004] Another category of speech synthesis (called "statistical TTS") produces synthetic speech signals through statistical modeling of human s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/033
CPCG10L13/033G10L13/06G10L13/02
Inventor A·索林S·谢克特曼
Owner INT BUSINESS MASCH CORP
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More