Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech time scale modification method based on short-term continuous nonnegative matrix decomposition

A non-negative matrix decomposition and duration adjustment technology, applied in speech analysis, instruments, etc., can solve problems such as instability, achieve the effect of easy algorithm, overcome the problem of pitch period labeling, and overcome the problem of strong machine sound

Active Publication Date: 2014-08-13
PLA UNIV OF SCI & TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since this method operates on a fixed pitch period length when extending the speech duration, good speech quality can only be obtained when the adjustment scale is an integer number of pitch periods. For continuous duration adjustment, there is instability. The problem
In addition, the TDPSOLA algorithm relies on accurate pitch period labeling, which is also difficult to achieve

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech time scale modification method based on short-term continuous nonnegative matrix decomposition
  • Speech time scale modification method based on short-term continuous nonnegative matrix decomposition
  • Speech time scale modification method based on short-term continuous nonnegative matrix decomposition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0047] Figure 6 It is a schematic diagram of the duration adjustment process of a piece of male voice data (the office is equipped with microcomputers.) The duration adjustment ratio α is 2, where the sampling rate of the voice is 8KHz, the frame division time window length L is 256, and the frame shift R is 64. When performing discrete Fourier transform on each frame, the number of frequency points K=256. When performing short-term continuous non-negative matrix decomposition on the amplitude spectrum, the value of r is 50. When reconstructing the speech waveform from the amplitude spectrum, the number of iterations is 30. It can be seen from the figure that after the original voice y(n) is adjusted by the voice duration of this method, the adjusted voice is obtained is twice as long as y(n).

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech time scale modification method based on short-term continuous nonnegative matrix decomposition. The method comprises the steps as follows: decomposing a speech magnitude spectrum into a basic matrix and an encoding matrix by utilizing a short-term continuous nonnegative matrix decomposition algorithm; remaining the basic matrix in a constant state, modifying the ratio based on the time scale of the speech, and carrying out linear interpolation to the encoding matrix; combining the basic matrix and the encoding matrix subjected to the linear interpolation to obtain the speech magnitude spectrum with the time scale modified; and finally, reconstructing the waveform of the speech with the time scale modified from the speech magnitude spectrum with the time scale modified by utilizing a waveform estimation method. With the adoption of the speech time scale modification method, the performance of speech time scale modification is enhanced, and the quality of the speech with the time scale modified is improved.

Description

technical field [0001] The invention belongs to the technical field of speech signal processing, in particular to a speech duration adjustment method based on short-term continuous non-negative matrix decomposition. Background technique [0002] Speech duration adjustment technology can change the speech playback speed while maintaining the perceptual characteristics of speech such as pitch cycle and formant structure, so that the processed speech is as if the speaker actively changes the speech rate. According to the survey, the fastest speech speed of human speech is about 110 to 180 words per minute, and the maximum speech speed that human ears can understand is 2 to 3 times that (1.M.R.Portnoff.Time-scale modification of speech based on short-time fourier analysis [D]. PhD Thesis, MIT, 1978). Therefore, if the voice duration adjustment technology is used to adjust the playback speed of voice data as needed, the auditory potential of the human ear can be maximized, enabl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L21/04
Inventor 张雄伟吴海佳黄建军陈卫卫赵改华李铁南
Owner PLA UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products