Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Estimation method for fundamental frequency of Chinese whispered speech

A technology of ear speech and fundamental frequency, which is applied in the field of speech signal processing, and can solve the problem of lack of fundamental frequency information of Chinese ear speech

Active Publication Date: 2015-02-25
SUZHOU UNIV
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method for estimating the fundamental frequency of Chinese ear speech, which can solve the difficulties caused by the lack of fundamental frequency information of Chinese ear speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Estimation method for fundamental frequency of Chinese whispered speech
  • Estimation method for fundamental frequency of Chinese whispered speech
  • Estimation method for fundamental frequency of Chinese whispered speech

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0017] Embodiment one: see figure 1 Shown, a kind of fundamental frequency estimation method of Chinese ear speech, comprises the following steps:

[0018] (1) Establish a database of earphone and normal voice with consistent corpus, so that in the database, the speaker, voice content, and word order of earphone and normal voice are completely consistent;

[0019] (2) Extract the linear predictive cepstrum parameters L of ear speech respectively w , the linear predictive cepstrum parameter L of normal speech n and fundamental frequency parameter F0, and according to L w and L n Perform Dynamic Time Warping (DTW) alignment;

[0020] (3) Divide the F0 of normal speech between 100 and 300 Hz according to an interval of 5 Hz, and generate 40 intervals in total;

[0021] (4) All the aligned vectors are assigned to each interval according to the size of the normal speech F0, and the linear predictive cepstral vectors of all ear speech in each interval are trained as a Gaussian ...

Embodiment 2

[0023] Embodiment 2: 80 speakers are selected to participate in the recording, including 40 males and 40 females, and the age ranges from children to the elderly, and the distribution is relatively balanced. The recording environment is quiet, the microphone is a handheld microphone, the sampling rate is 16KHz, and the quantization bit is 16bits. In order to ensure that children can participate in the recording smoothly, the recording text is collected from elementary school Chinese textbooks, including all Chinese tonal syllables composed of 21 initials and 35 finals in Chinese. The content of the corpus has been screened to ensure a balanced distribution of phonemes.

[0024] Each speaker pronounces the same corpus with ear speech and normal speech respectively. Due to the particularity of ear speech pronunciation, it is inevitable that there may be incorrect pronunciation methods. Therefore, all ear speech corpus data have undergone subjective spectrum observation to ensure...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an estimation method for the fundamental frequency of Chinese whispered speech. The estimation method concretely includes the steps that a whispered speech and normal speech database uniform in linguistic data is set up; LPCC parameters Lw of whispered speech, LPCC parameters Ln of normal speech and fundamental frequency parameters F0 are extracted, and DTW alignment is carried out according to Lw and Ln; F0 of the normal speech is divided within the range of 100-300 Hz at the interval of 5 Hz, and forty intervals are generated in total; aligned vectors are assigned to all the intervals according to F0 values of the normal speech, all whispered speech LPCC vectors in each interval are trained to be a GMM model, the combined vectors formed by all the whispered speech LPCC vectors and the F0 parameters of the normal speech in the corresponding interval are trained be to a GMM model to obtain an estimation function, and forty estimation functions are obtained in total; the LPCC parameters of the whispered speech are extracted and matched with all the GMM models to search for the optimum matching model, and then the F0 values of the whispered speech are estimated according to the estimation function of the model. The fundamental frequency of the whispered speech can be estimated, and the difficulty, caused by the loss of fundamental frequency information, of the Chinese whispered speech is effectively overcome.

Description

technical field [0001] The invention relates to a speech signal processing technology, in particular to a fundamental frequency estimation method of Chinese ear speech. Background technique [0002] Chinese is a tonal language, and the speaker's semantics and emotions are mainly expressed through tones. However, the vocal cords do not vibrate during whispering, and the most important carrier of tone—pitch frequency, is lost. Therefore, whether ear speech has tone and how to perceive its tone has become a research hotspot. The study of whisper tone perception is of great significance to the processing of ear speech, such as enhancement and recognition. In 1972, Abramson summarized two opposing views on whispering tones: the representative of the first view is Panconcelli-calzia, who believes that for tonal languages, continuous ear sounds can be understood according to the context, while isolated words are incomprehensible; The representative of the second point of view is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/24G10L25/78G10L15/06
Inventor 陈雪勤刘正赵鹤鸣俞一彪
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products