Method of converting whispered voice into normal voice based on radial group neutral network

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology based on neural network and normal speech, applied in speech analysis, instruments, etc., can solve problems such as difficulty in extracting formants of ear speech, affecting call quality, and distorted speech bands, achieving confidential calls, good intelligibility, The effect of facilitating communication

Inactive Publication Date: 2009-09-09

SUZHOU UNIV

View PDF0 Cites 22 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, due to the particularity of whispering pronunciation and the influence of the dialogue environment, the voice signal in this way not only has a low signal-to-noise ratio but also has poor intelligibility and clarity, especially when communicating through communication equipment, which affects the conversation Quality, but also easy to cause fatigue

In addition, some voice patients or those with abnormal pronunciation function can only use whispering when communicating, which affects communication.

[0003] At present, there are few studies on ear-to-speech conversion at home and abroad. The existing methods are: 1. Use the linear prediction method (LPC) to realize the reconstruction of the ear-to-speech, and synthesize the oblique lattice synthesis filter by extracting the partial correlation coefficient of the ear-to-speech , the problems encountered are: firstly, the formant of ear speech is not easy to extract; 2. Using the Mixed Excitation Linear Prediction Model (MELP) to reconstruct ear speech, the speech is divided into five frequency bands, and four low-frequency bands are used as voiced segments 3. Based on the homomorphic signal processing method combined with the relative entropy phonological segmentation whisper reconstruction system, after the earphone phonological segmentation, use the homomorphic signal The processing method obtains the vocal tract response sequence, and adds the fundamental frequency according to the tone. Since the ear speech transfer function is different from the normal speech, although some post-processing has been done, the naturalness of the converted speech is not high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0039] Embodiment one: see attached Figures 1 to 4 as shown,

[0040] Ear speech has no pitch period, its energy is 20dB lower than normal speech, and its signal-to-noise ratio is lower. This voice signal not only has a low signal-to-noise ratio but also has poor intelligibility and clarity, which not only affects the quality of the call, but also easily causes fatigue. In this embodiment, an audio file in wav format with a sampling rate of 10KHz is selected, and the workflow of each step will be described in detail below.

[0041] Such as figure 1 As shown, the method of the present embodiment includes the following steps:

[0042] Step 11: Preprocessing the ear-to-ear speech. Firstly, pre-emphasis processing is performed on the ear speech. The purpose of pre-emphasis is to enhance the high-frequency part, make the spectrum of the signal flat, and keep it in the entire frequency band from low frequency to high frequency. The spectrum can be calculated with the same sign...

Embodiment 2

[0086] Embodiment two: see attached Figures 5 to 8 as shown,

[0087] The wav format audio file ear speech "a, o, e, i, u, v" of sampling rate 10KHz is respectively processed as follows: (1) use linear prediction method (LPC) to convert ear speech; (2) use the present invention The method converts ear speech. Figure 5-7 The waveform diagram and spectrogram of the normal speech and the speech "a" processed by the above two algorithms are given respectively. It can be seen that the spectrogram of converted speech by the method of the present invention is closer to the spectrogram of normal speech.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method of converting whispered voice into normal voice based on a radial group neutral network, which comprises two steps of training and converting: respectively extracting line spectrum pair parameters of the whispered voice and the normal voice when in training; seizing mapping relation of spectrum envelope of the whispered voice and the normal voice by using the radial group neutral network; preprocessing the whispered voice when in conversion and extracting the line spectrum pair parameters, converting the line spectrum pair parameters of the whispered voice by the trained radial group neutral network, at last generating a driving source of the voice by using basic frequency mean value of the voice as basic voice frequency, and converting into the normal voice by a line spectrum pair synthesizer. The whispered voice converted by the invention achieves better effect on the respects of intelligibility threshold and tone quality.

Description

technical field [0001] The invention belongs to the technical field of speech signal processing, in particular to the technology of converting ear speech into normal speech. Background technique [0002] Otospeech is a pronunciation pattern that differs from normal speech and is characterized by low volume and no vocal cord vibration at all. When speaking on certain occasions, people often use whispering in order not to affect others or to keep the conversation confidential. However, due to the particularity of whispering pronunciation and the influence of the dialogue environment, the voice signal in this way not only has a low signal-to-noise ratio but also has poor intelligibility and clarity, especially when communicating through communication equipment, which affects the conversation quality, and it is easy to cause fatigue. In addition, some voice patients or those with abnormal pronunciation can only use whispering when communicating, which affects communication. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L21/02G10L21/00

Inventor陶智赵鹤鸣顾济华韩韬陈大庆许宜申吴迪张晓俊

OwnerSUZHOU UNIV

Method of converting whispered voice into normal voice based on radial group neutral network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology