Deep learning-based unusual speech distinguishing method

A deep learning, abnormal technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of speaker information interference and recognition performance degradation

Active Publication Date: 2018-11-06
SOUTH CHINA UNIV OF TECH
View PDF7 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Another problem is that traditional speech recognition systems often use linear predictive cepstral coefficients and Mel-frequency cepstral coefficients. The main information in these underlying acoustic features is the pronunciation of text features, and speaker information is easily affected by this information, channel and noise. Information interference, which reduces the recognition performance of the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning-based unusual speech distinguishing method
  • Deep learning-based unusual speech distinguishing method
  • Deep learning-based unusual speech distinguishing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0104] A method for distinguishing abnormal speech based on deep learning, comprising the following steps:

[0105] Step 1: Obtain the input voice, and perform preprocessing such as resampling, pre-emphasis, frame division and windowing on the input voice to obtain the pre-processed voice;

[0106] Resampling is specifically: the input voice has different sampling frequencies and encoding methods. In order to facilitate data processing and analysis, the original input voice signal is resampled, and the sampling frequency and encoding method are unified; the sampling frequency is 22.05kHz, and the encoding method is wav Format.

[0107] The pre-emphasis is specifically: the power spectrum of the audio signal decreases with the increase of the frequency, and most of the energy is concentrated in the low-frequency range. In order to improve the high-frequency part of the original audio signal, the original input audio signal is pre-emphasized. order FIR high-pass filter, its tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep learning-based unusual speech distinguishing method. The method comprises the following steps: input speech is acquired, resampling, pre-emphasis and framing and windowing preprocessing are carried out on the input speech, and preprocessed speech is obtained; a mel-frequency cepstral coefficient (MFCC) characteristic vector is extracted for the preprocessed speech; the speech segments with different frames are regularized to a fixed number of frames, and each speech segment obtains a corresponding mel-frequency cepstral coefficient characteristic vector; a convolutional depth confidence network is built; the mel-frequency cepstral coefficient characteristic vectors are inputted to the convolutional depth confidence network for training, and the states of input speech are classified; and according to a classification result, a hidden Markov model is called for template matching and a speech recognition result is obtained. Multiple nonlinear transform layers of the convolutional depth confidence network are used, the inputted MFCC characteristics are mapped to higher-dimensional space, the hidden Markov model is then used to carry out modeling on different states of speech, and the speech recognition accuracy is improved.

Description

technical field [0001] The invention relates to the field of intelligent speech processing research, in particular to a method for distinguishing abnormal speech based on deep learning. Background technique [0002] Speech is one of the important ways for humans and machines to interact. After decades of research, speech recognition technology has been greatly developed and has penetrated into our daily life. However, the existing research on speech recognition has the following problems: [0003] In real life, the speaker's abnormal health or other reasons will cause the input speech to shift from normal speech to abnormal speech, and will bring more noise interference. Abnormal speech generally refers to speech with complex background noise, speech with intentional changes in speaking methods or habits, speech with developmental organ lesions, etc. [0004] Another problem is that traditional speech recognition systems often use linear predictive cepstral coefficients and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/08G10L15/14G10L15/16G10L15/26G10L25/24G10L25/30
CPCG10L15/063G10L15/08G10L15/142G10L15/16G10L15/26G10L25/24G10L25/30
Inventor 奉小慧陈光科贺前华巫小兰李艳雄
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products