LSTM and SAE-based telephone voice emotion analysis and recognition method

A technology of emotion analysis and recognition method, which is applied in speech analysis, character and pattern recognition, biological neural network model, etc. It can solve problems such as gradient disappearance, influence analysis results, unsupervised, etc., achieve high efficiency, accurate experimental results, and learning powerful effect

Active Publication Date: 2019-11-22
GUANGDONG UNIV OF TECH
View PDF11 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the prior art, Chinese patent CN109785863A discloses a speech emotion recognition method of a deep belief network, which adopts a support vector machine to identify and classify the speech signal features, and the specific steps are: preprocessing the speech signal , then the preprocessed speech signal adopts deep belief network to perform unsupervised speech signal feature extraction to obtain speech signal features; finally, the speech signal features are used to recognize and classify speech emotion using support vector machine to obtain speech emotion recognition results; However, the disadvantages of this method are: DBN (Deep Belief Network) tends to miss some information when processing time-related feature sequences, and the support vector machine is more biased towards binary classification, so the results of sentiment analysis may produce errors
[0005] Chinese patent CN109767791A discloses a voice emotion recognition and application system for call center calls, which extracts voice information and preprocesses the voice information; then the voice keyword detection module recognizes the voice data of the voice feature analysis sub-module as emotion category keywords and topic keywords, and obtain emotional data information and response question data information; then use the emotional model set module to dynamically capture and track the emotional state of the caller; finally perform emotional classification to judge the call to be detected However, the disadvantages of this method are: a large amount of data sets are required for the construction of the speech keyword retrieval module, which will consume more manpower and material resources, and cannot compare with the artificial neural network with feature learning ability in terms of efficiency, and Keywords used as the basis for classification may cause large errors and disturb the results of sentiment analysis
This technology is particularly applicable to mobile communication devices such as smartphones, where fact or profile input may come from the use of various feature sets of the device, including online access, text or voice communication, scheduling functions, etc.; however, the disadvantage of this method is : The input of human-computer interaction dialogue is relatively cumbersome, and errors may occur when man-machine matching input and output. At the same time, the emotional classification algorithm is adopted; whether it is based on rules or based on traditional machine learning methods, in the process of further extracting the deep features of speech signals If there is a lack in it, it will reduce the accuracy of emotion classification
[0007] Traditional research in the field of speech emotion recognition tends to analyze the acoustic statistical characteristics of speech. The selected data set is also an emotional speech database with fewer speech entries and simpler semantics. Therefore, the acoustic model used for sentiment analysis is not universal. Because the linear discriminant analysis method is often used for statistical features, the accuracy of the analysis results is low; although a method for automatically extracting features using a deep belief network was proposed later, and the linear discriminant classification method was used in the prior art, and k The nearest neighbor method and the support vector machine method have achieved a recognition rate of 60%-65%, but the problem is still not resolved
[0008] And prior art adopts traditional neural network in the process of applying telephone sentiment analysis, when training, can train as a whole between the network, when training set data amount is bigger, can increase the training time of network, make network The convergence speed becomes slower, and sometimes even the gradient disappears or the gradient explodes; for example, using random initialization to initialize the network parameters will cause the error correction signal to become weaker as the network is updated, and the network will appear localized. optimal phenomenon
At the same time, because the voice signal is a kind of data related to time series, the influence of time series is often ignored when using traditional methods to extract deep features, so the accuracy of telephone voice emotion classification is low, which affects the analysis results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LSTM and SAE-based telephone voice emotion analysis and recognition method
  • LSTM and SAE-based telephone voice emotion analysis and recognition method
  • LSTM and SAE-based telephone voice emotion analysis and recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The present invention will be described in further detail below with reference to the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0053] like Figures 1 to 2 As shown, a telephone speech sentiment analysis and recognition method based on LSTM and SAE includes the following steps:

[0054] Step 1, voice information sampling and quantization;

[0055] First of all, it should be clear that the analysis and processing of voice signals is essentially the process of discretization and digitization of the original voice signals; therefore, the analog signals are first converted into digitized voice signals through analog-to-digital conversion; the sampling process is based on a certain frequency, that is Every short period of time, the analog value of the analog signal is measured; in order to ensure that the sound is not distorted, the sampling frequency is about 40kHz, which satisfies the Nyquist samplin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an LSTM and SAE-based telephone voice emotion analysis and recognition method; according to the method, voice information is subjected to a series of preprocessing of sampling,quantifying, pre-emphasis, framing, windowing and the like, and then fast fourier transformation is carried out on the voice information to obtain frequency domain characteristics of the voice information, and next, the voice characteristic parameters MFCC are extracted. By adoption of the method, an LSTM+ SAE neural network model is established to train the extracted MFCC characteristic parameters to obtain depth characteristic information of voice signals; the classification accuracy is obtained by combining a full-connection layer and a softmax regression algorithm, and model training is completed; and finally, the MFCC characteristic parameters to be tested are input into a trained model, emotion analysis is carried out on the telephone voice, and the emotion of a speaker is judged.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a telephone speech emotion analysis and recognition method based on LSTM and SAE. Background technique [0002] With the development of society, speech has become an important medium for people to transmit information and express their feelings. With the breakthrough of speech recognition and deep learning artificial intelligence technology in recent years, speech signals have also become the background of the information age after images. It is a basic and efficient way of ideological communication, emotional communication, and human-computer interaction between people, such as our commonly used voice calls and smart home human-computer interaction tools such as Tmall Genie. The research on speech emotion recognition has important practical significance for enhancing the intelligence and humanization of computers, developing new human-machine environments, and promoti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/63G10L25/30G10L25/24G06N3/04G06K9/62
CPCG10L25/63G10L25/30G10L25/24G06N3/045G06F18/241
Inventor 李琪叶武剑刘怡俊王峰李学易
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products