Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Telephone Speech Emotion Analysis and Recognition Method Based on LSTM and SAE

A technology of emotion analysis and recognition method, which is applied in speech analysis, character and pattern recognition, instruments, etc. It can solve problems affecting analysis results, gradient disappearance, disturbing emotion analysis results, etc., and achieve the effect of accurate experimental results and high efficiency

Active Publication Date: 2022-02-25
GUANGDONG UNIV OF TECH
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the prior art, Chinese patent CN109785863A discloses a speech emotion recognition method of a deep belief network, which adopts a support vector machine to identify and classify the speech signal features, and the specific steps are: preprocessing the speech signal , then the preprocessed speech signal adopts deep belief network to perform unsupervised speech signal feature extraction to obtain speech signal features; finally, the speech signal features are used to recognize and classify speech emotion using support vector machine to obtain speech emotion recognition results; However, the disadvantages of this method are: DBN (Deep Belief Network) tends to miss some information when processing time-related feature sequences, and the support vector machine is more biased towards binary classification, so the results of sentiment analysis may produce errors
[0005] Chinese patent CN109767791A discloses a voice emotion recognition and application system for call center calls, which extracts voice information and preprocesses the voice information; then the voice keyword detection module recognizes the voice data of the voice feature analysis sub-module as emotion category keywords and topic keywords, and obtain emotional data information and response question data information; then use the emotional model set module to dynamically capture and track the emotional state of the caller; finally perform emotional classification to judge the call to be detected However, the disadvantages of this method are: a large amount of data sets are required for the construction of the speech keyword retrieval module, which will consume more manpower and material resources, and cannot compare with the artificial neural network with feature learning ability in terms of efficiency, and Keywords used as the basis for classification may cause large errors and disturb the results of sentiment analysis
This technology is particularly applicable to mobile communication devices such as smartphones, where fact or profile input may come from the use of various feature sets of the device, including online access, text or voice communication, scheduling functions, etc.; however, the disadvantage of this method is : The input of human-computer interaction dialogue is relatively cumbersome, and errors may occur when man-machine matching input and output. At the same time, the emotional classification algorithm is adopted; whether it is based on rules or based on traditional machine learning methods, in the process of further extracting the deep features of speech signals If there is a lack in it, it will reduce the accuracy of emotion classification
[0007] Traditional research in the field of speech emotion recognition tends to analyze the acoustic statistical characteristics of speech. The selected data set is also an emotional speech database with fewer speech entries and simpler semantics. Therefore, the acoustic model used for sentiment analysis is not universal. Because the linear discriminant analysis method is often used for statistical features, the accuracy of the analysis results is low; although a method for automatically extracting features using a deep belief network was proposed later, and the linear discriminant classification method was used in the prior art, and k The nearest neighbor method and the support vector machine method have achieved a recognition rate of 60%-65%, but the problem is still not resolved
[0008] And prior art adopts traditional neural network in the process of applying telephone sentiment analysis, when training, can train as a whole between the network, when training set data amount is bigger, can increase the training time of network, make network The convergence speed becomes slower, and sometimes even the gradient disappears or the gradient explodes; for example, using random initialization to initialize the network parameters will cause the error correction signal to become weaker as the network is updated, and the network will appear localized. optimal phenomenon
At the same time, because the voice signal is a kind of data related to time series, the influence of time series is often ignored when using traditional methods to extract deep features, so the accuracy of telephone voice emotion classification is low, which affects the analysis results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Telephone Speech Emotion Analysis and Recognition Method Based on LSTM and SAE
  • A Telephone Speech Emotion Analysis and Recognition Method Based on LSTM and SAE
  • A Telephone Speech Emotion Analysis and Recognition Method Based on LSTM and SAE

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0053] Such as Figure 1~2 Shown, a kind of telephone speech emotion analysis and recognition method based on LSTM and SAE, comprises the following steps:

[0054] Step 1, voice information sampling and quantization;

[0055] First of all, it must be clear that the analysis and processing of the voice signal is essentially the discretization and digital processing of the original voice signal; therefore, the analog signal is first converted into a digital voice signal through analog-to-digital conversion; the sampling process is based on a certain frequency, that is, The analog value of the analog signal is measured every short period of time; in order to ensure that the sound is not distorted, the sampling frequency is around 40kHz, which satisfies the Nyquist sampling...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a telephone voice emotion analysis and recognition method based on LSTM and SAE. First, a series of preprocessing such as sampling and quantization, pre-emphasis, framing, and windowing are performed on the voice information, and then fast Fourier is performed on the voice information. Leaf transform in order to obtain its frequency domain feature, extract speech characteristic parameter MFCC; The present invention constructs LSTM+SAE neural network model and trains the extracted MFCC characteristic parameter, obtains the depth characteristic information of speech signal, combines fully connected layer and softmax regression The algorithm obtains the classification accuracy, completes the model training, and finally inputs the MFCC feature parameters to be tested into the trained model, conducts emotional analysis on the telephone voice, and judges the speaker's emotion.

Description

technical field [0001] The invention relates to the technical field of voice recognition, in particular to a telephone voice emotion analysis and recognition method based on LSTM and SAE. Background technique [0002] With the development of society, voice has become an important medium for people to transmit information and express their feelings. With the breakthroughs in voice recognition and deep learning artificial intelligence technology in recent years, voice signals have also become an important medium in the information age after images. It is a basic and efficient way to communicate ideas, emotions, and human-computer interaction between people, such as our commonly used voice calls and smart home human-computer interaction tools such as Tmall Genie. The research of speech emotion recognition has important practical significance for enhancing the intelligence and humanization of computers, developing new man-machine environments, and promoting the development of ps...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L25/63G10L25/30G10L25/24G06N3/04G06K9/62
CPCG10L25/63G10L25/30G10L25/24G06N3/045G06F18/241
Inventor 李琪叶武剑刘怡俊王峰李学易
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products