A Telephone Speech Emotion Analysis and Recognition Method Based on LSTM and SAE

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of emotion analysis and recognition method, which is applied in speech analysis, character and pattern recognition, instruments, etc. It can solve problems affecting analysis results, gradient disappearance, disturbing emotion analysis results, etc., and achieve the effect of accurate experimental results and high efficiency

Active Publication Date: 2022-02-25

GUANGDONG UNIV OF TECH

View PDF11 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In the prior art, Chinese patent CN109785863A discloses a speech emotion recognition method of a deep belief network, which adopts a support vector machine to identify and classify the speech signal features, and the specific steps are: preprocessing the speech signal , then the preprocessed speech signal adopts deep belief network to perform unsupervised speech signal feature extraction to obtain speech signal features; finally, the speech signal features are used to recognize and classify speech emotion using support vector machine to obtain speech emotion recognition results; However, the disadvantages of this method are: DBN (Deep Belief Network) tends to miss some information when processing time-related feature sequences, and the support vector machine is more biased towards binary classification, so the results of sentiment analysis may produce errors

[0005] Chinese patent CN109767791A discloses a voice emotion recognition and application system for call center calls, which extracts voice information and preprocesses the voice information; then the voice keyword detection module recognizes the voice data of the voice feature analysis sub-module as emotion category keywords and topic keywords, and obtain emotional data information and response question data information; then use the emotional model set module to dynamically capture and track the emotional state of the caller; finally perform emotional classification to judge the call to be detected However, the disadvantages of this method are: a large amount of data sets are required for the construction of the speech keyword retrieval module, which will consume more manpower and material resources, and cannot compare with the artificial neural network with feature learning ability in terms of efficiency, and Keywords used as the basis for classification may cause large errors and disturb the results of sentiment analysis

This technology is particularly applicable to mobile communication devices such as smartphones, where fact or profile input may come from the use of various feature sets of the device, including online access, text or voice communication, scheduling functions, etc.; however, the disadvantage of this method is : The input of human-computer interaction dialogue is relatively cumbersome, and errors may occur when man-machine matching input and output. At the same time, the emotional classification algorithm is adopted; whether it is based on rules or based on traditional machine learning methods, in the process of further extracting the deep features of speech signals If there is a lack in it, it will reduce the accuracy of emotion classification

[0007] Traditional research in the field of speech emotion recognition tends to analyze the acoustic statistical characteristics of speech. The selected data set is also an emotional speech database with fewer speech entries and simpler semantics. Therefore, the acoustic model used for sentiment analysis is not universal. Because the linear discriminant analysis method is often used for statistical features, the accuracy of the analysis results is low; although a method for automatically extracting features using a deep belief network was proposed later, and the linear discriminant classification method was used in the prior art, and k The nearest neighbor method and the support vector machine method have achieved a recognition rate of 60%-65%, but the problem is still not resolved

[0008] And prior art adopts traditional neural network in the process of applying telephone sentiment analysis, when training, can train as a whole between the network, when training set data amount is bigger, can increase the training time of network, make network The convergence speed becomes slower, and sometimes even the gradient disappears or the gradient explodes; for example, using random initialization to initialize the network parameters will cause the error correction signal to become weaker as the network is updated, and the network will appear localized. optimal phenomenon

At the same time, because the voice signal is a kind of data related to time series, the influence of time series is often ignored when using traditional methods to extract deep features, so the accuracy of telephone voice emotion classification is low, which affects the analysis results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0052] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0053] Such as Figure 1~2 Shown, a kind of telephone speech emotion analysis and recognition method based on LSTM and SAE, comprises the following steps:

[0054] Step 1, voice information sampling and quantization;

[0055] First of all, it must be clear that the analysis and processing of the voice signal is essentially the discretization and digital processing of the original voice signal; therefore, the analog signal is first converted into a digital voice signal through analog-to-digital conversion; the sampling process is based on a certain frequency, that is, The analog value of the analog signal is measured every short period of time; in order to ensure that the sound is not distorted, the sampling frequency is around 40kHz, which satisfies the Nyquist sampling...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a telephone voice emotion analysis and recognition method based on LSTM and SAE. First, a series of preprocessing such as sampling and quantization, pre-emphasis, framing, and windowing are performed on the voice information, and then fast Fourier is performed on the voice information. Leaf transform in order to obtain its frequency domain feature, extract speech characteristic parameter MFCC; The present invention constructs LSTM+SAE neural network model and trains the extracted MFCC characteristic parameter, obtains the depth characteristic information of speech signal, combines fully connected layer and softmax regression The algorithm obtains the classification accuracy, completes the model training, and finally inputs the MFCC feature parameters to be tested into the trained model, conducts emotional analysis on the telephone voice, and judges the speaker's emotion.

Description

technical field [0001] The invention relates to the technical field of voice recognition, in particular to a telephone voice emotion analysis and recognition method based on LSTM and SAE. Background technique [0002] With the development of society, voice has become an important medium for people to transmit information and express their feelings. With the breakthroughs in voice recognition and deep learning artificial intelligence technology in recent years, voice signals have also become an important medium in the information age after images. It is a basic and efficient way to communicate ideas, emotions, and human-computer interaction between people, such as our commonly used voice calls and smart home human-computer interaction tools such as Tmall Genie. The research of speech emotion recognition has important practical significance for enhancing the intelligence and humanization of computers, developing new man-machine environments, and promoting the development of ps...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L25/63G10L25/30G10L25/24G06N3/04G06K9/62

CPCG10L25/63G10L25/30G10L25/24G06N3/045G06F18/241

Inventor 李琪叶武剑刘怡俊王峰李学易

Owner GUANGDONG UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Telephone Speech Emotion Analysis and Recognition Method Based on LSTM and SAE

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology