Voice emotion recognition system and voice emotion recognition method

A speech emotion recognition and emotion technology, applied in speech emotion recognition system, speech emotion recognition, and deep neural network technology field, can solve problems such as poor versatility, and achieve the effect of reducing size, reducing training time, and reducing complexity

Active Publication Date: 2019-12-03
珠海亿智电子科技有限公司
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these hand-extracted speech features are only suitable for specific tasks and have poor generalizability.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice emotion recognition system and voice emotion recognition method
  • Voice emotion recognition system and voice emotion recognition method
  • Voice emotion recognition system and voice emotion recognition method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0050] The present invention will be further described below in combination with specific embodiments.

[0051] Such as figure 1 Shown, a kind of speech emotion recognition system comprises: the audio frequency preprocessing module, CNN module, pyramidal FSMN module, time step attention module and output module connected successively; Described CNN module has convolutional layer, and described pyramidal FSMN module It has a pyramid memory block structure.

[0052] The voice emotion recognition system in this embodiment is an end-to-end feed-forward deep neural network structure. The present invention improves the basic network in the classic FSMN and DFSMN structures, and adds convolution in the basic network. layer to achieve lower-level feature extraction.

[0053] The audio data of the present invention is in the mainstream wav format, and the sampling frequency is 16000 Hz. The original audio data is further divided into frames for Fourier transform. The length of each f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a voice emotion recognition system which comprises: an audio preprocessing module, a CNN module, a pyramid FSMN module, a time step attention module and an output module, whichare connected in sequence. The invention also discloses a speech emotion recognition method applied to the speech emotion recognition system, wherein the method comprises the following steps: 1, carrying out the preliminary operation of speech, thereby obtaining a speech spectrum feature map; 2, operating the spectrogram feature map, and constructing a spectrogram feature map containing audio shallow information; 3, further processing the spectrogram feature map containing the audio shallow information to obtain deeper semantic information and context information; 4, processing the spectrogram feature map with deeper semantic information and context information to obtain a feature vector with the highest relevancy between the whole speech and the emotion of the speaker; and 5, outputtingan emotion category corresponding to the whole section of voice. Compared with the prior art, the speech emotion recognition performance of the speech emotion recognition method is greatly improved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence and speech recognition, in particular to a speech emotion recognition system and a speech emotion recognition method, which is an end-to-end deep neural network technology, which is improved based on DFSMN. Background technique [0002] With the continuous improvement of speech recognition technology and the wide application of speech recognition equipment, human-computer interaction is becoming more and more common in people's daily life. However, most of these devices can only recognize the text-level content of human language and cannot recognize the emotional state of the speaker, while speech emotion recognition has many useful applications in human-centered services and human-computer interaction, such as intelligent service robots, automation Call centers and distance education. So far, it has attracted considerable research attention and many methods have been proposed. D...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/30G10L25/63G06N3/04G06N3/063
CPCG10L25/30G10L25/63G06N3/063G06N3/045
Inventor 殷绪成曹秒杨春
Owner 珠海亿智电子科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products