Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method of using spectrograms and deep convolution neural network (CNN) to identify voice emotion

A technology of convolutional neural network and speech emotion recognition, which is applied in speech analysis, instruments, etc., can solve the problems of not enough convolution layers, too many, overfitting, etc., and achieve the effect of improving speech recognition ability

Inactive Publication Date: 2018-02-16
BEIJING UNION UNIVERSITY
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Image feature extraction is not fine enough compared to 3 convolutional layers
The fully connected layer can retain the internal relationship between features, but it should not be too much, which will easily lead to overfitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of using spectrograms and deep convolution neural network (CNN) to identify voice emotion
  • Method of using spectrograms and deep convolution neural network (CNN) to identify voice emotion
  • Method of using spectrograms and deep convolution neural network (CNN) to identify voice emotion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] Such as figure 1 As shown, step 100 is executed to generate a spectrogram, and the spectrogram is generated according to the speech signal as the input data of the deep convolutional neural network model. The generation of the spectrogram specifically includes: the spectrogram is a visual representation of the frequency of a specific waveform of the speech signal changing with time. The spectrogram is a two-dimensional graph, the abscissa represents time, and the ordinate represents frequency. In the graph, the amplitude of the speech signal at a certain time and frequency portion is represented by the density and color of that point. Dark blue indicates low amplitude and bright red indicates high amplitude. The relationship between time and frequency is obtained by adding FFT conversion to the speech signal, that is, the spectrum diagram. In order to observe the frequency of the voice signal at a certain moment, the signal is divided into multiple blocks, and each b...

Embodiment 2

[0042] Such as figure 2 As shown, the overall system architecture of the present invention includes five parts: a speech input module 200 , a spectrogram generation module 210 , a data preprocessing module 220 , a classifier module 230 and an output module 240 .

[0043] The voice input module 200 is used for receiving input voice data.

[0044] The spectrogram generating module 210 is used to divide the input speech data to generate a spectrogram. The steps of its work are as follows: the signal is divided into multiple blocks, and each block is transformed by FFT. The Fourier change of a non-periodic continuous-time signal X(t) is defined as: What is calculated in the formula is the continuous frequency spectrum of the signal X(t). What is obtained in practical application is the discrete sampling value X(nT) of the continuous signal X(t). Therefore, it is necessary to use the discrete signal X(nT) to calculate the frequency spectrum of the signal X(t). DFT definition...

Embodiment 3

[0049] Such as image 3 As shown, the system is further explained from two parts of training and testing. Divide the voice signal 300 into a spectrogram 310, and the division method is as follows: Divide the signal into multiple blocks, and perform FFT transformation on each block. The Fourier change of a non-periodic continuous-time signal X(t) is defined as: What is calculated in the formula is the continuous frequency spectrum of the signal X(t). What is obtained in practical application is the discrete sampling value X(nT) of the continuous signal X(t). Therefore, it is necessary to use the discrete signal X(nT) to calculate the frequency spectrum of the signal X(t). DFT definition of finite length discrete signal X(n), n=0, 1,..., N-1 k=0, 1, . . . , N-1, Among them, N is the number of sampling points, and j represents the imaginary part of the negative number. Using the method above to generate 5,000 spectrograms, import them into the classifier 302 of the deep ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method of using spectrograms and a deep convolution neural network to identify the voice emotion. The method comprises the following steps of generating the spectrograms according to the voice signals; constructing a deep convolution neural network model; using a lot of spectrograms as the input, training and optimizing the deep convolution neural network model; testing and optimizing the trained deep convolution neural network model. According to the present invention, a new voice emotion identification method is used to convert the voice signals into the images to process, and by combining the CNN, enables the identification capability to be improved effectively.

Description

technical field [0001] The invention relates to the technical field of speech signal processing and pattern recognition, in particular to a method for speech emotion recognition using a spectrogram and a deep convolutional neural network. Background technique [0002] With the continuous development of information technology, social development puts forward higher requirements for affective computing. For example, in terms of human-computer interaction, a computer with emotional capabilities can acquire, classify, recognize, and respond to human emotions, thereby helping users to obtain an efficient and friendly feeling, and can effectively reduce people's frustration in using computers, and even Can help people understand their own and other people's emotional world. For example, using such technology to detect whether the driver is focused, the level of stress he feels, etc., and respond accordingly. In addition, affective computing can also be applied in related industr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/63G10L25/30
CPCG10L25/30G10L25/63
Inventor 袁家政刘宏哲龚灵杰
Owner BEIJING UNION UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products