Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech emotion feature extraction method based on transformer model encoder

A technology of emotional features and extraction methods, applied in speech analysis, instruments, etc., can solve problems such as long-distance gradient disappearance and information loss

Pending Publication Date: 2021-03-09
XUZHOU NORMAL UNIVERSITY
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are still many systems that use neural networks such as RNN and DNN to extract learning features. However, due to the problem of long-distance gradient disappearance and information loss from long sequences to fixed-length vectors, the traditional neural network is not yet able to fully understand the global information of speech emotion. Failed to extract well, there are great limitations, and this property is very important in the emotional characteristics of speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech emotion feature extraction method based on transformer model encoder
  • Speech emotion feature extraction method based on transformer model encoder
  • Speech emotion feature extraction method based on transformer model encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072]Use the IEMOCAP speech emotion database to extract, the database contains 10 kinds of emotions in total, in the present embodiment, four kinds of emotions are used: anger, joy, sadness, and neutrality, wherein joy and excitement are divided into happy categories to achieve category balance, a total of 5531 sentences in English audio.

[0073] Specifically follow the steps below:

[0074] Step 1: pre-process the original waveform signal by pre-emphasizing, windowing and framing, and endpoint detection to obtain x[n]. The sampling rate of each voice waveform is set to 16KHZ, 16bit quantization, and the window length and offset are used at the same time. Hamming windows of 250ms and 10ms, and then convert the voice signal into the original voice waveform;

[0075] Step 2: Use the sincnet filter layer to learn a custom filter bank adjusted for speech emotion recognition, and perform convolution calculations between x[n] and sincnet layer g[n,θ] for preliminary selection of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech emotion feature extraction method based on a transformer model encoder, and is suitable for the field of artificial intelligence and speech emotion recognition. The method comprises the following steps: firstly, extracting low-level speech emotion features from an original speech waveform by using a sincnet filter, and then further learning the low-level speech emotion features by using a multi-layer transformer model encoder; wherein the improved transformer model encoder is characterized in that a layer of sincnet filter is added in front of a conventional transformer model encoder, namely a group of parameterized sinc functions with band-pass filters, the low-level feature extraction work of original voice waveform signals is completed by utilizing the sincnet filter, a network is enabled to better capture important narrow-band emotion features, and therefore, the frame-level emotion features containing the global context information at a deeper level are obtained.

Description

technical field [0001] The invention relates to a speech emotion feature extraction method, in particular to a speech emotion feature extraction method based on a transformer model encoder used in the fields of artificial intelligence and speech emotion recognition. Background technique [0002] With the advancement of science and technology, human-computer interaction has become an important research field. Speech emotion recognition technology can make machines more human. At present, speech emotion recognition has been studied for more than ten years. Its essence is the simulation of the process of human emotion perception and understanding by computer. The task is to extract effective emotional acoustic features from preprocessed speech signals and find out these acoustic features. Mapping relationship with human emotion. [0003] In the research of speech emotion recognition, how to extract the features with the most emotional information from speech signals is still ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/03G10L25/30G10L25/45G10L25/63G10L25/90
CPCG10L25/03G10L25/63G10L25/90G10L25/30G10L25/45Y04S10/50
Inventor 金赟俞佳佳马勇李世党姜芳艽
Owner XUZHOU NORMAL UNIVERSITY