Speech emotion feature extraction method based on transformer model encoder
A technology of emotional features and extraction methods, applied in speech analysis, instruments, etc., can solve problems such as long-distance gradient disappearance and information loss
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072]Use the IEMOCAP speech emotion database to extract, the database contains 10 kinds of emotions in total, in the present embodiment, four kinds of emotions are used: anger, joy, sadness, and neutrality, wherein joy and excitement are divided into happy categories to achieve category balance, a total of 5531 sentences in English audio.
[0073] Specifically follow the steps below:
[0074] Step 1: pre-process the original waveform signal by pre-emphasizing, windowing and framing, and endpoint detection to obtain x[n]. The sampling rate of each voice waveform is set to 16KHZ, 16bit quantization, and the window length and offset are used at the same time. Hamming windows of 250ms and 10ms, and then convert the voice signal into the original voice waveform;
[0075] Step 2: Use the sincnet filter layer to learn a custom filter bank adjusted for speech emotion recognition, and perform convolution calculations between x[n] and sincnet layer g[n,θ] for preliminary selection of ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


