Speech Emotion Recognition Method Based on Multimodal Feature Extraction and Fusion

A speech emotion recognition and feature extraction technology, applied in neural learning methods, natural language data processing, biological neural network models, etc., can solve problems such as poor interpretability, inability to classify model emotion discrimination, etc. The effect of presentation skills

Active Publication Date: 2021-08-17
NO 15 INST OF CHINA ELECTRONICS TECH GRP +1
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the disadvantage that the existing speech emotion recognition method is only based on the feature extraction of a certain aspect of the audio, and cannot make the classification model comprehensively identify the emotion according to the characteristics of the audio, the present invention discloses a speech emotion recognition method based on multi-modal feature extraction and fusion. The recognition method first extracts the eGeMAPS acoustic feature set from the input audio, which defines 88 basic speech emotion analysis features, and standardizes the operation method for low-level acoustic features to solve the problem of poor interpretability; secondly, The invention provides a method for extracting user features and audio content text features, and uses hierarchical self-attention to fuse user features, text features and acoustic features, fully utilizes the multimodal information in the corpus database, and extracts voice emotion More comprehensive modeling for more effective speech emotion recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech Emotion Recognition Method Based on Multimodal Feature Extraction and Fusion
  • Speech Emotion Recognition Method Based on Multimodal Feature Extraction and Fusion
  • Speech Emotion Recognition Method Based on Multimodal Feature Extraction and Fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to better understand the contents of the present invention, an example is given here.

[0056] The invention discloses a speech emotion recognition method based on multimodal feature extraction and fusion, figure 1 It is an overall flowchart of the speech emotion recognition method of the present invention, and its steps include:

[0057] S1, data preprocessing;

[0058] S11, audio file preprocessing, figure 2 For the specific flow chart of audio preprocessing, it includes:

[0059] S111 , check the legality of the audio file format, only the legal audio format can correctly extract the acoustic features, and convert the illegal audio format into a legal audio format before performing subsequent processing. The specific method is to check the suffix name of the audio file, if the suffix name is in the legal suffix name list (including '.mp3', '.wav'), then pass the file format check; if it is not in the legal suffix name list, then Utilizes pyAudio open so...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech emotion recognition method based on multimodal feature extraction and fusion. First, the eGeMAPS acoustic feature set is extracted from the input audio. The feature set defines 88 basic speech emotion analysis features and standardizes the low-level The operation method of hierarchical acoustic features solves the problem of poor interpretability; secondly, the present invention provides a method for extracting user features and audio content text features, and uses hierarchical self-attention to combine user features and text features with acoustic features Carry out fusion, make full use of the multi-modal information in the corpus database, carry out more comprehensive modeling to speech emotion, thus realize speech emotion recognition more effectively; Finally, the present invention utilizes a kind of hierarchical self-attention mechanism to The in-depth fusion of modal features and the alignment of features of different modalities in the same high-dimensional semantic space enhance the ability to express emotions, thereby improving the accuracy of speech emotion recognition.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a speech emotion recognition method based on multimodal feature extraction and fusion. Background technique [0002] With the rapid development of information technology and the popularization of various smart devices, speech recognition is no longer limited to the simple recognition of a single phoneme and sentence in the voice of a specific speaker, such as remote call voice, children's voice, low-resource voice, The effective recognition of the voices of speech-impaired persons and emotional voices has also become some new hotspots in the field of speech recognition. Speech contains not only the text information that the speaker wants to express, but also the emotional information that the speaker wants to express. Effective recognition of emotion in speech can improve the intelligibility of speech. Therefore, some researchers are trying to develop human nature with human thi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/683G06F40/151G06F40/279G06N3/04G06N3/08
CPCG06F16/3343G06F16/3344G06F16/683G06F40/151G06F40/279G06N3/08G06N3/044G06N3/045
Inventor 任传伦郭世泽巢文涵张先国夏建民姜鑫孙玺晨俞赛赛刘晓影乌吉斯古愣
Owner NO 15 INST OF CHINA ELECTRONICS TECH GRP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products