Speech Emotion Recognition Method Based on Multimodal Feature Extraction and Fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech emotion recognition and feature extraction technology, applied in neural learning methods, natural language data processing, biological neural network models, etc., can solve problems such as poor interpretability, inability to classify model emotion discrimination, etc. The effect of presentation skills

Active Publication Date: 2021-08-17

NO 15 INST OF CHINA ELECTRONICS TECH GRP +1

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] Aiming at the disadvantage that the existing speech emotion recognition method is only based on the feature extraction of a certain aspect of the audio, and cannot make the classification model comprehensively identify the emotion according to the characteristics of the audio, the present invention discloses a speech emotion recognition method based on multi-modal feature extraction and fusion. The recognition method first extracts the eGeMAPS acoustic feature set from the input audio, which defines 88 basic speech emotion analysis features, and standardizes the operation method for low-level acoustic features to solve the problem of poor interpretability; secondly, The invention provides a method for extracting user features and audio content text features, and uses hierarchical self-attention to fuse user features, text features and acoustic features, fully utilizes the multimodal information in the corpus database, and extracts voice emotion More comprehensive modeling for more effective speech emotion recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0055] In order to better understand the contents of the present invention, an example is given here.

[0056] The invention discloses a speech emotion recognition method based on multimodal feature extraction and fusion, figure 1 It is an overall flowchart of the speech emotion recognition method of the present invention, and its steps include:

[0057] S1, data preprocessing;

[0058] S11, audio file preprocessing, figure 2 For the specific flow chart of audio preprocessing, it includes:

[0059] S111 , check the legality of the audio file format, only the legal audio format can correctly extract the acoustic features, and convert the illegal audio format into a legal audio format before performing subsequent processing. The specific method is to check the suffix name of the audio file, if the suffix name is in the legal suffix name list (including '.mp3', '.wav'), then pass the file format check; if it is not in the legal suffix name list, then Utilizes pyAudio open so...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech emotion recognition method based on multimodal feature extraction and fusion. First, the eGeMAPS acoustic feature set is extracted from the input audio. The feature set defines 88 basic speech emotion analysis features and standardizes the low-level The operation method of hierarchical acoustic features solves the problem of poor interpretability; secondly, the present invention provides a method for extracting user features and audio content text features, and uses hierarchical self-attention to combine user features and text features with acoustic features Carry out fusion, make full use of the multi-modal information in the corpus database, carry out more comprehensive modeling to speech emotion, thus realize speech emotion recognition more effectively; Finally, the present invention utilizes a kind of hierarchical self-attention mechanism to The in-depth fusion of modal features and the alignment of features of different modalities in the same high-dimensional semantic space enhance the ability to express emotions, thereby improving the accuracy of speech emotion recognition.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a speech emotion recognition method based on multimodal feature extraction and fusion. Background technique [0002] With the rapid development of information technology and the popularization of various smart devices, speech recognition is no longer limited to the simple recognition of a single phoneme and sentence in the voice of a specific speaker, such as remote call voice, children's voice, low-resource voice, The effective recognition of the voices of speech-impaired persons and emotional voices has also become some new hotspots in the field of speech recognition. Speech contains not only the text information that the speaker wants to express, but also the emotional information that the speaker wants to express. Effective recognition of emotion in speech can improve the intelligibility of speech. Therefore, some researchers are trying to develop human nature with human thi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/33G06F16/683G06F40/151G06F40/279G06N3/04G06N3/08

CPCG06F16/3343G06F16/3344G06F16/683G06F40/151G06F40/279G06N3/08G06N3/044G06N3/045

Inventor 任传伦郭世泽巢文涵张先国夏建民姜鑫孙玺晨俞赛赛刘晓影乌吉斯古愣

Owner NO 15 INST OF CHINA ELECTRONICS TECH GRP

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speech Emotion Recognition Method Based on Multimodal Feature Extraction and Fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology