Emotion recognition method and device based on cross-frequency coupling of multi-modal physiological signals
By analyzing the cross-frequency coupling of multimodal physiological signals, we constructed cross-modal low-frequency-high-frequency combination pairs, extracted and filtered the cross-frequency coupling features, and solved the problem of insufficient recognition performance of single-modal signals, achieving higher accuracy and stability in emotion recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JIANGSU FOOD & PHARMA SCI COLLEGE
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing emotion recognition technologies rely on single-modal signals, making it difficult to fully capture the multidimensional information of emotions. Furthermore, recognition performance deteriorates significantly under conditions of signal loss, noise interference, or unstable acquisition. Multimodal physiological signal fusion methods have failed to effectively uncover the intrinsic correlations and dynamic interactions between modalities, resulting in insufficient recognition accuracy and stability.
By analyzing the cross-frequency coupling of multimodal physiological signals, we construct cross-modal low-frequency-high-frequency cross-channel combination pairs, calculate the cross-frequency coupling relationship, extract cross-frequency coupling features, perform discriminative evaluation and redundancy constraint screening, establish a multimodal joint feature vector representation, and input it into the emotion recognition model for training.
It improves the accuracy and robustness of emotion recognition, can provide complementary information in the case of loss or anomaly of a certain modality signal, enhances the expressive power and physiological interpretability of emotion features, reduces redundancy and noise interference in model training, and improves generalization ability.
Smart Images

Figure CN122229451A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of signal processing and pattern recognition technology, and in particular to an emotion recognition method and device based on the cross-frequency coupling of multimodal physiological signals. Background Technology
[0002] Emotion is a dynamic process that influences social relationships, rational thinking, and decision-making. Emotion recognition, as an important interdisciplinary field of psychology and biology, has received widespread attention in recent years. Emotion involves complex interactions between the brain and the body's peripheral nervous system. When a person's emotional state changes, the sympathetic and / or parasympathetic systems are activated, leading to changes in the functioning of various internal organs. Physiological signals such as electroencephalography (EEG), electrocardiography (ECG), electrical skin activity (EDA), electromyography (EMG), electrooculography (EOG), photoplethysmography (PPG), and respiratory waves (RESP) can objectively reflect the process of emotional changes in the human body at both the central and peripheral nervous systems levels, providing important clues for emotion recognition. For example, heart rate variability (HRV) is a key indicator reflecting the state of the autonomic nervous system and is closely related to an individual's emotional state (such as anxiety, depression, stress, fear, etc.).
[0003] Emotion recognition based on physiological signals requires appropriate methods to extract meaningful information from these signals. Existing technologies suffer from at least the following shortcomings:
[0004] 1. Existing emotion recognition technologies mostly rely on signals from a single modality, making it difficult to fully capture the multidimensional information of emotions. Furthermore, their recognition performance significantly deteriorates and their generalization ability is insufficient when signals are missing, there is noise interference, or the data acquisition is unstable. The dynamic interdependence of various signals from the peripheral or central nervous system may be related to emotional experience. In cases where single-modality data is missing or discontinuous, data from other modalities may provide complementary information.
[0005] 2. Existing multimodal physiological signal fusion methods typically model individual modal signals as independent vector representations, without exploring the intrinsic correlations and dynamic interactions between different modal physiological signals, making it difficult to effectively mine shared emotional features between modalities.
[0006] 3. Existing multimodal physiological signal feature extraction methods mix various heterogeneous high-dimensional features. Directly splicing or weighted fusion of different modal features can easily lead to the curse of dimensionality and introduce a large number of redundant or negatively correlated features, thereby significantly reducing the training efficiency, stability and recognition accuracy of the model.
[0007] Complex modulation and feedback mechanisms exist between the central and peripheral nervous systems, and emotional changes often manifest as cross-frequency coupling between neural oscillations of different frequencies and between different physiological systems. Therefore, it is necessary to invent a new analytical method and device to address the aforementioned problems of traditional emotion recognition methods and devices. Summary of the Invention
[0008] Purpose of the invention: To address the shortcomings of existing technologies, this invention provides an emotion recognition method and device based on cross-frequency coupling of multimodal physiological signals. Through cross-frequency coupling analysis, it deeply explores the interaction between different modal physiological signals, providing a more accurate, robust, and physiologically interpretable emotion recognition scheme that can be widely applied in fields such as human-computer interaction, emotion monitoring, and cognitive function assessment.
[0009] Technical Solution: This invention provides an emotion recognition method based on the cross-frequency coupling of multimodal physiological signals, comprising the following steps:
[0010] Step S1: Synchronously acquire the user's multimodal physiological signals, including but not limited to electroencephalogram (EEG), electrocardiogram (ECG), electrical skin activity (EDA), electromyography (EMG), electrooculogram (EOG), photoplethysmography (PPG), and respiratory wave (RESP); preprocess the multimodal physiological signals to generate standard input sample segments;
[0011] Step S2: Perform frequency band decomposition processing on each channel preprocessed signal in step S1 to obtain multiple frequency band sub-signals for each channel, and construct a cross-modal low-frequency-high-frequency cross-channel combination pair, wherein one channel of the low-frequency-high-frequency cross-channel combination pair provides a low-frequency band signal and the other channel provides a high-frequency band signal;
[0012] Step S3: Establish a calculation model for the cross-modal cross-frequency coupling relationship of the low-frequency-high-frequency cross-channel combination pair, calculate at least one cross-frequency coupling relationship between the low-frequency signal in one mode and the high-frequency signal in another mode, and extract the cross-frequency coupling features;
[0013] Step S4: Perform discriminative evaluation and redundancy constraint screening on the cross-frequency coupling feature set to obtain a multimodal joint feature vector representation with low redundancy and high discriminative power;
[0014] Step S5: Input the multimodal joint feature vector into the preset emotion recognition model for training to establish the mapping relationship between multimodal physiological features and emotional states; input the corresponding multimodal joint feature vector of the test sample into the trained emotion recognition model and output the emotion state recognition result of the target object.
[0015] Further, step S1 specifically includes:
[0016] S1.1: Channel A is set to be EEG signal, channel B to be ECG signal, channel C to be skin conductance signal, channel D to be electromyography signal, channel E to be electrooculography signal, channel F to be photoplethysmography signal, and channel G to be respiratory wave signal.
[0017] S1.2 performs the following processes on the EEG signal: downsampling, rereference, bandpass filtering, notch filtering, and independent component analysis (ICA) artifact removal to obtain preprocessed EEG signals for channels A; and performs the following processes on the ECG, electrodermal transfer function (ETF), electromyography (EMG), electrooculography (EOG), photoplethysmography (PPG), and respiratory wave signals: downsampling, bandpass filtering, notch filtering, and motion artifact suppression to obtain preprocessed signals for channels B+C+D+E+F+G.
[0018] S1.3 performs time synchronization alignment, normalization, and sliding time window segmentation on all modal signals to generate standard sample segments.
[0019] Further, step S2 specifically includes:
[0020] S2.1 Let the set of preprocessed channel signals be represented as: ,in Indicates modal index, This indicates the channel index under the corresponding mode. The signal represents a time series signal. Frequency band decomposition is performed on each channel signal to obtain multi-frequency band sub-signals for each channel. The frequency band decomposition methods include, but are not limited to: bandpass filtering, wavelet packet transform, Fourier transform, empirical mode decomposition, and variational mode decomposition.
[0021] S2.2 Construct cross-modal low-frequency-high-frequency cross-channel combination pairs. ,in and This indicates the channel index under the corresponding mode. Indicates low-frequency band index, This indicates a high-frequency band index.
[0022] Furthermore, the coupling relationships include: phase-phase coupling relationship, amplitude-amplitude coupling relationship, phase-amplitude coupling relationship, and phase-frequency coupling relationship, in order to extract cross-frequency coupling features that can reflect the coordinated dynamic changes of multimodal physiological signals.
[0023] Furthermore, step S3 specifically includes:
[0024] S3.1: Perform Hilbert transform on any low-frequency-high-frequency cross-channel combination pair to obtain the analytic signal: obtain the instantaneous phase of the low-frequency signal. and the instantaneous amplitude of low-frequency signals Instantaneous phase of high-frequency signals and the instantaneous amplitude of high frequency signals ,in, Represents the Hilbert transform operator. Indicates modal index, This indicates the channel index under the corresponding mode. Represents a time series signal. Indicates low-frequency band index, Indicates a high-frequency band index;
[0025] S3.2 Calculate the following cross-modal frequency coupling characteristics:
[0026] (1) Phase-phase coupling:
[0027] ;
[0028] in, Indicates the length of the sample segment;
[0029] (2) Amplitude-amplitude coupling:
[0030] ;
[0031] (3) Phase-amplitude coupling:
[0032] ;
[0033] (4) Phase-frequency coupling:
[0034] For high frequency instantaneous frequency With low frequency phase Establish statistical dependencies between them. ,in Represents the mutual information function;
[0035] S3.3: Concatenate the PPC, AAC, PAC, and PFC features corresponding to the cross-modal frequency band combination to form a cross-modal cross-frequency coupling feature vector:
[0036] ,
[0037] The above process is repeated for each mode in pairs to construct a coupling matrix under a unified multimodal cross-frequency coupling tensor. ,in Indicates the number of modal combinations. Indicates the number of frequency band combinations. This indicates the number of coupling types, specifically four: PPC, AAC, PAC, and PFC. This is used to characterize the multimodal collaborative dynamic relationship between the central nervous system and the peripheral nervous system, providing a foundation of input features for subsequent emotion recognition models.
[0038] Further, step S4 specifically includes:
[0039] Step S4.1 Expand the cross-frequency coupled feature tensor into a candidate feature set. ,in , Indicates the first A cross-modal cross-frequency coupling feature is used to initialize the selected feature set. ;
[0040] Step S4.2: Measure the discriminative power of each coupled feature for the emotional state, and calculate the mutual information between each coupled feature and the emotional label Y. It measures the degree of information dependence between cross-modal coupling patterns and emotion categories;
[0041] Step S4.3 defines the redundancy measure between coupled features as ,in hour, ;
[0042] Step S4.4 Constructing the feature selection objective function ,in , For the weight coefficients, a greedy strategy is used to select the optimal feature: And update Until the stopping condition is met;
[0043] Step S4.5 will finally select the optimal feature subset. By fusing the features, a joint feature vector is obtained. .
[0044] Furthermore, the fusion function includes, but is not limited to, feature concatenation, weighted fusion, attention fusion, and tensor compression.
[0045] Furthermore, the emotion recognition model preset in step S5 includes, but is not limited to, a machine learning model or a deep learning model, which optimizes its parameters through the following objective function. ,in, This represents the joint feature vector generated in step S4. This indicates the corresponding emotion category label or emotion dimension annotation value. This represents the mapping function of the emotion recognition model. Represents the set of model parameters. Represents the loss function. This represents the regularization coefficient. The model parameters are updated by minimizing the loss function, and the trained emotion recognition model is used for emotion recognition on the test set.
[0046] This invention also discloses an emotion recognition device based on the cross-frequency coupling of multimodal physiological signals, comprising:
[0047] The multimodal physiological signal acquisition and preprocessing module is used to acquire the user's multimodal physiological signals and preprocess them, and then send the preprocessed signals to the frequency band decomposition and cross-channel combination construction module.
[0048] The frequency band decomposition and cross-channel combination construction module is used to perform frequency band decomposition processing on the preprocessed signal, obtain multiple frequency band sub-signals of each channel, construct cross-modal low-frequency-high-frequency cross-channel combination pairs, and send the combination pairs to the cross-modal cross-frequency coupling modeling module.
[0049] The cross-modal cross-frequency coupling modeling module calculates at least one coupling relationship between a low-frequency signal in one mode and a high-frequency signal in another mode, extracts cross-frequency coupling features, and outputs them to the coupling feature filtering and fusion module.
[0050] The coupling feature filtering and fusion module is used to perform discriminative evaluation and redundancy constraint filtering on the cross-frequency coupling features to obtain a low-redundancy, high-discriminative multimodal joint feature vector representation, which is then sent to the emotion recognition and visualization module.
[0051] The emotion recognition and visualization module inputs the multimodal joint feature vector into a preset emotion recognition model for training or inference processing, and outputs the emotion state recognition result of the target object.
[0052] Beneficial effects
[0053] 1. This invention addresses the incomplete expression of emotional information in single-modal signals by introducing cross-frequency coupling analysis of multimodal physiological signals (eEG, ECG, electroskin activity, electromyography, electrooculography, photoplethysmography, pulse wave, and respiratory wave, etc.), thereby improving the system's ability to perceive complex emotional states. Even in the event of loss or anomaly in a particular modality, other modalities can still provide effective complementary information, ensuring the continuity and robustness of the emotion recognition process and improving accuracy and generalization ability. This invention constructs a unified temporal reference framework through multimodal time synchronization alignment, resampling to unify the time base, sliding time window segmentation, and normalization processing, making different physiological channels comparable and providing reliable input for subsequent cross-modal coupling calculations.
[0054] 2. Compared to traditional multimodal fusion methods that simply splice or independently model the features of each modality, this invention constructs a directional cross-frequency coupling pair by using a cross-frequency coupling mechanism, where one modality provides low frequencies and another provides high frequencies. This fundamentally solves the problems of physiological modulation direction, cross-system control path, and the explosion of ineffective combinations. By deeply exploring the intrinsic correlations and frequency synergistic changes between different modal physiological signals, it can characterize the synergistic change patterns between the central and peripheral nervous systems, effectively revealing the shared emotional representations hidden between modalities, enhancing the expressive power and physiological interpretability of emotional features, providing more discriminative joint feature representations for emotion recognition and even prediction, and reducing the dependence of emotion recognition on the algorithm.
[0055] 3. Single-modal CFC is based on signals from the same source, exhibiting consistent noise structures and similar statistical properties. However, multimodal CFC involves signals from different sources, resulting in variations in noise levels, amplitude scales, and phase stability. Directly applying single-modal CFC methods can easily lead to failure or distortion. This invention constructs a unified tensor representation across modalities, frequency bands, and multiple coupling types (PPC / AAC / PAC / PFC). Its essential contribution lies in transforming "incomparable heterogeneous physiological dynamics" into a "coupling representation space that can be uniformly analyzed."
[0056] 4. To address the issues of high dimensionality and redundancy in the cross-frequency coupling features of multimodal physiological signals, this invention introduces a feature selection mechanism based on correlation constraints and discriminant ability evaluation. It incorporates discriminant ability (MI), redundancy constraints (Red), greedy selection, and weighted fusion to form a dedicated selection mechanism for cross-modal coupling features. This effectively suppresses redundant and negatively correlated features, avoids the curse of dimensionality caused by directly splicing high-dimensional heterogeneous features, and reduces the interference of irrelevant information on model training, thereby improving the efficiency and stability of emotion recognition.
[0057] 5. Compared with single physiological signal processing methods, this invention addresses the significant heterogeneity of multimodal physiological signals in terms of time scale, frequency structure, and statistical characteristics by introducing a cross-modal time synchronization mechanism, a directional low-frequency-high-frequency combination strategy, multi-type cross-frequency coupling modeling, and a discriminant-redundancy joint constraint screening method. This enables the proposed emotion recognition framework to operate stably under multimodal conditions and maintain high discriminative ability, thereby achieving an effective extension from single-modal to multimodal emotion recognition. Attached Figure Description
[0058] Figure 1 This is a flowchart of an emotion recognition method based on the cross-frequency coupling of multimodal physiological signals according to the present invention.
[0059] Figure 2 This is a schematic diagram of a multimodal physiological signal synchronous acquisition system in an embodiment of the present invention.
[0060] Figure 3 The multimodal physiological signals obtained after preprocessing are shown in this embodiment of the invention.
[0061] Figure 4 In this embodiment of the invention, a frequency band combination pair between different channels across modes is constructed, wherein one channel provides a lower frequency band signal and the other channel provides a higher frequency band signal.
[0062] Figure 5 This is a flowchart illustrating the coupling feature screening and fusion process in an embodiment of the present invention.
[0063] Figure 6 This is a schematic diagram of the structure of an emotion recognition device based on the cross-frequency coupling of multimodal physiological signals according to the present invention. Detailed Implementation
[0064] The present invention will now be described in further detail with reference to the accompanying drawings and specific embodiments.
[0065] Reference Figure 1 This invention provides an emotion recognition method based on the cross-frequency coupling of multimodal physiological signals, the method comprising the following steps:
[0066] Step S1: Synchronously acquire the user's multimodal physiological signal data using physiological signal acquisition equipment, referring to... Figure 2 This includes, but is not limited to, various physiological signals such as electroencephalogram (EEG), electrocardiogram (ECG), electrical skin activity (EDA), electromyography (EMG), electrooculogram (EOG), photoplethysmography (PPG), and respiratory wave (RESP), as referenced. Figure 3 The multimodal physiological signals are preprocessed to obtain preprocessed signals for each channel. All modal signals are then time-synchronized, normalized, and segmented using a sliding time window to generate standard input sample segments.
[0067] S1.1 synchronously collects multimodal physiological signal data from users through physiological signal acquisition devices, including but not limited to electroencephalogram (EEG), electrocardiogram (ECG), electrical skin activity (EDA), electromyography (EMG), electrooculogram (EOG), photoplethysmography (PPG), and respiratory wave (RESP). Among them, 32 channels are EEG signals, 12 channels are ECG signals, 2 channels are EDA signals, 8 channels are EMG signals, 4 channels are EEO signals, 1 channel is PPG signal, and 1 channel is respiratory wave signal. The sampling frequency can be set within the range of 100Hz to 1000Hz and can be adjusted according to the actual application scenario.
[0068] S1.2 Preprocesses the obtained physiological signals of each modality. Optionally, the EEG signals are subjected to downsampling, rereference, bandpass filtering, notch filtering, and independent component analysis (ICA) artifact removal to obtain 32-channel preprocessed EEG signals. The ECG, skin conductance, electromyography, electrooculography, photoplethysmography (PPG), and respiratory wave signals are subjected to downsampling, bandpass filtering, notch filtering, and motion artifact suppression to obtain 28-channel preprocessed signals.
[0069] S1.3 Perform time synchronization alignment, normalization processing, and sliding time window segmentation on all modal signals. In this embodiment, a 5-second time window with a window step size of 1 second is preferred to generate standard sample segments.
[0070] Step S2: Perform frequency band decomposition processing on each channel preprocessed signal from Step S1 to obtain multiple frequency band sub-signals for each channel. Further, refer to... Figure 4 By constructing cross-mode frequency band combination pairs between different channels, one channel provides a lower frequency band signal and the other channel provides a higher frequency band signal.
[0071] S2.1 Let the set of 60 channel signals after preprocessing in step S1 be represented as: ,in Indicates modal index, This indicates the channel index under the corresponding mode. Representing a time-series signal, frequency band decomposition is performed on each channel signal. In this embodiment, wavelet packet transform is preferred, but bandpass filtering, Fourier transform, empirical mode decomposition, and variational mode decomposition can also be used to obtain multi-band sub-signals for each channel. ,in Indicates frequency band index. The total number of frequency bands is indicated. Preferably, the EEG signal is divided into δ (0.5-4 Hz), θ (4-8 Hz), α (8-13 Hz), β (13-30 Hz), and γ (30-50 Hz).
[0072] S2.2 Constructs frequency band combination pairs between different channels across modes, so that one channel provides a lower frequency band signal and the other channel provides a higher frequency band signal. ,in and This indicates the channel index under the corresponding mode. Indicates low-frequency band index, This indicates the high-frequency band index, which is used to provide the input structure for subsequent cross-frequency coupling analysis.
[0073] Step S3: For the low-frequency-high-frequency cross-channel combination pair constructed in Step S2, establish a calculation model for the cross-modal cross-frequency coupling relationship, calculate at least one cross-frequency coupling relationship between the low-frequency signal in one mode and the high-frequency signal in another mode, the coupling relationship includes: phase-phase coupling relationship, amplitude-amplitude coupling relationship, phase-amplitude coupling relationship, phase-frequency coupling relationship, so as to extract the cross-frequency coupling features that can reflect the coordinated dynamic changes of multimodal physiological signals.
[0074] S3.1: Perform Hilbert transform on any cross-modal frequency band combination pair to obtain the analytic signal: obtain the instantaneous phase of the low-frequency signal. and instantaneous amplitude Perform a Hilbert transform on the high-frequency signal to obtain the instantaneous phase and amplitude, thus obtaining the instantaneous phase of the high-frequency signal. and instantaneous amplitude ,in, This represents the Hilbert transform operator.
[0075] S3.2 Calculate the following cross-modal frequency coupling characteristics:
[0076] (1) Phase-phase coupling
[0077]
[0078] Used to characterize the degree of synchronization between the low-frequency phase and the high-frequency phase of different modes, among which Indicates the length of the sample segment.
[0079] (2) Amplitude-Amplitude Coupling
[0080]
[0081] Used to characterize the correlation between energy fluctuations in different modal frequency bands, among which Indicates the length of the sample segment.
[0082] (3) Phase-amplitude coupling
[0083]
[0084] Used to characterize the influence of low-frequency phase on the intensity of high-frequency amplitude modulation, among which Indicates the length of the sample segment;
[0085] (4) Phase-frequency coupling
[0086] For high frequency instantaneous frequency With low frequency phase Establish statistical dependencies between them. ,in Represents the mutual information function.
[0087] The PPC, AAC, PAC, and PFC features corresponding to the cross-modal frequency band combination are concatenated to form a cross-modal cross-frequency coupling feature vector:
[0088] ,
[0089] The above process is repeated for each mode in pairs to construct a coupling matrix under a unified multimodal cross-frequency coupling tensor. ,in Indicates the number of modal combinations. Indicates the number of frequency band combinations. Indicates the number of coupling types (PPC, AAC, PAC, PFC). This is used to characterize the multimodal collaborative dynamic relationship between the central nervous system and the peripheral nervous system, providing a foundation of input features for subsequent emotion recognition models.
[0090] Step S4: Perform discriminative evaluation and redundancy constraint screening on the high-dimensional multimodal physiological signal cross-frequency coupling feature set obtained in Step S3, referring to... Figure 5 This includes feature discriminative evaluation, redundant feature suppression, optimal feature selection, and joint feature construction, to obtain a multimodal joint feature vector representation with low redundancy and high discriminative power, thereby alleviating the curse of dimensionality caused by high-dimensional feature space.
[0091] Step S4.1 Expand the tensor into a set of candidate features ,in , Indicates the first A cross-modal cross-frequency coupling feature is used to initialize the selected feature set. .
[0092] Step S4.2: Measure the discriminative power of each coupled feature for the emotional state, and calculate the mutual information between each coupled feature and the emotional label Y. Among them, mutual information Defined as This metric is used to measure the degree of information dependence between cross-modal coupling patterns and emotion categories.
[0093] Step S4.3, to reduce information redundancy between different coupling paths, defines a redundancy metric between coupling features as follows: ,in hour, This redundancy term is used to characterize the degree of overlap in modulation modes between different mode coupling paths, thereby avoiding duplicate information from entering the final feature space.
[0094] Step S4.4 Constructing the feature selection objective function ,in , For the weight coefficients, a greedy strategy is used to select the optimal feature: And update Until a stopping condition is met, which may include, but is not limited to, reaching a preset feature dimension or If the value is less than a certain number, the classification performance will converge.
[0095] Step S4.5 will finally select the optimal feature subset. By fusing the features, a joint feature vector is obtained. The fusion function includes, but is not limited to, feature concatenation, weighted fusion, attention fusion, or tensor compression, resulting in joint features. Input the emotion recognition model from step S5.
[0096] Step S5: Input the multimodal joint feature vector obtained in step S4 into a preset emotion recognition model for training. The emotion recognition model is a machine learning model or a deep learning model to establish a mapping relationship between multimodal physiological features and emotional states. In the testing phase, input the corresponding multimodal joint feature vector of the sample to be tested into the trained emotion recognition model and output the emotion state recognition result of the target object. Optionally, the recognition result may also be visualized.
[0097] S5.1 Constructing the Training Sample Set ,in This represents the joint feature vector generated in step S4. This indicates the corresponding emotion category label or emotion dimension annotation value.
[0098] S5.2 Constructing the emotion recognition model mapping function The emotion recognition model includes, but is not limited to, machine learning models or deep learning models. In this embodiment, a support vector machine is preferably used, and the parameters are optimized through the following objective function. ,in Represents the set of model parameters. Represents the loss function. The regularization coefficient is used to update the model parameters by minimizing the loss function. The model with the best performance is determined based on the validation set evaluation. The trained emotion recognition model is then used for emotion recognition on the test set.
[0099] S5.3 In the inference phase, the test sample Perform predictions and output the target user's emotion category label or emotion continuous dimension prediction results. In this embodiment, the emotion recognition result is "happy". Optionally, the predicted emotion category output can be visualized to reflect the change in the target object's emotional state.
[0100] Since emotion regulation involves the synergistic effects of the central and peripheral nervous systems, this invention, through cross-modal cross-frequency coupling modeling, can more fully characterize the dynamic modulation relationships between different physiological systems, thereby improving emotion representation capabilities from a mechanistic perspective. To verify the effectiveness of the method, a publicly available multimodal emotion database is preferably used as the experimental data source for offline experimental verification. In this embodiment, the DEAP multimodal emotion dataset is preferably used. This dataset simultaneously includes electroencephalogram (EEG), peripheral physiological signals (including ECG, EDA, EOG, RESP, EMG, PPG, etc.), and corresponding emotion annotation information, which can meet the research needs of multimodal emotion recognition. It should be understood that the dataset is only used as an example data source and does not constitute a limitation on the application scenario of this invention.
[0101] In the experiment, multimodal cross-frequency coupling features were first extracted according to steps S1~S4 of the present invention, and a joint feature vector was constructed. The obtained features were then input into the emotion recognition model for training and testing. To verify the effectiveness of the method of the present invention, the following comparison methods were set up: (1) single-modal EEG feature method; (2) multimodal feature direct splicing method; (3) multimodal method without cross-frequency coupling modeling.
[0102] Comparative experiments were conducted under the same data partitioning strategy and classifier parameter settings. Experimental results show that, compared to the aforementioned comparative methods, the method of this invention exhibits superior performance in both emotion classification accuracy and F1-score, with an overall recognition accuracy improvement of approximately 3%–8% (with variations among different participants). This indicates that the proposed cross-modal cross-frequency coupling feature can more effectively characterize the collaborative change patterns of multiple physiological systems related to emotion. Further analysis revealed that the method of this invention maintains relatively stable recognition performance even when the quality of some modal signals is degraded or locally missing, verifying the effectiveness of multimodal coupling modeling in improving system robustness.
[0103] Reference Figure 6 As an implementation of the method shown in the above embodiments, another embodiment of the present invention also provides an emotion recognition device based on the cross-frequency coupling of multimodal physiological signals. This device embodiment corresponds to the foregoing method embodiments. For ease of reading, this device embodiment will not repeat the details of the foregoing method embodiments one by one, but it should be clear that the device in this embodiment can implement all the contents of the foregoing method embodiments. Figure 6 This diagram illustrates the structure of an emotion recognition device based on the cross-frequency coupling of multimodal physiological signals, according to an embodiment of the present invention. Figure 6 As shown, the device in this embodiment is equipped with the following modules, including:
[0104] The multimodal physiological signal acquisition and preprocessing module is used to acquire and preprocess the user's multimodal physiological signals. The preprocessing includes noise reduction, filtering, artifact removal, time synchronization alignment and normalization to obtain a standard input sample signal, and then sends the preprocessed signal to the frequency band decomposition and cross-channel combination construction module.
[0105] The frequency band decomposition and cross-channel combination construction module is used to perform frequency band decomposition processing on the preprocessed signal, obtain multiple frequency band sub-signals of each channel, and construct cross-mode frequency band combination pairs between different channels, so that one channel provides a lower frequency band signal and another channel provides a higher frequency band signal, and send the combination pair to the cross-mode cross-frequency coupling modeling module.
[0106] The cross-modal cross-frequency coupling modeling module calculates at least one coupling relationship between a low-frequency signal in one modality and a high-frequency signal in another modality. The coupling relationship includes phase-phase coupling, amplitude-amplitude coupling, phase-amplitude coupling, and phase-frequency coupling. This is used to extract cross-frequency coupling features that reflect the coordinated dynamic changes of multimodal physiological signals and output them to the coupling feature filtering and fusion module.
[0107] The coupling feature filtering and fusion module is used to perform discriminative evaluation and redundancy constraint filtering on the cross-frequency coupling features, including feature discriminative evaluation, redundancy feature suppression, optimal feature filtering, and joint feature construction, to obtain a low-redundancy, high-discriminative multimodal joint feature vector representation, which is then sent to the emotion recognition and visualization module.
[0108] The emotion recognition and visualization module is used to input the multimodal joint feature vector into a preset emotion recognition model for training or inference processing, so as to establish or call the mapping relationship between multimodal physiological features and emotional state, and output the emotion state recognition result of the target object; optionally, it is also used to visualize the recognition result.
[0109] The above modules can be implemented through software, hardware, or a combination of both, and can be deployed in wearable terminals, mobile devices, embedded systems, or cloud servers.
[0110] The foregoing detailed description of preferred embodiments of the present invention enables those skilled in the art to understand and utilize the invention effectively. However, the present invention is not limited to the described embodiments. Those skilled in the art can make equivalent modifications or substitutions without departing from the spirit and scope of the invention, and all such equivalent modifications or substitutions are included within the scope defined by the claims of the present invention.
Claims
1. An emotion recognition method based on cross-frequency coupling of multimodal physiological signals, characterized in that, Includes the following steps: Step S1: Synchronously acquire the user's multimodal physiological signals, including but not limited to electroencephalogram (EEG), electrocardiogram (ECG), electrical skin activity (EDA), electromyography (EMG), electrooculogram (EOG), photoplethysmography (PPG), and respiratory wave (RESP); preprocess the multimodal physiological signals to generate standard input sample segments; Step S2: Perform frequency band decomposition processing on each channel preprocessed signal in step S1 to obtain multiple frequency band sub-signals for each channel, and construct a cross-modal low-frequency-high-frequency cross-channel combination pair, wherein one channel of the low-frequency-high-frequency cross-channel combination pair provides a low-frequency band signal and the other channel provides a high-frequency band signal; Step S3: Establish a calculation model for the cross-modal cross-frequency coupling relationship of the low-frequency-high-frequency cross-channel combination pair, calculate at least one cross-frequency coupling relationship between the low-frequency signal in one mode and the high-frequency signal in another mode, and extract the cross-frequency coupling features; Step S4: Perform discriminative evaluation and redundancy constraint screening on the cross-frequency coupling feature set to obtain a multimodal joint feature vector representation with low redundancy and high discriminative power; Step S5: Input the multimodal joint feature vector into the preset emotion recognition model for training to establish the mapping relationship between multimodal physiological features and emotional states; input the corresponding multimodal joint feature vector of the test sample into the trained emotion recognition model and output the emotion state recognition result of the target object.
2. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 1, characterized in that, Step S1 specifically includes: S1.1: Channel A is set to be EEG signal, channel B to be ECG signal, channel C to be skin conductance signal, channel D to be electromyography signal, channel E to be electrooculography signal, channel F to be photoplethysmography signal, and channel G to be respiratory wave signal. S1.2 performs the following processes on the EEG signal: downsampling, rereference, bandpass filtering, notch filtering, and independent component analysis (ICA) artifact removal to obtain preprocessed EEG signals for channels A; and performs the following processes on the ECG, electrodermal transfer function (ETF), electromyography (EMG), electrooculography (EOG), photoplethysmography (PPG), and respiratory wave signals: downsampling, bandpass filtering, notch filtering, and motion artifact suppression to obtain preprocessed signals for channels B+C+D+E+F+G. S1.3 performs time synchronization alignment, normalization, and sliding time window segmentation on all modal signals to generate standard sample segments.
3. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 1, characterized in that, Step S2 specifically includes: S2.1 Let the set of preprocessed channel signals be represented as: ,in Indicates modal index, This indicates the channel index under the corresponding mode. The signal represents a time series signal. Frequency band decomposition is performed on each channel signal to obtain multi-frequency band sub-signals for each channel. The frequency band decomposition methods include, but are not limited to: bandpass filtering, wavelet packet transform, Fourier transform, empirical mode decomposition, and variational mode decomposition. S2.2 Construct cross-modal low-frequency-high-frequency cross-channel combination pairs. ,in and This indicates the channel index under the corresponding mode. Indicates low-frequency band index, This indicates a high-frequency band index.
4. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 1, characterized in that, The coupling relationships include phase-phase coupling, amplitude-amplitude coupling, phase-amplitude coupling, and phase-frequency coupling, in order to extract cross-frequency coupling features that can reflect the coordinated dynamic changes of multimodal physiological signals.
5. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 4, characterized in that, Step S3 specifically includes: S3.1: Perform Hilbert transform on any low-frequency-high-frequency cross-channel combination pair to obtain the analytic signal: obtain the instantaneous phase of the low-frequency signal. and the instantaneous amplitude of low-frequency signals Instantaneous phase of high-frequency signals and the instantaneous amplitude of high frequency signals ,in, Represents the Hilbert transform operator. Indicates modal index, This indicates the channel index under the corresponding mode. Represents a time series signal. Indicates low-frequency band index, Indicates a high-frequency band index; S3.2 Calculate the following cross-modal frequency coupling characteristics: (1) Phase-phase coupling: ; in, Indicates the length of the sample segment; (2) Amplitude-amplitude coupling: ; (3) Phase-amplitude coupling: ; (4) Phase-frequency coupling: For high frequency instantaneous frequency With low frequency phase Establish statistical dependencies between them. ,in Represents the mutual information function; S3.3: Concatenate the PPC, AAC, PAC, and PFC features corresponding to the cross-modal frequency band combination to form a cross-modal cross-frequency coupling feature vector: , The above process is repeated for each mode in pairs to construct a coupling matrix under a unified multimodal cross-frequency coupling tensor. ,in Indicates the number of modal combinations. Indicates the number of frequency band combinations. This indicates the number of coupling types, specifically four: PPC, AAC, PAC, and PFC. This is used to characterize the multimodal collaborative dynamic relationship between the central nervous system and the peripheral nervous system, providing a foundation of input features for subsequent emotion recognition models.
6. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 1, characterized in that, Step S4 specifically includes: Step S4.1 Expand the cross-frequency coupled feature tensor into a candidate feature set. ,in , Indicates the first A cross-modal cross-frequency coupling feature is used to initialize the selected feature set. ; Step S4.2: Measure the discriminative power of each coupled feature for the emotional state, and calculate the mutual information between each coupled feature and the emotional label Y. It measures the degree of information dependence between cross-modal coupling patterns and emotion categories; Step S4.3 defines the redundancy measure between coupled features as ,in hour, ; Step S4.4 Constructing the feature selection objective function ,in , For the weight coefficients, a greedy strategy is used to select the optimal feature: And update Until the stopping condition is met; Step S4.5 will finally select the optimal feature subset. By fusing the features, a joint feature vector is obtained. .
7. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 6, characterized in that, The fusion functions include, but are not limited to, feature concatenation, weighted fusion, attention fusion, and tensor compression.
8. The emotion recognition method based on cross-frequency coupling of multimodal physiological signals according to claim 1, characterized in that, The emotion recognition model preset in step S5 includes, but is not limited to, machine learning models or deep learning models, whose parameters are optimized using the following objective function. ,in, This represents the joint feature vector generated in step S4. This indicates the corresponding emotion category label or emotion dimension annotation value. This represents the mapping function of the emotion recognition model. Represents the set of model parameters. Represents the loss function. This represents the regularization coefficient. The model parameters are updated by minimizing the loss function, and the trained emotion recognition model is used for emotion recognition on the test set.
9. An emotion recognition device based on cross-frequency coupling of multimodal physiological signals, characterized in that, include: The multimodal physiological signal acquisition and preprocessing module is used to acquire the user's multimodal physiological signals and preprocess them, and then send the preprocessed signals to the frequency band decomposition and cross-channel combination construction module. The frequency band decomposition and cross-channel combination construction module is used to perform frequency band decomposition processing on the preprocessed signal, obtain multiple frequency band sub-signals of each channel, construct cross-modal low-frequency-high-frequency cross-channel combination pairs, and send the combination pairs to the cross-modal cross-frequency coupling modeling module. The cross-modal cross-frequency coupling modeling module calculates at least one coupling relationship between a low-frequency signal in one mode and a high-frequency signal in another mode, extracts cross-frequency coupling features, and outputs them to the coupling feature filtering and fusion module. The coupling feature filtering and fusion module is used to perform discriminative evaluation and redundancy constraint filtering on the cross-frequency coupling features to obtain a low-redundancy, high-discriminative multimodal joint feature vector representation, which is then sent to the emotion recognition and visualization module. The emotion recognition and visualization module inputs the multimodal joint feature vector into a preset emotion recognition model for training or inference processing, and outputs the emotion state recognition result of the target object.