Mine emergency rescue life state evaluation system based on multi-modal sign information fusion

By introducing multimodal physiological signal acquisition and hierarchical condition fusion into the mine emergency rescue system, the problem of refined identification of coma status in the assessment of life status in underground mines has been solved, and accurate identification of low arousal status has been achieved, thus improving rescue efficiency.

CN122245767APending Publication Date: 2026-06-19XIAN UNIV OF SCI & TECH +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAN UNIV OF SCI & TECH
Filing Date
2026-03-20
Publication Date
2026-06-19

Smart Images

  • Figure CN122245767A_ABST
    Figure CN122245767A_ABST
Patent Text Reader

Abstract

A life status assessment system based on multimodal vital sign information fusion for mine emergency rescue includes a multimodal physiological signal acquisition module, a life status feature extraction module, and three levels of physiological signals. These signals undergo preliminary filtering, resampling, and data encapsulation via an embedded main control unit. The life status feature extraction module designs dedicated feature extraction methods for EEG, ECG, and SpO₂, extracting discriminative feature vectors highly correlated with consciousness level and life support capacity. The hierarchical conditional fusion module encodes the clinical decision logic of "assessing vital signs first, then assessing consciousness" into a network structure. It first fuses ECG and SpO₂ to generate a "life support context," then uses this context to guide attentional fusion of EEG features to obtain high-dimensional features, achieving multimodal information integration driven by physiological logic. This invention achieves non-contact, rapid, quantitative, and interpretable intelligent discrimination of the life status of trapped personnel.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of emergency rescue technology in underground mines, and specifically relates to a life status assessment system based on multimodal vital sign information fusion in mine emergency rescue. Background Technology

[0002] Currently, portable signal acquisition is the hardware foundation for achieving vital sign assessment downhole, while the characteristic and fusion assessment of different modalities of bioelectrical signals are crucial. For physiological signal processing, deep learning-based feature extraction and fusion methods have become mainstream, achieving key progress in improving signal robustness and uncovering multimodal correlations.

[0003] Signal processing and enhancement technologies have matured, with much research focused on addressing noise and missing data issues in signal acquisition. Existing technologies have proposed multifunctional cardiovascular signal fusion methods based on unified diffusion approaches, capable of denoising, interpolating, and even cross-modal synthesis of signals such as electrocardiograms (ECGs). Other technologies have proposed multimodal biological signal fusion frameworks using heterogeneous metadata, enabling cross-modal representation generation from electroencephalograms (EEGs) to cerebral oxygenation signals, improving signal quality and integrity. These methods offer technological possibilities for acquiring usable data in harsh environments. Deep feature extraction has become mainstream research, with convolutional neural networks (CNNs) becoming the dominant architecture for processing waveform signals such as EEG and ECGs due to their powerful spatial feature extraction capabilities. [7-11] Research shows that over 90% of deep learning-based EEG classification studies employ or incorporate CNN architectures. The BioCross framework developed by the Shanghai Jiao Tong University team also utilizes deep networks to learn unified cross-modal representations from ECG, PPG, and other signals.

[0004] Despite their advanced methods, existing feature extraction models are mostly general designs aimed at maximizing overall performance across various tasks. However, assessing the condition of comatose victims after a mine disaster requires features to be directly correlated with specific physiological crises such as level of consciousness, autonomic nervous system failure, and oxygenation impairment. General features may contain redundant information, while missing subtle dynamic features crucial for distinguishing between "awakenable" and "mildly comatose" individuals.

[0005] The core of multimodal research lies in fusing multimodal information. In decision-level fusion, each modality is processed independently up to the classification stage before the results are integrated. This approach is simple to implement and highly fault-tolerant. However, it completely ignores early intermodal associations and cannot achieve physiologically meaningful complementary enhancement. In feature-level fusion, features from each modality are extracted and then spliced ​​or weighted. This method preserves modal characteristics and is currently the mainstream approach. However, current research often uses simplified splicing and fails to establish a structured fusion logic based on physiological mechanisms (e.g., "cardiopulmonary function precedes brain function assessment"). In input-level fusion, the original signals are spliced ​​early and then processed uniformly. Theoretically, this can uncover the most basic associations. However, it is susceptible to interference from different signal sampling rates and scale differences, resulting in a heavy model learning burden. To improve simple fusion, many studies have begun to focus on more refined interaction mechanisms. Some studies use cross-modal attention mechanisms to dynamically calibrate the weights of different modal features or align different modalities in the latent space.

[0006] While existing advanced fusion strategies can improve model performance, their design is inherently data-driven, meaning they rely on the model automatically discovering correlations from the data. However, mine rescue assessment urgently needs a knowledge-guided fusion paradigm: one that simulates the clinical decision-making path in emergency medicine—"first assess vital signs (heart / lungs), then assess consciousness (brain)." [20, 21] Hard-coding this hierarchical and conditional physiological logic into the fusion architecture, rather than relying entirely on the model to discover it on its own, is a challenge that current research has not yet systematically solved. Summary of the Invention

[0007] To overcome the above technical problems, the present invention aims to provide a vital status assessment system based on multimodal vital sign information fusion for mine emergency rescue. This assessment system features customization for extreme mine environments, deep integration of physiological signal characteristics and clinical decision-making logic, and refined grading of a continuous spectrum of consciousness states. By introducing the "awakenable" state as an independent category of vital status, a four-level assessment system from "near death" to "awakenable" is constructed, achieving non-contact, rapid, quantitative, and interpretable intelligent identification of the vital status of trapped personnel.

[0008] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0009] A life status assessment system based on multimodal vital sign information fusion for mine emergency rescue includes a multimodal physiological signal acquisition module, a life status feature extraction module, a hierarchical conditional fusion module, and a decision and output module;

[0010] The multimodal physiological signal acquisition module is used to simultaneously acquire three physiological signals: electroencephalogram (EEG), electrocardiogram (ECG), and blood oxygen (SpO2). The embedded main control unit performs preliminary filtering, resampling, and data encapsulation to provide high-quality raw signal input for subsequent feature extraction.

[0011] The vital state feature extraction module is designed to extract specific features for EEG, ECG, and SpO2, and extracts discriminative feature vectors that are highly correlated with the level of consciousness and life maintenance ability from multiple dimensions such as time frequency, nonlinearity, waveform morphology, autonomic nervous function, and oxygenation trend.

[0012] The hierarchical conditional fusion module is used to encode the clinical decision logic of "assessing vital signs first and then assessing consciousness" into a network structure. It first fuses ECG and SpO2 to generate "life maintenance state context", and then uses this condition to guide the attention fusion of EEG features to obtain high-dimensional features and realize the integration of multimodal information driven by physiological logic.

[0013] The decision and output module is used to input the fused high-dimensional features into the classifier, output the probability distribution of the four life states of the trapped personnel, generate a visual assessment result, and transmit it to the wellhead command center to support rescue decision-making.

[0014] The four modules form an end-to-end processing chain: signal acquisition → feature extraction → knowledge-guided fusion → state determination. The acquisition module provides the underlying support, the feature extraction module transforms the raw signal into computable features, the fusion module achieves knowledge integration of cross-modal information, and the output module completes decision mapping and human-computer interaction. The modules are decoupled through a standardized feature vector interface, facilitating algorithm iteration and hardware adaptation.

[0015] The multimodal physiological signal acquisition module includes an EEG, ECG, and blood oxygen sensor array; an embedded core controller; and a vital signs assessment model.

[0016] The EEG, ECG, and blood oxygen sensor array is responsible for picking up raw physiological signals, including a frontal single-channel EEG electrode, a BMD101 ECG module, and an MKS-SPO2-R blood oxygen module.

[0017] The embedded core controller (CM3588) performs synchronous acquisition of multiple signals, real-time filtering, resampling, data packaging, and edge inference acceleration.

[0018] The life status assessment model is deployed within the embedded core controller. It receives preprocessed signals, performs feature extraction and fusion inference, and outputs the status classification results.

[0019] The EEG, ECG, and blood oxygen sensor arrays are the system's sensors, responsible for converting bioelectrical activity into digital bioelectrical signals. The controller is the hardware carrier of the decision-making function, responsible for controlling the sensor array's acquisition and filtering functions, and sending the bioelectrical signals into the evaluation model to output the vital status. The evaluation model is the main software carrier of the decision-making function, primarily responsible for fusing and evaluating the bioelectrical signals. These three components work together to form an edge-side closed-loop intelligent evaluation unit, enabling real-time vital status determination without relying on the cloud.

[0020] In the multimodal physiological signal acquisition module, the EEG feature extractor adopts a three-path parallel architecture of "time-frequency + nonlinear + morphology" to output feature vectors. ECG Feature Extractor: Outputs data through a dual-branch approach of HRV analysis and waveform morphology. SpO2 Feature Extractor: Outputs features based on statistical characteristics, sliding trends, and Bi-GRU time series modeling. .

[0021] The three components are designed independently, each quantifying the functional state of different physiological systems. Their output features are complementary in both dimension and semantics, providing multi-perspective, low-redundancy, and highly interpretable feature inputs for the subsequent fusion module. The hierarchical conditional fusion module includes an encoder, a life maintenance state context, and a conditional attention mechanism, among other parts.

[0022] The decision and output module includes a classifier, a life state probability distribution, and a four-class classification task.

[0023] The classifier consists of a multilayer perceptron and inputs fused features. Output four-dimensional logits;

[0024] The Softmax layer transforms logits into a probability distribution, corresponding to "awakenable, mild coma, severe coma, and near death";

[0025] The four-category task packages probability distribution, confidence level, recommended rescue priority, etc. into structured data for display by the host computer or call by the command system;

[0026] The classifier maps features to decisions, Softmax normalizes probabilities, and the output of the four-class classification task enables human-computer interaction.

[0027] The operation method of a life status assessment system based on multimodal vital sign information fusion in mine emergency rescue includes the following steps;

[0028] Step 1: Synchronous acquisition of multimodal physiological signals:

[0029] The frontal EEG, single-lead ECG, and fingertip blood oxygen saturation signals of the trapped personnel were simultaneously acquired using EEG, ECG, and SpO2 sensor arrays. The sampling rate was uniformly set to 125Hz, and real-time filtering and data encapsulation were completed by the CM3588 main control unit.

[0030] Step 2: Modality-specific feature extraction:

[0031] After wavelet artifact removal and AR model restoration, the EEG signal is input into a three-path parallel feature extractor and outputs (F_{eeg}).

[0032] The ECG signal is processed by Pan-Tompkins to detect the R peak, and HRV time-frequency domain features and 1D-CNN waveform features are extracted to output (F_{ecg} );

[0033] After MAD anomaly removal, linear interpolation, and sliding smoothing, the SpO2 signal is used to extract statistical features, trend features, and Bi-GRU dynamic features, and output (F_{spo2}).

[0034] Step 3: Survival trait context generation:

[0035] Will and Projecting to a unified dimension, the input to a 4-layer Transformer encoder, after multi-head self-attention interaction, generates a life-sustaining state context vector. ;

[0036] Step 4: Conditioned EEG fusion:

[0037] Will Replication expands into conditional sequences (C), with EEG features. A common input conditional Transformer coding layer is used; in this layer, the key (K) and value (V) are generated by adding (C) and EEG representation, and the query (Q) is still generated by EEG itself, realizing the dynamic modulation of the EEG attention weight by the vital signs status;

[0038] Step 5: State determination and output:

[0039] Global average pooling is performed on the conditionally encoded EEG representation to obtain the fused features. The system takes an MLP classifier and a Softmax layer as input and outputs the probability distribution of four life states. The system packages and outputs the highest probability class, its confidence level, and the recommended rescue priority for the command center to use in decision-making.

[0040] The beneficial effects of the present invention.

[0041] This invention addresses the scenario of mining machine rescue, enhancing the targeted nature of vital sign feature extraction. By designing a multi-path deep feature extractor (time-frequency, nonlinear, morphological) for single-channel EEG and a dedicated feature extraction scheme for ECG and SpO2, it ensures that the extracted features are highly correlated with the subtle changes in vital signs in post-disaster coma, providing a scientific information basis for assessment.

[0042] To address the lack of fusion logic, this invention proposes a hierarchical conditional fusion model that structurally embeds the clinical prior knowledge of "survival signs first, then neurological function" into the model. Specifically, ECG and SpO2 features are first fused to generate a "life-maintaining status" context, which is then used to guide the fusion and interpretation of EEG features. This conditional fusion mechanism goes beyond simple feature splicing or data-driven attention allocation, achieving intelligent assessment that aligns with physiological and emergency care logic. Attached Figure Description

[0043] Figure 1 This is a system technical block diagram.

[0044] Figure 2 This is a schematic diagram of a robust feature extraction algorithm for single-channel EEG.

[0045] Figure 3 This is a schematic diagram of a hierarchical condition fusion model.

[0046] Figure 4 This is a schematic diagram of a Transformer encoder.

[0047] Figure 5 This is the accuracy change curve after 100 iterations.

[0048] Figure 6 This is the curve showing the change in loss value after 100 iterations.

[0049] Figure 7 Confusion matrix diagram for different models. Detailed Implementation

[0050] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0051] This invention discloses a life status assessment system based on multimodal vital sign information fusion for emergency rescue in mines. The overall framework of the system is as follows:

[0052] 1.1 Overall Design

[0053] A vital signs assessment system based on the fusion of multimodal information from EEG, ECG, and blood oxygenation; technical block diagram see [link to system]. Figure 1 It mainly consists of two modules: hardware acquisition and intelligent analysis. These correspond to steps 1 and 2 through 5 of the operation method, respectively.

[0054] Three raw physiological signals are simultaneously acquired using a dedicated sensor array and then preliminarily processed and transmitted by an embedded core controller (Step 1 (Synchronous Acquisition of Multimodal Physiological Signals)).

[0055] The core algorithm model is the "Hierarchical Conditional Fusion Model for Life Status Assessment" (HCF-LSA). Its processing flow strictly follows the clinical decision-making logic of "first assessing vital signs, then evaluating the level of consciousness," and includes three steps:

[0056] First, specific features are extracted for each modality of signal (such as time-frequency and nonlinear analysis of EEG, HRV analysis of ECG, and trend modeling of blood oxygenation); this corresponds to step 2 (modality-specific feature extraction).

[0057] Secondly, the hierarchical fusion coding of the simulated clinical reasoning path is used, that is, firstly, the ECG and blood oxygenation features are fused to assess the "strength of life maintenance", and then the results are used as conditions to guide the fusion and interpretation of EEG features; corresponding to step 3 (generation of survival signs context) and step 4 (conditional EEG fusion).

[0058] Finally, the system outputs classification results for four vital states: "awakenable," "mild coma," "severe coma," and "near death," corresponding to step 5 (state identification and output). This forms a closed-loop system design from data acquisition to intelligent diagnosis.

[0059] 1.2.1 Main Control Unit

[0060] This invention uses the FriendlyELEC CM3588 (approximately 100mm x 85.47mm x 7mm) high-performance computing module as the main control unit. This module is based on the Rockchip RK3588 processor, uses industrial-grade packaging, and can be equipped with up to 16GB of LPDDR4X memory, providing powerful support for parallel processing of multi-channel data on-site. Its unique big.LITTLE architecture can intelligently allocate the workload between the four Cortex-A76 and four Cortex-A55 cores, ensuring real-time analysis response while keeping power consumption within the range of 3-8W, perfectly adapting to mobile power supply solutions at rescue sites.

[0061] 1.2.2 Sensor Module

[0062] 1) EEG sensor module

[0063] OpenBCI (Open-source Brain-Computer Interface) is an open-source hardware and software platform for acquiring, processing, and analyzing bioelectrical signals (such as EEG, EMG, and EOG). Its core mechanism involves collecting weak bioelectrical signals through electrodes that contact the scalp or skin. These signals are then amplified and converted from analog to digital by a data acquisition board (such as Cyton or Ganglion), and transmitted via wired or wireless means to a computer or embedded control unit for analysis.

[0064] Its principle is based on bioelectric phenomena. The electrical activity of neurons in the cerebral cortex generates potential differences on the scalp. These potential changes are captured by electrodes placed at specific locations on the scalp (following the international 10-20 system), forming an electroencephalogram (EEG). This platform can not only be used for basic brainwave detection (such as alpha waves), but also for identifying various biometric signals such as blinking and muscle activity. It can also be visualized in real time, filtered (e.g., to remove 50Hz power frequency interference), and shared via software (such as OpenBCIGUI).

[0065] 2) ECG sensor module

[0066] The BMD101 module is a heart rate acquisition chip that measures heart rate through two electrodes

[40] . When the chip is turned on by pressing the two sides of the electrodes with a finger, a microcurrent flows through the finger. The chip reflects the change in human heart rate based on the change in the microcurrent passing through the finger. The chip feeds back real-time heart rate data, which may fluctuate. The real-time heart rate data can be plotted, filtered and calculated by an algorithm, and finally a complete and stable human heart rate change curve and a basic human heart rate average can be obtained.

[0067] 3) Blood Oxygen Sensor Module

[0068] The MKS-SPO2-R is a high-performance blood oxygen sensor module primarily used to monitor blood oxygen saturation and pulse rate. This module employs photoplethysmography (PPG), analyzing the oxygen content in the blood by emitting and receiving light of specific wavelengths.

[0069] 1.3 Software Design

[0070] The core function of this system is to assess the vital signs of casualties after a disaster, aiming to improve the efficiency of disaster relief. The system has three data acquisition modules in the collection area, corresponding to EEG, ECG, and pulse oximetry modules respectively. The core task of the software is to achieve efficient integration and real-time assessment of these three types of physiological data to support rapid assessment of vital signs at the disaster site. The software connects data from each sensor to the central processing unit through a stable data link to ensure reliable transmission. Specialized preprocessing is performed for different data characteristics: EEG signals are filtered from 0.1-30Hz and uniformly resampled to 125Hz, extracting effective segments and removing artifacts; ECG data is synchronized to 125Hz and key waveform segments are extracted for processing. The entire system adopts a modular design, connecting various functional parts through standard interfaces, facilitating future expansion with new sensors or optimized algorithms, thus ensuring the system's iterative capabilities while supporting current rescue missions.

[0071] In terms of data processing, the system uses the CM3588 as the core processing unit. This unit is responsible for receiving data from various modules and performing data preprocessing and feature extraction. By analyzing the data transmitted from the sensors in real time, the multimodal fusion-based monitoring and evaluation system can effectively extract relevant feature values. After feature extraction, these data are input into the multimodal fusion model for evaluation, thereby obtaining the assessment results of vital signs.

[0072] The overall design defines the functional module division and data flow of the system; the sensor module is responsible for raw signal acquisition, the main control unit undertakes data processing and model inference tasks, and the software design implements hardware scheduling, algorithm deployment, and interaction logic. Together, these three constitute a closed loop of "hardware-software-algorithm" engineering implementation.

[0073] State assessment model

[0074] The topic is "Life status assessment system based on multimodal vital sign information fusion in mine emergency rescue". Does this correspond to a status assessment model?

[0075] A system is not the same as a model. A system includes all engineering elements such as hardware, software, algorithms, and interactions; the state evaluation model is the core algorithm engine of the system, corresponding to the "intelligent analysis module" in the overall block diagram. This patent protects the complete system including this model, not just the algorithm itself.

[0076] 2.1 Dataset Construction

[0077] To address the problem of inaccurate life risk assessment caused by the misjudgment of low-arousal individuals as comatose during post-disaster rescue, this invention employs a combination of public and self-collected datasets in its dataset construction. Existing public vital sign data largely originates from clearly comatose patients in intensive care or after cardiac arrest, failing to cover the common but easily overlooked "awakenable" low-arousal state in post-disaster environments. To compensate for this deficiency, this invention introduces "awakenable state" as an independent vital sign type into the data system and collects corresponding multimodal physiological signal data through independent experiments. Simultaneously, it combines publicly available data on comatose patients after cardiac arrest to uniformly reconstruct and classify the original labels. Through the fusion of these two datasets, this invention ultimately constructs a multimodal labeled dataset covering four vital sign states: awakeable, mild coma, severe coma, and near-death. This provides a data foundation for refined identification of vital sign states and priority determination in complex post-disaster environments.

[0078] 1) Introduction to Coma Datasets

[0079] The George B. Moody PhysioNet Challenge 2023 database collected data from over 1000 adult cardiac arrest patients from seven academic hospitals in the United States and Europe. This data covered baseline clinical information and continuous multimodal physiological signals during the period when patients remained in a coma and under ICU care after restoration of spontaneous circulation (ROSC). Building upon this, this invention further standardized the data extraction strategy, selecting the first 5 consecutive minutes of EEG, ECG, and SpO2 signals from each patient within 24 hours after cardiac arrest, constructing a total of 2000 samples. This formed a multimodal vital signs database suitable for vital signs identification research, encompassing mild coma, severe coma, and near-death states.

[0080] 2) Construction of a wake-upable dataset

[0081] The "awakenable" state proposed in this invention is not a diagnostic term in clinical medicine, but rather a description of a life state characterized by reduced consciousness and weakened external responsiveness, but with intact central nervous system arousal pathways, and the ability to regain wakefulness under continuous external stimulation. From a physiological perspective, this state exhibits high similarity to the early stages of natural sleep or light sleep in terms of EEG rhythm distribution and heart rate regulation characteristics, making it suitable for modeling and identifying the life state of low-arousal individuals in complex post-disaster environments.

[0082] This invention addresses the need for life status modeling in post-disaster low-arousal individuals by systematically constructing a "wakeable" life status dataset. Data collection targeted healthy adults, with a planned recruitment of at least 20 participants. All participants voluntarily participated and signed informed consent forms. During data collection, the sampling rate, channel configuration, and preprocessing procedures were synchronized with those used for coma data to ensure comparability across different data sources. Data collection was conducted in a low-light, relatively quiet indoor environment, with participants maintaining a semi-reclining or supine resting posture. The experiment was uniformly scheduled to begin around 10:00 PM to utilize physiological fatigue and circadian rhythms to promote the natural emergence of a low-arousal state. Each participant completed approximately 60 minutes of data collection per experiment, including a baseline awakening phase (approximately 5 minutes each with eyes closed and open) and a natural sleep phase. Physiological signals were continuously recorded without any drug intervention, and a mild arousal check was performed approximately every 10 minutes. Ultimately, through rigorous quality control and annotation processes, 500 valid "wakeable" life status data points with synchronized EEG, ECG, and SpO2 were obtained.

[0083] 2.2 Feature Extraction (corresponding to step 2 (Modality-Specific Feature Extraction))

[0084] In a multimodal vital signs assessment system, key information is distilled from raw physiological signals using feature extractors. Each extractor is constructed based on the physiological generation mechanism and clinical assessment significance of the corresponding signal, aiming to transform high-dimensional, noisy raw waveforms into low-dimensional, robust, and discriminative feature vectors.

[0085] 2.2.1 Robust Feature Extraction for Single-Channel EEG

[0086] While single-channel forehead EEG sensing units offer ease of acquisition, they face challenges such as missing spatial signal information and severe artifact contamination (e.g., electrooculography, electromyography). To address these challenges, this invention designs a robust feature extractor based on parallel multipath analysis, the core architecture of which is as follows: Figure 2 As shown.

[0087] 1) Adaptive preprocessing:

[0088] The raw signal is first subjected to a Butterworth bandpass filter (0.5–45 Hz) to preserve the effective frequency band relevant to conscious activity and suppress baseline drift and high-frequency noise. Subsequently, an algorithm based on wavelet transform modulus maxima is used to detect transient artifacts (such as electrooculography and blinking). For segments marked as artifacts… An autoregressive (AR) model was built using the normal data segments before and after the model for repair.

[0089]

[0090] Where (p) is the order of the AR model, These are the model coefficients. It is white noise. These are adjacent normal signal segments.

[0091] 2) Parallel multi-path feature extraction:

[0092] To overcome the deficiency of insufficient spatial information in a single channel, this invention extracts features in parallel from three complementary dimensions: time-frequency, nonlinear dynamics, and waveform morphology.

[0093] Path 1: High-resolution time-frequency features

[0094] Continuous wavelet transform (CWT) using complex Morlet wavelets is employed to obtain a high-resolution time spectrum (W(a, b)):

[0095] (2)

[0096] in, Let (a) be the mother wavelet, (b) be the scaling factor, and (c) be the translation factor. Extract the band relative power from the time spectrum: δ The proportion of power in the four classic frequency bands θ (4-8Hz), α (8-13Hz), and β (13-30Hz) to the total power. Spectral edge frequency (SEF95), the frequency at which the accumulated power reaches 95% of the total power, is a sensitive indicator of decreased consciousness level; Instantaneous frequency stability: calculating the variance of the instantaneous frequency of the dominant frequency band. .

[0097] Path 2: Nonlinear Dynamics

[0098] Multiscale Sample Entropy (MSE): First, construct a coarse-grained sequence.

[0099]

[0100] in This is the scale factor. Then, the sample entropy of the sequence at each scale is calculated:

[0101] (4)

[0102] Where (m) is the embedding dimension, and (r) is the similarity tolerance. Let be the number of template matches in (m+1) dimensional and (m) dimensional spaces, respectively. When there is a loss of consciousness, the complexity decreases and the entropy decays.

[0103] Recursive quantitative analysis (RQA) generates recursive graphs through phase space reconstruction. Features such as determinism (DET) and layer average length (LAM) are extracted from the signal to quantify the degree of determinism.

[0104] Detrended volatility analysis (DFA) calculates scaling exponents. This reveals the long-range correlation of the signal. A value deviating from 1 (white noise) may indicate a change in physiological regulatory mechanisms.

[0105] Path 3: Transient waveform morphology characteristics:

[0106] Multi-scale one-dimensional convolution: Convolving the signal with three one-dimensional convolution kernels of different widths (50 ms, 100 ms, 250 ms) to capture local waveform patterns associated with specific neural oscillations (such as slow waves).

[0107] Transient slow-wave event statistics: Detect negative deflection waves with amplitude exceeding the threshold and duration between 0.5 and 2 seconds, and statistically analyze their average amplitude and occurrence rate (times / minute). Increased slow-wave activity is one of the hallmarks of declining consciousness.

[0108] 3) Feature fusion and dimensionality reduction:

[0109] All scalar features extracted from the three paths are concatenated into a high-dimensional vector. Subsequently, a lightweight feature-level attention layer is introduced to assign learnable weights to different features, achieving adaptive weighting. Finally, a fully connected layer is used for dimensionality reduction and nonlinear transformation to output the final EEG feature vector. .

[0110] 2.2.2 Extraction of ECG autonomic nervous system functional features

[0111] Electrocardiogram (ECG) signals are the gold standard for assessing cardiac autonomic tone. This extractor is designed to accurately quantify heart rate variability (HRV) and heartbeat waveform morphology from a single-lead ECG.

[0112] 1) R-peak detection and waveform segmentation

[0113] The ECG signal was preprocessed and R-peak detected. After being bandpass filtered from 1 to 40 Hz, the position of the R-peak was detected in real time using the classic Pan-Tompkins algorithm, resulting in the R-peak time point sequence. The time interval between adjacent R peaks is then calculated to form the RR interval sequence. .

[0114] 2) Dual-branch feature extraction

[0115] The two branches refer to heart rate variability (HRV) feature extraction and waveform morphology feature extraction, respectively.

[0116] Branch 1 characterizes HRV features from three levels: time domain, frequency domain, and nonlinearity. Time domain features include the standard deviation of all normal RR intervals, representing the sensitivity of parasympathetic activity, reflecting overall variability, and the percentage of intervals with a difference greater than 50 ms between adjacent RR intervals. In the frequency domain, for non-uniformly sampled RR interval sequences, the Lomb-Scargle periodogram method is used for spectral estimation to avoid information distortion caused by resampling. Low-frequency power (LF, 0.04-0.15 Hz) is calculated to reflect the joint regulation by the sympathetic and parasympathetic nervous systems; high-frequency power (HF, 0.15-0.4 Hz) is associated with respiratory arrhythmias and mainly reflects parasympathetic activity; the LF / HF ratio characterizes the sympathetic-vagal balance. Nonlinear features include Poincaré plot analysis and sequence sample entropy calculation (using the same method as EEG) to measure the complexity of heart rate rhythm.

[0117] For extracting morphological features of the heartbeat waveform, each R-peak is used as a reference point to extract a fixed time window (e.g., 100ms before and 400ms after the R-peak), and amplitude and time alignment is performed to generate an average heartbeat waveform. Subsequently, a shallow one-dimensional convolutional neural network (1D CNN) is used to automatically learn the morphological features of the average waveform (e.g., P-wave, T-wave morphology, ST-segment trend, etc.) and output a morphological feature vector.

[0118] 3) Feature fusion and dimensionality reduction

[0119] The time-domain, frequency-domain, and nonlinear feature vectors of HRV are concatenated with the waveform morphology feature vector, and then fused and dimensionality-reduced through a fully connected layer to output the final ECG feature vector. .

[0120] 2.2.3 Extraction of SpO2 Oxygenation Trend Features

[0121] Blood oxygen saturation signals reflect the oxygen supply to tissues. This extractor aims to extract statistical and dynamic features reflecting oxygenation stability, declining trends, and potential risks from numerical sequences that may contain motion artifacts and missing values.

[0122] 1) Data cleaning and sequence reconstruction

[0123] The raw SpO2 signal is susceptible to motion artifacts, resulting in sudden drops or gaps (reference). First, outliers are identified and removed using the median absolute deviation (MAD) method. For missing or removed data points, linear interpolation is used for completion. Finally, a 5-second moving average filter is applied to smooth the sequence to suppress high-frequency noise and obtain a high-quality sequence. .

[0124] 2) Multi-angle trend feature extraction

[0125] First, we calculate the overall statistical characteristics, mainly the descriptive statistics of the series, including central tendency and dispersion (mean). ), standard deviation Interquartile range (IQR); quantification of hypoxia load, calculating the percentage of time that blood oxygen levels are below clinically critical thresholds (such as 92% and 88%). The cumulative hypoxic area, along with the total hypoxic area, comprehensively reflects the depth and duration of hypoxia.

[0126] Secondly, there is the extraction of sliding trend features, including trend slope analysis, which involves performing linear regression on the sequence within a 30-second sliding window to obtain the local slope. Calculate the mean of the slopes of all windows. and variance This was done to assess the overall trend and volatility of oxygenation changes. Then, for the window statistical series, the mean and standard deviation within a 10-second sliding window were calculated to form two new time series. and Then, the mean, variance and other statistics of these two sequences are calculated to characterize the short-term fluctuation pattern of oxygenation status.

[0127] The third is temporal dynamic feature extraction, which extracts the smoothed sequence. Input a bidirectional gated recurrent unit (Bi-GRU) network. The recurrent structure of GRU enables it to effectively model long-range dependencies in time series. Finally, the bidirectional hidden states of the last time step are concatenated to form a feature vector that encodes the dynamic context information of the entire sequence. .

[0128] 3) Feature fusion and dimensionality reduction

[0129] The statistical values ​​of global statistical features, sliding trend features, and GRU dynamic features are concatenated to form a comprehensive feature vector. This is then reduced in dimensionality using a fully connected layer to output the final SpO2 feature vector. .

[0130] The three modality-specific feature extractors described above are designed with targeted processing flows and feature sets tailored to the physiological characteristics of their respective signals. They transform the raw signals into high-quality feature vectors, providing reliable information input for hierarchical fusion and state assessment. All feature designs prioritize interpretability and clinical relevance, ensuring the rationality of the model's decision-making basis.

[0131] 2.3 Conditional Hierarchical Fusion

[0132] The hierarchical conditional fusion model aims to simulate the clinical decision-making logic in emergency medicine of "stabilizing vital signs first, then assessing neurological function".[20, 21] Through structured encoder design and conditional attention mechanisms, multimodal physiological signals are fused and state reasoned from superficial to advanced levels. This corresponds to steps 3 (survival sign context generation) and 4 (conditional EEG fusion).

[0133] The model's input consists of the outputs of three modality-specific feature extractors: EEG feature vectors. ECG feature vector and SpO2 eigenvectors The overall processing flow is divided into two core stages, such as... Figure 3 As shown. Following the priorities of rescue medicine, namely prioritizing circulatory and respiratory (life support systems), the degree of central nervous system injury is then assessed. This model hard-codes this domain knowledge into a network structure. The first stage (survival sign fusion) first fuses ECG and SpO2 features reflecting cardiopulmonary function to generate a comprehensive, low-dimensional "life support status" context vector. The second stage (fusion of neural state conditions): will As a global condition, it guides the in-depth analysis and fusion of EEG (reflecting brain function) features. At this point, the model's "attention" to brain electrical activity will dynamically adjust according to the different life-sustaining states.

[0134] 2.3.1 Vital signs fusion (corresponding to step 3)

[0135] like Figure 4 As shown, the encoder aims to... and The process involves fusing and generating context vectors that represent the underlying vitality. This mainly consists of two steps.

[0136] 1) Modal-specific projection and serialization

[0137] Project features from different dimensions onto a unified model dimension. :

[0138]

[0139] The projected features are considered as a sequence of length 2: Add learnable positional coding. To inject sequence order information: .

[0140] 2) Transformer encoding

[0141] will sequence Input a module consisting of (4) layers of standard Transformer encoders stacked together, as shown in the figure. For the (l)th layer... Its forward propagation process is as follows:

[0142] a) Multi-head self-attention mechanism

[0143] Each attention head The calculation formula is:

[0144] (6)

[0145] (7)

[0146] : The input of the (l)th layer encoder, which is also the output of the (l-1)th layer. (h): The total number of attention heads. : The dimensions of the key vector and query vector for each attention head.

[0147] The (i)th header is used to generate the query. s and keys The learnable weight matrix. The (i)th header is used to generate the value. The learnable weight matrix, usually . : A learnable weight matrix used to project the concatenated outputs of multiple attention heads back to the model dimension.

[0148] b) Residual connectivity and layer normalization

[0149] (8)

[0150] Layer normalization operation, applied to the input tensor in the last dimension (feature dimension) The standardization is performed on the sample with a mean of 0 and a variance of 1, and learnable scaling and translation parameters are introduced. : The intermediate representation after multi-head self-attention, residual connection, and layer normalization.

[0151] c) Feedforward network and renormalization

[0152] A feedforward network consists of two linear transformations and one nonlinear activation function:

[0153] (9)

[0154] Then, residual connections and layer normalization are performed again:

[0155] (10)

[0156] : The weights and biases of the first linear transformation in the (l)th layer feedforward network. Usually . : The weights and biases of the second linear transformation in the (l)th layer feedforward network. : Modified linear unit activation function, defined as . The final output of the Transformer encoder at layer (l) will be used as the input of layer (l+1).

[0157] These formulas and definitions together constitute a complete, closed-loop mathematical description of multi-layer Transformer encoding in the survival symptom fusion encoder. The entire process starts from... It begins by passing information layer by layer, eventually resulting in a set of rich interactive information. This is for generating context vectors later. It laid the foundation.

[0158] 2.3.2 Conditional Fusion of Neural States (corresponding to step 4)

[0159] This stage is the core innovation of this model, aiming to... As a condition, EEG features are fused in a deep and focused manner.

[0160] 1) EEG Feature Projection and Conditioning Preparation

[0161] Projecting EEG features into the model space while simultaneously transferring context vectors Mapped to the same dimension through a linear layer and replicated. share The length of the EEG feature sequence is given by... If it is a single vector, then ), forming a conditional sequence .

[0162] 2) Conditional Transformer coding layer

[0163] This is key to achieving "conditional fusion". This invention designs a conditionalized multi-head self-attention module to replace the standard MHA. For the input of layer (l)... (Initial time) ).

[0164] Conditional key-value generation: no longer from Self-generated keys Sum Instead, it refers to the conditions of survival. It is generated after being fused with the current EEG representation.

[0165] (11)

[0166] Query generation: query vector Still from the original EEG representation This is to ensure that the core of the attention mechanism remains the EEG signal itself.

[0167] Conditional attention computation:

[0168] (12)

[0169] When the model calculates the attention between elements within the EEG, it uses a "dictionary" (or similar dictionary) to match and aggregate information. (Based on life support status) It has been rewritten. For example, when When indicating severe hypoxemia, and They tend to emphasize pattern features in EEG associated with hypoxic brain injury.

[0170] The subsequent residual connections, layer normalization, and feedforward network steps are the same as in the standard Transformer. After stacking (N) such conditional coding layers, a high-level fusion representation is obtained. .

[0171] Feature fusion (corresponding to step 5 (state discrimination and output)): For Global average pooling is performed to obtain the final fused feature vector, which contains multimodal information and hierarchical decision logic:

[0172] (13)

[0173] Decision output (corresponding to step 5 (state determination and output)): At this point, the model has completed the entire inference chain from the original multimodal signal to high-level fused features. Finally, Input a simple multilayer perceptron classifier:

[0174] (14)

[0175] in, The probability distribution of life states predicted by the model corresponds to four categories: "awakenable", "mild coma", "severe coma" and "near death".

[0176] Results Analysis

[0177] After dataset preprocessing, the data is input into the model for training. The dataset contains 2500 valid samples, divided into training and test sets in an 8:2 ratio. During offline testing, the Adam optimizer is used with cross-entropy loss as the optimization objective, and the batch size is set to 16. The initial learning rate is set to 0.001, and a dynamic learning rate adjustment mechanism is introduced during training, gradually reducing the learning rate as training iterations progress to enhance the model's convergence stability and generalization ability in later training stages.

[0178] The training process iterated for a total of 100 epochs, and the final prediction results of the model were obtained through offline testing. Figure 5 , Figure 6 The trends of accuracy and loss values ​​for different modality combinations are shown over 100 iterations. Figure 5 As can be seen, the accuracy of each model showed a rapid upward trend in the early stages of training. The trimodal fusion model maintained the highest accuracy throughout the training process, gradually stabilizing after approximately 40 iterations and eventually settling at around 0.94. The bimodal model (ECG + POS) had the second highest accuracy, eventually converging to approximately 0.87. Among the unimodal models, ECG signals performed better than EEG and POS signals, achieving an accuracy of approximately 0.78. The unimodal models for EEG and POS had relatively lower accuracies, stabilizing at around 0.74 and 0.72, respectively. Overall, the multimodal fusion model significantly outperformed the unimodal model in terms of accuracy. Figure 6 The corresponding changes in loss values ​​are presented. It can be observed that the loss values ​​of all models decrease rapidly in the early stages of training, and then gradually stabilize. Among them, the loss value of the trimodal fusion model decreases the fastest, converging to a low level after approximately 35 iterations and remaining stable below 0.05 for a long period. The final loss value of the bimodal model is approximately 0.10, significantly lower than that of each unimodal model. In contrast, the loss values ​​of the unimodal models converge more slowly and are more stable. The loss value of the blood oxygen saturation unimodal model fluctuates relatively greatly, eventually stabilizing at around 0.33, while the loss values ​​of the EEG and ECG unimodal models are approximately 0.25 and 0.23, respectively.

[0179] Table 1 shows the classification accuracy of different combinations of physiological signal modalities for four vital states (awakenable, mild coma, severe coma, and near death). The progressive comparison from single-modality to multi-modal fusion effectively validates the effectiveness of the "hierarchical conditional fusion encoder" model. The average accuracy of a single modality (EEG, ECG, or blood oxygenation) ranges from 72.33% to 78.04%, each with significant assessment blind spots. For example, using only EEG resulted in the lowest recognition rate for the "near death" state (68.18%), while using only ECG resulted in a weaker recognition rate for "mild coma" (72.56%). The data in the table show a positive correlation between model performance and the richness of modal information, highlighting the inherent limitations of single physiological signals in comprehensively assessing complex vital states.

[0180] When ECG and SpO2, two core vital sign modalities, were integrated, the model's average accuracy jumped to 88.00%, with more balanced performance across the four states (range 84.80%-92.60%). The fusion of surface survival sign information constructs a robust physiological state benchmark, improving the stability of the assessment. The full-modal model integrating EEG, ECG, and SpO2 achieved the highest accuracy, with an average accuracy of 94.90%, and the highest values ​​across all subcategories (93.56%-95.85%), empirically demonstrating that the two-stage fusion architecture can improve accuracy. The first-stage (ECG + SpO2) fusion encoder generates a high-quality "vitality maintenance strength" context vector; the second stage uses this vector as a condition to guide attention and interpretation of EEG (neurological function) features. This structured fusion mechanism simulates a clinical reasoning path of "first assessing vital signs, then evaluating the level of consciousness," allowing the model to fully utilize the complementarity of multi-source signals, ultimately achieving robust and accurate end-to-end assessment of vital states.

[0181] Table 2 presents the recall performance under different modal combinations, and the trend of recall is consistent with that of accuracy. Under single-modal conditions, the average recall (70.91%-78.55%) of each model is comparable to the accuracy level, but also shows differences in sensitivity to different states. The single blood oxygenation modality has the lowest recall for "severe coma" (66.86%), while the single electrocardiogram modality has the highest recall for "near death" (83.46%). This indicates that different physiological signals have different emphases in pathophysiological significance, and a single signal is difficult to comprehensively capture all state characteristics. The dual-modal fusion of electrocardiogram and blood oxygenation improves the average recall to 87.36%, and the recall of each category is improved to above 82.88%. This marks a substantial enhancement in the overall detection capability of the model, indicating that the fusion of vital signs effectively expands the perceptual boundary of the model and reduces the missed detection of single states (especially critical states).

[0182] Table 1. Accuracy under different modalities

[0183]

[0184] Table 2 Recall rates under different modalities

[0185]

[0186] The full-modal fusion model achieved an average recall of 94.30%, the highest among all configurations. Importantly, it demonstrated excellent and balanced recall across all four life states (91.86%–96.83%). This result indicates that the hierarchical conditional fusion strategy of this invention not only improves overall accuracy but also enhances the model's generalization ability and reliability across various categories.

[0187] The simultaneous achievement of high recall and high accuracy demonstrates that the fusion framework can generate fusion feature representations with strong discriminative power and good generalization, thus showing significant application potential in scenarios such as mine emergency rescue where both missed and false alarms are extremely sensitive.

[0188] To more fully verify the performance advantages of the method of this invention, under the same experimental conditions, the model of this invention was systematically compared with mainstream architecture models such as MuKAT, CNN-Transformer, and TB-CAT. The confusion matrices of each model are shown below. Figure 7 As shown, through comparison with objective data, the breakthrough improvement in classification accuracy and generalization ability of the method of this invention is evident.

[0189] In the recognition tasks of four different states (categories (1) to (4)), the method proposed in this invention shows a significant advantage in key evaluation metrics. As can be seen from Figure (a), the recognition accuracy of the method in this invention is consistently above 93% in all categories, demonstrating extremely high robustness. In particular, compared with the second-best performing TB-CAT model, the method in this invention achieves high accuracy of 93.56% and 94.40% in easily confused categories (2) and (3), respectively, which is an improvement of 8.96% and 6.90% compared with TB-CAT. Compared with MuKAT and CNN-Transformer, the method in this invention has significant advantages, solving the bottleneck that other models struggle to break through 80% accuracy in intermediate categories, highlighting its overall superiority in multimodal feature extraction and accurate classification.

[0190] in conclusion

[0191] 1) By explicitly defining and introducing the "awakenable" state in a mine rescue scenario, the assessment of vital signs is expanded into a refined, continuous-spectrum classification problem. Based on this, a dedicated feature system highly correlated with EEG, ECG, and blood oxygenation signals is tailored (e.g., nonlinear EEG features to distinguish subtle differences in consciousness, and blood oxygenation trend features to warn of deterioration), rather than applying existing general feature sets. Experimental results confirm that this native feature design tailored to a specific scientific problem is the fundamental reason for the model's high discrimination accuracy (average accuracy of 94.90%).

[0192] 2) Addressing the issue of completely missing spatial information in single-channel forehead EEG, the proposed "deep temporal information mining" framework compensates for the "breadth" of the spatial dimension with the "depth" of the temporal dimension through parallel fusion of time-frequency, nonlinear dynamics, and morphological analysis, achieving a refined characterization of the "orderliness" and complexity of conscious states. Simultaneously, targeted artifact detection and repair mechanisms in signal preprocessing enhance the algorithm's robustness in post-disaster noise environments.

[0193] 3) The design of the feature extractor and the subsequent hierarchical conditional fusion model form an organic whole: feature extraction serves the fusion logic of "survival first, then neural connections," while the fusion model relies on the complementary and interpretable physiological information provided by the features. This closed-loop system has verified its high accuracy and robustness on standard datasets, providing a directly implementable core algorithm prototype for the intelligent upgrading of mine emergency rescue equipment.

Claims

1. A life status assessment system based on multimodal vital sign information fusion in mine emergency rescue, characterized in that, It includes a multimodal physiological signal acquisition module, a vital state feature extraction module, a hierarchical conditional fusion module, and a decision and output module; The multimodal physiological signal acquisition module is used to simultaneously acquire three physiological signals: electroencephalogram (EEG), electrocardiogram (ECG), and blood oxygen (SpO2). The embedded main control unit performs preliminary filtering, resampling, and data encapsulation to provide high-quality raw signal input for subsequent feature extraction. The vital state feature extraction module is designed to extract specific features for EEG, ECG, and SpO2, and extracts discriminative feature vectors that are highly correlated with the level of consciousness and life maintenance ability from multiple dimensions such as time frequency, nonlinearity, waveform morphology, autonomic nervous function, and oxygenation trend. The hierarchical conditional fusion module is used to encode the clinical decision-making logic of "assessing vital signs first and then assessing consciousness" into a network structure. It first fuses ECG and SpO2 to generate a "life maintenance state context", and then uses this condition to guide the attentional fusion of EEG features to obtain high-dimensional features and realize the integration of multimodal information driven by physiological logic.

2. The life status assessment system based on multimodal vital sign information fusion in mine emergency rescue according to claim 1, characterized in that, The decision and output module is used to input the fused high-dimensional features into the classifier, output the probability distribution of the four life states of the trapped personnel, generate a visual assessment result, and transmit it to the wellhead command center to support rescue decision-making.

3. The life status assessment system based on multimodal vital sign information fusion in mine emergency rescue according to claim 1, characterized in that, The multimodal physiological signal acquisition module includes an EEG, ECG, and blood oxygen sensor array; an embedded core controller; and a vital signs assessment model. The EEG, ECG, and blood oxygen sensor array is responsible for picking up raw physiological signals, including a frontal single-channel EEG electrode, a BMD101 ECG module, and an MKS-SPO2-R blood oxygen module. The embedded core controller (CM3588) performs synchronous acquisition of multiple signals, real-time filtering, resampling, data packaging, and edge inference acceleration. The life status assessment model is deployed within the embedded core controller. It receives preprocessed signals, performs feature extraction and fusion inference, and outputs the status classification results.

4. The life status assessment system based on multimodal vital sign information fusion in mine emergency rescue according to claim 1, characterized in that, In the multimodal physiological signal acquisition module, the EEG feature extractor adopts a three-path parallel architecture of "time-frequency + nonlinear + morphology" to output feature vectors. ; ECG Feature Extractor: Outputs data through a dual-branch approach combining HRV analysis and waveform morphology. SpO2 Feature Extractor: Outputs features based on statistical characteristics, sliding trends, and Bi-GRU time series modeling. .

5. The life status assessment system based on multimodal vital sign information fusion in mine emergency rescue according to claim 1, characterized in that, The decision and output module includes a classifier, a life state probability distribution, and a four-class classification task. The classifier consists of a multilayer perceptron and inputs fused features. Output four-dimensional logits; The Softmax layer transforms logits into a probability distribution, corresponding to "awakenable, mild coma, severe coma, near death"; The four-category task packages probability distribution, confidence level, recommended rescue priority, etc. into structured data for display by the host computer or call by the command system; The classifier completes the mapping from features to decisions, Softmax achieves probability normalization, and the output of the four-class classification task completes human-computer information interaction.

6. A method for operating a life status assessment system based on multimodal vital sign information fusion in mine emergency rescue according to any one of claims 1-5, comprising the following steps; Step 1: Simultaneously collect the frontal EEG, single-lead ECG, and fingertip blood oxygen saturation signals of the trapped personnel using EEG, ECG, and SpO2 sensor arrays. The sampling rate is uniformly set to 125Hz, and real-time filtering and data encapsulation are completed by the CM3588 main control unit. Step 2: After wavelet artifact removal and AR model restoration, the EEG signal is input into a three-path parallel feature extractor and outputs (F_{eeg} ). The ECG signal is processed by Pan-Tompkins to detect the R peak, and HRV time-frequency domain features and 1D-CNN waveform features are extracted to output (F_{ecg} ); After MAD anomaly removal, linear interpolation, and sliding smoothing, the SpO2 signal is used to extract statistical features, trend features, and Bi-GRU dynamic features, and output (F_{spo2}). Step 3: Put and Projecting to a unified dimension, the input to a 4-layer Transformer encoder, after multi-head self-attention interaction, generates a life-sustaining state context vector. ; Step 4: Replication expands into conditional sequences (C), with EEG features. A common input conditional Transformer coding layer is used; in this layer, the key (K) and value (V) are generated by adding (C) and EEG representation, and the query (Q) is still generated by EEG itself, realizing the dynamic modulation of the EEG attention weight by the vital signs status; Step 5: Perform global average pooling on the conditionally encoded EEG representation to obtain the fused features. ; The system takes an MLP classifier and a Softmax layer as input and outputs the probability distribution of four life states. The system packages and outputs the highest probability class, its confidence level, and the recommended rescue priority for the command center to use in decision-making.