A multi-modal perception-based emotional interactive intelligent nursing system and method
By using a multimodal sensor array and deep learning model, combined with edge computing and cloud analytics, high-resolution body pressure distribution recognition and personalized health assessment of the nursing bed system have been achieved. This solves the problems of single monitoring dimensions and response delay in existing systems, provides emotional interaction and psychological care, and improves the intelligence level and response speed of the nursing bed.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF ELECTRONICS SCI & TECH OF CHINA
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
Smart Images

Figure CN121900629B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of interdisciplinary technology of intelligent medical care and smart elderly care, specifically relating to an emotional interactive intelligent nursing system and method based on multimodal perception. Background Technology
[0002] In recent years, with the accelerating aging of the population, the demand for health monitoring and emotional care for elderly people living alone or bedridden has been increasing. Traditional elderly care systems mainly focus on monitoring single physiological parameters, such as heart rate, blood pressure, body temperature, or environmental parameters, and upload and display the data through mobile terminals. Although these systems can achieve a certain degree of remote viewing of health data, their monitoring dimensions are limited, data processing capabilities are weak, and response delays are high, making it difficult to meet the 24 / 7, continuous, and personalized care needs of the elderly. Currently, most bed-chair monitoring systems on the market mainly focus on bed adjustment and weight-sensing alarm devices. Their technical solutions attempt to monitor the physiological state and sleep posture of the elderly by embedding sensor arrays in the bed or wearable devices.
[0003] Existing bed-chair monitoring systems are easy to install and relatively inexpensive, and can prevent accidents such as nighttime falls in the elderly to some extent, representing a typical technological approach for current monitoring products. However, these systems still have significant shortcomings: their monitoring dimensions are limited, relying solely on single-point or single-area pressure changes to determine bed exit, lacking high-resolution body pressure distribution recognition; their analysis logic is based on fixed thresholds, unable to combine multimodal physiological signals for intelligent prediction; the systems do not integrate emotion recognition and voice care functions, failing to provide psychological comfort and interactive feedback; communication range is limited to local alarms, preventing real-time remote monitoring by family members; and the lack of edge computing and cloud collaboration mechanisms results in limited data processing capabilities and high response latency. Overall, existing solutions are insufficient in terms of monitoring comprehensiveness, analytical intelligence, and emotional interactivity. Therefore, an intelligent nursing system integrating multimodal perception, AI health analysis, and voice emotion recognition is needed to achieve a more comprehensive and personalized health management and care experience for the elderly. Summary of the Invention
[0004] The purpose of this invention is to provide a multimodal nursing bed system based on emotional interaction to solve the problems of single function, incomplete health monitoring, and insufficient emotional interaction capabilities in existing nursing beds, and to achieve multimodal integration of real-time physiological data collection, intelligent health assessment, proactive risk warning and emotional care.
[0005] This invention provides an emotionally interactive intelligent nursing system based on multimodal perception, such as... Figure 1 As shown, the system includes: a perception layer, a control layer, a network layer, a cloud analytics layer, and a user interaction layer.
[0006] The layers are interconnected through a data communication module. The perception layer and the control layer use I²C, UART, and ADC interfaces for data transmission; the control layer and the cloud analysis layer use a Wi-Fi module and MQTT communication protocol for data uploading and command issuance; and the cloud analysis layer and the user interaction layer use WebSocket for real-time information synchronization.
[0007] The sensing layer includes a matrix pressure sensor array, a heart rate sensor, a blood oxygen sensor, a urine moisture sensor, and a temperature and humidity sensor. The matrix pressure sensor array is installed inside the nursing mattress and uses a flexible electrode cross structure to form 32 independent pressure sensing units, which can collect high-resolution body pressure distribution information. The sensor array is connected to the sub-control device via a multiplexing circuit and an analog-to-digital converter module to achieve parallel acquisition of multi-point pressure. The heart rate and blood oxygen sensors are installed in the area of contact with the monitored person's body, and the urine moisture and temperature and humidity sensors are installed in the bed and environmental detection unit. Data from all sensors are aggregated to the sub-control device for preliminary filtering and feature extraction.
[0008] The control layer includes a main control device and a sub-control device. The main control device uses an ESP32 microcontroller and is responsible for voice recognition, motor control, voice broadcasting, and communication with the cloud. The sub-control device (ESP32 sub-controller) is responsible for the acquisition and preprocessing of various sensor data.
[0009] The main control device is responsible for voice recognition, motor control, voice broadcasting, and communication with the cloud; it includes an AI voice module, a voice broadcasting module, a motor control module, and a communication module; the AI voice module recognizes the elderly's voice commands through a voice recognition chip and analyzes voice features in conjunction with an emotion judgment algorithm; the voice broadcasting module is used to output soothing voices or health tips; the motor control module is used to drive bed adjustment or posture changes.
[0010] The sub-control device is responsible for sensor data acquisition and edge analysis; it includes a data acquisition module and an edge computing module. The data acquisition module is connected to the sensor through I²C, UART, and ADC interfaces; the edge computing module executes filtering, anomaly detection, and attitude recognition algorithms to achieve local decision-making.
[0011] The main control device and the sub-control device of the control layer are connected through a serial port or local area network to form a data sharing and redundancy fault tolerance mechanism.
[0012] The network layer includes a Wi-Fi communication module and an MQTT Broker server. The control layer connects to the MQTT Broker in the network layer via the Wi-Fi module to enable data publishing and subscription. The MQTT Broker, as a message middleware, is responsible for managing data transmission between devices and the cloud, providing reliable message queues, topic management, and state maintenance mechanisms. This layer structure ensures low latency, high reliability, and multi-device synchronous communication capabilities, providing support for subsequent cloud analysis and user interaction.
[0013] The cloud-based analysis layer includes a cloud server, a multimodal data preprocessing module, and four core algorithm modules. The cloud server is built on the Python Flask framework and is responsible for receiving multimodal data from the control layer: stress matrix, heart rate, blood oxygen, urine moisture, and environmental parameters.
[0014] The multimodal data preprocessing module is responsible for data reception, cleaning, time-series alignment, and storage management.
[0015] The four core algorithm modules include: a multimodal spatiotemporal feature fusion prediction module, a pressure ulcer risk quantification assessment module, an adaptive dynamic threshold adjustment module, and a personalized health module based on transfer learning.
[0016] Furthermore, the multimodal spatiotemporal feature fusion prediction module adopts a CNN-Transformer hybrid architecture. It extracts the spatial features of pressure distribution through a convolutional neural network (CNN), models the time series data in combination with a Transformer, and achieves adaptive feature fusion through a gated multimodal fusion layer, thereby realizing intelligent assessment of sleep quality, pressure ulcer risk and abnormal body position in the elderly.
[0017] 1) Convolutional Neural Network (CNN) is responsible for spatial feature extraction: After reconstructing the 32×1 pressure matrix into an 8×4×1 tensor, it is processed by three layers of convolution, each layer is configured with ReLU activation and batch normalization, and dimensionality is reduced by 2×2 max pooling to output a 128-dimensional spatial feature vector, capturing local pressure concentration patterns and body position distribution features.
[0018] 2) The Transformer is responsible for temporal feature modeling: the input is the concatenated spatial feature vector, heart rate and blood oxygen time series; an 8-head multi-head self-attention mechanism is used to capture the dependencies at different time scales, and sine-cosine position encoding is combined to preserve temporal information; after processing by two layers of fully connected feedforward network, the output is spatiotemporal fusion features, which model temporal patterns such as body position flipping frequency and pressure peak duration;
[0019] 3) The fusion output layer concatenates spatial features and temporal features, uses a three-layer MLP classifier, and configures Dropout with a value of 0.3; it sets up dual output heads: the pressure ulcer risk score uses Sigmoid activation to output the risk probability of [0,1], and the sleep quality evaluation uses Softmax activation to achieve multi-classification.
[0020] Furthermore, the pressure ulcer risk quantification assessment module is based on a bidirectional GRU neural network and attention mechanism, combined with the automated enhancement of the Braden scoring system, to achieve quantitative scoring of pressure ulcer risk and location of high-risk areas, providing 24-hour and 48-hour early warning capabilities.
[0021] The pressure ulcer risk quantification assessment module uses 18 comprehensive features as input: 6-dimensional pressure matrix features: maximum pressure, average pressure, pressure standard deviation, high-pressure zone percentage, pressure entropy, and pressure gradient; 4-dimensional physiological parameters: heart rate, blood oxygen, body temperature, and respiratory rate; 4-dimensional temporal features: turning frequency, pressure duration, peak change rate, and nocturnal activity; and 4-dimensional individual information: age, BMI, activity level, and history of pressure ulcers.
[0022] 2) The model structure is based on the improved ResNet18: The input layer maps 18-dimensional features to 64-dimensional features, and the feature dimensions are increased by four residual blocks: 64→128→256→512. After global average pooling for dimensionality reduction, the [0,1] risk score is output through Sigmoid activation.
[0023] 3) Establish a five-level risk classification system: 0.0-0.2 is extremely low risk and no intervention is required; 0.2-0.4 is low risk and routine observation is required; 0.4-0.6 is medium risk and the frequency of turning over should be increased; 0.6-0.8 is high risk and the patient should be turned over once every 2 hours; 0.8-1.0 is extremely high risk and immediate intervention should be provided in conjunction with a pressure-reducing mat.
[0024] 4) Generate an 8×4 resolution risk heat map, and backpropagate the risk score to the pressure matrix to intuitively mark high-risk areas such as shoulders, hips, and heels, providing nursing staff with precise turning guidance.
[0025] Furthermore, the adaptive dynamic threshold adjustment module uses a deep Q-network algorithm combined with sliding window statistics to achieve adaptive adjustment of the detection threshold, adapting to different individual characteristics and environmental changes, and reducing false alarm rate and false negative rate.
[0026] 1) The 99th percentile is calculated using a sliding window as the baseline, and the anomaly detection range is determined by combining the interquartile range. The parameter n is adjusted by reinforcement learning for adaptive optimization.
[0027] 2) Threshold optimization is achieved using the DQN algorithm: the state space includes current physiological parameters, historical statistical information, and alarm history; the action space is for adjusting the value of n, ranging from 0.5 to 3.0, with a step size of 0.1; the reward function is designed as R = α × (1 - false alarm rate) + β × (1 - false negative rate) + γ × user satisfaction, where α = 0.4, β = 0.5, and γ = 0.1.
[0028] 3) Employ a multi-parameter collaborative threshold strategy: heart rate threshold is determined based on mean and standard deviation; blood oxygen threshold is calculated based on baseline value and interquartile range; pressure threshold is locally adaptively adjusted based on weight and body type.
[0029] 4) It has the ability to adapt to time series and detect data distribution drift; when three consecutive alarms are not confirmed, it is determined to be a new baseline and the statistical benchmark is recalculated to prevent the accumulation of false alarms.
[0030] Furthermore, the personalized health module based on transfer learning utilizes publicly available medical datasets for model pre-training, combines online learning mechanisms to achieve personalized model fine-tuning, and adopts a multi-task learning architecture to simultaneously optimize multiple tasks such as pressure ulcer prediction, sleep assessment, and anomaly detection.
[0031] 1) The pre-training stage utilizes the dataset for multi-task learning, including multi-classification of normal / abnormal states, regression prediction of physiological parameters in the next time step, and self-supervised contrastive learning to enhance feature representation capabilities;
[0032] 2) During the fine-tuning phase, individual baseline data is collected, and the model parameters are updated incrementally through continuous learning; the bottom layer remains frozen, and only the subsequent convolutional layers, attention layers, and output layers are fine-tuned, and L2 regularization is introduced to prevent overfitting of small samples; the bottom layer includes: the first two layers of CNN and the embedding layer of Transformer;
[0033] 3) Construct a hierarchical transfer architecture: The first layer transfers edge detection and texture recognition from the pre-trained model to identify the spatial distribution patterns of the stress matrix; the second layer transfers domain features from the medical dataset to capture physiological signal patterns such as heart rate variability and blood oxygen fluctuations; the third layer trains task-specific layers using only individual data to learn the unique health characteristics of users; in terms of personalized adaptation, the MAML meta-learning algorithm is introduced to train rapid adaptation capabilities on multiple individual samples, enabling new users to achieve practical performance with only a small number of samples; at the same time, a federated learning framework is adopted to achieve multi-user collaborative training, and improves generalization ability through model parameter aggregation while ensuring data privacy.
[0034] A method for an emotionally interactive intelligent nursing system based on multimodal perception, the method comprising:
[0035] Step 1: Data Acquisition and Edge Preprocessing Stage: Various sensors in the sensing layer acquire data in real time according to differentiated sampling strategies: the pressure matrix continuously acquires body pressure distribution information at a frequency of 10Hz; the heart rate and blood oxygen sensor acquires physiological parameters at a frequency of 1Hz; the urine moisture sensor adopts an event-triggered mode; the ambient temperature and humidity sensor samples periodically at a frequency of 0.1Hz; after receiving the raw sensor data, the sub-control device performs preliminary processing using the built-in edge computing module; Kalman filtering is used to eliminate acquisition noise, and Min-Max normalization is used to map various types of data to a unified dimension, extracting basic features in the time and frequency domains, laying the data foundation for subsequent analysis;
[0036] Step 2: Real-time Edge Detection and Rapid Response Phase: The sub-control device deploys a lightweight anomaly detection algorithm for rapid response in emergencies; the bed-off detection algorithm continuously monitors the total pressure matrix, and determines a bed-off event when the value is below a preset threshold for more than 5 seconds; the urine wetness detection algorithm identifies urinary incontinence by monitoring sudden changes in resistance value; the physiological extreme anomaly detection algorithm compares the normal heart rate range with the safe blood oxygen saturation line in real time, and immediately triggers an alarm if it exceeds the safe range; when an abnormal event is detected, the sub-control device immediately triggers a local alarm mechanism, realizing multimodal alarm through voice prompts, buzzers, and LED indicators. The entire response process delay is controlled within milliseconds to ensure timely handling of emergencies.
[0037] Step 3: Cloud-based Intelligent Analysis and Health Assessment Stage: The sub-control device uploads a complete data packet to the cloud every minute via the MQTT protocol. After receiving the data, the cloud analysis layer initiates parallel processing using a three-layer neural network: the CNN network extracts spatial distribution features from the pressure matrix to identify posture patterns and high-pressure areas; the Transformer network integrates spatial features with time-series data of heart rate and blood oxygen to model the temporal dependencies of physiological parameters; and the EmotionNet network is triggered on demand in voice interaction scenarios to analyze the user's emotional state. The multimodal health assessment model integrates the outputs of the three networks to generate a pressure ulcer risk score and a heat map of high-risk areas, evaluates sleep quality levels, and identifies emotional state categories. Simultaneously, a reinforcement learning-based dynamic threshold adjustment algorithm adaptively optimizes various alarm parameters based on individual historical data and real-time status using the DQN algorithm, minimizing both false alarm and false negative rates.
[0038] Step 4: Tiered Early Warning Stage: The cloud platform generates tiered early warning information based on health assessment results and risk levels. When a high-risk situation occurs, the system pushes the early warning information to the control layer in real time via the MQTT protocol, triggering audible and visual alarms and voice broadcasts; simultaneously, it pushes notification messages to the nursing and family terminals via the WebSocket protocol, including risk score values, heat maps of high-risk areas, and targeted nursing suggestions; for different emotional states, the system executes differentiated interaction strategies: when anxiety or sadness is detected, preset soothing music is automatically played and the voice companion function is activated; when anger or fear is identified, nursing staff are prioritized for manual intervention, achieving closed-loop management of emotional care;
[0039] Step 5: Health Record Management and Continuous Model Optimization Phase: The system automatically generates health reports on a daily, weekly, and monthly basis, displaying key indicators such as pressure ulcer risk changes and sleep quality trends through visual charts, providing decision support for medical staff. A cloud database continuously stores all sensor data, model outputs, and user feedback, forming a complete data loop. The AI model employs a periodic retraining mechanism, updating transfer learning parameters based on accumulated data, optimizing dynamic threshold strategies, and expanding personalized health records. Before new models are deployed, A / B testing is used to verify the update effects, ensuring continuous improvement in model performance and enhanced personalization to adapt to dynamic changes in individual health conditions.
[0040] The beneficial effects of this invention are as follows:
[0041] (i) Achieve high-resolution, multi-dimensional health monitoring results;
[0042] This invention integrates a matrix pressure sensor array into a nursing mattress to acquire real-time multi-point pressure distribution of the elderly person's body on the bed surface, achieving high-resolution body pressure mapping. Compared to existing single-point or linear pressure monitoring structures, this invention can accurately identify the spatial location of localized pressure areas, enabling early identification and quantitative assessment of pressure ulcer risk areas. By fusing multimodal sensor data such as pressure matrix, heart rate, blood oxygen, ambient temperature and humidity, and urine wetness, this invention significantly expands the monitoring dimensions, enabling a comprehensive assessment of health status from multiple physiological indicators. Compared to single-parameter monitoring methods, it has a more comprehensive information acquisition capability, effectively improving monitoring accuracy and nursing efficiency.
[0043] (ii) Significantly improve system response speed and emergency handling capabilities;
[0044] This invention introduces an edge computing module into the control layer to achieve local preprocessing and anomaly detection of sensor data. When abnormal body position, wetness, or getting out of bed is detected, the main control device can trigger voice prompts or audible and visual alarms within milliseconds, without relying on cloud-based judgment. This edge-cloud collaborative architecture significantly reduces system latency, substantially improves emergency response speed and system robustness, and ensures timely intervention in critical moments.
[0045] (iii) Achieve intelligent health analysis and early warning capabilities;
[0046] By introducing a deep learning model (a fusion architecture of CNN and Transformer) into the cloud-based analysis layer, this invention can automatically learn the temporal features and spatial correlations between multimodal signals, enabling intelligent prediction of sleep quality, postural changes, and pressure ulcer risk. This invention possesses early warning capabilities, issuing risk alerts hours before pressure ulcers actually occur, allowing caregivers sufficient intervention time. Compared to traditional threshold-based algorithms, this model has adaptive learning and trend analysis capabilities, continuously optimizing prediction results, thereby improving the intelligence and accuracy of health assessments. Clinical validation shows that this system can effectively reduce the incidence of pressure ulcers and improve the quality of care.
[0047] (iv) Achieving personalized prediction and rapid adaptation capabilities;
[0048] This invention employs a transfer learning strategy based on large-scale medical datasets, significantly reducing the number of training samples required for new users. Through the combination of a hierarchical transfer architecture and a meta-learning mechanism, the system achieves high accuracy during the cold start phase and can be further improved to a stable operating level after a short period of data accumulation. This approach demonstrates excellent generalization ability for individuals of different ages, weights, and medical histories, quickly adapting to individual differences and achieving truly personalized health monitoring and prediction.
[0049] (v) Construct an adaptive dynamic threshold adjustment mechanism;
[0050] This invention introduces an adaptive dynamic threshold algorithm based on reinforcement learning, which automatically optimizes various alarm parameters according to individual historical data and real-time status. This mechanism effectively reduces the false alarm rate and false negative rate, significantly reducing the interference of invalid alarms on caregivers and the elderly while ensuring safety. The adaptive threshold can automatically respond to individual differences, diurnal fluctuations, and environmental changes, maintaining stable detection performance without manual parameter tuning. The system's overall evaluation indicators are significantly better than traditional fixed threshold schemes.
[0051] (vi) Construct an integrated mechanism for emotion recognition and psychological care;
[0052] This invention integrates the EmotionNet emotion recognition model in the cloud, enabling emotion recognition from the acoustic features of elderly people's speech and accurately determining various emotional states. Based on the recognition results, the system automatically generates personalized reassuring voices or triggers manual intervention, forming a closed loop of "emotion recognition—intelligent response—human intervention." This mechanism achieves comprehensive care from physiological monitoring to psychological support, effectively improving the emotional state and mental health of the elderly, constructing a dual "physiological-psychological" care system, and enhancing the humanistic care level of intelligent nursing.
[0053] (vii) Achieve the system's self-learning and continuous optimization capabilities;
[0054] By establishing a closed-loop storage mechanism for historical data in the cloud, the system can continuously record perceived data, model output, and user feedback, and periodically update the neural network model parameters, enabling adaptive optimization and long-term learning of the model. This mechanism allows the system to maintain high prediction accuracy and stability under different usage environments and individual differences. The model performance continuously improves over time, evolving automatically without human intervention, possessing continuously evolving intelligent care capabilities, and ensuring the long-term effectiveness and reliability of the system. Attached Figure Description
[0055] Figure 1 This is a five-layer overall architecture diagram of the multimodal nursing bed system in an embodiment of the present invention.
[0056] Figure 2 This is a layout diagram of the sensor layer in an embodiment of the present invention.
[0057] Figure 3 This is an architecture diagram of the main control device and sub-control devices in the control layer of this invention.
[0058] Figure 4 This is a detailed schematic diagram of the network layer structure in an embodiment of the present invention.
[0059] Figure 5 This is a schematic diagram of the four core algorithm modules of the cloud analysis layer in an embodiment of the present invention.
[0060] Figure 6 This is a schematic diagram of the CNN-Transformer fusion network architecture in an embodiment of the present invention.
[0061] Figure 7 This is a flowchart illustrating the adjustment process of the adaptive dynamic threshold algorithm in this embodiment of the invention.
[0062] Figure 8 This is a schematic diagram of the three-layer architecture of transfer learning in an embodiment of the present invention.
[0063] Figure 9This is a diagram of the three-layer neural network (CNN+Transformer+EmotionNet) fusion architecture in an embodiment of the present invention. Detailed Implementation
[0064] To better understand the purpose, structure, and function of this invention, the following detailed description of an intelligent elderly care system based on multimodal perception and artificial intelligence analysis is provided in conjunction with the accompanying drawings.
[0065] I. Overall Structure and System Architecture
[0066] like Figure 1 As shown, the system of this invention consists of five parts: a perception layer, a control layer, a network layer, a cloud analysis layer, and a user interaction layer. The system uses a multimodal sensor array to collect physiological and environmental data of the elderly in real time, utilizes edge computing and cloud-based deep learning models for analysis and prediction, and ultimately pushes health and emotional results synchronously to both the caregiver and family members, achieving a unified approach to intelligent monitoring and emotional care.
[0067] The workflow of the entire system is as follows: Figure 2 As shown, the sensing layer consists of a matrix pressure sensor array, heart rate sensor, blood oxygen sensor, urine moisture sensor, and temperature and humidity sensor. The matrix pressure sensor array is installed inside the nursing mattress, using a flexible electrode cross structure to form 32 independent pressure sensing units, capable of acquiring high-resolution body pressure distribution information. The sensor array is connected to the sub-control device via a multiplexing circuit and an analog-to-digital converter module to achieve parallel acquisition of multi-point pressure. The heart rate and blood oxygen sensors are placed in the areas of contact with the monitored person's body, while the urine moisture and temperature and humidity sensors are placed on the bed and in the environmental detection unit. Data from all sensors is aggregated to the sub-control device for preliminary filtering and feature extraction. The sensors in the sensing layer collect data according to a differentiated sampling strategy at 10Hz for the pressure matrix, 1Hz for heart rate and blood oxygen, 0Hz for urine moisture event triggering, and 0.1Hz for environmental parameters, and then transmit the data to the sub-control device. The sub-control device performs preprocessing such as Kalman filtering and normalization, and performs lightweight anomaly detection. Figure 3The diagram shows the architecture of the main control device and sub-control devices in the control layer. The main control device is responsible for voice recognition, motor control, voice broadcasting, and communication with the cloud. It includes an AI voice module, a voice broadcasting module, a motor control module, and a communication module. The AI voice module recognizes the elderly person's voice commands using a voice recognition chip and analyzes voice features using an emotion judgment algorithm. The voice broadcasting module outputs soothing voices or health prompts. The motor control module drives bed adjustments or posture changes. The sub-control devices are responsible for sensor data acquisition and edge computing. They include a data acquisition module and an edge computing module. The data acquisition module connects to sensors via I²C, UART, and ADC interfaces. The edge computing module performs filtering, anomaly detection, and posture recognition algorithms to achieve local decision-making.
[0068] The output data of the control device is transmitted via, for example... Figure 4 The network layer transmits data to the cloud; the network layer includes a Wi-Fi communication module and an MQTT Broker server; the control layer connects to the MQTT Broker in the network layer via the Wi-Fi module to implement data publishing and subscription; the MQTT Broker, as a message middleware, is responsible for managing data transmission between the device and the cloud, providing reliable message queues, topic management, and state maintenance mechanisms. When an emergency is detected, a local alarm is immediately triggered, with response latency controlled within milliseconds, and data is simultaneously uploaded to the cloud via the MQTT protocol; for example... Figure 5 The cloud-based analysis layer, as shown, employs a CNN-Transformer fusion network and an EmotionNet emotion recognition network for deep analysis, outputting results such as pressure ulcer risk scores, sleep quality assessments, and emotional states. It also adaptively optimizes dynamic thresholds using the DQN algorithm. Based on the analysis results, it generates tiered early warning information. The system continuously accumulates data and periodically updates model parameters, achieving self-learning and continuous optimization. The cloud-based analysis layer includes a cloud server, a multimodal data preprocessing module, and four core algorithm modules. The cloud server, built on the Python Flask framework, is responsible for receiving multimodal data from the control layer: pressure matrix, heart rate, blood oxygen, urine moisture, and environmental parameters.
[0069] The core technology of the present invention will be described in detail below through specific embodiments.
[0070] Specific implementation of a matrix pressure sensor array;
[0071] This embodiment describes in detail the hardware configuration, software implementation, and calibration process of the matrix pressure sensor array.
[0072] Hardware Configuration: This embodiment uses 32 FSR406 flexible thin-film pressure sensors, embedded in the nursing mattress in an 8x4 configuration. The sensor spacing is set to 2.5 cm, covering the main pressure areas of the human body, including the shoulders, back, hips, legs, and heels. The sampling circuit uses a 16-channel analog multiplexer CD74HC4067 in conjunction with a 16-bit analog-to-digital converter ADS1115 to achieve high-precision signal acquisition. The system is powered by 5 volts, and the total power consumption is controlled within 2 watts, meeting the requirements for long-term continuous operation.
[0073] Software Implementation: The system is programmed and controlled using an ESP32-S3 microcontroller. It scans the sensor array at a frequency of 10Hz, completing a full scan of all 32 sensor points every 100 milliseconds. The acquired raw signals are first noise-reduced using a Kalman filter with process noise covariance set to 0.001 and measurement noise covariance to 0.01. The filtered signals are then mapped to the standard range of 0 to 1 using a Min-Max normalization algorithm, with the normalization parameters calibrated based on a maximum load capacity of 150 kg. The processed data is stored in an 8x4 matrix using 32-bit floating-point data type and uploaded to a cloud server via the MQTT protocol.
[0074] Calibration Process: To ensure measurement accuracy, the system employs a four-step calibration process. The first step is no-load calibration, where the zero-point reference is recorded with the mattress unattended to eliminate initial sensor deviation. The second step is weight calibration, using three standard weights (50 kg, 100 kg, and 150 kg) for three-point calibration to establish a linear mapping between the sensor output and actual pressure. The third step is temperature compensation, incorporating a temperature compensation algorithm for linear correction to address the sensor's resistance variation with temperature. The compensation coefficient is determined through testing under different temperature conditions. The fourth step is automatic maintenance, where the system automatically performs a zero-point drift calibration every 24 hours to ensure long-term measurement stability.
[0075] Technical benefits: Compared with traditional single-point or linear pressure monitoring methods, the matrix pressure sensor array in this embodiment can acquire a two-dimensional pressure distribution map, accurately locate local high-pressure areas, and provide high-quality spatial feature data for subsequent pressure ulcer risk assessment.
[0076] Training and deployment of CNN-Transformer fusion networks;
[0077] like Figure 6 As shown, this embodiment details the structural design, training process, and performance evaluation of the CNN-Transformer fusion network.
[0078] Network Structure Design: The fusion network proposed in this embodiment includes three main modules. The first module is a CNN spatial feature extraction network, which reconstructs the data from the 32 sensor points collected in Embodiment 1 into an 8x4 single-channel tensor as input. After three layers of convolution processing, the number of filters is 32, 64, and 128 respectively, the kernel size is 3x3, and the stride is 1. Each convolution layer is followed by a ReLU activation function and batch normalization, and then dimensionality reduction is achieved through 2x2 max pooling. Finally, a 128-dimensional spatial feature vector is output, which captures local pressure concentration patterns and body position distribution features.
[0079] The second module is the Transformer temporal modeling network, whose inputs are the concatenated spatial feature vector, heart rate, and blood oxygen time series. The temporal data is sampled using a 60-second sliding window with a 10-second step size. The network employs an 8-head multi-head self-attention mechanism to capture dependencies at different time scales, combined with sine and cosine positional encoding to preserve temporal information. After processing through two fully connected feedforward layers, the hidden layer dimension is 512, outputting spatiotemporal fusion features that effectively model temporal patterns such as body position reversal frequency and peak pressure duration.
[0080] The third module is the fusion output layer, which concatenates the spatial features extracted by the CNN with the temporal features modeled by the Transformer. This concatenation is then processed by a three-layer multilayer perceptron classifier with dimensions of 256, 128, and 64 respectively, and a Dropout coefficient of 0.3 is configured to prevent overfitting. The system uses dual output heads: the pressure ulcer risk score uses Sigmoid activation to output a risk probability ranging from 0 to 1; and the sleep quality assessment uses Softmax activation to achieve four categories: deep sleep, light sleep, REM sleep, and wakefulness.
[0081] Training Process: The dataset preparation phase combined publicly available medical datasets with self-collected data. Public datasets included the MIMIC-IV and PhysioNet datasets, while self-collected data came from continuous monitoring records of elderly volunteers. All data were annotated by professional nurses, including a five-level pressure ulcer risk rating and sleep quality assessment. The dataset was divided into training, validation, and test sets.
[0082] Model training was implemented using the PyTorch deep learning framework. Training hyperparameter settings included batch size, number of training epochs, and an early stopping mechanism. The optimizer employed the Adam algorithm, with settings for the learning rate and weight decay coefficients. Cosine annealing was used for learning rate scheduling. A multi-task learning framework was used for the loss function: binary cross-entropy loss was used for pressure ulcer risk prediction, and cross-entropy loss was used for sleep quality classification; the total loss was the weighted sum of the two.
[0083] Performance Evaluation: Performance was evaluated on an independent test set, achieving high accuracy in both pressure ulcer risk prediction and sleep quality classification. The model inference speed meets the requirements for real-time analysis. The model files have been quantized and compressed for easy deployment on edge devices.
[0084] Technical Results: The CNN-Transformer fusion network in this embodiment achieves multi-dimensional modeling of health status through the collaborative work of spatial feature extraction and temporal dependency modeling. Compared with solutions using CNN or Transformer alone, the fusion network shows a significant performance improvement. This network can issue a warning of pressure ulcer risk several hours in advance, giving caregivers sufficient time to intervene.
[0085] Practical applications of adaptive dynamic threshold algorithm;
[0086] like Figure 7 As shown, this embodiment details the implementation process of the adaptive dynamic threshold algorithm based on sliding window statistics and reinforcement learning.
[0087] Initialization Phase: The system enters the initialization phase for the first 7 days after a new user starts using it, collecting individual baseline data. The system periodically collects heart rate, blood oxygen, and stress information. After data accumulation, the system calculates statistical parameters, including the mean, standard deviation, and percentiles of heart rate; the mean, interquartile range, and percentiles of blood oxygen; and the mean and standard deviation of stress.
[0088] Initial thresholds are set based on these statistical parameters. Heart rate anomaly detection uses a strategy of adding or subtracting a certain number of standard deviations from the mean; blood oxygen anomaly detection uses a strategy of subtracting a certain number of interquartile ranges from the baseline value; and stress anomaly detection uses a strategy of adding a certain number of standard deviations to the mean. Initial values are set for the adjustment parameters. Adaptive Adjustment Phase: After entering the initialization phase, the system initiates an adaptive adjustment mechanism based on the Deep Q-Network (DQN) algorithm. The state space of the DQN algorithm includes current physiological parameters, historical statistical information, and alarm history. The action space contains the values of the adjustment parameters. The reward function comprehensively considers the false alarm rate, false negative rate, and user satisfaction.
[0089] The system monitors the false positive rate and false negative rate in real time. The DQN algorithm selects actions and adjusts threshold parameters based on the current state. Through continuous optimization, the false positive rate has gradually decreased, while the false negative rate has remained at a low level.
[0090] Multi-parameter collaborative threshold strategy: Considering the differences in characteristics of different physiological parameters, the system adopts an independent threshold strategy. The heart rate threshold uses a symmetrical double boundary, the blood oxygen threshold uses a single boundary focusing only on low abnormal values, and the pressure threshold uses a local adaptive strategy, dynamically adjusting according to individual characteristics such as weight and body type.
[0091] Timing-adaptive mechanism: The system has the ability to detect data distribution drift. When multiple alarms are not confirmed by nursing staff, the system determines that baseline drift may have occurred and automatically triggers the baseline recalculation process. For example, in a cold and fever scenario, if the baseline heart rate continues to rise, the system will detect consecutive false alarms and automatically recalculate the baseline parameters and update the thresholds using recent data to avoid continuous false alarms.
[0092] Another typical scenario is nighttime blood oxygen fluctuations. When sleep apnea symptoms occur, the system detects periodic fluctuations through a time-series pattern recognition algorithm, identifies it as a sleep apnea pattern, dynamically lowers the blood oxygen threshold, successfully triggers an alarm, and promptly reminds the user to seek medical attention.
[0093] Technical effect: Compared with the traditional fixed threshold scheme, the adaptive dynamic threshold algorithm of this embodiment can automatically adapt to individual differences, changes in physiological state and environmental factors, and significantly reduce invalid alarms while ensuring safety.
[0094] Cold start optimization process based on transfer learning;
[0095] like Figure 8 As shown, this embodiment describes in detail the application of transfer learning strategies in the rapid adaptation of new users.
[0096] Pre-trained model preparation: This embodiment first trains the base model on a large-scale medical dataset. The MIMIC-IV and PhysioNet datasets are used for pre-training. The pre-training tasks are designed as a multi-task learning framework, including binary classification tasks for normal and abnormal states, regression prediction tasks for future physiological parameters, and self-supervised contrastive learning tasks. The pre-training process employs a distributed training approach.
[0097] Individual data fine-tuning process: Taking the actual fine-tuning process of a new user as an example to illustrate the cold start effect. When a user starts using the system, the system automatically loads the pre-trained model parameters as initialization.
[0098] During the cold start period, the system collects user baseline data. The fine-tuning strategy employs a parameter freezing approach, training only the model's output layer while keeping the parameters of other layers unchanged. The optimizer is set with a small learning rate, using early-stage data as the training set and later-stage data as the validation set. After fine-tuning, the accuracy of pressure ulcer prediction is significantly improved. During the adaptation period, the system unfreezes the model's higher-level parameters, including the later convolutional layers and the Transformer's attention layer. A differentiated learning rate strategy is adopted, setting different learning rates for different layers. Training with accumulated data further improves accuracy.
[0099] During the stabilization period, the system fine-tunes the entire model, with all layer parameters participating in the update. A hierarchical learning rate strategy is employed, with incremental learning rates set for the bottom, middle, and output layers. Ultimately, a high level of accuracy is achieved.
[0100] Hierarchical Transfer Architecture: This embodiment further constructs a hierarchical transfer architecture, dividing the transfer process into three layers. The first layer transfers low-level visual features from a ResNet-18 model pre-trained on ImageNet, including edge detection and texture recognition capabilities, to identify the spatial distribution patterns of the stress matrix. The second layer transfers domain features of physiological signal processing from medical datasets, including capabilities such as heart rate variability analysis and blood oxygen fluctuation pattern recognition. The third layer trains task-specific layers using only individual data, learning the user's unique health characteristics.
[0101] Personalized Adaptation Mechanism: This embodiment also introduces the meta-learning MAML algorithm to enhance rapid adaptation capabilities. The MAML algorithm learns a good initialization parameter on data from multiple users, enabling the model to quickly adapt to new users with a small number of gradient updates. Furthermore, a federated learning framework is employed to achieve privacy-preserving aggregation of multi-user data. Each user's data is kept locally, and only model parameter updates are uploaded to the central server, protecting user privacy while simultaneously optimizing the model.
[0102] Technical Effects: The transfer learning strategy in this embodiment fully utilizes common patterns in large-scale medical datasets, enabling new users to achieve high prediction accuracy in a short time, thus solving the cold start problem for individual data. Compared to training from scratch, the required training samples are significantly reduced, and the training time is greatly shortened. This approach maintains high accuracy for individuals of different ages, weights, and medical histories, and the model can be continuously optimized based on new data.
[0103] Practical applications of a three-layer neural network fusion architecture for emotion recognition;
[0104] like Figure 9 As shown, this embodiment describes in detail the implementation of the three-layer neural network fusion architecture and its application in emotion recognition and interaction.
[0105] Fusion Architecture Design: This embodiment integrates three core neural networks to construct a complete closed-loop system of "perception-analysis-interaction". The first layer is a CNN spatial feature extraction network, responsible for extracting spatial distribution features from the pressure matrix and outputting a 128-dimensional spatial feature vector for identifying body position changes and locating high-pressure areas. The second layer is a Transformer temporal modeling network, responsible for capturing multimodal temporal dependencies and outputting spatiotemporal fusion features to achieve pressure ulcer risk prediction and sleep quality assessment.
[0106] The third layer is the EmotionNet emotion recognition network, which is the core innovation of this embodiment. The input of EmotionNet is the acoustic features of speech, including parameters such as fundamental frequency F0, energy, and formants F1 to F4. In the feature extraction stage, the Mel-frequency cepstral coefficient (MFCC) algorithm is used to convert the original speech signal into an MFCC feature vector, including static features, first-order differences, and second-order differences.
[0107] The EmotionNet network architecture consists of three parts. The first part is a deep CNN, containing multiple convolutional layers, each followed by ReLU activation, batch normalization, and max pooling. The CNN is responsible for extracting local feature patterns in speech. The second part is an LSTM layer, containing multiple bidirectional LSTM layers, responsible for modeling the temporal changes in intonation and capturing the evolution of emotions over time. The third part is a classifier, containing fully connected layers that output the probability distribution of multiple emotion classes through Softmax activation, including calm, joy, anxiety, sadness, anger, and fear.
[0108] The three-network convergence strategy involves joint training of the three networks using a multi-task learning framework. CNN and Transformer share the underlying feature extraction module, while EmotionNet is trained independently. The overall loss function is designed as a weighted sum of stress-related and emotion-related losses. Through end-to-end backpropagation, the parameters of the three networks are simultaneously optimized, enabling the system to achieve synergistic enhancement in physiological monitoring, health prediction, and emotional care.
[0109] Feedback Mechanism Design: Based on the emotional states identified by EmotionNet, the system implements a differentiated feedback strategy. When anxiety or sadness is detected, the system automatically selects soothing music from the preset music library and plays it, while simultaneously activating the voice companion function to play pre-recorded comforting voice messages. When anger or fear is detected, it is determined that there may be an emergency or serious discomfort, and the system prioritizes pushing an emergency notification to the nursing staff via the MQTT protocol for manual intervention. When joy or calmness is detected, the system provides a simple voice response and records positive emotional events to update the mental health score.
[0110] Practical application scenarios:
[0111] Scenario 1 involves handling anxiety late at night. The system picks up the elderly person's call via the bedside microphone. After MFCC feature extraction, the voice signal is input into EmotionNet. The network identifies it as anxiety and immediately triggers a calming mechanism, playing soothing music and a calming voice message. The system pushes a non-emergency notification to the family member, including the time, emotion type, and the response measures already taken. After a period of time, the system picks up the voice signal again, confirms that the emotion has returned to calm, and records the event in the emotion log.
[0112] Scenario 2 is an emergency call for help due to pain. The system receives the elderly person's call, which EmotionNet identifies as a mixed emotion of pain and fear, classifying it as an emergency. The system immediately sends an emergency notification to the caregiver, including the elderly person's location, emotional state, current physiological parameters, and pressure ulcer risk score. A reassuring voice is played through the bedside speaker. The system automatically correlates this with the current pressure ulcer risk heatmap, identifying high-pressure areas and inferring that the pain may be pressure-related. Upon arrival, the caregiver adjusts the elderly person's position and provides care based on the information provided by the system.
[0113] Scenario 3 involves interaction based on positive emotions. The system captures the elderly person's cheerful words, which EmotionNet identifies as a positive emotion, triggering a positive interactive response and playing a voice reply. The system records this positive emotional event in the mental health scoring module, resulting in an increase in the user's mental health score. The system does not need to send notifications to the caregiver; it only reflects the user's good mental state in their daily health report.
[0114] Technical Effects: The three-layer neural network fusion architecture in this embodiment achieves closed-loop management from physiological monitoring to psychological care. EmotionNet achieves high recognition accuracy on a self-built emotional speech dataset. By combining with physiological monitoring functions such as pressure ulcer risk prediction and sleep quality assessment, the system can provide comprehensive care. Clinical trials show that elderly people using this invention experience improved mental health and enhanced nursing efficiency.
[0115] The specific implementation of the software system;
[0116] This embodiment describes in detail the software architecture supporting the entire intelligent nursing mattress system and the implementation methods of each module.
[0117] Software Architecture: This embodiment adopts a three-layer "device-edge-cloud" software architecture. The device on the device side is the ESP32 hardware, responsible for sensor data acquisition and actuator control; the device on the edge side is the Flask backend server, responsible for business logic processing and data persistence; and the device on the cloud side is the frontend UI, responsible for user interaction and data visualization. The three layers communicate with each other via the MQTT and WebSocket protocols, forming a complete closed loop.
[0118] The edge-side ESP32 hardware module: This module is deployed on the ESP32-S3 development board, serving as the interface for system interaction with the physical world. The initialization process includes NVS flash initialization, Wi-Fi connection configuration, SNTP time synchronization, hardware resource initialization including ADC and I²C, and establishing a connection with the MQTT Broker. In the data acquisition and preprocessing process, various sensors operate at preset frequencies: the pressure sensor array is sampled via ADC, the heart rate and blood oxygen sensors are read via I²C, the urine moisture sensor is triggered by a GPIO interrupt, and the ambient temperature and humidity sensor is read via a single-bus protocol. The acquired raw data undergoes local preprocessing, including converting ADC values to actual physical quantities, signal filtering, and data normalization. The processed data is formatted as a JSON message and uploaded to the cloud server via the MQTT protocol. The ESP-NOW communication mechanism is used for short-range data exchange between the master ESP32 and the slave ESP32. Initializing the ESP-NOW function, registering callback functions, and adding peer devices enables low-latency, low-power direct communication. The main ESP32 sends control commands to the sub-ESP32 via ESP-NOW. Upon receiving these commands, the sub-ESP32 drives the motor to adjust the bed position. A local AI model processes the voice data. The ESP32's integrated array microphone collects voice signals, and a lightweight speech recognition model is deployed locally to recognize common commands. Simultaneously, an emotion feature extraction model is deployed to extract features such as fundamental frequency, energy, and formants from the speech, package them, and send them to the cloud via MQTT for complete emotion recognition by EmotionNet.
[0119] The edge-side backend server module is built on the Python Flask framework. The startup initialization process includes loading Flask application configuration, configuring MQTT connection parameters, initializing Flask-SocketIO, setting the data file path, and loading historical data from a JSON file into memory. The user management module provides a RESTful API interface. The registration interface receives user information, hashes and encrypts the password, and stores it in the database. The login interface verifies the username and password, generating a JWT token upon successful login. User types include caregivers, family members, elderly, and administrators, each with different permissions. The MQTT data receiving and processing module subscribes to multiple topics. It subscribes to the sensor data topic, receives data, parses the JSON payload, extracts the device ID, timestamp, and sensor values, and processes it according to business logic, including generating logs, updating device status, triggering anomaly detection, and calling deep learning models for prediction. It subscribes to the alarm topic, receives wetness alarms, updates the status, generates alarm logs, and pushes alarm pop-ups via Socket.IO. It subscribes to the messaging topic, receives chat messages, analyzes sentiment information, records it in logs, and pushes it to the front end. The WebSocket real-time communication module is implemented based on Flask-SocketIO. Establish persistent connections to process real-time events sent from the front end, including user chat and bed control, and push updates to the front end in real time, including new data, logs, and alarm events.
[0120] The multimodal data analysis module integrates a neural network model. The model is saved in PyTorch format and loaded into memory upon backend startup. Whenever new sensor data is received, the backend inputs the data into the model for inference, and the inference results are stored in the database and pushed to the frontend. The data persistence module employs an asynchronous batch write strategy. A daemon thread is created at system startup to periodically serialize the data in memory into JSON format and write it in batches to disk files. A dual-file backup strategy is used to prevent data loss.
[0121] Cloud-based Frontend UI Module: This module adopts a frontend-backend separation architecture, using HTML5, CSS3, and JavaScript to build a single-page application. Key JavaScript libraries include Vue.js, Socket.IO Client, Axios, and ECharts. The interface is built using a responsive design, supporting both desktop and mobile devices. Components are dynamically rendered based on the user's login status and type, displaying different interface content and functional modules for different user types. WebSocket real-time communication establishes a Socket.IO connection with the backend when the page loads. Event listeners are registered to monitor data update events, alarm events, and new message events, updating the interface upon receipt. User actions are triggered via Socket.IO. HTTP API interaction is used for one-time data retrieval and authentication. After user login, the frontend sends a POST request to obtain and store a JWT token; subsequent requests carry this token for authentication.
[0122] The functionality includes historical log viewing with pagination, sorting, and filtering support; real-time pop-up alerts with a tiered alert strategy; dynamic data visualization using the ECharts library to create real-time curves, pressure ulcer risk heatmaps, and sleep quality pie charts; and instant messaging with a chat window, input box, and send button, with message formatting distinguishing the sender. Interaction methods include voice and touch. Voice interaction uses the Web Speech API or a third-party speech recognition service; after the elderly click a button, their voice is captured and sent to the backend for recognition and response. Touch interaction features a simplified interface design for elderly users, with enlarged buttons, larger fonts, and high-contrast color schemes.
[0123] Technical Effects: The software system in this embodiment achieves end-to-end management of data acquisition, transmission, processing, and control. The end-to-end latency of the entire system is controlled within a reasonable range, meeting real-time monitoring requirements. The system supports concurrent access by multiple users and maintains stable performance under stress testing. Data persistence employs an asynchronous batch write strategy, improving system stability and response speed.
[0124] Summary of the overall advantages of the system;
[0125] In summary, this invention achieves high-resolution body pressure monitoring through a matrix-type pressure sensor array, intelligent health analysis and early warning through a CNN-Transformer fusion network, addresses individual differences and false alarms through an adaptive dynamic threshold algorithm, enables rapid adaptation and personalized prediction through transfer learning strategies, achieves physiological monitoring and psychological care through a three-layer neural network fusion, and realizes end-to-end management of data acquisition, processing, and control through an "edge-cloud" software architecture. These core technologies work synergistically to construct a complete intelligent nursing system encompassing "perception-analysis-early warning-care," significantly improving the intelligence and humanization of elderly care.
[0126] This invention achieves systematic innovations compared to existing technologies in terms of perception structure, data processing architecture, disease prediction, and personalized adaptation, as detailed below:
[0127] (I) Structural Optimization of the Flexible Pressure Sensing Array: This invention employs a flexible electrode cross structure to construct a pressure sensing array, deploying 32 independent sensing units in the key pressure-bearing areas of the mattress in an 8×4 arrangement with a sensing point spacing of 2.5cm. For signal acquisition, a multiplexed circuit (MUX) combined with a 16-bit resolution ADC is used to accurately acquire a two-dimensional body pressure distribution map, with the sampling frequency set to 10Hz to meet the real-time monitoring requirements of body position changes.
[0128] (ii) At the signal processing level, a Kalman filter is introduced for noise cancellation, and Hamming window convolution is used to achieve signal smoothing. The Min-Max normalization algorithm is used to map the pressure value to the standard interval [0,1], which is convenient for subsequent neural network processing.
[0129] (III) Multimodal Fusion Deep Learning Analysis Architecture: This invention constructs a multimodal data analysis framework that combines edge computing and deep learning, breaking through the limitations of traditional threshold alarm mechanisms. This framework achieves joint extraction of spatial-temporal features through the collaborative work of CNN and Transformer models.
[0130] 1) The CNN module is responsible for spatial feature extraction: After reconstructing the 32×1 pressure matrix into an 8×4×1 tensor, it is processed by three layers of convolution (32 / 64 / 128 filters, 3×3 convolution kernels, stride 1). Each layer is configured with ReLU activation and batch normalization. Dimensionality is reduced by 2×2 max pooling to output a 128-dimensional spatial feature vector, capturing local pressure concentration patterns and body position distribution features.
[0131] 2) The Transformer module is responsible for temporal feature modeling: the input is the concatenated spatial feature vector, heart rate and blood oxygen time series (60-second window, 10-second step); an 8-head multi-head self-attention mechanism is used to capture the dependencies at different time scales, and sine-cosine position encoding is combined to preserve temporal information; after processing by a two-layer fully connected feedforward network (512 hidden dimensions), the output is spatiotemporal fusion features, which model temporal patterns such as body position reversal frequency and stress peak duration.
[0132] 3) The fusion output layer concatenates spatial features and temporal features, and uses a three-layer MLP classifier (256→128→64) with a Dropout of 0.3. Dual output heads are set: the pressure ulcer risk score uses Sigmoid activation to output the risk probability of [0,1], and the sleep quality evaluation uses Softmax activation to achieve multi-classification.
[0133] (iv) Accurate prediction and quantitative assessment system for pressure ulcer risk: This invention integrates a deep learning prediction model with the traditional Braden scoring system, and achieves 24-hour and 48-hour early warning through a bidirectional GRU network combined with an attention mechanism, and can automatically locate high-risk areas to generate personalized care plans.
[0134] 1) The model input adopts 18-dimensional comprehensive features: 6-dimensional pressure matrix features (maximum pressure, average pressure, pressure standard deviation, high pressure zone proportion, pressure entropy, pressure gradient); 4-dimensional physiological parameters (heart rate, blood oxygen, body temperature, respiratory rate); 4-dimensional temporal features (flipping frequency, pressure duration, peak change rate, nocturnal activity); 4-dimensional individual information (age, BMI, activity level, history of pressure ulcers).
[0135] 2) The model structure is based on the improved ResNet18: The input layer maps 18-dimensional features to 64-dimensional features, and the feature dimensions are increased by four residual blocks (64→128→256→512). After global average pooling for dimensionality reduction, the output risk score of [0,1] is activated by Sigmoid.
[0136] 3) Establish a five-level risk classification system: 0.0-0.2 is extremely low risk (green) and no intervention is required; 0.2-0.4 is low risk (yellow) and routine observation is required; 0.4-0.6 is medium risk (orange) and the frequency of turning over should be increased; 0.6-0.8 is high risk (red) and the patient should be turned over once every 2 hours; 0.8-1.0 is extremely high risk (dark red) and immediate intervention is required, along with the use of a pressure-reducing mat.
[0137] 4) Using Grad-CAM technology to generate an 8×4 resolution risk heat map, the risk score is propagated back to the pressure matrix, visually marking high-risk areas such as the shoulders, hips, and heels, providing nursing staff with precise turning guidance.
[0138] (v) Reinforcement learning-driven adaptive threshold adjustment mechanism
[0139] To address the issue of individual differences, this invention proposes an adaptive threshold algorithm based on sliding window statistics and reinforcement learning, which dynamically adjusts the anomaly detection threshold according to individual historical data and real-time status.
[0140] 1) The 99th percentile is calculated using a sliding window as the baseline, and the anomaly detection range is determined by combining the interquartile range. The parameter n is adjusted by reinforcement learning for adaptive optimization.
[0141] 2) Threshold optimization is achieved using the DQN algorithm: The state space includes current physiological parameters, historical statistical information, and alarm history; the action space is the adjustment of the n value (range 0.5-3.0, step size 0.1); the reward function is designed as R = α×(1-false alarm rate) +β×(1-false negative rate) + γ×user satisfaction, where α=0.4, β=0.5, and γ=0.1.
[0142] 3) A multi-parameter collaborative threshold strategy is adopted: the heart rate threshold is determined based on the mean and standard deviation; the blood oxygen threshold is calculated based on the baseline value and interquartile range; and the pressure threshold is locally adaptively adjusted according to weight and body type.
[0143] 4) It has time-adaptive capabilities and can detect data distribution drift. When three consecutive alarms are not acknowledged, it is determined to be a new baseline and the statistical benchmark is recalculated to prevent the accumulation of false alarms.
[0144] (vi) Personalized model training strategies that integrate transfer learning and federated learning
[0145] This invention employs a two-stage transfer learning strategy to address the problem of scarce individual data. First, a basic model is pre-trained on a large-scale medical dataset (MIMIC-IV, PhysioNet), and then fine-tuned based on individual data to achieve personalized predictions.
[0146] 1) In the pre-training stage, the PhysioNet dataset is used for multi-task learning, including multi-classification of normal / abnormal states, regression prediction of physiological parameters in the next time step, and self-supervised contrastive learning to enhance feature representation capabilities.
[0147] 2) During the fine-tuning phase, individual baseline data is collected, and model parameters are updated incrementally through continuous learning. The bottom layers (the first two layers of the CNN and the embedding layer of the Transformer) are kept frozen, and only the subsequent convolutional layers, attention layers, and output layers are fine-tuned. L2 regularization is introduced to prevent overfitting on small samples.
[0148] 3) Constructing a hierarchical transfer architecture: The first layer transfers low-level visual features such as edge detection and texture recognition from ImageNet pre-trained models to identify spatial distribution patterns of the stress matrix; the second layer transfers domain features from medical datasets to capture physiological signal patterns such as heart rate variability and blood oxygen fluctuations; the third layer trains task-specific layers using only individual data to learn unique health characteristics of users. For personalized adaptation, the MAML meta-learning algorithm is introduced to train rapid adaptation capabilities on multiple individual samples, enabling new users to achieve practical performance with only a small number of samples; simultaneously, a federated learning framework is used to achieve multi-user collaborative training, improving generalization ability through model parameter aggregation while ensuring data privacy.
[0149] (vii) Three-layer neural network fusion architecture and emotion recognition
[0150] This invention integrates three core neural networks to construct a complete closed-loop system of "perception-analysis-interaction," enabling intelligent health management that ranges from physiological monitoring to emotional care.
[0151] 1) The first layer of the CNN spatial feature extraction network is responsible for extracting spatial distribution features from the pressure matrix: it receives an 8×4 pressure matrix as input, and outputs a 128-dimensional spatial feature vector after multi-layer convolution processing. This vector is used to identify changes in body position and locate high-pressure areas, providing basic spatial information for subsequent pressure ulcer risk assessment.
[0152] 2) The second-layer Transformer temporal modeling network is responsible for capturing multimodal temporal dependencies: the input includes spatial feature vectors and heart rate and blood oxygen temporal data, and the network models the correlation patterns at different time scales through a multi-head self-attention mechanism, outputting spatiotemporal fusion features to realize the functions of pressure ulcer risk prediction and sleep quality assessment.
[0153] 3) The third layer, EmotionNet, is responsible for speech emotion recognition and interaction. The input is speech acoustic features (including fundamental frequency, energy, formants, etc.). First, features are extracted using MFCC (Mel-frequency cepstral coefficients), then further extracted using a four-layer deep convolutional network. Next, LSTM layers model temporal changes in intonation. Finally, a Softmax classifier outputs six emotion recognition results (calm, joy, anxiety, sadness, anger, and fear) and their corresponding confidence scores. For each identified emotional state, the system employs a differentiated feedback mechanism: when anxiety or sadness is detected, soothing music is automatically played and voice accompaniment is provided; when anger or fear is detected, nursing staff are promptly notified to intervene, achieving closed-loop management of emotional care.
[0154] 4) The three networks converge and adopt a multi-task learning framework to achieve joint training: CNN provides the foundation for spatial awareness, Transformer integrates temporal dependencies, and EmotionNet supplements the emotional interaction dimension; the three networks share the underlying feature representation, and optimize the parameters of each network simultaneously through end-to-end backpropagation, so that the system achieves synergistic enhancement in physiological monitoring, health prediction and emotional care.
Claims
1. An emotionally interactive intelligent nursing system based on multimodal perception, characterized in that, The system comprises: a perception layer, a control layer, a network layer, a cloud analytics layer, and a user interaction layer. The layers are interconnected through a data communication module. The perception layer and the control layer use I²C, UART, and ADC interfaces for data transmission; the control layer and the cloud analysis layer use a Wi-Fi module and MQTT communication protocol for data uploading and command issuance; and the cloud analysis layer and the user interaction layer use WebSocket for real-time information synchronization. The sensing layer includes a matrix pressure sensor array, a heart rate sensor, a blood oxygen sensor, a urine moisture sensor, and a temperature and humidity sensor. The matrix pressure sensor array is installed inside the nursing mattress, employing a flexible electrode cross structure to form 32 independent pressure sensing units, capable of acquiring high-resolution body pressure distribution information. The sensor array is connected to a sub-control device via a multiplexing circuit and an analog-to-digital converter module to achieve parallel acquisition of pressure at multiple points. The heart rate and blood oxygen sensors are installed in the areas of contact with the monitored person's body, while the urine moisture and temperature and humidity sensors are installed on the bed and in the environmental detection unit. Data from all sensors is aggregated to the sub-control device for preliminary filtering and feature extraction. The control layer includes a main control device and sub-control devices; The main control device is responsible for voice recognition, motor control, voice broadcasting, and communication with the cloud; it includes an AI voice module, a voice broadcasting module, a motor control module, and a communication module; the AI voice module recognizes the elderly's voice commands through a voice recognition chip and analyzes voice features in conjunction with an emotion judgment algorithm; the voice broadcasting module is used to output soothing voices or health tips; the motor control module is used to drive bed adjustment or posture changes. The sub-control device is responsible for sensor data acquisition and edge analysis; it includes a data acquisition module and an edge computing module. The data acquisition module is connected to the sensor through I²C, UART, and ADC interfaces; the edge computing module executes filtering, anomaly detection, and attitude recognition algorithms to achieve local decision-making. The main control device and the sub-control device of the control layer are connected through a serial port or local area network to form a data sharing and redundancy fault tolerance mechanism. The network layer includes a Wi-Fi communication module and an MQTT Broker server; the control layer connects to the MQTT Broker in the network layer via the Wi-Fi module to enable data publishing and subscription; the MQTT Broker, as a message middleware, is responsible for managing data transmission between the device and the cloud, providing reliable message queues, topic management, and state maintenance mechanisms. The cloud-based analysis layer includes a cloud server, a multimodal data preprocessing module, and an algorithm module. The cloud server is built on the Python Flask framework and is responsible for receiving multimodal data from the control layer: stress matrix, heart rate, blood oxygen, urine moisture, and environmental parameters. The multimodal data preprocessing module is responsible for data reception, cleaning, time-series alignment, and storage management. The algorithm modules include: a multimodal spatiotemporal feature fusion prediction module, a pressure ulcer risk quantification assessment module, an adaptive dynamic threshold adjustment module, and a personalized health module based on transfer learning; The pressure ulcer risk quantification assessment module is based on a bidirectional GRU neural network and attention mechanism, combined with the automated enhancement of the Braden scoring system, to achieve quantitative scoring of pressure ulcer risk and location of high-risk areas, and provides 24-hour and 48-hour early warning capabilities. 1) The pressure ulcer risk quantification assessment module uses 18-dimensional comprehensive features as input: 6-dimensional pressure matrix features: maximum pressure, average pressure, pressure standard deviation, high pressure zone percentage, pressure entropy, and pressure gradient; 4-dimensional physiological parameters: heart rate, blood oxygen, body temperature, and respiratory rate; 4-dimensional temporal features: turning frequency, pressure duration, peak change rate, and nocturnal activity; and 4-dimensional individual information: age, BMI, activity level, and history of pressure ulcers. 2) The model structure is based on the improved ResNet18: The input layer maps 18-dimensional features to 64-dimensional features, and the feature dimensions are increased by four residual blocks: 64→128→256→512. After global average pooling for dimensionality reduction, the [0,1] risk score is output through Sigmoid activation. 3) Establish a five-level risk classification system: 0.0-0.2 is extremely low risk and no intervention is required; 0.2-0.4 is low risk and routine observation is required; 0.4-0.6 is medium risk and the frequency of turning over should be increased; 0.6-0.8 is high risk and the patient should be turned over once every 2 hours; 0.8-1.0 is extremely high risk and immediate intervention should be provided in conjunction with a pressure-reducing mat. 4) Generate an 8×4 resolution risk heat map, and backpropagate the risk score to the pressure matrix, visually marking the shoulders, hips, and heels, providing nursing staff with precise turning guidance.
2. The emotional interactive intelligent nursing system based on multimodal perception as described in claim 1, characterized in that, The multimodal spatiotemporal feature fusion prediction module adopts a CNN-Transformer hybrid architecture. It extracts the spatial features of pressure distribution through a convolutional neural network (CNN), combines it with a Transformer to model time-series data, and achieves adaptive feature fusion through a gated multimodal fusion layer, thereby realizing intelligent assessment of sleep quality, pressure ulcer risk, and abnormal body position in the elderly. 1) Convolutional Neural Network (CNN) is responsible for spatial feature extraction: After reconstructing the 32×1 pressure matrix into an 8×4×1 tensor, it is processed by three layers of convolution, each layer is configured with ReLU activation and batch normalization, and dimensionality is reduced by 2×2 max pooling to output a 128-dimensional spatial feature vector, capturing local pressure concentration patterns and body position distribution features. 2) The Transformer is responsible for temporal feature modeling: the input is the concatenated spatial feature vector, heart rate and blood oxygen time series; an 8-head multi-head self-attention mechanism is used to capture the dependencies at different time scales, and sine-cosine position encoding is combined to preserve temporal information; after processing by two layers of fully connected feedforward network, the output is spatiotemporal fusion features, which model the body position reversal frequency and the duration of pressure peak. 3) The fusion output layer concatenates spatial features and temporal features, uses a three-layer MLP classifier, and configures Dropout with a value of 0.3; it sets up dual output heads: the pressure ulcer risk score uses Sigmoid activation to output the risk probability of [0,1], and the sleep quality evaluation uses Softmax activation to achieve multi-classification.
3. The emotional interactive intelligent nursing system based on multimodal perception as described in claim 1, characterized in that, The adaptive dynamic threshold adjustment module uses a deep Q-network algorithm combined with sliding window statistics to achieve adaptive adjustment of the detection threshold, adapting to different individual characteristics and environmental changes, and reducing false alarm rate and false negative rate. 1) The 99th percentile is calculated using a sliding window as the baseline, and the anomaly detection range is determined by combining the interquartile range. The parameter n is adaptively optimized by reinforcement learning. 2) Threshold optimization is achieved using the DQN algorithm: the state space includes current physiological parameters, historical statistics, and alarm history; The action space is to adjust the value of n, ranging from 0.5 to 3.0, with a step size of 0.1; the reward function is designed as R = α × (1 - false positive rate) + β × (1 - false negative rate) + γ × user satisfaction, where α = 0.4, β = 0.5, and γ = 0.
1. 3) Employ a multi-parameter collaborative threshold strategy: heart rate threshold is determined based on mean and standard deviation; blood oxygen threshold is calculated based on baseline value and interquartile range; pressure threshold is locally adaptively adjusted based on weight and body type. 4) It has the ability to adapt to time series and detect data distribution drift; when three consecutive alarms are not confirmed, it is determined to be a new baseline and the statistical benchmark is recalculated to prevent the accumulation of false alarms.
4. The emotional interactive intelligent nursing system based on multimodal perception as described in claim 1, characterized in that, The personalized health module based on transfer learning utilizes publicly available medical datasets for model pre-training, combines online learning mechanisms to achieve personalized model fine-tuning, and adopts a multi-task learning architecture to simultaneously optimize pressure ulcer prediction, sleep assessment, and abnormality detection. 1) The pre-training stage utilizes the dataset for multi-task learning, including multi-classification of normal / abnormal states, regression prediction of physiological parameters in the next time step, and self-supervised contrastive learning to enhance feature representation capabilities; 2) During the fine-tuning phase, individual baseline data is collected, and the model parameters are updated incrementally through continuous learning; the bottom layer remains frozen, and only the subsequent convolutional layers, attention layers, and output layers are fine-tuned, and L2 regularization is introduced to prevent overfitting of small samples; the bottom layer includes: the first two layers of CNN and the embedding layer of Transformer; 3) Construct a hierarchical transfer architecture: The first layer transfers edge detection and texture recognition from the pre-trained model to identify the spatial distribution pattern of the stress matrix; the second layer transfers domain features from the medical dataset to capture heart rate variability and blood oxygen fluctuations; the third layer trains the task using only individual data to learn the user's unique health characteristics; in terms of personalized adaptation, the MAML meta-learning algorithm is introduced to train rapid adaptation capabilities on multiple individual samples; at the same time, a federated learning framework is adopted to achieve multi-user collaborative training, and the generalization ability is improved by aggregating model parameters while ensuring data privacy.
5. The method of an emotional interactive intelligent nursing system based on multimodal perception as described in claim 1, characterized in that, The method includes: Step 1: Data Acquisition and Edge Preprocessing Stage: Various sensors in the sensing layer acquire data in real time according to differentiated sampling strategies: the pressure matrix continuously acquires body pressure distribution information at a frequency of 10Hz; the heart rate and blood oxygen sensor acquires physiological parameters at a frequency of 1Hz; the urine moisture sensor adopts an event-triggered mode; the ambient temperature and humidity sensor samples periodically at a frequency of 0.1Hz; after receiving the raw sensor data, the sub-control device performs preliminary processing using the built-in edge computing module; Kalman filtering is used to eliminate acquisition noise, and Min-Max normalization is used to map various types of data to a unified dimension, extracting basic features in the time and frequency domains, laying the data foundation for subsequent analysis; Step 2: Real-time Edge Detection and Rapid Response Phase: The sub-control device deploys a lightweight anomaly detection algorithm for rapid response in emergencies; the bed-off detection algorithm continuously monitors the total pressure matrix, and determines a bed-off event when the value is below a preset threshold for more than 5 seconds; the urine wetness detection algorithm identifies urinary incontinence by monitoring sudden changes in resistance value; the physiological extreme anomaly detection algorithm compares the normal heart rate range with the safe blood oxygen saturation line in real time, and immediately triggers an alarm if it exceeds the safe range; when an abnormal event is detected, the sub-control device immediately triggers a local alarm mechanism, realizing multimodal alarm through voice prompts, buzzers, and LED indicators. The entire response process delay is controlled within milliseconds to ensure timely handling of emergencies. Step 3: Cloud-based Intelligent Analysis and Health Assessment Stage: The sub-control device uploads a complete data packet to the cloud every minute via the MQTT protocol; after receiving the data, the cloud analysis layer initiates parallel processing of a three-layer neural network: the CNN network extracts spatial distribution features from the pressure matrix to identify posture patterns and high-pressure areas; the Transformer network integrates spatial features with time-series data of heart rate and blood oxygen to model the temporal dependence of physiological parameters; the EmotionNet network is triggered on demand in voice interaction scenarios to analyze the user's emotional state; the multimodal health assessment model integrates the output results of the three networks to generate a pressure ulcer risk score and a heat map of high-risk areas, evaluate sleep quality levels, and identify emotional state categories; simultaneously, the dynamic threshold adjustment algorithm based on reinforcement learning adaptively optimizes various alarm parameters according to individual historical data and real-time status through the DQN algorithm, minimizing the false alarm rate while reducing the false alarm rate; Step 4: Tiered Early Warning Stage: The cloud generates tiered early warning information based on health assessment results and risk levels. When a high-risk situation occurs, the system pushes the early warning information to the control layer in real time via the MQTT protocol, triggering audible and visual alarms and voice broadcasts. Simultaneously, notification messages are pushed to the nursing and family terminals via the WebSocket protocol. The message content includes risk score values, heat maps of high-risk areas, and targeted nursing suggestions. For different emotional states, the system executes differentiated interaction strategies: when anxiety or sadness is detected, preset soothing music is automatically played and the voice companion function is activated; when anger or fear is identified, nursing staff are notified first for manual intervention, achieving closed-loop management of emotional care. Step 5: Health Record Management and Continuous Model Optimization Phase: The system automatically generates health reports on a daily, weekly, and monthly basis, and displays the pressure ulcer risk change curve and sleep quality trend through visual charts to provide decision support for medical staff; the cloud database continuously stores all sensor data, model output results, and user feedback information to form a complete data loop; the AI model adopts a periodic retraining mechanism to update transfer learning parameters based on accumulated data, optimize dynamic threshold strategies, and expand personalized health records.