A Method for Emotion Mapping and Interaction Control of Multimodal Human Habitat Data

By using multimodal data fusion and adaptive interactive control, the problems of single data collection dimensions and inaccurate emotional mapping in smart elderly care have been solved. This has enabled accurate identification of the emotional state of elderly users and dynamic adaptation to the environment, improving living comfort and health levels, and promoting the integration of smart elderly care and the big health industry.

CN122308192APending Publication Date: 2026-06-30TIANJIN XINGHE TECHNOLOGY DEVELOPMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TIANJIN XINGHE TECHNOLOGY DEVELOPMENT CO LTD
Filing Date
2026-03-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies for controlling the living environment in the field of smart elderly care suffer from problems such as limited data collection dimensions, inaccurate emotional mapping, poor interactive adaptability, and lack of integration with the broader health and wellness landscape, thus failing to meet the multidimensional needs of elderly users.

Method used

By integrating multimodal data collection, accurately mapping emotional states, and adaptive interactive control, a method for emotional mapping and interactive control of multimodal human settlement environment data is constructed. This method includes multimodal data collection, feature extraction, emotional mapping, and adaptive environmental adjustment. In combination with the elderly care scenario and the physiological characteristics of elderly users, a convenient interaction method is designed.

Benefits of technology

It achieves dynamic adaptation between the living environment and the emotional state of elderly users, improves living comfort and physical and mental health, and promotes the deep integration of smart elderly care with the silver economy and the big health industry.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308192A_ABST
    Figure CN122308192A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of adaptive interactive control technology and discloses a method for emotion mapping and interactive control of multimodal living environment data, including the following steps: Step S1, multimodal living environment and user data collection; Step S2, multimodal data fusion and feature extraction; Step S3, emotion mapping of multimodal data; Step S4, emotion-driven interactive control of the living environment; Step S5, data storage and traceability management. This invention employs the above-mentioned method for emotion mapping and interactive control of multimodal living environment data to achieve adaptive interactive control of the living environment through precise mapping of multi-dimensional environmental data with the emotional state of elderly users, thus contributing to the high-quality development of the silver economy and safeguarding the physical and mental health of the elderly.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of adaptive interactive control technology, and in particular to a method for emotion mapping and interactive control of multimodal human living environment data. Background Technology

[0002] Guided by the concept of holistic health, elderly care services have shifted from basic daily care to comprehensive care encompassing physiological, psychological, and environmental dimensions. As the core setting for the daily activities of the elderly, the comfort and suitability of the living environment directly affect their emotional state and physical and mental health. In turn, changes in emotional state influence the elderly's needs for the living environment, forming a two-way relationship between environment and emotion.

[0003] Current technologies for controlling the living environment in the field of smart elderly care are mostly limited to adjusting environmental parameters in a single dimension (such as temperature and humidity) or to single-modal emotion recognition (such as facial expression recognition). These technologies suffer from the following core shortcomings: First, they rely on a single data collection dimension, failing to achieve multimodal fusion of environmental, physiological, and behavioral data. This makes it impossible to comprehensively capture the real needs and emotional changes of elderly users. For example, temperature regulation alone cannot accommodate the different physical sensations experienced by elderly users due to anxiety. Second, they lack an effective emotion mapping mechanism, failing to establish a link between multimodal data and the emotional states of elderly users (pleasure, calmness, anxiety, loneliness, discomfort, etc.). The precise correspondence between the two is lacking, and the disconnect between environmental adjustment and emotional needs makes it difficult to achieve emotionally adaptive environmental control. Third, the interaction control mode is rigid, mostly based on fixed threshold triggers or manual control, and cannot adaptively adjust according to the dynamic changes in the emotional state of elderly users. Furthermore, it does not take into account the physiological characteristics of the elderly (such as the high incidence of chronic diseases and mobility difficulties) to design convenient interaction methods, resulting in poor adaptability. Fourth, the existing technology has not fully integrated the needs of the silver economy and the big health industry, and has not deeply integrated emotional state monitoring, environmental adaptation, and big health scenarios such as chronic disease management and psychological comfort, thus failing to meet the dual needs of the elderly for both physical and mental health.

[0004] Therefore, developing an interactive control method that can integrate multimodal human living environment data, accurately realize emotional mapping, adapt to the needs of the elderly, and link with big health scenarios has become the key to solving the current pain points of smart elderly care and promoting the upgrading of the silver economy. It also conforms to the development concept of the big health industry of prevention first and full protection throughout the process. Summary of the Invention

[0005] The purpose of this invention is to provide a method for emotion mapping and interaction control of multimodal human living environment data. By integrating and collecting multimodal data, accurately mapping emotional states, and implementing adaptive interaction control, this method solves the problems of existing technologies, such as single data dimensions, inaccurate emotion mapping, poor interaction adaptability, and lack of linkage with big health scenarios. It achieves dynamic adaptation between the living environment and the emotional state of elderly users, improves the living comfort and physical and mental health of elderly users, and promotes the deep integration of smart elderly care with the silver economy and the big health industry.

[0006] To achieve the above objectives, this invention provides a method for emotion mapping and interaction control of multimodal human settlement environment data, comprising the following steps: Step S1, Multimodal Human Settlement Environment and User Data Collection: Build and deploy a multimodal data collection terminal to simultaneously collect human settlement environment data, elderly user physiological data, and elderly user behavioral data to form a multimodal raw dataset; Step S2, Multimodal Data Fusion and Feature Extraction: A multimodal fusion network is used to fuse features of the multimodal dataset and extract features that can characterize the relationship between environment, physiology and behavior. Step S3, Sentiment Mapping of Multimodal Data: Construct a Transformer-based sentiment mapping model to map the fused feature vectors to the emotional state of elderly users. Combine the emotional characteristics of elderly users in the elderly care scenario, define sentiment labels, and achieve accurate identification and quantification of emotional states. Step S4, Emotion-Driven Interactive Control of Living Environment: Based on the emotional mapping results, an adaptive interactive control strategy is constructed to link living environment adjustment equipment, health monitoring equipment, and emotional comfort equipment to achieve dynamic adjustment and emotional interaction of the living environment; Step S5: Data storage and traceability management.

[0007] Preferably, in step S1, the multimodal human living environment and user data collection process is as follows: Step S11, Human Settlement Environment Data Acquisition: Environmental parameters are collected in real time through an environmental sensor array, with a sampling frequency of 1 time / 30s; The collected environmental parameters include: temperature (T), humidity (H), light intensity (L), noise intensity (N), air quality index (AQI), PM2.5 concentration (P), and CO2 concentration (C). Meanwhile, the environmental scene parameters collected include: scene type and time period; the scene types include: bedroom, living room, bathroom and rehabilitation area; the time periods include: morning, daytime activities and nighttime rest; The resulting subset of environmental data As shown below: ; Where S represents the scene type, coded from 1 to 4, corresponding to four scene types: bedroom, living room, bathroom, and rehabilitation area; Tm represents the time period, coded from 1 to 3, corresponding to three time periods: morning, daytime activities, and nighttime rest. Step S12: Physiological data collection for elderly users: Physiological parameters of elderly users are collected through wearable devices and non-contact monitoring devices, with a sampling frequency of 1 time / 1min. The physiological parameters collected include: heart rate (HR) and systolic blood pressure (BP). s diastolic blood pressure (BP) d Body temperature (Temp), respiratory rate (RR), and blood oxygen saturation (SpO2); For elderly users with chronic diseases, additional parameters related to chronic diseases are collected, including blood glucose (GLU) for diabetic users and heart rate variability (HRV) for cardiovascular and cerebrovascular users. The resulting subset of physiological data As shown below: ; Among them, chronic disease parameters are selectively collected based on the user's situation; if there is no relevant chronic disease, the parameter is left blank. Step S13: Collect elderly user behavior data: Collect the behavior and voice characteristics of elderly users through visual acquisition devices, motion sensors, and voice acquisition devices, with a sampling frequency of 1 time / 10s. The collected behavioral parameters include: behavioral actions, coded as 1-6, corresponding to: lying down, sitting, walking, eating, rehabilitation training, and abnormal emotional actions; and speech features: speech rate V. s intonation V t Voice energy E v Facial expression features were extracted using contour features and encoded as 1-5, corresponding to: joy, calmness, neutrality, anxiety, and sadness, respectively. The resulting subset of behavioral data As shown below: ; Where Act represents the action code; Exp represents the facial expression code; Step S14, Data Preprocessing: The collected multimodal raw data is cleaned and standardized to remove outliers and missing values. The min-max standardization method is used to map all parameters to the [0,1] interval to obtain a standardized multimodal dataset, as shown below: ; Where E', P', and B' are the standardized results of environmental data, physiological data, and behavioral data, respectively; the standardization formula is shown below: ; Where x represents the original data; x' represents the standardized data; This is the minimum value of the parameter; This is the maximum value of the parameter; For missing values, the mean value of the same time period and the same scenario is used to fill in the missing values ​​to ensure the integrity of the dataset.

[0008] Preferably, in step S2, the multimodal data fusion and feature extraction process is as follows: Step S21, Single-modal feature extraction: Extract features from environmental data subset E', physiological data subset P', and behavioral data subset B' respectively; Step S22, Multimodal Feature Fusion: Introducing similarity loss, difference loss, and reconstruction loss, a multimodal fusion model is constructed to fuse single-modal feature vectors. , , Merged into a unified fusion feature vector .

[0009] Preferably, in step S21, the specific process of single-modal feature extraction is as follows: Step S211, Environmental Feature Extraction: A convolutional neural network is used to extract the spatial features of the environmental data. The input is an 8-dimensional parameter E'. The feature is extracted through two convolutional layers with a kernel size of 3×3 and a stride of 1, and one pooling layer with a kernel size of 2×2 and a stride of 2. The output is an environmental feature vector. As shown below: ; Step S212, Physiological Feature Extraction: A Long Short-Term Memory (LSTM) network is used to extract the temporal features of the physiological data. The input is an 8-dimensional parameter P', with two hidden layers, each containing 128 neurons. The output is a physiological feature vector. As shown below: ; Step S213, Behavioral Feature Extraction: Key features of behavioral data are extracted using an attention mechanism. The input is a 5-dimensional parameter B'. Through attention weight allocation, the behavioral feature vector is output. As shown below: .

[0010] Preferably, in step S22, the specific process of multimodal feature fusion is as follows: Step S221, Similarity Loss The calculation enables information sharing among different modalities and reduces distribution differences, as shown below: ; in, The order of the central moment; Denotes the k-th order central moment; Represents the L2 norm; Step S222, Difference Loss The calculation helps the model distinguish the feature differences corresponding to different emotions, as shown below: ; Where n is the number of samples; tr(·) denotes the trace of the matrix; K, M, and N are respectively , , The corresponding kernel matrix; L, H, H', H'' are auxiliary matrices; Step S223, Reconstructing Losses The computation ensures that the fused features can capture the key details of each modality and reduce trivial representations, as shown below: ; in, , , The reconstruction results are for environmental, physiological, and behavioral characteristics, respectively. Step S224, Feature Fusion Output: Through a cross-modal attention mechanism, the single-modal features are interactively enhanced. Combined with a loss function, the fusion model is trained, and the final output fused feature vector F is shown below: ; in, , , They are respectively , , The weight matrix; For feature dimensions; This indicates feature concatenation; the Softmax function is used to normalize weights.

[0011] Preferably, in step S3, the sentiment mapping of the multimodal data is carried out as follows: Step S31, Emotion Label Definition: Based on the physiological and psychological characteristics of the elderly population, define 5 types of emotional state labels and assign quantitative values ​​for subsequent interaction control; Step S32, Sentiment Mapping Model Training: Using the fused feature vector F as input and the sentiment label quantization value as output, train the improved Transformer model. The model contains 4 encoder layers and 2 decoder layers. Each encoder layer contains a multi-head attention mechanism and a fully connected layer. The decoder layer is used to output the sentiment quantization value. During training, mean squared error is used as the loss function, the Adam optimizer is used, the learning rate is set to 1e-4, and the number of iterations is set to 100 rounds until the model converges. loss function As shown below: ; Where N is the number of training samples; This represents the true sentiment quantification value for the i-th sample. The quantified value of sentiment predicted by the model; Step S33, Emotional State Output: Input the real-time extracted fusion feature vector F into the trained emotional mapping model, and output the real-time emotional quantification value of the elderly user. Based on the range of quantified values, the corresponding sentiment state label is determined, completing the mapping from multimodal data to sentiment state. Simultaneously, combining chronic disease parameters, the correlation coefficient R between sentiment state and chronic disease is output, as shown below: ; Where m represents the number of chronic disease parameters; Let be the standardized value of the i-th chronic disease parameter; This represents the mean value of the parameter for this chronic disease. The mean of the sentiment quantification values; the correlation degree R∈[-1,1].

[0012] Preferably, in step S31, five categories of emotional state labels are defined and assigned quantitative values, as follows: Tag 1: Pleasure, quantitative value S1=0.8-1.0: corresponds to elderly users' physical and mental comfort, positive emotions, normal physiological parameters, and relaxed behavior; Tag 2: Calm, quantitative value S2=0.6-0.8: corresponds to elderly users with stable emotions, no obvious fluctuations, normal physiological parameters, and calm behavior. Tag 3: Neutral, quantitative value S3=0.4-0.6: Corresponds to elderly users with no obvious emotional tendency, basically normal physiological parameters, and normal behavior. Tag 4: Anxiety, quantitative value S4=0.2-0.4: corresponds to elderly users' emotional tension, irritability, abnormal physiological parameters, and restless behavior; Tag 5: Discomfort, quantitative value S5=0.0-0.2: corresponds to physical discomfort or low mood in elderly users, obvious abnormal physiological parameters, and abnormal behavior.

[0013] Preferably, in step S4, the emotion-driven human-centered environment interaction control is implemented as follows: Step S41, Control Parameter Initialization: Based on different elderly care scenarios and time periods, set the basic threshold range for environmental parameters. Then, adjust the basic thresholds individually, taking into account the elderly user's age and chronic disease status, to form a basic control parameter set, as shown below: ; Step S42, Adaptive Environment Adjustment: Based on the emotion quantification value obtained from the emotion mapping. And sentiment tags, calculate the moderating amount of environmental parameters. The moderating magnitude is positively correlated with the emotional quantification value, as shown below: ; Where k is the adjustment coefficient, ranging from 0.1 to 0.3; 0.5 is the median value of the neutral sentiment quantification; when When >0.5, the adjustment amount is positive; when When <0.5, the adjustment amount is negative; Step S43, Interactive Feedback and Model Optimization: Collect multimodal data of elderly users after adjustment in real time, repeat steps S1-S3, and obtain the adjusted emotional quantification value. Calculate the degree of improvement in mood As shown below: ; Step S44, Convenient Interaction Supplement: Design multimodal convenient interaction methods, including voice interaction, gesture interaction, one-click call, and support for remote interaction by family members.

[0014] Preferably, targeted adjustments to environmental parameters can alleviate negative emotions. Specific adjustment strategies are as follows: (1) Pleasure, 0.8≤ ≤1.0: Fine-tune the environmental parameters to the upper limit of comfort, and at the same time activate the emotional comfort equipment; (2) Calm, 0.6≤ <0.8: This maintains environmental parameters within the basic threshold range, requiring no additional adjustment. (3) Neutral, 0.4≤ <0.6: Slightly adjust the environmental parameters to the middle value of the basic threshold; (4) Anxiety, 0.2≤ <0.4: Reduce temperature, noise, and light intensity; increase humidity; play soothing music; and turn off irrelevant noise sources. If the correlation R > 0.7, push chronic disease relief suggestions simultaneously. (5) Discomfort, 0≤ <0.2: The system adjusts environmental parameters accordingly, triggers health alerts, sends warning messages to family members and elderly care workers, and activates emergency care equipment. The evaluation criteria for the degree of improvement in emotional state are as follows: (1) Excellent: ≥0.2 indicates a shift in emotional state from negative to calm or pleasant; (2) Good: 0.1≤ <0.2, emotional state significantly improved; (3) Qualified: 0< <0.1, emotional state has slightly improved; (4) Unqualified: ≤0 indicates no improvement or worsening of the emotional state.

[0015] Preferably, in step S5, data storage and traceability management specifically includes: Build an encrypted data storage server to store multimodal raw data, standardized data, fusion features, sentiment mapping results, interaction control records, and health warning information; Using blockchain technology to achieve data traceability ensures the security, integrity, and traceability of data; Meanwhile, the data is anonymized to protect the privacy of elderly users; the data storage period is set to 1 year, and the data is backed up regularly for model optimization, elderly care service optimization, and big health data analysis.

[0016] Therefore, the present invention employs the above-mentioned method for emotion mapping and interaction control of multimodal human settlement environment data, and the beneficial effects are as follows: (1) Improve the accuracy of emotion recognition and environmental adaptation: Through multimodal data fusion and improved emotion mapping model, the physiological state, behavioral characteristics and living environment parameters of elderly users are fully captured, and the emotional state of elderly users is accurately identified; at the same time, based on the adaptive environmental adjustment of emotional state, the living environment and the emotional needs of elderly users are dynamically adapted, thereby improving the living comfort and psychological pleasure of elderly users.

[0017] (2) Adapting to the needs of elderly care, silver economy and big health: It is tailored to the physiological characteristics (high incidence of chronic diseases, inconvenience of movement) and psychological characteristics (easily lonely, anxious) of the elderly population, linking functions such as chronic disease management, health early warning, and emotional comfort, promoting the upgrading of smart elderly care services to all dimensions and high quality, helping the large-scale development of the silver economy industry, and providing a new path for the big health industry to link environment-emotion-health.

[0018] (3) Convenient interaction and strong adaptability: Designed for the operation ability of the elderly, a multimodal convenient interaction method is designed, supporting voice, gesture, one-click call and remote care, reducing the operation difficulty for elderly users and adapting to elderly users of different ages and physical conditions; at the same time, according to different elderly care scenarios, different time periods and different chronic diseases, the control strategy is adjusted in a personalized manner to adapt to the needs of home care, community care, institutional care and other scenarios.

[0019] (4) Data security and traceability with industrial promotion value: The use of encrypted storage and blockchain traceability technology protects the privacy data of elderly users and ensures the security and integrity of the data. At the same time, the stored multimodal data and emotional data are used for the optimization of elderly care services, model iteration and upgrading and big health data analysis, providing technical and data support for enterprises related to the silver economy. It has broad industrial promotion value and can promote the deep integration of smart elderly care with the silver economy and the big health industry.

[0020] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0021] Figure 1 This is a flowchart of a method for emotion mapping and interaction control of multimodal human settlement environment data according to the present invention; Figure 2 This is a schematic diagram illustrating the process of multimodal data fusion and feature extraction in this invention. Figure 3 This is a schematic diagram of the sentiment mapping process for multimodal data in this invention; Figure 4 This is a schematic diagram of the process of emotion-driven interactive control of the living environment according to the present invention. Detailed Implementation

[0022] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.

[0023] like Figure 1 As shown, the present invention provides a method for emotion mapping and interaction control of multimodal human settlement environment data, comprising the following steps: Step S1: Multimodal human settlement environment and user data collection.

[0024] A multimodal data collection terminal is built and deployed in home-based, community-based, or institutional elderly care scenarios to simultaneously collect three types of core data: living environment data, elderly user physiological data, and elderly user behavioral data, forming a multimodal raw dataset.

[0025] Step S11, Human Settlement Environment Data Acquisition: Environmental parameters are collected in real time through an environmental sensor array (temperature sensor, humidity sensor, light sensor, noise sensor, air quality sensor, PM2.5 sensor, CO2 sensor), with a sampling frequency of 1 time / 30s.

[0026] The collected environmental parameters include: temperature T (°C), humidity H (%RH), light intensity L (lux), noise intensity N (dB), air quality index AQI, and PM2.5 concentration P (μg / m³). 3 CO2 concentration C (unit: ppm); Meanwhile, environmental scene parameters collected include: scene type (bedroom, living room, bathroom, rehabilitation area) and time period (morning, daytime activities, nighttime rest); The resulting subset of environmental data As shown below: ; Wherein, S represents the scene type, coded as 1-4, corresponding to four scenes: bedroom, living room, bathroom, and rehabilitation area; Tm represents the time period, coded as 1-3, corresponding to three time periods: morning, daytime activities, and nighttime rest.

[0027] Step S12: Physiological data collection for elderly users: Physiological parameters of elderly users are collected through wearable devices (smart bracelets, heart rate patches, blood pressure monitors) and non-contact monitoring devices (millimeter-wave radar, infrared monitors), with a sampling frequency of 1 time / 1 minute.

[0028] The collected physiological parameters include: heart rate (HR) (beats / min), blood pressure (systolic blood pressure BP). s diastolic blood pressure (BP) d (Unit: mmHg), body temperature (Temp, ℃), respiratory rate (RR, breaths / min), and blood oxygen saturation (SpO2, %). For elderly users with chronic diseases, additional parameters related to chronic diseases are collected, including blood glucose (GLU) values ​​(mmol / L) for diabetic users and heart rate variability (HRV) for cardiovascular and cerebrovascular users. The resulting subset of physiological data As shown below: ; Among them, chronic disease parameters are selectively collected based on the user's situation; if there is no relevant chronic disease, the parameter is left blank.

[0029] Step S13: Collect elderly user behavior data: Collect the behavior and voice characteristics of elderly users through high-definition visual acquisition equipment, motion sensors, and voice acquisition equipment, with a sampling frequency of 1 time / 10s.

[0030] The collected behavioral parameters include: behavioral actions, coded as 1-6, corresponding to: lying down, sitting, walking, eating, rehabilitation training, and abnormal emotional actions; and speech features: speech rate V. s (Unit: words / min), intonation V t (Unit: Hz), Voice Energy E v (Unit: dB); Facial expression features are extracted through contour features and encoded as 1-5, corresponding to: joy, calm, neutral, anxiety, and sadness, respectively. The resulting subset of behavioral data As shown below: ; Where Act represents the code for behavioral actions; Exp represents the code for facial expressions.

[0031] Step S14, Data Preprocessing: The collected multimodal raw data is cleaned and standardized to remove outliers and missing values. The min-max standardization method is used to map all parameters to the [0,1] interval to obtain a standardized multimodal dataset, as shown below: ; Where E', P', and B' are the standardized results of environmental data, physiological data, and behavioral data, respectively; the standardization formula is shown below: ; Where x represents the original data; x' represents the standardized data; This is the minimum value of the parameter; This is the maximum value of the parameter.

[0032] For missing values, the mean value of the same time period and the same scenario is used to fill in the missing values ​​to ensure the integrity of the dataset.

[0033] Step S2, multimodal data fusion and feature extraction, such as Figure 2 As shown.

[0034] A multimodal fusion network is used to perform feature fusion on the standardized multimodal dataset D, extracting core features that can characterize the relationship between environment, physiology and behavior, thus providing support for emotion mapping.

[0035] Step S21, Single-modal feature extraction: Extract features from environmental data subset E', physiological data subset P', and behavioral data subset B' respectively.

[0036] Step S211, Environmental Feature Extraction: A Convolutional Neural Network (CNN) is used to extract spatial features from the environmental data. The input is an 8-dimensional parameter E'. The feature is extracted through two convolutional layers (3×3 kernel size, stride 1) and one pooling layer (2×2 pooling kernel size, stride 2), outputting an environmental feature vector. As shown below: ; Step S212, Physiological Feature Extraction: A Long Short-Term Memory (LSTM) network is used to extract the temporal features of the physiological data. The input is an 8-dimensional parameter P' (arranged in time series). Two hidden layers are set (each hidden layer has 128 neurons), and the output is a physiological feature vector. As shown below: ; Step S213, Behavioral Feature Extraction: Key features of the behavioral data are extracted using a self-attention mechanism. The input is a 5-dimensional parameter B'. Through attention weight allocation, key features such as facial expressions and abnormal emotional movements are highlighted, and a behavioral feature vector is output. As shown below: .

[0037] Step S22, Multimodal Feature Fusion: Introduce similarity loss (CMD), difference loss (HSIC), and reconstruction loss to construct a multimodal fusion model, which combines single-modal feature vectors. , , Merged into a unified fusion feature vector The specific integration process is as follows: Step S221, Similarity Loss The calculation enables information sharing among different modalities and reduces distribution differences, as shown below: ; in, Let be the order of the central moment (taken as 3); Denotes the k-th order central moment; This represents the L2 norm.

[0038] Step S222, Difference Loss The calculation helps the model distinguish the feature differences corresponding to different emotions, as shown below: ; Where n is the number of samples; tr(·) denotes the trace of the matrix; K, M, and N are respectively , , The corresponding kernel matrix; L, H, H', H'' are auxiliary matrices.

[0039] Step S223, Reconstructing Losses The computation ensures that the fused features can capture the key details of each modality and reduce trivial representations, as shown below: ; in, , , The results represent the reconstruction of environmental, physiological, and behavioral characteristics, respectively.

[0040] Step S224, Feature Fusion Output: Through a cross-modal attention mechanism, the single-modal features are interactively enhanced. Combined with a loss function, the fusion model is trained, and the final output fused feature vector F is shown below: ; in, , , They are respectively , , The weight matrix; For feature dimensions; This indicates feature concatenation; the Softmax function is used to normalize weights.

[0041] Step S3: Sentiment mapping of multimodal data, such as... Figure 3 As shown.

[0042] An improved Transformer-based emotion mapping model is constructed, which maps the fused feature vector F to the emotional state of elderly users. Combining the emotional characteristics of elderly users in the elderly care scenario (such as high-frequency emotions such as loneliness, anxiety, and discomfort), five core emotion labels are defined to achieve accurate identification and quantification of emotional states.

[0043] Step S31, Emotional Label Definition: Combining the needs of the silver economy and the big health industry, and considering the physiological and psychological characteristics of the elderly population, five emotional state labels are defined and assigned quantitative values ​​for subsequent interaction control, as follows: Tag 1: Pleasure, quantitative value S1=0.8-1.0: corresponds to elderly users' physical and mental comfort, positive emotions, normal physiological parameters, and relaxed behavior; Tag 2: Calm, quantitative value S2=0.6-0.8: corresponds to elderly users with stable emotions, no obvious fluctuations, normal physiological parameters, and calm behavior. Tag 3: Neutral, quantitative value S3=0.4-0.6: Corresponds to elderly users with no obvious emotional tendency, basically normal physiological parameters, and normal behavior. Tag 4: Anxiety, quantitative value S4=0.2-0.4: corresponds to elderly users' emotional tension, irritability, abnormal physiological parameters (such as increased heart rate, high blood pressure), and agitated behavior; Tag 5: Discomfort, quantitative value S5=0.0-0.2: corresponds to physical discomfort or low mood in elderly users, obvious abnormal physiological parameters (such as elevated body temperature and decreased blood oxygen), and abnormal behavior (such as lying still or abnormal emotional movements).

[0044] Step S32, Sentiment Mapping Model Training: Using the fused feature vector F as input and the sentiment label quantization value as output, train the improved Transformer model. The model contains 4 encoder layers and 2 decoder layers. Each encoder layer contains a multi-head attention mechanism (8 heads) and a fully connected layer. The decoder layer is used to output the sentiment quantization value. During training, mean squared error (MSE) is used as the loss function, the Adam optimizer is used, the learning rate is set to 1e-4, and the number of iterations is set to 100 rounds until the model converges, that is, the loss value is less than 0.001.

[0045] loss function As shown below: ; Where N is the number of training samples; This represents the true sentiment quantification value for the i-th sample. This is the quantified sentiment value predicted by the model.

[0046] Step S33, Emotional State Output: Input the real-time extracted fusion feature vector F into the trained emotional mapping model, and output the real-time emotional quantification value of the elderly user. Based on the range of quantified values, the corresponding sentiment state label is determined, completing the mapping from multimodal data to sentiment state. Simultaneously, combining chronic disease parameters, the correlation coefficient R between sentiment state and chronic disease is output, as shown below: ; Where m represents the number of chronic disease parameters; Let be the standardized value of the i-th chronic disease parameter; This represents the mean value of the parameter for this chronic disease. The mean of the emotional quantification values; the correlation R∈[-1,1], the larger the absolute value of R, the higher the correlation between emotional state and chronic diseases (such as the correlation between anxiety and hypertension).

[0047] Step S4: Emotion-driven interactive control of the living environment, such as... Figure 4 As shown.

[0048] Based on the results of emotion mapping, and combined with the adaptability of elderly care scenarios and the needs of general health, an adaptive interactive control strategy is constructed to link living environment adjustment equipment, health monitoring equipment, and emotional comfort equipment to achieve dynamic adjustment of the living environment and emotional interaction.

[0049] Step S41: Control Parameter Initialization: Based on different elderly care scenarios (bedroom, living room, bathroom, rehabilitation area) and time periods, set the basic threshold range for environmental parameters. Then, adjust the basic thresholds individually based on the elderly user's age and chronic disease status to form a basic control parameter set, as shown below: ; The basic thresholds for each parameter are as follows (these can be adjusted individually): Bedroom (for nighttime rest): =20-24℃, =40%-60%, =0-10 lux, ≤30dB ≤50, ≤10μg / m 3 , ≤800ppm; Living room (daytime activities): =22-26℃, =45%-65%, =100-300 lux, ≤40dB ≤50, ≤10μg / m 3 , ≤800ppm; Rehabilitation area (rehabilitation training): =23-27℃, =50%-60%, =200-400 lux, ≤35dB ≤50, ≤10μg / m 3 , ≤800ppm; bathroom: =24-28℃, =50%-70%, =150-300 lux, ≤45dB ≤50, ≤10μg / m 3 , ≤1000ppm.

[0050] Step S42, Adaptive Environment Adjustment: Based on the emotion quantification value obtained from the emotion mapping. And sentiment tags, calculate the moderating amount of environmental parameters. The moderating magnitude is positively correlated with the emotional quantification value, as shown below: ; Where k is an adjustment coefficient (adaptively adjusted according to the scenario, ranging from 0.1 to 0.3); 0.5 is the median value of the neutral sentiment quantification; when When the value is >0.5 (pleasant, calm), the adjustment amount is positive, fine-tuning the environmental parameters towards greater comfort; when... When the modulatory modulus is less than 0.5 (anxiety, discomfort), the modulatory modulus is negative.

[0051] Targeted adjustments to environmental parameters can alleviate negative emotions. Specific adjustment strategies are as follows: (1) Pleasure (0.8≤ ≤1.0): Fine-tune the environmental parameters to the upper limit of comfort, and at the same time activate emotional comfort devices (such as playing soothing music and broadcasting warm reminders). (2) Calm (0.6≤ <0.8): This maintains environmental parameters within the basic threshold range, requiring no additional adjustment. (3) Neutral (0.4≤ <0.6): Slightly adjust the environmental parameters to the middle value of the basic threshold; (4) Anxiety (0.2≤ <0.4): Reduce temperature, noise, and light intensity; increase humidity; play soothing music; and turn off irrelevant noise sources. If the correlation R > 0.7 (anxiety is related to chronic diseases), push chronic disease relief suggestions simultaneously. (5) Discomfort (0≤ <0.2): It can adjust environmental parameters in a targeted manner (such as lowering the temperature when body temperature rises and improving air quality when blood oxygen levels drop), while triggering health warnings and sending warning information (including physiological parameters, emotional state, and environmental parameters) to family members and elderly care workers, and activating emergency care equipment (such as call buttons and emergency buttons).

[0052] Step S43, Interactive Feedback and Model Optimization: Collect multimodal data of elderly users after adjustment in real time, repeat steps S1-S3, and obtain the adjusted emotional quantification value. Calculate the degree of improvement in mood As shown below: ; If ΔS>0, the adjustment is effective, and the current adjustment strategy is retained; if ΔS≤0, the adjustment coefficient k is adjusted (increased or decreased by 0.05), the adjustment amount is recalculated, and the adjustment is repeated; at the same time, the adjustment data and sentiment feedback data are added to the training set, and the sentiment mapping model and fusion model are fine-tuned regularly (every 7 days) to improve the model's adaptability.

[0053] The evaluation criteria for the degree of improvement in emotional state are as follows: excellent: ≥0.2 indicates a shift in emotional state from negative to calm or pleasant; Good: 0.1≤ <0.2, emotional state significantly improved; Passed: 0< <0.1, emotional state has slightly improved; Unqualified: ≤0 indicates no improvement or worsening of the emotional state.

[0054] Step S44, Convenient Interaction Supplement: In view of the characteristics of the elderly with limited mobility and weak operation ability, a multimodal convenient interaction method is designed, including voice interaction (supporting dialect recognition), gesture interaction (simple gesture control of environmental adjustment), one-click call (triggering alarm and care in emergency situations), and also supporting remote interaction by family members (remotely viewing environmental parameters and emotional status, and remotely adjusting environmental equipment).

[0055] Step S5: Data storage and traceability management.

[0056] An encrypted data storage server is built to store multimodal raw data, standardized data, fusion features, emotion mapping results, interaction control records, and health warning information. Blockchain technology is used to achieve data traceability, ensuring data security, integrity, and traceability. At the same time, the data is anonymized to protect the privacy of elderly users. The data storage cycle is set to 1 year, and the data is backed up regularly for model optimization, elderly care service optimization, and big health data analysis (such as the correlation analysis between chronic diseases and emotional states).

[0057] Example 1 This embodiment targets a 75-year-old male elderly person with hypertension living alone in a home-based elderly care setting. It implements emotional mapping and interactive control of multimodal living environment data. The specific implementation process is as follows: Step S1: Multimodal data acquisition.

[0058] Set up multimodal data acquisition terminals and deploy them in the user's bedroom, living room, and bathroom to collect the following data: Step S11, Environmental Data: Temperature T=25℃, Humidity H=45%RH, Light Intensity L=250lux, Noise Intensity N=38dB, AQI=45, PM2.5 Concentration P=8μg / m³ 3 CO2 concentration C=750ppm; Scene type S=2 (living room); Time period Tm=2 (daytime activities); Environmental data subset: E={25,45,250,38,45,8,750,2,2}.

[0059] Step S12, Physiological data: Heart rate HR = 88 beats / min, Systolic blood pressure BP s =145 mmHg, diastolic blood pressure BPd =90mmHg, body temperature Temp=36.8℃, respiratory rate RR=18 breaths / min, blood oxygen saturation SpO2=96%, heart rate variability HRV=120ms (hypertension-related parameters); Physiological data subset: P={88,145,90,36.8,18,96, / ,120}.

[0060] Step S13, Behavioral Data: Behavioral Action Act=2 (sitting still), Speech Rate V s =50 words / min, intonation V t =180Hz, Voice Energy E v =55dB, Facial Expression Exp=4 (Anxiety); Behavioral data subset: B={2,50,180,55,4}.

[0061] Step S14: Data preprocessing: Using the min-max normalization method, the above data is mapped to the [0,1] interval, and missing values ​​are filled (blood glucose values ​​GLU are left blank and not included in the calculation), resulting in a normalized dataset: D={E',P',B'}.

[0062] Step S2: Multimodal data fusion and feature extraction.

[0063] Step S21, Single-modal feature extraction: Extract the environmental feature vector F using CNN. E ∈R 1×64 Physiological feature vector F is extracted using LSTM. P ∈R 1×128 The behavioral feature vector F is extracted through Self-Attention. B ∈R 1×64 .

[0064] Step S22, Multimodal Feature Fusion: Calculate the similarity loss L CMD =0.08, difference loss L HSIC =0.12, Reconstruction Loss L recon =0.05, and the three single-modal features are fused through a cross-modal attention mechanism to output a fused feature vector F∈R. 1×256 .

[0065] Step S3: Emotional mapping.

[0066] The fused feature vector F is input into the trained sentiment mapping model, and the output sentiment quantification value ŷ=0.35, corresponding to sentiment label 4 (anxiety); the correlation between sentiment and hypertension is calculated to be R=0.82 (the correlation is high, indicating that anxiety is related to elevated blood pressure).

[0067] Step S4: Interactive control.

[0068] Step S41, Control parameter initialization: Basic threshold C for living room (daytime activities) base ={22-26℃,45%-65%,100-300lux,≤40dB,≤50,≤10μg / m 3 ,≤800ppm}, adjustment coefficient k=0.2.

[0069] Step S42: Calculate the adjustment amount: The adjusted environmental parameters were obtained as follows: temperature T=23℃ (decreased by 2℃), humidity H=50%RH (increased by 5%), light intensity L=200lux (decreased by 50lux), noise intensity N=35dB (decreased by 3dB), while other parameters remained unchanged.

[0070] Step S43, Perform adjustments: Turn on the air conditioner to adjust the temperature, the humidifier to adjust the humidity, the curtains to adjust the light, and the silent mode to reduce noise. At the same time, start soothing music (volume 30dB) and push high blood pressure relief advice (please remain calm, avoid emotional excitement, and drink an appropriate amount of water). Step S44, Interactive Feedback: After 10 minutes of adjustment, multimodal data is collected again, and the sentiment quantification value ŷ'=0.55 and the sentiment improvement degree ΔS=0.2 are obtained. The adjustment is effective, and the current adjustment strategy is retained. At the same time, the adjustment data and sentiment feedback data are added to the training set for subsequent model optimization.

[0071] Step S45, Convenient Interaction: Users can adjust the temperature by voice commands such as "turn off music" and "increase temperature". In case of emergency, users can press the one-click call button to send warning information to family members and the community elderly care service center.

[0072] Step S5, Data Storage and Traceability: All collected data, emotion mapping results, and adjustment records will be encrypted and stored. Blockchain technology will be used to achieve data traceability. At the same time, user privacy data will be anonymized. The storage period is 1 year and will be used for subsequent model optimization and correlation analysis between hypertension and anxiety.

[0073] Example 2 This embodiment targets an elderly user (68 years old, female, in the recovery period after stroke) undergoing rehabilitation training in an institutional elderly care setting. It implements emotional mapping and interactive control of multimodal living environment data. The specific implementation process is as follows: Step S1: Multimodal data acquisition.

[0074] Deploy multimodal data acquisition terminals in the rehabilitation area of ​​elderly care facilities to collect the following data: Step S11, Environmental Data: Temperature T=28℃, Humidity H=48%RH, Light Intensity L=450lux, Noise Intensity N=36dB, AQI=40, PM2.5 Concentration P=6μg / m³ 3 CO2 concentration C=780ppm, scene type S=4 (rehabilitation area), time period Tm=2 (daytime activity); Environmental data subset: E={28,48,450,36,40,6,780,4,2}.

[0075] Step S12, Physiological data: Heart rate HR = 92 beats / min, Systolic blood pressure BP_s = 135 mmHg, Diastolic blood pressure BP_s = 92 beats / min, d =85mmHg, body temperature Temp=37.1℃, respiratory rate RR=20 breaths / min, blood oxygen saturation mSpO2=95%, heart rate variability HRV=110ms (parameters related to stroke rehabilitation); Physiological data subset: P={92,135,85,37.1,20,95, / ,110}.

[0076] Step S13, Behavioral Data: Behavioral Action Act=5 (Rehabilitation Training), Speech Rate V s =45 words / min, intonation V t =170Hz, Voice Energy E v =58dB, Facial Expression Exp=5 (Discomfort); Behavioral data subset: B={5,45,170,58,5}.

[0077] Step S14: Data preprocessing: Using the min-max normalization method, the above data is mapped to the [0,1] interval to obtain the normalized dataset: D={E',P',B'}.

[0078] Step S2: Multimodal data fusion and feature extraction.

[0079] Step S21, Single-modal feature extraction: Extract environmental, physiological, and behavioral feature vectors F using CNN, LSTM, and Self-Attention respectively. E F P F B .

[0080] Step S22, Multimodal Feature Fusion: Calculate the similarity loss L CMD =0.07, difference loss L HSIC =0.11, Reconstruction Loss L recon =0.04, features are fused through a cross-modal attention mechanism, and the fused feature vector F∈R is output. 1×256 .

[0081] Step S3: Emotional mapping.

[0082] The fused feature vector F is input into the emotion mapping model, and the output emotion quantification value ŷ=0.18, corresponding to emotion label 5 (discomfort); the correlation between emotion and stroke rehabilitation is calculated to be R=0.78 (indicating that discomfort is related to the intensity of rehabilitation training and physiological state).

[0083] Step S4: Interactive control.

[0084] Step S41, Control parameter initialization: Base threshold C of the rehabilitation area base ={23-27℃,50%-60%,200-400lux,≤35dB,≤50,≤10μg / m 3 ,≤800ppm}, adjustment coefficient k=0.3.

[0085] Step S42: Calculate the adjustment amount: The adjusted environmental parameters were obtained as follows: temperature T=24℃ (decreased by 4℃), humidity H=55%RH (increased by 7%), light intensity L=300lux (decreased by 150lux), and noise intensity N=33dB (decreased by 3dB). Other parameters remained unchanged.

[0086] Step S43, Execute Adjustment: Turn on the air conditioner to cool down, the humidifier to increase humidity, adjust the curtains to increase light, and reduce the volume of background music in the rehabilitation area. At the same time, trigger a health warning and push warning information (including user physiological parameters, emotional state, and environmental parameters) to the elderly care staff. The care staff can check the user's status in a timely manner and adjust the intensity of rehabilitation training.

[0087] Step S44, Interactive Feedback: After 15 minutes of adjustment, multimodal data were collected again, and the emotional quantification value ŷ'=0.42 and the emotional improvement degree ΔS=0.24 were obtained. The adjustment was effective, and the current adjustment strategy was retained.

[0088] Step S45, Convenient Interaction: Nursing staff can view the user's emotional state and environmental parameters through a remote terminal, remotely adjust environmental equipment, and the user can call the nursing staff through gestures (waving).

[0089] Step S5, Data Storage and Traceability: All data is encrypted and stored to achieve blockchain traceability, which is used for correlation analysis between rehabilitation training and emotional state, to optimize rehabilitation training programs and environmental control strategies, and to improve the quality of institutional elderly care services.

[0090] Therefore, this invention adopts the above-mentioned method for emotion mapping and interaction control of multimodal human settlement environment data. Through multimodal data fusion and collection, accurate emotional state mapping, and adaptive interaction control, it solves the problems of single data dimension, inaccurate emotion mapping, poor interaction adaptability, and lack of linkage with big health scenarios in the existing technology. It realizes dynamic adaptation between human settlement environment and emotional state of elderly users, improves the living comfort and physical and mental health level of elderly users, and promotes the deep integration of smart elderly care with the silver economy and big health industry.

[0091] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for emotion mapping and interaction control of multimodal human settlement environment data, characterized in that, Includes the following steps: Step S1, Multimodal Human Settlement Environment and User Data Collection: Build and deploy a multimodal data collection terminal to simultaneously collect human settlement environment data, elderly user physiological data, and elderly user behavioral data to form a multimodal raw dataset; Step S2, Multimodal Data Fusion and Feature Extraction: A multimodal fusion network is used to fuse features of the multimodal dataset and extract features that can characterize the relationship between environment, physiology and behavior. Step S3, Sentiment Mapping of Multimodal Data: Construct a Transformer-based sentiment mapping model to map the fused feature vectors to the emotional states of elderly users. Combine the emotional characteristics of elderly users in the elderly care scenario, define sentiment labels, and achieve accurate identification and quantification of emotional states. Step S4, Emotion-Driven Interactive Control of Living Environment: Based on the emotional mapping results, an adaptive interactive control strategy is constructed to link living environment adjustment equipment, health monitoring equipment, and emotional comfort equipment to achieve dynamic adjustment and emotional interaction of the living environment; Step S5: Data storage and traceability management.

2. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 1, characterized in that, In step S1, the multimodal human living environment and user data collection process is as follows: Step S11, Human Settlement Environment Data Acquisition: Environmental parameters are collected in real time through an environmental sensor array, with a sampling frequency of 1 time / 30s; The collected environmental parameters include: temperature (T), humidity (H), light intensity (L), noise intensity (N), air quality index (AQI), PM2.5 concentration (P), and CO2 concentration (C). Meanwhile, the environmental scene parameters collected include: scene type and time period; the scene types include: bedroom, living room, bathroom and rehabilitation area; the time periods include: morning, daytime activities and nighttime rest; The resulting subset of environmental data As shown below: ; Where S represents the scene type, coded from 1 to 4, corresponding to four scene types: bedroom, living room, bathroom, and rehabilitation area; Tm represents the time period, coded from 1 to 3, corresponding to three time periods: morning, daytime activities, and nighttime rest. Step S12: Physiological data collection for elderly users: Physiological parameters of elderly users are collected through wearable devices and non-contact monitoring devices, with a sampling frequency of 1 time / 1min. The collected physiological parameters include: heart rate HR, blood pressure systolic BP s , blood pressure diastolic BP d , body temperature Temp, respiratory rate RR, and blood oxygen saturation SpO2. For elderly users with chronic diseases, additional parameters related to chronic diseases are collected, including blood glucose (GLU) for diabetic users and heart rate variability (HRV) for cardiovascular and cerebrovascular users. The resulting subset of physiological data As shown below: ; Among them, chronic disease parameters are selectively collected based on the user's situation; if there is no relevant chronic disease, the parameter is left blank. Step S13: Collect elderly user behavior data: Collect the behavior and voice characteristics of elderly users through visual acquisition devices, motion sensors, and voice acquisition devices, with a sampling frequency of 1 time / 10s. The collected behavioral parameters include: behavioral actions, coded as 1-6, corresponding to: lying down, sitting, walking, eating, rehabilitation training, and abnormal emotional actions; and speech features: speech rate V. s intonation V t Voice energy E v Facial expression features are extracted through contour features and encoded as 1-5, corresponding to: joy, calmness, neutrality, anxiety, and sadness, respectively. The resulting subset of behavioral data As shown below: ; Where Act represents the action code; Exp represents the facial expression code; Step S14, Data Preprocessing: The collected multimodal raw data is cleaned and standardized to remove outliers and missing values. The min-max standardization method is used to map all parameters to the [0,1] interval to obtain a standardized multimodal dataset, as shown below: ; Where E', P', and B' are the standardized results of environmental data, physiological data, and behavioral data, respectively; the standardization formula is shown below: ; Where x represents the original data; x' represents the standardized data; This is the minimum value of the parameter; This is the maximum value of the parameter; For missing values, the mean value of the same time period and the same scenario is used to fill in the missing values ​​to ensure the integrity of the dataset.

3. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 2, characterized in that, In step S2, the multimodal data fusion and feature extraction are performed as follows: Step S21, Single-modal feature extraction: Extract features from environmental data subset E', physiological data subset P', and behavioral data subset B' respectively; Step S22, Multimodal Feature Fusion: Introducing similarity loss, difference loss, and reconstruction loss, a multimodal fusion model is constructed to fuse single-modal feature vectors. , , Merged into a unified fusion feature vector .

4. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 3, characterized in that, In step S21, the specific process of single-modal feature extraction is as follows: Step S211, Environmental Feature Extraction: A convolutional neural network is used to extract the spatial features of the environmental data. The input is an 8-dimensional parameter E'. The feature is extracted through two convolutional layers with a kernel size of 3×3 and a stride of 1, and one pooling layer with a kernel size of 2×2 and a stride of 2. The output is an environmental feature vector. As shown below: ; Step S212, Physiological Feature Extraction: A Long Short-Term Memory (LSTM) network is used to extract the temporal features of the physiological data. The input is an 8-dimensional parameter P', with two hidden layers, each containing 128 neurons. The output is a physiological feature vector. As shown below: ; Step S213, Behavioral Feature Extraction: Key features of behavioral data are extracted using an attention mechanism. The input is a 5-dimensional parameter B'. Through attention weight allocation, the behavioral feature vector is output. As shown below: 。 5. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 4, characterized in that, In step S22, the specific process of multimodal feature fusion is as follows: Step S221, Similarity Loss The calculation enables information sharing among different modalities and reduces distribution differences, as shown below: ; in, The order of the central moment; Denotes the k-th order central moment; Represents the L2 norm; Step S222, Difference Loss The calculation helps the model distinguish the feature differences corresponding to different emotions, as shown below: ; Where n is the number of samples; tr(·) denotes the trace of the matrix; K, M, and N are respectively , , The corresponding kernel matrix; L, H, H', H'' are auxiliary matrices; Step S223, Reconstructing Losses The computation ensures that the fused features can capture the key details of each modality and reduce trivial representations, as shown below: ; in, , , The reconstruction results are for environmental, physiological, and behavioral characteristics, respectively. Step S224, Feature Fusion Output: Through a cross-modal attention mechanism, the single-modal features are interactively enhanced. Combined with a loss function, the fusion model is trained, and the final output fused feature vector F is shown below: ; in, , , They are respectively , , The weight matrix; For feature dimensions; This indicates feature concatenation; the Softmax function is used to normalize weights.

6. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 5, characterized in that, In step S3, the sentiment mapping of multimodal data is carried out as follows: Step S31, Emotion Label Definition: Based on the physiological and psychological characteristics of the elderly population, define 5 types of emotional state labels and assign quantitative values ​​for subsequent interaction control; Step S32, Sentiment Mapping Model Training: Using the fused feature vector F as input and the sentiment label quantization value as output, train the improved Transformer model. The model contains 4 encoder layers and 2 decoder layers. Each encoder layer contains a multi-head attention mechanism and a fully connected layer. The decoder layer is used to output the sentiment quantization value. During training, mean squared error was used as the loss function, the Adam optimizer was used, the learning rate was set to 1e-4, and the number of iterations was set to 100 rounds until the model converged. loss function As shown below: ; Where N is the number of training samples; This represents the true sentiment quantification value for the i-th sample. The quantified value of sentiment predicted by the model; Step S33, Emotional State Output: Input the real-time extracted fusion feature vector F into the trained emotional mapping model, and output the real-time emotional quantification value of the elderly user. Based on the range of quantified values, the corresponding sentiment state label is determined, completing the mapping from multimodal data to sentiment state. Simultaneously, combining chronic disease parameters, the correlation coefficient R between sentiment state and chronic disease is output, as shown below: ; Where m represents the number of chronic disease parameters; Let be the standardized value of the i-th chronic disease parameter; This represents the mean value of the parameter for this chronic disease. The mean of the sentiment quantification values; the correlation degree R∈[-1,1].

7. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 6, characterized in that, In step S31, five categories of emotional state labels are defined and assigned quantitative values, as follows: Tag 1: Pleasure, quantitative value S1=0.8-1.0: corresponds to elderly users' physical and mental comfort, positive emotions, normal physiological parameters, and relaxed behavior; Tag 2: Calm, quantitative value S2=0.6-0.8: corresponds to elderly users with stable emotions, no obvious fluctuations, normal physiological parameters, and calm behavior. Tag 3: Neutral, quantitative value S3=0.4-0.6: Corresponds to elderly users with no obvious emotional tendency, basically normal physiological parameters, and normal behavior. Tag 4: Anxiety, quantitative value S4=0.2-0.4: corresponds to elderly users' emotional tension, irritability, abnormal physiological parameters, and restless behavior; Tag 5: Discomfort, quantitative value S5=0.0-0.2: corresponds to physical discomfort or low mood in elderly users, obvious abnormal physiological parameters, and abnormal behavior.

8. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 6, characterized in that, In step S4, the emotion-driven human-centered environment interaction control process is as follows: Step S41, Control Parameter Initialization: Based on different elderly care scenarios and time periods, set the basic threshold range for environmental parameters. Then, adjust the basic thresholds individually, taking into account the elderly user's age and chronic disease status, to form a basic control parameter set, as shown below: ; Step S42, Adaptive Environment Adjustment: Based on the emotion quantification value obtained from the emotion mapping. And sentiment tags, calculate the moderating amount of environmental parameters. The moderating magnitude is positively correlated with the emotional quantification value, as shown below: ; Where k is the adjustment coefficient, ranging from 0.1 to 0.3; 0.5 is the median value of the neutral sentiment quantification; when When >0.5, the adjustment amount is positive; when When <0.5, the adjustment amount is negative; Step S43, Interactive Feedback and Model Optimization: Collect multimodal data of elderly users after adjustment in real time, repeat steps S1-S3, and obtain the adjusted emotional quantification value. Calculate the degree of improvement in mood As shown below: ; Step S44, Convenient Interaction Supplement: Design multimodal convenient interaction methods, including voice interaction, gesture interaction, one-click call, and support for remote interaction by family members.

9. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 1, characterized in that, Targeted adjustments to environmental parameters can alleviate negative emotions. Specific adjustment strategies are as follows: (1) Pleasure, 0.8≤ ≤1.0: Fine-tune the environmental parameters to the upper limit of comfort, and at the same time activate the emotional comfort equipment; (2) Calm, 0.6≤ <0.8: This maintains environmental parameters within the basic threshold range, requiring no additional adjustment. (3) Neutral, 0.4≤ <0.6: Slightly adjust the environmental parameters to the middle value of the basic threshold; (4) Anxiety, 0.2≤ <0.4: Reduce temperature, noise, and light intensity; increase humidity; play soothing music; and turn off irrelevant noise sources. If the correlation R > 0.7, push chronic disease relief suggestions simultaneously. (5) Discomfort, 0≤ <0.2: The system adjusts environmental parameters accordingly, triggers health alerts, sends warning messages to family members and elderly care workers, and activates emergency care equipment. The evaluation criteria for the degree of improvement in emotional state are as follows: (1) Excellent: ≥0.2 indicates a shift in emotional state from negative to calm or pleasant; (2) Good: 0.1≤ <0.2, emotional state significantly improved; (3) Qualified: 0< <0.1, emotional state has slightly improved; (4) Unqualified: ≤0 indicates no improvement or worsening of the emotional state.

10. The method for emotion mapping and interaction control of multimodal human settlement environment data according to claim 1, characterized in that, In step S5, data storage and traceability management specifically includes: Build an encrypted data storage server to store multimodal raw data, standardized data, fusion features, sentiment mapping results, interaction control records, and health warning information; Using blockchain technology to achieve data traceability ensures the security, integrity, and traceability of data; Meanwhile, the data is anonymized to protect the privacy of elderly users; the data storage period is set to 1 year, and the data is backed up regularly for model optimization, elderly care service optimization, and big health data analysis.