A health state evaluation system based on multi-modal image feature analysis

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using multimodal image feature analysis and dynamic evaluation mechanisms, the problem of insufficient differentiation of physiological event types and severity in existing systems has been solved, enabling personalized health status assessment and early anomaly detection.

CN122291014APending Publication Date: 2026-06-26HUNAN CHUNSO VIENTIANE TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HUNAN CHUNSO VIENTIANE TECH CO LTD
Filing Date: 2026-03-30
Publication Date: 2026-06-26

Application Information

Patent Timeline

30 Mar 2026

Application

26 Jun 2026

Publication

CN122291014A

IPC: G16H50/30; G16H50/20; G16H50/70; G06V40/16; G06F18/241; G06F18/15; G06F18/2433; G06V10/44; G06F18/213; G06V10/54; G06F18/25; G06F18/27; G06V10/25

AI Tagging

Technology Topics

Medical treatment Physics

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Deep learning-based eating micro-expression and food satisfaction correlation analysis method and application
CN122244925ADigital data information retrieval Character and pattern recognitionFood preferenceMicroexpression
High efficiency filter for clean room
CN224397957UMechanical apparatus Lighting and heating apparatusAir cleaningOperating theatres
A method of reducing the n-hexane extractables in a polyethylene resin
CN122255330Aincrease pressurerelieve pressure Polymer science Solvent
Artificial intelligence based system for generating personalized medical information
US20260179774A1Medical automated diagnosis Personalization Anatomical structures
Medical assistance device, medical assistance method, and non-transitory recording medium
US20260165575A1Surgery Endoscopes Acetic acidColposcopes

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing multimodal health status assessment systems cannot effectively distinguish between different types and severities of physiological events, resulting in poor individual adaptability and failure to assess the severity of the event itself and the synergy between modalities.

Method used

Multimodal image feature analysis is employed, and the chaos index of each mode is calculated through the chaos quantization module. Combined with the synchronous detection module, the real-time difference between modes is evaluated, and the evaluation mode is switched under dynamic threshold triggering. The time window and fusion algorithm are adjusted to achieve personalized monitoring.

Benefits of technology

It enables the essential description of physiological states and early anomaly identification, improves the sensitivity and specificity of anomaly detection, adapts to the physiological differences of different individuals, and responds promptly to sudden anomalies.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122291014A_ABST

Patent Text Reader

Abstract

This invention discloses a health status assessment system based on multimodal image feature analysis, specifically relating to the field of medical and health monitoring technology. Specifically, it involves: based on multimodal image data and physiological signal time-series data, sequentially extracting features to generate feature time series, calculating the chaos index of each modality, calculating the real-time difference between chaos indices and comparing it with a dynamic threshold; when the real-time difference exceeds the threshold, the system switches from steady-state monitoring to crisis warning mode, adjusts the assessment time window, and calls the corresponding fusion algorithm for comprehensive analysis, outputting the health status assessment result; this invention uses a chaos quantification module to describe physiological states from a deeper nonlinear dynamic perspective, achieving a more accurate characterization of physiological essence; it uses a synchronous detection module to quantify the real-time difference of multimodal chaos indices, comprehensively reflecting the degree of coordination disorder among various modalities; and it improves the timeliness of response to sudden anomalies and the accuracy of assessment by adaptively adjusting the assessment time window and fusion algorithm.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical and health monitoring technology, and more specifically, to a health status assessment system based on multimodal image feature analysis. Background Technology

[0002] With the accelerating aging of the population and the rising prevalence of chronic diseases, continuous health monitoring technologies have received widespread attention. Multimodal physiological monitoring systems, by integrating different types of data sources, can reflect the physiological state of the human body from multiple dimensions, providing decision support for early detection of abnormalities and clinical intervention. Currently, image-based heart rate monitoring technology, wearable physiological signal acquisition devices, and multimodal information fusion analysis technology are gradually being applied in the field of health management.

[0003] Existing multimodal health status assessment systems typically collect data within fixed time windows, analyze it using pre-defined fusion models, and output periodic assessment results. To address sudden anomalies, some systems have introduced dynamic switching mechanisms. For example, when multiple premature ventricular contractions (PVCs) are detected in the electrocardiogram (ECG) signal or when blood pressure waveforms fluctuate drastically, the system automatically switches to a high-frequency acquisition mode or triggers an alarm. This event-count-based switching method improves the system's responsiveness to some extent. However, existing dynamic switching mechanisms have the following limitations: Existing systems typically use fixed event count thresholds as the basis for switching, such as "3 premature ventricular contractions within 2 minutes" or "more than 5 blood pressure fluctuations within 1 minute". This rule based on a uniform threshold fails to consider the significant impact of individual differences on the clinical significance of events. The same event frequency has completely different warning meanings for heart failure patients and healthy athletes, resulting in poor system adaptability to different individuals.

[0004] Existing systems simply count abnormal events, treating events of different types and severities the same, failing to assess the severity of the events themselves or the synergy between different modalities. For example, a fatal ventricular fibrillation and a harmless premature contraction are both recorded as one event. This superficial approach fails to distinguish the clinical significance of events, potentially leading to underreporting of serious events or false positives for harmless ones. Therefore, this invention proposes a health status assessment system based on multimodal image feature analysis to address these problems. Summary of the Invention

[0005] To achieve the above objectives, the present invention provides the following technical solution: A health status assessment system based on multimodal image feature analysis includes: The multimodal acquisition module is used to simultaneously acquire multimodal image data and corresponding physiological signal time-series data of the subject; The feature extraction module is used to extract image features from multimodal image data and signal features from physiological signal time series data to generate feature time series for each modality. The chaos quantization module is used to calculate the chaos index of each modality feature time series to quantify the degree of chaos in each mode. The synchronous detection module is used to calculate the real-time difference between the chaos exponents of each mode and compare the real-time difference with the preset dynamic threshold. The dynamic assessment module is used to switch the assessment paradigm from steady-state monitoring mode to crisis early warning mode when the real-time difference exceeds the dynamic threshold. It adjusts the assessment time window and calls the corresponding preset fusion algorithm to comprehensively analyze the current multimodal data, generate health status assessment results and output them.

[0006] In a preferred embodiment, the multi-mode acquisition module includes: An image acquisition unit is used to acquire at least one of the following: a facial video image, an infrared thermal imaging image, or a dermoscopy image of the subject. The physiological signal acquisition unit is used to synchronously acquire at least one of the following: electrocardiogram signal, blood pressure waveform signal, or photoplethysmography signal corresponding to the image data; The synchronization control unit is connected to the image acquisition unit and the physiological signal acquisition unit respectively, and is used to generate a unified time reference signal to ensure that the multimodal image data and the physiological signal time series data are strictly aligned on the time axis.

[0007] In a preferred embodiment, the feature extraction module includes: The image feature extraction unit is used to perform spatiotemporal domain analysis on the multimodal image data acquired by the image acquisition unit, extract time-series image features including blood flow pulse features, skin texture features, or heat distribution features, and generate image feature time series. The physiological signal feature extraction unit is used to perform nonlinear dynamic analysis on the time series data of physiological signals acquired by the physiological signal acquisition unit, extract deep signal features including sample entropy features, correlation dimension features or Lyapunov exponential features, and generate signal feature time series.

[0008] In a preferred embodiment, spatiotemporal domain analysis refers to: For facial video images, blood flow pulse signals based on the principle of photoplethysmography are extracted to show the time-varying pixel mean of the facial region of interest, generating an image feature time series that includes heart rate and heart rate variability. For infrared thermal imaging images, thermal distribution signals showing the time-varying temperature mean of the region of interest are extracted to generate a thermal image feature time series that includes temperature fluctuation features. For dermoscopy images, signals showing the time-varying gray-level co-occurrence matrix feature parameters of skin texture in consecutive frames are extracted to generate a skin image feature time series that includes texture evolution features.

[0009] In a preferred embodiment, the nonlinear dynamics analysis method is specifically implemented as follows: When the physiological signal time series data is an electrocardiogram signal, calculate the probability distribution of the distance between vector points in the reconstructed phase space, and calculate the sample entropy feature based on the cumulative sum of the probability distribution. When the physiological signal time series data is a blood pressure waveform signal, the reconstructed phase space is divided into grids, the distribution probability of phase space trajectory points at different scales is calculated, and the correlation dimension feature is calculated based on the slope fitting the logarithmic linear relationship between the distribution probability and the scale. When the physiological signal time series data is a photoplethysmography signal, the above sample entropy features and correlation dimension features are calculated respectively, and the two are weighted and fused to generate a feature time series containing signal complexity information.

[0010] In a preferred embodiment, the chaos quantization module includes: The image chaos calculation unit, connected to the image feature extraction unit, is used to receive the image feature time series, reconstruct the phase space of the image feature time series, and use the Kolmogorov entropy calculation method to analyze the evolution rate of the phase trajectory in the reconstructed phase space, and calculate the Kolmogorov entropy of the image mode as the chaos index of the image mode. The signal chaos calculation unit, connected to the physiological signal feature extraction unit, is used to receive the signal feature time series, reconstruct the phase space of the signal feature time series, and calculate the maximum Lyapunov exponent of the signal feature time series using the small data amount method, which serves as the chaos exponent of the signal mode. The chaos index output unit is connected to the image chaos calculation unit and the signal chaos calculation unit respectively, and is used to synchronously output the Kolmogorov entropy of the image mode and the maximum Lyapunov exponent of the signal mode to the synchronization detection module.

[0011] In a preferred embodiment, the synchronization detection module includes: The exponential normalization unit, connected to the chaotic exponential output unit, is used to receive multiple Kolmogorov entropies corresponding to the time series of each image feature in the image modality set and multiple maximum Lyapunov exponents corresponding to the time series of each signal feature in the signal modality set. It performs normalization processing using the baseline value of its respective modality to generate a set of image chaotic normalized values and a set of signal chaotic normalized values. The cross-modal pairing unit, connected to the exponential normalization unit, is used to pair each normalized value in the image chaotic normalized value set with each normalized value in the signal chaotic normalized value set to generate several cross-modal normalized value pairs. The difference calculation unit, connected to the cross-mode pairing unit, is used to calculate the absolute difference between the two normalized values in each cross-mode normalized value pair, and to perform a weighted average of all absolute differences to generate a chaotic synchronization difference that characterizes the overall synchronization state. The timing generation unit, connected to the difference calculation unit, is used to arrange multiple chaotic synchronization differences calculated within a continuous time window in chronological order to generate a chaotic synchronization difference sequence. The threshold dynamic setting unit is used to acquire historical chaotic synchronization difference sequence data of subjects in a healthy state, calculate the mean and standard deviation of the historical data, set the sum of the mean and the standard deviation of a preset multiple as the dynamic threshold, and update it in real time. The comparison output unit is connected to the timing generation unit and the threshold dynamic setting unit respectively. It is used to compare the current value in the chaotic synchronization difference sequence with the dynamic threshold. When the current value exceeds the dynamic threshold, a trigger signal is output to the dynamic evaluation module.

[0012] In a preferred embodiment, the weights of the weighted average are preset in the following manner: Obtain the standard dataset corresponding to the current health assessment task. The standard dataset contains a set of multimodal chaotic normalized values of historical subjects and the corresponding real health status labels. The mutual information value between the chaotic normalized value and the health status label of each image mode and the mutual information value between the chaotic normalized value and the health status label of each signal mode are calculated respectively. The mutual information value is calculated based on the relative entropy of the joint probability distribution and the marginal probability distribution of the two variables and is used to measure the nonlinear correlation between each mode and the health status. Divide the mutual information value of each image mode by the sum of the mutual information values of all image modes to obtain the weight of that image mode, so that the sum of the weights of all image modes is 1; divide the mutual information value of each signal mode by the sum of the mutual information values of all signal modes to obtain the weight of that signal mode, so that the sum of the weights of all signal modes is 1. For each cross-modal pair consisting of an image mode and a signal mode, the weight of the image mode is multiplied by the weight of the signal mode to obtain the weight of the cross-modal pair. The sum of the weights of all cross-modal pairs is 1. The calculated weights of each cross-modal pair are stored for use by the difference calculation unit during weighted averaging.

[0013] In a preferred embodiment, after receiving the trigger signal, the dynamic evaluation module adjusts the evaluation time window and calls the corresponding fusion algorithm as follows: Get the threshold magnitude by which the current chaotic synchronization difference exceeds the dynamic threshold. The threshold magnitude is the ratio of the current chaotic synchronization difference to the dynamic threshold minus one. Obtain the first derivative of the current chaotic synchronization difference sequence within the most recent preset time period as the rate of difference change; Based on the over-threshold amplitude and the rate of change of the difference, calculate the exponent with the natural constant e as the base. The exponent of the exponent is the sum of the over-threshold amplitude multiplied by the first adjustment coefficient and the rate of change of the difference multiplied by the second adjustment coefficient. This exponent is used as the adjustment coefficient. Divide the base time window length under steady-state monitoring mode by the adjustment coefficient to obtain the intermediate time window length; The intermediate time window length is compared with the preset minimum time window length, and the larger of the two values is taken as the final adjusted evaluation time window length.

[0014] The preset fusion algorithms include a fast fusion algorithm based on logistic regression, a balanced fusion algorithm based on support vector machines, and a precise fusion algorithm based on deep neural networks. Based on the preset interval where the threshold amplitude is located, a corresponding fusion algorithm is selected from multiple pre-stored fusion algorithms to perform comprehensive analysis on the multimodal data within the current time window, and the health status assessment result obtained from the comprehensive analysis is output.

[0015] The technical effects and advantages of this invention are as follows: This invention employs a chaos quantification module to perform nonlinear dynamic analysis on the time series of features from various modalities, calculating chaos indices to quantify the degree of chaos in each modality. This approach characterizes the intrinsic complexity and evolutionary patterns of physiological activities at a deeper dynamic level. This processing method elevates the original multimodal data from a time-domain description to a nonlinear dynamic description, revealing deep physiological characteristics hidden beneath the surface. For example, it quantifies the information generation rate of image feature sequences using Kolmogorov entropy and the trajectory divergence rate of signal feature sequences using the maximum Lyapunov exponent. This avoids the limitations of existing technologies that merely superficially count abnormal events without distinguishing their severity. Even if physiological events of different types and severities are recorded equally in event counting, their performance in chaos indices differs significantly. For instance, the chaos index change caused by a single ventricular fibrillation is far greater than that of a harmless premature contraction. This allows the system to distinguish the intrinsic severity of events at a dynamic level, achieving a more fundamental description of physiological states and laying a solid data foundation for subsequent synchronization analysis and dynamic regulation.

[0016] This invention calculates the real-time difference between the chaos indices of different modalities through a synchronous detection module and compares this difference with a dynamic threshold, quantifying the degree of synchronization and coordination of a multimodal system at the dynamic level into a single index. This real-time difference is not a simple accumulation of event counts, but rather a comprehensive reflection of the coupling relationship between different modalities at the chaotic dynamic level. When the physiological systems of different modalities operate in coordination, their chaos indices maintain similar fluctuation levels, and the real-time difference is small. When an anomaly occurs in a certain modality, causing a change in its dynamic characteristics, the chaos index of that modality deviates from other modalities, and the real-time difference increases. This mechanism can effectively assess the quality of the events themselves and the synergy between events of different modalities. For example, when the blood flow pulse characteristics of the image modality are abnormal and the electrocardiogram characteristics of the signal modality also change accordingly, the chaos indices of both deviate synchronously. The real-time difference can capture this multimodal synergistic anomaly, while single-modal analysis or simple event counting is difficult to identify this cross-modal coupling relationship. By setting a dynamic threshold through statistical analysis of chaotic synchronization differences under historical healthy states, the system can identify early signs of decoupling or imbalance in multimodal systems earlier and more accurately, avoiding the static defects of fixed rule thresholds, significantly improving the sensitivity and specificity of anomaly detection, and adapting to the physiological differences of different individuals.

[0017] This invention utilizes a dynamic evaluation module to switch the evaluation paradigm from steady-state monitoring to crisis early warning mode when the real-time difference exceeds a dynamic threshold. It adjusts the evaluation time window and invokes a corresponding preset fusion algorithm, achieving adaptive adjustment of the evaluation strategy to the rate of physiological state evolution. When the chaotic synchronization difference exceeds the dynamic threshold, it indicates that the chaotic synchronization state of the multimodal system has significantly deviated from the individual's normal fluctuation range. Based on this, the system determines that the current physiological state may be rapidly evolving, requiring more refined monitoring and more accurate analysis. The dynamic evaluation module dynamically compresses the evaluation time window based on the threshold exceedance magnitude and the rate of difference change. The larger the threshold exceedance magnitude and the faster the difference change rate, the more significant the time window compression. This allows the system to automatically shorten the evaluation window to capture transient changes when the physiological state deteriorates rapidly. Simultaneously, it invokes fusion algorithms of different complexities based on the threshold exceedance range: a fast algorithm is used for mild anomalies to save resources, while a precise algorithm is used for severe anomalies to ensure accuracy. This adaptive regulation mechanism enables the system to smoothly transition between different working modes, from routine monitoring to crisis early warning. It avoids the limitations of existing technologies, such as fixed assessment cycles and single fusion algorithms, which are difficult to adapt to the non-uniform evolution of physiological states. This significantly improves the timeliness of response to sudden abnormalities and the accuracy of assessment, thus buying valuable time for clinical intervention. Attached Figure Description

[0018] To facilitate understanding by those skilled in the art, the present invention will be further described below with reference to the accompanying drawings; Figure 1 This is a schematic diagram of a health status assessment system based on multimodal image feature analysis according to the present invention. Detailed Implementation

[0019] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0020] Reference Figure 1 The following examples were obtained: Example 1: A health status assessment system based on multimodal image feature analysis, comprising: The multimodal acquisition module is used to simultaneously acquire multimodal image data and corresponding physiological signal time-series data of the subject. The multimodal image data includes visual information reflecting the physiological activities of the subject's body surface and superficial tissues, and the physiological signal time-series data includes continuous waveform information reflecting the electrophysiological activities or hemodynamics of the subject's internal organs. Through synchronous acquisition, subsequent analysis can jointly process physiological information from different sources at the same time reference. The feature extraction module is used to extract image features from multimodal image data and signal features from physiological signal time series data, generating feature time series for each modality. Image features reflect the changes in pixel values, textures or temperatures over time in image data, while signal features reflect the waveform complexity or variability over time in physiological signals, transforming the original data into feature sequences that are physiologically meaningful and suitable for chaotic analysis. The chaos quantification module is used to calculate the chaos index of each modality feature time series to quantify the degree of chaos of each modality. The chaos index reflects the inherent randomness of each modality's physiological activities at the dynamic level, its sensitivity to initial conditions, and the complexity of the system, thus elevating the feature time series from a time-domain description to a nonlinear dynamic description. The synchronization detection module is used to calculate the real-time difference between the chaos exponents of each modality and compare the real-time difference with the preset dynamic threshold. The real-time difference reflects the degree of synchronization and coordination of different modal physiological systems at the chaotic dynamics level. The smaller the difference, the tighter the coupling of each modal physiological system; the larger the difference, the more decoupled or disordered each modal physiological system tends to be. The dynamic assessment module is used to switch the assessment paradigm from steady-state monitoring mode to crisis early warning mode when the real-time difference exceeds the dynamic threshold. It adjusts the assessment time window and calls the corresponding preset fusion algorithm to comprehensively analyze the current multimodal data, generate health status assessment results and output them. The steady-state monitoring mode uses a longer time window and a stable fusion algorithm to adapt to routine monitoring needs, while the crisis early warning mode uses a shorter time window and a more sensitive fusion algorithm to capture transient anomalies and respond quickly.

[0021] The multi-mode acquisition module includes: The image acquisition unit is used to acquire at least one of the following: facial video images, infrared thermal imaging images, or dermoscopy images of the subject. The facial video images record the reflected light intensity information of the subject's facial skin area over time, from which subtle color changes related to heartbeats can be extracted. The infrared thermal imaging images record the radiation information of the subject's body surface temperature distribution over time, from which temperature fluctuations related to local metabolic activity and blood perfusion can be extracted. The dermoscopy images record magnified image information of the subject's skin surface microstructure over time, from which the evolution patterns of skin texture, blood vessel morphology, and pigment distribution can be extracted. The physiological signal acquisition unit is used to synchronously acquire at least one of the following signals corresponding to the image data: electrocardiogram (ECG) signal, blood pressure waveform signal, or photoplethysmography (PPG) signal. The ECG signal records the potential changes formed on the body surface by the electrical activity of the heart, reflecting the electrophysiological characteristics and rhythm information of the myocardium. The blood pressure waveform signal records the fluctuation curve of arterial pressure with the cardiac cycle, reflecting the heart's pumping function and vascular elasticity. The PPG signal records the light absorption fluctuation of peripheral vascular blood volume with the heartbeat, reflecting peripheral circulation and autonomic nervous system regulation. Through synchronous acquisition, the surface phenomena reflected by the image data and the internal processes reflected by the physiological signals can be mutually verified in the context of the same physiological event. The synchronization control unit, connected to both the image acquisition unit and the physiological signal acquisition unit, generates a unified time-series reference signal to ensure strict alignment of multimodal image data and physiological signal time-series data on the time axis. The time-series reference signal is a synchronization pulse generated by a high-precision clock source. Each frame of the image acquisition unit and each sampling point of the physiological signal acquisition unit are marked with the same timestamp, enabling subsequent analysis to determine the time correspondence between different modal data with millisecond-level accuracy, providing a reliable data foundation for cross-modal synchronization analysis based on chaotic dynamics.

[0022] The feature extraction module includes: The image feature extraction unit is used to perform spatiotemporal domain analysis on the multimodal image data acquired by the image acquisition unit, extract time-series image features including blood flow pulse features, skin texture features, or heat distribution features, and generate image feature time series. The physiological signal feature extraction unit is used to perform nonlinear dynamic analysis on the time series data of physiological signals acquired by the physiological signal acquisition unit, extract deep signal features including sample entropy features, correlation dimension features or Lyapunov exponential features, and generate signal feature time series.

[0023] Spatiotemporal domain analysis refers to: For facial video images, blood flow pulse signals are extracted from the mean pixel value of the facial region of interest (ROI) over time, based on the photoplethysmography (PPG) principle, to generate a time series of image features including heart rate and heart rate variability. The ROI refers to a skin region rich in blood flow signals selected in the facial video image, typically including the forehead, cheeks, or the entire face. These regions have abundant subcutaneous blood vessels and are less affected by facial facial expressions, making them suitable for extracting subtle color changes related to heartbeats. The PPG principle utilizes the physical phenomenon that the periodic changes in blood volume in facial vessels during heart contraction and relaxation alter the skin's light absorption and reflection characteristics. When ambient light illuminates the skin surface, the light intensity entering the image acquisition unit fluctuates periodically with changes in blood volume. Although these fluctuations are subtle, they can be captured through the spatiotemporal variations of pixel values in the video image. The specific method for extracting the blood flow pulse signal with pixel average value changing over time is as follows: In each frame of the facial video image, the pixel average value of all pixels in the selected region of interest is calculated in the red or green channel. Since hemoglobin absorbs green light strongly, the green channel usually contains more obvious pulse fluctuation information. The pixel average values calculated for each frame are arranged in chronological order to form the original blood flow pulse signal waveform. This waveform is then bandpass filtered to remove low-frequency interference such as respiratory movements and head displacement, as well as high-frequency noise. The filtered waveform is the pure blood flow pulse signal. The specific method for generating the image feature time series containing heart rate and heart rate variability is as follows: Peak detection is performed on the filtered blood flow pulse signal to identify the peak position of each pulse wave. The time interval between adjacent peaks is the instantaneous heart rate cycle. Each instantaneous heart rate cycle is converted into heart rate beats per minute to obtain the instantaneous heart rate value. All instantaneous heart rate values are arranged in chronological order to obtain the heart rate time series. At the same time, the difference between consecutive instantaneous heart rate cycles is analyzed to obtain the heart rate variability time domain index. The heart rate time series and the heart rate variability index are combined to form the image feature time series.

[0024] For infrared thermal imaging images, the thermal distribution signal of the mean temperature change over time in the region of interest (ROI) is extracted to generate a thermal image feature time series containing temperature fluctuation characteristics. Infrared thermal imaging records the intensity of infrared radiation radiated outward from the subject's body surface. This intensity is directly proportional to the skin surface temperature; the higher the temperature, the greater the radiation intensity, which is represented by larger pixel values in the image. The ROI refers to an anatomical region closely related to physiological state selected in the infrared thermal imaging image, typically including the tip of the nose, forehead, eye socket, or finger area. The temperature changes in these areas are directly affected by local blood perfusion, metabolic activity, and autonomic nervous system regulation, reflecting the dynamic evolution of deep physiological states. The specific method for extracting the thermal distribution signal of the mean temperature change over time is as follows: In each frame of the infrared thermal imaging image, the pixel average of all pixels in the selected ROI is calculated. This average value has a linear correlation with the average temperature of the region. The calculated temperature average values for each frame are arranged in chronological order to form the original thermal distribution signal waveform. This waveform is then detrended to eliminate the influence of slow ambient temperature drift, resulting in a pure temperature fluctuation signal. The specific method for generating a thermal image feature time series containing temperature fluctuation characteristics is as follows: perform power spectral density analysis on the temperature fluctuation signal, extract the frequency band energy corresponding to the respiratory rate and heart rate as feature parameters, and calculate statistical indicators such as the mean amplitude, variance and peak frequency of the temperature fluctuation. Arrange or combine these feature parameters in chronological order to form a thermal image feature time series.

[0025] For dermoscopic images, the signal of the gray-level co-occurrence matrix feature parameters of skin texture in consecutive frames is extracted over time to generate a skin image feature time series containing texture evolution characteristics. Dermoscopic images are microscopic images of the skin surface acquired through a dermoscopic magnification device, clearly displaying fine structures such as skin texture direction, skin groove and ridge morphology, hair follicle openings, and superficial vascular networks. The gray-level co-occurrence matrix is a second-order statistical method describing image texture features. It reflects the spatial dependence of image gray levels in direction, interval, and amplitude by calculating the probability distribution of the simultaneous occurrence of gray values of two pixels with a specific spatial relationship in the image. The specific method for extracting the gray-level co-occurrence matrix feature parameters over time is as follows: In each frame of dermoscopic image, the color image is first converted to a gray-level image, and then a region of interest containing representative skin texture is selected. The gray-level co-occurrence matrix is calculated for the region in four directions with a preset step size. From each gray-level co-occurrence matrix, four typical texture feature parameters are extracted: contrast, correlation, energy, and homogeneity. Contrast reflects the clarity and groove depth of the texture, correlation reflects the directional consistency of the texture, energy reflects the uniformity and roughness of the texture, and homogeneity reflects the smoothness of local changes in the texture. The average of the calculation results in the four directions is taken as the texture feature vector of the frame image. The specific method for generating skin image feature time series containing texture evolution characteristics is as follows: the texture feature vectors calculated from consecutive frame images are arranged in chronological order. The contrast change sequence over time reflects the dynamic process of skin texture gradually deepening or lightening. The correlation change sequence over time reflects the law of skin texture direction consistency fluctuating with skin state. The energy change sequence over time reflects the evolution trend of skin surface roughness. The homogeneity change sequence over time reflects the change in the stability of skin microstructure. These sequences are combined or used separately as skin image feature time series for subsequent chaotic quantification analysis.

[0026] The specific method of nonlinear dynamic analysis is as follows: When the physiological signal time series data is an electrocardiogram (ECG) signal, the probability distribution of the distance between vector points in the reconstructed phase space is calculated, and the sample entropy feature is calculated based on the cumulative sum of the probability distribution. The reconstructed phase space refers to mapping a one-dimensional ECG time series to a high-dimensional space using time-delay embedding technology to reveal the hidden dynamic evolution information within it. Specifically, the embedding dimension m and the time delay parameter τ are selected to construct a vector sequence X(i) = [x(i), x(i+1), …, x(i+m-1)], where i ranges from 1 to N-m+1, N is the length of the original signal, and each X(i) represents the position of the system's state point at a certain moment in the m-dimensional phase space. Calculating the distance between vector points means that for each pair of reconstructed vectors X(i) and X(j), where i≠j, the Chebyshev distance between them is calculated, which is the maximum absolute value of the difference between the corresponding components of the two vectors. This distance measures the similarity between the states of the two systems.

[0027] The probability distribution refers to the number of vectors X(j) whose distance to each X(i) is less than r, given a similarity tolerance threshold r. Dividing this number by the total number of vectors Nm yields the conditional probability, which is then averaged to obtain Bm(r). The sample entropy feature is calculated by increasing the embedding dimension to m+1 and repeating the above process to obtain Bm+1(r). The sample entropy value is defined as -ln[Bm+1(r) / Bm(r)]. This value reflects the negative logarithm of the conditional probability that neighboring pairs in the phase space remain neighboring as the embedding dimension increases. A larger value indicates a more complex and unpredictable sequence, while a smaller value indicates a more regular and predictable sequence. In specific calculations, the embedding dimension m is usually taken as 2, and the similarity tolerance threshold r is taken as 0.1 to 0.25 times the standard deviation of the original electrocardiogram signal. This range is determined based on the statistical characteristics of physiological signals, effectively distinguishing noise from true dynamic characteristics while ensuring a sufficient number of matching pairs to maintain computational stability.

[0028] When the physiological signal time-series data is a blood pressure waveform signal, the reconstructed phase space is gridded, and the distribution probability of phase space trajectory points at different scales is calculated. Based on the logarithmic linear relationship between the distribution probability and the scale, the slope is fitted to calculate the correlation dimension feature. Gridding the reconstructed phase space means dividing the reconstructed high-dimensional phase space into several hypercube grids according to a certain scale. Each grid has a side length of r, and the grid covers the entire phase space region, used to statistically analyze the spatial distribution density of phase space trajectory points. Phase space trajectory points refer to a series of state points arranged in chronological order in the reconstructed phase space. The set of these points constitutes the trajectory of the system's dynamic evolution, reflecting the motion pattern of the blood pressure system in phase space. Calculating the distribution probability at different scales involves continuously changing the grid division scale r. For each r value, the number of trajectory points falling into each grid is counted, and the ratio of the sum of the number of point pairs in all grids to the total number of point pairs is calculated, yielding the correlation integral function C(r). This function describes the probability that the distance between any two points in the phase space is less than r.

[0029] The log-linear relationship between probability distribution and scale refers to the approximate linear relationship between lnC(r) and lnr within an appropriate scale range. This relationship stems from the self-similarity property of chaotic systems, meaning that the geometric structure of the system exhibits scale invariance within a certain scale range. The fitting slope refers to the slope of the straight line obtained by least-squares linear regression on the linear region of the lnC(r) and lnr curves; this slope is the correlation dimension characteristic. This value reflects the geometric complexity of the attractor in the blood pressure system. A larger correlation dimension indicates more degrees of freedom and more complex dynamic behavior. A non-integer correlation dimension indicates chaotic characteristics of the system. In specific calculations, the embedding dimension m needs to be gradually increased from a small value until the correlation dimension estimate converges. The time delay τ is usually determined using the autocorrelation function method or mutual information method. The grid scale r covers the interval from the minimum to the maximum distance between point pairs in phase space. The number of grid points, i.e., the number of points for r, is generally no less than 20 to ensure the reliability of the linear fit.

[0030] When the time-series physiological signal data is photoplethysmography (PPG) signal, the aforementioned sample entropy feature and correlation dimension feature are calculated separately, and then weighted and fused to generate a feature time series containing signal complexity information. PPG signals simultaneously contain periodic components caused by heartbeats and nonlinear components generated by physiological mechanisms such as autonomic nervous system regulation and peripheral circulation. A single complexity index is insufficient to fully characterize its multi-dimensional features; therefore, it is necessary to fuse two indices with different properties: sample entropy and correlation dimension. The specific method of weighted fusion is to linearly combine the sample entropy value and correlation dimension value calculated within the same time window. The fusion coefficient is set based on the distinguishing ability and stability of the two indices under different physiological states. The setting of the fusion coefficient is based on prior clinical knowledge and statistical analysis. Sample entropy is sensitive to the randomness and irregularity of the signal and can effectively capture subtle changes in heart rate variability, while correlation dimension is sensitive to the deterministic chaotic structure of the signal and can reflect the overall dynamic complexity of the cardiovascular system.

[0031] In resting state monitoring scenarios, the fusion coefficient of sample entropy and correlation dimension is typically set to 0.6 to 0.4, with a slightly higher weight for sample entropy to highlight sensitivity to fluctuations in autonomic nervous system regulation. In stress or exercise-load scenarios, the fusion coefficient is adjusted to 0.5 to 0.5, balancing the contributions of both indicators to comprehensively reflect the interplay between the sympathetic and parasympathetic nervous systems. In pathological screening scenarios, such as heart failure or atrial fibrillation detection, the fusion coefficient is set to 0.4 to 0.6, with a higher weight for the correlation dimension to highlight the reduced dynamic complexity caused by the disease. The fusion coefficient can also be optimized from labeled datasets using machine learning methods to maximize the discriminative power between different health states. Arranging the fusion feature values calculated over continuous time windows in chronological order generates a feature time series containing signal complexity information. This series comprehensively reflects the nonlinear dynamic characteristics of the photoplethysmography signal at different levels, providing input for the subsequent chaotic quantization module.

[0032] The chaos quantization module includes: The image chaos calculation unit, connected to the image feature extraction unit, receives image feature time series, performs phase space reconstruction on the image feature time series, and uses the Kolmogorov entropy calculation method to analyze the evolution rate of the phase trajectory in the reconstructed phase space, calculating the Kolmogorov entropy of the image mode as the chaos index of the image mode. Phase space reconstruction of the image feature time series refers to mapping a one-dimensional image feature time series to a high-dimensional phase space through time-delay embedding technology to reveal the dynamic evolution information hidden within it. Specifically, the embedding dimension m and the time delay parameter τ are selected to construct a vector sequence X(i)=[x(i), x(i+1), …, x(i+m-1)], where i ranges from 1 to N-m+1, N is the length of the original feature sequence, and each X(i) represents the position of the state point of the image feature system at a certain moment in the m-dimensional phase space.

[0033] The Kolmogorov entropy calculation method refers to quantifying the information generation rate of a system by analyzing the evolution of phase trajectories in the reconstructed phase space over time. The specific method for analyzing the evolution rate of phase trajectories in the reconstructed phase space is as follows: Arbitrarily select a pair of neighboring phase points with an initial distance less than a preset threshold in the reconstructed phase space. Track the two phase trajectories formed by these two phase points as the number of time steps increases, and record the number of evolution time steps required for the distance between them to evolve from initially being less than the threshold to exceeding the threshold. This number of evolution time steps reflects the divergence rate of the phase trajectories. Repeat the above process for a large number of phase point pairs to obtain the statistical distribution of the evolution time steps. Calculate the average of all evolution time steps, and take the natural logarithm of the reciprocal of this average to obtain an estimate of the Kolmogorov entropy. Calculating the Kolmogorov entropy of an image modality as a chaos index means using the calculation result as a quantitative indicator of the degree of chaos in the image feature time series. A positive and larger Kolmogorov entropy indicates a higher degree of chaos in the image feature system, a faster information generation rate, and lower predictability. A Kolmogorov entropy approaching zero indicates that the system tends towards periodic or deterministic motion, and a Kolmogorov entropy of zero indicates that the system is completely regular. In specific calculations, the embedding dimension m typically ranges from 5 to 10, the time delay τ is determined using the mutual information method to ensure reconstruction quality, the distance threshold is taken as 5% to 10% of the phase space diameter, and the number of phase point pairs sampled is no less than 500 pairs to ensure statistical stability. These parameter values are determined based on the sampling rate and dynamic characteristics of the image feature sequence, and can effectively capture the chaotic dynamic characteristics of the image modality.

[0034] The signal chaos calculation unit, connected to the physiological signal feature extraction unit, receives signal feature time series, performs phase space reconstruction on the signal feature time series, and calculates the maximum Lyapunov exponent of the signal feature time series using a small data volume method, which serves as the chaos exponent of the signal mode. Phase space reconstruction of the signal feature time series refers to mapping the one-dimensional signal feature time series to a high-dimensional phase space using a time-delay embedding technique. The specific method is the same as that of the image chaos calculation unit, selecting the embedding dimension m and the time delay parameter τ to construct a phase space vector sequence.

[0035] The method of calculating the maximum Lyapunov exponent using a small amount of data refers to estimating the system's sensitivity to initial conditions by using the evolutionary trajectories of neighboring phase pairs in the reconstructed phase space. The specific calculation steps are as follows: In the reconstructed phase space, for each phase point Y(i), find its nearest neighbor Y(j), requiring |ij| to be greater than the average period to avoid pseudo-neighbors that are too close in time; calculate the distance d_i(k) of each pair of nearest neighbors after k evolution steps, i.e., |Y(i+k)-Y(j+k)|; take the average of ln d_i(k) for all i to obtain the sequence y(k) of the average logarithmic distance changing with the number of evolution steps k.<ln d_i(k)> Least squares linear fitting is performed on the linear region of the y(k) curve, and the slope of the fitted line is the maximum Lyapunov exponent.

[0036] The maximum Lyapunov exponent for calculating the signal characteristic time series refers to using the slope obtained from the above fitting as a quantitative indicator of the degree of chaos in the signal characteristic time series. A positive maximum Lyapunov exponent indicates that the system has chaotic characteristics. The larger the value, the faster the divergence of adjacent phase trajectories, the more sensitive the system is to initial conditions, and the worse the long-term predictability. A negative maximum Lyapunov exponent indicates that the system tends to a stable fixed point or periodic motion. A maximum Lyapunov exponent of zero indicates that the system is in periodic or quasi-periodic motion. The small data volume method is suitable for medium-length data such as physiological signal characteristic sequences. It can make full use of limited data points to obtain a stable exponential estimate. In the specific calculation, the embedding dimension m needs to be gradually increased from a small value until the maximum Lyapunov exponent estimate converges. The time delay τ is determined by the autocorrelation function method or the average mutual information method. The average period is obtained by power spectral density analysis or peak detection. The linear fitting region is usually selected from one-third to one-half of the minimum to maximum value of k. This range can avoid the initial transient effects and ensure a sufficient number of fitting points.

[0037] The chaos index output unit, connected to both the image chaos calculation unit and the signal chaos calculation unit, synchronously outputs the Kolmogorov entropy of the image mode and the maximum Lyapunov exponent of the signal mode to the synchronization detection module. While both the Kolmogorov entropy of the image mode and the maximum Lyapunov exponent of the signal mode are quantitative indicators of chaos, they characterize the dynamics of the system from different perspectives. The Kolmogorov entropy reflects the system's information generation rate and unpredictability, while the maximum Lyapunov exponent reflects the system's sensitivity to initial conditions and trajectory divergence rate. Synchronous output means simultaneously transmitting both chaos indices to the next module at the same time reference, maintaining the correspondence with the original data acquisition timestamp during output. This allows the subsequent synchronization detection module to compare and analyze the chaos indices of different modes on a unified time coordinate. When the image feature extraction unit generates multiple image feature time series, the image chaos calculation unit calculates the Kolmogorov entropy corresponding to each image feature time series, forming a set of chaotic indices for the image mode. When the physiological signal feature extraction unit generates multiple signal feature time series, the signal chaos calculation unit calculates the maximum Lyapunov exponent corresponding to each signal feature time series, forming a set of chaotic indices for the signal mode. The chaos index output unit packages and outputs these two sets of chaotic indices together, along with the timestamp information and mode identification information corresponding to each chaotic index, providing a complete data foundation for multi-to-multi-modal comparisons in the synchronous detection module.

[0038] The synchronous detection module includes: The index normalization unit, connected to the chaotic index output unit, receives multiple Kolmogorov entropies corresponding to the time series of each image feature in the image modality set and multiple maximum Lyapunov exponents corresponding to the time series of each signal feature in the signal modality set. It then normalizes these exponents using the baseline values of their respective modalities, generating sets of image chaotic normalized values and signal chaotic normalized values. The baseline value refers to the reference level of each modality's chaotic index under stable health conditions. It can be obtained by calculating the mean from a period of abnormal-free monitoring data during the initial enrollment of the subject, or by using the statistical standard value of a healthy population of the same age and sex. The purpose of normalization is to eliminate the incomparability of chaotic indices across different modalities due to differences in dimensions, numerical ranges, and individual differences, enabling subsequent cross-modal comparisons to be conducted on a uniform scale. Normalization methods can be either dividing the current chaotic index by the corresponding baseline value to obtain a multiple relative to the baseline level, or dividing the difference between the current value and the baseline value by the baseline value to obtain the relative deviation. Both methods map the chaotic indices of different modalities to a comparable numerical range centered on the baseline.

[0039] The cross-modal pairing unit, connected to the exponential normalization unit, is used to pair each normalized value in the image chaotic normalized value set with each normalized value in the signal chaotic normalized value set, generating several cross-modal normalized value pairs. When the image modality contains multiple feature time series, such as simultaneously having image feature sequences of heart rate variability and image feature sequences of temperature fluctuation, there are multiple corresponding Kolmogorov entropies; similarly, the signal modality may also contain multiple feature time series, such as ECG sample entropy sequences and blood pressure correlation dimension sequences, and there are multiple corresponding maximum Lyapunov exponents. Pairing each value in the image set with each value in the signal set forms all possible image-signal combination pairs. For example, if the image has M values and the signal has N values, then M×N cross-modal pairs are generated. This full pairing method can comprehensively capture the synchronization relationship between different image features and different signal features, avoiding the omission of any specific combinations that may reflect abnormal coupling in the system.

[0040] The difference calculation unit, connected to the cross-modal pairing unit, calculates the absolute difference between the two normalized values in each cross-modal normalized value pair. It then performs a weighted average of all absolute differences to generate a chaotic synchronization difference that characterizes the overall synchronization state. The absolute difference reflects the degree of deviation in chaos between each pair of image and signal modes; a larger difference indicates greater asynchrony in the dynamic behavior of the two modes. When weighting the average of all absolute differences, the weight of each difference reflects the importance of the corresponding image and signal mode pair to the current health assessment task; these weights are determined beforehand using mutual information. The weighted average result is a single numerical value that integrates the synchronization information of all cross-modal pairs, enabling the quantification of the synchronization coordination of the chaotic level in the multimodal system at the overall level.

[0041] The time-series generation unit, connected to the difference calculation unit, is used to arrange multiple chaotic synchronization differences calculated within a continuous time window in chronological order to generate a chaotic synchronization difference sequence. The system repeats the process from feature extraction to chaotic synchronization difference calculation in a fixed time sliding window (e.g., every 5 seconds or every 30 seconds). Each window generates one chaotic synchronization difference. Connecting these differences in chronological order forms a time series. This series reflects the trajectory of the chaotic synchronization state of the subject's multimodal system evolving over time, providing a basis for subsequent dynamic threshold comparison and trend analysis.

[0042] The dynamic threshold setting unit acquires historical chaotic synchronization difference sequence data of subjects in a healthy state, calculates the mean and standard deviation of this historical data, and sets the sum of the mean and the standard deviation of a preset multiple as the dynamic threshold, which is updated in real time. Historical data of subjects in a healthy state can come from the initial calibration phase of system deployment, or from monitoring data marked as normal over a past period. The mean reflects the average level of chaotic synchronization difference for an individual in a healthy state, and the standard deviation reflects its normal fluctuation range. The preset multiple is set according to the sensitivity and specificity requirements of clinical monitoring. Based on existing statistical theory, if the chaotic synchronization difference approximately follows a normal distribution, the mean plus two times the standard deviation covers approximately 95% of the normal fluctuation range, and the mean plus three times the standard deviation covers approximately 99.7% of the normal fluctuation range. Therefore, the preset multiple is usually between 2 and 3. A smaller multiple makes the system more sensitive to abnormalities but may increase false positives, while a larger multiple increases specificity but may reduce the ability to detect early abnormalities. The threshold is updated in real time as historical data on the health status of the subjects are accumulated. When new normal data is added or the health status of the subjects changes over a long period of time, the mean and standard deviation are adjusted accordingly, and the threshold automatically adapts to changes in individual physiological characteristics.

[0043] The comparison output unit, connected to both the timing generation unit and the threshold dynamic setting unit, compares the current value in the chaotic synchronization difference sequence with the dynamic threshold. When the current value exceeds the dynamic threshold, a trigger signal is output to the dynamic evaluation module. The current value is the chaotic synchronization difference calculated in the latest time window. If this value is greater than the dynamic threshold, it indicates that the chaotic synchronization state of the multimodal system has significantly deviated from the normal fluctuation range of the individual, which may indicate coupling disorder of the physiological system or potential health risks. At this time, a trigger signal is output to activate the crisis warning mode; if the threshold is not exceeded, the steady-state monitoring mode continues to be maintained.

[0044] The weights of the weighted average are preset in the following way: Obtain a standard dataset corresponding to the current health assessment task. This standard dataset contains a set of multimodal chaotic normalized values from historical subjects, along with corresponding real-world health status labels. The standard dataset is representative population data pre-collected before system deployment or for specific clinical tasks. The set of multimodal chaotic normalized values refers to the image and signal chaotic normalized values calculated from historical subject data, including the normalization results for each image and signal modality. The real-world health status labels refer to the subject's health status classification determined by gold standard diagnostic methods or clinical expert assessment, such as normal, hypertension risk, arrhythmia risk, and heart failure. This dataset provides quantifiable learning samples for weight calculation, enabling the weights to reflect the discriminative ability of different modalities for different health states.

[0045] The mutual information values between the chaotic normalized value and the health status label for each image modality, and between the chaotic normalized value and the health status label for each signal modality, are calculated separately. Mutual information, in information theory, measures the degree of interdependence between two variables. Unlike linear correlation coefficients, which only capture linear relationships, mutual information can capture any type of statistical dependency, including nonlinear and non-monotonic correlations. This is particularly important for the complex and often nonlinear association between physiological signals and health status. The mutual information value is calculated based on the relative entropy between the joint probability distribution and the marginal probability distribution of the two variables. Specifically, the chaotic normalized value of the image modality is taken as a continuous random variable X, and the health status label is taken as a discrete random variable Y. First, X is discretized or its probability density function is obtained by using the kernel density estimation method. The conditional distribution of X under different health states is statistically analyzed, and the joint probability distribution p(x,y) of X and Y and their respective marginal probability distributions p(x) and p(y) are calculated. The mutual information value is defined as the relative entropy between the product of the joint probability distribution and the marginal probability distribution, that is, the weighted average of the logarithm of the ratio of p(x,y) to p(x)p(y). The larger the value, the stronger the correlation between the image modality and the health status label, that is, the greater the contribution of the modality to distinguishing different health states. Similarly, the above process is repeated for each signal modality to obtain the mutual information value of all signal modalities.

[0046] The weight of each image mode is obtained by dividing its mutual information value by the sum of the mutual information values of all image modes, ensuring that the sum of the weights of all image modes is 1. Similarly, the weight of each signal mode is obtained by dividing its mutual information value by the sum of the mutual information values of all signal modes, ensuring that the sum of the weights of all signal modes is 1. This normalization process makes the weights relative importance indicators, eliminating the influence of the absolute magnitude of the mutual information values of different modes, ensuring the comparability of weights between image modes and between signal modes, and laying the foundation for subsequent cross-modal weight normalization by ensuring that the sum of the weights is 1.

[0047] For each cross-modal pair consisting of an image modality and a signal modality, the weight of the image modality is multiplied by the weight of the signal modality to obtain the weight of the cross-modal pair. The sum of the weights of all cross-modal pairs is 1. The mathematical basis of this product form is that the contribution of a cross-modal pair consisting of an image modality and a signal modality to the overall synchronization state should be determined by the importance of both the image modality and the signal modality. When both modalities are important to the health assessment task, their combination is also important to the task; when one modality is not important, the weight of the cross-modal pair decreases accordingly. The product form also ensures that the sum of the weights of all cross-modal pairs is 1, thus satisfying the probability normalization requirement and ensuring that the subsequent weighted average result is within a reasonable range. This weight allocation method implicitly assumes that the importance of the image modality and the signal modality to the task is independent of each other. This is usually a reasonable assumption in physiological multimodal systems because image features and signal features originate from different physical measurement principles, and their correlation mechanisms with health status are relatively independent.

[0048] The calculated weights for each cross-modal pair are stored for use by the difference calculation unit during weighted averaging. These weights are pre-calculated and stored in the system memory, and are directly read and used by the difference calculation unit during real-time monitoring, avoiding the computational latency and resource consumption caused by online calculation of mutual information values and weights. When the health assessment task changes, such as switching from cardiovascular risk assessment to neurological function assessment, simply replace the corresponding standard dataset and re-execute the above weight calculation process to update the stored cross-modal pair weights, thus enabling the system to quickly adapt to different clinical tasks.

[0049] After receiving the trigger signal, the dynamic evaluation module adjusts the evaluation time window as follows: The threshold amplitude is calculated as the ratio of the current chaotic synchronization difference to the dynamic threshold minus one. This threshold amplitude quantifies the degree to which the chaotic synchronization state of the multimodal system deviates from the normal fluctuation range of an individual. A larger value indicates more severe synchronization disorder, a higher degree of deviation from steady state in the physiological system, and a greater potential risk of crisis. For example, a threshold amplitude of 0.5 indicates that the current chaotic synchronization difference is 1.5 times the dynamic threshold, and a threshold amplitude of 1.0 indicates that the current chaotic synchronization difference is 2 times the dynamic threshold. As one of the core bases for adjusting the time window, the threshold amplitude ensures that the assessment system can dynamically adjust the monitoring precision according to the severity of the anomaly; the more severe the anomaly, the shorter the monitoring time window to capture rapidly evolving physiological changes.

[0050] Obtain the first derivative of the current chaotic synchronization difference sequence over the most recent preset time period as the rate of change of the difference. The first derivative reflects the trend and speed of change of the chaotic synchronization difference over time; a positive value indicates that the synchronization disorder is worsening, while a negative value indicates that the synchronization state is recovering. The absolute value of the derivative indicates the speed of change. The rate of change of the difference is obtained by calculating the first derivative of the chaotic synchronization difference sequence over a preset time period, such as 30 seconds or 60 seconds, before the current time. The slope of the central difference method or least squares linear fitting can be used as an estimate. This rate index can capture the dynamic evolution trend of synchronization disorder. Even if the current exceedance threshold has not yet reached a high level, a large rate of change of the difference indicates that the system is rapidly deteriorating, and the time window needs to be shortened for close monitoring.

[0051] Based on the over-threshold amplitude and the rate of change of the difference, an exponent with the natural constant e as the base is calculated. The exponent is the sum of the over-threshold amplitude multiplied by the first adjustment coefficient and the rate of change of the difference multiplied by the second adjustment coefficient. This exponent is used as the adjustment coefficient. Using an exponential function ensures that the adjustment coefficient monotonically increases with both the over-threshold amplitude and the rate of change of the difference and is always greater than zero. It also exhibits smooth nonlinear characteristics; when the over-threshold amplitude and the rate of change of the difference are small, the adjustment coefficient approaches 1; when both are large, the adjustment coefficient increases rapidly, thus achieving a fast compression response within the time window. The first adjustment coefficient controls the contribution weight of the over-threshold amplitude to the adjustment coefficient, and the second adjustment coefficient controls the contribution weight of the rate of change of the difference to the adjustment coefficient.

[0052] The setting of the first and second adjustment coefficients is based on a balance between the sensitivity and stability of the physiological monitoring system. In existing technologies, the adjustment coefficients are typically set according to the following criteria: First, based on the acceptable range of false positive rates in clinical monitoring, simulations are performed using monitoring data from subjects' normal workdays. The frequency of time windows being mistakenly shortened due to random fluctuations in the absence of abnormalities is statistically analyzed. The first adjustment coefficient is set in the range of 0.2 to 0.5, so that for every 0.5 increase in the threshold amplitude, the adjustment coefficient increases by approximately 10% to 30%. The second adjustment coefficient is set in the range of 0.1 to 0.3, so that for every 0.1 increase in the rate of change of the difference, the adjustment coefficient increases by approximately 1% to 3%. Secondly, the adjustment coefficients can be fine-tuned according to the clinical department's requirements for timely warnings. For emergency departments or intensive care units requiring more sensitive responses, both coefficients can be appropriately increased; for general wards or home monitoring, they can be appropriately decreased to reduce unnecessary mode switching. In addition, the adjustment coefficient can be optimized through retrospective analysis of historical abnormal events. By selecting monitoring data of clinical events that have occurred, and aiming to provide the earliest possible warning before the event occurs, the adjustment coefficient can be optimized by grid search. This ensures that the adjusted time window is neither switched too early, resulting in wasted resources, nor switched too late, missing the best intervention opportunity.

[0053] The intermediate time window length is obtained by dividing the base time window length in steady-state monitoring mode by an adjustment factor. The base time window length is the time window length used by the system in normal steady-state monitoring mode to calculate chaotic synchronization differences. Its setting is based on the minimum sampling rate requirements of physiological signals and the temporal resolution requirements of clinical monitoring. For ECG signals and photoplethysmography (PPG) signals, heart rate variability analysis typically requires at least 30 to 60 seconds of data to obtain stable time and frequency domain indicators; for blood pressure waveform signals, reliable estimation of blood pressure variability typically requires at least 1 minute of data; for image feature sequences, heart rate extraction based on PPG principles also requires at least 30 seconds of data to ensure the accuracy of peak detection. Considering the minimum requirements of each modality, the base time window length is usually set to 60 seconds. This length ensures the stability of feature extraction for each modality while meeting the temporal resolution requirements of daily health monitoring. When the adjustment coefficient is greater than 1, the time window obtained by dividing the basic time window length by the adjustment coefficient is shortened, and the monitoring is more precise. When the adjustment coefficient is less than 1, the time window is extended. However, since the threshold amplitude and the rate of change of the difference are non-negative, the adjustment coefficient is always greater than or equal to 1. Therefore, in practical applications, the length of the intermediate time window is always less than or equal to the length of the basic time window, which reflects the unidirectional adjustment logic of the time window only shortening and not extending under the crisis early warning mode.

[0054] The intermediate time window length is compared with the preset minimum time window length, and the larger of the two is taken as the final adjusted evaluation time window length. The minimum time window length is the lower limit of the minimum evaluation time window allowed by the system, and its setting is based on the minimum number of data points required for multimodal analysis and the non-stationary characteristics of physiological signals. From a signal processing perspective, any feature extraction algorithm needs sufficient data points to ensure statistical stability. For example, sample entropy calculation usually requires a data length of at least 10 to the power of m, which requires at least 100 to 1000 data points when the embedding dimension m is 2 to 3. From a physiological perspective, the analysis of heart rate variability of ECG signals requires at least 30 cardiac cycles to obtain meaningful frequency domain indicators, which is equivalent to 30 seconds of data for a normal person with a heart rate of 60 beats per minute. Considering the minimum amount of data required for feature extraction of each modality and the response speed requirements of real-time monitoring, the minimum time window length is usually set to 10 to 30 seconds. Furthermore, the minimum time window length must also consider the system's computational overhead and resource consumption. An excessively short time window will lead to frequent execution of feature extraction and chaos index calculation, increasing processor load and energy consumption, which is particularly important for wearable devices. By introducing a minimum time window lower limit protection, it is ensured that even under extreme abnormal conditions, the evaluation window can still collect sufficient data for the fusion algorithm analysis, avoiding evaluation failure or algorithm instability due to insufficient data.

[0055] The preset fusion algorithms include a fast fusion algorithm based on logistic regression, a balanced fusion algorithm based on support vector machines, and a precise fusion algorithm based on deep neural networks. Based on the preset interval where the threshold amplitude is located, a corresponding fusion algorithm is selected from multiple pre-stored fusion algorithms to perform comprehensive analysis on the multimodal data within the current time window, and the health status assessment result obtained from the comprehensive analysis is output.

[0056] The fast fusion algorithm based on logistic regression is a mature linear classification model in the field of machine learning. Its basic form involves linearly weighting and summing the input multimodal feature vectors with weight parameters, then mapping the sum to a probability value between 0 and 1 using the logistic function, representing the probability that a subject is in a certain health state. This algorithm is trained before system deployment, using a large amount of labeled historical multimodal data to optimize the weight parameters by maximizing the likelihood function. After training, the model parameters are stored in the system memory. Logistic regression has low computational complexity; forward propagation only involves matrix multiplication and a nonlinear activation function. It is the fastest algorithm and suitable for use when the exceedance threshold is low, i.e., the anomaly level is mild. It can quickly provide evaluation results while maintaining basic accuracy, reducing system resource consumption and response latency.

[0057] The Support Vector Machine (SVM)-based balanced fusion algorithm is a classification algorithm based on statistical learning theory. Its core idea is to map input features to a high-dimensional feature space using a kernel function, and then find the optimal classification hyperplane in this high-dimensional space that maximizes the class margin. The kernel function used in this algorithm is typically a radial basis function (RBF), which can handle the nonlinear relationship between features and health status. During the training phase, the SVM obtains support vectors and decision function parameters by solving a convex quadratic programming problem. After training, the model is stored as a set of support vectors and their corresponding coefficients. In the prediction phase, the kernel function values of the input features and each support vector are calculated and weighted to obtain the classification decision value. This algorithm achieves a good balance between computational speed and classification accuracy. It has stronger nonlinear fitting ability than logistic regression and lower computational cost than deep neural networks. It is suitable for use when the exceedance threshold is moderate, i.e., the anomaly level is relatively clear, providing more accurate discrimination ability than fast algorithms while maintaining acceptable computational cost.

[0058] The precise fusion algorithm based on deep neural networks is an artificial neural network model containing multiple hidden layers, typically employing a multilayer perceptron structure. Each layer consists of several neurons, and layers are connected by weights and nonlinear activation functions. This algorithm can automatically learn high-order nonlinear interactions between multimodal features, exhibiting the strongest fitting ability and the highest classification accuracy. During the training phase, the deep neural network optimizes network weights through backpropagation and a large amount of labeled data. The training process may involve complex parameter tuning and regularization techniques. After training, the network structure and weight parameters are permanently stored. During forward computation, input features propagate layer by layer, outputting the probability of a healthy state after multiple levels of nonlinear transformations. This algorithm has the highest computational complexity, requiring significant processor resources and energy. It is suitable for use when the exceedance of threshold values is high, indicating suspected serious abnormalities. In such cases, the system needs to make the judgment with the highest accuracy, even at the cost of some computational speed and energy consumption, to ensure the reliability of the evaluation results and avoid the clinical risks of false negatives or false negatives.

[0059] The specific method for selecting the corresponding fusion algorithm based on the preset interval of the exceedance threshold amplitude is as follows: two threshold limits are pre-defined to divide the exceedance threshold amplitude into three continuous numerical intervals, with each interval corresponding to a fusion algorithm. The threshold limits are set based on the performance evaluation of the three algorithms under different abnormality severity levels, determined through cross-validation or clinical data backtesting analysis. For example, based on receiver operating characteristic (ROC) curve analysis and using the maximization of the Youden index as the criterion, when the exceedance threshold amplitude is below the first limit, the accuracy difference between the logistic regression algorithm and the support vector machine algorithm is less than the preset tolerance, while the logistic regression algorithm is several times faster. Therefore, the first limit is used as the upper limit of the low amplitude interval. When the exceedance threshold amplitude is between the first and second limits, the support vector machine algorithm significantly improves the accuracy of the logistic regression algorithm and is not much different from the deep neural network algorithm. Therefore, the second limit is used as the upper limit of the medium amplitude interval. When the exceedance threshold amplitude exceeds the second limit, the accuracy of the deep neural network algorithm is significantly higher than that of the support vector machine algorithm. At this time, the precise fusion algorithm is selected to ensure the highest discrimination ability. The specific values of the first and second limits can be adjusted according to different clinical application scenarios. For example, in intensive care scenarios, the limit values can be appropriately lowered to enable high-precision algorithms earlier, while in general screening scenarios, the limit values can be appropriately increased to prioritize operational efficiency.

[0060] In actual operation, after obtaining the current threshold amplitude value, the dynamic assessment module first determines which preset interval the value falls into. Then, it retrieves the corresponding algorithm's model file and parameters from the system memory, organizes the multimodal feature time series data within the adjusted time window into the input format required by the algorithm, runs the algorithm's forward calculation process, and obtains the health status assessment result. The format of the assessment result is set according to the specific task; it can be binary (e.g., normal or abnormal), multi-category (e.g., normal, mildly abnormal, moderately abnormal, critically ill), or a continuous risk score (e.g., a risk index from 0 to 100). Finally, the result is output to a display terminal, mobile application, or medical information system for reference by clinical medical staff or the subject. Through this algorithm-level invocation mechanism, the system can adaptively balance assessment accuracy and computational efficiency under different degrees of abnormality severity, achieving optimal resource allocation.

[0061] The above-mentioned models or function formulas are all dimensionless and numerical calculations. The models or function formulas are obtained by software simulation based on a large amount of collected data to obtain the most recent real situation. The preset parameters in the models or function formulas are set by those skilled in the art according to the actual situation.

[0062] It should be understood that in the various embodiments of this application, the order of the above-mentioned processes does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0063] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0064] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0065] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A health status assessment system based on multi-modal image feature analysis, characterized in that, The method comprises the following steps: a multi-modal acquisition module for synchronously acquiring multi-modal image data and corresponding physiological signal time series data of a subject; a feature extraction module for extracting image features from the multi-modal image data and signal features from the physiological signal time series data, and generating feature time series of each modality; a chaotic quantification module for calculating a chaotic index of each modality feature time series to quantify the chaotic degree of each modality; a synchronous detection module for calculating a real-time difference between the chaotic indexes of each modality, and comparing the real-time difference with a preset dynamic threshold; a dynamic evaluation module for switching an evaluation paradigm from a steady-state monitoring mode to a crisis warning mode when the real-time difference exceeds the dynamic threshold, adjusting an evaluation time window, and calling a corresponding preset fusion algorithm to comprehensively analyze the current multi-modal data, and generating a health status evaluation result and outputting the same. 2.The health state evaluation system based on multi-modal image feature analysis of claim 1, wherein, The multi-modal acquisition module comprises: an image acquisition unit for acquiring at least one of a facial video image, an infrared thermal imaging image, or a dermoscope image of a subject; a physiological signal acquisition unit for synchronously acquiring at least one of an electrocardiogram signal, a blood pressure waveform signal, or a photoplethysmogram signal corresponding to the image data; a synchronous control unit connected with the image acquisition unit and the physiological signal acquisition unit, respectively, for generating a unified time series reference signal to ensure that the multi-modal image data and the physiological signal time series data are strictly aligned on a time axis. 3.The health status evaluation system based on multi-modal image feature analysis of claim 2, characterized in that, The feature extraction module comprises: an image feature extraction unit for performing spatiotemporal domain analysis on the multi-modal image data acquired by the image acquisition unit, extracting time series image features including blood flow pulse features, skin texture features, or thermal distribution features, and generating image feature time series; a physiological signal feature extraction unit for performing nonlinear dynamics analysis on the physiological signal time series data acquired by the physiological signal acquisition unit, extracting deep signal features including sample entropy features, correlation dimension features, or Lyapunov exponent features, and generating signal feature time series. 4.The health state evaluation system based on multi-modal image feature analysis of claim 3, characterized in that, The spatiotemporal domain analysis refers to: for a facial video image, extracting a blood flow pulse signal based on the photoplethysmogram principle that the mean value of the pixels in the facial region of interest changes over time, and generating image feature time series containing heart rate and heart rate variability; for an infrared thermal imaging image, extracting a thermal distribution signal that the mean temperature of the region of interest changes over time, and generating thermal image feature time series containing temperature fluctuation features; for a dermoscope image, extracting a signal that the gray level co-occurrence matrix feature parameters of the skin texture in consecutive frames change over time, and generating skin image feature time series containing texture evolution features. 5.The health status evaluation system based on multi-modal image feature analysis of claim 3, wherein, The specific way of the nonlinear dynamics analysis method is: when the physiological signal time series data is an electrocardiogram signal, the distribution probability of the distance between vector points in the reconstructed phase space is calculated, and the sample entropy feature is calculated based on the cumulative sum of the distribution probability; when the physiological signal time series data is a blood pressure waveform signal, the reconstructed phase space is divided into grids, the distribution probability of the trajectory points in the phase space at different scales is calculated, and the correlation dimension feature is calculated based on the logarithmic linear relationship between the distribution probability and the scale. When the physiological signal time series data is a photoplethysmography signal, the sample entropy feature and the correlation dimension feature are calculated respectively, and the two are weighted and fused to generate a feature time series containing signal complexity information. 6.The health status evaluation system based on multi-modal image feature analysis according to claim 5, characterized in that, The chaotic quantization module comprises: The image chaotic calculation unit is connected with the image feature extraction unit and is configured to receive the image feature time series, reconstruct a phase space for the image feature time series, analyze an evolution rate of a phase trajectory in the reconstructed phase space by using a Kolmogorov entropy calculation method, and calculate a Kolmogorov entropy of the image mode as a chaotic index of the image mode. The signal chaotic calculation unit is connected with the physiological signal feature extraction unit and is configured to receive the signal feature time series, reconstruct a phase space for the signal feature time series, and calculate a maximum Lyapunov exponent of the signal feature time series by using a small-data method as a chaotic index of the signal mode. The chaotic index output unit is connected with the image chaotic calculation unit and the signal chaotic calculation unit respectively and is configured to synchronously output the Kolmogorov entropy of the image mode and the maximum Lyapunov exponent of the signal mode to the synchronous detection module. 7.The health status evaluation system based on multi-modal image feature analysis of claim 6, wherein, The synchronous detection module comprises: The index normalization unit is connected with the chaotic index output unit and is configured to receive a plurality of Kolmogorov entropies corresponding to each image feature time series in the image mode set and a plurality of maximum Lyapunov exponents corresponding to each signal feature time series in the signal mode set, and perform normalization processing on the plurality of Kolmogorov entropies and the plurality of maximum Lyapunov exponents respectively by using baseline values of the respective modes to generate an image chaotic normalized value set and a signal chaotic normalized value set. The cross-mode pairing unit is connected with the index normalization unit and is configured to pair each normalized value in the image chaotic normalized value set with each normalized value in the signal chaotic normalized value set to generate a plurality of cross-mode normalized value pairs. The difference value calculation unit is connected with the cross-mode pairing unit and is configured to calculate an absolute difference value between two normalized values in each cross-mode normalized value pair and perform weighted averaging on all the absolute difference values to generate a chaotic synchronization difference value representing an overall synchronization state. The time series generation unit is connected with the difference value calculation unit and is configured to arrange a plurality of chaotic synchronization difference values calculated in a continuous time window in chronological order to generate a chaotic synchronization difference value sequence. The threshold dynamic setting unit is configured to obtain historical chaotic synchronization difference value sequence data of a subject in a healthy state, calculate a mean value and a standard deviation of the historical data, set a sum of the mean value and a preset multiple of the standard deviation as a dynamic threshold value, and update the dynamic threshold value in real time. The comparison output unit is connected with the time series generation unit and the threshold dynamic setting unit respectively and is configured to compare a current value in the chaotic synchronization difference value sequence with the dynamic threshold value, and output a trigger signal to the dynamic evaluation module when the current value exceeds the dynamic threshold value. 8.The health status evaluation system based on multi-modal image feature analysis of claim 7, wherein, The weight of the weighted averaging is preset in the following manner: A standard data set corresponding to a current health assessment task is obtained, and the standard data set contains a multi-modal chaotic normalized value set of a historical subject and a corresponding real health status label. calculate the mutual information value between the normalized chaos value of each image modality and the health status label, and the mutual information value between the normalized chaos value of each signal modality and the health status label, respectively; divide the mutual information value of each image modality by the sum of the mutual information values of all image modalities to obtain the weight of the image modality, so that the sum of the weights of all image modalities is 1; divide the mutual information value of each signal modality by the sum of the mutual information values of all signal modalities to obtain the weight of the signal modality, so that the sum of the weights of all signal modalities is 1; for each cross-modal pair composed of an image modality and a signal modality, multiply the weight of the image modality by the weight of the signal modality to obtain the weight of the cross-modal pair, and the sum of the weights of all cross-modal pairs is 1; store the calculated weights of each cross-modal pair for calling by the difference calculation unit when performing weighted average. 9.The health status evaluation system based on multi-modal image feature analysis of claim 8, wherein, After the dynamic evaluation module receives the trigger signal, the evaluation time window is adjusted in the following manner: obtain the super-threshold amplitude of the chaos synchronization difference value at the current time that exceeds the dynamic threshold, the super-threshold amplitude being the ratio of the current chaos synchronization difference value to the dynamic threshold minus one; obtain the first derivative of the current chaos synchronization difference value sequence within the latest preset time length as the difference value change rate; calculate the exponential power with the natural constant e as the base according to the super-threshold amplitude and the difference value change rate, the index of the exponential power being composed of the product of the super-threshold amplitude and the first adjustment coefficient and the product of the difference value change rate and the second adjustment coefficient, and the exponential power being taken as the adjustment coefficient; divide the base time window length in the steady-state monitoring mode by the adjustment coefficient to obtain the intermediate time window length; compare the intermediate time window length with the preset minimum time window length, and take the larger one as the final adjusted evaluation time window length.