An artificial intelligence-based digital evaluation method, system, medium and product

By constructing an individualized ritual model and combining real-time behavioral data and environmental data, the problem of insufficient behavioral context recognition in existing technologies has been solved, and a highly reliable assessment of patients with neuropsychiatric disorders has been achieved.

CN122201747APending Publication Date: 2026-06-12SUZHOU CHENLING INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUZHOU CHENLING INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-01-28
Publication Date
2026-06-12

Smart Images

  • Figure CN122201747A_ABST
    Figure CN122201747A_ABST
Patent Text Reader

Abstract

An artificial intelligence-based digital evaluation method, system, medium and product, wherein the method comprises: obtaining historical action sequence data of a target patient interacting with a target object; constructing an individualized ritual model of the target patient; when real-time environmental data meets a core trigger context, extracting a real-time action sequence from a real-time behavior data stream; calculating the behavior similarity of the real-time action sequence and the core ritual behavior; when the behavior similarity is less than or equal to a preset similarity threshold, determining that an individual ritual interruption event occurs in the target patient, and generating a context activation signal; generating a coupling feature vector of the target patient; inputting the coupling feature vector into a preset emotional stress assessment model to obtain an evaluation result of the target patient. The present application can improve the reliability of digital evaluation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of digital evaluation technology, specifically to a digital evaluation method, system, medium, and product based on artificial intelligence. Background Technology

[0002] In the auxiliary diagnosis and intervention assessment of neuropsychiatric disorders and mental disorders, the dynamic monitoring and analysis of individual behavioral characteristics and physiological parameters has become an important research direction in clinical research and digital therapy. Especially for patient groups characterized by stereotyped behaviors and impaired emotional stress expression, such as those with autism spectrum disorder (ASD), how to use digital means to achieve early identification and real-time assessment of abnormal behavioral states has become a crucial technical approach to improve intervention effectiveness and personalized treatment response.

[0003] In existing technologies, emotional states or behavioral patterns are identified and classified by collecting patients' motor behavior data or physiological data, such as heart rate, electrodermatology, and electromyography signals, and combining them with machine learning methods. However, both behavior classification based on kinematic data and emotion recognition based on physiological signals typically operate in a context-blind mode. For example, a physiological monitoring system might label a sudden increase in heart rate as an emotional stress event. But the system cannot distinguish whether this increase in heart rate is due to a conflict between the patient and another person, or simply because of a joyful running or jumping activity. Similarly, a behavior recognition system can identify the occurrence of stereotyped behaviors, but cannot determine whether the behavior is a steady rhythm in a calm state or a disordered acceleration in a stress state. This lack of key behavioral context (e.g., whether a planned routine behavior was successfully completed or unexpectedly interrupted) prevents the system from effectively associating physiological and behavioral changes with meaningful external or internal events, leading to a large number of non-specific fluctuations being misjudged as abnormal and reducing the reliability of digital assessments. Summary of the Invention

[0004] This application provides an artificial intelligence-based digital evaluation method, system, medium, and product to address the technical problem of missing key behavioral contexts, which prevents the system from effectively associating physiological and behavioral changes with meaningful external or internal events, resulting in a large number of non-specific fluctuations being misjudged as abnormal, thereby improving the reliability of digital evaluation.

[0005] The first aspect of this application provides an artificial intelligence-based digital evaluation method, which includes: Acquire historical action sequence data of the target patient's interaction with the target object; Unsupervised temporal modeling is performed on the historical action sequence data to obtain the individualized ritual model of the target patient. The individualized ritual model is used to characterize the ritualized behavior pattern of the target patient. The individualized ritual model includes core ritual behaviors and core triggering contexts corresponding to the core ritual behaviors. The system acquires real-time behavioral data streams of the target patient and monitors the real-time environmental data of the target patient. When the real-time environmental data meets the core triggering scenario, the system extracts real-time action sequences from the real-time behavioral data streams. Calculate the behavioral similarity between the real-time action sequence and the core ritual behavior; When the behavioral similarity is less than or equal to a preset similarity threshold, it is determined that the target patient has experienced an individual ritual interruption event, and a context activation signal is generated; Based on the context activation signal, a coupling feature vector of the target patient is generated. The coupling feature vector is used to characterize the behavioral-physiological joint state of the target patient after the occurrence of the individual ritual interruption event. The coupled feature vector is input into a preset emotional stress assessment model to obtain the assessment results of the target patient. The assessment results are used to characterize the emotional stress level of the target patient.

[0006] Optionally, unsupervised temporal modeling is performed on the historical action sequence data to obtain an individualized ritual model for the target patient, specifically including: Extract the duration of actions and the duration of action intervals from the historical action sequence data, and construct the temporal dimension behavioral features of the target patient based on the duration of actions and the duration of action intervals; Based on the time-dimensional behavioral characteristics, multiple repetitive action combinations of the target patient are identified, and the occurrence frequency and time interval distribution entropy value of each repetitive action combination are calculated. When the occurrence frequency of the repetitive action combination is greater than a preset frequency threshold and the time interval distribution entropy value is less than a preset entropy threshold, the repetitive action combination is identified as a candidate ritual behavior. The candidate ritual behaviors are screened for environmental triggers to obtain the individualized ritual model.

[0007] Optionally, the candidate ritual behaviors are screened for environmental triggers to obtain the individualized ritual model, specifically including: Obtain multiple sets of environmental triggers associated with each of the candidate ritual behaviors. The environmental triggers include multiple environmental state information of the target patient's environment within a preset time window before the candidate ritual behavior occurs. One candidate ritual behavior corresponds to multiple sets of environmental triggers. Calculate the similarity among multiple sets of environmental triggers for the candidate ritual behavior to obtain a contextual consistency score; When the context consistency score is greater than the preset consistency threshold, the candidate ritual behavior is determined as the core ritual behavior; In the set of multiple environmental triggering factors corresponding to the core ritual behavior, multiple target environmental features are extracted to obtain the core triggering situation. The target environmental features are environmental features that appear more frequently than a preset frequency threshold in the set of multiple environmental triggering factors. The core ritual behavior and the core triggering context are combined to form the individualized ritual model.

[0008] Optionally, calculating the behavioral similarity between the real-time action sequence and the core ritual behavior specifically includes: The core ritual behavior is decomposed into at least one anchor action subsequence and at least one transition action subsequence. The anchor action subsequence is an action segment common to all target candidate ritual behaviors. The target candidate ritual behavior is any one of the core ritual behaviors. The transition action subsequence is an action segment in the core ritual behavior other than the anchor action subsequence that connects different anchor action subsequences. The behavior similarity is calculated based on the real-time action sequence, the anchor action subsequence, and the transition action subsequence.

[0009] Optionally, the behavior similarity is calculated based on the real-time action sequence, the anchor action sub-sequence, and the transition action sub-sequence, specifically including: The real-time action sequence is matched and analyzed with the anchor action sub-sequence in the core ritual behavior to obtain the anchor completion degree. The anchor completion degree is used to characterize the degree to which the real-time action sequence completes the anchor action sub-sequence in the core ritual behavior. When the completion degree of the anchor point is greater than the preset completion degree threshold, the transition behavior deviation degree between the transition action subsequence in the core ritual behavior and the real-time action sequence is calculated. The transition behavior deviation degree is used to characterize the degree of difference between the transition actions in the real-time action sequence and the transition action subsequence in the core ritual behavior. The behavior similarity is obtained by using a preset function based on the anchor point completion degree and the transition behavior deviation degree.

[0010] Optionally, generating the coupling feature vector of the target patient based on the context activation signal specifically includes: Based on the interruption type, the data sampling frequency of the target patient is determined. Taking the timestamp as the starting point, kinematic feature sequences and physiological index sequences are acquired within a preset duration at the data sampling frequency. The kinematic feature sequences are used to characterize the stereotyped behavioral characteristics of the target patient, and the physiological index sequences are used to characterize the physiological state characteristics of the target patient. Signal processing is performed on the kinematic feature sequence to obtain the rhythmic stability characteristics of the kinematic feature sequence, and trend analysis is performed on the physiological index sequence to obtain the dynamic rate of change characteristics of the physiological index sequence. The rhythm stability feature and the dynamic rate of change feature are combined into a coupled feature vector.

[0011] Optionally, the interruption types include core action omission interruptions and transitional behavior abnormality interruptions. The determination of the data sampling frequency for the target patient based on the interruption type specifically includes: When the interrupt type is the core action omission type interrupt, the data sampling frequency is set to the first preset frequency; When the interrupt type is the transition behavior exception interrupt, the data sampling frequency is set to a second preset frequency, where the first preset frequency is greater than the second preset frequency.

[0012] Secondly, embodiments of this application provide an artificial intelligence-based digital evaluation system, which includes one or more processors and a memory; the memory is coupled to the one or more processors and is used to store computer program code, which includes computer instructions, and the one or more processors call the computer instructions to cause the artificial intelligence-based digital evaluation system to perform the method described in the first aspect and any possible implementation thereof.

[0013] Thirdly, embodiments of this application provide a computer-readable storage medium including instructions that, when executed on an AI-based digital evaluation system, cause the AI-based digital evaluation system to perform the method described in the first aspect and any possible implementation thereof.

[0014] Fourthly, embodiments of this application provide a computer program product containing instructions that, when the computer program product is run on an artificial intelligence-based digital evaluation system, cause the artificial intelligence-based digital evaluation system to perform the method described in the first aspect and any possible implementation thereof.

[0015] In summary, one or more technical solutions provided in this application have at least the following technical effects or advantages: 1. By acquiring historical action sequence data of the target patient's interaction with the target object, and constructing an individualized ritual model using unsupervised temporal modeling, the system can autonomously extract the patient's unique core ritual behaviors and their triggering contexts, thereby establishing a personalized behavioral baseline. Further, by combining real-time behavioral data streams and environmental data, when a core context trigger is detected, the system extracts the current action sequence and calculates its similarity with the core ritual behavior. When the similarity is below a preset threshold, it automatically identifies individual ritual interruption events and generates contextual activation signals with timestamps and types, effectively associating behavioral changes with specific contexts. Subsequently, based on this activation signal, a coupled feature vector is generated, fusing behavior and physiological state, and input into the emotional stress assessment model to obtain more context-aware individual emotional stress assessment results. This solves the problem of high misjudgment rates caused by the lack of contextual judgment in the identification of behavioral or physiological abnormalities in existing technologies, improving the reliability of digital assessment.

[0016] 2. In constructing the individualized ritual model for the target patient, a time-dimensional behavioral feature extraction based on the duration and interval of the action was further introduced. Based on this feature, multiple repetitive action combinations were identified. By jointly screening the occurrence frequency and the entropy value of the time interval distribution, candidate ritual behaviors with regularity and repetition can be effectively identified, thereby improving the accuracy of behavioral modeling at the time structure level. On this basis, environmental trigger factor analysis was further introduced for each candidate ritual behavior. By collecting environmental state information within its preceding time window and constructing multiple sets of environmental trigger factors, the similarity between the sets of environmental trigger factors was combined to obtain a contextual consistency score. When the score is higher than a preset threshold, the candidate behavior can be confirmed as a core ritual behavior. The core trigger context is extracted from the high-frequency environmental features, thereby constructing an individualized ritual model together with the core ritual behavior and its highly consistent trigger context. This modeling process not only enhances the temporal stability and repeatability of behavior recognition, but also introduces the analysis of behavior occurrence conditions in the environmental context dimension. It effectively solves the problem that the lack of behavior triggering context recognition in existing behavior modeling leads to behavior generalization and misidentification. It achieves accurate modeling of typical ritual behaviors of target patients under specific environmental conditions and highly relevant behavior context matching.

[0017] 3. In the process of behavioral similarity calculation, a structured decomposition of the core ritual behavior is introduced, dividing it into anchor action subsequences and transition action subsequences. Stable behavioral segments are identified based on the common features of the anchor action subsequences, and the individual differences in the execution process are captured by combining them with the transition action subsequences. This ensures that similarity assessment not only focuses on the consistency of the actions themselves but also considers the coherence of the structural relationships between actions. Furthermore, by matching and analyzing the real-time action sequence with the anchor action subsequences, the anchor completion rate is obtained. Under the condition that the anchor completion rate meets a preset threshold, the deviation of the transition action subsequence is evaluated. Thus, through a joint function of anchor completion rate and transition behavior deviation, the behavioral similarity between the real-time action sequence and the core ritual behavior is accurately calculated. This approach considers the combined characteristics of stability and variability in ritual behavior, solving the problem of insufficient sensitivity or high false positive rates in existing technologies that rely solely on overall action or local feature matching for behavioral similarity assessment. It provides a more structure-aware and tolerance-controlled similarity assessment for individualized ritual behaviors within complex behavioral sequences. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating an artificial intelligence-based digital evaluation method in an embodiment of this application. Figure 2 This is a schematic diagram of the process for constructing an individualized ritual model in an embodiment of this application; Figure 3 This is a schematic diagram of the structure of the electronic device in the embodiments of this application.

[0019] Explanation of reference numerals in the attached drawings: 301, Central Processing Unit; 302, Read-Only Memory; 303, Random Access Memory; 304, Bus; 305, Input / Output Interface; 306, Input Section; 307, Output Section; 308, Storage Section; 309, Communication Section; 310, Driver; 311, Removable Media. Detailed Implementation

[0020] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.

[0021] In the description of the embodiments of this application, the words "for example" or "for instance" are used to indicate examples, illustrations, or explanations. Any embodiment or design that is described as "for example" or "for instance" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design options. Rather, the use of the words "for example" or "for instance" is intended to present the relevant concepts in a specific manner.

[0022] In the description of the embodiments of this application, the term "multiple" means two or more. For example, multiple systems means two or more systems, and multiple screen terminals means two or more screen terminals. Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the indicated technical features. Thus, a feature defined with "first" or "second" may explicitly or implicitly include one or more of that feature. The terms "comprising," "including," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.

[0023] Figure 1 This is a flowchart illustrating an artificial intelligence-based digital evaluation method in an embodiment of this application.

[0024] Please see Figure 1 This application provides an artificial intelligence-based digital evaluation method, which includes: S101. Obtain historical action sequence data of the interaction between the target patient and the target object; In this embodiment, to achieve accurate modeling of the individualized ritual behavior patterns of the target patient, historical action sequence data of the target patient's interactions with target objects is acquired. Action sequence data refers to the trajectory data of the target patient's movements when interacting with specific objects (such as everyday objects, rehabilitation training equipment, or interactive digital devices) during natural interactions or guided tasks. This data mainly includes temporal motion parameters such as the start time, end time, action type, spatial displacement, and velocity changes. The purpose of acquiring this type of historical action sequence data is to provide sufficient behavioral observation samples for the subsequent unsupervised temporal modeling module, enabling the extraction of behavioral patterns with repeatability and temporal consistency.

[0025] To acquire the aforementioned motion sequence data, in practical applications, motion tracking can be performed using wearable motion capture devices (such as sensors based on inertial measurement units, IMUs) or image recognition modules based on computer vision. Taking an IMU as an example, this device typically integrates a three-axis accelerometer, a three-axis gyroscope, and a magnetometer, enabling real-time acquisition of acceleration, angular velocity, and directional changes in the target patient's hand, wrist, or torso in three-dimensional space. By fixing multiple sensors to key joints of the target patient (such as the wrist, elbow, and shoulder) and combining them with a time synchronization mechanism, a multi-sensor fused motion timing data stream can be constructed. Another approach is to deploy RGB-D cameras or a multi-view camera system, using pose recognition algorithms (such as OpenPose or MediaPipe) to extract the three-dimensional coordinate changes of key points on the target patient's body, thereby reconstructing the complete motion sequence.

[0026] During the data collection process, the system sets the interaction object, i.e. the target object, which is usually a physical object or virtual trigger that the patient interacts with frequently in a specific task or life situation, such as toy blocks, books, drinking cups or specific image cards. By placing the object in a controlled environment and monitoring its state changes of being touched, moved or used with visual detection or sensors, it can determine whether the interaction behavior between the target patient and the target object has occurred, and extract the action sequence within the corresponding time period as a valid sample.

[0027] The historical action sequence data obtained through the above methods possesses high temporal integrity and contextual relevance, providing a data foundation for subsequent modeling of individualized patient ritual behaviors based on unsupervised learning algorithms (such as variational autoencoders (VAEs) or self-attention mechanisms like Transformers). Compared to traditional static feature extraction methods, this approach can realistically reflect the rhythmicity, repetitiveness, and individual differences of patients' actions in natural interaction scenarios, thus laying the foundation for building highly reliable and personalized ritual models and achieving accurate identification of ritual behaviors and improved contextual understanding.

[0028] S102. Perform unsupervised temporal modeling on the historical action sequence data to obtain the individualized ritual model of the target patient. The individualized ritual model is used to characterize the ritualized behavior pattern of the target patient. The individualized ritual model includes core ritual behaviors and core triggering situations corresponding to the core ritual behaviors. In step S102, to achieve accurate modeling of the target patient's ritualistic behavior patterns, unsupervised temporal modeling is required on the previously acquired historical action sequence data. This modeling process aims to mine repetitive behavioral structures with individual characteristics from the patient's long-term behavioral data and identify their typical occurrence patterns in specific contexts, thereby constructing an individualized ritual model that can characterize the target patient's specific behavioral rhythms and their contextual dependencies. This model includes not only the core ritualistic behaviors that repeatedly occur in the target patient's natural interactions but also the core triggering contexts stably associated with these behaviors, supporting subsequent identification and evaluation of key behavioral states. To achieve the above modeling objectives, this embodiment further refines the processing flow of historical action sequence data, specifically including extracting temporal-dimensional behavioral features for modeling, identifying highly repetitive action combinations, screening candidate ritualistic behaviors with temporal stability, and further screening core ritualistic behaviors and their core triggering contexts that are highly associated with specific contexts, combined with environmental factors, ultimately constructing an individualized ritual model for the target patient. Figure 2 This is a flowchart illustrating the process of constructing an individualized ritual model in an embodiment of this application, combined with... Figure 2 Step S102 will be described in detail.

[0029] S1021. Extract the duration of the action and the duration of the action interval from the historical action sequence data, and construct the time-dimensional behavioral features of the target patient based on the duration of the action and the duration of the action interval. To effectively model the temporal structure of ritualized behavior patterns in target patients, it is necessary to extract action duration and action interval duration from historical action sequence data, and construct temporal-dimensional behavioral characteristics of the target patients based on these two types of temporal parameters. Action duration refers to the time span during which the target patient continuously performs a specific action, used to measure the stability and rhythmicity of the action's execution; action interval duration refers to the time interval between two adjacent actions, used to reflect the structural relationships and behavioral rhythms among the actions in the entire action sequence. These two types of temporal parameters are the foundation for analyzing the rhythmicity and repetitiveness of individual behavior and possess crucial discriminative power in behavioral sequence modeling.

[0030] In its implementation, the system segments and parses historical action sequence data. This data consists of timestamped action streams extracted from the target patient's past interactions by the action recognition module. Each identified action unit includes its start and end times, as well as its action category identifier. The system calculates the timestamp difference between adjacent action units to obtain the duration of each action unit and the interval between adjacent actions. To ensure the accuracy of the extraction results, the system preprocesses the data, including action label denoising, invalid action filtering, and timestamp alignment, to construct a high-quality temporal-dimensional behavior dataset.

[0031] The system assembles each action, its corresponding duration, and the preceding and following intervals into temporal feature triples, i.e., temporal behavioral features. These triples are then extracted segment by segment from the entire historical sequence using a sliding window mechanism, forming multidimensional time series data. To eliminate absolute duration shifts caused by individual differences, the system normalizes all time values, converting them into relative unit time proportions or Z-score standardized values, thereby enhancing the comparability of temporal features between different patients.

[0032] Temporal behavioral features can comprehensively characterize the temporal structure of target patients' actions in long-term real-world interactive environments, helping to identify highly repetitive and rhythmic action combinations, thus laying a data foundation for screening candidate ritual behaviors. Compared to static recognition methods based solely on action categories, introducing temporal behavioral features can significantly improve the sensitivity to individual differences and the ability to express behavioral sequences in behavioral modeling, thereby achieving more accurate construction of individualized ritual models.

[0033] S1022. Based on the time-dimensional behavioral characteristics, identify multiple repetitive action combinations of the target patient, and calculate the occurrence frequency and time interval distribution entropy value of each repetitive action combination. In step S1022, in order to further explore the regular behavioral structure of the target patient in the historical action sequence, it is necessary to identify multiple action combinations with repetitive features based on the previously extracted time-dimensional behavioral features, and calculate the occurrence frequency and time interval distribution entropy value of each repetitive action combination to support the subsequent screening of candidate ritual behaviors.

[0034] To this end, the system first constructs a standardized temporal feature vector sequence based on the duration and interval of each action unit in the historical action sequence. Each temporal feature vector describes the temporal behavior characteristics of an action segment, including the duration of the action and the time interval between it and the previous action. Subsequently, the system uses a sliding window mechanism to traverse the entire temporal feature vector sequence and clusters and encodes the temporal patterns within the window. Specifically, the system uses algorithms based on dynamic temporal pattern mining (such as DTW-based dynamic time warping or Shapelet-based sequence similarity comparison methods) to perform similarity matching between multiple time segments, thereby identifying action combinations with highly similar temporal structures.

[0035] After identifying a set of action combinations with consistent temporal structure characteristics, the system performs frequency statistics on each type of repetitive action combination, recording the number of times it appears in the entire historical action sequence, which is defined as the frequency of occurrence of the combination. Frequency is used to measure the representativeness of the combination in the target patient's behavioral pattern; the higher the frequency, the more likely the combination is to be a core component of the patient's daily behavior.

[0036] Meanwhile, to further evaluate the rhythmicity and regularity of these action combinations over time, the system calculates the time interval distribution entropy value for each set of repetitive action combinations. This entropy value is used to characterize the dispersion of the time interval distribution of the same action combination between different occurrence time points. The specific calculation method is as follows: first, extract the time interval sequence between each occurrence, construct a probability distribution histogram for it, and then use the Shannon entropy formula. Here, H is the entropy value of the time interval distribution, and n is the number of distribution intervals. The time intervals between all action combinations are divided into n intervals (or "buckets") to construct the probability distribution. This division can be equidistant intervals or adaptive partitioning based on the data distribution. Let be the probability that the time interval between two consecutive occurrences of any repetitive action combination falls within the pre-divided i-th time interval. Let represent the frequency proportion of the time interval falling within the i-th interval, i.e., what percentage of the time intervals between all action combinations belong to this interval. This satisfies: The entropy value of time interval distribution represents the degree of uncertainty in the distribution of time intervals of a combination of actions. The smaller the entropy value, the more regular the behavior; the larger the entropy value, the more random the behavior. Entropy value can objectively distinguish between highly regular ritualistic behaviors and disordered behaviors, making it suitable for capturing specific behavioral rhythm differences in patients and improving the accuracy of screening candidate ritualistic behaviors.

[0037] S1023. When the occurrence frequency of the repetitive action combination is greater than a preset frequency threshold and the time interval distribution entropy value is less than a preset entropy threshold, the repetitive action combination is determined as a candidate ritual behavior. To select the most representative candidate ritual behaviors from multiple identified repetitive action combinations, two quantitative indicators—occurrence frequency and time interval distribution entropy—are needed as criteria, along with corresponding threshold conditions, to achieve high-reliability identification of the target patient's ritualistic behaviors. The purpose of this screening mechanism is to eliminate action combinations from historical behavior that, while repetitive, lack rhythmicity or behavioral stability, ensuring that the behaviors ultimately included in the individualized ritual model exhibit high regularity and consistency with individual characteristics.

[0038] In the specific implementation process, the system first reads the occurrence frequency and time interval distribution entropy value of each repetitive action combination identified in step S1022 as the judgment input. The occurrence frequency represents the total number of times the action combination occurs non-overlapping in the historical action sequence, which is used to measure its behavioral representativeness; the time interval distribution entropy value reflects the dispersion of the occurrence time interval of the combination in different time periods, which is used to evaluate its behavioral rhythmicity and stability.

[0039] The system presets a set of judgment thresholds, namely a frequency threshold and an entropy threshold. The frequency threshold is adaptively set by the system based on the total sample duration, the number of action categories, and the patient's behavioral activity level, and is usually set to the upper quartile of the frequency distribution of all combinations; the entropy threshold is set as an empirical constant based on historical training data or expert experience, and is used to exclude behavioral combinations with highly random temporal distribution.

[0040] When the frequency of a repetitive action combination is greater than the frequency threshold and the entropy value of its time interval distribution is less than the entropy threshold, the system determines that the combination not only has a significant probability of occurrence in the target patient's behavior, but also that its occurrence time is highly predictable and rhythmic. Therefore, the combination is marked as a candidate ritual behavior, and its structural features, frequency information and time pattern are recorded as input parameters for subsequent modeling.

[0041] Through the aforementioned screening mechanism, the system can automatically filter out combinations of behaviors with low frequency and unstable distribution from a large number of potential behavioral fragments, retaining only the key action sequences that truly reflect the individual patient's ritualistic behavioral rhythms. This provides high-quality sample input for constructing accurate, personalized, and context-relevant ritual behavior models. This method significantly improves the model's sensitivity to behavioral repeatability and temporal regularity, enhancing the clinical interpretability and operability of behavioral recognition results in digital assessment systems.

[0042] S1024. Screen the candidate ritual behaviors for environmental triggering factors to obtain the individualized ritual model.

[0043] To further improve the accuracy and context adaptability of individualized ritual models, an environmental triggering factor screening mechanism needs to be introduced based on the identified candidate ritual behaviors. This mechanism identifies core ritual behaviors that exhibit high consistency and stability under specific environmental conditions. Since ritualistic behaviors often depend on specific external environmental stimuli or contextual backgrounds, relying solely on the repetitive characteristics of the behavior itself is insufficient to fully characterize its occurrence patterns. Therefore, the system analyzes the environmental state information corresponding to candidate ritual behaviors in historical data, constructs multiple sets of environmental triggering factors, and calculates the consistency scores between these contexts. This allows for the screening of core ritual behaviors with high contextual stability, and further extraction of their common environmental features to determine the core triggering contexts. Finally, the core ritual behaviors are paired and integrated with their corresponding core triggering contexts to form an individualized ritual model with contextual awareness capabilities. Specifically, this may include the following steps: Obtain multiple sets of environmental triggers associated with each of the candidate ritual behaviors. The environmental triggers include multiple environmental state information of the target patient's environment within a preset time window before the candidate ritual behavior occurs. One candidate ritual behavior corresponds to multiple sets of environmental triggers. Calculate the similarity among multiple sets of environmental triggers for the candidate ritual behavior to obtain a context consistency score; When the context consistency score is greater than the preset consistency threshold, the candidate ritual behavior is determined as the core ritual behavior; In the set of multiple environmental triggering factors corresponding to the core ritual behavior, multiple target environmental features are extracted to obtain the core triggering situation. The target environmental features are environmental features that appear more frequently than a preset frequency threshold in the set of multiple environmental triggering factors. The core ritual behavior and the core triggering context are combined to form the individualized ritual model.

[0044] After identifying candidate ritual behaviors, to verify whether these behaviors exhibit stable and predictable contextual dependence, it is necessary to further acquire multiple sets of environmental triggers associated with each candidate ritual behavior. The environmental trigger set refers to the collection of records of the target patient's environmental state within a preset time window prior to each candidate ritual behavior. This time window is typically set based on the duration of the behavior and the environmental response delay, for example, 5 to 10 minutes. Environmental state information includes, but is not limited to, light intensity, sound type, temperature and humidity levels, spatial location, and surrounding population density, which can be sensed in real time by a multimodal sensor system and aligned with the behavior sequence via timestamps.

[0045] By traversing the occurrence times of candidate ritual behaviors, the system automatically extracts corresponding environmental data fragments, forming multiple sets of environmental triggering factors to characterize the typical external context in which the behavior occurs. The purpose of this process is to construct the correlation data between candidate ritual behaviors and their environments, providing a foundational input for subsequent contextual consistency analysis.

[0046] To further determine whether candidate ritual behaviors are highly context-dependent, i.e., whether they always occur in similar environments, the system needs to calculate the similarity of multiple sets of environmental triggering factors corresponding to the candidate ritual behavior and obtain a context consistency score accordingly. The similarity calculation is based on the feature vector representation of environmental state information. The system first standardizes various environmental features in each set of environmental triggering factors, mapping them uniformly into a vector space. Then, it uses metrics such as cosine similarity, Euclidean distance, or Mahalanobis distance to calculate the pairwise similarity between each set of environmental triggering factors, and uses the average similarity or minimum similarity as the context consistency score for the candidate behavior. This score reflects whether the behavior always occurs under similar environmental conditions; a higher score indicates a more stable environmental triggering mechanism behind the behavior. By setting a consistency threshold, such as 0.8, the system can exclude candidate behaviors with weak context dependence and strong randomness in environmental triggering factors, retaining only behaviors that occur stably under highly consistent environmental conditions.

[0047] After determining that a candidate ritual behavior's contextual consistency score exceeds a preset consistency threshold, the system identifies it as a core ritual behavior. This determination aims to ensure that the behaviors included in the individualized ritual model are not only rhythmic in the temporal dimension but also highly predictable in terms of spatial and environmental triggers. At this stage, the system labels and manages the core ritual behavior and records all its corresponding environmental trigger factor sets as input for subsequent extraction of core trigger contexts. This operation ensures that the final model can achieve closed-loop prediction from environmental state to behavioral state, possessing a true contextual awareness capability.

[0048] After obtaining the core ritualistic behavior and its corresponding sets of environmental triggers, the system further extracts the target environmental features that repeatedly appear before the occurrence of multiple behaviors to construct the core triggering context of the core ritualistic behavior. Target environmental features refer to features whose frequency of occurrence exceeds a preset frequency threshold across all sets of environmental triggers; for example, a feature may appear in 70% of the trigger sets. To achieve this, the system performs statistical analysis on all sets of environmental triggers, statistically analyzes the frequency of each type of environmental feature (such as light intensity, sound type, and temperature range), and selects features that meet the frequency criteria as core trigger features. The principle behind this process is to identify stable external variables that accompany the behavior through frequency analysis, thereby defining the typical context in which the behavior occurs. For example, if a patient's "repeatedly cleaning the table" behavior occurs in more than 80% of cases where the environmental conditions include "light intensity <150 lux and quiet environment," then "low light" and "silence" can be considered components of the core triggering context for this behavior.

[0049] It should be noted that the context mentioned in this invention is a broad concept, aiming to comprehensively describe the integrated context in which a behavior occurs. In embodiments of this invention, the factors constituting the context may include, but are not limited to: environmental factors: such as physical spatial information such as the room where the target patient is located (e.g., bedroom, study) and the layout of key objects in the room (e.g., desk, bed); time factors: such as a specific time of day (e.g., after 10 pm) and a specific date type (e.g., weekday or weekend); and antecedent event factors: such as other signature actions performed by the patient in the period before the ritual behavior occurs (e.g., washing up, turning off the lights). Therefore, the context consistency score calculated by this invention measures the comprehensive similarity of the above-mentioned multi-dimensional factors when candidate ritual behaviors occur, rather than just the similarity of the physical environment, thereby enabling more accurate identification of true ritualized behavior patterns.

[0050] Ultimately, the system pairs and integrates the selected core ritual behaviors with their core triggering contexts to form a complete individualized ritual model. This model is stored in "behavior-context" pairs, possessing the dual ability to describe the specific behavioral patterns of the target patient and their triggering environments. It can serve as a priori behavioral model for subsequent real-time monitoring and ritual interruption detection. During subsequent system operation, when the real-time environmental state matches the core triggering context, the system can initiate behavioral consistency monitoring of the core ritual behavior and trigger a contextual activation signal when an interruption occurs. The construction of this model significantly enhances the system's accuracy in behavior recognition and contextual sensitivity in predicting emotional stress in natural interactive environments, exhibiting high individual adaptability and clinical interpretability.

[0051] S103. Obtain the real-time behavior data stream of the target patient and monitor the real-time environmental data of the target patient. When the real-time environmental data meets the core triggering scenario, extract the real-time action sequence from the real-time behavior data stream. In step S103, to achieve real-time monitoring of the target patient's ritualistic behavior execution process and individual ritual interruption identification, it is necessary to dynamically acquire the target patient's real-time behavioral data stream while the system is running, and simultaneously collect real-time environmental data of the patient's environment. Then, based on the core triggering context predefined in the individualized ritual model, the real-time environmental state is compared and judged. When the current environmental state is detected to meet the characteristics of the core triggering context, the system can activate the behavior monitoring module to extract the real-time action sequence within the current time period from the real-time behavioral data stream. The core objective of this process is to ensure that the system can continuously observe the patient's behavior execution process with high frequency and high precision when the patient is actually in a critical behavioral context, thereby providing real-time input for subsequent behavior similarity calculation and ritual interruption judgment.

[0052] In practice, the system synchronously perceives the target patient's behavior and surrounding environment through multimodal sensing devices. Real-time behavioral data streams are collected and generated by the behavior recognition subsystem, typically relying on visual sensors (such as RGB-D cameras), inertial measurement units (IMUs), motion capture devices, or pressure sensors, continuously outputting timestamped sequences of action categories and their corresponding spatial and temporal characteristics. Real-time environmental data is collected through environmental perception nodes, including dimensions such as ambient light intensity, sound decibel levels, temperature and humidity values, spatial location information, crowd density, and object states, all reported in real-time to the system's environmental perception engine in the form of structured environmental state vectors.

[0053] After receiving real-time environmental data, the system invokes the core triggering context matching module. This module compares the current environmental state vector based on a predefined set of core triggering context features in the individualized ritual model. Core triggering contexts are high-frequency environmental patterns extracted from recurring environmental features in historical behavioral data through statistical analysis. They are typically stored in the form of Boolean combinations, interval constraints, or weighted distributions, such as "light intensity below 100 lux, ambient noise below 45 dB, and temperature between 22–24°C." The matching process is achieved through feature threshold comparison and logical rule determination. If the current environmental state satisfies all or most of the core triggering features, the system determines that the current moment is a potential activation period for ritual behavior and immediately activates the real-time action extraction module.

[0054] After successful environment matching, the real-time action extraction module initiates a high-frequency action sampling mechanism, continuously extracting action sequences from the real-time behavior data stream within a set time window (e.g., 30 or 60 seconds) to form structured real-time action sequence data. This sequence serves as input for subsequent behavior similarity calculations, determining whether the current behavior aligns with the core ritual behavior. Through this environment-driven behavior data extraction strategy, the system significantly reduces the overhead of processing invalid data, focusing on the analysis of key behavior periods, while simultaneously improving the contextual relevance of behavior recognition and data utilization efficiency.

[0055] For example, if the core ritual behavior in an individualized ritual model is "repeatedly tidying the desktop," and its core triggering context is "a quiet room with soft lighting and no one else nearby," when the system detects that real-time environmental data meets these characteristics, it triggers behavioral data extraction. This involves extracting the patient's hand movements and body position change sequences from the current behavioral data stream to determine whether the core ritual behavior is being performed, thus supporting subsequent interruption monitoring and stress assessment processes. This approach not only enhances the system's real-time performance and accuracy but also improves the stability and individual adaptability of digital assessment in natural interaction scenarios.

[0056] S104. Calculate the behavioral similarity between the real-time action sequence and the core ritual behavior; In step S104, to achieve consistency judgment between the target patient's current behavioral state and the core ritual behavior in their individualized ritual model, it is necessary to calculate the behavioral similarity between the real-time extracted action sequence and the core ritual behavior. Since ritualized behaviors typically possess a certain degree of structural stability—that is, action sequences that recur in different contexts often contain several key action segments—and these key actions play a structural anchoring role during the behavior's occurrence, to improve the robustness and interpretability of the similarity calculation results, the system needs to perform a structured analysis of the core ritual behavior before performing behavior matching, dividing it into multiple anchor action subsequences and transitional action subsequences. The anchor action subsequences are used to capture the core action segments that recur in all highly consistent candidate ritual behaviors, while the transitional action subsequences reflect the connection methods and behavioral transition characteristics between these anchors. By performing segmented matching analysis between the real-time action sequence and the anchor action subsequences and transitional action subsequences respectively, the system can more accurately measure the structural similarity between the current behavior and the core ritual behavior, providing a high-confidence criterion for subsequent judgment of individual ritual interruption events. The following details how to calculate behavioral similarity based on this structural decomposition method, specifically including the following steps: The core ritual behavior is decomposed into at least one anchor action subsequence and at least one transition action subsequence. The anchor action subsequence is an action segment common to all target candidate ritual behaviors. The target candidate ritual behavior is any one of the core ritual behaviors. The transition action subsequence is an action segment in the core ritual behavior other than the anchor action subsequence that connects different anchor action subsequences. The behavior similarity is calculated based on the real-time action sequence, the anchor action subsequence, and the transition action subsequence.

[0057] To improve the accuracy and adaptability to individual differences in subsequent behavior similarity matching, core ritualistic behaviors need to be divided into two components: anchor action subsequences and transition action subsequences. This division strategy is based on the fact that ritualistic behaviors are typically not completely identical repetitions of actions, but rather consist of several key action segments (i.e., anchor actions) that are highly consistent and stable across multiple behavior instances, along with transitional action segments connecting these anchors. By structurally decomposing core ritualistic behaviors in this way, the system can more accurately distinguish between the core stable components and variable structures within the behavior, improving the robustness and interpretability of behavior similarity assessment, and is particularly suitable for situations where natural behavioral variations exist in real-world scenarios.

[0058] In practice, the system first selects several target candidate ritual behaviors from among the candidate ritual behaviors, whose contextual consistency scores are higher than a preset consistency threshold, as the basic dataset for constructing the core ritual behaviors. Based on the action sequences of these target candidate ritual behaviors, the system performs multi-sequence alignment operations, employing algorithms such as Dynamic Time Warping (DTW) or Longest Common Subsequence (LCS) to identify action segments that recur and have relatively stable order positions among all target candidate ritual behaviors, defining them as anchor action subsequences. Anchor action subsequences reflect the structural core of this type of behavior and are the most discriminative feature units in subsequent behavior recognition processes.

[0059] After identifying the anchor point action subsequences, the system further analyzes the intermediate action segments between every two adjacent anchor points, defining these behavioral processes connecting different anchor points as transitional action subsequences. Transitional action subsequences may vary to some extent in different instances, such as different action durations or slight differences in details, but their overall structure and function are usually consistent, thus they can be modeled as behavioral segments with high tolerance. The system performs cluster analysis on the action segments between each anchor point in the target candidate ritual behavior, extracting their common features and constructing typical transitional behavior templates to support subsequent fuzzy matching.

[0060] The structural decomposition of core ritual behaviors achieved through the above method allows the system to employ a segmented comparison strategy when performing real-time action sequence matching. This involves rigorous matching of anchor action subsequences and flexible matching of transitional action subsequences, significantly enhancing the robustness of similarity calculation. For example, in a core ritual behavior like "handwashing," anchor actions might include "reaching for the faucet," "rubbing hands together," and "turning off the faucet," while transitional actions include steps with significant individual variations, such as "adjusting the water temperature" and "getting hand sanitizer." By structurally modeling these behaviors, the system can accurately identify whether the anchor structure is continuously missing or shifted during the patient's actual behavior, and combine this with the matching degree of transitional actions for a comprehensive evaluation, achieving sensitive judgment of behavioral integrity and interruption detection. This structural decomposition strategy not only improves the stability of behavior recognition but also provides a more interpretable behavioral basis for subsequent emotional stress assessment.

[0061] In calculating behavioral similarity based on real-time action sequences, anchor action subsequences, and transitional action subsequences, to improve the resolution and individual adaptability of behavioral assessment, the system not only needs to match the overall action structure but also needs to conduct a joint analysis from two dimensions: the completion degree of core actions and the degree of deviation of connecting actions. Since core ritual behaviors often have clearly defined key action anchors in their structure, these anchor actions have high weight in the behavior discrimination of subsequences. Therefore, the system first performs a matching analysis on whether these anchor actions are fully executed in the real-time action sequence, obtaining a quantitative index reflecting the degree of anchor action execution, namely, anchor completion degree. After completing the matching of anchor actions, if the anchor completion degree meets the preset requirements, the system further analyzes the differences between the connecting behaviors between anchors in the real-time action sequence and the transitional action subsequences in the core ritual behavior, calculating the transitional behavior deviation degree to reflect the degree of subtle variation in the behavioral structure. Finally, based on the anchor completion degree and the transitional behavior deviation degree, combined with a preset function model, the system generates a comprehensive quantitative result representing behavioral similarity, providing an accurate basis for judging individual ritual interruptions. The following details how to calculate anchor point completion and transition behavior deviation, and how to fuse them to generate a behavior similarity index. Specifically, this may include the following steps: The real-time action sequence is matched and analyzed with the anchor action sub-sequence in the core ritual behavior to obtain the anchor completion degree. The anchor completion degree is used to characterize the degree to which the real-time action sequence completes the anchor action sub-sequence in the core ritual behavior. When the completion degree of the anchor point is greater than the preset completion degree threshold, the transition behavior deviation degree between the transition action subsequence in the core ritual behavior and the real-time action sequence is calculated. The transition behavior deviation degree is used to characterize the degree of difference between the transition actions in the real-time action sequence and the transition action subsequence in the core ritual behavior. The behavior similarity is obtained by using a preset function based on the anchor point completion degree and the transition behavior deviation degree.

[0062] To accurately assess the structural consistency between the target patient's real-time behavior and core ritualistic behavior, the system needs to perform matching analysis between the real-time action sequence and the anchor action sub-sequences in the core ritualistic behavior to obtain the anchor completion score. Anchor completion score quantifies whether the key action units defined in the core ritualistic behavior are executed completely and accurately in the real-time action sequence; its core function is to determine whether the target patient's current behavior retains core structural features. The system decodes the real-time action sequence by constructing an action recognition model, dividing it into basic action units, and performs sequence-level matching with predefined anchor action sub-sequences in the core ritualistic behavior based on the Dynamic Time Warping (DTW) algorithm or a sliding window sequence alignment strategy. During the matching process, weighted evaluations are performed on dimensions such as action category, action execution order, and duration. Finally, the anchor completion score is calculated as the ratio between the number of anchor points hit and the total number of anchor points. For example, if the core ritualistic behavior contains 5 anchor actions, and the real-time action sequence accurately matches 4 of them, the anchor completion score is 0.8. This indicator provides a structural basis for determining whether to continue transitional behavior analysis.

[0063] After obtaining the anchor point completion rate, if the value is higher than the system's set completion rate threshold (e.g., 0.7), it indicates that the target patient has completed most of the core action structure. At this point, the system will proceed to the next stage to further analyze the non-anchor point parts of the real-time action sequence and calculate the transition behavior deviation. The transition behavior deviation measures the degree of difference between the action execution process connecting anchor points in the real-time action sequence and the corresponding transition action subsequence in the core ritual behavior. Its evaluation dimensions include action type offset, execution order variation, duration fluctuation, and action amplitude deviation. The system first locates the action segments between each anchor point based on timestamps, aligns these segments with the corresponding transition action subsequences in the core ritual model, and extracts high-dimensional feature vectors using an action embedding model (such as a Transformer or LSTM-based action sequence encoder). Then, it calculates the similarity score between actions using Euclidean distance or cosine distance. The system further aggregates the offset degrees of multiple transition actions to obtain the overall transition behavior deviation. The closer the transition segment is to the transition segment in the core ritual behavior, the smaller the behavioral variation and the lower the deviation. Taking the action of "washing hands" as an example, if the anchor points are "turning on the tap" and "rubbing hands", and the transition segment is "getting hand sanitizer", but the action is replaced by "rubbing hands with empty hands" in the real-time sequence, then there is a shift at the semantic level of the action, and the deviation is increased accordingly.

[0064] After obtaining the anchor point completion rate and transition behavior deviation rate, the system integrates the two indicators through a preset fusion function to generate the final behavioral similarity value. This function is typically implemented using weighted linear combination, nonlinear mapping, or fuzzy logic reasoning. Its core objective is to unify the modeling of structural matching results and behavioral detail deviations, outputting a single similarity index that can be used to determine behavioral consistency. For example, behavioral similarity can be set as α × anchor point completion rate - β × transition behavior deviation rate, where α and β are empirical weighting coefficients reflecting the system's different levels of attention to behavioral structure and behavioral details. If the behavioral similarity is below a set threshold, it can be identified as an individual ritual interruption event. In this way, the system can not only identify whether the behavior has been interrupted, but also further analyze at what structural stage the interruption occurred, thus providing more structurally interpretable input for subsequent stress state modeling and intervention strategies. For example, if a patient's anchor point completion rate is 0.9 but the transition behavior deviation rate is large during a "medication" ritual, it indicates that the subject of the behavior is retained but the details have deviated, possibly suggesting distraction or emotional interference, which has significant clinical indicative value.

[0065] S105. When the behavioral similarity is less than or equal to a preset similarity threshold, it is determined that the target patient has experienced an individual ritual interruption event, and a context activation signal is generated. In step S105, the system determines whether an individual ritual interruption event has occurred based on the behavioral similarity of the target patient when performing the current ritual behavior, and generates a contextual activation signal containing the interruption time and type. The core purpose of this step is to achieve real-time monitoring and accurate classification of abnormalities in the execution of individualized ritual behaviors, so as to correlate with subsequent physiological and emotional state analysis and provide a timely and targeted basis for intervention strategies. In specific implementation, the system first compares the behavioral similarity calculated in the previous steps with a preset similarity threshold. This threshold is a boundary value set based on the distribution of historical behavioral characteristics, behavioral complexity, and the system's sensitivity requirements for abnormalities. When the behavioral similarity is less than or equal to the threshold, the system determines that the target patient's current behavior has significantly deviated from the core ritual behavior pattern defined in their individualized ritual model, and thus determines that an individual ritual interruption event has occurred.

[0066] While the interruption event is determined, the system further refines the interruption type identification based on two key indicators in the structural matching process—anchor point completion rate and transition behavior deviation rate. When the anchor point completion rate is less than a preset completion rate threshold, the interruption type is determined to be a core action omission type; when the anchor point completion rate is greater than or equal to the preset completion rate threshold, and the transition behavior deviation rate is greater than a preset deviation rate threshold, the interruption type is determined to be a transition behavior aberration type. Among them, a core action omission type interruption refers to the failure to execute key steps of the ritual behavior; a transition behavior aberration type interruption refers to the fact that although the key steps have been executed, the connection between the steps or the execution method is abnormal.

[0067] In its implementation, the system first constructs an individualized ritual model for the target patient based on historical behavioral data, and extracts anchor action subsequences and transition action subsequences from multiple highly consistent behavioral instances. Subsequently, for multiple historical execution samples, the distribution characteristics of anchor completion rates in each execution are statistically analyzed under undisturbed conditions. The mean and standard deviation of this indicator are calculated, and a preset completion rate threshold is set with reference to the lower bound of the 95% confidence interval, for example, 0.75. This value indicates that, under normal circumstances, the proportion of the target patient completing the core anchor action is typically higher than 75%. Simultaneously, a similar method is used to statistically analyze the deviation of transition behaviors, identifying its average degree of deviation under normal execution conditions. Deviation rates exceeding two standard deviations above the mean are used as a preset deviation rate threshold, for example, 0.4, to identify atypical abnormal action transitions.

[0068] During the real-time behavior analysis phase, the system first performs structural matching on the target patient's current action sequence and calculates the anchor point completion rate. When this value is less than a preset completion rate threshold (e.g., below 0.75), it indicates that the patient failed to complete the key actions required in the ritual behavior, and the system classifies the interruption type as a core action omission interruption. This situation usually reflects a structural interruption of behavior caused by cognitive impairment, attention deficit, or executive dysfunction, and has strong clinical indicative value. If the anchor point completion rate is higher than or equal to the preset completion rate threshold, the system further calculates the transition behavior deviation rate. If this value is greater than the preset deviation rate threshold (e.g., above 0.4), it indicates that although the core action has been completed, there are structural connection abnormalities in the execution of the behavior, such as reversed sequence, redundant actions, or disordered behavior rhythm. In this case, the system classifies the interruption type as a transition behavior abnormality interruption.

[0069] For example, in the handwashing ritual, the core anchor points include "turning on the tap," "rubbing hands," and "turning off the tap," while the transitional actions include "taking hand sanitizer" and "adjusting the water temperature." In one instance, the system detected that the target patient only completed the "turning on the tap" and "turning off the tap" anchor points, failing to detect the "rubbing hands" action. The anchor point completion rate was 0.67, below the threshold of 0.75, and the system thus classified it as a core action omission interruption. In another scenario, the patient completed all anchor point actions, achieving an anchor point completion rate of 1.0. However, several irrelevant actions were detected between "taking hand sanitizer" and "rubbing hands," such as repeated turning and hand hovering, resulting in a transitional behavior deviation rate of 0.52, exceeding the threshold of 0.4. The system then categorized this as an abnormal transitional behavior interruption.

[0070] The design principle of this interruption classification mechanism lies in the fact that ritualistic behavior in cognitive behavioral science typically includes two dimensions: the completion of key behavioral nodes and the smoothness of transitions between behaviors. These two dimensions respectively reflect the structural integrity and execution coherence of the behavior. By distinguishing between the structural or transitional sources of interruption, the system can more effectively guide subsequent stress detection and intervention decisions.

[0071] After identifying the interruption type, the system timestamps the current behavioral data stream and environmental data stream, extracts the precise time point of the interruption, and encapsulates this timestamp and interruption type together into a context activation signal. The context activation signal is a key input used to trigger subsequent behavioral-physiological joint modeling and emotional stress assessment. Its structure includes metadata such as the time of the event, the event type, and the index information of the behavioral segment.

[0072] S106. Generate a coupling feature vector of the target patient based on the context activation signal. The coupling feature vector is used to characterize the behavioral-physiological joint state of the target patient after the occurrence of the individual ritual interruption event. In step S106, to achieve a comprehensive representation of the target patient's state after an individual ritual interruption event, the system further constructs a coupled feature vector that jointly reflects behavioral characteristics and physiological state based on the generated contextual activation signal. This coupled feature vector not only preserves the individual's behavioral dynamics during the interruption event but also integrates their physiological response trends, thus providing a multi-dimensional and time-series-based input foundation for subsequent assessment of emotional stress levels. To ensure that the construction of the coupled feature vector is individual-sensitive and event-dependent, the system needs to dynamically adjust the data sampling frequency according to the interruption type and simultaneously acquire high-resolution kinematic feature sequences and physiological indicator sequences within a preset duration. Based on this, through signal processing and trend analysis methods, rhythmic stability features and dynamic rate of change features are extracted respectively, and the two are jointly encoded into a unified coupled feature vector, forming a key input for subsequent stress assessment modeling. Specifically, this may include the following steps: Based on the interruption type, the data sampling frequency of the target patient is determined. Taking the timestamp as the starting point, kinematic feature sequences and physiological index sequences are acquired within a preset duration at the data sampling frequency. The kinematic feature sequences are used to characterize the stereotyped behavioral characteristics of the target patient, and the physiological index sequences are used to characterize the physiological state characteristics of the target patient. Signal processing is performed on the kinematic feature sequence to obtain the rhythmic stability characteristics of the kinematic feature sequence, and trend analysis is performed on the physiological index sequence to obtain the dynamic rate of change characteristics of the physiological index sequence. The rhythm stability feature and the dynamic rate of change feature are combined into a coupled feature vector.

[0073] After acquiring the contextual activation signal of an individual ritual interruption event, in order to comprehensively capture the behavioral-physiological combined response characteristics of the target patient in the short period following the event, the system needs to dynamically set the data sampling frequency based on the interruption type, and simultaneously collect kinematic feature sequences and physiological indicator sequences within a set time window, starting from the timestamp of the interruption event. Dynamically setting the data sampling frequency based on the interruption type can include the following steps: when the interruption type is a core action omission type interruption, the data sampling frequency is set to a first preset frequency; when the interruption type is a transitional behavior abnormality type interruption, the data sampling frequency is set to a second preset frequency, where the first preset frequency is greater than the second preset frequency. The reason for this setting is that core action omission type interruptions usually signify a structural collapse of ritual behavior, which may trigger a stronger and more sudden instantaneous emotional stress response in the target patient. Therefore, using a higher first preset frequency can more accurately capture the rapid dynamic changes in behavior and physiological indicators. Transitional behavior abnormality type interruptions, on the other hand, are more often characterized by disorder or hesitation during the execution of behavior, and their corresponding stress responses may be relatively mild or show a continuous accumulating trend. Using a relatively lower second preset frequency is sufficient to capture their changing characteristics, while also saving computational resources.

[0074] During this process, the system determines whether the interruption is a "core action omission interruption" or a "transitional behavior abnormality interruption," and sets a first preset frequency and a second preset frequency accordingly. The first preset frequency is higher than the second preset frequency to address potential sudden stress responses caused by the omission of core actions. The system synchronously samples through body posture sensors (such as IMU inertial units) and wearable physiological sensors (such as heart rate belts and skin conductance meters), collecting high-frequency raw signals such as acceleration, angular velocity, heart rate variability, and skin conductance within 10-30 seconds to construct a complete temporal data input foundation. For example, when the system detects that the target patient has missed the crucial action of "using utensils" during the "eating" ritual, it determines it as a core action omission interruption, and the system immediately starts the data acquisition module at a high frequency of 50Hz to lock the critical state window after the event occurs.

[0075] After data acquisition, the system performs signal processing on the acquired kinematic feature sequences to extract stability features reflecting the repetitiveness and rhythmicity of movements. Since stereotyped behaviors often manifest as highly repetitive and rhythmic movement patterns, the system employs frequency domain analysis methods such as Fast Fourier Transform (FFT) or wavelet transform to perform periodic analysis on acceleration and angular velocity signals, extracting features such as dominant frequency, frequency stability index, and rhythmic fluctuation coefficient. The system can also combine time domain indicators such as zero crossover rate and short-term energy changes to enhance its ability to identify stereotypes in non-stationary signals. This processing not only captures the patient's potential behavioral activation patterns after an interruption event but also provides a behavioral explanation for individualized stress responses. For example, if a target patient exhibits stereotyped behavioral patterns such as repetitive hand swinging within a short period after detecting an interruption to the "handwashing" ritual, the dominant frequency and amplitude variations in the rhythmic stability features will significantly increase, indicating a behavioral state of stress.

[0076] In parallel with kinematic data processing, the system performs trend analysis on physiological indicator sequences, extracting dynamic rate-of-change features reflecting the rate of change in physiological states. Specifically, the system performs sliding window fitting on physiological signals such as heart rate, skin conductance, and skin temperature, quantifying the intensity and direction of physiological responses by calculating the first derivative, rate of change of acceleration, or exponential fitting slope of the signal within the time window. Trend analysis methods such as locally weighted regression (LOESS) or Kalman filtering can be used to enhance signal smoothness and robustness, thereby more accurately extracting the physiological change trajectory triggered by sudden stress responses. For example, if the target patient's heart rate increases by more than 15% within 10 seconds and skin conductance significantly increases after the "medication" behavior is interrupted, the corresponding dynamic rate-of-change features will indicate enhanced sympathetic nerve activity, reflecting a physiological response to a high-stress state.

[0077] After extracting rhythmic stability features and dynamic rate of change features, the system jointly encodes them to construct a unified coupled feature vector, representing the behavioral-physiological joint state of the target patient in the short period following an individual ritual interruption event. This vector not only preserves the rhythmic structure of the behavioral sequence but also integrates the dynamic trend of the physiological response, exhibiting high temporal consistency and individual sensitivity. Through feature standardization and embedding layer mapping, the system uniformly maps the features of both dimensions to a high-dimensional space representation, forming the final coupled feature vector, which is then input into the emotional stress assessment model. For example, after a "tidying clothes" ritual interruption event, the behavioral dimension of the coupled feature vector shows high-amplitude rhythmic fluctuations, while the physiological dimension shows a rapid increase in conductivity. Based on this, the system can determine that the patient may be in a mild to moderate stress state, prompting the subsequent intervention module to provide appropriate situational guidance or behavioral cues. Through this process, the system achieves a closed-loop structure from interruption event identification to cross-modal state modeling, providing reliable support for personalized mental health assessment.

[0078] S107. Input the coupled feature vector into the preset emotional stress assessment model to obtain the assessment result of the target patient. The assessment result is used to characterize the emotional stress level of the target patient.

[0079] In step S107, to accurately quantify the emotional stress level of the target patient after an individual ritual interruption event, the system needs to input the coupled feature vector constructed in step S106 into a preset emotional stress assessment model, and generate an assessment result corresponding to the current state based on the model's output. The coupled feature vector is a multi-dimensional vector structure that integrates patient behavioral and physiological state information. Its initial construction process has uniformly encoded the rhythmic stability features in kinematic features and the dynamic rate of change features in physiological indicators, exhibiting high temporal correlation and physiological behavioral coupling. Therefore, this vector can serve as an ideal input carrier for modeling emotional stress states, thereby achieving multimodal modeling and intelligent recognition of individual stress responses.

[0080] The emotional stress assessment model used is a multimodal emotion recognition model built on a supervised learning strategy. The model's training process is based on a large amount of real-world behavioral-physiological joint data and manually labeled emotional stress responses. The model can employ a dual-stream convolutional neural network (Dual-Stream CNN) structure or a Transformer architecture with an attention fusion mechanism to extract features and perform temporal modeling of rhythmic stability and dynamic rate of change features, respectively. Cross-modal attention fusion is implemented in the intermediate layer to improve the model's ability to discriminate complex stress responses. During the training phase, developers collect kinematic and physiological data of the target group in multiple real-world scenarios and combine this with subjective reports from clinical experts or subjects to label their current stress levels, thus constructing a multi-label training set. After parameter optimization on this training set, the model can learn the mapping relationship between different coupled feature patterns and stress levels.

[0081] During implementation, the system uses the timestamp determined by the context activation signal as a starting point to extract the coupled feature vector, which is then fed into the input of the evaluation model. The model performs feature decoding, hierarchical mapping, and state classification operations on this vector based on its internal structure, ultimately outputting a multi-dimensional stress state vector or a single stress level label. The output can be presented in a graded manner, such as low stress, moderate stress, and high stress, or as a continuous stress score (e.g., a small value between 0 and 1) to support emotional state feedback mechanisms with varying degrees of precision. For example, if a target patient exhibits typical increased stereotyped movements and a rapid rise in skin conductance after the interruption of "medication" behavior, the constructed coupled feature vector will show high rhythmic stability and a high dynamic rate of change. Based on this, the evaluation model will determine that the patient is in a high stress state and output a stress score of 0.82.

[0082] By inputting coupled feature vectors into a pre-trained emotional stress assessment model, not only can the individual's stress state after a current interruption event be quantified in real time, but it can also provide strong data support for subsequent intervention strategies. For example, the system can automatically trigger voice reassurance commands, guiding behavioral prompts, or remotely notify caregivers based on the assessment results, thus forming a closed-loop mechanism of "interruption detection—state assessment—intelligent response," which greatly improves the intelligent level of identifying behavioral abnormalities and managing emotions in special populations (such as patients with cognitive impairment and autism). Overall, this step, through a model-driven emotional stress recognition mechanism, extends the digital assessment system from behavioral recognition to psychological state modeling, constructing a cross-modal and cross-level intelligent assessment system, providing reliable technical support for clinical auxiliary diagnosis and rehabilitation intervention.

[0083] Please see Figure 3 This is a schematic diagram of the structure of an electronic device for an artificial intelligence-based digital evaluation system in an embodiment of this application.

[0084] It should be noted that, Figure 3 The structure of the AI-based digital evaluation system shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of the present invention.

[0085] like Figure 3 As shown, an artificial intelligence-based digital evaluation system includes a central processing unit 301, which can perform various appropriate actions and processes according to a program stored in a read-only memory 302 or a program loaded from a storage section 308 into a random access memory 303, such as executing the methods described in the above embodiments. The random access memory 303 also stores various programs and data required for system operation. The central processing unit 301, the read-only memory 302, and the random access memory 303 are interconnected via a bus 304. An input / output interface 305 is also connected to the bus 304.

[0086] The following components are connected to the input / output interface 305: an input section 306 including audio input devices, push-button switches, etc.; an output section 307 including an LCD display, audio output devices, indicator lights, etc.; a storage section 308 including a hard disk, etc.; and a communication section 309 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 309 performs communication processing via a network such as the Internet. A drive 310 is also connected to the input / output interface 305 as needed. A removable medium 311, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on the drive 310 as needed so that computer programs read from it can be installed into the storage section 308 as needed.

[0087] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing computer programs for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 309, and / or installed from removable medium 311. When the computer program is executed by central processing unit 301, it performs the various functions defined in the present invention.

[0088] It should be noted that specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory, read-only memory, erasable programmable read-only memory, flash memory, optical fiber, portable compact disk read-only memory, optical storage devices, magnetic storage devices, or any suitable combination thereof. In this invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

[0089] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. Each block in a flowchart or block diagram may represent a module, segment, or portion of code, which contains one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those shown in the drawings.

[0090] Specifically, an artificial intelligence-based digital evaluation system according to this embodiment includes a processor and a memory. The memory stores a computer program. When the computer program is executed by the processor, it implements an artificial intelligence-based digital evaluation method provided in the above embodiment.

[0091] In another aspect, the present invention also provides a computer-readable storage medium, which may be included in an AI-based digital evaluation system described in the above embodiments; or it may exist independently and not incorporated into the AI-based digital evaluation system. The storage medium carries one or more computer programs, which, when executed by a processor of the AI-based digital evaluation system, cause the AI-based digital evaluation system to implement the AI-based digital evaluation method provided in the above embodiments.

Claims

1. A digital evaluation method based on artificial intelligence, characterized in that, The method includes: Acquire historical action sequence data of the target patient's interaction with the target object; Unsupervised temporal modeling is performed on the historical action sequence data to obtain the individualized ritual model of the target patient. The individualized ritual model is used to characterize the ritualized behavior pattern of the target patient. The individualized ritual model includes core ritual behaviors and core triggering contexts corresponding to the core ritual behaviors. The system acquires real-time behavioral data streams of the target patient and monitors the real-time environmental data of the target patient. When the real-time environmental data meets the core triggering scenario, the system extracts real-time action sequences from the real-time behavioral data streams. Calculate the behavioral similarity between the real-time action sequence and the core ritual behavior; When the behavioral similarity is less than or equal to a preset similarity threshold, it is determined that the target patient has experienced an individual ritual interruption event, and a context activation signal is generated; Based on the context activation signal, a coupling feature vector of the target patient is generated. The coupling feature vector is used to characterize the behavioral-physiological joint state of the target patient after the occurrence of the individual ritual interruption event. The coupled feature vector is input into a preset emotional stress assessment model to obtain the assessment results of the target patient. The assessment results are used to characterize the emotional stress level of the target patient.

2. The method according to claim 1, characterized in that, The unsupervised temporal modeling of the historical action sequence data to obtain the individualized ritual model of the target patient specifically includes: Extract the duration of actions and the duration of action intervals from the historical action sequence data, and construct the temporal dimension behavioral features of the target patient based on the duration of actions and the duration of action intervals; Based on the time-dimensional behavioral characteristics, multiple repetitive action combinations of the target patient are identified, and the occurrence frequency and time interval distribution entropy value of each repetitive action combination are calculated. When the occurrence frequency of the repetitive action combination is greater than a preset frequency threshold and the time interval distribution entropy value is less than a preset entropy threshold, the repetitive action combination is identified as a candidate ritual behavior. The candidate ritual behaviors are screened for environmental triggers to obtain the individualized ritual model.

3. The method according to claim 2, characterized in that, The process of screening environmental triggers for the candidate ritual behaviors to obtain the individualized ritual model specifically includes: Obtain multiple sets of environmental triggers associated with each of the candidate ritual behaviors. The environmental triggers include multiple environmental state information of the target patient's environment within a preset time window before the candidate ritual behavior occurs. One candidate ritual behavior corresponds to multiple sets of environmental triggers. Calculate the similarity among multiple sets of environmental triggers for the candidate ritual behavior to obtain a context consistency score; When the context consistency score is greater than the preset consistency threshold, the candidate ritual behavior is determined as the core ritual behavior; In the set of multiple environmental triggering factors corresponding to the core ritual behavior, multiple target environmental features are extracted to obtain the core triggering situation. The target environmental features are environmental features that appear more frequently than the preset frequency threshold in the set of multiple environmental triggering factors. The core ritual behavior and the core triggering context are combined to form the individualized ritual model.

4. The method according to claim 3, characterized in that, The calculation of the behavioral similarity between the real-time action sequence and the core ritual behavior specifically includes: The core ritual behavior is decomposed into at least one anchor action subsequence and at least one transition action subsequence. The anchor action subsequence is an action segment common to all target candidate ritual behaviors. The target candidate ritual behavior is any one of the core ritual behaviors. The transition action subsequence is an action segment in the core ritual behavior other than the anchor action subsequence that connects different anchor action subsequences. The behavior similarity is calculated based on the real-time action sequence, the anchor action subsequence, and the transition action subsequence.

5. The method according to claim 4, characterized in that, The calculation of the behavior similarity based on the real-time action sequence, the anchor action sub-sequence, and the transition action sub-sequence specifically includes: The real-time action sequence is matched and analyzed with the anchor action sub-sequence in the core ritual behavior to obtain the anchor completion degree. The anchor completion degree is used to characterize the degree to which the real-time action sequence completes the anchor action sub-sequence in the core ritual behavior. When the completion degree of the anchor point is greater than the preset completion degree threshold, the transition behavior deviation degree between the transition action subsequence in the core ritual behavior and the real-time action sequence is calculated. The transition behavior deviation degree is used to characterize the degree of difference between the transition actions in the real-time action sequence and the transition action subsequence in the core ritual behavior. The behavior similarity is obtained by using a preset function based on the anchor point completion degree and the transition behavior deviation degree.

6. The method according to claim 1, characterized in that, The context activation signal includes the timestamp and interruption type of the individual ritual interruption event. Generating the coupling feature vector of the target patient based on the context activation signal specifically includes: Based on the interruption type, the data sampling frequency of the target patient is determined. Taking the timestamp as the starting point, kinematic feature sequences and physiological index sequences are acquired within a preset duration at the data sampling frequency. The kinematic feature sequences are used to characterize the stereotyped behavioral characteristics of the target patient, and the physiological index sequences are used to characterize the physiological state characteristics of the target patient. Signal processing is performed on the kinematic feature sequence to obtain the rhythmic stability characteristics of the kinematic feature sequence, and trend analysis is performed on the physiological index sequence to obtain the dynamic rate of change characteristics of the physiological index sequence. The rhythm stability feature and the dynamic rate of change feature are combined into a coupled feature vector.

7. The method according to claim 6, characterized in that, The interruption types include core action omission interruptions and transitional behavior abnormality interruptions. Determining the data sampling frequency for the target patient based on the interruption type specifically includes: When the interrupt type is the core action omission type interrupt, the data sampling frequency is set to the first preset frequency; When the interrupt type is the transition behavior exception interrupt, the data sampling frequency is set to a second preset frequency, where the first preset frequency is greater than the second preset frequency.

8. A digital evaluation system based on artificial intelligence, characterized in that, The AI-based digital evaluation system includes: one or more processors and a memory; the memory is coupled to the one or more processors, the memory is used to store computer program code, the computer program code includes computer instructions, and the one or more processors call the computer instructions to cause the AI-based digital evaluation system to perform the method as described in any one of claims 1-7.

9. A computer-readable storage medium comprising instructions, characterized in that, When the instructions are executed on an AI-based digital evaluation system, the AI-based digital evaluation system performs the method as described in any one of claims 1-7.

10. A computer program product, characterized in that, When the computer program product is run on an AI-based digital evaluation system, the AI-based digital evaluation system performs the method as described in any one of claims 1-7.