Medical online examination management system based on multi-modal behavior perception
By collecting and analyzing candidate data through a multimodal behavior perception system, personalized digital twins are constructed and anomaly detection is performed, which solves the problem of cheating in online assessments and improves the fairness and credibility of the assessment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU JUHAI SOFTWARE TECH CO LTD
- Filing Date
- 2026-02-10
- Publication Date
- 2026-06-12
AI Technical Summary
Existing online assessment management systems lack multimodal behavior perception and monitoring methods, which makes the assessment process prone to cheating and results in low credibility. In particular, they cannot effectively identify proxy test-taking and collaborative cheating behaviors in large-scale online examinations.
The medical online assessment and management system, which adopts multimodal behavior perception, acquires candidates' facial video streams, environmental audio streams, and operation event sequences through a data acquisition module. It then constructs personalized digital twins by combining a twin modeling module, a behavior analysis module performs multimodal feature fusion and anomaly detection, an identity verification module implements silent continuous authentication, a fraud detection module analyzes spatial correlations, a parallel review module performs semantic association analysis, and finally, a dynamic evaluation module generates a comprehensive evaluation report.
It improves the ability to prevent fraud in the assessment process and the fairness and credibility of the assessment results. Through multimodal data analysis and dynamic model optimization, it reduces false alarms and omissions, and achieves accurate monitoring and evaluation of candidates' behavior.
Smart Images

Figure CN122199209A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of online examination anti-cheating monitoring and medical assessment management technology, and in particular to a medical online assessment management system based on multimodal behavior perception. Background Technology
[0002] Multimodal behavior perception technology integrates multiple data sources such as vision, hearing, and biosignals to collect and analyze human behavioral characteristics in order to identify behavioral patterns and intentions. In online medical assessment management, this system applies multimodal behavior perception technology to monitor candidates' facial expressions, voice interactions, and operational behaviors in a remote examination environment in real time. This allows the system to infer the candidates' concentration and operational standardization, thereby improving the objectivity and completeness of medical assessments and supporting refined evaluation of clinical skills and professional competence.
[0003] Existing online assessment management technologies suffer from the following technical pain points: Specifically, online assessment systems primarily rely on basic anti-cheating measures such as fixed IP addresses and randomized question order. When regional health management agencies organize large-scale online examinations for numerous medical institutions and personnel within their jurisdiction, the lack of multimodal behavior perception and monitoring methods, such as real-time identity verification, abnormal operation detection, or environmental behavior analysis, makes it difficult for the system to effectively identify proxy testing, collaborative cheating, or unauthorized assistance. For example, candidates may transmit answers through multiple terminal communication tools, or someone else may impersonate them to take the exam. Existing technologies can only restrict login IPs but cannot verify the identity and compliance of the actual examinee, thus increasing the risk of cheating during the assessment process and casting doubt on the fairness and credibility of the assessment results. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this invention provides a medical online assessment management system based on multimodal behavior perception. This invention solves the technical problem that the lack of multimodal behavior perception and monitoring methods in online assessments leads to easy cheating and low reliability of assessment results.
[0005] To solve the above-mentioned technical problems, the specific contents of the present invention are as follows: The medical online assessment management system based on multimodal behavior perception provided by this invention includes: The data acquisition module is configured to acquire facial video streams, environmental audio streams, and operation event sequences from the examinee's terminal, and output synchronized multimodal raw data streams; The twin modeling module is configured to receive synchronous multimodal raw data streams output by the data acquisition module, construct personalized digital twins of examinees based on historical data, and aggregate and update the global behavioral representation model in the cloud through a federated averaging algorithm to output dynamic behavioral baselines. The behavior analysis module is configured to receive the synchronous multimodal raw data stream output by the data acquisition module and the dynamic behavior baseline output by the twin modeling module. It uses a multi-head attention mechanism to fuse multimodal features, compares the real-time features with the dynamic behavior baseline, and outputs abnormal segments. The identity verification module is configured to extract facial video streams and operation event sequences from the synchronous multimodal raw data stream output by the data acquisition module, perform silent continuous authentication, and output an identity confidence curve. The cheating detection module is configured to receive abnormal segments output by the behavior analysis module and low confidence intervals in the identity confidence curve output by the identity verification module. It constructs an incomplete information game model, uses a spatiotemporal graph convolutional network to analyze the spatial correlation of the behaviors of multiple candidates in the examination room, and outputs a preliminary risk score. The parallel review module is configured to receive abnormal fragments output by the behavior analysis module, low confidence intervals in the identity confidence curve output by the identity verification module, and primary risk scores output by the fraud detection module. It performs semantic association analysis through a medical examination knowledge graph and generates correction vectors using a counterfactual causal reasoning framework. The dynamic evaluation module is configured to receive the initial risk score output by the fraud detection module and the correction vector output by the parallel review module, integrate them through a confidence-weighted fusion algorithm, use a Markov logic network to reason about contradictory evidence, and output a comprehensive evaluation report. The correction vector output by the parallel review module is transmitted to the fraud detection module, the behavior analysis module, and the twin modeling module.
[0006] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the data acquisition module is configured as follows: The candidate's terminal camera is used to capture raw video frames at a fixed frame rate to obtain video data; the microphone is used to capture ambient audio waveforms at a specific sampling rate to obtain audio data; and the system's underlying events are monitored to capture keyboard keystrokes, mouse coordinates and click events, and browser tab activation status change events to obtain operation event data. The acquired video data, audio data, and operation event data are stamped with microsecond-level timestamps obtained from a high-precision network time protocol server, and hardware synchronization pulse signals are sent to each sensor driver layer before transmission to align the acquisition clock start point. Face detection and region of interest cropping are performed on the stamped video data to compress the image size. Speech activity detection is applied to the stamped audio data to label segments including human voices. The stamped operation event data are aggregated into an event sequence sorted by time. The compressed video data, tagged audio data, and aggregated event sequences, along with their corresponding timestamps, are encapsulated into a data packet of a unified format. This encapsulated data packet is then streamed to the message queue of the backend processing server via a latency-resistant network protocol.
[0007] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the twin modeling module is configured as follows: The system receives synchronous multimodal raw data streams from the data acquisition module, retrieves candidates' historical training data, extracts multimodal feature sequences from the historical training data and real-time multimodal raw data streams, and uses the extracted feature sequences to train a gated recurrent unit network to predict the behavioral feature vector of the next time period based on the behavioral characteristics of the previous time period, thus forming an initial digital twin of individual behavior. During the assessment, the system receives real-time synchronized multimodal raw data streams from the data acquisition module, performs time alignment processing on the data streams to obtain time-aligned multimodal feature sequences, inputs the time-aligned multimodal feature sequences into the gated recurrent unit network, and outputs the predicted values of the behavioral feature vectors within a short time window. The system receives real-time behavioral feature vectors from the behavior analysis module, calculates the difference loss between the predicted value output by the gated recurrent unit network and the true value provided by the behavior analysis module, and adjusts the weights of some layers of the gated recurrent unit network locally using the backpropagation algorithm. Based on the weight adjustment results, an encrypted weight update is generated and uploaded to the central server. The global average update is calculated using a federated averaging algorithm and then securely distributed to each candidate's terminal to update their local individual gated recurrent unit network model.
[0008] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the behavior analysis module is configured as follows: The system receives a synchronous multimodal raw data stream from the data acquisition module and extracts in parallel the head pose, eye opening and gaze direction of the video stream, the Mel frequency cepstral coefficients and fundamental frequency profile of the audio stream, and the keystroke interval time, average mouse movement speed and page focus dwell time of the operation event sequence. The extracted features, including head pose, eye opening and closing, gaze direction, Mel frequency cepstral coefficients, fundamental frequency profile, keystroke interval time, average mouse movement speed, and page focus dwell time, are concatenated into a high-dimensional vector. This high-dimensional vector is then input into a multi-head attention network to learn the importance weights of different feature dimensions in the current exam context and output a weighted unified context feature vector. The system receives a dynamic behavior baseline from the twin modeling module, obtains the predicted behavior feature vector at the corresponding time point from the dynamic behavior baseline, calculates the cosine similarity between the real-time context feature vector output by the multi-head attention network and the predicted feature vector, and obtains a similarity sequence. The similarity sequence is segmented into fixed-length sliding windows. The sliding window data is input into a pre-trained isolation forest model. By calculating the path length required for data points to be isolated, windows with path lengths significantly shorter than the average level are identified as anomalous behavior segments.
[0009] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the identity verification module is configured as follows: At the start of the exam, candidates are instructed to perform specific actions to extract facial video streams and obtain high-quality frames from the synchronous multimodal raw data stream output by the data acquisition module. After liveness detection, three-dimensional facial feature point clouds are extracted from the high-quality frames, and the three-dimensional facial feature point clouds are compared with the registered template to complete the initial verification. During the assessment, the system continuously receives synchronous multimodal raw data streams from the data acquisition module and extracts facial video streams and operation event sequences from them. It periodically captures frames from the video stream to extract lightweight facial feature descriptors and extracts keyboard dynamic features from the operation events. The extracted lightweight facial feature descriptors and keyboard dynamic features are matched with the registration template to obtain independent scores. The independent scores are then fused and smoothed using a Kalman filter to estimate the overall identity confidence score and variance. The system monitors network latency and video illumination uniformity in real time, estimates the authentication quality factor based on these factors, dynamically adjusts the identity anomaly detection threshold based on the variance estimated by the authentication quality factor and the Kalman filter, and outputs an identity confidence curve based on the adjusted identity anomaly detection threshold and the comprehensive identity confidence score. The identity confidence curve includes a timestamp sequence, a real-time identity confidence score, and the identity anomaly detection threshold.
[0010] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the fraud detection module is configured as follows: The system receives abnormal fragments from the behavior analysis module and identity confidence curves from the identity verification module. It extracts low-confidence intervals from the identity confidence curves and transforms the abnormal fragments and low-confidence intervals into risk events with timestamps and intensity values. Metadata of all online candidates is obtained from the candidate information database. The metadata includes login IP geographic mapping and answer timestamp sequence. Using candidates as nodes, spatial proximity edges are constructed based on IP range or virtual examination room number. Behavioral similarity edges are constructed based on the similarity of answer time and answer options, forming a dynamic spatiotemporal graph. Risk events are injected as node attributes into the dynamic spatiotemporal graph. Spatiotemporal graph convolutional networks are used to learn the risk propagation patterns in the dynamic spatiotemporal graph, identify the abnormal clustering patterns of risk events in specific subgraph structures, and output a list of potential collaborative cheating groups and their confidence levels. The risk events and potential collusion cheating groups, along with their confidence levels, are input into the incomplete information game model as payoff signals to update the system's monitoring strategy and output the initial cheating risk probability for each candidate.
[0011] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the parallel review module is configured as follows: The system receives a preliminary risk score from the cheating detection module. When the preliminary risk score exceeds a preset threshold, it retrieves the complete operation sequence of the corresponding candidate during the risk period corresponding to the preliminary risk score from the behavior analysis module. The operation sequence is then mapped to a pre-constructed medical examination knowledge graph to analyze the rationality of the behavior and knowledge logic and output the logical contradiction coefficient. Abnormal segments are received from the behavior analysis module. Counterfactual hypotheses are constructed for the abnormal segments. The candidate's personal digital twin model in the twin modeling module is used to simulate and deduce the subsequent answering behavior and results under normal conditions. The abnormal behavior is compared with the actual records to quantify the treatment effect of abnormal behavior on the change in answering accuracy and obtain the strength of the causal effect. The logical contradiction coefficient and the causal effect strength are combined to generate a correction vector. The correction vector is sent to the fraud detection module to adjust the risk weight of the incomplete information game model, sent to the behavior analysis module to fine-tune the decision boundary of the isolated forest model, and sent to the twin modeling module as a regularization constraint for model update.
[0012] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the parallel review module is further configured as follows: The answer sequence and modification traces of the examinee are extracted from the complete operation sequence received by the behavior analysis module. The video stream is extracted from the synchronous multimodal raw data stream output by the data acquisition module, and the gaze movement trajectory is extracted from it. The answer sequence, modification traces, gaze movement trajectory and the difficulty of the test questions and knowledge points are semantically correlated. Based on the semantic association analysis results, the matching degree between the pattern of candidates' gaze lingering on the relevant reference material area when answering questions on related knowledge points consecutively and the learning and cognitive patterns is calculated in order to identify logical contradictions.
[0013] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the dynamic evaluation module is configured as follows: The system receives a preliminary risk score from the fraud detection module and a correction vector generated by combining the logical contradiction coefficient and the causal effect strength from the parallel review module. The system calculates the comprehensive risk score using a weighted average formula, where the weights of the correction vector are based on a preset confidence factor. Establish a Markov logic network, define soft rules, and input the primary risk score, logical contradiction coefficient, and causal effect strength as observation nodes into the Markov logic network. Output the comprehensive fraud probability through probabilistic reasoning. Based on the overall fraud probability, combined with the initial risk score received from the fraud detection module, the correction vector received from the parallel review module, the abnormal fragments received from the behavior analysis module, and the identity confidence curve received from the identity verification module, a structured report is generated, which displays the time of abnormal behavior, related collaborating candidates, knowledge graph contradictions, and counterfactual reasoning results.
[0014] Furthermore, in the multimodal behavior perception-based online medical assessment management system of the present invention, the correction vector output by the parallel review module is transmitted to the fraud detection module, behavior analysis module, and twin modeling module, including: The parallel review module transmits the generated correction vector to the fraud detection module. The fraud detection module uses the logical contradiction coefficient in the correction vector to adjust the weight parameters of different types of risks in the incomplete information game model. The parallel review module transmits the correction vector to the behavior analysis module, which uses the causal effect strength in the correction vector to adjust the boundary threshold for judging abnormal behavior segments in the isolated forest algorithm. The parallel review module transmits the correction vector to the twin modeling module. The twin modeling module uses the logical contradiction coefficient in the correction vector as a regularization constraint in the training process of the gated recurrent unit network model to dynamically update the baseline parameters of individual behavior. By directing the transmission and parameter adjustment of correction vectors between the fraud detection module, behavior analysis module, and twin modeling module, a system-level adaptive optimization loop is formed.
[0015] Beneficial effects of this invention: This invention employs a data acquisition module to collect facial video streams, environmental audio streams, and operation event sequences from the examinee's terminal and outputs synchronized multimodal raw data streams, providing a high-quality data foundation with strict time alignment for subsequent analysis. A twin modeling module constructs a personalized digital twin of the examinee based on historical data and outputs a dynamic behavioral baseline, enabling accurate prediction and continuous optimization of individual behavioral patterns. A behavior analysis module uses a multi-head attention mechanism to fuse multimodal features and compares real-time features with the dynamic behavioral baseline to output abnormal segments, improving the accuracy of abnormal behavior identification. An identity verification module implements silent continuous authentication and outputs an identity confidence curve, effectively preventing proxy test-taking. A cheating detection module constructs an incomplete information game model. Furthermore, the system utilizes spatiotemporal graph convolutional networks to analyze the spatial correlation of behaviors within the examination room and outputs a preliminary risk score, enhancing the ability to detect collaborative cheating. The parallel review module performs semantic association analysis through a medical examination knowledge graph and generates correction vectors using a counterfactual causal reasoning framework to achieve consistency verification of behavioral logic. The dynamic evaluation module integrates evidence through a confidence-weighted fusion algorithm and outputs a comprehensive evaluation report using Markov logic network reasoning, improving the reliability of decision-making. The correction vector is transmitted directionally between the cheating detection module, behavior analysis module, and twin modeling module to form an adaptive optimization loop, enabling the system to dynamically adjust parameters to reduce false alarms and false negatives, thereby comprehensively improving the anti-cheating capabilities, fairness, and credibility of online medical examinations. Attached Figure Description
[0016] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on the drawings without creative effort.
[0017] Figure 1 This is a system architecture diagram of the medical online assessment and management system based on multimodal behavior perception according to the present invention. Detailed Implementation
[0018] To make the technical solution of the present invention clearer, the present invention will be clearly and completely described below with reference to specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. The present invention provided by various embodiments will be described in detail below with reference to the accompanying drawings. To better understand the purpose of the present invention, the present invention will be described in further detail below.
[0019] Please see Figure 1 The medical online assessment and management system based on multimodal behavior perception provided by the present invention includes: The data acquisition module is configured to acquire facial video streams, environmental audio streams, and operation event sequences from the examinee's terminal, and output synchronized multimodal raw data streams; The twin modeling module is configured to receive synchronous multimodal raw data streams output by the data acquisition module, construct personalized digital twins of examinees based on historical data, and aggregate and update the global behavioral representation model in the cloud through a federated averaging algorithm to output dynamic behavioral baselines. The behavior analysis module is configured to receive the synchronous multimodal raw data stream output by the data acquisition module and the dynamic behavior baseline output by the twin modeling module. It uses a multi-head attention mechanism to fuse multimodal features, compares the real-time features with the dynamic behavior baseline, and outputs abnormal segments. The identity verification module is configured to extract facial video streams and operation event sequences from the synchronous multimodal raw data stream output by the data acquisition module, perform silent continuous authentication, and output an identity confidence curve. The cheating detection module is configured to receive abnormal segments output by the behavior analysis module and low confidence intervals in the identity confidence curve output by the identity verification module. It constructs an incomplete information game model, uses a spatiotemporal graph convolutional network to analyze the spatial correlation of the behaviors of multiple candidates in the examination room, and outputs a preliminary risk score. The parallel review module is configured to receive abnormal fragments output by the behavior analysis module, low confidence intervals in the identity confidence curve output by the identity verification module, and primary risk scores output by the fraud detection module. It performs semantic association analysis through a medical examination knowledge graph and generates correction vectors using a counterfactual causal reasoning framework. The dynamic evaluation module is configured to receive the initial risk score output by the fraud detection module and the correction vector output by the parallel review module, integrate them through a confidence-weighted fusion algorithm, use a Markov logic network to reason about contradictory evidence, and output a comprehensive evaluation report. The correction vector output by the parallel review module is transmitted to the fraud detection module, the behavior analysis module, and the twin modeling module.
[0020] The medical online assessment and management system based on multimodal behavior perception achieves comprehensive monitoring and fraud prevention of the assessment process through collaborative data processing of multiple modules. The data acquisition module, as the system's data source, is responsible for collecting facial video streams, environmental audio streams, and operation event sequences from the candidate's terminal. The module uses the terminal's camera to capture raw video frames at a fixed frame rate, uses the microphone to collect environmental audio waveforms at a specific sampling rate, and simultaneously listens for underlying system events to obtain keyboard keystrokes, mouse coordinates and click events, and changes in browser tab activation status. The acquired video, audio, and operation event data are stamped with microsecond-level timestamps obtained from a high-precision network time protocol server, and hardware synchronization pulse signals are sent to each sensor driver layer before transmission to align with the acquisition clock start point, thus ensuring strict synchronization of multimodal data. The video data undergoes face detection and region-of-interest (ROI) cropping for size compression; the audio data is labeled with voice activity detection tags including human voice segments; and the operation event data is aggregated into a time-ordered event sequence. The compressed video data, tagged audio data, and aggregated event sequences, along with their corresponding timestamps, are encapsulated into a unified format data packet and streamed to the message queue of the backend processing server via a latency-resistant network protocol, providing a high-quality synchronous data stream for subsequent modules.
[0021] The twin modeling module receives synchronous multimodal raw data streams from the data acquisition module and constructs personalized digital twins of examinees based on historical data. The module retrieves examinees' historical training data, extracts multimodal feature sequences from historical data and real-time data streams, and uses these feature sequences to train a gated recurrent unit network (GRU). The network learns to predict behavioral feature vectors for the next time period based on behavioral characteristics from the previous time period, forming an initial personal behavioral digital twin. During the assessment, the module performs time alignment processing on the real-time received multimodal raw data streams to obtain time-aligned multimodal feature sequences, which are then input into the GRU to predict the behavioral feature vectors within a short time window. The module receives real-time behavioral feature vectors from the behavior analysis module, calculates the difference loss between the predicted and actual values, and locally adjusts the weights of some layers of the GRU using a backpropagation algorithm. Based on the weight adjustment results, an encrypted weight update is generated, uploaded to the central server, and then the global average update is calculated using a federated averaging algorithm. The global average update is securely distributed to each examinee's terminal to update their local personal GRU model, achieving dynamic optimization and privacy protection of the behavioral representation model.
[0022] The behavior analysis module receives the synchronous multimodal raw data stream from the data acquisition module and the dynamic behavior baseline output by the Siamese modeling module in parallel. From the synchronous multimodal raw data stream, the module extracts head pose, eye opening and gaze direction from the video stream, Mel-frequency cepstral coefficients and fundamental frequency profile from the audio stream, and keystroke interval time, average mouse movement speed, and page focus dwell time from the operation event sequence in parallel. The extracted features of head pose, eye opening and gaze direction, Mel-frequency cepstral coefficients, fundamental frequency profile, keystroke interval time, average mouse movement speed, and page focus dwell time are concatenated into a high-dimensional vector. This high-dimensional vector is input into a multi-head attention network, which learns the importance weights of different feature dimensions in the current exam context and outputs a weighted unified context feature vector. The module receives the dynamic behavior baseline from the Siamese modeling module, obtains the predicted behavior feature vector at the corresponding time point, calculates the cosine similarity between the real-time context feature vector output by the multi-head attention network and the predicted feature vector, and obtains a similarity sequence. Similarity sequences are divided into fixed-length sliding windows. The sliding window data is input into a pre-trained isolation forest model. The model identifies windows with path lengths significantly shorter than the average by calculating the path length required to isolate data points, and marks them as anomalous behavior segments.
[0023] The identity verification module extracts facial video streams and operation event sequences from the synchronous multimodal raw data stream output by the data acquisition module, and performs silent continuous authentication. At the start of the exam, the module instructs the examinee to perform specific actions, acquire high-quality frames from the facial video stream, extract 3D facial feature point clouds after liveness detection, and compare the 3D facial feature point clouds with the registered template to complete the initial verification. During the assessment, the module continuously receives the synchronous multimodal raw data stream, periodically captures frames from the video stream to extract lightweight facial feature descriptors, and extracts keyboard kinetic features from the operation events. The lightweight facial feature descriptors and keyboard kinetic features are matched with the registered template to obtain independent scores. The independent scores are fused and smoothed using a Kalman filter to estimate the overall identity confidence score and variance. The module monitors network latency and video illumination uniformity in real time, estimates the authentication quality factor based on network latency and video illumination uniformity, dynamically adjusts the identity anomaly detection threshold based on the authentication quality factor and the variance estimated by the Kalman filter, and outputs an identity confidence curve based on the adjusted identity anomaly detection threshold and the comprehensive identity confidence score. The curve includes the timestamp sequence, the real-time identity confidence score, and the identity anomaly detection threshold.
[0024] The cheating detection module receives abnormal fragments from the behavior analysis module and low-confidence intervals from the identity confidence curve output by the identity verification module. The module transforms these abnormal fragments and low-confidence intervals into risk events with timestamps and intensity values. The module retrieves metadata from the candidate information database for all online candidates, including login IP geographic mappings and answer timestamp sequences. Using candidates as nodes, it constructs spatial proximity edges based on IP ranges or virtual exam room numbers, and behavioral similarity edges based on the similarity between answer times and answer options, forming a dynamic spatiotemporal graph. Risk events are injected as node attributes into the dynamic spatiotemporal graph. A temporal graph convolutional network is used to learn risk propagation patterns in the dynamic spatiotemporal graph, identifying abnormal clustering patterns of risk events in specific subgraph structures, and outputting a list of potential collaborative cheating groups and their confidence levels. Risk events, the list of potential collaborative cheating groups, and their confidence levels are input as payoff signals into an incomplete information game model. The model updates the system monitoring strategy and outputs the initial cheating risk probability for each candidate.
[0025] The parallel review module receives abnormal segments output by the behavior analysis module, low-confidence intervals from the identity confidence curve output by the identity verification module, and preliminary risk scores output by the fraud detection module. When the preliminary risk score exceeds a preset threshold, the module retrieves the complete operation sequence of the corresponding candidate during the risk period corresponding to the preliminary risk score from the behavior analysis module. It maps the operation sequence to a pre-constructed medical assessment knowledge graph, analyzes the rationality of the behavior and knowledge logic, and outputs a logical contradiction coefficient. The module extracts the candidate's answering order and modification traces from the complete operation sequence received by the behavior analysis module, extracts the video stream from the synchronous multimodal raw data stream output by the data acquisition module, and extracts the gaze movement trajectory. It performs semantic association analysis on the answering order, modification traces, gaze movement trajectory, and question difficulty and knowledge points. Based on the semantic association analysis results, it calculates the matching degree between the candidate's gaze pattern lingering in the relevant reference material area when continuously answering questions related to relevant knowledge points and the learning and cognitive patterns, in order to identify logical contradictions. The module receives abnormal segments from the behavior analysis module, constructs counterfactual hypotheses for these segments, and uses the candidate's individual digital twin model in the twin modeling module to simulate and extrapolate subsequent answering behaviors and results under normal conditions. It compares these simulations with actual records to quantify the impact of abnormal behavior on the accuracy of answering questions, obtaining the strength of the causal effect. The logical contradiction coefficient and the causal effect strength are combined to generate a correction vector. This correction vector is sent to the cheating detection module to adjust the risk weights of the incomplete information game model, to the behavior analysis module to fine-tune the decision boundary of the isolated forest model, and to the twin modeling module as a regularization constraint for model updates.
[0026] The dynamic assessment module receives the initial risk score from the fraud detection module and the correction vector from the parallel review module. The correction vector includes a logical contradiction coefficient and a causal effect strength. The module calculates the comprehensive risk score using a weighted average formula, where the weights of the correction vector are based on preset confidence factors. The module establishes a Markov logic network, defines soft rules, and inputs the initial risk score, logical contradiction coefficient, and causal effect strength as observation nodes into the Markov logic network. Through probabilistic reasoning, it outputs a comprehensive fraud probability. Based on the comprehensive fraud probability, the module combines the initial risk score received from the fraud detection module, the correction vector received from the parallel review module, the abnormal segments received from the behavior analysis module, and the identity confidence curve received from the identity verification module to generate a structured report. The report displays the time of abnormal behavior, associated collaborating candidates, knowledge graph contradictions, and counterfactual reasoning results.
[0027] The system transmits correction vectors output by the parallel review module to the fraud detection module, behavior analysis module, and twin modeling module, forming a feedback loop. The fraud detection module uses the logical contradiction coefficient in the correction vector to adjust the weight parameters of different types of risks in the incomplete information game model; the behavior analysis module uses the causal effect strength in the correction vector to adjust the boundary threshold for judging abnormal behavior segments in the isolated forest algorithm; and the twin modeling module uses the logical contradiction coefficient in the correction vector as a regularization constraint term in the training process of the gated recurrent unit network model, dynamically updating the baseline parameters of individual behavior. The targeted transmission and parameter adjustment of the correction vector achieve system-level adaptive optimization, improving the accuracy and reliability of assessment and monitoring.
[0028] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the data acquisition module is configured as follows: The candidate's terminal camera is used to capture raw video frames at a fixed frame rate to obtain video data; the microphone is used to capture ambient audio waveforms at a specific sampling rate to obtain audio data; and the system's underlying events are monitored to capture keyboard keystrokes, mouse coordinates and click events, and browser tab activation status change events to obtain operation event data. The acquired video data, audio data, and operation event data are stamped with microsecond-level timestamps obtained from a high-precision network time protocol server, and hardware synchronization pulse signals are sent to each sensor driver layer before transmission to align the acquisition clock start point. Face detection and region of interest cropping are performed on the stamped video data to compress the image size. Speech activity detection is applied to the stamped audio data to label segments including human voices. The stamped operation event data are aggregated into an event sequence sorted by time. The compressed video data, tagged audio data, and aggregated event sequences, along with their corresponding timestamps, are encapsulated into a data packet of a unified format. This encapsulated data packet is then streamed to the message queue of the backend processing server via a latency-resistant network protocol.
[0029] The data acquisition module constructs a synchronous multimodal raw data stream through the collaborative work of multiple source sensors. The module utilizes the examinee's terminal camera to capture RGB video frames at a fixed frame rate per second, obtaining video data including the examinee's face and upper body. Simultaneously, it activates a microphone array to acquire environmental audio waveforms at a specific sampling rate, capturing ambient sound and potential voice interaction signals from the examination room. The system's underlying event listener records keystrokes, mouse coordinates, click events, and browser tab activation status changes in real time, forming an operation event data stream. These three data streams establish temporal correlation at the acquisition end, providing a foundation for subsequent multimodal behavior analysis.
[0030] The time synchronization mechanism employs a layered time synchronization strategy. Video data, audio data, and operation event data are tagged with microsecond-level timestamps obtained from a high-precision network time protocol server during generation. Before data transmission, hardware synchronization pulse signals are sent to each sensor driver layer to uniformly calibrate the start points of the acquisition clocks of the cameras, microphones, and input devices. This hardware-level synchronization scheme effectively overcomes the cumulative errors of software timestamps, ensuring strict alignment of data from different modalities in the time dimension and providing a reliable time reference for subsequent cross-modal feature fusion.
[0031] The data preprocessing stage is optimized for different modal characteristics. For video data, face detection algorithms are used to locate the examinee's facial region, and regions of interest are cropped, compressing the image size to 30% of its original size while preserving key visual information. For audio data, energy threshold-based speech activity detection technology is applied to identify valid audio segments including human voices, filtering out silent segments and environmental noise. Discrete operation events are aggregated into a structured event sequence in chronological order, recording the timestamp, type, and parameters of each operation. This preprocessing significantly reduces transmission bandwidth and storage pressure while maintaining data quality.
[0032] Data encapsulation and transmission employ a streaming processing architecture. Compressed video data, tagged audio data, and aggregated event sequences, along with their corresponding timestamps, are encapsulated into container format data packets conforming to ISO standards. The packet header includes metadata such as version number and timestamp checksum. The encapsulated data packets are streamed via a UDP-based, latency-resistant network protocol. Forward error correction is used during transmission to ensure data integrity, and the data is ultimately persisted to the Kafka message queue on the backend processing server, providing high-throughput, low-latency data access services to downstream modules.
[0033] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the twin modeling module is configured as follows: The system receives synchronous multimodal raw data streams from the data acquisition module, retrieves candidates' historical training data, extracts multimodal feature sequences from the historical training data and real-time multimodal raw data streams, and uses the extracted feature sequences to train a gated recurrent unit network to predict the behavioral feature vector of the next time period based on the behavioral characteristics of the previous time period, thus forming an initial digital twin of individual behavior. During the assessment, the system receives real-time synchronized multimodal raw data streams from the data acquisition module, performs time alignment processing on the data streams to obtain time-aligned multimodal feature sequences, inputs the time-aligned multimodal feature sequences into the gated recurrent unit network, and outputs the predicted values of the behavioral feature vectors within a short time window. The system receives real-time behavioral feature vectors from the behavior analysis module, calculates the difference loss between the predicted value output by the gated recurrent unit network and the true value provided by the behavior analysis module, and adjusts the weights of some layers of the gated recurrent unit network locally using the backpropagation algorithm. Based on the weight adjustment results, an encrypted weight update is generated and uploaded to the central server. The global average update is calculated using a federated averaging algorithm and then securely distributed to each candidate's terminal to update their local individual gated recurrent unit network model.
[0034] The twin modeling module receives synchronous multimodal raw data streams from the data acquisition module and combines them with the examinee's historical training data to construct a personalized digital twin of the examinee. The module extracts multimodal feature sequences from the historical training data and the real-time multimodal raw data stream. These feature sequences include facial expression changes from the facial video stream, speech rhythm features from the environmental audio stream, and behavioral interval features from the action event sequence. The extracted multimodal feature sequences are used to train a gated recurrent unit network. The network learns to predict behavioral feature vectors for the next time period based on behavioral features from the previous time period, forming an initial digital twin of the examinee's behavior, thereby establishing a baseline model of the examinee's individual behavioral patterns.
[0035] During the assessment, the module continuously receives real-time synchronous multimodal raw data streams transmitted from the data acquisition module. The data streams undergo time alignment processing to eliminate time discrepancies between different sensors, resulting in time-aligned multimodal feature sequences. These time-aligned multimodal feature sequences are then input into a gated recurrent unit network (GRU). The network outputs predicted values of behavioral feature vectors within a short time window, representing the expected normal behavior of the examinee under undisturbed conditions.
[0036] The module receives real-time behavioral feature vectors from the behavior analysis module as true values, compares them with the predicted values output by the gated recurrent unit network, and calculates the difference loss between the predicted and true values. The difference loss is applied locally to adjust the weights of some hidden layers of the gated recurrent unit network using the backpropagation algorithm, focusing on optimizing the connection weights of neurons in the network that are sensitive to time-series features.
[0037] Based on the weight adjustment results, encrypted weight update values are generated, and homomorphic encryption is used to encrypt the weight parameters. These encrypted weight update values are uploaded to the central server, which aggregates the encrypted weight update values from all candidate terminals using a federated averaging algorithm to calculate the global average update value. The global average update value is then distributed to each candidate terminal via a secure transmission protocol. Each terminal uses the decrypted update value to update its local individual gated recurrent unit network model, achieving distributed collaborative optimization of model parameters.
[0038] The module utilizes a federated learning framework to continuously evolve the global behavioral representation model while protecting the original candidate data from leaving the local terminal. The digital twin is automatically updated after each assessment, gradually adapting to changes in candidate behavior patterns and improving the accuracy of behavioral baseline predictions. Differential privacy techniques are used to add random noise during model updates to prevent individual data from being reverse-engineered during aggregation, thus strengthening privacy protection mechanisms.
[0039] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the behavior analysis module is configured as follows: The system receives a synchronous multimodal raw data stream from the data acquisition module and extracts in parallel the head pose, eye opening and gaze direction of the video stream, the Mel frequency cepstral coefficients and fundamental frequency profile of the audio stream, and the keystroke interval time, average mouse movement speed and page focus dwell time of the operation event sequence. The extracted features, including head pose, eye opening and closing, gaze direction, Mel frequency cepstral coefficients, fundamental frequency profile, keystroke interval time, average mouse movement speed, and page focus dwell time, are concatenated into a high-dimensional vector. This high-dimensional vector is then input into a multi-head attention network to learn the importance weights of different feature dimensions in the current exam context and output a weighted unified context feature vector. The system receives a dynamic behavior baseline from the twin modeling module, obtains the predicted behavior feature vector at the corresponding time point from the dynamic behavior baseline, calculates the cosine similarity between the real-time context feature vector output by the multi-head attention network and the predicted feature vector, and obtains a similarity sequence. The similarity sequence is segmented into fixed-length sliding windows. The sliding window data is input into a pre-trained isolation forest model. By calculating the path length required for data points to be isolated, windows with path lengths significantly shorter than the average level are identified as anomalous behavior segments.
[0040] The behavior analysis module identifies behavioral deviations in test takers through multimodal feature fusion and anomaly detection mechanisms. The module receives synchronized multimodal raw data streams from the data acquisition module, including strictly time-aligned facial video streams, environmental audio streams, and operation event sequences. The module initiates three feature extraction channels in parallel: the video processing channel uses the OpenPose algorithm to extract attitude parameters such as head yaw and pitch angles from the video stream, calculates eye opening using eye aspect ratio, and obtains the gaze direction vector through a gaze estimation algorithm; the audio processing channel uses a Mel filter bank to extract Mel frequency cepstral coefficients and calculates the fundamental frequency profile using an autocorrelation function; the operation behavior analysis channel statistically analyzes the mean keystroke time interval, calculates the standard deviation of mouse movement speed, and records the percentile of page focus dwell time. The feature extraction processes of the three modalities share a unified time reference, ensuring temporal consistency among features.
[0041] The feature fusion stage maps heterogeneous features to a unified vector space. The extracted 28-dimensional features, including head pose parameters, eye opening / closing values, gaze direction vector, Mel frequency cepstral coefficient sequence, fundamental frequency contour value, keystroke interval statistics, average mouse movement speed, and page focus dwell time, are Z-score standardized and concatenated into a high-dimensional feature vector. This high-dimensional vector is input into a multi-head attention network with eight attention heads. The network uses a query-key-value matching mechanism to calculate the attention weights of visual, auditory, and operational features within a specific exam context (e.g., during multiple-choice question answering). The network outputs a weighted contextual feature vector, where features highly relevant to the current task (e.g., gaze movement features during calculation questions) receive higher weights.
[0042] The behavior comparison process incorporates a dynamic behavior baseline provided by the twin modeling module as a personalized benchmark. The module receives this dynamic behavior baseline, generated based on candidates' historical data, and extracts predicted behavior feature vectors for corresponding time points. By calculating the cosine similarity between the real-time context feature vector and the predicted feature vector, a similarity curve in time series form is generated. The similarity calculation process incorporates a dynamic time warping algorithm to eliminate differences in individual behavioral rhythms and avoid misjudgments caused by varying answering speeds.
[0043] Anomaly detection employs a strategy combining sliding window analysis and the isolated forest algorithm. The similarity sequence is divided into sliding windows of 5 seconds each, with a step size of 1 second, and each window contains 50 sampling points. The window data is input into a pre-trained isolated forest model. The model constructs isolated trees by randomly selecting features and calculates the path length required to isolate each data point. The path length reflects the degree of deviation of a data point from the mainstream distribution. When the average path length of data points within a window is less than two standard deviations below the overall mean, that window is marked as an anomalous behavior segment. The system records the start and end times, main anomalous modes, and degree of deviation of the anomalous segments, providing quantitative evidence for subsequent fraud analysis.
[0044] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the identity verification module is configured as follows: At the start of the exam, candidates are instructed to perform specific actions to extract facial video streams and obtain high-quality frames from the synchronous multimodal raw data stream output by the data acquisition module. After liveness detection, three-dimensional facial feature point clouds are extracted from the high-quality frames, and the three-dimensional facial feature point clouds are compared with the registered template to complete the initial verification. During the assessment, the system continuously receives synchronous multimodal raw data streams from the data acquisition module and extracts facial video streams and operation event sequences from them. It periodically captures frames from the video stream to extract lightweight facial feature descriptors and extracts keyboard dynamic features from the operation events. The extracted lightweight facial feature descriptors and keyboard dynamic features are matched with the registration template to obtain independent scores. The independent scores are then fused and smoothed using a Kalman filter to estimate the overall identity confidence score and variance. The system monitors network latency and video illumination uniformity in real time, estimates the authentication quality factor based on these factors, dynamically adjusts the identity anomaly detection threshold based on the variance estimated by the authentication quality factor and the Kalman filter, and outputs an identity confidence curve based on the adjusted identity anomaly detection threshold and the comprehensive identity confidence score. The identity confidence curve includes a timestamp sequence, a real-time identity confidence score, and the identity anomaly detection threshold.
[0045] The identity verification module continuously verifies the candidate's identity through a multi-stage authentication mechanism. At the start of the exam, the module instructs the candidate to complete a specific sequence of actions, such as continuous blinking or slow head turning, extracting facial video streams in real time from the synchronous multimodal raw data stream provided by the data acquisition module. The system employs 3D structured light-based liveness detection technology to filter high-quality frames from the video stream, excluding frames that are occluded, blurred, or poorly lit. A dense facial keypoint detection algorithm extracts 3D point cloud data, including feature points such as the tip of the nose, corners of the eyes, and corners of the mouth, from the high-quality frames. The point cloud data is then iteratively matched with the pre-registered candidate's 3D facial template using a nearest-neighbor algorithm. The Hausdorff distance between the point clouds is calculated as a similarity score, and initial identity verification is completed when the score exceeds a preset threshold.
[0046] During the assessment, the module enters a silent continuous authentication mode. It continuously acquires synchronous multimodal raw data streams from the data acquisition module, separating the facial video stream and the operation event sequence in real time. The system captures image frames from the video stream at a frequency of 2 frames per second, extracting 128 facial feature descriptors using a lightweight convolutional neural network. Simultaneously, it extracts keyboard dynamics features from the operation event sequence, including the standard deviation of keystroke interval time, key pressure variation patterns, and input rhythm regularity. The feature extraction process employs a sliding window mechanism to ensure a balance between real-time performance and computational efficiency.
[0047] The identity feature fusion stage employs a multi-source score adaptive weighting strategy. Lightweight facial feature descriptors and keyboard kinetic features are compared with the registration template for similarity calculation. Facial features are matched using a cosine similarity algorithm, while keyboard features are matched using a dynamic time warping algorithm, resulting in two independent scores. These two score sequences are then input into an extended Kalman filter for fusion. The filter estimates the current comprehensive identity confidence score and its variance using a state-space model, with filter parameters dynamically adjusted based on historical score fluctuations to accommodate natural changes in test-taker behavior.
[0048] The authentication quality assessment and threshold adjustment mechanism enables dynamic fault tolerance. The module monitors network transmission latency and the illumination uniformity index of the video stream in real time. When network latency exceeds 100 milliseconds or illumination uniformity falls below 0.7, the authentication quality factor is automatically lowered. Combining the variance value estimated by the Kalman filter, a fuzzy logic controller dynamically calculates the identity anomaly detection threshold. When the variance increases, the threshold is appropriately relaxed to avoid false alarms. The final output identity confidence curve includes a millisecond-level timestamp, a smoothed real-time confidence score, and a dynamically changing detection threshold, forming a complete authentication trajectory record.
[0049] The module enhances system robustness through multimodal feature complementarity. When facial features become temporarily unreliable due to changes in lighting, keyboard kinetic features can maintain basic authentication capabilities. This design effectively addresses common challenges in remote examinations, such as network fluctuations and lighting changes, minimizing interference for test takers while ensuring security.
[0050] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the fraud detection module is configured as follows: The system receives abnormal fragments from the behavior analysis module and identity confidence curves from the identity verification module. It extracts low-confidence intervals from the identity confidence curves and transforms the abnormal fragments and low-confidence intervals into risk events with timestamps and intensity values. Metadata of all online candidates is obtained from the candidate information database. The metadata includes login IP geographic mapping and answer timestamp sequence. Using candidates as nodes, spatial proximity edges are constructed based on IP range or virtual examination room number. Behavioral similarity edges are constructed based on the similarity of answer time and answer options, forming a dynamic spatiotemporal graph. Risk events are injected as node attributes into the dynamic spatiotemporal graph. Spatiotemporal graph convolutional networks are used to learn the risk propagation patterns in the dynamic spatiotemporal graph, identify the abnormal clustering patterns of risk events in specific subgraph structures, and output a list of potential collaborative cheating groups and their confidence levels. The risk events and potential collusion cheating groups, along with their confidence levels, are input into the incomplete information game model as payoff signals to update the system's monitoring strategy and output the initial cheating risk probability for each candidate.
[0051] The fraud detection module identifies potential fraudulent behaviors through multi-source risk signal fusion and spatiotemporal correlation analysis. The module receives anomalous segments from the behavior analysis module, including time-stamped deviations in candidate behavior, and identity confidence curves from the identity verification module. The system extracts low-confidence intervals from the identity confidence curves using an adaptive threshold algorithm, with the threshold dynamically adjusted based on historical confidence distribution. Anomalous segments and low-confidence intervals are transformed into risk events in a standardized format. Each event includes a millisecond-level timestamp, an event type identifier, and an intensity value calculated based on the degree of deviation or confidence gap, forming a standardized risk input stream.
[0052] The module retrieves real-time updated online candidate metadata from the candidate information database, including the geographic coordinates of the login IP address and millisecond-level answer timestamp sequences. Using each candidate as a graph node, spatial proximity edges are constructed based on whether the IP address range belongs to the same subnet or the virtual examination room number; edge weights reflect the degree of geographical or logical proximity. Behavioral similarity edges are constructed based on the similarity between the answer time sequence and answer options. A dynamic time warping algorithm is used to calculate the similarity of answer rhythm, and the Jaccard similarity coefficient is used to calculate the overlap of answer options. The edge weights combine temporal and content similarity. Spatial proximity edges and behavioral similarity edges together constitute a dynamic spatiotemporal graph, whose structure is updated in real-time as the examination progresses.
[0053] Risk events are injected as node attributes into the dynamic spatiotemporal graph, with each candidate node appended with the timestamp and intensity value of the latest risk event. The spatiotemporal graph convolutional network employs a hierarchical aggregation mechanism. The bottom convolutional layers capture local spatial features and learn the diffusion patterns of risk among physically or logically adjacent candidates through the propagation of node neighbor information. Temporal convolutional layers handle changes in the time dimension and use dilated convolutions to capture long-term dependencies. The network uses an attention mechanism to weighted aggregate features from different time steps, identifying patterns of anomalous clustering of risk events in specific subgraph structures. For example, if multiple candidates within the same IP segment exhibit similar anomalous behavior at the same time, the network outputs a list of potential collaborative cheating groups and a confidence score based on cluster density.
[0054] Risk events and identified cheating groups are input into an incomplete information game model as payoff signals. The model treats the system as one player and the examinees as the other. The system's strategy space includes actions such as adjusting monitoring intensity and triggering review mechanisms. The game model uses the intensity of risk events as the immediate payoff signal and the confidence level of cheating groups as a potential threat indicator. The system dynamically adjusts its prior probability estimates of various cheating behaviors through Bayesian update rules and updates the policy value function using a Q-learning algorithm. The model outputs a preliminary cheating risk probability for each examinee, which integrates evidence of individual abnormal behavior and group-related risk, providing a quantitative assessment basis for subsequent modules.
[0055] The module uses spatiotemporal graph structure analysis to correlate isolated risk events into group behavior patterns, overcoming the limitations of detecting individual examinees. Dynamic spatiotemporal graph modeling allows the system to capture the dynamics of risk propagation within the examination room in real time, while the incomplete information game framework enables the system to adapt to changes in cheating strategies. This multi-dimensional analysis mechanism effectively improves the detection capability of complex cheating behaviors such as collaborative cheating, while retaining uncertainty metrics through probabilistic output to support progressive analysis by subsequent modules.
[0056] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the parallel review module is configured as follows: The system receives a preliminary risk score from the cheating detection module. When the preliminary risk score exceeds a preset threshold, it retrieves the complete operation sequence of the corresponding candidate during the risk period corresponding to the preliminary risk score from the behavior analysis module. The operation sequence is then mapped to a pre-constructed medical examination knowledge graph to analyze the rationality of the behavior and knowledge logic and output the logical contradiction coefficient. Abnormal segments are received from the behavior analysis module. Counterfactual hypotheses are constructed for the abnormal segments. The candidate's personal digital twin model in the twin modeling module is used to simulate and deduce the subsequent answering behavior and results under normal conditions. The abnormal behavior is compared with the actual records to quantify the treatment effect of abnormal behavior on the change in answering accuracy and obtain the strength of the causal effect. The logical contradiction coefficient and the causal effect strength are combined to generate a correction vector. The correction vector is sent to the fraud detection module to adjust the risk weight of the incomplete information game model, sent to the behavior analysis module to fine-tune the decision boundary of the isolated forest model, and sent to the twin modeling module as a regularization constraint for model update.
[0057] The parallel review module generates a system correction vector through multi-dimensional evidence verification and counterfactual reasoning mechanisms. The module receives a preliminary risk score from the cheating detection module; when the score exceeds a dynamically adjusted threshold, a deep review process is triggered. The system retrieves the complete operation sequence of the corresponding examinee during the marked risk period from the behavior analysis module, including timestamps, operation types, and parameter details for all interaction events. The operation sequence is mapped and matched with a pre-constructed medical assessment knowledge graph, which includes medical logical relationships such as disease classification trees, symptom association networks, and treatment principle reasoning chains. By analyzing the semantic correlation between the examinee's answer path and knowledge graph nodes, the degree of deviation between the behavioral pattern and medical cognitive logic is calculated, outputting a quantified logical contradiction coefficient. For example, if the system detects that an examinee continuously jumps to completely unrelated anatomical reference areas while answering a cardiology question, it is marked as a logical inconsistency.
[0058] The module receives abnormal fragments from the behavior analysis module, triggering counterfactual causal reasoning. For each abnormal behavior fragment, a contrastive scenario hypothesis of "no abnormal factors" is constructed, and a personal digital twin in the twin modeling module is used to simulate behavioral deduction under normal cognitive conditions. The digital twin builds a personalized cognitive model based on the examinee's historical behavioral data, simulating a typical answer trajectory that conforms to medical thinking patterns under the same test question situation. The simulated expected answer sequence is compared with the actual record using difference analysis to quantify the disposition effect of abnormal behavior on answer accuracy, time distribution, and knowledge retrieval patterns. The significant difference between actual performance and counterfactual reasoning results is calculated using a difference-in-differences model to obtain a causal effect strength index, which reflects the actual impact of abnormal behavior on the assessment results.
[0059] A structured correction vector is generated by combining the logical contradiction coefficient and the causal effect strength. This correction vector is encapsulated in a multidimensional tensor form, including a time dimension label, anomaly type encoding, and correction strength value. After being transmitted to the fraud detection module, the weight parameters for different risk types in the incomplete information game model are adjusted, such as increasing the weight coefficient for knowledge-logical contradiction risks. After being sent to the behavior analysis module, the decision boundary threshold of the isolated forest model is fine-tuned, adaptively relaxing or tightening the detection sensitivity for specific types of anomaly patterns. Finally, after being transmitted to the Siamese modeling module, the vector serves as a regularization constraint for training the gated recurrent unit network, introducing logical consistency constraints during model updates to prevent the model from learning anomalous behavior patterns.
[0060] The module employs a feedback mechanism to adaptively optimize system parameters. When a correction vector repeatedly indicates a certain type of false alarm, the system automatically reduces the sensitivity of the corresponding detection mode; when undetected cheating patterns are repeatedly detected, the corresponding detection threshold is gradually tightened. This dynamic adjustment mechanism based on evidence accumulation enables the system to continuously learn, ensuring detection accuracy while minimizing interference with normal examinations. All correction operations are recorded in the audit log, forming a complete chain of decision-making traceability and providing explanatory evidence for disputes regarding assessment results.
[0061] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the parallel review module is further configured as follows: The answer sequence and modification traces of the examinee are extracted from the complete operation sequence received by the behavior analysis module. The video stream is extracted from the synchronous multimodal raw data stream output by the data acquisition module, and the gaze movement trajectory is extracted from it. The answer sequence, modification traces, gaze movement trajectory and the difficulty of the test questions and knowledge points are semantically correlated. Based on the semantic association analysis results, the matching degree between the pattern of candidates' gaze lingering on the relevant reference material area when answering questions on related knowledge points consecutively and the learning and cognitive patterns is calculated in order to identify logical contradictions.
[0062] The parallel review module verifies the consistency of assessment behavior logic through multi-source behavioral trajectory analysis. The module parses the candidate's answer sequence from the complete operation sequence received by the behavior analysis module, recording the jump paths and time intervals between questions, and extracting answer modification traces, including the number of modifications, modification times, and patterns of content changes. Interface interaction events in the operation sequence are reconstructed into an answer flow timeline, marking the start time, duration of each question, and modification operation sequence. Modification trace analysis focuses on the change patterns of answer options, such as repeated switching of options in multiple-choice questions or large additions and deletions of text in short-answer questions. These traces, combined with timestamps, form a behavioral modification trajectory.
[0063] The module synchronously separates the video stream from the synchronous multimodal raw data stream output by the data acquisition module and applies an eye-tracking algorithm to extract the gaze movement trajectory from the video frame sequence. The gaze tracking process uses pupil-corneal reflex technology to locate the gaze point coordinates, generating gaze movement sequence data in milliseconds. The trajectory data is smoothed to eliminate physiological eye movement noise, extracting meaningful gaze point clusters and saccade paths. The gaze trajectory includes features such as the gaze point coordinate sequence, dwell time, and movement speed, with features strictly aligned to the operation event timestamps.
[0064] The system performs multi-dimensional association mapping between three types of data streams—answer order, modification traces, and gaze movement trajectory—and the test question metadata. Test question difficulty information comes from a pre-defined difficulty coefficient database, while knowledge point information is obtained from a medical assessment knowledge graph, which includes semantic relationship networks such as disease classification, pathological mechanisms, and treatment principles. Semantic association analysis employs a graph neural network model, mapping behavioral trajectory features to corresponding nodes in the knowledge graph and calculating the semantic similarity between behavioral patterns and knowledge point associations. For example, the system analyzes the correlation between the duration of a candidate's gaze lingering in the electrocardiogram analysis area and cardiovascular physiology knowledge points when answering questions about cardiovascular diseases.
[0065] Based on semantic association analysis, the module calculates behavioral consistency indices when continuously answering questions related to relevant knowledge points. The system defines a learning and cognitive pattern model, built upon expert experience, describing typical gaze patterns that should appear in key areas of the reference materials during normal question-answering. Actual gaze sequences are dynamically time-warped and matched with the pattern model, calculating similarity scores between sequences. When test-takers rapidly skip key reference areas or abnormally linger in irrelevant areas on difficult knowledge point questions, the system identifies this as a logical contradiction. This contradiction detection, combined with abnormal answering order (such as skipping questions in reverse) and abnormal modification traces (such as repeatedly modifying answers to basic questions), constitutes a chain of evidence for abnormal behavioral logic.
[0066] The module effectively identifies abnormal patterns that violate the principles of medical learning through cross-modal correlation analysis of behavioral trajectories and knowledge logic. For example, when continuously assessing relevant pathological mechanisms, normal cognition should present a progressively deeper gaze movement pattern, while abnormal behavior may show skipping references or fixed-area staring. This fine-grained analysis compensates for the shortcomings of simple behavioral anomaly detection, improves the accuracy of fraud detection from the cognitive logic level, and provides the system with a deeper basis for decision-making.
[0067] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the dynamic evaluation module is configured as follows: The system receives a preliminary risk score from the fraud detection module and a correction vector generated by combining the logical contradiction coefficient and the causal effect strength from the parallel review module. The system calculates the comprehensive risk score using a weighted average formula, where the weights of the correction vector are based on a preset confidence factor. Establish a Markov logic network, define soft rules, and input the primary risk score, logical contradiction coefficient, and causal effect strength as observation nodes into the Markov logic network. Output the comprehensive fraud probability through probabilistic reasoning. Based on the overall fraud probability, combined with the initial risk score received from the fraud detection module, the correction vector received from the parallel review module, the abnormal fragments received from the behavior analysis module, and the identity confidence curve received from the identity verification module, a structured report is generated, which displays the time of abnormal behavior, related collaborating candidates, knowledge graph contradictions, and counterfactual reasoning results.
[0068] The dynamic assessment module generates a comprehensive assessment result through multi-source evidence fusion and probabilistic reasoning. The module receives a preliminary risk score from the fraud detection module, which is derived from spatiotemporal correlation analysis of abnormal behavior fragments and low-confidence intervals of identity. Simultaneously, it receives a correction vector from the parallel review module. This correction vector includes a logical contradiction coefficient and a causal effect strength. The logical contradiction coefficient reflects the degree of deviation between the behavior and the logic of medical knowledge, while the causal effect strength quantifies the impact of abnormal behavior on the accuracy of answers. The module uses a weighted average algorithm to calculate the comprehensive risk score, where the weights of the correction vector are dynamically adjusted based on a preset confidence factor. The confidence factor is calculated based on historical verification accuracy. The logical contradiction coefficient has a higher weight than the causal effect strength because logical contradictions in knowledge have higher discriminative value in medical assessments. The weighting process incorporates variance normalization to eliminate differences in the dimensions of different indicators, outputting a standardized comprehensive risk score.
[0069] The module establishes a Markov logic network model, with a network structure including an observation layer, a hidden layer, and an output layer. The observation layer defines three observation nodes: a primary risk scoring node, a logical contradiction coefficient node, and a causal effect strength node. The hidden layer defines soft rules, expressed as weighted first-order logic formulas, such as "if the logical contradiction coefficient is high and the causal effect strength is strong, then the probability of cheating is high." Each rule has an adjustable weight parameter. After the observation nodes are input into the network, probabilistic inference is performed using Gibbs sampling or belief propagation algorithms to calculate the joint probability distribution of all possible world states. The inference process considers rule weights and evidence conflicts; when multiple rules point to the same conclusion, the probabilities are superimposed; when the evidence is contradictory, the probabilities cancel each other out. The final output is a comprehensive cheating probability value, representing the posterior probability that the examinee has engaged in cheating.
[0070] Based on the overall fraud probability, the module integrates output data from multiple modules to generate a structured report. The report engine extracts time-series data of the initial risk score from the fraud detection module, obtains detailed calculations of the logical contradiction coefficient and causal effect strength in the correction vector from the parallel review module, retrieves the start and end times and feature descriptions of abnormal segments from the behavior analysis module, and imports fluctuation records of the identity confidence curve from the identity verification module. The data is assembled into a structured document using a template engine. The document includes an abnormal behavior timeline, marking the start time, duration, and triggering module of each abnormal event; a list of associated collaborating candidates, listing potential cheating group members identified in the spatiotemporal graph analysis and their association strength; a knowledge graph contradiction summary, showcasing the details of logical conflicts in the mapping between behavior and medical knowledge points; and counterfactual reasoning results, presenting a comparison between the normal behavior simulated by the digital twin and the actual records. The report is visualized in various forms, including timelines, correlation diagrams, and data tables, supporting assessment administrators in quickly locating risk points.
[0071] The module balances the contributions of direct and indirect evidence through probabilistic fusion and rule-based reasoning. Weighted averaging ensures the linear superposition of quantitative indicators, while Markov logic networks handle non-linear logical relationships. This hybrid approach effectively overcomes the limitations of a single algorithm. The structured report not only provides decision-making conclusions but also offers explanatory support for controversial outcomes through multi-dimensional evidence tracing, meeting the high requirements of fairness and transparency in medical assessments.
[0072] Specifically, in the multimodal behavior perception-based online medical assessment management system of the present invention, the correction vector output by the parallel review module is transmitted to the fraud detection module, the behavior analysis module, and the twin modeling module, including: The parallel review module transmits the generated correction vector to the fraud detection module. The fraud detection module uses the logical contradiction coefficient in the correction vector to adjust the weight parameters of different types of risks in the incomplete information game model. The parallel review module transmits the correction vector to the behavior analysis module, which uses the causal effect strength in the correction vector to adjust the boundary threshold for judging abnormal behavior segments in the isolated forest algorithm. The parallel review module transmits the correction vector to the twin modeling module. The twin modeling module uses the logical contradiction coefficient in the correction vector as a regularization constraint in the training process of the gated recurrent unit network model to dynamically update the baseline parameters of individual behavior. By directing the transmission and parameter adjustment of correction vectors between the fraud detection module, behavior analysis module, and twin modeling module, a system-level adaptive optimization loop is formed.
[0073] The correction vectors generated by the parallel review module are transmitted to the three core processing modules of the system via a secure data transmission channel, forming a closed-loop optimization mechanism. The correction vectors are encapsulated in an encrypted data packet format, including a timestamp, vector version identifier, and integrity check code, and asynchronous communication between modules is achieved through a message queue middleware. The transmission process follows the principle of least privilege; each module can only access the part of the vector relevant to its own function. For example, the fraud detection module can only read the logical contradiction coefficient, and the behavior analysis module can only obtain the causal effect strength. This design ensures data security and module independence.
[0074] After receiving the correction vector, the fraud detection module extracts the logical contradiction coefficient to optimize the incomplete information game model. The logical contradiction coefficient reflects the degree of deviation between the candidate's behavior and the logic of medical knowledge. The module maps this coefficient to the weight parameters of different types of risks in the model. For example, for knowledge-logic contradiction risks, the weight adjustment factor is positively correlated with the logical contradiction coefficient, and the weight matrix is dynamically updated using a gradient descent algorithm. A momentum term is introduced during the adjustment process to prevent oscillations and ensure a smooth transition in weight changes. The model also records historical adjustment trajectories. When the logical contradiction coefficient indicates a high false alarm rate for a specific risk type over several consecutive periods, the sensitivity weight for that type of risk is automatically reduced, achieving parameter adaptation based on evidence accumulation.
[0075] The behavior analysis module optimizes the decision boundary of the isolated forest algorithm by utilizing the causal effect strength in the correction vector. The causal effect strength quantifies the actual impact of abnormal behavior on the accuracy of answering questions, and the module converts this into an adjustment amount for the decision threshold. By establishing a linear mapping relationship between the causal effect strength and the threshold offset, the decision boundary for abnormal segments is tightened accordingly when the causal effect strength increases. The threshold adjustment employs a sliding window mechanism, calculating the threshold change trend based on recent anomaly detection results to avoid over-adjustment due to single fluctuations. The algorithm simultaneously monitors the balance point between precision and recall of the adjusted model to ensure that detection performance remains within the optimal range.
[0076] The Siamese modeling module uses the logistic inconsistency coefficient in the correction vector as a regularization constraint for training the gated recurrent unit network. During backpropagation, the logistic inconsistency coefficient is combined with the loss function to form a new optimization objective, minimizing prediction error while controlling model complexity. The regularization coefficient is dynamically adjusted based on the logistic inconsistency coefficient; higher coefficients strengthen the constraint to prevent overfitting to anomalous behavior patterns. Early stopping is employed during network training; training terminates when the validation set loss no longer decreases, preserving the model parameters with the best generalization ability. Updated baseline parameters for individual behavior are noise-added using differential privacy techniques, achieving a balance between model optimization and privacy protection.
[0077] The directional transfer and parameter adjustment of the correction vector among the three modules form a system-level adaptive optimization loop. After each assessment, the system evaluates the impact of the parameter adjustments on detection accuracy and updates the transfer strategy through reinforcement learning. The loop optimization process is recorded in the system log, forming a complete debugging trajectory and providing data support for subsequent algorithm improvements. This closed-loop design enables the system to continuously evolve, gradually improving its ability to identify complex cheating patterns while minimizing interference with normal examinations.
[0078] The Siamese modeling module uses a gated recurrent unit (GRU) network for time series prediction. The GRU network controls the update process of the hidden state through update and reset gates. Given an input feature vector at time step t... The hidden state of the previous time step Network computing update gate Reset door and candidate hidden state Finally, output the current hidden state. The formula is as follows:
[0079]
[0080]
[0081]
[0082] in, This represents the sigmoid activation function (a non-linear function that maps the input to the interval [0,1]). This represents the hyperbolic tangent activation function (a non-linear function that maps the input to the interval [-1, 1]). This indicates element-wise multiplication (multiplying corresponding elements of two matrices). This indicates vector concatenation (joining two vectors in order into a longer vector). This represents the weight matrix of the updated gate (its dimension is the hidden layer dimension × (hidden layer dimension + input feature dimension)). The weight matrix representing the reset gate (same dimensions) ), The weight matrix representing the candidate hidden state (same dimensions) ), This represents the bias vector of the updated gate (with dimensions equal to the hidden layer dimension). Represents the bias vector of the reset gate (same dimension). ), The bias vector representing the candidate hidden state (same dimension) ), Indicates time step Hidden state (as time step) (Predicted values of behavioral feature vectors). During network training, the predicted values are calculated using the backpropagation algorithm. Compared with the true value The difference loss is calculated using the mean squared error as the loss function.
[0083] in This represents the mean squared error loss function value (the sum of squares of the differences between the predicted and actual values). The time series length is represented by the length of a continuous 10-minute behavioral feature sequence during the assessment, which is 600 seconds, with one data point per second. This indicates that the gated recurrent unit network is at time step The output is the predicted behavior feature vector. This represents the actual behavioral feature vector provided by the behavior analysis module. This represents the L2 norm (the square root of the sum of the squares of the elements of a vector). In the data processing path, the multimodal feature sequences are first normalized, then input into a gated recurrent unit network according to a time window. After outputting the predicted sequence, the loss is calculated by comparing it with the true sequence, and the network parameters are updated using gradient descent.
[0084] The behavior analysis module employs a multi-head attention mechanism to fuse multimodal features. For the input high-dimensional feature vector... (in Representing feature dimension, (representing time steps), the multi-head attention mechanism will query the matrix. Key matrix Sum matrix Projected onto multiple subspaces, attention weights are calculated for each head. The output of the head is:
[0085] in Indicates the first The query matrix projection matrix of each attention head (dimension 1) ), Indicates the first The key matrix projection matrix of each attention head (dimension 1) ), Indicates the first The value matrix and projection matrix of each attention point (dimension 1) ), Let this represent the attention function. The attention function is calculated as follows:
[0086] in The dimension of the key vector (the number of elements in each key vector, usually taken as...). , (For the number of heads) The function is applied along the row direction (normalizing each row's elements to a probability distribution). The multi-head outputs are concatenated and then subjected to a linear transformation to obtain a weighted feature vector. Cosine similarity is calculated to compare the real-time feature vector a with the predicted feature vector b.
[0087] in Representing vectors and The dot product (sum of corresponding elements). and Let a and b represent the L2 norms of vectors a and b, respectively. When a similarity sequence is input into the isolation forest model, the model randomly selects features to partition the data points. Anomaly scores are calculated based on path length; the shorter the path length, the higher the probability of an anomaly.
[0088] The authentication module uses a Kalman filter to fuse the score sequences. The Kalman filter consists of two steps: prediction and update. (State vector) The observation vector represents the identity confidence score at time t (including identity confidence score and rate of change of confidence score, with a dimension of 2). This represents the matching score between lightweight facial features and keyboard dynamics features (dimension 2, including facial feature matching score and keyboard feature matching score).
[0089] The prediction step is:
[0090]
[0091] in This represents the state transition matrix (describing the state transition pattern over time, with values in the range [[1, 0.1], [0,1]], assuming the confidence rate of change is constant). This represents the control input (noise, dimension 1). This represents the control matrix (which maps the control inputs to the state space, with values [[0], [0.1]]). This represents the error covariance matrix (which describes the uncertainty of the state estimate and has a dimension of 2×2). This represents the process noise covariance (describing the intensity of process noise, with values of [[0.01, 0], [0, 0.001]]).
[0092] The update steps are as follows:
[0093]
[0094]
[0095] Where H represents the observation matrix (mapping the state vector to the observation space, with values of [[1, 0], [0, 1]]) and R represents the observation noise covariance (describing the intensity of the observation noise, with values of [[0.02, 0], [0, 0.02]]). This represents the Kalman gain (balancing the weights of predictions and observations, with a dimension of 2×2). The filter output combines the identity confidence score and variance, which are used to dynamically adjust the threshold.
[0096] The fraud detection module uses a temporal graph convolutional network to analyze risk propagation. The graph convolution operation is based on the graph Laplacian matrix. For the graph signal X (a matrix of node attributes containing timestamps and intensity values of risk events) and the convolution kernel... Graph convolution is defined as:
[0097] in The eigenvector matrix represents the graph Laplacian matrix (obtained from the eigenvalue decomposition of the graph Laplacian matrix). This represents an eigenvalue diagonal matrix (a diagonal matrix composed of the eigenvalues of a graphical Laplacian matrix). This represents the graph convolution operation. To simplify computation, a Chebyshev polynomial approximation is used:
[0098] in This represents the order of the polynomial (with a value of 2). Represents the polynomial coefficients (learnable parameters). express Chebyshev polynomials (orthogonal polynomials used for approximate graph convolution kernels). This represents the scaled Laplace matrix (with values...). , Represents the identity matrix. Degree matrix, (Represents the adjacency matrix). Spatiotemporal graph convolutional networks combine temporal convolution to capture dynamic features and output risk clustering probabilities.
[0099] The dynamic assessment module uses a weighted average formula to calculate the comprehensive risk score. Given a preliminary risk score... (Values range [0,1], derived from spatiotemporal correlation analysis of abnormal behavior fragments and low-confidence intervals of identity) and correction vector components (logic contradiction coefficient) (Values range [0,1], reflecting the degree of deviation between behavior and medical knowledge logic) and the strength of causal effect (Values range [-1, 1], quantifying the impact of abnormal behavior on answer accuracy)), the weighted average is:
[0100] in This indicates the weight of the initial risk score (with a value of 0.6, set based on a historical validation accuracy of 85%). The weight of the logical contradiction coefficient is represented (with a value of 0.3, set based on a historical verification accuracy of 90%). The weight representing the strength of the causal effect (set to 0.1 based on a historical validation accuracy of 70%), and satisfying the following conditions: Markov logic networks use soft rules for probabilistic reasoning, with the rule form being... ,in Indicates the first The weights of soft rules (learnable parameters) This represents a logical formula (e.g., "If the logical contradiction coefficient is high and the causal effect is strong, then the probability of fraud is high"). The probability distribution is defined by the potential function:
[0101] in This represents the probability distribution of a Markov logic network (the probability of all possible world states). This represents the partition function (a normalization constant that ensures the sum of probabilities of all states is 1). Represents the natural exponential function (in) Exponential operations with base ( This represents the summation over all soft rules. Indicates the first The weight of a soft rule, Indicates the state The following satisfies the first The number of atomic formulas (the number of conditions under which a logical formula is true) of a soft rule. Inference outputs the overall probability of fraud through maximum a posteriori estimation or sampling methods.
[0102] In this invention, all input data undergoes standardization during computation, such as Z-score standardization, to ensure a feature mean of 0 and a variance of 1. Time-series data is segmented using a sliding window, with the window length and step size set according to the application scenario. Model parameters are optimized using gradient descent, and the learning rate is adjusted using an adaptive algorithm. This invention achieves a balance between real-time performance and accuracy in its data processing path, with modules communicating data via message queues to avoid blocking.
[0103] The multimodal behavior perception-based online medical examination management system achieves comprehensive monitoring and fraud prevention in remote examination environments through the collaborative operation of multiple modules. The system is applied to scenarios where regional health management agencies organize large-scale online medical examinations. Addressing the characteristics of candidates being dispersed across different medical institutions and participating in the examinations via the internet, the system employs multimodal behavior perception technology to collect and analyze candidate behavior data in real time. Upon system startup, the data acquisition module uses the candidate's terminal camera to capture facial video streams at a fixed frame rate, and the microphone to collect ambient audio streams at a specific sampling rate. Simultaneously, it listens for underlying system events to obtain keystrokes, mouse movements, and browser operation sequences. The collected multimodal data is timestamped at the microsecond level, compressed, and streamed to the backend processing platform, providing a high-quality synchronous data stream for subsequent analysis.
[0104] The twin modeling module receives synchronous multimodal raw data streams from the data acquisition module and combines them with candidates' historical training data to construct personalized behavior models. The module extracts typical behavioral feature sequences from historical data, trains a gated recurrent unit network to learn individual behavioral patterns, and forms an initial digital twin. During the examination, the module processes incoming video, audio, and operational data in real time, obtains coordinated multimodal feature sequences through time alignment, and inputs them into the prediction model to output a short-term behavioral baseline. When behavioral deviations are detected, the module updates model parameters through local weight adjustments and a federated learning mechanism to dynamically optimize the behavioral baseline while protecting candidate data privacy.
[0105] The behavior analysis module processes real-time data streams and dynamic baselines provided by the Siamese model in parallel. The module extracts head pose, eye state, and gaze direction features from the video stream, calculates acoustic feature parameters from the audio stream, and statistically analyzes interaction behavior metrics from action events. Features are fused into a unified context vector using a multi-head attention mechanism and then compared with the predicted baseline for similarity calculation. The similarity sequence is then segmented by a sliding window and input into an isolation forest model to identify anomalous segments that deviate from the normal pattern. For example, when assessing clinical case analysis questions, if the system detects that a candidate's gaze is prolonged away from the case data area while answering quickly, it is marked as abnormal behavior.
[0106] The identity verification module implements a continuous static authentication mechanism. Initial identity verification is completed at the start of the exam via liveness detection. During the assessment, facial features and keyboard dynamics data are periodically captured. Lightweight facial feature descriptors and keyboard input patterns are compared with registered templates, and the score sequences are fused using Kalman filtering to generate an identity confidence curve. The system dynamically monitors network status and lighting conditions, adaptively adjusting the authentication threshold. An alert is triggered when the identity confidence level consistently falls below the threshold.
[0107] The cheating detection module integrates abnormal behavioral segments and low-confidence intervals of identity to construct a spatiotemporal graph model to analyze risk propagation within the examination room. The module establishes a spatial correlation graph based on the geographical distribution of examinee IP addresses and answer time series, injecting risk events as node attributes into the graph. A spatiotemporal graph convolutional network identifies risk clustering patterns and outputs a list of potential collaborative cheating groups. An incomplete information game model integrates individual risk and group correlation to calculate the initial probability of cheating, providing a basis for subsequent decision-making.
[0108] The parallel review module initiates in-depth analysis for high-risk events. The module retrieves the candidate's complete operational sequence, performs semantic mapping with the medical knowledge graph, and analyzes the consistency between behavior and medical logic. It simulates normal answer trajectories through counterfactual reasoning to quantify the causal effects of abnormal behavior. The generated correction vector dynamically adjusts system detection parameters, such as optimizing game theory model weights and fine-tuning abnormal judgment thresholds, forming a closed-loop optimization mechanism. In the diagnostics assessment, the system detects candidates skipping key differential diagnostic steps and directly answering, marking this as a logical contradiction.
[0109] The dynamic evaluation module integrates multi-source evidence to generate a final report. The module employs a weighted average algorithm to combine initial risk scores and correction vectors, and uses a Markov logic network to process contradictory evidence, outputting a comprehensive probability of fraud. The structured report details abnormal time points, associated candidates, knowledge contradictions, and counterfactual analysis results, supporting precise decision-making by examination staff. The system continuously learns to gradually improve detection accuracy, minimizing interference with the normal examination while ensuring fairness.
[0110] Each module achieves data exchange and functional collaboration through standardized interfaces, and the directional transmission of correction vectors forms a system-level adaptive loop. The system adopts a distributed architecture deployment, supporting large-scale concurrent examination scenarios, while ensuring data security through encrypted communication and differential privacy technology. No additional hardware is required during implementation; comprehensive monitoring can be achieved using existing terminal sensors, providing reliable technical support for online medical assessments.
[0111] In a specific implementation, the parameters of each sensor in the data acquisition module are correlated with the synchronization mechanism as follows: The candidate's terminal camera captures a facial video stream at 25 frames per second. This frame rate was determined through pre-experimentation to balance the accuracy of facial feature point detection with computational resource consumption. The microphone acquires environmental audio waveforms at a sampling rate of 16kHz, which meets the requirement of voice activity detection covering the audio frequency range of 300Hz to 3400Hz. The system's underlying event listener captures keyboard keystrokes, mouse coordinates and click events, and browser tab activation status changes by registering Windows system API hooks. The event capture latency was tested to be less than 1 millisecond. The hardware synchronization pulse signal is a square wave signal with a period of 1 millisecond. It is sent by the system main process to the camera driver, microphone driver, and input device driver when acquisition starts. After receiving the signal, the driver layer aligns its internal clock counter to control the deviation between the video frame timestamp, audio sampling timestamp, and operation event timestamp within ±0.5 microseconds. The video data was processed using the MTCNN model for face detection. The input image size was 640×480 pixels, and the output consisted of coordinates for 106 facial key points. After locating the facial region, it was cropped to a 224×224 pixel region of interest, with a compression ratio of 34% of the original size. When the original size was 640×480 pixels, the cropped area was 224×224 pixels, and the compression ratio was the cropped area divided by the original area. The audio data was processed using a WebRTC-based speech activity detection algorithm. The energy threshold was set to 0.02, which is the squared mean of the normalized audio signal. Continuous segments with energy exceeding this threshold were identified as human voice segments, while environmental noise segments with energy below the threshold, such as keyboard typing and paper rustling, were filtered out. Operation event data is aggregated into a JSON-formatted event sequence in ascending order of timestamps. Each event includes three fields: timestamp, event type, and event parameters. Event type values include keystroke, mouse movement, mouse click, tab activation, and tab deactivation. Event parameters include keystroke events (keycode value), mouse movement events (coordinate value), click events (coordinates and button type), and tab events (tab ID). The encapsulated data packets are in MP4 container format. The header includes a version number (currently V1.0), timestamp checksum, and data length. The timestamp checksum uses the CRC32 algorithm to calculate the checksum of all timestamps within the data packet. The data length field records the total number of bytes in the data packet. The data packets are transmitted via a UDP-based, latency-resistant network protocol using the QUIC protocol with forward error correction enabled. The error correction code uses Reed-Solomon encoding at a 1:2 ratio. During streaming transmission, a retransmission request is made every 10 data packets to ensure data integrity. Finally, the data is persisted to the `exam_raw_data` topic in the backend Kafka message queue, with the candidate ID as the partition key to ensure that data for the same candidate is stored sequentially.
[0112] The twin modeling module extracts multimodal feature sequences from historical training data, including facial expression change features from facial video streams, speech rhythm features from environmental audio streams, and behavioral interval features from operation event sequences. Facial expression change features are extracted using FACS encoding to obtain the intensity values of 48 facial action units, with a sampling interval of 1 second. Speech rhythm features calculate the speech rate of each sentence, i.e., the number of words spoken per minute, with a sampling interval at the end of each sentence. Behavioral interval features calculate the average time interval between two consecutive keystrokes and the average time interval between two consecutive mouse clicks, with a sampling interval of 10 seconds. The gated recurrent unit network has an input layer dimension of 128, obtained by mapping 48 dimensions of facial expression change features, 1 dimension of speech rhythm features, and 2 dimensions of behavioral interval features through a fully connected layer; the hidden layer dimension is 64; and the output layer dimension is 64, i.e., the behavioral feature vector dimension. The training process uses the Adam optimizer with an initial learning rate of 0.001, a learning rate decay strategy of decreasing to 0.9 times the original value every 10 epochs, a batch size of 32, 50 training epochs, and mean squared error as the loss function.
[0113]
[0114] in This represents the mean squared error loss function value. This represents the length of the time series, calculated as the length of a continuous 10-minute behavioral feature sequence during the assessment process, which is 600 seconds (1 point per second). This indicates that the gated recurrent unit network is at time step The output is the predicted behavior feature vector. This represents the actual behavioral feature vector provided by the behavior analysis module. The L2 norm is represented. During the assessment, the time alignment process uses a dynamic time warping algorithm with an alignment window size of 5 seconds and a step size of 1 second. The time resolution of the aligned multimodal feature sequences is uniformly 1 second per point. This weight adjustment only updates the hidden layer weights of the gated recurrent unit network. After each backpropagation, the hidden layer weights are adjusted by 10% of the current value (i.e., the update ratio is 0.1) to avoid overfitting to abnormal behavior in a single assessment. The encrypted weight update uses the homomorphic encryption algorithm Paillier algorithm with a key length of 2048 bits. Before encryption, the weight update is normalized, mapping the weight values to the range of -1 to 1. After being uploaded to the central server, the server calculates the global average update using a federated averaging algorithm. The weights are the proportion of data volume for each terminal, and the data volume is the length of the multimodal feature sequences collected by the terminal. The global average update is distributed to each terminal via the secure transmission protocol TLS 1.3. After decryption, the terminal adjusts its local model weights by the update ratio of 0.1. Differential privacy technology is used during model updates. The added Laplace noise scale parameter is set to 0.01, calculated based on the privacy budget ε equal to 1. The noise scale b is equal to Δf divided by ε, where Δf is the maximum change in weight update, which is tested to be 0.01, to prevent individual data from being reverse-engineered.
[0115] The behavior analysis module employs the OpenPose algorithm in the video processing channel, using the BODY_25 model. The input image size is 224×224 pixels, and the output is the coordinates of 25 body key points. Head posture is calculated using the 3D coordinates of the nose tip, left eye corner, and right eye corner to determine yaw and pitch angles, with a calculation error of less than 2 degrees. The eye aspect ratio algorithm takes a 60×20 pixel image as input and calculates the aspect ratio of 6 eye key points, with a threshold of 0.2; values below this value are considered as closed eyes. The gaze estimation algorithm uses the Tobii gaze estimation model, taking eye and facial images as input and outputting a gaze direction vector with a dimension of 3, representing the direction of the gaze in the screen coordinate system. The audio processing channel uses a Mel filter bank of 40 filters, ranging from 0 to 8000Hz. Mel frequency cepstral coefficients are extracted, and the first 13 coefficients are used. The fundamental frequency profile is calculated using an autocorrelation function with a window size of 25ms and an overlap of 10ms, covering the main frequency range of adult speech from 80Hz to 300Hz. The operation behavior analysis channel statistically analyzes the average keystroke interval time, with a sampling interval of 10 keystrokes; the average mouse movement speed, with a sampling interval of 1 second, calculates the ratio of mouse movement distance to time within 1 second; and the percentile of page focus dwell time, taking the 50th percentile as the median. The input high-dimensional vector of the multi-head attention network has a dimension of 28: head pose (3D), eye opening (1D), gaze direction (3D), Mel frequency cepstral coefficients (13D), fundamental frequency profile (1D), average keystroke interval time (1D), average mouse movement speed (1D), and percentile of page focus dwell time (1D), and is concatenated after Z-score normalization; there are 8 attention heads, each with a dimension of 4, so 28 dimensions divided by 8 heads is approximately 4 dimensions; the projection dimensions of the query matrix, key matrix, and value matrix are all 4 dimensions. The cosine similarity calculation incorporates a dynamic time warping algorithm. The alignment window size is 5 seconds, with a step size of 1 second. The temporal resolution of the aligned real-time context feature vector and the predicted feature vector is unified to 1 point per second. The cosine similarity value range is from -1 to 1, with a mean of 0.8, representing the average similarity under normal behavior, and a standard deviation of 0.1. The pre-training of the isolated forest model uses historical normal exam behavior data (1000 examinees, 600 seconds per sequence). 100 isolated trees are constructed for each examinee, with a sampling size of 256 data points per tree. The abnormal score threshold is 0.6; scores higher than this value are considered abnormal. The sliding window length is 5 seconds, including 5 similarity data points, with a step size of 1 second. Data points within the window whose average path length is lower than the overall mean minus two standard deviations are marked as abnormal behavior segments. The overall mean is 10, and the standard deviation is 2. An average path length less than 10 - 2 × 2 = 6 is considered abnormal.
[0116] The identity verification module instructs candidates to perform the following specific action sequence at the start of the exam: blinking three times consecutively, each blink lasting less than 0.5 seconds with an interval of less than 1 second; slowly turning the head from left to right at an angle greater than 30 degrees, taking more than 2 seconds; and opening and closing the mouth once, with the mouth opening lasting less than 1 second. High-quality frames are selected based on the following criteria: facial keypoint detection confidence greater than 0.9 (using the confidence score output by the MTCNN model); image sharpness score greater than 0.8 (using the Laplacian operator to calculate image variance, with a variance greater than 1000 indicating sharpness); and no occlusion, with the occluded area less than 5% of the facial region. Liveness detection employs a 3D structured light-based algorithm on an Intel RealSense D435i device with a depth map resolution of 640×480 pixels and a depth measurement range of 0.3 meters to 3 meters. By detecting the continuity of the facial depth map, the depth difference between adjacent pixels is less than 0.1 meters, excluding photo or video attacks. The 3D facial feature point cloud extraction uses an iterative nearest-neighbor algorithm with 50 iterations and a convergence threshold of 0.001 meters. The Hausdorff distance between point clouds is calculated, with a threshold of 0.05 meters; values less than this are considered successful matches. During the assessment, the frequency of periodic frame capture is 2 frames per second, or 1 frame every 500 milliseconds. The lightweight convolutional neural network uses the MobileNetV2 model, with an input image size of 112×112 pixels and an output of a 128-dimensional facial feature descriptor. Keyboard dynamics features include the standard deviation of keystroke interval time, sampled at 10 keystroke intervals; key pressure variation patterns, using the keyboard's built-in pressure sensor at a sampling frequency of 100Hz, extracting the mean and variance of pressure values; and input rhythm regularity, calculating the coefficient of variation of adjacent keystroke time intervals, i.e., the ratio of the standard deviation to the mean. Facial feature matching uses a cosine similarity algorithm with a threshold of 0.85; a match is considered successful if the value is greater than this. Keyboard feature matching uses a dynamic time warping algorithm with a window size of 5 keystrokes, a step size of 1 keystroke, and a distance threshold of 0.1. The state vector of the extended Kalman filter consists of identity confidence and confidence rate of change, with a dimension of 2; the observation vector consists of facial feature matching score and keyboard feature matching score, with a dimension of 2; the state transition matrix F is [[1, 0.1], [0, 1]], assuming the confidence rate of change is constant; the control matrix B is [[0], [0.1]], with noise as the control input; the process noise covariance Q is [[0.01, 0], [0, 0.001]]; the observation matrix H is [[1, 0], [0, 1]]; and the observation noise covariance R is [[0.02, 0], [0, 0.02]], calculated based on historical score fluctuations.The authentication quality factor is calculated based on network latency and illumination uniformity. When the network latency exceeds 100 milliseconds, the quality factor decreases by 0.1. When the illumination uniformity is below 0.7, it is calculated using the coefficient of variation of the image grayscale values. A coefficient of variation less than 0.7 is considered uniform, and the quality factor decreases by 0.2. The initial value of the quality factor is 1.0, with a minimum of 0.5. The inputs to the fuzzy logic controller are the authentication quality factor (universe of discourse 0.5 to 1.0) and the variance estimated by the Kalman filter (universe of discourse 0 to 0.1). The output is the identity anomaly determination threshold (universe of discourse 0.6 to 0.9). The fuzzy rules are as follows: if the quality factor is high (greater than 0.8) and the variance is low (less than 0.05), the threshold is set to 0.9; if the quality factor is medium (0.6 to 0.8) and the variance is medium (0.05 to 0.1), the threshold is set to 0.75; if the quality factor is low (less than 0.6) and the variance is high (greater than 0.1), the threshold is set to 0.6. The real-time confidence score of the identity confidence curve is obtained by using the identity confidence level output by the Kalman filter, the dynamic judgment threshold is obtained by using the threshold output by the fuzzy logic controller, and the timestamp sequence is synchronized with the timestamp of the data acquisition module with millisecond precision.
[0117] The cheating detection module employs an adaptive threshold algorithm based on the Otsu thresholding method using historical confidence distribution. It calculates a histogram of the identity confidence curve, with bins equal to 10, and finds the threshold that maximizes the inter-class variance as the judgment threshold for low confidence intervals. The historical confidence distribution is derived from identity confidence data from 100 exams. The length of the identity confidence sequence for each candidate in each exam is equal to the exam duration, with an average duration of 120 minutes. For calculating the intensity value of risk events, the intensity value of anomaly segments is the minimum similarity sequence within the anomaly segment, ranging from -1 to 1; the smaller the value, the stronger the intensity. The intensity value of a low confidence interval is 1 minus the average identity confidence within the interval; when the average identity confidence is below 0.6, the intensity value is greater than 0.4. The metadata of the candidate information database includes login IP geolocation, obtaining latitude and longitude coordinates by querying the GeoIP database using the IP address, with city-level precision; and answer timestamp sequences, accurate to milliseconds, recording the start and submission times of each candidate's answers. Construction of spatial proximity edges: If the IPs belong to the same / 24 subnet (subnet mask 255.255.255.0) or the same virtual examination room number, the edge weight is set to 1.0; if the IPs belong to the same / 16 subnet but different / 24 subnets (subnet mask 255.255.0.0), the edge weight is set to 0.5; otherwise, there is no edge connection. Construction of behavioral similarity edges: The answer rhythm similarity adopts the dynamic time warping algorithm, with an alignment window size of 10 questions, a step size of 1 question, and a path cumulative distance threshold of 5 seconds. If the distance is less than 5 seconds, it is considered rhythm similar, and the rhythm similarity weight is set to 0.5; the answer option similarity adopts the Jaccard similarity coefficient, calculating the ratio of the intersection and union of the answer options of two candidates on the same question. The similarity coefficient threshold is set to 0.5. When the ratio is greater than or equal to this value, it is considered content similar, and the content similarity weight is set to 0.5; the weight of the behavioral similarity edge is the rhythm similarity weight multiplied by the rhythm similarity score plus the content similarity weight multiplied by the content similarity score, with the score ranging from 0 to 1. The spatiotemporal graph convolutional network uses a 2-layer Chebyshev polynomial approximation at its bottom convolutional layers, where K equals 2; the scaled Laplacian matrix... for ,in Represents the identity matrix. Degree matrix, The adjacency matrix is represented; the temporal convolutional layer uses two one-dimensional convolutions with a kernel size of 3, a stride of 1, padding of 1, and the activation function ReLU; the attention mechanism uses self-attention, with an input dimension of 64 and an output dimension of 64, and the attention weights are calculated as follows. ,in The dimension of the key vector (here) Q and K are linear projections of the input features. The system strategy space of the incomplete information game model includes adjusting the monitoring intensity, with actions of strengthening, maintaining, and weakening, corresponding to adjusting the anomaly detection threshold of the behavior analysis module; triggering the review mechanism, with actions of triggering and not triggering, corresponding to calling the parallel review module; the candidate strategy space includes normal examination, with actions of normal answering and normal operation; cheating behavior, with actions of collaborative cheating, proxy examination, and using auxiliary tools; the information set is defined as the current risk event intensity, with values of low, medium, and high, and the collaborative cheating confidence, with values of low, medium, and high; in the payoff function, the immediate payoff for a high risk event intensity is -0.5, the system payoff is negative, and the immediate payoff for a high collaborative cheating confidence is -0.5. The benefit is -1.0; in the Bayesian update rule, the initial prior probabilities are 0.95 for the normal exam probability and 0.05 for the cheating probability; the Q-learning algorithm has a learning rate of 0.1, a discount factor of 0.9, and an initial exploration rate of 0.3, which decays to 0.9 times the original rate every 100 iterations; the state is defined as the current risk event intensity, discretized into 3 classes, and the collaborative cheating confidence, also discretized into 3 classes, for a total of 9 states; the action is defined as adjusting the monitoring intensity (3 types: increase, maintain, decrease) and triggering the review mechanism (2 types: trigger, do not trigger), for a total of 5 actions; the Q-value table is initialized to 0, and the optimal strategy for 9×5=45 state-action combinations is learned through iterative updates. For the parallel review module, the medical assessment knowledge graph includes entity types, diseases, symptoms, signs, examination items, treatment methods, and drugs; attributes, the ICD-10 code of the disease, the severity of the symptoms, and the applicable scenarios of the examination items; relationships, disease-manifestation-symptoms, symptoms-prompt-disease, disease-required-examination items, examination items-aided diagnosis-disease, disease-recommended-treatment methods, and treatment methods-use-drugs; the knowledge graph is constructed using the Neo4j graph database, with 10,000 nodes and 50,000 relationships, and the data sources are internal medicine and surgery textbooks and clinical guidelines. The mapping method between operation sequences and knowledge graphs matches the candidate's answer sequence (question ID sequence) with the knowledge point nodes in the knowledge graph (knowledge point IDs associated with question IDs) to obtain the knowledge point jump path; modification traces (modified question IDs, answers before modification, and answers after modification) are matched with the knowledge point IDs associated with the modified question IDs in the knowledge graph to analyze whether the modification conforms to the logical relationship of the knowledge points, such as modifying the answer from incorrect to correct to conform to the correct understanding of the knowledge point; eye movement trajectories (gaze point coordinate sequence) are matched with the reference material areas in the knowledge graph (e.g., electrocardiogram area, imaging image area) to obtain the duration of gaze in the reference material area and the relevance to the knowledge point, with a duration of more than 10 seconds indicating attention to the knowledge point. The calculation of the logical contradiction coefficient uses the cosine similarity algorithm to calculate the similarity between the knowledge point jump path and the typical cognitive path in the knowledge graph. The typical cognitive path is the normal answer path annotated by experts. When the similarity is less than 0.7, the logical contradiction coefficient is set to 1.0, indicating a high degree of contradiction; when the similarity is between 0.7 and 0.9, it is set to 0.5; when the similarity is greater than 0.9, it is set to 0.0. The calculation of the matching degree between the gaze lingering pattern and the learning cognitive law uses the dynamic time warping algorithm to align the actual gaze lingering sequence with the gaze lingering sequence of the typical cognitive law. The typical sequence is annotated by experts. For example, when answering cardiovascular disease questions continuously, the gaze lingering time sequence in the electrocardiogram area is 15 seconds, 20 seconds, and 18 seconds. When the similarity after alignment is less than 0.6, it is judged as a logical contradiction, and the logical contradiction coefficient is increased by 0.3.The counterfactual hypothesis is constructed by constructing a hypothetical scenario starting from the beginning time t0 of the abnormal segment, where candidates answer questions according to the predicted behavioral feature vector of their personal digital twin model. The simulated normal state includes subsequent answering behaviors and results such as the answering path (question jump sequence), time distribution (answering time for each question), and knowledge retrieval pattern (duration of gaze in the reference material area). A difference-in-differences model is used to compare the differences with actual records. The formula is: the treatment effect strength equals the actual accuracy rate minus the simulated accuracy rate minus the actual accuracy rate of the control group minus the simulated accuracy rate of the control group. The control group consists of candidates who did not experience the abnormal segment. Accuracy data from 100 normal candidates are selected. The causal effect strength ranges from -1 to 1; a positive value indicates that the abnormal behavior increased the accuracy rate, possibly indicating cheating; a negative value indicates that the accuracy rate decreased, possibly indicating operational error. The multidimensional tensor form of the correction vector has the following dimensions: Dimension 1, the time dimension, is the start timestamp of the abnormal segment, in milliseconds; Dimension 2, the abnormality type encoding, is logical contradiction encoding 1, causal effect encoding 2, and visual contradiction encoding 3; Dimension 3, the correction strength value, is the logical contradiction coefficient range of 0 to 1, the causal effect strength range of -1 to 1, and the visual contradiction coefficient range of 0 to 1. The transmission channel uses a RabbitMQ message queue, the exchange type is direct, the routing key is the module name, the fraud detection module routing key is fraud_detection, the behavior analysis module is behavior_analysis, the twin modeling module is twin_modeling, the queue persistence is true, and the message confirmation mechanism is publisherconfirm. The fraud detection module adjusts the risk weight parameters of the incomplete information game model. The mapping relationship between the logical contradiction coefficient and the risk weight parameters is that the weight adjustment factor is equal to the logical contradiction coefficient multiplied by 0.5. For example, when the logical contradiction coefficient is 1.0, the weight adjustment factor is 0.5. The gradient descent algorithm is used to update the weight matrix with a learning rate of 0.01 and a momentum term of 0.9 to prevent weight oscillation. The historical adjustment trajectory records the weight values of the most recent 100 adjustments. When the logical contradiction coefficient indicates a false alarm rate of more than 0.2 for a certain type of risk, such as knowledge logical contradiction, for 10 consecutive times, the false alarm rate is equal to the number of false alarms divided by the total number of detections. The sensitivity weight of this type of risk is automatically reduced by 0.8 times the current weight.The behavior analysis module adjusts the decision boundary threshold of the isolated forest algorithm. The linear mapping relationship between the causal effect strength and the threshold offset is that the threshold offset equals the causal effect strength multiplied by 0.1. For example, when the causal effect strength is -0.5, the threshold offset is -0.05. The threshold adjustment adopts a sliding window mechanism with a window size of 10 anomaly segments and a step size of 1 segment. The threshold change trend within the window is calculated using a linear regression slope. When the slope is less than -0.01, the threshold continues to decrease, and the adjustment is paused. The precision and recall of the adjusted model are monitored. Precision equals the number of correct detections divided by the total number of detections, and recall equals the number of correct detections divided by the actual number of anomalies. When the precision is lower than 0.8 or the recall is lower than 0.7, the threshold is rolled back to the original threshold. The twin modeling module adds a regularization constraint term, which is the logical contradiction coefficient multiplied by a certain factor. , Let L2 be the L2 norm of the weight matrix of the gated recurrent unit network. The regularization coefficient is equal to the logical contradiction coefficient multiplied by 0.01. For example, if the logical contradiction coefficient is 1.0, the regularization coefficient is 0.01. During backpropagation, the loss function is the mean squared error loss plus the regularization term. The total loss is equal to the mean squared error loss plus the regularization coefficient multiplied by L2. In the early stopping strategy, training terminates when the validation set loss does not decrease for 10 consecutive rounds. Each round's validation set includes behavioral feature sequences of 100 examinees. A decrease of less than 0.001 is considered as no decrease, and the model parameters with the minimum validation set loss are retained. The Laplace noise scaling parameter added by the differential privacy technique is set to 0.005, and the privacy budget is set accordingly. The value is 2. Noise is added before the weight update is uploaded to ensure that the aggregated global model parameters cannot be used to infer individual data.
[0118] For the dynamic evaluation module, in the weighted average formula, the confidence factor is calculated based on historical verification accuracy. The confidence factor for the primary risk score is 0.6, with a historical verification accuracy of 85%; the confidence factor for the logical contradiction coefficient is 0.3, with a historical verification accuracy of 90%; and the confidence factor for the causal effect strength is 0.1, with a historical verification accuracy of 70%. The weights satisfy... add add =0.6 + 0.3 + 0.1 = 1.0; variance normalization uses Z-score standardization, calculating the mean and standard deviation for each indicator. The mean of the primary risk score is 0.5, and the standard deviation is 0.2; the mean of the logical contradiction coefficient is 0.3, and the standard deviation is 0.3; the mean of the causal effect strength is 0.0, and the standard deviation is 0.5; the normalized value is the original value minus the mean divided by the standard deviation. The observation layer nodes of the Markov logic network are the primary risk score nodes, with values ranging from 0 to 1; the logical contradiction coefficient nodes, with values ranging from 0 to 1; and the causal effect strength nodes, with values ranging from -1 to 1. The hidden layer soft rules include rule 1, with a weight w1 equal to 0.8, where a high probability of fraud is indicated if the logical contradiction coefficient is higher than 0.7 and the causal effect strength is higher than 0.5; and rule 2, with a weight w2 equal to 0.6, where a medium probability of fraud is indicated if the primary risk score is higher than 0.8 and the logical contradiction coefficient is between 0.3 and 0.7. The potential function is defined as...
[0119] in This represents the probability distribution of a Markov logic network. This represents the partition function, used to normalize probability values. This represents the natural exponential function. This represents the summation over all soft rules. Indicates the first The weight of a soft rule, Indicates the state The following satisfies the first The number of atomic formulas for each soft rule. Reasoning is performed using Gibbs sampling, with 1000 samplings. The average probability is calculated from the last 500 samplings, and the overall fraud probability value is output. When generating the structured report, the abnormal behavior timeline marks the start time, duration, and triggering module of each abnormal event. The data comes from the abnormal segment records of the behavior analysis module. The list of associated collaborating candidates lists the members of potential cheating groups identified in the spatiotemporal graph analysis and the strength of association, where the association strength is the weight of the behavioral similarity edges. The knowledge graph contradiction summary shows the details of logical conflicts in the mapping between behavior and medical knowledge points. The data comes from the logical contradiction coefficient calculation results of the parallel review module. The counterfactual reasoning results present the differences between the normal behavior simulated by the digital twin and the actual records. The data comes from the causal effect strength calculation results of the parallel review module. The report is visualized in the form of a timeline, association graph, and data table. The timeline uses a line graph to show the temporal distribution of identity confidence and abnormal segments. The association graph uses a network graph to show the association edges and weights of collaborating cheating candidates. The data table lists the detailed parameters of abnormal events, supporting assessment administrators to quickly locate risk points.
[0120] Embodiment 1 of the present invention: Implementation of the data acquisition module in a multi-hospital joint examination; In the provincial-level physician qualification examination, the system was deployed in the examination centers of 23 tertiary hospitals across the province. The data acquisition module simultaneously connected to 317 examination terminals. Each terminal's camera captured facial video of the examinee at 25 frames per second, and the microphone captured ambient audio at a sampling rate of 16kHz. The module detected that examinee A frequently switched browser tabs during the question-answering process, and the system's underlying event listener recorded a sequence of abnormal page focus changes. The video stream showed that the examinee's gaze repeatedly deviated from the main area of the screen, and the audio stream detected whispers in the background environment. All data was stamped with microsecond-level timestamps obtained from the BeiDou time synchronization system, and the acquisition timing was aligned using hardware synchronization signals. After face region cropping and voice activity detection, the data packets were transmitted via a 5G network to the processing server of the Provincial Health Information Center, providing a strictly synchronized multimodal data source for subsequent analysis.
[0121] Embodiment 2 of the present invention: Application of the twin modeling module in clinical reasoning assessment; In the standardized residency training assessment, the system retrieved 136 training records from candidate B over the past three months to construct a personal digital twin. The module extracted typical case analysis rhythm characteristics from the historical data to train a gate control recurrent unit network model. During real-time assessment, when candidate B handled an acute chest pain case, the model predicted that under normal circumstances, they should spend approximately 45 seconds on the ECG interpretation step. Actual monitoring showed that the candidate skipped the crucial interpretation step in only 12 seconds, and the system detected a behavioral deviation. The model updated its parameters through federated learning, incorporating this abnormal pattern into the global behavior database while protecting the original training data from leaving the local terminal.
[0122] Embodiment 3 of the present invention: Monitoring of behavior analysis module in skills operation assessment; In the laparoscopic simulation assessment, the system simultaneously analyzed candidate C's eye movement trajectory, operational force, and instrument movement path. Multiple attention networks revealed that during key tissue dissection steps, the candidate's gaze lingered on the anatomical reference image for only 30% of the average time, while the instrument movement speed abnormally increased. Comparison with the individual baseline showed that the cosine similarity between the real-time operational characteristics and the predicted values dropped below 0.3. The isolated forest model marked five consecutive time windows as anomalies, with these anomalous periods precisely corresponding to the delicate operational steps required by the assessment. The system generated a detailed anomaly report, indicating significant deviations from the required operational rhythm and difficulty.
[0123] Embodiment 4 of the present invention: Implementation of the identity verification module in remote proctoring; In a cross-regional medical practitioner qualification examination, candidate D took the exam at a mountain clinic. The identity verification module detected network latency fluctuations between 80-150ms and light intensity at only 40% of the standard value. The system automatically adjusted the authentication threshold and increased the weight of keyboard dynamics features. Monitoring revealed that although the candidate's facial feature matching accuracy fluctuated significantly, the input rhythm of their unique medical practitioner qualification certificate number remained stable. After integrating multi-source evidence using a Kalman filter, the system determined that the identity confidence level remained within the reliable range. The system generated an authentication quality report, recommending that the local examination center improve lighting conditions to ensure authentication reliability.
[0124] Embodiment 5 of the present invention: Operation of the fraud detection module in collaborative fraud identification; During the nursing qualification examination, the system detected similar abnormal patterns among seven candidates at the same test center during the same time period. Spatiotemporal graph convolutional network analysis revealed that the candidates' IP addresses belonged to the same network segment, and their error patterns were highly correlated. Three candidates made the same spelling errors in pharmacology questions, and the error pattern was completely consistent with printing errors in a certain preparation material. The dynamic spatiotemporal graph constructed by the system showed that the abnormal behavior was clustered in physical space, identifying potential collaborative cheating groups. The cheating detection module fused group risk signals with individual abnormal evidence, outputting a high-risk warning.
[0125] Embodiment Six of the Invention: Application of the Parallel Review Module in Arbitration of Disputed Results; During the review of a candidate's associate chief physician professional title examination results, the parallel review module discovered inconsistencies in the candidate E's answers to the radiology interpretation questions. The system retrieved the candidate's complete operation record and found that the candidate repeatedly revised their answers when dealing with normal chest X-rays, but quickly determined diagnoses when analyzing difficult cases. Medical knowledge graph mapping showed that this behavior violated the principles of radiological diagnostic thinking. Counterfactual reasoning simulation indicated that, under normal circumstances, the candidate should have spent more time on difficult cases. The correction vector instructed the system to adjust the detection parameters, resulting in fewer similar false alarms in subsequent similar examinations. Based on the logical contradiction analysis report provided by the system, the arbitration committee ultimately confirmed the validity of the examination results.
[0126] Federated averaging is a distributed machine learning technique used to collaboratively train models while protecting data privacy. This algorithm allows multiple clients to train models locally, uploading only the updated model parameters to a central server. The server then aggregates these updates to generate a global model without sharing the original data. In this invention, federated averaging is applied to the twin modeling module to aggregate the weight updates of the local gated recurrent unit network uploaded from each candidate's terminal. The specific data processing path is as follows: each candidate's terminal trains the gated recurrent unit network using local multimodal data, generates encrypted weight updates, and uploads them to a cloud server; the server performs a weighted average calculation on all weight updates, where the weights are assigned based on the amount of data or confidence factors from each terminal; the resulting global average update is securely distributed to each terminal to update the local model, achieving collaborative optimization of the global behavioral representation model while ensuring that candidate data does not leave the local terminal.
[0127] Gated recurrent unit networks (GRUNs) are variants of recurrent neural networks that control information flow through update and reset gate mechanisms, effectively capturing long-term dependencies in time series. The network structure includes hidden states; at each time step, it receives input, updates the hidden states, and outputs predicted values. In this invention, the GRAN is used in the twin modeling module to construct a personalized digital twin of the examinee. The specific data processing path is as follows: the module extracts feature sequences from the multimodal raw data stream and inputs them into the GRAN; the network predicts the feature vector for the next time step based on the behavioral features of the previous time step, forming an initial behavioral baseline; during the assessment, real-time feature sequences are input into the network, outputting short-window predicted values, which are compared with the true values to generate a loss; the network updates weights through a backpropagation algorithm, gradually optimizing prediction accuracy.
[0128] Backpropagation is a core method for training neural networks. It calculates the gradient of the loss function with respect to the network weights using the chain rule and updates the weights using gradient descent to minimize the loss. The algorithm consists of three steps: forward propagation to calculate the output, backpropagation to calculate the gradient, and weight update. In this invention, the backpropagation algorithm is applied to the Siamese modeling module and the behavior analysis module to optimize the gated recurrent unit network and the multi-head attention network. The specific data processing path is as follows: the module calculates the difference loss between the network output and the true value, and calculates the gradient using the backpropagation algorithm; the gradient is used to adjust the network weights, focusing on optimizing layers sensitive to time series or feature importance; the weight update process may incorporate regularization constraints to prevent overfitting and improve the model's generalization ability.
[0129] Multi-head attention is a variant of the attention mechanism that enhances the model's representational ability by capturing different aspects of the input data in parallel through multiple attention heads. Each head learns an independent query, key-value mapping, and outputs a weighted sum concatenated and subjected to a linear transformation. In this invention, the multi-head attention mechanism is used in the behavior analysis module to fuse multimodal features. The specific data processing path is as follows: the module extracts features from video, audio, and operational data and concatenates them into a high-dimensional vector; the vector is input into the multi-head attention network, which calculates the attention weights for different feature dimensions and outputs a weighted unified contextual feature vector; this vector is used to compare with the behavior baseline to identify anomalies.
[0130] Cosine similarity is a vector similarity measure, calculated by taking the cosine of the angle between two vectors. Its value ranges from -1 to 1, with larger values indicating greater similarity. This measure is insensitive to vector magnitude and is suitable for measuring directional similarity. In this invention, cosine similarity is used in the behavior analysis module and the twin modeling module to compare real-time features with predicted features. The specific data processing path is as follows: the module calculates the cosine similarity between the real-time context feature vector and the predicted feature vector, obtaining a similarity sequence; the sequence is used to assess the degree of behavioral deviation, with low similarity indicating anomalies; the similarity calculation may be combined with dynamic time warping to eliminate the influence of time offset.
[0131] The Kalman filter is a recursive state estimation algorithm that optimally estimates the state of a dynamic system through two steps: prediction and update. The algorithm assumes the system state follows a linear Gaussian model, calculates the posterior state estimate using prior states and current observations, and minimizes the covariance of the estimation error. In this invention, the Kalman filter is applied to the authentication module to fuse matching score sequences of lightweight facial feature descriptors and keyboard kinetic features. The specific data processing path is as follows: the module periodically captures frames from the video stream to extract facial feature descriptors and extracts keyboard kinetic features from operation events, matching them with the registration template to obtain independent scores; these scores are input as observations into the Kalman filter, which uses identity confidence as a state variable. The filter calculates the prior state estimate and covariance through the prediction step and adjusts the posterior state in conjunction with the observations in the update step; the filter outputs a smoothed comprehensive identity confidence score and variance, used to dynamically adjust the authentication threshold, achieving stability and adaptability in continuous authentication.
[0132] Spatiotemporal graph convolutional networks (SPCRCs) are deep learning models that combine graph convolution and temporal convolution to process dynamic graph-structured data. The network learns spatiotemporal evolution patterns by aggregating neighbor node information in the spatial dimension and capturing sequence dependencies in the temporal dimension. In this invention, SPCRC is applied to a cheating detection module to analyze the spatial correlation of examinee behavior within an examination room. The specific data processing path is as follows: the module constructs a dynamic spatiotemporal graph with examinees as nodes. Node attributes include the timestamp and intensity value of risk events. Edges are constructed based on IP ranges or virtual examination room numbers to establish spatial proximity relationships, and behavioral similarity relationships are constructed based on the similarity of answer time series. SPCRC performs hierarchical convolution operations on the dynamic spatiotemporal graph. The spatial convolutional layer uses Chebyshev multinomial approximation graph convolution to aggregate neighbor node features, while the temporal convolutional layer uses one-dimensional convolution to capture short-term dependencies in risk event sequences. The network outputs node-level risk clustering probabilities, identifying risk patterns that abnormally cluster in specific subgraph structures, such as examinees in the same physical area exhibiting similar abnormal behaviors at the same time.
[0133] The counterfactual causal reasoning framework is a causal inference method that quantifies the causal effect of intervention measures on outcomes by constructing counterfactual scenarios and comparing them with actual observations. The framework is based on a potential outcome model, assuming possible outcomes without intervention and comparing them with actual results. In this invention, the counterfactual causal reasoning framework is applied to the parallel review module to generate correction vectors. The specific data processing path is as follows: the module constructs a counterfactual hypothesis of "if there were no abnormal factors" for the abnormal segments output by the behavior analysis module; it uses the candidate's personal digital twin model in the twin modeling module to simulate the subsequent answering behavior sequence under normal cognitive conditions, including the answering path, time distribution, and accuracy rate; it performs a difference-of-differences comparison with the actual records, and calculates the treatment effect size of the abnormal behavior on the change in answering accuracy rate using a difference-of-differences model; it outputs a causal effect strength index, which reflects the net impact of abnormal behavior on the assessment results and is used to adjust the system detection parameters.
[0134] The medical examination knowledge graph is a semantic network that represents the logical relationships between knowledge points in the medical field, such as disease classification, symptom association, and treatment principles. The knowledge graph consists of entities, attributes, and relationships, supporting semantic queries and reasoning. In this invention, the medical examination knowledge graph is applied to the parallel review module to analyze the rationality of behavior and knowledge logic. The specific data processing path is as follows: the module retrieves the candidate's complete operation sequence from the behavior analysis module, including the answering order, modification traces, and eye movement trajectory; the operation sequence is mapped to the pre-constructed medical examination knowledge graph, and the semantic correlation between the behavior trajectory and knowledge point nodes is calculated using a graph neural network; the module analyzes whether the candidate's behavior pattern violates medical cognitive laws when continuously answering related knowledge point questions, such as skipping key diagnostic steps and answering directly; and outputs a logical contradiction coefficient to quantify the degree of deviation between behavior and knowledge logic, used to identify logical inconsistencies.
[0135] The confidence-weighted fusion algorithm is an information fusion technique that assigns weights based on the reliability of different pieces of evidence and obtains a comprehensive estimate after weighted averaging. The algorithm typically dynamically adjusts the weights based on a confidence factor to improve the robustness of the fusion result. In this invention, the confidence-weighted fusion algorithm is applied to the dynamic evaluation module, integrating the initial risk score and the correction vector. The specific data processing path is as follows: the module receives the initial risk score from the fraud detection module and the correction vector (including logical contradiction coefficient and causal effect strength) from the parallel review module; weights are assigned to each input based on a preset confidence factor, which is calculated based on historical accuracy, with the logical contradiction coefficient typically having a higher weight than the causal effect strength; a weighted average formula is used to calculate the comprehensive risk score, with variance normalization introduced to eliminate dimensional differences; a standardized comprehensive score is output, providing input for subsequent probabilistic inference.
[0136] Markov logic networks (MLRs) are statistical relational learning models that combine first-order logic and Markov networks to handle uncertain and relational data. The network defines probabilistic dependencies between variables through soft rules, supporting probabilistic inference. In this invention, the MLR is applied to the dynamic evaluation module to reason about contradictory evidence. The specific data processing path is as follows: the module defines a set of soft rules, such as "a high logical contradiction coefficient and a high causal effect strength lead to a high probability of cheating," with each rule accompanied by learnable weights; the initial risk score, logical contradiction coefficient, and causal effect strength are input into the network as observation nodes, and probabilistic inference is performed using Gibbs sampling or belief propagation algorithms; the joint probability distribution of all possible world states is calculated, and a comprehensive cheating probability value is output. This probability represents the posterior probability that the examinee has engaged in cheating, and is used to generate the final evaluation report.
[0137] Dynamic time warping (VTW) is a time series alignment technique used to compare sequences of different lengths and find the path with the minimum distance by bending the time axis. The algorithm overcomes the distortion of time series on the time scale and is suitable for similarity calculation. In this invention, VTW is applied to the cheating detection module to construct behavioral similarity edges. The specific data processing path is as follows: the module obtains the answer timestamp sequence from the candidate information database and calculates the similarity of different candidates' answering rhythms; it uses VTW to align the time series, finds the optimal bending path, and calculates the cumulative path distance as a similarity measure; combined with the Jaccard similarity coefficient of the answer options, it constructs behavioral similarity edges, with edge weights reflecting the degree of similarity in candidates' answering patterns; this is used to form a dynamic spatiotemporal graph, supporting spatiotemporal graph convolutional network analysis of risk propagation.
[0138] The Jaccard similarity coefficient is a statistical measure used to measure the similarity between two sets. It is obtained by calculating the ratio of the intersection size to the union size, and the value ranges from 0 to 1, with a higher value indicating a higher similarity. This coefficient is suitable for comparing the degree of overlap of discrete elements. In this invention, the Jaccard similarity coefficient is applied to the cheating detection module to construct behavioral similarity edges. The specific data processing path is as follows: the module obtains the answer timestamp sequence and answer option set of all online candidates from the candidate information database; for any two candidates, it extracts their answer option sets from the same time period and calculates the ratio of the number of elements in the intersection to the number of elements in the union of the two sets; the obtained Jaccard similarity coefficient is used as the weight of the behavioral similarity edge, reflecting the degree of similarity of the answer content between candidates; combined with the answer rhythm similarity calculated by the dynamic time warping algorithm, it constitutes the comprehensive weight of the behavioral similarity edge, which is used to construct a dynamic spatiotemporal graph to analyze the risk propagation pattern.
[0139] Differential privacy is a privacy protection framework that adds carefully designed random noise to data query results, controlling the impact of individual records on the output within a quantifiable range, thereby preventing the inference of individual data from aggregated information. The level of privacy protection is controlled by the ε parameter; a smaller ε value indicates a stronger level of privacy protection. In this invention, differential privacy is applied to the twin modeling module to protect the privacy of candidate data during the federated learning process. The specific data processing path is as follows: the module generates weight updates after training the gated recurrent unit network locally; before uploading the weight updates to the central server, the scale parameter of Laplace noise or Gaussian noise is calculated according to the preset privacy budget ε; random noise that meets the differential privacy requirements is added to each dimension of the weight updates; the noisy weight updates are uploaded through a secure channel, and the server aggregates all noisy updates to calculate the global average update; the noise addition ensures that attackers cannot infer any candidate's original data from the aggregated results, achieving a balance between privacy protection and model utility.
[0140] Q-learning is a model-free reinforcement learning algorithm that learns the optimal policy by iteratively updating the action-value function Q(s,a). The algorithm uses the Bellman equation to progressively optimize the Q-value, ultimately enabling the agent to choose the action that yields the maximum cumulative reward in a specific state. In this invention, Q-learning is applied to the incomplete information game model of the fraud detection module to optimize the system's monitoring strategy. The specific data processing path is as follows: the module treats the system as an agent, defining the system state s as the current risk event distribution and confidence level, and actions a including adjusting monitoring intensity and triggering a review mechanism; after the system executes the action, the environment transitions to a new state s', and the system receives an immediate reward r composed of the risk event intensity and the confidence level of collaborative cheating; the Q-value table is updated using the Q-learning update rule Q(s,a)←Q(s,a)+α[r+γmaxa'Q(s',a')-Q(s,a)], where α is the learning rate and γ is the discount factor; through multiple iterations, the optimal monitoring strategy under different risk states is learned, enabling the system to adaptively adjust its response to fraudulent behavior.
[0141] Gradient descent is an optimization algorithm that minimizes the loss function by calculating the gradient of the loss function with respect to the model parameters and updating the parameters along the negative gradient direction. Variants include batch gradient descent, stochastic gradient descent, and mini-batch gradient descent. In this invention, gradient descent is widely applied to the model training process across multiple modules. The specific data processing path is as follows: In the Siamese modeling module, the algorithm calculates the mean squared error loss function between the predicted and actual values of the gated recurrent unit network, calculates the gradient of the loss function with respect to the network weights through backpropagation, updates the step size using the learning rate control parameter, and iteratively optimizes the network parameters; in the behavior analysis module, the algorithm optimizes the weight parameters of the multi-head attention network, minimizing the difference loss between the context feature vector and the baseline features; during training, it is often combined with momentum terms or adaptive learning rate algorithms to accelerate convergence and prevent getting trapped in local optima.
[0142] Early stopping is a regularization technique to prevent overfitting in machine learning models. It determines when to terminate the training process by monitoring performance metrics on the validation set. Training is stopped early when the validation set loss no longer decreases for several consecutive rounds, preserving the model parameters with the best performance on the validation set. In this invention, the early stopping strategy is applied to the model training process of the Siamese modeling module and the behavior analysis module. The specific data processing path is as follows: the module divides the dataset into training and validation sets, and periodically evaluates the model performance on the validation set during training; it records the loss value or accuracy metric on the validation set, and triggers the early stopping mechanism when the validation set loss does not decrease for a preset number of rounds (e.g., 10 rounds); the system rolls back to the snapshot of the model parameters with the best performance on the validation set, terminating the training process. This strategy effectively prevents the model from overfitting the training data, improves generalization ability, and works in conjunction with differential privacy technology in the Siamese modeling module to achieve a balance between privacy protection and model utility.
[0143] Z-score standardization is a data preprocessing technique that transforms raw data into a standard normal distribution with a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation. The formula is z = (x-μ) / σ, where x is the original value, μ is the mean, and σ is the standard deviation. In this invention, Z-score standardization is applied to the data preprocessing stages of multiple modules. The specific data processing path is as follows: In the behavior analysis module, the mean and standard deviation of each of the 28 features extracted from the multimodal raw data stream, such as head pose, eye opening, and gaze direction, are calculated on the training set; the real-time incoming feature values are standardized using the corresponding mean and standard deviation to eliminate differences in the dimensions and numerical ranges of different features; the standardized feature vectors are input into the multi-head attention network to improve the model training stability and convergence speed; similarly, in the identity verification module, facial feature descriptors and keyboard dynamics features are also Z-score standardized to ensure that the numerical ranges of different modal features are consistent, thereby improving the effect of subsequent fusion algorithms.
[0144] The isolated forest model is an unsupervised anomaly detection algorithm that constructs isolated trees by randomly selecting features and segmentation values. Anomalies are more easily isolated because their feature values differ significantly from normal points, requiring shorter path lengths. The model calculates anomaly scores by averaging the path lengths of multiple isolated trees; higher scores indicate a greater likelihood of an anomaly. In this invention, the isolated forest model is applied to the behavior analysis module to identify anomalous segments in examinee behavior data. The model construction process is as follows: Before system deployment, the model is pre-trained using historical normal exam behavior data. This historical data includes similarity values of multimodal feature sequences. Multiple isolated trees are constructed through random subsampling and feature selection to form an isolated forest. In real-time processing, the model receives sliding window data of similarity sequences and calculates the path length from the root node to a leaf node for each data point. The path length reflects the ease with which a data point is isolated. When the average path length of data points within a window is significantly lower than the average path length of historical normal data, the window is marked as an anomalous behavior segment. The specific data processing path is as follows: The behavior analysis module receives a dynamic behavior baseline from the twin modeling module and a real-time multimodal feature sequence from the data acquisition module. It calculates the cosine similarity between the real-time features and the baseline features, generating a similarity sequence in time series form. The similarity sequence is divided into sliding windows of fixed length (e.g., 5 seconds) with a step size of 1 second. Each window includes multiple sampling points. The sliding window data is input into a pre-trained isolated forest model. The model calculates anomaly scores by integrating the path lengths of multiple isolated trees. The scores are compared based on a preset threshold or an adaptive threshold to identify abnormal windows and output the start and end times and anomaly intensity indicators of abnormal segments.
[0145] The incomplete information game model is a game theory framework used to simulate the strategic interactions of participants in an environment of information asymmetry, where one party cannot fully know the other party's private information. The model defines participants, policy space, information set, and payoff function, and optimizes policies using Bayesian updates and reinforcement learning algorithms. In this invention, the incomplete information game model is applied to a cheating detection module to assess the risk of student cheating. The model construction process is as follows: the system is defined as one party in the game, and the student as the other; the system's policy space includes actions such as adjusting monitoring intensity and triggering review mechanisms; the student's policy space includes normal examination or cheating behavior; the information set is based on partially observable states, such as risk events and confidence levels of collaborative cheating; the payoff function is defined based on the intensity of risk events and collaborative cheating threat indicators. The model initializes prior probabilities using historical data and updates the policy value function using the Q-learning algorithm. The specific data processing path is as follows: The fraud detection module receives abnormal fragments from the behavior analysis module and low-confidence intervals from the identity confidence curve from the identity verification module, transforming them into risk events with timestamps and intensity values; the module obtains metadata from the candidate information database, constructs a dynamic spatiotemporal graph, and uses a spatiotemporal graph convolutional network to identify potential collaborative cheating groups and their confidence levels; the risk events and collaborative cheating group information are used as payoff signals input into the incomplete information game model, and the model adjusts the prior probability estimates of various fraudulent behaviors through Bayesian update rules, and uses the Q-learning algorithm to update the state-action value function; the model outputs the initial fraud risk probability for each candidate, which integrates individual abnormal evidence and group association risk, and is used for decision support in subsequent modules.
[0146] The terminal devices involved in this invention include a computer terminal, camera, microphone, keyboard, and mouse used by the examinee. The computer terminal runs online assessment system client software, providing a user interface and handling local data acquisition tasks. The camera is connected to the computer terminal and captures a video stream of the examinee's face at a fixed frame rate. The video stream includes image data of the examinee's face and upper body, used for subsequent face detection, liveness verification, and gaze tracking analysis. The microphone is connected to the computer terminal and collects an ambient audio stream at a specific sampling rate. The audio stream includes ambient sounds from the examination room and potential voice interaction signals, used for voice activity detection and acoustic feature extraction. The keyboard and mouse, as input devices, have their operation events captured by the system's underlying event listener, recording keystroke codes, mouse coordinate trajectories, click events, and changes in browser tab activation status. The multimodal data collected by the terminal devices is stamped with a microsecond-level timestamp obtained from a high-precision network time protocol server during generation, and aligned with the acquisition clock start point through a hardware synchronization pulse signal, forming a synchronized multimodal raw data stream. The synchronized data is compressed and encapsulated, then streamed to the backend processing server via a latency-resistant network protocol, providing a high-quality input data source for the twin modeling module, behavior analysis module, and authentication module. The configuration of the terminal devices ensures the synchronization and integrity of multimodal data acquisition, supporting the accurate execution of subsequent behavior perception and fraud detection algorithms.
Claims
1. A medical online assessment management system based on multimodal behavior perception, characterized in that, include; The data acquisition module is configured to acquire facial video streams, environmental audio streams, and operation event sequences from the examinee's terminal, and output synchronized multimodal raw data streams; The twin modeling module is configured to receive synchronous multimodal raw data streams, construct personalized digital twins of examinees based on historical data, and aggregate and update the global behavioral representation model in the cloud through a federated averaging algorithm to output a dynamic behavioral baseline. The behavior analysis module is configured to receive synchronous multimodal raw data streams and dynamic behavior baselines, use a multi-head attention mechanism to fuse multimodal features, compare real-time features with dynamic behavior baselines, and output abnormal segments. The identity verification module is configured to extract facial video streams and operation event sequences from synchronized multimodal raw data streams, implement silent continuous authentication, and output an identity confidence curve. The cheating detection module is configured to receive abnormal segments and low-confidence intervals in the identity confidence curve, construct an incomplete information game model, and use a spatiotemporal graph convolutional network to analyze the spatial correlation of the behaviors of multiple candidates in the examination room and output a preliminary risk score. The parallel review module is configured to receive abnormal fragments, low confidence intervals in the identity confidence curve output by the identity verification module, and primary risk scores output by the fraud detection module. It performs semantic association analysis through a medical examination knowledge graph and generates correction vectors using a counterfactual causal reasoning framework. The dynamic assessment module is configured to receive the correction vectors output by the primary risk score and the parallel review module, integrate them through a confidence-weighted fusion algorithm, use a Markov logic network to reason about contradictory evidence, and output a comprehensive assessment report.
2. The medical online assessment and management system based on multimodal behavior perception according to claim 1, characterized in that, The data acquisition module is configured as follows: The candidate's terminal camera is used to capture raw video frames at a fixed frame rate to obtain video data; the microphone is used to capture ambient audio waveforms at a specific sampling rate to obtain audio data; and the system's underlying events are monitored to capture keyboard keystrokes, mouse coordinates and click events, and browser tab activation status change events to obtain operation event data. The acquired video data, audio data, and operation event data are stamped with microsecond-level timestamps obtained from a high-precision network time protocol server, and hardware synchronization pulse signals are sent to each sensor driver layer before transmission to align the acquisition clock start point. Face detection and region of interest cropping are performed on the stamped video data to compress the image size. Speech activity detection is applied to the stamped audio data to label segments including human voices. The stamped operation event data are aggregated into an event sequence sorted by time. The compressed video data, tagged audio data, and aggregated event sequences, along with their corresponding timestamps, are encapsulated into a data packet of a unified format. This encapsulated data packet is then streamed to the message queue of the backend processing server via a latency-resistant network protocol.
3. The medical online assessment and management system based on multimodal behavior perception according to claim 2, characterized in that, The twin modeling module is configured as follows: The system receives synchronous multimodal raw data streams from the data acquisition module, retrieves candidates' historical training data, extracts multimodal feature sequences from the historical training data and real-time multimodal raw data streams, and uses the extracted feature sequences to train a gated recurrent unit network to predict the behavioral feature vector of the next time period based on the behavioral characteristics of the previous time period, thus forming an initial digital twin of individual behavior. During the assessment, the system receives real-time synchronized multimodal raw data streams from the data acquisition module, performs time alignment processing on the data streams to obtain time-aligned multimodal feature sequences, inputs the time-aligned multimodal feature sequences into the gated recurrent unit network, and outputs the predicted values of the behavioral feature vectors within a short time window. The system receives real-time behavioral feature vectors from the behavior analysis module, calculates the difference loss between the predicted value output by the gated recurrent unit network and the true value provided by the behavior analysis module, and adjusts the weights of some layers of the gated recurrent unit network locally using the backpropagation algorithm. Based on the weight adjustment results, an encrypted weight update is generated and uploaded to the central server. The global average update is calculated using a federated averaging algorithm and then securely distributed to each candidate's terminal to update their local individual gated recurrent unit network model.
4. The medical online assessment and management system based on multimodal behavior perception according to claim 3, characterized in that, The behavior analysis module is configured as follows: The system receives a synchronous multimodal raw data stream from the data acquisition module and extracts in parallel the head pose, eye opening and gaze direction of the video stream, the Mel frequency cepstral coefficients and fundamental frequency profile of the audio stream, and the keystroke interval time, average mouse movement speed and page focus dwell time of the operation event sequence. The extracted features, including head pose, eye opening and closing, gaze direction, Mel frequency cepstral coefficients, fundamental frequency profile, keystroke interval time, average mouse movement speed, and page focus dwell time, are concatenated into a high-dimensional vector. This high-dimensional vector is then input into a multi-head attention network to learn the importance weights of different feature dimensions in the current exam context and output a weighted unified context feature vector. The system receives a dynamic behavior baseline from the twin modeling module, obtains the predicted behavior feature vector at the corresponding time point from the dynamic behavior baseline, calculates the cosine similarity between the real-time context feature vector output by the multi-head attention network and the predicted feature vector, and obtains a similarity sequence. The similarity sequence is segmented into fixed-length sliding windows. The sliding window data is input into a pre-trained isolation forest model. By calculating the path length required for data points to be isolated, windows with path lengths significantly shorter than the average level are identified as anomalous behavior segments.
5. The medical online assessment and management system based on multimodal behavior perception according to claim 4, characterized in that, The authentication module is configured as follows: At the start of the exam, candidates are instructed to perform specific actions to extract facial video streams and obtain high-quality frames from the synchronous multimodal raw data stream output by the data acquisition module. After liveness detection, three-dimensional facial feature point clouds are extracted from the high-quality frames, and the three-dimensional facial feature point clouds are compared with the registered template to complete the initial verification. During the assessment, the system continuously receives synchronous multimodal raw data streams from the data acquisition module and extracts facial video streams and operation event sequences from them. It periodically captures frames from the video stream to extract lightweight facial feature descriptors and extracts keyboard dynamic features from the operation events. The extracted lightweight facial feature descriptors and keyboard dynamic features are matched with the registration template to obtain independent scores. The independent scores are then fused and smoothed using a Kalman filter to estimate the overall identity confidence score and variance. The system monitors network latency and video illumination uniformity in real time, estimates the authentication quality factor based on these factors, dynamically adjusts the identity anomaly detection threshold based on the variance estimated by the authentication quality factor and the Kalman filter, and outputs an identity confidence curve based on the adjusted identity anomaly detection threshold and the comprehensive identity confidence score. The identity confidence curve includes a timestamp sequence, a real-time identity confidence score, and the identity anomaly detection threshold.
6. The medical online assessment and management system based on multimodal behavior perception according to claim 5, characterized in that, The fraud detection module is configured as follows: The system receives abnormal fragments from the behavior analysis module and identity confidence curves from the identity verification module. It extracts low-confidence intervals from the identity confidence curves and transforms the abnormal fragments and low-confidence intervals into risk events with timestamps and intensity values. Metadata of all online candidates is obtained from the candidate information database. The metadata includes login IP geographic mapping and answer timestamp sequence. Using candidates as nodes, spatial proximity edges are constructed based on IP range or virtual examination room number. Behavioral similarity edges are constructed based on the similarity of answer time and answer options, forming a dynamic spatiotemporal graph. Risk events are injected as node attributes into the dynamic spatiotemporal graph. Spatiotemporal graph convolutional networks are used to learn the risk propagation patterns in the dynamic spatiotemporal graph, identify the abnormal clustering patterns of risk events in specific subgraph structures, and output a list of potential collaborative cheating groups and their confidence levels. The risk events and potential collusion cheating groups, along with their confidence levels, are input into the incomplete information game model as payoff signals to update the system's monitoring strategy and output the initial cheating risk probability for each candidate.
7. The medical online assessment and management system based on multimodal behavior perception according to claim 6, characterized in that, The parallel review module is configured as follows: The system receives a preliminary risk score from the cheating detection module. When the preliminary risk score exceeds a preset threshold, it retrieves the complete operation sequence of the corresponding candidate during the risk period corresponding to the preliminary risk score from the behavior analysis module. The operation sequence is then mapped to a pre-constructed medical examination knowledge graph to analyze the rationality of the behavior and knowledge logic and output the logical contradiction coefficient. Abnormal segments are received from the behavior analysis module. Counterfactual hypotheses are constructed for the abnormal segments. The candidate's personal digital twin model in the twin modeling module is used to simulate and deduce the subsequent answering behavior and results under normal conditions. The abnormal behavior is compared with the actual records to quantify the treatment effect of abnormal behavior on the change in answering accuracy and obtain the strength of the causal effect. The logical contradiction coefficient and the causal effect strength are combined to generate a correction vector. The correction vector is sent to the fraud detection module to adjust the risk weight of the incomplete information game model, sent to the behavior analysis module to fine-tune the decision boundary of the isolated forest model, and sent to the twin modeling module as a regularization constraint for model update.
8. The medical online assessment and management system based on multimodal behavior perception according to claim 7, characterized in that, The parallel review module is also configured as follows: The answer sequence and modification traces of the examinee are extracted from the complete operation sequence received by the behavior analysis module. The video stream is extracted from the synchronous multimodal raw data stream output by the data acquisition module, and the gaze movement trajectory is extracted from it. The answer sequence, modification traces, gaze movement trajectory and the difficulty of the test questions and knowledge points are semantically correlated. Based on the semantic association analysis results, the matching degree between the pattern of candidates' gaze lingering on the relevant reference material area when answering questions on related knowledge points consecutively and the learning and cognitive patterns is calculated in order to identify logical contradictions.
9. The medical online assessment and management system based on multimodal behavior perception according to claim 8, characterized in that, The dynamic evaluation module is configured as follows: The system receives a preliminary risk score from the fraud detection module and a correction vector generated by combining the logical contradiction coefficient and the causal effect strength from the parallel review module. The system calculates the comprehensive risk score using a weighted average formula, where the weights of the correction vector are based on preset confidence factors. Establish a Markov logic network, define soft rules, and input the primary risk score, logical contradiction coefficient, and causal effect strength as observation nodes into the Markov logic network. Output the comprehensive fraud probability through probabilistic reasoning. Based on the overall fraud probability, combined with the initial risk score received from the fraud detection module, the correction vector received from the parallel review module, the abnormal fragments received from the behavior analysis module, and the identity confidence curve received from the identity verification module, a structured report is generated, which displays the time of abnormal behavior, related collaborating candidates, knowledge graph contradictions, and counterfactual reasoning results.
10. The medical online assessment management system based on multimodal behavior perception according to claim 9, characterized in that, Also includes: The parallel review module transmits the generated correction vector to the fraud detection module. The fraud detection module uses the logical contradiction coefficient in the correction vector to adjust the weight parameters of different types of risks in the incomplete information game model. The parallel review module transmits the correction vector to the behavior analysis module, which uses the causal effect strength in the correction vector to adjust the boundary threshold for judging abnormal behavior segments in the isolated forest algorithm. The parallel review module transmits the correction vector to the twin modeling module. The twin modeling module uses the logical contradiction coefficient in the correction vector as a regularization constraint in the training process of the gated recurrent unit network model to dynamically update the baseline parameters of individual behavior. By directing the transmission and parameter adjustment of correction vectors between the fraud detection module, behavior analysis module, and twin modeling module, a system-level adaptive optimization loop is formed.