A method and system for analyzing violations based on ar glasses

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By collecting multi-source time-series data through AR glasses, synchronizing and aligning the data, and segmenting it into action event nodes, the system utilizes time-series action recognition and causal graph models to analyze violations. This solves the problem of existing technologies being unable to explain the causes of violations, enabling accurate identification and quantitative analysis of violations, and improving the level of precision in safety management of live-line working scenarios.

CN122223601APending Publication Date: 2026-06-16ELECTRIC POWER RES INST OF GUANGXI POWER GRID CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ELECTRIC POWER RES INST OF GUANGXI POWER GRID CO LTD
Filing Date: 2026-02-25
Publication Date: 2026-06-16

Application Information

Patent Timeline

25 Feb 2026

Application

16 Jun 2026

Publication

CN122223601A

IPC: G06V20/20; G06V20/40; G06V40/20; G06V10/62; G06V10/764; G06V10/82; G06N3/042; G06N3/045; G06N3/0464; G06N5/04; G06N5/045

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122223601A_ABST

Patent Text Reader

Abstract

The application belongs to the technical field of live-line work safety management and control, and specifically discloses a method and system for analyzing rule violations based on AR glasses, which comprises the following steps: collecting video streams, IMU data, GPS data, and environment parameters and other multi-source time sequence data through AR glasses; after synchronous alignment processing, the work track is cut into action event nodes containing multiple attributes by using an action boundary detection algorithm; a time sequence action recognition model combined with a lightweight video understanding model and an ST-GCN is used to output the action category, rule violation probability, and feature vector of the node; a time perception causal graph model containing cross-time causal relationships is constructed, and the DoWhy framework is combined to carry out causal effect estimation and counterfactual reasoning, so that the influence of rule violation factors is quantitatively analyzed; and finally, a rule violation cause explanation report containing primary and secondary causes, causal chains, and counterfactual conclusions is generated, thereby solving the problem that the prior art only identifies results and does not know the causes, and improving the fine management and control level of work safety.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of safety management and control technology for live-line work, and in particular to a violation analysis method and system based on AR glasses. Background Technology

[0002] In high-risk work scenarios such as live-line work, standardized management of work behavior is crucial for ensuring personnel safety and work quality. Existing work monitoring systems mostly rely on terminals such as AR glasses to collect video and posture data, and use algorithms to identify violations. However, such systems can only complete "result-level" violation detection, that is, they only indicate "whether a violation has occurred," without explaining the underlying causes of the violation.

[0003] In this black-box monitoring model, managers find it difficult to distinguish whether violations are caused by environmental interference, operational habits, or emergencies, making it difficult to determine responsibility; training and optimization lack specificity and cannot accurately identify skill gaps; similar hidden dangers cannot be prevented at their root, and can only be addressed through passive rectification.

[0004] At the same time, existing technologies do not fully link the temporal causal relationships of multi-source data, nor do they achieve visual tracing of the violation process, resulting in a lack of completeness and intuitiveness in violation analysis, making it difficult to meet the actual needs of refined security management.

[0005] Therefore, a method and system for analyzing traffic violations based on AR glasses is needed. Summary of the Invention

[0006] To address this issue, the present invention provides a method and system for analyzing traffic violations based on AR glasses, thereby solving the aforementioned technical problems.

[0007] This invention provides a traffic violation analysis method based on AR glasses, comprising the following steps: Multi-source time-series data is collected during the operation using augmented reality (AR) glasses. The multi-source time-series data includes at least video streams, inertial measurement unit (IMU) data, GPS data, and environmental parameters. The multi-source time series data is synchronized and time-series aligned, and the synchronized and aligned multi-source time series data is divided into multiple action event nodes based on the action boundary detection algorithm. Each action event node is used to indicate the action of the operator. The action event node includes start and end time, participant identification, spatial coordinate sequence, posture curve, environmental snapshot information and tool usage status information. The action event node is processed using a temporal action recognition model, and the corresponding action category label, violation probability label, and action feature vector are output; wherein, the violation probability label is used to indicate whether the action corresponding to the action event node is a violation. A causal graph model is constructed, wherein the nodes of the causal graph model include event nodes corresponding to the action event nodes and attribute feature nodes corresponding to the attribute features of the action event nodes. The attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact mark nodes. The edges of the causal graph model include edges between attribute feature nodes of the same event node, and edges between preceding and subsequent event nodes at adjacent times. For the target action event node identified as a violation, based on the causal graph model, the causal inference algorithm is used to estimate the causal effect of one or more candidate factors on triggering the violation, and counterfactual inference is performed. The counterfactual inference is used to quantitatively estimate the change in the probability of the violation occurring when the candidate factors do not occur. Based on the causal effect and the results of the counterfactual reasoning, a report explaining the causes of the violation is generated and output.

[0008] Preferably, the action boundary detection algorithm performs event segmentation based on abrupt changes in one or more of the following signals: peak values of operator movement speed, abrupt changes in body posture angle, and changes in the contact state between the hand and the tool.

[0009] Preferably, the temporal action recognition model is a combination of a lightweight video understanding model and a spatiotemporal graph convolutional network, and its output includes action embedding vectors and key pose time series for downstream causal analysis.

[0010] Preferably, when constructing the time-aware causal graph model, the types of the k consecutive historical event nodes preceding the target event node are introduced as contextual features to obtain the impact of preceding operations on subsequent violations.

[0011] Preferably, the causal reasoning algorithm adopts the DoWhy framework, and its process includes: formally defining the causal problem, identifying causal effects, selecting an estimator for estimation, and performing sensitivity testing.

[0012] Preferably, the estimator is any one of propensity score matching, dual machine learning, or structural equation modeling, used to estimate the average or individual treatment effect of the candidate factors on the violation.

[0013] Preferably, the specific process of the counterfactual reasoning is as follows: construct an event node with a similar context to the candidate factor within the time window in which the candidate factor occurred but the candidate factor did not occur as a control group, and through comparative analysis, calculate a quantitative conclusion that if the candidate factor did not occur, the probability of violation is reduced by a certain percentage.

[0014] Preferably, the report includes at least a sorted list of primary and secondary causes that triggered the violation, a causal chain describing the cause chain, and quantitative counterfactual conclusions.

[0015] Preferably, after outputting the explanation report of the cause of the violation, a dynamic visualization timeline interface can also be generated. The interface integrates and displays the action event nodes, the causal relationship edges in the causal chain, and environmental snapshots at key time points in chronological order.

[0016] In another aspect, this application also provides a traffic violation analysis system based on AR glasses, comprising: The time-series data acquisition module is used to collect multi-source time-series data during the operation process through augmented reality (AR) glasses. The multi-source time-series data includes at least video streams, inertial measurement unit (IMU) data, GPS data, and environmental parameters. The action event node acquisition module is used to synchronize and align the multi-source time series data, and to divide the synchronized and aligned multi-source time series data into multiple action event nodes based on the action boundary detection algorithm. Each action event node is used to indicate the action of the operator. The action event node includes start and end time, participant identifier, spatial coordinate sequence, posture curve, environmental snapshot information and tool usage status information. The temporal action recognition result output module is used to process the action event node using the temporal action recognition model and output the corresponding action category label, violation probability label and action feature vector; wherein, the violation probability label is used to indicate whether the action corresponding to the action event node is a violation. A causal graph model construction module is used to construct a causal graph model. The nodes of the causal graph model include event nodes corresponding to the action event nodes and attribute feature nodes corresponding to the attribute features of the action event nodes. The attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact marker nodes. The edges of the causal graph model include edges between attribute feature nodes of the same event node, and edges between preceding and subsequent event nodes at adjacent times. The causal reasoning module is used to estimate the causal effect of one or more candidate factors on triggering the violation based on the causal graph model for the target action event node identified as a violation, and to perform counterfactual reasoning. The counterfactual reasoning is used to quantitatively estimate the change in the probability of the violation occurring when the candidate factors do not occur. The report generation module is used to generate and output a report explaining the causes of violations based on the causal effects and the results of the counterfactual reasoning.

[0017] This disclosure also provides an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the AR glasses-based violation analysis method as described above.

[0018] In another aspect, this disclosure provides a computer-readable storage medium having stored thereon computer program instructions that can be executed by a processor to implement the AR glasses-based violation analysis method as described above.

[0019] In another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the aforementioned violation analysis method based on AR glasses.

[0020] This invention collects multi-source time-series data through AR glasses, and through synchronous processing and action event segmentation, time-series action recognition, time-aware causal graph construction, and causal reasoning, it not only achieves accurate identification of violations, but also quantitatively analyzes the primary and secondary causes and causal chains of violations, generating a structured explanation report. At the same time, the dynamic and visualized timeline intuitively presents event nodes, causal relationships, and environmental snapshots, solving the limitation of existing systems that only recognize results but not causes. This helps in accurate responsibility determination, targeted training optimization, and prevention of root causes of hidden dangers, significantly improving the level of precision and efficiency of safety management in scenarios such as live-line work. Attached Figure Description

[0021] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. In all the drawings, similar elements or parts are generally identified by similar reference numerals. In the drawings, the elements or parts are not necessarily drawn to scale.

[0022] Figure 1 A flowchart of a traffic violation analysis method based on AR glasses is provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the temporal action recognition model architecture provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the cause-effect graph structure construction process provided in an embodiment of the present invention; Figure 4 A schematic diagram of the causal effect estimation process provided in an embodiment of the present invention; Figure 5 A schematic diagram of a traffic violation analysis system based on AR glasses provided in an embodiment of the present invention; Figure 6This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0024] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0025] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0026] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0027] like Figure 1 As shown, this embodiment of the invention discloses a traffic violation analysis method 100 based on AR glasses, including the following method steps: S1, collect multi-source time-series data during the operation process through augmented reality (AR) glasses. The multi-source time-series data includes at least video streams, inertial measurement unit (IMU) data, GPS data, and environmental parameters. S2, synchronize and align the multi-source time series data, and divide the synchronized and aligned multi-source time series data into multiple action event nodes based on the action boundary detection algorithm. Each action event node is used to indicate the action of the operator. The action event node includes start and end time, participant identification, spatial coordinate sequence, posture curve, environmental snapshot information and tool usage status information. S3, the action event node is processed using a temporal action recognition model, and the corresponding action category label, violation probability label, and action feature vector are output; wherein, the violation probability label is used to indicate whether the action corresponding to the action event node is a violation. S4, construct a causal graph model. The nodes of the causal graph model include event nodes corresponding to the action event nodes and attribute feature nodes corresponding to the attribute features of the action event nodes. The attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact mark nodes. The edges of the causal graph model include edges between attribute feature nodes of the same event node and edges between preceding and subsequent event nodes at adjacent times. S5, for the target action event node identified as a violation, based on the causal graph model, use the causal reasoning algorithm to estimate the causal effect of one or more candidate factors on triggering the violation, and perform counterfactual reasoning, which is used to quantitatively estimate the change in the probability of the violation occurring when the candidate factors do not occur; S6. Based on the causal effect and the result of the counterfactual reasoning, generate and output a report explaining the cause of the violation.

[0028] In some embodiments, for step S1, real-time acquisition of multi-dimensional time-series data during the operation is completed through augmented reality (AR) glasses terminal with multi-sensor integration and synchronous acquisition capabilities.

[0029] The video stream can be captured by the RGB camera built into the AR glasses. For example, the RGB camera captures the video stream of the work scene at a frame rate of 30fps-60fps and a resolution of 1080P. Each frame of the video carries a precise timestamp to ensure temporal alignment with other data. The video stream is used for subsequent action sequence analysis and pose extraction.

[0030] Inertial Measurement Unit (IMU) data can be acquired through IMU sensors. For example, using the six-axis or nine-axis IMU sensors built into AR glasses, acceleration and angular velocity data can be acquired at a sampling rate of 100Hz-200Hz to generate motion dynamics time series, which can be used to capture the operator's limb movement trajectory and posture changes.

[0031] Global Positioning System (GPS) data can be collected via a GPS module. For example, spatial coordinate data, including longitude, latitude, and altitude, can be collected at a sampling rate of 1Hz-10Hz using a GPS module integrated into AR glasses, forming the operator's spatial movement trajectory.

[0032] Environmental parameters can be collected through environmental sensors. For example, if the AR glasses have built-in environmental sensors, they can directly collect ambient illuminance, ambient noise, and ambient temperature; if the AR glasses do not have built-in sensors, they can communicate wirelessly with environmental monitoring nodes deployed in the work area to obtain the above environmental parameters in real time and generate environmental snapshot information.

[0033] The tool usage status can be collected through computer recognition. For example, the computer vision recognition of AR glasses can be used to identify the tool outline and handheld status based on target detection algorithms, or the tool can interact with the AR glasses' near field communication module through the near field communication (NFC) or radio frequency identification (RFID) tags built into the tool to determine whether the tool is in a handheld, contacting the work surface, or idle state, and generate a tool contact mark, where 0 indicates no contact and 1 indicates contact.

[0034] In some embodiments, for step S2, for the multi-source time-series data collected by AR glasses, the data timing deviation is eliminated by synchronization alignment processing, and then the semantic segmentation of the worker's action trajectory is realized based on the action boundary detection algorithm to generate action event nodes containing complete attribute information.

[0035] Specifically, using the AR glasses system clock as a reference, a unified timeline is established based on the timestamps of each data source. Linear interpolation is used to time-align data with inconsistent sampling frequencies to ensure the correspondence between multiple data sources at the same point in time.

[0036] Optionally, for data with packet loss or outliers, a sliding window filter or the 3σ criterion can be used to remove outliers to ensure data integrity and accuracy.

[0037] In one embodiment, three key signals—motion velocity signal, attitude angle change signal, and tool contact state signal—are selected from multi-source time-series data as inputs for boundary detection, covering three dimensions: motion, attitude, and tool interaction, to ensure the semantic accuracy of boundary recognition.

[0038] Among them, the motion velocity of the operator's hand or the position where the AR glasses are worn can be calculated by quadratic integration based on the aligned IMU acceleration data, and the magnitude of the velocity vector can be taken as the motion velocity signal.

[0039] Among them, the absolute value of the attitude angle difference between two adjacent target time points can be calculated based on the pitch angle, roll angle and yaw angle obtained by IMU data fusion, and used as the attitude angle change signal.

[0040] Specifically, the computer vision recognition system using AR glasses can detect the outline of a tool and the position of the operator's hand in real time, calculate the minimum distance between the hand and the tool, and determine contact when the distance is less than 5cm, marking the tool contact status signal as 1; when the distance is greater than or equal to 5cm, it is determined as "non-contact" and marked as 0, forming a binary tool contact status signal. This computer vision recognition can be based on the YOLOv8-tiny target detection algorithm.

[0041] For the three types of input signals, boundary trigger thresholds are set respectively. When any condition is met, it is determined to be the start boundary or end boundary of the action. The specific thresholds are determined by statistical analysis of multiple sets of operation sample data.

[0042] For example, when the motion speed signal value exceeds twice the average speed of the previous 5 time points, and this state continues for more than or equal to 3 time points, it is determined as the action start boundary; when the motion speed signal value is less than 0.5 times the average speed of the previous 5 time points, and the duration is greater than or equal to 3 target time points, it is determined as the action termination boundary.

[0043] For example, when the absolute value of the difference of any attitude angle, that is, the attitude angle change signal value, is greater than 10° / 5ms, and the change causes the current attitude angle to deviate from the average attitude angle of the previous 10 target time points by more than 15°, it is determined to be an action boundary.

[0044] For example, when the tool contact status signal changes from 0 to 1 or from 1 to 0, it is directly determined as an action boundary, and this boundary usually corresponds to the start or end of tool use.

[0045] Optionally, since different signals may trigger multiple boundaries within the same time period, such as when the peak motion speed and the change in tool contact state occur simultaneously, boundary deduplication and merging are required to avoid generating redundant event nodes.

[0046] Specifically, if the time difference between two trigger boundaries is less than 50ms, they are determined to be the same action boundary, and the boundary time point that was triggered first is retained; when the time difference between the boundary triggered by the change in tool contact state and other boundaries is greater than 50ms, the boundary triggered by the change in tool contact state is given priority as the start or end boundary of the action. It can be understood that this is because the change in tool contact state directly corresponds to the core interaction node of the operation action, and the semantics are clearer.

[0047] Subsequently, based on the start-end time pairs obtained from action boundary detection, the continuous operation trajectory represented by multi-source time-series data is divided into discrete action event nodes. Each action event node includes start and end time, participant identifier, spatial coordinate sequence, posture curve, environmental snapshot information, and tool usage status information.

[0048] Specifically, the event duration can be obtained by calculating the time difference based on the system timestamps of the recorded start and end boundaries of the action; the unique ID of the operator bound to the AR glasses is linked to the enterprise's employee management system to ensure that the participants are traceable.

[0049] Specifically, the GPS coordinates of all target time points within the start and end time range of the action event node can be extracted to form a spatial coordinate sequence, which is used to reconstruct the spatial movement trajectory of the operator; the attitude angle data of all target time points within the start and end time range can be extracted, and an attitude curve can be generated with time as the horizontal axis and attitude angle as the vertical axis. At the same time, the original attitude angle time series can be stored for attitude feature extraction in subsequent behavior recognition; the environmental parameter data corresponding to the midpoint of the action event node can be selected as an environmental snapshot, including illumination value, noise value, temperature value, humidity value and illumination change.

[0050] Specifically, it can also count the percentage of time during which the tool contact status signal is 1 within the start and end time range of the action event node, and mark it as the tool contact percentage; at the same time, it can record the number of times the tool contact status changes during the event, in order to distinguish between action types of continuous tool use and multiple intermittent tool use.

[0051] In some embodiments, for step S3, the temporal action recognition model is a combination architecture based on a lightweight video understanding model and a spatio-temporal graph convolutional network (ST-GCN). It performs structured analysis on each action event node and outputs action category labels, violation probability labels, action feature vectors, and key pose point time series, providing behavioral-level data for downstream causal inference.

[0052] like Figure 2 As shown, Figure 2 This is a schematic diagram of the temporal action recognition model architecture. Considering the computing power constraints of edge computing scenarios for AR glasses, while meeting the accuracy requirements of action recognition, the temporal action recognition model is a combination architecture of dual-branch feature extraction and fusion inference, including a visual feature branch, a pose temporal branch, and a feature fusion and inference layer.

[0053] The visual feature branch is responsible for extracting global visual features from the video frame sequence of action event nodes, such as tool shape and work scene background. For example, the visual feature branch can use VideoMAE-Tiny as the backbone network, with fewer than or equal to 50M parameters, supporting efficient extraction of frame-level features and adapting to the NPU acceleration capabilities of AR glasses.

[0054] The pose temporal branch is responsible for extracting local dynamic features, such as limb movement trajectories and joint angle changes, from the time series of key pose points. For example, a lightweight 3-layer ST-GCN structure is adopted, including an input layer, a hidden layer, and an output layer; the number of nodes in the hidden layer is 128, which can reduce the amount of computation while preserving the temporal correlation of pose.

[0055] The feature fusion and inference layer is used to concatenate the frame-level feature vector output by the visual feature branch with the pose feature vector output by the pose temporal branch to obtain a fused feature vector, which is then input into a two-layer fully connected network to finally output action recognition results. The complementary features of global vision and local pose improve the recognition accuracy.

[0056] In one embodiment, the model input data is first preprocessed. This can be done by first extracting keyframes from the video frame sequence of the action event nodes at a frequency of 10fps to ensure that the complete action process is covered; then scaling the size of each keyframe to 224×224 pixels, mapping the pixel values to the range of [-1,1], and adding Gaussian noise with a mean of 0 and a variance of 0.1 to enhance robustness, forming a visual input tensor, which is then input to the visual feature branch.

[0057] Then, the coordinate time series of 17 key pose points are extracted from the pose curve of the action event node, and the original data is retained at a sampling rate of 200Hz. The coordinate data is normalized, for example, the midpoint of the human shoulder is taken as the origin, and the corresponding coordinates are mapped to the interval [-1,1] to remove the influence of human scale differences. The final pose input tensor is input to the ST-GCN branch.

[0058] After processing each action event node, the model outputs four types of structured data, directly serving downstream causal analysis. Specifically, the output of this time-series action recognition model includes action category labels, violation probability labels, action feature vectors, and time series of key pose points.

[0059] Specifically, the Softmax function can be used to normalize the probability distribution of action categories output by the model, take the category with the highest probability as the final label, and output the confidence score of the category. When the confidence score is less than 0.7, it is marked as a suspected action and requires subsequent manual review.

[0060] Specifically, the model outputs the probability value of an action event as a violation. When the probability value is greater than or equal to 0.8, it is determined to be a violation event, triggering the subsequent causal analysis process; when the probability value is less than 0.5, it is determined to be a "compliant event", and only the identification result is stored without entering the causal analysis.

[0061] Specifically, the fusion feature vector output by the model fusion inference layer can be extracted as the action feature vector. This vector contains comprehensive feature information from both visual and pose dimensions, which is used to assign attribute values to action feature nodes in the downstream causal graph model, supporting causal relationship calculation.

[0062] Specifically, the time series of key attitude points after model processing is output, preserving the details of dynamic attitude changes, which are used for the identification of potential violation triggering factors such as attitude stability and action deviation in subsequent causal analysis. For example, attitude stability can be judged by calculating the variance of attitude point coordinates.

[0063] In some embodiments, for step S4, the nodes of the cause-effect graph model are divided into two categories, forming an "event-attribute" two-level node structure.

[0064] Among them, the event node corresponds one-to-one with the action event node generated by S2, and the node ID is consistent with the action event ID. The core associated information includes the event start and end time and the participant identifier, which serve as the aggregation carrier of attribute feature nodes; attribute feature nodes: each event node is associated with a set of attribute feature nodes.

[0065] Among them, the attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact marker nodes.

[0066] Specifically, the attitude feature node is used to associate the time series of key attitude points output by S3 with the attitude stability statistics; the motion dynamics feature node is used to associate the statistical features derived from IMU data, such as the mean acceleration, peak angular velocity, and rate of change of motion velocity; the ambient lighting feature node is used to associate the lighting value and rate of change of lighting in the ambient snapshot; the ambient noise feature node is used to associate the noise value and duration of noise in the ambient snapshot; and the tool contact flag node is used to associate the proportion of tool contact states and the number of times the contact state changes.

[0067] In one embodiment, the edges of the causal graph model are used to represent potential causal relationships, including edges between different attribute feature nodes of the same event node, which represent the direct causal influence between different attributes in the same event; and edges between preceding and subsequent event nodes at adjacent times, which represent the induced influence of the preceding event on the subsequent event.

[0068] Preferably, to capture the impact of preceding operations on subsequent violations, contextual features are introduced into the model construction. Specifically, the types of the k consecutive historical event nodes preceding the target action event node corresponding to the violation can be obtained as contextual features, where k is a positive integer greater than or equal to 1. These contextual features are then used as attributes of the target event node corresponding to the target action event node to construct the causal graph model. For example, the default value is k=3, which can be adjusted to 2-5 depending on the complexity of the task scenario to cover the potential impact of recent operation sequences on the current event.

[0069] For example, the format of the context feature is a k-dimensional action category label sequence. For instance, if the action category labels of the three historical events preceding the target event are coded as tool retrieval action (2), equipment inspection action (5), and wiring action (8), then the context feature is represented as [2,5,8], which is directly associated with the attribute set of the target event node.

[0070] In this way, a context feature attribute node can be added to each event node to store the corresponding k-dimensional historical action category label sequence. This node is directly connected to the target event node and establishes potential association edges with other attribute feature nodes of the target event.

[0071] In one embodiment, such as Figure 3 As shown, Figure 3 The diagram illustrates the process of constructing a cause-effect graph structure, which includes the following steps: S301 constructs an initial causal graph based on prior knowledge, combining operational safety specifications, ergonomic principles, and domain expert experience to predefine clear causal relationship edges and form the skeleton of the initial causal graph.

[0072] For example, the causal edges corresponding to different attributes of the same event include the edges between ambient lighting feature nodes and posture feature nodes. The prior knowledge is that when the illumination value is lower than 200 lux or the illumination change rate is greater than 500 lux / s, the posture stability of the operator will decrease.

[0073] For example, a causal edge between preceding and following events across time includes the attitude feature nodes of the preceding event and the attitude feature nodes of the following event, representing the temporal continuation of attitude anomalies.

[0074] At the same time, an initial confidence value is added to the initial causal edge. This initial confidence value can be assigned based on expert experience and labeled with an expert-defined identifier.

[0075] S302: After accumulating a certain number of action event samples, the initial causal graph is optimized using a causal discovery algorithm.

[0076] Among them, the attribute feature values, context feature sequences and violation labels of all event nodes can be extracted to construct a standardized dataset, in which the feature values are normalized to the [0,1] interval; then, a constrained causal discovery algorithm is used to test the conditional independence between variables and complete the potential causal edges not covered in the initial causal graph.

[0077] For attribute feature nodes of the same event, examine the potential association between combinations of environmental noise features and motion dynamics features, posture features and tool contact marks. If the p-value of the conditional independence test is less than or equal to 0.05, a causal relationship is determined, and a new causal edge is added within the corresponding attribute, with an initial confidence level of 0.7.

[0078] For cross-time event nodes, examine the long-term correlation between the event features at time t-2 and the event features at time t. If the p-value is less than or equal to 0.05, add a cross-time causal edge with an initial confidence value of 0.6. For the causal edges defined by experts in the initial causal graph, their conditional independence is verified through data-driven testing. If the p-value is greater than 0.1, it is determined that the causal relationship has no statistical significance in the actual data, and its confidence level is lowered to below 0.3. It is marked as pending verification and will be verified again after accumulating more samples. If there is still no statistical support, the edge is deleted.

[0079] S303 strengthens cross-time causal relationships by enhancing the accuracy of causal relationships through time window constraints and correlation strength calculations.

[0080] Among them, cross-time causal edges are only allowed to connect the target event node with the first k historical event nodes. Here, k is the value of k for calculating context features, with a default value of 3. In other words, the range of cross-time causal edges is the sum of the durations of the k consecutive events before the start time of the target event, avoiding interference from weak associations with excessively long time spans.

[0081] Mutual information values can be used to quantify the correlation strength of causal edges across time; a higher mutual information value indicates a stronger causal relationship.

[0082] Specifically, for the attribute features of the preceding event and the attribute features of the subsequent event, the mutual information value of the two is calculated and normalized to the interval [0,1]. The mutual information value is used as the confidence update value of the cross-time causal edge. If the difference between the original confidence value and the mutual information value is greater than 0.2, the mutual information value is used as the standard for updating, so as to ensure that the confidence of the causal edge conforms to the actual data pattern.

[0083] Optionally, a graph database is used to store the causal graph model, with each node and edge associated with structured attributes. Event nodes store ID, start and end times, participant identifiers, and contextual feature sequences; attribute feature nodes store corresponding feature values and statistical indicators; causal edges store confidence scores, association strengths, construction methods, and update timestamps to ensure model traceability.

[0084] In some embodiments, for S5, based on the causal graph model, causal reasoning is carried out through the DoWhy framework to estimate the causal effects of candidate factors and perform counterfactual quantitative analysis for the target event node corresponding to the violation, thereby clarifying the core influencing factors that trigger the violation.

[0085] First, target action event nodes identified as violations can be filtered from the S3 output results. The filtering criteria are: the probability label of the violation is greater than 0.8 and the confidence of the action category label is greater than 0.7, ensuring that the violation attributes of the event are clear and unambiguous. A unique violation event ID is assigned to the target action event node, and it is associated with its corresponding node in the causal graph model and all attribute features and context features.

[0086] Then, based on the target event node and causal edge information associated with the target action event node in the causal graph model, candidate factors, namely potential violation triggering factors, are extracted.

[0087] This process involves filtering attribute feature nodes that have direct or indirect causal relationships with the target event node, retaining only attribute features that are abnormal in the target event, incorporating high-risk preceding actions contained in the context features, such as tags like "unchecked equipment" or "unauthorized handling of tools" in the first k historical events, and finally forming a set of candidate factors, with each candidate factor labeled with its corresponding attribute type and abnormal description.

[0088] Then, causal effect estimation is performed based on the DoWhy framework. For example... Figure 4 As shown, Figure 4 This is a schematic diagram of the causal effect estimation process, which specifically includes: S401. Based on observational data, estimate the causal effect of intervention variable X on outcome variable Y, i.e., whether the abnormal occurrence of X leads to a significant increase in Y, and the quantitative value of the effect.

[0089] Among them, the intervention variable X is a single candidate factor. Discrete factors are assigned a value of 1 or 0, representing normal or abnormal respectively; continuous factors are directly adopted using the original normalized values; the outcome variable Y is the probability of violation in the violation probability label of the target event; the confounding variable Z is the attribute feature node that simultaneously affects the intervention variable X and the outcome variable Y, selected from the causal graph model, such as the worker's experience level, work period and environmental noise value, to ensure coverage of potential confounding factors.

[0090] S402 is identified based on the Backdoor Criterion of the DoWhy framework.

[0091] This involves traversing all paths between X and Y in the causal graph model and filtering out backdoor paths, which are paths that contain edges pointing to both X and Y, such as X←worker experience level→Y.

[0092] Then, check whether the set of confounding variables Z satisfies the condition of blocking all backdoor paths and not containing descendant nodes of X. If it satisfies the condition, the causal effect is determined to be identifiable; if it does not satisfy the condition, supplement the missing confounding variables from the causal graph until the backdoor criterion is met.

[0093] S403, select the appropriate estimator based on the type of candidate factor X, and calculate the average treatment effect (ATE) or individual treatment effect (ITE).

[0094] For discrete intervention variables with a small number of confounding variables (e.g., 5 or fewer), propensity score matching (PSM) estimators can be used. For continuous intervention variables, or scenarios with a large number of confounding variables and complex nonlinear relationships, double machine learning (DML) estimators can be used. For scenarios with clear causal relationships and direct / indirect paths in the causal graph, structural equation modeling (SEM) can be used as an estimator, which can simultaneously estimate direct and indirect effects.

[0095] For propensity score matching, the propensity score of each event sample can be calculated using a logistic regression model based on the confounding variable Z, which is the probability that the intervention variable X=1 is abnormal; control samples with a propensity score difference of less than or equal to 0.05 and a contextual feature similarity of more than 90% are matched for the target event; the difference between the target event Y value and the average Y value of the control samples is calculated, which is ATE.

[0096] For dual machine learning, the dataset can be split into a training set and a validation set. Gradient boosting tree models can be used to fit the relationships between X and Z, and Y and Z, respectively, to obtain two sets of residuals: X residual = X - predicted X, and Y residual = Y - predicted Y. Linear regression can be performed on the X residual and Y residual, and the regression coefficient is the causal effect. For example, a coefficient of -0.002 means that for every 100 lux increase in illumination, the probability of violation decreases by 0.2.

[0097] For structural equation modeling, structural equations can be constructed based on the path relationships in the causal graph; the maximum likelihood estimation method can be used to solve the equation parameters to obtain the direct effect, indirect effect and total effect of X on Y.

[0098] S404 uses the E-value method to test the robustness of causal effects and resist the interference of unobserved confounding variables.

[0099] The E-value is calculated to measure the strength of association required for unobserved confounding variables to overturn the current causal effect conclusion. If the E-value is greater than or equal to 1.5, the causal effect is considered robust and not significantly affected by unobserved confounding variables; if 1.0 ≤ E-value < 1.5, it is marked as moderately robust and requires further validation with more samples; if the E-value is less than 1.0, the causal effect is considered unrobust and the candidate factor is removed.

[0100] In one embodiment, counterfactual quantitative analysis is achieved by constructing a control sample based on the causal effect estimation results.

[0101] Specifically, control event samples can be selected within the time window of the target event. These control event samples must have a contextual similarity of more than 90% with the target event, a difference of less than 10% in the value of the confounding variable Z, and no abnormalities in the candidate factor X. Optionally, the number of control group samples must be greater than or equal to 5. If this is insufficient, the time window is expanded to the previous 2 hours. If it is still insufficient, it is marked as insufficient control samples, and counterfactual conclusions are derived using the causal effect coefficient.

[0102] Then, quantitative conclusions can be calculated. When the control group sample is sufficient, the average violation probability Y_control of the control group sample is calculated, and then the percentage reduction in violation probability is calculated as (Y_treatment - Y_control) / Y_treatment × 100%, where Y_treatment is the violation probability of the target event.

[0103] For example, if the target event Y_treatment=0.95 and the control group average Y_control=0.25, then the counterfactual conclusion is that if the candidate factor X does not show any abnormality, the probability of this violation will decrease from 95% to 25%, a reduction of 73.7%.

[0104] When the control sample is insufficient, the ATE or coefficient derivation based on the causal effect, such as ATE=0.7, that is, the abnormality of X leads to a 70% increase in the probability of violation, then the counterfactual conclusion is that if the candidate factor X is not abnormal, the probability of violation will be reduced by about 70%.

[0105] Optionally, the reliability of the conclusions can be ensured by comparing the consistency between the counterfactual conclusions and the causal effect estimation results. If they are consistent, the conclusions are retained; if they are inconsistent, the control group samples are re-screened or the estimator is adjusted.

[0106] In one embodiment, the causal effect value, robustness, and counterfactual conclusion of each candidate factor are integrated and sorted from largest to smallest by the absolute value of the causal effect to form a priority list of candidate factors.

[0107] In some embodiments, for step S6, based on the causal effect estimation and counterfactual reasoning results, a report explaining the causes of violations is generated, which includes the core causes, causal chains and quantitative conclusions. A dynamic visualization timeline interface is also constructed to intuitively present the process of violation formation, providing support for management decisions and training optimization.

[0108] The core ranking indicator is the absolute value of the causal effect output by S5, which is then combined with the E-value from the sensitivity test for weight adjustment to finally calculate the overall impact score. Specifically, the overall score = absolute value of causal effect × E-value.

[0109] Then, the top N candidate factors in the overall score ranking can be taken as the primary and secondary factors. The primary factor must have an overall score greater than 0.5 and an E-value greater than 1.5, and the secondary factor must have an overall score greater than 0.3 and an E-value greater than 1.0.

[0110] Each factor includes a factor name, attribute type, causal effect value, robustness level, and anomaly description. For example, in the example row, a primary cause is: abnormal ambient light value (Environment class, ATE=0.72, high robustness, light value 50 lux < threshold 200 lux); a secondary cause is: preceding action includes unchecked equipment (preceding action class, ATE=0.35, moderate robustness, event action label at time t-1 is 'unchecked equipment').

[0111] Among them, based on the causal edges of the causal graph model, the complete logical path from the main cause and secondary cause can be traced to the result of the violation. Causal edges with a confidence level of ≥0.7 are retained first to ensure the rationality of the chain. Natural language expression is used to generate descriptive text from the main cause and secondary cause to the intermediate influence and then to the result, clarifying the relationship and quantitative impact of each link.

[0112] Optionally, if the primary and secondary causes have a synergistic effect, the interaction relationship needs to be clearly defined. For example, the secondary cause of failing to check the equipment in the preceding action leads to tool contact deviation, which, combined with the visual interference caused by the primary cause of abnormal ambient lighting, results in a deviation in operating posture, ultimately triggering a violation.

[0113] For quantitative counterfactual conclusions, directly use the quantitative results of S5 counterfactual reasoning, clarify the change in the probability of violation when the candidate factor does not occur, and indicate the sample size of the control group or the basis for the derivation to ensure credibility.

[0114] The quantitative result can be described using a uniform format such as "If no abnormality occurs in the [candidate factor], the probability of violation will be reduced from the [original probability] to the [predicted probability], with a reduction of [X%] (sample size of the control group: N / based on ATE derivation)".

[0115] Optionally, for cases where neither the primary nor secondary cause is abnormal, the combined reduction can be calculated based on the superposition of their causal effects. The combined reduction = 1 - (1 - reduction of the primary cause) × (1 - reduction of the secondary cause), and then a description can be generated. For example, "If the ambient lighting is normal and the preceding actions have completed the equipment inspection, the probability of this violation will be reduced by approximately 83.3%."

[0116] In one embodiment, the report uses a combination of text descriptions and data charts. The text portion consists of structured paragraphs, including chapters on main causes and secondary causes, causal chains, and counterfactual conclusions. The data charts include bar charts of causal effects and line charts comparing the probability of violations, which visually display key data.

[0117] Preferably, a dynamic visualization timeline interface can be generated, which integrates and displays the action event nodes, the causal relationship edges in the causal chain, and environmental snapshots at key time points in chronological order.

[0118] Specifically, the start and end times of the target event node can be used as the core interval, extending backward to the start times of the previous k historical event nodes and forward by 30 minutes. Then, action event nodes are marked on the timeline in chronological order. Violation target event nodes are marked with solid red blocks, and the block size is positively correlated with the event duration; previous historical event nodes are marked with hollow blue blocks, and the event ID and action category label are marked below the node.

[0119] Simultaneously, arrowed line segments connect relevant nodes in the causal chain, such as preceding and subsequent event nodes; the line segment color corresponds to the factor type; and the line segment thickness is positively correlated with the confidence of the causal edge. Key time points in the causal chain are selected as snapshot collection points, such as the moment when environmental anomalies occur, the moment when attitude changes suddenly, and the moment when violations are triggered. Each snapshot point is associated with corresponding environmental parameter data and video frame screenshots.

[0120] Then, set a snapshot icon at the corresponding time point on the timeline. When the mouse hovers over or selects the icon, a pop-up window will display the environmental parameter values and a screenshot of the video frame. Key areas in the screenshot are marked with red boxes.

[0121] Figure 5 An AR glasses-based traffic violation analysis system 500 is shown. The system implementation is similar to... Figure 1 Corresponding to the illustrated method embodiments, the specific methods include: The timing data acquisition module 501 is used to collect multi-source timing data during the operation process through augmented reality (AR) glasses. The multi-source timing data includes at least video streams, inertial measurement unit (IMU) data, GPS data, and environmental parameters. The action event node acquisition module 502 is used to synchronize and align the multi-source time series data, and divide the synchronized and aligned multi-source time series data into multiple action event nodes based on the action boundary detection algorithm. Each action event node is used to indicate the action of the operator. The action event node includes start and end time, participant identifier, spatial coordinate sequence, posture curve, environmental snapshot information and tool usage status information. The temporal action recognition result output module 503 is used to process the action event node using the temporal action recognition model and output the corresponding action category label, violation probability label and action feature vector; wherein, the violation probability label is used to indicate whether the action corresponding to the action event node is a violation. The causal graph model construction module 504 is used to construct a causal graph model. The nodes of the causal graph model include event nodes corresponding to the action event nodes and attribute feature nodes corresponding to the attribute features of the action event nodes. The attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact mark nodes. The edges of the causal graph model include edges between attribute feature nodes of the same event node and edges between preceding and subsequent event nodes at adjacent times. The causal reasoning module 505 is used to estimate the causal effect of one or more candidate factors on triggering the violation based on the causal graph model for the target action event node identified as a violation, and to perform counterfactual reasoning. The counterfactual reasoning is used to quantitatively estimate the change in the probability of the violation occurring when the candidate factors do not occur. The report generation module 506 is used to generate and output a report explaining the cause of the violation based on the causal effect and the result of the counterfactual reasoning.

[0122] Those skilled in the art will clearly understand that the technical solutions of the embodiments of this application can be implemented by means of software and / or hardware. In this specification, "unit" and "module" refer to software and / or hardware that can independently complete or cooperate with other components to complete a specific function, wherein the hardware may be, for example, a field-programmable gate array (FPGA), an integrated circuit (IC), etc.

[0123] Each processing unit and / or module in the embodiments of this application can be implemented by an analog circuit that implements the functions described in the embodiments of this application, or by software that executes the functions described in the embodiments of this application.

[0124] Please see Figure 6It shows a schematic diagram of the structure of an electronic device according to an embodiment of this application, which can be used to implement... Figure 1 The method in the illustrated embodiment. (As shown) Figure 6 As shown, the electronic device 600 may include: The system includes at least one processor 601, at least one network interface 604, a user interface 603, a memory 605, and at least one communication bus 602. The communication bus 602 is used to enable connection and communication between the components. The user interface 603 may include buttons, and optionally include a standard wired or wireless interface. The network interface 604 may include, but is not limited to, a Bluetooth module, an NFC module, a Wi-Fi module, etc.

[0125] The processor 601 may include one or more processing cores and connect to various parts within the device 600 via various interfaces and lines. It implements the various functions and data processing of the device 600 by running or executing instructions, programs, code sets, or instruction sets stored in the memory 605, and by accessing data in the memory 605. Optionally, the processor 601 may be implemented using at least one hardware form of DSP, FPGA, or PLA. The processor 601 may also integrate one or more combinations of CPU, GPU, and modem. The CPU is mainly used to handle the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem is used for wireless communication. It is understood that the modem may not be integrated into the processor 601, but may be implemented through a separate chip.

[0126] Memory 605 may include random access memory (RAM) or read-only memory (ROM). Optionally, memory 605 includes a non-transitory computer-readable medium for storing instructions, programs, code, code sets, or instruction sets. Memory 605 may be divided into a program storage area and a data storage area, wherein the program storage area may be used to store instructions for implementing an operating system, instructions for implementing at least one function (such as touch functionality, audio playback functionality, image playback functionality, etc.), and instructions for implementing the foregoing method embodiments; the data storage area may be used to store data involved in the relevant method embodiments. Memory 605 may also be at least one storage device located remotely from processor 601. Figure 6 As shown, the memory 605, which serves as a computer storage medium, may contain an operating system, a network communication module, a user interface module, and program instructions.

[0127] In particular, the methods and / or embodiments in this application can be implemented as computer software programs. For example, the embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowchart. When the computer program is executed by processor 601, it performs the functions defined in the methods of this application.

[0128] Another embodiment of this application provides a computer-readable storage medium having computer program instructions stored thereon, which can be executed by a processor to implement the methods and / or technical solutions of any one or more embodiments of this application described above.

[0129] The computer-readable storage medium may include, but is not limited to, any type of disk, including floppy disks, optical disks, DVDs, CD-ROMs, microdrives, as well as magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic cards or optical cards, nanosystems (including molecular memory ICs), or any type of medium or device suitable for storing instructions and / or data.

[0130] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0131] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for analyzing traffic violations based on AR glasses, characterized in that, include: Multi-source time-series data is collected during the operation using augmented reality (AR) glasses. The multi-source time-series data includes at least video streams, inertial measurement unit (IMU) data, GPS data, and environmental parameters. The multi-source time series data is synchronized and time-series aligned, and the synchronized and aligned multi-source time series data is divided into multiple action event nodes based on the action boundary detection algorithm. Each action event node is used to indicate the action of the operator. The action event node includes start and end time, participant identification, spatial coordinate sequence, posture curve, environmental snapshot information and tool usage status information. The action event node is processed using a temporal action recognition model, and the corresponding action category label, violation probability label, and action feature vector are output; wherein, the violation probability label is used to indicate whether the action corresponding to the action event node is a violation. A causal graph model is constructed, wherein the nodes of the causal graph model include event nodes corresponding to the action event nodes and attribute feature nodes corresponding to the attribute features of the action event nodes. The attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact mark nodes. The edges of the causal graph model include edges between attribute feature nodes of the same event node, and edges between preceding and subsequent event nodes at adjacent times. For the target action event node identified as a violation, based on the causal graph model, the causal inference algorithm is used to estimate the causal effect of one or more candidate factors on triggering the violation, and counterfactual inference is performed. The counterfactual inference is used to quantitatively estimate the change in the probability of the violation occurring when the candidate factors do not occur. Based on the causal effect and the results of the counterfactual reasoning, a report explaining the causes of the violation is generated and output.

2. The method for analyzing traffic violations based on AR glasses according to claim 1, characterized in that, The action boundary detection algorithm is used to segment the action event node based on abrupt changes in one or more of the following signals: peak value of operator movement speed, abrupt change in body posture angle, and change in the contact state between the hand and the tool.

3. The traffic violation analysis method based on AR glasses according to claim 1, characterized in that, The temporal action recognition model is a combination of a lightweight video understanding model and a spatiotemporal graph convolutional network. Its output also includes the time series of key pose points for downstream causal analysis.

4. The traffic violation analysis method based on AR glasses according to claim 1, characterized in that, The construction of the causal graph model includes: The types of the k consecutive historical event nodes preceding the target action event node are obtained as context features, where k is a positive integer greater than or equal to 1; The contextual features are used as attributes of the target event node corresponding to the target action event node to construct the causal graph model.

5. The method for analyzing traffic violations based on AR glasses according to claim 1, characterized in that, The causal reasoning algorithm adopts the DoWhy framework, and its process includes: formally defining the causal problem, identifying causal effects, selecting an estimator for estimation, and performing sensitivity testing.

6. The method for analyzing traffic violations based on AR glasses according to claim 5, characterized in that, The estimator is any one of propensity score matching, dual machine learning, or structural equation modeling, used to estimate the average or individual treatment effect of the candidate factors on the violation.

7. The traffic violation analysis method based on AR glasses according to claim 1, characterized in that, The specific process of the counterfactual reasoning is as follows: Construct an event node with a similar context to the candidate factor within the time window in which the candidate factor occurred, but in which the candidate factor did not occur, as a control group. Through comparative analysis, calculate a quantitative conclusion that if the candidate factor did not occur, the probability of violation would be reduced by a certain percentage.

8. A traffic violation analysis system based on AR glasses, characterized in that, include: The time-series data acquisition module is used to collect multi-source time-series data during the operation process through augmented reality (AR) glasses. The multi-source time-series data includes at least video streams, inertial measurement unit (IMU) data, GPS data, and environmental parameters. The action event node acquisition module is used to synchronize and align the multi-source time series data, and to divide the synchronized and aligned multi-source time series data into multiple action event nodes based on the action boundary detection algorithm. Each action event node is used to indicate the action of the operator. The action event node includes start and end time, participant identifier, spatial coordinate sequence, posture curve, environmental snapshot information and tool usage status information. The temporal action recognition result output module is used to process the action event node using the temporal action recognition model and output the corresponding action category label, violation probability label and action feature vector; wherein, the violation probability label is used to indicate whether the action corresponding to the action event node is a violation. A causal graph model construction module is used to construct a causal graph model. The nodes of the causal graph model include event nodes corresponding to the action event nodes and attribute feature nodes corresponding to the attribute features of the action event nodes. The attribute feature nodes include posture feature nodes, motion dynamics feature nodes, ambient lighting feature nodes, ambient noise feature nodes, and tool contact marker nodes. The edges of the causal graph model include edges between attribute feature nodes of the same event node, and edges between preceding and subsequent event nodes at adjacent times. The causal reasoning module is used to estimate the causal effect of one or more candidate factors on triggering the violation based on the causal graph model for the target action event node identified as a violation, and to perform counterfactual reasoning. The counterfactual reasoning is used to quantitatively estimate the change in the probability of the violation occurring when the candidate factors do not occur. The report generation module is used to generate and output a report explaining the causes of the violation based on the causal effect and the results of the counterfactual reasoning.

9. An electronic device, characterized in that, include: At least one processor; and a memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A computer-readable medium having computer program instructions stored thereon, characterized in that, The computer program instructions can be executed by a processor to implement the method as described in any one of claims 1-7.