An ar-glasses-based detection method and system for compliance of wearing insulating protective equipment

By constructing a human-protective gear topology map using AR glasses and utilizing graph neural networks to identify the wearing status of insulating protective gear, the problem of the inability to identify non-standard wearing in existing technologies has been solved, thereby improving the accuracy and real-time performance of protective gear detection in high-risk operations.

CN122244742APending Publication Date: 2026-06-19ELECTRIC POWER RES INST OF GUANGXI POWER GRID CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ELECTRIC POWER RES INST OF GUANGXI POWER GRID CO LTD
Filing Date
2026-02-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for detecting insulating protective gear cannot identify non-standard wearing conditions, posing safety hazards. Furthermore, they lack portability and real-time interactive capabilities, making it difficult to adapt to the needs of complex and ever-changing high-risk work sites.

Method used

An AR glasses-based method for detecting compliance of insulated protective gear wearing is adopted. By obtaining key points of the human body and protective gear detection box information through pose estimation and target detection, a human-protective gear topology graph is constructed, and a graph neural network is used to learn the spatial topological relationship to output compliance confidence and generate alarm information.

Benefits of technology

It enables accurate identification of the wearing status of insulating protective gear, improves the accuracy and real-time performance of detection, and ensures the safety of workers.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244742A_ABST
    Figure CN122244742A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of image processing technology, specifically disclosing a method and system for detecting the compliance of insulating protective gear wearing based on AR glasses. The method includes: acquiring video sequences of workers using AR glasses; processing the images frame by frame to obtain the coordinates of key human body points and protective gear detection frames; constructing a human body topology graph based on the key human body points, with key points as nodes and physiological connections as edges; converting the protective gear detection frames into feature vectors and attaching them to relevant nodes to form a human body-protective gear topology graph; inputting the graph into a pre-trained graph neural network model; learning the spatial topological relationships through message passing and feature aggregation; and outputting a compliance confidence score containing the probability distribution of multiple wearing states; judging the compliance of the protective gear accordingly, and generating an alarm when non-compliant. This method can distinguish between standard and non-standard wearing, improve the accuracy and real-time performance of protective gear detection in high-risk operations, and ensure the safety of workers.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and in particular to a method and system for detecting compliance of wearing insulating protective gear based on AR glasses. Background Technology

[0002] In high-risk work scenarios such as live-line work, the compliance of wearing insulating protective gear is the core line of defense for ensuring the personal safety of workers. Traditional protective gear compliance inspections rely on manual inspections, which suffer from low efficiency, high missed inspection rates, and poor real-time performance, making it difficult to meet the dynamic safety management needs of high-risk operations.

[0003] With the application of computer vision technology, existing protective gear detection solutions mostly use positional overlap or distance threshold methods to determine the wearing status of protective gear. That is, compliance is determined by whether the overlap ratio between the protective gear target box and the key point area of ​​the human body, and whether the distance between the center of the protective gear and the key point meets the threshold. However, these methods only focus on geometric spatial proximity and cannot identify non-standard wearing states, such as insulating gloves being worn only halfway, goggles being tilted, or protective gear slipping or loosening. Such states may meet the threshold requirements geometrically, but they pose serious safety hazards, resulting in insufficient safety protection capabilities of traditional solutions.

[0004] Meanwhile, existing solutions rely heavily on fixed equipment to collect data, lacking portability and real-time interactive capabilities. When facing multi-person operation scenarios, the matching of protective gear and human body is easily affected by obstruction and environmental interference, leading to misjudgment. Furthermore, the lack of dynamic adjustment of judgment criteria in conjunction with the operation stage further reduces the detection accuracy and practicality, making it difficult to adapt to the needs of complex and ever-changing high-risk operation sites.

[0005] Therefore, there is a need for a method and system for detecting the compliance of wearing insulating protective gear based on AR glasses. Summary of the Invention

[0006] To address this issue, the present invention provides a method and system for detecting compliance of wearing insulating protective gear based on AR glasses, thereby solving the aforementioned technical problems.

[0007] This invention provides a method for detecting the compliance of wearing insulating protective gear based on AR glasses, comprising the following steps: A video sequence containing at least one worker is acquired; each frame of the video sequence is processed to obtain the coordinate information of key human body points through a pose estimation model and the detection box information of protective gear through a target detection model; a human body topology graph is constructed based on the key human body points, wherein each key human body point is treated as a graph node and the physiological connections between adjacent key points are treated as edges of the graph; the detection box information of the protective gear is converted into a protective gear feature vector, and this protective gear feature vector is attached to the most relevant key human body point node in the human body topology graph to form a human body-protective gear topology graph that incorporates protective gear information; the... The human body-protective gear topology map is input into a pre-trained graph neural network model. The graph neural network model performs message passing and feature aggregation to learn the spatial topological relationship between human body key points and protective gear, and outputs a compliance confidence score to characterize the protective gear wearing status. The compliance confidence score includes the probability distribution of multiple wearing status categories. Based on the compliance confidence score, it is determined whether the protective gear wearing is compliant, and an alarm message is generated when it is determined to be non-compliant. The spatial topological relationship refers to the non-linear, part-dependent relative positional pattern of the protective gear relative to the relevant human body key point set, which is used to distinguish between standard and non-standard wearing status.

[0008] Preferably, the graph neural network model is a graph attention network; message passing and feature aggregation include at least three rounds of message passing to propagate structural context information between nodes of the graph.

[0009] Preferably, the protective gear feature vector includes at least one of the following: normalized coordinates of the center of the protective gear detection frame relative to the key point of the attached human body, normalized size of the protective gear detection frame, one-hot encoding of the protective gear category, orientation angle of the protective gear, and visual feature embedding extracted from the protective gear image region.

[0010] Preferably, the wearing status categories include at least: correct wearing, partial wearing, tilted wearing, slippage, and missing.

[0011] Preferably, after outputting the compliance confidence score, the time sequence module is further used to process the compliance confidence score and human key point information of multiple consecutive frames; a time sequence consistency score is generated based on the time sequence consistency, and the threshold used to judge compliance is dynamically adjusted in combination with the current operation stage identified by the action recognition module.

[0012] Preferably, the alarm information is generated under the following conditions: within a preset time window, the probability value representing the compliance category in the compliance confidence score for the same protective gear is continuously lower than the dynamically adjusted threshold for a predetermined number of frames, so as to avoid instantaneous false alarms.

[0013] Preferably, in scenarios with multiple workers, the method further includes assigning a unique identifier (ID) to each worker and tracking their movement trajectory; matching the detected protective gear with the human body topology map of each worker, with the matching conditions considering spatial overlap, appearance feature similarity, and graph neural network context consistency score.

[0014] Preferably, the loss function used in the training process of the graph neural network model is a composite loss function, which includes at least: protective gear detection loss, graph neural network classification loss, and temporal consistency loss.

[0015] Preferably, the training process also introduces a contrastive learning loss term, which increases the representation distance in the feature space by constructing sample pairs of correct and non-standard wearing.

[0016] In another aspect, this application also provides a compliance detection system for wearing insulating protective gear based on AR glasses, comprising: The acquisition module is used to acquire video sequences containing at least one operator. The detection box information acquisition module is used to process each frame of the video sequence, obtain the coordinate information of human key points through the pose estimation model, and obtain the detection box information of the protective gear through the target detection model. A human body topology graph construction module is used to construct a human body topology graph based on the human body key points, wherein each human body key point is used as a graph node and the physiological connections between adjacent key points are used as graph edges. The fusion topology graph construction module is used to convert the detection box information of the protective gear into a protective gear feature vector, and attach the protective gear feature vector to the human body key point node most relevant to it in the human body topology graph, so as to form a human body-protective gear topology graph that integrates the protective gear information. The compliance output module is used to input the human body-protective gear topology map into a pre-trained graph neural network model, perform message passing and feature aggregation through the graph neural network model, learn the spatial topological relationship between human body key points and protective gear, and output a compliance confidence score to characterize the protective gear wearing state. The compliance confidence score includes the probability distribution of multiple wearing state categories. The alarm module is used to determine whether the wearing of protective gear is compliant based on the compliance confidence level, and to generate alarm information when it is determined to be non-compliant; The spatial topological relationship refers to the nonlinear, part-dependent relative positional pattern of the protective gear relative to the relevant set of key points on the human body, which is used to distinguish between standard and non-standard wearing states.

[0017] This disclosure also provides an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the AR glasses-based insulating protective gear wearing compliance detection method as described above.

[0018] In another aspect, this disclosure provides a computer-readable storage medium having stored thereon computer program instructions that can be executed by a processor to implement the AR glasses-based compliance testing method for wearing insulating protective gear as described above.

[0019] In another aspect of this disclosure, a computer program product is provided, including a computer program that, when executed by a processor, implements the above-described method for detecting compliance of wearing insulating protective gear based on AR glasses.

[0020] This invention enables portable real-time data acquisition through AR glasses, and accurately acquires key points of the human body and protective gear detection frames by combining posture estimation and target detection. It innovatively constructs a human body topology map and integrates protective gear feature vectors to form a human body-protective gear topology map. By using graph attention networks to learn spatial topological relationships, it can accurately distinguish between standard and non-standard protective gear, improving the accuracy, real-time performance, and practicality of compliance detection of insulating protective gear in high-risk operations such as live-line work, and ensuring the safety of workers. Attached Figure Description

[0021] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. In all the drawings, similar elements or parts are generally identified by similar reference numerals. In the drawings, the elements or parts are not necessarily drawn to scale.

[0022] Figure 1 A flowchart of a method for testing the compliance of wearing insulating protective gear based on AR glasses, provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the process for obtaining detection frame information of protective gear according to an embodiment of the present invention; Figure 3 This is a schematic diagram illustrating the process of fusing protective gear feature vectors with key human body nodes according to an embodiment of the present invention. Figure 4 This is a schematic diagram of the compliance threshold adjustment process provided in an embodiment of the present invention; Figure 5 A schematic diagram of an AR glasses-based compliance detection system for wearing insulating protective gear is provided as an embodiment of the present invention. Figure 6This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0023] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0024] It should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0025] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0026] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0027] like Figure 1 As shown, this invention discloses a method 100 for detecting compliance of wearing insulating protective gear for AR glasses, which integrates human pose estimation and spatial topology learning. Applied to AR glasses, the method 100 includes the following steps: S1, Acquire a video sequence containing at least one operator; S2, process each frame of the video sequence, obtain the coordinate information of key points of the human body through the pose estimation model, and obtain the detection box information of the protective gear through the target detection model; S3, construct a human body topology graph based on the human body key points, wherein each human body key point is used as a graph node and the physiological connection between adjacent key points is used as the edge of the graph. S4, the detection frame information of the protective gear is converted into a protective gear feature vector, and the protective gear feature vector is attached to the human body key point node most relevant to it in the human body topology map to form a human body-protective gear topology map that integrates the protective gear information; S5, the human body-protective gear topology map is input into a pre-trained graph neural network model. The graph neural network model performs message passing and feature aggregation to learn the spatial topological relationship between human body key points and protective gear, and outputs compliance confidence to characterize the protective gear wearing state. The compliance confidence includes the probability distribution of multiple wearing state categories. S6. Based on the compliance confidence level, determine whether the protective gear is worn in compliance, and generate an alarm message when it is determined to be non-compliant; The spatial topological relationship refers to the nonlinear, part-dependent relative positional pattern of the protective gear relative to the relevant set of key points on the human body, which is used to distinguish between standard and non-standard wearing states.

[0028] In some embodiments, parameters are adapted through the AR glasses interaction function before the operation. For example, the resolution is adjusted according to the distance between the person and the glasses. When the distance is less than 1.5m, a 720P resolution is used, and when the distance is greater than 1.5m, a 1080P resolution is used. Regions of Interest (ROI) are preset according to the type of protective gear. The ROI of gloves can correspond to the lower middle part of the gloves, and the ROI of goggles can correspond to the upper middle part of the goggles. Invalid backgrounds are cropped.

[0029] After the acquisition is triggered, the AR glasses’ image sensor captures frames according to preset parameters. Its processor converts the acquired image data into RGB format and normalizes the pixels to [0,1]. Then, when the acquired video frame rate is less than 15fps, the resolution is reduced or unnecessary functions are turned off. An alarm is triggered when the lens is blocked or a person is lost. The processed video frames are temporarily stored in the buffer and flow to the subsequent steps in sequence.

[0030] When there are multiple people in an image, AR glasses can identify the location of people using the lightweight YOLOv8-tiny and prompt them to adjust their viewing angle to ensure full coverage; when people are scattered, the system automatically collects data in different areas and marks their locations; when multiple AR glasses are collecting data, data can be avoided by synchronizing the time and embedding device identifiers.

[0031] In some embodiments, for step S2, the video sequence acquired in S1 is processed frame by frame. Human body keypoint coordinates are extracted using a pose estimation model, and protective gear detection box information is obtained using a target detection model. This provides core data for subsequent human body topology construction and protective gear embedding. For example, such as... Figure 2 The diagram shows the process for obtaining the detection frame information of the protective gear. The specific implementation process is as follows: S201, video frame preprocessing, performs light preprocessing on each frame of image temporarily stored in the AR glasses buffer in S1 to eliminate environmental interference and match the model input requirements. The preprocessing includes noise suppression, size normalization, pixel normalization and data format conversion.

[0032] Specifically, noise suppression can be achieved by using Gaussian filtering to smooth out image noise caused by dust and water mist at the work site, thus preventing noise from affecting subsequent model feature extraction.

[0033] Specifically, size normalization can be achieved by scaling the image to a fixed resolution according to the input size requirements of the pose estimation model and the protective gear detection model. The scaling process uses bilinear interpolation to ensure that the spatial ratio between the human body and the protective gear remains unchanged. The pose estimation model can be a lightweight version of RTMPose or ViTPose, and the protective gear detection model can be a YOLOv8-small model or a lightweight version of RT-DETR.

[0034] Pixel normalization can be achieved by mapping the RGB channel pixel values ​​of the image from [0,255] to [0,1] and subtracting the mean of the dataset during model training to eliminate the interference of pixel brightness differences on the model output.

[0035] Among them, data format conversion can be used to convert the preprocessed image into a tensor format supported by the model, so that the model can call and calculate it.

[0036] S202, obtains the coordinates of key points on the human body based on the pose estimation model.

[0037] For example, a lightweight version of RTMPose or a variant of ViTPose is used as the pose estimation model, which runs in real time on the AR glasses processor to extract the coordinate information of key human points. The specific process may include: The pre-trained lightweight pose estimation model is loaded into the AR glasses processor. The pre-processed single-frame image is input, and the model extracts image features through the backbone network, then fuses multi-scale features through the neck network, and finally outputs the prediction results of 17 or 25 human key points through the head network.

[0038] Then, the model outputs the two-dimensional coordinates and confidence score of each key point. The two-dimensional coordinates are the pixel coordinates after the image is scaled up, and the confidence score represents the reliability of the key point prediction, with a value range of [0,1]. Kalman filtering is then performed on the key point coordinates of consecutive frames. The coordinate values ​​are corrected through prediction-update iteration to suppress key point jumps caused by the jitter of the operator's movements and ensure the temporal continuity of the coordinate sequence.

[0039] If the confidence level of a key point is lower than a preset threshold, the key point is determined to be a prediction anomaly, and the coordinates of the key point at the same position in the previous frame are used to complete the prediction, so as to avoid the impact of the subsequent human body topology construction due to the absence of a single key point.

[0040] S203, Obtain protective gear detection frame information based on target detection model.

[0041] For example, YOLOv8-small or RT-DETR lightweight version can be used as the protective gear detection model. The preprocessed image is processed simultaneously, and the protective gear detection bounding box and associated information are output. The specific process may include: Among them, AR glasses can load a pre-trained lightweight protective gear detection model. After inputting a pre-processed image, the model generates candidate protective gear detection boxes through an anchor box mechanism or an anchorless mechanism, and at the same time predicts the protective gear category and category confidence corresponding to each candidate box.

[0042] Then, non-maximum suppression is performed on the candidate detection boxes output by the model. The IOU threshold is set to 0.3, and redundant detection boxes with an overlap higher than the threshold are removed. The detection box with the highest class confidence is retained as the final result.

[0043] Then, the detection box information for each protective gear is output, including the pixel coordinates of the upper left and lower right corners of the detection box (x1, y1, x2, y2), the protective gear category, and the category confidence. The protective gear category can be represented by one-hot encoding, such as [1, 0, 0] for insulating gloves.

[0044] If the model supports segmentation, it can simultaneously output the segmentation mask of the protective gear to represent the pixel region of the protective gear in the image, providing more refined morphological information for the subsequent construction of the protective gear feature vector; if the category confidence of a protective gear detection box is lower than the preset threshold, it is judged as a suspected false detection and the detection box is directly removed to avoid invalid data from entering the subsequent process.

[0045] Finally, the coordinate information of human key points in a single frame image is associated with the protective gear detection box information by frame index and stored in the structured data cache of the AR glasses. At the same time, the timestamp of each frame is recorded to ensure the temporal correspondence of continuous frame data, providing a data association basis for subsequent multi-target tracking and human-protective gear topology map construction.

[0046] In some embodiments, for step S3, based on the human body key point coordinate information obtained in S2, a human body topology graph is constructed with key points as nodes and physiological connections as edges, providing a structured graph model for subsequent protective gear feature embedding and spatial topology learning.

[0047] Specifically, the first step is to perform validity screening and node attribute initialization on the human body key points output by S2, including: (1) Based on the confidence of key points output by S2, key points with confidence ≥ 0.5 are retained as candidates for graph nodes; if only one of the two key points corresponding to a certain key physiological connection meets the confidence requirement, the coordinates of the key point at the same position in the previous frame are used to fill in the missing key point to ensure the integrity of the physiological connection and avoid the breakage of the topological structure.

[0048] (2) Construct a feature vector for each selected key node. The feature dimensions include: Coordinate normalization feature: The keypoint pixel coordinates (x, y) are normalized relative to the top left corner vertex of the bounding rectangle of the human body to obtain (x_norm, y_norm), eliminating the influence of image size and distance from the person on the coordinates; where the keypoint pixel coordinates (x, y) are generated by fitting all keypoints. Confidence feature: The confidence of key points directly included in the S2 output, characterizing the reliability of node coordinates; Motion features: Calculate the coordinate difference between the key point and the key point at the same position in the previous frame to obtain the short-term velocity vector (vx,vy), which reflects the motion trend of the key point; Joint angle features: For key points with parent nodes, such as the elbow's parent node being the shoulder and the wrist's parent node being the elbow, calculate the angle between the line connecting the key point and its parent node and the horizontal direction to quantify the joint posture. Body proportion characteristics: Calculate the proportion of the Euclidean distance between this key point and the key point of the human head to the total height of the human body, and incorporate the proportion constraints of human physiological structure. (3) Node identification: assign a unique identifier to each node and associate it with the corresponding human body part information to facilitate subsequent protective gear attachment and topological relationship matching.

[0049] Subsequently, based on the constraints of human physiological structure, the connection relationship between adjacent key points is determined and the edges of the graph are constructed. At the same time, edge features are defined. The connection between the aforementioned adjacent key points covers the key parts associated with the insulating protective gear, such as the wrist corresponding to insulating gloves, the head corresponding to goggles, and the ankle corresponding to insulating boots, to ensure that the topology graph can reflect the human structural relationships related to the protective gear.

[0050] Then, a feature vector can be constructed for each physiological connection edge. The feature dimensions include bone length ratio feature, angle constraint confidence, and edge stability feature. Among them, the bone length ratio feature is the proportion of the Euclidean distance between the two key points corresponding to the edge to the total height of the human body; the angle constraint confidence is the matching degree between the current joint angle and the normal range of motion based on the range of motion of the human joint, which represents the rationality of the physiological connection; the edge stability feature is the rate of change of the edge length within 3 consecutive frames. The smaller the rate of change, the higher the stability feature value, reflecting the temporal continuity of the physiological connection.

[0051] After constructing the nodes and edges, AR glasses can generate a structured human body topology graph and perform validity checks. For example, the human body topology graph can be stored in an adjacency list format, recording the identifier and feature vector of each node, as well as the edge identifier, edge feature vector, and neighboring node identifiers associated with that node, facilitating subsequent message passing and feature aggregation in the graph neural network.

[0052] For example, it can check whether physiological connections such as shoulder-elbow-wrist and hip-knee-ankle are complete. If there are any gaps, it can use a preset human body standard topology to fill in the features of the missing edges, ensuring that the topology map covers the key structures associated with the protective gear.

[0053] Then, the constructed human body topology map is associated with and stored with the timestamps and personnel IDs of the corresponding video frames, forming structured data of "timestamp-person ID-human body topology map", which lays the foundation for subsequent integration of protective gear information and realization of topology learning in multi-person scenarios.

[0054] In some embodiments, for step S4, based on the protective gear detection box information obtained in S2 and the human body topology map constructed in S3, the protective gear feature vector is transformed and attached to the key point nodes of the human body, and finally a human body-protective gear topology map with integrated protective gear information is formed, providing a complete structured input for subsequent graph neural network learning.

[0055] Specifically, such as Figure 3 The diagram illustrates the process of fusing protective gear feature vectors with key human body nodes. The specific implementation includes: S301, Information preprocessing and screening for protective gear detection frames.

[0056] First, the protective gear detection frame information output by S2 is preprocessed to ensure the validity of the protective gear data. Specifically, the pixel coordinates of the protective gear detection frame output by S2, i.e., the coordinates of the top left and bottom right corners of the detection frame, are converted into the center coordinates of the detection frame, and the size of the detection frame is calculated at the same time.

[0057] Then, the detection boxes with a confidence level of 0.6 or higher for the protective gear category are retained, while suspected false detection boxes with a confidence level lower than this threshold are removed to avoid invalid data interfering with subsequent feature construction. At the same time, if there are multiple overlapping detection boxes for the same protective gear, the detection box with the highest category confidence level is retained as valid data to eliminate the impact of duplicate detection on node attachment.

[0058] S302, Protective Gear Feature Vector Construction: For each valid protective gear detection box after preprocessing, construct a protective gear feature vector according to preset dimensions. The vector must contain at least one or more of the following information, and the feature construction methods for each dimension are as follows: (1) The relative normalized coordinates of the center of the protective gear detection frame.

[0059] Using the coordinates (x_node, y_node) of the key points of the human body to be attached as a reference, the coordinates (cx, cy) of the center of the protective gear detection frame are normalized. First, the relative coordinate differences Δx = cx - x_node and Δy = cy - y_node are calculated. Then, using the width W_body and height H_body of the bounding rectangle of the human body as normalization factors, the relative normalized coordinates Δx_norm = Δx / W_body and Δy_norm = Δy / H_body are obtained, eliminating the influence of personnel distance and image size on the coordinate difference.

[0060] (2) Normalized dimensions of the protective gear inspection frame.

[0061] Divide the protective gear detection frame dimensions (w, h) by the width W_body and height H_body of the circumscribed rectangle of the human body, respectively, to obtain normalized dimensions w_norm=w / W_body and h_norm=h / H_body. These are used to quantify the size ratio of the protective gear relative to the human body and reflect the coverage of the protective gear, such as whether insulating gloves only cover part of the hands.

[0062] (3) Protective gear category unique thermal code.

[0063] One-hot encoded vectors are constructed based on the protective gear category. A preset set of protective gear categories is used, such as {insulating gloves: 0, goggles: 1, insulating boots: 2}. If the protective gear is insulating gloves, it is encoded as [1,0,0]; if it is goggles, it is encoded as [0,1,0], and so on, so that the model can quickly identify the type of protective gear and match it with the corresponding key parts of the human body.

[0064] (4) The direction and angle of the protective gear.

[0065] The orientation angle is calculated based on the geometry of the protective gear detection box. If the protective gear detection box is a regular rectangle, the angle between the long side of the detection box and the horizontal direction of the image is calculated. If a protective gear segmentation mask exists, the direction of the long side is determined by the minimum bounding rectangle of the mask area, and the angle θ is calculated. This angle can reflect the protective gear wearing posture, such as whether the goggles are tilted.

[0066] (5) Embedding of visual features of protective gear.

[0067] The image region corresponding to the protective gear detection box is cropped, scaled to a fixed size, and input into a pre-trained lightweight feature extraction network. The output is a 256-dimensional or 512-dimensional visual feature vector, which can represent the texture, color, local shape and other details of the protective gear, thereby improving the model's ability to recognize non-standard wearing conditions.

[0068] S303, Matching the attachment of protective gear feature vectors to key human body nodes: Based on the correlation between protective gear type and key human body nodes, determine the target node for attaching the protective gear feature vector.

[0069] Among them, fixed attachment rules can be preset based on the correspondence between insulating protective gear and human body parts. For example, insulating gloves can be attached to key points on the wrist, goggles can be attached to key points on the head or the midpoint of the two key points on the eyes, and insulating boots can be attached to key points on the ankle.

[0070] This ensures that the protective gear is properly attached to the most relevant key points on the human body, conforming to the physiological function correspondence between the protective gear and the body parts.

[0071] Then, the validity of the preset attachment node can be verified. If the confidence of the node is ≥0.5, the protective gear feature vector is directly attached to the node. If the confidence of the node is <0.5, its neighboring node is selected as a temporary attachment node, and a node validity mark is added to the feature vector. For example, 0 indicates that the target node is valid, and 1 indicates that it is a temporary node. The mark will be used as the basis for weight adjustment during subsequent graph neural network learning.

[0072] If multiple protective gear feature vectors need to be attached to the same human body key point node, they are arranged in descending order of the category confidence of the protective gear detection box, and the top two high-confidence protective gear feature vectors are retained and attached to the node in parallel. Each vector is individually labeled with a sequence number. Subsequently, the GNN will select the optimal matching protective gear through feature comparison.

[0073] S304: After completing the attachment of the protective gear feature vectors, information is supplemented to the human body topology map constructed in S3 to form the final human body-protective gear topology map.

[0074] In this process, the feature vector of the attached protective gear can be used as an additional attribute and integrated into the original feature vector of the corresponding human body key point node to form an expanded node feature vector, ensuring that the node contains both human body structure information and protective gear information.

[0075] Specifically, in the adjacency list of the human body topology graph, protective gear association markers can be added to nodes that are attached with protective gear, recording the protective gear type, protective gear feature vector dimension, and attachment timestamp. This facilitates the subsequent rapid location of nodes containing protective gear information by the GNN, improving message transmission efficiency. Furthermore, the human body-protective gear topology graph can be associated with the timestamp and personnel ID of the corresponding video frame, stored in a graph structure data format to ensure a one-to-one correspondence between the timestamp, personnel ID, and human body-protective gear topology graph.

[0076] In some embodiments, for step S5, based on the human-protective gear topology graph formed in S4, message passing and feature aggregation are completed through a pre-trained graph attention network (GAT), the compliance confidence of the protective gear wearing state is output, and the compliance threshold is dynamically adjusted in combination with the temporal module and the action recognition module.

[0077] Before inputting the human-protective gear topology map into the GAT model, the AR glasses can first perform data adaptation processing to ensure that it matches the model input format.

[0078] First, a graph structure format conversion can be performed. For example, the GraphML format human-protective gear topology graph stored in S4 can be converted into an adjacency matrix and node feature matrix format supported by the GAT model. The adjacency matrix has a dimension of [V×V], where V is the total number of human keypoint nodes, such as 17 or 25; matrix elements are 0 or 1, where 1 indicates a physiological connection between two nodes, and 0 indicates no connection. The node feature matrix has a dimension of [V×F], where F is the total dimension of the expanded node features, containing the original features of human keypoints and the feature vectors of the protective gear; the matrix elements are normalized feature values ​​to eliminate the interference of feature magnitude differences on model learning.

[0079] Next, feature dimensions are unified. If the feature vector dimensions of different protective gear types differ, a linear mapping layer is used to uniformly map all protective gear feature vectors to 64 dimensions, ensuring that the feature dimensions of each node in the node feature matrix are consistent and meeting the input requirements of the GAT model.

[0080] Then, invalid nodes are masked. For example, for low-reliability human keypoint nodes with a confidence level of less than 0.3 in S3, invalid markers are added to the node feature matrix, such as setting all elements of the node's feature vector to 0. During the GAT model's learning process, attention weights on the information of that node will be automatically reduced, thus preventing invalid nodes from interfering with the learning of spatial topological relationships.

[0081] Then, the pre-trained GAT model is loaded, and at least three rounds of message passing and feature aggregation are performed on the input human-protective gear topology graph. For example, the specific process may include: The system employs a 3-layer GAT architecture with 4 attention heads per layer, 128 hidden dimensions, and a dropout probability of 0.2. The first layer receives the preprocessed node feature matrix and adjacency matrix, and each layer calculates the attention weights between nodes through a multi-head attention mechanism.

[0082] In the first round of message passing, the focus is on the interaction between the node's own features and the features of its directly adjacent nodes. For each node containing protective gear information, GAT calculates attention weights, aggregates the human body structure features of its directly adjacent nodes, initially fuses the correlation information between protective gear and local human body structure, and outputs the node feature matrix after the first round of feature updates.

[0083] In the second round of message passing, the focus is on the interaction between the node's own features and the features of its indirectly adjacent nodes. For each node containing protective gear information, based on the node features updated in the first round, GAT further aggregates the features of indirectly adjacent nodes, learns the topological relationship between the protective gear and the larger structure of the human body, corrects the attention weights, and highlights the structural features related to the protective gear wearing status.

[0084] In the third round of message passing, the focus is on the interaction between the node's own features and the features of all neighboring nodes. For each node containing protective gear information, GAT performs global aggregation of all node features, combining the features from the first two rounds. It emphasizes the matching information between protective gear features and human body topological constraints, such as the angle matching between goggles and the head-shoulder structure, and the position matching between insulating boots and the hip-knee-ankle structure. The final node feature matrix is ​​output, which incorporates global spatial topological relationship information.

[0085] Then, the node features of the attached protective gear in the final node feature matrix can be compressed and classified using a two-layer multilayer perceptron, outputting the feature vector corresponding to each protective gear, which provides a basis for subsequent compliance confidence calculation.

[0086] Then, based on the features aggregated by the GAT model, the probability distribution of each protective gear for the preset wearing state category is calculated and output.

[0087] Five wear status categories can be preset: correct wear, partial wear, tilted wear, slippage, and missing, covering both "non-standard wear" and normal wear scenarios.

[0088] The protective gear feature vector output by GAT can be input into a softmax activation function to obtain the probability distribution of the protective gear across five wearing status categories. The sum of the probability values ​​is 1, where a higher probability value for a particular category indicates a higher confidence level that the protective gear belongs to that wearing status. The system then stores the compliance confidence results, including the personnel ID, protective gear type, timestamp, and probability distribution, for example, "ID1-Insulating Gloves-00:01:23-[0.92,0.03,0.02,0.02,0.01]". Simultaneously, a visualized compliance confidence heatmap is generated, which uses human body contours as a basis and color depth to represent the correct wearing probability of each protective gear, providing intuitive data support for subsequent time-series processing and alarms.

[0089] If a certain type of protective gear should be detected but is not, the probability distribution of that protective gear will be output as [0,0,0,0,1], that is, the probability of the "missing" category is 1, ensuring that all protective gear that should be monitored has a compliant confidence output.

[0090] Preferably, after outputting the compliance confidence score, the data from multiple consecutive frames is processed by the time-series module, and the compliance threshold is adjusted in conjunction with the action recognition module. For example, such as... Figure 4The diagram shown illustrates the compliance threshold adjustment process. The specific process is as follows: S401, time-series data acquisition, collects continuous T-second data on the compliance confidence of protective gear and the coordinate information of key points of the human body to form a time-series data sequence. The sampling frequency is synchronized with the video frame rate, and the value of T ranges from 1.5 to 3 seconds. S402, Temporal Consistency Score Calculation: Temporal ConvNet (temporal convolutional network) is used to process temporal data sequences and calculate the temporal consistency score. Among them, the convolution kernel captures the changing trend of compliance confidence between consecutive frames and outputs the temporal consistency score in the 0-1 interval. The higher the score, the more stable the protective gear wearing status. S403, Operation Stage Recognition: Call the lightweight SlowFast motion recognition module, input a continuous sequence of human key points for T seconds, and identify the current operation stage. For example, the "contact with electrical appliances" stage is determined to be a critical operation stage, and the "preparation" stage is determined to be a non-critical stage. S404, Dynamic Adjustment of Compliance Thresholds: A basic compliance threshold is set based on the work phase, and then adjusted according to the time-series consistency score. For example, when the consistency score is greater than or equal to 0.8, the threshold is lowered by 5%; when the consistency score is less than 0.5, the threshold is raised by 5%, forming a dynamic compliance threshold to ensure the adaptability of the threshold to different work scenarios and states. S405, threshold and confidence level are associated and stored. The dynamically adjusted compliance threshold is associated and stored with the corresponding timestamp, operation stage and time sequence consistency score, so as to provide threshold basis for subsequent protective gear wearing compliance judgment.

[0091] Optionally, for the graph neural network (GAT) model, a composite loss function is constructed that includes protective gear detection loss, GNN classification loss, and temporal consistency loss, and a contrastive learning loss term is introduced to optimize training, ensuring that the model accurately learns the spatial topological relationship between the human body and protective gear.

[0092] Specifically, before calculating the loss function, the training dataset is preprocessed to provide suitable samples for each loss term. This includes, for example,... Dataset partitioning: The labeled training data is divided into training set, validation set and test set in a ratio of 7:2:1 to ensure that the samples cover all wearing states such as normal wearing, partial wearing, tilted wearing, slipping, and missing, and include complex scene samples such as strong light, occlusion, and multiple overlapping people; among them, the training data is more than 10k frames, and each frame contains labels such as human body key points, protective gear detection box, and wearing status.

[0093] After acquiring the training data, the human body key points and protective gear detection box information of each frame sample can be converted into graph structure data input to the GAT model. At the same time, a continuous T-second temporal sample sequence is extracted to provide data for calculating the temporal consistency loss. Then, a comparative learning sample pair of correctly worn samples and non-standard worn samples is constructed at a ratio of 1:3. The correctly worn samples are protective gear-human body topology maps labeled "correct", and the non-standard worn samples include samples labeled "partial", "tilted", and "slipped". Each pair of samples must come from the same protective gear type and similar working scenarios to ensure that the comparative learning focuses on the differences in wearing status rather than differences in scenario or protective gear type.

[0094] Alternatively, the general formula for the composite loss function is L = L_det + α. L_gnn_cls+β L_temp+γ L_contrast, where α, β, and γ are weighting coefficients, set to α=1.2, β=0.8, and γ=0.5 using 5-fold cross-validation. The calculation methods for each component are as follows: (1) Protective gear detection loss (L_det) L_det is the output error calculation of the protective gear detection model, which includes bounding box regression loss, class classification loss, and target existence loss (BCEWithLogitsLoss).

[0095] Among them, the bounding box regression loss is the CIoU value between the model-predicted protective gear detection box and the labeled box, which reflects the matching degree of the box position, size, and aspect ratio. The closer the CIoU value is to 1, the smaller the loss. The category classification loss is the cross-entropy between the predicted probability of the protective gear category and the one-hot encoding of the label, which is used to penalize the category prediction bias. The target existence loss is the cross-entropy between the probability of whether the protective gear exists in the predicted box and the binary classification of the label, which is used to suppress false detection boxes.

[0096] The three losses are summed with weights of 1:1:0.5 to obtain L_det, which is used to quantify the overall error of protective gear detection.

[0097] (2) GNN classification loss (L_gnn_cls) L_gnn_cls is calculated for the protective gear wearing state classification results output by the GAT model to optimize the model's ability to distinguish wearing states; among them, multi-class cross-entropy loss is used to compare the probability distribution of the 5 types of wearing states output by GAT with the labeled labels.

[0098] Optionally, for non-standard wearing samples such as "partial wearing" and "slippage" with a small sample size, a weighting coefficient of 1.5 is applied when calculating the loss to solve the sample imbalance problem and improve the model's sensitivity to identifying abnormal states. Finally, by iterating through all protective gear samples, the average cross-entropy loss is calculated to obtain L_gnn_cls, which reflects the deviation between the GAT model classification results and the actual wearing status.

[0099] (3) Temporal consistency loss (L_temp) To address the stability of protective gear features across consecutive frames, L_temp is calculated to suppress misjudgments caused by frame jumps. Specifically, mean squared error (MSE) loss is used, calculating the squared difference of the GAT output feature vectors for the same protective gear in two consecutive frames. Within a time window of length T seconds, the MSE loss of all consecutive frame pairs is averaged. Simultaneously, samples with abrupt changes in inter-frame features (i.e., MSE values ​​greater than 0.3) are multiplied by a penalty coefficient of 2.0 to strengthen temporal stability constraints. Finally, L_temp is obtained, quantifying the fluctuation degree of protective gear features across consecutive frames; a smaller loss indicates better temporal consistency.

[0100] (4) Contrast learning loss (L_contrast) By constructing positive and negative sample pairs to optimize the feature space distribution, the model's ability to distinguish wearing states is improved.

[0101] Among them, sample pairs include positive sample pairs, which are "correctly worn" samples and "correctly worn" samples, or "partially worn" samples and "partially worn" samples of the same protective gear in similar scenarios; negative sample pairs are "correctly worn" samples and "non-standard worn" samples of the same protective gear.

[0102] Optionally, the InfoNCE loss is used, with the feature similarity of positive sample pairs as the numerator and the sum of the feature similarities of negative sample pairs as the denominator, and a temperature coefficient τ of 0.1 is added to calculate the logarithmic loss. During training, negative samples that are difficult to distinguish are dynamically screened, that is, negative samples whose feature similarity is close to that of positive sample pairs. Their loss is multiplied by a weight of 1.8 to further increase the distance between positive and negative samples in the feature space. Then, the loss of all constructed sample pairs is averaged to obtain L_contrast, ensuring that the features of samples with the same wearing state are clustered and the features of samples with different states are separated.

[0103] Optionally, by using 5-fold cross-validation, the values ​​of α, β, and γ are adjusted on the validation set, and finally α=1.2, β=0.8, and γ=0.5 are determined to achieve the optimal balance between classification accuracy, temporal stability, and feature discrimination of the model.

[0104] The AdamW optimizer is used with the composite loss function L as the optimization objective. The model is iteratively trained on the training set for 30-50 epochs. If the composite loss L on the validation set does not decrease for 3 consecutive epochs, training is stopped and the current optimal model weights are saved to avoid overfitting. After training, the classification accuracy of protective gear wearing status and the temporal consistency error are calculated on the test set to ensure that the model meets the detection requirements of actual operation scenarios.

[0105] The AdamW optimizer can initially have a learning rate of 1e-4, which decays to 0.8 every 5 epochs.

[0106] Preferably, based on the aforementioned protective gear compliance confidence level and dynamically adjusted compliance threshold, compliance judgment is completed in conjunction with time-series window constraints, and alarm information is generated when preset conditions are met.

[0107] Specifically, before making a compliance judgment, the basic data should be linked and organized to ensure the integrity of the evidence chain.

[0108] Specifically, the compliance confidence level, dynamic compliance threshold, and time-series window parameters corresponding to the same "Personnel ID - Protective Gear Type" can be associated to form a single protective gear compliance judgment dataset, avoiding data confusion between different personnel and different protective gears. From the probability distribution of compliance confidence, the probability value of the "correct wearing" category is extracted and denoted as P_correct, which directly represents the compliance level of the current wearing status of the protective gear. If the probability P_missing of a certain protective gear corresponding to the "missing" category is ≥ 0.9, then its P_correct value is directly assigned to 0, simplifying the judgment logic for extreme cases. The validity of the dynamically adjusted compliance threshold is verified. If Th exceeds the reasonable range of [0.6, 0.95] due to abnormal time-series consistency score, it is automatically corrected to the range boundary value, taking 0.95 when it is greater than the upper limit and 0.6 when it is less than the lower limit, avoiding misjudgment or missed judgment due to abnormal threshold.

[0109] Preferably, a judgment method based on threshold comparison of consecutive frames within a time window is adopted to avoid misjudgment caused by instantaneous fluctuations in a single frame.

[0110] Specifically, a sliding time-series window with a preset number of frames N can be constructed with the current processing frame as the window endpoint, and the P_correct and dynamic threshold of each frame within the window can be recorded. If there are missing frame data within the window, the missing values ​​are filled in by linear interpolation of P_correct between the previous and next frames to ensure the integrity of the window data.

[0111] The process involves iterating through each frame of data within the sliding window and comparing the frame's P_correct with the corresponding dynamic threshold. If P_correct is less than the dynamic threshold, it is recorded as a low-compliance frame. Starting from the window's initial frame, the number C of consecutive low-compliance frames is counted. If a frame with P_correct ≥ the dynamic threshold appears midway, the count C of low-compliance frames is reset to 0. If the counted C is greater than or equal to the corresponding threshold C0, the protective gear is determined to be non-compliant. If C is less than C0, the current wear is determined to be compliant or in a state of fluctuating compliance. For "missing" protective gear, it is directly determined to be "severely non-compliant" without waiting for the consecutive frame count.

[0112] Alarm information will be generated and triggered according to preset rules only when the compliance judgment result is "non-compliant" or "seriously non-compliant".

[0113] For example, alarm levels may include Level 1 alarms and Level 2 alarms, corresponding to serious non-compliance and general non-compliance, respectively.

[0114] Level 1 alarms apply to scenarios where protective gear is missing, or where protective gear slips, is partially worn, and the work phase involves critical stages such as "contact with electrical appliances," indicating a high safety risk. Level 2 alarms apply to scenarios where protective gear is worn at an angle, partially worn, and the work phase is non-critical, indicating a medium to low safety risk.

[0115] The alarm information can include structured key information in the format of "[alarm level]-[personnel ID]-[protective gear type]-[non-compliance status]-[trigger window frame number]-[current operation stage]", such as "Level 1 alarm-ID2-insulating gloves-missing-50 consecutive frames-contact with electrical appliances stage", and an alarm trigger timestamp is attached for easy traceability.

[0116] In one embodiment, preferably, in a scenario with multiple workers, a unique identifier ID is assigned to each worker and their movement trajectory is tracked; the detected protective gear is matched with the human body topology map of each worker, and the matching conditions simultaneously consider spatial overlap, appearance feature similarity, and graph neural network context consistency score.

[0117] Specifically, when a new person enters the acquisition range in the first frame of a video sequence, a unique ID is assigned to each worker to ensure the uniqueness and continuity of personnel identification.

[0118] Then, the video sequence is processed frame by frame. A lightweight object detection model is used to identify the bounding rectangles of human bodies in the image. The intersection-union ratio (IOU) of the human body bounding box in the current frame with all human body bounding boxes with assigned IDs in the previous frame is calculated. If the IOU of a human body bounding box with all existing human body bounding boxes with IDs is less than 0.3, it is determined to be a newly entered person, and the ID allocation process is triggered.

[0119] The AR glasses can generate an ID using a combination of timestamp and personnel number, in the format "TS-XX". TS is the timestamp of the frame in which the new person is first detected, accurate to milliseconds; XX is the personnel number of the new person under that timestamp, starting from 01 and incrementing to ensure that the ID is not repeated in different times and different scenarios.

[0120] Then, the newly assigned ID is associated with initial data, including the coordinates of the bounding box of the human body detected in the first frame, the set of human body key points, and the initial human body topology map, and stored in the "Personnel ID-Topology Map" associated database to provide a foundation for subsequent trajectory tracking and protective gear matching.

[0121] Next, a multi-target tracking algorithm combined with appearance features is used to continuously track the movement trajectory of workers with each ID, avoiding ID switching or loss.

[0122] The ByteTrack multi-target tracking algorithm can be used, combined with a lightweight appearance feature extraction model. For example, the algorithm parameters are set as follows: detection box matching IOU threshold of 0.5, maximum number of frames that survive after the trajectory disappears of 10 (i.e., the ID is still retained within 10 frames after a person is briefly occluded), and appearance feature similarity matching threshold of 0.6.

[0123] For the human bounding box detected in the current frame, a preliminary matching is first performed with the trajectory prediction bounding boxes of each ID in the previous frame based on the intersection-union ratio (IoU) of the human bounding box: if the IoU is greater than or equal to 0.5, it is directly associated with the corresponding ID; if the IoU is less than 0.5, the appearance features of the current human bounding box are extracted and the cosine similarity is calculated with the appearance features of each ID in the previous frame. If the similarity is greater than or equal to 0.6, it is associated with the corresponding ID.

[0124] After successful association, the trajectory information of the ID is updated, including the bounding rectangle of the human body in the current frame, the coordinates of the human body key points, and the human body topology map. The trajectory coordinate sequence is recorded, and the center coordinates of the human body bounding box are saved in each frame. If an ID fails to match a human body bounding box for 10 consecutive frames, it is determined that the person has left the collection range, the ID is marked as invalid, and trajectory updates are stopped. If a human body bounding box matches multiple IDs in the same frame, the ID with the highest appearance feature similarity is selected for association, and the remaining IDs are still marked as invalid.

[0125] Subsequently, the protective gear detected by S2 was matched with the human body topology map of each ID worker based on three categories of conditions: spatial, appearance, and topology, to determine the ownership of the protective gear.

[0126] First, matching conditions are calculated, including spatial overlap, apparent feature similarity, and GNN context consistency score.

[0127] Specifically, for each protective gear detection frame and the bounding rectangle of a person with a certain ID, the IOU between the two is calculated and denoted as the spatial overlap IOU_spatial. The larger the value, the stronger the spatial correlation between the protective gear and the person. If the protective gear detection frame and the person frame do not overlap, the ID is directly excluded.

[0128] Specifically, the apparent features of the image region corresponding to the protective gear detection box are extracted and the apparent features of the corresponding ID human body are extracted. The similarity between the two is calculated using the cosine similarity formula and denoted as the apparent feature similarity Sim_appear. The larger the value, the stronger the apparent correlation between the protective gear and the human body.

[0129] In this process, the feature vector of the protective gear is temporarily attached to the candidate node of the human body topology graph of the ID, input into the pre-trained GAT model, and outputs the context consistency score Score_gnn between the protective gear and the human body topology. The larger the value, the higher the matching degree between the protective gear and the human body topology.

[0130] Next, weights are assigned to the three matching conditions, and the overall matching score of each protective gear with a certain ID is calculated: Score_total=ω1×IOU_spatial+ω2×Sim_appear+ω3×Score_gnn; through cross-validation, it can be determined that ω1=0.3, ω2=0.2, and ω3=0.5.

[0131] For each protective gear, calculate its overall matching score with all valid IDs, and select the ID with the highest overall matching score greater than 0.5 as the ID to which the protective gear belongs; if the Score_total of all IDs is less than 0.5, it is determined that the protective gear has not matched a human body and is marked as a suspected false positive protective gear, which will not be included in the subsequent compliance judgment; if an ID matches multiple protective gears of the same type at the same time, select the protective gear with the highest Score_total as a valid match, and mark the remaining protective gears as "redundant protective gear" and remove them.

[0132] Then, the protective gear-ID attribution relationship is linked to the "Personnel ID-Protective Gear-Topology" database, and the human-protective gear topology of the corresponding ID is updated to ensure that the protective gear can be accurately associated with the topology of the attributing personnel during subsequent GAT model processing.

[0133] Figure 5 An AR glasses-based insulated protective gear wearing compliance detection system 500 is shown. The system implementation is similar to... Figure 1 Corresponding to the illustrated method embodiments, the specific methods include: Acquisition module 501 is used to acquire video sequences containing at least one operator; The detection box information acquisition module 502 is used to process each frame of the video sequence, obtain the coordinate information of human body key points through the pose estimation model, and obtain the detection box information of the protective gear through the target detection model. The human body topology graph construction module 503 is used to construct a human body topology graph based on the human body key points, wherein each human body key point is used as a graph node and the physiological connection between adjacent key points is used as the edge of the graph. The fusion topology graph construction module 504 is used to convert the detection box information of the protective gear into a protective gear feature vector, and attach the protective gear feature vector to the human body key point node most relevant to it in the human body topology graph, so as to form a human body-protective gear topology graph that integrates the protective gear information. The compliance output module 505 is used to input the human body-protective gear topology map into a pre-trained graph neural network model, perform message passing and feature aggregation through the graph neural network model, learn the spatial topological relationship between human body key points and protective gear, and output a compliance confidence score to characterize the protective gear wearing state. The compliance confidence score includes the probability distribution of multiple wearing state categories. Alarm module 506 is used to determine whether the wearing of protective gear is compliant based on the compliance confidence level, and to generate alarm information when it is determined to be non-compliant; The spatial topological relationship refers to the nonlinear, part-dependent relative positional pattern of the protective gear relative to the relevant set of key points on the human body, which is used to distinguish between standard and non-standard wearing states.

[0134] Those skilled in the art will clearly understand that the technical solutions of the embodiments of this application can be implemented by means of software and / or hardware. In this specification, "unit" and "module" refer to software and / or hardware that can independently complete or cooperate with other components to complete a specific function, wherein the hardware may be, for example, a field-programmable gate array (FPGA), an integrated circuit (IC), etc.

[0135] Each processing unit and / or module in the embodiments of this application can be implemented by an analog circuit that implements the functions described in the embodiments of this application, or by software that executes the functions described in the embodiments of this application.

[0136] Please see Figure 6 It shows a schematic diagram of the structure of an electronic device according to an embodiment of this application, which can be used to implement... Figure 1 The method in the illustrated embodiment. (As shown) Figure 6 As shown, the electronic device 600 may include: The system includes at least one processor 601, at least one network interface 604, a user interface 603, a memory 605, and at least one communication bus 602. The communication bus 602 is used to enable connection and communication between the components. The user interface 603 may include buttons, and optionally include a standard wired or wireless interface. The network interface 604 may include, but is not limited to, a Bluetooth module, an NFC module, a Wi-Fi module, etc.

[0137] The processor 601 may include one or more processing cores and connect to various parts within the device 600 via various interfaces and lines. It implements the various functions and data processing of the device 600 by running or executing instructions, programs, code sets, or instruction sets stored in the memory 605, and by accessing data in the memory 605. Optionally, the processor 601 may be implemented using at least one hardware form of DSP, FPGA, or PLA. The processor 601 may also integrate one or more combinations of CPU, GPU, and modem. The CPU is mainly used to handle the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem is used for wireless communication. It is understood that the modem may not be integrated into the processor 601, but may be implemented through a separate chip.

[0138] Memory 605 may include random access memory (RAM) or read-only memory (ROM). Optionally, memory 605 includes a non-transitory computer-readable medium for storing instructions, programs, code, code sets, or instruction sets. Memory 605 may be divided into a program storage area and a data storage area, wherein the program storage area may be used to store instructions for implementing an operating system, instructions for implementing at least one function (such as touch functionality, audio playback functionality, image playback functionality, etc.), and instructions for implementing the foregoing method embodiments; the data storage area may be used to store data involved in the relevant method embodiments. Memory 605 may also be at least one storage device located remotely from processor 601. Figure 6 As shown, the memory 605, which serves as a computer storage medium, may contain an operating system, a network communication module, a user interface module, and program instructions.

[0139] In particular, the methods and / or embodiments in this application can be implemented as computer software programs. For example, the embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowchart. When the computer program is executed by processor 601, it performs the functions defined in the methods of this application.

[0140] Another embodiment of this application provides a computer-readable storage medium having computer program instructions stored thereon, which can be executed by a processor to implement the methods and / or technical solutions of any one or more embodiments of this application described above.

[0141] The computer-readable storage medium may include, but is not limited to, any type of disk, including floppy disks, optical disks, DVDs, CD-ROMs, microdrives, as well as magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic cards or optical cards, nanosystems (including molecular memory ICs), or any type of medium or device suitable for storing instructions and / or data.

[0142] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.

[0143] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A method for detecting the compliance of wearing insulating protective gear based on AR glasses, characterized in that, The method is applied to AR glasses; the method includes the following steps: Acquire video sequences containing at least one operator; Each frame of the video sequence is processed to obtain the coordinate information of key points of the human body through a pose estimation model and to obtain the detection box information of the protective gear through a target detection model. A human topology graph is constructed based on the aforementioned human key points, wherein each of the aforementioned human key points is used as a graph node, and the physiological connections between adjacent key points are used as edges of the graph. The detection frame information of the protective gear is converted into a protective gear feature vector, and the protective gear feature vector is attached to the human body key point node that is most relevant to it in the human body topology map to form a human body-protective gear topology map that integrates protective gear information. The human body-protective gear topology map is input into a pre-trained graph neural network model. The graph neural network model performs message passing and feature aggregation to learn the spatial topological relationship between human body key points and protective gear, and outputs a compliance confidence score to characterize the protective gear wearing status. The compliance confidence score includes the probability distribution of multiple wearing status categories. Based on the compliance confidence level, it is determined whether the protective gear is worn in compliance, and an alarm message is generated when it is determined to be non-compliant. The spatial topological relationship refers to the nonlinear, part-dependent relative positional pattern of the protective gear relative to the relevant set of key points on the human body, which is used to distinguish between standard and non-standard wearing states.

2. The method according to claim 1, characterized in that, The graph neural network model is a graph attention network; message passing and feature aggregation involve at least three rounds of message passing to propagate structural context information between nodes in the graph.

3. The method according to claim 1, characterized in that, The protective gear feature vector includes at least one of the following: normalized coordinates of the protective gear detection frame center relative to the key points of the attached human body, normalized size of the protective gear detection frame, one-hot encoding of the protective gear category, orientation angle of the protective gear, and visual feature embedding extracted from the protective gear image region.

4. The method according to claim 1, characterized in that, The wear status categories include: correctly worn, partially worn, worn at an angle, slipped, and missing.

5. The method according to claim 1, characterized in that, The generation of alarm information when a non-compliance is determined includes: Within a preset time window, if the probability value representing the compliance category in the compliance confidence score of the same protective gear is continuously lower than the threshold for a predetermined number of frames, the alarm information is generated.

6. The method according to claim 1, characterized in that, The loss function used in the training process of the graph neural network model is a composite loss function, which includes at least: protective gear detection loss, graph neural network classification loss, and temporal consistency loss.

7. The method according to claim 6, characterized in that, The loss function also includes a contrastive learning loss term, which is used to maximize the representation distance between correctly worn and non-standard worn sample pairs in the feature space of the graph neural network.

8. A compliance detection system for wearing insulating protective gear based on AR glasses, characterized in that, include: The acquisition module is used to acquire video sequences containing at least one operator. The detection box information acquisition module is used to process each frame of the video sequence, obtain the coordinate information of human key points through the pose estimation model, and obtain the detection box information of the protective gear through the target detection model. A human body topology graph construction module is used to construct a human body topology graph based on the human body key points, wherein each human body key point is used as a graph node and the physiological connections between adjacent key points are used as graph edges. The fusion topology graph construction module is used to convert the detection box information of the protective gear into a protective gear feature vector, and attach the protective gear feature vector to the human body key point node most relevant to it in the human body topology graph, so as to form a human body-protective gear topology graph that integrates the protective gear information. The compliance output module is used to input the human body-protective gear topology map into a pre-trained graph neural network model, perform message passing and feature aggregation through the graph neural network model, learn the spatial topological relationship between human body key points and protective gear, and output a compliance confidence score to characterize the protective gear wearing state. The compliance confidence score includes the probability distribution of multiple wearing state categories. The alarm module is used to determine whether the wearing of protective gear is compliant based on the compliance confidence level, and to generate alarm information when it is determined to be non-compliant; The spatial topological relationship refers to the nonlinear, part-dependent relative positional pattern of the protective gear relative to the relevant set of key points on the human body, which is used to distinguish between standard and non-standard wearing states.

9. An electronic device, characterized in that, include: At least one processor; and a memory communicatively connected to the at least one processor; wherein, The memory stores instructions that can be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A computer-readable medium having computer program instructions stored thereon, characterized in that, The computer program instructions can be executed by a processor to implement the method as described in any one of claims 1-7.