Construction site gate pass violation behavior analysis method and system based on large model

By synchronously collecting and associating the identity, physical, and visual data of construction site gates using a large model approach, automated identification of violations was achieved. This solved the problem of existing systems relying on manual inspection for violation determination, and improved the accuracy and efficiency of identification.

CN122244786APending Publication Date: 2026-06-19UNIVERSAL UBIQUITOUS TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
UNIVERSAL UBIQUITOUS TECH CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing construction site gate systems cannot accurately link identity authentication data, physical status data, and video data, resulting in the determination of violations relying on manual review afterward, which is inefficient, prone to omissions, and fails to meet high standards of safety management requirements.

Method used

Using a large model approach, identity authentication data, physical status data, and visual observation data of the gate passage area are collected simultaneously. Video segments are extracted by using changes in the gate's state as segmentation points, and multi-target detection and identity association are performed to establish a mapping relationship between authenticated identity and visual targets, thereby determining the compliance of passage.

🎯Benefits of technology

It achieves automatic and accurate identification of violations, reduces manual intervention, improves the accuracy and real-time nature of violation judgment, and meets the high standards required for safety management of smart construction sites.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244786A_ABST
    Figure CN122244786A_ABST
Patent Text Reader

Abstract

This invention provides a method and system for analyzing violations at construction site turnstiles based on a large model. It relates to the field of behavior analysis technology, including the simultaneous acquisition of multi-source data from the turnstile area, segmenting video clips based on the turnstile gate's opening and closing status. Multi-target detection and identity association are performed on the clips to establish a mapping relationship between authenticated identities and visual targets. Based on this mapping relationship, access compliance is verified, and violations such as tailgating, false passage, and detours can be automatically identified. This invention achieves accurate and automatic identification of turnstile access violations, improving the efficiency of security management at construction site entrances and exits.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of behavior analysis technology, and in particular to a method and system for analyzing illegal passage behavior at construction site gates based on a large model. Background Technology

[0002] In the field of smart construction site safety management, turnstiles are the core facility for controlling personnel access. Currently, to ensure safe passage and prevent violations, the conventional approach is to deploy an automated monitoring system integrating multiple sensors. This system typically relies on the collaborative work of an identity authentication module, turnstile physical status sensors, and video surveillance equipment. The identity authentication module verifies the legitimate identity of personnel, such as through card swiping, facial recognition, or QR code scanning. The turnstile physical status sensors monitor the opening and closing status of the gate and the blocking status of infrared beam detectors in real time. The video surveillance equipment continuously records the area around the turnstile for later review or real-time viewing.

[0003] However, this conventional integrated monitoring approach reveals significant shortcomings in practical applications. Existing methods typically treat identity authentication data, physical status data, and video data as independent streams, lacking a mechanism for precise correlation and fusion analysis across time and event dimensions. For example, the system may record a successful identity authentication and a turnstile opening action, but it cannot automatically and accurately correlate this authentication with the passage of a specific person in the video. This leads to a heavy reliance on manual review of recordings for determining typical violations such as "tailgating" (one person authenticates, multiple people pass through), "false passage" (authentication without actual passage), and "detour" (not using the turnstile channel), resulting in low efficiency and a high risk of omissions. Furthermore, due to the lack of structured video analysis based on complete passage events, the system struggles to automatically calculate the precise dwell time and movement trajectory of individuals within the channel, making it impossible to meet high standards of security management in terms of accuracy and real-time performance in violation detection. Summary of the Invention

[0004] This invention provides a method and system for analyzing illegal passage behavior at construction site gates based on a large model, which can solve the problems in the prior art.

[0005] A first aspect of this invention provides a method for analyzing illegal access behavior at construction site gates based on a large model, comprising: The system simultaneously collects identity authentication data, physical status data, and visual observation data of the turnstile passage area. The segmentation starts at the moment the turnstile gate changes from closed to open and ends at the moment it returns to closed. Video segments are extracted within each segmentation period. For each video segment, multi-target detection and identity association are performed to identify all human targets appearing in the video segment, calculate the dwell time of each human target in the gate channel area, determine whether each human target has completed the position migration from the front side of the gate to the back side of the gate, match the authentication success mark with the human targets in the video segment, and establish a mapping relationship between authentication identity and visual targets; Based on the mapping relationship, the compliance of passage is verified. When multiple human targets that have completed position migration are detected within a gate opening cycle, but only one authentication success mark exists, it is determined to be a tailgating violation. When there is an authentication success mark but the corresponding human target has not completed position migration, it is determined to be a false passage violation. When a human target is detected to have completed position migration but the infrared beam trigger signal of the physical state data does not generate a trigger record and the human target's movement trajectory does not pass through the gate channel area, it is determined to be a detour violation.

[0006] Simultaneously collect identity authentication data, physical status data, and visual observation data of the turnstile passage area. Use the moment the turnstile gate changes from closed to open as the segmentation start point and the moment the gate gate returns to closed as the segmentation end point. Extract video segments within each segmentation period, including: The identity authentication data includes an authentication success identifier and an authentication timestamp; the physical status data includes the gate's opening and closing status and the infrared beam trigger signal; and the visual observation data is a continuous video frame sequence. The gate opening and closing status signal in the physical status data is monitored in real time. When the gate opening and closing status signal is detected to jump from the first state value representing the closed state to the second state value representing the open state, the current time is recorded as the first timestamp and marked as the segmentation start point, and the data capture process of the video buffer is triggered at the same time. The gate opening and closing status signal is continuously monitored. When the gate opening and closing status signal is detected to recover from the second status value to the first status value, the current time is recorded as the second timestamp and marked as the segmentation endpoint. The time interval between the first timestamp and the second timestamp is calculated as the gate opening duration. Extract a continuous video frame sequence from the video buffer whose time range covers the first timestamp to the second timestamp, encapsulate the continuous video frame sequence into an independent video segment, assign a unique identifier to the independent video segment, and associate the first timestamp, the second timestamp, and the duration of the gate opening as metadata to the independent video segment.

[0007] For each video segment, perform multi-target detection and identity association, identify all human targets appearing in the video segment, calculate the dwell time of each human target within the turnstile channel area, and determine whether each human target has completed the migration from the front side of the turnstile to the rear side of the turnstile, including: Human target detection is performed on each video frame in the video segment, and the bounding box coordinates and confidence scores of the detected human targets in each frame are output. An appearance feature vector is extracted for each human target. Based on the appearance feature vector, feature matching is performed between adjacent video frames. The detection results belonging to the same human target are associated as a continuous trajectory, and a unique trajectory identifier is assigned to each trajectory. Based on the preset gate channel area boundary coordinates, determine whether the bounding box coordinates in each trajectory are within the spatial range defined by the gate channel area boundary coordinates, count the number of video frames for each trajectory within the spatial range defined by the gate channel area boundary coordinates, and convert the number of video frames into dwell time based on the frame rate of the video segment. Define the boundary coordinates of the area in front of the gate and the area in back of the gate. Extract the starting boundary frame coordinates and the ending boundary frame coordinates of each trajectory. Determine whether the starting boundary frame coordinates are within the spatial range defined by the boundary coordinates of the area in front of the gate and whether the ending boundary frame coordinates are within the spatial range defined by the boundary coordinates of the area in back of the gate. When both determination results are yes, determine that the human target corresponding to the trajectory has completed the position migration.

[0008] Based on appearance feature vectors, feature matching is performed between adjacent video frames, and detection results belonging to the same human target are associated as continuous trajectories, including: For each human target detected in the current video frame, a multi-level feature representation is extracted from the image region defined by its bounding box coordinates. The multi-level feature representation includes color histogram features, texture gradient features, and deep semantic features. The multi-level feature representation is then concatenated and fused into a fixed-dimensional appearance feature vector. All active trajectories are obtained from the established trajectory set. The active trajectories are those that have detection results in the previous video frame. The appearance feature vector of each active trajectory in the previous video frame is extracted. The feature similarity between the appearance feature vector of each human target in the current video frame and the appearance feature vector of each active trajectory is calculated. A bipartite graph matching model is constructed, wherein the first vertex set of the bipartite graph matching model corresponds to the human target in the current video frame, the second vertex set of the bipartite graph matching model corresponds to the active trajectory, and the edge weight of the bipartite graph matching model is determined by weighting the feature similarity and the spatial distance of the trajectory motion prediction position, thereby obtaining the association pairs between the human target and the active trajectory in the current video frame; The association between human targets and active trajectories is added to the trajectory record of the corresponding active trajectory. New trajectories are initialized and new trajectory identifiers are assigned to human targets that fail to match. Active trajectories that fail to match are marked as temporarily interrupted. When the temporary interruption status lasts for more than a preset frame threshold, the trajectory is terminated.

[0009] The successful authentication identifier is matched with the human target in the video clip to establish a mapping relationship between the authenticated identity and the visual target, including: Extract the facial image features corresponding to the successful authentication identifier. The facial image features are the feature representation of the facial image collected by the identity authentication module when a person passes the identity verification and encoded by the feature extraction network. Face detection is performed on each video frame in the video segment to locate all face regions appearing in the video frame. Feature extraction is performed on the detected face regions to generate corresponding visual face feature vectors. The visual face feature vectors are spatiotemporally associated with the established human target trajectory to determine the human target trajectory identifier to which each visual face feature vector belongs. Calculate the feature similarity between the facial image features and the visual facial feature vector corresponding to each human target trajectory in the video segment, and select the human target trajectory with the highest feature similarity and exceeding the preset similarity threshold as the candidate matching trajectory; Verify the time consistency of the candidate matching trajectory, determine whether the start time of the candidate matching trajectory is within the time window range of the authentication timestamp of the identity authentication data, and when the time consistency verification is successful, establish a mapping relationship between the authentication success identifier and the trajectory identifier of the candidate matching trajectory.

[0010] Verifying compliance based on the mapping relationship includes: For each gate opening cycle, the number of human targets that complete the migration from the front to the back of the gate in the video segment is counted. The number of successful authentication markers that overlap with the time range of the video segment is obtained by querying the identity authentication data. The number of human targets is compared with the number of successful authentication markers. When the number of human targets is greater than the number of authentication success identifiers, it indicates that there are human targets that have completed location migration without identity authentication. Further check the mapping relationship, identify the trajectory of human targets that has not been mapped with any authentication success identifier, mark these human target trajectories as tailing violation trajectories, and generate tailing violation records. When the number of successful authentication identifiers is greater than zero but the human target trajectory in the corresponding mapping relationship has not completed the position migration, it indicates that there is a case where identity authentication is successful but the gate is not actually passed. Check the coordinates of the termination boundary box of the human target trajectory. When the coordinates of the termination boundary box are still within the spatial range limited by the boundary coordinates of the area in front of the gate, it is determined to be a false passage violation and a false passage violation record is generated. When a human target trajectory has been completed and the infrared beam trigger signal record of the physical state data is found to have no trigger record within the time period corresponding to the human target trajectory, the movement path of the human target trajectory is extracted, and it is determined whether the movement path passes through the spatial range defined by the boundary coordinates of the gate channel area. If the movement path does not pass through the spatial range, it is determined to be a detour violation, and a detour violation record is generated.

[0011] A second aspect of the present invention provides a construction site gate access violation analysis system based on a large model, comprising: The data acquisition unit is used to simultaneously collect identity authentication data, physical status data and visual observation data of the gate passage area. The moment when the gate body changes from the closed state to the open state is used as the segmentation start point, and the moment when the gate body returns from the open state to the closed state is used as the segmentation end point, and video segments within each segmentation period are extracted. The identity association unit is used to perform multi-target detection and identity association on each video segment, identify all human targets appearing in the video segment, calculate the dwell time of each human target in the gate channel area, determine whether each human target has completed the position migration from the front side of the gate to the back side of the gate, match the authentication success mark with the human targets in the video segment, and establish a mapping relationship between the authentication identity and the visual target. The compliance verification unit is used to verify the compliance of passage based on the mapping relationship. When multiple human targets that have completed position migration are detected within a gate opening cycle, but only one authentication success mark exists, it is determined to be a tailgating violation. When there is an authentication success mark but the corresponding human target has not completed position migration, it is determined to be a false passage violation. When a human target is detected to have completed position migration but the infrared beam trigger signal of the physical state data does not generate a trigger record and the human target's movement trajectory does not pass through the gate channel area, it is determined to be a detour violation.

[0012] A third aspect of the present invention provides an electronic device, comprising: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the aforementioned method.

[0013] A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the aforementioned method.

[0014] This method employs a video segmentation strategy based on the physical opening and closing actions of the turnstile gate, ensuring a strict correspondence between each analysis cycle and an actual single passage event. This approach avoids the periodic misalignments that may arise from relying solely on time or event signals, providing an accurate and unambiguous time window basis for subsequent refined behavioral analysis.

[0015] Within a single passage cycle, the method comprehensively utilizes multi-target detection, tracking, and identity association technologies. It not only identifies all human targets appearing in the video but also characterizes each target's passage intention and completion status by calculating dwell time and determining location migration (from the front to the back of the gate). Simultaneously, it performs feature matching between the successful authentication system identifier and visual targets in the video, establishing a reliable mapping relationship between "authenticated identity" and "visual entity." This step is crucial for distinguishing compliant passage from identity theft, tailgating, and other violations.

[0016] Based on the established precise mapping relationship, the system executes multi-dimensional compliance verification logic. When multiple independent human targets are detected to have completed the crossing, but the system only records one successful authentication, it can be clearly determined as a tailgating violation. When there is a successful authentication record, but the associated visual target has not actually completed the crossing of the gate channel, it is determined as a false passage or identity theft violation. In addition, by comparing the human movement trajectory with the gate channel area and the infrared beam trigger status, it can effectively identify unauthorized entry behaviors such as bypassing the normal channel or circumventing physical detection equipment. Attached Figure Description

[0017] Figure 1 This is a flowchart illustrating a method for analyzing violations at construction site gates based on a large model. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] The technical solution of the present invention will be described in detail below with reference to specific embodiments. These specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

[0020] Figure 1 This is a flowchart illustrating a method for analyzing violations at construction site gates based on a large model, such as... Figure 1 As shown, the method includes: The system simultaneously collects identity authentication data, physical status data, and visual observation data of the turnstile passage area. The segmentation starts at the moment the turnstile gate changes from closed to open and ends at the moment it returns to closed. Video segments are extracted within each segmentation period. For each video segment, multi-target detection and identity association are performed to identify all human targets appearing in the video segment, calculate the dwell time of each human target in the gate channel area, determine whether each human target has completed the position migration from the front side of the gate to the back side of the gate, match the authentication success mark with the human targets in the video segment, and establish a mapping relationship between authentication identity and visual targets; Based on the mapping relationship, the compliance of passage is verified. When multiple human targets that have completed position migration are detected within a gate opening cycle, but only one authentication success mark exists, it is determined to be a tailgating violation. When there is an authentication success mark but the corresponding human target has not completed position migration, it is determined to be a false passage violation. When a human target is detected to have completed position migration but the infrared beam trigger signal of the physical state data does not generate a trigger record and the human target's movement trajectory does not pass through the gate channel area, it is determined to be a detour violation.

[0021] For example, identity authentication data, physical status data, and visual observation data of the turnstile passage area are collected synchronously. The moment the turnstile gate changes from a closed state to an open state is used as the segmentation starting point, and the moment the turnstile gate returns from an open state to a closed state is used as the segmentation ending point. Video segments within each segmentation period are extracted, including: The identity authentication data includes an authentication success identifier and an authentication timestamp; the physical status data includes the gate's opening and closing status and the infrared beam trigger signal; and the visual observation data is a continuous video frame sequence. The gate opening and closing status signal in the physical status data is monitored in real time. When the gate opening and closing status signal is detected to jump from the first state value representing the closed state to the second state value representing the open state, the current time is recorded as the first timestamp and marked as the segmentation start point, and the data capture process of the video buffer is triggered at the same time. The gate opening and closing status signal is continuously monitored. When the gate opening and closing status signal is detected to recover from the second status value to the first status value, the current time is recorded as the second timestamp and marked as the segmentation endpoint. The time interval between the first timestamp and the second timestamp is calculated as the gate opening duration. Extract a continuous video frame sequence from the video buffer whose time range covers the first timestamp to the second timestamp, encapsulate the continuous video frame sequence into an independent video segment, assign a unique identifier to the independent video segment, and associate the first timestamp, the second timestamp, and the duration of the gate opening as metadata to the independent video segment.

[0022] The system in this application synchronously acquires three types of data sources through a data acquisition module: identity authentication data is read in real time from the authentication interface of the gate controller, and a data record containing the employee number, authentication timestamp, and authentication result is generated after each successful card swipe or face recognition; physical status data is acquired through GPIO pins to acquire the limit switch status of the gate body, and the gate body is closed or open through 0 or 1 level signals. At the same time, the infrared beam sensor signals installed on the columns on both sides of the gate channel are acquired, and a trigger record is generated when a human body blocks the infrared beam; visual observation data is acquired by a network camera mounted above the gate at a rate of 25 frames per second, with the video resolution set to 1920×1080 pixels. The camera's downward angle is maintained at 45 degrees to ensure that the waiting area in front of the gate, the gate channel area, and the passage area behind the gate are covered simultaneously.

[0023] The data acquisition module uses a circular queue structure to maintain the video buffer, with a capacity set to store 20 seconds of video frame data. The real-time monitoring thread polls the gate's opening / closing status signal every 10 milliseconds. When it detects the signal transitioning from a low level (0) indicating closed to a high level (1) indicating open, it immediately calls the system clock function to obtain the current Unix timestamp, recording it as T_open and marking it as the start of the segmentation. Simultaneously, it sends a data lock command to the video buffer to prevent old data from being overwritten. The monitoring thread continues to track changes in the gate's opening / closing status signal. When the signal returns from high level 1 to low level 0, it obtains the current timestamp, recording it as T_close and marking it as the end of the segmentation, calculating the time difference as the duration of the gate's open position.

[0024] The video extraction module locates the corresponding video frame sequence from the video buffer based on the two timestamps, T_open and T_close. Specifically, it finds the first frame by comparing the PTS timestamp of each video frame with the value of T_open, and the last frame by comparing the PTS timestamp with the value of T_close. All consecutive frames between these timestamps are then extracted in their original order. The extracted video frame sequence is encapsulated into an MP4 container file using H.264 encoding, named "gate_event_number_timestamp.mp4". To ensure traceability in subsequent processing, the three time parameters—T_open, T_close, and the time difference—are written to the metadata segment of the video file in JSON format. Simultaneously, a 32-bit hexadecimal unique identifier (UUID) composed of a timestamp and a random number is generated and embedded into the metadata. The encapsulated video segment is then transmitted to the processing queue for subsequent object detection and behavior analysis modules. The original buffer data is then unlocked and resumes normal cyclic writing.

[0025] For example, performing multi-target detection and identity association on each video segment, identifying all human targets appearing in the video segment, calculating the dwell time of each human target within the turnstile channel area, and determining whether each human target has completed the migration from the front side of the turnstile to the rear side of the turnstile includes: Human target detection is performed on each video frame in the video segment, and the bounding box coordinates and confidence scores of the detected human targets in each frame are output. An appearance feature vector is extracted for each human target. Based on the appearance feature vector, feature matching is performed between adjacent video frames. The detection results belonging to the same human target are associated as a continuous trajectory, and a unique trajectory identifier is assigned to each trajectory. Based on the preset gate channel area boundary coordinates, determine whether the bounding box coordinates in each trajectory are within the spatial range defined by the gate channel area boundary coordinates, count the number of video frames for each trajectory within the spatial range defined by the gate channel area boundary coordinates, and convert the number of video frames into dwell time based on the frame rate of the video segment. Define the boundary coordinates of the area in front of the gate and the area in back of the gate. Extract the starting boundary frame coordinates and the ending boundary frame coordinates of each trajectory. Determine whether the starting boundary frame coordinates are within the spatial range defined by the boundary coordinates of the area in front of the gate and whether the ending boundary frame coordinates are within the spatial range defined by the boundary coordinates of the area in back of the gate. When both determination results are yes, determine that the human target corresponding to the trajectory has completed the position migration.

[0026] For video segment processing, a deep learning-based multi-object detection algorithm is used to analyze each video frame frame by frame. After inputting a single-frame image, the detection algorithm extracts image features through a convolutional neural network and predicts the location of human targets on the feature map. For each detected human target, the algorithm outputs bounding box coordinates containing four values: the x-coordinate and y-coordinate of the top-left corner of the bounding box, and the width and height of the bounding box. It also outputs a confidence score for the detection result, ranging from 0 to 1, with a value closer to 1 indicating higher reliability. A confidence threshold of 0.6 is set, and only detection results with confidence scores greater than this threshold are retained for subsequent processing.

[0027] To establish cross-frame target association, a 512-dimensional appearance feature vector is extracted for each detected human target. The feature extraction network crops the bounding box region of the human target, inputs the cropped image patch into the feature extraction network, and generates a fixed-dimensional feature representation through multiple convolutional and pooling operations. When performing feature matching between adjacent video frames, the cosine similarity between the feature vectors of each detected target in the current frame and all existing trajectories in the previous frame is calculated. The similarity calculation formula is the dot product of the two feature vectors divided by the product of their respective magnitudes. When the similarity is greater than 0.75 and the Euclidean distance between the center points of the detection boxes in the current frame and the previous frame is less than 100 pixels, they are determined to be the same target, and the detection result of the current frame is associated with the corresponding existing trajectory. For detection results that cannot be matched with an existing trajectory, a new trajectory is created and a unique integer identifier is assigned, with the identifier starting from 1 and incrementing. When a trajectory does not obtain a new detection result match for 30 consecutive frames, the trajectory is considered terminated.

[0028] The boundary coordinates of the turnstile channel area are defined by marking the pixel coordinates of four vertices on the monitoring screen, forming a rectangular area. For each trajectory, the center point coordinates of the bounding box in each frame are checked to see if they are within the rectangular area of ​​the turnstile channel. Specifically, the x-coordinate of the center point must be greater than or equal to the x-coordinate of the left boundary of the turnstile channel and less than or equal to the x-coordinate of the right boundary; simultaneously, the y-coordinate of the center point must be greater than or equal to the y-coordinate of the upper boundary of the turnstile channel and less than or equal to the y-coordinate of the lower boundary. The total number of video frames meeting these conditions is counted and recorded as the number of frames the trajectory resides within the channel area. Based on the frame rate of the video segment, typically 25 frames per second, dividing the number of residing frames by the frame rate yields the dwell time in seconds.

[0029] Determining location migration requires pre-defining the boundary coordinates of the areas in front of and behind the turnstile. The area in front of the turnstile is set within 1 to 3 meters in front of the turnstile gate, and the area behind the turnstile is set within 1 to 3 meters behind the turnstile gate. Both areas are marked as rectangles on the monitoring screen. The starting boundary frame coordinates (the center point coordinates of the boundary frame when the trajectory first appears) and the ending boundary frame coordinates (the center point coordinates of the boundary frame when the trajectory last appears) of each trajectory are extracted. It is then determined whether the center point of the starting boundary frame is within the rectangular area of ​​the area in front of the turnstile, and simultaneously whether the center point of the ending boundary frame is within the rectangular area of ​​the area behind the turnstile. Only when both conditions are met is the human target corresponding to the trajectory confirmed to have completed a full location migration process from the front to the back of the turnstile. The trajectory is then marked as a completed migration trajectory, and its trajectory identifier is recorded for subsequent violation determination.

[0030] For example, performing feature matching between adjacent video frames based on appearance feature vectors and associating detection results belonging to the same human target as a continuous trajectory includes: For each human target detected in the current video frame, a multi-level feature representation is extracted from the image region defined by its bounding box coordinates. The multi-level feature representation includes color histogram features, texture gradient features, and deep semantic features. The multi-level feature representation is then concatenated and fused into a fixed-dimensional appearance feature vector. All active trajectories are obtained from the established trajectory set. The active trajectories are those that have detection results in the previous video frame. The appearance feature vector of each active trajectory in the previous video frame is extracted. The feature similarity between the appearance feature vector of each human target in the current video frame and the appearance feature vector of each active trajectory is calculated. A bipartite graph matching model is constructed, wherein the first vertex set of the bipartite graph matching model corresponds to the human target in the current video frame, the second vertex set of the bipartite graph matching model corresponds to the active trajectory, and the edge weight of the bipartite graph matching model is determined by weighting the feature similarity and the spatial distance of the trajectory motion prediction position, thereby obtaining the association pairs between the human target and the active trajectory in the current video frame; The association between human targets and active trajectories is added to the trajectory record of the corresponding active trajectory. New trajectories are initialized and new trajectory identifiers are assigned to human targets that fail to match. Active trajectories that fail to match are marked as temporarily interrupted. When the temporary interruption status lasts for more than a preset frame threshold, the trajectory is terminated.

[0031] In the process of establishing the mapping relationship between authentication identity and visual target, it is necessary to perform feature matching between adjacent video frames through appearance feature vectors, and associate the detection results belonging to the same human target as a continuous trajectory.

[0032] For each human target detected in the current video frame, multi-level feature representations are extracted from the image region defined by its bounding box coordinates. Color histogram features are obtained by quantizing the image region in the HSV color space, dividing hue, saturation, and brightness into 16, 8, and 8 quantization levels respectively, forming a 1024-dimensional color distribution vector. Texture gradient features employ the oriented gradient histogram algorithm, dividing the image region into 8×8 pixel cells, calculating the gradient direction distribution within each cell, and statistically normalizing it according to 9 direction intervals to form a gradient vector describing local shape features. Deep semantic features are extracted through a pre-trained convolutional neural network, inputting the image region into the feature layer before the fully connected layer of the network to obtain a 512-dimensional high-level semantic representation. These three types of features are concatenated and fused according to their dimensions, and then reduced to 256 dimensions through principal component analysis to obtain a fixed-dimensional appearance feature vector.

[0033] Retrieve all active trajectories from the established trajectory set, and extract the appearance feature vector of each active trajectory from the previous video frame. Calculate the cosine similarity between the appearance feature vector of each human target in the current video frame and the appearance feature vector of each active trajectory, denoted as s. ij , where i represents the human target index in the current frame and j represents the active trajectory index. Simultaneously, based on the historical motion state of each active trajectory, a Kalman filter is used to predict the trajectory's position in the current frame. The Euclidean distance between the predicted position and the center point of the detection box in the current frame is calculated and normalized, denoted as the spatial distance cost.

[0034] A bipartite graph matching model is constructed. The first vertex set contains all human targets detected in the current video frame, and the second vertex set contains all active trajectories. Edge weights are determined by fusing feature similarity and spatial distance cost. Specifically, feature similarity is converted into a similarity score, which is then weighted and fused with the spatial distance cost at weights of 0.6 and 0.4, respectively. The Hungarian algorithm is used to solve for the maximum weighted matching in the bipartite graph, retaining only matching pairs with a weight greater than 0.5 as valid association pairs.

[0035] The association pairs of successfully matched human targets and active trajectories are appended to the trajectory record of the corresponding active trajectory. The appearance feature vector of the trajectory is updated to the feature vector of the detection result of the current frame, and the motion state parameters of the trajectory are also updated. For unmatched human targets, a new trajectory is initialized, a unique trajectory identifier number is assigned, and the detection result of the current frame is used as the first node of the trajectory. Unmatched active trajectories are marked as temporarily interrupted, and the interruption start frame number is recorded. When the temporary interruption lasts for more than 30 frames, the trajectory is determined to have ended, removed from the active trajectory set, and stored in the historical trajectory database. Through the above processing, continuous tracking of human targets across frames is achieved, providing trajectory data support for subsequent determination of whether the human target has completed position migration.

[0036] For example, feature matching is performed between the authentication success identifier and the human target in the video clip to establish a mapping relationship between the authentication identity and the visual target, including: Extract the facial image features corresponding to the successful authentication identifier. The facial image features are the feature representation of the facial image collected by the identity authentication module when a person passes the identity verification and encoded by the feature extraction network. Face detection is performed on each video frame in the video segment to locate all face regions appearing in the video frame. Feature extraction is performed on the detected face regions to generate corresponding visual face feature vectors. The visual face feature vectors are spatiotemporally associated with the established human target trajectory to determine the human target trajectory identifier to which each visual face feature vector belongs. Calculate the feature similarity between the facial image features and the visual facial feature vector corresponding to each human target trajectory in the video segment, and select the human target trajectory with the highest feature similarity and exceeding the preset similarity threshold as the candidate matching trajectory; Verify the time consistency of the candidate matching trajectory, determine whether the start time of the candidate matching trajectory is within the time window range of the authentication timestamp of the identity authentication data, and when the time consistency verification is successful, establish a mapping relationship between the authentication success identifier and the trajectory identifier of the candidate matching trajectory.

[0037] In turnstile access scenarios, successful authentication typically includes structured information such as employee ID and name, as well as a facial image captured by the authentication module at the moment of verification. This facial image is processed by a pre-trained feature extraction network to generate a 512-dimensional or 1024-dimensional facial feature vector. This feature vector possesses high discriminative power and stability, effectively representing the facial features of a specific individual. The feature extraction network can employ ArcFace or CosFace architectures and is trained on a large-scale facial dataset.

[0038] For the video segment to be analyzed, face detection is performed frame by frame at a sampling frequency of 30 frames per second or 15 frames per second. MTCNN or RetinaFace detectors are used to locate the face regions in each frame and obtain the coordinates of the face bounding boxes. When the width and height of the detected face bounding box are both greater than 64 pixels, the face is considered to have sufficient resolution for feature extraction. An affine transformation is performed on each valid face region to align the face to a standard pose, and then the result is input into the same feature extraction network as the identity authentication module to generate the corresponding visual face feature vector.

[0039] When associating visual facial feature vectors with human target trajectories, the principle of spatiotemporal consistency must be followed. For a face detected in a frame, the spatial distance between that coordinate point and all human target detection boxes in the current frame is calculated based on the center coordinates of its bounding box. When the face center point falls inside a human detection box, or the vertical distance to a human detection box is less than 0.3 times the height of the detection box, the visual facial feature vector is associated with the corresponding human target trajectory identifier. A human target may generate multiple visual facial feature vectors throughout the entire video segment; the mean or median of these vectors is taken as the representative facial feature of the trajectory.

[0040] Cosine similarity is used to calculate feature similarity. The feature vector of a face image is denoted as vector A, and the representative face feature vector of a human target trajectory is denoted as vector B. The dot product of the two vectors is calculated and divided by the product of their respective magnitudes. The similarity value range is [-1, 1], with values ​​closer to 1 indicating greater feature similarity. A preset similarity threshold of 0.65 is set; when the calculated similarity exceeds this threshold, the feature match is considered reliable. All human target trajectories are traversed, and the trajectory with the highest similarity is selected as the candidate matching trajectory.

[0041] Temporal consistency verification is used to exclude matching results with temporal logic errors. Identity authentication data includes an authentication timestamp, marking the specific moment authentication was completed. Candidate matching trajectories have a start time, corresponding to the video frame time when the human target was first detected. The time window is defined as the interval from 2 seconds before to 5 seconds after the authentication timestamp. When the start time of a candidate matching trajectory falls within this time window, the temporal consistency verification is considered successful. If a human target has entered the gate area for more than 10 seconds before authentication, even if the facial feature similarity is high, no mapping relationship is established to avoid misjudging a bystander as an authenticator. After successful verification, a key-value mapping relationship is established between the successful authentication identifier (employee number or unique identifier) ​​and the trajectory identifier of the candidate matching trajectory, stored in an association table for subsequent compliance judgment.

[0042] For example, verifying compliance based on the mapping relationship includes: For each gate opening cycle, the number of human targets that complete the migration from the front to the back of the gate in the video segment is counted. The number of successful authentication markers that overlap with the time range of the video segment is obtained by querying the identity authentication data. The number of human targets is compared with the number of successful authentication markers. When the number of human targets is greater than the number of authentication success identifiers, it indicates that there are human targets that have completed location migration without identity authentication. Further check the mapping relationship, identify the trajectory of human targets that has not been mapped with any authentication success identifier, mark these human target trajectories as tailing violation trajectories, and generate tailing violation records. When the number of successful authentication identifiers is greater than zero but the human target trajectory in the corresponding mapping relationship has not completed the position migration, it indicates that there is a case where identity authentication is successful but the gate is not actually passed. Check the coordinates of the termination boundary box of the human target trajectory. When the coordinates of the termination boundary box are still within the spatial range limited by the boundary coordinates of the area in front of the gate, it is determined to be a false passage violation and a false passage violation record is generated. When a human target trajectory has been completed and the infrared beam trigger signal record of the physical state data is found to have no trigger record within the time period corresponding to the human target trajectory, the movement path of the human target trajectory is extracted, and it is determined whether the movement path passes through the spatial range defined by the boundary coordinates of the gate channel area. If the movement path does not pass through the spatial range, it is determined to be a detour violation, and a detour violation record is generated.

[0043] After establishing the mapping relationship between authenticated identities and visual targets, the system verifies violations for video segments corresponding to each gate opening cycle. Specifically, it first iterates through all extracted video segments, calculating the timestamp range for each segment. This timestamp range is determined by the gate's opening and closing times. Within this time range, the total number of human targets marked as "completed location migration" in the video detection results is counted, denoted as N. pass Simultaneously, the identity authentication database is queried to retrieve all successfully authenticated records that overlap with the time range of the video clip, and the number of successfully authenticated records is counted and denoted as N. auth .

[0044] When performing a quantity comparison, when N pass Greater than N auth When this occurs, it indicates that an unauthenticated person has passed through the turnstile. The mapping table is then retrieved to find all human target trajectories not associated with a successful authentication identifier. For each unassociated trajectory, its complete bounding box coordinate sequence is extracted. It is verified that the center point of the trajectory's starting bounding box is within the coordinate range of the area in front of the turnstile, and the center point of its ending bounding box is within the coordinate range of the area behind the turnstile. Trajectories meeting these conditions are marked as tailgating violation trajectories, and the system generates a tailgating violation record. The record includes a video clip timestamp, the coordinate sequence of the violation's trajectory, the violation's feature image, and the authentication information of the person being tailgated.

[0045] When N authWhen the value is greater than zero, check the trajectory of the human target corresponding to each successful authentication identifier in the mapping relationship. For each trajectory, read the bounding box coordinates of its last frame detection result and calculate the coordinates of the center point of the bounding box. Obtain the spatial boundary coordinates of the area in front of the gate. These boundary coordinates are usually determined by the installation position of the gate. The center line of the gate body is used as the boundary, and the area in front is defined as the range from 0 to 1.5 meters away from the center line. Determine whether the center point of the trajectory termination bounding box is still within the boundary of the area in front. If it is within this range, it means that the person did not actually pass through the gate after successful authentication, which is judged as a false passage violation. A false passage violation record is generated, which includes the authentication timestamp, authentication identity information, person's stay duration, and final location coordinates.

[0046] For detecting detour violations, the system iterates through all human target trajectories that have completed position migration, reading the start and end times of the corresponding time period for each trajectory. Based on this time period, it queries the infrared beam trigger signal record table in the physical state data to check for any trigger records within that time period. Infrared beam sensors are installed on both sides of the turnstile channel, and a signal is always triggered during normal passage. When the query result shows no trigger record, the coordinates of the center points of all bounding boxes of the human target trajectory are extracted and connected in chronological order to form the movement path. The boundary coordinates of the turnstile channel area are obtained; this area is typically defined as a rectangular area extending 0.3 meters on each side of the turnstile gate. It is determined whether any coordinate point on the movement path falls within this rectangular area. If all coordinate points are outside this area, it is confirmed that the person has completed position migration by bypassing the turnstile channel, and this is judged as a detour violation. A detour violation record is generated, containing the complete movement trajectory coordinates, the violation time period, the person's characteristic image, and a visual annotation of the detour path. All violation records are pushed to the management platform in real time and trigger on-site audible and visual alarm devices.

[0047] A second aspect of the present invention provides a construction site gate access violation analysis system based on a large model, comprising: The data acquisition unit is used to simultaneously collect identity authentication data, physical status data and visual observation data of the gate passage area. The moment when the gate body changes from the closed state to the open state is used as the segmentation start point, and the moment when the gate body returns from the open state to the closed state is used as the segmentation end point, and video segments within each segmentation period are extracted. The identity association unit is used to perform multi-target detection and identity association on each video segment, identify all human targets appearing in the video segment, calculate the dwell time of each human target in the gate channel area, determine whether each human target has completed the position migration from the front side of the gate to the back side of the gate, match the authentication success mark with the human targets in the video segment, and establish a mapping relationship between the authentication identity and the visual target. The compliance verification unit is used to verify the compliance of passage based on the mapping relationship. When multiple human targets that have completed position migration are detected within a gate opening cycle, but only one authentication success mark exists, it is determined to be a tailgating violation. When there is an authentication success mark but the corresponding human target has not completed position migration, it is determined to be a false passage violation. When a human target is detected to have completed position migration but the infrared beam trigger signal of the physical state data does not generate a trigger record and the human target's movement trajectory does not pass through the gate channel area, it is determined to be a detour violation.

[0048] A third aspect of the present invention provides an electronic device, comprising: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the aforementioned method.

[0049] A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the aforementioned method.

[0050] This invention can be a method, apparatus, system, and / or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of the invention.

[0051] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for analyzing violations of access rules at construction site gates based on a large model, characterized in that, include: The system simultaneously collects identity authentication data, physical status data, and visual observation data of the turnstile passage area. The segmentation starts at the moment the turnstile gate changes from a closed state to an open state and ends at the moment the gate gate returns to a closed state. Video segments are extracted within each segmentation period. For each video segment, multi-target detection and identity association are performed to identify all human targets appearing in the video segment, calculate the dwell time of each human target in the gate channel area, determine whether each human target has completed the position migration from the front side of the gate to the back side of the gate, match the authentication success mark with the human targets in the video segment, and establish a mapping relationship between authentication identity and visual targets; Based on the mapping relationship, the compliance of passage is verified. When multiple human targets that have completed position migration are detected within a gate opening cycle, but only one authentication success mark exists, it is determined to be a tailgating violation. When there is an authentication success mark but the corresponding human target has not completed position migration, it is determined to be a false passage violation. When a human target is detected to have completed position migration but the infrared beam trigger signal of the physical state data does not generate a trigger record and the human target's movement trajectory does not pass through the gate channel area, it is determined to be a detour violation.

2. The method according to claim 1, characterized in that, Simultaneously collect identity authentication data, physical status data, and visual observation data of the turnstile passage area. Use the moment the turnstile gate changes from closed to open as the segmentation start point and the moment the gate gate returns to closed as the segmentation end point. Extract video segments within each segmentation period, including: The identity authentication data includes an authentication success identifier and an authentication timestamp; the physical status data includes the gate's opening and closing status and the infrared beam trigger signal; and the visual observation data is a continuous video frame sequence. The gate opening and closing status signal in the physical status data is monitored in real time. When the gate opening and closing status signal is detected to jump from the first state value representing the closed state to the second state value representing the open state, the current time is recorded as the first timestamp and marked as the segmentation start point, and the data capture process of the video buffer is triggered at the same time. The gate opening and closing status signal is continuously monitored. When the gate opening and closing status signal is detected to recover from the second status value to the first status value, the current time is recorded as the second timestamp and marked as the segmentation endpoint. The time interval between the first timestamp and the second timestamp is calculated as the gate opening duration. Extract a continuous video frame sequence from the video buffer whose time range covers the first timestamp to the second timestamp, encapsulate the continuous video frame sequence into an independent video segment, assign a unique identifier to the independent video segment, and associate the first timestamp, the second timestamp, and the duration of the gate opening as metadata to the independent video segment.

3. The method according to claim 1, characterized in that, For each video segment, perform multi-target detection and identity association, identify all human targets appearing in the video segment, calculate the dwell time of each human target within the turnstile channel area, and determine whether each human target has completed the migration from the front side of the turnstile to the rear side of the turnstile, including: Human target detection is performed on each video frame in the video segment, and the bounding box coordinates and confidence scores of the detected human targets in each frame are output. An appearance feature vector is extracted for each human target. Based on the appearance feature vector, feature matching is performed between adjacent video frames. The detection results belonging to the same human target are associated as a continuous trajectory, and a unique trajectory identifier is assigned to each trajectory. Based on the preset gate channel area boundary coordinates, determine whether the bounding box coordinates in each trajectory are within the spatial range defined by the gate channel area boundary coordinates, count the number of video frames for each trajectory within the spatial range defined by the gate channel area boundary coordinates, and convert the number of video frames into dwell time based on the frame rate of the video segment. Define the boundary coordinates of the area in front of the gate and the area in back of the gate. Extract the starting boundary frame coordinates and the ending boundary frame coordinates of each trajectory. Determine whether the starting boundary frame coordinates are within the spatial range defined by the boundary coordinates of the area in front of the gate and whether the ending boundary frame coordinates are within the spatial range defined by the boundary coordinates of the area in back of the gate. When both determination results are yes, determine that the human target corresponding to the trajectory has completed the position migration.

4. The method according to claim 3, characterized in that, Based on appearance feature vectors, feature matching is performed between adjacent video frames, and detection results belonging to the same human target are associated as continuous trajectories, including: For each human target detected in the current video frame, a multi-level feature representation is extracted from the image region defined by its bounding box coordinates. The multi-level feature representation includes color histogram features, texture gradient features, and deep semantic features. The multi-level feature representation is then concatenated and fused into a fixed-dimensional appearance feature vector. All active trajectories are obtained from the established trajectory set. The active trajectories are those that have detection results in the previous video frame. The appearance feature vector of each active trajectory in the previous video frame is extracted. The feature similarity between the appearance feature vector of each human target in the current video frame and the appearance feature vector of each active trajectory is calculated. A bipartite graph matching model is constructed, wherein the first vertex set of the bipartite graph matching model corresponds to the human target in the current video frame, the second vertex set of the bipartite graph matching model corresponds to the active trajectory, and the edge weight of the bipartite graph matching model is determined by weighting the feature similarity and the spatial distance of the trajectory motion prediction position, thereby obtaining the association pairs between the human target and the active trajectory in the current video frame; The association between human targets and active trajectories is added to the trajectory record of the corresponding active trajectory. New trajectories are initialized and new trajectory identifiers are assigned to human targets that fail to match. Active trajectories that fail to match are marked as temporarily interrupted. When the temporary interruption status lasts for more than a preset frame threshold, the trajectory is terminated.

5. The method according to claim 1, characterized in that, The successful authentication identifier is matched with the human target in the video clip to establish a mapping relationship between the authenticated identity and the visual target, including: Extract the facial image features corresponding to the successful authentication identifier. The facial image features are the feature representation of the facial image collected by the identity authentication module when a person passes the identity verification and encoded by the feature extraction network. Face detection is performed on each video frame in the video segment to locate all face regions appearing in the video frame. Feature extraction is performed on the detected face regions to generate corresponding visual face feature vectors. The visual face feature vectors are spatiotemporally associated with the established human target trajectory to determine the human target trajectory identifier to which each visual face feature vector belongs. Calculate the feature similarity between the facial image features and the visual facial feature vector corresponding to each human target trajectory in the video segment, and select the human target trajectory with the highest feature similarity and exceeding the preset similarity threshold as the candidate matching trajectory; Verify the time consistency of the candidate matching trajectory, determine whether the start time of the candidate matching trajectory is within the time window range of the authentication timestamp of the identity authentication data, and when the time consistency verification is successful, establish a mapping relationship between the authentication success identifier and the trajectory identifier of the candidate matching trajectory.

6. The method according to claim 1, characterized in that, Verifying compliance based on the mapping relationship includes: For each gate opening cycle, the number of human targets that complete the migration from the front to the back of the gate in the video segment is counted. The number of successful authentication markers that overlap with the time range of the video segment is obtained by querying the identity authentication data. The number of human targets is compared with the number of successful authentication markers. When the number of human targets is greater than the number of authentication success identifiers, it indicates that there are human targets that have completed location migration without identity authentication. Further check the mapping relationship, identify the trajectory of human targets that has not been mapped with any authentication success identifier, mark these human target trajectories as tailing violation trajectories, and generate tailing violation records. When the number of successful authentication identifiers is greater than zero but the human target trajectory in the corresponding mapping relationship has not completed the position migration, it indicates that there is a case where identity authentication is successful but the gate is not actually passed. Check the coordinates of the termination boundary box of the human target trajectory. When the coordinates of the termination boundary box are still within the spatial range limited by the boundary coordinates of the area in front of the gate, it is determined to be a false passage violation and a false passage violation record is generated. When a human target trajectory has been completed and the infrared beam trigger signal record of the physical state data is found to have no trigger record within the time period corresponding to the human target trajectory, the movement path of the human target trajectory is extracted, and it is determined whether the movement path passes through the spatial range defined by the boundary coordinates of the gate channel area. If the movement path does not pass through the spatial range, it is determined to be a detour violation, and a detour violation record is generated.

7. A construction site gate access violation analysis system based on a large model, used to implement the method as described in any one of claims 1-6, characterized in that, include: The data acquisition unit is used to simultaneously collect identity authentication data, physical status data and visual observation data of the gate passage area. The moment when the gate body changes from the closed state to the open state is used as the segmentation start point, and the moment when the gate body returns from the open state to the closed state is used as the segmentation end point, and video segments within each segmentation period are extracted. The identity association unit is used to perform multi-target detection and identity association on each video segment, identify all human targets appearing in the video segment, calculate the dwell time of each human target in the gate channel area, determine whether each human target has completed the position migration from the front side of the gate to the back side of the gate, match the authentication success mark with the human targets in the video segment, and establish a mapping relationship between the authentication identity and the visual target. The compliance verification unit is used to verify the compliance of passage based on the mapping relationship. When multiple human targets that have completed position migration are detected within a gate opening cycle, but only one authentication success mark exists, it is determined to be a tailgating violation. When there is an authentication success mark but the corresponding human target has not completed position migration, it is determined to be a false passage violation. When a human target is detected to have completed position migration but the infrared beam trigger signal of the physical state data does not generate a trigger record and the human target's movement trajectory does not pass through the gate channel area, it is determined to be a detour violation.

8. An electronic device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to invoke instructions stored in the memory to execute the method according to any one of claims 1 to 6.

9. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1 to 6.