An AI-powered real-time video behavior analysis system for police body cameras
By constructing and reconstructing the human body topology graph and combining it with graph convolutional networks, the problem of identifying and locating abnormal behavior in complex scenarios by police body cameras was solved, achieving accurate identification and real-time location of abnormal behavior in police body camera videos.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUIZHOU RUITONG COMMUNICATION TECHNOLOGY CO LTD
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-30
AI Technical Summary
Existing police body cameras struggle to accurately track individual limb movements in complex scenarios, making it difficult to perform fine-grained analysis and localization of abnormal behavior. In particular, the models have difficulty accurately identifying abnormal behavior in occluded or multi-person interaction scenarios.
By constructing an initial human body topology map, dynamic reconstruction is performed using a graph convolutional network with sparse constraints on topological connections. Local motion pattern features are extracted, and the local topology is reconstructed in potential abnormal regions. Key point location information is updated to generate behavioral anomaly analysis results.
It enhances the ability to respond to dynamic changes in complex behavioral structures, improves the accuracy of abnormal area discrimination and the reliability of abnormal pattern matching under background interference and partial occlusion conditions, and realizes accurate identification and real-time positioning of abnormal behaviors in police law enforcement recorder videos.
Smart Images

Figure CN122313352A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of video behavior analysis technology, and specifically to an AI video behavior real-time analysis system for police law enforcement recorders. Background Technology
[0002] With the widespread deployment of police body cameras in real-world scenarios such as street patrols, incident handling, and law enforcement evidence collection, the on-site video data they collect has played an irreplaceable role in behavior judgment, conflict reconstruction, and assessment of law enforcement effectiveness.
[0003] Existing systems typically rely on convolutional neural networks to extract static frame-level features from video images and classify and analyze overall behavior categories through human detection and action recognition models. Some methods also introduce temporal modeling to enhance the ability to recognize continuous actions. In clear scenes with no obstructions and stable behavior amplitudes, basic behavior recognition and early warning support can already be achieved.
[0004] However, the aforementioned systems generally ignore the hierarchical information of the human body's internal structure and the dynamic relationship between key points, and fail to construct a structural representation with topological constraints. As a result, when key parts are partially occluded, there is interference from multiple people interacting in the scene, or the rate of behavioral mutation is high, the model has difficulty accurately tracking the individual's limb movement state, leading to serious deviations in the prediction of abnormal behavior or frequent missed detections.
[0005] Furthermore, existing systems generally lack a mechanism for modeling differences in localized areas of motion, making it impossible to perform fine-grained analysis and localization of behavioral anomalies with sudden and covert characteristics. Summary of the Invention
[0006] The purpose of this invention is to provide an AI video behavior real-time analysis system for police law enforcement recorders, the system comprising: Topology construction module S101: used to collect spatial location information of key points of the human body based on real-time video, and to construct an initial human body topology diagram using the spatial location information; Topology Reconstruction Module S102: Used to dynamically reconstruct the topology of the initial human body topology map using a graph convolutional network that incorporates sparse constraints on topological connections, thereby obtaining a dynamic topology feature map; Feature extraction module S103: used to extract local motion pattern features of human behavior based on the dynamic topological feature map, and use the local motion pattern features to determine potential areas of abnormal behavior; Anomaly analysis module S104: used to perform local structural reconstruction of the spatial location information of human body key points based on the potential region, and to generate behavioral anomaly analysis results using the spatial location information of human body key points after local structural reconstruction.
[0007] Furthermore, the construction of the initial human body topology diagram includes: S101.1: Perform multi-scale convolution feature extraction processing on the current frame image of the real-time video to obtain image features at different scales; S101.2: After fusing the image features at different scales, a closed region containing the complete outer contour boundary of the human body is obtained; S101.3: Based on the complete outer contour boundary of the human body, the closed area is divided into several local areas corresponding to key points of the human body; S101.4: Extract the two-dimensional coordinates of key human body points for each local region, and use the two-dimensional coordinates to construct a complete initial human body topology diagram.
[0008] Furthermore, the multi-scale convolutional feature extraction process for the current frame image of the real-time video includes: S101.1.1: By sliding the convolution kernel, feature extraction processing is performed on the current frame image to obtain multiple initial convolution feature maps with different resolutions; S101.1.2: Perform feature fusion processing on the initial convolutional feature map to obtain a fused multi-scale feature map; S101.1.3: Perform human target region detection processing based on the fused multi-scale feature map to obtain preliminary detection results of the human target region; S101.1.4: Using the preliminary detection results of the human target region, image features of different scales are generated to construct a closed region of the complete outer contour boundary of the human body.
[0009] Furthermore, the model structure of the graph convolutional network with sparse constraints on topological connections includes: The topology node sensitivity calculation module is used to calculate the node sensitivity value based on the differences in node characteristics of the initial human body topology diagram. The topology connection filtering module is used to filter redundant topology connections with low sensitivity based on the node sensitivity value, so as to obtain a set of topology connections to be sparsified. The topology connection sparsification module is used to perform connection deletion operations on the set of topology connections to be sparsified, so as to obtain a sparsified topology connection structure. The node feature aggregation module is used to perform node feature aggregation processing based on the sparse topological connection structure to generate the dynamic topological feature map.
[0010] Furthermore, the topology node sensitivity calculation module includes: The node feature difference calculation unit is used to calculate the feature difference values between adjacent nodes in the initial human body topology diagram. The node sensitivity assessment unit is used to assess the sensitivity of each node based on the characteristic difference values of the nodes, and obtain the sensitivity assessment value of each node. A high-sensitivity node region selection unit is used to select nodes with higher sensitivity evaluation values from the sensitivity evaluation values to determine the high-sensitivity node region. The node sensitivity value output unit is used to output the node sensitivity values of regions containing highly sensitive nodes, which serve as the basis for the topology connection filtering module to filter redundant topology connections with lower sensitivity based on the node sensitivity values.
[0011] Furthermore, the topology connection sparsification module includes: The sparse candidate connection determination unit is used to determine redundant topological connection regions with low sensitivity based on the node sensitivity value, forming a set of connections to be sparsified. A connection region deletion unit is used to perform a connection deletion operation from the set of connections to be sparsified, and obtain the connection structure after deletion; The topology connection reconstruction unit is used to redetermine the connection relationship between nodes based on the deleted connection structure, and form a sparse connection structure after connection deletion. The sparse connection structure output unit is used to output the sparse connection structure as the basis for the node feature aggregation module to perform node feature aggregation processing based on the sparse connection structure and generate the dynamic topology feature map.
[0012] Furthermore, identifying potential regions of abnormal behavior includes: S103.1: Perform local motion trajectory feature extraction processing on each node in the dynamic topology feature graph to obtain the motion trajectory features of each node; S103.2: Based on the connection relationship of the human body topology, the motion trajectory features are aggregated and processed into multiple local motion pattern features; S103.3: Based on the difference between each local motion pattern feature and the pre-stored behavioral pattern features, identify local motion pattern features that differ significantly from normal behavior; S103.4: Mark the node regions corresponding to the identified local motion pattern features with large differences as potential regions of behavioral abnormalities, and output them as the basis for local structural reorganization based on potential regions.
[0013] Further, the local motion trajectory feature extraction process for each node in the dynamic topology feature graph includes: S103.1.1: Track the spatial position information of each node in the dynamic topology feature map along consecutive video frames to obtain the displacement sequence of the node in consecutive video frames; S103.1.2: Perform motion direction change processing on the displacement sequence to extract the motion direction change features of the nodes in consecutive video frames; S103.1.3: The node motion stability is evaluated based on the motion direction change characteristics to obtain the node motion stability characteristics; S103.1.4: Combine the displacement sequence and motion stability features of the node to output the motion trajectory features of the node, which can then be aggregated into multiple local motion mode features.
[0014] Furthermore, the generated behavioral anomaly analysis results include: S104.1: Construct a local topological neighborhood centered on key human body points within the potential region of the aforementioned behavioral abnormality; S104.2: Perform topological relationship reconstruction processing on the human body key points in the local topological neighborhood to obtain the local reconstructed topological structure; S104.3: Update the spatial location information of human body key points in the local region according to the local reconstruction topology to obtain the reconstructed human body key point location information; S104.4: Based on the recombined human body key point location information, perform abnormal behavior pattern matching processing and output the behavior anomaly analysis results.
[0015] The technical effects and advantages provided by the present invention in the above technical solution are as follows: This invention constructs an initial human topology map based on the spatial location information of key human points acquired in real-time video, and achieves dynamic reconstruction of the topology by combining topological connection sparsity constraints. This enables stable representation of the spatiotemporal correlation features of local motion patterns in human behavior, thereby improving the response capability to dynamic changes within complex behavioral structures and enabling early capture and localization of potential abnormal behaviors in real-time video. Furthermore, this invention also filters redundant connections in the topology based on node sensitivity values and performs sparsification processing, and aggregates and analyzes node motion trajectory features and motion stability features to achieve differentiated detection of key motion regions, thereby improving the accuracy of distinguishing abnormal regions under background interference and local occlusion conditions. Furthermore, this invention reconstructs the local topology within the potential area of behavioral abnormalities and updates the spatial location information of key human body points to achieve accurate restoration of the movement trend of abnormal areas, thereby improving the spatial consistency of abnormal pattern matching and enhancing the reliability of behavioral abnormality analysis results. This invention constructs a human key point map that integrates dynamic reconstruction of topological structure and analysis of local motion patterns, in order to achieve accurate identification and real-time positioning of abnormal behavior in police body camera videos. Attached Figure Description
[0016] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this invention. For those skilled in the art, other drawings can be obtained based on these drawings.
[0017] Figure 1 This is a block diagram of an AI video behavior real-time analysis system for a police law enforcement recorder according to the present invention. Detailed Implementation
[0018] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, they are provided to make the description of this application more complete and comprehensive, and to fully convey the concept of the exemplary embodiments to those skilled in the art. The drawings are merely illustrative illustrations of this application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted.
[0019] Furthermore, the described features, structures, or characteristics can be combined in any suitable manner in one or more exemplary embodiments. Numerous specific details are provided in the following description to give a full understanding of the exemplary embodiments disclosed in this application. However, those skilled in the art will recognize that the technical solutions disclosed in this application can be practiced with one or more specific details omitted, or other methods, components, steps, etc., can be employed. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring various aspects of the disclosure of this application.
[0020] Example 1
[0021] like Figure 1 As shown, this embodiment discloses an AI video behavior real-time analysis system for police law enforcement recorders, including: Topology construction module S101: used to collect spatial location information of key points of the human body based on real-time video, and to construct an initial human body topology diagram using the spatial location information; In a specific implementation, constructing the initial human body topology diagram includes: S101.1: Perform multi-scale convolution feature extraction processing on the current frame image of the real-time video to obtain image features at different scales; It should be noted that this embodiment constructs a parallel convolutional branch structure with multiple convolutional kernels of different sizes to achieve comprehensive extraction of multi-scale features of the current frame image of real-time video, thereby ensuring the robustness and precision of the features.
[0022] The step of performing multi-scale convolutional feature extraction on the current frame image of the real-time video includes: S101.1.1: By sliding the convolution kernel, feature extraction processing is performed on the current frame image to obtain multiple initial convolution feature maps with different resolutions; In practice, the multiple initial convolutional feature maps with different resolutions are obtained by selecting a fixed convolutional kernel size, specifically limited as follows: For example, this embodiment constructs three parallel convolutional branches, using convolutional kernels with kernel sizes of 3×3, 5×5, and 7×7 respectively, and specifies that: The 3×3 convolution kernel is a small-scale convolution kernel, used to extract fine-grained features with a small spatial range; The 5×5 convolution kernel is a medium-scale convolution kernel, used to capture medium-granularity features of local human body regions; The 7×7 convolution kernel is a large-scale convolution kernel used to obtain coarse-grained features of the global human body contour boundary.
[0023] Understandably, the specific choice of convolution kernel size is not arbitrary, but rather based on typical optimal size combinations determined by previous experimental analysis. The optimal size combinations include 3×3, 5×5, and 7×7. This combination can achieve optimal coverage and supplementation of information at different scales in actual video detection, thereby improving the accuracy of human key point extraction.
[0024] Furthermore, the specific convolution operation of each parallel convolution branch adopts standard two-dimensional convolution, represented as: In the formula: K is the convolution output feature value of the k-th branch at image position (i,j); K is the kernel size used by the current convolution branch. Preferably, the kernel size includes, but is not limited to, 3, 5, and 7. This represents the grayscale or RGB channel value of the corresponding pixel position in the current frame image. The weight parameters are the convolutional kernel corresponding to the k-th convolutional branch; This is the bias term for the k-th convolutional branch.
[0025] It should be understood that the above convolution operations are implemented using a standard CNN structure, and all convolution parameters are obtained through subsequent supervised training.
[0026] S101.1.2: Perform feature fusion processing on the initial convolutional feature map to obtain a fused multi-scale feature map; It should be noted that the feature fusion processing described in this embodiment is limited by channel-dimensional stacking and convolutional mapping. The specific steps are clearly disclosed as follows: First, the initial convolutional feature maps obtained from the three convolutional branches of 3×3, 5×5, and 7×7 are denoted as follows: Small-scale feature map ; Mesoscale feature map ; Large-scale feature maps ; Secondly, the three initial convolutional feature maps at different scales are concatenated along the channel dimension, resulting in: Here, Concat represents stacking along the feature channel dimension to form a fused feature map with expanded channel count.
[0027] Finally, to further reduce channel redundancy in the spliced feature maps and enhance the fusion of features at different scales, a 1×1 convolutional kernel is specified for channel-dimensional mapping and fusion. The specific convolutional fusion calculation formula is as follows: In the formula: For the fused feature map at position (i,j) of the first position... The characteristic values of each channel; This represents the total number of channels in the spliced feature map; These are the weight parameters used for channel mapping in a 1×1 convolution kernel. This is a bias term.
[0028] It should be understood that by using 1×1 convolution kernels for fusion operations, the compression and fusion of features at multiple scales can be effectively achieved, reducing parameter redundancy and further improving the computational efficiency and accuracy of subsequent human target detection.
[0029] S101.1.3: Perform human target region detection processing based on the fused multi-scale feature map to obtain preliminary detection results of the human target region; In a specific implementation, this embodiment uses a detection structure based on a Region Proposal Network (RPN) for human target detection: First, a sliding window of a fixed size (e.g., 3×3) is set on the fused feature map, and anchor boxes with different aspect ratios and scales are generated for each window position. Specifically, the scale of the anchor boxes is set to 32×32, 64×64 and 128×128 pixels, and the aspect ratios are 1:2, 1:1 and 2:1. Then, the network calculates the intersection-over-union (IoU) between each anchor box and the real human body region to determine the target category (human body or background). An IoU greater than 0.7 is considered a positive sample (human body target region), less than 0.3 is considered a negative sample (background region), and anchor boxes between 0.3 and 0.7 are excluded from training. Finally, redundant detection boxes are removed using the non-maximum suppression (NMS) method to obtain preliminary human target region detection results.
[0030] S101.1.4: Using the preliminary detection results of the human target region, image features of different scales are generated to construct a closed region of the complete outer contour boundary of the human body; In practice, the feature map of the initially detected human target region (such as a rectangular box) is cropped and fused accordingly to obtain the cropping features that accurately correspond to the human target region: For example, if the target area is a rectangle The features obtained by cropping are: The aforementioned cropping features are then used to accurately construct closed outer contour boundaries to avoid situations where local features are missing or incomplete during human body keypoint detection.
[0031] S101.2: After fusing the image features at different scales, a closed region containing the complete outer contour boundary of the human body is obtained; In the implementation, it should be noted that: in order to obtain a complete and closed outer contour region of the human body, this embodiment further refines the contour boundary of the cropping feature map obtained above. The specific method is detailed as follows: First, pixel-level edge detection is performed on the cropped feature map. Specifically, an improved Canny operator method is used, with the edge detection threshold parameters for the Canny operator defined as follows: Lower threshold Defined as 0.66 times the median of the pixel grayscale gradient values in the fused feature map; Upper limit threshold Defined as 1.33 times the median of the pixel grayscale gradient values in the fused feature map.
[0032] The above thresholds were determined by experimental statistics, which can ensure that the edge detection results are stable and the edges are closed.
[0033] Furthermore, edge closure processing is implemented to ensure that the outer contour is a closed area. The specific process is as follows: First, morphological closing operations are used to process the initially detected edges. The structuring element (kernel) is selected as a circular structure with a radius of 3 pixels. The specific formula is as follows: It should be noted that the morphological closing operation in this step is a closing operation, used to smooth the initially detected edges and enhance edge continuity; In the formula, This represents the initial edge detection result of the Canny operator, where S is a circular structuring element (kernel) with a diameter limited to 6 pixels; (symbol) , These represent expansion and corrosion operations, respectively.
[0034] Then, a contour filling algorithm is used to fill the closed boundary region to obtain a defined closed region image, which serves as the closed region of the complete outer contour boundary of the human body.
[0035] It should be noted that the contour filling algorithm can use the scanline filling algorithm; S101.3: Based on the complete outer contour boundary of the human body, the closed area is divided into several local areas corresponding to key points of the human body; Understandably, in order to unambiguously define each local region, this embodiment first defines a standard set of human body key points, which is limited to: There are 15 key points in total: top of the head, neck, center of the chest, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees, and left and right ankles.
[0036] Subsequently, the specific local area is divided through the following implementation steps: First, based on the aforementioned closed outer contour region, an affine transformation and mapping of a predefined standard human body template is performed according to the aspect ratio of the target region size: Affine transformation is used to scale, rotate, and translate a predefined standard human body template and map it onto the closed area to determine the approximate position of the standard human body key points in the actual video frame.
[0037] Next, using the transformed standard human body keypoint positions as the center, define the local region of each keypoint within a closed area: For example, the local area of each key point is defined as a square local area centered at the key point, with a side length equal to 5% of the average length and width of the human body template. The calculation method is as follows: In the formula, Let d be the coordinates of the p-th key point, and d be half the side length of the local region, which is defined as 2.5% of the average size of the human body template (i.e., half the length of 5% of the overall size).
[0038] S101.4: Extract the two-dimensional coordinates of key human body points for each local region, and use the two-dimensional coordinates to construct a complete initial human body topology map; It should be understood that this embodiment uses a key point localization method based on heatmaps, and the specific implementation steps are clearly defined as follows: For each local region, a two-dimensional keypoint confidence heatmap is generated using a separate small-scale convolutional network, with the following constraints: The small-scale convolutional network specifically includes two consecutive convolutional layers and one sigmoid activation layer, wherein the convolutional kernel size is 3×3 and the number of output channels is 1; The output heatmap pixel values are between 0 and 1, representing the probability of the existence of key points.
[0039] The initial 2D coordinates of the keypoints are determined by using the location of the maximum pixel value in the heatmap. Specific examples include local heatmaps. Above, key point coordinates The calculation is as follows: .
[0040] Using the initial two-dimensional coordinates of key points, construct an initial human body topology diagram: The initial human body topology diagram is defined as a graph structure containing the 15 key node points, and the connection relationship between the nodes is determined based on the human skeletal structure. Limit the node connection relationships, for example: the top of the head node is only connected to the neck; the neck is connected to the top of the head, chest, left and right shoulders, and so on; The initial topological graph is represented by the following mathematical set: in, Let E represent the i-th key node, and let E represent the set of skeletal topological connecting edges.
[0041] Topology Reconstruction Module S102: Used to dynamically reconstruct the topology of the initial human body topology map using a graph convolutional network that incorporates sparse constraints on topological connections, thereby obtaining a dynamic topology feature map; In a specific implementation, the model structure of the graph convolutional network that integrates topological connectivity sparse constraints includes: The topology node sensitivity calculation module is used to calculate the node sensitivity value based on the differences in node characteristics of the initial human body topology diagram. In a specific implementation, the function of the topology node sensitivity calculation module is to calculate the sensitivity value of each node in the initial human body topology graph. The sensitivity of a node is defined as the degree to which its features respond to changes in human behavioral patterns, and is used to determine whether to retain or delete topology connections.
[0042] The topology node sensitivity calculation module includes: The node feature difference calculation unit is used to calculate the feature difference values between adjacent nodes in the initial human body topology diagram. It should be noted that: In this embodiment, the node feature difference calculation unit is specifically used to calculate the feature difference value between each node and its neighboring nodes in the initial human body topology diagram. The specific calculation process is clearly defined as follows: First, the feature vector of each node is defined as the spatial coordinates of the human keypoint corresponding to the node at the current moment and its displacement vector between adjacent frames. For example, if the node... The spatial coordinates are represented as follows: in, This represents the difference between the horizontal and vertical coordinates of the displacement vector of the node position in the current frame and the previous frame.
[0043] Secondly, the specific method for calculating node feature differences is defined as Euclidean distance, i.e., node... With neighboring nodes The characteristic difference value between them is calculated as follows: It should be understood that the magnitude of the node feature difference value directly reflects the similarity of behavioral features between nodes. The larger the difference value, the more significant the difference in behavioral patterns between nodes, and the smaller the difference value, the more similar the behavioral patterns.
[0044] The node sensitivity assessment unit is used to assess the sensitivity of each node based on the characteristic difference values of the nodes, and obtain the sensitivity assessment value of each node. Furthermore, the node sensitivity assessment unit is specified for converting node feature difference values into node sensitivity values. In a specific implementation, this embodiment limits the use of the average feature difference values between a node and all its neighboring nodes as the node's sensitivity assessment value. The calculation formula is disclosed as follows: in, Represents a node The set of all adjacent nodes, This represents the number of nodes in the set.
[0045] It should be noted that the above sensitivity assessment value calculation method can effectively reflect the overall sensitivity of each node to changes in neighborhood characteristics, providing an accurate basis for subsequent topology sparsification.
[0046] A high-sensitivity node region selection unit is used to select nodes with higher sensitivity evaluation values from the sensitivity evaluation values to determine the high-sensitivity node region. In a specific implementation, the function of the high-sensitivity node region selection unit is defined as selecting nodes with higher sensitivity based on node sensitivity assessment values to determine the high-sensitivity node region. The specific node selection method and judgment threshold are defined as follows: First, sort all node sensitivity evaluation values in ascending order, and define the sorted sensitivity evaluation value sequence as follows: For example, this embodiment defines a sensitivity threshold. The calculation method is as follows: ; That is, the threshold is set as the sensitivity value corresponding to the 75th quantile of the node sensitivity evaluation value after sorting, and this value is used as the boundary for judging the level of node sensitivity: If the node sensitivity assessment value is greater than or equal to If so, it is identified as a highly sensitive node; Otherwise, it is a low-sensitivity node.
[0047] It should be noted that the specific threshold selection method mentioned above was determined through a large number of experiments, effectively distinguishing between nodes with high and low sensitivity.
[0048] The node sensitivity value output unit is used to output the node sensitivity value of the region containing high-sensitivity nodes, which serves as the basis for the topology connection filtering module to filter redundant topology connections with lower sensitivity based on the node sensitivity value. It is understood that the node sensitivity value output unit in this embodiment is used to output the sensitivity value of each node and the corresponding high-sensitivity node region, so as to provide the sensitivity value basis for the subsequent topology connection screening module.
[0049] In practice, the output data structure is limited to a set containing node numbers and corresponding sensitivity values: Here, V represents the set of all nodes in the initial human body topology diagram. The above data structure is used as input for subsequent modules to ensure the integrity and accuracy of the information transmission process.
[0050] The topology connection filtering module is used to filter redundant topology connections with low sensitivity based on the node sensitivity value, so as to obtain a set of topology connections to be sparsified. In a specific implementation, the topology connection filtering module is limited to filtering redundant topology connections with low sensitivity based on node sensitivity values, forming a set of connections to be sparsified. The specific filtering criteria, process, and limitations are as follows: It should be noted that: This embodiment specifically specifies that the selection criterion for topology connections is the sum of the sensitivity values of the two endpoint nodes of the connection. First, for any connection edge in the initial topological connection set E Calculate the sensitivity weights of the edges: Then, the edge sensitivity weight threshold is specified. Calculation method and limiting standards: For example, after sorting the sensitivity weights of all connected edges from smallest to largest, the sensitivity values that are in the top 30% quantile are selected as the threshold. ; Limit edge sensitivity weights to less than a threshold The connections are identified as redundant topological connections with low sensitivity, forming a set of connections to be sparsified. .
[0051] The topology connection sparsification module is used to perform connection deletion operations on the set of topology connections to be sparsified, so as to obtain a sparsified topology connection structure. In a specific implementation, the topology connection sparsification module described in this embodiment performs a connection deletion operation on the set of connections to be sparsified obtained by the aforementioned topology connection filtering module to obtain a sparsified topology connection structure, thereby achieving dynamic optimization of the human body topology diagram.
[0052] The topology connection sparsification module includes: The sparse candidate connection determination unit is used to determine redundant topological connection regions with low sensitivity based on the node sensitivity value, forming a set of connections to be sparsified. In specific implementation, this embodiment limits the function of the sparse candidate connection determination unit to further determine the candidate connection regions that need to be sparsified based on the set of connections to be sparsified output by the topology connection filtering module.
[0053] The specific rules for determining the final sparsely connected region are further defined as follows: For the set of connections to be sparsified The sensitivity of the nodes associated with each connecting edge is re-verified; The specification stipulates that connections with extremely low sensitivity (exemplarily defined as edges with sensitivity weights less than the threshold) are allowed. (0.5 times) are forcibly included in the set of candidate sparse connections. This is to ensure the effective removal of redundant connections.
[0054] For example, the specific method for calculating the final set of sparse candidate connections in this embodiment is as follows: It should be noted that the above-mentioned specific review and threshold limiting process can ensure that the selection of sparse connection regions is more accurate, avoid the erroneous deletion of critical connections with high sensitivity, and thus achieve effective sparse optimization.
[0055] A connection region deletion unit is used to perform a connection deletion operation from the set of connections to be sparsified, and obtain the connection structure after deletion; It should be understood that the connection region deletion unit is used to remove connections from the aforementioned determined set of candidate sparse connections. The specific connection deletion operation is performed to generate a defined post-deletion connection structure.
[0056] In specific implementation, this embodiment specifies that the connection deletion method is to directly start from the initial human body topology diagram. Remove the above candidate connections to be sparse from the connection set E. All connecting edges in: Obtain the set of connection structures after deletion .
[0057] The topology connection reconstruction unit is used to redetermine the connection relationship between nodes based on the deleted connection structure, and form a sparse connection structure after connection deletion. In a specific implementation, the topology connection reconstruction unit is defined to reconstruct the topology based on the set of deleted connection structures. The connection relationships between nodes are redefined to obtain a sparsified connection structure.
[0058] It should be noted that: the specific implementation process of the topology connection reconfiguration unit in this embodiment is as follows: First, the neighboring nodes of the nodes after the connection is deleted are updated and reconstructed. The neighborhood set of the reconstructed nodes is defined as follows: in, Specific node representation The set of adjacent nodes in the sparsified connection structure; Secondly, through data structure operations, the sparse topological connection graph structure is reconstructed with new connection relationships to obtain the defined sparse topological connection structure: .
[0059] A sparse connection structure output unit is used to output the sparse connection structure as the basis for the node feature aggregation module to perform node feature aggregation processing based on the sparse connection structure and generate the dynamic topology feature map. Furthermore, in this embodiment, the function of the sparse connection structure output unit is defined as outputting the sparsed topology connection structure. This serves as the basis for the node feature aggregation module to perform feature aggregation processing.
[0060] In specific implementation, the data format of the output sparse topological connection structure is limited to an adjacency matrix representation, that is, defined as a sparse adjacency matrix. ; For example, the elements of a sparse adjacency matrix are defined as follows: It should be noted that the sparse connection structure represented by the adjacency matrix ensures a clear and concise output, which is convenient for subsequent node feature aggregation processing.
[0061] The node feature aggregation module is used to perform node feature aggregation processing based on the sparsed topological connection structure to generate the dynamic topological feature map. In a specific implementation, the node feature aggregation module described in this embodiment is limited to the function of outputting a sparse topology connection structure. Perform node feature aggregation processing to generate a dynamic topology feature map.
[0062] It should be noted that this embodiment limits node feature aggregation to the standard convolution operation of Graph Convolutional Networks (GCN). Specifically, the aggregation process is defined as follows: First, the node feature matrix is: in, For nodes The feature vector, where n is the total number of nodes.
[0063] Secondly, the formula for calculating the convolution aggregation of a single-order graph is defined as follows: in, It is limited to an adjacency matrix with self-connections; The angle matrix is defined as follows: W and b are the convolution weights and bias parameters to be trained; 𝜎 is limited to the activation function, such as the ReLU function.
[0064] The feature matrix generated after node feature aggregation processing through the graph convolutional network is This is limited to the dynamic topology feature map output in this embodiment.
[0065] It should be further explained that the parameters W, b, and other weights of the above-mentioned node feature aggregation network are obtained through subsequent supervised training to ensure that the accuracy of feature aggregation and the quality of aggregated features meet the requirements of real-time abnormal behavior detection and analysis.
[0066] Feature extraction module S103: used to extract local motion pattern features of human behavior based on the dynamic topological feature map, and use the local motion pattern features to determine potential areas of abnormal behavior; In this specific implementation, it should be noted that the dynamic topological feature map is limited to analyzing local motion pattern features of human behavior in order to accurately identify potential areas of abnormal human behavior. In a specific implementation, determining the potential area of abnormal behavior includes: S103.1: Perform local motion trajectory feature extraction processing on each node in the dynamic topology feature graph to obtain the motion trajectory features of each node; The step of extracting local motion trajectory features from each node in the dynamic topology feature graph includes: S103.1.1: Track the spatial position information of each node in the dynamic topology feature map along consecutive video frames to obtain the displacement sequence of the node in consecutive video frames; Specifically, this embodiment limits the node tracking method to optical flow (such as the Lucas-Kanade method), and the specific steps are as follows: First, the spatial coordinates of the node in the current frame of the dynamic topology feature map are used as the initial positions of the feature points; Secondly, in the next frame, a search window of 15×15 pixels is defined with the feature point as the center. The position transformation is estimated by the least squares method, and the position displacement vector of the node from the current frame to the next frame is calculated. Finally, the above steps are repeated sequentially for several consecutive frames of images to obtain the displacement sequence of the nodes in the time dimension. Specifically, the node displacement sequence is represented as follows: .
[0067] S103.1.2: Perform motion direction change processing on the displacement sequence to extract the motion direction change features of the nodes in consecutive video frames; In specific implementation, this embodiment limits the calculation method to using the change in the angle between displacement vectors as the characteristic of the change in motion direction: First, the displacement vectors at two adjacent moments in the displacement sequence are normalized, and the characteristics of the change in motion direction are defined. The specific calculation formula is as follows: in, Represents a node The displacement vector from time t to time t+1, arccos(⋅) is the inverse cosine function.
[0068] S103.1.3: The node motion stability is evaluated based on the motion direction change characteristics to obtain the node motion stability characteristics; In this specific implementation, the calculation method for the node motion stability characteristics is limited to the variance value of the node motion direction change characteristics: The formula for calculating the constrained stability characteristic is: Where T is the number of continuously tracked video frames. For nodes The average value of the change characteristics of the direction of motion.
[0069] It should be noted that the above motion stability characteristic values represent the smoothness of the change in the node's motion direction. Larger values indicate drastic changes in the node's motion trajectory, while smaller values indicate smoother node motion.
[0070] S103.1.4: Combine the displacement sequence and motion stability features of the node to output the motion trajectory features of the node, which can be used to aggregate the motion trajectory features into multiple local motion mode features in the subsequent process. Furthermore, the motion trajectory features of the nodes in this embodiment are specifically represented as a combination of the average displacement vector of the node displacement sequence and the motion stability features, as exemplified below: in, These represent the average displacements of the nodal displacement sequences in the horizontal and vertical directions, respectively.
[0071] S103.2: Based on the connection relationship of the human body topology, the motion trajectory features are aggregated and processed into multiple local motion pattern features; in addition, the components can be optionally normalized to balance the influence of dimensions. In specific implementation, this embodiment limits the local motion pattern feature aggregation process to average aggregation of node features based on the neighborhood relationships of nodes in the dynamic topology: The specific aggregate formula is as follows: in, For nodes The set of adjacent nodes in a dynamic topological feature graph.
[0072] S103.3: Based on the difference between each local motion pattern feature and the pre-stored behavioral pattern features, identify local motion pattern features that differ significantly from normal behavior; This embodiment specifically defines the calculation of the difference between local motion pattern features and pre-stored normal behavior pattern features using the Euclidean distance method: The formula for calculating the difference is:
[0073] in, Nodes in the pre-stored normal behavior pattern features The corresponding features.
[0074] Further limit the difference threshold After statistical analysis of a large amount of experimental data, it was determined to be twice the average difference in characteristics of normal behavioral patterns.
[0075] Limited when (greater than) When this happens, the local motion pattern corresponding to the node is identified as abnormal.
[0076] S103.4: Mark the node regions corresponding to the identified local motion pattern features with large differences as potential regions of behavioral abnormalities, and output them as the basis for local structural reorganization based on potential regions; This embodiment limits the output to a set of regions with abnormal behavior: The aforementioned set serves as the input for subsequent local structural reorganization, enabling further confirmation analysis of behavioral anomalies and output of results.
[0077] Anomaly analysis module S104: used to perform local structural reconstruction of the spatial location information of human body key points based on the potential region, and to generate behavioral anomaly analysis results using the spatial location information of human body key points after local structural reconstruction. In a specific implementation, the generation of abnormal behavior analysis results includes: S104.1: Construct a local topological neighborhood centered on key human body points within the potential region of the aforementioned behavioral abnormality; In specific implementation, the method for constructing the local topological neighborhood is defined as follows: First, for each key node within a potential region marked as exhibiting abnormal behavior... This node is taken as the center node of the topological neighborhood; Secondly, the topology expansion of local neighborhoods is defined based on the node topology connection relationship. In this embodiment, the topology expansion depth is limited to only one level of adjacent nodes, that is, nodes directly connected to the central node: In the formula, This represents a defined sparse topological connection structure.
[0078] It should be noted that the topology expansion depth is limited to first-level nodes to ensure the accuracy and compactness of the local topology structure in abnormal areas and to avoid introducing irrelevant nodes due to over-expansion.
[0079] S104.2: Perform topological relationship reconstruction processing on the human body key points in the local topological neighborhood to obtain the local reconstructed topological structure; In practical implementation, the specific process of local topological relationship reconstruction is defined, including the following sub-steps: S104.2.1: Within the local topological neighborhood, the local connectivity relationship is initially updated based on the spatial distance between key points to form a preliminary local connectivity relationship; In a specific implementation, the spatial distance is specifically limited to calculation using two-dimensional Euclidean distance, and the calculation formula is as follows: in, These are any two key nodes within the local topological neighborhood. and The current coordinates of the position; Furthermore, the threshold for determining the spatial distance between nodes is: in, Indicates the width of the currently enclosed outer contour region; It should be noted that if the spatial distance between nodes is less than the above threshold and no connection has been established yet, a connection will be established between the node pairs to form a preliminary connection set. . S104.2.2: Based on the relative motion trend between key nodes, adjust the preliminary local connection relationship to generate an updated local connection relationship; Specifically, in this embodiment, the calculation of relative motion trend is defined as the displacement vector of the node from the previous frame to the current frame, that is: To further refine the calculation of the similarity of movement trends between nodes, cosine similarity is used, with the specific formula as follows: For example, the similarity threshold is set to 0.5; if the similarity between nodes reaches the threshold... If the similarity satisfies the condition, then either retain or add a new connection; if the similarity satisfies the condition... If the corresponding connection is not found, then delete the connection, thus obtaining the updated set of local joins. .
[0080] S104.2.3: Remove redundant or invalid connection edges based on the updated local connectivity to obtain a simplified local topology; In practical implementation, the criteria for determining redundant connection edges are as follows: For closed loop structures formed within a local topological neighborhood (such as three or more nodes completely connected), the longest connecting edge in the loop is deleted to eliminate redundant structures. Perform connectivity analysis on the local topology and stipulate the deletion of invalid connecting edges that cause the topology to be disconnected or isolated nodes, so as to ensure the effectiveness and integrity of the local topology. Finally, a simplified set of local topological connections is obtained. .
[0081] S104.2.4: Based on the simplified local topology, output the local reconstructed topology to update the spatial location information of key points on the human body; The locally reconstructed topology in this embodiment is represented as follows: Output in the form of an adjacency matrix or adjacency list for subsequent updates of human body keypoint positions.
[0082] S104.3: Update the spatial location information of human body key points in the local region according to the local reconstruction topology to obtain the reconstructed human body key point location information; In practice, the method for updating the key point positions is defined as follows: The updated position is obtained by taking the weighted average of the original position of each key node and the positions of its topological neighbors, i.e.: The above-mentioned update method effectively improves the accuracy and stability of key point location information in local areas, more realistically reflects the trend of local structural changes, and enhances the accuracy of subsequent abnormal behavior identification.
[0083] S104.4: Based on the recombined human body key point location information, perform abnormal behavior pattern matching processing and output the abnormal behavior analysis results; In practice, the method for matching abnormal behavior patterns is limited to: The reconstructed keypoint location feature vector is as follows: Compared with the pattern vectors in the pre-stored abnormal behavior pattern feature library Perform cosine similarity calculation: Represented as: Specifically, the matching threshold is limited to 0.7, that is: When the similarity satisfies When the time is right, the corresponding abnormal behavior pattern k is output as the final abnormal behavior analysis result; If no pattern meets the threshold condition, the output is an unknown abnormal behavior pattern.
[0084] The above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters, weights, and thresholds in the formulas are set by those skilled in the art according to the actual situation.
[0085] The foregoing has only described certain exemplary embodiments of the present invention by way of illustration. Undoubtedly, those skilled in the art can modify the described embodiments in various ways without departing from the spirit and scope of the present invention. Therefore, the foregoing drawings and descriptions are illustrative in nature and should not be construed as limiting the scope of protection of the claims of the present invention.
Claims
1. An AI video behavior real-time analysis system for police law enforcement recorders, characterized in that, The system includes: Topology construction module S101: used to collect spatial location information of key points of the human body based on real-time video, and to construct an initial human body topology diagram using the spatial location information; Topology Reconstruction Module S102: Used to dynamically reconstruct the topology of the initial human body topology map using a graph convolutional network that incorporates sparse constraints on topological connections, thereby obtaining a dynamic topology feature map; Feature extraction module S103: used to extract local motion pattern features of human behavior based on the dynamic topological feature map, and use the local motion pattern features to determine potential areas of abnormal behavior; Anomaly analysis module S104: used to perform local structural reconstruction of the spatial location information of human body key points based on the potential region, and to generate behavioral anomaly analysis results using the spatial location information of human body key points after local structural reconstruction.
2. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 1, characterized in that, The construction of the initial human body topology diagram includes: S101.1: Perform multi-scale convolution feature extraction processing on the current frame image of the real-time video to obtain image features at different scales; S101.2: After fusing the image features at different scales, a closed region containing the complete outer contour boundary of the human body is obtained; S101.3: Based on the complete outer contour boundary of the human body, the closed area is divided into several local areas corresponding to key points of the human body; S101.4: Extract the two-dimensional coordinates of key human body points for each local region, and use the two-dimensional coordinates to construct a complete initial human body topology diagram.
3. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 2, characterized in that, The multi-scale convolutional feature extraction process for the current frame image of the real-time video includes: S101.1.1: By sliding the convolution kernel, feature extraction processing is performed on the current frame image to obtain multiple initial convolution feature maps with different resolutions; S101.1.2: Perform feature fusion processing on the initial convolutional feature map to obtain a fused multi-scale feature map; S101.1.3: Perform human target region detection processing based on the fused multi-scale feature map to obtain preliminary detection results of the human target region; S101.1.4: Using the preliminary detection results of the human target region, image features of different scales are generated to construct a closed region of the complete outer contour boundary of the human body.
4. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 3, characterized in that, The model structure of the graph convolutional network with sparse constraints on topological connections includes: The topology node sensitivity calculation module is used to calculate the node sensitivity value based on the differences in node characteristics of the initial human body topology diagram. The topology connection filtering module is used to filter redundant topology connections with low sensitivity based on the node sensitivity value, so as to obtain a set of topology connections to be sparsified. The topology connection sparsification module is used to perform a connection deletion operation on the set of topology connections to be sparsified, so as to obtain a sparsified topology connection structure. The node feature aggregation module is used to perform node feature aggregation processing based on the sparse topological connection structure to generate the dynamic topological feature map.
5. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 4, characterized in that, The topology node sensitivity calculation module includes: The node feature difference calculation unit is used to calculate the feature difference values between adjacent nodes in the initial human body topology diagram. The node sensitivity assessment unit is used to assess the sensitivity of each node based on the characteristic difference values of the nodes, and obtain the sensitivity assessment value of each node. A high-sensitivity node region selection unit is used to select nodes with higher sensitivity evaluation values from the sensitivity evaluation values to determine the high-sensitivity node region. The node sensitivity value output unit is used to output the node sensitivity values of regions containing highly sensitive nodes, which serve as the basis for the topology connection filtering module to filter redundant topology connections with lower sensitivity based on the node sensitivity values.
6. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 5, characterized in that, The topology connection sparsification module includes: The sparse candidate connection determination unit is used to determine redundant topological connection regions with low sensitivity based on the node sensitivity value, forming a set of connections to be sparsified. A connection region deletion unit is used to perform a connection deletion operation from the set of connections to be sparsified, and obtain the connection structure after deletion; The topology connection reconstruction unit is used to redetermine the connection relationship between nodes based on the deleted connection structure, and form a sparse connection structure after connection deletion. The sparse connection structure output unit is used to output the sparse connection structure as the basis for the node feature aggregation module to perform node feature aggregation processing based on the sparse connection structure and generate the dynamic topology feature map.
7. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 6, characterized in that, The determination of potential regions of abnormal behavior includes: S103.1: Perform local motion trajectory feature extraction processing on each node in the dynamic topology feature graph to obtain the motion trajectory features of each node; S103.2: Based on the connection relationship of the human body topology, the motion trajectory features are aggregated and processed into multiple local motion pattern features; S103.3: Based on the difference between each local motion pattern feature and the pre-stored behavioral pattern features, identify local motion pattern features that differ significantly from normal behavior; S103.4: Mark the node regions corresponding to the identified local motion pattern features with large differences as potential regions of behavioral abnormalities, and output them as the basis for local structural reorganization based on potential regions.
8. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 7, characterized in that, The process of extracting local motion trajectory features from each node in the dynamic topology feature graph includes: S103.1.1: Track the spatial position information of each node in the dynamic topology feature map along consecutive video frames to obtain the displacement sequence of the node in consecutive video frames; S103.1.2: Perform motion direction change processing on the displacement sequence to extract the motion direction change features of the nodes in consecutive video frames; S103.1.3: The node motion stability is evaluated based on the motion direction change characteristics to obtain the node motion stability characteristics; S103.1.4: Combine the displacement sequence and motion stability features of the node to output the motion trajectory features of the node, which can then be aggregated into multiple local motion mode features.
9. The AI video behavior real-time analysis system for police law enforcement recorders according to claim 8, characterized in that, The generated behavioral anomaly analysis results include: S104.1: Construct a local topological neighborhood centered on key human body points within the potential region of the aforementioned behavioral abnormality; S104.2: Perform topological relationship reconstruction processing on the human body key points in the local topological neighborhood to obtain the local reconstructed topological structure; S104.3: Update the spatial location information of human body key points in the local region according to the local reconstruction topology to obtain the reconstructed human body key point location information; S104.4: Based on the recombined human body key point location information, perform abnormal behavior pattern matching processing and output the behavior anomaly analysis results.