An AI video analysis and AR fusion-based visual global security management system
The visualized full-domain security management system, which integrates AI video analytics and AR, solves the problems of low data processing efficiency and limited display of abnormal information in building security management systems. It achieves efficient data collection, accurate anomaly identification, and collaborative device linkage, thus meeting the security management needs of modern buildings.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI WANYU INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-12
AI Technical Summary
Existing building security management systems are inadequate in terms of data management, visualization, and coordinated control, failing to meet the needs of modern buildings for comprehensive, precise, and efficient security management, especially in terms of low data processing efficiency, limited display of abnormal information, and inefficient risk assessment.
The visualized full-domain security management system, which integrates AI video analysis and AR, acquires multi-source video stream data through the data acquisition unit, performs full-domain partitioning and feature extraction, and combines adaptive quantization coding and cross-modal feature indexing to realize spatiotemporal mapping and visual plotting of abnormal data. It also uses the graph-linked collaborative intelligent driving model to conduct risk assessment and equipment collaborative scheduling.
It achieves efficient and accurate data collection and processing, improves the efficiency of identifying and analyzing abnormal events, ensures an intuitive presentation of the security situation and coordinated response of equipment, and meets the needs of modern buildings for comprehensive, accurate and efficient security management.
Smart Images

Figure CN122200543A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent security technology, specifically relating to a visualized, full-domain security management system based on the fusion of AI video analysis and AR. Background Technology
[0002] The current visualized, full-domain intelligent security management system still has the following areas for improvement: As buildings become larger, more complex, and more intelligent, people's requirements for the security, timeliness, and precision of building security are constantly increasing. The technical architecture and application model of traditional building security management systems are no longer suitable for the needs of comprehensive security management in modern buildings, and have exposed many technical defects and application shortcomings in practical applications.
[0003] Security data management lacks a standardized and structured processing system, resulting in low efficiency in data retrieval and correlation analysis. Various types of security data, such as video stream data and sensor data from different security areas within a building, are not stored in a targeted regional classification manner. Multi-format video stream data exhibits poor compatibility and adaptability, and unstructured and structured data are stored together. Furthermore, a unified data index and retrieval mechanism has not been established. When security anomalies occur, managers cannot quickly locate relevant data in the abnormal areas, and it is difficult to perform correlation analysis between security data and video stream data, significantly reducing the efficiency of tracing and judging security incidents.
[0004] The display of security anomaly information is limited in form and has low visualization, making it difficult for managers to intuitively grasp the overall security situation of the building. Traditional security anomaly alarms are mostly text prompts, audible and visual alarms, or pop-ups on single monitoring screens. They cannot integrate key information such as the type of abnormal behavior, risk level, and spatiotemporal location with the building's physical space model. Managers cannot intuitively and quickly grasp the specific location of abnormal events and the overall security situation from a global perspective. When faced with concurrent anomalies in multiple areas, it is difficult to achieve efficient overall assessment and dispatching.
[0005] In summary, existing building security management systems have significant shortcomings in intelligent analysis, standardized data management, visual display, and coordinated control, and can no longer meet the needs of modern buildings for comprehensive, accurate, and efficient security management. There is an urgent need to develop a building security management system that integrates AI video analysis and AR visualization technologies to solve the pain points of existing technologies. Summary of the Invention
[0006] To address the aforementioned problems in the existing technology, this invention provides a visualized, omni-channel security management system based on the fusion of AI video analysis and AR. The objective of this invention can be achieved through the following technical solutions: a data acquisition unit, a data processing unit, a plotting unit, and a security control unit; The data acquisition unit is used to acquire multi-source video stream data from building security scene devices; The data processing unit performs full-domain partitioning and screening on the multi-source video stream data to obtain a security dataset; extracts basic features of personnel and physical scenes, and performs adaptive quantization encoding on the basic features; binds the security index of cross-modal features through a unique identifier coding mechanism to obtain a multi-dimensional decoupled action set of personnel violation identifiers and a feature set of physical scene violations. The plotting unit extracts the spatiotemporal features of the abnormal data based on the action set and feature set; it performs global mapping on the spatial points of the abnormal data to generate abnormal spatiotemporal points; it selects a reference identifier through feature point topology matching, performs spatiotemporal registration on the coordinates of the building AR real-scene security base, and simultaneously performs visual plotting on the abnormal data. The security control unit uses the real-scene visualization plotting results and the graph-linked collaborative intelligent driving model to conduct anomaly risk assessment of the entire building area, and generates collaborative scheduling instructions for building zone security equipment based on the assessment results.
[0007] Specifically, the acquisition of multi-source video stream data of building security scene equipment is as follows: based on the deployment topology map of building security equipment, a multi-protocol compatible acquisition mechanism is used to acquire the original heterogeneous video stream data of monitoring cameras and sensor terminals; The format is unified by an adaptive decoding conversion matrix to obtain homogenized video stream data; A multi-source video stream dataset is obtained by using a spatiotemporal frame deduplication algorithm and electromagnetic noise filtering.
[0008] Specifically, the process of performing full-domain partitioning and screening includes: extracting device spatial identifiers and regional division features based on the multi-source video stream dataset to obtain a collection of regional association labels; The intelligent partitioning mapping algorithm is used to perform a full-domain partitioning and classification operation on the video stream data; Based on the partitioning and classification results, the data is stored in a structured manner using spatiotemporal indexing to obtain a smart storage set of security data categorized by domain.
[0009] Specifically, the process of extracting the basic features of personnel and physical scenes includes: extracting the personnel's body contours, posture key point features, and the spatial topology and visual texture features of the physical scene; By optimizing data quality through a feature dimension calibration mechanism, a feature map is obtained. Based on feature maps, spatiotemporal synchronization is achieved through cross-modal feature alignment to obtain personnel-scene collaborative feature maps.
[0010] Specifically, the process of adaptive quantization encoding includes: using a differentiated quantization encoding matrix based on the characteristics of the corresponding data type to obtain an encoding feature set; The data dimensions of the encoded feature set are compressed to obtain a standard encoded feature dataset.
[0011] Specifically, the process of binding the security index of cross-modal features through the unique identifier coding mechanism includes: generating a unique identifier for the feature data based on the standard coded feature dataset; A cross-modal feature association algorithm is used to establish a mapping relationship between personnel's unruly actions and physical scene violation features; Based on feature mapping relationships, a multi-level security feature index map is constructed to obtain a multi-dimensional decoupled set of personnel violation identification actions and a set of physical scene violation features.
[0012] Specifically, the process of extracting the spatiotemporal features of the abnormal data includes: detecting abnormal data through an abnormal feature detection mechanism based on the set of personnel violation identification actions and the set of physical scene violation features; Extract the relevant parameters from the abnormal data to obtain the original spatiotemporal feature set; Parameter calibration is performed using a spatiotemporal feature fusion algorithm to obtain a set of spatiotemporal feature parameters for abnormal data.
[0013] Specifically, the process of mapping the spatial locations of abnormal data globally includes: mapping the local coordinates of the device to the global coordinate system of the building through device coordinate transformation; The coordinate deviation is corrected by spatial point calibration to obtain the abnormal spatial points; Based on the abnormal spatial locations and the duration of the abnormality, a global mapping is performed on the spatiotemporally related abnormal locations.
[0014] Specifically, the process of performing spatiotemporal registration of the coordinate system of the building AR real-scene security base includes: extracting the coordinates of feature points of the building's fixed signage; A mapping relationship between anomaly points and AR base coordinates is constructed using a topology matching algorithm; The spatiotemporal dynamic calibration engine corrects the coordinate deviation of the AR base and accurately registers abnormal points with the AR scene.
[0015] Specifically, the process of visualizing and plotting abnormal behavior events includes: transforming abnormal features into visual identifiers based on the spatiotemporal registration results through an anomaly type mapping mechanism; The AR overlay rendering engine is used to plot the signage onto the corresponding position on the AR real-world security base. By adjusting the label hierarchy and layout through an anti-occlusion intelligent optimization algorithm, real-scene visualization plotting data is obtained.
[0016] Specifically, the process of using the graph-linked collaborative intelligent driving model to assess the abnormal risks of the entire building includes: extracting a risk assessment parameter set based on real-scene visualized plotting data; By utilizing the MapLink collaborative intelligent driving model and integrating historical risk data with real-time scenario information, a quantitative risk assessment is conducted. Based on the quantitative assessment results, the judgment results are calibrated through a risk level dynamic correction engine to obtain the results of the abnormal risk assessment of the entire building.
[0017] Specifically, the process of generating collaborative scheduling instructions for building zone security equipment includes: based on the building anomaly risk determination results, extracting risk level and anomaly area information through risk area analysis; By leveraging a comprehensive security management rule matching mechanism and scenario-based command generation, intelligent adaptation of management strategies to abnormal scenarios is achieved. Based on the adaptation results, the structured encoding and grouping of the execution instructions are integrated to obtain the linkage control instruction set for building area access control.
[0018] The beneficial effects of this invention are as follows: Through multi-dimensional technological innovation, this invention constructs an efficient and accurate visualized full-domain security management system, with its core advantage lying in the standardized upgrade of data acquisition and processing. In the data acquisition stage, a multi-protocol compatibility mechanism is adopted to overcome barriers between equipment manufacturers and models. Combined with adaptive decoding and conversion, it achieves unified processing of heterogeneous video streams. Furthermore, it undergoes dual protection through spatiotemporal frame deduplication and electromagnetic noise filtering, providing a high-fidelity data source for subsequent analysis. In the data processing stage, full-domain partitioning and screening establish a precise binding between video streams and physical spaces. Utilizing multi-level feature extraction and cross-modal spatiotemporal synchronization, it forms a feature system with deep correlation between personnel and scenes. Then, through differentiated quantization encoding and dimensional compression, it achieves efficient storage and retrieval of feature data, solving the pain points of traditional systems such as chaotic data management, poor compatibility, and inaccurate feature extraction.
[0019] In terms of anomaly visualization and precise registration, this invention achieves intuitive presentation of security situation and precise spatial positioning. Through anomaly spatiotemporal feature extraction and global mapping, scattered anomaly locations are transformed into spatiotemporal trajectories in the building's global coordinate system. Combined with topological matching and dynamic calibration of fixed building marker feature points, high-precision registration between anomaly locations and AR real-world bases is achieved. In the visualization and plotting stage, the visual transformation of anomaly types and risk levels, coupled with AR overlay rendering and anti-occlusion optimization, allows managers to intuitively grasp key information such as the type, location, and risk level of anomaly events from a global perspective. This completely changes the traditional system's single alarm format and weak global situational awareness, significantly improving the efficiency of anomaly event identification and analysis.
[0020] In terms of intelligent decision-making and coordinated control, this invention constructs a closed-loop system from risk assessment to command execution. Relying on a graph-linked collaborative intelligent driving model that integrates historical data and real-time scenario information, it achieves precise quantitative assessment of abnormal risks through multi-dimensional parameter extraction and a dynamic correction engine. Based on the assessment results, it generates scenario-based and differentiated collaborative scheduling commands through risk area analysis and intelligent matching of control rules. These commands are then structured and grouped to ensure coordinated response from multiple types of devices, including access control, monitoring, and alarm systems. This system not only transforms security management from passive alarm to proactive prediction but also improves the efficiency of handling abnormal events through coordinated device scheduling, effectively reducing security risks and meeting the needs of modern buildings for comprehensive, precise, and efficient security management. Attached Figure Description
[0021] To facilitate understanding by those skilled in the art, the present invention will be further described below with reference to the accompanying drawings.
[0022] Figure 1 This is a flowchart illustrating a visual omni-channel security management system based on the fusion of AI video analysis and AR according to the present invention. Figure 2 This is a flowchart of the building mapping process in this invention. Detailed Implementation
[0023] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided.
[0024] Please see Figure 1-2 A visualized full-domain security management system based on the fusion of AI video analysis and AR includes: a data acquisition unit, a data processing unit, a plotting unit, and a security control unit; The data acquisition unit is used to acquire multi-source video stream data from building security scene devices; The data processing unit performs full-domain partitioning and screening on the multi-source video stream data to obtain a security dataset; extracts basic features of personnel and physical scenes, and performs adaptive quantization encoding on the basic features; binds the security index of cross-modal features through a unique identifier coding mechanism to obtain a multi-dimensional decoupled action set of personnel violation identifiers and a feature set of physical scene violations. The plotting unit extracts the spatiotemporal features of the abnormal data based on the action set and feature set; it performs global mapping on the spatial points of the abnormal data to generate abnormal spatiotemporal points; it selects a reference identifier through feature point topology matching, performs spatiotemporal registration on the coordinates of the building AR real-scene security base, and simultaneously performs visual plotting on the abnormal data. The security control unit uses the real-scene visualization plotting results and the graph-linked collaborative intelligent driving model to conduct anomaly risk assessment of the entire building area, and generates collaborative scheduling instructions for building zone security equipment based on the assessment results.
[0025] Specifically, the acquisition of multi-source video stream data of building security scene equipment is as follows: based on the deployment topology map of building security equipment, a multi-protocol compatible acquisition mechanism is used to acquire the original heterogeneous video stream data of monitoring cameras and sensor terminals; The format is unified by an adaptive decoding conversion matrix to obtain homogenized video stream data; A multi-source video stream dataset is obtained by using a spatiotemporal frame deduplication algorithm and electromagnetic noise filtering.
[0026] In this embodiment, based on the building's architectural drawings, security equipment installation list, and on-site survey results, a comprehensive security equipment deployment topology map is drawn, including equipment location coordinates, communication interfaces, protocol types, and data transmission paths. This map clarifies the distribution logic and relationships of devices such as surveillance cameras (including fixed bullet cameras, PTZ cameras, and panoramic cameras), infrared sensor terminals, vibration sensor terminals, and sound sensor terminals. The data acquisition unit employs a multi-protocol compatible acquisition mechanism, with built-in parsing modules for mainstream security communication protocols such as ONVIF, RTSP, GB / T 28181, and HTTP. It simultaneously acquires raw data from heterogeneous video streams output by devices from different manufacturers and models—including H.264 / H.265 / MJPEG encoded video from surveillance cameras (resolutions covering 720P, 1080P, and 4K), night vision video streams associated with infrared sensor terminals, and video clips associated with events triggered by vibration sensors—forming a multi-format, multi-resolution set of heterogeneous video stream raw data. Subsequently, the adaptive decoding transformation matrix automatically identifies parameters such as the encoding format, resolution, and frame rate of each video stream using deep learning algorithms, matches the optimal decoding scheme to complete the decoding, and then converts all decoded video streams into homogenized video stream data with H.265 encoding, 1080P resolution, and 25 frames per second, ensuring the compatibility and efficiency of subsequent data processing. Finally, a spatiotemporal frame order deduplication algorithm is used to perform frame sequence analysis on the homogenized video stream. By calculating the pixel difference, motion vector similarity, and timestamp continuity of consecutive frames, duplicate frames caused by device lag, network latency, and signal interference are accurately identified and removed. At the same time, an electromagnetic noise filtering module is activated, using a wavelet threshold denoising algorithm to filter electromagnetic interference noise and environmental clutter in the video stream, retaining effective detail information in the video frames, and outputting a high-fidelity, non-redundant, and uniformly formatted multi-source video stream dataset.
[0027] Specifically, the process of performing full-domain partitioning and screening includes: extracting device spatial identifiers and regional division features based on the multi-source video stream dataset to obtain a collection of regional association labels; The intelligent partitioning mapping algorithm is used to perform a full-domain partitioning and classification operation on the video stream data; Based on the partitioning and classification results, the data is stored in a structured manner using spatiotemporal indexing to obtain a smart storage set of security data categorized by domain.
[0028] In this embodiment, the device metadata corresponding to each video stream is parsed from the multi-source video stream dataset. Device spatial identifiers (including the 3D coordinates of the device installation location, floor, area number, installation height, etc.) and area division features (including functional area types such as office area / server room / corridor / parking lot, security levels such as ordinary area / restricted area / restricted area, area boundary coordinates, etc.) are extracted. These features are then associated and bound with the device ID and acquisition timestamp of the video stream data to form a smart aggregation of area association tags containing device ID, spatial identifier, area attribute, and timestamp, ensuring that each video stream has clear spatial attribution information. Based on this tag aggregation, the intelligent partitioning mapping algorithm adopts a hybrid strategy combining K-means clustering and rule matching. First, it performs primary partitioning by floor, then secondary partitioning by functional area type, and finally tertiary partitioning by security level. This performs a full-domain partitioning and classification operation on all video stream data. For example, video streams with different attributes, such as 1F-commercial area-ordinary area and B2-parking lot-restricted area, are assigned to their corresponding partitions. For the video stream data after partitioning and categorizing, a spatiotemporal indexing structured storage scheme is adopted: multi-dimensional indexing information is added to each video stream, including time indexing (acquisition start time, end time, time slice index), spatial indexing (region code, device coordinates, coverage area), and content indexing (video type such as normal / night vision / event triggered, key feature summary). The indexed video stream data is then stored in a distributed time-series database according to partitions to complete the data storage, forming a domain-classified security data intelligent storage set that is clearly indexed by region and can be quickly retrieved.
[0029] Specifically, the process of extracting the basic features of personnel and physical scenes includes: extracting the personnel's body contours, posture key point features, and the spatial topology and visual texture features of the physical scene; By optimizing data quality through a feature dimension calibration mechanism, a feature map is obtained. Based on feature maps, spatiotemporal synchronization is achieved through cross-modal feature alignment to obtain personnel-scene collaborative feature maps.
[0030] In this embodiment, features are extracted from the video stream frames of the security data intelligent storage collection: For personnel features, the semantic segmentation algorithm is used to separate personnel from the background, and morphological features such as the aspect ratio of the bounding rectangle of the body contour, contour complexity, and center of gravity position are extracted. At the same time, by identifying 17 key posture joints such as head, neck, shoulder, elbow, wrist, hip, knee, and ankle, dynamic features such as the three-dimensional coordinates, joint angles, and motion trajectories of the joints are obtained. For physical scene features, the SIFT algorithm is used to extract the spatial topological features of objects in the scene (such as: door and window positions, wall orientation, passage width, object distribution density, etc.), and visual texture features (such as: wall texture, ground material, sign pattern, object surface color distribution, etc.) are extracted. The extracted raw features are input into a feature dimension calibration mechanism: First, statistical filtering is used to remove abnormal features caused by image blurring, occlusion, or sudden changes in lighting (such as contour features exceeding normal human proportions and keypoint features with abrupt coordinate changes); then, Min-Max normalization is used to map feature values of different dimensions to the [0,1] interval to eliminate the influence of dimension differences; finally, redundant features (such as repetitive texture features with correlation ≥0.95) are removed by mutual information entropy calculation, resulting in a well-structured and information-effective feature map. Based on this feature map, spatiotemporal synchronization is achieved through a cross-modal feature alignment algorithm: using the timestamp of the video frame as the time reference, the spatial coordinates of personnel features and corresponding scene features at the same timestamp are calibrated (ensuring that the position of personnel features in scene features is consistent with the real physical position), and then the feature weights of the personnel-scene interaction area are strengthened through an attention mechanism (such as increasing the scene feature weights of the door and window areas when personnel approach doors and windows), forming a personnel-scene collaborative feature map with a deep correlation between personnel actions and the scene environment.
[0031] Specifically, the process of adaptive quantization encoding includes: using a differentiated quantization encoding matrix based on the characteristics of the corresponding data type to obtain an encoding feature set; The data dimensions of the encoded feature set are compressed to obtain a standard encoded feature dataset.
[0032] In this embodiment, the feature data in the personnel-scene collaborative feature map is refined into three core feature types: numerical features (e.g., continuous values such as joint coordinates, contour aspect ratio, channel width, and brightness), categorical features (e.g., discrete categories such as personnel posture type, scene function type, and object category), and temporal features (e.g., time-series data such as joint movement trajectory, personnel movement path, and scene illumination changes). For different types of feature data, a differentiated quantization coding matrix is used for encoding: linear quantization coding is used for numerical features, mapping the normalized values in the [0,1] interval to 8-bit binary codes while preserving the relative magnitude of the values; one-hot coding combined with hash mapping is used for categorical features, first converting the category labels into one-hot vectors, and then mapping them to fixed-length binary codes through a hash function to ensure the uniqueness of the encoding for different categories of features; temporal-dimensional correlation features are extracted, converting the variable-length temporal data into fixed-dimensional feature vectors, and then performing quantization coding to preserve the temporal variation patterns. Through the above differentiated coding processing, a coded feature set containing multiple types of coded features is obtained. Principal Component Analysis (PCA) algorithm is used to compress the data dimensionality of the encoded feature set: the covariance matrix of the encoded feature set is calculated, the eigenvalues and eigenvectors are solved, and the high-dimensional encoded feature set is mapped to a K-dimensional low-dimensional space. While retaining the core information, the dimensionality and computational complexity of the feature data are greatly reduced, and a standard encoded feature dataset with optimized dimensionality, complete information and can be directly used for subsequent processing is obtained.
[0033] Specifically, the process of binding the security index of cross-modal features through the unique identifier coding mechanism includes: generating a unique identifier for the feature data based on the standard coded feature dataset; A cross-modal feature association algorithm is used to establish a mapping relationship between personnel's unruly actions and physical scene violation features; Based on feature mapping relationships, a multi-level security feature index map is constructed to obtain a multi-dimensional decoupled set of personnel violation identification actions and a set of physical scene violation features.
[0034] In this embodiment, based on a standard coded feature dataset, a unique identifier (UUID) for feature data is generated using a composite coding rule of device ID, region code, timestamp, feature type, and random sequence. The device ID and region code associate the spatial attribution of the feature data; the timestamp ensures the temporal uniqueness of the feature data; the feature type distinguishes between personnel features and scene features; and the random sequence avoids identifier conflicts. This unique identifier enables full lifecycle traceability of each piece of feature data. A cross-modal feature association algorithm is used to establish a mapping relationship, the formula of which is as follows: , Sim(F p ,F s): The comprehensive correlation similarity between personnel features Fp and scene features Fs (value range [0,1], the closer to 1, the stronger the correlation). F p : Feature vector of personnel's unruly actions (e.g., encoding vector of actions such as climbing and breaking in); F s Physical scene violation feature vector (e.g., encoding vector for restricted area fences, equipment rooms, etc.); Cosine similarity represents the directional consistency between two feature vectors; MI(F p ,F s ): Mutual information value, representing the strength of the statistical dependency between two features; α: Weighting coefficient (range [0.3, 0.7]), used to balance the contribution ratio of cosine similarity and mutual information, and can be adaptively adjusted according to the actual scenario.
[0035] To analyze the correlation between personnel and scene characteristics, indicators such as cosine similarity and mutual information are used to mine the strength of their association. The focus is on establishing a mapping relationship between personnel's unethical actions and physical scene violations. For example, the correlation between personnel climbing and restricted area fence features, personnel breaking into equipment rooms at night and equipment room features, and personnel tailgating and corridor access control features are all recorded. The confidence threshold of each association is also recorded. Based on this feature mapping, a multi-level security feature index map is constructed: the map is divided into four levels from top to bottom. The first level is a global feature index, the second level is a regional feature index, the third level is a feature type index (personnel characteristics / scene characteristics), and the fourth level is a specific feature index (such as unethical action types / scene violation types). Each index node is associated with a unique identifier for the corresponding feature data and a related feature pointer. This index map enables rapid bidirectional retrieval and correlation queries of personnel violation characteristics and scene violation characteristics. Ultimately, it yields a multi-dimensional decoupled set of personnel violation identification actions (including more than 20 types of violation actions such as climbing, vaulting, intrusion, tailing, and overstaying, each action containing feature codes, associated scene types, confidence thresholds, and other information) and a set of physical scene violation characteristics (including more than 15 types of scene violation characteristics such as restricted area intrusion, personnel appearing in equipment malfunction areas, and unauthorized loitering in areas, each characteristic containing feature codes, associated action types, area security levels, and other information).
[0036] Specifically, the process of extracting the spatiotemporal features of the abnormal data includes: detecting abnormal data through an abnormal feature detection mechanism based on the set of personnel violation identification actions and the set of physical scene violation features; Extract the relevant parameters from the abnormal data to obtain the original spatiotemporal feature set; Parameter calibration is performed using a spatiotemporal feature fusion algorithm to obtain a set of spatiotemporal feature parameters for abnormal data.
[0037] In this embodiment, a set of actions indicating unauthorized personnel and a set of physical scene violation features are input into an anomaly detection mechanism. This mechanism comprises a two-layer architecture: a rule base and an intelligent detection model. The rule base pre-defines various anomaly criteria, such as: a person's joint movement trajectory matching climbing characteristics + associated scene being a restricted area fence → judged as a climbing anomaly; a person's stay in a restricted area exceeding 30 minutes + unauthorized access record → judged as a loitering anomaly. The intelligent detection model employs an LSTM-based anomaly detection network, using historical normal feature data as training samples to learn the distribution patterns of normal features. When the deviation between the input real-time features and the normal distribution exceeds a preset threshold, it is judged as abnormal data. Through dual verification of rule base matching and intelligent model detection, abnormal data is accurately identified from video stream features. For the detected abnormal data, the corresponding spatiotemporal feature parameters are extracted: the time parameters include the start time stamp of the abnormal behavior (accurate to milliseconds), duration, time change trend (e.g., whether it continues to increase / stable / decline), and the type of time period (e.g., working hours / non-working hours / early morning); the spatial parameters include the three-dimensional coordinates of the device location where the abnormal behavior occurred, the area code, the area security level, the boundary coordinates of the abnormal coverage area, the spatial movement trajectory of the abnormal behavior (e.g., the latitude and longitude sequence of personnel movement path), and the distance to surrounding key facilities (e.g., the distance to access control / camera / computer room); these time parameters and spatial parameters are integrated to form the original spatiotemporal feature set. Subsequently, a spatiotemporal feature fusion algorithm was used for parameter calibration: in terms of time calibration, the timestamp deviation of different devices was corrected by the NTP time synchronization protocol to unify it into the building standard time system, and missing time parameters were supplemented by linear interpolation; in terms of spatial calibration, the deviation between the local coordinates and global coordinates of the devices was corrected based on the transformation matrix of the building coordinate system, and Kalman filtering was used to smooth the noise interference of spatial motion trajectory; finally, the spatiotemporal features were fused by the attention mechanism to strengthen the feature weights of the time of anomaly occurrence and the core location, and finally a standardized and high-precision spatiotemporal feature parameter set of anomaly data was obtained.
[0038] Specifically, the process of mapping the spatial locations of abnormal data globally includes: mapping the local coordinates of the device to the global coordinate system of the building through device coordinate transformation; The coordinate deviation is corrected by spatial point calibration to obtain the abnormal spatial points; Based on the abnormal spatial locations and the duration of the abnormality, a global mapping is performed on the spatiotemporally related abnormal locations.
[0039] In this embodiment, a building-wide global coordinate system is established. Using a fixed marker at the building's ground floor entrance as the origin, a three-dimensional Cartesian coordinate system is created. The units of measurement and directions for the X-axis (horizontal), Y-axis (horizontal), and Z-axis (vertical) are defined. Simultaneously, the coordinates of each security device's installation location within this global coordinate system are measured, and a transformation matrix (including translation and rotation matrices) is established between the device's local coordinates and global coordinates. Through a device coordinate transformation algorithm, the device's local coordinates (i.e., the coordinates of the location where the abnormal behavior occurred within the device's own coordinate system) recorded in the spatiotemporal characteristic parameters of the abnormal data are substituted into the transformation matrix to calculate the initial coordinate data of that location within the building's global coordinate system. This achieves a preliminary mapping of the abnormal spatial location from local to global coordinates. For the converted global coordinate data, a spatial point calibration algorithm is used to correct the deviation: First, the impact of equipment installation errors (e.g., installation angle deviation, height measurement deviation) and environmental interference (e.g., coordinate drift caused by temperature changes) on coordinate accuracy is analyzed, and an error compensation model is established; then, the global coordinates of fixed landmarks within the building (e.g., corners, columns, ground marking lines) are extracted as a reference, and the deviation between the initial mapped coordinates and the reference coordinates is calculated; finally, the deviation is corrected using the error compensation model to obtain the abnormal spatial points. Based on the corrected abnormal spatial points and the duration of the anomaly, a global mapping is performed on the spatiotemporally related abnormal points: all abnormal spatial points during the duration of the anomaly are connected in series according to the time sequence to construct the spatiotemporal trajectory curve of the abnormal behavior. At the same time, combined with the boundary coordinates of the anomaly coverage area, the spatial influence area of the abnormal behavior (e.g., circular / rectangular coverage area) is drawn in the building's global coordinate system, clearly showing the location distribution, movement trajectory, and coverage area of the abnormal behavior in the building's global space, thus completing the global mapping of the spatiotemporally related abnormal points.
[0040] Specifically, the process of performing spatiotemporal registration of the coordinate system of the building AR real-scene security base includes: extracting the coordinates of feature points of the building's fixed signage; A mapping relationship between anomaly locations and AR base coordinates is constructed using a topology matching algorithm; The spatiotemporal dynamic calibration engine corrects the coordinate deviation of the AR base and accurately registers abnormal points with the AR scene.
[0041] In this embodiment, the building AR real-scene security base is constructed based on BIM technology, including a detailed model of the building's three-dimensional architectural structure, equipment deployment locations, and area divisions. First, the feature point coordinates of fixed markers are extracted from this AR base model. Objects that are not easily changed and have high recognizability, such as wall corners, column apexes, equipment base centers, and intersections of ground marking lines, are selected as fixed markers. Their true three-dimensional coordinates are measured using a high-precision laser rangefinder, serving as the reference feature point coordinate set for registration, ensuring the stability and accuracy of the reference. A topology matching algorithm is used to construct the mapping relationship: First, spatial topology parameters such as Euclidean distance, angular relationship, and topological connectivity between abnormal spatial points and each reference feature point are calculated. Then, based on these topology parameters, the RANSAC algorithm (Random Sample Consensus Algorithm) is used to eliminate outlier interference, selecting the subset of reference feature points with the highest matching degree. Finally, the mapping matrix between abnormal points and the AR base coordinate system is solved using the least squares method, constructing a spatial topology mapping relationship model between the two. The AR base coordinate deviation is corrected through a spatiotemporal dynamic calibration engine: The operating status of the AR base model is monitored in real time, and attitude changes of the device are sensed through sensors such as gyroscopes and accelerometers. Simultaneously, the real coordinates of reference feature points are periodically compared with the model coordinates in the AR base to detect coordinate drift. When the detected drift deviation exceeds a threshold, a dynamic calibration process is initiated to correct the AR base model coordinates based on the real coordinates of the reference feature points, eliminating drift errors. Based on the corrected AR base coordinate system and the constructed mapping relationship model, abnormal spatial points are substituted into the mapping matrix to calculate their corresponding positions in the AR base coordinate system, achieving accurate registration between abnormal points and the AR scene, ensuring that the displayed position of abnormal behavior in the AR scene is completely consistent with its real physical position.
[0042] Specifically, the process of visualizing and plotting abnormal behavior events includes: transforming abnormal features into visual identifiers based on the spatiotemporal registration results through an anomaly type mapping mechanism; The AR overlay rendering engine is used to plot the signage onto the corresponding position of the AR real-world security base. By adjusting the label hierarchy and layout through an anti-occlusion intelligent optimization algorithm, real-scene visualization plotting data is obtained.
[0043] In this embodiment, based on the precise spatiotemporal registration results and combined with the core characteristics of abnormal data (including anomaly type, risk level, duration, and scope of impact), a visual identifier transformation is performed through an anomaly type mapping mechanism: For anomaly type mapping, a unique visual icon is designed for each anomaly type, such as: climbing anomaly corresponds to a climbing human figure + fence combination icon; trespassing into a restricted area corresponds to a red prohibition symbol + area marker icon; and overstaying corresponds to a clock + human figure icon. The icon style uses vector graphics to ensure no distortion during scaling. For risk level mapping, a color-coded system is used to classify risk levels: general risk (blue), relatively high risk (yellow), high risk (orange), and extremely high risk (red). The higher the risk level, the thicker the icon border and the faster the flashing frequency. For duration and scope of impact mapping, a numerical label is added below the icon to display the duration, and a semi-transparent color block is overlaid to display the scope of impact. The transparency of the color block decreases as the risk level increases. The AR overlay rendering engine uses the OpenGL ES graphics rendering framework. It calls the transformed visual identifiers and, based on the registration coordinates of the abnormal points in the AR base, accurately overlays and renders the visual identifiers to the corresponding positions in the AR real-world scene. First, the two-dimensional pixel coordinates of the identifiers are converted into three-dimensional world coordinates of the AR scene to ensure the depth adaptation between the identifiers and the AR scene. Then, the identifiers are blended with the AR real-world scene through hybrid rendering technology, making the identifiers look as if they actually exist in physical space. Finally, visual effects such as shadows and halos are added to enhance the sense of three-dimensionality and recognizability. To address the potential issue of marker occlusion during simultaneous plotting of multiple anomaly points, the anti-occlusion intelligent optimization algorithm employs a greedy algorithm combined with a priority sorting strategy: First, it sets marker display priorities according to risk level, with extremely high risk > high risk > relatively high risk > moderate risk; then, it calculates the bounding box coordinates of each marker element in real time to detect overlapping areas; for overlapping markers, it prioritizes retaining the display position of high-priority markers, shifting low-priority markers along the edge of the overlapping area, reducing the size of low-priority markers, or adjusting the display layer of markers (high-priority markers are displayed on the top layer); for overlapping markers without priority differences, it adjusts the layout using a uniform distribution principle to ensure that all marker elements are clearly visible and do not obstruct each other, thereby obtaining intuitive, clear, and hierarchically distinct real-scene visualization plotting data.
[0044] Specifically, the process of using the graph-linked collaborative intelligent driving model to assess the abnormal risks of the entire building includes: extracting a risk assessment parameter set based on real-scene visualized plotting data; By utilizing the MapLink collaborative intelligent driving model and integrating historical risk data with real-time scenario information, a quantitative risk assessment is conducted. Based on the quantitative assessment results, the judgment results are calibrated through a risk level dynamic correction engine to obtain the results of the abnormal risk assessment of the entire building.
[0045] In this embodiment, the core parameter set required for risk assessment is extracted from the real-scene visualization plotting data, covering five categories of parameters: basic anomaly parameters (anomaly type, location of occurrence, duration, scope of impact, initial risk level), personnel-related parameters (number of people involved, personnel behavior characteristics, whether they are carrying suspicious items, whether they have authorized access records), scene-related parameters (security level of the area, importance of equipment in the area, population density in the area, distribution of emergency passages in the area), equipment-related parameters (operating status of associated security equipment, completeness of monitoring coverage, response status of alarm equipment), and environmental-related parameters (real-time lighting conditions, weather conditions, whether it is a holiday / night or other special period). The GraphLink Collaborative Intelligent Driving Model is built upon Graph Neural Networks (GNNs) and collaborative decision-making algorithms. The model's input layer receives the aforementioned risk assessment parameter set and simultaneously accesses relevant data from a historical risk database via a data interface (including the handling results of similar abnormal events over the past three years, risk diffusion patterns, the extent of losses caused, and the matching degree between historical risk levels and actual impacts). It also integrates real-time scene information from the current building (including the load status of all security equipment, the on-duty distribution of security personnel, the configuration of emergency resources, and the total flow of people in the current building). The model uses GNNs to uncover the relationships between parameters and employs a collaborative decision-making algorithm to fuse historical data and real-time information for risk quantification assessment: weights are assigned to each parameter, and an initial risk score is calculated; then, the initial score is adjusted using risk correction coefficients from historical data; the score is mapped to a risk quantification value of [0, 100] using a Sigmoid function; finally, risk levels are classified based on the quantification value to obtain a preliminary risk quantification assessment result. The risk level dynamic correction engine is calibrated in conjunction with real-time security situation: if multiple areas experience concurrent anomalies, the risk level of each anomaly is increased by one level; if the anomaly occurs in a core critical area (such as a computer room / power distribution room), the risk level is increased by one level; if security resources are sufficient in the real-time scenario (such as on-duty security personnel nearby and emergency equipment that can respond quickly), the risk level is decreased by one level; if the duration of the anomaly exceeds a preset threshold, the risk level is increased once for each additional certain duration, outputting accurate and realistic building-wide anomaly risk assessment results, including the final risk level, scope of impact, urgency, associated areas, and recommended handling priority for each anomaly event.
[0046] Specifically, the process of generating collaborative scheduling instructions for building zone security equipment includes: based on the building anomaly risk determination results, extracting risk level and anomaly area information through risk area analysis; By leveraging a comprehensive security management rule matching mechanism and scenario-based command generation, intelligent adaptation of management strategies to abnormal scenarios is achieved. Based on the adaptation results, the structured encoding and grouping of the execution instructions are integrated to obtain the linkage control instruction set for building area access control.
[0047] In this embodiment, based on the results of the building-wide anomaly risk assessment, the risk area analysis module uses spatial analysis technology of Geographic Information System (GIS) to accurately extract the core information of each anomaly: risk level (general / relatively high / high / extremely high), anomaly area information (area code, three-dimensional boundary coordinates, function type, security level, and a list of associated security devices such as the ID and location of access control / camera / alarm devices), and impact diffusion prediction (based on historical data to predict the possible diffusion area and diffusion time). This information is structured and output as a triplet of risk level-area information-diffusion prediction data to clarify the key targets and scope of control. The comprehensive security control rule matching mechanism has a built-in rule library covering more than 50 control strategies. The rule library is indexed in three dimensions: risk level, area type, and anomaly type. For example, the control strategy for "extremely high risk - restricted area - intrusion" is to close all access control in the area and within 10 meters, activate the sound and light alarm in the area, adjust the focus of the cameras in the area to the maximum and track the target, send an emergency alarm to the security command center, and dispatch the nearest on-duty security personnel to the scene. The control strategy for "relatively high risk - office area - overstay" is to "restrict the entry of new personnel into the area, send tracking instructions to the cameras in the area, and send a notification to the security personnel". Based on the risk area analysis results, the rule matching mechanism quickly matches the corresponding control strategies through a three-dimensional index. Then, combined with the specific details of the abnormal scenario (such as the number of people involved, whether they are carrying suspicious items, and real-time personnel density), it generates scenario-based instructions: For access control devices, it generates switch control instructions (e.g., close access control IDs: 101-105, prohibit authorized access for access control IDs: 203-205) and permission adjustment instructions (e.g., temporarily freeze the area access permissions for user IDs: 5001-5010); For monitoring devices, it generates parameter adjustment instructions (e.g., adjust the focal length of camera IDs: 301-305 to 20x, turn on infrared mode, and track moving targets) and recording control instructions (e.g., "start continuous recording of camera IDs: 301-305 and mark it as video associated with abnormal events"); For alarm devices, it generates alarm trigger instructions (e.g., activate the audible and visual alarm of alarm IDs: 401-403 and turn the alarm volume to maximum) and notification sending instructions (e.g., send a notification containing the abnormal location and risk level to security personnel terminal IDs: 601-605). Based on the original control commands generated by the adaptation, the execution commands are structured and encoded: a unified format of command header-device type-device ID-operation type-parameter-checksum-command tail is adopted. The command header and command tail are used for frame synchronization, the checksum is used to ensure the integrity of command transmission, and the parameter part adopts differentiated encoding according to different device types (e.g., access control device parameters are on / off status / permission list, camera device parameters are focal length / mode / tracking target coordinates).Finally, the instructions are grouped and integrated according to the abnormal area and device type: the instructions are divided into different area instruction groups, and each area instruction group is further divided into subgroups according to device type (access control / monitoring / alarm). At the same time, an execution priority identifier and execution sequence requirements are added to each group of instructions (e.g., execute the access control closing instruction first, then the alarm trigger instruction, and finally the camera tracking instruction). This results in a well-structured and directly executable building area access control linkage control instruction set, ensuring coordinated action and accurate response of all devices.
[0048] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the scope of the present invention.
Claims
1. A visualized, omni-channel security management system based on the fusion of AI video analytics and AR, characterized in that, include: Data acquisition unit, data processing unit, plotting unit, security control unit; The data acquisition unit is used to acquire multi-source video stream data from building security scene devices; The data processing unit performs full-domain partitioning and screening on the multi-source video stream data to obtain a security dataset; extracts basic features of personnel and physical scenes, and performs adaptive quantization encoding on the basic features; binds the security index of cross-modal features through a unique identifier coding mechanism to obtain a multi-dimensional decoupled action set of personnel violation identifiers and a feature set of physical scene violations. The plotting unit extracts the spatiotemporal features of the abnormal data based on the action set and feature set; it then performs global mapping on the spatial points of the abnormal data to generate abnormal spatiotemporal points. The reference identifier is selected by feature point topology matching, and the coordinates of the building AR real-scene security base are spatiotemporally registered. At the same time, abnormal data is visualized and plotted. The security control unit uses the real-scene visualization plotting results and the graph-linked collaborative intelligent driving model to conduct anomaly risk assessment of the entire building area, and generates collaborative scheduling instructions for building zone security equipment based on the assessment results.
2. The system according to claim 1, characterized in that, The acquisition of multi-source video stream data from building security scene devices is as follows: Based on the deployment topology map of building security equipment, a multi-protocol compatible acquisition mechanism is used to obtain the raw data of heterogeneous video streams from surveillance cameras and sensor terminals; The format is unified by an adaptive decoding conversion matrix to obtain homogenized video stream data; A multi-source video stream dataset is obtained by using a spatiotemporal frame deduplication algorithm and electromagnetic noise filtering.
3. The system according to claim 1, characterized in that, The specific process of performing the global partitioning screening includes: Based on the multi-source video stream dataset, device spatial identifiers and region division features are extracted to obtain a region association tag intelligent collection; The intelligent partitioning mapping algorithm is used to perform a full-domain partitioning and classification operation on the video stream data; Based on the partitioning and classification results, the data is stored in a structured manner using spatiotemporal indexing to obtain a smart storage set of security data categorized by domain.
4. The system according to claim 1, characterized in that, The specific process for extracting the basic features of personnel and physical scenes includes: By extracting the body contours and posture key features of people, as well as the spatial topology and visual texture features of the physical scene; By optimizing data quality through a feature dimension calibration mechanism, a feature map is obtained. Based on feature maps, spatiotemporal synchronization is achieved through cross-modal feature alignment to obtain personnel-scene collaborative feature maps.
5. The system according to claim 1, characterized in that, The specific process of performing adaptive quantization encoding includes: Based on the characteristics of the corresponding data type, a differentiated quantization coding matrix is used to obtain the coding feature set; The data dimensions of the encoded feature set are compressed to obtain a standard encoded feature dataset.
6. The system according to claim 1, characterized in that, The specific process of binding the security index with cross-modal features through a unique identifier coding mechanism includes: Based on the standard encoded feature dataset, a unique identifier for the feature data is generated; A cross-modal feature association algorithm is used to establish a mapping relationship between personnel's unruly actions and physical scene violation features; Based on feature mapping relationships, a multi-level security feature index map is constructed to obtain a multi-dimensional decoupled set of personnel violation identification actions and a set of physical scene violation features.
7. The system according to claim 1, characterized in that, The specific process for extracting the spatiotemporal features of abnormal data includes: Based on the set of personnel violation identification actions and the set of physical scene violation features, abnormal data is detected through an abnormal feature detection mechanism. Extract the relevant parameters from the abnormal data to obtain the original spatiotemporal feature set; Parameter calibration is performed using a spatiotemporal feature fusion algorithm to obtain a set of spatiotemporal feature parameters for abnormal data.
8. The system according to claim 1, characterized in that, The specific process of performing global mapping of spatial points of abnormal data includes: The device's local coordinates are mapped to the building's global coordinate system through device coordinate transformation; The coordinate deviation is corrected by spatial point calibration to obtain the abnormal spatial points; Based on the abnormal spatial locations and the duration of the abnormality, a global mapping is performed on the spatiotemporally related abnormal locations.
9. The system according to claim 1, characterized in that, The specific process of spatiotemporal registration of the coordinate system of the building AR real-scene security base includes: Extract the coordinates of feature points for fixed building signs; A mapping relationship between anomaly locations and AR base coordinates is constructed using a topology matching algorithm; The spatiotemporal dynamic calibration engine corrects the coordinate deviation of the AR base and accurately registers abnormal points with the AR scene.
10. The system according to claim 1, characterized in that, The specific process of visualizing and plotting abnormal behavior events includes: Based on the spatiotemporal accurate registration results, anomaly features are transformed into visual identifiers through an anomaly type mapping mechanism; The AR overlay rendering engine is used to plot the signage onto the corresponding position of the AR real-world security base. By adjusting the label hierarchy and layout through an anti-occlusion intelligent optimization algorithm, real-scene visualization plotting data is obtained.
11. The system according to claim 1, characterized in that, The specific process of using the graph-linked collaborative intelligent driving model to assess anomaly risks across the entire building area includes: Based on real-scene visualization plotting data, extract a set of risk assessment parameters; By utilizing the MapLink collaborative intelligent driving model and integrating historical risk data with real-time scenario information, a quantitative risk assessment is conducted. Based on the quantitative assessment results, the judgment results are calibrated through a risk level dynamic correction engine to obtain the results of the abnormal risk assessment of the entire building.
12. The system according to claim 1, characterized in that, The specific process of generating collaborative scheduling instructions for building zone security devices includes: Based on the building anomaly risk assessment results, risk level and anomaly area information are extracted through risk area analysis; By leveraging a comprehensive security management rule matching mechanism and scenario-based command generation, intelligent adaptation of management strategies to abnormal scenarios is achieved. Based on the adaptation results, the structured encoding and grouping of the execution instructions are integrated to obtain the linkage control instruction set for building area access control.