A campus safety analysis and early warning method and device based on edge pre-screening
By constructing a multi-scenario data acquisition system and edge computing nodes on campus for initial screening, and combining multimodal data fusion and cross-modal semantic understanding of the cloud analysis platform, the problem of isolated multi-source data in campus security analysis was solved, and high-precision and timely security risk warnings were achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUHAN ID TECH CO LTD
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-16
AI Technical Summary
Existing campus security analysis and early warning methods suffer from low accuracy because they lack a unified data collaboration and semantic association mechanism among various systems, making it difficult to effectively associate and integrate multi-source data and identify complex risks.
By constructing a multi-scenario data acquisition system, using edge computing nodes for initial screening, outputting risk scenario data to be analyzed, and performing multimodal data fusion and cross-modal semantic understanding on a cloud analysis platform, a dedicated large-scale model for campus safety and a risk reasoning and decision-making engine are constructed to generate campus risk warning levels and decision instructions.
It enables unified analysis and associated early warning of security risks in multiple campus scenarios, improves the accuracy of security risk identification and the timeliness of early warning in complex scenarios, and enhances the ability to understand typical security scenarios and their risk semantics and the reliability of decision-making instructions.
Smart Images

Figure CN122223928A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of edge computing, and in particular to a campus security analysis and early warning method and device based on edge pre-screening. Background Technology
[0002] The campus environment is characterized by dense population, diverse activities, and complex spatial functions. Furthermore, various safety risks are sudden and interconnected in both time and space. Therefore, how to conduct comprehensive, timely, and accurate analysis and early warning of campus safety has become an important technical issue in the construction of smart campuses.
[0003] Existing campus safety analysis and early warning methods suffer from a lack of unified data collaboration and semantic association mechanisms between various systems. This leads to independent operation of systems such as fire prevention, security, and personnel management, resulting in distinct scenario isolation characteristics across different safety scenarios and hindering effective correlation and fusion of multi-source data. For example, it is impossible to jointly analyze abnormal crowd gatherings in a certain area with simultaneous abnormal heat source changes in the vicinity, easily missing early warning opportunities for complex risks such as stampedes and fires. Therefore, existing technologies generally suffer from low accuracy in campus safety identification when facing complex campus scenarios with concurrent multi-source information and overlapping risks.
[0004] Therefore, there is an urgent need for a campus security analysis and early warning method and device based on edge pre-screening. Summary of the Invention
[0005] This application provides a campus security analysis and early warning method and device based on edge pre-screening, which solves the problem of low campus security identification accuracy that is common in existing technologies when facing complex campus scenarios with concurrent multi-source information and overlapping risks.
[0006] The first aspect of this application provides a campus security analysis and early warning method based on edge pre-screening. The method includes: acquiring multi-scenario data within the campus and sending the multi-scenario data to an edge computing node; performing preliminary screening of the multi-scenario data based on the edge computing node and outputting multiple risk scenario data to be analyzed; inputting the multiple risk scenario data to be analyzed into a cloud analysis platform and outputting campus security risk early warning results based on the cloud analysis platform.
[0007] Optionally, data from multiple scenarios within the campus can be acquired, specifically including: constructing a multi-scenario data acquisition system, which includes data acquisition devices deployed in various areas of the campus; the data acquisition devices include network cameras, infrared thermal imagers, IoT sensors, and access control card swiping systems; acquiring multi-scenario data within the campus based on the multi-scenario data acquisition system; the multi-scenario data includes real-time video stream data, heat map data, campus environmental data, personnel identity information data, and personnel location information data.
[0008] Optionally, the data from multiple scenarios is initially screened based on edge computing nodes, and multiple risk scenario data to be analyzed are output. Specifically, this includes: constructing multiple edge computing nodes within a preset range of the multi-scenario data acquisition system; each edge computing node includes a lightweight risk screening model and a data preprocessing module; performing frame extraction, compression, noise reduction, and formatting on the multi-scenario data based on the lightweight risk screening model to generate a standardized data representation adapted to the lightweight risk screening model; performing risk correlation screening on the standardized multi-scenario data based on the data preprocessing module, and obtaining multiple risk scenario data to be analyzed; the risk scenario data to be analyzed includes key video frame data containing potential risk characteristics, continuous video segment data, and sensor anomaly data segments that have a spatiotemporal correlation with the key video frame data or the number of continuous video segments; potential risk characteristics include abnormal behavior characteristics, abnormal heat distribution characteristics, and abnormal environmental parameter fluctuation characteristics.
[0009] Optionally, the cloud-based analytics platform includes a multimodal data fusion module, a large-scale campus safety model, a risk reasoning and decision-making engine, and an early warning information generation and distribution module. It inputs multiple risk scenario data to be analyzed into the cloud-based analytics platform, specifically including: acquiring multiple risk scenario data from different edge nodes, aligning and fusing the multiple risk scenario data in the time and space dimensions to obtain a unified scenario description tensor; extracting cross-modal semantic information from the scenario description tensor based on the trained large-scale campus safety model; inputting the cross-modal semantic information into the risk reasoning and decision-making engine and outputting the campus risk early warning level; and inputting the campus risk early warning level into the early warning information generation and distribution module and outputting target decision instructions.
[0010] Optionally, a large-scale model for campus safety can be constructed, specifically including: training the large-scale model for campus safety based on campus security scenario data; the campus security scenario data is labeled with multi-category semantic tags, including fire risk tags, campus bullying risk tags, abnormal intrusion risk tags, and abnormal gathering of people risk tags.
[0011] Optionally, a risk reasoning and decision-making engine is constructed, specifically including: constructing a scenario-rule knowledge graph, where the nodes of the scenario-rule knowledge graph are cross-modal semantic elements, which are generated by cross-modal semantic information through structured abstract representation, and the edges of the scenario-rule knowledge graph are the semantic relationships between cross-modal semantic elements; and introducing the scenario-rule knowledge graph to construct the risk reasoning and decision-making engine.
[0012] Optionally, the campus risk warning level is input into the warning information generation and distribution module, and target decision instructions are output. Specifically, this includes: generating structured warning information based on the campus risk warning level, which includes risk type, risk location, risk level, on-site risk image, and risk handling suggestions; constructing target decision instructions based on the structured warning information, and sending the target decision instructions to the corresponding processing terminal based on a preset distribution strategy; the processing terminal includes the security center's large screen, the on-duty personnel's mobile phone, and a multi-system linkage control interface.
[0013] A second aspect of this application provides a campus security analysis and early warning device based on edge pre-screening. The device includes an acquisition module and a processing module, wherein... The acquisition module is used to acquire multi-scenario data within the campus and send the multi-scenario data to the edge computing node.
[0014] The processing module is used to perform initial screening of multi-scenario data based on edge computing nodes and output multiple risk scenario data to be analyzed; input multiple risk scenario data to be analyzed into the cloud analysis platform and output the campus security risk warning results based on the cloud analysis platform.
[0015] A third aspect of this application provides an electronic device including a processor, a memory, a user interface, and a network interface. The memory is used to store instructions, the user interface and the network interface are used to communicate with other devices, and the processor is used to execute the instructions stored in the memory to cause the electronic device to perform the method as described above.
[0016] A fourth aspect of this application provides a non-transitory computer-readable storage medium storing a computer program, the computer program being executed by a processor using any of the methods described above.
[0017] One or more technical solutions provided in the embodiments of this application have at least the following technical effects or advantages: 1. Acquire multi-scenario data within the campus and send the multi-scenario data to edge computing nodes; Perform initial screening of the multi-scenario data based on edge computing nodes and output multiple risk scenario data to be analyzed; Input the multiple risk scenario data to be analyzed into the cloud analysis platform, and output the campus security risk warning results based on the cloud analysis platform, thereby realizing unified analysis and correlation warning of campus security risks in multiple scenarios, and improving the accuracy of campus security risk identification and the timeliness of warning in complex scenarios.
[0018] 2. Train a large-scale campus safety model based on campus security scenario data labeled with multiple categories of semantic tags, such as fire risk tags, school bullying risk tags, abnormal intrusion risk tags, and abnormal gathering of people risk tags. This enables the model to have a targeted understanding of typical campus security scenarios and their risk semantics, thereby improving the accuracy of identifying and distinguishing security risks in complex campus environments.
[0019] 3. Using cross-modal semantic elements as nodes and the semantic relationships between cross-modal semantic elements as edges, a scenario-rule knowledge graph is introduced to construct a risk reasoning and decision-making engine, thereby realizing the structured organization and associative reasoning of cross-modal semantic information, and improving the accuracy of campus safety risk level determination and the reliability of decision command generation. Attached Figure Description
[0020] Figure 1 This is a flowchart illustrating a campus security analysis and early warning method based on edge pre-screening provided in an embodiment of this application; Figure 2 This is a schematic diagram of a campus security analysis and early warning device based on edge pre-screening provided in an embodiment of this application; Figure 3 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0021] Explanation of reference numerals in the attached figures: 21. Acquisition module; 22. Processing module; 301. Processor; 302. Communication bus; 303. User interface; 304. Network interface; 305. Memory. Detailed Implementation
[0022] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments.
[0023] The terminology used in the following embodiments of this application is for the purpose of describing particular embodiments only and is not intended to be limiting of this application. As used in the specification of this application, the singular expressions “a,” “an,” “the,” “the,” “the,” and “this” are intended to include the plural expressions as well, unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in this application refers to and includes any or all possible combinations of one or more of the listed items.
[0024] Hereinafter, the terms "first" and "second" are used for descriptive purposes only and should not be construed as implying or suggesting relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more.
[0025] To enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings.
[0026] Please refer to Figure 1 The diagram illustrates a flowchart of a campus security analysis and early warning method based on edge pre-screening provided in this application embodiment. The flowchart mainly includes the following steps: S101 to S103.
[0027] Step S101: Obtain multi-scene data within the campus and send the multi-scene data to the edge computing node.
[0028] Specifically, campuses are typical scenarios characterized by dense populations, complex behavioral patterns, and diverse security risks. Different areas exhibit significant differences in spatial function, population composition, and risk patterns. For example, teaching buildings focus on risks related to population gathering and order, laboratories on fire and hazard source risks, and entrance and exit areas on risks related to unauthorized entry and identity verification. Existing technologies often monitor specific risks through single video surveillance systems or independent sensor systems. This approach typically focuses on analyzing a single scenario or risk type, lacking unified collection and collaborative perception of multi-scenario, multi-source information within the campus. It is difficult to achieve overall perception and correlation judgment of potential security issues occurring in different areas and at different times, thus failing to support the early identification of complex and evolving risks.
[0029] Therefore, in order to solve the technical problems of fragmented campus security perception, scenario separation, and insufficient risk correlation capabilities in existing technologies, this application first takes the perspective of the overall operation of the campus, uniformly acquires multi-scenario data covering multiple areas and multiple types of risks, and sends the multi-scenario data to edge computing nodes.
[0030] In one possible implementation, step S101 further includes: constructing a multi-scenario data acquisition system, which includes data acquisition devices deployed in various areas of the campus; the data acquisition devices include network cameras, infrared thermal imagers, IoT sensors, and access control card swiping systems; acquiring multi-scenario data within the campus based on the multi-scenario data acquisition system; the multi-scenario data includes real-time video stream data, heat map data, campus environmental data, personnel identity information data, and personnel location information data.
[0031] Specifically, the multi-scenario data acquisition system is deployed in zones according to the functional division of campus space. The configuration of data acquisition devices and the focus of data collection differ in different areas to adapt to the different potential risk types in each area. For teaching buildings and dormitory areas, network cameras are deployed at fixed angles or in a pan-tilt configuration to continuously acquire real-time video stream data covering corridors, entrances and exits, and public activity areas. The real-time video stream data reflects the density of people, the direction of people's flow, and changes in behavior. At the same time, the access control card swiping system records people's card swiping behavior at building entrances and key passages to form personnel identity information data and personnel location information data that correspond to the real-time video stream data in time and space, which is used to characterize the patterns of people entering and exiting and abnormal stays.
[0032] For example, in high-risk areas such as laboratories, power distribution rooms, and equipment rooms, infrared thermal imagers are deployed in conjunction with environmental IoT sensors. The infrared thermal imagers periodically or continuously acquire thermal map data of the area to characterize the surface temperature distribution of equipment and abnormal changes in environmental thermal activity. The IoT sensors, including smoke sensors, temperature and humidity sensors, and combustible gas sensors, collect environmental parameters in real time to form campus environmental data, reflecting abnormal environmental conditions that may lead to fires or safety accidents. Both thermal map data and campus environmental data are bound to corresponding area identifiers and timestamps during acquisition to ensure subsequent correlation analysis with video data. For playgrounds, squares, and open public areas, network cameras focus on covering the range of human activity to acquire real-time video stream data of large-scale human distribution and gathering. Acoustic IoT sensors simultaneously collect environmental acoustic characteristics to form campus environmental data reflecting abnormal noise, conflicts, or emergencies. By mapping real-time video stream data and environmental acoustic data along a time dimension, the correlation between changes in human behavior and environmental anomalies can be further characterized.
[0033] In each of the aforementioned regions, the multi-scenario data acquisition system uniformly encapsulates the acquired real-time video stream data, heatmap data, campus environment data, personnel identity information data, and personnel location information data, and adds source identifiers, time identifiers, and spatial identifiers to each type of data, thereby forming a structured multi-scenario data set. This multi-scenario data set is then transmitted to the corresponding edge computing nodes via wired or wireless networks, enabling the edge computing nodes to uniformly receive and process data from different regions and of different types while maintaining spatiotemporal consistency.
[0034] Step S102: Perform initial screening of multi-scenario data based on edge computing nodes, and output multiple risk scenario data to be analyzed.
[0035] Specifically, the process involves initial screening of multi-scenario data at the edge and outputting multiple risk scenario data sets for analysis. Because multi-scenario data exhibits high density characteristics in terms of temporal continuity, spatial coverage, and data scale, directly uploading all raw data to the cloud for analysis would easily lead to high network transmission pressure, high cloud computing load, and insufficient response timeliness. Therefore, this application introduces edge computing nodes closer to the data source to perform rapid preliminary analysis of multi-scenario data. By judging whether the data contains potential risk characteristics, normal data that is clearly unrelated to security risks is filtered out, retaining only risk candidate data that may reflect abnormal behavior, abnormal environmental changes, or abnormal heat distribution. This forms the risk scenario data set for analysis, which is then output to the cloud analysis platform. This provides a high-value data foundation for subsequent deep semantic understanding and risk inference, while improving the real-time performance and processing efficiency of the overall early warning process.
[0036] In one possible implementation, step S102 further includes: constructing multiple edge computing nodes within a preset range of the multi-scenario data acquisition system; each edge computing node includes a lightweight risk screening model and a data preprocessing module; performing frame extraction, compression, noise reduction, and formatting on the multi-scenario data based on the lightweight risk screening model to generate a standardized data representation adapted to the lightweight risk screening model; performing risk association screening on the standardized multi-scenario data based on the data preprocessing module, and obtaining multiple risk scenario data to be analyzed; the risk scenario data to be analyzed includes key video frame data containing potential risk characteristics, continuous video segment data, and sensor anomaly data segments that have a spatiotemporal correlation with the key video frame data or the number of continuous video segments; potential risk characteristics include abnormal behavior characteristics, abnormal heat distribution characteristics, and abnormal environmental parameter fluctuation characteristics.
[0037] Specifically, when constructing multiple edge computing nodes within the preset range of the multi-scenario data acquisition system, based on the spatial division and link load distribution of the campus area, data acquisition devices in different areas are mapped to the corresponding edge computing nodes according to the principle of proximity access. Each edge computing node is responsible for receiving multi-scenario data within its coverage area and performing local initial screening. At the same time, each multi-scenario data is attached with an area identifier, device identifier, and unified timestamp to ensure that the spatiotemporal correlation of key video frame data, continuous video segment data, and sensor abnormal data segments can be consistently represented. The risk scenario data with initial screening value is then output to the cloud analysis platform.
[0038] When each edge computing node includes a lightweight risk screening model and a data preprocessing module, the data preprocessing module first performs frame extraction, compression, denoising, and formatting on the real-time video stream data to generate a standardized data representation adapted to the lightweight risk screening model. The frame extraction process does not use a single strategy with a fixed frame rate, but dynamically determines the candidate frame set by combining the intensity of image changes and risk sensitivity, so that the subsequent lightweight risk screening model can prioritize moments with more likely potential risk features under limited computing power. The compression process reduces the bit rate and generates a fast-decoding bitstream structure while ensuring that potential risk targets are identifiable. The denoising process suppresses imaging noise and compression artifacts to reduce false detections. The formatting process converts video frames, heatmaps, and campus environment data into tensor representations with time and spatial indices, thereby improving the operability of multi-source alignment.
[0039] To achieve dynamic frame extraction processing, the intensity of video change and information complexity are first calculated for each candidate time point, and the frame extraction weight is determined accordingly. The formula for calculating the frame extraction weight is as follows: in, Indicates a point in time The weight values for the selected candidate frames range from 0 to 1. This represents a compression function used to map linear combinations to 0 to 1; The weight coefficients for different discrimination terms are obtained by offline calibration or online adaptive update of edge computing nodes. Indicates a point in time The corresponding video change intensity is obtained by calculating the energy difference between adjacent frames; These respectively represent within the preset time window The mean and dispersion; Indicates a point in time The corresponding image information complexity is obtained by measuring the entropy of the intra-frame grayscale or feature distribution. They represent The mean and dispersion; Indicates a point in time The corresponding potential risk target density proxy quantity can be obtained from the number of candidate boxes or candidate key points output by the lightweight detection head; They represent The mean and dispersion; This indicates a tiny positive number that prevents the denominator from being zero. By simultaneously measuring motion change, information complexity, and candidate density, this formula makes frame extraction more likely to retain key moments that are more likely to contain anomalous behavioral characteristics, thereby reducing invalid uploads and improving initial screening efficiency.
[0040] Compression and denoising are performed collaboratively at the edge to ensure that key video frame data and continuous video segment data can still support the lightweight risk screening model in extracting abnormal behavior features and abnormal heat distribution features. The denoising intensity of each frame adopts adaptive control related to risk sensitivity, and the calculation formula of the control quantity is as follows: in, Indicates a point in time The noise reduction intensity control value ranges from arrive limited; and These represent the boundary values for the lowest and highest denoising intensities, respectively, which are preset by the edge computing nodes based on the device noise level. Indicates the frame extraction weight; Represents the compression function; This represents the weighting coefficient of the control item; Indicates a point in time The noise estimate can be obtained from the high-frequency energy or the variance of the flat region; They represent The mean and dispersion within a preset time window; This represents the compression quality proxy, and its value can be obtained from the bitrate, quantization parameters, or block effect index. They represent The formula determines the mean and dispersion of the data. This formula enables frames with high-risk weights, high noise levels, and strong compression artifacts to achieve stronger denoising, thereby reducing the false detection probability of the lightweight risk screening model in complex scenes.
[0041] When formatting unifies multi-source data into a standardized data representation, candidate frames from real-time video stream data, heatmap data generated by infrared thermal imagers, and campus environmental data generated by IoT sensors are jointly mapped into a multimodal tensor with a time index, and a consistent spatial index is generated for each time point. The formula for constructing the standardized data representation is as follows: in, Indicates a point in time Standardized data representation; Indicates a point in time The corresponding video feature tensor is obtained from candidate frames of real-time video stream data through a lightweight feature extractor. Indicates a point in time The corresponding heatmap feature tensor is obtained by normalizing and aggregating the heatmap data. Indicates a point in time The corresponding sensor feature vector is obtained by encoding campus environmental data, personnel identity information data, and personnel location information data according to preset fields. These represent the standardization operators for the three types of data, This represents the corresponding operator parameter, whose value is obtained through statistical calibration or online updating; This indicates a feature splicing or alignment fusion operation. Through standardized and splicing methods, this formula enables lightweight risk screening models to simultaneously perceive behavior, heat distribution, and environmental fluctuations within the same representation space, thus providing a consistent data foundation for subsequent risk association screening.
[0042] When performing risk association screening on standardized multi-scenario data based on the data preprocessing module, abnormal behavior features, abnormal heat distribution features, and abnormal environmental parameter fluctuation features are first extracted from the standardized data representation. Risk candidate scores are formed based on the principle of consistency of multiple features, thereby screening and outputting the risk scenario data to be analyzed. Among them, abnormal behavior features can be composed of personnel motion vector field, relative distance changes between personnel, and posture conflict indicators; abnormal heat distribution features can be composed of high temperature connectivity intensity, gradient mutation, and persistence indicators of heat map; and abnormal environmental parameter fluctuation features can be composed of mutation amplitude, duration, and abnormal frequency of sensors such as smoke, acoustics, temperature, and humidity.
[0043] To quantify the characteristics of the three potential risk categories and perform joint screening, a multimodal risk score is constructed, and its calculation formula is as follows: in, Indicates a point in time The multimodal risk score ranges from 0 to 1. Represents the compression function; to This represents the weight coefficient of each scoring item, and its value is obtained by edge calibration or adaptive adjustment in combination with historical false alarm rates; This represents the score of abnormal behavior features extracted from the video feature tensor. The value is obtained by weighting indicators such as motion intensity, probability of adversarial posture, and sudden changes in personnel density. This represents the score of abnormal heat distribution characteristics extracted from the feature tensor of the heat map. The value is obtained by combining the area of the high-temperature connected domain, the peak temperature rise proxy, and the persistence. This represents the score of abnormal environmental parameter fluctuation characteristics extracted from the sensor feature vectors. The value is obtained by combining the deviation of each sensor reading with the frequency of sudden changes. , , These represent cross-modal consistency scores, used to characterize the intensity of the co-occurrence of behavioral anomalies and thermal anomalies, behavioral anomalies and environmental fluctuations, and thermal anomalies and environmental fluctuations within the same time window. By incorporating both single-modal anomalies and cross-modal consistency, this formula enables risk association screening to more reliably distinguish between pseudo-anomalies caused by single-source noise and genuine risk indicators pointed to by multi-source consistency.
[0044] Cross-modal consistency scoring is achieved through time window alignment and similarity aggregation. For example, the calculation formula is as follows: in, Indicates a point in time Abnormal behavior feature vectors extracted from video feature tensors; Indicates a point in time The abnormal environmental parameter fluctuation feature vector extracted from the sensor feature vector; This represents the transformation matrix that maps sensor features to a space comparable to behavioral features, with values obtained from a small number of labeled samples at the edge or through self-supervised alignment learning. Represents the dot product of vectors; Represents the vector norm; This indicates the allowable alignment time offset range, and its value is determined by both the sensor response delay and the video sampling period. Indicates the alignment offset; This represents the time offset attenuation coefficient; a smaller value indicates a stronger emphasis on synchronous occurrence. This represents a small positive number. The formula, by finding the position with the highest consistency within the allowable offset range and applying attenuation to the offset, allows the edge side to tolerate minor misalignments caused by asynchronous sampling from different devices, while still emphasizing the near-synchronous linkage characteristics of the same risk event.
[0045] when When the risk candidate conditions are met, key video frame data and continuous video segment data are generated, and sensor anomaly data segments with spatiotemporal correlation are simultaneously extracted. The selection of key video frames is constrained by the local peak value and stability of the risk score, and the formula for calculating the discriminant is as follows: in, Indicates a point in time Criticality score of the selected key video frame data; Indicates multimodal risk score; Indicates The average risk score within the preset window centered on the target; This indicates the degree of dispersion of the risk score within the window; Indicates Centered on, spanning The variance of the risk score; This represents the baseline variance used for normalization, and its value is obtained from historical normal range statistics. Indicates the weighting coefficient; This represents the proxy quantity of abnormal gradient intensity in the heatmap, and its value is obtained by statistically analyzing the proportion of high gradient regions or the gradient peak value in the heatmap. They represent The mean and dispersion; This represents a small positive number. The formula, through the combined constraints of high-risk scoring, a significant increase in relative mean, minimal noise from local fluctuations, and more pronounced thermal anomalies, makes key video frames more representative of critical moments in risk events.
[0046] Continuous video segment data is obtained by segmenting the risk score based on a hysteresis threshold to avoid segment fragmentation caused by risk score jitter near the threshold. The segmentation criteria satisfy: when... When entering the fragment state, when Exit the fragment state at that time, and ;in and The preset threshold is determined by a combination of false alarm rate constraints and expected recall constraints at edge nodes, so that continuous video segment data can cover the initial, development and mitigation stages of potential risk characteristics.
[0047] Sensor anomaly data segments that have a spatiotemporal correlation with key video frame data or continuous video segment data are extracted based on spatiotemporal correlation degree, where the formula for calculating the spatiotemporal correlation degree is: in, Indicates video time points With sensor time points The spatiotemporal correlation between them; Indicates cross-modal consistency score; Indicates a point in time The corresponding video spatial location identifier is obtained by mapping the area covered by the camera or the area of the risky target in the image; Indicates a point in time The corresponding sensor spatial location identifier is determined by the sensor deployment point and its coverage area. This represents the distance or adjacency cost between two spatial location identifiers, and its value is calculated from the campus topological distance, planar distance, or regional adjacency matrix. This represents the spatial attenuation coefficient; a smaller value indicates a stronger emphasis on the correlation between the same or neighboring regions. Indicates a point in time The sensor anomaly intensity is determined by a comprehensive score of the fluctuation characteristics of the abnormal environmental parameters. They represent Mean and dispersion within a preset window; This represents a small positive number. The formula, through the joint constraints of cross-modal consistency, spatial proximity, and sensor anomaly intensity, ensures that the extracted sensor anomaly data segments are aligned temporally with video risk segments and spatially with the risk occurrence areas, thus forming more interpretable risk scenario data for analysis.
[0048] Based on the above risk association screening process, the edge computing node finally outputs multiple risk scenario data to be analyzed. Among them, key video frame data is used to quickly present instantaneous evidence of risk, continuous video segment data is used to cover the risk evolution process, and sensor anomaly data segments are used to provide evidence of abnormal heat distribution characteristics and abnormal environmental parameter fluctuation characteristics. The consistency of the three in the same risk event is maintained through spatiotemporal correlation, so that the cloud analysis platform can further perform cross-modal semantic understanding and risk reasoning.
[0049] Step S103: Input multiple risk scenario data to be analyzed into the cloud analysis platform, and output the campus security risk warning results based on the cloud analysis platform.
[0050] Specifically, multiple risk scenario data output from the edge are aggregated to the cloud analysis platform, where a comprehensive assessment of the overall campus security situation is completed. By performing multimodal fusion and semantic understanding on the risk scenario data, the cloud analysis platform can identify different risk scenarios and their relationships within a unified spatiotemporal semantic framework. Combining risk reasoning and decision-making logic, it outputs corresponding campus security risk warning results, thereby achieving a transformation from local anomaly perception to global risk warning.
[0051] In one possible implementation, step S103 further includes: the cloud analysis platform includes a multimodal data fusion module, a campus safety-specific large model, a risk reasoning and decision engine, and an early warning information generation and distribution module. Multiple risk scenario data to be analyzed are input into the cloud analysis platform, specifically including: acquiring multiple risk scenario data from different edge nodes, and aligning and fusing the multiple risk scenario data in the time and space dimensions to obtain a unified scenario description tensor; constructing a campus safety-specific large model, specifically including: training the campus safety-specific large model based on campus security scenario data; labeling the campus security scenario data with multi-category semantic tags, including fire risk tags, campus bullying risk tags, abnormal intrusion risk tags, and abnormal personnel gathering risk tags; extracting cross-modal semantic information from the scenario description tensor based on the trained campus safety-specific large model; and constructing a risk reasoning and decision engine, specifically including: constructing... A scenario-rule knowledge graph is constructed, where nodes are cross-modal semantic elements, generated from cross-modal semantic information through structured abstraction. Edges in the graph represent semantic relationships between these elements. This graph is then used to build a risk reasoning and decision-making engine. Cross-modal semantic information is input into the engine, which outputs a campus risk warning level. The warning level is then input into a warning information generation and distribution module, which outputs target decision instructions. Specifically, this includes: generating structured warning information based on the warning level, including risk type, location, level, on-site imagery, and risk management suggestions; constructing target decision instructions based on the structured warning information and sending them to corresponding processing terminals using a preset distribution strategy; processing terminals include a security center screen, on-duty personnel's mobile phones, and multi-system linkage control interfaces.
[0052] Specifically, when acquiring data from multiple risk scenarios to be analyzed from different edge nodes, key video frame data, continuous video segment data, and sensor anomaly data segments are uniformly encapsulated into cloud-resolvable event units, while retaining the timestamps, spatial location identifiers, and multimodal risk scores output from the edge side. Metadata enables cloud analytics platforms to organize multi-source segments under the same timeline and spatial semantic benchmark. To reduce alignment errors caused by cross-node clock drift and link jitter, offset estimation and correction are first performed on the timestamp of each edge node, and then sliding matching is performed on segments from different sources on the corrected timeline, thereby providing a consistent indexing system for subsequent alignment and fusion in time and space dimensions.
[0053] When aligning multiple risk scenario data sets to be analyzed in both time and space dimensions, a time offset and a spatial mapping matrix are introduced to jointly determine the cross-source correspondence. The optimal alignment offset is selected based on the criterion of maximum consistency. The formula for determining the alignment offset is as follows: in, Indicates the first Correction offset of each edge node relative to the cloud reference time axis; This indicates the maximum alignment offset range allowed for the search, and its value is set based on the upper bound of the edge node clock drift and the upper bound of the network latency. This represents a set of cloud-based reference times, with values representing candidate time points within a preset time window in the cloud. Indicates a reference time point in the cloud; Indicates the first The original timestamps of the uploaded segments from each edge node; This represents the time decay coefficient; the smaller the value, the more strict the synchronization is emphasized. A feature vector representing abnormal behavior corresponding to key video frame data or continuous video segment data; This represents the feature vector of abnormal environmental parameter fluctuations corresponding to abnormal data segments from sensors. This represents the transformation matrix that maps sensor features to a space comparable to behavioral features. Its values are obtained through alignment training on the cloud or edge side and are consistent with the aforementioned definition. Indicates the similarity of inner products; Represents the vector norm; This indicates a tiny positive number to prevent the denominator from being zero. The formula maximizes the accumulation of cross-modal similarity after time proximity weighting within an allowed offset range, enabling fragments from different edge nodes to form more stable same-event alignments in the cloud.
[0054] After alignment, when performing fusion to obtain a unified scene description tensor, the standardized representations from different sources are first mapped to the same semantic embedding space. Then, the scene description tensor is generated through spatiotemporal attention fusion. The construction formula of the scene description tensor is as follows: in, Indicates a point in time The corresponding scene description tensor; This represents the modal index, with values corresponding to video, heatmap, and sensor data, respectively. Indicates time point The set of adjacent or related time indices on the aligned timeline, with values determined by... The corrected matching result is determined; Representing modes In time index At time point Attention weight, with a value ranging from 0 to 1 and applicable to all Normalization; Representing modes The linear mapping matrix is used to map features of different modalities to the same dimensional space, and its values are obtained by training and learning in the cloud. This represents a feature vector extracted from key video frame data or continuous video segment data and obtained through formatting. This represents the feature vector extracted from the corresponding segment of the heatmap data; This represents the feature vector extracted from the abnormal data fragments from the sensor.
[0055] Note the weight. Driven by both cross-modal consistency and edge-side risk intensity, to ensure that fusion focuses more on high-risk and cross-modal consistent segments, the weighting formula is as follows: in, This represents the similarity function, whose value is obtained from the normalized inner product or cosine similarity. Indicates a point in time The query vector, whose values are determined by... The previous layer representation or initialization prior generation; Representing modes In time index The key vector, whose values are determined by... Obtained through nonlinear transformation; This represents the weighting coefficient, whose value is learned through cloud training or tuned according to false positive constraints; The multimodal risk score output from the edge side is defined in the same way as described above; Representing modes Its complementary mode In time index The cross-modal consistency score can be derived from the aforementioned consistency metric and is consistent with the cross-modal consistency score; the denominator is a normalization term, which makes the weights of all candidate contributions sum to 1. This formula, by simultaneously incorporating semantic relevance, edge risk intensity, and cross-modal consistency into attention allocation, enables the scene description tensor to more stably represent real risk events and suppresses the influence of single-source noise fragments.
[0056] When training a large-scale campus security model based on campus security scenario data, the scenario description tensor... Multi-class semantic labels are used together for supervised learning, enabling the model to learn risk semantic boundaries from the fused multimodal representation. The training objective uses a weighted binary cross-entropy of multi-class semantic labels and superimposed with cross-modal contrast constraints to enhance semantic alignment. The expression for the training loss is: in, Indicates the total training loss; This represents a set of multi-category semantic labels, with values including fire risk labels, school bullying risk labels, abnormal intrusion risk labels, and abnormal gathering of people risk labels. This represents the category weight, and its value is set based on the degree of imbalance in category samples or business priority. Indicates a point in time Category The label indicates a value of 0 or 1; This indicates the category of the large-scale model specifically for campus safety. The predicted probability ranges from 0 to 1; This represents the weight of the contrast constraint, and its value is used to balance the classification supervision and alignment constraint. This represents the set of positive sample pairs, where each value represents the same risk event at different times or different edge nodes. This indicates that the large-scale model specifically designed for campus safety corresponds to... The input is the semantic embedding vector generated; This represents the set of candidate embeddings within the current batch; This represents the temperature coefficient; the smaller the value, the more it emphasizes the separation effect of making closer objects even closer and farther objects even farther apart.
[0057] When extracting cross-modal semantic information from the scene description tensor based on the trained large-scale campus security model, The mapping is performed as a cross-modal semantic information vector, and the semantic confidence corresponding to the multi-category semantic labels is output. The generation of cross-modal semantic information can be represented as follows: in, Indicates a point in time The cross-modal semantic information vector; This represents the encoding transformation of a continuous scene description tensor sequence by a large-scale campus safety model. Indicates from arrive The scene description tensor sequence, whose values are determined by the time span corresponding to the continuous video segment data; This represents a pooling aggregation operator, whose values can be selected using attention pooling or mean pooling to aggregate sequence information; The output projection matrix is used to map the aggregate representation to a cross-modal semantic information space, and its value is obtained through training.
[0058] Cross-modal semantic information when building risk reasoning and decision-making engines As a scene description tensor in the cloud The unified semantic carrier, after semantic understanding is completed, simultaneously contains semantic confidence related to multi-category semantic labels and a sequential representation of risk evolution; to ensure that the subsequent knowledge graph can be stably reused and does not depend on the original vector dimension of a particular input, it is first... Perform structured abstract representations, embedding continuous semantics into sequences. The cross-modal semantic elements are compressed into a finite set, and each cross-modal semantic element is assigned a set of inferable attributes, enabling it to be correlated with multimodal risk scoring. Cross-modal consistency score And pay attention to weight This forms a consistent evidence loop, thereby achieving an interpretable mapping of "semantic elements - relationships - inference outputs" at the graph level.
[0059] When cross-modal semantic elements are generated from cross-modal semantic information through structured abstract representation, The semantic space is simultaneously discretized in both the category semantic space and the evidence space to first form a set of semantic candidates, and then the set of semantic candidates is aggregated into a set of cross-modal semantic elements. The generation of the semantic candidate set is achieved through the analysis of... The process involves multi-head projection and similarity matching with multi-category semantic label prototypes. The expression is as follows: in, Indicates a point in time Corresponding category The semantic activation value ranges from 0 to 1; Represents the compression function; Indicate category The semantic prototype vector, whose values are obtained by embedding similar samples in the training set. It is obtained through aggregation or learned from model parameters; This represents the category bias, whose value is learned during training. The value range covers a multi-category semantic label set. This formula achieves this by... Mapping to the category semantic direction and outputting activation values enables the abstract representation to directly align with fire risk labels, school bullying risk labels, abnormal intrusion risk labels, and abnormal gathering of people risk labels without introducing new feature names.
[0060] To enable cross-modal semantic elements to simultaneously carry three types of information—"category semantics, strength of evidence, and consistency"—evidence indicators from both the edge and fusion sides are incorporated into the element attributes, constructing an attribute vector for each element. Its expression is: in, Indicates a point in time With category Cross-modal semantic feature attribute vectors of common indexes; Indicates a splicing operation; Represents a cross-modal semantic information vector; Indicates category semantic activation; This represents a multimodal risk score, whose definition and value range remain consistent with those described above; Indicates a point in time The cross-modal consistency aggregation quantity, the value of which can be derived from... The weighted average of the consistent terms of the same type is obtained by taking the average value of the modal set. Indicates a point in time The attention weight aggregation amount, the value of which can be determined by... exist The maximum value or entropy measure on the modality set is used to reflect the fusion concentration. This formula is obtained by taking the maximum value or entropy measure on the modality set. and As a semantic core, , , As evidence cores, cross-modal semantic elements maintain a direct association with the cross-modal semantic information in claim 4, and can be stably referenced by graph reasoning.
[0061] Structured abstract representation generates cross-modal semantic element sets To avoid instability caused by momentary jitter, at that time, Perform consistent clustering within the time window to form representative elements. Its clustering objective can be achieved by minimizing intra-window dispersion and maximizing inter-window separability, as expressed in the following expression: in, Indicates the first A representative attribute vector of a cross-modal semantic element; Indicates allocation to the first The set of indices for each cluster is determined iteratively by the principle of maximizing intra-window similarity; This represents the feature metric matrix, with values used for emphasis. , and Key dimensions, and can be set by training or experience; Represents the norm; This represents the weight of the inter-cluster separation regularization term, and its value is used to balance intra-cluster compactness and inter-cluster separation. This represents a small positive number. By compressing similar semantics and evidence patterns into stable elements within a time window, this formula enables the nodes of the subsequent scene-rule knowledge graph to be reusable rather than transient data points.
[0062] When constructing a scenario-rule knowledge graph, cross-modal semantic elements are used as nodes and organized into a graph. ,in Composed of a cross-modal semantic element set, node features are taken The edges of the scene-rule knowledge graph represent semantic relationships between cross-modal semantic elements. Their generation considers three constraints simultaneously: semantic similarity, temporal causality, and spatial consistency. This ensures that edges not only express "similarity" but also "coupling within the same event / evolutionary chain / region." The edge weights of these semantic relationships... It can be determined in the following uniform form: in, Represents a node With nodes The strength of the semantic association between them is a value ranging from 0 to 1; Represents the compression function; to This represents the weighting coefficient, whose value is obtained from historical early warning performance calibration or training learning; Indicates the inner product; and These represent the representative attribute vectors of the two cross-modal semantic elements; This represents the consistency gain of the strength of evidence for the corresponding risks of the two factors, and its value can be determined by both factors. Weighted averages or minimum values are constructed to emphasize common high risk; This represents the cross-modal consistency gain, and its value is determined by both... Weighted average or minimum value construction; This represents the time interval between two elements on the aligned timeline, and its value is calculated from the absolute value of the difference in their timestamps.
[0063] When semantic relationships need to reflect spatial constraints, incorporating spatially consistent terms into edge weights without changing the overall name of the edges can... The term is expanded to a spatiotemporal composite cost. Its expression is: in, This represents the cost of spatiotemporal composite composition; Indicates a time interval; This represents the distance or adjacency cost between the corresponding spatial location identifiers of the two elements, and its meaning is the same as described above. Consistent, but the target of the action is the feature-level spatial identifier; It represents a trade-off factor between time and space costs, and its value is used to transform spatial inconsistency into a penalty intensity that can be compared with the time interval on the same scale.
[0064] When introducing scenario-rule knowledge graphs to build risk reasoning and decision-making engines, the graph... As a reasoning vehicle, it integrates cross-modal semantic information. Mapping to graph node activation vectors The data is then propagated and aggregated on the graph to output the campus risk warning level; the node activation vector can be generated by... , and Together, they form a starting point for reasoning that simultaneously includes semantic category bias and strength of evidence. Its initialization formula is: in, Represents a node The initial activation value ranges from 0 to 1; Represents the compression function; This represents the inference readout vector, whose values are learned through training to emphasize the element attribute dimensions that contribute to the early warning. This represents the bias term, whose value is learned through training. Represents a node The representative attribute vector.
[0065] When performing risk reasoning on a graph, the edge weights are based on semantic relationships. Perform weighted propagation to form a risk evidence chain from highly activated and strongly correlated elements and increase the confidence level of the corresponding risk level; propagation updates can adopt a gated weighted message passing form, expressed as: in, Indicates the first Layer Time Node The hidden representation, initially can be taken or ; This indicates the updated hidden representation; Represents the self-loop transformation matrix; Represents the neighbor message transformation matrix; Represents a node The set of neighbors, whose values are determined by semantic associations; Indicates the edge weight; and This represents the gating network parameters, used to determine whether neighbor messages should be absorbed; its value is obtained through training. Indicates splicing; This represents a nonlinear compression function. The formula aggregates local element evidence into a higher-level risk semantic structure through "edge weighting + gating screening + hierarchical propagation," enabling laboratory-related thermal anomalies and smoke-related semantic elements to form stronger co-activations on the graph and drive higher-risk outputs, without needing to list specific rule statements at the claim level.
[0066] When outputting the campus risk warning level, the propagated node representations are aggregated at the graph level and mapped to the warning level space. The probability vector of the campus risk warning level can be represented as: in, A probability vector representing the campus risk warning level; This represents a normalization function that makes the sum of the probabilities of each level equal to 1. and These represent the rank mapping matrix and the bias term, respectively, whose values are obtained through training. This indicates the propagation layer number, and its value is set based on the graph size and inference depth. Represents a node Attention weights for graph-level output, ranging from 0 to 1 and normalized for all nodes, are used to emphasize more critical cross-modal semantic elements. This represents the final layer node representation. This formula compresses the cross-modal semantic element network into a single warning level output through "graph-level aggregation + level mapping," enabling the risk reasoning and decision-making engine to stably generate campus risk warning levels under the constraints of the graph structure, and naturally inheriting the aforementioned... , The chain of evidence formed by consistency metrics on the edge side and the fusion side.
[0067] To ensure consistency with the strength of evidence on the periphery, it is possible to simultaneously introduce The aggregation term with the edge-side risk intensity, without changing the existing symbol definition, is expressed as follows: in, Indicate the node's attention weight; This represents the trade-off coefficient, the value of which is obtained through training. Indicates the initial activation level of the node; This indicates the attention readout vector, whose values are learned through training to highlight the direction of nodes that are more sensitive to level determination; This represents the final layer node representation. By incorporating both "initial activation" and "post-propagation semantic structure contribution" into the weight normalization, the inference engine respects both the high-risk evidence already formed on the edge and fusion sides, and utilizes graph propagation to discover implicit association chains between cross-modal semantic elements, thereby completing the structured inference output from cross-modal semantic information to campus risk warning levels.
[0068] After extracting cross-modal semantic information, this information is fed into the risk reasoning and decision-making engine as high-level semantic input. The engine uses a scenario-rule knowledge graph to perform associative reasoning and comprehensive judgment on the cross-modal semantic information, thereby outputting the corresponding campus risk warning level. This campus risk warning level characterizes the comprehensive judgment result of the current risk event in terms of its severity, development stage, and urgency of handling. Subsequently, the campus risk warning level is input into the warning information generation and distribution module. Under the constraints of the risk reasoning and decision-making engine's output, this module generates structured warning information. This structured warning information maps the risk reasoning results into content that can be directly understood and executed by people and the system. This includes risk types characterizing the risk semantic category, risk locations indicating the risk occurrence area, risk levels quantifying the severity of the risk, visual representations of the risk scene, and risk handling suggestions to guide subsequent responses. Based on this, target decision instructions are constructed using structured early warning information and sent to the corresponding processing terminals according to a preset distribution strategy. This enables different roles and systems to respond collaboratively based on the same risk perception framework. The processing terminals include the security center's large screen, the on-duty personnel's mobile phones, and multi-system linkage control interfaces.
[0069] Taking a safety risk in a laboratory area as an example, the risk scenario data to be analyzed output by the edge computing node is aligned and fused by the multimodal data fusion module in the cloud to form a scenario description tensor. The campus safety-specific large model performs semantic understanding on the scenario description tensor and outputs cross-modal semantic information containing the semantic tendency of fire risk. The risk reasoning and decision engine receives this cross-modal semantic information and combines it with the semantic association relationship related to the laboratory scenario in the scenario-rule knowledge graph to perform reasoning and output a higher level of campus risk warning. The warning information generation and distribution module generates structured warning information based on the campus risk warning level. The risk type corresponding to the structured warning information is fire risk, the risk location is the area where the laboratory is located, the risk level is high, the risk scene image is the corresponding key video frame or continuous video clip, and the risk handling suggestion is to start evacuation and link the fire protection system. Based on this, a target decision instruction is generated and sent to the security center's large screen for centralized display of the risk situation and pushed to the mobile phones of on-duty personnel for prompting manual handling. At the same time, the linkage response of relevant safety facilities is triggered through the multi-system linkage control interface, thereby realizing a complete closed loop from multi-scenario data collection, model analysis to risk warning and handling execution.
[0070] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of the relevant data must comply with relevant regulations.
[0071] Please refer to Figure 2 This illustration shows a schematic diagram of a campus security analysis and early warning device based on edge pre-screening provided in an embodiment of this application. The device includes an acquisition module 21 and a processing module 22, wherein... The acquisition module 21 is used to acquire multi-scene data within the campus and send the multi-scene data to the edge computing node.
[0072] The processing module 22 is used to perform initial screening of multi-scenario data based on edge computing nodes and output multiple risk scenario data to be analyzed; input the multiple risk scenario data to be analyzed into the cloud analysis platform and output the campus security risk warning results based on the cloud analysis platform.
[0073] In one possible implementation, the acquisition module 21 is used to acquire multi-scenario data within the campus, specifically including: constructing a multi-scenario data acquisition system, which includes data acquisition devices deployed in various areas of the campus; the data acquisition devices include network cameras, infrared thermal imagers, IoT sensors, and access control card swiping systems; acquiring multi-scenario data within the campus based on the multi-scenario data acquisition system; the multi-scenario data includes real-time video stream data, heat map data, campus environmental data, personnel identity information data, and personnel location information data.
[0074] In one possible implementation, the processing module 22 is used to perform preliminary screening of multi-scenario data based on edge computing nodes and output multiple risk scenario data to be analyzed. Specifically, it includes: constructing multiple edge computing nodes within a preset range of the multi-scenario data acquisition system; each edge computing node includes a lightweight risk screening model and a data preprocessing module; performing frame extraction, compression, noise reduction, and formatting on the multi-scenario data based on the lightweight risk screening model to generate a standardized data representation adapted to the lightweight risk screening model; performing risk correlation screening on the standardized multi-scenario data based on the data preprocessing module and obtaining multiple risk scenario data to be analyzed; the risk scenario data to be analyzed includes key video frame data containing potential risk characteristics, continuous video segment data, and sensor anomaly data segments that have a spatiotemporal correlation with the key video frame data or the number of continuous video segments; the potential risk characteristics include abnormal behavior characteristics, abnormal heat distribution characteristics, and abnormal environmental parameter fluctuation characteristics.
[0075] In one possible implementation, the processing module 22, used in the cloud analysis platform, includes a multimodal data fusion module, a campus safety-specific large model, a risk reasoning and decision engine, and an early warning information generation and distribution module. It inputs multiple risk scenario data to be analyzed into the cloud analysis platform, specifically including: acquiring multiple risk scenario data from different edge nodes, aligning and fusing the multiple risk scenario data in time and space dimensions to obtain a unified scenario description tensor; extracting cross-modal semantic information from the scenario description tensor based on the trained campus safety-specific large model; inputting the cross-modal semantic information into the risk reasoning and decision engine and outputting the campus risk early warning level; and inputting the campus risk early warning level into the early warning information generation and distribution module and outputting target decision instructions.
[0076] In one possible implementation, the processing module 22 is used to construct a large-scale model specifically for campus security, which specifically includes: training the large-scale model for campus security based on campus security scenario data; the campus security scenario data is labeled with multi-category semantic tags, including fire risk tags, campus bullying risk tags, abnormal intrusion risk tags, and abnormal gathering of people risk tags.
[0077] In one possible implementation, the processing module 22 is used to construct a risk reasoning and decision engine, specifically including: constructing a scenario-rule knowledge graph, wherein the nodes of the scenario-rule knowledge graph are cross-modal semantic elements, the cross-modal semantic elements are generated by cross-modal semantic information through structured abstract representation, and the edges of the scenario-rule knowledge graph are the semantic associations between cross-modal semantic elements; and introducing the scenario-rule knowledge graph to construct the risk reasoning and decision engine.
[0078] In one possible implementation, the processing module 22 is used to input the campus risk warning level into the warning information generation and distribution module and output the target decision instruction. Specifically, it includes: generating structured warning information based on the campus risk warning level, the structured warning information including risk type, risk location, risk level, on-site risk image, and risk handling suggestions; constructing the target decision instruction based on the structured warning information and sending the target decision instruction to the corresponding processing terminal based on a preset distribution strategy; the processing terminal includes a security center screen, on-duty personnel's mobile phone, and a multi-system linkage control interface.
[0079] It should be noted that the above embodiments of the apparatus are only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0080] This application also provides an electronic device. (See reference...) Figure 3 , Figure 3 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. The electronic device may include: at least one processor 301, at least one communication bus 302, a user interface 303, at least one network interface 304, and a memory 305.
[0081] The communication bus 302 is used to enable communication between these components.
[0082] The user interface 303 may include a display screen and a camera. Optionally, the user interface 303 may also include a standard wired interface and a wireless interface.
[0083] The network interface 304 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface).
[0084] The processor 301 may include one or more processing cores. The processor 301 connects to various parts of the server using various interfaces and lines, and performs various server functions and processes data by running or executing instructions, programs, code sets, or instruction sets stored in memory 305, and by calling data stored in memory 305. Optionally, the processor 301 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 301 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem handles wireless communication. It is understood that the modem may also not be integrated into the processor 301 and may be implemented as a separate chip.
[0085] The memory 305 may include random access memory (RAM) or read-only memory. Optionally, the memory 305 may include a non-transitory computer-readable storage medium. The memory 305 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 305 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the above-described method embodiments, etc.; the data storage area may store data involved in the above-described method embodiments, etc. Optionally, the memory 305 may also be at least one storage device located remotely from the aforementioned processor 301. (Refer to...) Figure 3 The memory 305, which serves as a computer storage medium, may include an operating system, a network communication module, a user interface module, and a campus security analysis and early warning application based on edge pre-screening.
[0086] exist Figure 3In the illustrated electronic device, the user interface 303 is primarily used to provide an input interface for the user and acquire user input data; while the processor 301 can be used to call the campus security analysis and early warning application based on edge pre-screening stored in the memory 305. When executed by one or more processors 301, the electronic device performs one or more of the methods described in the above embodiments. It should be noted that, for the foregoing method embodiments, for the sake of simplicity, they are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, because according to this application, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to this application.
[0087] This application also provides a non-transitory computer-readable storage medium storing instructions. When executed by one or more processors, these instructions cause an electronic device to perform one or more of the methods described in the above embodiments.
[0088] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0089] In the various embodiments provided in this application, it should be understood that the disclosed apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some service interface; the indirect coupling or communication connection between apparatuses or units may be electrical or other forms.
[0090] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0091] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0092] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage device (CMD). Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a memory and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned memory includes various media capable of storing program code, such as USB flash drives, portable hard drives, magnetic disks, or optical disks.
[0093] The above description is merely an exemplary embodiment disclosed in this application and should not be construed as limiting the scope of this application. Any equivalent changes and modifications made in accordance with the teachings of this application shall still fall within the scope of this application.
[0094] This application is intended to cover any variations, uses, or adaptations disclosed herein that follow the general principles disclosed herein and include common knowledge or customary technical means in the art that are not described in this application.
Claims
1. A campus security analysis and early warning method based on edge pre-screening, characterized in that, The method includes: Acquire multi-scene data within the campus and send the multi-scene data to the edge computing node; The edge computing node performs initial screening of the multi-scenario data and outputs multiple risk scenario data to be analyzed. The data from multiple risk scenarios to be analyzed are input into the cloud analysis platform, and the campus security risk warning results are output based on the cloud analysis platform.
2. The method according to claim 1, characterized in that, The acquisition of multi-scenario data within the school specifically includes: A multi-scenario data acquisition system is constructed, which includes data acquisition devices deployed in various areas of the campus; the data acquisition devices include network cameras, infrared thermal imagers, IoT sensors, and access control card swiping systems; The multi-scenario data acquisition system acquires multi-scenario data within the campus; the multi-scenario data includes real-time video stream data, heat map data, campus environment data, personnel identity information data, and personnel location information data.
3. The method according to claim 2, characterized in that, The initial screening of the multi-scenario data based on the edge computing node, and the output of multiple risk scenario data to be analyzed, specifically includes: Multiple edge computing nodes are constructed within a preset range of the multi-scenario data acquisition system; each edge computing node includes a lightweight risk screening model and a data preprocessing module. Based on the lightweight risk screening model, the multi-scenario data is subjected to frame extraction, compression, noise reduction, and formatting to generate a standardized data representation adapted to the lightweight risk screening model. Based on the data preprocessing module, risk association screening is performed on the standardized multi-scenario data, and multiple risk scenario data to be analyzed are obtained; the risk scenario data to be analyzed includes key video frame data containing potential risk characteristics, continuous video segment data, and sensor abnormal data segments that have a spatiotemporal correlation with the key video frame data or the number of continuous video segments; the potential risk characteristics include abnormal behavior characteristics, abnormal heat distribution characteristics, and abnormal environmental parameter fluctuation characteristics.
4. The method according to claim 1, characterized in that, The cloud-based analytics platform includes a multimodal data fusion module, a large-scale campus safety model, a risk reasoning and decision-making engine, and an early warning information generation and distribution module. The input of multiple risk scenario data to be analyzed into the cloud-based analytics platform specifically includes: Multiple risk scenario data to be analyzed are acquired from different edge nodes, and the multiple risk scenario data to be analyzed are aligned and fused in the time and space dimensions to obtain a unified scenario description tensor; Based on the trained campus security-specific large model, cross-modal semantic information is extracted from the scene description tensor. The cross-modal semantic information is input into the risk reasoning and decision engine, and the campus risk warning level is output. The campus risk warning level is input into the warning information generation and distribution module, and the target decision instruction is output.
5. The method according to claim 4, characterized in that, The construction of the aforementioned large-scale campus security model specifically includes: The campus security-specific large model is trained based on campus security scenario data; the campus security scenario data is labeled with multi-category semantic tags, including fire risk tags, campus bullying risk tags, abnormal intrusion risk tags, and abnormal gathering of people risk tags.
6. The method according to claim 4, characterized in that, The construction of the risk reasoning and decision-making engine specifically includes: A scenario-rule knowledge graph is constructed, wherein the nodes of the scenario-rule knowledge graph are cross-modal semantic elements, the cross-modal semantic elements are generated by the cross-modal semantic information through structured abstract representation, and the edges of the scenario-rule knowledge graph are the semantic associations between the cross-modal semantic elements; The scenario-rule knowledge graph is introduced to construct the risk reasoning and decision-making engine.
7. The method according to claim 4, characterized in that, The step of inputting the campus risk warning level into the warning information generation and distribution module and outputting the target decision instruction specifically includes: Based on the campus risk warning level, structured warning information is generated, which includes risk type, risk location, risk level, on-site images of the risk, and risk handling suggestions. Based on the structured early warning information, a target decision instruction is constructed, and the target decision instruction is sent to the corresponding processing terminal based on a preset distribution strategy; the processing terminal includes a security center screen, a duty officer's mobile phone, and a multi-system linkage control interface.
8. A campus security analysis and early warning device based on edge pre-screening, characterized in that, The device includes an acquisition module and a processing module, wherein, The acquisition module is used to acquire multi-scene data within the campus and send the multi-scene data to the edge computing node; The processing module is used to perform initial screening of the multi-scenario data based on the edge computing node and output multiple risk scenario data to be analyzed; input the multiple risk scenario data to be analyzed into the cloud analysis platform and output the campus security risk warning results based on the cloud analysis platform.
9. An electronic device, characterized in that, The device includes a processor, a communication bus, a user interface, a network interface, and a memory. The memory is used to store instructions. The user interface and the network interface are used to communicate with other devices. The processor is used to execute the instructions stored in the memory to cause the electronic device to perform the method as described in any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium, characterized in that, The non-transitory computer-readable storage medium stores instructions that, when executed, perform the method as described in any one of claims 1 to 7.