Construction dynamic risk real-time early warning method and system based on artificial intelligence multi-source perception
By integrating multi-source sensing data and heterogeneous graph neural networks, the problems of single sensing dimension and delayed early warning in construction site safety monitoring systems have been solved, enabling all-weather, high-precision dynamic risk early warning for construction sites and improving the safety management capabilities of construction sites.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU SHANGCHENG DISTRICT URBAN CONSTRUCTION INVESTMENT GROUP CO LTD
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing construction site safety monitoring systems suffer from problems such as limited perception dimensions, insufficient dynamic interaction and understanding, and delayed early warnings, making it difficult to achieve intelligent safety management across the entire process and all elements.
An AI-based multi-source perception method is adopted to construct a four-dimensional spatiotemporal data cube by acquiring two-dimensional visual images, three-dimensional laser point clouds and IoT environmental sensing data. The semantic features and spatial location features of the construction site entities are fused using a heterogeneous graph neural network model to calculate the dynamic risk potential field distribution and generate multi-level early warning instructions.
It achieves all-weather, high-precision 3D perception, deeply understands the complex interactive relationships at the construction site, and can proactively provide dynamic risk warnings, reduce false alarm rates, and improve the safety management level of the construction site.
Smart Images

Figure CN122243200A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence and smart building safety technology, and in particular to a method and system for real-time early warning of dynamic construction risks based on multi-source perception using artificial intelligence. Background Technology
[0002] With the continuous expansion of modern construction projects and the increasing complexity of construction technologies, safety management at construction sites faces unprecedented challenges. Construction sites are typically highly dynamic and unstructured environments, characterized by high personnel mobility, frequent cross-operation of large machinery, and constantly changing temporary facilities, coupled with environmental disturbances such as noise, dust, and inclement weather. Statistics show that the accident rate in the construction industry has long ranked among the highest of all industrial sectors, with falls from heights, falling objects, machinery injuries, and collapses being particularly prominent. How to achieve intelligent safety monitoring of the entire construction process and all its elements has become a core issue that urgently needs to be addressed in the construction of smart construction sites.
[0003] Traditional construction safety monitoring mainly relies on manual inspections and fixed-point video surveillance. However, these traditional methods have significant limitations: First, manual inspections have significant blind spots and time lags. Limited by manpower and inspection frequency, managers cannot achieve real-time, all-weather, and comprehensive supervision. Furthermore, the quality of inspections is easily affected by the subjective experience, sense of responsibility, and fatigue of managers, making it difficult to promptly detect latent and sudden safety hazards.
[0004] Secondly, traditional video surveillance systems are mostly "passive" recording systems, primarily used for post-event review rather than pre-event warning. Monitoring footage relies on human monitoring, but human attention cannot maintain high concentration for extended periods, making it easy to miss critical risk signals. Although specific behavior recognition technologies based on computer vision have emerged in recent years, such as helmet detection and restricted area intrusion detection, these technologies are typically based on a single visual modality and are susceptible to adverse environmental conditions such as changes in lighting, object obstruction, and rain or fog, resulting in high rates of missed or false alarms and significantly reducing their effectiveness in practical engineering applications.
[0005] Third, existing technologies lack a comprehensive understanding of construction sites and are deficient in deep semantic analysis capabilities. Current safety early warning systems are often discrete and based on simple, pre-defined rules, such as triggering an alarm when a person enters an electronic fence. This "black and white" logic fails to capture the dynamic risks arising from the complex interactions between people, machines, and the environment at construction sites. For example, a worker standing within the coverage area of a tower crane may face low risk if the crane is stationary, but the risk increases dramatically if it is lifting heavy objects and moving towards the worker. Similarly, a worker at the edge of a deep foundation pit may face manageable risks while walking normally, but the risk can be drastically different if ground subsidence occurs or the worker becomes unconscious. Existing technologies struggle to capture these spatiotemporal dynamic correlations and multi-factor coupling effects, making it impossible to accurately predict and quantify potential risks such as collisions, object strikes, and collapses.
[0006] Fourth, the degree of multi-source data fusion is low, and the data silo effect is severe. While existing construction sites may simultaneously deploy various devices such as video surveillance, environmental sensors, and positioning tags, these systems often operate independently, lacking effective correlation and collaborative analysis between the data. Visual data provides semantic information but lacks precise 3D location; laser point clouds provide high-precision geometric information but lack semantic labels; and environmental sensors provide physical parameters but cannot be associated with specific entities. This fragmented data situation severely restricts the system's ability to comprehensively assess complex and risky scenarios.
[0007] In summary, there is an urgent need for a technical solution that can integrate multi-source sensing data, deeply understand the semantics of construction scenarios, and quantify and predict dynamic interactive risks in real time, so as to overcome the limitations of existing technologies and improve the inherent safety level and intelligent management capabilities of construction sites. Summary of the Invention
[0008] In order to overcome the above-mentioned defects of the prior art, the present invention provides a method and system for real-time early warning of construction dynamic risks based on artificial intelligence multi-source perception, so as to solve the problems existing in the background art.
[0009] This invention provides the following technical solution: a method and system for real-time early warning of construction dynamic risks based on artificial intelligence multi-source perception, comprising the following steps: S100: Acquire multi-source heterogeneous sensing data from the construction site, including two-dimensional visual image data, three-dimensional laser point cloud data, and IoT environmental sensing data. S200: Perform time synchronization and spatial registration processing on the multi-source heterogeneous sensing data to construct a unified four-dimensional spatiotemporal data cube; S300: Input the four-dimensional spatiotemporal data cube into the heterogeneous graph neural network model to extract the semantic features and spatial location features of each construction entity in the scene and construct a dynamic semantic scene graph; S400: Based on the dynamic semantic scene graph, the historical trajectory features and current state features of each construction entity are fused using a multi-head attention mechanism to calculate the dynamic risk potential field distribution at the construction site. S500: Based on the dynamic risk potential field distribution, calculate the real-time risk value of each construction entity through the risk quantification function, and generate corresponding multi-level early warning instructions according to the real-time risk value.
[0010] Furthermore, the step of constructing the dynamic semantic scene graph includes: The Kalman filter algorithm is used to perform time interpolation on sensor data with different acquisition frequencies to achieve microsecond-level timestamp synchronization; A global world coordinate system is constructed, and a calibration matrix is used to uniformly map the pixel coordinates of the two-dimensional visual image data and the point cloud coordinates of the three-dimensional laser point cloud data to the global world coordinate system. The mapping formula is as follows: in, Represents global world coordinates. Represents pixel coordinates, For depth information, For the camera intrinsic parameter matrix, and These are the rotation matrix and the translation vector, respectively.
[0011] Furthermore, in the step of constructing the dynamic semantic scene graph, the dynamic semantic scene graph is defined as follows: ,in Indicates time The set of construction entity nodes, This represents the set of interaction edges between entities; the node feature update formula of the heterogeneous graph neural network model is: in, Indicates the first Layer nodes eigenvectors, Represents a set of types of relations. Represents a node In relation The set of neighboring nodes below, The normalization constant is and The weight matrix is a learnable matrix. This is the activation function.
[0012] Furthermore, the step of calculating the dynamic risk potential field distribution at the construction site includes: Define static risk potential field With dynamic risk potential field The static risk potential field consists of the adjacent opening, the electrified area, and the restricted area; the dynamic risk potential field consists of mobile machinery and equipment and personnel; the total risk potential field... The calculation formula is: in, For spatial location points, For the first Location of a static hazard source This refers to the hazard level coefficient. For the first A dynamic entity at time Location, It is a velocity vector. The angle between the velocity direction and the position vector difference. All are field stress modulation coefficients.
[0013] Furthermore, the risk quantification function integrates the collision risk probability and the unsafe state of the environment, and its expression is: in, This represents the collision probability based on trajectory prediction. This represents the environmental insecurity index based on environmental sensor data. This represents the confidence level for unsafe behaviors based on skeletal keypoint recognition. These are dynamically adaptive weighting coefficients.
[0014] Furthermore, the collision probability based on trajectory prediction The computation is based on Long Short-Term Memory (LSTM) networks for future... Trajectory derivation at each time step, the calculation formula includes: in, For the predicted time entity With entity The Euclidean distance between them For safe distance threshold, This is the sensitivity coefficient.
[0015] A real-time early warning system for dynamic construction risks based on artificial intelligence and multi-source perception includes: The multi-source sensing module is used to collect visual images, laser point clouds, and environmental parameter data at the construction site. The data fusion engine is used to map the multi-source data to the global coordinate system using a spatiotemporal alignment algorithm, and to perform data cleaning and completion. The scene graph construction module is used to identify construction entities using heterogeneous graph neural networks and construct a dynamic semantic scene graph that includes entity attributes and spatial relationships. The risk calculation and decision-making module is used to calculate the total risk potential field based on the dynamic semantic scene graph and output the warning level through the risk quantification function; A real-time interactive terminal is used to receive the warning level and provide immediate feedback via an audible and visual alarm or a handheld terminal.
[0016] Furthermore, the scene graph construction module specifically includes: The 3D target detection unit is used to process laser point cloud data and generate three-dimensional bounding boxes of objects. 2D semantic segmentation unit is used to process visual images and extract pixel-level category information of objects; The feature association unit is used to project 3D bounding boxes onto 2D images and fuse visual and spatial features through the intersection-over-union (IoU) matching algorithm to generate semantically rich node feature vectors.
[0017] An electronic device includes: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the construction dynamic risk real-time early warning method based on artificial intelligence multi-source perception as described in any one of the above.
[0018] A computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the real-time early warning method for construction dynamic risks based on multi-source perception using artificial intelligence, as described in any one of the preceding descriptions.
[0019] The technical effects and advantages of this invention are as follows: Multi-dimensional perception fusion: It breaks through the limitations of traditional single visual monitoring, which is greatly affected by ambient light and obstruction. Through the complementarity of LiDAR and environmental sensors, it realizes all-weather, high-precision three-dimensional perception.
[0020] Deep semantic understanding: By introducing heterogeneous graph neural networks, the recognition of isolated objects is improved to the understanding of the interaction relationships between entities in the scene, which can identify complex logical risks such as "human-machine mixed operation" and "unauthorized entry into dangerous areas".
[0021] Forward-looking dynamic early warning: Based on the artificial potential field method and trajectory prediction model, it realizes the leap from "alarm" to "early warning", which can judge the trend of collision or accident several seconds in advance, and buy valuable time for on-site personnel to avoid danger.
[0022] Adaptive risk quantification: The risk calculation formula with dynamic weights enables the system to adapt to the safety management needs of different construction stages and different weather conditions, reducing the false alarm rate and improving the system's practicality. Attached Figure Description
[0023] Figure 1 A flowchart illustrating a method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence, provided in an embodiment of this application; Figure 2 A schematic diagram illustrating the principle of multi-source data spatiotemporal alignment and fusion provided in the embodiments of this application; Figure 3 This is a diagram illustrating the architecture for constructing dynamic scene graphs based on heterogeneous graph neural networks, as provided in the embodiments of this application. Figure 4 The structural block diagram of the construction dynamic risk real-time early warning system based on artificial intelligence multi-source perception provided in the embodiments of this application. Detailed Implementation
[0024] The technical solution of the present invention will be described in detail below. This embodiment is only used to explain the present invention and is not intended to limit the scope of protection of the present invention.
[0025] Firstly, this application provides a method for real-time early warning of dynamic construction risks based on multi-source perception using artificial intelligence, including: S100: Acquire multi-source heterogeneous sensing data from the construction site, including two-dimensional visual image data, three-dimensional laser point cloud data, and IoT environmental sensing data. S200: Performs time synchronization and spatial registration processing on multi-source heterogeneous sensing data to construct a unified four-dimensional spatiotemporal data cube; S300: Input the four-dimensional spatiotemporal data cube into the heterogeneous graph neural network model to extract the semantic features and spatial location features of each construction entity in the scene and construct a dynamic semantic scene graph; S400: Based on the dynamic semantic scene graph, it uses a multi-head attention mechanism to fuse the historical trajectory features and current state features of each construction entity to calculate the dynamic risk potential field distribution at the construction site. S500: Based on the dynamic risk potential field distribution, it calculates the real-time risk value of each construction entity through a risk quantification function, and generates corresponding multi-level early warning instructions based on the real-time risk value.
[0026] In the above implementation process, the concepts of "four-dimensional spatiotemporal data cube" and "dynamic semantic scene graph" were introduced. By deeply fusing the rich texture information of vision, the high-precision spatial geometric information of LiDAR, and the physical state information of environmental sensors, the limitations of single modality are overcome. Heterogeneous Graph Neural Network (HGNN) can explicitly model the complex topological relationships between different types of entities (such as workers, excavators, and deep foundation pits) in a construction site, allowing the system not only to "see" objects but also to "understand" the interactions between them.
[0027] Furthermore, the steps for time synchronization and spatial registration of multi-source heterogeneous sensing data include: The Kalman filter algorithm is used to perform time interpolation on sensor data with different acquisition frequencies to achieve microsecond-level timestamp synchronization; A global world coordinate system is constructed, and a calibration matrix is used to uniformly map the pixel coordinates of the 2D visual image data and the point cloud coordinates of the 3D laser point cloud data to the global world coordinate system. The mapping formula is as follows: in, Represents global world coordinates. Represents pixel coordinates, For depth information, For the camera intrinsic parameter matrix, and These are the rotation matrix and the translation vector, respectively.
[0028] In the above implementation process, accurate spatiotemporal registration is the foundation of multi-source fusion. Because the sampling frequencies of the camera and LiDAR are inconsistent (e.g., camera 25Hz, LiDAR 10Hz), direct fusion would lead to data misalignment. Kalman filtering is used for temporal interpolation and smoothing, ensuring strict correspondence of data at the same moment. Simultaneously, a rigorous coordinate mapping formula ensures that pixels in the image accurately correspond to three-dimensional coordinates in physical space, providing a geometric benchmark for subsequent distance calculations and spatial relationship inference.
[0029] Furthermore, in the step of constructing a dynamic semantic scene graph, the dynamic semantic scene graph is defined as follows: ,in Indicates time The set of construction entity nodes, This represents the set of interaction edges between entities; the node feature update formula for the heterogeneous graph neural network model is: in, Indicates the first Layer nodes eigenvectors, Represents a set of types of relations. Represents a node In relation The set of neighboring nodes below, The normalization constant is and The weight matrix is a learnable matrix. This is the activation function.
[0030] In the above implementation process, traditional convolutional neural networks (CNNs) excel at processing Euclidean space data (such as images), but perform poorly when processing unstructured relational data. A construction site is essentially a complex relational network composed of people, machines, and objects. This formula updates the features of the current node by aggregating information from neighboring nodes, enabling it to capture high-order semantic risk features such as "workers approaching a running excavator."
[0031] Furthermore, define the static risk potential field. With dynamic risk potential field The static risk potential field consists of adjacent openings, electrified areas, and restricted areas; the dynamic risk potential field consists of mobile machinery and equipment and personnel; the total risk potential field... The calculation formula is: in, For spatial location points, For the first Location of a static hazard source This refers to the hazard level coefficient. For the first A dynamic entity at time Location, It is a velocity vector. The angle between the velocity direction and the position vector difference. All are field stress modulation coefficients.
[0032] In the above implementation process, the concept of "Artificial Potential Field" is introduced to quantify the risk field. Unlike traditional binary alarms (dangerous / not dangerous), the potential field method constructs a continuously changing risk gradient field. The static potential field reflects inherent hazards (such as deep pits), while the dynamic potential field reflects the risks brought by mobile equipment. Furthermore, the formula specifically considers the velocity vector. The angle θ between the direction and the mechanical equipment means that if the mechanical equipment is moving at high speed toward a certain area, the potential field strength in that area will increase sharply, thus achieving a forward-looking trend warning.
[0033] Furthermore, the risk quantification function integrates the collision risk probability and the unsafe state of the environment, and its expression is: in, This represents the collision probability based on trajectory prediction. This represents the environmental insecurity index based on environmental sensor data. This represents the confidence level for unsafe behaviors based on skeletal keypoint recognition. These are dynamically adaptive weighting coefficients.
[0034] In the above implementation process, the formula realizes a comprehensive risk scoring mechanism. Pay attention to the possibility of collisions in physical space. Pay attention to environmental factors (such as gas concentration and ground vibration). Pay attention to unsafe human behaviors (such as smoking and not wearing protective equipment). The dynamic adaptive adjustment of weights allows the system to flexibly adjust the early warning strategy according to different construction stages (such as focusing on collapse prevention during the earthwork stage and fire prevention during the decoration stage).
[0035] Furthermore, collision probability based on trajectory prediction The computation is based on Long Short-Term Memory (LSTM) networks for future... Trajectory derivation at each time step, the calculation formula includes: in, For the predicted time entity With entity The Euclidean distance between them For safe distance threshold, This is the sensitivity coefficient.
[0036] In the above implementation process, combining LSTM for time series prediction gives the system a "predictive" capability. The system not only calculates the current distance but also extrapolates the trajectory within the next few seconds. The Sigmoid function maps the distance difference to a probability value between 0 and 1, and the λ parameter controls the sensitivity of risk to changes in distance, simulating the human psychological perception curve of approaching danger.
[0037] Secondly, this application provides a real-time early warning system for construction dynamic risks based on multi-source perception using artificial intelligence, comprising: a multi-source perception module for collecting visual images, laser point clouds, and environmental parameter data from the construction site; a data fusion engine for mapping multi-source data to a global coordinate system using a spatiotemporal alignment algorithm, and performing data cleaning and completion; a scene graph construction module for identifying construction entities using a heterogeneous graph neural network and constructing a dynamic semantic scene graph containing entity attributes and spatial relationships; a risk calculation and decision-making module for calculating the total risk potential field based on the dynamic semantic scene graph and outputting the early warning level through a risk quantification function; and a real-time interactive terminal for receiving the early warning level and providing immediate feedback through an audible and visual alarm or a handheld terminal.
[0038] Thirdly, this application provides an electronic device comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the method as described in any of the first aspects.
[0039] Fourthly, this application provides a computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method as described in any of the first aspects.
[0040] The technical solutions in the embodiments of this application will be described in detail below with reference to the accompanying drawings.
[0041] This application addresses the core pain points of existing construction safety monitoring technologies, such as limited perception dimensions, insufficient dynamic interaction understanding, and delayed early warning, by proposing a novel real-time early warning scheme for dynamic construction risks based on artificial intelligence and multi-source perception. This scheme utilizes deep learning and multi-sensor fusion technology to construct a digital twin dynamic risk perception field.
[0042] Please see Figure 1 This is a flowchart illustrating a real-time early warning method for construction dynamic risks based on multi-source perception using artificial intelligence, provided in an embodiment of this application. The method includes the following steps: S100: Acquire multi-source heterogeneous sensing data from the construction site.
[0043] For example, multimodal sensor arrays are deployed at key nodes on the construction site (such as tower crane vantage points, entrances / exits, and the edges of deep foundation pits). These sensor arrays include, but are not limited to: 1. High-resolution industrial cameras: acquiring RGB visual images for identifying personnel, equipment worn, fine motor skills, and posted safety signs. 2. 3D LiDAR: acquiring high-precision point cloud data for constructing the 3D geometry of the site, accurately measuring distances, volumes, and spatial positions between objects, and is unaffected by lighting conditions, offering significant advantages in nighttime construction scenarios. 3. Internet of Things (IoT) environmental sensors: including PM2.5 / PM10 dust sensors, noise sensors, anemometers, foundation pit displacement sensors, and combustible gas detectors. 4. Ultra-wideband (UWB) positioning tags: integrated into workers' smart safety helmets, providing personnel ID information and auxiliary positioning coordinates.
[0044] S200: Performs time synchronization and spatial registration processing on multi-source heterogeneous sensing data to construct a unified four-dimensional spatiotemporal data cube.
[0045] For example, the fusion of multi-source data first faces the problem of inconsistent spatiotemporal references. Time synchronization: IEEE 1588 PTP (Precise Time Protocol) is used to synchronize the time of all networked sensors. For non-networked sensors, hard synchronization is achieved using FPGA hardware trigger signals. At the software level, a Kalman filter is used as an interpolator. Assume that data needs to be fused at time t, while the LiDAR data is at... Camera data in Then, the motion model of the object is used. Predict the state values of each sensor at time t to eliminate motion artifacts caused by time deviation. Spatial registration: Establish a unified World Coordinate System (WCS) for the construction site. Obtain the extrinsic parameter matrices (rotation matrix R and translation vector T) of the camera and lidar by calibrating the target sphere. For any point in the point cloud... Its projection coordinates on the image plane Calculated using the following formula: Where K is the camera intrinsic parameter. Through this hard association, semantic information in the image (such as "this is a worker not wearing a seat belt") can be assigned to the corresponding laser point cloud cluster, thereby obtaining a semantically meaningful 3D point cloud.
[0046] S300: Input the four-dimensional spatiotemporal data cube into the heterogeneous graph neural network model to extract the semantic features and spatial location features of each construction entity in the scene and construct a dynamic semantic scene graph.
[0047] For example, this is the core intelligent processing step of this application. Traditional detection algorithms (such as YOLO) can only output isolated detection boxes, while this application constructs a "Scene Graph". Entity recognition: Point clouds are processed using 3D detection networks (such as PointPillars), and images are processed using 2D detection networks (such as Faster R-CNN). The results are then fused to obtain entity nodes. Node attributes include category (person, crane, pit), 3D dimensions, velocity vector, and attitude information. Relationship extraction: constructing edges. Edge types include geometric relationships ("above", "5 meters away") and semantic interaction relationships ("operating", "gazing", "carrying"). Graph Neural Network Inference: The constructed initial graph is input into a Heterogeneous Graph Neural Network (HGNN). HGNN allows nodes to exchange information through message passing. For example, if a "worker" node is connected to an "uncovered hole" node and its distance from the edge feature is less than 1 meter, after aggregation by the graph convolutional layer, the dimension representing the danger level in the feature vector of the "worker" node will significantly increase. The formula is: in Learning different weights for different relationship types (such as the relationship between humans and machines, and the relationship between humans and the environment) reflects a differentiated focus on different risk sources.
[0048] S400: Based on a dynamic semantic scene graph, it uses a multi-head attention mechanism to fuse the historical trajectory features and current state features of each construction entity to calculate the dynamic risk potential field distribution at the construction site.
[0049] For example, to predict risk, it is necessary to consider not only the present but also the past and future. Trajectory Encoding: LSTM or Transformer Encoder is used to extract the trajectory feature vectors of each moving entity over the past 5 seconds. Risk Potential Field Construction: This application creatively introduces the concept of "potential field" from physics. Static Potential Field: Generated by hazardous areas (such as high-voltage power lines or adjacent edges), it manifests as a repulsive field centered on the hazard source; the closer the distance, the higher the potential energy.
[0050] In the formula This spatial decay characteristic is described. Dynamic potential field: generated by the moving machinery. It depends not only on distance but also on the direction of velocity. Formula terms. This indicates that if the direction of the robotic arm's velocity... Pointing to a certain area ( If the risk potential in that area increases exponentially, then the risk will multiply; if it deviates from ( If the risk potential energy is reduced, the resulting total risk potential field is a scalar field covering the entire field, which can be intuitively overlaid on the monitoring screen in the form of a "heat map".
[0051] S500: Based on the dynamic risk potential field distribution, it calculates the real-time risk value of each construction entity through a risk quantification function, and generates corresponding multi-level early warning instructions based on the real-time risk value.
[0052] For example, risk quantification is no longer a single-dimensional process.
[0053] formula It reflects a comprehensive consideration of multiple factors. (Collision Probability): Calculates the probability that the distance between entities will fall below a safe threshold using predicted future trajectories. The introduction of the Sigmoid function makes the probability change smoother, conforming to cognitive patterns. (Environmental conditions): For example, when the wind speed sensor detects a wind force greater than level 6, this value spikes, and the risk threshold for tower crane operations is automatically lowered. (Behavioral Unsafety Factor): Through skeletal key point detection (Pose Estimation), if the worker's posture is detected as "lying down" (potentially fainting) or "climbing over a railing", this item scores high.
[0054] Decision-making logic: Level 1 warning (red): Immediately triggers an audible and visual alarm and automatically cuts off power to the relevant equipment (via PLC interface). Level 2 Warning (Orange): On-site broadcast warnings and push notifications to safety personnel's handheld terminals. Level 3 Warning (Yellow): Record these details and report them at the daily morning meeting as safety education material.
[0055] Example 2: System Architecture Please see Figure 3 (For reference), the system architecture provided in this application relies on an edge computing gateway at the hardware level. Due to the unstable network environment at construction sites and the massive amount of video data, uploading all data to the cloud for processing would cause unacceptable latency. Therefore, this system designs an "edge-cloud" collaborative architecture: Edge side: Sensors are responsible for data collection. Edge side: Deploy edge computing boxes based on NVIDIA Jetson or Huawei Atlas, running lightweight detection models and some graph neural network inference, undertaking the core computing for S100-S400, ensuring early warning latency is less than 100ms. Cloud side: Responsible for model training, updates, storage of historical big data, and macro-level safety situation analysis across construction sites.
[0056] Example 3: Application Scenarios Scenario 1: Tower Crane Collision Avoidance and Blind Spot Warning. A downward-facing camera and LiDAR are installed on the tower crane's boom. When the hook enters the blind spot, the system constructs a dynamic scene map to calculate the potential field overlap between the hook (dynamic node) and ground personnel (dynamic node) or buildings (static node) in real time. If an intersection is predicted, the system directly sends a signal to the tower crane's cab to lock the slewing or luffing mechanism.
[0057] Scenario 2: Safety monitoring of deep foundation pit operations integrates displacement sensor data and visual images. If the sensors detect minute deformations, and the visual model detects water accumulation or cracks in the support structure at the bottom of the foundation pit, the multimodal fusion engine will output a high-level geological disaster warning and link it with the broadcast system to notify the workers at the bottom of the pit to evacuate immediately.
[0058] Electronic devices and storage media This application also provides an electronic device, including a processor, a memory, and a communication bus. The memory stores a computer program executable by the processor, which, when executed, implements the method of any of the above embodiments. This electronic device can be an edge computing server at a construction site or a cloud server cluster.
[0059] This application also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements all the steps of the above-described method.
[0060] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence, characterized in that, Includes the following steps: S100: Acquire multi-source heterogeneous sensing data from the construction site, including two-dimensional visual image data, three-dimensional laser point cloud data, and IoT environmental sensing data. S200: Perform time synchronization and spatial registration processing on the multi-source heterogeneous sensing data to construct a unified four-dimensional spatiotemporal data cube; S300: Input the four-dimensional spatiotemporal data cube into the heterogeneous graph neural network model to extract the semantic features and spatial location features of each construction entity in the scene and construct a dynamic semantic scene graph; S400: Based on the dynamic semantic scene graph, the historical trajectory features and current state features of each construction entity are fused using a multi-head attention mechanism to calculate the dynamic risk potential field distribution at the construction site. S500: Based on the dynamic risk potential field distribution, calculate the real-time risk value of each construction entity through the risk quantification function, and generate corresponding multi-level early warning instructions according to the real-time risk value.
2. The method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence as described in claim 1, characterized in that, The step of constructing the dynamic semantic scene graph includes: The Kalman filter algorithm is used to perform time interpolation on sensor data with different acquisition frequencies to achieve microsecond-level timestamp synchronization; A global world coordinate system is constructed, and a calibration matrix is used to uniformly map the pixel coordinates of the two-dimensional visual image data and the point cloud coordinates of the three-dimensional laser point cloud data to the global world coordinate system. The mapping formula is as follows: in, Represents global world coordinates. Represents pixel coordinates, For depth information, For the camera intrinsic parameter matrix, and These are the rotation matrix and the translation vector, respectively.
3. The method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence as described in claim 1, characterized in that, In the step of constructing a dynamic semantic scene graph, the dynamic semantic scene graph is defined as follows: ,in Indicates time The set of construction entity nodes, This represents the set of interaction edges between entities; the node feature update formula of the heterogeneous graph neural network model is: in, Indicates the first Layer nodes eigenvectors, Represents a set of types of relations. Represents a node In relation The set of neighboring nodes below, The normalization constant is and The weight matrix is a learnable matrix. This is the activation function.
4. The method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence, as described in claim 1 or 3, is characterized in that... The steps for calculating the dynamic risk potential field distribution at the construction site include: Define static risk potential field With dynamic risk potential field The static risk potential field consists of the adjacent opening, the electrified area, and the restricted area; the dynamic risk potential field consists of mobile machinery and equipment and personnel; the total risk potential field... The calculation formula is: in, For spatial location points, For the first Location of a static hazard source This refers to the hazard level coefficient. For the first A dynamic entity at time Location, It is a velocity vector. The angle between the velocity direction and the position vector difference. All are field stress modulation coefficients.
5. The method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence as described in claim 1, characterized in that, The risk quantification function integrates the collision risk probability and the unsafe state of the environment, and its expression is: in, This represents the collision probability based on trajectory prediction. This represents the environmental insecurity index based on environmental sensor data. This represents the confidence level for unsafe behaviors based on skeletal keypoint recognition. These are dynamically adaptive weighting coefficients.
6. The method for real-time early warning of construction dynamic risks based on multi-source perception using artificial intelligence as described in claim 5, characterized in that, The collision probability based on trajectory prediction The computation is based on Long Short-Term Memory (LSTM) networks for future... Trajectory derivation and calculation formula at each time step include: in, For the predicted time entity With entity The Euclidean distance between them For safe distance threshold, This is the sensitivity coefficient.
7. A real-time early warning system for construction dynamic risks based on artificial intelligence multi-source perception, characterized in that, include: The multi-source sensing module is used to collect visual images, laser point clouds, and environmental parameter data at the construction site. The data fusion engine is used to map the multi-source data to the global coordinate system using a spatiotemporal alignment algorithm, and to perform data cleaning and completion. The scene graph construction module is used to identify construction entities using heterogeneous graph neural networks and construct a dynamic semantic scene graph that includes entity attributes and spatial relationships. The risk calculation and decision-making module is used to calculate the total risk potential field based on the dynamic semantic scene graph and output the warning level through the risk quantification function; A real-time interactive terminal is used to receive the warning level and provide immediate feedback via an audible and visual alarm or a handheld terminal.
8. The construction dynamic risk real-time early warning system based on artificial intelligence multi-source perception according to claim 7, characterized in that, The scene graph construction module specifically includes: The 3D target detection unit is used to process laser point cloud data and generate three-dimensional bounding boxes of objects. 2D semantic segmentation unit is used to process visual images and extract pixel-level category information of objects; The feature association unit is used to project 3D bounding boxes onto 2D images and fuse visual and spatial features through the intersection-over-union (IoU) matching algorithm to generate semantically rich node feature vectors.
9. An electronic device, characterized in that, include: The system includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the construction dynamic risk real-time early warning method based on artificial intelligence multi-source perception as described in any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the real-time early warning method for construction dynamic risks based on artificial intelligence multi-source perception as described in any one of claims 1 to 6.