Engineering contract progress and concealed engineering intelligent management method and system based on unmanned aerial vehicle multi-modal perception
By using UAV multimodal perception technology to analyze construction contracts and generate structured clause sets, an automated closed loop from on-site physical data to contract payment decisions has been achieved, solving the problem of disconnect between on-site perception and business decision-making, and improving the efficiency and accuracy of engineering construction management.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA THREE GORGES CORPORATION
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
In engineering construction and digital contract management, existing technologies suffer from a severe disconnect between on-site perception and business decision-making. This results in a lengthy contract performance verification process, significant subjective disagreements, a lack of irrefutable evidence chains, and an inability to achieve an automated closed loop from on-site physical data collection to contract payment decisions.
By using drones for multimodal perception, the construction contract text is parsed into a structured set of clauses, generating an initial flight mission. Multimodal image data is collected for 3D reconstruction and object-level semantic segmentation, engineering quantity features are extracted, and a lightweight evidence digest hash value is generated and broadcast to the blockchain network for consensus and evidence consolidation, thereby realizing automatic payment decisions.
It achieves a direct mapping from dynamic changes in three-dimensional space to contract payment nodes, eliminating the reliance on manual semantic conversion, ensuring that the collected data accurately matches the needs of progress assessment and acceptance, improving the effectiveness and efficiency of on-site perception, and eliminating discrepancies and delays caused by subjective verification.
Smart Images

Figure CN122243410A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of engineering construction and digital contract management technology, and in particular to an intelligent management method and system for engineering contract progress and hidden works based on UAV multimodal perception. Background Technology
[0002] In existing engineering construction and contract digital management scenarios, although UAV oblique photography, 3D reconstruction, and multi-sensor technologies have been widely adopted for on-site surveying and quality inspection, a significant disconnect between on-site perception and business decision-making remains a common pain point in practical engineering applications. Specifically, the collected engineering surveying data is vast but isolated; the assessment of contract progress and the determination of compliance status of concealed works still heavily rely on offline manual comparison, analysis, and post-event data entry by multiple personnel. This management model leads to a lengthy contract performance verification process, significant subjective disagreements, and a high risk of payment disputes between supervisors and contractors. Furthermore, the acceptance of key milestones lacks an irrefutable chain of evidence.
[0003] Analysis reveals a serious static fragmentation defect in the underlying data structure design and state machine transition mechanism of existing technologies. These technologies incorrectly define "3D point cloud / multimodal imagery" merely as a front-end "visual verification tool" or physical snapshot, failing to establish a structured association between the geometric spatial data model and the legal and economic model. For example, Chinese invention patent application CN115526450A discloses a "construction progress monitoring method, system, and medium based on oblique photography and BIM integration." This scheme utilizes UAV oblique photography to construct a 3D reality model and uses a Boolean algorithm to match and segment it with the BIM 3D model, ultimately generating a construction progress report. As this cited document shows, the highest level of existing technology only achieves geometric collision comparison between the reality model and the BIM model, but it does not establish a direct logical mapping and hash association between the physical space's 3D geometric coordinates (X, Y, Z), object-level semantics, and structured specific contract terms (such as bill of quantities items and milestone payment nodes) at the underlying data structure level. The "progress reports" generated by its calculations are essentially still in the geometric dimension. They inevitably still rely on human intervention (such as supervisors or owner engineers) as "translators" between the physical visual domain and the contract semantic domain, subjectively verifying the contract terms before deciding whether to trigger payment. This lack of cross-domain mapping means that the system cannot directly understand the business implications of the dynamic changes in the physical environment, thus completely blocking the automated closed loop from on-site physical data collection to contract payment decision signaling. Summary of the Invention
[0004] To address the shortcomings of the existing technologies, the technical problem to be solved by this invention is to provide a method and system for intelligent management of engineering contract progress and hidden works based on UAV multimodal perception. This system can construct an automated mapping and trusted state triggering mechanism based on engineering object-level semantics, enabling dynamic change features and thermal anomaly information extracted from three-dimensional space to be directly mapped and hashed to structured contract payment nodes and hidden works acceptance standards. This completely eliminates the dependence of engineering quantity calculation and compliance judgment on manual semantic transformation, and realizes an automated closed-loop flow from the underlying physical geometric quantitative changes to the top-level contract performance decision signaling.
[0005] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows: The present invention provides an intelligent management method and system for engineering contract progress and hidden works based on UAV multimodal perception, comprising the following steps: S1. Parse the unstructured construction contract text, extract and convert it into a set of structured contract clauses containing spatial location constraints. ; S2, Monitor the set of structured contract terms. The timeline status and business event signaling, when the preset triggering conditions are met, generate the initial UAV flight mission command. ; S3. Obtain the command for the drone to execute the initial drone flight mission. Collected multimodal image dataset For this multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. ; S4. Extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. ; S5. Hash value of the lightweight core evidence digest Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. ; S6. When the feature set of completed work volume is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
[0006] In the preferred embodiment, the unstructured construction contract text is parsed, and the extracted and converted into a set of structured contract clauses containing spatial location constraints. ,include: The system calls upon a deep learning-based pre-trained named entity recognition model, combined with a predefined engineering industry-specific dictionary, to extract key entities. Based on the preset regular expression template and entity mapping rules, the key entity is converted into the structured contract terms set. .
[0007] In the preferred embodiment, the initial UAV flight mission instructions are generated. Following that, it also includes: The initial drone flight mission command As the initial seed point, the optimal orthogonal plane for calculating the bounding box of the target component in the 4D Building Information Model is imported to generate the desired set of orthographic observation viewpoints. ; Solving the optimal visual-inertial state vector of a UAV based on a visual-inertial navigation system. An incremental exploration algorithm is used in conjunction with the desired frontal observation viewpoint set. Generate the optimal B-spline flight trajectory .
[0008] In a preferred embodiment, the multimodal image dataset is obtained in step S3. The process is as follows: S31. Obtain the original image sequence through the edge computing station deployed at the project site, and call the Laplacian variance operator to perform motion blur detection, and remove images with variance values lower than the dynamically set threshold. S32. Calculate the overlap between the actual flight path coverage area of the UAV and the planned mission area, and trigger an alarm signal when the overlap is lower than the preset overlap threshold. S33. Fuse the qualified image sequences to generate the multimodal image dataset. .
[0009] In the preferred embodiment, the multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. The process is as follows: S41. A depth-regularized planar Gaussian splashing algorithm is used for 3D reconstruction. S42. In multi-view rendering training, an image gradient filter is used to calculate the gradient magnitude of the entire image, and a weak texture region mask is generated when the gradient magnitude is lower than the low-frequency texture threshold. ; S43, Masking in the weak texture area Internally, the monocular depth map feature matrix output by the pre-trained depth estimation network is used as the ground truth to constrain the Gaussian volume rendering depth, thereby generating a current dense point cloud model. ; S44. The current dense point cloud model Multi-level feature extraction and pixel-level annotation are performed to generate object-level semantic segmentation label features. ; S45. Extract the initial real-world 3D baseline model. The known static obstacle depth map, compared with the current dense point cloud model. The rendered depth map is subjected to pixel-by-pixel differencing; when the depth difference of a target object is detected to be greater than a set threshold and its semantic category does not belong to the bill of quantities, it is identified as a dynamic occlusion, and its label features are segmented from the object-level semantics. In-middle stripping; S46. Extract the stripped object-level semantic segmentation label features. Basic engineering body map Perform three-dimensional Boolean difference operations to calculate volume or area integrals, and output the feature set of the completed project. .
[0010] In the preferred scheme, a lightweight core evidence digest hash value is generated. The process of consolidating consensus is as follows: S51, the hash value of the lightweight core evidence digest The associated set of structured contract terms The specific payment node ID and the hash value of the previous block are encapsulated into a smart contract transaction request; S52. Broadcast the smart contract transaction request to the engineering consortium blockchain node network, verify its legality through a practical Byzantine fault-tolerant consensus mechanism, and write it into the latest block after more than two-thirds of the nodes have verified it, thereby obtaining the blockchain storage receipt certificate. .
[0011] In the preferred embodiment, step S6 involves issuing and pushing an automatic payment decision signaling. Previously, and the determination process for this preset condition, are as follows: S61. Simultaneously render the current dense point cloud model on an engineering digital twin collaborative platform based on WebGL technology. ; S62. In response to the user's click operation on the semantically segmented building component in the 3D view, the reverse spatial index addressing algorithm is triggered. S63. By mapping point cloud coordinates, retrieve and display in a pop-up window the original image and the feature set of the completed project quantity bound to the building component from the database. and the blockchain-based evidence receipt. ; S64. Determine whether the preset condition is met: the ratio of the measured completed work volume of all associated progress nodes to the total design work volume is greater than or equal to 95%, and the digital handover status of all associated concealed works is confirmed.
[0012] In a preferred embodiment, the present invention also provides an intelligent management system for engineering contract progress and concealed works based on UAV multimodal perception, characterized in that it includes: The parsing module is used to parse unstructured construction contract texts, extract and convert them into a set of structured contract clauses that include spatial location constraints. ; The scheduling module is used to monitor this set of structured contract terms. The timeline status and business event signaling; when the preset triggering conditions are met, the initial UAV flight mission command is generated. ; The processing module is used to obtain the instructions for the UAV to execute the initial UAV flight mission. The collected multimodal image dataset; The processing module is also used to perform 3D reconstruction and object-level semantic segmentation on the multimodal image dataset, extract and output the feature set of the completed engineering quantity. ; The processing module is also used to extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. ; The evidence consolidation module is used to store the hash value of the lightweight core evidence digest. Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. ; The decision module is used when the feature set of completed work volume is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
[0013] In a preferred embodiment, the present invention also provides a computer device, the computer device including at least one processor coupled to at least one memory, the memory storing at least one computer program or instruction, wherein the computer program or instruction is loaded and executed by the processor to implement the steps of any of the above-mentioned intelligent management methods for engineering contract progress and hidden works based on UAV multimodal perception.
[0014] In a preferred embodiment, the present invention further provides a computer-readable storage medium, wherein a computer program or instructions are stored on the computer-readable storage medium, and when the computer program or instructions are executed by a processor, the steps of any of the above-mentioned intelligent management methods for engineering contract progress and concealed works based on UAV multimodal perception are implemented.
[0015] This invention provides an intelligent management method and system for engineering contract progress and hidden works based on UAV multimodal perception. Through the coordination of the above-mentioned structures, it has the following advantages compared with existing methods: First, an event-driven UAV mission scheduling model was constructed, enabling a leap from passive blind testing to proactive and precise contract performance inspection. Addressing the pain points of existing technologies where engineering surveying data is complex yet isolated and lacks guidance from business objectives, this invention directly binds the UAV deployment logic to specific contractual business events, with flight missions triggered by an event state machine. This transforms the UAV from a simple front-end "flying camera" into an "automatic contract performance inspector," ensuring from the source that the collected multimodal data accurately matches current progress assessment and acceptance requirements, significantly improving the effectiveness and efficiency of on-site perception.
[0016] Secondly, addressing the core issue that existing technologies only focus on the geometric collision dimension of BIM and cannot directly understand the business implications of dynamic changes in the physical environment, this invention establishes a direct logical mapping between physical space three-dimensional coordinates, object-level semantics, and structured specific contract terms at the underlying data structure through a specially optimized AI parsing algorithm oriented towards contract terms. This mechanism ensures that the system output is no longer a simple physical "snapshot of changes," but directly corresponds to the "completed work" in the contract bill of quantities and the "quality status judgment" that conforms to acceptance specifications, thereby eliminating discrepancies and delays caused by subjective human verification.
[0017] Thirdly, addressing the issue that existing technologies are prone to distorting 3D reconstruction and hindering automated judgment in practical engineering applications due to obstructions from on-site personnel, construction machinery, or weakly textured indoor areas, this invention introduces a depth regularization algorithm to achieve high-fidelity reconstruction of weakly textured environments, combined with semantic segmentation to dynamically remove unexpected obstructions. This collaborative processing mechanism effectively filters dynamic noise from the on-site physical environment, ensuring the absolute purity and objectivity of the extracted physical engineering quantities, thus laying a solid data foundation for accurate and error-free intelligent contract clause verification. Attached Figure Description
[0018] The present invention will be further described below with reference to the accompanying drawings and embodiments: Figure 1 This is a main view diagram of the process structure of this invention; Figure 2 This is a schematic diagram of the structure of the computer device of the present invention. Detailed Implementation
[0019] To better understand the purpose, system architecture, and functional implementation of this embodiment, the embodiments and features described herein can be combined with each other without conflict. The exemplary embodiments disclosed herein will be described below with reference to the accompanying drawings, including specific technical details disclosed to aid understanding; however, these details should be considered exemplary rather than restrictive. Therefore, those skilled in the art should understand that various improvements and adjustments can be made to the embodiments described herein without departing from the scope and core ideas of the invention. Similarly, for clarity, detailed descriptions of well-known technologies, functions, and structures are omitted in the following description.
[0020] Example 1 like Figure 1 As shown in the figure, this embodiment provides an intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception. The method includes the following sequential technical steps.
[0021] S1. Micro-implementation Mechanism of Contract Clause Structured Analysis and Initial Digital Benchmark Model Establishment In the specific implementation of the engineering construction and contract digital management system, step S1 aims to transform unstructured construction contracts into a machine-readable state that can be directly accessed by computer and drone scheduling systems, and to construct a globally unified spatial mapping base. Its underlying micro-execution logic and data flow process are as follows: First, the system's backend server receives the original construction contract text and bill of quantities input. Then, for this unstructured text data, the system calls a pre-trained deep learning-based Named Entity Recognition (NER) model, combined with a predefined engineering industry-specific dictionary, to perform Natural Language Processing (NLP).
[0022] Specifically, in this process, the Named Entity Recognition (NER) model actively extracts key entities. Preferably, key entities include date, amount, percentage, component name, and whether the concealed works water tightness test is passed.
[0023] Secondly, based on preset XML / JSON format regular expression templates and entity mapping rules, the system converts the extracted key entities into a structured contract clause set. .
[0024] Specifically, this set of structured contract terms In the database, it is encapsulated as an entity object containing multidimensional fields, including bill of quantities nodes, milestone payment nodes and their corresponding spatial location constraint attributes, and quality acceptance specification attributes.
[0025] The bill of quantities nodes include the item code, unit, and total design quantity. Milestone payment nodes include the trigger condition type, required completion percentage, and payment percentage. Preferably, the quality acceptance specification attribute includes a specific technical specification number.
[0026] Next, at the start of the project or at a key benchmark node, the system sends instructions to the dispatch center to drive the UAV to conduct the first full-domain high-precision oblique photography.
[0027] Preferably, this critical reference node is established when the infrastructure is completed.
[0028] Specifically, the drone is equipped with a five-lens tilting camera to acquire an initial multimodal image dataset under specific flight parameters. .
[0029] The specific flight parameters are set as follows: flight altitude 80m, ground resolution ≤2cm, forward overlap 80%, and lateral overlap 70%.
[0030] Then, for the acquired initial multimodal image dataset The cloud computing engine employs a BIM-assisted structure-of-motion (SfM) algorithm, which extracts local invariant feature descriptors from images and performs feature point matching to generate an initial real-world 3D benchmark model containing high-precision geometric textures. .
[0031] Preferably, the locally invariant feature descriptor is a SIFT feature.
[0032] Next, the system registers the initial real-world 3D baseline model with the global coordinate system through affine transformation. The three-dimensional geometric coordinate system and the structured contract terms set Initial spatial alignment is performed using spatial location constraint attributes, and a basic engineering ontology map with spatial anchor points is output. .
[0033] In addition, the foundation engineering body diagram with spatial anchor points It will serve as the absolute physical reference frame for all subsequent change detection and business mapping.
[0034] S2. Micro-implementation Mechanism of Intelligent Task Scheduling for Unmanned Aerial Vehicles Based on Structured Contractual Events and 4D Models This step aims to break away from the limitations of traditional drones relying on blind inspections based on human experience. By introducing an innovative "event-driven" architecture, it upgrades drones from "flying cameras" to "automatic contract performance inspectors." Its underlying micro-execution logic and upstream / downstream data flow process are as follows: First, an event-driven state machine deployed in the cloud monitors and polls the structured contract terms set in real time. The timeline status is monitored, and business event signaling is received synchronously from the construction party's mobile terminal APP.
[0035] Preferably, the business event signaling includes applications for inspection before the concealed works are covered, reports of milestone completion, and requests for nighttime inspections by the supervisor.
[0036] Secondly, when the preset triggering conditions are met, the task scheduling engine will combine the weather forecast data obtained from the meteorological interface to automatically generate initial UAV flight mission instructions for a specific area and with a specific level of precision. .
[0037] Specifically, weather forecast data is used to avoid severe weather.
[0038] Then, the system imports the prior geometry and interconnection information of the 4D Building Information Model (4D BIM) into the model-driven viewpoint generator, and sends the initial UAV flight mission instructions. As the initial seed point for spatial search.
[0039] Next, for the specific engineering component that needs to be observed in the instruction, the system calculates the optimal orthogonal plane of its bounding box, thereby generating the desired set of frontal observation viewpoints. .
[0040] In practice, this process utilizes the following formula: ; Among them, the expectation is to face the observation viewpoint set. The set of observation coordinates that minimizes the angle between the component surface normal and the camera's line of sight and is free from static obstruction. is the normal vector of the target component surface. In candidate viewpoints The camera observation vector at that location. This is the set of known static occlusions in the 4D BIM model. This is an indicator function used to determine the viewpoint. To the center of the target component The value is 1 if the ray is not penetrated by a static obstruction, and 0 otherwise.
[0041] Therefore, the system not only ensures the minimum angle between the camera's line of sight and the normal to the component surface, but also utilizes the known set of static occlusions in 4D BIM. A line-of-sight ray penetration test was conducted to ensure that the viewpoint set was physically unobstructed. Preferably, the smallest included angle was used to obtain the highest quality texture and infrared features.
[0042] Next, after the drone receives the mission and takes off, it activates its onboard visual inertial navigation system in a complex construction scenario without GPS signal coverage.
[0043] Preferably, the complex construction scenario includes indoor or basement areas. The visual inertial navigation system employs the VINS-Fusion algorithm.
[0044] Secondly, the airborne computing unit acquires high-frequency data from the airborne inertial measurement unit and KLT sparse optical flow features extracted by the stereo camera in real time. Through pre-integration techniques and tightly coupled nonlinear least-squares optimization, the optimal visual-inertial state vector with extremely low drift error is obtained. : ; Wherein, the visual inertial optimal state vector For drones Precise three-dimensional translation and rotation pose matrix at any given time; It is a collection of measurement data from multiple sensors; For sensors exist The actual observed characteristic value at time; This is a sensor observation model function based on predictions made from the current state. This is the covariance matrix of the sensor observation noise, used for weight adjustment; This is the prior residual information generated for marginalizing historical frames.
[0045] Preferably, the onboard computing unit is an Intel NUC mini PC. The onboard inertial measurement unit is a Wheeltec M100 IMU. The stereo camera is an Intel Realsense D435i.
[0046] Specifically, the optimal state vector of visual inertia That is, the precise three-dimensional translation and rotation pose matrix of the UAV.
[0047] Then, based on this visual inertial optimal state vector The system provides a high-precision self-localization benchmark and invokes the Fast Unmanned Aerial Vehicle Exploration Algorithm (FUEL). This algorithm continuously identifies frontiers at the boundary between unknown regions and known free space in an incremental voxel grid map.
[0048] Next, the system will view the expected observation viewpoint set. Forced injection is a mandatory node. The asymmetric traveling salesman problem (ATSP) is constructed and solved to generate a rough global path that can cover all components to be inspected with the shortest distance while also taking into account the exploration of unknown areas.
[0049] Secondly, the system uses a uniform B-spline curve with convex hull properties to refine the coarse global path locally, thus solving for the optimal B-spline flight trajectory. .
[0050] In practice, the following formula is used in the optimization function: ; Among them, the optimal B-spline flight trajectory Consisting of a series of control points and node spacing Parametric configuration. An elastic band penalty term is used to ensure the smoothness of the trajectory. Time penalty weight. This represents the total execution time of the trajectory. This is the obstacle avoidance weight coefficient. This is a repulsion field penalty term based on the distance between the voxel map and the obstacle. These are the dynamic constraint weighting coefficients. and These are penalty functions for exceeding the maximum speed and maximum acceleration thresholds of the drone, respectively. Weights for viewpoint transition smoothing. A penalty term is used to constrain the smooth transition of the drone gimbal yaw angle.
[0051] Therefore, the system introduces an obstacle avoidance repulsion force term based on voxel maps. Dynamic constraints on the maximum thrust of the UAV motor and And the smooth yaw angle of the camera gimbal This makes the optimal B-spline flight trajectory It becomes a continuous polynomial trajectory with the shortest time, absolute collision avoidance, and dynamic feasibility.
[0052] Next, the drone's underlying flight controller receives the optimal B-spline flight trajectory. It then converts this into PWM speed control commands for the underlying motor, precisely driving the drone to the target location.
[0053] Preferably, the flight controller uses a Pixhawk 6c mini running PX4 autopilot firmware.
[0054] S3. Micro-implementation mechanism for high-frequency acquisition and preprocessing of multimodal data in edge collaboration This step aims to address the bandwidth bottleneck and redundant cloud computing power consumption issues caused by massive amounts of engineering visual data. By introducing an edge computing layer into the physical field, preliminary cleaning, quality control, and security encryption of multi-source heterogeneous data are completed. The underlying micro-execution logic and data flow process are as follows: First, the drone's underlying flight controller strictly tracks this optimal B-spline flight trajectory. During cruise, its onboard synchronous triggering module will trigger the multi-modal sensors at high frequency based on the spatial displacement distance and flight speed.
[0055] Specifically, the multimodal sensor includes a visible light camera, an infrared thermal imager, and a depth camera containing depth information. Preferably, the visible light camera is a GoPro 11 mini action camera. The depth camera is an Intel RealSense D435i depth camera.
[0056] Secondly, during the data acquisition process, the UAV strictly adheres to the preset image overlap rate specifications to capture the original physical image sequence and point cloud snapshots of the site. Preferably, the image overlap rate specifications are set to 80% for the forward direction and 70% for the side direction.
[0057] The collected raw, complex data stream is then transmitted in real time to the edge computing station deployed at the project site via an onboard high-speed interface. At the edge, the system initiates a series of lightweight quality control algorithms.
[0058] Next, the system calls the Laplacian Variance operator to extract high-frequency edge gradients from the visible light image, achieving motion blur detection and directly discarding unusable images with variance values below a dynamically set threshold locally. Preferably, these unusable images include invalid images generated by severe shaking of the drone gimbal.
[0059] Secondly, the system performs an integrity check, analyzing the image POS (Position and Orientation System) data of each image to calculate the overlap between the actual flight path coverage area of the UAV and the planned mission area. If the overlap is detected to be lower than a preset overlap threshold, an alarm signal is immediately triggered to the cloud-based mission scheduling center, and a re-flight is recommended. Preferably, the preset overlap threshold is set to 98%.
[0060] In one feasible approach, for infrared thermal imaging data, the edge computing station performs non-uniformity correction (NUC) in real time to eliminate fixed-pattern noise of the focal plane array.
[0061] Next, during flight, the system extracts the initial UAV flight mission command from the original high-resolution imagery according to the predetermined extraction logic. Keyframes for the corresponding area are extracted and their resolution reduced to generate representative thumbnails for rapid remote inspection. Preferably, the extraction logic is set every 20 seconds or every 50 meters of flight. The resolution is reduced to 5-10 cm / pixel.
[0062] Secondly, after the aforementioned high-frequency cleaning and dimensionality reduction, the edge computing station performs spatiotemporal alignment and data fusion on qualified visible light images, corrected infrared thermal images, lightweight thumbnails, and synchronized attitude sensor logs, formally encapsulating them to generate an edge preprocessing multimodal dataset. .
[0063] Then, the system directly calls the project-specific AES-256 key locally to preprocess the multimodal dataset at the edge. Symmetric encryption is performed, and then the data is transmitted back to the cloud at high speed via a secure link established through the TLS 1.3 protocol.
[0064] In addition, if a network signal interruption is detected during this process, the edge gateway will trigger the breakpoint resume mechanism, temporarily storing the encrypted slice in local non-volatile memory, and automatically resuming transmission after the link is restored.
[0065] S4. Microscopic Implementation Mechanism of Cloud-based Weak Texture High-Fidelity Reconstruction and Object-level Semantic Mapping Based on D-PGSR This step aims to completely resolve the issues of holes and noise in traditional multi-view stereo vision (MVS) in areas with weak texture, such as white walls and flat floors, and to accurately convert complex physical point clouds into the engineering quantities payable in the contract. Its underlying micro-execution logic and data flow process are as follows: First, a cluster of cloud servers equipped with multiple high-performance GPU accelerator cards receives the edge-preprocessed multimodal dataset from the edge. Then, the cloud-based intelligent analysis engine calls the COLMAP sparse reconstruction module to extract the initial feature point group of the scene, and uses this as the prior location to initialize the three-dimensional Gaussian attribute set. .
[0066] Specifically, this three-dimensional Gaussian property set Including spatial location Opacity spherical harmonic coefficient and the covariance matrix representing the shape Preferably, the spherical harmonic coefficient Colors used to express perspective.
[0067] Next, the system optimizes the covariance by decomposing the rotation matrix and the scaling matrix.
[0068] In practical implementation, the characteristic expression function of a single three-dimensional Gaussian distribution Defined as: ; in, For spatial points The probability density effect value at that location. Input coordinate vectors for the physical space. The mean coordinate vector of the center of the current Gaussian primitive. This is the positive definite covariance matrix obtained by decomposing it using rotation and scaling matrices.
[0069] Specifically, the positive definite covariance matrix defines the anisotropic stretch of Gaussian primitives in three-dimensional space. Thus, this step initially transforms discrete two-dimensional image features into a continuous and differentiable three-dimensional probability density representation.
[0070] Secondly, in response to the challenge of weak textures that are prevalent in indoor construction sites, the system initiates the backpropagation gradient optimization process of the Depth Regularized Planar Gaussian Splatter Algorithm (D-PGSR).
[0071] Then, in the multi-view rendering training, the system first extracts the pixels of the reference frame. and adjacent frame pixels By calculating the homography matrix Multi-view geometric and photometric consistency constraints are applied. Then, the system uses the Sobel edge detection operator as the image gradient filter. Calculate the gradient magnitude of the entire image.
[0072] Secondly, when the gradient magnitude is lower than the set low-frequency texture threshold, a weak texture region mask is generated. Within this masked region, the system forcibly activates the depth regularization loss function. : ; Wherein, the depth regularization loss function Forcefully narrow the distance between the Gaussian rendering depth and the output features of the monocular depth estimation network. The set of pixels identified as having weak texture by the mask. In pixel coordinates The binary mask matrix at the given location is set to 1 if the gradient is below a set threshold, and 0 otherwise. For the current three-dimensional Gaussian property set The differentiable absolute depth value is obtained by accumulating the volume rendering formula. This is the relative monocular depth map feature matrix predicted by a pre-trained deep neural network under the same viewpoint. and These are the dynamic scaling and translation alignment parameters obtained by fitting the sampled ray.
[0073] Preferably, the deep neural network adopts the MiDaS depth estimation network model.
[0074] Next, the system uses a deep neural network to output a monocular depth map feature matrix. By dynamically aligning parameters and Use it as a pseudo-truth value to constrain the rendering depth of the Gaussian volume. Finally, after thousands of iterations using the Adam optimizer, a high-precision current-state dense point cloud model is output. .
[0075] Secondly, after acquiring the high-precision geometric shape, the system models the current dense point cloud. Two-dimensional object-level semantic segmentation is performed on the projected multi-view rendering image.
[0076] Specifically, for the visible light channel, the system uses a pre-trained SwinTransformer as the backbone network for multi-level feature extraction, and uses UperNet as the decoding head, fusing a pyramid pooling module (PPM) to capture the global context.
[0077] Next, for the infrared thermal imaging channel of the concealed works, the system constructs a three-segment grayscale mapping function with the overall average temperature and the average temperature of the low-temperature zone as inflection points. After stretching the thermal anomaly contrast, the Otsu algorithm is used to adaptively extract the leakage contour. Preferably, the concealed works include roof waterproofing works.
[0078] Then, the visual features of the two channels are annotated at the pixel level using a softmax classifier. Preferably, the pixel-level annotation is used to distinguish between newly constructed structures, demolished structures, material piles, or waterproofing voids.
[0079] Secondly, the known camera intrinsic parameters are then back-projected into 3D space, and a semantic vector is assigned to each 3D Gaussian point to generate object-level semantic segmentation label features. .
[0080] Next, occlusion culling inference based on depth and ray projection is performed. The system extracts the initial real-world 3D baseline model. The known static obstacle depth map, compared with the current dense point cloud model. The rendering depth map is pixel-by-pixel differencing.
[0081] Then, when the depth difference of an object is detected to be greater than a set threshold and its semantic category does not belong to the contract list, the system defines it as a dynamic occlusion and directly segments the label features from the semantic segmentation of that object. The object is then stripped away to ensure the three-dimensional purity of the engineering entity. Preferably, the threshold is set to 0.5 meters. The object is semantically segmented and identified as a construction worker, an excavator, or a temporary steel pipe stack.
[0082] Secondly, the purified object-level semantic segmentation label features The foundation engineering body diagram with spatial anchor points Perform three-dimensional Boolean difference operations. The system automatically selects newly added or modified three-dimensional objects and performs three-dimensional meshing and closure processing on them to calculate volume or surface area. Preferably, the three-dimensional meshing and closure processing uses Poisson surface reconstruction.
[0083] Next, the system automatically accumulates and maps the three-dimensional volume or area integration results of the physical space domain, and outputs the feature set of completed engineering quantities. .
[0084] In practical implementation, the following formula is used: ; Among them, the feature set of completed project quantities For the corresponding to the first The cumulative physical completion amount of each contract in the list. This refers to the physical space bound to this contract item. This is a semantic matching indicator function. For coordinates The semantic label value of the point cloud at that location. This refers to the standard building component category code specified in industry standards for this contract item.
[0085] Then, the system will calculate the feature set of the completed project. The extracted thermal anomaly quality status assessment parameters are directly correlated with the structured contract terms set. The corresponding thresholds are compared numerically. Preferably, the corresponding thresholds include the percentage of progress in the project and the quality acceptance specifications.
[0086] Next, the system automatically outputs a structured acceptance summary containing a Boolean value indicating whether the project meets or fails to meet the standards. Preferably, this structured acceptance summary is a JSON-formatted summary of the concealed works acceptance and a schedule deviation report.
[0087] S5. A lightweight blockchain on-chain and evidence-based micro-implementation mechanism based on the principle of minimum necessary evidence. This step aims to meticulously analyze the complex physical domain data generated by front-end acquisition and cloud-based processing, extracting crucial fingerprints with legal validity. This fundamentally resolves the contradiction between storage costs and explosive network performance collapse faced by traditional blockchains when dealing with massive amounts of engineering 3D data. Its underlying micro-execution logic and data flow process are as follows: First, the system strictly adheres to the principle of minimum necessary evidence, performing dimensionality reduction extraction and evidence-based packaging on the massive amounts of data flowing in from the front end. The system extracts the feature set of the completed project quantity. The corresponding decision summary file, and simultaneously extract the edge preprocessed multimodal dataset. The process fingerprint is strictly bound to the drone's take-off and landing timestamps and device serial number as a physical spatiotemporal anti-counterfeiting identifier.
[0088] Preferably, the decision summary document contains the hash values of PDF documents of the schedule matching report and the concealed works acceptance summary. The process fingerprint contains image features of representative keyframe thumbnails. The device serial number contains GPS track records.
[0089] Next, after strictly serializing and merging the above information according to the preset data structure, the system calls the SHA-256 encryption hash algorithm to calculate and generate a fixed-length, collision-resistant, lightweight core evidence digest hash value. .
[0090] In practical implementation, the following formula is used: ; in, This serves as the unique digital fingerprint for this business acceptance event. This provides safe concatenation operations for underlying strings or byte streams. This is a one-way hash operation for the source data of the submodule. A set of key evidence images that can prove the original appearance of the scene. A structured acceptance report that includes a conclusion of whether the standard was met or not. and Together, they anchored the absolute spatiotemporal coordinates of data acquisition and the unique entity of the device.
[0091] Secondly, the system calls the preset smart contract API interface to obtain the lightweight core evidence digest hash value. The associated set of structured contract terms The specific payment node ID and the hash value of the previous block are encapsulated to generate a formatted smart contract transaction request. Preferably, the smart contract API interface uses the submitEvidence function.
[0092] Next, the system broadcasts the transaction request to the engineering consortium blockchain node network via a secure network. Specifically, this engineering consortium blockchain node network is physically deployed by servers belonging to the owner, supervisor, general contractor, and third-party auditing agency.
[0093] Upon receiving the broadcast, the consortium blockchain node cluster initiates the Practical Byzantine Fault Tolerance (PBFT) consensus mechanism to verify the legality of the transaction. During the consensus process, the transaction will only be officially confirmed and written into the latest block of the consortium blockchain if and only if more than two-thirds of the nodes have verified and reached a consensus. Preferably, more than two-thirds of the nodes, i.e., at least three parties, are involved.
[0094] Next, after the proof is established, the blockchain network returns a blockchain-based proof receipt with immutable attributes to the business system. The receipt is formatted and parsed into key-value pairs containing the transaction hash, block height, and timestamp.
[0095] In addition, the massive original images of physical entities, infrared thermal images, and 3D point cloud files are stored in a cloud-based distributed file system with strict access control, thus realizing a data architecture of on-chain evidence storage and off-chain storage.
[0096] S6. Micro-implementation Mechanism of Contract Performance Twin Visualization and Intelligent Payment Decision-Making End-to-End Closed Loop This step aims to transform the underlying, tedious visual point cloud data and hash strings into a 3D visualization interface that engineering managers can directly interact with. Ultimately, it drives the automatic disbursement of commercial funds through a state machine, completely eliminating the delays caused by manual review in traditional processes. Its underlying micro-execution logic and data flow process are as follows: First, in this embodiment, the system uses a WebGL-based engineering digital twin collaborative platform to synchronously render the current dense point cloud model calculated in the preceding steps at a high frame rate. The platform provides a unified three-dimensional spatial operation interface for all participating parties.
[0097] Secondly, when a user clicks on any colored building component in the 3D view, the system immediately triggers the reverse spatial indexing addressing algorithm. Specifically, the coloring itself constitutes semantic segmentation. Preferably, the building component is a suspected leaking roof waterproofing layer.
[0098] Next, the reverse spatial indexing addressing algorithm retrieves and displays the complete chain of evidence bound to the physical space from the cloud database via point cloud coordinate mapping. This complete chain of evidence includes the associated original visible light and thermal infrared images, parameters determined by artificial intelligence to be qualified or unqualified, precisely verified volume or area values, and a blockchain-based evidence receipt to prove that the data has not been tampered with. This creates a penetrating digital verification channel.
[0099] Specifically, the raw visible light and thermal infrared images are derived from this edge preprocessing multimodal dataset. The precisely verified volume or area values are derived from the feature set of the completed project. .
[0100] Then, the contract management smart state machine monitors the notarized transaction events and front-end business processes on the blockchain network in real time. This state machine incorporates Boolean logic gates, and its core decision conditions are directly drawn from the structured contract terms set. .
[0101] In practice, the system will only trigger the release logic when both Condition 1 and Condition 2 are met for a specific payment node. Condition 1 is that the ratio of the measured completed work volume to the total designed work volume for all associated progress nodes is greater than or equal to 95%. Condition 2 is that the digital handover status of all associated concealed works packages is confirmed, and the corresponding blockchain-based evidence receipt has been successfully obtained for each package. Preferably, the measured completed work quantity is the feature set of the completed work quantity. Ninety-five percent is the preset configurable extreme threshold for achieving the target workload.
[0102] Next, the system establishes an anomaly interception mechanism. If the project quantity threshold is not met, the system automatically generates an early warning report and lists the discrepancies, pushing it to the construction party. If the progress node meets the target but the corresponding blockchain-based evidence receipt is missing... If the system fails to do so, it will implement a veto, forcefully blocking subsequent processes and issuing the highest-level alert for incomplete evidence chains.
[0103] Finally, when all the above judgment logics are true, the contract management smart state machine automatically triggers the top-level smart contract. The system automatically issues an automatic payment decision signaling message containing a summary of the entire encrypted hash traceability chain and the acceptance conclusion. .
[0104] Next, the automatic payment decision signaling Carrying the blockchain evidence numbers of all relevant nodes as underlying attachments, the system is directly pushed to the owner-side financial ERP system via a secure RESTful API interface, automatically generating a draft electronic payment certificate and initiating online final review and disbursement. Preferably, the owner-side financial ERP system uses SAP or Oracle financial modules. Automatic payment decision signaling. It takes the form of a payment proposal.
[0105] Thus, the entire process is automatically connected through data flow, achieving streamlined closed-loop management.
[0106] Example 2 As mentioned above, with the development of engineering digitization technology, engineering information acquisition and sharing at construction sites can be achieved based on UAV multimodal perception networks. This includes real-world 3D models, project progress, and the status of concealed works, enabling owners and supervisors to grasp information on contract performance progress. This performance progress information includes the 3D geometric coordinates of the physical space, object-level semantic classification, and abnormal material temperature distribution. An intelligent engineering contract management system based on UAV multimodal perception can effectively improve the efficiency and transparency of project progress settlement and reduce manual auditing costs. The reliability of this system heavily relies on airborne multimodal sensors, edge-cloud collaborative communication networks, and underlying legal mapping logic. However, these components may face anomalies in the physical environment or data flow during actual operation. For example, centralized storage nodes are susceptible to tampering or single-point failure attacks, causing delays or damage to the engineering evidence chain. Furthermore, in environments without GPS signal coverage or with weak indoor textures, airborne visual sensor signals may also malfunction due to feature point loss or interference from complex dynamic obstructions (such as construction machinery and personnel), leading to severe distortion in the extracted 3D engineering quantities.
[0107] In the current technological context, the comparison and detection of UAV visual data and construction progress is typically based on manual offline comparison or simple BIM geometric collision algorithms. However, both of these methods heavily rely on the subjective experience of humans; the accuracy of quantity calculations based on conventional 3D geometric collision largely depends on the ideality of the site environment; most critically, existing technologies mistakenly define "3D point cloud" merely as a front-end physical snapshot, failing to break down the structured barrier between the geometric space model and the bill of quantities, thus making it impossible to directly summarize the legal and economic implications of dynamic changes in the physical environment. Therefore, existing technologies inevitably lead to mis-detection and omission of project progress and prolonged disputes in the payment process.
[0108] In view of this, this application provides an intelligent management system for engineering contract progress and hidden works based on UAV multimodal perception, aiming to solve the core technical problem of how to break down the static structural isolation between three-dimensional physical space data and abstract contract terms. This solution constructs an automated mapping and trusted state triggering mechanism based on event-driven and engineering object-level semantics, enabling dynamic features extracted from UAV dynamic planning acquisition, high-precision reconstruction of weak texture environments, and object-level semantic segmentation to be directly mapped and hashed to structured contract payment nodes through a lightweight on-chain architecture. This completely eliminates the dependence of engineering quantity calculation on manual semantic transformation, realizing an end-to-end automated pipeline closed loop from underlying physical geometric quantitative changes to top-level contract performance decision signaling.
[0109] This intelligent management system for engineering contract progress and hidden works based on UAV multimodal perception can include a perception system, a computing platform, and a communication system. The perception system can include one or more airborne sensors that sense information about the surrounding environment of the engineering site. For example, the perception system can include one or more of the following: an airborne positioning system (such as GPS, BeiDou, or other visual inertial positioning systems, VIO), an inertial measurement unit (IMU), lidar, an infrared thermal imager, and a depth camera.
[0110] The intelligent management system for engineering contract progress and hidden works based on UAV multimodal perception is not a single isolated device, but a distributed hardware and software collaborative system composed of a cluster of cloud-based integrated scheduling servers, on-site edge computing gateways, and an airborne heterogeneous computing platform.
[0111] Specifically, the intelligent management system for engineering contract progress and hidden works based on UAV multimodal perception is logically divided into: a parsing module, a scheduling module, a processing module, a verification module, and a decision-making module, to fully cover the automated inspection pipeline for contract performance. From the perspective of underlying physical hardware, the intelligent management system for engineering contract progress and hidden works based on UAV multimodal perception further includes: a multi-core central processing unit (CPU), a graphics accelerator processor (GPU), a neural network processor (NPU), a hardware security module (HSM), an airborne flight control bus interface, and a large-capacity high-speed cache and persistent storage medium.
[0112] The parsing module is used to parse unstructured construction contract text, extract and convert it into a set of structured contract clauses that include spatial location constraint attributes. At the physical hardware level, the natural language processing tasks of the parsing module mainly rely on the Neural Processing Unit (NPU) or Tensor Processing Unit (TPU) of the cloud server for execution. The NPU integrates a hardware multiply-accumulator (MAC) array specifically optimized for deep learning matrix multiplication. This allows it to load massive amounts of unstructured engineering text into high-speed SRAM with extremely low latency. It then performs efficient vectorization calculations and entity extraction by calling pre-loaded named entity recognition model weights, and finally transmits the parsed structured contract terms set via the system bus. Write it to persistent storage medium for downstream use.
[0113] The scheduling module is the core of implementing the "event-driven model." It is used to monitor the set of structured contract terms. The timeline status and business event signaling; when the preset triggering conditions are met, the initial UAV flight mission command is generated. At the underlying implementation level, the scheduling module can be represented as a high-priority state machine logic engine running on a multi-core central processing unit (CPU). The CPU receives business inspection network messages from the mobile terminals of on-site construction personnel in real time and polls the contract timeline in conjunction with clock interrupt signals. Once the triggering conditions are met (such as inspection of concealed works or arrival of a node), the CPU calculates and generates the initial UAV flight mission instructions. The data is then transmitted to the UAV's underlying flight controller via the airborne flight control bus interface. The flight control system then automatically performs precise and efficient data acquisition based on the generated trajectory, thereby upgrading the UAV from a passive "flying camera" to an "automatic contract performance inspector".
[0114] The processing module is the core computing power hub of this device and the carrier for implementing the "AI parsing algorithm oriented towards contract terms." It acquires multimodal image datasets collected by drones; performs 3D reconstruction and object-level semantic segmentation on these datasets; and extracts and outputs a feature set of completed work quantities directly corresponding to the contract's bill of quantities. Simultaneously, a lightweight core evidence digest hash value is generated. .
[0115] In terms of hardware architecture deployment, the computing power of the processing module is dynamically allocated across two physical layers: the "edge" and the "cloud". (1) At the edge, the processing module relies on the edge computing station (such as FPGA-based computing unit) deployed at the construction site, and uses hardware-level operator accelerators to process high-definition video streams in real time and in parallel, remove blurry and unusable footage, and generate a lightweight edge preprocessing multimodal dataset. .
[0116] (2) In the cloud, the processing module is mainly mapped to a supercomputing cluster composed of multiple graphics accelerator processors (GPUs). The massive stream processors and video memory inside the GPUs work together to carry out the massive backpropagation gradient calculations during multi-view rendering training of the deep regularized planar Gaussian splashing algorithm. At the same time, the GPUs perform feature map convolution operations on the backbone network to quickly achieve object-level semantic segmentation (i.e., determine the quality status), and complete three-dimensional Boolean difference operations through register accumulation. Finally, the completed feature set is output in memory. Furthermore, the processing module generates the lightweight core evidence digest hash value. At that time, the CPU106's cryptographic instruction set will be invoked to perform high-throughput hash operations.
[0117] Some or all of the system's data flow functions are controlled by the computing platform. In one implementation, the processing module is one or more processors, such as an application processor (AP), an application-specific integrated circuit (ASIC), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a controller, memory, a video codec, a digital signal processor (DSP), a baseband processor, and / or a neural network processing unit (NPU). Different processing units can be independent devices or integrated into one or more processors. The controller can generate operation control signals based on the instruction opcode and timing signals to control instruction fetching and execution. The processor's cache memory can store recently used or repeatedly used instructions or data, reducing processor latency. In another implementation, the processor can implement certain functions through the logical relationships of hardware circuits, such as a field-programmable gate array (FPGA). To support AI-based parsing algorithms for contract terms and high-fidelity 3D reconstruction, hardware circuits designed for deep learning in artificial intelligence can be employed, such as Neural Processing Units (NPUs) and Tensor Processing Units (TPUs). Furthermore, the computing platform may include memory, and some or all of the processors can call instructions from memory to execute the end-to-end automated closed-loop logic proposed in this invention. The system, through a communication system, enables cross-domain information exchange between the airborne platform and cloud scheduling servers, edge computing stations, engineering consortium blockchain nodes, and third-party financial ERP systems.
[0118] The controller can serve as a central nervous system and command center. Based on the instruction opcode and timing signals, the controller generates operation control signals to control instruction fetching and execution. The processor may also include memory for storing instructions and data. In some embodiments, the processor's memory is a cache memory. This memory can store instructions or data that the processor has just used or that is used repeatedly. If the processor needs to reuse the instruction or data, it can directly retrieve it from the memory. This avoids repeated accesses, reduces processor waiting time, and thus improves system efficiency.
[0119] In another implementation, the processing module can achieve certain functions through the logical relationships of hardware circuits. These hardware circuit relationships can be fixed or reconfigurable. For example, the processor can be a hardware circuit implemented using an Application-Specific Integrated Circuit (ASIC) or a Programmable Logic Device (PLD), such as a Field-Programmable Gate Array (FPGA). In reconfigurable hardware circuits, the processor loads a configuration document to configure the hardware circuit. Furthermore, for the complex quadratic programming and semi-positive definite matrix decomposition involved in the trajectory optimization of this application, hardware circuits designed for artificial intelligence or high-performance matrix operations can also be used, such as Neural Processing Units (NPUs), Tensor Processing Units (TPUs), and Deep Learning Processing Units (DPUs). The aforementioned processors can call and execute instructions from memory to quickly construct a safe corridor and output the globally optimal trajectory.
[0120] The evidence consolidation module is the executor that implements the "lightweight on-chain" architecture. It is used to hash the lightweight core evidence digest value. Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. At the physical entity level, the evidence-gathering module heavily relies on the Hardware Security Module (HSM). The evidence-gathering module utilizes the HSM within a Trusted Execution Environment (TEE) to process the hash value containing the lightweight core evidence digest. The transaction payload is digitally signed, and then the digital message is broadcast to the physical server nodes of each party via a network interface. This module innovatively puts only the core evidence digest on the blockchain, which not only takes advantage of the immutability of the blockchain, but also avoids the storage bottleneck caused by putting massive amounts of engineering point cloud data on the blockchain.
[0121] The decision-making module is the final output for achieving an "end-to-end automated closed loop." It is used to determine the completion of a set of project features upon detection. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. At the system interaction level, after the logic unit within the CPU completes the numerical comparison instruction for the contract engineering quantity, the decision module will generate the automatic payment decision signal. It is then directly pushed to a third-party financial ERP system (such as SAP or Oracle) through a security gateway layer. The entire process is automatically connected through data flow, minimizing human intervention and delays.
[0122] It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the intelligent management system for engineering contract progress and concealed works based on UAV multimodal perception. In other embodiments of this application, the intelligent management system for engineering contract progress and concealed works based on UAV multimodal perception may include more or fewer hardware components than illustrated. The logic modules illustrated can all be implemented in hardware-based collaborative operation by computer executable program code pre-burned in the storage medium, loaded and executed by a processor.
[0123] A storage module can be used to store computer executable program code and data, the executable program code including instructions. The storage module may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device or flash drive.
[0124] Furthermore, the logical instructions in the aforementioned memory can be implemented as software functional units and sold or used as independent products, and can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0125] It is understood that the embodiments of this application do not constitute a specific limitation on the intelligent management system for engineering contract progress and hidden works based on UAV multimodal perception. In other embodiments of this application, the device may include more or fewer components than illustrated (e.g., adding a display panel for human-computer interaction, including LCD, OLED, AMOLED, MiniOLED, etc.), or combining certain components, or splitting certain components, or arranging different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
[0126] Example 3 Further explanation in conjunction with Example 1, such as Figure 2 The structure shown. Figure 2 A schematic diagram of the structure of a computer device provided in an embodiment of this application. The computer device includes: Processor, memory, communication bus, and computer programs stored in memory that can run on the processor.
[0127] The processor can call computer programs in memory to implement the intelligent management method and system for engineering contract progress and hidden works based on UAV multimodal perception provided in the above embodiments when executing the program. The method includes: S1, parsing the unstructured construction contract text, extracting and converting it into a set of structured contract clauses containing spatial location constraint attributes. S2, Monitor the set of structured contract terms. The timeline status and business event signaling, when the preset triggering conditions are met, generate the initial UAV flight mission command. S3. Obtain the command for the UAV to execute the initial UAV flight mission. Collected multimodal image dataset For this multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. S4. Extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. S5. Calculate the hash value of this lightweight core evidence digest. Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. S6. When the feature set of completed work is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
[0128] Furthermore, computer equipment also includes: The Communications Interface (CI) is used for communication between the memory and the processor.
[0129] The memory may include high-speed RAM, and may also include non-volatile memory, such as at least one disk drive.
[0130] If the memory, processor, and communication interface are implemented independently, they can be interconnected via a bus to communicate with each other. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 2 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0131] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0132] Display devices are used to display images, videos, etc. Display devices may include display panels, which can be liquid crystal displays (LCDs), organic light-emitting diodes (OLEDs), active-matrix organic light-emitting diodes (AMOLEDs), flexible light-emitting diodes (FLEDs), MiniLEDs, MicroLEDs, Micro-OLEDs, quantum dot light-emitting diodes (QLEDs), etc.
[0133] Alternatively, in a specific implementation, if the memory, processor, and communication interface are integrated on a single chip, then the memory, processor, and communication interface can communicate with each other through an internal interface.
[0134] On the other hand, embodiments of this application also provide a computer-readable storage medium storing a computer program thereon. When executed by a processor, the program implements the above-mentioned intelligent management method and system for engineering contract progress and hidden works based on UAV multimodal perception. The method includes: S1, parsing unstructured construction contract text, extracting and converting it into a set of structured contract clauses containing spatial location constraint attributes. S2, Monitor the set of structured contract terms. The timeline status and business event signaling, when the preset triggering conditions are met, generate the initial UAV flight mission command. S3. Obtain the command for the UAV to execute the initial UAV flight mission. Collected multimodal image dataset For this multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. S4. Extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. S5. Calculate the hash value of this lightweight core evidence digest. Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. S6. When the feature set of completed work is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
[0135] On another front, this application also provides a computer program product, which includes a computer program that can be stored on a computer-readable storage medium. The computer program can execute computer instructions. When the computer program is executed by a processor, the computer can execute the intelligent management method and system for engineering contract progress and hidden works based on UAV multimodal perception provided by the above methods. This method includes: S1, parsing unstructured construction contract text, extracting and converting it into a set of structured contract clauses containing spatial location constraint attributes. S2, Monitor the set of structured contract terms. The timeline status and business event signaling, when the preset triggering conditions are met, generate the initial UAV flight mission command. S3. Obtain the command for the UAV to execute the initial UAV flight mission. Collected multimodal image dataset For this multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. S4. Extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. S5. Calculate the hash value of this lightweight core evidence digest. Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. S6. When the feature set of completed work is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
[0136] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-included system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).
[0137] For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit a program for use in or in conjunction with an instruction execution system, apparatus, or device. More specific examples of computer-readable media (a non-exhaustive list) include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. Additionally, a computer-readable medium can even be paper or other suitable media on which the program can be printed, since the program can be obtained electronically by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.
[0138] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0139] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0140] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.
[0141] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.
[0142] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.
[0143] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of this application.
Claims
1. A method for intelligent management of engineering contract progress and hidden works based on UAV multimodal perception, characterized in that, Includes the following steps: S1. Parse the unstructured construction contract text, extract and convert it into a set of structured contract clauses containing spatial location constraints. ; S2, Monitor the set of structured contract terms. The timeline status and business event signaling, when the preset triggering conditions are met, generate the initial UAV flight mission command. ; S3. Obtain the command for the drone to execute the initial drone flight mission. Collected multimodal image dataset For this multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. ; S4. Extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. ; S5. Hash value of the lightweight core evidence digest Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. ; S6. When the feature set of completed work volume is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
2. The intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception according to claim 1, characterized in that, The unstructured construction contract text is parsed, and the extracted and converted into a set of structured contract clauses that include spatial location constraints. ,include: The system calls upon a deep learning-based pre-trained named entity recognition model, combined with a predefined engineering industry-specific dictionary, to extract key entities. Based on the preset regular expression template and entity mapping rules, the key entity is converted into the structured contract terms set. .
3. The intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception according to any one of claims 1 or 2, characterized in that, Generate initial UAV flight mission instructions Following that, it also includes: The initial drone flight mission command As the initial seed point, the optimal orthogonal plane for calculating the bounding box of the target component in the 4D Building Information Model is imported to generate the desired set of orthographic observation viewpoints. ; Solving the optimal visual-inertial state vector of a UAV based on a visual-inertial navigation system. An incremental exploration algorithm is used in conjunction with the desired frontal observation viewpoint set. Generate the optimal B-spline flight trajectory .
4. The intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception according to claim 1, characterized in that, In step S3, a multimodal image dataset is obtained. The process is as follows: S31. Obtain the original image sequence through the edge computing station deployed at the project site, and call the Laplacian variance operator to perform motion blur detection, and remove images with variance values lower than the dynamically set threshold. S32. Calculate the overlap between the actual flight path coverage area of the UAV and the planned mission area, and trigger an alarm signal when the overlap is lower than the preset overlap threshold. S33. Fuse the qualified image sequences to generate the multimodal image dataset. .
5. The intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception according to claim 1, characterized in that, This multimodal image dataset Perform 3D reconstruction and object-level semantic segmentation, extract and output the feature set of completed engineering quantities. The process is as follows: S41. A depth-regularized planar Gaussian splashing algorithm is used for 3D reconstruction. S42. In multi-view rendering training, an image gradient filter is used to calculate the gradient magnitude of the entire image, and a weak texture region mask is generated when the gradient magnitude is lower than the low-frequency texture threshold. ; S43, Masking in the weak texture area Internally, the monocular depth map feature matrix output by the pre-trained depth estimation network is used as the ground truth to constrain the Gaussian volume rendering depth, thereby generating a current dense point cloud model. ; S44. The current dense point cloud model Multi-level feature extraction and pixel-level annotation are performed to generate object-level semantic segmentation label features. ; S45. Extract the initial real-world 3D baseline model. The known static obstacle depth map, compared with the current dense point cloud model. The rendered depth map is subjected to pixel-by-pixel differencing; when the depth difference of a target object is detected to be greater than a set threshold and its semantic category does not belong to the bill of quantities, it is identified as a dynamic occlusion, and its label features are segmented from the object-level semantics. In-middle stripping; S46. Extract the stripped object-level semantic segmentation label features. Basic engineering body map Perform three-dimensional Boolean difference operations to calculate volume or area integrals, and output the feature set of the completed project. .
6. The intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception according to claim 1, characterized in that, Generate lightweight core evidence digest hash value The process of consolidating consensus is as follows: S51, the hash value of the lightweight core evidence digest The associated set of structured contract terms The specific payment node ID and the hash value of the previous block are encapsulated into a smart contract transaction request; S52. Broadcast the smart contract transaction request to the engineering consortium blockchain node network, verify its legality through a practical Byzantine fault-tolerant consensus mechanism, and write it into the latest block after more than two-thirds of the nodes have verified it, thereby obtaining the blockchain storage receipt certificate. .
7. The intelligent management method for engineering contract progress and hidden works based on UAV multimodal perception according to claim 1, characterized in that, In step S6, an automatic payment decision signaling is issued and pushed. Previously, and the determination process for this preset condition, are as follows: S61. Simultaneously render the current dense point cloud model on an engineering digital twin collaborative platform based on WebGL technology. ; S62. In response to the user's click operation on the semantically segmented building component in the 3D view, the reverse spatial index addressing algorithm is triggered. S63. By mapping point cloud coordinates, retrieve and display in a pop-up window the original image and the feature set of the completed project quantity bound to the building component from the database. and the blockchain-based evidence receipt. ; S64. Determine whether the preset condition is met: the ratio of the measured completed work volume of all associated progress nodes to the total design work volume is greater than or equal to 95%, and the digital handover status of all associated concealed works is confirmed.
8. A multimodal perception-based intelligent management system for engineering contract progress and concealed works using unmanned aerial vehicles (UAVs), characterized in that, include: The parsing module is used to parse unstructured construction contract texts, extract and convert them into a set of structured contract clauses that include spatial location constraints. ; The scheduling module is used to monitor this set of structured contract terms. The timeline status and business event signaling; when the preset triggering conditions are met, the initial UAV flight mission command is generated. ; The processing module is used to obtain the instructions for the UAV to execute the initial UAV flight mission. The collected multimodal image dataset; The processing module is also used to perform 3D reconstruction and object-level semantic segmentation on the multimodal image dataset, and extract and output the feature set of the completed engineering quantity. ; The processing module is also used to extract the feature set of the completed project quantity. Generate a lightweight core evidence digest hash value from the collected data. ; The evidence consolidation module is used to store the hash value of the lightweight core evidence digest. Broadcast to the blockchain node network for consensus and proof, and obtain a blockchain evidence storage receipt. ; The decision module is used when the feature set of completed work volume is detected. With the blockchain-based evidence receipt Satisfying this set of structured contract terms When the preset conditions are met, an automatic payment decision signaling is issued and pushed. .
9. A computer device comprising at least one processor coupled to at least one memory storing at least one computer program or instruction, characterized in that, The computer program or instructions are loaded and executed by the processor to implement the steps of the intelligent management method and system method for engineering contract progress and hidden works based on UAV multimodal perception as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program or instructions, which, when executed by a processor, implement the steps of the intelligent management method and system for engineering contract progress and concealed works based on UAV multimodal perception as described in any one of claims 1 to 7.