Construction site dynamic guidance correction method based on augmented reality and real-time BIM
By collecting multimodal data in real time using AR devices and combining it with SLAM and deep learning models, deviations in construction components are identified and correction instructions are generated. This solves the problem of low efficiency in construction quality control in existing technologies, realizes automated and visualized correction guidance on construction sites, and improves construction accuracy and efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JILIN JIANZHU UNIVERSITY
- Filing Date
- 2025-11-27
- Publication Date
- 2026-06-26
Smart Images

Figure CN121597015B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent construction technology, and more specifically, relates to a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM. Background Technology
[0002] In modern construction management practices, the requirements for construction precision, efficiency, and quality control are becoming increasingly stringent. The increasing complexity of building structures and the intricate details of design pose significant challenges to traditional methods of on-site guidance and quality inspection based on two-dimensional engineering drawings. Specifically, construction workers must expend considerable effort mentally converting two-dimensional drawings into three-dimensional physical structures. This process is highly susceptible to human error, leading to problems such as component installation deviations from design requirements, conflicts in pipeline layouts, and inconsistencies in structural component dimensions. Consequently, on-site quality inspection and correction often heavily rely on manual comparison using measuring tools. This process is not only inefficient but also fails to provide comprehensive and timely verification of construction results, creating potential for rework, delays, and cost overruns. This severely restricts the overall production efficiency and lean management level of the construction industry.
[0003] To address these challenges, Building Information Modeling (BIM) technology has emerged and developed significantly. BIM provides accurate 3D digital models of buildings and their components, establishing a unified digital benchmark that greatly improves collaborative efficiency during the design phase and conflict detection capabilities during the pre-construction phase. Building on this foundation, some advanced exploratory solutions attempt to combine BIM models with Augmented Reality (AR) technology, aiming to bring precise digital model information to the actual construction site. Through AR devices, virtual BIM models can be accurately overlaid onto the real construction scene, providing on-site workers with intuitive and visual references and guidance. This combination, to some extent, overcomes the comprehension barriers of traditional 2D drawings, enabling construction workers to more directly refer to the 3D digital model for component positioning and installation, thereby improving the accuracy and efficiency of construction guidance at the visual level.
[0004] Therefore, how to construct a technical method that can acquire the actual status of the construction site in real time, compare it with a high-precision BIM model in an automated and quantitative manner, and then identify deviations in real time, generate operable correction instructions, and finally form a closed-loop dynamic guidance and correction mechanism has become a key challenge and a technical problem that needs to be solved by those skilled in the art. Summary of the Invention
[0005] According to a first aspect of the present invention, the present invention claims protection for a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, comprising the following steps:
[0006] S1, AR devices are used to collect and preprocess multimodal data of the construction site in real time. Through synchronous positioning and map building SLAM algorithm, a three-dimensional environmental map of the current work area is built in real time, and the pose of the AR device itself is tracked.
[0007] S2, Load the BIM model and initial registration space, load the BIM data corresponding to the current construction task into the data processing unit, and analyze the theoretical design pose information acquired in real time by the AR device and the known reference point data of the construction site to achieve the initial coarse registration of the BIM model and the AR device coordinate system to establish a common reference coordinate system.
[0008] S3, real-time identification of construction components and estimation of pose, using a pre-trained deep learning target detection and instance segmentation model to analyze multimodal data, real-time identification of various target construction components and calculation of their three-dimensional spatial position and actual pose relative to the coordinate system of the AR device;
[0009] S4. Quantify the deviation between the actual state and the BIM model. Compare the actual pose of the target construction component with the theoretical design pose information in a unified reference coordinate system. Quantify the deviation of the actual spatial position and posture of the target construction component from the theoretical position and posture of the BIM model.
[0010] S5, determine if the deviation exceeds the threshold, generate dynamic correction instructions, compare the various quantitative deviation values with the geometric tolerance thresholds of the corresponding construction components preset in the BIM model or defined in the engineering specifications, if they exceed the thresholds, determine that the current construction component has a deviation and needs to be corrected, and automatically generate dynamic correction instructions in combination with the preset component correction strategy.
[0011] S6, The dynamic correction command is superimposed on the real construction scene in the field of view of the construction personnel in the form of a virtual object through the AR device in real time, and the AR device updates the display information in real time.
[0012] S7. During the component correction operation by the construction personnel, the construction site data is continuously and in real time re-collected, the component position is identified, and the deviation after correction is quantified and calculated. When all deviation values converge to within the preset tolerance threshold, the correction is determined to be successful, and a prompt message is displayed on the interface of the AR device. The finally corrected component position, correction process data, and related deviation reports are automatically stored in the BIM model database or project management system to obtain a complete construction quality control log.
[0013] Furthermore, the method also includes:
[0014] In S1, the RGB video stream, depth information and point cloud data of the construction site are captured in real time. The AR device performs precise time synchronization of data from different sensors through a synchronization controller, and performs preliminary spatial filtering, noise reduction and distortion correction on the collected raw data.
[0015] The AR device integrates an RGB camera, a depth sensor, an inertial measurement unit (IMU), and a system-on-a-chip (SoC) with a dedicated neural network processing unit (NPU).
[0016] The AR device, based on IMU data and visual odometry information, uses the Simultaneous Localization and Mapping (SLAM) algorithm to build a three-dimensional environmental map of the current work area in real time and tracks the AR device's own pose in the three-dimensional environmental map.
[0017] The BIM data in S2 includes the precise three-dimensional geometric information, semantic information, material properties, and engineering design dimensional tolerance requirements of the components to be constructed.
[0018] For each identified target construction component, the data processing unit further combines the RGB video, the depth information, and the point cloud data generated by SLAM to execute a pose estimation algorithm.
[0019] Furthermore, the real-time acquisition and preprocessing of multimodal data from the construction site in S1 includes:
[0020] The AR device's RGB camera acquires the depth information in real time at a frame rate no lower than a preset frame rate;
[0021] The IMU includes a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer, which measures the linear acceleration, angular velocity, and magnetic field direction of the AR device itself in real time at a preset sampling frequency.
[0022] The NPU integrated into the SoC is used to locally perform real-time sensor data fusion, partial preprocessing, and preliminary deep learning inference tasks.
[0023] The synchronization controller employs a hardware-level or operating system-level timestamp synchronization mechanism to ensure that all sensor data are aligned on a strict timeline.
[0024] The initial preprocessing includes spatial filtering using median filtering or bilateral filtering algorithms, and noise reduction using statistical outlier removal (SOR) or voxel downsampling methods. The SOR algorithm is configured with a neighborhood number K of 50 and a standard deviation factor of 1.0.
[0025] Distortion correction utilizes the intrinsic parameter matrix and distortion coefficients obtained in advance through camera calibration to eliminate radial and tangential distortions in the RGB video and the depth information.
[0026] Furthermore, the simultaneous localization and mapping (SLAM) algorithm in S1 adopts a visual-inertial odometry (VIO) fusion architecture, combining pre-integrated IMU measurements with visual feature matching or direct image alignment to achieve robust pose tracking and map building.
[0027] The front-end visual odometry of the SLAM system of the AR device estimates the relative motion of the AR device by tracking feature points in the scene;
[0028] The backend of the SLAM system employs a nonlinear optimization method to jointly optimize the pose and map points of all keyframes in order to correct accumulated errors and generate a consistent 3D environment map.
[0029] The three-dimensional environment map is a sparse point cloud, a dense point cloud, or a grid-based representation.
[0030] Furthermore, the data acquisition in S1 also includes acquiring dense three-dimensional point cloud data of the construction site through an external high-precision laser scanner wirelessly connected to the AR device;
[0031] The data processing unit fuses the local point cloud data acquired by the laser scanner with the environmental map constructed by the AR device through SLAM using a point cloud registration algorithm;
[0032] The point cloud registration algorithm includes the Iterative Closest Point (ICP) algorithm and its variants;
[0033] The fused point cloud data is used for component pose estimation in S3 and quantitative calculation of geometric dimension deviation in S4 to achieve deviation detection.
[0034] Furthermore, the real-time identification and high-precision pose estimation of construction components in S3 includes:
[0035] The deep learning object detection and instance segmentation model adopts a real-time detection and segmentation network based on the convolutional neural network (CNN) or Transformer architecture.
[0036] The deep learning object detection and instance segmentation model outputs a semantic segmentation mask for the component, and extracts the geometric information of a single component from the point cloud data;
[0037] The pose estimation algorithm adopts a fusion of feature point matching and optimization, RANSAC-based PnP algorithm and Iterative Closest Point ICP algorithm;
[0038] The pose estimation algorithm continuously tracks the target construction component, adapting to the dynamic changes of the component during construction.
[0039] Specific feature extraction strategies are adopted for different types of components. For components with planar features, a plane fitting algorithm is used to extract their surface normal vectors and positions.
[0040] For components with edge or corner features, edge detection and corner detection algorithms are used to extract key geometric features, and depth information is combined to construct a three-dimensional feature point cloud.
[0041] Furthermore, the quantitative calculation of the deviation between the actual state and the BIM model in S4 includes:
[0042] The quantitative calculation of deviations includes translational deviations ΔX, ΔY, and ΔZ, which represent the actual offset distance of the component in the X, Y, and Z axis directions, and rotational deviations ΔRoll, ΔPitch, and ΔYaw, which represent the actual rotational angle differences of the component in the three principal axes.
[0043] The data processing unit calculates the dimensional deviation of the component by extracting the geometric shape parameters of the component from real-time point cloud data based on the geometric definition of the component in the BIM model and comparing them with the theoretical dimensions in the BIM model.
[0044] All deviation values are output in specific numerical form, and confidence intervals can be provided.
[0045] When comparing component poses, geometric topology analysis is performed on the components in the BIM model to identify the local coordinate system and the transformation relationship between the local coordinate system and the global coordinate system. The calculation of translation and rotation deviations is performed in the local coordinate system of the component.
[0046] Furthermore, the deviation exceeding the threshold judgment and dynamic correction instruction generation in S5 includes:
[0047] The dynamic correction command includes the correction direction, the specific movement distance, or the precise adjustment angle;
[0048] The inverse kinematics algorithm or geometric transformation algorithm based on the optimization principle is used to generate dynamic correction instructions, and the shortest path and minimum adjustment amount required to adjust the actual component pose to the theoretical pose of the BIM model are calculated.
[0049] The instructions also depend on the type of the component and its physical adjustable range;
[0050] The dynamic correction instruction generation is also based on the operability constraints of the construction site. If the deviation of the component exceeds the reasonable range of manual correction, it will automatically prompt that auxiliary mechanical equipment is needed for correction and generate guidance information for the operators of these mechanical equipment.
[0051] The correction strategy has multiple preset priorities, prioritizing translational adjustments before rotational adjustments;
[0052] The system employs an instruction generation module, which includes a predefined knowledge base that stores calibration experience and operating procedures for different component types, materials, and installation methods.
[0053] Furthermore, the augmented reality visualization guidance and real-time feedback in S6 include:
[0054] The virtual objects include directional arrows, numerical indicators, virtual models of target locations, color-coded prompts, and text prompts.
[0055] The directional arrow dynamically displays the direction of correction, and its length or color intensity is adjusted according to the magnitude of the deviation;
[0056] The target location virtual model renders the final target location of the component in a semi-transparent BIM model to contrast with the actual component, or displays it in the form of wireframe or ghost model overlay.
[0057] The color coding prompts are highlighted in red or orange when the component is in an out-of-threshold deviation state, and switch to yellow or green when the component is close to or reaches the tolerance range.
[0058] The virtual object also includes audio auxiliary commands that play voice prompts through the built-in speaker of the AR device;
[0059] The content displayed on the AR device is customized according to the construction personnel's professional level and current task.
[0060] Furthermore, the closed-loop verification and recording in S7 includes:
[0061] The correction effect is evaluated in real time, and the guidance information on the display interface of the AR device is dynamically updated.
[0062] When all deviation values converge to within the preset tolerance threshold, the correction is deemed successful, and a prompt message indicating that the correction is complete or meets the requirements is displayed on the interface of the AR device.
[0063] If the deviation still exceeds the threshold after the correction operation, S5 and S6 will continue to be executed in a loop until the component accuracy meets the requirements or the preset maximum number of iterations is reached, forming a continuous and adaptive closed-loop correction feedback mechanism.
[0064] The data processing unit automatically updates the corrected component information into the BIM model, thereby achieving real-time synchronization of the BIM model's as-built status.
[0065] The update includes marking or replacing the actual pose, dimensions, and any corrections of the components in the BIM model;
[0066] The recording module supports the generation of exportable, traceable quality reports in PDF or CSV format, which include construction date, time, operator ID, original deviation, correction amount, final deviation, and compliance assessment with the design tolerances of the BIM model.
[0067] This invention discloses a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, belonging to the field of intelligent construction technology. It involves real-time acquisition and preprocessing of multimodal data from the construction site using AR devices, constructing an environmental map using SLAM algorithms and tracking equipment poses, loading a BIM model and performing initial registration with the real space, identifying construction components using a deep learning model and estimating their actual poses, quantifying the deviation between the actual state and the theoretical BIM model, automatically generating dynamic correction instructions when the deviation exceeds a threshold, and displaying these instructions as virtual objects overlaid on the real scene using AR devices to guide construction personnel in on-site operations. Finally, closed-loop verification and recording are performed to form a complete construction quality control log. This invention achieves full automation and visualization from deviation detection and guidance correction to verification and recording, significantly improving construction accuracy and efficiency. Attached Figure Description
[0068] Figure 1 A flowchart illustrating the workflow of a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, as claimed in an embodiment of the present invention.
[0069] Figure 2 A second workflow diagram of a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, as claimed in an embodiment of the present invention;
[0070] Figure 3 A third workflow diagram of a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, as claimed in an embodiment of the present invention;
[0071] Figure 4 The fourth workflow diagram of a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, as claimed in the embodiments of the present invention. Detailed Implementation
[0072] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of the embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of this application.
[0073] The terms "first," "second," and "third" in this application are for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of those features. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified. All directional indications (such as up, down, left, right, front, back, etc.) in the embodiments of this application are only used to explain the relative positional relationships and movements between components in a specific orientation (as shown in the figures). If the specific orientation changes, the directional indications also change accordingly. Furthermore, the terms "including" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.
[0074] References to embodiments herein mean that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0075] With the acceleration of industrialization and digital transformation in the construction industry, and the increasingly stringent requirements for construction accuracy, real-time response, and closed-loop control in application scenarios, the inherent limitations of the aforementioned BIM-AR integration solutions based on static visualization comparison are gradually becoming apparent. The core of existing AR-BIM technology lies in spatially registering a pre-set BIM model and then overlaying it onto the real world as virtual guidance information. Essentially, this is a one-way, passive information output mode, primarily functioning as static guidance or reference. The reason for this is that such solutions lack a crucial dynamic correction mechanism: the system cannot actively and in real-time identify any potential deviations between the physical building structure and the digital BIM model, nor can it generate targeted and actionable correction instructions based on these deviations to guide construction personnel in on-site adjustments. Specifically, when the actual position or orientation of components on the construction site differs from the BIM model, existing AR devices can visually display the differences between the virtual model and the actual object. However, this merely provides a visual comparison between what should look like and what looks wrong now; it does not endow the system with the ability to automatically identify what is wrong and guide how to correct it.
[0076] A deeper contradiction lies in the gaps in the environmental perception and intelligent decision-making chains of existing AR-BIM systems. Although AR devices can capture real-time video streams and perform self-positioning and tracking, these capabilities primarily serve the accurate overlay of virtual models, rather than real-time, high-precision geometric and semantic analysis of actual construction components. In other words, the system lacks the ability to understand the physical objects being constructed on-site and cannot automatically and quantitatively identify the precise numerical deviations between the actual spatial location, size, or orientation of a component and its corresponding element in the BIM model. Therefore, even minute deviations, imperceptible to the naked eye but potentially having a cascading impact on subsequent construction, cannot be automatically detected and reported by the system. Once a deviation is detected, quantifying its degree, determining the correction direction, and calculating the correction amount still heavily rely on time-consuming comparisons and subjective judgments using auxiliary measurement tools. This passivity prevents errors during construction from being detected and accurately quantified in a timely manner, thus delaying correction opportunities and failing to fully realize the real-time and accuracy potential promised by AR technology. More importantly, this open-loop working model, lacking an automated, data-driven feedback loop between problem identification and problem-solving guidance, leads to bottlenecks in construction quality control efficiency and makes it difficult to achieve continuous, iterative improvements in accuracy. In the complex, ever-changing, and highly precision-demanding modern construction sites, the guiding value and efficiency-enhancing effects of this AR application model, which only displays information without providing correction, are insufficient to fully meet the industry's growing demand for refined management.
[0077] According to the first embodiment of the present invention, referring to Figure 1 This invention claims protection for a dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, comprising the following steps:
[0078] S1 uses AR equipment to collect and preprocess multimodal data of the construction site in real time. Through synchronous positioning and map building SLAM algorithm, a three-dimensional environmental map of the current work area is built in real time, and the AR equipment's own pose is tracked.
[0079] S2, Load BIM model and initial registration space, load the BIM data corresponding to the current construction task into the data processing unit, and achieve initial coarse registration between BIM model and AR device coordinate system by analyzing the theoretical design pose information obtained by AR device in real time and the known reference point data of construction site to establish a common reference coordinate system.
[0080] S3 identifies construction components and estimates their poses in real time. It uses a pre-trained deep learning target detection and instance segmentation model to analyze multimodal data, identify various types of target construction components in real time, and calculate their three-dimensional spatial position and actual pose relative to the AR device coordinate system.
[0081] S4, quantitatively calculate the deviation between the actual state and the BIM model, compare the actual pose of the target construction component with the theoretical design pose information under a unified reference coordinate system, and quantitatively calculate the deviation of the actual spatial position and posture of the target construction component relative to the theoretical position and posture of the BIM model.
[0082] S5, determine if the deviation exceeds the threshold, generate dynamic correction instructions, compare the various quantitative deviation values with the geometric tolerance thresholds of the corresponding construction components preset in the BIM model or defined in the engineering specifications, if they exceed the thresholds, determine that the current construction component has a deviation and needs to be corrected, and automatically generate dynamic correction instructions in combination with the preset component correction strategy.
[0083] S6 displays dynamic correction instructions as virtual objects overlaid on the real construction scene in the field of view of construction workers in real time through AR devices, and the AR devices update the displayed information in real time.
[0084] S7 continuously re-collects construction site data in real time during the component correction operation by construction personnel, identifies component poses, and quantifies and calculates the deviations after correction. When all deviation values converge to within the preset tolerance threshold, the correction is deemed successful, and a prompt message is displayed on the AR device interface. The finally corrected component poses, correction process data, and related deviation reports are automatically stored in the BIM model database or project management system to obtain a complete construction quality control log.
[0085] Furthermore, the method also includes:
[0086] In S1, the RGB video stream, depth information and point cloud data of the construction site are captured in real time. The AR device performs precise time synchronization of data from different sensors through a synchronization controller, and performs preliminary spatial filtering, noise reduction and distortion correction on the collected raw data.
[0087] The AR device integrates an RGB camera, a depth sensor, an inertial measurement unit (IMU), and a system-on-a-chip (SoC) with a dedicated neural network processing unit (NPU).
[0088] The AR device, based on IMU data and visual odometry information, uses the Simultaneous Localization and Mapping (SLAM) algorithm to build a three-dimensional environmental map of the current work area in real time and tracks the AR device's own pose in the three-dimensional environmental map.
[0089] The BIM data in S2 includes precise three-dimensional geometric information, semantic information, material properties, and engineering design dimensional tolerance requirements of the components to be constructed.
[0090] For each identified target construction component, the data processing unit further combines the RGB video, the depth information, and the point cloud data generated by SLAM to execute a pose estimation algorithm.
[0091] Furthermore, the real-time acquisition and preprocessing of multimodal data from the construction site in S1 includes:
[0092] The AR device's RGB camera acquires the depth information in real time at a frame rate no lower than a preset frame rate;
[0093] The IMU includes a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer, which measures the linear acceleration, angular velocity, and magnetic field direction of the AR device itself in real time at a preset sampling frequency.
[0094] The NPU integrated into the SoC is used to locally perform real-time sensor data fusion, partial preprocessing, and preliminary deep learning inference tasks.
[0095] The synchronization controller employs a hardware-level or operating system-level timestamp synchronization mechanism to ensure that all sensor data are aligned on a strict timeline.
[0096] The initial preprocessing includes spatial filtering using median filtering or bilateral filtering algorithms, and noise reduction using statistical outlier removal (SOR) or voxel downsampling methods. The SOR algorithm is configured with a neighborhood number K of 50 and a standard deviation factor of 1.0.
[0097] Distortion correction utilizes the intrinsic parameter matrix and distortion coefficients obtained in advance through camera calibration to eliminate radial and tangential distortions in the RGB video and the depth information.
[0098] Furthermore, the simultaneous localization and mapping (SLAM) algorithm in S1 adopts a visual-inertial odometry (VIO) fusion architecture, combining pre-integrated IMU measurements with visual feature matching or direct image alignment to achieve robust pose tracking and map building.
[0099] The front-end visual odometry of the SLAM system of the AR device estimates the relative motion of the AR device by tracking feature points in the scene;
[0100] The backend of the SLAM system employs a nonlinear optimization method to jointly optimize the pose and map points of all keyframes in order to correct accumulated errors and generate a consistent 3D environment map.
[0101] The three-dimensional environment map is a sparse point cloud, a dense point cloud, or a grid-based representation.
[0102] Furthermore, referring to Figure 2 The data acquisition in S1 also includes:
[0103] Dense 3D point cloud data of the construction site is collected by an external high-precision laser scanner that is wirelessly connected to the AR device.
[0104] The data processing unit fuses the local point cloud data acquired by the laser scanner with the environmental map constructed by the AR device through SLAM using a point cloud registration algorithm. The point cloud registration algorithm includes the Iterative Closest Point (ICP) algorithm and its variants.
[0105] The fused point cloud data is used for component pose estimation in S3 and quantitative calculation of geometric dimension deviation in S4 to achieve deviation detection.
[0106] In this embodiment, the core of this step lies in utilizing an industrial-grade augmented reality (AR) head-mounted display device to comprehensively and multidimensionally capture data about the construction site environment. The AR device, for example, is a system-on-a-chip (SoC) integrating a high-resolution RGB camera, a high-precision depth sensor, an inertial measurement unit (IMU), and a dedicated neural network processing unit (NPU), such as the Magic Leap 2 or Microsoft HoloLens 2. Specifically, the high-resolution RGB camera, for example, with at least 12 million effective pixels and a frame rate of up to 60fps, is responsible for capturing real-time color video streams of the construction site, providing texture and semantic information. The high-precision depth sensor, such as a sensor based on the time-of-flight (ToF) principle or a depth sensor based on the structured light principle, acquires three-dimensional depth images of the construction site in real time at a frame rate of at least 30fps, thereby generating high-density point cloud data. This point cloud data provides accurate spatial information for the geometric shape recognition and pose estimation of components. The IMU typically includes a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer, measuring the AR device's linear acceleration, angular velocity, and magnetic field direction in real time at sampling frequencies up to 1000Hz, thus providing high-frequency raw six-DOF pose data. The NPU integrated into the SoC is used to locally perform real-time sensor data fusion, some preprocessing, and preliminary deep learning inference tasks to reduce data transmission latency and improve system response speed.
[0107] The synchronization controller inside the AR device, for example using hardware-level or operating system-level timestamp synchronization mechanisms, ensures that all data acquired by the sensors are aligned on a strict timeline, with typical synchronization accuracy down to the microsecond level. The acquired raw data undergoes preliminary preprocessing. Spatial filtering removes outliers and noise from the depth image, using algorithms such as median filtering or bilateral filtering. The kernel size can be adjusted according to the noise characteristics, typically 3x3 or 5x5. Noise reduction effectively reduces point cloud density and eliminates random noise through methods such as statistical outlier removal or voxel downsampling. For example, the SOR algorithm can be configured with a neighborhood number K of 50 and a standard deviation factor of 1.0. Distortion correction uses the intrinsic parameter matrix and distortion coefficients obtained beforehand through camera calibration to perform geometric correction on the RGB and depth images, eliminating radial and tangential distortions and ensuring that straight lines in the image remain straight after correction, thereby improving the accuracy of feature extraction.
[0108] Based on IMU data and visual odometry (VO) information, the AR device uses a simultaneous localization and mapping (SLAM) algorithm, such as feature-point-based PTAM, direct method-based DSO, or LSD-SLAM, to construct a 3D environmental map of the current working area in real time. The SLAM system typically employs a visual-inertial odometry (VIO) fusion architecture, combining pre-integrated IMU measurements with visual feature matching or direct method image alignment to achieve robust pose tracking and map construction. The front-end visual odometry estimates the relative motion of the AR device by tracking feature points in the scene or directly utilizing changes in image pixel intensity. The back-end uses a nonlinear optimization method to jointly optimize the pose and map points of all keyframes to correct accumulated errors and generate a more consistent 3D environmental map. This 3D environmental map can be a sparse point cloud, a dense point cloud, or a grid-based representation. This SLAM process ensures that the AR device's pose within the 3D environmental map can be tracked with high precision in real time, achieving sub-centimeter accuracy in static environments and maintaining centimeter-level accuracy for short periods in dynamic environments.
[0109] In a preferred embodiment of the present invention, the data acquisition in S1 can also be achieved by acquiring dense 3D point cloud data of the construction site using an external high-precision laser scanner wirelessly connected to the AR device. The external laser scanner, for example, a handheld Leica BLK2GO or a ground-based Faro Focus S350, has a measurement accuracy better than 1 mm under typical working conditions, reaching 0.5 mm within a 10-meter range, and can provide high-density geometric point cloud data at a rate of hundreds of thousands or even millions of points per second. The laser scanner transmits data in real time to the AR device or data processing unit via wireless communication protocols such as Wi-Fi or Bluetooth. The data processing unit fuses the local high-precision point cloud data acquired by the laser scanner with the environmental map constructed by the AR device using SLAM through a point cloud registration algorithm. Commonly used point cloud registration algorithms include the Iterative Nearest Point (ICP) algorithm and its variants, or registration algorithms based on Normal Distribution Transform (NDT). These algorithms calculate the optimal rigid body transformation by minimizing the geometric distance or statistical difference between two point clouds. For example, the NDT algorithm transforms the point cloud into a set of normal distributions and uses an optimization algorithm to find the optimal matching pose. The fused point cloud data combines the local high-precision advantage of a laser scanner with the real-time, wide-area coverage capability of an AR device's SLAM system, thus providing higher-precision three-dimensional geometric information for component pose estimation in S3 and quantitative calculation of geometric dimension deviations in S4.
[0110] Furthermore, the real-time identification and high-precision pose estimation of construction components in S3 includes:
[0111] The deep learning object detection and instance segmentation model adopts a real-time detection and segmentation network based on the convolutional neural network (CNN) or Transformer architecture.
[0112] The deep learning object detection and instance segmentation model outputs a semantic segmentation mask for the component, and extracts the geometric information of a single component from the point cloud data;
[0113] The pose estimation algorithm adopts a fusion of feature point matching and optimization, RANSAC-based PnP algorithm and Iterative Closest Point ICP algorithm;
[0114] The pose estimation algorithm continuously tracks the target construction component, adapting to the dynamic changes of the component during construction.
[0115] Specific feature extraction strategies are adopted for different types of components. For components with planar features, a plane fitting algorithm is used to extract their surface normal vectors and positions.
[0116] For components with edge or corner features, edge detection and corner detection algorithms are used to extract key geometric features, and depth information is combined to construct a three-dimensional feature point cloud.
[0117] In this embodiment, the task is to understand and quantify the physical world from a complex on-site environment. This requires transforming the environmental map, created in previous steps and filled with raw geometric data, into an object model rich in semantic information that can be directly manipulated by the program. This process integrates cutting-edge deep learning with classical 3D geometric computation.
[0118] The pre-trained deep learning model has been loaded into the neural network processing unit or connected mobile computing unit of the AR device. This model is a complex, hierarchical computation graph that has been trained on massive amounts of building component images and point cloud data, and has learned to map raw pixels and dots to meaningful categories and contours.
[0119] When a real-time RGB video stream is input into the model's first stage, the model first performs object detection. It scans the image, identifies all areas that might contain construction components, and marks them with two-dimensional bounding boxes. More importantly, it predicts a category label for each detected object, such as a steel beam, precast slab, pipe, or rebar cage.
[0120] The second stage of the model, instance segmentation, is activated. This is a more refined task than object detection. It classifies each pixel within each detected bounding box, determining whether it belongs to the foreground or background. Ultimately, it generates an accurate mask for each individual component instance. This mask is a binary image where white pixels represent all points belonging to that component, and black pixels represent everything else.
[0121] Since the RGB images and depth maps have undergone strict spatiotemporal synchronization and coordinate alignment during acquisition, the system can directly project the two-dimensional instance segmentation mask onto the corresponding depth map and the three-dimensional environment point cloud generated by SLAM.
[0122] By using mask indexing and employing a binary mask as a filter, all 3D points labeled as belonging to the component are selected from the complete environmental point cloud. These selected points constitute the instance point cloud of the component. This point cloud cluster is spatially isolated from other objects and accurately represents the component's 3D geometry and surface texture in the real world.
[0123] After obtaining the point cloud of the isolated target component, the most crucial step is to calculate its six-degree-of-freedom pose, that is, the component's position (X, Y, Z coordinates) and orientation in three-dimensional space. This is a typical 3D-to-3D registration problem.
[0124] Initial pose estimation requires a coarse initial transformation as the starting point for iterative optimization, achieved through a feature-based approach. The system extracts local geometric features from the instance point cloud of the component and corresponding features from the surface of the component's CAD model in the BIM model. Through feature matching, some corresponding point pairs are found between the two sets of point clouds. Then, using a robust algorithm such as RANSAC combined with PnP, an initial rotation and translation matrix that can align these corresponding points is estimated. This initial estimate may not be precise enough, but it provides a good starting point for subsequent fine-grained optimization, avoiding getting trapped in local optima.
[0125] After obtaining the initial pose, the system initiates a fine-tuning process, the most common method being the Iterative Closest Point (ICP) algorithm and its variants. The core idea of this algorithm is to iteratively perform the following steps:
[0126] Nearest point search: For each point in the instance point cloud, find the nearest corresponding point on the surface of the BIM CAD model.
[0127] Transformation estimation, based on all found point pairs, calculates an optimal rigid body transformation that minimizes the average distance between all point pairs, with rotation matrix R and translation vector T.
[0128] The transformation application applies the calculated transformation to the instance point cloud, moving it one step closer to the CAD model.
[0129] This process is repeated until the transformation amount between two iterations is less than a preset threshold, or the maximum number of iterations is reached. At this point, the instance point cloud and the BIM CAD model achieve optimal alignment, and the final accumulated transformation matrix accurately describes the transformation relationship from the BIM model coordinate system to the real-world coordinate system. Through inverse transformation, the precise six-degree-of-freedom pose of the component in a unified reference coordinate system can be obtained.
[0130] Furthermore, referring to Figure 3 The quantitative calculation of the deviation between the actual state and the BIM model in S4 includes:
[0131] The quantitative calculation of deviations includes translational deviations ΔX, ΔY, and ΔZ, which represent the actual offset distance of the component in the X, Y, and Z axis directions, and rotational deviations ΔRoll, ΔPitch, and ΔYaw, which represent the actual rotational angle differences of the component in the three principal axes.
[0132] The data processing unit calculates the dimensional deviation of the component based on the geometric definition of the component in the BIM model by extracting the geometric shape parameters of the component from real-time point cloud data, comparing them with the theoretical dimensions in the BIM model, and outputting all deviation values in specific numerical form, with confidence intervals also included.
[0133] When comparing component poses, geometric topology analysis is performed on the components in the BIM model to identify the local coordinate system and the transformation relationship between the local coordinate system and the global coordinate system. The calculation of translation and rotation deviations is performed in the local coordinate system of the component.
[0134] In this embodiment, the digital understanding of the real world obtained in the previous step is precisely and quantitatively compared with the idealized design blueprint. The output is no longer an image or point cloud, but an objective and measurable numerical indicator. These indicators are the direct basis for judging whether the construction quality is up to standard, and also the data foundation for generating correction instructions.
[0135] Before performing any calculations, the system ensures that all data are in the unified common reference coordinate system established in step S2.
[0136] The theoretical design pose of each component in the BIM model has been accurately placed in this virtual coordinate system, which strictly corresponds to the real world, through initial registration and refinement.
[0137] The actual pose of each component calculated through step S3 is itself expressed in this unified coordinate system.
[0138] This is similar to comparing the design drawings and the actual product measurements on the same standard coordinate paper, ensuring that the comparison benchmark is absolutely consistent and eliminating systematic errors caused by inconsistent coordinate systems.
[0139] Positional deviations mainly include translational deviations and rotational deviations, which together describe the degree to which a component's position and orientation in space deviate from the design requirements.
[0140] Translational deviation describes the linear displacement of the actual installed center of gravity or feature point of a component relative to its theoretical design position.
[0141] First, it is necessary to determine the benchmark point for comparison. For regular components, this is usually their geometric center or the origin of the local coordinate system defined in the BIM model. The calculation process is mathematically very intuitive: subtract the theoretical position coordinate vector (X_design, Y_design, Z_design) from the actual position coordinate vector (X_actual, Y_actual, Z_actual) of the component, and the resulting difference vector (ΔX, ΔY, ΔZ) is its translational deviation in the three principal axes of X, Y, and Z.
[0142] A positive ΔX value indicates that the component has shifted by that amount of distance in the positive X-axis direction relative to its design position in actual space, while a negative value indicates a shift in the negative direction. The same logic applies to ΔY and ΔZ. The system typically outputs both the deviation of each axis and the overall Euclidean distance deviation, i.e., √(ΔX / ΔZ). 2 +ΔY 2 +ΔZ 2 This is a comprehensive positional deviation index.
[0143] Rotational deviation describes the angular difference between the actual orientation of a component during installation and its theoretical design orientation. It is usually expressed in Euler angles or similar three rotational angles about an axis.
[0144] This calculation is relatively complex, involving mathematical operations on the rotation matrix. Step S3 yields the total rotation matrix R_total, which rotates the component from its theoretical attitude to its actual attitude. By decomposing this rotation matrix, the system can solve for three independent rotation angle deviations (ΔYaw, ΔPitch, ΔRoll).
[0145] ΔYaw: Represents the deflection of a component on a horizontal plane, much like the angle at which a ship deviates from its course.
[0146] ΔPitch: Represents the pitch angle deviation of a component about a lateral axis (such as the axis in the left-right direction), much like the pitch angle of an airplane.
[0147] ΔRoll: Represents the rolling angle deviation of a component around the front and rear axles, much like the angle of the tilt of an airplane wing.
[0148] Besides location and orientation, the dimensions of a component after manufacturing or installation may also differ from the design. The system quantifies this dimensional deviation by analyzing point clouds of component instances extracted from real-world scenarios.
[0149] Geometric shape parameter extraction: Using a 3D point cloud processing algorithm, the basic geometric shape is fitted from the instance point cloud of the component, and its parameters are calculated.
[0150] For beam and column components, the central axis can be extracted using principal component analysis or linear fitting algorithms, and then the actual length of the axis can be calculated.
[0151] For slab and wall components, the dominant plane is obtained through a plane fitting algorithm, and the actual area of the plane is calculated, or its length and width are calculated through point cloud bounding boxes.
[0152] For complex curved surface components, a surface fitting algorithm is used to calculate key parameters such as the radius of curvature.
[0153] The comparison between theoretical and actual values involves extracting geometric parameters from the actual point cloud and comparing them one by one with the precise theoretical dimensions of the component defined in the BIM model. For example, the calculated length deviation equals the actual length minus the design length. This can identify problems such as steel beams being cut too long or too short, or dimensional errors in precast slabs.
[0154] A mature system not only outputs deviation values but also assesses the reliability of these deviation values by attaching a confidence interval or uncertainty assessment to each calculated deviation value. This assessment is based on:
[0155] Point cloud quality refers to the density and noise level of the instance point cloud. The sparser the point cloud and the greater the noise, the lower the confidence level.
[0156] The convergence of pose estimation algorithms and the final energy function value of optimization algorithms such as ICP are important indicators. A smaller value indicates a better match and higher confidence.
[0157] Sensor status, lighting conditions during data acquisition, distance between the sensor and the component, etc.
[0158] For example, the output translation deviation ΔX might be +5.2mm ± 0.8mm, which indicates that the actual deviation is highly likely to be between 4.4mm and 6.0mm. This provides a more scientific basis for quality judgment and avoids overreacting to results with significant measurement noise.
[0159] Finally, step S4 generates a structured bias data packet for each successfully identified and calculated component. This data packet contains at least:
[0160] Unique identifier for a component;
[0161] Translational deviations (ΔX, ΔY, ΔZ);
[0162] Rotational deviations (ΔRoll, ΔPitch, ΔYaw);
[0163] List of critical dimension deviations;
[0164] Confidence assessment of each deviation;
[0165] Timestamp;
[0166] This complete, quantified, and credibility-assessed deviation data packet will be transmitted in real time to the next decision-making step S5 as authoritative input to determine whether a corrective action needs to be triggered.
[0167] Step S4 transforms abstract construction errors into concrete, objective, and operable engineering data, making data-driven automated quality control and intelligent decision-making possible.
[0168] Furthermore, the deviation exceeding the threshold judgment and dynamic correction instruction generation in S5 includes:
[0169] The dynamic correction command includes the correction direction, the specific movement distance, or the precise adjustment angle;
[0170] The inverse kinematics algorithm or geometric transformation algorithm based on the optimization principle is used to generate dynamic correction instructions, and the shortest path and minimum adjustment amount required to adjust the actual component pose to the theoretical pose of the BIM model are calculated.
[0171] The instructions also depend on the type of the component and its physical adjustable range;
[0172] The dynamic correction instruction generation is also based on the operability constraints of the construction site. If the deviation of the component exceeds the reasonable range of manual correction, it will automatically prompt that auxiliary mechanical equipment is needed for correction and generate guidance information for the operators of these mechanical equipment.
[0173] The correction strategy has multiple preset priorities, prioritizing translational adjustments before rotational adjustments;
[0174] The system employs an instruction generation module, which includes a predefined knowledge base that stores calibration experience and operating procedures for different component types, materials, and installation methods.
[0175] In this embodiment, the quantization deviation data from step S4 is received, and based on preset rules and intelligent strategies, a key judgment is made as to whether intervention is needed, thereby generating specific, feasible, and safe operation instructions.
[0176] First, the system extracts the preset geometric tolerances for the current component from the loaded BIM model properties. These tolerances may be directly stored in the component's property set, for example, installation position tolerance: ±5mm, verticality tolerance: 3mm, length tolerance: -1mm to +2mm. If not explicitly specified in the BIM model, the system will retrieve the tolerance requirements from an integrated, updatable engineering specification knowledge base, based on the component's type, material, and function, from the relevant industry standards or project-specific specifications.
[0177] Each value in the deviation data packet received in step S4 is quickly and automatically compared with its corresponding tolerance threshold. If all deviation values, including translation, rotation, and dimensional deviations, are within their respective tolerance zones, the system determines that the component is installed correctly. This status is recorded and usually does not trigger any correction commands; it may be gently indicated in the AR viewpoint as a green virtual outline or other non-intrusive markers.
[0178] Once any deviation value is found to exceed its preset tolerance threshold, the system immediately determines that the component has a deviation and needs to be corrected. This is the switch that triggers a series of subsequent intelligent decision-making and instruction generation processes.
[0179] Once it is determined that correction is needed, the core task becomes generating one or more guidance instructions that can effectively eliminate the deviation.
[0180] A spatial transformation is calculated to move the component from its current incorrect pose to its theoretically correct pose. This process is similar to the inverse kinematics problem in robotics.
[0181] Given:
[0182] Current pose: P_current;
[0183] Target pose: P_target;
[0184] What needs to be solved is the correction transformation matrix T_correct, which transforms P_current to P_target.
[0185] Through mathematical calculations This transformation can be precisely calculated, and then the transformation matrix can be decomposed into easily understandable translation and rotation vectors. The translation vector directly indicates the direction and distance of the correction, while the rotation vector indicates which axis to rotate around and by how many degrees.
[0186] For priority strategies, a preset correction strategy is followed, such as translating first and then rotating. This is because, during construction, correcting the position of components first can often naturally and partially correct their angular deviations, or at least provide a more reasonable benchmark for subsequent rotational adjustments.
[0187] Using the principle of minimum adjustment, it attempts to calculate the shortest travel distance and minimum rotation angle required to achieve pose alignment. It evaluates different adjustment sequences and selects the path with the least overall movement to save time and effort for construction workers.
[0188] Furthermore, the augmented reality visualization guidance and real-time feedback in S6 include:
[0189] The virtual objects include directional arrows, numerical indicators, virtual models of target locations, color-coded prompts, and text prompts.
[0190] The directional arrow dynamically displays the direction of correction, and its length or color intensity is adjusted according to the magnitude of the deviation;
[0191] The target location virtual model renders the final target location of the component in a semi-transparent BIM model to contrast with the actual component, or displays it in the form of wireframe or ghost model overlay.
[0192] The color coding prompts are highlighted in red or orange when the component is in an out-of-threshold deviation state, and switch to yellow or green when the component is close to or reaches the tolerance range.
[0193] The virtual object also includes audio auxiliary commands that play voice prompts through the built-in speaker of the AR device;
[0194] The content displayed on the AR device is customized according to the construction personnel's professional level and current task.
[0195] In this embodiment, this step is the core interface for system-user interaction. It transforms the complex data, analysis results, and decision instructions generated by all previous steps into sensory information that construction personnel can intuitively, unambiguously, and efficiently understand. Through augmented reality technology, it overlays a digital information layer onto the real physical world, achieving a "what you see is what you get" approach to the correction process.
[0196] AR devices utilize optical display systems or video pass-through display systems to precisely blend computer-generated virtual objects with the real-world scene in the user's field of vision, achieving pixel-level accuracy. This visual presentation system is meticulously designed, adhering to principles of information visualization, human factors engineering, and cognitive load management.
[0197] For the dynamic generation and rendering of directional arrows, which are the most direct symbols guiding spatial movement, the system generates intelligent, dynamic 3D models rather than static arrows. A typical translation command generates a conical arrow pointing from the actual position of the component to the target position. The arrow shaft may be designed as tubular or beam-shaped to enhance the sense of three-dimensionality.
[0198] The physical length of the arrow is proportional to the distance to be moved. An arrow indicating a movement of 50 millimeters will be several times longer than an arrow indicating a movement of 10 millimeters, allowing the operator to intuitively predict the range of adjustment.
[0199] Arrow color is an important visual coding technique. Typically, red or bright orange is used to indicate a current error requiring immediate correction. A flashing effect can be used to emphasize critical instructions or instructions that haven't yet been executed, thus attracting the user's attention.
[0200] The direction of the arrow is determined by the correction vector calculated in step S5, accurately indicating the direction of movement in three-dimensional space. For complex non-linear paths, the system may generate a series of short arrows connected end to end, or a light band with directional indication, to guide an optimal movement trajectory.
[0201] Numerical indicators and text prompts:
[0202] Arrows and ghost models provide qualitative guidance, while numerical indicators offer quantitative support. The system displays key data in clear font floating next to the component or near the arrow, such as: ΔX: +22mm, rotation required: -0.8°. This provides a digital benchmark for fine-tuning.
[0203] For complex multi-step operations or matters requiring special attention, the system displays brief text prompts. For example, Step 1: Loosen the bolt on side A; Step 2: Use a jack to lift to the indicated position. These text messages concretize abstract instructions into actionable steps.
[0204] To reduce over-reliance on the visual channel and provide a more natural interactive experience, the system integrates multiple feedback modes.
[0205] Audio-assisted instructions: The AR device's built-in speaker or bone conduction headphones can provide voice guidance. For example, when the system detects that the operator has begun to move, it may announce: "Please move the component slowly in the direction of the arrow." As the component approaches the target, the voice may change to: "Fine-tuning, 2 millimeters away."
[0206] The entire AR display is not a static image, but a dynamic interface that changes in sync with the physical world.
[0207] As construction workers begin moving the component, steps S1 through S4 continue running in the background. The AR device tracks the changes in the component's pose in real time. As a result, the guide arrows will dynamically shorten and change direction; the numerical indicators will count down in real time; and the color will transition from red, through yellow, and finally to green.
[0208] If the operator mistakenly moves the component in the opposite direction, the arrow will immediately reverse and may begin flashing a red warning, while a voice prompt will indicate the incorrect direction. This immediate feedback prevents erroneous operations and ensures that the correction process is carried out efficiently.
[0209] Through step S6, the system successfully hides the complex algorithms and data analysis behind a simple, user-friendly, and efficient interactive interface. Construction workers do not need to understand SLAM, point cloud registration, or inverse kinematics. They only need to follow the clear virtual guidance in their field of vision to accurately place real components into the ideal positions marked by virtual outlines, just like playing a fill-in-the-blank game. This greatly reduces the technical threshold for high-precision construction and improves the accuracy and efficiency of the operation.
[0210] Furthermore, referring to Figure 4 The closed-loop verification and recording in S7 includes:
[0211] The correction effect is evaluated in real time, and the guidance information on the display interface of the AR device is dynamically updated.
[0212] When all deviation values converge to within the preset tolerance threshold, the correction is deemed successful, and a prompt message indicating that the correction is complete or meets the requirements is displayed on the interface of the AR device.
[0213] If the deviation still exceeds the threshold after the correction operation, S5 and S6 will continue to be executed in a loop until the component accuracy meets the requirements or the preset maximum number of iterations is reached, forming a continuous and adaptive closed-loop correction feedback mechanism.
[0214] The data processing unit automatically updates the corrected component information into the BIM model to achieve real-time synchronization of the BIM model's as-built status. The update includes marking or replacing the component's actual pose, dimensions, and any corrections in the BIM model.
[0215] The recording module supports the generation of exportable, traceable quality reports in PDF or CSV format, which include construction date, time, operator ID, original deviation, correction amount, final deviation, and compliance assessment with the design tolerances of the BIM model.
[0216] In this embodiment, this step is the final stage of the system's intelligent and automated quality control, and the ultimate guarantee that the construction results fully meet the design requirements. It is not merely a simple termination signal, but a continuous, adaptive process of verification, learning, and recording, ultimately forming a complete digital twin closed loop. This step connects the previous six steps into a self-correcting and self-verifying organic whole, generating an immutable digital quality archive.
[0217] While the construction workers were performing the correction operations according to the AR visualization guidance in step S6, the system did not stop, but quietly started a high-frequency, background verification loop. This loop is the core manifestation of the system's intelligence.
[0218] Background monitoring loop startup: Once the system generates the first correction instruction in step S5 and begins displaying it to the user in step S6, a parallel, lightweight monitoring process is immediately activated. This process silently repeats the core tasks of steps S1 to S4 at a very high frequency:
[0219] (S1) Continuously acquire RGB images, depth information and IMU data from the site.
[0220] (S2) Maintain the registration status between the BIM model and reality.
[0221] (S3) Track the pose changes of the target component in real time, but at this time a faster tracking algorithm may be used instead of complete re-identification and segmentation to improve efficiency.
[0222] (S4) Quickly calculate the real-time deviation between the latest pose of the component and the BIM model.
[0223] Dynamic feedback and guided updates: the latest deviation value, calculated in real time, is immediately fed back to the AR display interface and command generation logic.
[0224] The virtual objects in the construction workers' field of vision are no longer static. The length of the guide arrows dynamically shortens as the deviation decreases; the numbers in the numerical indicators decrease in real time like a countdown; the color of the components gradually transitions from red, representing exceeding the tolerance, to yellow, representing approaching the target. This immediate visual change provides operators with strong positive motivation and precise fine-tuning guidance.
[0225] Adaptive refinement of instructions: As the component gets closer to the target, the system-generated instructions evolve from coarse to fine adjustments. For example, the initial instruction might be to move 50 millimeters north, but when only 5 millimeters remain, the instruction would evolve into "Please fine-tune, move precisely north to alignment." This adaptive guidance ensures the final precision.
[0226] Intelligent judgment and positive feedback for successful calibration: The system continuously compares the latest real-time deviation value with the preset tolerance threshold. The system will make a final judgment of successful calibration only when all key deviation items simultaneously fall within their respective tolerance threshold ranges.
[0227] Once the success conditions are met, the AR interface undergoes a significant change. All warning red and guiding arrows and numerical values instantly disappear. The component is surrounded by a striking, green virtual halo, possibly bearing a checkmark icon or the words "Calibration Complete." The entire interface becomes clean and positive.
[0228] Simultaneously, the system will play a short, clear, and positive confirmation tone or voice prompt through the audio channel, such as "Calibration successful!" or "Component installation qualified!". This multi-sensory positive feedback provides construction workers with a clear signal that the task has been completed, improving job satisfaction.
[0229] If the operator's actions fail to bring the deviation to a convergence point, or if new errors are introduced during the adjustment process, the system will continue to run this loop, constantly updating the guidance instructions. To prevent infinite loops, the system has preset safety mechanisms, such as a maximum number of iterations or a maximum operation time. If these limits are exceeded without success, the system will prompt for escalation. If the correction is not completed within the predetermined time, please check for structural obstacles or damage to the components and contact the supervising engineer.
[0230] The system automatically completes all paperwork and generates detailed digital quality records.
[0231] The logging module captures and stores the entire lifecycle data of the correction event, forming a structured log entry. This entry typically contains:
[0232] Component identification information: Component ID, type, and location in the BIM model.
[0233] Time information: time of first problem detection, time of correction start, time of successful correction.
[0234] Personnel Information: Operator ID who performed the calibration operation, and possible supervisor ID.
[0235] Deviation history: initial deviation value, key deviation snapshots during the correction process, and final deviation value.
[0236] Operation log: All instruction sequences generated by the system and the general actions performed by the operator.
[0237] Conformity assessment: Clearly define the approval conclusion and list the conformity status of all deviation items and tolerances.
[0238] Multimedia attachments: The system may automatically save an AR view screenshot or short video clip of the moment the calibration is successful as the most direct evidence.
[0239] Exportability and Traceability of Reports: All this data can be automatically aggregated into standard-format quality reports based on project needs. The most common formats are PDF inspection batch tables or CSV structured data tables. These reports can be printed, signed, and archived directly, or uploaded to a cloud-based project management system. Every piece of data in the report can be traced back to the original BIM model and sensor data via component ID or timestamp, achieving comprehensive, seamless digital traceability of construction quality throughout the entire process.
[0240] Through step S7, the system not only completes the one-time calibration task, but more importantly, it constructs a continuous, high-quality closed loop from the physical site to the digital model, and then from the digital model to guide the physical site. It transforms the traditional quality control model, which relies on human experience and post-construction inspection, into a modern intelligent manufacturing model embedded in the construction process, real-time, data-driven, and capable of self-verification and recording. Ultimately, this significantly improves the precision, reliability, and overall value of building products.
[0241] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between apparatuses or units, and may be electrical, mechanical, or other forms.
[0242] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated units described above can be implemented in hardware or as software functional units. The above are merely embodiments of this application and do not limit the patent scope of this application. Any equivalent structural or procedural transformations made based on the description and drawings of this application, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.
[0243] The specific embodiments of the invention have been described in detail above, but they are only examples, and this application is not limited to the specific embodiments described above. For those skilled in the art, any equivalent modifications or substitutions to the invention are also within the scope of this application. Therefore, all equivalent changes, modifications, and improvements made without departing from the spirit and principles of this application should be covered within the scope of this application.
Claims
1. A dynamic guidance and correction method for building construction sites based on augmented reality and real-time BIM, characterized in that, Includes the following steps: S1, AR devices are used to collect and preprocess multimodal data of the construction site in real time. Through synchronous positioning and map building SLAM algorithm, a three-dimensional environmental map of the current work area is built in real time, and the pose of the AR device itself is tracked. S2, Load the BIM model and initial registration space, load the BIM data corresponding to the current construction task into the data processing unit, and analyze the theoretical design pose information acquired in real time by the AR device and the known reference point data of the construction site to achieve the initial coarse registration of the BIM model and the AR device coordinate system to establish a common reference coordinate system. S3, real-time identification of construction components and estimation of pose, using a pre-trained deep learning target detection and instance segmentation model to analyze multimodal data, real-time identification of various target construction components and calculation of their three-dimensional spatial position and actual pose relative to the coordinate system of the AR device; S4. Quantify the deviation between the actual state and the BIM model. Compare the actual pose of the target construction component with the theoretical design pose information in a unified reference coordinate system. Quantify the deviation of the actual spatial position and posture of the target construction component from the theoretical position and posture of the BIM model. S5, determine if the deviation exceeds the threshold, generate dynamic correction instructions, compare the various quantitative deviation values with the geometric tolerance thresholds of the corresponding construction components preset in the BIM model or defined in the engineering specifications, if they exceed the thresholds, determine that the current construction component has a deviation and needs to be corrected, and automatically generate dynamic correction instructions in combination with the preset component correction strategy. S6, The dynamic correction command is superimposed on the real construction scene in the field of view of the construction personnel in the form of a virtual object through the AR device in real time, and the AR device updates the display information in real time. S7. During the component correction operation by the construction personnel, the construction site data is continuously and in real time re-collected, the component position is identified, and the deviation after correction is quantified and calculated. When all deviation values converge to within the preset tolerance threshold, the correction is determined to be successful, and a prompt message is displayed on the interface of the AR device. The finally corrected component position, correction process data, and related deviation reports are automatically stored in the BIM model database or project management system to obtain a complete construction quality control log. The quantitative calculation of the deviation between the actual state and the BIM model in S4 includes: The quantitative calculation of deviations includes translational deviations ΔX, ΔY, and ΔZ, which represent the actual offset distance of the component in the X, Y, and Z axis directions, and rotational deviations ΔRoll, ΔPitch, and ΔYaw, which represent the actual rotational angle differences of the component in the three principal axes. The data processing unit calculates the dimensional deviation of the component by extracting the geometric shape parameters of the component from real-time point cloud data and comparing them with the theoretical dimensions in the BIM model, based on the geometric definition of the component in the BIM model. All deviation values are output in specific numerical form, and confidence intervals can be provided. When comparing component poses, geometric topology analysis is performed on the components in the BIM model to identify the local coordinate system and the transformation relationship between the local coordinate system and the global coordinate system. The calculation of translation and rotation deviations is performed in the local coordinate system of the component.
2. The method according to claim 1, characterized in that, Also includes: In S1, the RGB video stream, depth information and point cloud data of the construction site are captured in real time. The AR device performs precise time synchronization of data from different sensors through a synchronization controller, and performs preliminary spatial filtering, noise reduction and distortion correction on the collected raw data. The AR device integrates an RGB camera, a depth sensor, an inertial measurement unit (IMU), and a system-on-a-chip (SoC) with a dedicated neural network processing unit (NPU). The AR device, based on IMU data and visual odometry information, uses the Simultaneous Localization and Mapping (SLAM) algorithm to build a three-dimensional environmental map of the current work area in real time and tracks the AR device's own pose in the three-dimensional environmental map. The BIM data in S2 includes the precise three-dimensional geometric information, semantic information, material properties, and engineering design dimensional tolerance requirements of the components to be constructed. For each identified target construction component, the data processing unit further combines the RGB video, the depth information, and the point cloud data generated by SLAM to execute a pose estimation algorithm.
3. The method according to claim 2, characterized in that, The real-time acquisition and preprocessing of multimodal data from the construction site in S1 includes: The AR device's RGB camera acquires the depth information in real time at a frame rate no lower than a preset frame rate; The IMU includes a three-axis accelerometer, a three-axis gyroscope, and a three-axis magnetometer, which measures the linear acceleration, angular velocity, and magnetic field direction of the AR device itself in real time at a preset sampling frequency. The NPU integrated into the SoC is used to locally perform real-time sensor data fusion, partial preprocessing, and preliminary deep learning inference tasks. The synchronization controller employs a hardware-level or operating system-level timestamp synchronization mechanism to ensure that all sensor data are aligned on a strict timeline. The initial preprocessing includes spatial filtering using median filtering or bilateral filtering algorithms, and noise reduction using statistical outlier removal (SOR) or voxel downsampling methods. The SOR algorithm is configured with a neighborhood number K of 50 and a standard deviation factor of 1.
0. Distortion correction utilizes the intrinsic parameter matrix and distortion coefficients obtained in advance through camera calibration to eliminate radial and tangential distortions in the RGB video and the depth information.
4. The method according to claim 2, characterized in that, The simultaneous localization and mapping (SLAM) algorithm in S1 adopts a visual-inertial odometry (VIO) fusion architecture, which combines pre-integrated IMU measurements with visual feature matching or direct image alignment to achieve robust pose tracking and map building. The front-end visual odometry of the SLAM system of the AR device estimates the relative motion of the AR device by tracking feature points in the scene; The backend of the SLAM system employs a nonlinear optimization method to jointly optimize the pose and map points of all keyframes in order to correct accumulated errors and generate a consistent 3D environment map. The three-dimensional environment map is a sparse point cloud, a dense point cloud, or a grid-based representation.
5. The method according to claim 2, characterized in that, The real-time identification and high-precision pose estimation of construction components in S3 includes: The deep learning object detection and instance segmentation model adopts a real-time detection and segmentation network based on the convolutional neural network (CNN) or Transformer architecture. The deep learning object detection and instance segmentation model outputs a semantic segmentation mask for the component, and extracts the geometric information of a single component from the point cloud data; The pose estimation algorithm adopts a fusion of feature point matching and optimization, RANSAC-based PnP algorithm and Iterative Closest Point ICP algorithm; The pose estimation algorithm continuously tracks the target construction component, adapting to the dynamic changes of the component during construction. Specific feature extraction strategies are adopted for different types of components. For components with planar features, a plane fitting algorithm is used to extract their surface normal vectors and positions. For components with edge or corner features, edge detection and corner detection algorithms are used to extract key geometric features, and depth information is combined to construct a three-dimensional feature point cloud.
6. The method according to claim 1, characterized in that, The deviation exceeding the threshold judgment and dynamic correction instruction generation in S5 includes: The dynamic correction command includes the correction direction, the specific movement distance, or the precise adjustment angle; The inverse kinematics algorithm or geometric transformation algorithm based on the optimization principle is used to generate dynamic correction instructions, and the shortest path and minimum adjustment amount required to adjust the actual component pose to the theoretical pose of the BIM model are calculated. The instructions also depend on the type of the component and its physical adjustable range; The dynamic correction instruction generation is also based on the operability constraints of the construction site. If the deviation of the component exceeds the reasonable range of manual correction, it will automatically prompt that auxiliary mechanical equipment is needed for correction and generate guidance information for the operators of these mechanical equipment. The correction strategy has multiple preset priorities, prioritizing translational adjustments before rotational adjustments; The system employs an instruction generation module, which includes a predefined knowledge base that stores calibration experience and operating procedures for different component types, materials, and installation methods.
7. The method according to claim 1, characterized in that, The augmented reality visualization guidance and real-time feedback in S6 include: The virtual objects include directional arrows, numerical indicators, virtual models of target locations, color-coded prompts, and text prompts. The directional arrow dynamically displays the direction of correction, and its length or color intensity is adjusted according to the magnitude of the deviation; The target location virtual model renders the final target location of the component in a semi-transparent BIM model to contrast with the actual component, or displays it in the form of wireframe or ghost model overlay. The color coding prompts are highlighted in red or orange when the component is in an out-of-threshold deviation state, and switch to yellow or green when the component is close to or reaches the tolerance range. The virtual object also includes audio auxiliary commands that play voice prompts through the built-in speaker of the AR device; The content displayed on the AR device is customized according to the construction personnel's professional level and current task.
8. The method according to claim 7, characterized in that, The closed-loop verification and recording in S7 includes: The correction effect is evaluated in real time, and the guidance information on the display interface of the AR device is dynamically updated. When all deviation values converge to within the preset tolerance threshold, the correction is deemed successful, and a prompt message indicating that the correction is complete or meets the requirements is displayed on the interface of the AR device. If the deviation still exceeds the threshold after the correction operation, S5 and S6 will continue to be executed in a loop until the component accuracy meets the requirements or the preset maximum number of iterations is reached, forming a continuous and adaptive closed-loop correction feedback mechanism. The data processing unit automatically updates the corrected component information into the BIM model, thereby achieving real-time synchronization of the BIM model's as-built status. The update includes marking or replacing the actual pose, dimensions, and any corrections of the components in the BIM model; The recording module supports the generation of exportable, traceable quality reports in PDF or CSV format, which include construction date, time, operator ID, original deviation, correction amount, final deviation, and compliance assessment with the design tolerances of the BIM model.