An intelligent mapping method for house building projects
By performing multi-channel enhancement and fusion of grayscale texture and edge features of architectural images, combined with an improved YOLOv10 model, the problem of feature blurring and distortion caused by environmental interference in traditional surveying and mapping is solved, thereby improving the recognition accuracy and efficiency of surveying and mapping data.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA TIESIJU CIVIL ENGINEERING GROUP CO LTD
- Filing Date
- 2026-02-26
- Publication Date
- 2026-06-23
AI Technical Summary
In traditional surveying and mapping, image features are blurred, lost, or broken due to interference from the shooting environment, and feature enhancement imbalance and original information distortion are caused by a single image processing method.
By extracting the grayscale texture and edge features of the building, they are mapped to the R, G, and B channels for enhancement processing. Multi-channel fusion is then used to generate enhanced mapping maps, which are then combined with an improved YOLOv10 model for recognition and uploading.
This effectively avoids the loss and misjudgment of building features, improves the accuracy and efficiency of surveying data identification, and ensures the authenticity and integrity of surveying data.
Smart Images

Figure CN122265701A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and more specifically to an intelligent surveying method for building construction projects. Background Technology
[0002] Building surveying technology has evolved from traditional manual measurement to digital instruments, with technologies such as total stations, 3D laser scanning, and photogrammetry gradually being applied. In recent years, the integration of computer vision and artificial intelligence has promoted the development of intelligent surveying, enabling automatic feature extraction through image processing. However, enhancing texture and edge features in complex environments still faces challenges in terms of accuracy and robustness.
[0003] Publication No. CN120495538A discloses a data reconstruction method and system for remote sensing mapping models of buildings; acquiring three-dimensional point cloud data of buildings, using the local geometric features of the three-dimensional point cloud data to perform initial layering to obtain each target layer; using the geometric structural relationship of the interface between target layers of adjacent building stages, determining whether there is false point cloud data in the target layer and determining the correction parameters for false point cloud data; in the case of false point cloud data in the target layer, using the correction parameters to correct the point cloud data in the target layer to obtain corrected point cloud data; using the corrected point cloud data to determine the thinning parameters of the target layer, and using the thinning parameters to determine the target point cloud model of the building.
[0004] In traditional surveying and mapping, there are problems such as blurred, lost, or broken image features caused by interference from the shooting environment, as well as problems of imbalance in feature enhancement and distortion of original information caused by a single image processing method. Summary of the Invention
[0005] The purpose of this invention is to solve the problem mentioned in the background art above: 1. the problem of blurred, lost, or broken image features caused by interference from the shooting environment in traditional surveying and mapping; 2. In view of the problems of feature enhancement imbalance and original information distortion caused by a single image processing method, an intelligent surveying method for building construction projects is proposed.
[0006] A first aspect of this invention provides an intelligent surveying method for building construction projects, the method comprising: Obtain the original survey map of the target building, and extract the grayscale texture matrix and edge feature matrix from the original survey map; The grayscale texture matrix and the edge feature matrix are enhanced to obtain the enhanced texture matrix and the enhanced edge matrix, respectively. The enhanced feature matrix is mapped to the R channel to obtain the R channel feature map, the original mapping map is mapped to the G channel to obtain the G channel feature map, and the enhanced edge matrix is mapped to the B channel to obtain the B channel feature map; The R-channel feature map, the G-channel feature map, and the B-channel feature map are fused to obtain an enhanced mapping map; The enhanced mapping map is identified using the target model to obtain mapping data, which is then uploaded to the cloud.
[0007] By extracting the grayscale texture and edge features of the original survey map and enhancing them, the three types of features are mapped to three channels and fused to generate an enhanced survey map. The target model then accurately identifies the survey data and uploads it to the cloud, thereby improving the recognition efficiency and accuracy of building survey maps.
[0008] Optionally, a grayscale texture matrix is extracted from the original survey map, the method comprising: The original survey map is denoised to obtain a denoised survey map. The denoised survey map is then converted to grayscale to obtain a two-dimensional grayscale image. The two-dimensional grayscale image is then normalized to obtain a normalized grayscale image. A grayscale co-occurrence matrix set is calculated based on the normalized grayscale image. The grayscale co-occurrence matrix set contains grayscale co-occurrence matrices in four different directions. After extracting texture features from each gray-level co-occurrence matrix in the gray-level co-occurrence matrix set, the average value is taken to obtain the gray-level texture matrix; The edge feature matrix is obtained by extracting the original survey map, including: The normalized grayscale image is subjected to a second Gaussian filter to obtain a smooth grayscale image. The smooth grayscale image is subjected to gradient calculation to obtain a gradient magnitude image. The gradient magnitude image is subjected to non-maximum suppression to obtain a refined gradient image. The refined gradient map is subjected to double threshold cutting to obtain a binarized edge map. The binarized edge map is subjected to morphological closing operation to obtain an edge map. The edge map is then converted into a two-dimensional numerical matrix to obtain an edge feature matrix.
[0009] The original survey map is denoised, grayscale converted and normalized. The average grayscale texture matrix is extracted by the four-directional grayscale co-occurrence matrix. Then, the edge feature matrix is obtained by second-order Gaussian filtering, gradient calculation, double threshold cutting and morphological closing operation. This effectively removes noise, smooths the image, and accurately extracts stable and clear texture and edge features, laying a solid data foundation for subsequent image enhancement.
[0010] Optionally, the grayscale texture matrix and the edge feature matrix are enhanced to obtain an enhanced texture matrix and an enhanced edge matrix, respectively, the method comprising: The row autocorrelation matrix is obtained by traversing the row indices of the target matrix, and the column autocorrelation matrix is obtained by traversing the column indices of the target matrix; the target matrix is either the grayscale texture matrix or the edge feature matrix. The row autocorrelation matrix and the column autocorrelation matrix are subjected to preset processing to obtain optimized row autocorrelation matrix and optimized column autocorrelation matrix. The target matrix, the optimized row autocorrelation matrix and the optimized column autocorrelation matrix are fused to obtain enhanced texture matrix and enhanced edge matrix. The preset processing includes outlier removal and grayscale compression.
[0011] The autocorrelation matrix is obtained by traversing the row and column indices of the grayscale texture matrix and the edge feature matrix. After outlier removal and grayscale compression optimization, it is fused with the original matrix to obtain the enhanced feature matrix. This can effectively enhance feature continuity and recognizability, eliminate interference information, and improve the stability and expression accuracy of the feature matrix.
[0012] Optionally, the enhanced feature matrix is mapped to the R channel to obtain an R-channel feature map, the method comprising: Determine the elements in the enhancement feature matrix, determine the pixel values based on the elements, and replace the elements in the enhancement feature matrix based on the pixel values to obtain the R-channel feature map; The method of mapping the original mapping image to the G channel to obtain the G channel feature map includes: The original mapping image is subjected to channel separation to obtain R channel, G channel and B channel, and the image corresponding to the G channel is used as the G channel feature map; The method of mapping the enhanced edge matrix to the B channel to obtain the B channel feature map includes: The elements in the enhanced edge matrix are determined, the pixel values are determined based on the elements, and the elements in the enhanced feature matrix are replaced based on the pixel values to obtain the B-channel feature map.
[0013] The enhanced feature matrix, the original survey map, and the enhanced edge matrix are mapped to the R, G, and B channels respectively to generate corresponding feature maps. This enables independent channel-specific carrying of texture, original information, and edge features. While preserving the original survey information, key features are highlighted and enhanced, providing clear and standardized single-channel basic data for subsequent image fusion.
[0014] Optionally, the target model based on YOLOv10 improvements includes: The target model is obtained by replacing the backbone network in the YOLOv10 model with an improved backbone network. The working principle of the improved backbone network includes: The original image is acquired and sequentially input into the adaptive gamma correction module, the visual focusing noise modulation module, the Conv module, and the Conv module to obtain the first image. The first image is then input into the C2f module to obtain the second image. The second image is input into the improved CBAM module to obtain the third image. The second image and the third image are stitched together to obtain the fourth image. The fourth image is input into the Conv module to obtain the fifth image. The fifth image is used as the output of the improved backbone network. The 5th image is sequentially input into the C2f module and the improved CBAM module to obtain the 6th image. The 5th image and the 6th image are stitched together to obtain the 7th image. The 7th image is downsampled to obtain the 8th image. The 8th image is used as the output of the improved backbone network. The 8th image is input into the C2f module and downsampled to obtain the 9th image. The 9th image is then input into the C2fCIB module, SPPF module, and PSA module in sequence to obtain the 10th image. The 10th image is used as the output of the improved backbone network.
[0015] Optionally, the workflow of the improved CBAM module includes: The original feature map is obtained, and then input into the channel attention module to obtain the first feature map. The first feature map is fused with the original feature map to obtain the second feature map. The second feature map is input into the spatial attention module to obtain the third feature map. The third feature map is fused with the second feature map to obtain the fourth feature map. The fourth feature map is concatenated with the original feature map to obtain the fifth feature map. The fifth feature map is used as the output of the improved CBAM module.
[0016] Optionally, the mathematical expression of the adaptive gamma correction module; ; Where I' is the output of the adaptive gamma correction module, I r For the normalized red single-channel image, I g For the normalized green single-channel image, I b This is the normalized blue single-channel image, where γ is the gamma value and Δγ is the normalized value. r The deviation value for the red channel, Δγ g Δγ represents the deviation value of the green channel. b This represents the deviation value for the blue channel.
[0017] Optionally, the visual focusing noise modulation module includes: Obtain the original training image, perform target recognition on the original training image to obtain bounding boxes, add bounding boxes to the original training image to obtain labeled training images, and calculate the target density of each bounding box in the labeled training image; the target density is the density of spatial positions in each bounding box; the spatial position is the pixel point used for target recognition. Normalize each target density to obtain a normalized target density, determine the coordinates of each normalized target density, and combine them to obtain a visual focus map; Obtain basic noise, generate adaptive noise using the visual focus image and the basic noise, determine image channels based on the original training image, and if the image channels belong to a preset channel set, add the adaptive noise to the original training image to obtain a noise-modulated image, and use the noise-modulated image as the output of the visual focus noise modulation module.
[0018] The beneficial effects of this invention are: This invention proposes an intelligent surveying method for building construction projects. By extracting and enhancing building texture and edge features, and using multi-channel fusion to highlight details while retaining original information, it effectively avoids the loss and misjudgment of building features, improves the recognition accuracy and efficiency of model surveying data, and ensures the authenticity and integrity of surveying data. Attached Figure Description
[0019] Figure 1 A flowchart of an intelligent surveying method for building construction projects provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the model structure of an intelligent surveying method for building construction provided in an embodiment of the present invention. Detailed Implementation
[0020] To further illustrate the technical means and effects of the present invention in achieving its intended purpose, the following detailed description of the specific implementation methods, structures, features, and effects of the present invention, in conjunction with the accompanying drawings and preferred embodiments, is provided below.
[0021] This invention provides an intelligent surveying method for building construction projects. See also... Figure 1 , Figure 1 A flowchart illustrating an intelligent surveying method for building construction projects provided in this embodiment of the invention. The method includes the following steps: S101, Obtain the original survey map of the target building, and extract the grayscale texture matrix and edge feature matrix from the original survey map; S102, enhance the grayscale texture matrix and edge feature matrix to obtain the enhanced texture matrix and enhanced edge matrix respectively; S103, map the enhanced feature matrix to the R channel to obtain the R channel feature map, map the original mapping map to the G channel to obtain the G channel feature map, and map the enhanced edge matrix to the B channel to obtain the B channel feature map; S104, the R-channel feature map, G-channel feature map and B-channel feature map are fused to obtain the enhanced mapping map; S105 identifies the enhanced survey map through the target model to obtain survey data, and uploads the survey data to the cloud.
[0022] The intelligent surveying and mapping method for building construction provided by this invention extracts and enhances building texture and edge features, utilizes multi-channel fusion to highlight details and retain original information, effectively avoids loss and misjudgment of building features, improves the recognition accuracy and efficiency of model surveying and mapping data, and ensures the authenticity and integrity of surveying and mapping data.
[0023] In one implementation, the original survey map of the target building is obtained by using a drone to collect images and videos of the target building.
[0024] In one implementation, the grayscale texture and edge feature matrix of the building are accurately extracted, and then autocorrelation enhancement is performed to effectively fill the texture gaps and complete the broken edges. This improves the clarity of key features such as building wall material, window frame, and fine structure, avoids feature loss due to shooting distance and environmental interference, and provides a highly recognizable feature foundation for subsequent recognition.
[0025] In one implementation, the R channel carries enhanced texture, the G channel retains the original color and geometry, and the B channel enhances the contour. This design not only fully preserves the true color and spatial relationship of the original survey map, but also achieves precise highlighting of texture details and structural contours through feature separation enhancement. This avoids information imbalance caused by single enhancement and ensures the authenticity and integrity of the survey data.
[0026] In one implementation, the accuracy and efficiency of model recognition are improved and the risk of misjudgment is reduced: the three-channel fused image integrates multi-dimensional enhanced features, which improves the inter-class differentiation of building components. Combined with target model recognition, it reduces the probability of the model misjudging similar features. At the same time, the enhanced image features are easier for the model to extract, shortening the recognition calculation time and providing technical support for quickly obtaining accurate surveying and mapping data.
[0027] In one embodiment, a grayscale texture matrix is extracted from the original survey map, the method comprising: The original survey map is denoised to obtain a denoised survey map. The denoised survey map is then converted to grayscale to obtain a two-dimensional grayscale image. The two-dimensional grayscale image is then normalized to obtain a normalized grayscale image. A grayscale co-occurrence matrix set is calculated based on the normalized grayscale image. The grayscale co-occurrence matrix set contains grayscale co-occurrence matrices in four different directions. After extracting texture features from each gray-level co-occurrence matrix in the gray-level co-occurrence matrix set, the average value is taken to obtain the gray-level texture matrix; The edge feature matrix is extracted from the original survey map, including: A smoothed grayscale image is obtained by performing a second Gaussian filter on the normalized grayscale image, a gradient magnitude image is obtained by performing gradient calculation on the smoothed grayscale image, and a thinned gradient image is obtained by performing non-maximum suppression on the gradient magnitude image. The thinned gradient map is subjected to double threshold cutting to obtain a binarized edge map. The binarized edge map is subjected to morphological closing operation to obtain an edge map. The edge map is then converted into a two-dimensional numerical matrix to obtain the edge feature matrix.
[0028] In one implementation, the original survey map is preprocessed for denoising using a Gaussian filter with a kernel size of 3×3 and a standard deviation σ=1.0 to suppress noise interference caused by atmospheric scattering and uneven illumination, resulting in a denoised RGB image (denoised survey map). The denoised survey map is then converted to grayscale to obtain a two-dimensional grayscale image: the denoised RGB image is converted to grayscale using a weighted average method: grayscale value = 0.299×R + 0.587×G + 0.114×B, converting the three-dimensional RGB image into a two-dimensional grayscale image with pixel values ranging from [0,255]. The two-dimensional grayscale image is then normalized to obtain a normalized grayscale image: the grayscale image is normalized by linearly scaling to the [0,1] interval, using the formula: normalized value = (original pixel value - minimum pixel value) / (maximum pixel value). -Minimum pixel value), eliminate the influence of pixel value scale differences on subsequent calculations to obtain a normalized grayscale image; calculate the grayscale co-occurrence matrix set based on the normalized grayscale image: calculate the grayscale co-occurrence matrix on the normalized grayscale image (set distance d=1, angle θ=0° / 45° / 90° / 135°, quantization level=16), capture the spatial distribution characteristics of building surface texture, and obtain grayscale co-occurrence matrices in 4 directions; extract texture features from each grayscale co-occurrence matrix in the grayscale co-occurrence matrix set and take the average to obtain the grayscale texture matrix: extract core texture features from the grayscale co-occurrence matrices in 4 directions, the core texture features include: contrast, correlation, energy, entropy, and take the average of the feature values in 4 directions to form a two-dimensional matrix with the same size as the normalized grayscale image, which is the grayscale texture matrix.
[0029] In one implementation, a second Gaussian filter is applied to the normalized grayscale image to obtain a smoothed grayscale image: a second Gaussian filter (kernel size 5×5, standard deviation σ=1.5) is applied to the normalized grayscale image to further smooth noise while preserving key edge information such as building outlines and window frames, resulting in a smoothed grayscale image; gradient calculation is performed on the smoothed grayscale image to obtain a gradient magnitude map: the gradient is calculated on the smoothed grayscale image using the Sobel operator, calculating G in the x-direction separately. x and y direction G y The gradient, expressed by the gradient magnitude formula: Obtain the gradient magnitude map to capture the intensity information of the edges.
[0030] In one implementation, non-maximum suppression (NMS) is applied to the gradient magnitude map to obtain a refined gradient map: NMS is performed on the gradient magnitude map by traversing pixels along the gradient direction, retaining only pixels with local gradient maximum values, discarding non-edge pixels, and refining the edge contours to obtain the refined gradient map; a double-threshold segmentation is performed on the refined gradient map to obtain a binarized edge map: The refined gradient map is segmented using a double threshold, setting a high threshold of 0.2 and a low threshold of 0.05. Based on the normalized gradient magnitude, pixels are divided into strong edges, weak edges, and non-edges, retaining strong edges and... Connecting weak edges that meet the conditions yields a binarized edge map with pixel values 0 = non-edge and 1 = edge. Morphological closing operations are performed on the binarized edge map to obtain a complete edge map: Morphological closing operations are performed on the binarized edge map, first dilating and then eroding, with a 3×3 rectangle as the structuring element, filling in broken edges, such as discontinuous window frames caused by long-distance photography, to obtain a complete edge map. The edge map is then converted into a two-dimensional numerical matrix to obtain the edge feature matrix: The complete edge map is converted into a two-dimensional numerical matrix (pixel values 1 and 0 remain unchanged), which is the edge feature matrix.
[0031] In one embodiment, enhancing the grayscale texture matrix and the edge feature matrix respectively yields an enhanced texture matrix and an enhanced edge matrix, the method comprising: The row autocorrelation matrix is obtained by traversing the row indices of the target matrix, and the column autocorrelation matrix is obtained by traversing the column indices of the target matrix; the target matrix can be either the grayscale texture matrix or the edge feature matrix. The row autocorrelation matrix and column autocorrelation matrix are pre-processed to obtain the optimized row autocorrelation matrix and optimized column autocorrelation matrix. The target matrix, optimized row autocorrelation matrix and optimized column autocorrelation matrix are fused to obtain the enhanced texture matrix and enhanced edge matrix. The pre-processing includes outlier removal and grayscale compression.
[0032] In one implementation, the row autocorrelation matrix is obtained by traversing the row indices of the target matrix: traversing all row indices (i,j) and filling each element of the row autocorrelation matrix to obtain the complete row autocorrelation matrix (C×C, element values ∈ [0,1]); the column autocorrelation matrix is obtained by traversing the column indices of the target matrix: traversing all column indices (i,j) and filling each element of the column autocorrelation matrix to obtain the complete column autocorrelation matrix (C×C, element values ∈ [0,1]).
[0033] In one implementation, the row autocorrelation matrix and column autocorrelation matrix are pre-processed to obtain optimized row autocorrelation matrix and optimized column autocorrelation matrix: outlier removal: outlier elements in the row autocorrelation matrix and column autocorrelation matrix that exceed the mean ± 3 times the standard deviation are removed using the 3σ criterion and filled with the matrix mean; grayscale compression: ensure that the element values of the optimized matrix are still in the [0,1] range to avoid numerical overflow during subsequent fusion; the target matrix, optimized row autocorrelation matrix and optimized column autocorrelation matrix are fused to obtain enhanced texture matrix and enhanced edge matrix: fusion is performed by weighted summation, with the autocorrelation matrix weight 0.4 and the original matrix weight 0.2.
[0034] In one embodiment, mapping the enhanced feature matrix to the R channel to obtain the R-channel feature map includes the following method: Determine the elements in the enhancement feature matrix, determine the pixel values based on the elements, and replace the elements in the enhancement feature matrix based on the pixel values to obtain the R-channel feature map; The original mapping image is mapped to the G channel to obtain the G channel feature map. Methods include: The original map is separated into R, G and B channels, and the image corresponding to the G channel is used as the G channel feature map. The method of mapping the enhanced edge matrix to the B channel to obtain the B channel feature map includes: The elements in the enhanced edge matrix are determined, the pixel values are determined based on the elements, and the elements in the enhanced feature matrix are replaced based on the pixel values to obtain the B-channel feature map.
[0035] In one implementation, an R-channel feature map that highlights texture details is obtained by mapping the enhanced feature matrix to pixel values. The original survey map's G-channel is directly reused to preserve true color and geometric information. The enhanced edge matrix is also mapped to obtain a B-channel feature map that enhances the contour. This achieves precise enhancement of texture and contour features while fully preserving the original core information, effectively improving the detail recognition and feature differentiation of architectural survey images. It provides high-precision image data support for subsequent surveying applications such as semantic segmentation and 3D modeling.
[0036] In one embodiment, improvements to the target model based on YOLOv10 include: The target model is obtained by replacing the backbone network in the YOLOv10 model with an improved backbone network. The improved working principle of the backbone network includes: The original image is acquired and sequentially input into the adaptive gamma correction module, the visual focusing noise modulation module, the Conv module, and the Conv module to obtain the first image. The first image is then input into the C2f module to obtain the second image. The second image is input into the improved CBAM module to obtain the third image. The second image and the third image are stitched together to obtain the fourth image. The fourth image is input into the Conv module to obtain the fifth image. The fifth image is used as the output of the improved backbone network. The 5th image is sequentially input into the C2f module and the improved CBAM module to obtain the 6th image. The 5th image and the 6th image are stitched together to obtain the 7th image. The 7th image is downsampled to obtain the 8th image. The 8th image is used as the output of the improved backbone network. The 8th image is input into the C2f module and downsampled to obtain the 9th image. The 9th image is then input into the C2fCIB module, SPPF module, and PSA module in sequence to obtain the 10th image. The 10th image is used as the output of the improved backbone network.
[0037] In one implementation, the C2fCIB module, PSA module, and SCDown operation module are all standard components of the original YOLOv10 architecture, responsible for efficient feature extraction, spatial attention, and downsampling operations, respectively.
[0038] In one implementation, the backbone network is reconstructed by adding three major modules: an adaptive gamma correction module, a visual focusing noise modulation module, and an improved CBAM module, and combining them with components such as C2f and Conv from the original model. This reconstructs the backbone network through multi-stage feature processing, multi-round feature concatenation, and progressive downsampling, replacing the original YOLOv10s backbone network structure.
[0039] In one implementation, the original image is sequentially subjected to illumination adaptive correction by the adaptive gamma correction module, noise modulation by the visual focus noise modulation module, and basic feature extraction by the Conv module twice to obtain the first image. After that, it is refined into the second image by the C2f module. Then, the attention features are enhanced by the improved CBAM module. The enhanced third image is then stitched together with the original second image and integrated by the Conv module to obtain the first output fifth image. Subsequently, the fifth image is subjected to C2f feature extraction and R-CBAM attention enhancement. After stitching and fusion, it is downsampled to obtain the second output eighth image. Finally, after the eighth image is processed and downsampled by the C2f module, it is combined with the C2fCIB, SPPF, and PSA modules to perform depth feature extraction to obtain the third output tenth image, thus realizing multi-scale and multi-level feature output.
[0040] In one implementation, an adaptive gamma correction module and a visual focusing noise modulation module are used to first address the issues of variable lighting and complex backgrounds in road scenes, clearing interference for subsequent feature extraction. Then, an improved CBAM module is used to enhance the attention of key crack features, and multiple feature concatenation methods are employed to retain the basic feature information extracted in the early stages, avoiding information loss caused by single feature enhancement. By progressively downsampling to improve the feature scale, coupled with multi-stage feature output, it can adapt to the detection needs of road cracks at different scales. While enhancing the ability to represent fine-grained crack features, it maintains the computational efficiency of the original model's backbone network, making the improved backbone network both robust and practical.
[0041] In one embodiment, improving the workflow of the CBAM module includes: The original feature map is obtained and input into the channel attention module to obtain the first feature map. The first feature map is fused with the original feature map to obtain the second feature map. The second feature map is input into the spatial attention module to obtain the third feature map. The third feature map is fused with the second feature map to obtain the fourth feature map. The fourth feature map is concatenated with the original feature map to obtain the fifth feature map. The fifth feature map is used as the output of the improved CBAM module.
[0042] In one implementation, the feature enhancement of road cracks is achieved through step-by-step extraction using channel attention and spatial attention, combined with multi-round feature fusion and final feature concatenation, while preserving the original feature information to the greatest extent. The feature processing flow of this module follows a fixed step logic: first, the original feature map is obtained and input into the channel attention module to obtain the first feature map; the first feature map is fused with the original feature map to generate the second feature map; then, the second feature map is sent to the spatial attention module to obtain the third feature map; the third feature map is fused with the second feature map to obtain the fourth feature map; finally, the fourth feature map is concatenated with the original feature map, and the resulting fifth feature map is the final output of the module.
[0043] In one implementation, the feature redundancy and distortion problems that easily occur when traditional CBAM modules are stacked are addressed by multi-round feature fusion operations. This ensures that the attention-enhanced features remain associated with the preceding features, reducing the loss of feature information. The final concatenation with the original feature map further introduces the residual connection design concept, alleviating the gradient vanishing problem in model training and reducing the risk of overfitting. At the same time, this design only slightly increases the number of parameters and computational cost compared to the original YOLOv10. While improving the model's ability to represent key features such as fine cracks and details, it maintains the lightweight nature of the module, meeting the real-time deployment requirements of architectural surveying.
[0044] In one embodiment, the mathematical expression of the adaptive gamma correction module; ; Where I' is the output of the adaptive gamma correction module, I r For the normalized red single-channel image, I g For the normalized green single-channel image, I b This is the normalized blue single-channel image, where γ is the gamma value and Δγ is the normalized value. r The deviation value for the red channel, Δγ g Δγ represents the deviation value of the green channel. b This represents the deviation value for the blue channel.
[0045] In one implementation, Concat means stitching together the three corrected channel images; γ is the gamma value used to adjust the brightness. γ>1: compresses the dark areas of the image, darkens the overall image, and enhances the details of the bright areas; γ<1: compresses the bright areas of the image, brightens the overall image, and enhances the details of the dark areas; the gamma value is obtained by randomly sampling the brightness of the original image I and taking the average value. The gamma value is randomly sampled from the interval [0.45, 2.2], covering all typical cases of road scenes, including underexposure (low γ), normal lighting, and overexposure (high γ).
[0046] In one implementation, the original image I is separated into three channels: RGB, and a gamma transform is designed for each channel instead of a globally uniform value. A small range of random deviations is added to the base gamma value of each channel to simulate lighting fluctuations in real-world scenes, such as sunlight reflection, road shadows, and camera exposure deviations. The random deviation Δγc (c takes values of r, g, and b) is independently sampled from the interval [−0.1γ, 0.1γ]. The deviation amplitude is proportional to the gamma value to avoid excessive perturbation when the γ value is small and insufficient perturbation when the γ value is large, thus balancing augmentation diversity and training stability.
[0047] In one embodiment, the visual focusing noise modulation module includes: Obtain the original training image, perform target recognition on the original training image to obtain bounding boxes, add bounding boxes to the original training image to obtain labeled training images, and calculate the target density of each bounding box in the labeled training image; the target density is the density of spatial locations in each bounding box; the spatial locations are the pixels used for target recognition. Normalize each target density to obtain a normalized target density, determine the coordinates of each normalized target density, and combine them to obtain a visual focus map; The basic noise is obtained, and adaptive noise is generated by combining the visual focusing image and the basic noise. The image channels are determined based on the original training image. If the image channel belongs to the preset channel set, the adaptive noise is added to the original training image to obtain a noise-modulated image. The noise-modulated image is then used as the output of the visual focusing noise modulation module.
[0048] In one implementation, target recognition is performed on the original training image to obtain bounding boxes. Targets in the original training image are identified to obtain bounding boxes, such as building cracks, potholes, and geometric features. The bounding boxes are rectangular borders. Bounding boxes are added to the original training image to obtain labeled training images, that is, the labeled training images are obtained by adding the bounding boxes to the original training images.
[0049] In one implementation, the target density of each bounding box in the labeled training image is calculated. The target density is the number of spatial locations within the bounding box, and the spatial location is the pixel location of the target in the image. The formula for calculating the target density is: Where T(x,y) is the number of spatial locations within the bounding box, and B... i Let T(x,y) be the bounding box, n be the total number of bounding boxes, i be the i-th bounding box, and Ⅱ() be the indicator function. The indicator function is 1 if the spatial location is inside the box and 0 if it is outside the box. The higher the value of T(x,y), the denser the target at that location.
[0050] In one implementation, the density of each target is normalized to obtain a normalized target density. The coordinates of each normalized target density are determined and combined to obtain a visual focus map. T(x,y) is normalized to eliminate the density value differences between different images and different batches, making it a visual focus map that is comparable across samples. The value range of the visual focus map T'(x,y) is [0,1]. The higher the value, the higher the priority of the core target region.
[0051] In one implementation, basic noise is obtained, and Gaussian noise following a normal distribution is generated as the basic noise source, with the noise size being exactly the same as the input image; adaptive noise is generated using the visual focus map and the basic noise. Where N1(x,y) is the adaptive noise, T'(x,y) is the visual focusing image, and N0(x,y) is the basic noise.
[0052] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any way. Although the present invention has been disclosed above with reference to preferred embodiments, it is not intended to limit the present invention. Any person skilled in the art can make some modifications or alterations to the above-disclosed technical content to create equivalent embodiments without departing from the scope of the present invention. Any simple modifications, equivalent changes and alterations made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention should still fall within the scope of the claims of the present invention.
Claims
1. A method for intelligent surveying of building construction projects, characterized in that, The method includes: Obtain the original survey map of the target building, and extract the grayscale texture matrix and edge feature matrix from the original survey map; The grayscale texture matrix and the edge feature matrix are enhanced to obtain the enhanced texture matrix and the enhanced edge matrix, respectively. The enhanced feature matrix is mapped to the R channel to obtain the R channel feature map, the original mapping map is mapped to the G channel to obtain the G channel feature map, and the enhanced edge matrix is mapped to the B channel to obtain the B channel feature map; The R-channel feature map, the G-channel feature map, and the B-channel feature map are fused to obtain an enhanced mapping map; The enhanced mapping map is identified using the target model to obtain mapping data, which is then uploaded to the cloud.
2. The intelligent surveying method for building construction projects according to claim 1, characterized in that, The method for extracting a grayscale texture matrix from the original survey map includes: The original survey map is denoised to obtain a denoised survey map. The denoised survey map is then converted to grayscale to obtain a two-dimensional grayscale image. The two-dimensional grayscale image is then normalized to obtain a normalized grayscale image. A grayscale co-occurrence matrix set is calculated based on the normalized grayscale image. The grayscale co-occurrence matrix set contains grayscale co-occurrence matrices in four different directions. After extracting texture features from each gray-level co-occurrence matrix in the gray-level co-occurrence matrix set, the average value is taken to obtain the gray-level texture matrix; The edge feature matrix is obtained by extracting the original survey map, including: The normalized grayscale image is subjected to a second Gaussian filter to obtain a smooth grayscale image. The smooth grayscale image is subjected to gradient calculation to obtain a gradient magnitude image. The gradient magnitude image is subjected to non-maximum suppression to obtain a refined gradient image. The refined gradient map is subjected to double threshold cutting to obtain a binarized edge map. The binarized edge map is subjected to morphological closing operation to obtain an edge map. The edge map is then converted into a two-dimensional numerical matrix to obtain an edge feature matrix.
3. The intelligent surveying method for building construction projects according to claim 1, characterized in that, The grayscale texture matrix and the edge feature matrix are enhanced to obtain an enhanced texture matrix and an enhanced edge matrix, respectively. The method includes: The row autocorrelation matrix is obtained by traversing the row indices of the target matrix, and the column autocorrelation matrix is obtained by traversing the column indices of the target matrix; the target matrix is either the grayscale texture matrix or the edge feature matrix. The row autocorrelation matrix and the column autocorrelation matrix are subjected to preset processing to obtain optimized row autocorrelation matrix and optimized column autocorrelation matrix. The target matrix, the optimized row autocorrelation matrix and the optimized column autocorrelation matrix are fused to obtain enhanced texture matrix and enhanced edge matrix. The preset processing includes outlier removal and grayscale compression.
4. The intelligent surveying method for building construction projects according to claim 1, characterized in that, The method of mapping the enhanced feature matrix to the R channel to obtain the R channel feature map includes: Determine the elements in the enhancement feature matrix, determine the pixel values based on the elements, and replace the elements in the enhancement feature matrix based on the pixel values to obtain the R-channel feature map; The method of mapping the original mapping image to the G channel to obtain the G channel feature map includes: The original mapping image is subjected to channel separation to obtain R channel, G channel and B channel, and the image corresponding to the G channel is used as the G channel feature map; The method of mapping the enhanced edge matrix to the B channel to obtain the B channel feature map includes: The elements in the enhanced edge matrix are determined, the pixel values are determined based on the elements, and the elements in the enhanced feature matrix are replaced based on the pixel values to obtain the B-channel feature map.
5. The intelligent surveying method for building construction projects according to claim 1, characterized in that, The improvements to the target model based on YOLOv10 include: The target model is obtained by replacing the backbone network in the YOLOv10 model with an improved backbone network. The working principle of the improved backbone network includes: The original image is acquired and sequentially input into the adaptive gamma correction module, the visual focusing noise modulation module, the Conv module, and the Conv module to obtain the first image. The first image is then input into the C2f module to obtain the second image. The second image is input into the improved CBAM module to obtain the third image. The second image and the third image are stitched together to obtain the fourth image. The fourth image is input into the Conv module to obtain the fifth image. The fifth image is used as the output of the improved backbone network. The 5th image is sequentially input into the C2f module and the improved CBAM module to obtain the 6th image. The 5th image and the 6th image are stitched together to obtain the 7th image. The 7th image is downsampled to obtain the 8th image. The 8th image is used as the output of the improved backbone network. The 8th image is input into the C2f module and downsampled to obtain the 9th image. The 9th image is then input into the C2fCIB module, SPPF module, and PSA module in sequence to obtain the 10th image. The 10th image is used as the output of the improved backbone network.
6. The intelligent surveying method for building construction projects according to claim 5, characterized in that, The workflow of the improved CBAM module includes: The original feature map is obtained, and then input into the channel attention module to obtain the first feature map. The first feature map is fused with the original feature map to obtain the second feature map. The second feature map is input into the spatial attention module to obtain the third feature map. The third feature map is fused with the second feature map to obtain the fourth feature map. The fourth feature map is concatenated with the original feature map to obtain the fifth feature map. The fifth feature map is used as the output of the improved CBAM module.
7. The intelligent surveying method for building construction projects according to claim 5, characterized in that, The mathematical expression of the adaptive gamma correction module; ; Where I' is the output of the adaptive gamma correction module, I r For the normalized red single-channel image, I g For the normalized green single-channel image, I b This is the normalized blue single-channel image, where γ is the gamma value and Δγ is the normalized value. r The deviation value for the red channel, Δγ g Δγ represents the deviation value of the green channel. b This represents the deviation value for the blue channel.
8. The intelligent surveying method for building construction projects according to claim 5, characterized in that, The visual focusing noise modulation module includes: Obtain the original training image, perform target recognition on the original training image to obtain bounding boxes, add bounding boxes to the original training image to obtain labeled training images, and calculate the target density of each bounding box in the labeled training image; the target density is the density of spatial positions in each bounding box; the spatial position is the pixel point used for target recognition. Normalize each target density to obtain a normalized target density, determine the coordinates of each normalized target density, and combine them to obtain a visual focus map; Obtain basic noise, generate adaptive noise using the visual focus image and the basic noise, determine image channels based on the original training image, and if the image channels belong to a preset channel set, add the adaptive noise to the original training image to obtain a noise-modulated image, and use the noise-modulated image as the output of the visual focus noise modulation module.