A multi-view based ancient building crack marking method and device
By using a multi-view geometric method to obtain spatial location information of cracks in ancient buildings, the problem of lacking a unified reference benchmark in existing technologies is solved, and the automatic extraction and quantitative representation of cracks are realized, thereby improving the practicality and engineering application value of ancient building inspection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN UNIV OF TECH
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies cannot directly obtain the specific spatial location information of cracks in ancient buildings from point cloud models reconstructed in 3D. There is a lack of unified reference benchmarks, making it difficult to achieve automatic extraction and quantitative representation of cracks.
Using a multi-view geometric method, we acquire close-up and distant image sequences, perform affine invariant feature matching and 3D reconstruction, construct a unified reference coordinate system, fit the crack and ground plane normal vectors, perform cross product operations, establish an orthogonal coordinate system, and extract the spatial location information of the crack.
It enables the quantitative expression of the spatial location of cracks, provides key parameters to support repair plans, improves the automation and universality of detection, and is applicable to ancient buildings of different materials and structures.
Smart Images

Figure CN122265604A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of image recognition, specifically relating to a method and apparatus for marking cracks in ancient buildings based on multiple views. Background Technology
[0002] Ancient buildings, as an important component of immovable cultural relics, are receiving increasing attention for their protection. Cracks, as a key indicator affecting the structural safety of ancient buildings, are crucial for assessing their health status and developing restoration plans.
[0003] Current technologies for detecting cracks in ancient buildings mainly fall into two categories: two-dimensional detection methods based on close-up images and holistic digital methods based on three-dimensional reconstruction. Close-up image detection, through digital image processing or deep learning algorithms, can achieve high-precision crack identification and parameter extraction, with detection accuracy reaching sub-millimeter levels, meeting engineering inspection requirements. However, close-up images have a limited field of view, only reflecting local details and failing to present the spatial relationship between cracks and the overall building. Three-dimensional reconstruction technologies, such as multi-view geometric methods, can construct a holistic three-dimensional point cloud model of the ancient building based on a sequence of distant images, enabling digital archiving of the structure. However, cracks account for a smaller percentage of pixels in distant images, making it difficult to guarantee crack detection accuracy when directly relying on distant images, and potentially leading to missed detections.
[0004] To address the aforementioned issues, crack detection methods that fuse near-field and far-field images have emerged in recent years. These methods map high-precision crack information extracted from near-field images onto far-field images, and then use 3D reconstruction to obtain a point cloud model with crack markings. This approach preserves crack details while enabling the visualization of cracks within the 3D model.
[0005] However, in existing methods, the initial point cloud model obtained from 3D reconstruction is located in a random camera coordinate system. The coordinate systems of point cloud models vary depending on the building and shooting conditions, lacking a unified measurement benchmark. The location information of cracks in the point cloud model is only represented as relative coordinates, making it difficult to directly convert into spatially meaningful locations with actual physical significance, such as crack height above the ground or horizontal offset. In the practice of ancient building conservation, maintenance personnel need to know the specific spatial location of cracks to formulate repair plans and set up work platforms, but existing technologies cannot automatically obtain this crucial information directly from the point cloud model.
[0006] Therefore, how to establish a unified reference benchmark based on obtaining a three-dimensional point cloud model with crack markings, and realize the automatic extraction and quantitative expression of crack spatial information, has become a technical problem that urgently needs to be solved in the field of ancient building crack detection technology. Summary of the Invention
[0007] To address the challenge of establishing a unified reference benchmark based on a 3D point cloud model with crack markings, thereby enabling the automatic extraction and quantification of crack spatial information, this invention provides a method and apparatus for marking cracks in ancient buildings based on multiple views.
[0008] To achieve the above objectives, the present invention provides the following technical solution: A method for marking cracks in ancient buildings based on multiple views, the method comprising: Acquire close-up images containing cracks and multiple distant image sequences containing the entirety or parts of the ancient building; Extract crack information from the close-up image to obtain crack pixel coordinates; Affine invariant feature matching is performed on the close-up image and each distant image to map the crack pixel coordinates in the close-up image to the distant image, forming a new distant image sequence; Based on the new distant image sequence, multi-view geometric 3D reconstruction is performed to generate a 3D point cloud model of the ancient building with crack markings. In the 3D point cloud model, the plane where the crack is located and the ground plane are fitted respectively to obtain the corresponding first normal vector and second normal vector; the process of fitting the plane where the crack is located and the ground plane is implemented by singular value decomposition based on the least squares method, and the eigenvector corresponding to the minimum singular value is used as the normal vector of the plane. A cross product operation is performed based on the first and second normal vectors to obtain a third vector parallel to the ground; the first, second, and third normal vectors are used as the basis axes of the new coordinate system to construct the unified reference coordinate system. In a unified reference coordinate system, the spatial location information of cracks is extracted based on crack markers in a 3D point cloud model, thus realizing the spatial positioning and marking of cracks.
[0009] Optionally, performing affine invariant feature matching between the near-view image and each distant image to map the crack pixel coordinates in the near-view image to the distant image includes: The ASIFT algorithm is used to extract affine invariant feature points from the near-field and far-field images, and feature matching is then performed. The RANSAC algorithm is used to remove mismatched point pairs, resulting in an optimized set of matching points. Based on the optimized matching point set, the homography matrix between the near-field image and the far-field image is solved using the eight-point method or singular value decomposition. Based on the homography matrix, the formula for mapping the crack pixel coordinates in the near-field image to those in the far-field image is as follows: ; ; in, and The coordinates of the crack pixels in the close-up image. and These are the pixel coordinates of the crack in the mapped distant image. It is a homography matrix.
[0010] Optionally, based on the new distant image sequence, multi-view geometric 3D reconstruction is performed to generate a 3D point cloud model of the ancient building with crack markers, including: Based on the incremental motion recovery structure algorithm, sparse point cloud reconstruction is performed on new distant image sequences to obtain camera pose and sparse 3D points. A multi-view stereo vision algorithm based on patches is used to perform dense matching on sparse point clouds to generate a dense 3D point cloud model of the ancient building with crack markings. In the process of dense matching on sparse point clouds, bundle adjustment is used to jointly optimize camera parameters and 3D point coordinates to minimize reprojection error.
[0011] Optionally, the extraction of the spatial location information of the crack includes: Crack point clouds are segmented from a 3D point cloud model based on color features; The maximum and minimum coordinates of the crack point cloud in the direction perpendicular to the ground are statistically analyzed and used as the distances from the top and bottom of the crack to the ground, thereby realizing the spatial positioning of the crack.
[0012] Optionally, extracting crack information from the close-up image includes: Preprocessing of close-up images includes grayscale conversion, filtering for noise reduction, and contrast enhancement. A region growing algorithm combining local Otsu's method and local mean was used to segment cracks in the preprocessed image to obtain a binarized crack image. The binary crack image is skeletonized and burrs are removed. The pixel coordinates of the crack skeleton are counted to obtain the crack pixel coordinates.
[0013] A multi-view-based crack marking device for ancient buildings, the device comprising: The acquisition module is used to acquire a close-up image containing cracks and a sequence of multiple distant images containing the whole or part of the ancient building; extract crack information from the close-up image to obtain the crack pixel coordinates; The generation module is used to perform affine invariant feature matching between the close-up image and each distant image, and to map the crack pixel coordinates in the close-up image to the distant image to form a new distant image sequence. The reconstruction module is used to perform multi-view geometric 3D reconstruction based on the new distant image sequence to generate a 3D point cloud model of the ancient building with crack markings. A construction module is used to fit the plane where the crack is located and the ground plane in the 3D point cloud model, respectively, to obtain the corresponding first normal vector and second normal vector. The process of fitting the plane where the crack is located and the ground plane is implemented by singular value decomposition based on the least squares method, and the eigenvector corresponding to the minimum singular value is used as the normal vector of the plane. A cross product operation is performed based on the first normal vector and the second normal vector to obtain a third vector parallel to the ground. The unified reference coordinate system is constructed using the first normal vector, the second normal vector and the third normal vector as the basis axes of the new coordinate system. The marking module is used to extract the spatial location information of cracks based on crack markers in a 3D point cloud model within a unified reference coordinate system, thereby realizing the spatial positioning and marking of cracks.
[0014] The multi-view-based method for marking cracks in ancient buildings provided by this invention has the following beneficial effects: This invention transforms point cloud data from a random camera coordinate system into a spatially meaningful reference by fitting the plane containing the crack to the ground plane, extracting the normal vector, and constructing an orthogonal coordinate system pointing perpendicular to the ground, parallel to the ground, and perpendicular to the wall. This coordinate system construction method allows for the quantitative expression of the crack's spatial location, and key parameters such as the crack's centroid coordinates and height above the ground can be automatically extracted, providing direct data support for the formulation of ancient building restoration plans. Simultaneously, the establishment of a unified coordinate system allows for comparative analysis of data collected at different time points under the same reference, creating conditions for long-term crack evolution monitoring. Furthermore, this method is entirely based on the adaptive construction of the point cloud's own geometric features, requiring no external measurement equipment, exhibiting a high degree of automation, and demonstrating good universality for ancient buildings of different materials and structures, significantly improving the practicality and engineering application value of crack detection in ancient buildings. Attached Figure Description
[0015] To more clearly illustrate the embodiments and design schemes of the present invention, the accompanying drawings required for this embodiment will be briefly described below. The drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0016] Figure 1 This is a flowchart illustrating a multi-view-based method for marking cracks in ancient buildings according to an exemplary embodiment of the present invention.
[0017] Figure 2 This is a schematic diagram of a crack space marking and positioning process provided by the present invention according to an exemplary embodiment.
[0018] Figure 3 This is a schematic diagram of an affine camera model provided by the present invention according to an exemplary embodiment.
[0019] Figure 4 This is a schematic diagram of an ASIFT spatial sampling point provided by the present invention according to an exemplary embodiment, wherein (a) is a front view and (b) is a top view.
[0020] Figure 5 This is a schematic diagram of a Gaussian pyramid model provided by the present invention according to an exemplary embodiment.
[0021] Figure 6 This is a schematic diagram of near-field and far-field crack mapping provided by the present invention according to an exemplary embodiment.
[0022] Figure 7 This is a schematic diagram of the main process of multi-view geometric three-dimensional reconstruction according to an exemplary embodiment of the present invention.
[0023] Figure 8 This is a schematic diagram of the main process of an incremental SFM algorithm provided by the present invention according to an exemplary embodiment.
[0024] Figure 9 This is a schematic diagram of an image connection according to an exemplary embodiment of the present invention.
[0025] Figure 10 This is a schematic diagram of a CMVS algorithm flow provided by the present invention according to an exemplary embodiment.
[0026] Figure 11 This is a block diagram of a multi-view-based crack marking device for ancient buildings provided by the present invention according to an exemplary embodiment. Detailed Implementation
[0027] To enable those skilled in the art to better understand and implement the technical solutions of the present invention, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. The following embodiments are only used to more clearly illustrate the technical solutions of the present invention and should not be construed as limiting the scope of protection of the present invention.
[0028] The technical solutions provided by the various embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
[0029] First, this invention provides a method for marking cracks in ancient buildings based on multiple views, specifically as follows: Figure 1 As shown, it includes the following steps: S101. Obtain a close-up image containing cracks and a sequence of multiple distant images containing the whole or part of the ancient building; extract crack information from the close-up image and obtain the crack pixel coordinates.
[0030] In this step, when extracting crack information, the near-field image is first preprocessed by grayscale conversion, filtering and noise reduction, and contrast enhancement. Then, a region growing algorithm combining local Otsu's method and local mean is used to segment the crack in the preprocessed image to obtain a binarized crack image. Finally, the binarized crack image is skeletonized and burrs are removed. The pixel coordinates of the crack skeleton are counted to obtain the crack pixel coordinates.
[0031] S102. Perform affine invariant feature matching on the close-up image and each distant image, and map the crack pixel coordinates in the close-up image to the distant image to form a new distant image sequence.
[0032] In this step, the ASIFT algorithm is used to extract affine invariant feature points from the near-field and far-field images and perform feature matching; the RANSAC algorithm is used to remove mismatched point pairs to obtain an optimized matching point set; based on the optimized matching point set, the homography matrix between the near-field and far-field images is solved using the eight-point method or singular value decomposition.
[0033] Based on this homography matrix, the formula for mapping the crack pixel coordinates in the near-field image to those in the far-field image is as follows: ; ; in, and The coordinates of the crack pixels in the close-up image. and These are the pixel coordinates of the crack in the mapped distant image. It is a homography matrix.
[0034] S103. Based on the new distant image sequence, perform multi-view geometric 3D reconstruction to generate a 3D point cloud model of the ancient building with crack markings.
[0035] In this step, based on the incremental structure-of-motion (SOMO) algorithm, sparse point cloud reconstruction is performed on the new distant image sequence to obtain the camera pose and sparse 3D points. A patch-based multi-view stereo vision algorithm is used to perform dense matching on the sparse point cloud to generate a dense 3D point cloud model of the ancient building with crack markings. In the process of dense matching on the sparse point cloud, bundle adjustment is used to jointly optimize the camera parameters and 3D point coordinates to minimize the reprojection error.
[0036] S104. In the three-dimensional point cloud model, fit the plane where the crack is located and the ground plane respectively to obtain the corresponding first normal vector and second normal vector; perform a cross product operation based on the first normal vector and the second normal vector to obtain a third vector parallel to the ground; use the first normal vector, the second normal vector and the third normal vector as the basis axes of the new coordinate system to construct the unified reference coordinate system.
[0037] The process of fitting the plane where the crack is located and the ground plane is achieved by singular value decomposition based on the least squares method, and the eigenvector corresponding to the minimum singular value is used as the normal vector of the plane.
[0038] S105. In a unified reference coordinate system, based on the crack markers in the three-dimensional point cloud model, extract the spatial location information of the cracks to realize the spatial positioning and marking of the cracks.
[0039] The transformation relationship for converting the original 3D point cloud coordinates to this unified reference coordinate system is as follows: ; in, Let be any vector in the original coordinate system. For rotation matrix, It is a translation vector. For the reference coordinate system corresponding to The vector.
[0040] Finally, the crack point cloud is segmented from the 3D point cloud model based on color features; the maximum and minimum coordinates of the crack point cloud in the direction perpendicular to the ground are calculated and used as the distances to the top and bottom of the crack relative to the ground, thus realizing the spatial positioning of the crack.
[0041] Using the above method, by fitting the plane where the crack is located to the ground plane, extracting the normal vector, and constructing an orthogonal coordinate system pointing perpendicular to the ground, parallel to the ground, and perpendicular to the wall, the point cloud data under the random camera coordinate system is transformed into a spatial reference with clear physical meaning. This coordinate system construction method allows for the quantitative expression of the spatial location of the crack, and key parameters such as the crack centroid coordinates and height above the ground can be automatically extracted, providing direct data support for the formulation of ancient building restoration plans. Simultaneously, the establishment of a unified coordinate system allows for comparative analysis of data collected at different time points under the same reference, creating conditions for long-term crack evolution monitoring. Furthermore, this method is entirely based on the adaptive construction of the point cloud's own geometric features, requiring no external measurement equipment, exhibiting a high degree of automation, and demonstrating good universality for ancient buildings of different materials and structures, significantly improving the practicality and engineering application value of ancient building crack detection.
[0042] Based on the above steps, this invention also provides an embodiment. Image matching is a crucial step in mapping cracks in ancient buildings, requiring the matching of the near-view crack image, which has already undergone crack identification, with the distant view image. If cracks are directly extracted from the distant view image, the crack accuracy cannot be guaranteed due to the small percentage of crack pixels, making it impossible to accurately determine the positional relationship between the crack and the image. Furthermore, according to the requirements of multi-view geometric 3D reconstruction, multiple distant views of the ancient building are needed to reconstruct the 3D model of the ancient building. Therefore, this invention proposes... Figure 2 The diagram illustrates the spatial marking and localization process of cracks. Analysis of the multi-view geometric 3D reconstruction process reveals that its reconstruction is based on a series of 2D images. Common cracks are identified and marked within these images, and spatial marking of the cracks is achieved through 3D reconstruction, thus obtaining the crack's spatial information. This study uses close-up crack images as a foundation, employing the ASIFT algorithm to construct the relationship between close-up and distant images. Cracks are mapped onto the distant image sequence using a homography matrix, and then multi-view geometric 3D reconstruction is used to achieve crack spatial marking. Finally, a reference coordinate system is established to obtain the crack's spatial information, enabling its spatial localization.
[0043] The distant view sequence is the foundation of 3D reconstruction and a prerequisite for spatial labeling and localization of cracks. Obtaining accurate crack spatial information requires mapping high-resolution near-view cracks onto the distant view, which necessitates finding the correspondence between the two images, i.e., image matching. This step uses the ASIFT algorithm to find the correspondence between the near-view and distant views, i.e., feature point extraction and matching, and then uses a homography matrix to map the cracks onto the distant view.
[0044] In feature point extraction and matching, the SIFT algorithm is a locally invariant feature method that is not easily affected by noise and illumination, and has good matching ability and robustness. The main idea of this algorithm is to find scale- and rotation-invariant feature points in scale space, and then define orientation parameters for the found key points to generate 128-dimensional feature point descriptors.
[0045] When significant affine distortion exists between two images, the SIFT algorithm significantly reduces the number of feature points extracted and matched. As the image viewing angle increases, the number of feature points decreases, making mismatches more likely. The ASIFT algorithm, based on SIFT, adds the tilt angle between the camera and the image normal. and rotation angle This solves the problem of complete invariance of the camera's dimensional and longitude angles, and can be used to simulate all possible transformations of the original image. The model is as follows: Figure 3 As shown. Then, normalized rotation and translation are performed to simulate the entire affine space. Finally, the SIFT algorithm is used to extract feature points for all simulation cases.
[0046] in, For camera The simulated imaging plane Simulated camera Rotation along the optical axis Simulate changes in camera focal length.
[0047] By changing these parameters, one can simulate the changes that may occur when a camera captures an object from different angles. Affine simulation. The expression is:
[0048] ; ; in , For camera rotation parameters, For the degree of inclination, This is the absolute tilt amount, representing the tilt between the front view and the tilted view of the camera. In practice, it is necessary to adjust... and Sampling is used to achieve invariance to arbitrary affine transformations. Experiments show that... , This means that distant crack images acquired at an 80° tilt angle can also be mapped. A larger acquisition angle means that during image acquisition, the camera's optical center can have a larger angle with the normal vector of the building crack mapping surface, enabling crack mapping from near-far images at large tilt angles, increasing the number of distant crack-marked images, and obtaining a 3D model with clear and accurate location markings.
[0049] Once these known sampling intervals are obtained, it becomes possible to simulate all transformed images, such as... Figure 4 As shown, (a) is the front view and (b) is the top view. Feature points are then extracted and matched from all these images to determine the relationship between near and far images. Experiments demonstrate that the ASIFT algorithm can effectively increase the number of feature points and ensure matching accuracy.
[0050] By constructing a multi-scale space for the image, we obtain, for example Figure 5 The Gaussian pyramid is shown. Next, the Gaussian images at two adjacent scales are subtracted to obtain the Gaussian difference multi-scale space (DOG). Finally, a 128-dimensional feature point descriptor is found in the difference space.
[0051] In this step, the SIFT algorithm is performed in scale space, and the formula for constructing scale space is as follows: ; ; in, It is a two-dimensional Gaussian function with variable dimensions. The original image and the two images can be convolved to obtain the scale space. . The blur factor determines the degree of blur in an image; a larger value indicates a more blurred image. For an image... It is necessary to calculate its different The image below.
[0052] Feature point matching between two images involves finding similar 128-dimensional feature descriptors between them. First, the Euclidean distance between the two image descriptors is calculated to find initial matching points. ; ; in The degree of similarity between descriptors in two images. Nearest neighbor distance European distance, Second nearest neighbor distance European distance, The distance constraint determines whether a matching point pair is valid. Even then, the matching point pair may still contain noise, requiring iterative estimation using the RANSAC (Random Sampling Acquisition) algorithm to determine the optimal model between the two views, removing mismatched points, and obtaining the final matching point pair.
[0053] After obtaining multiple matching point pairs, the homography matrix between the near-view and far-view images is calculated using the eight-point method. Then, this transformation matrix is used to map the crack pixel coordinates onto the far-view image, resulting in a labeled image sequence and achieving crack labeling. A schematic diagram of the crack mapping process is shown below. Figure 6 As shown.
[0054] Assuming that the close-up crack image and the distant image are matched... The correspondence between them is The correspondence between pixels is The effective matching point pairs obtained by using the ASIFT and RANSAC algorithms are as follows: .
[0055] The homography matrix is calculated using the 8-point method, which involves 4 pairs of points. From the definition of homography matrix, we know that:
[0056] ; in , The normalized matching point pairs obtained from feature point extraction and matching. It is a homography matrix with 8 degrees of freedom.
[0057] Expanding the above equation, we get: ; After sorting, we get: ; .
[0058] To facilitate solving the homography matrix, it is rearranged as follows: Format: ; ; in: ; ; .
[0059] Using the obtained matching point pairs, substitute them with the information about From the formal formula, we get: ; in: .
[0060] make Constrain the equations. A pair of matching points can form two equations due to the homography matrix. With only 8 degrees of freedom, at least 4 pairs of matching points are needed to solve for the homography matrix. The method using 4 pairs of points is called the 8-point method. Here, we use Singular Value Decomposition (SVD) to solve the equation, and the required homography matrix... For matrix The right singular vector with the minimum singular value, i.e. middle The last column. If only 4 pairs of points are used in the solution process, the calculation results will be inaccurate due to the influence of error points. Therefore, in order to ensure the accuracy of crack mapping, more points are used to calculate and verify the correctness of the homography matrix.
[0061] The pixel coordinates of the crack in the close-up image are denoted as... By using the matching points of the near and far views to perform crack mapping on the solved homography matrix, a far view image is obtained. The crack pixel coordinates in the image are denoted as... The near-field crack mapping formula is: ; .
[0062] By extracting cracks through digital image processing, the pixel coordinates of the cracks can be statistically determined, and the crack pixels can be mapped onto the distant image using the mapping formula mentioned above.
[0063] After processing, the cracks in the distant view are displayed in more prominent red, and only one close-up crack image is needed to map the cracks in multiple distant images. These labeled images can then be used as part of the image sequence for 3D reconstruction, thereby constructing a crack point cloud model with spatial information.
[0064] As a representative image-based visual 3D reconstruction method, multi-view geometric 3D reconstruction utilizes multiple digital images to reconstruct a surface 3D scene. The main process is as follows: Figure 7 As shown.
[0065] The two-dimensional image sequence is the distant view obtained by the present invention. This sequence includes a distant view with crack markings after mapping and the original image. Next, the SIFT algorithm is used for feature point extraction and matching, the SFM (Structure of Motion Recovery) algorithm is used to obtain the camera's spatial position and the sparse point cloud of the ancient building, and then the CMVS / PMVS (Multi-view Clustering / Dense Matching Based on Patch Model) algorithm is used to obtain the dense point cloud of the ancient building. This dense point cloud includes marked crack information. Finally, the initial coordinate system of the point cloud is transformed to the reference coordinate system, and the crack spatial information is extracted based on the color features of the crack point cloud to achieve localization.
[0066] Feature points that represent image information are the foundation for establishing the correspondence between pixels in a two-dimensional image sequence, and are also the basis for multi-view... Figure 3 The foundation of 3D reconstruction. Integrating feature point extraction and matching efficiency, the SIFT algorithm is used to extract and match feature points from each view of the distant image, and remove RANSAC mismatch points.
[0067] SFM (Spatial Visualization) technology, used in multi-view geometric 3D reconstruction to solve for camera spatial position and obtain scene 3D structural information, requires that camera motion during image capture must include translation; rotation alone cannot obtain the solution for spatial 3D points. The incremental SFM algorithm performs a Bundle Adjustment (BA) optimization after each new viewpoint is added, adjusting the camera pose and the spatial position of 3D points. Therefore, it has good robustness and is insensitive to initial values. This invention uses the incremental SFM algorithm, and the main steps are as follows: Figure 8 As shown.
[0068] The input sequence of new distant images is first subjected to feature point extraction and matching. The number of matching points is used to describe the overlapping areas of the images, thereby constructing an image connectivity graph, as shown in the diagram. Figure 9As shown in the diagram, the connectivity graph directly determines the order in which new viewpoints are added. During initialization, the selection of initial viewpoint pairs must satisfy the conditions of a sufficiently large overlap area between the two viewpoints and a sufficient number of successfully triangulated matching pairs. Then, Tracks reconstruction is performed to obtain the initial 3D points and camera positions. Points at infinity and those with large reprojection errors are filtered out. Next, Base Algorithm (BA) optimization is performed to adjust the 3D points and camera poses, completing the initialization. The selection of new viewpoints is crucial for the remaining reconstruction; therefore, starting from the third image, the viewpoint with the most overlapping points is selected using the image connectivity graph. The optimal viewpoint is added to the 3D reconstruction. Then, the camera pose is solved and adjusted using the PnP algorithm. Tracks reconstruction is then performed to obtain new 3D points that were not reconstructed. Finally, filtering and global optimization (BA) are applied. This process of "reconstructing new viewpoints" is repeated until no new viewpoints are added, thus completing the reconstruction of the sparse point cloud.
[0069] BA optimization ensures the stability of the entire 3D reconstruction process. It employs joint nonlinear optimization of camera parameters and 3D points to minimize reprojection error. The specific formula is as follows: ; in, These are the three-dimensional points obtained after reconstruction. For camera parameters, For the observation point, For point From the perspective The projection point below. It is a constant, when point From the perspective When visible below ,on the contrary .
[0070] The incremental SFM algorithm yields too few 3D points and contains redundant data, failing to fully represent the 3D model information of the target, especially crack information. Therefore, subsequent processing is needed to increase the number of 3D points to increase the target data volume. This invention uses CMVS / PMVS to achieve dense matching after sparse point cloud reconstruction. This algorithm can be used for 3D reconstruction of multi-view images and is not constrained by prior conditions such as bounding boxes and depth maps.
[0071] The CMVS algorithm groups the image set into multiple view clusters based on the principle of high overlap between adjacent viewpoints. To ensure that each 3D point in the target point cloud is reconstructed from at least one view cluster, memory consumption is reduced and reconstruction efficiency is improved. Then, the PMVS algorithm is used to process each view cluster to generate 3D points, completing the reconstruction of the dense point cloud.
[0072] Each view cluster The partitioning must satisfy the following constraints: compactness (removing redundant images from view clusters); size (ensuring each view cluster is small enough to allow for reconstruction); and coverage (ensuring the reconstruction results of view clusters maintain image information integrity while minimizing loss). Assume 3D points in a sparse point cloud... In view cluster The accuracy that can be reconstructed is The condition that a point can be reconstructed under at least one view cluster condition is:
[0073] ; in For point The collection of images visible in all images It is a constant, typically taking the value of 0.7.
[0074] If there is a sparse point In the image and images If both images are visible in the image set, then the two images are adjacent. If two images in two image sets are adjacent, then the two image sets are adjacent. If sparse points... Image set With point Image set If two points are adjacent and their projected positions differ within a certain range, then these two points are considered visible neighbors. Based on the above view clustering algorithm, the basic process of the CMVS algorithm is as follows:
[0075] Step 1, SFM filtering. A search is performed in the neighborhood of a sparse point, and the average coordinates of points with the same visibility in the neighborhood are used to replace that point, thereby reducing the number of input points.
[0076] Step two, image selection. Based on the coverage constraint, images that do not meet the conditions are removed. The process iterates from low-resolution to high-resolution images, prioritizing the removal of low-resolution images to ensure image quality.
[0077] Step 3, View Cluster Classification. Image coverage constraints are applied; images that do not meet the size constraints are classified into smaller view clusters that do meet the constraints.
[0078] Step four, add images. (Use...) Calculate each sparse point that was not added Satisfactory optimal view cluster Image Join and calculate the points Corresponding efficiency value Then select the image with the highest efficiency value. Join In this way, the view cluster can cover more images.
[0079] Repeat steps three and four until both the size constraint and the coverage constraint are satisfied. The CMVS algorithm flow is as follows: Figure 10 As shown.
[0080] In obtaining each optimal view cluster Then, the PMVS algorithm is used for each Reconstruction is performed separately, and under the constraints of local luminosity consistency and global visualization consistency, dense 3D points with complete target information are obtained after matching, dilation, and culling, thus completing the dense point cloud reconstruction.
[0081] After 3D reconstruction, the point cloud data includes crack marker information from the distant view sequence. This is manifested in the crack point cloud with distinct color features, providing a basis for subsequent crack spatial localization.
[0082] Establishing a coordinate system at a specific location in three-dimensional space helps determine the exact location of spatial cracks. Therefore, coordinate system transformation is necessary, converting the initial coordinate system to a reference coordinate system according to certain rules. In the densely reconstructed point cloud data, there are crack marker point clouds with obvious color characteristics. Therefore, crack point cloud data can be extracted based on color features, and relevant data can be statistically analyzed to obtain three-dimensional coordinates, thus achieving spatial localization of the cracks.
[0083] First, using the initial coordinate system as a reference, perform plane fitting on the plane where the crack is located and the ground to obtain vectors of the two planes; then calculate the third vector, which is the vector parallel to the ground, based on the vector cross product; finally, transform the initial coordinate system based on these three vectors and a point in space.
[0084] The calculation of the plane normal vector uses a least squares method. Assume the coordinates of several points on the plane are... The calculated centroid is The fitted plane equation is:
[0085] ; Substituting the centroid into the above plane equation and subtracting it from the equation, we get: ; Substituting all points into the above equation and converting it to matrix form, we get: .
[0086] Ideally, substituting all points into the above equation would satisfy the condition. However, in reality, not all points lie on the plane. Therefore, it is required that the sum of the distances from the plane to all points be minimized, i.e., satisfying the condition... Under the constraints Minimum. Now for the matrix Perform singular value decomposition to obtain The required plane normal vector is the eigenvector corresponding to the minimum singular value, i.e. The last column is the required normal vector, which is obtained after normalization. .
[0087] The method is used to estimate the crack surface and ground normal vectors respectively, denoted as follows: , The third vector can be obtained by cross product, that is: .
[0088] At this point, all three vectors are unit vectors and are perpendicular to each other. Parallel to the ground, these three vectors are considered in a new coordinate system. Assume that after transformation, these three vectors are transformed into... , , Then for any vector in the original coordinate system After transforming the coordinate system, it becomes Based on the transformation relationship between the two coordinate systems, we have:
[0089] .
[0090] because , , All are unit vectors, therefore Given a 3×3 identity matrix, substituting it into the above transformation relation yields: .
[0091] at this time Rotation matrix And since it is an orthogonal matrix with a determinant of 1, we can obtain... Rotational transformation relationship: .
[0092] After the rotation is complete, let Rotation matrix The specified origin is Then the coordinate system is translated by a vector. Switch to the specified location, i.e. The final transformation relationship is: ; Convert to a homogeneous coordinate form that is easy to calculate, and express it using a matrix: ; in, It is a 3×3 matrix, consisting of two plane normal vectors and a calculated vector. It is a 3×1 matrix, where the coordinates of the reference point are negative.
[0093] After the 3D reconstruction is completed, plane fitting is performed on the crack surfaces of the wall and bridge, and the point cloud of the ground, and the normal vector is calculated. Then, a cross product is performed to obtain a third vector, and the final coordinate system is obtained according to equation (5-31). Since different targets may have different initial coordinate systems, for the convenience of calculation, the final coordinate system is rotated around the coordinate axes to unify the orientation of the three coordinate axes: The axis points inward into the crack surface. The axis is perpendicular to the ground and upwards. The axis is parallel to the ground.
[0094] After normalizing the coordinate system, the crack point cloud is segmented based on crack color characteristics, and the centroid coordinates are calculated. Statistical analysis of crack point cloud coordinates Maximum value in the axial direction and minimum value This allows us to determine the spatial location of the crack and its distance from the ground.
[0095] Secondly, the present invention also provides a multi-view-based crack marking device for ancient buildings, such as... Figure 11 As shown, it includes: The acquisition module 201 is used to acquire a close-up image containing cracks and a sequence of multiple distant images containing the whole or part of the ancient building; extract crack information from the close-up image to obtain the crack pixel coordinates.
[0096] The generation module 202 is used to perform affine invariant feature matching between the close-up image and each distant image, and to map the crack pixel coordinates in the close-up image to the distant image to form a new distant image sequence.
[0097] Reconstruction module 203 is used to perform multi-view geometric 3D reconstruction based on the new distant image sequence to generate a 3D point cloud model of the ancient building with crack markings.
[0098] The construction module 204 is used to fit the plane where the crack is located and the ground plane in the 3D point cloud model respectively, and obtain the corresponding first normal vector and second normal vector. The process of fitting the plane where the crack is located and the ground plane is implemented by singular value decomposition based on the least squares method, and the eigenvector corresponding to the minimum singular value is used as the normal vector of the plane. The cross product operation is performed based on the first normal vector and the second normal vector to obtain a third vector parallel to the ground. The first normal vector, the second normal vector and the third normal vector are used as the basis axes of the new coordinate system to construct the unified reference coordinate system.
[0099] The marking module 205 is used to extract the spatial location information of cracks based on crack markings in a three-dimensional point cloud model in a unified reference coordinate system, thereby realizing the spatial positioning marking of cracks.
[0100] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0101] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0102] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0103] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0104] It should be noted that the specific embodiments described above enable those skilled in the art to more fully understand the present invention, but do not limit the present invention in any way. Therefore, although the present invention has been described in detail in this specification, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the present invention; and all technical solutions and improvements that do not depart from the spirit and scope of the present invention are covered within the protection scope of the patent of the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.
Claims
1. A method for marking cracks in ancient buildings based on multiple views, characterized in that, The method includes: Acquire close-up images containing cracks and multiple distant image sequences containing the entirety or parts of the ancient building; Extract crack information from the close-up image to obtain crack pixel coordinates; Affine invariant feature matching is performed on the close-up image and each distant image to map the crack pixel coordinates in the close-up image to the distant image, forming a new distant image sequence; Based on the new distant image sequence, multi-view geometric 3D reconstruction is performed to generate a 3D point cloud model of the ancient building with crack markings. In the 3D point cloud model, the plane where the crack is located and the ground plane are fitted respectively to obtain the corresponding first normal vector and second normal vector; the process of fitting the plane where the crack is located and the ground plane is implemented by singular value decomposition based on the least squares method, and the eigenvector corresponding to the minimum singular value is used as the normal vector of the plane. A cross product operation is performed based on the first and second normal vectors to obtain a third vector parallel to the ground; the first, second, and third normal vectors are used as the basis axes of the new coordinate system to construct the unified reference coordinate system. In a unified reference coordinate system, the spatial location information of cracks is extracted based on crack markers in a 3D point cloud model, thus realizing the spatial positioning and marking of cracks.
2. The method according to claim 1, characterized in that, Performing affine invariant feature matching between the close-up image and each distant image, mapping the crack pixel coordinates in the close-up image to the distant image includes: The ASIFT algorithm is used to extract affine invariant feature points from the near-field and far-field images, and feature matching is then performed. The RANSAC algorithm is used to remove mismatched point pairs, resulting in an optimized set of matching points. Based on the optimized matching point set, the homography matrix between the near-field image and the far-field image is solved using the eight-point method or singular value decomposition. Based on the homography matrix, the formula for mapping the crack pixel coordinates in the near-field image to those in the far-field image is as follows: ; ; in, and The coordinates of the crack pixels in the close-up image. and These are the pixel coordinates of the crack in the mapped distant image. It is a homography matrix.
3. The method according to claim 1, characterized in that, Based on the new distant image sequence, multi-view geometric 3D reconstruction is performed to generate a 3D point cloud model of the ancient building with crack markings, including: Based on the incremental motion recovery structure algorithm, sparse point cloud reconstruction is performed on new distant image sequences to obtain camera pose and sparse 3D points. A multi-view stereo vision algorithm based on patches is used to perform dense matching on sparse point clouds to generate a dense 3D point cloud model of the ancient building with crack markings. In the process of dense matching on sparse point clouds, bundle adjustment is used to jointly optimize camera parameters and 3D point coordinates to minimize reprojection error.
4. The method according to claim 1, characterized in that, The extraction of the spatial location information of the crack includes: Crack point clouds are segmented from a 3D point cloud model based on color features; The maximum and minimum coordinates of the crack point cloud in the direction perpendicular to the ground are statistically analyzed and used as the distances from the top and bottom of the crack to the ground, thereby realizing the spatial positioning of the crack.
5. The method according to claim 1, characterized in that, Extracting crack information from the close-up image includes: Preprocessing of close-up images includes grayscale conversion, filtering for noise reduction, and contrast enhancement. A region growing algorithm combining local Otsu's method and local mean was used to segment cracks in the preprocessed image to obtain a binarized crack image. The binary crack image is skeletonized and burrs are removed. The pixel coordinates of the crack skeleton are counted to obtain the crack pixel coordinates.
6. A multi-view-based crack marking device for ancient buildings, characterized in that, The device includes: The acquisition module is used to acquire a close-up image containing cracks and a sequence of multiple distant images containing the whole or part of the ancient building; extract crack information from the close-up image to obtain the crack pixel coordinates; The generation module is used to perform affine invariant feature matching between the close-up image and each distant image, and to map the crack pixel coordinates in the close-up image to the distant image to form a new distant image sequence. The reconstruction module is used to perform multi-view geometric 3D reconstruction based on the new distant image sequence to generate a 3D point cloud model of the ancient building with crack markings. A construction module is used to fit the plane where the crack is located and the ground plane in the 3D point cloud model, respectively, to obtain the corresponding first normal vector and second normal vector. The process of fitting the plane where the crack is located and the ground plane is implemented by singular value decomposition based on the least squares method, and the eigenvector corresponding to the minimum singular value is used as the normal vector of the plane. A cross product operation is performed based on the first normal vector and the second normal vector to obtain a third vector parallel to the ground. The unified reference coordinate system is constructed using the first normal vector, the second normal vector and the third normal vector as the basis axes of the new coordinate system. The marking module is used to extract the spatial location information of cracks based on crack markers in a 3D point cloud model within a unified reference coordinate system, thereby realizing the spatial positioning and marking of cracks.