Building unit boundary extraction method and system for inclined three-dimensional model under semantic guidance
By combining semantic guidance with deep instance segmentation and graph cut algorithms, the problem of balancing automation and high precision in building unit boundary extraction in real-world 3D modeling is solved. This achieves efficient and accurate building unit boundary extraction and enhances anti-interference capabilities for complex scenes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- WUDA GEOINFORMATICS CO LTD
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to simultaneously automate and extract the boundaries of individual buildings in real-world 3D modeling, especially in large-scale urban scenarios where they are inefficient and costly. Furthermore, existing methods suffer from issues such as incomplete boundaries, overgrowth, semantic confusion, and weak anti-interference capabilities.
A semantically guided approach is adopted, combining a cascaded semantic segmentation network and a deep instance segmentation neural network. Through point cloud classification, instance segmentation and digital surface model constraints, combined with a multi-label graph cut algorithm, global optimization segmentation is performed to generate high-precision building unit boundaries.
It achieves efficient and accurate building boundary extraction in complex urban scenarios, improves automation and robustness, ensures boundary integrity and geometric accuracy, and enhances anti-interference capabilities in complex scenarios.
Smart Images

Figure CN122244367A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of real-scene 3D modeling, and more specifically, to a method and system for extracting the boundaries of individual buildings in a semantically guided tilted 3D model. Background Technology
[0002] In the field of realistic 3D modeling, the accurate identification and contour segmentation of individual buildings is the core foundation for models to move from visualization to data-driven applications. "Building entity segmentation" refers to the automated separation and extraction of individual building entities with independent topological structures and geometric boundaries from the mesh model and orthophoto generated from the realistic 3D model. The value of this technology lies in constructing computer-recognizable digital objects for each building entity, supporting subsequent core applications such as attribute attachment, spatial analysis, and batch modeling. The automated and high-precision extraction of the two-dimensional projected contours of building entities has become a crucial step in producing compliant 3D white models and advancing realistic 3D construction. However, current mainstream building entity segmentation technologies mostly rely on manual surveying or semi-automatic screen digitization, which, while offering controllable accuracy, are inefficient and costly. Automation methods also primarily rely on low-level feature methods such as region growing, making it difficult to balance efficiency and accuracy, and failing to meet the needs of large-scale, high-efficiency data production at the city or even regional level. Therefore, developing a technical solution capable of fully automated and high-precision extraction of building entity vector boundaries from realistic 3D models has become a critical technical problem urgently needing to be solved in this field.
[0003] For the technical objective of extracting building unit boundaries, existing technologies have evolved from low-level feature-driven to high-level data-driven approaches, which can be divided into two core technical paths. One type is low-level feature-driven methods that rely on seed points, represented by region growing and watershed algorithms. These methods extract boundaries based on thresholds such as pixel grayscale difference and elevation difference, but they are highly dependent on the quality of seed points and growth rules, easily leading to incomplete or overgrown boundaries. Furthermore, they lack semantic constraints and have weak anti-interference capabilities. Compared to graph cut algorithms with global optimality, these methods lack a global constraint mechanism; simply optimizing growth rules or threshold parameters cannot fundamentally solve the problems of semantic confusion and manual dependence, making them unsuitable for the automated production needs of large-scale urban scenarios. The other type is high-level methods driven by data or with global optimization, encompassing deep instance segmentation and graph cut algorithms. Deep instance segmentation has a high degree of automation, but its output boundaries are prone to breakage, often resulting in "semantically coarse boundaries." Graph cut algorithms possess global optimality characteristics, avoiding the seed point sensitivity problem of region growing methods and adapting to large-scale scenarios. However, when applied alone, they easily confuse feature categories due to the lack of semantic guidance.
[0004] In summary, existing technologies have not yet achieved an organic integration of the high-level semantic recognition capabilities of deep instance segmentation and the global optimization advantages of graph cut algorithms. They also fail to effectively avoid the inherent defects of region growing methods and cannot simultaneously meet the technical requirements of automation, high precision, and strong robustness, making it difficult to meet the current engineering application standards for real-world 3D construction. Summary of the Invention
[0005] This invention addresses the technical problems existing in the prior art by providing a semantically guided method and system for extracting the boundaries of individual buildings in a tilted 3D model, thus solving the problem of balancing automation and high precision in the process of extracting the boundaries of individual buildings in a tilted 3D model.
[0006] According to a first aspect of the present invention, a method for extracting the boundary of a single building unit in a semantically guided tilted 3D model is provided, comprising:
[0007] Step S1: Uniformly sample point cloud data from the tilted 3D model, and perform semantic classification on the point cloud data based on a cascaded semantic segmentation network to obtain point cloud semantic classification results;
[0008] Step S2: Based on the point cloud semantic classification results, extract the initial building point cloud set, and generate a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the top undulations of the ground according to the point cloud semantic classification results.
[0009] Step S3: Perform orthophoto rendering on the tilted 3D model to generate an orthophoto image of the study area. Perform pixel-level instance segmentation on the orthophoto image based on a deep instance segmentation neural network to obtain the initial area of the building instances at the pixel level, thus forming an initial set of two-dimensional building areas.
[0010] Step S4: Perform spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed;
[0011] Step S5: Based on the Digital Elevation Model (DEM) and the Digital Surface Model (DSM), construct a multi-label graph cut energy function for each two-dimensional building region to be processed. With minimizing the multi-label graph cut energy function as the optimization objective, perform global optimization segmentation on each two-dimensional building region to be processed to generate a refined two-dimensional building unit region.
[0012] Step S6: Perform contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas to output a regular building vector boundary that conforms to the specifications.
[0013] According to a second aspect of the present invention, a semantically guided system for extracting the boundary of a tilted 3D model building unit is provided, comprising:
[0014] The semantic classification module is used to uniformly sample point cloud data from the tilted 3D model and perform semantic classification on the point cloud data based on the cascaded semantic segmentation network to obtain the point cloud semantic classification result.
[0015] The generation module is used to extract an initial set of building point clouds based on the point cloud semantic classification results, and to generate a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the top undulations of the ground surface based on the point cloud semantic classification results.
[0016] The instance segmentation module is used to perform orthophoto rendering on the tilted 3D model to generate an orthophoto image of the study area. The orthophoto image is then segmented into pixels at the level of a deep instance segmentation neural network to obtain the initial area of building instances at the pixel level, thus forming an initial set of two-dimensional building areas.
[0017] The cross-validation module is used to perform spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed.
[0018] The global optimization module is used to construct a multi-label graph cut energy function for each two-dimensional building region to be processed based on the digital elevation model (DEM) and the digital surface model (DSM). With minimizing the multi-label graph cut energy function as the optimization objective, the module performs global optimization segmentation on each two-dimensional building region to be processed, generating a refined two-dimensional building unit region.
[0019] The regularization module is used to perform contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas, and output regularized building vector boundaries that conform to the specifications.
[0020] This invention provides a semantically guided method and system for extracting the boundaries of individual buildings in a tilted 3D model. It utilizes deep instance segmentation combined with cross-validation of 2D and 3D data, using the semantically complete regions of instance segmentation to constrain the graph cut optimization process, supplemented by geometric regularization processing. This ensures that the final output boundary of the individual building is both completely closed and possesses a regular geometric shape that conforms to architectural characteristics, meeting the data requirements for high-precision 3D modeling. Through the organic integration of deep instance segmentation and graph cut algorithms, supplemented by multi-source data constraints, it overcomes the limitations of existing technologies in terms of automation, boundary accuracy, anti-interference ability, and semantic-geometric coordination, providing an efficient, accurate, and reliable automated boundary delineation solution for real-world 3D construction. Attached Figure Description
[0021] Figure 1 A flowchart illustrating a semantically guided method for extracting the boundary of a tilted 3D model building unit, as provided in one embodiment of the present invention;
[0022] Figure 2This is a schematic diagram of the point cloud sampling and classification results for a tilted 3D model.
[0023] Figure 3 This is a schematic diagram of DEM and DSM data based on the point cloud output after classification.
[0024] Figure 4 This is a schematic diagram of the orthophoto DOM (Domain of Orthophoto) results.
[0025] Figure 5 This is a schematic diagram of the building instance segmentation results generated by a deep instance segmentation network.
[0026] Figure 6 A schematic diagram illustrating the optimized initial seed pixel set distribution;
[0027] Figure 7 This is a schematic diagram showing the results of dividing individual buildings;
[0028] Figure 8 This is a schematic diagram of the regularization result of a single building;
[0029] Figure 9 This is a schematic diagram of the architecture of a semantically guided tilted 3D model building boundary extraction system provided in one embodiment of the present invention. Detailed Implementation
[0030] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. In addition, the technical features of the various embodiments or individual embodiments provided by the present invention can be arbitrarily combined with each other to form feasible technical solutions. Such combinations are not constrained by the order of steps and / or structural composition patterns, but must be based on the ability of those skilled in the art to implement them. When the combination of technical solutions is contradictory or cannot be implemented, it should be considered that such a combination of technical solutions does not exist and is not within the scope of protection claimed by the present invention.
[0031] This invention proposes an automated method for extracting building unit boundaries based on oblique 3D model (OSGB) data, integrating AI instance segmentation and optimized graph cut algorithms. This method employs a collaborative technical approach of "3D semantic initial screening, 2D instance localization, and 2D / 3D fusion segmentation." Through progressive coupling of point cloud classification, instance segmentation, and the geometric constraints of the Digital Surface Model (DSM), it achieves fully automated and high-precision extraction of building unit vector contours from oblique 3D models of complex urban scenes. This invention effectively solves the problems of existing pure 2D methods, such as adhesion in dense areas, rough boundaries, and weak anti-interference capabilities, avoiding the limitations of a single data source, and significantly improving the automation, geometric accuracy, and scene robustness of the building unit extraction process.
[0032] The implementation steps and technical principles of this invention will be described in detail below, and the overall process is as follows: Figure 1 As shown, the method includes the following steps:
[0033] Step S1: Uniformly sample point cloud data from the tilted 3D model, and perform semantic classification on the point cloud data based on a cascaded semantic segmentation network to obtain the point cloud semantic classification result.
[0034] Understandably, this step is used to generate semantically labeled point cloud data from the input tilted 3D model. Its purpose is to pre-identify the building structure in the 3D spatial domain, providing geometric and semantic priors for subsequent processing.
[0035] Specifically, firstly, point cloud sampling is performed on the 3D triangular mesh model (tilted 3D model). Specifically, a uniform sampling algorithm is used, by setting the sampling interval... A uniformly distributed 3D point cloud dataset is generated on each triangular facet of the mesh model. Sampling interval. The preferred value is 0.1 meters to 0.5 meters. This range is the optimal range determined experimentally based on common resolutions (0.1-0.5 meters) for urban building 3D models. The specific value can be adjusted according to the native resolution of the tilted 3D model and the target data size. This uniform sampling method has high computational efficiency and can generate discrete point cloud data with appropriate density suitable for deep learning processing while preserving the main geometry of the model.
[0036] Subsequently, the sampled point cloud data is input into a cascaded semantic segmentation network specifically designed for large-scale scene processing to achieve efficient and accurate semantic classification of ground features. The cascaded semantic segmentation network is a cascaded architecture designed to address the characteristics of oblique photogrammetry 3D model point clouds (uneven density, complex scenes, and large data volume). It integrates the core abstraction ideas of the existing PointNet++ network with the efficient local aggregation architecture of the RandLA-Net network, forming a two-stage processing flow of coarse-grained followed by fine-grained processing. Specifically, the cascaded semantic segmentation network consists of sequentially connected coarse-grained feature extraction modules and fine-grained semantic segmentation modules:
[0037] (1) Coarse-grained feature extraction module: This module adopts the hierarchical point set abstraction idea in the PointNet++ network. Its network structure contains multiple layers of abstract units connected in sequence. Each unit performs the operation of "farthest point sampling - local neighborhood construction based on ball query - multilayer perceptron feature extraction" in sequence. Through this structure, the input point cloud is downsampled layer by layer. At the same time, the perceptron of each layer extracts and outputs the local geometric features of the corresponding scale, realizing the initial fusion of multi-scale features. After multi-level processing, this module outputs a subset of candidate points with a significantly reduced number of points but with fused multi-scale features. The technical effect of this process is to quickly compress the data scale and initially focus on potential building areas, laying an efficient data foundation for subsequent fine processing.
[0038] (2) Fine-grained semantic segmentation module: This module is built based on the efficient local feature aggregation architecture of the RandLA-Net network. The network stacks multiple local feature aggregation units. Each unit performs the following operations sequentially on the input candidate point subset: "random sampling to preserve the point set structure - Local spatial encoding (LocSE) to capture relative geometric relationships - attention pooling for weighted feature aggregation", gradually fusing local to global contextual information. Finally, through a shared multilayer perceptron classification head, it outputs the semantic category and corresponding classification confidence of each point cloud. The technical effect of this process is that it achieves high-precision point-by-point classification on the compressed candidate set, balancing classification efficiency and accuracy, and ensuring the reliability and accuracy of the initial positioning of the building body.
[0039] Step S2: Based on the point cloud semantic classification results, extract the initial building point cloud set, and generate a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the top undulations of the ground.
[0040] Understandably, based on the classification results output by the fine-grained semantic segmentation module, building points with high classification confidence are selected to form an initial building point cloud set. This initial building point cloud set serves as the core output of this step, providing reliable initial positioning information for the main building in 3D space. This provides a crucial data source for the subsequent generation of the digital surface model (DSM) and a 3D spatial reference for verification and geometric correction of the instance segmentation results based on 2D images in subsequent steps. The color point cloud sampling and point cloud classification results are shown below. Figure 2 As shown.
[0041] In addition, this step generates a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the undulations of the ground surface (including building tops) based on the point cloud semantic classification results. These two models will provide core geometric and elevation constraints for subsequent boundary growth.
[0042] First, from the overall point cloud semantic classification results completed in step S1, all points classified as ground are extracted to form a candidate ground point cloud set. To remove outliers introduced by classification errors in this candidate ground point cloud set, a σ-threshold filtering method based on local statistics is used. Specifically, each candidate ground point in the candidate ground point cloud set is projected onto a horizontal plane, and the space is divided using a square grid with a side length of 1.0 meter. For each grid cell, the elevation values of all candidate ground point clouds falling within it are calculated, and the arithmetic mean μ and standard deviation of the elevation values of all candidate ground point clouds are solved. Subsequently, based on " "Guidelines (of which The elevation deviation coefficient ranges from 1.5 to 2.5. This range has been verified through testing on 100 sets of urban terrain point cloud data. It can remove more than 95% of non-ground outliers while retaining more than 97% of real ground points. A filtering threshold is set to filter out all candidate ground point clouds with elevation values greater than the threshold in the current grid cell. After traversing all grid cells to complete this operation, the remaining points constitute a pure ground point cloud set.
[0043] in, The principle of threshold filtering is that, within a local horizontal neighborhood, the elevation distribution of real terrain surface points follows a normal distribution centered on the terrain trend surface. Misclassified non-ground points are statistically significantly high outliers, which can be robustly removed by the adaptive statistical threshold.
[0044] Using a clean ground point cloud set as the spatial observation sample, Kriging interpolation is performed on a regular grid with a resolution of 0.2 meters (consistent with the resolution of the subsequent orthophoto DOM) to generate a reference digital elevation model (DEM). To construct an elevation surface model that suppresses interference from non-building features, the initial building point cloud set from step S1 is merged with the clean ground point cloud set obtained in this step to form a building-ground fused point cloud set. This building-ground fused point cloud set explicitly excludes point clouds semantically classified as vegetation, vehicles, etc. Furthermore, Kriging interpolation is also applied to this building-ground fused point cloud set on a grid with the same geographic reference and resolution as the aforementioned DEM to generate a building digital surface model (DSM). The reference digital elevation model (DEM) and the building digital surface model (DSM) are as follows: Figure 3 As shown.
[0045] Step S3: Perform orthophoto rendering on the tilted 3D model to generate an orthophoto image of the study area. Perform pixel-level instance segmentation on the orthophoto image based on a deep instance segmentation neural network to obtain the initial area of the building instances at the pixel level, thus forming an initial set of 2D building areas.
[0046] Understandably, based on the input real-world 3D triangular mesh model, orthophotos of the study area are generated using orthophoto projection rendering technology. Specifically, based on the 3D surface geometry and attached texture defined by the triangular mesh model, and according to the specified output plane coordinate system and ground sampling distance... (Maintaining the same DEM resolution as defined in step S2, i.e., 0.2 meters) and output range, the surface texture of the 3D model is vertically projected onto a 2D plane, thereby generating an orthophoto with uniform resolution and no perspective distortion. The generated orthophoto (DOM) is as follows: Figure 4 As shown.
[0047] Subsequently, the orthophotos are input into a pre-trained deep instance segmentation neural network to obtain initial pixel-level building instance regions. The deep instance segmentation neural network employs a Mask R-CNN architecture based on region proposal and mask prediction. This architecture is configured to simultaneously perform building target detection, classification, and pixel-level mask prediction. Its output is a series of building instances, each containing a bounding box, a "building" category label, and a corresponding initial binary mask. The Mask R-CNN architecture was chosen because its region proposal network and feature pyramid network work together to effectively adapt to the multi-scale characteristics of building targets in remote sensing imagery; simultaneously, its parallel mask prediction branch can generate relatively fine pixel-level segmentation results for each detected instance. The network is trained on a large dataset of multi-sample, multi-scene image building annotations to ensure its generalization ability.
[0048] After network inference is completed, the initial binary masks of all building instances collectively constitute an "initial 2D building region set." This "initial 2D building region set," as the core output of this step, has the following technical effect: in the 2D image domain, it provides each building target with an initial region containing semantic category identification, spatial location, and preliminary shape range. This region will serve as high-level semantic guidance information and will be used by subsequent semantic fusion and boundary refinement growth modules. The building instance segmentation result is as follows: Figure 5 As shown.
[0049] Step S4: Perform spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed.
[0050] Understandably, this step aims to eliminate false detections, complete missed detections, and generate optimized seed regions with reliable location and pure semantics by spatially registering and logically cross-validating the semantic information of 3D point clouds with the segmentation results of 2D image instances.
[0051] First, a precise spatial correspondence is established between the "initial set of two-dimensional building regions" (from step S3) and the "initial set of building point clouds" in three-dimensional space (from step S1). Based on the coordinate system consistency and resolution consistency ensured in steps S2 and S3, the pixel coordinates of each initial two-dimensional building region are mapped to three-dimensional geographic space coordinates through the geographic transformation parameters of the orthophoto. Simultaneously, the three-dimensional building point clouds are vertically projected onto the horizontal plane. Based on the above spatial mapping relationship, a two-way semantic cross-validation logic is executed: three-dimensional verification of two-dimensional (removing false detections) and two-dimensional detection of three-dimensional missed detections (filling in missed detections). Specifically, three-dimensional verification of two-dimensional involves calculating the building point cloud classification ratio (i.e., the number of three-dimensional building point clouds in the region / the total number of point clouds in the region) for each region in the "initial set of two-dimensional building regions". A ratio threshold of 70% is set (determined based on 10 sets of measured data from urban scenes; this threshold can eliminate more than 90% of the false detection regions in the image). If the ratio is less than 70%, the two-dimensional region is determined to be a false detection and is removed from the subsequent seed candidate set. The 2D detection of 3D omissions involves detecting continuous point cloud clusters in the "initial building point cloud set" that are densely packed with points and have a coverage area exceeding a threshold but are not covered by any "initial 2D building region" on the horizontal projection plane. Such point cloud clusters indicate that there are buildings that have been missed by the instance segmentation model. Based on the horizontal projection range of the point cloud cluster, a corresponding supplementary segmentation region is generated and added to the set of regions to be processed.
[0052] Subsequently, a region skeleton extraction algorithm is employed: for each building region corresponding to a binary mask, its internal spatial distance feature map is calculated. The spatial distance feature map is calculated as follows: for each foreground (building) pixel within the mask, its Euclidean distance to the nearest background (non-building) pixel is calculated, resulting in a distance value in pixels. All pixel distance values together constitute the spatial distance feature map. Based on this spatial distance feature map, building feature points with local maxima, located in the main body of the region, far from the edges, and possessing strong spatial representativeness are selected. All these feature points are defined as the "optimized initial seed pixel set" corresponding to the building region. These points are located in the "central skeleton" or main body of the building region, far from the edges, and have the strongest spatial representativeness. All these local maxima points are defined as the "optimized initial seed pixel set" corresponding to the building region. A schematic diagram of the optimized initial seed pixel set distribution is shown below. Figure 6 As shown.
[0053] Step S5: Based on the Digital Elevation Model (DEM) and the Digital Surface Model (DSM), construct a multi-label graph cut energy function for each two-dimensional building region to be processed. With minimizing the multi-label graph cut energy function as the optimization objective, perform global optimization segmentation on each two-dimensional building region to be processed to generate a refined two-dimensional building unit region.
[0054] Understandably, this step aims to achieve globally optimal fine segmentation of two-dimensional building unit regions by constructing an improved multi-label graph cut energy function that integrates multi-source data constraints and instance semantic guidance, ensuring that each building unit is independently separable.
[0055] First, based on the instance-discriminative optimized seed pixel set obtained in the previous steps, a unique foreground label is assigned to each individual building candidate region. (k=1, 2,..., N), where k represents the building instance number, and a label is uniformly assigned to all non-building background areas. This formalizes the segmentation problem into a multi-label assignment optimization problem.
[0056] Secondly, based on the Digital Elevation Model (DEM) and the Digital Surface Model (DSM), a global optimization energy function E is constructed for each independent building candidate region. The global optimization energy function E consists of a data term, a smoothing term, and an optional instance integrity constraint term. The data term undertakes the core functions of multi-source evidence fusion and adaptive evaluation; the smoothing term ensures the spatial consistency and boundary fidelity of the segmentation results; and the optional instance integrity constraint term injects higher-order semantic continuity priors. ;
[0057] in, For data items, For smoothing terms, For instance integrity constraints, Let L represent the set of pixels within the two-dimensional building area to be processed, and let L denote the label assignment scheme for all pixels p. The label assigned to pixel p. The label assigned to pixel q. These are adjustable, non-negative weighting coefficients. Each item is detailed below:
[0058] (1) Data Items : Used to measure the label assigned to pixel p The cost.
[0059] when In the background, data items The cost is jointly dominated by elevation difference and semantic prior: ;
[0060] in, This is the ground clearance of pixel p, calculated using the Digital Elevation Model (DEM) and the Digital Surface Model (DSM). The DEM represents the elevation of all ground points, while the DSM represents the elevation of all building points. The ground clearance of each building point can be calculated based on the elevation difference between the DSM and DEM. The confidence score for semantic classification of the point cloud from step S1 into a building. To control the variance parameter of high constraint sensitivity, and These are adjustable non-negative weighting coefficients. Wherein, hour, This refers to the background (ground seed region) height variance. The purpose of this variance parameter is to scale the height contribution to a uniform range, improving algorithm stability. Its calculation method is as follows: ;
[0061] Where N is the number of pixels in the corresponding background seed region. This represents the average ground clearance of the background seed region.
[0062] when (No. When considering individual buildings, the cost is primarily driven by appearance similarity and elevation consistency. ;
[0063] in, Let p be the color vector of pixel p in the DOM (Digital Orthophoto Map). Belongs to the label The average color of the seed region in the DOM; This represents the average normalized ground clearance of the seed region; This refers to the variance of the ground clearance sample for the k-th building seed region. and These are adjustable non-negative weighting coefficients. The core role of the data item is to construct a highly fault-tolerant cost evaluation model by organically integrating DOM, DSM, and point cloud data. It can achieve evidence weighting and conflict resolution across different data sources, thereby ensuring the robustness and accuracy of classification decisions.
[0064] (2) Smoothing term V pq (L p ,L q ): The penalty is applied to adjacent pixel pairs (p,q) with different labels, and the weights are adaptively adjusted based on the differences between adjacent pixels in multi-source data. ;in, These are the weighting coefficients. and For the true heights of pixels p and q in the Digital Surface Model (DSE), when hour, Select 0 otherwise select 1. and represents the weighting coefficients for the Digital Surface Model (DSM) and the Orthophoto DOM, both of which are adjustable non-negative weighting coefficients. This penalty term results in a smaller penalty when crossing the edges of real buildings (manifested as significant jumps in elevation or color), and a larger penalty when segmenting within the same building surface, thereby accurately preserving the real building edges while suppressing noise.
[0065] (3) Instance integrity constraints (Optional): ;
[0066] in, For belonging to the first The initial seed pixel set for optimizing each building instance is given, where δ(⋅) is an indicator function. δ(⋅) is 1 when the condition within the parentheses is true, and 0 otherwise. Its purpose is to penalize pixel allocations that deviate from the initial instance region, integrating the instance-level semantic prior knowledge provided in the previous step as a spatial continuity guide into pixel-level optimization, thereby achieving a balance between high-level semantic information and low-level multi-source data.
[0067] w in the formula h w s w c γ DSM γ DOM μ are adjustable non-negative weighting parameters used to balance the relative importance of various constraints. Finally, the α-expansion algorithm is used to solve the minimum energy function. By iteratively solving the two-dimensional label graph cutting problem, the optimal label is assigned to each pixel, and the separated, refined two-dimensional building unit regions are directly output, such as... Figure 7 As shown.
[0068] To improve accuracy, the average ground clearance of each candidate region (based on the elevation difference between the building's DSM and DEM) and the area of each building unit are calculated. If the average ground clearance is lower than a preset threshold (2.0 meters) or the building area is less than a threshold (5.0 square meters), it is considered noise and removed. The 2.0-meter threshold refers to the minimum height accuracy requirement for LOD2 level building models in the "Technical Specification for Urban 3D Modeling" (CJJ / T 157-2010), and the 5.0-square-meter threshold is used to filter out non-building features such as vehicles and billboards, which conforms to the conventional requirements of urban 3D modeling engineering practice. The regions that pass the verification constitute a "refined two-dimensional building unit region set".
[0069] Step S6: Perform contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas to output a regular building vector boundary that conforms to the specifications.
[0070] Understandably, this step performs vectorization and geometric optimization on the refined 2D building unit area obtained in step S5 to generate building vector boundaries that conform to manual drawing standards. The specific process is as follows:
[0071] (1) Contour Extraction and Vectorization: First, for each binary region in the "refined 2D building unit region set", the pixel coordinate sequence of the outer contour of the region is extracted using a boundary tracing algorithm. The algorithm obtains an ordered and closed set of contour points by starting from any pixel on the edge of the region and searching clockwise or counterclockwise along its eight neighbors until returning to the starting point. Then, based on the georeferenced information of the orthophoto (DOM), the pixel coordinate sequence is converted into an initial vector polygon in the real-world coordinate system.
[0072] (2) Preliminary contour simplification: To solve the "step-like" jagged edges of the boundary caused by the pixel grid characteristics and to provide concise data for subsequent analysis, the Douglas-Puk algorithm is first used to simplify the initial polygon. This algorithm recursively compares the perpendicular distance from each vertex on the polygon contour to the chord segment connecting its two adjacent vertices. That is, it first finds the two adjacent vertices of the vertex, connects the two adjacent vertices to form a chord segment, and calculates the perpendicular distance from the vertex to the chord segment.
[0073] The vertical distance is compared with a preset global tolerance threshold. If it is less than or equal to the threshold, the point is considered an eliminateable non-feature point and should be removed; otherwise, it is retained as a feature point. A relatively lenient first global tolerance threshold is set. This efficiently filters out a large number of redundant vertices generated by pixel boundaries, generating a simple polygon as input for all subsequent geometric analyses.
[0074] (3) Dominant direction detection: For each edge segment of the simplified polygon The direction angle is calculated using the arctangent function. and normalize it to Between radians, to eliminate ambiguity in the directional representation of a line segment and its opposite line segment. Then, all normalized direction angles are within a preset angle tolerance range (here set to...). / 12) Perform clustering to obtain directional clusters. , This refers to the numbering of the direction cluster. The direction clusters are then... Each line segment within is determined by its length. and direction angle Construct each line segment The weighted direction vector is denoted as ;
[0075] The weighted direction vector sum of the cluster is obtained by summing all weighted direction vectors. The direction angle of this vector sum is the weighted average direction of the cluster, and the magnitude is the cluster weight. Finally, the weighted average direction of the one or two direction clusters with the largest weights is selected as the dominant direction.
[0076] (4) Boundary right angle processing: Based on the dominant direction detected in the previous step, geometric regularization and right angle processing are performed on the simplified polygon. This processing takes the dominant direction as the reference and adjusts each edge segment of the polygon to the dominant direction with the smallest angle. Then, the vertex coordinates are updated by solving a geometric optimization problem. The geometric optimization problem takes "minimizing the distance error between the adjusted contour and the original simplified polygon" as the core optimization objective and "all edge segments are parallel to the dominant direction" as the constraint condition. The vertex coordinates are gradually corrected through iterative solution so that the adjusted polygon satisfies the condition that all edges are parallel to the dominant direction while its contour is geometrically close to the original polygon to the greatest extent.
[0077] (5) Final polygon simplification: The rectangular polygons are simplified again using the Douglas-Puk algorithm, this time with a stricter second global tolerance threshold. This process precisely removes redundant vertices generated or left over during the rectangularization process that do not contribute to the regularized geometry, ultimately outputting a simplified building outline polygon with refined vertices and regular geometry. A schematic diagram of the regularized building unit is shown below. Figure 8 As shown.
[0078] See Figure 9 This paper provides a semantically guided system for extracting the boundaries of tilted 3D model building units. The system includes:
[0079] Semantic classification model 901 is used to uniformly sample point cloud data from a tilted 3D model and perform semantic classification on the point cloud data based on a cascaded semantic segmentation network to obtain point cloud semantic classification results.
[0080] The generation module 902 is used to extract an initial set of building point clouds based on the point cloud semantic classification results, and to generate a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the top undulations of the ground surface.
[0081] The instance segmentation module 903 is used to perform orthophoto projection rendering on the tilted 3D model to generate an orthophoto image of the study area, and perform pixel-level instance segmentation on the orthophoto image based on a deep instance segmentation neural network to obtain the initial area of building instances at the pixel level, thus forming an initial set of two-dimensional building areas.
[0082] Cross-validation module 904 is used to perform spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed.
[0083] The global optimization module 905 is used to construct a multi-label graph cut energy function for each two-dimensional building region to be processed based on the digital elevation model (DEM) and the digital surface model (DSM), and to perform global optimization segmentation on each two-dimensional building region to be processed with minimizing the multi-label graph cut energy function as the optimization objective, thereby generating a refined two-dimensional building unit region.
[0084] The regularization processing module 906 is used to perform contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas, and output regular building vector boundaries that conform to the specifications.
[0085] It is understood that the semantically guided tilted 3D model building unit boundary extraction system provided by the present invention corresponds to the semantically guided tilted 3D model building unit boundary extraction method provided in the foregoing embodiments. The relevant technical features of the semantically guided tilted 3D model building unit boundary extraction system can be referred to the relevant technical features of the semantically guided tilted 3D model building unit boundary extraction method, and will not be repeated here.
[0086] This invention addresses the challenge of balancing automation and high precision by employing a strategy of automatically acquiring initial seeds through deep instance segmentation and cross-validating with 2D and 3D data. This guides subsequent global optimization segmentation via graph cut, achieving a fully automated process while ensuring boundary geometric accuracy. To address the issue of insufficient boundary integrity and geometric accuracy, the invention constrains the graph cut energy function optimization process with semantically complete regions, supplemented by dominant direction detection and right-angle conversion to ensure topological closure and regular shape of the final result. For the problem of weak anti-interference in complex scenes, the invention innovatively integrates multi-source data such as DSM, DEM, and point cloud classification results to construct graph cut energy function constraints, effectively suppressing interference from shadows, occlusion, and non-building features, thus improving scene adaptability. Finally, to address the contradiction between semantic and geometric separation, a collaborative framework of "semantic guidance and graph cut execution" is constructed, achieving a deep unification of their underlying principles. In summary, this invention, through systematic collaborative innovation, comprehensively improves the boundary accuracy, scene adaptability, and geometric rationality of individual building results, providing an efficient and reliable technical solution for real-world 3D construction.
[0087] It should be noted that the descriptions of each embodiment in the above embodiments have different focuses. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions in other embodiments.
[0088] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0089] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0090] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0091] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0092] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention.
[0093] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.
Claims
1. A semantically guided method for extracting the boundaries of individual buildings in a tilted 3D model, characterized in that, include: Step S1: Uniformly sample point cloud data from the tilted 3D model, and perform semantic classification on the point cloud data based on a cascaded semantic segmentation network to obtain point cloud semantic classification results; Step S2: Based on the point cloud semantic classification results, extract the initial building point cloud set, and generate a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the top undulations of the ground according to the point cloud semantic classification results. Step S3: Perform orthophoto rendering on the tilted 3D model to generate an orthophoto image of the study area. Perform pixel-level instance segmentation on the orthophoto image based on a deep instance segmentation neural network to obtain the initial area of the building instances at the pixel level, thus forming an initial set of two-dimensional building areas. Step S4: Perform spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed; Step S5: Based on the Digital Elevation Model (DEM) and the Digital Surface Model (DSM), construct a multi-label graph cut energy function for each two-dimensional building region to be processed. With minimizing the multi-label graph cut energy function as the optimization objective, perform global optimization segmentation on each two-dimensional building region to be processed to generate a refined two-dimensional building unit region. Step S6: Perform contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas to output a regular building vector boundary that conforms to the specifications.
2. The method according to claim 1, characterized in that, Step S1 involves uniformly sampling point cloud data from the tilted 3D model and performing semantic classification on the point cloud data based on a cascaded semantic segmentation network to obtain point cloud semantic classification results, including: Set sampling interval Uniformly distributed three-dimensional point cloud data is generated by sampling on each triangular facet of the tilted three-dimensional model. The sampled 3D point cloud data is input into a cascaded semantic segmentation network, which outputs point cloud semantic classification results. The point cloud semantic classification results include the semantic category of each point cloud data and the corresponding classification confidence.
3. The method according to claim 2, characterized in that, Step S2, based on the point cloud semantic classification results, extracts an initial building point cloud set, and generates a digital elevation model (DEM) describing ground undulations and a digital surface model (DSM) describing the top undulations of the ground surface, including: Based on the point cloud semantic classification results, all point clouds classified as ground are extracted to form a candidate ground point cloud set. The candidate ground point cloud set is filtered out using a threshold filtering method based on local statistics to obtain a clean ground point cloud set. Using the pure ground point cloud set as a spatial observation sample, spatial interpolation is performed on the grid cells of the study area using the Kriging interpolation method to generate a reference digital elevation model (DEM). Based on the semantic classification results of the point clouds, the building point clouds with high classification confidence are selected to form the initial set of building point clouds. The pure ground point cloud set and the initial building point cloud set are merged to form a building-ground fused point cloud set; The building-ground fusion point cloud set is interpolated using the Kriging interpolation method on grid cells with the same geographic reference and resolution as the reference digital elevation model (DEM) to generate a building digital surface model (DSM).
4. The method according to claim 3, characterized in that, The threshold filtering method based on local statistics filters out candidate ground point cloud sets to obtain a clean ground point cloud set, including: Project all candidate ground point clouds in the candidate ground point cloud set onto a horizontal plane, and divide the study area into grids; For each grid cell, calculate the elevation values of all candidate ground point clouds falling within it, and then calculate the arithmetic mean of the elevation values of all candidate ground point clouds. with standard deviation ; in accordance with" "The criteria set a filtering threshold, The elevation deviation coefficient is used to filter out all candidate ground point clouds within the current grid cell whose elevation values are greater than the filtering threshold. Traverse all grid cells to obtain a clean ground point cloud set.
5. The method according to claim 1, characterized in that, Step S4 involves performing spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed, including: The pixel coordinates of the initial area of each 2D building are mapped to 3D geospatial coordinates through the geographic transformation parameters of the orthophoto, and the 3D point cloud of each building is vertically projected onto the horizontal plane. Execute bidirectional semantic cross-validation logic, which includes three-dimensional verification of two-dimensional data and two-dimensional verification of three-dimensional data, wherein the three-dimensional verification of two-dimensional data includes: For each two-dimensional building region after projection transformation in the initial two-dimensional building region set, calculate the building point cloud classification ratio. If the building point cloud classification ratio is greater than or equal to the ratio threshold, then retain the two-dimensional building region; otherwise, the two-dimensional building region is a false detection and is removed. The two-dimensional verification of the three-dimensional includes: If a cluster of continuous building point clouds with dense point clouds and a coverage area exceeding the area threshold is detected in the initial building point cloud set but is not covered by any "initial two-dimensional building region", then the building point cloud cluster is a building that was missed by the deep instance segmentation neural network. Based on the horizontal projection range of the building point cloud cluster, a corresponding supplementary two-dimensional building segmentation region is generated, forming a set of two-dimensional building regions to be processed.
6. The method according to claim 1, characterized in that, The deep instance segmentation neural network outputs each building instance, and each building instance includes a building bounding box, a building category label, and a corresponding initial binary mask. Step S5 involves constructing a multi-label graph cut energy function for each two-dimensional building region to be processed. Guided by the Digital Elevation Model (DEM) and the Digital Surface Model (DSM), and with minimizing the multi-label graph cut energy function as the optimization objective, global optimization segmentation is performed to generate refined two-dimensional building unit regions, including: Based on the initial binary mask corresponding to each two-dimensional building region to be processed, the spatial distance feature map inside each two-dimensional building region to be processed is calculated. The spatial distance feature map is calculated as follows: for each foreground pixel in the initial binary mask, the Euclidean distance from it to the nearest background pixel is calculated to obtain the distance value in pixels. The distance values of all foreground pixels constitute the spatial distance feature map. Based on the spatial distance feature map, building feature points with local maxima, located in the main part of the building area, far from the building edge, and with strong spatial representativeness are selected from each two-dimensional building area to be processed, and used as the initial seed pixel set for optimization of each two-dimensional building area to be processed. Based on the optimized initial seed pixel set of the two-dimensional building region to be processed, a unique foreground label is assigned to each pixel of each two-dimensional building region to be processed. or background label , Indicates the number of the two-dimensional building area to be processed; According to the foreground label and background labels Based on the digital elevation model (DEM) and the digital surface model (DSM), a global optimization energy function E is constructed for each two-dimensional building region to be processed. With minimizing the multi-label graph cut energy function as the optimization objective, the label graph cut energy function is iteratively optimized to assign the optimal label to each pixel of each two-dimensional building region to be processed, thereby generating each two-dimensional building unit region.
7. The method according to claim 6, characterized in that, According to the foreground label and background labels Based on the digital elevation model (DEM) and the digital surface model (DSM), a global optimization energy function E is constructed for each two-dimensional building region to be processed, including: ; in, For data items, For smoothing terms, For instance integrity constraints, Let L represent the set of pixels within the two-dimensional building area to be processed, and let L denote the label assignment scheme for all pixels p. The label assigned to pixel p. The label assigned to pixel q. These are adjustable non-negative weighting coefficients; The data item The label is used to measure the pixel p. The cost, among which, when At that time, the data item The expression is: ; in, It is the height of pixel p above the ground, calculated based on the Digital Elevation Model (DEM) and the Digital Surface Model (DSM). Let p be the confidence level for classifying a pixel as a building. To control the variance parameter of high constraint sensitivity, and These are adjustable non-negative weighting coefficients; when At that time, the data item The expression is: ; in, Let p be the color vector of pixel p in the orthophoto. Belongs to the label The average color of the seed region in the orthophoto; For tags The average normalized ground clearance of the seed region, and These are adjustable non-negative weighting coefficients; The smoothing item Used to penalize adjacent pixel pairs that have been assigned different labels ( Its expression is: ; in, These are the weighting coefficients. and Let p and q be the true heights of pixels in the Digital Surface Model (DSM), when hour, Select 0 otherwise select 1. and The weighting coefficients for the Digital Surface Model (DSM) and the Orthophoto Image (DOM) are adjustable non-negative weighting coefficients. The instance integrity constraint The expression is: ; in, For belonging to the first The initial seed pixel set for optimization of each building instance, where δ(⋅) is an indicator function, with a value of 1 when the condition in parentheses is true and 0 otherwise.
8. The method according to claim 1, characterized in that, Step S6 involves performing contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas to output a regular building vector boundary that conforms to the specifications, including: For each two-dimensional building unit area, a boundary tracing algorithm is used to extract the outer contour pixel coordinate sequence of the binary mask; Based on the georeferenced information of the orthophoto, the sequence of pixel coordinates of the outer contour is converted into an initial vector polygon in the real coordinate system; The initial vector polygon is simplified using the Douglas-Puk algorithm to obtain a simplified polygon; Detect multiple dominant directions of a simplified polygon; Based on multiple dominant directions, geometric regularization and rectilinearization are performed on simplified polygons to obtain simplified building outline polygons, that is, regularized building vector boundaries.
9. The method according to claim 8, characterized in that, For each two-dimensional building unit region, a binary mask is used to extract the outer contour pixel coordinate sequence of the binary mask using a boundary tracing algorithm, including: Starting from any pixel at the edge of the binary mask for each 2D building unit region, search clockwise or counterclockwise along its eight neighborhoods until returning to the starting point to obtain an ordered, closed sequence of outer contour pixel coordinates. The process of simplifying the initial vector polygon using the Douglas-Puk algorithm to obtain a simplified polygon includes: Recursively compare the perpendicular distance from each vertex of the initial vector polygon to the chord segment connecting its two adjacent vertices, and compare the perpendicular distance with a preset first global tolerance threshold. A comparison is made; if the vertical distance is less than or equal to a preset first global tolerance threshold... If the vertex does not meet the condition, then the vertex is removed; otherwise, the vertex is retained. The detection of multiple dominant directions of the simplified polygon includes: For each edge segment of the simplified polygon The direction angle is calculated using the arctangent function. and normalize it to Between arcs; All normalized orientation angles are clustered within a preset angle tolerance range to obtain orientation clusters. , For each direction cluster, the number is the number of the direction cluster. Each inner edge segment According to the edge segments length and direction angle Construct edge segments Weighted direction vector : opposite direction cluster All inner edge segments The weighted direction vectors are summed to obtain the direction cluster. The weighted direction vector sum, where the direction angle of the weighted direction vector sum is the direction cluster. The weighted average direction, where the magnitude of the sum of the weighted direction vectors is the direction cluster. Cluster weight ; The weighted average direction of the multiple directions with the largest cluster weights is selected as the multiple dominant directions; The process of performing geometric regularization and rectilinearization on simplified polygons based on multiple dominant directions to obtain simplified building outline polygons, i.e., regularizing building vector boundaries, includes: Based on multiple dominant directions, each edge segment of the simplified polygon is adjusted to the dominant direction with the smallest included angle to obtain the first polygon after direction adjustment; The vertex coordinates of the first polygon are updated by solving a geometric optimization problem to obtain an adjusted second polygon. The geometric optimization problem has the core optimization objective of "minimizing the distance error between the adjusted second polygon contour and the first polygon contour" and the constraint condition of "all edge segments of the second polygon are parallel to the dominant direction". The vertex coordinates of the first polygon are gradually corrected through iterative solution to obtain the right-angled second polygon. For the second polygon after rectilinearization, configure a preset second global tolerance threshold. The Douglas-Puk algorithm is used to simplify the second polygon, resulting in a simplified building outline polygon.
10. A semantically guided system for extracting the boundary of a tilted 3D model building, characterized in that, include: The semantic classification module is used to uniformly sample point cloud data from the tilted 3D model and perform semantic classification on the point cloud data based on the cascaded semantic segmentation network to obtain the point cloud semantic classification result. The generation module is used to extract an initial set of building point clouds based on the point cloud semantic classification results, and to generate a digital elevation model (DEM) describing the ground undulations and a digital surface model (DSM) describing the top undulations of the ground surface based on the point cloud semantic classification results. The instance segmentation module is used to perform orthophoto rendering on the tilted 3D model to generate an orthophoto image of the study area. The orthophoto image is then segmented into pixels at the level of a deep instance segmentation neural network to obtain the initial area of building instances at the pixel level, thus forming an initial set of two-dimensional building areas. The cross-validation module is used to perform spatial cross-validation on the initial building point cloud set and the initial two-dimensional building region set to obtain the two-dimensional building region set to be processed. The global optimization module is used to construct a multi-label graph cut energy function for each two-dimensional building region to be processed based on the digital elevation model (DEM) and the digital surface model (DSM). With minimizing the multi-label graph cut energy function as the optimization objective, the module performs global optimization segmentation on each two-dimensional building region to be processed, generating a refined two-dimensional building unit region. The regularization module is used to perform contour vectorization and geometric regularization processing on each of the two-dimensional building unit areas, and output regular building vector boundaries that conform to the specifications.