Building extraction method, system, medium and equipment based on spatial profile sampling guidance of airborne lidar point cloud
By using a spatial profile sampling-guided method combined with KNN and self-attention mechanisms to expand the network's receptive field, the efficiency and accuracy issues of building extraction in large-scale airborne point clouds are solved, achieving efficient and high-precision building extraction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 四川省冶金地质勘查局测绘工程大队
- Filing Date
- 2025-07-10
- Publication Date
- 2026-06-23
AI Technical Summary
Existing deep learning methods suffer from low computational efficiency and insufficient accuracy when processing large-scale airborne urban building point cloud data, especially in complex urban environments where it is difficult to extract high-density buildings efficiently and with high precision.
A spatial profile sampling-guided approach is adopted. By training a segmentation network model, combining the KNN algorithm and self-attention mechanism for feature encoding, the receptive field of the network is expanded, local fusion attention is introduced, point cloud features are enhanced and compressed, and the nearest neighbor interpolation method is used to restore the point cloud scale, thereby achieving efficient extraction of buildings.
It significantly improves the classification accuracy and efficiency of buildings in large-scale airborne point clouds, solves the problem of confusion between buildings and surrounding ground features, and reduces time complexity.
Smart Images

Figure CN120876944B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the technical field of three-dimensional point cloud segmentation, specifically relating to a method, system, medium, and equipment for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling guidance. Background Technology
[0002] With the development of remote sensing technology, airborne radar (Lidar) systems have been widely used in various fields such as geographic information acquisition, urban planning, and environmental monitoring. Airborne radar point cloud data, as an important source of high-precision three-dimensional spatial information, can provide large-scale, high-resolution data on ground and building morphology, thus possessing significant application value in urban modeling, building classification, and disaster monitoring. However, how to efficiently process these massive point cloud datasets, especially accurately classifying buildings in complex urban environments, remains a challenging research problem.
[0003] Traditional building classification methods primarily rely on rule-based algorithms, such as those based on geometric shape and texture information. While these methods perform well in some simple scenarios, they tend to suffer from low classification accuracy and slow processing speed in complex environments due to the diversity of building shapes and the noise in point cloud data. Furthermore, these traditional methods often require manual rule setting and consume enormous computational resources when processing large-scale point cloud data, resulting in low efficiency. In recent years, with the rapid development of deep learning technology, automated classification methods based on deep learning have gradually become a research hotspot in point cloud data processing. Deep learning methods can be trained on large amounts of labeled data to automatically learn effective feature representations, thereby improving classification accuracy. However, the application of current deep learning-based point cloud classification methods in high-density airborne urban building point cloud data still faces the following challenges:
[0004] (1) Large data scale: Airborne radar point cloud data usually contains millions to hundreds of millions of points. In the process of processing, how to efficiently store and compute this large-scale data has become a major bottleneck for deep learning applications.
[0005] (2) Sparsity and irregularity of point cloud data: Unlike image data, point cloud data is sparse and irregularly distributed, which makes it difficult for traditional convolutional neural networks (CNNs) to be directly applied to the processing of point cloud data. Although some point cloud-based deep learning frameworks (such as PointNet, PointNet++, etc.) have made some progress in recent years, they still suffer from insufficient accuracy and low computational efficiency in building classification tasks in complex urban environments;
[0006] (3) Diversity and complexity of high-density building extraction: The forms of buildings in cities vary greatly, including high-rise buildings, low-rise buildings, bridges and other structures. How to identify and extract these high-density buildings from large-scale point cloud data is a challenging problem. Existing deep learning methods often struggle to achieve both high efficiency and high accuracy when dealing with complex forms and large-scale data. Summary of the Invention
[0007] The purpose of this invention is to provide a method, system, medium, and equipment for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling guidance, with the aim of accurately extracting buildings from large-scale airborne radar point clouds.
[0008] This invention is mainly achieved through the following technical solutions:
[0009] A method for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling includes the following steps:
[0010] Step S1: Acquire airborne radar point cloud and perform data preprocessing;
[0011] Step S2: Train the segmentation network model, and based on the trained segmentation network model, perform 3D point cloud segmentation and extract buildings;
[0012] Step S21: Read the data stream;
[0013] Step S22: In the coding layer, random sampling and feature aggregation are performed to compress the point cloud size and enhance the point cloud features;
[0014] Step T1: During feature aggregation, for each point, the nearest neighbor is searched based on the KNN algorithm. Then, spatial location encoding and feature encoding based on the self-attention mechanism are performed sequentially. Finally, the enhanced feature F1 is obtained through attention pooling.
[0015] Step T2: During feature aggregation, for each point, the nearest neighbor point is searched based on the spatial profile algorithm. Then, spatial location encoding and feature encoding based on the self-attention mechanism are performed in sequence. Finally, the enhanced feature F2 is obtained through attention pooling.
[0016] Step T3: Perform feature fusion on the enhanced features F1 and F2; compress the point cloud size to 1 / 256 of the original size, and enhance the point features;
[0017] Step S23: In the decoding layer, the compressed point cloud is upsampled and skipped using the nearest neighbor interpolation method to restore the point cloud scale, and finally each point is classified.
[0018] To better implement this invention, further, in step T2, the search for nearest neighbor points based on the spatial profile algorithm includes the following steps:
[0019] First, project the point cloud along the x-axis and y-axis respectively, sort it, and record the sorting position relationship;
[0020] Based on the sorting position, k nearest neighbor indexes are selected by sliding for each point. The nearest neighbors of the center point's x-coordinate and y-coordinate become the nearest neighbors based on the relationship of the projected X and Y coordinates, respectively. The nearest neighbors are searched based on the projected X and Y coordinates. Then, the original point cloud order is mapped back through the inverse index.
[0021] Finally, the neighbor indices of the two axes are concatenated into a 32-dimensional feature, forming a cross-axial local neighborhood description for each point.
[0022] To better realize the present invention, furthermore, in steps T1 and T2, spatial location encoding is implemented based on an MLP network, and the formula is:
[0023]
[0024] Where: p i Let be the position coordinates of the i-th point.
[0025] Let the coordinates be the coordinates of the neighboring points of the i-th point;
[0026] is the Euclidean distance between the center point and its neighboring points;
[0027] The formula for feature encoding based on the self-attention mechanism is:
[0028]
[0029] Where: Y i This represents the features after attention aggregation;
[0030] x i Represents the feature of the i-th point in the point cloud;
[0031] The features of the neighborhood points of the i-th point in the point cloud;
[0032] β is an MLP layer that maps the features of neighboring points;
[0033] γ, ω, For trainable transformation values,
[0034] δ represents the positional encoding, and the specific encoding formula is as follows:
[0035] δ=MLP(P i -Pj );
[0036] Where, p j Let j be the coordinates of point j.
[0037] To better implement the present invention, further, in steps T1 and T2, the spatial location encoding and the feature encoding based on the self-attention mechanism are concatenated; then, attention pooling is performed on the point cloud after feature aggregation.
[0038] To better implement the present invention, step S1 further includes the following steps:
[0039] Step S11: Obtain the original 3D point cloud data, and perform data filtering and noise reduction processing based on CloudCompare to obtain the dataset;
[0040] Step S12: Divide the dataset into a training set, a validation set, and a test set;
[0041] Step S13: Perform grid downsampling on the training set and validation set respectively;
[0042] Step A1: First, downsample at a distance of 0.01 and save the point cloud as a ply1 file;
[0043] Step A2: Then, downsample at a distance of 0.06 and save the point cloud as a ply2 file;
[0044] Step A3: Build a KD tree based on the downsampled points in step A2 and save it as a kdtree file;
[0045] Using the generated KD tree, find the index of the nearest neighbor of the point cloud in step A2 to the point in step A1, and save it as a proj file.
[0046] To better implement the present invention, further, in step S21, reading the data stream includes the following steps:
[0047] Step B1: Load the ply2 file, kdtree file, and proj file from the training set;
[0048] Step B2: Read the point cloud and randomly generate a probability value for each point. Using the point with the lowest probability value as the center, search for the nearest n points as the point data to be input into the model.
[0049] Step B3: Then, calculate the weight δ i And update the probability values of the n points selected this time to the original probability values plus the weight δ. i ;
[0050]
[0051] Where: δ i Here is the weight coefficient corresponding to the i-th point;
[0052] w is the global weight coefficient;
[0053] d i Let i be the Euclidean distance from the i-th point to the reference point;
[0054] d max It is the maximum value among all distances.
[0055] This invention is mainly achieved through the following technical solutions:
[0056] The airborne LiDAR point cloud building extraction system based on spatial profile sampling is based on the aforementioned airborne LiDAR point cloud building extraction method, and includes a data acquisition and processing module, a training module, and an extraction module. The data acquisition and processing module is used to acquire 3D point cloud data and perform preprocessing. The training module is used to train a segmentation network model using the preprocessed data. The extraction module is used to perform 3D point cloud segmentation and extract buildings based on the trained segmentation network model.
[0057] The segmentation network model includes an encoder and a decoder, each with several encoding and decoding layers. Each encoding layer includes a feature aggregation module, which comprises a local spatial encoding module, a spatial profile sampling feature aggregation module, and a feature fusion module arranged in parallel. The local spatial encoding module includes, from front to back, a KNN search module, a spatial location encoding module, an attention mechanism feature encoding module, a feature concatenation layer, and an attention pooling layer. The spatial profile sampling feature aggregation module includes, from front to back, a spatial profile search module, a spatial location encoding module, an attention mechanism feature encoding module, a feature concatenation layer, and an attention pooling layer.
[0058] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described method for extracting airborne LiDAR point cloud buildings based on spatial profile sampling guidance.
[0059] An electronic device is characterized by comprising a memory and a processor; the memory stores a computer program; the processor is configured to execute the computer program in the memory to implement the above-described airborne LiDAR point cloud building extraction method based on spatial profile sampling guidance.
[0060] The beneficial effects of this invention are as follows:
[0061] The main reason for the difficulty in extracting high-density buildings from airborne point clouds is the excessively high point cloud density. Furthermore, modern urban scenes are complex, with diverse building shapes and overlapping or adhering structures between buildings and other ground features. To address this challenge, this invention improves upon the self-attention mechanism by introducing a spatial profile sampling strategy to expand the network's receptive field and focus on global features. While balancing global information with local features, local fusion attention is introduced, effectively solving the problem of large-scale confusion between buildings and their surrounding ground features. Simultaneously, the expanded receptive field and the effective aggregation of features through local fusion attention also resolve the issue of blurred building edges caused by noise. This invention significantly improves the classification accuracy of buildings in large-scale airborne point clouds with low time complexity, achieving high classification accuracy and efficiency. Attached Figure Description
[0062] Figure 1 This is a flowchart of the airborne LiDAR point cloud building extraction method based on spatial profile sampling guided by the present invention;
[0063] Figure 2 This is a flowchart of sample annotation in Example 1;
[0064] Figure 3 This is a schematic diagram of the segmentation network model.
[0065] Figure 4 This is a block diagram illustrating the principle of the feature aggregation module.
[0066] Figure 5 This is a point cloud diagram of the two buildings in Example 1;
[0067] Figure 6 This is a schematic diagram of the X-coordinate projection distribution in Example 1;
[0068] Figure 7 This is a schematic diagram of the Y-coordinate projection distribution in Example 1;
[0069] Figure 8 This is a schematic diagram of building extraction before and after adding spatial profile sampling guidance in Example 1;
[0070] Figure 9 The present invention is a schematic diagram of an airborne LiDAR point cloud building extraction system guided by spatial profile sampling. Detailed Implementation
[0071] Example 1:
[0072] A method for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling, such as Figure 1 As shown, the specific steps include:
[0073] Step S1: Acquire airborne radar point cloud and perform data preprocessing;
[0074] Step S11: Use CloudCompare to acquire 3D point cloud data and create a dataset;
[0075] First, import the original 3D point cloud data (such as LAS / LAZ format) into CloudCompare to ensure that the point cloud data can be loaded correctly, and check whether the data contains complete coordinates (X, Y, Z) and possible other attributes (e.g., reflectance intensity, classification label, etc.).
[0076] Then, sample labeling of the dataset is performed. For example... Figure 2 As shown, during the annotation process, the dataset orientation is continuously rotated to ensure that each point is accurately located in each category. The mouse wheel can be used to zoom in and out of the data display interface, and clicking the interface while holding down the left mouse button and dragging the mouse rotates the orientation. Points around the dataset whose categories are indistinguishable are noise points and can be ignored. Setting Colors to RGB will restore the original colors of the dataset. All annotations will disappear when CloudCompare is closed, so the annotated data must be saved locally.
[0077] Step S12: Data filtering and noise reduction;
[0078] Use CloudCompare's filtering features (such as Statistical Outlier Removal (SOR) or Voxel Grid Filter) to denoise point cloud data, removing outliers or duplicate points. If the number of data points is too large, consider using downsampling techniques, such as voxel grid filtering, to simplify the point cloud data.
[0079] Step S13: Divide the dataset into training, validation, and test sets. Perform data preprocessing on the divided point cloud data. The test set contains no label files, while the training and validation sets contain label files.
[0080] The dataset is divided into three parts: 70%-80% as the training set, 10%-15% as the validation set, and 10%-15% as the test set.
[0081] Step S14: Perform raster downsampling on the training set and validation set respectively.
[0082] (1) Sampling is performed once at a distance of 0.01, and the point cloud is saved as a ply1 file. The purpose is to reduce the resolution of the data, thereby reducing the storage space occupied and improving the processing efficiency while maintaining the data characteristics.
[0083] (2) Downsample at a distance of 0.06 and save the point cloud information as a ply2 file. The purpose is to uniformly select the points in the point cloud that participate in network training.
[0084] (3) Build a KD tree based on the downsampled points in (2).
[0085] In the subsequent point cloud semantic segmentation network, the N nearest points around each point are searched as input samples. The KD tree file is built to facilitate this operation.
[0086] (4) Using the generated KD tree, find the index of the nearest neighbor of the point in the point cloud in (2) to the point in (1), and save it as a proj file.
[0087] Since the amount of raw point cloud data collected by the device is very large, it is usually reduced by downsampling without affecting the shape features of the point cloud. Therefore, it is necessary to restore the original point cloud after semantic segmentation of the downsampled point cloud. This step requires a proj file.
[0088] Step S2: Read the data stream, train the segmentation network model, and perform 3D point cloud segmentation based on the trained segmentation network model to extract buildings;
[0089] Step S21: Read the data stream;
[0090] The process of reading a data stream includes initialization, data loading, a data reading generator, preprocessing operations, data augmentation, and data stream initialization.
[0091] (1) Load the downsampled ply2 point cloud file, KD tree file and proj file from the training set to facilitate reading and writing operations on the dataset in subsequent steps.
[0092] (2) Start reading the point cloud according to the set number of points. First, generate a probability value for each point randomly, find the point with the lowest probability value, and search for the nearest n (preset) points with this point as the center, as the data of the points input to the model once.
[0093] (3) Then, the weights of these n points are calculated as follows:
[0094]
[0095] Where δ i The weight coefficient corresponding to the i-th point is...
[0096] w is the global weight coefficient.
[0097] d i Let be the Euclidean distance from the i-th point to the reference point.
[0098] d max It is the maximum value among all distances.
[0099] Subsequently, the probability values of the selected n points are updated by adding a weight δ to the original probability value of each point. i This is to avoid selecting the same point again when selecting points to send into the network next time.
[0100] (4) Perform preprocessing operations, data augmentation, data stream initialization, etc.
[0101] Step S22: In the coding layer, random sampling and feature aggregation are performed to compress the point cloud size and enhance the point cloud features;
[0102] Step T1: During feature aggregation, for each point, the nearest neighbor is searched based on the KNN algorithm. Then, spatial location encoding and feature encoding based on the self-attention mechanism are performed sequentially. Finally, the enhanced feature F1 is obtained through attention pooling.
[0103] For each point, integrate the spatial and feature information of its surrounding K points.
[0104] The first step is to find neighboring points by using the KNN algorithm to find the K nearest neighbors of a given point (including the given point).
[0105] The second step is spatial location encoding (Relative Point Encoding), which is implemented through a shared MLP network, as shown in the following formula:
[0106]
[0107] Where: p i Let be the position coordinates of the i-th point.
[0108] Let the coordinates be the coordinates of the neighboring points of the i-th point;
[0109] This is the subtraction operation between the coordinates of the center point of the i-th point and the coordinates of its neighboring points;
[0110] is the Euclidean distance between the center point and its neighboring points.
[0111] The third step is to perform feature aggregation on the point features based on a self-attention mechanism, as shown in the following formula:
[0112]
[0113] Where: Y iThis represents the features after self-attention aggregation.
[0114] x i This represents the feature of the i-th point in the point cloud.
[0115] γ, ω, For trainable transformation values,
[0116] δ represents the positional encoding, and the specific encoding formula is as follows:
[0117] δ=MLP(P i -P j );
[0118] Where, p j Let j be the coordinates of point j.
[0119] The spatially encoded features are concatenated with the self-attention aggregated features to obtain the local spatial encoded features.
[0120] Finally, attention pooling is performed on the point cloud after feature aggregation.
[0121] Step T2: During feature aggregation, for each point, the nearest neighbor point is searched based on the spatial profile algorithm. Then, spatial location encoding and feature encoding based on the self-attention mechanism are performed sequentially. Finally, the enhanced feature F2 is obtained through attention pooling.
[0122] (1) When the neighborhood coverage area is too small, the aggregated features are limited, and expanding the KNN search range will cause the network time complexity to increase sharply. Therefore, the spatial profile algorithm is used to search for nearest neighbors in order to expand the network receptive field.
[0123] like Figure 5 As shown, taking the point cloud of two buildings as an example, after selecting a certain center point, the ordinary KNN algorithm can only select the K nearest neighbors of the center point as the support of this point to aggregate features. For example, when the center point is the roof of the building, the nearest neighbors found by the KNN algorithm will all be points near the roof. This is more suitable for local fine features, but for global information, the receptive field is not large enough.
[0124] Spatial profile-based sampling can effectively capture global information that the KNN nearest neighbor search method overlooks. For a selected center point, the spatial profile algorithm can be used to select points over a larger area, thus increasing the network's receptive field. The specific algorithm is as follows:
[0125] First, project the point cloud along the x-axis and y-axis respectively, sort it, and record the sorting position relationship;
[0126] Subsequently, based on the sorted position, k neighboring point indices are selected for each point (16 for each of the x and y axes), such as... Figure 6 and Figure 7 As shown, the nearest neighbors of the center point at the x-coordinate and the nearest neighbors of the center point at the y-coordinate have changed from the nearest neighbors in terms of spatial location to the nearest neighbors in terms of the relationship between the X and Y coordinates. Therefore, when the projection of the X and Y coordinates is used to search for the nearest neighbors, the distribution of these nearest neighbors will be spread throughout the point cloud. Then, the original point cloud order is mapped back through the inverse index.
[0127] Finally, the neighbor indices of the two axes are concatenated into a 32-dimensional feature, forming a cross-axial local neighborhood description for each point. This invention, through sampling based on spatial profiles, quickly constructs approximate spatial neighborhood relationships with linear complexity, providing the network with efficient local context capture capabilities, balancing computational efficiency and neighborhood coverage scenarios.
[0128] (2) The only difference between this step and the local spatial encoding in step T1 is that spatial profile sampling is used instead of KNN to find nearest neighbors to expand the receptive field. Similarly, this module will perform spatial location encoding and self-attention aggregation of features, and finally concatenate the spatial location encoded features with the self-attention aggregated features.
[0129] (3) Attention pooling is performed on the point cloud after feature aggregation.
[0130] Attention pooling serves to aggregate neighborhood features. For both steps T1 and T2, attention pooling is required after feature aggregation.
[0131] First, the ComputeAttention Scores are calculated. The feature vectors encoded in the local space are used as input. For each feature vector, the MLP network is fed in, and then the weights corresponding to the feature vector f are calculated using softmax. Next, the weights are summed by multiplying the calculated weights by the enhanced feature vectors. This summation of the K weighted feature vectors yields a new feature vector F.
[0132] Step T3: Perform feature fusion on the enhanced features F1 and F2;
[0133] To expand the receptive field, local spatial encoding and attention pooling are performed twice within an expanded residual block. Simultaneously, borrowing from the residual structure of networks like ResNet, a short-cut operation is used to sum the input features and the enhanced features. This aims to increase the receptive field, accelerate learning, and improve accuracy.
[0134] Step S23: In the decoding layer, the compressed point cloud is upsampled and skipped using the nearest neighbor interpolation method to restore the point cloud scale, and finally each point is classified.
[0135] Specifically, the point cloud is upsampled and compressed, and multi-scale feature fusion is performed to restore the original point cloud scale. Finally, each point is classified.
[0136] Taking the point cloud data from the publicly available VAIHINGEN Dataset as an example, the model was trained using the same training data, and the resulting model accuracy is shown in Table 1. The comparison shows that the method proposed in this invention can effectively improve the segmentation accuracy of building point clouds. Figure 8 As shown, (a) is a schematic diagram of building extraction without the addition of the spatial profile sampling feature aggregation module; (b) is a schematic diagram of building extraction in this embodiment. The comparison reveals that after adding the spatial profile sampling feature aggregation module, the adhesion phenomenon of high-density building groups is significantly improved, and the impact of noise in the point cloud on the classification results is also reduced. This invention has achieved significant progress.
[0137] Table 1
[0138]
[0139] Example 2:
[0140] An airborne LiDAR point cloud building extraction system based on spatial profile sampling guidance, such as Figure 9 As shown, it includes a data acquisition and processing module, a training module, and an extraction module; the data acquisition and processing module is used to acquire 3D point cloud data and perform preprocessing; the training module is used to train a segmentation network model using the preprocessed data; the extraction module is used to perform 3D point cloud segmentation and extract buildings based on the trained segmentation network model.
[0141] like Figure 3 and Figure 4 As shown, N represents the number of point clouds, and D represents the feature dimension of the point clouds. The segmentation network model includes an encoder and a decoder, each with several encoding and decoding layers. The encoding layer includes a feature aggregation module, which comprises a local spatial encoding module, a spatial profile sampling feature aggregation module, and a feature fusion module arranged in parallel. The local spatial encoding module includes, from front to back, a KNN search module, a spatial location encoding module, an attention mechanism feature encoding module, a feature concatenation layer, and an attention pooling layer. The spatial profile sampling feature aggregation module includes, from front to back, a spatial profile search module, a spatial location encoding module, an attention mechanism feature encoding module, a feature concatenation layer, and an attention pooling layer.
[0142] like Figure 4As shown, the feature aggregation module searches for nearest neighbors in two ways: one is the conventional KNN search, which encodes features through spatial location encoding and self-attention mechanism, and then obtains enhanced features through attention pooling; the other is to use a spatial profile algorithm to search for nearest neighbors, which also encodes features through spatial location encoding and self-attention mechanism, and then obtains enhanced features through attention pooling. The enhanced features obtained by the two methods are fused to obtain the output of the feature fusion module.
[0143] The segmentation network of this invention uses the coordinates and attribute features of points as input. In the decoding stage, there are four encoding layers. Each encoding layer has a feature aggregation module and a random downsampling operation. The feature aggregation module is used to increase the feature dimension of the points, and random downsampling is used to reduce the number of points. Only 1 / 4 of the points are retained in each encoding layer, while the feature dimension of each point continuously increases.
[0144] In the decoding stage, nearest neighbor interpolation is used to upsample the points. After upsampling, an MLP network is used to reduce the feature dimension of the points. At the same time, skip connections are used, and the decoded features are fed into a multi-scale feature fusion module to stack the features from the encoding stage. Finally, the semantic category of each point is predicted through three fully connected layers plus a dropout layer.
[0145] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any simple modifications or equivalent changes made to the above embodiments based on the technical essence of the present invention shall fall within the protection scope of the present invention.
Claims
1. A method for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling guidance, characterized in that, Includes the following steps: Step S1: Acquire airborne radar point cloud and perform data preprocessing; Step S2: Train the segmentation network model, and based on the trained segmentation network model, perform 3D point cloud segmentation and extract buildings; Step S21: Read the data stream; Step S22: In the coding layer, random sampling and feature aggregation are performed to compress the point cloud size and enhance the point cloud features; Step T1: During feature aggregation, for each point, the nearest neighbor is searched based on the KNN algorithm. Then, spatial location encoding and feature encoding based on the self-attention mechanism are performed sequentially. Finally, the enhanced feature F1 is obtained through attention pooling. Step T2: During feature aggregation, for each point, the nearest neighbor point is searched based on the spatial profile algorithm. Then, spatial location encoding and feature encoding based on the self-attention mechanism are performed in sequence. Finally, the enhanced feature F2 is obtained through attention pooling. In step T2, the nearest neighbor point is searched based on the spatial profile algorithm, including the following steps: First, project the point cloud along the x-axis and y-axis respectively, sort it, and record the sorting position relationship; Based on the sorting position, k nearest neighbor indexes are selected by sliding for each point. The nearest neighbors of the center point's x-coordinate and y-coordinate become the nearest neighbors based on the relationship of the projected X and Y coordinates, respectively. The nearest neighbors are searched based on the projected X and Y coordinates. Then, the original point cloud order is mapped back through the inverse index. Finally, the neighbor indices of the two axes are concatenated into a 32-dimensional feature, forming a cross-axial local neighborhood description for each point; Step T3: Perform feature fusion on the enhanced features F1 and F2; compress the point cloud size to 1 / 256 of the original size, and enhance the point features; Step S23: In the decoding layer, the compressed point cloud is upsampled and skipped using the nearest neighbor interpolation method to restore the point cloud scale, and finally each point is classified.
2. The method for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling guidance according to claim 1, characterized in that, In steps T1 and T2, spatial location encoding is implemented based on an MLP network, and the formula is as follows: ; in: p i For the first i The position coordinates of each point For the first i The coordinates of the neighboring points of each point; is the Euclidean distance between the center point and its neighboring points; The formula for feature encoding based on the self-attention mechanism is: ; in: Y i This represents the features after attention aggregation; x i Represents the feature of the i-th point in the point cloud; The features of the neighborhood points of the i-th point in the point cloud; β An MLP layer that maps features of neighboring points; γ, ω, φ are trainable transformation values. δ For positional encoding, the specific encoding formula is as follows: ; in, P j Let j be the coordinates of point j.
3. The airborne LiDAR point cloud building extraction method based on spatial profile sampling guidance according to claim 2, characterized in that, In steps T1 and T2, the spatial location encoding and the feature encoding based on the self-attention mechanism are concatenated; then, attention pooling is performed on the point cloud after feature aggregation.
4. The method for extracting buildings from airborne LiDAR point clouds based on spatial profile sampling guidance according to claim 1, characterized in that, Step S1 includes the following steps: Step S11: Obtain the original 3D point cloud data, and perform data filtering and noise reduction processing based on CloudCompare to obtain the dataset; Step S12: Divide the dataset into a training set, a validation set, and a test set; Step S13: Perform grid downsampling on the training set and validation set respectively; Step A1: First, downsample at a distance of 0.01 and save the point cloud as a ply1 file; Step A2: Then, downsample at a distance of 0.06 and save the point cloud as a ply2 file; Step A3: Build a KD tree based on the downsampled points in step A2 and save it as a kdtree file; Using the generated KD tree, find the index of the nearest neighbor of the point cloud in step A2 to the point in step A1, and save it as a proj file.
5. The airborne LiDAR point cloud building extraction method based on spatial profile sampling guidance according to claim 4, characterized in that, In step S21, reading the data stream includes the following steps: Step B1: Load the ply2 file, kdtree file, and proj file from the training set; Step B2: Read the point cloud and randomly generate a probability value for each point. Using the point with the lowest probability value as the center, search for the nearest n points as the point data to be input into the model. Step B3: Then, calculate the weights. δ i And update the probability values of the selected n points to the original probability values plus the weights. δ i ; ; in: δ i Here is the weight coefficient corresponding to the i-th point; w These are the global weight coefficients; d i Let i be the Euclidean distance from the i-th point to the reference point; d max It is the maximum value among all distances.
6. An airborne LiDAR point cloud building extraction system guided by spatial profile sampling, comprising the airborne LiDAR point cloud building extraction method guided by spatial profile sampling as described in any one of claims 1-5, characterized in that, It includes a data acquisition and processing module, a training module, and an extraction module; the data acquisition and processing module is used to acquire 3D point cloud data and perform preprocessing; the training module is used to train a segmentation network model using the preprocessed data; the extraction module is used to perform 3D point cloud segmentation and extract buildings based on the trained segmentation network model; The segmentation network model includes an encoder and a decoder, each with several encoding and decoding layers. The encoding layer includes a feature aggregation module, which comprises a local spatial encoding module, a spatial profile sampling feature aggregation module, and a feature fusion module arranged in parallel. The local spatial encoding module includes, from front to back, a KNN search module, a spatial location encoding module, an attention mechanism feature encoding module, a feature concatenation layer, and an attention pooling layer. The spatial profile sampling feature aggregation module includes, from front to back, a spatial profile search module, a spatial location encoding module, an attention mechanism feature encoding module, a feature concatenation layer, and an attention pooling layer. The spatial profile search module is used to search for nearest neighbors based on a spatial profile algorithm.
7. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by the processor, the program implements the airborne LiDAR point cloud building extraction method based on spatial profile sampling guidance as described in any one of claims 1-5.
8. An electronic device, characterized in that, It includes a memory and a processor; the memory stores a computer program; the processor is used to execute the computer program in the memory to implement the airborne LiDAR point cloud building extraction method based on spatial profile sampling guidance as described in any one of claims 1-5.