An aviation laser point cloud semi-supervised semantic segmentation method based on a hybrid structure

By employing a semi-supervised semantic segmentation method for airborne laser point clouds with a hybrid structure, and combining multiple loss functions and pseudo-label generation techniques, the problems of multi-scale and insufficient structural information in point cloud segmentation are solved, achieving high-precision and robust semantic segmentation results while reducing annotation costs.

CN118015279BActive Publication Date: 2026-06-16CHONGQING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHONGQING UNIV
Filing Date
2024-03-05
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies for semi-supervised semantic segmentation of airborne laser point clouds do not fully consider the multi-scale and structural information of point clouds, resulting in insufficient segmentation accuracy and robustness.

Method used

A semi-supervised semantic segmentation method for airborne laser point clouds with a hybrid structure is proposed. This method combines basic loss, global perception loss, multi-scale geometric similarity measurement loss, and multi-level fusion pseudo-label loss. By aligning the global centers of the student model and the teacher model, it extracts and constrains the local structural information of the point cloud, generates high-quality pseudo-labels, and utilizes unlabeled data for semi-supervised learning.

🎯Benefits of technology

In situations where labeled samples are scarce, this method significantly improves the accuracy and robustness of semantic segmentation, reduces labeling costs, and achieves more refined semantic segmentation results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118015279B_ABST
    Figure CN118015279B_ABST
Patent Text Reader

Abstract

The application provides an aviation laser point cloud semi-supervised semantic segmentation method based on a hybrid structure, which comprises the following steps: S1, performing partial labeling on collected target region point clouds; S2, inputting the partially labeled point cloud data and unlabeled point cloud data into a student model and a teacher model of a hybrid structure constraint network constructed in advance to obtain output features of the student model and the teacher model; S3, inputting labeled samples and unlabeled samples in the output features into a basic loss function, a global perception loss function, a multi-scale geometric structure similarity loss function and a multi-level fusion pseudo label loss function model to obtain output probabilities; S4, setting loss weights, constructing a hybrid loss function to optimize the training of the network, performing back propagation by using a stochastic gradient descent method, dynamically adjusting model parameters and obtaining robust segmentation results. The method and device can effectively utilize deep features of unlabeled samples, promote information transmission between labeled samples and unlabeled samples and realize more fine semantic segmentation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer vision technology and relates to a semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure. Background Technology

[0002] With the continuous advancement of sensor and communication technologies, Earth observation technologies are developing towards acquiring higher resolution and more diverse data types. Among these, Airborne Laser Scanning (ALS) technology has become one of the most promising technologies due to its unique advantages. ALS-generated point cloud data can meticulously reveal the three-dimensional information of object surfaces, providing strong support for the accurate description of geospatial and ground features, and is widely used in urban planning, environmental monitoring, forestry management, and geological exploration. In these applications, semantic segmentation of ALS point clouds is particularly important.

[0003] In recent years, deep learning has made significant progress in many fields and has also been introduced into the intelligent processing of point cloud data. Deep segmentation methods for point clouds can be summarized into three main paradigms: projection-based, voxel-based, and point-based methods. However, most of these methods rely on a large number of accurately labeled training samples, which is typically tedious and difficult. To reduce the cost of sample labeling, researchers have begun to explore semi-supervised learning methods (SSL), utilizing unlabeled samples to supplement limited labeled data.

[0004] SSL methods can generally be divided into two main categories: pseudo-label-based and consistency-based. Pseudo-label-based SSL utilizes unlabeled data to generate pseudo-labels for model training and iteratively optimizes these pseudo-labels. For example, to address post-processing and label assignment issues in semantic segmentation, some researchers have proposed using dense pseudo-labels for model training. Furthermore, some researchers treat unreliable pseudo-labels as negative classification samples or use contrastive learning and self-training strategies to improve pseudo-label quality. These methods also have important applications in point cloud data, such as combining temporal matching and spatial graph propagation techniques to generate pseudo-labels for unlabeled points, or employing uncertainty-aware pseudo-label generation methods to eliminate noise. However, pseudo-label-based methods may introduce a large number of erroneous pseudo-labels, leading to noise accumulation during training. In contrast, consistency-based methods maintain the consistency of model predictions under specific perturbations, thereby mitigating the interference of erroneous labels. For example, some researchers have proposed perturbation self-distillation frameworks and hybrid contrastive regularization networks. These methods have also achieved good results in point cloud segmentation, but may be too lenient or restrictive, ignoring the structural information of the point cloud. Furthermore, traditional pseudo-label-based methods often ignore the scale effect of point clouds to generate unreliable pseudo-labels, which limits the usability of the generated labels and degrades network performance. Therefore, achieving accurate semantic segmentation in ALS point clouds still faces many challenges.

[0005] In summary, existing technologies for semi-supervised semantic segmentation of airborne laser point clouds do not fully consider the multi-scale and structural information of point clouds. Therefore, this paper proposes a method to improve the accuracy and robustness of ALS point cloud semantic segmentation through multiple constraints based on multi-scale and structural information, which has practical application value. Summary of the Invention

[0006] In view of this, the purpose of this invention is to provide a semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure. The base loss (BL) ensures the basic performance of the model on labeled data. The global awareness loss (GAL) enhances the overall consistency of features by aligning the global centers of the student and teacher models, thereby improving the model's ability to perceive the global structure. Furthermore, to capture the geometric structure information of point cloud data at different scales, a multi-scale geometric structure similarity metric loss (MLS) is introduced. MLS is used to extract and constrain the consistency of local structural information of point clouds at different scales, further enriching the model's structural representation capabilities. To fully utilize unlabeled data, this invention also designs a multilevel fused pseudo-label loss (PLS). The multilevel fused pseudo-label loss utilizes pseudo-labels generated by integrating feature information from different levels to provide supervision for unlabeled data.

[0007] To achieve the above objectives, the present invention provides the following technical solution:

[0008] A semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure (HSCN) is characterized by the following steps: S1: Partially labeling the collected target region point cloud; S2: Inputting the partially labeled and unlabeled point cloud data into a pre-constructed hybrid structure constraint network's student and teacher models to obtain the output features of the student and teacher models, respectively; S3: Inputting the labeled and unlabeled samples from the output features into a base loss function, a global perception loss function, a multi-scale geometric structure similarity loss function, and a multi-level fusion pseudo-label loss function model, respectively, to obtain the output probabilities; S4: Setting loss weights, constructing a hybrid loss function to optimize network training, using stochastic gradient descent for backpropagation, dynamically adjusting model parameters, and obtaining robust segmentation results. The method described in this invention can effectively utilize the deep features of unlabeled samples, promote information transfer between labeled and unlabeled samples, and achieve more refined semantic segmentation.

[0009] Furthermore, in step S1, the software cloucompare is used to partially annotate the point cloud data of the acquired target.

[0010] Furthermore, in step S2, the partially labeled point cloud data and the unlabeled point cloud data are input into the pre-constructed hybrid structure constraint network student model and teacher model to obtain the output features of the student model and teacher model, respectively.

[0011] Furthermore, in step S3, the labeled and unlabeled samples in the output features are input into the base loss function, the global perception loss function, the multi-scale geometric structure similarity loss function, and the multi-level fusion pseudo-label loss function model, respectively, to obtain the output probability. Specifically, given a set of labeled training points... and a set of unlabeled training points ,in and These represent the total number of sample points and the number of labeled sample points, respectively. and These represent the number of marked points and the number of unmarked points, respectively. For the first The labels corresponding to each training sample. Semi-supervised segmentation utilizes both a large number of unlabeled points and a limited number of labeled points to learn the functional mapping relationship between them: For example, the number of markers is set to... These labeled points were randomly selected from the entire dataset. This setup simulates a scenario where labeled data is scarce. The model uses a large amount of unlabeled data to aid training. Within the HSCN framework, KPConv is chosen as the backbone, where the mathematical expression for KPConv is as follows:

[0012]

[0013] in, For a single point in a point cloud, for The neighborhood set, Define a fixed radius. , For neighboring points, For neighborhood points eigenvectors, For kernel function, For a trainable weight matrix, Let x be the distance between x and xi. Represents the linear dependence function. For uniform sampling points in spherical space, To control the parameters of the core point distance, Represents the convolution operation, n kThis indicates the number of cores set in the KPConv operation.

[0014] Furthermore, in step S3, to ensure the model's accuracy and generalization ability on labeled data, two types of losses are considered: supervised loss (SL) for labeled samples and point-wise loss (PWL) commonly used in ordinary student-teacher networks. In the initial stage of training, the focus is on training the network using labeled samples. To ensure the network learns effectively from these samples, cross-entropy loss and Dice coefficient loss are used as supervised losses. Cross-entropy loss helps optimize the model's classification performance, while Dice coefficient loss focuses more on improving the accuracy of segmentation regions. The supervised loss is used to constrain the model's classification results on labeled data to be consistent with the labels. The point-wise loss is used to constrain the consistency of the student and teacher models' predictions on unlabeled data. The cross-entropy loss and Dice coefficient loss are shown below:

[0015]

[0016]

[0017] Where C represents the total number of categories. This represents the true label value of the i-th labeled sample belonging to the c-th class. Let be the predicted probability value of the student model for the i-th labeled sample belonging to the c-th class. The supervised loss of HSCN can be expressed as follows:

[0018]

[0019] In semi-supervised learning, to consider the consistency of unlabeled samples, a common point-wise loss (PWL) is constructed in the teacher-student network. In this loss function, to preserve as much information as possible from the student model's training, an information entropy constraint is introduced into the basic loss to improve the student model's performance on unlabeled data and generate a more consistent probability distribution. Furthermore, to reduce the prediction discrepancy at each point in the teacher-student model, the mean squared error between student and teacher predictions in each mini-batch is considered. Therefore, the calculation method of the PWL loss function is as follows:

[0020]

[0021] in, Let be the predicted probability value of the student model for the i-th unlabeled sample belonging to the c-th class. Let be the predicted probability vector of the student model for the i-th unlabeled sample. Let be the predicted probability vector of the teacher model for the i-th unlabeled sample. The formula for calculating the basic loss function is shown below:

[0022]

[0023] in and These are two hyperparameters.

[0024] Furthermore, in step S3, to address the issue that the basic loss only considers the single-point consistency between the predictions of the student and teacher models, neglecting point-to-point relationships, a global perception loss based on unlabeled points is proposed. This loss constrains both networks to maintain a consistent global distribution by minimizing the Euclidean distance between the weighted global averages of the student and teacher network features. The objective of this loss is to minimize the squared Euclidean distance between the weighted global averages of the teacher and student network features. Global Perception Loss The calculation formula is as follows:

[0025]

[0026] in, and are the predicted probability values ​​of the student model and the teacher model for the i-th unlabeled sample, respectively. and Let be the prediction weights of the student model and the teacher model for the i-th unlabeled sample, respectively, with symbols _____. This represents the Hadama product.

[0027] Furthermore, in step S3, to constrain the consistency of the local geometric structure of the point cloud at different scales, a multi-scale geometric similarity loss (MLS) is proposed. First, the features of each decoding layer after recovering the number of point clouds through near-neighbor interpolation are uniformly mapped to 128 dimensions using MLP. Second, geometric structure features are extracted at each decoding layer using KPConv (with radii set to 9 meters, 4.5 meters, 2.25 meters, and 1.125 meters, respectively), corresponding to the scale. Then, a set of anchor points is obtained using voxel mesh sampling. and for each anchor point A subgroup is constructed by selecting 40 neighboring points from the original point set P using K-nearest neighbor search. After calculating the average vector of point features within each subgroup, a self-similarity matrix between subgroups is constructed using cosine similarity. MLS obtains this matrix by constraining the consistency of the self-similarity matrices between the student model and the teacher model at each decoding layer and summing the differences between layers, as shown in the following formula:

[0028]

[0029] in, Indicates the first One decoding layer, Anchor points obtained through voxel mesh sampling Quantity, and The student model and the teacher model were respectively in the first... Under each decoding layer, with anchor points The average feature vector of a group of neighboring points centered on the anchor point, the group of neighboring points being composed of pairs of anchor points. The system consists of 40 points obtained through a K-nearest neighbor search.

[0030] Furthermore, in step S3, to generate high-quality pseudo-labels and provide more accurate supervision information for the training of the student model, the designed pseudo-label generator comprises two core components: a Multilevel Fusion Module (MLF) and a Pseudo-label Loss Function (PLS). The Multilevel Fusion Module effectively integrates multi-layer semantic features from the teacher network, achieving a more comprehensive and accurate expression of semantic information, thereby improving the quality of pseudo-label generation. The Pseudo-label Loss Function further ensures the consistency between the generated pseudo-labels and the student model output, enhancing the reliability of the pseudo-labels and their guiding role in network training. The formula for the Multilevel Fusion Module (MLF) is shown below:

[0031]

[0032] in express function, Indicates average pooling. This represents the MLP operation. In this way, MLF achieves effective fusion and compression of multi-scale information in the teacher model, providing richer and more accurate feature representations for subsequent model training.

[0033] Then, the compressed features are concatenated to fuse information from different decoding layers, as shown in the following formula:

[0034]

[0035] in This indicates a splicing operation. This represents the results of the teacher model. These represent the output features of the decoding layers 1-4 of the teacher network. The weighted features of each layer are compressed to 16 dimensions using the MLF formula, and then the fused 64-dimensional features are restored to 128 dimensions using the same formula. Based on the compressed features, pseudo-labels are generated. The formula for generating the formula is shown below:

[0036]

[0037] Based on the generated pseudo-labels, the difference in class mean probabilities between the student model and the teacher model is introduced. This aims to reduce sensitivity to mislabels, enhance the model's robustness, and thus achieve more accurate segmentation performance. The pseudo-label loss function PLS is formulated as follows:

[0038]

[0039] in The number of pseudo-tags. For the first Number of samples with pseudo-labels This represents the first output from the MLF module of the teacher model. In the pseudo-tags of the class The probability of a class sample. Output the corresponding student model.

[0040] Furthermore, in step S4, to train a stable network model, all proposed losses are combined with different weights to optimize the network parameters. Total Loss The formula is shown below:

[0041]

[0042] in , and These are three hyperparameters. Network training is divided into two phases: a stable training phase with supervised loss and a joint training phase with consistency loss and pseudo-label loss. In the experiments, both phases were trained in the same epoch. This enhancement strategy aims to improve the stability and efficiency of model training while utilizing unlabeled data to enhance the model's generalization. In the experiments, [the following data is used]. and Set to the same value, use a Gaussian curve Definition. Wherein It increases linearly from 0 to 1 during stable training. and It is set to zero during stable training and to 0.2 during joint training.

[0043] The beneficial effects of this invention are as follows:

[0044] The global perception loss proposed in this invention significantly enhances the model's ability to capture global feature consistency. Furthermore, this invention proposes a multi-scale geometric similarity metric loss, aiming to achieve feature alignment between point cloud geometries at different scales. To fully utilize information at each level, this invention designs a multi-layer fusion pseudo-label generator, which reliably generates pseudo-labels, thereby assisting the model in effectively learning diverse features from unlabeled samples. This invention proposes a hybrid-structure-based semi-supervised semantic segmentation method for airborne laser point clouds, HSCN. This network focuses on semi-supervised learning with limited labeled samples, exhibiting strong representational capabilities for point clouds with a small number of labeled samples, and significantly reducing the cost of point cloud data labeling. Experimental results on three airborne laser scanning point cloud datasets show that, under semi-supervised conditions with only 0.1% labeled samples, the HSCN method demonstrates superior performance in airborne laser point cloud semantic segmentation, outperforming other current state-of-the-art methods.

[0045] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description

[0046] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:

[0047] Figure 1 This is a flowchart of the method of the present invention;

[0048] Figure 2 Hybrid Structure Graph (HSCN) for semantic segmentation of airborne laser point clouds;

[0049] Figure 3 For multi-level fusion module (MLF);

[0050] Figure 4 The experimental results of the HSCN network of this invention are shown in the figure. Detailed Implementation

[0051] The technical solution of the present invention will now be described in detail with reference to the accompanying drawings.

[0052] Figure 1 This is a flowchart of the method of the present invention. The present invention provides a semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure. As shown in the figure, in the point cloud annotation stage, the CloudCompare software is used to annotate the point cloud data of the target region. The deep learning network used for semantic segmentation is as follows: Figure 2As shown, it can learn representative ground feature characteristics from partially labeled aerial laser point clouds. The network consists of a student-teacher kernel convolutional module (KPConv), a basic loss function (BL), a global perception loss function (GAL), a multi-scale geometric similarity metric loss function (MLS), and a multi-level fusion pseudo-label loss function (PLS). First, the global perception loss (GAL) enhances the overall consistency of features by aligning the global centers of the student and teacher models, thereby improving the model's ability to perceive global structure. Furthermore, to capture the geometric structure information of point cloud data at different scales, the multi-scale geometric similarity metric loss (MLS) effectively extracts and aligns the local structural information of point clouds at different scales, further enriching the model's structural representation capabilities. To fully utilize unlabeled data, the multi-level fusion pseudo-label loss (PLS) generates high-quality pseudo-labels by integrating feature information from different levels.

[0053] Specifically, the technical solution of the present invention includes the following:

[0054] 1. Data annotation: The point cloud data of the acquired target was partially annotated using the software cloucompare.

[0055] 2. Input the partially labeled point cloud data and the unlabeled point cloud data into the pre-constructed hybrid structure constraint network student model and teacher model to obtain the output features of the student model and teacher model respectively.

[0056] 3. Input the labeled and unlabeled samples from the output features into the base loss function, global perception loss function, multi-scale geometric structure similarity loss function, and multi-level fusion pseudo-label loss function model, respectively, to obtain the output probability. Specifically, given a set of labeled training points... and a set of unlabeled training points ,in and These represent the total number of sample points and the number of labeled sample points, respectively. and These represent the number of marked points and the number of unmarked points, respectively. For the first The labels corresponding to each training sample. Semi-supervised segmentation utilizes both a large number of unlabeled points and a limited number of labeled points to learn the functional mapping relationship between them: For example, the number of markers is set to... These labeled points were randomly selected from the entire dataset. This setup simulates a scenario where labeled data is scarce. The model uses a large amount of unlabeled data to aid training. Within the HSCN framework, KPConv is chosen as the backbone, where the mathematical expression for KPConv is as follows:

[0057]

[0058] in, For a single point in a point cloud, for The neighborhood set, Define a fixed radius. , For neighboring points, For neighborhood points eigenvectors, For kernel function, For a trainable weight matrix, Let x be the distance between x and xi. Represents the linear dependence function. For uniform sampling points in spherical space, To control the parameters of the core point distance, Represents the convolution operation, n k This indicates the number of cores set in the KPConv operation.

[0059] 4. To ensure the model's accuracy and generalization ability on labeled data, two types of loss were considered: supervised loss (SL) for labeled samples and point-wise loss (PWL) commonly used in ordinary student-teacher networks, such as... Figure 2 As shown. In the initial stage of the training process, the focus is on training the network using labeled samples. To ensure the network learns effectively from these samples, cross-entropy loss and Dice coefficient loss are used as supervised losses. Cross-entropy loss helps optimize the model's classification performance, while Dice coefficient loss focuses more on improving the accuracy of segmentation regions. The supervised loss is used to constrain the model's classification results on labeled data to be consistent with the labels. Pointwise loss is used to constrain the consistency of the student and teacher models' predictions on unlabeled data. The cross-entropy loss and Dice coefficient loss are shown below:

[0060]

[0061]

[0062] in, Indicates the first The first in the category The true label of each sample It is the first The first in the category The predicted value for each sample. and This represents the number of labeled samples and categories. Therefore, the supervision loss of HSCN can be expressed as follows:

[0063]

[0064] In semi-supervised learning, to consider the consistency of unlabeled samples, a common point-wise loss (PWL) is constructed in the teacher-student network. In this loss function, to preserve as much information as possible from the student model's training, an information entropy constraint is introduced into the basic loss to improve the student model's performance on unlabeled data and generate a more consistent probability distribution. Furthermore, to reduce the prediction discrepancy at each point in the teacher-student model, the mean squared error between student and teacher predictions in each mini-batch is considered. Therefore, the calculation method of the PWL loss function is as follows:

[0065]

[0066] in, and These are the predicted values ​​for the student model and the teacher model, respectively. Finally, the formula for calculating the basic loss function is shown below:

[0067]

[0068] in and These are two hyperparameters.

[0069] 5. To address the issue that the basic loss only considers the single-point consistency between the predictions of the student and teacher models, neglecting the point-to-point relationships, a global perception loss based on unlabeled points is proposed, such as... Figure 2 As shown. This loss constrains the consistency of the global distribution of the student and teacher networks by minimizing the Euclidean distance between their weighted global features. The objective of this loss is to minimize the squared Euclidean distance between the weighted global features of the teacher and student networks. The weights of unlabeled points are used to control for differences and diversity in the prediction results. Defined as , where the symbol It is the product of Hadamah. This represents the predicted probability of unlabeled points. Furthermore, since backpropagation is only implemented in the student model, the global entropy of the student model's predictions is also considered in the global perception loss to ensure the stability of network training and obtain a more robust segmentation model. Global Perception Loss The calculation formula is as follows:

[0070]

[0071] in, and These are the prediction results of the student model and the teacher model for the unlabeled samples, respectively.

[0072] 6. To fully utilize the neighborhood structure of ALS point clouds and capture their inherent geometric and semantic features, a multi-scale geometric similarity loss (MLS) is proposed, such as... Figure 2 As shown, to constrain the consistency of local geometric structure in point clouds across different scales, a multi-scale geometric similarity loss (MLS) is proposed. First, the features of each decoding layer, after recovering the number of point clouds through near-neighbor interpolation, are uniformly mapped to 128 dimensions using MLP. Second, geometric structure features are extracted at each decoding layer using KPConv (radii set to 9m, 4.5m, 2.25m, and 1.125m, respectively), corresponding to the scale. Then, a voxel grid sampling method is used to obtain the anchor point set. and for each anchor point A subgroup is constructed by selecting 40 neighboring points from the original point set P using K-nearest neighbor search. After calculating the average vector of point features within each subgroup, a self-similarity matrix between subgroups is constructed using cosine similarity. MLS obtains this matrix by constraining the consistency of the self-similarity matrices between the student model and the teacher model at each decoding layer and summing the differences between layers, as shown in the following formula:

[0073]

[0074] in, Indicates the first One decoding layer, Anchor points obtained through voxel mesh sampling Quantity, and The student model and the teacher model were respectively in the first... Under each decoding layer, with anchor points The average feature vector of a group of neighboring points centered on the anchor point, the group of neighboring points being composed of pairs of anchor points. The system consists of 40 points obtained through a K-nearest neighbor search.

[0075] 7. To generate high-quality pseudo-labels and provide more accurate supervision information for training student models, the designed pseudo-label generator comprises two core components: a Multilevel Fusion Module (MLF) and a Pseudo-label Loss Function (PLS). The Multilevel Fusion Module effectively integrates multi-layer semantic features from the teacher network, achieving a more comprehensive and accurate representation of semantic information, thereby improving the quality of pseudo-label generation. Figure 3As shown. The pseudo-label loss function further ensures the consistency between the generated pseudo-labels and the student model output, enhancing the reliability of the pseudo-labels and their guiding role in network training. The MLF formula for the multi-level fusion module is shown below:

[0076]

[0077] in express function, Indicates average pooling. This represents the MLP operation. In this way, MLF achieves effective fusion and compression of multi-scale information in the teacher model, providing richer and more accurate feature representations for subsequent model training.

[0078] Then, the compressed features are concatenated to fuse information from different decoding layers, as shown in the following formula:

[0079]

[0080] in This indicates a splicing operation. This represents the results of the teacher model. These represent the output features of the decoding layers 1-4 of the teacher network. The weighted features of each layer are compressed to 16 dimensions using the MLF formula, and then the fused 64-dimensional features are restored to 128 dimensions using the same formula. Based on the compressed features, pseudo-labels are generated. The formula for generating the formula is shown below:

[0081]

[0082] Based on the generated pseudo-labels, the difference in class mean probabilities between the student model and the teacher model is introduced. This aims to reduce sensitivity to mislabels, enhance the model's robustness, and thus achieve more accurate segmentation performance. The pseudo-label loss function PLS is formulated as follows:

[0083]

[0084] in The number of pseudo-tags. For the first Number of samples with pseudo-labels This represents the first output from the MLF module of the teacher model. In the pseudo-tags of the class The probability of a class sample. Output the corresponding student model.

[0085] 8. To train a stable network model, all proposed losses are combined with different weights to optimize the network parameters. Total Loss The formula is shown below:

[0086]

[0087] in , and These are three hyperparameters. Network training is divided into two phases: a stable training phase with supervised loss and a joint training phase with consistency loss and pseudo-label loss. In the experiments, both phases were trained in the same epoch. This enhancement strategy aims to improve the stability and efficiency of model training while utilizing unlabeled data to enhance the model's generalization. In the experiments, [the following data is used]. and Set to the same value, use a Gaussian curve Definition. Wherein It increases linearly from 0 to 1 during stable training. and It is set to zero during stable training and to 0.2 during joint training.

[0088] like Figure 4 The experimental results of the HSCN semantic segmentation network described in this invention are shown on an open-source airborne city point cloud dataset, LASDU. To verify the generality of the proposed HSCN method, it was compared with some state-of-the-art fully supervised and semi-supervised methods on the LASDU dataset, and the comparison results are shown in Table 1. In the MLP-based methods (with 0.1% of labeled samples), Xu et al.'s method did not achieve ideal results, while the PSD method showed relatively good performance. For the KPConv-based methods, the original KPConv did not perform well. However, by introducing consistency loss into KPConv, the MT method improved OA and Avg.F1 by 4.1% and 4.4%, respectively. Furthermore, compared with the best MLP-based method, PSD, the MT method showed significant advantages in both OA and Avg.F1.

[0089] Table 1. Comparison of HSCN and other methods on the LASDU dataset.

[0090] In the table, red text represents the optimal result, and green text represents the second-best result.

[0091] Nevertheless, the MT method still has certain limitations. For example, it fails to fully consider the geometric structure information of the point cloud during processing, which may lead to errors. In contrast, the HSCN method exhibits the best performance, especially in key categories such as ground, buildings, trees, and man-made objects, where its performance is 1.6%, 0.1%, 3.9%, and 1.4% higher than the MT method, respectively. This advantage is mainly attributed to the HSCN method's comprehensive consideration of multi-scale and structural information in describing the characteristics of ALS point cloud data.

[0092] With a labeled sample ratio of 1%, the HSCN method significantly outperforms the PSD and KPConv methods, and even rivals the state-of-the-art fully supervised method RRDAN. This result strongly demonstrates that the HSCN method possesses excellent robustness and accuracy, maintaining superior performance across different labeled sample ratios.

[0093] Furthermore, as shown in Table 1, the HSCN method outperforms MLP-based fully supervised methods such as PointNet++, PointCNN, PointConv, and DGCNN in Avg.F1. When the labeled sample rate is 0.1%, the HSCN method also outperforms MLP-based methods such as PointNet++, PointCNN, PointConv, DGCNN, ShellNet, and PosPool in Avg.F1. Moreover, the HSCN method achieves 96% of the OA and Avg.F1 of KPConv-based fully supervised methods such as RRDAN and MCFN.

[0094] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications should be covered within the scope of the claims of the present invention.

Claims

1. A semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure, characterized in that: The method includes the following steps: S1: Partially label the point cloud of the target area; S2: Input the partially labeled point cloud data and the unlabeled point cloud data into the pre-constructed hybrid structure network student model and teacher model to obtain the output features of the student model and teacher model respectively; the hybrid structure network uses KPConv as the backbone network; S3: Input the labeled and unlabeled samples in the output features into the basic loss function, global perception loss function, multi-scale geometric structure similarity loss function and multi-level fusion pseudo-label loss function model respectively to obtain the output probability; The basic loss function is a weighted sum of the supervised loss of labeled samples and the pointwise loss commonly used in ordinary student-teacher networks; The global perception loss based on unlabeled points ensures that the student and teacher networks maintain the same global distribution. The goal of this loss is to minimize the squared Euclidean distance between the weighted global averages of the features of the teacher and student networks. The weights of the unlabeled points are used to control the differences and diversity in the prediction results. Defined as , where the symbol It is the product of Hadamah. It is the predicted probability of unlabeled points; global perception loss. The calculation formula is as follows: in, and These are the prediction results of the student model and the teacher model for the unlabeled samples, respectively, where N and M are the total number of sample points and the number of labeled sample points, respectively; The multi-scale geometric similarity loss function recovers the point cloud to its original size through nearest-neighbor interpolation, and uses a multilayer perceptron to uniformly map the features of each layer to a 128-dimensional space. Next, a kernel-point convolution KPConv operation matching the radius of the decoding layer is introduced to adapt to the multi-scale characteristics of the point cloud. To further aggregate global neighborhood information, a voxel grid sampling algorithm is used to obtain the anchor point set. Subsequently, the K-nearest neighbor algorithm is applied to each anchor point from the original point set P. Select 40 neighboring points to construct a specific subgroup When extracting adjacency structure information within subgroups, the average value of point features within each subgroup is calculated to capture overall characteristics. Cosine similarity is used to measure the self-similarity matrix between different subgroups, exploring the consistency differences between the student model and the teacher model in the self-similarity matrix. The formula for the multi-scale geometric structure similarity measurement loss function is shown below: The multi-level fusion pseudo-label loss function comprises two core components: a multi-level fusion module and a pseudo-label loss function; the formula for the multi-level fusion module is shown below: in express function, Indicates average pooling. Indicates multilayer sensor operation; Then, the compressed features are concatenated to fuse information from different decoding layers, as shown in the following formula: in This indicates a splicing operation. This represents the results of the teacher model. These represent the output features of the decoding layers 1-4 of the teacher network, respectively. The weighted features of each layer are compressed to 16 dimensions according to the multi-level fusion module formula. Then, the fused 64-dimensional features are restored to 128 dimensions according to the above formula. Based on the compressed features, pseudo-labels are generated. The formula for generating the formula is shown below: The formula for the pseudo-label loss function is shown below: in The number of pseudo-tags. For the first Number of samples with pseudo-labels This represents the first output from the MLF module of the teacher model. In the pseudo-tags of the class The probability of a class sample. Output the corresponding student model; S4: Set loss weights, construct a hybrid loss function to optimize network training, use stochastic gradient descent for backpropagation, dynamically adjust model parameters, and obtain robust segmentation results.

2. The semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure according to claim 1, characterized in that: In step S3, the labeled and unlabeled samples in the output features are input into the base loss function, the global perception loss function, the multi-scale geometric structure similarity loss function, and the multi-level fusion pseudo-label loss function model, respectively, to obtain the output probability. Specifically, given a set of labeled training points... and a set of unlabeled training points Where N and M (N >> M) are the number of population sample points and the number of labeled sample points, respectively. and Y represents the number of marked points and the number of unmarked points, respectively. i l For the label corresponding to the i-th training sample, semi-supervised segmentation simultaneously utilizes a large number of unlabeled points and a limited number of labeled points to learn the functional mapping relationship between them: The number of markers was set to M = 0.1% × N. These markers were randomly selected from the entire dataset. In the HSCN framework, KPConv was chosen as the backbone, and the mathematical expression for KPConv is shown below: Where, N x Let x be the neighborhood set, where For a fixed radius, N x ={x i ∈P∣||x i -x||≤r}, for each neighborhood point x i This chapter uses Let κ(l) represent its feature vector, where N is the total number of point clouds and D is the dimension of the feature. i ) is the kernel function, W k Let l be a trainable weight matrix. i It is a relative position vector. These are uniform sampling points in spherical space.

3. The semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure according to claim 2, characterized in that: In step S3, to ensure the model's accuracy and generalization ability on labeled data, two types of loss are considered: the supervised loss SL for labeled samples and the pointwise loss PWL commonly used in ordinary student-teacher networks. Cross-entropy loss and Dice coefficient loss are used as supervised losses, as shown below: in, This represents the true label of the i-th sample in the c-th category. Let be the predicted value of the i-th sample in the c-th category, and M and C represent the number of labeled samples and the number of categories, respectively. Therefore, the supervision loss of HSCN can be expressed as follows: The PWL loss function is calculated as follows: Where, p t and p′ t The predicted values ​​are for the student model and the teacher model, respectively. The formula for calculating the basic loss function is shown below: Where λ s and λ P These are two hyperparameters.

4. The semi-supervised semantic segmentation method for airborne laser point clouds based on a hybrid structure according to claim 3, characterized in that: In step S4, to train a stable network model, all proposed losses are combined with different weights to optimize the network parameters; the total loss... The formula is shown below: in , and There are three hyperparameters; network training is divided into two stages: a stable training stage with supervised loss and a joint training stage with consistency loss and pseudo-label loss; in the experiment, the same round was used to train both stages. This enhancement strategy aims to improve the stability and efficiency of model training while leveraging unlabeled data to enhance the model's generalization; in the experiment, and Set to the same value, use a Gaussian curve Definition; where It increases linearly from 0 to 1 during stable training; while and It is set to zero during stable training and to 0.2 during joint training.