A Multispectral Point Cloud Classification Method Based on Spatial-Spectral Self-Supervised Pre-training
By employing self-supervised distance matrix and graph modeling techniques, the general feature learning problem of pre-trained models in multispectral remote sensing scenarios was solved, achieving efficient classification of multispectral point cloud data and improving the model's generalization and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- KUNMING UNIV OF SCI & TECH
- Filing Date
- 2025-10-23
- Publication Date
- 2026-06-30
AI Technical Summary
Existing self-supervised pre-training methods for point clouds cannot be effectively transferred to multispectral remote sensing scenarios, mainly because the diversity and complexity of ground objects in remote sensing scenarios hinder the pre-trained model from learning general features, and ignore the spatial-spectral consistency of multispectral point cloud data.
The self-supervised distance matrix of multispectral point cloud voxel blocks in 3D Euclidean space and spectral space is extracted by custom rules and used as a self-supervised signal. The spatial features of the voxel blocks are extracted as residual modules, and the implicit topological relationships between voxel blocks are learned by graph modeling. Combined with global and local feature extraction, a pre-trained model is loaded and a learnable classification head is added for training.
It significantly improves the pre-trained model's ability to capture general features of multispectral remote sensing point clouds, enhances the model's generalization and robustness, reduces dependence on labeled data, and improves the performance of classification tasks.
Smart Images

Figure CN121190883B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a multispectral point cloud classification method based on spatial-spectral self-supervised pre-training, belonging to the field of multispectral lidar point cloud processing technology. Background Technology
[0002] In the field of lidar point cloud processing technology, multispectral point cloud data provides accurate 3D spatial information along with corresponding spectral information. Its spatial-spectral consistency makes it widely used in remote sensing. Furthermore, with the development of deep learning technology, the inherent spatial-spectral consistency of multispectral point cloud data provides models with dual constraints in both the geometric and spectral domains. However, deep learning models require a large number of precisely labeled data for training, and the multi-spectral information of multispectral point cloud data, coupled with the massive data volume in large-scale remote sensing scenarios, makes its practical application extremely costly. To alleviate this problem, researchers have introduced self-supervised pre-training of point clouds. By performing self-supervised training on benchmark datasets, the model learns generalizable features, reducing its dependence on training labels and thus lowering costs.
[0003] However, existing self-supervised pre-training methods for point clouds still cannot be directly transferred to remote sensing scenes with multispectral point clouds, mainly due to two factors. First, although existing self-supervised pre-training methods for point clouds have achieved excellent results on indoor point cloud data, the diversity and complexity of ground objects in remote sensing scenes hinder the pre-trained model from learning general features. Most importantly, existing self-supervised pre-training methods for point clouds mostly focus on learning a general representation in three-dimensional Euclidean space, neglecting the inherent spatial-spectral consistency of multispectral point cloud data. Therefore, how to improve the pre-trained model's ability to capture general features of multispectral remote sensing point clouds, effectively utilize the spatial-spectral consistency of multispectral point cloud data, and enhance the generalization and robustness of the pre-trained model are urgent technical problems to be solved. Summary of the Invention
[0004] The technical problem this invention addresses is to propose a multispectral point cloud classification method based on spatial-spectral self-supervised pre-training. This method aims to improve the pre-trained model's ability to capture information about common features of multispectral point cloud features in complex remote sensing scenarios, effectively utilizing the spatial-spectral consistency features of multispectral point cloud data. This provides an effective and targeted self-supervised pre-training paradigm for multispectral point cloud classification tasks in remote sensing scenarios. By simultaneously learning the implicit topological relationships between multispectral point cloud voxel blocks in 3D Euclidean space and spectral space, the method enhances the pre-trained model's understanding of complex multispectral remote sensing point cloud scenarios and improves its generalization and robustness.
[0005] The technical solution adopted in this invention is: a multispectral point cloud classification method based on spatial-spectral self-supervised pre-training. This method uses custom rules to preprocess and obtain the distance categories of multispectral point cloud voxel blocks in the three-dimensional Euclidean space domain and the spectral domain, serving as self-supervised signals for subsequent distance reconstruction. Spatial features of the voxel blocks are extracted separately and used as residual modules of the voxel spatial features during pre-training, and residual connections are made with voxel-level features to prevent feature forgetting. Then, using multispectral point cloud data as input, the inter-block distances of voxel blocks in the three-dimensional Euclidean space and spectral space are reconstructed through global and local feature extraction operations and graph modeling, respectively. The pre-trained model is loaded, and a learnable classification head is added for further training on the training set. Finally, a multispectral point cloud classification model is obtained, which takes multispectral point cloud data as input and outputs the multispectral point cloud classification results. The specific steps are as follows:
[0006] Step 1: Voxelize the multispectral point cloud sample to obtain voxel blocks, and preprocess the voxel blocks to obtain the self-supervised distance matrix between voxel blocks in three-dimensional Euclidean space and spectral space.
[0007] Step 2: Extract the voxel block spatial features of the voxelized multispectral point cloud samples;
[0008] Step 3: Extract the global and local features of the voxelized multispectral point cloud samples, perform feature pooling to obtain point-level feature representations, map the point-level feature representations to voxels, and perform residual connection with the voxel block spatial features to obtain point cloud voxel-level features.
[0009] Step 4: Based on the voxel-level features of the point cloud, the implicit topological relationships of voxel blocks in three-dimensional Euclidean space and spectral space are modeled using graph structure, and the implicit topological relationships between voxel blocks in three-dimensional Euclidean space and spectral space are learned using the self-supervised distance matrix constraint model to obtain the pre-trained model.
[0010] Step 5: Based on the pre-trained model, a learnable classification head is introduced for training to obtain a multispectral point cloud classification model, which is then used to classify multispectral point clouds.
[0011] Optionally, the expression for the voxelized multispectral point cloud sample is:
[0012]
[0013]
[0014]
[0015]
[0016] in, It is the calculated voxel block index. It is the number of partitions during voxelization. It is the mapping relationship between multispectral point clouds and voxel blocks. These are the minimum values of the three-dimensional spatial features of the multispectral point cloud, respectively, obtained through traversal calculations. It is the spatial three-dimensional feature of current multispectral point clouds.
[0017] Optionally, the preprocessing of the voxel blocks to obtain the self-supervised distance matrix between voxel blocks in three-dimensional Euclidean space and spectral space specifically involves:
[0018] Each voxel block obtained from the partitioning is used as a node in the graph, and the adjacency matrix in three-dimensional Euclidean space and spectral space is obtained through graph modeling.
[0019] In the three-dimensional Euclidean space, two voxel blocks are simultaneously magnified by a preset factor, and the adjacency matrix is obtained by judging whether they overlap.
[0020] In the spectral space, the adjacency matrix is obtained by defining the following expression:
[0021]
[0022] In the formula, These are adjacency matrix elements. It is the intersection of the color histograms of two voxel blocks. These are color histograms in two voxel spectral spaces, respectively. It is the intersection threshold of the color histograms. When the intersection of the color histograms is greater than 0.5, it is determined that the two nodes are connected, thus obtaining the adjacency matrix.
[0023] Based on the obtained adjacency matrix, the shortest path length between nodes in the three-dimensional Euclidean space and the spectral space are calculated respectively, and the shortest path length is used as the self-supervised distance matrix.
[0024] Optionally, Step 2 specifically includes:
[0025] The spatial information of the voxelized multispectral point cloud sample is extracted and denoted as . , It is a 3D spatial feature of multispectral point cloud. Based on the mapping from points to voxels, the spatial information of voxel blocks is calculated, and the spatial features of voxel blocks are extracted using a multilayer perceptron. The expression is:
[0026]
[0027]
[0028] in, This means that the point cloud features in each voxel block are averaged and pooled into voxel features. This indicates the number of point clouds contained in the current voxel block. For the point cloud belonging to the current voxel block, This is the feature set of the point cloud under the current voxel block. This represents the feature representation of the point cloud under the current voxel block. The number of feature channels, Represented as a transpose matrix, It is a feature mapped to a high-dimensional voxel block space. It is a multilayer perceptron.
[0029] Optionally, Step 3 specifically includes:
[0030] Step 3.1: Extract the global and local features of the voxelized multispectral point cloud samples. The expression is:
[0031]
[0032]
[0033]
[0034] in, To query the current point Feature representation of neighboring points, It is the point cloud feature representation of the current query. K-Nearest Neighbors Algorithm This represents the high-dimensional feature mapping results of global and local features for each point cloud center point. For feature concatenation operations, finally use Feature pooling is performed to obtain point-level feature representations. ;
[0035] Step 3.2: Utilizing the mapping relationship between multispectral point clouds and voxel blocks Representing point-level features Map back to voxels and use average pooling to process them into voxel feature representations, then combine them with voxel block space features. Residual connections are performed to enhance the features of voxel blocks in 3D Euclidean space, ultimately yielding voxel-level features of the point cloud. ,in, The number of voxel blocks obtained from the division.
[0036] Optionally, Step 4 specifically includes:
[0037] Step 4.1: A dual-path structure is used to reconstruct the distance information of the graph structure in both spectral space and three-dimensional Euclidean space. The expression for graph modeling is:
[0038]
[0039] in, To calculate the edge weight between the two nodes, The features of the current two nodes obtained through traversal;
[0040] Step 4.2: In the spectral space and the three-dimensional Euclidean space, a multi-head graph attention mechanism is used to learn voxel-level features of the point cloud, and the distance category is reconstructed using the features. Cross-entropy loss is calculated with the self-supervised distance matrix to constrain the model to learn the implicit topological relationships in the spectral space and the three-dimensional Euclidean space at the same time.
[0041] Step 4.3: After completing the distance reconstruction, the voxel-level features of the point cloud obtained after the dual-path structure constraint are summed and depooled into point-level features, which are used as input features for the next layer iteration. After multiple iterations, the pre-trained model is obtained.
[0042] The beneficial effects of this invention are as follows: This invention uses self-supervised spatial-spectral distance reconstruction as the core supervisory signal to guide the network to simultaneously learn the topological relationships of voxel blocks in 3D Euclidean space and spectral space, significantly improving multi-scale semantic discrimination in complex remote sensing scenes; it separates and models spatial geometry and spectral attributes using a dual-branch architecture, and preserves fine-grained local geometry through voxel space feature residual channels, enabling stable recognition of both small targets (such as vehicles and overhead lines) and large-scale structures; it uses the shortest path length as a proxy task and introduces adjacency constraints with multi-head graph attention and Gaussian weights, taking into account both nearest-neighbor details and long-range context, improving the distinction between boundary segmentation and cross-class similar objects; it replaces pure supervised training with self-supervised pre-training and fine-tuning paradigms, maintaining high accuracy even with a significant reduction in labeled samples, reducing reliance on and cost of manual labeling; and it uses voxel similarity defined by the intersection of spectral color histograms as a spectral prior, fully exploring the spatial-spectral consistency of multispectral point clouds, and improving the robustness and transferability of feature representations. Attached Figure Description
[0043] Figure 1 This is a flowchart of the steps of the present invention;
[0044] Figure 2 This is a visualization of the Harbor of Tobermory Dataset, a multispectral point cloud remote sensing dataset used in this invention.
[0045] Figure 3 This is a visualization of the results of the present invention and the comparison method in achieving multispectral point cloud classification. Detailed Implementation
[0046] To enable those skilled in the art to better understand the present invention, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.
[0047] Example 1: As Figure 1 As shown, a multispectral point cloud classification method based on spatial-spectral self-supervised pre-training is presented, with the following specific steps:
[0048] Step 1: Voxelize the multispectral point cloud sample to obtain voxel blocks, and preprocess the voxel blocks to obtain the self-supervised distance matrix between voxel blocks in three-dimensional Euclidean space and spectral space.
[0049] Optionally, the expression for the voxelized multispectral point cloud sample is:
[0050]
[0051]
[0052]
[0053]
[0054] in, It is the calculated voxel block index. This refers to the number of voxel divisions during voxelization. In this embodiment, the number is 5, meaning each input multispectral point cloud sample is divided into 125 voxel blocks. It is the mapping relationship between multispectral point clouds and voxel blocks. These are the minimum values of the three-dimensional spatial features of the multispectral point cloud, respectively, obtained through traversal calculations. It is the spatial three-dimensional feature of current multispectral point clouds.
[0055] Optionally, the preprocessing of the voxel blocks to obtain the self-supervised distance matrix between voxel blocks in three-dimensional Euclidean space and spectral space specifically involves:
[0056] Each voxel block obtained from the partitioning is used as a node in the graph, and the adjacency matrix in three-dimensional Euclidean space and spectral space is obtained through graph modeling.
[0057] In this embodiment, two voxel blocks are simultaneously magnified by a preset factor in three-dimensional Euclidean space, and the adjacency matrix is obtained by determining whether they overlap. In this embodiment, the preset factor is 1.2 times.
[0058] In the spectral space, the adjacency matrix is obtained by defining the following expression:
[0059]
[0060] In the formula, These are adjacency matrix elements. It is the intersection of the color histograms of two voxel blocks. These are color histograms in two voxel spectral spaces, respectively. It is the intersection threshold of the color histograms. When the intersection of the color histograms is greater than 0.5, it is determined that the two nodes are connected, thus obtaining the adjacency matrix.
[0061] Based on the obtained adjacency matrix, the shortest path length between nodes in the 3D Euclidean space and spectral space is calculated using the shortest path algorithm. In this embodiment, to constrain the maximum upper limit of the path length and to unify the distance categories to be reconstructed subsequently, the shortest path length is defined as the number of partitions plus 2. Therefore, this embodiment uses 5 + 2 = 7 as the maximum shortest path length. The calculated shortest path lengths are saved as follows: and Two matrices, used as self-supervised distance matrices for subsequent distance reconstruction.
[0062] Step 2: Extract the voxel block spatial features of the voxelized multispectral point cloud samples;
[0063] Optionally, the spatial information of the voxelized multispectral point cloud sample is extracted, denoted as... , It is a 3D spatial feature of multispectral point cloud. Based on the mapping from points to voxels, the spatial information of voxel blocks is calculated, and the spatial features of voxel blocks are extracted using a multilayer perceptron. The expression is:
[0064]
[0065]
[0066] in, This means that the point cloud features in each voxel block are averaged and pooled into voxel features. This indicates the number of point clouds contained in the current voxel block. For the point cloud belonging to the current voxel block, This is the feature set of the point cloud under the current voxel block. This represents the feature representation of the point cloud under the current voxel block. The number of feature channels is 3 in this embodiment. Represented as a transpose matrix, It is a feature mapped to a high-dimensional voxel block space. It is a multilayer perceptron.
[0067] Step 3: Extract the global and local features of the voxelized multispectral point cloud samples, perform feature pooling to obtain point-level feature representations, map the point-level feature representations to voxels, and perform residual connection with the voxel block spatial features to obtain point cloud voxel-level features.
[0068] Step 3.1: Extract the global and local features of the voxelized multispectral point cloud samples. The expression is:
[0069]
[0070]
[0071]
[0072] in, To query the current point Feature representation of neighboring points, It is the point cloud feature representation of the current query. K-Nearest Neighbors Algorithm This represents the high-dimensional feature mapping results of global and local features for each point cloud center point. For feature concatenation operations, finally use Feature pooling is performed to obtain point-level feature representations. ;
[0073] Step 3.2: Utilizing the mapping relationship between multispectral point clouds and voxel blocks Representing point-level features Map back to voxels and use average pooling to process them into voxel feature representations, then combine them with voxel block space features. Residual connections are performed to enhance the features of voxel blocks in 3D Euclidean space, ultimately yielding voxel-level features of the point cloud. ,in, The number of voxel blocks obtained from the division.
[0074] Step 4: Based on the voxel-level features of the point cloud, the implicit topological relationships of voxel blocks in three-dimensional Euclidean space and spectral space are modeled using graph structure, and the implicit topological relationships between voxel blocks in three-dimensional Euclidean space and spectral space are learned using the self-supervised distance matrix constraint model to obtain the pre-trained model.
[0075] Step 4.1: A dual-path structure is used to reconstruct the distance information of the graph structure in both spectral space and three-dimensional Euclidean space. The expression for graph modeling is:
[0076]
[0077] in, To calculate the edge weight between the two nodes, The features of the current two nodes obtained through traversal;
[0078] Step 4.2: In the spectral space and the three-dimensional Euclidean space, a multi-head graph attention mechanism is used to learn voxel-level features of the point cloud, and the distance category is reconstructed using the features. Cross-entropy loss is calculated with the self-supervised distance matrix to constrain the model to learn the implicit topological relationships in the spectral space and the three-dimensional Euclidean space at the same time.
[0079] Specifically, this embodiment takes distance reconstruction in spectral space as an example, and the specific process is as follows:
[0080]
[0081]
[0082]
[0083]
[0084] in, Features after learning For multi-head graph attention mechanism, It is a probability distribution mapping function. Let be the self-supervised distance matrix in spectral space. The reconstruction results of distance categories in the spectral space are obtained through self-supervision using the self-supervised distance matrix obtained in Step 1, and the loss is calculated using the cross-entropy loss function. These represent the losses for Euclidean spatial distance reconstruction and spectral spatial distance reconstruction, respectively. This represents the total reconstruction loss from pre-training; the distance reconstruction method in 3D Euclidean space is similar to that in spectral space, and will not be elaborated here.
[0085] Step 4.3: After completing the distance reconstruction, the voxel-level features of the point cloud obtained through the dual-path structure are summed and depooled into point-level features, which are used as input features for the next iteration. After multiple iterations, the pre-trained model is obtained.
[0086] Specifically, taking the implementation in spectral space as an example, the constrained point cloud voxel-level features are obtained as follows: using the self-supervised distance matrix in spectral space constraint :
[0087]
[0088] in, For parameters The Gaussian function is used to obtain the constrained point cloud voxel-level features. The operation in three-dimensional Euclidean space is the same as the operation in spectral space, which will not be elaborated here.
[0089] Optionally, this embodiment employs a 3-layer autoencoder iteration to learn a general representation of multi-scale multispectral point clouds.
[0090] Step 5: Based on the pre-trained model, a learnable classification head is introduced for training to obtain a multispectral point cloud classification model, which is then used to classify multispectral point clouds.
[0091] Specifically, in this embodiment, a pre-trained model is obtained by pre-training on a multispectral point cloud remote sensing benchmark dataset. Then, the pre-trained model is loaded, and the output results of the iterative three-layer autoencoder are merged. ,in, Point-level features are generated by depooling the voxel features output from each autoencoder layer, and then global features are extracted based on the rotation invariance of the point cloud data. ,in, For data expansion operations, For max pooling operations, the fused features are finally... As input to the classification head, a multispectral point cloud classification model is trained.
[0092] Furthermore, in this embodiment, the total loss for gradient update of the multispectral point cloud classification model is defined as: ,in, The cross-entropy loss is calculated between the true value and the predicted label. This represents the total reconstruction loss from pre-training.
[0093] Based on the specific implementation details, the effectiveness of the technical solution of the present invention will be demonstrated through experiments.
[0094] 1. Experimental Data
[0095] Harbor of Tobermory dataset (HT): such as Figure 2 As shown, this dataset is a real multispectral point cloud remote sensing dataset. HT contains 7,181,982 spatial points collected by the Optech Titan three-channel LiDAR system, which acquires data in three spectral bands (532nm, 1064nm, and 1550nm). The dataset is manually labeled into nine land cover categories: shrubs, buildings, vehicles, grasslands, power lines, roads, boats, trees, and water.
[0096] 2. Experimental Setup
[0097] This experimental network is based on PyTorch 2.1.1 (compile-time with CUDA 11.8 support) and hardware acceleration is enabled via the NVIDIA 535.98 driver. The following evaluation metrics were consistently selected for the experiment: overall accuracy (OA), average F1 score (Avg. F1), and average intersection-over-union ratio (mIoU). SGD was used as the model training optimizer, with an initial learning rate of 0.001, a total of 100 epochs, and a K-nearest neighbor query parameter of k=20. The pre-training hyperparameters were the same as those used for the multispectral classification network training.
[0098] 3. Experimental Procedure
[0099] S1: The training and test sets are divided according to the distribution of land cover categories, avoiding class imbalance. The 3D spatial features of the point cloud are all subjected to min-max normalization, and the three-band spectral features of the point cloud are scaled to the [0,1] interval for normalization. Then, a multispectral point cloud sample containing 4096 points is generated, with samples containing fewer than 4096 points randomly repeated. The pre-training method proposed in this invention is used to pre-train on the divided benchmark dataset. The specific data flow is as follows: Figure 1 As shown by the dashed arrow, the number of partitions is set to 5, dividing the sample into 125 voxel blocks: , , ,in Simultaneously, record the mapping between points and voxel blocks: Calculate the two-way self-supervised distance matrix according to the custom rule. and In this study, the adjacency matrix in Euclidean space is obtained by scaling two voxels by a factor of 1.2 to determine if they overlap. The adjacency matrices calculated in both spaces are abstracted into graph structures, and the shortest path lengths between any two nodes in each graph are calculated using a shortest path algorithm. Finally, the results are saved to... Two matrices;
[0100] S2: Extracting the voxel block spatial features of the voxelized multispectral point cloud sample. ;
[0101] S3: Extract global and local features of voxel blocks. The specific steps are as follows: , , The query parameters are: =20, then represent the point-level features Mapped onto voxels, and then compared with voxel block spatial features Perform residual connections to obtain voxel-level features of the point cloud. ;
[0102] S4: Based on the voxel-level features of the point cloud, the implicit topological relationships between voxel blocks in 3D Euclidean space and spectral space are modeled using graph structures, and the implicit topological relationships between voxel blocks in 3D Euclidean space and spectral space are learned using the self-supervised distance matrix constraint model, resulting in a pre-trained model:
[0103] S5: Load the pre-trained model, and train it to obtain a multispectral point cloud classification model. The specific data flow is as follows: Figure 1 As shown by the arrows, after loading the pre-trained model and going through steps S1-S4, the output of each autoencoder layer is depooled into point-level features and fused. Finally, the fused features are input into the classification head, and training is performed using cross-entropy loss constraints. ,in, For the predicted value of the classification head, If it is true, The total loss is calculated to predict the number of categories: .
[0104] 4. Experimental Results
[0105] Following the above steps, experiments were conducted on the Harbor of Tobermory dataset to verify the predictions. The experimental results are shown in Table 1. The visualization results of the dataset's ground truth labels and the predicted labels of each method are shown below. Figure 3 As shown.
[0106] Table 1. Prediction performance of the present invention and the comparison method on real multispectral point cloud datasets.
[0107]
[0108] As shown in Table 1, the optimal values for each indicator are displayed in bold. The present invention improves the classification performance of the dataset to varying degrees, with an overall accuracy improvement of 14.6%, an average F1 score improvement of 17%, and an average intersection-union ratio improvement of 20%. This indicates that the present invention can better capture the complex spatial-spectral features of large-scale multispectral remote sensing point cloud data than existing technologies. Figure 2 The visualization compares the prediction results of this invention with those of other classification methods, more intuitively demonstrating the advantages of this invention. For example... Figure 2 As shown, this invention not only demonstrates superior classification performance for large-scale categories such as buildings, but also exhibits more accurate classification performance for fine-grained ground features (vehicles, power lines). This proves that this invention is more accurate in observing real-world scenes compared to existing technologies.
[0109] In summary, this invention preprocesses multispectral point cloud voxel blocks using custom rules, extracting their distance categories in 3D space and the spectral domain as self-supervised signals; extracts spatial features of the voxel blocks and uses them as residual modules to prevent feature forgetting; inputs them into an autoencoder, reconstructing the distances of the voxel blocks in spatial and spectral spaces through global and local feature extraction and graph modeling; loads a pre-trained model, adds a classification head, and further trains it; finally, uses the trained model for point cloud classification. Compared with existing technologies, this invention significantly improves the pre-trained model's ability to capture features of multispectral point cloud data, reduces dependence on label data, lowers costs, and significantly improves the performance of classification tasks through excellent generalization and robustness.
[0110] The specific embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention.
Claims
1. A multispectral point cloud classification method based on spatial-spectral self-supervised pre-training, characterized in that, Includes the following steps: Step 1: Voxelize the multispectral point cloud sample to obtain voxel blocks, and preprocess the voxel blocks to obtain the self-supervised distance matrix between voxel blocks in three-dimensional Euclidean space and spectral space. Step 2: Extract the voxel block spatial features of the voxelized multispectral point cloud samples; Step 3: Extract the global and local features of the voxelized multispectral point cloud samples, perform feature pooling to obtain point-level feature representations, map the point-level feature representations to voxels, and perform residual connection with the voxel block spatial features to obtain point cloud voxel-level features. Step 4: Based on the voxel-level features of the point cloud, the implicit topological relationships of voxel blocks in three-dimensional Euclidean space and spectral space are modeled using graph structure, and the implicit topological relationships between voxel blocks in three-dimensional Euclidean space and spectral space are learned using the self-supervised distance matrix constraint model to obtain the pre-trained model. Step 5: Based on the pre-trained model, a learnable classification head is introduced for training to obtain a multispectral point cloud classification model, which is then used to classify multispectral point clouds. Step 4 specifically refers to: Step 4.1: A dual-path structure is used to reconstruct the distance information of the graph structure in both spectral space and three-dimensional Euclidean space. The expression for graph modeling is: ; in, To calculate the edge weight between the two nodes, The features of the current two nodes obtained through traversal; Step 4.2: In the spectral space and the three-dimensional Euclidean space, a multi-head graph attention mechanism is used to learn voxel-level features of the point cloud, and the distance category is reconstructed using the features. Cross-entropy loss is calculated with the self-supervised distance matrix to constrain the model to learn the implicit topological relationships in the spectral space and the three-dimensional Euclidean space at the same time. Step 4.3: After completing the distance reconstruction, the voxel-level features of the point cloud obtained after the dual-path structure constraint are summed and depooled into point-level features, which are used as input features for the next layer iteration. After multiple iterations, the pre-trained model is obtained.
2. The multispectral point cloud classification method based on spatial-spectral self-supervised pre-training as described in claim 1, characterized in that, The expression for the voxelized multispectral point cloud sample is: ; ; ; ; in, It is the calculated voxel block index. It is the number of partitions during voxelization. It is the mapping relationship between multispectral point clouds and voxel blocks. These are the minimum values of the three-dimensional spatial features of the multispectral point cloud contained in the current multispectral point cloud sample obtained through traversal calculation. It is the spatial three-dimensional feature of current multispectral point clouds.
3. The multispectral point cloud classification method based on spatial-spectral self-supervised pre-training as described in claim 1, characterized in that, The preprocessing of the voxel blocks to obtain the self-supervised distance matrix between voxel blocks in three-dimensional Euclidean space and spectral space is specifically as follows: Each voxel block obtained from the partitioning is used as a node in the graph, and the adjacency matrix in three-dimensional Euclidean space and spectral space is obtained through graph modeling. In the three-dimensional Euclidean space, two voxel blocks are simultaneously magnified by a preset factor, and the adjacency matrix is obtained by judging whether they overlap. In the spectral space, the adjacency matrix is obtained by defining the following expression: ; In the formula, These are adjacency matrix elements. It is the intersection of the color histograms of two voxel blocks. These are color histograms in two voxel spectral spaces, respectively. It is the intersection threshold of the color histograms. When the intersection of the color histograms is greater than 0.5, it is determined that the two nodes are connected, thus obtaining the adjacency matrix. Based on the obtained adjacency matrix, the shortest path length between nodes in the three-dimensional Euclidean space and the spectral space are calculated respectively, and the shortest path length is used as the self-supervised distance matrix.
4. The multispectral point cloud classification method based on spatial-spectral self-supervised pre-training as described in claim 2, characterized in that, Step 2 specifically includes: The spatial information of the voxelized multispectral point cloud sample is extracted and denoted as . , It is a 3D spatial feature of multispectral point cloud. Based on the mapping from points to voxels, the spatial information of voxel blocks is calculated, and the spatial features of voxel blocks are extracted using a multilayer perceptron. The expression is: ; ; in, This means that the point cloud features in each voxel block are averaged and pooled into voxel features. This indicates the number of point clouds contained in the current voxel block. For the point cloud belonging to the current voxel block, This is the feature set of the point cloud under the current voxel block. This represents the feature representation of the point cloud under the current voxel block. The number of feature channels, Represented as a transpose matrix, It is a feature mapped to a high-dimensional voxel block space. It is a multilayer perceptron.
5. A multispectral point cloud classification method based on spatial-spectral self-supervised pre-training as described in claim 4, characterized in that, Step 3 specifically refers to: Step 3.1: Extract the global and local features of the voxelized multispectral point cloud samples. The expression is: ; ; ; in, To query the current point Feature representation of neighboring points, It is the point cloud feature representation of the current query. K-Nearest Neighbors Algorithm This represents the high-dimensional feature mapping results of global and local features for each point cloud center point. For feature concatenation operations, finally use Feature pooling is performed to obtain point-level feature representations. ; Step 3.2: Utilizing the mapping relationship between multispectral point clouds and voxel blocks Representing point-level features Map back to voxels and use average pooling to process them into voxel feature representations, then combine them with voxel block space features. Residual connections are performed to enhance the features of voxel blocks in 3D Euclidean space, ultimately yielding voxel-level features of the point cloud. ,in, The number of voxel blocks obtained from the division.