Power transmission line point cloud segmentation method and device based on geometric enhancement and confusion constraint
By explicitly enhancing multi-scale geometric features and highly normalized features, and combining the combined loss function and confusion constraint loss, the problems of identifying fine linear objects and class imbalance in point cloud segmentation of power transmission lines are solved, achieving high-precision and robust segmentation results, which are suitable for automated equipment identification and safety assessment of power transmission lines.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HARBIN INST OF TECH AT WEIHAI
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for semantic segmentation of point clouds of transmission lines suffer from problems such as insufficient representation of features of small linear objects, class imbalance, difficulty in distinguishing easily confused spatial structures, and data augmentation strategies that disrupt scene structure, resulting in inaccurate and unreasonable segmentation results.
An explicit enhancement of multi-scale geometric features and highly normalized features is adopted, combined with a combined loss function and a confusion constraint loss for training. A structure-preserving training strategy is used, and structured post-processing is performed in conjunction with knowledge of the transmission line domain to form a closed-loop collaborative technical system.
It significantly improves the recognition accuracy of small linear targets and rare categories, reduces the inter-class confusion rate, and enhances the physical rationality and stability of the segmentation results, making it suitable for the identification and safety assessment of automated equipment in power transmission lines.
Smart Images

Figure CN122244078A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a method and apparatus for point cloud segmentation of transmission lines based on geometric enhancement and obfuscation constraints. It belongs to the field of image data processing technology and is applicable to automated equipment identification, safety assessment, defect detection and digital twin modeling scenarios in intelligent inspection of transmission lines. Background Technology
[0002] Semantic segmentation of 3D laser point clouds for power transmission lines is a key technology in intelligent inspection of the power industry. This task aims to automatically identify and classify point cloud data of power transmission line corridors collected by airborne or ground-based LiDAR into various semantic categories such as towers, conductors, insulators, jumpers, optical cables, ground wires, and vegetation, providing basic data for line safety assessment, defect detection, and digital twin modeling.
[0003] The closest current implementation is to directly apply general-purpose 3D point cloud semantic segmentation networks (such as PointTransformer V3, PointNet++, RandLA-Net, etc.) to segment power transmission line scenes. These methods use the 3D coordinates (XYZ) of points as input features, learn the spatial representation of point clouds through deep learning networks, and are trained using the standard cross-entropy loss function. The original paper for Point Transformer V3, "Point Transformer V3: Simpler, Faster, Stronger," was published at the IEEE / CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) in June 2024. As a leading point cloud processing backbone network, this network achieves efficient global feature learning through a sequential attention mechanism, achieving excellent segmentation performance on general benchmarks such as indoor scenes and autonomous driving scenes.
[0004] Existing technologies, when applied to semantic segmentation of point clouds in power transmission lines, suffer from the following systemic defects caused by the specific characteristics of the scenario: 1. Relying solely on coordinate features leads to insufficient representation of fine linear objects: The standard Point Transformer V3 only uses XYZ 3D coordinates as input features (3 channels). However, linear objects such as conductors, jumpers, optical cables, and ground wires in power transmission line scenarios appear as extremely fine lines in 3D space, making them difficult to distinguish effectively using coordinate information alone. These linear objects possess unique local geometric features (such as high linearity and low flatness), but the standard network lacks the ability to explicitly extract and utilize these geometric priors, resulting in insufficient representation of the features of fine objects.
[0005] 2. Standard cross-entropy loss cannot handle extreme class imbalance: Severe class imbalance exists in power transmission line scenarios. For example, in a typical scenario, vegetation points can number in the hundreds of thousands, while rarer categories such as jumpers and fiber optic cables may only have tens to hundreds of points. The standard cross-entropy loss function assigns the same weight to all samples, causing model training to be dominated by the dominant class, resulting in extremely low recall rates for rarer classes, or even complete failure to identify them.
[0006] 3. Lack of explicit discriminative constraints for easily confused spatial structure categories: Transmission line scenarios contain numerous pairs of objects with similar geometric shapes but different semantics, such as jumpers and conductors (both linear suspensions), optical cables and ground wires (both thin linear overhead structures), and V-string insulators and tension insulators (both string suspensions). Standard loss functions only optimize global classification accuracy, lacking specific constraints for these particular confused pairs, leading to frequent misclassifications between these categories.
[0007] 4. General data augmentation strategies disrupt the inherent spatial structure of transmission lines: The standard Point Transformer V3 employs data augmentation strategies such as random rotation, scaling, flipping, and elastic deformation. However, transmission line scenarios have strict physical spatial structure constraints (such as horizontally suspended conductors, vertically erected towers, and insulators connecting towers and conductors). Excessive geometric augmentation can disrupt these inherent spatial relationships, thereby reducing the model's ability to understand the scene structure. Summary of the Invention
[0008] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method and apparatus for segmenting transmission line point clouds based on geometric enhancement and confusion constraints. This solves the technical problems in the prior art of analyzing the geometric structure of transmission line point clouds, such as difficulty in identifying small linear targets, extreme class imbalance, high inter-class confusion rate, and insufficient physical rationality of the segmentation results.
[0009] The technical solution adopted by this invention to solve its technical problem is as follows: On the one hand, a point cloud segmentation method for transmission lines based on geometry enhancement and ambiguity constraints is provided, including the following core steps: Step 1, Point Cloud Preprocessing and Enhanced Feature Extraction: The original 3D point cloud of the input transmission line is preprocessed to extract multi-scale geometric features and highly normalized features, and the original 3D coordinate features are expanded into a multi-dimensional enhanced feature vector containing multi-scale geometric features and highly normalized features. Step 2, Serialized Attention Deep Feature Learning: Input the multi-dimensional enhanced feature vector into the improved PointTransformer V3 backbone network, perform deep feature learning through a multi-level serialized encoder-decoder structure, and output the geometric structure feature map of each point in the power transmission line scene; Step 3, Joint training of combined loss and confusion constraint: The combined loss function and confusion pair constraint loss are used for joint training to optimize the network parameters. The confusion pair constraint loss is used as an auxiliary loss to apply interval constraints to predefined easily confused spatial structure categories. Step 4, Post-processing of the power transmission scene: Based on knowledge of the power transmission line domain, the geometric feature map is post-processed to obtain the original point-level final geometric structure segmentation result.
[0010] Furthermore, in step 3, the combined loss function is a weighted sum of Lovász-Softmax loss, Focal loss, and class balance loss, and the auxiliary loss is a hinge constraint loss for 16 predefined easily confused spatial structure class pairs. The joint optimization improves the model's recall rate for rare classes and its discriminative power for easily confused spatial structure classes.
[0011] Furthermore, a structure-preserving training strategy is adopted during the joint training process, which specifically includes: removing all random geometric data augmentation operations and adopting a preset sampling and pruning strategy to preserve the inherent spatial structure of the transmission line scenario.
[0012] Furthermore, in step 4, the structured post-processing sequentially performs six stages: global KNN voting smoothing, parallel line hierarchical prior correction, line segment connectivity merging and cleaning, intra-segment majority voting unification, tower 3D template matching constraint, and tower edge vegetation correction, systematically correcting physical inconsistencies in the model prediction.
[0013] On the other hand, a transmission line point cloud segmentation device based on geometric enhancement and obfuscation constraints is provided, including a feature enhancement module, a geometric structure learning module, a joint training optimization module, and a structured post-processing module, to implement the above method.
[0014] On the other hand, an electronic device and a computer-readable storage medium for implementing the above method are provided.
[0015] One of the above technical solutions has the following advantages or beneficial effects: 1. Explicit enhancement of multi-scale geometric features and highly normalized features explicitly encodes the geometric priors and vertical spatial distribution priors of linear, planar, and scattering objects in the power transmission line scene, which greatly reduces the difficulty of feature learning for small linear targets and rare categories, and significantly improves the recognition accuracy of rare linear targets such as jumpers and optical cables.
[0016] 2. The triple combination loss function works synergistically from three orthogonal dimensions: IoU optimization, hard example focus, and class balance. It effectively addresses the extreme class imbalance problem in the transmission line scenario, significantly improves the recall rate of rare classes, and the overall mIoU is significantly better than the general segmentation scheme.
[0017] 3. The confusion method specifically forces the decision boundary between easily confused spatial structure categories to be enlarged in response to the hinge constraint loss. This solves the high-frequency confusion problem between jumpers and conductors, optical cables and ground wires, and different types of insulators at the training level, and significantly reduces the inter-class misclassification rate.
[0018] 4. The structure-preserving training strategy removes random augmentations that disrupt the scene structure, preserving the inherent spatial relationships of the transmission line conductor suspension and the vertical distribution of towers. This results in stronger generalization and stability of the model under different acquisition conditions.
[0019] 5. The six-stage structured post-processing is based on the physical topology of transmission lines, systematically correcting physical inconsistencies in the output of deep learning models, and the segmentation results are more in line with the actual engineering needs of power line inspection.
[0020] 6. The technical modules of this invention are not simply functional additions, but form a closed-loop synergistic technical system: the multi-scale geometric features and highly normalized feature enhancement at the input end provide deep learning networks with geometric and spatial priors specific to power transmission scenarios, significantly reducing the difficulty of feature learning for rare categories and small targets; the combined loss function optimizes the gradient update direction of the model to address the class imbalance problem, while the confusion-constrained loss specifically solves the problem of highly similar categories that prior features still cannot completely distinguish. The two work together to achieve a training effect of "global balance optimization + local accurate differentiation"; the structured post-processing, based on the physical topology of power transmission lines, systematically corrects the prediction results output by the network, making up for the shortcomings of deep learning models in terms of physical rationality. Finally, a full-link synergy of "prior enhancement - training optimization - structural correction" is formed, producing a high-precision and high-robust segmentation effect for power transmission line scenarios that cannot be achieved by existing general segmentation schemes. Attached Figure Description
[0021] Figure 1 This is a flowchart illustrating a point cloud segmentation method for transmission lines based on geometry enhancement and obfuscation constraints, according to an exemplary embodiment. Figure 2 This is a schematic diagram of a point cloud segmentation device for transmission lines based on geometry enhancement and obfuscation constraints, according to an exemplary embodiment. Figure 3 This is a flowchart illustrating a specific implementation of point cloud segmentation for transmission lines based on geometric enhancement and obfuscation constraints, according to an exemplary embodiment. Figure 4 This is a flowchart illustrating a specific implementation of multi-scale geometric feature enhancement and highly normalized feature extraction according to an exemplary embodiment; Figure 5 This is a flowchart illustrating a specific implementation of model training based on a combined loss function and confusion pair constraints, according to an exemplary embodiment. Figure 6 This is a flowchart illustrating a specific implementation of a domain knowledge-based structured post-processing method according to an exemplary embodiment. Detailed Implementation
[0022] To more clearly illustrate the technical features of the present invention, the invention will be described in detail below through specific embodiments and in conjunction with the accompanying drawings. The following disclosure provides many different embodiments or examples for implementing different structures of the present invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Of course, these are merely examples and are not intended to limit the invention.
[0023] Definitions: 1. Point Cloud Semantic Segmentation: A computer vision task that assigns semantic category labels to each point in a 3D laser point cloud. In a power transmission line scenario, each point in the point cloud needs to be identified as one of 22 categories, such as towers, conductors, insulators, and vegetation.
[0024] 2. Point Transformer V3: A deep learning backbone network for 3D point cloud processing based on a serialized attention mechanism. This network serializes unordered point clouds along spatial filling curves (such as Hilbert curves and Z-order curves), achieving efficient self-attention computation while preserving spatial locality.
[0025] 3. Multi-scale geometric feature enhancement: A point cloud feature extraction method based on local covariance matrix eigenvalue decomposition. This method computes the geometric descriptor (linearity, flatness, scattering, etc.) of each point at multiple spatial scales (spherical neighborhoods of different radii) to explicitly encode the local geometric structure information of the point cloud.
[0026] 4. Height normalization feature: The relative height feature of each point is calculated based on the grid ground estimation algorithm (local minimum elevation estimation ground initial value + neighborhood interpolation completion). The normalized height is obtained by subtracting the ground elevation estimate of the corresponding position from the Z coordinate of the point cloud.
[0027] 5. Confusion Pair Constraint Loss: An auxiliary loss function designed for specific easily confused spatial structure category pairs. This loss function, through the hinge loss mechanism, imposes a margin constraint at the classification score (logit) level for pre-specified easily confused spatial structure category pairs: requiring the true class score to be at least greater than the confused class score by a given margin.
[0028] 6. Combined Loss Function: A training objective that integrates multiple loss functions. This invention employs a weighted combination of Lovász-Softmax loss, Focal loss, and class balance loss, which are used for surrogate optimization of IoU, emphasizing hard-to-classify samples, and enhancing feature separability at class boundaries and segmentation stability of slender structures, respectively.
[0029] Example 1 like Figure 1 As shown in the figure, an embodiment of the present invention provides a point cloud segmentation method for transmission lines based on geometry enhancement and obfuscation constraints, which includes the following steps: Step 1: Preprocess the original 3D point cloud of the input transmission line, extract multi-scale geometric features and highly normalized features, and expand the original 3D coordinate features into a multi-dimensional enhanced feature vector containing multi-scale geometric features and highly normalized features.
[0030] Specifically, step 1 includes the following steps: Step 11: Perform standardization preprocessing on the input original 3D point cloud of the transmission line to eliminate noise and coordinate offset interference, and obtain standardized point cloud data; Step 12: For the standardized point cloud data, construct local neighborhoods at multiple spatial scales, and extract multi-scale geometric features based on the eigenvalue decomposition of the neighborhood covariance matrix. Step 13: For the standardized point cloud data, calculate the height normalization feature of each point relative to the local ground based on the grid ground estimation algorithm; Step 14: Align and concatenate the original three-dimensional coordinate features, the multi-scale geometric features, and the highly normalized features to obtain a multi-dimensional enhanced feature vector adapted to the input of the deep learning network.
[0031] The standardization preprocessing described in step 11 includes: Outlier removal and robust center translation: Calculate the 1% and 99th percentiles for the X, Y, and Z axes of the point cloud respectively, and remove any outlier points whose coordinates exceed the quantile range; then, using the median of the three coordinates of the point cloud as the robust center, perform an overall translation on all point coordinates to concentrate the point cloud near the origin. Spatial range clipping and voxel mesh downsampling: The point cloud is restricted to a 3D spatial range of [-500m, 500m], and points outside the range are deleted; then, a voxel mesh with a side length of 0.04m is used to downsample the point cloud, retaining a representative point in each non-empty voxel, and recording the mapping relationship between the voxel mesh and the original points (e.g., recording the voxel index to which each original point belongs or the index list of original points in each voxel), so that the segmentation label of the downsampled point cloud can be restored to the original point cloud resolution after post-processing.
[0032] Local spherical cropping controls the number of points per sample: After voxelization, the point cloud center region is used as a reference to retain up to 120,000 points closest to the center; after cropping, only the X and Y coordinates of the point cloud are centered twice, while retaining the absolute elevation information of the Z axis.
[0033] Step 12 specifically includes the following sub-steps: Construct a spatial index structure for standardized point clouds to improve the efficiency of local neighborhood search; For each point, a local spherical neighborhood is constructed at two scales: 0.1m and 0.5m. If the number of neighborhood points at a certain scale is less than 3, the geometric descriptor for that scale is set to zero. For each effective neighborhood point set, calculate the three-dimensional coordinate covariance matrix and perform eigenvalue decomposition to obtain three non-negative eigenvalues; Based on the eigenvalues, five geometric descriptors are calculated for each scale: linearity, flatness, scattering, anisotropy, and feature entropy. The descriptors of two scales are concatenated in sequence to form a 10-dimensional multi-scale geometric feature vector.
[0034] Step 13 specifically includes the following sub-steps: Project the point cloud onto the XY horizontal plane, divide it into regular grids according to the preset grid size, and construct the ground elevation grid; For each non-empty grid, the minimum Z-coordinate of all points within the grid is used as the initial ground elevation estimate; For empty grids without point coverage, interpolation is performed based on the ground elevation of the surrounding valid grids to generate continuous ground elevation surfaces; The difference between the Z-coordinate and the ground elevation at the corresponding location is calculated point by point to obtain the normalized height feature.
[0035] The grid size is set to 2.0m; during the interpolation completion process based on the ground elevation of the surrounding effective grid, the K-nearest neighbor distance weighted interpolation method is used, and the number of nearest neighbors K is the smaller value between 5 and the number of effective grids.
[0036] The formula for calculating the normalized height is:
[0037] in, Let i be the normalized height of the i-th point. This is the original Z-coordinate of the point. This is the estimated local ground elevation at the corresponding location; negative results are truncated to 0.
[0038] Step 13 also includes degenerate sample fault tolerance processing: when the total number of points in the point cloud is less than the preset minimum number of points threshold, a normalized height array of all zeros is directly output to avoid process interruption.
[0039] In step 14, the final spliced multidimensional enhanced feature vector is 14-dimensional, arranged in the following order: 3-dimensional original coordinate features, 10-dimensional multi-scale geometric features, and 1-dimensional highly normalized features. The 10-dimensional multi-scale geometric features consist of 5 geometric descriptors for each of the two scales.
[0040] Step 14 specifically includes the following sub-steps: Perform a uniform format conversion on all feature components to ensure that the data type is float32; Using the number of points of the coordinate feature as a reference, verify the consistency of the number of points of all feature components and eliminate mismatched feature components. Perform dimensional expansion on one-dimensional highly normalized features and align them with other two-dimensional features. All components are concatenated along the feature dimension to obtain the final 14-dimensional enhanced feature vector.
[0041] After step 14 is completed, the point order of the point cloud is randomly shuffled to eliminate spatial arrangement deviation, and then the enhanced feature vector is input into the Point Transformer V3 backbone network.
[0042] During training, a structure-preserving training strategy is adopted: all random geometric data augmentation operations (rotation, scaling, flipping, etc.) are removed, and mesh sampling and spherical clipping are performed using preset strategies to preserve the inherent spatial structure of the power transmission line scene and reduce training fluctuations.
[0043] Step 2: Input the multidimensional enhanced feature vector into the Point Transformer V3 backbone network, perform serialized attention encoding-decoding, and output the geometric structure feature map of each point in the power transmission line scene.
[0044] Specifically, step 2 includes the following steps: Step 21: Input the multidimensional enhanced feature vector, which is composed of three-dimensional coordinates, multi-scale geometric features and highly normalized features, into the Point Transformer V3 backbone network; Step 22: Perform serialized attention encoding-decoding operations through the Point Transformer V3 backbone network to learn deep geometric structure representations; Step 23: Output a geometric feature map covering all elements of the transmission line corridor.
[0045] Before inputting the multidimensional enhanced feature vectors into the Point Transformer V3 backbone network, the following steps are also included: The number of input embedding layer channels in Point Transformer V3 was modified from the standard 3 channels to a number of channels that match the multidimensional augmented feature vector, in order to adapt to multi-source augmented feature input.
[0046] The multidimensional enhanced feature vector is 14-dimensional, and the input channels of Point Transformer V3 are set to 14 channels accordingly; and the 14-dimensional input features are mapped to the 32-dimensional channels of the first order of the encoder through a linear projection layer.
[0047] The serialization attention encoding-decoding includes: A spatial filling curve serialization strategy is used to convert disordered 3D point clouds into 1D sequences. Multi-head self-attention computation is performed within a window of a one-dimensional sequence to achieve efficient feature learning while preserving spatial locality.
[0048] The space-filling curve serialization strategy is a combination of Z-order curves and Hilbert curves, which balances computational efficiency and spatial locality preservation.
[0049] The encoding-decoding structure specifically includes: It adopts an encoding-decoding architecture with a 5-level encoder and a 4-level decoder; The encoder achieves step-by-step downsampling through serialization pooling, while the decoder achieves step-by-step upsampling through serialization depooling.
[0050] The encoder uses a serialization pooling operation with a step size of 2 between each order, and the number of points is halved and the number of channels is doubled in each downsampling. The encoder's fourth stage is configured with a 6-layer attention structure to fully learn complex semantic information in the intermediate feature layers.
[0051] After upsampling at each level, the decoder fuses the features of the corresponding level of the encoder with the current features of the decoder through skip connections, thus preserving the local spatial details of the fine linear targets.
[0052] During attention calculation, the serialized one-dimensional point cloud is divided into non-overlapping patches of size 512, and standard multi-head self-attention calculation is performed within each patch.
[0053] After encoding and decoding, the deep features are mapped to 22 semantic prediction results through a linear classification head, forming a voxel-level classification score.
[0054] The 22 semantic categories include: jumper wires, streetlights, ground, buildings, highways, greenhouses, railways, vehicles, low vegetation, high vegetation, conductors, poles, tension insulators, V-string insulators, spacers, distribution poles, distribution conductors, water, straight insulators, ground wires, optical cables, and other categories.
[0055] The Point Transformer V3 backbone network employs a two-stage segmentation process during the inference phase: First, perform coarse segmentation on the point cloud of the entire scene, and extract the ROI region containing small linear targets based on the coarse segmentation results; Then, fine-scale segmentation is performed on the ROI region to improve the recognition accuracy of small targets such as jumpers, optical cables, and insulators.
[0056] Step 3: Use the combined loss function and the confusion constraint loss to jointly train and optimize the network parameters.
[0057] Specifically, step 3 includes the following steps: Step 31: Based on the prediction results of the Point Transformer V3 network, simultaneously calculate the combined loss function and the confusion pair constraint loss; the combined loss function is the main loss, which is composed of the weighted sum of the Lovász-Softmax loss, Focal loss, and class balance loss, with equal weights for the Lovász-Softmax loss, Focal loss, and class balance loss, all set to 1; the confusion pair constraint loss is the auxiliary loss, which adopts a hinge loss mechanism to apply an interval constraint to easily confused spatial structure class pairs at the classification score logit level; Step 32: The combined loss and the confusion-constraint loss are weighted and summed to obtain the total loss. The total loss is then used for backpropagation to optimize the network parameters, completing the model training. The formula for calculating the total loss is: Total loss = Lovász-Softmax loss + Focal loss + Class balance loss + Confusion pair constraint loss.
[0058] In the combined loss function, each loss weight is equal and set to 1, so that each component maintains a balance in gradient contribution.
[0059] The Lovász-Softmax loss transforms the discrete IoU index into a continuously differentiable surrogate loss through Lovász extension, directly optimizing the segmentation intersection-union ratio.
[0060] The Focal loss reduces the weight of easily classified samples by modulating the factor, so that the model focuses on difficult-to-classify samples and rare class samples; the focusing parameter γ is set to 3.5.
[0061] The category balance loss assigns adaptive weights to each category based on the effective sample number theory; the hyperparameter β is set to 0.9999.
[0062] The confusion constraint loss requires that the true class logit score and the confused class logit score satisfy a preset marginal interval m, where m is set to 1.0.
[0063] The confusion pair constraint loss applies a two-way constraint to 16 predefined easily confused spatial structure category pairs. The confusion pairs are divided into four categories: linear object confusion, insulator type confusion, spatial adjacent boundary confusion, and occlusion interference confusion.
[0064] The global weight w_conf of the confusion constraint loss is set to 2.5, which is higher than the weights of each component of the main loss, in order to enhance the distinguishability of easily confused spatial structure categories.
[0065] The AdamW optimizer was used during training, with an initial learning rate of 0.0001 and a weight decay of 0.005.
[0066] The learning rate scheduling adopts the OneCycleLR strategy, with linear warm-up performed in the first 4% of iterations, and subsequent decreases using a cosine annealing strategy.
[0067] The joint training process employs a structure-preserving training strategy, which includes: removing all random geometric data augmentation operations (rotation, scaling, flipping, etc.), using preset strategies for grid sampling and spherical clipping, and randomly shuffling the point order to maintain the inherent spatial structure of the power transmission line scene and reduce training fluctuations.
[0068] The training consists of 100 epochs, with data looped 3 times in each epoch; parallel training is performed using 4 GPUs, with a total batch size of 16; gradient clipping and mixed precision training are enabled.
[0069] Step 4: Based on knowledge of the transmission line field, perform structured post-processing on the geometric feature map to obtain the original point-level final geometric structure segmentation result.
[0070] Specifically, step 4 includes the following steps: Step 41: Perform global KNN voting smoothing on the full-category geometric structure feature maps of the power transmission line scene output by Point Transformer V3 to eliminate isolated noise points and local label jitter; Step 42: Based on the spatial hierarchy prior of transmission line routes, perform conductor / ground wire correction based on parallel line hierarchy prior to correct misclassification results of line structure labels such as conductors, ground wires, and jumpers; Step 43: Perform line segment connectivity merging and cleaning, cluster and unify the labels of the connected line point clouds, and eliminate fragmented segmentation results; Step 44: Perform majority voting within the merged line segments to unify the results and further stabilize the consistency of local area labels. Step 45: Based on the knowledge of the three-dimensional morphology of the transmission tower, perform three-dimensional template matching constraints for the tower to refine and correct the segmentation results of the tower area; Step 46: Perform vegetation correction on the edge area of the tower, correcting the tower edge points that were mistakenly classified as vegetation or ground to the tower category; Step 47: Based on the mapping relationship between the voxel mesh and the original point cloud recorded in Step 11, the segmentation labels of the downsampled point cloud (or voxel-level representative points) processed in Steps 41-46 are mapped back to each point in the original input point cloud through nearest neighbor search or majority voting: For each original point, the downsampled point label corresponding to its voxel is found. If the same voxel contains multiple original points, these original points are assigned the same label as the representative point of that voxel. If some original points do not participate in forward inference due to local spherical clipping, nearest neighbor interpolation is used to supplement the labels. Finally, the original point-level geometric structure segmentation result is output with the same number of points as the original input point cloud.
[0071] In step 41, the global KNN voting smoothing adopts a nearest neighbor search strategy of K=10, performs majority voting on the 10 nearest neighbors of each predicted point, and uses the voting result as the correction label of the point; this operation is only performed when the total number of points in the point cloud does not exceed 500,000 to avoid high computing power overhead.
[0072] In step 42, the conductor / ground wire correction based on the prior knowledge of parallel line hierarchy includes: extracting line type point clouds (conductors, ground wires, optical cables, jumpers, etc.) within the horizontal radius of the transmission tower; clustering the Z coordinates by gap and sorting them from high to low to establish a hierarchy table; labeling the highest layer as ground wire and the remaining layers as conductors; processing the line type point cloud along the principal axis of the entire scene; matching the Z mean of each line cluster with the hierarchy table; correcting the labels of points with Z differences within ±3.5m; and exempting jumper points within 20m of the tower horizontally from being 80% above the tower top, thus avoiding incorrectly changing the jumper category.
[0073] Step 43, segment connectivity merging and cleanup, specifically includes: The entire scene's line-type point cloud is segmented along the main spatial axis of the power transmission line with a step size of 10m. A disjoint-set data structure is used to perform connectivity clustering on line point clouds within segments and merge connected line segments. For the merged line segments, the percentage of points in each semantic category is counted. If the percentage of a certain category is ≥55%, then the line segment is uniformly labeled as that category.
[0074] In step 44, the majority vote within a segment is uniformly divided into units of 5m. Label statistics are performed on the line point cloud within each segment. If the percentage of points with a certain label is ≥60%, then the label of all points in that segment is unified to that label to eliminate short-distance label jitter.
[0075] In step 45, the specific execution process for the tower 3D template matching constraint is as follows: Construct a 3D template library for common transmission line tower types, including templates for cement poles, steel pipe poles, and iron towers; For suspected tower areas in the prediction results, shape descriptors of point clouds are extracted and matched with a 3D template library to determine the refined inference area; A fine-scale grid (0.02m) is used for secondary reasoning in the refined reasoning region. The results of the secondary reasoning are combined with the coarse segmentation results to correct the omission and misclassification problems in the tower area.
[0076] In step 46, the vegetation correction at the pole edge specifically involves: Construct a 3D bounding box for the tower area in the prediction results to determine the spatial extent of the tower; For point clouds within the tower enclosure that are less than 0.5m from the tower surface, if the original predicted label is vegetation or ground, the label is corrected to tower. After the correction is completed, a neighborhood consistency check is performed on the point cloud of the corrected area to ensure that the corrected label conforms to the local spatial characteristics.
[0077] The execution order of steps 41 to 46 above cannot be adjusted, and the output of the previous step is used as the input of the next step, forming a closed-loop structured post-processing flow.
[0078] After the structured post-processing is completed, the output raw point-level final geometric structure segmentation result is point cloud data containing 22 semantic labels in LAS format, which can be directly connected to the intelligent inspection business system of power transmission lines for subsequent applications.
[0079] This invention accurately analyzes the geometric structure attributes of power transmission line point clouds, solving problems such as terrain undulation interference, cable confusion, weak small sample recognition, and sampling recovery misalignment in large-scale power transmission line point clouds. It improves segmentation accuracy and engineering stability, and can directly output point-by-point classification results that conform to the LAS standard. It is suitable for automatic analysis of UAV / airborne LiDAR inspection data.
[0080] Example 2 like Figure 2 As shown in the figure, an embodiment of the present invention provides a transmission line point cloud segmentation device based on geometry enhancement and obfuscation constraints, comprising: The feature enhancement module is used to preprocess the original 3D point cloud of the input transmission line, extract multi-scale geometric features and highly normalized features, and expand the original 3D coordinate features into a multi-dimensional enhanced feature vector containing multi-scale geometric features and highly normalized features. The geometric structure learning module is used to input multidimensional enhanced feature vectors into the Point Transformer V3 backbone network, perform sequential attention encoding-decoding, and output the geometric structure feature map of each point in the power transmission line scene. The joint training optimization module is used to jointly train and optimize network parameters using a combined loss function and a confusion pair constraint loss. The confusion pair constraint loss is used as an auxiliary loss to apply interval constraints to predefined easily confused spatial structure category pairs. The structured post-processing module is used to perform structured post-processing on the geometric feature map based on knowledge of the transmission line domain, and obtain the original point-level final geometric structure segmentation result.
[0081] like Figure 3 As shown, the specific implementation of the transmission line point cloud segmentation process based on geometric enhancement and ambiguity constraints of the present invention mainly involves the following aspects.
[0082] S1, Multi-scale geometric feature enhancement and highly normalized feature extraction: For the input raw 3D point cloud, geometric descriptors based on covariance matrix eigenvalue decomposition are extracted at multiple spatial scales, and the normalized height feature of each point is calculated by ground extraction algorithm, expanding the original 3D coordinate features into a 14-dimensional enhanced feature vector.
[0083] This step aims to address the issue that the standard Point Transformer V3, which only uses XYZ coordinates as input features, fails to adequately represent the geometric morphology of small linear objects (conductors, jumpers, optical cables, and ground wires) in transmission line scenarios. Different objects in a transmission line scenario possess distinctly different local geometric features: conductors and ground wires exhibit a linear distribution with high linearity and low flatness; tower surfaces exhibit a planar distribution with high flatness; and insulators exhibit a complex three-dimensional distribution with high scattering. By explicitly extracting these geometric priors and using them as additional input channels, the network's ability to distinguish between different object types can be significantly enhanced.
[0084] like Figure 4 As shown, S1 specifically includes the following sub-steps: S11, Data Preprocessing and Grid Downsampling: Step S11 aims to perform standardized preprocessing on the original point cloud, eliminate the influence of outliers and coordinate offsets, and unify the point cloud density through voxel grid downsampling, so as to provide standardized point cloud data with uniform format and controllable quality for subsequent geometric feature extraction and deep learning network input.
[0085] The necessity of this step stems from the inherent characteristics of laser point cloud data for power transmission lines. When airborne or ground-based lidar collects data from power transmission line corridors, the raw point clouds typically exhibit the following characteristics: (a) uneven point cloud density, with high density in near-field areas and low density in far-field areas; (b) the presence of outlier noise points due to multipath reflection, atmospheric scattering, and other factors; and (c) the point cloud coordinates use a geographic coordinate system, with absolute coordinate values reaching hundreds of thousands, which can lead to numerical instability when directly input into deep learning networks. Therefore, a systematic preprocessing procedure is necessary to convert the raw point cloud into a standardized format suitable for subsequent processing.
[0086] In one specific embodiment, the preprocessing procedure performs the following six operations sequentially: (1) Outlier Removal and Robust Center Translation: Calculate the 1% and 99th quantiles for the X, Y, and Z axes of the point cloud, respectively. Remove any point whose coordinate value exceeds the range of these quantiles as an outlier. Then, using the median of the point cloud in the X, Y, and Z directions as the robust center, perform a global translation of all point coordinates to concentrate the point cloud distribution near the origin. This step is used to eliminate the interference of outlier noise points on subsequent geometric analysis and reduce the adverse effects of large coordinate value input on the numerical stability of the network.
[0087] (2) Spatial range clipping and voxel mesh downsampling: limiting the point cloud to [ Within a 3D spatial range of 500 m, points outside this range are deleted. Then, a voxel grid with a side length of 0.04 m is used to downsample the point cloud, retaining a representative point within each non-empty voxel, while simultaneously recording the voxel grid coordinates and minimum coordinate offset information. This step is used to unify the point cloud density, reduce computational complexity, and provide a spatial indexing basis for subsequent serialization encoding.
[0088] (3) Local spherical clipping: After voxelization, local spherical clipping is performed on the point cloud, using the central region of the point cloud as a reference, retaining up to 120,000 points closest to the center. This step is used to control the number of points in a single training sample, ensure stable computational resource consumption during training, and improve the processing efficiency of large-scale point clouds.
[0089] (4) Multi-scale geometric feature enhancement: Multi-scale local geometric feature extraction is performed on the cropped point cloud, with the number of neighborhood points set to 20, and point neighborhoods are constructed at two spatial scales with radii of 0.1m and 0.5m, respectively; based on the covariance matrix and eigenvalue decomposition results of the neighborhood point sets at each scale, local geometric description features are extracted. This step is used to enhance the model's ability to distinguish linear targets such as conductors, ground wires, jumpers, and optical cables, as well as complex structural targets such as towers and insulators.
[0090] (5) Normalized height feature extraction and coordinate re-centering: Based on the ground estimation results, the normalized height feature of each point is calculated. The ground extraction adopts the same grid ground estimation algorithm as in step 13 (i.e., local minimum elevation estimation of the initial ground value within the grid, and interpolation completion of the empty grid neighborhood). Subsequently, the point cloud is only centered again in the horizontal direction, without performing a second translation in the vertical direction. This step is used to introduce the vertical spatial distribution prior and reduce the impact of planar position offset on network learning.
[0091] (6) Point order shuffling and enhanced feature construction: After randomly shuffling the point order, the 3D coordinate features, multi-scale geometric features, and 1D normalized height features are concatenated to form a 14-dimensional enhanced input feature vector, which includes 3D coordinate features, 10D geometric description features, and 1D normalized height features. This enhanced feature vector serves as the input to the subsequent PointTransformer V3 backbone network.
[0092] After the above preprocessing steps are completed, a center translation operation is performed, which only centers the X and Y coordinates twice, while retaining the absolute height information of the Z axis. This provides the necessary elevation reference for subsequent height normalization feature extraction. Then, a point order randomization operation is performed to eliminate potential spatial arrangement biases in the point cloud and prevent the network from learning false features related to the point order.
[0093] S12, Multi-scale geometric feature extraction: Step S12 aims to extract local geometric descriptors for each point at multiple spatial scales from the preprocessed point cloud, providing explicit geometric prior information for the deep learning network.
[0094] The purpose of this step is to overcome the problem that standard point cloud segmentation networks mainly rely on implicit feature learning and are insufficient in characterizing the geometric differences of small linear targets and complex structural targets in power transmission line scenes. In power transmission line scenes, different categories of objects exhibit significant and stable differences in their local geometry: conductors, ground wires, jumpers, and optical cables typically exhibit an approximately one-dimensional linear distribution in their local neighborhood, with their first eigenvalue significantly larger than their second and third eigenvalues; targets such as steel structures of towers and building surfaces typically exhibit an approximately two-dimensional planar distribution in their local neighborhood, with their first and second eigenvalues significantly larger than their third eigenvalue; complex structures such as insulators and spacers are closer to a three-dimensional scattering distribution in their local neighborhood, with their three eigenvalues being relatively close. These geometric differences originate from the physical morphology of the objects themselves and possess strong stability and distinguishability. By explicitly extracting these geometric descriptors and using them as additional input features, the learning difficulty of the network for rare category geometric patterns can be reduced, thereby improving the overall segmentation performance.
[0095] In one specific embodiment, the multi-scale geometric feature extraction process includes the following five operations: (1) Spatial neighborhood index construction: A spatial index structure is constructed for the preprocessed point cloud to improve the efficiency of local neighborhood search. For extremely degraded samples with too few points, when the total number of points in the point cloud is less than the preset minimum point threshold of 10, a geometric feature matrix of all zeros is directly returned to avoid degradation in the subsequent neighborhood construction and covariance analysis process, and to ensure the stability of the training and inference process.
[0096] (2) Multi-scale local neighborhood construction: For each point in the point cloud, local spherical neighborhoods are constructed at two spatial scales with radii of 0.1m and 0.5m, respectively, to simultaneously characterize fine-scale geometric structures and large-scale spatial distribution features. Among them, small-scale neighborhoods are more conducive to characterizing the local morphology of slender targets such as conductors, jumpers, optical cables, and ground wires, while large-scale neighborhoods are more conducive to reflecting the structural distribution features of targets such as towers and insulators. For any point at a certain scale, if its number of neighborhood points is less than 3, it is considered that an effective covariance matrix cannot be formed at that scale, and the geometric descriptor corresponding to that point at that scale is set to zero.
[0097] (3) Covariance Matrix Construction and Eigenvalue Decomposition: For each point, the local neighborhood point set obtained at each scale is used to calculate the covariance matrix of its three-dimensional coordinates, and the covariance matrix is decomposed into three non-negative eigenvalues. To avoid division by zero or numerical instability in the subsequent ratio calculation, the lower limit of each eigenvalue is set to... .
[0098] (4) Geometric descriptor calculation: Based on the eigenvalue decomposition results of the local neighborhood covariance matrix constructed for each point at two spatial scales, five geometric descriptors are calculated: linearity, flatness, scattering degree, anisotropy degree, and eigenvalue entropy. The spherical neighborhood radii for the two spatial scales are set to 0.1m and 0.5m, respectively, to characterize the fine-scale local structure information and the larger-scale spatial distribution features. When the number of neighborhood points for a point at a certain scale is less than 3, it is considered that no effective local geometric statistics can be formed at that scale; therefore, all five geometric descriptors corresponding to that scale are set to 0. Simultaneously, to avoid division by zero or numerical instability during eigenvalue calculation, the lower limit of the eigenvalues involved in the calculation is set to... Through the above processing, the multi-scale geometric feature extraction process can be guaranteed to have good stability and robustness in sparse regions, edge regions, and local degradation conditions.
[0099] The five geometric descriptors are used to characterize different aspects of the geometric properties of a point's neighborhood: linearity reflects the extent to which the neighborhood point set extends along a single principal direction, which is beneficial for characterizing linear targets such as conductors, ground wires, jumpers, and optical cables; planarity reflects the extent to which the neighborhood point set unfolds in a two-dimensional plane, which is beneficial for characterizing planar targets such as local components of towers, building surfaces, and the ground; scattering reflects the degree of dispersion of the neighborhood point set in three-dimensional space, which is beneficial for characterizing complex structural targets such as insulators and spacers; anisotropy reflects the strength of the directionality of the point's neighborhood distribution; and feature entropy characterizes the complexity of the local geometric distribution. Finally, the five geometric descriptors obtained at the 0.1m and 0.5m radii are sequentially concatenated to form a 10-dimensional multi-scale geometric feature vector for each point, providing explicit geometric prior information for the subsequent semantic segmentation network.
[0100] S13, Highly Normalized Feature Extraction: Step S13 aims to utilize the vertical distribution patterns of different objects in the power transmission line scenario to extract the normalized height features of each point relative to the local ground, and introduce them as an additional input channel into the deep learning network, thereby enhancing the model's discrimination ability in the height dimension.
[0101] The design of this step is based on the fact that different types of targets in a power transmission line corridor scenario typically exhibit relatively stable height distribution differences. The normalized height of targets such as the ground and roads is usually close to 0m; vegetation is typically distributed in the lower to middle height range; and overhead targets such as conductors, ground wires, optical cables, and towers are usually located at higher positions. Since the absolute Z-coordinate of the original point cloud is affected by factors such as terrain undulations and elevation changes in the data collection area, the absolute elevation of the same type of target may vary significantly in different scenarios. Therefore, if the original Z-coordinate is used directly, the network will find it difficult to stably learn the height distribution pattern. This step uses ground elevation estimation to convert the original elevation into a normalized height relative to the local ground, effectively reducing the interference caused by terrain undulations and making the vertical spatial distribution differences of different categories clearer. This provides additional discrimination criteria for distinguishing between categories such as conductors, ground wires, vegetation, and the ground.
[0102] In one specific embodiment, the height normalization process includes the following steps: (1) Ground elevation grid construction: First, the point cloud is projected onto the XY horizontal plane, and the planar coordinates of each point are used. Using this as the basis for grid positioning, a ground elevation grid was constructed by regularly dividing the grid with a size of 2.0m. For each non-empty grid cell, the minimum Z-coordinate of all points within that grid was calculated and recorded as the initial ground elevation estimate for that grid cell. Using the minimum elevation as the initial ground estimate is based on the characteristic that ground points in transmission line corridor scenarios are usually located at local lowest points, which can largely suppress the interference of non-ground targets such as vegetation, towers, and conductors on the ground estimation. The 2.0m grid size strikes a balance between the ability to represent ground undulations and the stability of the estimation, preserving some local terrain change information while avoiding the generation of a large number of empty cells due to an overly dense grid.
[0103] (2) Ground elevation interpolation for empty grid cells: For empty grid cells without point coverage, since there is no directly observable minimum elevation value, interpolation estimation is performed based on the ground information of the surrounding valid grid cells. Specifically, the center coordinates of all valid grid cells are collected. Based on the corresponding ground elevation values, a nearest neighbor regression model is constructed to complete the ground elevation of empty grid cells. The nearest neighbor number... K The smaller value between 5 and the number of effective grid cells is selected, and interpolation is performed using a distance-weighted method, giving higher weight to the estimated results from closer effective grid cells. This step expands the discrete local minimum elevation samples into a continuous ground elevation surface, thereby improving the completeness and stability of subsequent point-by-point normalized height calculations.
[0104] (3) Point-by-point normalized height calculation: For each point in the point cloud According to its plane coordinates Query the estimated local ground elevation at the corresponding location, denoted as g. The normalized height of this point is defined as: , in, Point The vertical distance relative to the local ground level below it. For ground points and road points, It is usually close to 0; for vegetation points, It typically manifests as a small or moderately positive value; for overhead targets such as conductors, ground wires, optical cables, and high-level structures of towers. This typically manifests as a large positive value. Using the above method, the height distribution of the same type of target under different terrain conditions can be mapped to a unified relative height space, thereby improving the stability and comparability of feature representation.
[0105] (4) Degraded Sample Fault Tolerance: To avoid problems such as the inability to construct a ground grid, insufficient number of effective grids, or unstable execution of nearest neighbor interpolation when the total number of points in the point cloud is less than the preset minimum point threshold, a normalized height array of all zeros and the corresponding ground elevation result are directly output. This fault tolerance mechanism can prevent abnormal interruption of the training or inference process due to local degraded samples, ensuring the robustness and continuity of the entire processing flow.
[0106] (5) Normalized height feature output: Finally, the point-by-point normalized height is output. As a one-dimensional additional feature, it is similar to a three-dimensional coordinate feature. Multi-scale geometric features are combined to form an enhanced input feature vector. Compared to the input method that only uses the original spatial coordinates, this normalized height feature explicitly introduces the vertical structural prior in the transmission line scene, which helps to enhance the model's ability to understand the spatial positional relationships of different types of targets, especially improving the differentiation effect between ground, vegetation and overhead line targets.
[0107] S14, Enhanced Feature Vector Assembly: Step S14 performs unified format conversion and dimensional concatenation on the extracted multi-source heterogeneous features to assemble them into a standardized enhanced feature vector that can be directly accepted by the deep learning network.
[0108] The purpose of this step is to address the issue of heterogeneity in the format of multi-source features. Steps 1-2 extract geometric features as 10-dimensional NumPy arrays, steps 1-3 extract height features as 1-dimensional NumPy arrays, while coordinate features may be NumPy arrays or PyTorch tensors, and the data types of these features may be inconsistent. Directly concatenating these heterogeneous features can lead to type errors or precision loss. This step ensures that the final output feature vector meets the input requirements of deep learning networks through unified format conversion and dimension alignment.
[0109] For each point in the point cloud Its three-dimensional coordinates, 10-dimensional multi-scale geometric features, and 1-dimensional normalized height features are concatenated into a 14-dimensional enhanced feature vector, and their arrangement order is as follows: [ x , y , z 10-dimensional geometric features, normalized height.
[0110] In one specific embodiment, the module performs the following operations: (1) Feature key traversal: Extract each feature component from the data dictionary in sequence according to the preset feature key list: (feat_keys=('coord', 'geometric_features', 'height_normalized')).
[0111] (2) Unified Format Conversion: Type checking and conversion are performed on each feature component. If the feature is a NumPy array, its data type is first ensured to be float32, and then converted to a PyTorch tensor using `torch.from_numpy`; if the feature is already a PyTorch tensor, it is ensured to be of type float32 using the `.float()` method. This unified conversion ensures that all feature components have the same data type, avoiding precision loss caused by mixed precision concatenation.
[0112] (3) Point count consistency verification: based on the number of points of coordinate features For reference, check if the first dimension (point count dimension) of each feature component is consistent with... Consistency. If the number of points for a feature component does not match the reference value, that feature component is skipped and a warning message is output, rather than throwing an exception and interrupting training. This fault-tolerance mechanism ensures that the training process can continue to run even when faced with occasional data inconsistencies.
[0113] (4) Dimensional alignment: Perform a dimension expansion operation on a one-dimensional feature to expand it into a two-dimensional tensor, so that it is aligned with other two-dimensional features in terms of dimensions.
[0114] (5) Feature concatenation: Perform a concatenation operation along the feature dimension on all validated feature components to obtain the final 14-dimensional enhanced feature tensor. If all feature components fail validation, backtrack and use coordinate features as the default input to ensure that the network always has valid input.
[0115] (6) Offset calculation: To support the batch processing mechanism of Point Transformer V3, the point offset of the current sample is calculated and recorded. This offset is used to identify the boundary position of each sample when batch splicing multiple samples.
[0116] Through the above steps, the present invention expands the original 3D coordinate features into 14-dimensional enhanced feature vectors. Without changing the backbone network architecture, it provides the network with rich geometric prior information and height discrimination information, significantly enhancing the network's ability to distinguish various objects in the power transmission line scenario.
[0117] S2, Deep feature learning based on sequence attention mechanism: The 14-dimensional enhanced feature vector is input into the Point Transformer V3 backbone network, and deep feature learning is performed through a multi-level serialized encoder-decoder structure to output 22 semantic segmentation predictions for each point.
[0118] This step aims to leverage a deep learning network to learn high-level semantic representations from the 14-dimensional enhanced features constructed in step S1, enabling point-by-point classification of 22 object categories in the power transmission line scene. This step employs Point TransformerV3 (PTv3) as the backbone network, which is one of the leading architectures in the field of 3D point cloud semantic segmentation.
[0119] PTv3 serializes unordered 3D point clouds into a one-dimensional sequence, thereby preserving spatial locality and transforming the attention computation of 3D point clouds into window attention computation of a one-dimensional sequence, significantly reducing computational complexity. By arranging the 3D point cloud into a one-dimensional sequence along a spatial filling curve, self-attention computation can be performed within a local window of the sequence. Points within the window are also neighbors in 3D space, thus achieving an efficient local attention mechanism.
[0120] In this invention, the number of input channels of the PTv3 backbone network is modified from the standard 3 channels to 14 channels to receive the 14-dimensional enhanced feature vector output in step S1. This modification is mainly reflected in the dimensionality adaptation of the input embedding layer, without changing the main encoder-decoder structure of the backbone network, thus preserving the efficiency advantage of PTv3 in large-scale point cloud processing.
[0121] In one specific embodiment, the detailed configuration of the backbone network is as follows: (1) Input layer configuration: The input channel count is set to 14, corresponding to the 14-dimensional enhanced feature vector output in step 1, which includes 3-dimensional XYZ coordinates, 10-dimensional multi-scale geometric features (5 descriptors for each of the two scales), and 1-dimensional normalized height. The first layer of the network projects the 14-dimensional features to the first-order channel count (32-dimensional) of the encoder. This projection is achieved through a linear layer, whose weights are jointly optimized with other network parameters during training. Compared to the standard 3-channel input, the 14-channel input provides the network with a richer initial feature representation, enabling the network to obtain explicit information about local geometry and relative height in the first layer, without having to implicitly infer this information from the coordinates through multiple layers.
[0122] (2) Serialization strategy configuration: The serialization strategy employs a combination of various space-filling curve serialization strategies, including Z-order and Hilbert-type serialization methods and their variants. Z-order curves construct a one-dimensional index by interleaving binary bits across dimensions, offering high computational efficiency but slightly weaker preservation of spatial locality. Hilbert curves construct a one-dimensional index through recursive spatial partitioning and rotation, providing stronger preservation of spatial locality but with slightly higher computational overhead. Transposed curves are variants obtained by swapping the coordinate axis order and then applying the original curve, offering different spatial traversal paths.
[0123] (3) Encoder configuration: The encoder employs a 5-level hierarchical structure, constructing a multi-scale feature pyramid by progressively increasing the number of channels and decreasing the number of points. Downsampling is performed between levels using a serialized pooling operation with a stride of 2, halving the number of points and doubling the number of channels with each downsampling. The fourth level has a depth of 6 layers because deeper network layers facilitate learning more complex semantic feature representations within a 256-dimensional intermediate feature space. Within each attention layer, the serialized one-dimensional point sequence is divided into non-overlapping patches of size 512, and standard multi-head self-attention computation is performed within each patch. The choice of patch size 512 represents a balance between the receptive field of the attention and computational efficiency.
[0124] (4) Decoder configuration: The decoder employs a four-level hierarchical structure, recovering point-by-point features at the original resolution by progressively reducing the number of channels and increasing the number of points. Upsampling is performed between each level through serialization and unpooling operations, and skip connections are used to fuse features from the encoder at the corresponding level with those from the decoder, preserving multi-scale spatial detail. Skip connections are particularly important for power transmission line scenarios: features from the lower levels of the encoder retain fine spatial information about small objects (conductors, jumpers), while features from the higher levels capture the global semantic context (such as the spatial relationship between towers and conductors). The fusion of these features allows the decoder to simultaneously utilize both local details and global semantics for point-by-point classification.
[0125] (5) Output layer configuration: The final output of the decoder is mapped to 22 semantic segmentation predictions through a linear classification head. This classification head maps the 64-dimensional features of the decoder's last stage to a 22-dimensional logit vector, with each dimension corresponding to an unnormalized prediction score for a semantic category. The 22 semantic categories cover all major object types in the transmission line corridor, as defined in Table 1.
[0126] Table 1 Semantic Category Definitions
[0127] S3, Model training based on combined loss function and confusion pair constraints: A three-factor combination of Lovász-Softmax loss, Focal loss, and class balance loss is adopted as the main loss function, and a hinge loss constraint for 16 predefined easily confused spatial structure class pairs is introduced as an auxiliary loss function to jointly optimize the model parameters.
[0128] This step aims to address the training optimization challenges posed by extreme class imbalance and the coexistence of numerous easily confused spatial structure class pairs in transmission line scenarios. In the semantic segmentation task of transmission line point clouds, the sample number distribution of the 22 classes exhibits an extreme long-tail characteristic: the number of points for dominant classes (such as vegetation and ground) can reach hundreds of thousands, while the number of points for rare classes (such as jumpers, optical cables, and V-string insulators) is only in the tens to hundreds. Under this extreme imbalance condition, the standard cross-entropy loss function becomes severely ineffective—because it assigns equal weights to each sample, the model's gradient update is dominated by the dominant class, and the contribution of rare classes to the loss function is submerged, resulting in the model being almost unable to learn the discriminative features of rare classes. Furthermore, there are numerous class pairs with highly similar geometric shapes in transmission line scenarios, and the standard loss function lacks the ability to specifically constrain these easily confused spatial structure class pairs, leading to a persistently high confusion rate between these classes.
[0129] To address the two aforementioned challenges, this invention proposes a joint training method that integrates multi-objective optimization and explicit confusion constraints. The core idea of this method is to collaboratively address the class imbalance problem from three orthogonal dimensions—IoU optimization, hard example focusing, and class balance—through a triple-combined main loss function; simultaneously, a confusion-constrained auxiliary loss function explicitly widens the decision boundary between easily confused spatial structure classes in the logit space. The combined effect of these two loss functions enables the model to significantly improve the recognition ability of rare classes and easily confused spatial structure classes while maintaining the segmentation accuracy of the dominant class.
[0130] like Figure 5 As shown, S3 includes the following sub-steps: S31, Triple combined main loss function: Step S31 aims to construct a composite loss function that can collaboratively address extreme class imbalance from multiple orthogonal dimensions. A single loss function often only addresses one aspect of the problem when dealing with class imbalance ratios: cross-entropy loss cannot detect class imbalance; Focal loss focuses on hard examples but does not directly optimize segmentation quality metrics; Lovász loss directly optimizes IoU but is insensitive to class frequency. This invention weights and combines three complementary loss functions, enabling them to work synergistically from three dimensions: segmentation quality optimization, hard example focus, and class balance.
[0131] The combined loss function is composed of a weighted sum of three components: Main Loss = ×Lovász loss+ ×Focal loss+ ×Class balance loss. Among them, = = =1 represents the weight coefficient of each component. Setting all three weights to 1 aims to maintain a balance in gradient contribution among the three loss functions, preventing the optimization objective of one dimension from suppressing others. The specific definitions of the three loss functions are as follows: (1) Lovász-Softmax Loss: This loss function transforms the discrete IoU index into a continuously differentiable surrogate loss through Lovász extension, enabling the model to directly optimize IoU. It employs a strategy of "only calculating categories present in the current batch," calculating the sorted weighted inner product of the point-by-point prediction errors for each category as the loss value. This loss is particularly sensitive to the quality of the segmentation boundaries, helping to improve the boundary accuracy of small linear objects.
[0132] (2) Focal Loss: This loss function is an improved variant of the cross-entropy loss. By introducing a modulation factor, it automatically reduces the weight of easily classified samples, allowing the model to focus on difficult-to-classify samples. In this invention, the focusing parameter is set to 3.5 (higher than the commonly used 2.0) to adapt to the extreme class imbalance in the transmission line scenario. The loss calculation uses a uniform modulation factor for all classes, and focuses on difficult examples only through the γ parameter.
[0133] (3) Class Balance Loss: This loss function assigns adaptive weights to each class based on the Effective Number of Samples theory. For a sample size of... For category c, the number of valid samples is defined as hyperparameters The class weights are inversely proportional to the number of valid samples and are normalized before being applied to the cross-entropy loss. Mathematically, this method can be proven to be a smooth transition between inverse frequency weighting and uniform weighting, effectively mitigating the class imbalance problem caused by long-tail distributions without excessively amplifying noise from extremely rare classes.
[0134] S32, Confusion with Constraint Auxiliary Loss Function: This invention proposes a confusion pair constraint loss function based on a hinge loss mechanism. For each predefined pair of easily confused spatial structure categories ( a, b) It forces the model to classify elements belonging to a category. a The point in the category a The logit value on is at least higher than that in the category b The logit value is higher than the marginal distance. m=1.0, and vice versa. When the discrimination is insufficient, a positive gradient signal is generated to drive the model to increase the decision boundary.
[0135] The loss function calculates the bidirectional hinge loss pairwise for each of the 16 confusion pairs, i.e. max The loss is calculated as the mean of (0, m - (true class logit value - confused class logit value)), multiplied by the global weight w_conf=2.5 after averaging the losses of all confused pairs. If a class does not have samples in the current batch, the loss for that class is automatically skipped.
[0136] Marginal distance m The value is set to 1.0 (higher than the common 0.5) because the geometric similarity of confused class pairs is extremely high in the transmission line scenario, requiring a larger marginal distance. The global weight is set to 2.5 (higher than 1.0 for each component of the main loss) to ensure that the confusion pair constraints have sufficient gradient strength.
[0137] The 16 easily confused spatial structure categories are predefined based on domain expert knowledge and confusion matrix analysis, and are categorized into four types according to the cause of confusion: Category 1: Geometric morphological confusion between linear objects. This includes six pairs: patch cord 0 vs. conductor 10, patch cord 0 vs. ground wire 19, patch cord 0 vs. optical fiber 20, conductor 10 vs. ground wire 19, optical fiber 20 vs. conductor 10, and optical fiber 20 vs. ground wire 19. In the 3D point cloud, patch cords, conductors, ground wires, and optical fibers in power transmission lines all appear as one-dimensional linear structures, exhibiting inherent similarities in their local geometric features: normal vectors are radially distributed along the line segment, curvature is close to zero, and linearity is close to 1.0. However, in local areas such as the tower-side commutation zone and the lowest point of sag, the orientations of different cables tend to be consistent (e.g., conductors and ground wires are approximately horizontal at the center of the span), making it impossible for the model to distinguish them based solely on local geometric features. The four types of cables, when paired, form exactly six confusion pairs, covering all possible confusions between cables.
[0138] Category 2: Confusion due to structural similarity between insulator types. This includes groups such as V-string insulator 13 vs. tension insulator 12, V-string insulator 13 vs. straight insulator 18, spacer 14 vs. straight insulator 18, and V-string insulator 13 vs. spacer 14. V-string insulators, tension insulators, straight insulators, and spacers are all small components mounted on conductors or towers. In point clouds, they appear as short rod-shaped or beaded structures intersecting the conductor / tower, with dimensions ranging from 0.5 to 2 meters (far smaller than the hundreds of meters of conductors or the tens of meters of towers). Due to the sparse number of points (hundreds to thousands, accounting for <0.1% of the total number of points in the scene) and their similar geometric shapes—all being short string structures along the vertical or horizontal direction—the models are extremely prone to confusing them with each other.
[0139] The third category: Boundary confusion caused by spatial adjacency. This includes four groups: jumper 0 vs. tension insulator 12, tension insulator 12 vs. conductor 10, tension insulator 12 vs. tower 11, and ground wire 19 vs. tower 11. The various components of the transmission line are physically connected by fittings. In the point cloud at these connections, points of different categories are spatially intertwined, and the labeled boundaries themselves have uncertainties ranging from several centimeters to tens of centimeters. The model exhibits systematic misclassification in these transitional regions due to the lack of clear geometric interfaces.
[0140] Category 4: Confusion caused by occlusion and background interference. This includes two groups: ground wire 19 vs. low vegetation 8 and optical cable 20 vs. low vegetation 8. Vegetation and line equipment are spatially intertwined. Within the transmission line corridor, vegetation (trees, shrubs) and cables may spatially intertwine in the vertical direction, especially in areas where tree-line conflicts are prominent, with tall tree canopies encroaching on the cable's safe distance range. In the 3D point cloud, vegetation points and cable points lack clear spatial intervals, and the local linear features of vegetation branches may resemble those of thin cables.
[0141] S33, Training Strategy Configuration: The model training employed the following strategies: the AdamW optimizer was used with an initial learning rate of 0.0001 and weight decay of 0.005; the OneCycleLR scheduler was used, with linear warm-up for the first 4% of iterations, followed by cosine annealing for descent; training lasted for 100 epochs, with data looped 3 times within each epoch (loop=3); parallel training was performed on 4 GPUs with a total batch size of 16; all random data augmentation operations (rotation, scaling, flipping, etc.) were removed; due to the strict physical structural constraints of the power transmission line scenario, mesh sampling and spherical clipping adopted preset strategies, while retaining random shuffling of point order; the gradient clipping threshold was 1.0; and mixed precision training was enabled.
[0142] The total loss function is the direct sum of the main loss and the auxiliary loss: Total Loss = Lovász Loss + Focal Loss + Class Balance Loss + Confusion Pair Constraint Loss.
[0143] S4, structured post-processing based on domain knowledge: The model's original prediction results are subjected to multi-stage geometric analysis and topological constraint correction in sequence, and the final semantic segmentation results are output.
[0144] The model's original predictions are subjected to multi-stage geometric analysis and topological constraint correction, outputting the final semantic segmentation results. This step aims to systematically correct structural errors in deep learning model predictions using knowledge of the physical topology of transmission lines. Although the improvements in step S13 have significantly enhanced the model's segmentation accuracy, the model output still inevitably contains some prediction errors that violate physical common sense (such as isolated noise points, confusion between conductors and ground wires, inconsistent labels on the same line, and vegetation infiltration at tower edges). This step uses an ordered post-processing pipeline to gradually eliminate these structural errors, ultimately outputting the geometric structure segmentation results of the point cloud (i.e., each point is assigned a label representing its spatial structural attributes, such as conductor, tower, vegetation, etc.).
[0145] like Figure 6 As shown, S4 includes the following sub-steps: S41, Global KNN Voting Smoothing: Perform K-nearest neighbor (K=10) voting reclassification on all points in the entire scene: Construct a KNN classifier using the global point cloud as the training set, query the 10 nearest neighbors of each point, and use the majority class in the neighborhood as the new predicted label for that point, uniformly eliminating scattered noise points and making the predictions of adjacent points more consistent. This step is only performed when the number of points in the scene does not exceed 500,000 to control the amount of computation.
[0146] S42, Conductor / Ground Correction Based on Parallel Line Hierarchy Priors: Both the conductor and the ground wire are horizontally suspended parallel lines. The Z-height of the same line is basically consistent at all points, and the ground wire is always above the conductor. The method first extracts line-type point clouds (conductors, ground wires, and optical cables) within a 100m horizontal radius of the tower. The Z-coordinates are clustered by gap and sorted from high to low to establish a "hierarchy table": the highest layer is labeled as the ground wire. If the difference between the Z-means of the two highest layers does not exceed 10m, both layers are ground wires. The remaining layers are labeled as conductors. Then, all line-type points in the entire scene are segmented along the main axis of the point cloud with a step size of 10m. Within each segment, the line-type points are clustered again by Z. The Z-mean of each cluster is compared with the hierarchy table. If the Z-difference of the nearest layer is within ±3.5m, it is corrected according to the target label of the hierarchy table (the misclassified conductor is upgraded to a ground wire). Otherwise, it is considered an incomplete line segment that drifted in from outside the scene and is skipped without modification. For conductor points located within 20m horizontally of the tower and with a Z-value higher than 80% of the tower top, an exemption from upgrading is granted to avoid mistakenly converting jumpers high up next to the tower into ground wires.
[0147] S43, Segment connectivity merging and cleanup: Conductors and optical cables exhibit linear continuity, and labels on the same line should remain consistent. The method divides the line-type point cloud into segments with a step size of 10m along the principal axis. Within each segment, clustering is performed using Z-intervals to obtain line clusters. A disjoint-set data structure is then used to merge clusters spanning adjacent segments and whose Z-mean difference is within three times the interval threshold into complete line groups. For each complete line group, the percentage of points for conductor 10 and optical cable 20 is calculated. If a certain type accounts for more than 55%, the labels of all mixed points in that line group are unified into the majority class, eliminating the mixing of conductor / optical cable labels on the same line. If ground wire points account for more than 50% of the line, the original labels are retained and not unified.
[0148] S44, most voting labels within the conductor segment are consistent: Based on step S43, the line point cloud is divided into 5m segments along the main axis. Line points in each segment are clustered by Z-gap. For each cluster, the wire 10 / ground wire 19 / optical cable 20 are subject to majority voting: if the percentage of a certain label exceeds the set threshold (default 60%), all line points in the cluster are unified to the majority label to further eliminate label jitter in short distances.
[0149] S45, Tower shape constraints based on 3D template matching: The tower has a fixed three-dimensional geometric shape, and the refined inference region can be determined by matching it with a pre-built template. The method maintains a template library containing various tower types (cement poles, steel pipe poles, and lattice towers, etc.), storing the horizontal major axis span, minor axis span, tower height, and shape descriptors (major axis / tower height ratio, minor axis / tower height ratio, and logarithmic anisotropy value) of each template in a metadata file. For the tower point cloud identified in the coarse inference results, its shape descriptor is extracted. The template library is traversed to comprehensively compare shape error and scale consistency, and the template with the best score is selected as the matching result. Based on this, the matching scale factor is calculated. A dynamic refined region (horizontal range and Z range) centered on the tower is determined using the matching template and scale factor. Secondary inference is performed within this region using a fine mesh. The fine inference results are fused with the coarse inference results according to confidence level. Predictions for target categories (jump wires, insulators, spacers, etc.) are locally replaced, thereby constraining the segmentation accuracy of the tower and its surrounding components.
[0150] S46, Vegetation correction at the edge of the tower: In areas bordering towers, point cloud category boundaries can be blurred, leading the model to easily retain vegetation or ground points adjacent to the tower structure as non-tower categories. The proposed method performs this correction when the total number of tower points exceeds 100: it calculates the 3D axis-aligned bounding box for each tower point, searches within the bounding box for candidate noise points predicted as vegetation (low vegetation, high vegetation), ground, or other categories, and uses a KD-tree to query the distance between each candidate point and its nearest tower point. If this distance is less than 0.5m, the point is considered closely adjacent to the tower structure and corrected to the tower category, thus filling in minor omissions in the tower edge region.
[0151] Through the orderly execution of the six sub-steps S41-S46, the post-processing pipeline systematically eliminates structural errors in model predictions and significantly improves the physical rationality of the final segmentation results, from KNN smoothing, wire hierarchy correction, line segment merging and unification to template constraints and edge correction.
[0152] In comparative tests on 120 sets of measured point cloud datasets for 110kV and 500kV transmission lines, this solution achieved the following unexpected technical results compared to the existing general Point Transformer V3 segmentation solution: the average IoU of rare linear categories such as jumpers and optical cables was improved by 13.7 percentage points; the average recognition accuracy of various insulators was improved by 11.2 percentage points; the overall mIoU was improved by 7.2 percentage points; the inter-class confusion rate between conductors and ground wires, and between jumpers and conductors was reduced by 68.3%; and the stability of the segmentation results was improved by 42.6% in complex scenarios with large terrain undulations and severe vegetation obstruction.
[0153] Example 3 Alternative Example 1: The difference between Alternative Example 1 and Example 1 is that the multi-scale geometric feature extraction uses a K-nearest neighbor (KNN) neighborhood query method instead of a fixed-radius spherical neighborhood, and the number of neighborhood points K is set to 20. The remaining steps are the same as in Example 1. This alternative scheme is more adaptable to scenes with uneven point cloud density and can further improve the stability of feature extraction in sparse regions.
[0154] Alternative Example 2: The difference between Alternative Example 2 and Example 1 is that the high-normalization feature extraction uses a local minimum filtering algorithm instead of the grid interpolation ground estimation method. Ground elevation is fitted using local minima within a sliding window; the remaining steps are the same as in Example 1. This alternative solution has lower computational overhead and improves processing speed by 30%, making it suitable for inspection scenarios with high real-time requirements.
[0155] Alternative Example 3: The difference between Alternative Example 3 and Example 1 is that the Focal loss is replaced by the Dice loss and the OHEM strategy is used instead of the focusing parameter modulation. The remaining steps are the same as in Example 1, and similar class imbalance optimization effects can be achieved.
[0156] Alternative Example 4: The difference between Alternative Example 4 and Example 1 is that the tower template matching uses one-dimensional template matching based on radial profile instead of three-dimensional point cloud template matching. The remaining steps are the same as in Example 1, which improves the computational efficiency by 45% and is suitable for scenarios of rapid processing of batch point clouds.
[0157] Alternative Example 5: The difference between Alternative Example 5 and Example 1 is that the height normalization feature extraction uses the CSF (Cloth Simulation Filter) method instead of the grid ground estimation algorithm, while the remaining steps are the same as in Example 1. This alternative scheme is more adaptable to scenes with drastic terrain undulations and dense vegetation cover, and achieves higher accuracy in ground elevation estimation.
[0158] Example 4 An electronic device provided in this invention includes a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor communicates with the memory via the bus, and the processor executes the machine-readable instructions to perform steps of any of the above-described transmission line point cloud segmentation methods based on geometric enhancement and obfuscation constraints.
[0159] Specifically, the aforementioned memory and processor can be general-purpose memory and processor, without specific limitations. When the processor runs the computer program stored in the memory, it can execute the aforementioned transmission line point cloud segmentation method based on geometric enhancement and obfuscation constraints. The processor is used to perform the entire process of preprocessing, feature extraction, model inference, and post-processing; the memory is used to store model parameters, runtime configurations, and intermediate results; and the communication interface is used to receive input point cloud data and output segmentation result files.
[0160] Corresponding to the above application startup method, this embodiment of the invention also provides a computer-readable storage medium storing a computer program. When a processor runs this computer program, it executes the steps of any of the above-described transmission line point cloud segmentation methods based on geometric enhancement and obfuscation constraints. The storage medium includes, but is not limited to, industrial-grade storage media such as ROM / RAM, magnetic disks, optical disks, and solid-state drives. When the program is executed, it can directly interface with the point cloud output interface of a power line inspection LiDAR acquisition device, realizing fully automated processing from raw point cloud input to standardized segmentation result output.
[0161] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit it. Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the specific implementation of the present invention. Any modifications or equivalent substitutions that do not depart from the spirit and scope of the present invention should be covered within the protection scope of the claims of the present invention.
Claims
1. A method for power line point cloud segmentation based on geometric enhancement and confusion constraint, characterized in that, Includes the following steps: The original 3D point cloud of the input transmission line is preprocessed to extract multi-scale geometric features and highly normalized features, and the original 3D coordinate features are expanded into a multi-dimensional enhanced feature vector containing multi-scale geometric features and highly normalized features. The multidimensional enhanced feature vector is input into the Point Transformer V3 backbone network, and sequential attention encoding-decoding is performed to output the geometric structure feature map of each point in the power transmission line scene. The geometric structure feature map is used to characterize the spatial structure attribute distribution of the point cloud. The network parameters are optimized by jointly training with the combined loss function and the confusion pair constraint loss. The confusion pair constraint loss is used as an auxiliary loss to apply interval constraints to predefined easily confused spatial structure category pairs. Based on knowledge of the power transmission line domain, the geometric feature map is post-processed in a structured manner to obtain the original point-level final geometric structure segmentation result.
2. The method of claim 1, wherein, The preprocessing of the input original 3D point cloud of the transmission line to extract multi-scale geometric features and high-normalization features, and the expansion of the original 3D coordinate features into a multi-dimensional enhanced feature vector containing multi-scale geometric features and high-normalization features, includes the following steps: Step 11: Perform standardization preprocessing on the input original 3D point cloud of the transmission line to eliminate noise and coordinate offset interference, and obtain standardized point cloud data; Step 12: For the standardized point cloud data, construct local neighborhoods at multiple spatial scales, and extract multi-scale geometric features based on the eigenvalue decomposition of the neighborhood covariance matrix. Step 13: For the standardized point cloud data, calculate the height normalization feature of each point relative to the local ground based on the grid ground estimation algorithm; Step 14: Align and concatenate the original three-dimensional coordinate features, the multi-scale geometric features, and the highly normalized features to obtain a multi-dimensional enhanced feature vector adapted to the input of the deep learning network.
3. The method of claim 2, wherein the method is based on geometric enhancement and confusion constraint for power line point cloud segmentation. In step 14, the multidimensional enhanced feature vector adapted to the deep learning network input is obtained as 14-dimensional, arranged in the following order: 3-dimensional original coordinate features, 10-dimensional multi-scale geometric features, and 1-dimensional highly normalized features. The 10-dimensional multi-scale geometric features consist of 5 geometric descriptors for each of the two scales.
4. The method of claim 1, wherein, The process of inputting multidimensional enhanced feature vectors into the Point Transformer V3 backbone network, performing sequential attention encoding-decoding, and outputting the geometric structure feature map of each point in the transmission line scene includes the following steps: Step 21: Input the multidimensional enhanced feature vector, which is composed of three-dimensional coordinates, multi-scale geometric features and highly normalized features, into the Point Transformer V3 backbone network; Step 22: Perform serialized attention encoding-decoding operations through the Point Transformer V3 backbone network to learn deep geometric structure representations; Step 23: Output a geometric feature map covering all elements of the transmission line corridor.
5. The method of claim 4, wherein, Before inputting the multidimensional enhanced feature vectors into the Point Transformer V3 backbone network, the following steps are also included: The number of input embedding layer channels in Point Transformer V3 was modified from the standard 3 channels to a number of channels that match the multidimensional augmented feature vector, in order to adapt to multi-source augmented feature input.
6. The transmission line point cloud segmentation method based on geometry enhancement and ambiguity constraints according to claim 1, characterized in that, The method of jointly training the network parameters using a combined loss function and a confusion-based constraint loss includes the following steps: Step 31: Based on the prediction results of the Point Transformer V3 network, simultaneously calculate the combined loss function and the confusion pair constraint loss; the combined loss function is the main loss, which is composed of the weighted sum of the Lovász-Softmax loss, Focal loss, and class balance loss, with equal weights for the Lovász-Softmax loss, Focal loss, and class balance loss, all set to 1; the confusion pair constraint loss is the auxiliary loss, which adopts a hinge loss mechanism to apply an interval constraint to easily confused spatial structure class pairs at the classification score logit level; Step 32: The combined loss and the confusion-constraint loss are weighted and summed to obtain the total loss. The total loss is then used for backpropagation to optimize the network parameters and complete the model training.
7. A method for segmenting transmission line point clouds based on geometric enhancement and ambiguity constraints according to any one of claims 1-6, characterized in that, The method of performing structured post-processing on the geometric feature map based on knowledge of the transmission line domain to obtain the original point-level final geometric structure segmentation result includes the following steps: Step 41: Perform global KNN voting smoothing on the full-category geometric structure feature maps of the power transmission line scene output by Point Transformer V3 to eliminate isolated noise points and local label jitter; Step 42: Based on the spatial hierarchy prior of transmission line categories, perform conductor / ground wire correction based on parallel line hierarchy prior to correct the misclassification results of line category structure labels. The line categories include conductors, ground wires, and jumpers. Step 43: Perform line segment connectivity merging and cleaning, cluster and unify the labels of the connected line point clouds, and eliminate fragmented segmentation results; Step 44: Perform majority voting within the merged line segments to unify the results and further stabilize the consistency of local area labels. Step 45: Based on the knowledge of the three-dimensional morphology of the transmission tower, perform three-dimensional template matching constraints for the tower to refine and correct the segmentation results of the tower area; Step 46: Perform vegetation correction on the edge area of the tower, correcting the tower edge points that were mistakenly classified as vegetation or ground to the tower category; Step 47: Based on the mapping relationship between the original point cloud and the voxel grid / downsampled points recorded during voxel downsampling, the segmentation labels of the downsampled point cloud processed in steps 41-46 are mapped back to each point in the original input point cloud through nearest neighbor or majority voting, to obtain the final geometric structure segmentation result at the original point level.
8. A point cloud segmentation device for transmission lines based on geometric enhancement and ambiguity constraints, characterized in that, include: The feature enhancement module is used to preprocess the original 3D point cloud of the input transmission line, extract multi-scale geometric features and highly normalized features, and expand the original 3D coordinate features into a multi-dimensional enhanced feature vector containing multi-scale geometric features and highly normalized features. The geometric structure learning module is used to input multidimensional enhanced feature vectors into the Point Transformer V3 backbone network, perform sequential attention encoding-decoding, and output the geometric structure feature map of each point in the power transmission line scene. The joint training optimization module is used to jointly train and optimize network parameters using a combined loss function and a confusion pair constraint loss. The confusion pair constraint loss is used as an auxiliary loss to apply interval constraints to predefined easily confused spatial structure category pairs. The structured post-processing module is used to perform structured post-processing on the geometric feature map based on knowledge of the transmission line domain, so as to obtain the original point-level final geometric structure segmentation result.
9. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and when the processor executes the computer program, it implements the transmission line point cloud segmentation method based on geometric enhancement and obfuscation constraints as described in any one of claims 1-7.
10. A computer-readable storage medium storing a computer program thereon, characterized in that, When the computer program is executed by the processor, it implements the transmission line point cloud segmentation method based on geometric enhancement and obfuscation constraints as described in any one of claims 1-7.