A point cloud data processing method and device based on a non-parametric attention mechanism

By constructing local reference frames and generating rotation-invariant features in point cloud data processing, and using a non-parametric attention mechanism for feature fusion, the sensitivity of point cloud data processing to rotation transformations is solved, thereby improving the stability and accuracy of point cloud networks.

CN122243764APending Publication Date: 2026-06-19PEKING UNIV SHENZHEN GRADUATE SCHOOL

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
PEKING UNIV SHENZHEN GRADUATE SCHOOL
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing point cloud data processing methods are extremely sensitive to rotation transformations, which leads to a significant drop in classification or segmentation performance and reduces the accuracy of prediction results.

Method used

By selecting multiple center points in the original point cloud data to construct a local reference frame, local rotation-invariant features are generated. These features are then combined with global rotation-invariant features and voxel branch features, and a non-parametric attention mechanism is used to fuse the features and generate fused features.

Benefits of technology

Without introducing learnable parameters, this method improves the applicability and stability of point cloud networks in real-world scenarios, maintains low computational resource consumption, solves the performance degradation problem caused by rotation sensitivity, and improves the accuracy of prediction results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243764A_ABST
    Figure CN122243764A_ABST
Patent Text Reader

Abstract

This application provides a point cloud data processing method and apparatus based on a non-parametric attention mechanism. The method includes: selecting multiple center points in the original point cloud data to construct a local reference frame; determining point pair features within the local reference frame; and generating local rotation-invariant features based on the point pair features; sampling the original point cloud data to generate multiple point triples to determine geometric primitives; performing quantization and statistical analysis on all geometric primitives to determine global rotation-invariant features; performing voxelization and feature extraction processing on the absolute coordinates under voxel branches to obtain voxel branch features; and using a non-parametric attention mechanism to fuse the local rotation-invariant features, global rotation-invariant features, and voxel branch features to obtain fused features. By employing the above-mentioned point cloud data processing method and apparatus based on a non-parametric attention mechanism, the degradation of classification or segmentation performance during point cloud data processing is avoided, and the accuracy of prediction results is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer vision technology, and more specifically, to a point cloud data processing method and apparatus based on a non-parametric attention mechanism. Background Technology

[0002] With the widespread adoption of 3D sensing technology, the analysis and processing of point cloud data is becoming increasingly important in fields such as autonomous driving and robot navigation. Among existing point cloud data processing methods, non-parametric attention mechanisms, represented by Point-NN, have emerged as a leading approach. These networks offer significant advantages such as no training required, fast inference speed, and lightweight models, providing an efficient solution for point cloud data processing.

[0003] However, existing point cloud data processing methods have a significant drawback: they are extremely sensitive to rotation transformations. For example, these methods typically encode features directly based on the absolute coordinates of points. When the input point cloud is rotated, its feature representation changes drastically, leading to a significant drop in classification or segmentation performance and reducing the accuracy of prediction results. Summary of the Invention

[0004] In view of this, the purpose of this application is to provide a point cloud data processing method and apparatus based on a non-parametric attention mechanism to overcome at least one of the above-mentioned defects.

[0005] In a first aspect, embodiments of this application provide a point cloud data processing method based on a non-parametric attention mechanism, including: Multiple center points are selected in the original point cloud data, and a local reference frame is constructed for each center point. The point pair features between the center point and its neighboring points are determined within the local reference frame, so as to generate local rotation invariant features based on the point pair features. The original point cloud data is sampled to generate multiple point triples. Geometric primitives are determined to describe the geometric shape of each point triple. All geometric primitives are quantized and statistically analyzed to determine the global rotation invariant features. Voxelization and feature extraction are performed on the absolute coordinates of the voxel branches corresponding to the original point cloud data to obtain voxel branch features. By using a non-parametric attention mechanism, local rotation-invariant features, global rotation-invariant features, and voxel branching features are fused to obtain fused features.

[0006] Optionally, a non-parametric attention mechanism is used to fuse local rotation-invariant features, global rotation-invariant features, and voxel branch features to obtain fused features. This step includes: using a non-parametric attention mechanism to concatenate the local rotation-invariant features, global rotation-invariant features, and voxel branch features to obtain a comprehensive feature vector; determining the attention weight of each point based on the comprehensive feature vector; and using the attention weights to perform a weighted summation of the comprehensive features in the neighborhood to obtain the final fused features.

[0007] Optionally, the step of determining the attention weight of each point based on the comprehensive feature vector includes: using a non-parametric attention mechanism to determine the dot product between the comprehensive features of the center point and the comprehensive features of the neighboring points; normalizing the ratio of the dot product to the temperature coefficient to determine the attention weight of each neighboring point relative to the center point.

[0008] Optionally, the step of quantifying and statistically analyzing all geometric primitives to determine the global rotation-invariant features includes: dividing the range of values ​​for the side length and angle representing the geometric shape into multiple preset intervals; statistically analyzing the frequency of the value range of the geometric primitive of each point triplet falling into the preset interval to generate a global histogram; and flattening the global histogram to obtain the global rotation-invariant features.

[0009] Optionally, local rotation-invariant features can be generated by encoding the point-pair features at their positions to expand the low-dimensional geometric features into high-dimensional feature vectors.

[0010] Optionally, the step of selecting multiple center points in the original point cloud data and constructing a local reference frame for each center point includes: using a preset algorithm to search for neighboring points around each center point to construct a local neighborhood of the center point; determining the normal vector of the local neighborhood based on the covariance matrix of all points in the local neighborhood relative to the center point; and using the projection of neighboring points on the principal axis to perform weighted voting to determine the direction of the normal vector and the tangent plane axis to construct a local reference frame.

[0011] Optionally, the step of determining the normal vector of the local neighborhood based on the covariance matrix of all points in the local neighborhood relative to the center point includes: determining the covariance matrix of all points in the local neighborhood relative to the center point; performing eigenvalue decomposition on the covariance matrix to obtain eigenvectors; and sorting the eigenvectors in descending order of eigenvalues ​​to determine the normal vector.

[0012] Optionally, the steps of voxelizing and extracting features from the absolute coordinates of the voxel branches corresponding to the original point cloud data to obtain voxel branch features include: dividing the continuous three-dimensional space into a cubic mesh and mapping the absolute coordinates of the original point cloud data to the voxel index; for each voxel, determining the voxel features based on the feature statistics of all points within the voxel; and using a linear interpolation method to remap the voxel features back to each point based on the relative position of each point in the voxel to obtain voxel branch features.

[0013] Optionally, the point-pair features include at least one of the following: the Euclidean distance between the center point and the neighboring points, the angle between the direction vector of the neighboring point relative to the center point and the normal vector of the center point, the angle between the normal vector of the center point and the normal vector of the neighboring point, and the angle between the direction vector of the neighboring point relative to the center point and the normal vector of the neighboring point.

[0014] Secondly, embodiments of this application also provide a point cloud data processing apparatus based on a non-parametric attention mechanism, the apparatus comprising: The first feature determination module is used to select multiple center points in the original point cloud data, construct a local reference frame for each center point, determine the point pair features between the center point and its neighboring points within the local reference frame, and generate local rotation invariant features based on the point pair features. The second feature determination module is used to sample the original point cloud data to generate multiple point triples, determine the geometric primitives used to describe the geometric shape of each point triple, quantize and statistically analyze all geometric primitives, and determine the global rotation invariant features. The third feature determination module is used to perform voxelization and feature extraction on the absolute coordinates of the voxel branch corresponding to the original point cloud data to obtain voxel branch features. The feature fusion module is used to fuse the local rotation-invariant features, the global rotation-invariant features, and the voxel branch features using a nonparametric attention mechanism to obtain fused features.

[0015] The embodiments of this application bring the following beneficial effects: This application provides a point cloud data processing method and apparatus based on a non-parametric attention mechanism. This method can inject locally rotation-invariant and globally rotation-invariant features into a non-parametric pipeline without introducing learnable parameters. It then dynamically fuses multiple features through an attention mechanism to obtain fused features, significantly improving the applicability and stability of non-parametric point cloud networks in real-world scenarios while maintaining extremely low computational resource consumption. Compared to existing point cloud data processing methods based on non-parametric attention mechanisms, this method avoids the problem of decreased classification or segmentation performance during point cloud data processing and improves the accuracy of prediction results.

[0016] To make the above-mentioned objectives, features and advantages of this application more apparent and understandable, preferred embodiments are described below in detail with reference to the accompanying drawings. Attached Figure Description

[0017] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 A flowchart of a point cloud data processing method based on a non-parametric attention mechanism provided in an embodiment of this application is shown; Figure 2 A flowchart illustrating the steps for generating local rotation-invariant features according to an embodiment of this application is shown; Figure 3 A flowchart illustrating the steps for determining the globally rotation-invariant feature provided in an embodiment of this application is shown; Figure 4 A flowchart illustrating the steps for determining voxel branch features provided in an embodiment of this application is shown; Figure 5 A flowchart illustrating the steps for obtaining fusion features provided in an embodiment of this application is shown; Figure 6 A schematic diagram of the structure of the point cloud data processing device based on a non-parametric attention mechanism provided in an embodiment of this application is shown. Figure 7 A schematic diagram of the structure of the electronic device provided in the embodiments of this application is shown. Detailed Implementation

[0019] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. The components of the embodiments of this application described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of this application. Based on the embodiments of this application, every other embodiment obtained by those skilled in the art without inventive effort falls within the scope of protection of this application.

[0020] To facilitate understanding of this embodiment, the following uses the point cloud data processing method based on a non-parametric attention mechanism provided in this application embodiment applied to a terminal device as an example to describe each of the exemplary steps provided in this application embodiment.

[0021] Please see Figure 1 , Figure 1 This is a flowchart illustrating a point cloud data processing method based on a non-parametric attention mechanism, provided as an embodiment of this application. Figure 1 As shown in the embodiments of this application, the point cloud data processing method based on a non-parametric attention mechanism includes: Step S101: Select multiple center points in the original point cloud data, construct a local reference frame for each center point, determine the point pair features between the center point and its neighboring points within the local reference frame, and generate local rotation-invariant features based on the point pair features.

[0022] Raw point cloud data refers to the point cloud data to be processed. Raw point cloud data can be three-dimensional spatial coordinate information collected by various sensors.

[0023] Local rotation-invariant features refer to relative geometric features that possess rotation invariance. These features are extracted by constructing a stable local reference frame, thereby overcoming the sensitivity to rotational transformations caused by the reliance on the absolute coordinates of the point cloud in traditional methods.

[0024] The following reference Figure 2 This section will introduce the generation process of locally rotation-invariant features.

[0025] Figure 2 A flowchart illustrating the steps for generating local rotation-invariant features according to an embodiment of this application is shown, as follows: Figure 2 As shown, the steps for generating locally rotation-invariant features include: Step S1011: Construct a local neighborhood.

[0026] The first step is to select multiple center points in the original point cloud data.

[0027] The Farthest Point Sampling (FPS) algorithm is used to select multiple representative center points from the original point cloud data.

[0028] The second step is to use a preset algorithm to search for neighboring points around each center point in order to construct the local neighborhood of that center point.

[0029] For each center point, the K-Nearest Neighbors (K-NN) algorithm is used to search for neighboring points around the center point to construct a local neighborhood with a size of 64.

[0030] Step S1012: Determine the local reference frame corresponding to the local neighborhood.

[0031] The first step is to determine the normal vector of the local neighborhood based on the covariance matrix of all points in the local neighborhood relative to the center point.

[0032] For each center point, determine the covariance matrix of all points in the local neighborhood of that center point relative to that center point, and perform eigenvalue decomposition on the covariance matrix to obtain the eigenvectors.

[0033] The eigenvectors are sorted in descending order of their eigenvalues ​​to determine the normal vector. For example, the eigenvalues ​​include... , , ,and The smallest eigenvalue corresponding feature vector It's the normal vector, and the other two eigenvectors. , The orthogonal axes that form the tangent plane.

[0034] The covariance matrix is ​​a weighted covariance matrix, which includes weights to characterize the contribution of each neighboring point. The weight of a neighboring point in the weighted covariance matrix is ​​negatively correlated with the distance between the neighboring point and the center point. The closer the neighboring point is to the center point, the higher the weight; the farther the neighboring point is from the center point, the higher the weight.

[0035] The second step involves using the projections of neighboring points onto the principal axis to perform weighted voting, determining the direction of the normal vector and the tangent plane axis, in order to construct a local reference frame.

[0036] The principal axis can refer to the main direction within the local neighborhood, that is, the largest eigenvalue of the covariance matrix. corresponding feature vector It represents the direction of extension of the distribution of neighboring points.

[0037] A local reference frame can be called a local reference system. A local reference frame is a unique local reference system that remains unchanged under rigid rotation. The local reference system is determined based on the orientation of the normal vector and the axis of the tangent plane.

[0038] Specifically, each neighboring point in the local neighborhood is projected onto the principal axis to obtain the projection value. The projection values ​​of all neighboring points are statistically analyzed, i.e., a weighted vote is performed, to determine the final direction of the normal vector. For example, if there are two opposite candidate directions for the normal vector, the voting score of each candidate direction is calculated, and the candidate direction with the highest voting score is selected as the final normal vector direction. The reverse direction of the tangent plane axis is determined based on the normal vector direction. The voting weight is inversely proportional to the distance from the neighboring point to the center point; the closer the distance, the higher the weight; the farther the distance, the lower the weight.

[0039] In the embodiments of this application, by constructing a local reference frame, the problems of normal vector sign flipping and axial uncertainty that exist in traditional point cloud data processing methods when processing near isotropic neighborhoods can be solved.

[0040] Step S1013: Determine the point pair features within the local reference frame.

[0041] Within a local reference frame, the point-pair features between the center point and each neighbor point are calculated. The point-pair features are four-dimensional vectors, wherein the point-pair features include at least one of the following: the Euclidean distance between the center point and the neighbor points, the angle between the direction vector of the neighbor point relative to the center point and the normal vector of the center point, the angle between the normal vector of the center point and the normal vector of the neighbor point, and the angle between the direction vector of the neighbor point relative to the center point and the normal vector of the neighbor point.

[0042] Step S1014: Perform position encoding on the point-pair features.

[0043] To enhance the expressive power of point-pair features, these features can be input into the sine and cosine position encoding module.

[0044] In the sine and cosine position encoding module, by using preset frequency parameters, low-dimensional point-pair features can be expanded into high-dimensional feature vectors, generating the final local rotation-invariant features.

[0045] Step S102: Sample the original point cloud data to generate multiple point triples, determine the geometric primitives used to describe the geometric shape of each point triple, quantize and statistically analyze all geometric primitives, and determine the global rotation invariant features.

[0046] The following reference Figure 3 This section will introduce the process of determining globally rotation-invariant features.

[0047] Figure 3 A flowchart illustrating the steps for determining the globally rotation-invariant feature provided in an embodiment of this application is shown, as follows: Figure 3 As shown, the steps for determining globally rotation-invariant features include: Step S1021: Sample the original point cloud data to generate point triples.

[0048] Large-scale random sampling is performed on the original point cloud data to generate multiple point triples. To balance computational efficiency and shape coverage, the number of point triples obtained from the sampling can be set to 8192.

[0049] Step S1022: Determine the geometric primitives corresponding to the point triplet.

[0050] For each point triplet, it can be regarded as a triangle, and then the geometric primitives describing the geometry of the triangle are calculated.

[0051] The geometric primitive is a six-dimensional primitive vector, which includes the three side lengths of the triangle and the interior angles of the three vertices. Since the side lengths and interior angles are intrinsic properties of the object and do not change with the overall rotation of the object, the geometric primitive has strict rotation invariance.

[0052] Step S1023: Quantize and statistically analyze all geometric primitives to determine the global rotation-invariant features.

[0053] To transform discrete geometric primitives into fixed-length feature vectors, the value range of the geometric primitives can be quantized.

[0054] First, the range of values ​​for the side length and angle representing the geometric shape is divided into multiple preset intervals. The frequency of the value range of the geometric primitive of each point triplet falling into the preset interval is statistically analyzed to generate a global histogram.

[0055] For example, dividing the side length into three preset intervals (short, medium, and long), and the angle into three preset intervals (acute, right, and obtuse), results in a total of 3 to the power of 6 statistical terms (i.e., the bars of the histogram) for a six-dimensional primitive vector. The frequency of each point triplet falling into each preset interval is counted, and the frequencies are normalized to generate a high-dimensional global histogram.

[0056] Then, the global histogram is flattened to obtain globally rotation-invariant features.

[0057] For example, in order to integrate with local rotation-invariant features, the global histogram can be flattened and the flattened global histogram can be copied so that the number of copied global histograms is consistent with the number of points in the original point cloud data, so as to serve as the global rotation-invariant feature for each point.

[0058] Step S103: Voxelize and extract features from the absolute coordinates of the voxel branches corresponding to the original point cloud data to obtain voxel branch features.

[0059] The following reference Figure 4 This section will introduce the process of determining voxel branching characteristics.

[0060] Figure 4 A flowchart illustrating the steps for determining voxel branch features provided in an embodiment of this application is shown, as follows: Figure 4 As shown, the steps for determining voxel branching features include: Step S1031: Divide the three-dimensional space into a cubic mesh and map the absolute coordinates of the original point cloud data into the voxel index.

[0061] The continuous three-dimensional space is divided into a cubic mesh, which includes multiple cubic meshes. Each cubic mesh is a voxel, and each voxel is set with a corresponding voxel index. The absolute coordinates of each point in the original point cloud data can be mapped to the voxel index.

[0062] Step S1032: For each voxel, determine the voxel features based on the feature statistics of all points within that voxel.

[0063] Specifically, the feature statistics can be the mean or maximum value of the original features of all points. The mean or maximum value of the original features of all points falling within the same voxel can be determined as the voxel feature.

[0064] Step S1033: Based on the relative position of each point in the voxel, remap the voxel features back to each point to obtain voxel branch features.

[0065] Using a linear interpolation method, the voxel features are remapped back to the point based on the relative position of each point in the voxel, so that each original point can obtain a voxel branch feature containing information about its surrounding spatial structure.

[0066] Step S104: Using a non-parametric attention mechanism, the local rotation-invariant features, the global rotation-invariant features, and the voxel branch features are fused to obtain the fused features.

[0067] This step aims to automatically adjust the weights of features in each branch based on the rotation state of the input data without training. The multiple branches include locally invariant branches, globally invariant branches, and voxel branches. The features of each branch include locally rotation-invariant features corresponding to locally invariant branches, globally rotation-invariant features corresponding to globally invariant branches, and voxel branch features corresponding to voxel branches.

[0068] The following reference Figure 5 This section will introduce the process of obtaining fusion features.

[0069] Figure 5 A flowchart illustrating the steps for obtaining the fusion features provided in an embodiment of this application is shown, as follows: Figure 5As shown, the steps for obtaining fused features include: Step S1041: Using a non-parametric attention mechanism, local rotation-invariant features, global rotation-invariant features, and voxel branch features are concatenated to obtain a comprehensive feature vector.

[0070] A parameter-free attention mechanism, or non-parametric attention mechanism, is constructed using cosine similarity. Using this mechanism, for each point in the original point cloud data, L2 (L2 Normalization) is applied to the local rotation-invariant features, global rotation-invariant features, and voxel branch features to eliminate differences in the amplitude of features from different branches. Then, the three normalized features are concatenated along the channel dimension to form a comprehensive feature vector.

[0071] Step S1042: Determine the attention weight of each point based on the comprehensive feature vector.

[0072] Using the local neighborhood constructed in step S1011, and employing a non-parametric attention mechanism, the dot product (i.e., cosine similarity) between the comprehensive feature vector of the center point in each local neighborhood and the comprehensive feature vector of each neighbor point is determined.

[0073] To control the sharpness of the attention distribution, a temperature coefficient is introduced to determine the ratio of the dot product to the temperature coefficient. The Softmax function is then applied to normalize the ratio of the dot product to the temperature coefficient for different neighboring points, in order to determine the attention weight of each neighboring point relative to the center point.

[0074] Step S1043: Use attention weights to perform weighted summation of the comprehensive features in the neighborhood to obtain the final fused features.

[0075] By using a non-parametric attention mechanism, the calculated attention weights are used to perform a weighted summation of the comprehensive features in the local neighborhood to obtain the final fused features.

[0076] The underlying working mechanism of this application embodiment is as follows: when the original point cloud data is rotated, the voxel branch features change drastically, leading to a decrease in feature similarity within the local neighborhood. However, the locally rotation-invariant features and the globally rotation-invariant features remain stable, and their similarity is relatively high. Due to the exponential amplification characteristic of the Softmax function, the attention mechanism automatically suppresses those feature components that become inconsistent due to rotation (i.e., voxel branch features) and allocates high weights to the consistent invariant feature components (i.e., locally rotation-invariant features and globally rotation-invariant features), thereby significantly improving the robustness of rotated data without sacrificing the accuracy of aligned data.

[0077] The point cloud data processing method based on a non-parametric attention mechanism provided in this application can inject local rotation-invariant and global rotation-invariant features into the non-parametric pipeline without introducing learnable parameters or sacrificing inference speed, dynamically resolving the conflict between rotation-sensitive and rotation-invariant features. Simultaneously, by dynamically fusing multiple features through an attention mechanism to obtain fused features, it significantly improves the applicability and stability of the non-parametric point cloud network in real-world scenarios while maintaining extremely low computational resource consumption. This solves the problem of the sharp performance degradation of existing non-parametric networks when facing point cloud rotation, and robust feature extraction can be achieved on different datasets without training.

[0078] Based on the same inventive concept, this application also provides a point cloud data processing device based on a non-parametric attention mechanism, which corresponds to the point cloud data processing method based on a non-parametric attention mechanism. Since the principle of the device in this application is similar to the point cloud data processing method based on a non-parametric attention mechanism described above, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be described again.

[0079] Please see Figure 6 , Figure 6 This is a schematic diagram of a point cloud data processing device based on a non-parametric attention mechanism, provided as an embodiment of this application. Figure 6 As shown, the point cloud data processing device 200 based on the non-parametric attention mechanism includes: The first feature determination module 201 is used to select multiple center points in the original point cloud data, construct a local reference frame for each center point, determine the point pair features between the center point and its neighboring points within the local reference frame, and generate local rotation invariant features based on the point pair features. The second feature determination module 202 is used to sample the original point cloud data to generate multiple point triples, determine the geometric primitives used to describe the geometric shape of each point triple, quantize and statistically analyze all geometric primitives, and determine the global rotation invariant features. The third feature determination module 203 is used to perform voxelization and feature extraction processing on the absolute coordinates of the voxel branch corresponding to the original point cloud data to obtain voxel branch features. The feature fusion module 204 is used to fuse the local rotation invariant features, the global rotation invariant features, and the voxel branch features using a nonparametric attention mechanism to obtain fused features.

[0080] Please see Figure 7 , Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Figure 7As shown, the electronic device 300 includes a processor 310, a memory 320, and a bus 330.

[0081] The memory 320 stores machine-readable instructions executable by the processor 310. When the electronic device 300 is running, the processor 310 and the memory 320 communicate via the bus 330. When the machine-readable instructions are executed by the processor 310, they can perform the operations described above. Figure 1 The steps of the point cloud data processing method based on the non-parametric attention mechanism in the method embodiment shown are specifically implemented in the method embodiment and will not be repeated here.

[0082] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, can perform the above-described actions. Figure 1 The steps of the point cloud data processing method based on the non-parametric attention mechanism in the method embodiment shown are specifically implemented in the method embodiment and will not be repeated here.

[0083] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0084] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the shown or discussed mutual couplings, direct couplings, or communication connections may be through some communication interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.

[0085] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0086] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0087] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0088] Finally, it should be noted that the above-described embodiments are merely specific implementations of this application, used to illustrate the technical solutions of this application, and not to limit them. The scope of protection of this application is not limited thereto. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the scope of the technology disclosed in this application. Such modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be covered within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A point cloud data processing method based on a non-parametric attention mechanism, characterized in that, include: Multiple center points are selected in the original point cloud data, and a local reference frame is constructed for each center point. Point pair features between the center point and its neighboring points are determined within the local reference frame, so as to generate local rotation invariant features based on the point pair features. The original point cloud data is sampled to generate multiple point triples, and geometric primitives are determined to describe the geometric shape of each point triple. All geometric primitives are quantized and statistically analyzed to determine global rotation invariant features. The absolute coordinates of the voxel branches corresponding to the original point cloud data are voxelized and feature extracted to obtain voxel branch features. By using a non-parametric attention mechanism, the local rotation-invariant features, the global rotation-invariant features, and the voxel branching features are fused to obtain a fused feature.

2. The method according to claim 1, characterized in that, The step of fusing the local rotation-invariant features, the global rotation-invariant features, and the voxel branching features using a non-parametric attention mechanism to obtain fused features includes: Using the non-parametric attention mechanism, the local rotation-invariant features, the global rotation-invariant features, and the voxel branch features are concatenated to obtain a comprehensive feature vector, and the attention weight of each point is determined based on the comprehensive feature vector. The attention weights are used to perform a weighted summation of the comprehensive features within the neighborhood to obtain the final fused features.

3. The method according to claim 2, characterized in that, The step of determining the attention weight of each point based on the comprehensive feature vector includes: The non-parametric attention mechanism is used to determine the dot product between the integrated features of the center point and the integrated features of the neighbor points; The ratio of the dot product to the temperature coefficient is normalized to determine the attention weight of each neighbor point relative to the center point.

4. The method according to claim 1, characterized in that, The step of quantizing and statistically analyzing all geometric primitives to determine globally rotation-invariant features includes: The range of values ​​for the side length and angle that characterize the geometric shape is divided into multiple preset intervals; The frequency of the value range of the geometric primitive of each point triplet falling into the preset interval is statistically analyzed to generate a global histogram; The global histogram is flattened to obtain globally rotation-invariant features.

5. The method according to claim 1, characterized in that, Local rotation-invariant features are generated using the following method: Position encoding is performed on the point pair features to expand the low-dimensional geometric features into high-dimensional feature vectors, generating locally rotation-invariant features.

6. The method according to claim 1, characterized in that, The step of selecting multiple center points in the original point cloud data and constructing a local reference frame for each center point includes: A preset algorithm is used to search for neighboring points around each center point to construct a local neighborhood of that center point; The normal vector of the local neighborhood is determined based on the covariance matrix of all points in the local neighborhood relative to the center point. The direction of the normal vector and the tangent plane axis is determined by weighted voting using the projections of neighboring points onto the principal axis, in order to construct a local reference frame.

7. The method according to claim 6, characterized in that, The step of determining the normal vector of the local neighborhood based on the covariance matrix of all points within the local neighborhood relative to the center point includes: Determine the covariance matrix of all points in the local neighborhood relative to the center point; The covariance matrix is ​​decomposed into eigenvectors, and the eigenvectors are sorted in descending order of eigenvalues ​​to determine the normal vector.

8. The method according to claim 1, characterized in that, The step of performing voxelization and feature extraction on the absolute coordinates of the voxel branches corresponding to the original point cloud data to obtain voxel branch features includes: The continuous three-dimensional space is divided into a cubic mesh, and the absolute coordinates of the original point cloud data are mapped to the voxel index. For each voxel, the voxel features are determined based on the feature statistics of all points within that voxel. By using a linear interpolation method, the voxel features are remapped back to each point based on the relative position of each point in the voxel, in order to obtain voxel branch features.

9. The method according to claim 1, characterized in that, The point pair features include at least one of the following: The Euclidean distance between the center point and the neighboring points, the angle between the direction vector of the neighboring point relative to the center point and the normal vector of the center point, the angle between the normal vector of the center point and the normal vector of the neighboring point, and the angle between the direction vector of the neighboring point relative to the center point and the normal vector of the neighboring point.

10. A point cloud data processing device based on a non-parametric attention mechanism, characterized in that, include: The first feature determination module is used to select multiple center points in the original point cloud data, construct a local reference frame for each center point, determine the point pair features between the center point and its neighboring points within the local reference frame, and generate local rotation invariant features based on the point pair features. The second feature determination module is used to sample the original point cloud data to generate multiple point triples, determine the geometric primitives used to describe the geometric shape of each point triple, quantize and statistically analyze all geometric primitives, and determine the global rotation invariant features. The third feature determination module is used to perform voxelization and feature extraction on the absolute coordinates of the voxel branch corresponding to the original point cloud data to obtain voxel branch features. The feature fusion module is used to fuse the local rotation-invariant features, the global rotation-invariant features, and the voxel branch features using a nonparametric attention mechanism to obtain fused features.