Point cloud data processing method, apparatus, device, and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP LTD
- Filing Date
- 2023-12-01
- Publication Date
- 2026-06-19
AI Technical Summary
The existing point cloud sampling methods have problems of insufficient quality and high computational complexity, which limit the quality of the sampling point cloud and the performance of downstream tasks.
By determining the contribution score of points in the point cloud dataset and downsampling according to the contribution score, a second point cloud dataset with fewer points was obtained. This method uses a data processing network to process the first point cloud data set, including feature extraction, cascade attention module and contribution marking module, ensuring the quality of the sampled point cloud and the performance of downstream tasks.
Improves the quality of the sampled point cloud, reduces computational complexity, and strikes a balance between retaining the geometric properties of the input point cloud and optimizing specific downstream tasks, thereby improving the performance of downstream tasks.
Smart Images

Figure CN122249839A_ABST
Abstract
Description
Point cloud data processing method, device, equipment and storage medium Technical Field
[0001] The embodiments of the present application relate to the field of point cloud data processing technology, and in particular to a point cloud data processing method, apparatus, device, and storage medium. Background Art
[0002] A point cloud is a collection of unordered points that describes the geometry of an object. Because point clouds provide rich information about geometry, shape, and scale in three-dimensional space, they are increasingly being used in a variety of applications, including autonomous driving, virtual reality, augmented reality, and robotics. However, point cloud data is large in size, irregular in format, and sparse, making it difficult to process and transmit. To overcome this challenge, downsampling is often required before downstream tasks can be performed.
[0003] In related technologies, although many point cloud downsampling methods have been proposed, such as generative point cloud sampling methods and selective point cloud sampling methods. The former can include S-Net, Sample-Net, PST-Net, etc., while the latter can include random sampling, farthest point sampling, Poisson disk sampling, etc. However, these existing point cloud sampling methods still have some defects, which limit the quality of the sampled point cloud and reduce the performance of downstream tasks.
[0004] Summary of the Invention
[0005] The embodiments of the present application provide a point cloud data processing method, apparatus, device, and storage medium, which can improve the quality of sampled point clouds and thereby improve the performance of downstream tasks.
[0006] The technical solution of the embodiment of the present application can be implemented as follows:
[0007] In a first aspect, an embodiment of the present application provides a method for processing point cloud data, the method comprising:
[0008] Determine a first point cloud dataset;
[0009] Processing the first point cloud dataset using a data processing network to determine a second point cloud dataset, wherein the first point cloud dataset is input data of the data processing network, the second point cloud dataset is output data of the data processing network, and the number of points included in the second point cloud dataset is less than the number of points in the first point cloud dataset;
[0010] The first point cloud dataset is processed using a data processing network, including:
[0011] Determining a contribution score of a point in the first point cloud dataset;
[0012] According to the contribution score, the second point cloud dataset is determined.
[0013] In a second aspect, an embodiment of the present application provides a point cloud data processing device, which includes a determination unit and a processing unit; wherein,
[0014] a determining unit configured to determine a first point cloud dataset;
[0015] a processing unit configured to process the first point cloud dataset using a data processing network to determine a second point cloud dataset, wherein the first point cloud dataset is input data of the data processing network, the second point cloud dataset is output data of the data processing network, and the number of points included in the second point cloud dataset is less than the number of points in the first point cloud dataset;
[0016] The processing unit is specifically configured to determine a contribution score of a point in the first point cloud data set; and determine a second point cloud data set based on the contribution score.
[0017] In a third aspect, an embodiment of the present application provides an electronic device, the electronic device including a memory and a processor; wherein,
[0018] a memory for storing computer programs capable of running on the processor;
[0019] A processor is configured to execute the method described in the first aspect when running the computer program.
[0020] In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by at least one processor, it implements the method described in the first aspect.
[0021] Embodiments of the present application provide a point cloud data processing method, apparatus, device, and storage medium, comprising: determining a first point cloud dataset; and processing the first point cloud dataset using a data processing network to determine a second point cloud dataset, wherein the first point cloud dataset is input data to the data processing network, the second point cloud dataset is output data from the data processing network, and the number of points contained in the second point cloud dataset is less than the number of points in the first point cloud dataset. Processing the first point cloud dataset using the data processing network includes: determining contribution scores of points in the first point cloud dataset; and determining the second point cloud dataset based on the contribution scores. Specifically, downsampling the first point cloud dataset based on the contribution scores to obtain a second point cloud dataset with fewer points not only reduces computational complexity, but also, because the contribution scores represent the importance of points in the first point cloud dataset relative to the task being processed, i.e., the downsampling method is task-oriented, the data processing network can strike a balance between preserving the geometric properties of the input point cloud and optimizing specific downstream tasks, thereby improving the quality of the sampled point cloud and, consequently, the performance of the downstream tasks. BRIEF DESCRIPTION OF THE DRAWINGS
[0022] FIG1 is a flowchart of a point cloud data processing method according to an embodiment of the present application;
[0023] FIG2 is a second flow chart of a point cloud data processing method provided in an embodiment of the present application;
[0024] FIG3 is a third flow chart of a point cloud data processing method provided in an embodiment of the present application;
[0025] FIG4 is a schematic diagram of the architecture of a data processing network provided in an embodiment of the present application;
[0026] FIG5 is a fourth flow chart of a point cloud data processing method provided in an embodiment of the present application;
[0027] FIG6 is a schematic diagram of a network structure of a feature extraction module provided in an embodiment of the present application;
[0028] FIG7 is a fifth flow chart of a point cloud data processing method provided in an embodiment of the present application;
[0029] FIG8 is a schematic diagram of a network structure of a cascaded attention module provided in an embodiment of the present application;
[0030] FIG9 is a schematic diagram of an application scenario of an attention module provided in an embodiment of the present application;
[0031] FIG10 is a schematic diagram of the architecture of a cascaded attention module and a contribution marking module provided in an embodiment of the present application;
[0032] FIG11 is a sixth flow chart of a point cloud data processing method provided in an embodiment of the present application;
[0033] FIG12 is a seventh flow chart of a point cloud data processing method provided in an embodiment of the present application;
[0034] FIG13 is a schematic diagram of an application framework of a data processing network provided in an embodiment of the present application;
[0035] FIG14 is a schematic diagram showing a visual comparison of a data processing network provided in an embodiment of the present application and other sampling methods;
[0036] FIG15 is a schematic diagram showing the contribution score distribution of different loss functions provided in an embodiment of the present application;
[0037] FIG16 is a schematic diagram comparing the registration results of the data processing network provided by an embodiment of the present application and other sampling methods;
[0038] FIG17 is a schematic diagram showing a comparison of rate-distortion curves of a data processing network provided by an embodiment of the present application and other sampling methods;
[0039] FIG18 is a first schematic diagram showing a comparison of point cloud compression visualization between a data processing network provided by an embodiment of the present application and other sampling methods;
[0040] FIG19 is a second schematic diagram showing a comparison of point cloud compression visualization between a data processing network provided by an embodiment of the present application and other sampling methods;
[0041] FIG20 is a schematic diagram comparing surface reconstruction using a data processing network provided by an embodiment of the present application and other sampling methods;
[0042] FIG21 is a schematic diagram of the structure of a point cloud data processing device provided in an embodiment of the present application;
[0043] FIG22 is a schematic diagram of a specific hardware structure of an electronic device provided in an embodiment of the present application;
[0044] Figure 23 is a schematic diagram of a network architecture of point cloud encoding and decoding provided in an embodiment of the present application. DETAILED DESCRIPTION
[0045] In order to enable a more detailed understanding of the features and technical contents of the embodiments of the present application, the implementation of the embodiments of the present application is described in detail below with reference to the accompanying drawings. The attached drawings are for reference only and are not used to limit the embodiments of the present application.
[0046] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this application pertains. The terms used herein are for the purpose of describing the embodiments of this application only and are not intended to limit this application.
[0047] In the following description, reference is made to “some embodiments”, which describes a subset of all possible embodiments, but it will be understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.
[0048] It should also be pointed out that the terms "first\second\third" involved in the embodiments of the present application are only used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that "first\second\third" can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described here can be implemented in an order other than that illustrated or described here.
[0049] Before further explaining the embodiments of the present application in detail, the nouns and terms involved in the embodiments of the present application are explained first. The nouns and terms involved in the embodiments of the present application are subject to the following interpretations:
[0050] Point Cloud Compression (PCC);
[0051] Contribution Mark-based Sampling Network (CMS-Net);
[0052] Density-Adaptive down sampling Network (DA-Net);
[0053] Point Sampling Transformer Network (PST-Net);
[0054] Matrix Optimization-driven Network (MOPS-Net);
[0055] Attention-Sampling Network (AS-Net);
[0056] Gumbel Subset Sampling (GSS);
[0057] Contribution Marking Module (CMM);
[0058] Cascade Attention Module (CAM);
[0059] Self-Attention (SA);
[0060] Offset Attention (OA);
[0061] Random Sample (RS);
[0062] Straight Sample (SS);
[0063] Fast Point Sample (FPS);
[0064] Poisson Disk Sample (PDS);
[0065] Chamfer Distance (CD);
[0066] Three Dimension (3D).
[0067] It can be understood that point cloud is a three-dimensional representation of the surface of an object. Point cloud (data) of the surface of an object can be collected through acquisition equipment such as photoelectric radar, lidar, laser scanner, multi-view camera, etc.
[0068] In a two-dimensional image, each pixel contains information and is distributed regularly, so there's no need to record its location. However, the distribution of points in a point cloud in three-dimensional space is random and irregular, so recording the location of each point in space is necessary to fully represent the point cloud. Similar to a two-dimensional image, each location in the acquisition process has corresponding attribute information, typically an RGB color value, which reflects the object's color. For a point cloud, in addition to color information, each point's attribute information often includes a reflectance value, which reflects the surface texture of the object. Therefore, a point in a point cloud can include both location information and attribute information. For example, the location information of a point can be its three-dimensional coordinates (x, y, z). The location information of a point can also be referred to as its geometric information. For example, the attribute information of a point can include color information (three-dimensional color information) and / or reflectance (one-dimensional reflectance information r). For example, the color information can be information in any color space. For example, the color information can be RGB information, where R represents red (R), G represents green (G), and B represents blue (B). For another example, the color information may be luminance and chrominance (YCbCr, YUV) information, where Y represents brightness (Luma), Cb (U) represents blue color difference, and Cr (V) represents red color difference.
[0069] For example, a point cloud generated using laser measurement principles can include both its 3D coordinate information and its reflectivity. For another example, a point cloud generated using photogrammetry principles can include both its 3D coordinate information and its 3D color information. For another example, a point cloud generated using a combination of laser measurement and photogrammetry principles can include both its 3D coordinate information, its reflectivity value, and its 3D color information.
[0070] Point clouds can be divided into the following categories according to the acquisition method:
[0071] The first type of static point cloud: the object is stationary and the device used to obtain the point cloud is also stationary;
[0072] The second type of dynamic point cloud: the object is moving, but the device that obtains the point cloud is stationary;
[0073] The third type of dynamic point cloud acquisition: the device that acquires the point cloud is moving.
[0074] For example, point clouds can be divided into two categories according to their usage:
[0075] Category 1: Machine perception point cloud, which can be used in scenarios such as autonomous navigation systems, real-time inspection systems, geographic information systems, visual sorting robots, and disaster relief robots;
[0076] Category 2: Human eye perception point cloud, which can be used in point cloud application scenarios such as digital cultural heritage, free viewpoint broadcasting, 3D immersive communication, and 3D immersive interaction.
[0077] Since a point cloud is a collection of massive points, storing the point cloud will not only consume a large amount of memory, but also be inconvenient for transmission. There is also not enough bandwidth to support direct transmission of the point cloud at the network layer without compression. Therefore, the point cloud needs to be compressed. Here, the point cloud coding framework that can achieve compression of the point cloud can be the geometry-based Point Cloud Compression (G-PCC) codec framework or the video-based Point Cloud Compression (V-PCC) codec framework provided by the Moving Picture Experts Group (MPEG), or the AVS-PCC codec framework provided by the Audio Video Standard (AVS). Among them, the G-PCC codec framework can be used to compress the first type of static point cloud and the third type of dynamically acquired point cloud, and the V-PCC codec framework can be used to compress the second type of dynamic point cloud.
[0078] It is also understandable that point clouds, as an important data representation, are widely used in fields such as autonomous driving, augmented reality, and robotics. Since point cloud data is typically large in volume, sampling it to obtain a representative subset of points is a fundamental and important task in 3D computer vision. In practical applications, point cloud data is large in volume, irregular in format, and sparse, making it difficult to process and transmit. To overcome this challenge, point cloud downsampling has become a popular and effective method to simplify point clouds, thereby saving storage space, reducing transmission bandwidth, and communication overhead.
[0079] The following introduces the relevant technologies by combining the generative point cloud sampling method and the selective point cloud sampling method.
[0080] 1) Generative point cloud sampling method.
[0081] Dovrat et al. pioneered the development of a generative sampling network, or S-Net. The output point cloud generated by S-Net is generated by passing a fully connected layer. Later, Lin et al. observed that Sample-Net had overlapping neighborhoods in the projection operation and proposed an improved local adjustment module that can better preserve local details and achieve better classification compared to Sample-Net. Lin et al. also proposed the density adaptive downsampling network DA-Net, which introduced a new density adaptive downsampling module that can adaptively adjust the sampling rate of different areas of the point cloud according to the estimated local density. Furthermore, Wang combined S-Net with a transformer and proposed a point sampling transformer (PST) network PST-Net. Thanks to the transformer, this method can generate noise-insensitive point clouds. In addition, Tian proposed a general point cloud sampling network that can sample representative points without the need for task-specific fine-tuning.
[0082] 2) Selective point cloud sampling method.
[0083] The most commonly used traditional sampling methods include random sampling, farthest point sampling, and Poisson disk sampling. In random sampling, a subset of points is randomly selected from the input point cloud. Farthest point sampling iteratively selects the point farthest from the previously selected point, generating a subset of points evenly distributed across the point cloud. Poisson disk sampling generates points by rejecting any new point that is too close to a previously generated point. This method provides a uniform distribution of points within a given space.
[0084] b. Learning-based selective sampling methods can directly select points from the input point set based on specific rules or extracted features. A significant issue with these methods is that the point selection process is discrete and non-differentiable. To address this issue, several strategies have been applied. Qian et al. pioneered a selective sampling method using the matrix optimization-driven network MOPS-Net, introducing matrix multiplication for sampling point clouds. They also proposed replacing a discrete point with a weighted average of several points, making the sampling network differentiable. Furthermore, Yang et al. proposed the attention sampling network AS-Net, building on MOPS-Net. This network introduces an attention module to capture important features. They also provided a constrained matching module to ensure that the sampled point cloud is strictly a subset of the input point cloud. Yang et al. proposed a gumbel subset sampling method that addresses the trainability issue by adding gumbel noise to the binary matrix. They also provided a channel shuffling module that enables feature interaction between different channels without adding additional parameters. Furthermore, Sun et al. proposed a direct sampling network that enables training a hard-sampled network using a gradient estimation strategy. However, selective sampling methods based on matrix multiplication introduce the new problem of duplicate points in the output point set. Furthermore, Ehsan et al. introduced top-k optimization into point cloud sampling.
[0085] This suggests that in related technologies, S-Net matches the output point with the nearest point in the input point set to improve the quality of the generated point cloud. Further research, Sample-Net, replaces this matching operation with a non-trainable differential projection operation. While this approach can better approximate the original point cloud, it is still computationally expensive and can lose some local details.
[0086] The point sampling transformer network still includes the matching operation proposed by S-Net and is not trainable. Although various methods have been proposed to help the sampled point cloud better approximate the input point cloud, a common problem of these methods is that the generated point cloud is not a strict subset of the input point cloud. Therefore, these methods cannot maintain the shape consistency of the original point cloud, resulting in lower subjective quality than traditional sampling methods such as FPS.
[0087] In addition, traditional sampling methods such as FPS and Poisson disk sampling have high computational complexity. A common shortcoming of matrix optimization driven networks and attention sampling networks is that they do not completely solve the differentiation problem, and the proposed networks can only output a set of soft sampling points during training.
[0088] That is to say, although related technologies have proposed many point cloud downsampling methods, these existing point cloud sampling methods still have some defects, which limit the quality of the sampled point cloud and thus reduce the performance of downstream tasks.
[0089] Based on this, an embodiment of the present application provides a method for processing point cloud data. First, a first point cloud data set is determined, and then the data processing network is used to process the first point cloud data set to determine a second point cloud data set. Specifically: determine the contribution scores of the points in the first point cloud data set; determine the second point cloud data set according to the contribution scores. Here, the first point cloud data set is the input data of the data processing network, and the second point cloud data set is the output data of the data processing network, and the number of points included in the second point cloud data set is less than the number of points in the first point cloud data set. That is to say, by performing downsampling processing on the first point cloud data set based on the contribution scores to obtain a second point cloud data set with fewer points, not only can the computational complexity be reduced, but also since the contribution scores can represent the importance of the points in the first point cloud data set relative to the task to be processed, that is, this downsampling method is task-oriented, and this data processing network can also balance between preserving the geometric attributes of the input point cloud and optimizing specific downstream tasks, thereby improving the quality of the sampled point cloud and further improving the performance of downstream tasks.
[0090] The following will describe each embodiment of the present application in detail with reference to the drawings.
[0091] In an embodiment of the present application, FIG. 1 is a first schematic flowchart of a method for processing point cloud data provided by an embodiment of the present application. As shown in FIG. 1, the method may include:
[0092] S101, determine the first point cloud data set.
[0093] S102, use the data processing network to process the first point cloud data set to determine the second point cloud data set.
[0094] It should be noted that in the embodiment of the present application, the method for processing point cloud data may refer to a point cloud sampling method, specifically a point cloud downsampling method using a data processing network. Among them, by performing downsampling processing on the input first point cloud data set, a second point cloud data set with fewer points can be obtained.
[0095] It should also be noted that in the embodiment of the present application, for the data processing network, the first point cloud data set may be the input data of the data processing network, and the second point cloud data set may be the output data of the data processing network, and the number of points included in the second point cloud data set is less than the number of points in the first point cloud data set.
[0096] Exemplarily, the number of points in the first point cloud data set is n, and the number of points in the second point cloud data set is m, then m < n, and both m and n are positive integers. That is to say, the input of the data processing network is a point cloud data set with n points, and the output of the data processing network is a sparse point cloud data set with m points, and m < n.
[0097] In some embodiments, processing the first point cloud dataset using a data processing network, as shown in FIG2 , may include:
[0098] S201, determining the contribution scores of points in a first point cloud dataset.
[0099] S202: Determine a second point cloud data set according to the contribution score.
[0100] It should be noted that, in an embodiment of the present application, determining the contribution scores of points in the first point cloud dataset may include: determining the contribution scores of points in the first point cloud dataset based on the importance of the points in the first point cloud dataset relative to the task to be processed.
[0101] Here, the task to be processed serves as a downstream task of the data processing network, and can be tasks such as point cloud classification, point cloud registration, sampling-based point cloud compression, or point cloud surface reconstruction. In other words, the data processing network is task-oriented, and the contribution score can represent the importance of the points in the first point cloud dataset relative to the task to be processed, so that the data processing network strikes a balance between preserving the geometric properties of the input point cloud and optimizing specific downstream tasks.
[0102] It should also be noted that, in an embodiment of the present application, determining the second point cloud dataset based on the contribution score may include: determining m index numbers of points to be selected based on the contribution score; determining the second point cloud dataset based on the m index numbers of points to be selected and the first point cloud dataset; wherein m is a positive integer.
[0103] In a specific embodiment, determining the index numbers of the m points to be selected based on the contribution scores may include: determining the index numbers corresponding to the first m points with the highest contribution scores as the index numbers of the m points to be selected.
[0104] It should also be noted that in the embodiment of the present application, the higher the contribution score, the more important the corresponding feature, so the points with high contribution scores need to be selected here. In other words, the points in the second point cloud dataset can be composed of a portion of points with the highest contribution scores.
[0105] For example, the index numbers corresponding to the top m points with the highest contribution scores, that is, the index numbers of the m points to be selected, can be selected based on the top-k query method; then, m points are selected from the first point cloud dataset according to the index numbers of the m points to be selected, and the second point cloud dataset can be composed based on these m points.
[0106] It is understood that in the embodiments of the present application, the data processing network can be a learning-based neural network, specifically a point cloud sampling network, such as a contribution mark-based sampling network (CMS-Net). Here, the number of points contained in the second point cloud dataset is less than the number of points in the first point cloud dataset, so the data processing network can also be called a downsampling network.
[0107] In some embodiments, the data processing network may include a feature extraction module, a cascaded attention module, and a contribution marking module. Accordingly, the data processing network is used to process the first point cloud dataset to determine the second point cloud dataset, as shown in FIG3 . The method may include:
[0108] S301: Perform feature extraction on a first point cloud dataset using a feature extraction module to obtain point feature information of the first point cloud dataset.
[0109] S302: Use the cascade attention module to perform attention analysis on the point feature information of the first point cloud dataset to obtain attention feature information of the first point cloud dataset.
[0110] S303: Use a contribution marking module to perform contribution evaluation on the attention feature information of the first point cloud dataset to determine a second point cloud dataset.
[0111] In the embodiment of the present application, the feature extraction module may also be referred to as a feature embedding module (FEM), which is mainly used to obtain a point-by-point feature map of the first point cloud data set. Specifically, the feature extraction module is used to capture local features and all features.
[0112] In the embodiments of the present application, the Cascade Attention Module (CAM) is primarily used to distinguish feature importance and identify and capture the most relevant and informative points during training. Specifically, the Cascade Attention Module is used to emphasize attractive features and suppress less important features.
[0113] In this embodiment of the present application, the Contribution Marking Module (CMM) primarily maps features into point-by-point labels that represent the importance of each point relative to the downstream task. The label here is the contribution score. The larger the contribution score, the more important the corresponding feature. Therefore, the points in the second point cloud dataset can be composed of the points with the highest contribution scores.
[0114] For example, FIG4 is a schematic diagram of the architecture of a data processing network provided in an embodiment of the present application. As shown in FIG4 , the data processing network includes a feature extraction module 401 , a cascaded attention module 402 , and a contribution marking module 403 .
[0115] Among them, for the first point cloud dataset P in ∈R n×3 First, the feature extraction module 401 is used to capture local features and global features, and then the cascade attention module 402 is used to emphasize attractive features and suppress less important features; then the contribution marking module 403 is used to map the features into point-by-point marks, namely contribution scores. Among them, these marks represent the importance of each point relative to the task to be processed. As shown in Figure 4, the contribution scores may include 3.4, 0.8, 4.1, 2.2, 2.6 and 1.2, from which the m elements with the highest scores are selected, such as 3.4, 4.1, 2.2 and 2.6, and then the point index numbers (Index, Idx) corresponding to these selected elements are obtained, which can be 0, 2, 3, and 4; according to these point index numbers and the input first point cloud data set, the corresponding m sampling points can be selected; finally, based on these m sampling points, the sampled second point cloud data set P can be obtained. sp ∈R m×3 In addition, it should be noted that for the first point cloud dataset P in It includes n points, each point has three-dimensional information, so the dimension here is n×3; for the second point cloud dataset P sp There are m points in it, each of which has three-dimensional information, so the dimension here is m×3.
[0116] In some embodiments, the feature extraction module may include a grouping module and a pooling module. Accordingly, the feature extraction module is used to extract features from the first point cloud dataset to obtain point feature information of the first point cloud dataset. As shown in FIG5 , the method may include:
[0117] S501: Utilize a grouping module to group and jointly process the first point cloud dataset to obtain joint feature information of the first point cloud dataset.
[0118] S502: Using a pooling module, perform feature mapping and pooling operations on the joint feature information of the first point cloud dataset to obtain point feature information of the first point cloud dataset.
[0119] It is understood that in the embodiment of the present application, the grouping module may include a grouping layer, a replication layer, and a first union layer. The grouping layer is primarily used to extract local features of the first point cloud dataset, while the replication layer is primarily used to obtain global features of the first point cloud dataset. Then, through the combined processing of the first union layer, both local and global features of the first point cloud dataset can be obtained.
[0120] In a specific embodiment, the first point cloud dataset is grouped and jointly processed using a grouping module to obtain joint feature information of the first point cloud dataset, which may include: grouping the first point cloud dataset using a grouping layer to obtain group feature information of the first point cloud dataset; copying the first point cloud dataset a preset number of times using a copying layer to obtain copy feature information of the first point cloud dataset; and combining the group feature information and the copy feature information using a first joint layer to obtain joint feature information of the first point cloud dataset.
[0121] It should be noted that, in the embodiment of the present application, taking a point in the first point cloud data set as an example, this point can be represented by p. The grouping operation for point p is defined as follows: Group(p) = {p1-p,p2-p,…,p k -p} (1)
[0122] Among them, p i are the k neighbor nodes of point p, i=1,…,k. Thus, if the first point cloud dataset is P in Indicates that point p is P in A point in , then for P in The grouping layer of F is defined as follows: group =Group(P in ) (2)
[0123] Here, F group That is the grouping feature information.
[0124] It should also be noted that, in the embodiment of the present application, the preset number of times corresponds to the value of l. That is, the number of neighbor nodes used for each point in the grouping process is the number of times it needs to be replicated in the replication layer in order to form a match in terms of quantity. Here, the replication feature information can be generated by P in The result obtained by copying k times can be expressed as
[0125] It should also be noted that, in the embodiment of the present application, the joint layer can be represented by concat(·). In this way, the joint feature information can be represented as
[0126] It can also be understood that in an embodiment of the present application, the pooling module includes a first multi-layer perception layer, an average pooling layer, a maximum pooling layer, a second joint layer and a second multi-layer perception layer. Among them, the multi-layer perception layer can be composed of a multi-layer perceptron (MLP). MLP is an artificial neural network with a directional structure that maps a set of input vectors to a set of output vectors. MLP can be regarded as a directed graph consisting of multiple node layers, each layer fully connected to the next layer. Both the first multi-layer perception layer and the second multi-layer perception layer can be represented by σ(·) so as to map the features into the hyperspace. In addition, the computational cost can be reduced by introducing the pooling operation. The maximum pooling layer is good at searching for important features at the expense of losing other features; the average pooling layer is good at aggregating features of different channels, but cannot effectively extract local details. The dual advantages of maximum pooling and average pooling are combined here.
[0127] In a specific implementation method, using a pooling module to perform feature mapping and pooling operations on the joint feature information of the first point cloud dataset to obtain point feature information of the first point cloud dataset can include: using a first multi-layer perception layer to perform feature mapping on the joint feature information of the first point cloud dataset to obtain first mapping feature information of the first point cloud dataset; using an average pooling layer to perform an average pooling operation on the first mapping feature information of the first point cloud dataset to obtain average pooling information of the first point cloud dataset; using a maximum pooling layer to perform a maximum pooling operation on the first mapping feature information of the first point cloud dataset to obtain maximum pooling information of the first point cloud dataset; using a second joint layer to combine the average pooling information and the maximum pooling information to obtain second mapping feature information of the first point cloud dataset; using a second multi-layer perception layer to perform feature mapping on the second mapping feature information to obtain point feature information of the first point cloud dataset.
[0128] It should be noted that, in the embodiment of the present application, based on the first multi-layer perception layer, σ(·) can map features to the hyperspace. Here, the first mapping feature information can be used to combine The specific definition is as follows:
[0129] It should also be noted that, in the embodiment of the present application, when F combine After that, the pooling operation is introduced to reduce the computational cost, and the network permutation-equivariance is used. Among them, the maximum pooling layer can be represented by max(·), which is good at searching for important features at the expense of losing other features; the average pooling layer can be represented by avg(·), which is good at aggregating features of different channels but cannot effectively extract local details. Specifically, the average pooling information can be represented by avg(F combine) indicates that the maximum pooling information can be expressed as max(F combine ) is shown; then, by combining the dual advantages of the maximum pooling layer and the average pooling layer, a spatial pooling layer is proposed, which is specifically defined as follows: F point-wise =σ(concat(max(F combine ),avg(F combine ))) (4)
[0130] Here, point feature information can be used point-wise Indicates that the second mapping feature information can be expressed as concat(max(F combine ),avg(F combine ))express.
[0131] For example, FIG6 is a schematic diagram of the network structure of a feature extraction module provided in an embodiment of the present application. As shown in FIG6 , the feature extraction module includes a grouping module 601 and a pooling module 602. Specifically, the grouping module 601 may include a grouping layer 611, a replication layer 612, and a first joint layer 613, and the pooling module 602 may include a first MLP layer 621, an average pooling layer 622, a maximum pooling layer 623, a second joint layer 624, and a second MLP layer 625.
[0132] In the grouping module 601, the first point cloud data set is first grouped by the grouping layer 611 to obtain the grouping feature information F group ; Duplicate the first point cloud dataset k times through the replication layer 612 to obtain the replication feature information Then, the first joint layer 613 is used to group the feature information F group and copy feature information Combine and obtain the joint feature information of the first point cloud dataset In the pooling module 602, the first MLP layer 621 is first used to pool the joint feature information Perform feature mapping to obtain the first mapping feature information F combine ; Then use the average pooling layer 622 to map the first feature information F combine Perform average pooling operation to obtain average pooling information avg(F combine ); using the maximum pooling layer 623 to map the first feature information F combine Perform the maximum pooling operation to obtain the maximum pooling information max(F combine ); then use the second joint layer 624 to average pooling information avg(F combine ) and maximum pooling information max(F combine) are combined to obtain the second mapping feature information concat(max(F combine ),avg(F combine )); Finally, the second MLP layer 625 is used to concat(max(F combine ),avg(F combine )) Perform feature mapping to obtain the point feature information F of the first point cloud dataset point-wise .
[0133] In addition, it should be noted that F point-wise Characterizes the feature information of each point in the first point cloud dataset, so it can also be called point-wise feature information. Here, the first point cloud dataset can be represented by P in ∈R n×3 However, after the grouping layer 611, the first multi-layer perception layer 621 and the second multi-layer perception layer 625, the dimensional information of the points will be adjusted. Assuming that each point has c-dimensional information after adjustment, the output point-by-point feature information can be expressed as F point-wise ∈R n×c express.
[0134] In some embodiments, the cascaded attention module includes at least two attention modules and a third union layer. Accordingly, using the cascaded attention module to perform attention analysis on the point feature information of the first point cloud dataset to obtain attention feature information of the first point cloud dataset may include: using the at least two attention modules to perform attention feature extraction on the point feature information of the first point cloud dataset to obtain at least two intermediate feature information of the first point cloud dataset; and combining the at least two intermediate feature information using the third union layer to obtain the attention feature information of the first point cloud dataset.
[0135] It can be understood that in the embodiment of the present application, the at least two attention modules can be 2 attention modules, or can be 3 attention modules, 4 attention modules, or more attention modules, etc., which is not specifically limited here.
[0136] In a specific embodiment, taking three attention modules as an example, that is, at least two attention modules include a first attention module, a second attention module, and a third attention module. Then, the cascaded attention module is used to perform attention analysis on the point feature information of the first point cloud dataset to obtain the attention feature information of the first point cloud dataset, as shown in FIG7 . The method may include:
[0137] S701: Use a first attention module to perform attention feature extraction on point feature information of a first point cloud dataset to obtain first intermediate feature information of the first point cloud dataset.
[0138] S702: Use the second attention module to perform attention feature extraction on the first intermediate feature information of the first point cloud data set to obtain second intermediate feature information of the first point cloud data set.
[0139] S703: Use the third attention module to perform attention feature extraction on the second intermediate feature information of the first point cloud dataset to obtain third intermediate feature information of the first point cloud dataset.
[0140] S704: Use the third joint layer to combine the first intermediate feature information, the second intermediate feature information, and the third intermediate feature information to obtain attention feature information of the first point cloud data set.
[0141] It should be noted that, in the embodiment of the present application, the first intermediate feature information can be used Indicates that the second intermediate feature information can be used Indicates that the third intermediate feature information can be used Indicates that attention feature information can be expressed as F concat The specific definition is as follows:
[0142] For example, FIG8 is a schematic diagram of a network structure of a cascade attention module provided in an embodiment of the present application. As shown in FIG8, the cascade attention module includes a first attention module 801, a second attention module 802, a third attention module 803 and a third joint layer 804. The third joint layer 804 is used to process the first intermediate feature information. Second intermediate feature information and the third intermediate feature information Combined, the obtained attention feature information F concat This is the feature after cascading.
[0143] It should also be noted that in the embodiment of the present application, whether it is the first attention module, the second attention module or the third attention module, that is, for any one of the at least two attention modules, the attention module can include any one of the self-attention module and the offset attention module.
[0144] In a specific implementation, if the attention module is a self-attention module, the self-attention module includes a first attention layer, a third multi-layer perception layer, and a first adder. Accordingly, in some embodiments, using the self-attention module to perform attention feature extraction on the point feature information of the first point cloud dataset to obtain first intermediate feature information of the first point cloud dataset may include: using the first attention layer to perform attention feature extraction on the point feature information of the first point cloud dataset to obtain first attention information of the first point cloud dataset; using the third multi-layer perception layer to perform feature mapping on the first attention information of the first point cloud dataset to obtain second attention information of the first point cloud dataset; using the first adder to perform addition operation on the point feature information and the second attention information of the first point cloud dataset to obtain intermediate feature information of the first point cloud dataset.
[0145] It should be noted that, in the embodiment of the present application, it is assumed that the first attention information can be sa The obtained second attention information can be expressed as γ(F sa ) represents; then the first adder is used to add the point feature information F point-wise and the second attention information γ(F sa ) is added, and the intermediate feature information obtained can be used out Indicates that, F out =γ(F sa )+F point-wise .
[0146] In another specific implementation, if the attention module is an offset attention module, the offset attention module includes a second attention layer, a subtractor, a fourth multi-layer perception layer, and a second adder. Accordingly, in some embodiments, using the offset attention module to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first intermediate feature information of the first point cloud data set may include: using the second attention layer to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first attention information of the first point cloud data set; using the subtractor to perform a subtraction operation on the point feature information of the first point cloud data set and the first attention information to obtain the second attention information of the first point cloud data set; using the fourth multi-layer perception layer to perform feature mapping on the second attention information of the first point cloud data set to obtain the third attention information of the first point cloud data set; using the second adder to perform an addition operation on the point feature information of the first point cloud data set and the third attention information to obtain the intermediate feature information of the first point cloud data set.
[0147] It should be noted that, in the embodiment of the present application, it is assumed that the first attention information can be sa Indicates that at this time, the point feature information F is obtained by the subtractor point-wise and the first attention information Fsa Perform subtraction operation, and the second attention information obtained can be used point-wise -F sa Indicates that the fourth multi-layer perception layer is used to perform feature mapping on the second attention information, and the obtained third attention information can be expressed as γ(F point-wise -F sa ) represents; then the second adder is used to add the point feature information F point-wise and the third attention information γ(F point-wise -F sa ) is added, and the intermediate feature information obtained can be used out Indicates that, F out =γ(F point-wise -F sa )+F point-wise .
[0148] It should also be noted that in the embodiments of this application, the attention mechanism is widely used to distinguish the importance of features. Since selective sampling is to select a subset of important points from the first input point cloud data set, an attention mechanism is proposed here to identify and capture the most relevant and informative points during the training process. In order to implement the attention mechanism, self-attention (SA) is a commonly used attention method: (Q, K, V) = F point-wise ·(W q ,W k ,W v ) (6)
[0149] Among them, F point-wise Represents the input point feature information, Q represents the query vector (Query), K represents the key vector (Key), and V represents the value vector (Value); W q ,W k and W v is a shared learnable linear transformation, d k is the dimension K of the key vector, and the SA layer can be: F out =SA(F point-wise )=γ(F sa )+F point-wise (8)
[0150] Where γ(·) is the MLP operation. However, when the network is deeper, the SA layer cannot handle the problem of information loss. Considering the difference between the attention feature and the input feature, we can also use offset attention (OA) to modify the feature, as follows: F out =OA(F point-wise)=γ(F point-wise -F sa )+F point-wise (9)
[0151] For example, FIG9 is a schematic diagram of an application scenario of an attention module provided by an embodiment of the present application. As shown in FIG9 , the application scenario may include a transpose operation module (transpose, T), a division operation module (division, DIV), a regression function module (softmax, S), a multiplication operation module, a switch module, a subtraction operation module, a multi-layer perception module (MLP) and an addition operation module. point-wise After the operations of the transposition operation module, the division operation module, the regression function module and the multiplication operation module, we can get F sa If the switch module is connected to side a, the attention module is the self-attention module, that is, F out =γ(F sa )+F point-wise If the switch module is connected to the b side, the attention module is the offset attention module, that is, F out =γ(F point-wise -F sa )+F point-wise .
[0152] It should also be noted that, in the embodiment of the present application, since the neural network may not be able to save all relevant information as the network layer deepens, it becomes more difficult to save important information of the earlier layers, resulting in the loss of reconstruction and semantic information. Therefore, a CAM method is proposed here to combine the information of the previous layer with the information of the subsequent layer. Specifically, the CAM module can be composed of three jump-connected OA layers, and the output of each layer is connected in series along the feature dimension, as shown in Figure 10. In Figure 10, the cascaded attention module 1001 may include an OA layer 1011, an OA layer 1012, an OA layer 1013 and a third joint layer 1014. Among them, for the OA layer 1011, For the OA layer 1012, For the OA layer 1013, For the third joint layer 1014, Here, represents the output feature of the i-th OA layer, F concat It is the concatenated attention feature information.
[0153] In some embodiments, the contribution marking module may include a fifth multi-layer perception layer, a fully connected layer, and a selection module. Accordingly, the contribution marking module is used to perform a contribution evaluation on the attention feature information of the first point cloud dataset to determine the second point cloud dataset, as shown in FIG11 . The method may include:
[0154] S1101, using the fifth multi-layer perception layer to perform feature mapping on the attention feature information of the first point cloud data set to obtain fourth intermediate feature information of the first point cloud data set.
[0155] S1102: Perform full connection processing on the fourth intermediate feature information of the first point cloud dataset using a fully connected layer to determine a contribution score of the point in the first point cloud dataset.
[0156] S1103 , performing selection processing on the first point cloud dataset using the selection module and the contribution score, determining m index numbers of points to be selected, and determining the second point cloud dataset according to the m index numbers of points to be selected.
[0157] It should be noted that, in the embodiment of the present application, m is a positive integer. In addition, the fully connected layer can be composed of linear functions, which can be represented by FC(·).
[0158] It should also be noted that, in embodiments of the present application, the selection module may utilize top-k optimization to select sampling points based on the contribution score of each point, thereby ensuring that no duplicate points exist in the second point cloud dataset. In some embodiments, utilizing the selection module and the contribution scores to select the first point cloud dataset and determine the index numbers of the m points to be selected may include performing a top-k query on the first point cloud dataset based on the contribution scores to determine the index numbers corresponding to the top m points with the highest contribution scores, thereby obtaining the index numbers of the m points to be selected.
[0159] Accordingly, determining the second point cloud dataset based on the index numbers of the m points to be selected may include: determining candidate points corresponding to the index numbers of the m points to be selected based on the first point cloud dataset, and determining the candidate points corresponding to the index numbers of the m points to be selected as the second point cloud dataset.
[0160] That is, in the embodiment of the present application, the connected features can be mapped to point-by-point contribution marks to evaluate the contribution of each point. con It can be made by F concat The mapping operation is implemented through MLP and fully connected layers, as shown below: con =FC(ρ(F concat )) (10)
[0161] Where ρ(·) represents the MLP operation and FC(·) represents the fully connected layer. con Each element in represents the quantitative importance relative to the task being processed. The larger the contribution score, the more important the corresponding feature. Then in S con Select the element with the highest score and get the point index number Idx corresponding to each selected element. Finally, according to the input first point cloud dataset P in The sampling point corresponding to the selected point index number Idx can obtain the sampling point cloud, that is, the second point cloud dataset P sp , which can be specifically expressed as: Idx=top(S con ,m) (11) P sp =reference(P in ,Idx) (12)
[0162] For example, still taking FIG10 as an example, the contribution marking module 1002 may include a fifth MLP layer 1021, a fully connected layer 1022 and a selection module 1023. As shown in FIG10, F concat After the feature mapping of the fifth MLP layer 1021, ρ(F concat ); Then, through the full connection processing of the fully connected layer 1022, the contribution score S can be obtained. con , such as 3.4, 0.8, 4.1, 2.2, 2.6 and 1.2. Then, the m values with the highest scores are selected from these contribution scores, such as 3.4, 4.1, 2.2 and 2.6, so as to determine the point index numbers Idx corresponding to these values, i.e. 0, 2, 3, 4; according to these point index numbers Idx and the input first point cloud dataset P in , we can get the selected m sampling points, so as to form a sampling point cloud, that is, the second point cloud dataset P sp ∈R n×3 .
[0163] An embodiment of the present application provides a point cloud data processing method, which first determines a first point cloud dataset, then processes the first point cloud dataset using a data processing network to determine a second point cloud dataset. Specifically, the method comprises: determining the contribution scores of the points in the first point cloud dataset; and determining the second point cloud dataset based on the contribution scores. Here, the first point cloud dataset is the input data of the data processing network, and the second point cloud dataset is the output data of the data processing network, and the number of points contained in the second point cloud dataset is less than the number of points in the first point cloud dataset. In other words, downsampling the first point cloud dataset based on the contribution score to obtain a second point cloud dataset with fewer points not only reduces computational complexity, but also, because the contribution score can characterize the importance of the points in the first point cloud dataset relative to the task to be processed, that is, the downsampling method is task-oriented, the data processing network can also strike a balance between preserving the geometric properties of the input point cloud and optimizing specific downstream tasks, thereby improving the quality of the sampled point cloud and, in turn, the performance of downstream tasks.
[0164] In another embodiment of the present application, based on the point cloud data processing method of the aforementioned embodiment, FIG12 is a flow chart of a point cloud data processing method provided by the embodiment of the present application. As shown in FIG12 , the method may include:
[0165] S1201: Determine a first point cloud dataset.
[0166] S1202: Process the first point cloud dataset using a data processing network to determine a second point cloud dataset.
[0167] S1203: Input the second point cloud data set into the preset task network, and output the execution result corresponding to the task to be processed through the preset task network.
[0168] It should be noted that in an embodiment of the present application, for a data processing network, the first point cloud dataset may be the input data of the data processing network, the second point cloud dataset may be the output data of the data processing network, and the number of points contained in the second point cloud dataset is less than the number of points in the first point cloud dataset.
[0169] It should also be noted that in this embodiment of the present application, the second point cloud dataset is input into the preset task network to obtain the execution result corresponding to the pending task. The pending task here can be downstream tasks such as point cloud classification, point cloud registration, sampling-based point cloud compression, and point cloud surface reconstruction, which are not specifically limited here.
[0170] In some embodiments, the method may include: determining at least two groups of training samples, each group of training samples including a point cloud data sample set and a task sample; performing model training on the initial joint model based on the at least two groups of training samples, and determining the trained model as the target joint model; wherein the target joint model includes a data processing network and a preset task network.
[0171] It should be noted that in the embodiment of the present application, for model training, the data processing network and the preset task network can be jointly trained. When the aforementioned joint loss function meets the convergence condition, the trained model is determined as the target joint model, that is, the data processing network and the preset task network are trained.
[0172] It should also be noted that in the embodiments of this application, a joint loss function is also defined. In some embodiments, the method may further include: determining a task loss function for the second point cloud dataset for the task to be processed, and determining a sampling loss function between the first point cloud dataset and the second point cloud dataset; and determining a joint loss function corresponding to the target joint model based on the task loss function and the sampling loss function.
[0173] In a specific embodiment, determining the joint loss function corresponding to the target joint model based on the task loss function and the sampling loss function can include: determining a first factor; determining a weighted sampling loss function based on the first factor and the sampling loss function; and determining the joint loss function based on the task loss function and the weighted sampling loss function.
[0174] It should be noted that, in the embodiment of the present application, the first factor can be represented by α, and the task loss function can be represented by L task (P sp ) indicates that the sampling loss function can be expressed as L emd (P in ,P sp ) is represented. Thus, for the joint loss function, it is defined as follows: L total =L task (P sp )+αL emd (P in ,P sp ) (13)
[0175] Among them, L task (·) aims to encourage the model to learn and optimize the down-sampled point set with specific downstream tasks, that is, different downstream tasks have corresponding L task (·) Different; L emd(·) refers to the Earth Mover's Distance (EMD) loss that preserves the geometric structure of the downsampled point cloud, and α is a weight factor that balances the two parts. Specifically, L emd (·) The goal is to minimize P in and P sp distance between them, thus ensuring that they are similar to each other:
[0176] in, is a bijective function, its purpose is to find a bijective Make P in and P sp Minimize the distance between corresponding points in .
[0177] It should also be noted that in the embodiment of the present application, the sampling loss function can be an EMD loss function, or a CD loss function can be used instead, which is not specifically limited here.
[0178] For example, FIG13 is a schematic diagram of an application framework of a data processing network provided by an embodiment of the present application. As shown in FIG13 , the application framework includes a data processing network 1301 and a preset task network 1302. The data processing network 1301 may include a feature extraction module, a cascade attention module, and a contribution marking module. in ∈R n×3 First, the feature extraction module is used to capture local features and global features, and then the cascade attention module is used to emphasize attractive features and suppress less important features. Then, the contribution tag module is used to map the features into point-by-point tags, namely contribution scores. According to the contribution scores, the m elements with the highest scores are selected, such as 3.4, 4.1, 2.2 and 2.6, and then the Idx corresponding to these selected elements are obtained, which can be 0, 2, 3, and 4. According to these Idx and the input first point cloud dataset P in , we can select the corresponding m sampling points from it; finally, we can get the second point cloud dataset P after sampling based on these m sampling points sp ∈R m×3 Finally, the second point cloud dataset P sp The preset task network 1302 is input, and the execution result corresponding to the task to be processed is output through the preset task network 1302.
[0179] The following describes the effectiveness of the proposed method in four common downstream tasks: point cloud classification, point cloud registration, sampling-based point cloud compression, and point cloud surface reconstruction. Throughout the training phase, random rotation and scaling are combined to increase the input point cloud, thereby enhancing the network's resilience.
[0180] (1) Point cloud classification.
[0181] The ModelNet40 dataset is used to evaluate the performance of the proposed data processing network in conjunction with a classification network (PointNet) for classification tasks. This dataset consists of 12,311 computer-aided design (CAD) models representing 40 categories of man-made objects. To ensure fairness, the default training-test split is maintained, dividing the dataset into 9,843 point clouds for training and 2,468 point clouds for testing. Each input point cloud consists of 1,024 points. The method involves jointly training the proposed data processing network and PointNet. The loss function for this task is the cross-entropy between the predicted labels and the true data labels. CMS-Net is trained using a batch size of 8 for over 200 epochs, with a learning rate set to 0.001. Values of α = 1, c = 64, and k = 32 are also specified for the training process. Classification accuracy is evaluated for a range of sampling ratios, representing the ratio between the number of points n in the original point cloud and the number of points m in the downsampled point cloud. During evaluation, the results were compared with five state-of-the-art sampling methods: S-Net, Sample-Net, CO-Net, DA-Net, PST-Net, MOPS-Net, and SS-Net. Furthermore, FPS was used as a baseline. With the exception of SS-Net, all other methods used the same PointNet. To adapt SS-Net for this evaluation, its network structure was modified by eliminating its fully connected layers and directing the output point cloud of the direct sampling module to PointNet for classification.
[0182] For example, Table 1 shows the classification accuracy achieved by different methods at different sampling ratios. The results show that the proposed CMS-Net significantly outperforms other methods at different sampling ratios. Of particular note, the advantages of the proposed CMS-Net become increasingly apparent as the sampling ratio increases. This phenomenon can be attributed to the fact that information loss in the downsampled point cloud increases with increasing sampling ratio. In this case, CMS-Net's ability to retain points with key features enables it to maintain superior classification performance.
[0183] Table 1
[0184] Furthermore, Figure 14 is a schematic diagram of a visual comparison of the data processing network provided in an embodiment of the present application and other sampling methods. Therein, bold black dots represent sampling points, and the sampling ratio here is equal to 8. Figure 14 provides a visualization of the point clouds generated by other sampling methods such as FPS, S-Net, Sample-Net, and the CMS-Net proposed in an embodiment of the present application. Visual comparison shows that compared with other learning-based sampling methods, the CMS-Net proposed in an embodiment of the present application can skillfully capture complex local details in complex point clouds while retaining the geometric shape of smooth point clouds. In addition, it is worth noting that although FPS provides a more uniform sampling pattern, its classification performance is not satisfactory at high sampling ratios.
[0185] Figure 15 is a schematic diagram of the contribution score distribution of different loss functions provided in an embodiment of the present application. As shown in Figure 15, a heat map result is given here, which describes the contribution score of each point during training under different loss functions. Among them, the black dots in (a) indicate that the point cloud constrained by the cross entropy loss shows more prominent hotspots in the edge area. In contrast, the black dots in (b) indicate that the point cloud restricted by the EMD loss shows a high score around the main part of the point cloud. This distribution can be attributed to the fact that edge points often have key semantic features that affect the classification of the point cloud. At the same time, points located on body parts are associated with reconstruction features, which is crucial for preserving the overall content of the point cloud. Combining cross entropy and EMD losses, the network highlights points that have both semantic and reconstruction properties, such as the black dots in (c). As can be seen from Figure 15, the ability of the network to simultaneously capture and balance semantic and reconstruction information in point cloud data is effectively emphasized here.
[0186] (2) Point cloud registration.
[0187] For the point cloud registration task, the network is tested using the Iterative Closest Point (ICP) algorithm. The method here involves two main steps: first, the point cloud is sampled using the proposed CMS-Net. Subsequently, the rotated point cloud is aligned with the sampled point cloud using the ICP algorithm to obtain the registered point cloud. In order to evaluate the performance of the network, it is necessary to check the rotation error between the registered point cloud and the sampled point cloud. The network is tested and trained on the ModelNet40 dataset. Among them, 4 batch sizes and a total of 300 epochs are used, and the learning rate during training is set to 0.001. In addition, during the training stage, the values of α=1, c=64, and k=32 are retained here.
[0188] Figure 16 is a schematic diagram comparing the alignment results of the data processing network provided in an embodiment of the present application and other sampling methods. As shown in Figure 16, the average rotation error on the test point cloud is provided here, and the lower the rotation error, the better. Among them, the horizontal coordinate axis represents the sampling ratio (Sample Ratio) and the vertical coordinate axis represents the rotation error (Rotation error). According to Figure 16, it can be shown that the CMS-Net of the embodiment of the present application always provides superior performance at all test sampling ratios. This favorable result can be attributed to the fact that the method of the embodiment of the present application can strategically select the most representative and important points, thereby promoting the ICP algorithm to identify the optimal alignment solution.
[0189] (3) Point cloud compression.
[0190] For this particular task, the method of the embodiment of the present application involves a multi-step process. First, the input point cloud is sampled using CMS-Net. Subsequently, the downsampled point cloud is encoded through the G-PCC platform, which is a product of the Moving Picture Experts Group. Finally, the reconstructed downsampled point cloud is upsampled using a state-of-the-art upsampling network (PU-Refiner) to obtain an upsampled point cloud. The performance of the network is evaluated based on the distortion between the initial point cloud and the upsampled corresponding point cloud. To demonstrate the effectiveness of the method, it was tested using the previously unseen PU1K dataset. Due to the limitations of the G-PCC architecture, CMS-Net and PU-Refiner are trained independently, and no additional task loss is introduced during the CMS-Net training phase. During training, we chose 4 batch sizes, spanning a total of 400 epochs, while maintaining a learning rate of 0.001. In addition, the parameter values α=1, c=64 and k=32 remained unchanged throughout the training process.
[0191] For ease of processing, during the compression stage, the point cloud is converted to a voxelized representation in a cubic space with a side length of 512 units, resulting in integer geometric coordinates. The initial point cloud contains 2048 points, which is subsequently downsampled to 512 points. After downsampling, the resulting point cloud is encoded and decoded using G-PCC (version 19). Note that due to the lossy compression scheme, the exact number of points in the decoded point cloud may vary. Finally, the decoded point cloud is upsampled to 4 times the number of points in the decoded cloud. Given that point clouds downsampled using other learning-based techniques often do not accurately preserve geometry and exhibit considerable distortion compared to the original input point cloud, the comparison here focuses solely on FPS, which provides the most favorable visualization of the downsampling.
[0192] Figure 17 shows a schematic diagram comparing the rate-distortion curves of the data processing network provided in an embodiment of the present application with other sampling methods. As shown in Figure 17 , the rate-distortion performance averaged over the test point cloud is provided. The horizontal axis represents the number of bits per pixel (bpp), and the vertical axis represents the peak signal-to-noise ratio (PSNR).
[0193] Figure 18 is a schematic diagram of the first comparison of the point cloud compression visualization of the data processing network provided in an embodiment of the present application and other sampling methods. As shown in Figure 18, for the input point cloud, it passes through different downsampling networks (including Sample-Net, FPS, CMS-Net), then passes through the encoding and decoding processing of G-PCC, and finally the point cloud is reconstructed by the upsampling network (PU-Refiner). Figure 19 is a schematic diagram of the second comparison of the point cloud compression visualization of the data processing network provided in an embodiment of the present application and other sampling methods. As shown in Figure 19, the implementation steps are similar to Figure 18. For the input point cloud, it first passes through different downsampling networks (including Sample-Net, FPS, CMS-Net), then passes through the encoding and decoding processing of G-PCC, and finally the point cloud is reconstructed by the upsampling network (PU-Refiner).
[0194] In addition, Table 2 provides a comparison of the lossy compression performance of different compression methods based on G-PCC, that is, a comprehensive evaluation of the results obtained by the proposed network and FPS within the proposed G-PCC compression framework. Among them, the G-PCC standard serves as a baseline method for comparison. It is obvious that both methods show enhanced performance when integrated into the proposed framework, especially in scenarios characterized by low bit rates. It is worth noting that compared with downsampling using FPS, the compression method using CMS-Net as a downsampling technique shows superior performance in terms of geometric PSNR improvement and computational efficiency. This advantage can be attributed to the proficiency of the downsampling mechanism proposed in the embodiment of the present application in preserving the complex details and shape features of the original point cloud.
[0195] Table 2
[0196] (4) Point cloud surface reconstruction.
[0197] In the field of surface reconstruction, the methods here may include sampling the point cloud using CMS-Net and then reconstructing the sampled point cloud into a mesh using the screened Poisson surface reconstruction method. The ModelNet40 dataset was used to evaluate and train this method. In the training phase, we selected a batch size of 4 and trained for more than 300 epochs with a fixed learning rate of 0.001. The parameters α = 1, c = 64, and k = 32 remained consistent throughout the training process.
[0198] FIG. 20 is a schematic diagram showing the comparison of surface reconstruction between a data processing network provided by an embodiment of the present application and other sampling methods. As shown in FIG. 20, for real data (Ground Truth), downsampling networks such as Sample-Net, FPS, and CMS-Net can be used here for surface reconstruction to analyze the conformal ability of the downsampling network. In FIG. 20, through the comparative analysis of the method of the embodiment of the present application and FPS, the research results show that both CMS-Net and FPS perform excellently in retaining the structural features of the original point cloud. In addition, since the CMS-Net of the embodiment of the present application can retain points with reconstruction and semantics, compared with the mesh reconstructed using FPS, the reconstructed mesh obtained by using the method of the embodiment of the present application can show more complex details.
[0199] The embodiment of the present application provides a method for processing point cloud data. Through the above embodiments, the specific implementation of the foregoing embodiments is elaborated in detail. It can be seen therefrom that according to the technical solutions of the foregoing embodiments, an end-to-end trainable point cloud sampling network, namely CMS-Net, is proposed here. Points are selected according to their contribution scores to downstream tasks to form a sampled point cloud, which can achieve a balance between retaining the geometric attributes of the input point cloud and optimizing specific downstream tasks. At the same time, an EMD loss function is introduced here, which can effectively adjust the shape of the sampled point cloud, thereby improving the quality of the sampled point cloud and further improving the performance of downstream tasks.
[0200] In another embodiment of the present application, based on the method for processing point cloud data described in the foregoing embodiment, for a given input point cloud with points and a specific downstream task network, the objective of the embodiment of the present application is to design a learning-based sampling network. The input point cloud of this sampling network is a point set of n points, and the output is a point set with (m < n) points. That is to say, the output sampled point cloud P sp has m points, and m < n. As mentioned above, the learning-based sampling method is task-oriented. Therefore, the data processing network proposed here should achieve a balance between retaining the geometric attributes of the input point cloud and optimizing specific downstream tasks.
[0201] The application framework of the data processing network is shown in Figure 13. In Figure 13, a feature extraction module is first used to capture local and global features. Then, a cascade attention module is introduced to emphasize attractive features and suppress less important features. A contribution tag module is connected after the cascade attention module to map the features into point-by-point tags. These tags represent the importance of each point with respect to the downstream task and loss function. By sorting these tags and selecting the point P with the highest score in the sampled point cloud sp Finally, P sp is fed into the preset task network for specific downstream tasks. m can be set arbitrarily in the above process, that is, the proposed network can flexibly adapt to various sampling ratios.
[0202] (1) Feature extraction module.
[0203] As shown in Figure 6, the feature extraction module aims to obtain point-by-point feature maps of the input point cloud. First, we use the grouping layer to extract local features of the input point cloud. The grouping operation of a point is defined as: Group(p) = {p1-p,p2-p,…,p k -p} (15)
[0204] Where P is P in A point in i , i=1,…,k are the k neighbor nodes of point P. Then the grouping layer of the input point cloud is defined as: F group =Group(P in ) (16)
[0205] In the embodiment of the present application, it is hoped that the feature extraction module can extract semantic local features and global reconstruction features at the same time so that the network can adapt to various downstream tasks. In order to obtain local details of the input point cloud, the relative coordinates of the neighbor nodes are used in the grouping layer, such as copying the input point cloud k times and comparing them with F group Combined to effectively preserve global features. Then use the multi-layer perceptron MLP, σ(·) to map the features into the hyperspace. The above process can be written as:
[0206] Afterwards, the pooling operation is introduced to reduce computational cost and make the network permutation-equivariant. The existing maximum pooling layer is good at searching for important features at the expense of losing other features. The average pooling layer is good at aggregating features from different channels, but cannot effectively extract local details. Therefore, a spatial pooling layer is proposed by combining the dual advantages of maximum pooling and average pooling, as follows: point-wise =σ(concat(max(F combine ),avg(F combine))) (18)
[0207] Among them, max(·) represents the maximum pooling layer and avg(·) represents the average pooling layer.
[0208] (2) Cascaded attention module.
[0209] Attention mechanisms are widely used to distinguish feature importance. Since selective sampling is to select a subset of important points from the input point cloud, an attention mechanism is proposed here to identify and capture the most relevant and informative points during training. To implement the attention mechanism, self-attention SA is a commonly used method: (Q, K, V) = F in ·(W q ,W k ,W v ) (19)
[0210] Among them, F in Represents input features, Q represents query vector (Query), K represents key vector (Key), and V represents value vector (Value); W q ,W k and W v is a shared learnable linear transformation, d k is the dimension K of the key vector, and the self-attention layer can be: out =γ(F sa )+F in (twenty one)
[0211] Where γ(·) is the MLP operation. However, when the network is deeper, the self-attention SA layer cannot handle the problem of information loss. Considering the difference between the attention feature and the input feature, the offset attention OA can also be used here to modify the feature: F out =OA(F in )=γ(F in -F sa )+F in (twenty two)
[0212] As shown in Figure 9, the switch module is connected to side a, which is the self-attention module; the switch module is connected to side b, which is the offset attention module. Since the neural network may not be able to save all relevant information as the layers deepen, it becomes more difficult to save important information from earlier layers, resulting in the loss of reconstruction and semantic information. Here, a CAM method is proposed to combine the information of the previous layer with the information of the subsequent layer. Specifically, the CAM can be composed of three jump-connected OA layers, and the output of each layer is concatenated along the feature dimension:
[0213] in, represents the output feature of the i-th OA layer, F concat It is the feature after connection.
[0214] (3) Contribution tag module.
[0215] Figure 10 shows the architecture of CAM and CMM. concat Afterwards, the concatenated features are then mapped to point-wise contribution labels, which are used to evaluate the contribution score of each point. con It is by F concat The mapping operation is implemented through MLP and fully connected layers: S con =FC(ρ(F concat )) (25)
[0216] Where ρ(·) represents the MLP operation and FC(·) represents the fully connected layer. con Each element in represents the quantitative importance to the downstream task. The larger the mark, the more important the corresponding feature. Then select the element with the highest score and get the point index number Idx corresponding to each selected element. Finally, according to the input point cloud P in The index number Idx of the selected down-sampling point is used to obtain the sampling point cloud P sp , which can be expressed as: Idx=top(S con ,m) (26) P sp =reference(P in ,Idx) (27)
[0217] (4) Loss function.
[0218] The joint loss function is used in the network proposed in the embodiment of the present application, which is specifically expressed as: L total =L task (P sp )+αL emd (P in ,P sp ) (28)
[0219] Among them, L task (·) aims to encourage the model to learn and optimize the downsampled point set with specific downstream tasks, L emd (·) refers to the EMD loss that preserves the geometric structure of the downsampled point cloud, and α is the weight factor that balances the two parts. Specifically, L emd (·) The goal is to minimize P in and P spdistance between them, thus ensuring that they are similar to each other:
[0220] in is a bijective function. Its purpose is to find a bijective Make P in and P sp Minimize the distance between corresponding points in .
[0221] It should be noted that in the embodiment of the present application, the CD loss function can be used here instead of the EMD loss function.
[0222] It should also be noted that in this embodiment of the present application, SA can be used instead of OA as the feature extraction for the cascaded attention module. In addition, different values of α, c, and k are specified in the point cloud classification task, where α is the weight factor of the joint loss function, c is the point-by-point feature dimension, and k is the number of neighbor nodes selected for the grouping operation.
[0223] The embodiments of the present application provide a point cloud data processing method. The specific implementation of the aforementioned embodiment is elaborated in detail through the above embodiment. It can be seen that according to the technical solution of the aforementioned embodiment, an end-to-end trainable point cloud sampling network, namely CMS-Net, is proposed here to select points according to their contribution to downstream tasks; and top-k optimization can be used to select sampling points according to the contribution score of each point to ensure that there are no duplicate points in the sampled point cloud; an EMD-based loss function is also introduced here to effectively adjust the shape of the sampled point cloud; in this way, a balance can be achieved between retaining the geometric properties of the input point cloud and optimizing specific downstream tasks, thereby improving the quality of the sampled point cloud and thus improving the performance of downstream tasks.
[0224] In another embodiment of the present application, based on the same inventive concept as the above embodiment, FIG21 is a schematic diagram of the structure of a point cloud data processing device provided in the embodiment of the present application. As shown in FIG21 , the point cloud data processing device 210 may include a determination unit 2101 and a processing unit 2102; wherein,
[0225] A determining unit 2101 is configured to determine a first point cloud dataset;
[0226] a processing unit 2102 configured to process the first point cloud dataset using a data processing network to determine a second point cloud dataset, wherein the first point cloud dataset is input data of the data processing network, the second point cloud dataset is output data of the data processing network, and the number of points included in the second point cloud dataset is less than the number of points in the first point cloud dataset;
[0227] The processing unit 2102 is specifically configured to determine a contribution score of a point in the first point cloud dataset; and determine a second point cloud dataset based on the contribution score.
[0228] In some embodiments, the determining unit 2101 is further configured to determine the contribution scores of the points in the first point cloud dataset based on the importance of the points in the first point cloud dataset relative to the task to be processed.
[0229] In some embodiments, the determination unit 2101 is further configured to determine m index numbers of points to be selected based on the contribution scores; and determine a second point cloud dataset based on the m index numbers of points to be selected and the first point cloud dataset; wherein m is a positive integer.
[0230] In some embodiments, the determining unit 2101 is further configured to determine the index numbers corresponding to the first m points with the highest contribution scores as the index numbers of the m points to be selected.
[0231] In some embodiments, the data processing network includes a feature extraction module, a cascaded attention module and a contribution marking module; accordingly, the processing unit 2102 is also configured to use the feature extraction module to perform feature extraction on the first point cloud dataset to obtain point feature information of the first point cloud dataset; use the cascaded attention module to perform attention analysis on the point feature information of the first point cloud dataset to obtain attention feature information of the first point cloud dataset; and use the contribution marking module to perform contribution evaluation on the attention feature information of the first point cloud dataset to determine the second point cloud dataset.
[0232] In some embodiments, the feature extraction module includes a grouping module and a pooling module; accordingly, the processing unit 2102 is further configured to use the grouping module to group and jointly process the first point cloud data set to obtain the joint feature information of the first point cloud data set; and use the pooling module to perform feature mapping and pooling operations on the joint feature information of the first point cloud data set to obtain the point feature information of the first point cloud data set.
[0233] In some embodiments, the grouping module includes a grouping layer, a replication layer and a first joint layer; accordingly, the processing unit 2102 is further configured to use the grouping layer to group the first point cloud dataset to obtain grouping feature information of the first point cloud dataset; use the replication layer to replicate the first point cloud dataset a preset number of times to obtain replication feature information of the first point cloud dataset; and use the first joint layer to combine the grouping feature information and the replication feature information to obtain joint feature information of the first point cloud dataset.
[0234] In some embodiments, the pooling module includes a first multi-layer perception layer, an average pooling layer, a maximum pooling layer, a second joint layer and a second multi-layer perception layer; accordingly, the processing unit 2102 is also configured to use the first multi-layer perception layer to perform feature mapping on the joint feature information of the first point cloud dataset to obtain the first mapping feature information of the first point cloud dataset; use the average pooling layer to perform an average pooling operation on the first mapping feature information of the first point cloud dataset to obtain the average pooling information of the first point cloud dataset; use the maximum pooling layer to perform a maximum pooling operation on the first mapping feature information of the first point cloud dataset to obtain the maximum pooling information of the first point cloud dataset; use the second joint layer to combine the average pooling information and the maximum pooling information to obtain the second mapping feature information of the first point cloud dataset; and use the second multi-layer perception layer to perform feature mapping on the second mapping feature information to obtain the point feature information of the first point cloud dataset.
[0235] In some embodiments, the cascaded attention module includes at least two attention modules and a third joint layer; accordingly, the processing unit 2102 is also configured to use at least two attention modules to perform attention feature extraction on the point feature information of the first point cloud dataset to obtain at least two intermediate feature information of the first point cloud dataset; and use the third joint layer to combine the at least two intermediate feature information to obtain the attention feature information of the first point cloud dataset.
[0236] In some embodiments, at least two attention modules include a first attention module, a second attention module and a third attention module; accordingly, the processing unit 2102 is further configured to use the first attention module to perform attention feature extraction on the point feature information of the first point cloud dataset to obtain the first intermediate feature information of the first point cloud dataset; use the second attention module to perform attention feature extraction on the first intermediate feature information of the first point cloud dataset to obtain the second intermediate feature information of the first point cloud dataset; use the third attention module to perform attention feature extraction on the second intermediate feature information of the first point cloud dataset to obtain the third intermediate feature information of the first point cloud dataset; and use the third joint layer to combine the first intermediate feature information, the second intermediate feature information and the third intermediate feature information to obtain the attention feature information of the first point cloud dataset.
[0237] In some embodiments, the attention module includes any one of a self-attention module and a bias attention module.
[0238] In some embodiments, when the attention module is a self-attention module, the self-attention module includes a first attention layer, a third multi-layer perception layer and a first adder; accordingly, the processing unit 2102 is also configured to use the first attention layer to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first attention information of the first point cloud data set; use the third multi-layer perception layer to perform feature mapping on the first attention information of the first point cloud data set to obtain the second attention information of the first point cloud data set; and use the first adder to perform addition operation on the point feature information and the second attention information of the first point cloud data set to obtain the intermediate feature information of the first point cloud data set.
[0239] In some embodiments, when the attention module is an offset attention module, the offset attention module includes a second attention layer, a subtractor, a fourth multi-layer perception layer and a second adder; accordingly, the processing unit 2102 is also configured to use the second attention layer to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first attention information of the first point cloud data set; use the subtractor to perform a subtraction operation on the point feature information of the first point cloud data set and the first attention information to obtain the second attention information of the first point cloud data set; use the fourth multi-layer perception layer to perform feature mapping on the second attention information of the first point cloud data set to obtain the third attention information of the first point cloud data set; and use the second adder to perform an addition operation on the point feature information of the first point cloud data set and the third attention information to obtain the intermediate feature information of the first point cloud data set.
[0240] In some embodiments, the contribution marking module includes a fifth multi-layer perception layer, a fully connected layer and a selection module; accordingly, the processing unit 2102 is also configured to use the fifth multi-layer perception layer to perform feature mapping on the attention feature information of the first point cloud dataset to obtain the fourth intermediate feature information of the first point cloud dataset; use the fully connected layer to perform full connection processing on the fourth intermediate feature information of the first point cloud dataset to determine the contribution score of the point in the first point cloud dataset; and use the selection module and the contribution score to perform selection processing on the first point cloud dataset to determine the index numbers of m points to be selected, and determine the second point cloud dataset based on the index numbers of m points to be selected; wherein m is a positive integer.
[0241] In some embodiments, the determination unit 2101 is further configured to perform top-k query processing on the first point cloud dataset based on the contribution scores, determine the index numbers corresponding to the top m points with the highest contribution scores, and obtain m index numbers of the points to be selected; and is also configured to determine the candidate points corresponding to the m index numbers of the points to be selected based on the first point cloud dataset, and determine the candidate points corresponding to the m index numbers of the points to be selected as the second point cloud dataset.
[0242] In some embodiments, the processing unit 2102 is further configured to input the second point cloud data set into a preset task network, and output an execution result corresponding to the task to be processed through the preset task network.
[0243] In some embodiments, the determination unit 2101 is further configured to determine at least two groups of training samples, each group of training samples including a point cloud data sample set and a task sample; and perform model training on the initial joint model based on the at least two groups of training samples, and determine the trained model as the target joint model; wherein the target joint model includes a data processing network and a preset task network.
[0244] In some embodiments, the determination unit 2101 is further configured to determine the task loss function of the second point cloud dataset for the task to be processed, and to determine the sampling loss function between the first point cloud dataset and the second point cloud dataset; and to determine the joint loss function corresponding to the target joint model based on the task loss function and the sampling loss function.
[0245] In some embodiments, the determination unit 2101 is further configured to determine a first factor; determine a weighted sampling loss function based on the first factor and the sampling loss function; and determine a joint loss function based on the task loss function and the weighted sampling loss function.
[0246] It is understood that in this embodiment, a "unit" can be a portion of a circuit, a portion of a processor, a portion of a program or software, etc., and can also be a module or a non-modular system. Furthermore, the various components in this embodiment can be integrated into a single processing unit, or each unit can exist physically separately, or two or more units can be integrated into a single unit. The aforementioned integrated units can be implemented in the form of hardware or software functional modules.
[0247] If the integrated unit is implemented as a software functional module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment, or the portion that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions for causing a computer device (which can be a personal computer, server, or network device, etc.) or a processor to execute all or part of the steps of the method described in this embodiment. The aforementioned storage medium includes various media that can store program code, such as a USB flash drive, a mobile hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0248] Therefore, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by at least one processor, the steps of the method in any one of the aforementioned embodiments are implemented.
[0249] Based on the composition of the above-mentioned point cloud data processing device 210 and the computer-readable storage medium, Figure 22 is a schematic diagram of the specific hardware structure of an electronic device provided in an embodiment of the present application. As shown in Figure 22, the electronic device 220 may include: a communication interface 2201, a memory 2202 and a processor 2203; each component is coupled together through a bus system 2204. It can be understood that the bus system 2204 is used to realize the connection and communication between these components. In addition to the data bus, the bus system 2204 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, various buses are marked as bus systems 2204 in Figure 22. Among them:
[0250] Communication interface 2201, used for sending and receiving signals during the process of sending and receiving information with the power supply equipment;
[0251] Memory 2202, used to store computer programs that can be run on processor 2203;
[0252] The processor 2203 is configured to, when running the computer program, execute:
[0253] Determine a first point cloud dataset; process the first point cloud dataset using a data processing network to determine a second point cloud dataset, wherein the first point cloud dataset is input data of the data processing network, the second point cloud dataset is output data of the data processing network, and the number of points included in the second point cloud dataset is less than the number of points in the first point cloud dataset; process the first point cloud dataset using the data processing network, including: determining contribution scores of points in the first point cloud dataset; and determining the second point cloud dataset based on the contribution scores.
[0254] It is understood that the memory 2202 in the embodiment of the present application can be a volatile memory or a non-volatile memory, or can include both volatile and non-volatile memories. Among them, the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. The volatile memory can be a random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of RAM are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous DRAM (DDRSDRAM), enhanced synchronous DRAM (ESDRAM), synchronous link DRAM (SLDRAM), and direct RAM bus random access memory (DRRAM). The memory 2202 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
[0255] The processor 2203 may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by hardware integrated logic circuits or software instructions in the processor 2203. The above-mentioned processor 2203 can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. The various methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of this application can be directly embodied as being executed by a hardware decoding processor, or can be executed by a combination of hardware and software modules in the decoding processor. The software module can be located in a storage medium mature in the art, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers, etc. The storage medium is located in the memory 2202, and the processor 2203 reads the information in the memory 2202 and completes the steps of the above method in combination with its hardware.
[0256] It is understood that the embodiments described herein may be implemented using hardware, software, firmware, middleware, microcode, or a combination thereof. For hardware implementation, the processing unit may be implemented in one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), general-purpose processors, controllers, microcontrollers, microprocessors, other electronic units for performing the functions described herein, or a combination thereof.
[0257] For software implementation, the techniques described herein can be implemented by modules (e.g., procedures, functions, etc.) that perform the functions described herein. The software code can be stored in a memory and executed by a processor. The memory can be implemented in the processor or external to the processor.
[0258] In some embodiments, the processor 2203 is further configured to execute the steps of the method described in any one of the aforementioned embodiments when running the computer program.
[0259] In some embodiments, the present application also provides an electronic device, which includes the point cloud data processing device 210 described in any one of the aforementioned embodiments.
[0260] In another embodiment of the present application, a network architecture of a point cloud encoding and decoding system including a point cloud data processing method is provided. FIG23 is a schematic diagram of a network architecture of a point cloud encoding and decoding provided in an embodiment of the present application. As shown in FIG23 , the network architecture includes one or more electronic devices 13 to 1N and a communication network 01, wherein the electronic devices 13 to 1N can perform video interaction through the communication network 01. During the implementation process, the electronic device can be various types of devices with point cloud encoding and decoding functions. For example, the electronic device can include a mobile phone, a tablet computer, a personal computer, a personal digital assistant, a navigator, a digital phone, a video phone, a television, a sensor device, a server, etc., which is not limited by the embodiment of the present application. Among them, the decoder or encoder in the embodiment of the present application can be the above-mentioned electronic device.
[0261] Among them, the electronic device in the embodiment of the present application has a point cloud encoding and decoding function, generally including a point cloud encoder (ie, encoder) and a point cloud decoder (ie, decoder).
[0262] It should be noted that in the embodiments of the present application, the aforementioned point cloud data processing method can be applied to the encoder, the decoder, or even the encoder and the decoder at the same time. For example, in the encoder, for the input point cloud, the data processing method is first used to downsample the input point cloud into a sparse point cloud with a smaller number of points, and then the encoding and decoding processing is performed based on the sparse point cloud. Finally, the point cloud is reconstructed based on the upsampling network (PU-Refiner), thereby improving the encoding and decoding efficiency.
[0263] It should be noted that, in this application, the terms "comprises," "includes," or any other variations thereof are intended to encompass non-exclusive inclusion, such that a process, method, article, or apparatus comprising a series of elements includes not only those elements but also other elements not explicitly listed, or elements inherent to such process, method, article, or apparatus. In the absence of further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of other identical elements in the process, method, article, or apparatus comprising the element.
[0264] The serial numbers of the above embodiments of the present application are for description only and do not represent the advantages or disadvantages of the embodiments.
[0265] The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined without conflict to obtain new method embodiments.
[0266] The features disclosed in the several product embodiments provided in this application can be arbitrarily combined without conflict to obtain new product embodiments.
[0267] The features disclosed in the several method or device embodiments provided in this application can be arbitrarily combined without conflict to obtain new method embodiments or device embodiments.
[0268] The above description is merely a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto. Any changes or substitutions that can be easily conceived by a person skilled in the art within the technical scope disclosed in this application should be included in the scope of protection of this application. Therefore, the scope of protection of this application should be based on the scope of protection of the claims. Industrial Applicability
[0269] In an embodiment of the present application, a first point cloud dataset is first determined; then, the first point cloud dataset is processed using a data processing network to determine a second point cloud dataset, wherein the first point cloud dataset serves as input data to the data processing network, the second point cloud dataset serves as output data from the data processing network, and the number of points contained in the second point cloud dataset is less than the number of points in the first point cloud dataset. Here, processing the first point cloud dataset using the data processing network includes: determining contribution scores of the points in the first point cloud dataset; and determining the second point cloud dataset based on the contribution scores. In other words, downsampling the first point cloud dataset based on the contribution scores to obtain a second point cloud dataset with fewer points not only reduces computational complexity, but also, because the contribution scores can characterize the importance of points in the first point cloud dataset relative to the task being processed, i.e., the downsampling method is task-oriented, the data processing network can also strike a balance between preserving the geometric properties of the input point cloud and optimizing specific downstream tasks, thereby improving the quality of the sampled point cloud and, consequently, the performance of downstream tasks.
Claims
1. A method for processing point cloud data, the method comprises: determining a first point cloud data set; processing the first point cloud data set by using a data processing network to determine a second point cloud data set, wherein the first point cloud data set is the input data of the data processing network, the second point cloud data set is the output data of the data processing network, and the number of points included in the second point cloud data set is less than the number of points in the first point cloud data set; The processing the first point cloud data set by using the data processing network includes: determining the contribution scores of the points in the first point cloud data set; determining the second point cloud data set according to the contribution scores.
2. The method according to claim 1, wherein, the determining the contribution scores of the points in the first point cloud data set includes: determining the contribution scores of the points in the first point cloud data set based on the importance degree of the points in the first point cloud data set relative to the task to be processed.
3. The method according to claim 1, wherein, the determining the second point cloud data set according to the contribution scores includes: determining m to-be-selected point index numbers according to the contribution scores; determining the second point cloud data set according to the m to-be-selected point index numbers and the first point cloud data set; wherein, m is a positive integer.
4. The method according to claim 3, wherein, the determining m to-be-selected point index numbers according to the contribution scores includes: determining the index numbers corresponding to the top m points with the highest contribution scores as the m to-be-selected point index numbers.
5. The method according to claim 1, wherein, the data processing network includes a feature extraction module, a cascaded attention module and a contribution marking module; the processing the first point cloud data set by using the data processing network to determine a second point cloud data set includes: extracting features of the first point cloud data set by using the feature extraction module to obtain the point feature information of the first point cloud data set; performing attention analysis on the point feature information of the first point cloud data set by using the cascaded attention module to obtain the attention feature information of the first point cloud data set; performing contribution evaluation on the attention feature information of the first point cloud data set by using the contribution marking module to determine the second point cloud data set.
6. The method according to claim 5, wherein, the feature extraction module includes a grouping module and a pooling module; the extracting features of the first point cloud data set by using the feature extraction module to obtain the point feature information of the first point cloud data set includes: grouping and jointly processing the first point cloud data set by using the grouping module to obtain the joint feature information of the first point cloud data set; performing feature mapping and pooling operations on the joint feature information of the first point cloud data set by using the pooling module to obtain the point feature information of the first point cloud data set.
7. The method according to claim 6, wherein, The grouping module includes a grouping layer, a replication layer, and a first combination layer; the use of the grouping module to group and jointly process the first point cloud data set to obtain the joint feature information of the first point cloud data set includes: Using the grouping layer to group the first point cloud data set to obtain the grouped feature information of the first point cloud data set; Using the replication layer to replicate the first point cloud data set a preset number of times to obtain the replicated feature information of the first point cloud data set; Using the first combination layer to combine the grouped feature information and the replicated feature information to obtain the joint feature information of the first point cloud data set.
8. The method according to claim 6, wherein, The pooling module includes a first multi-layer perceptron layer, an average pooling layer, a max pooling layer, a second combination layer, and a second multi-layer perceptron layer; The use of the pooling module to perform feature mapping and pooling operations on the joint feature information of the first point cloud data set to obtain the point feature information of the first point cloud data set includes: Using the first multi-layer perceptron layer to perform feature mapping on the joint feature information of the first point cloud data set to obtain the first mapped feature information of the first point cloud data set; Using the average pooling layer to perform average pooling operations on the first mapped feature information of the first point cloud data set to obtain the average pooling information of the first point cloud data set; Using the max pooling layer to perform max pooling operations on the first mapped feature information of the first point cloud data set to obtain the first max pooling information of the point cloud data set; Using the second combination layer to combine the average pooling information and the max pooling information to obtain the second mapped feature information of the first point cloud data set; Using the second multi-layer perceptron layer to perform feature mapping on the second mapped feature information to obtain the point feature information of the first point cloud data set.
9. The method according to claim 5, wherein, The cascaded attention module includes at least two attention modules and a third combination layer; The use of the cascaded attention module to perform attention analysis on the point feature information of the first point cloud data set to obtain the attention feature information of the first point cloud data set includes: Using the at least two attention modules to perform attention feature extraction on the point feature information of the first point cloud data set to obtain at least two intermediate feature information of the first point cloud data set; Using the third combination layer to combine the at least two intermediate feature information to obtain the attention feature information of the first point cloud data set.
10. The method according to claim 9, wherein, The at least two attention modules include a first attention module, a second attention module, and a third attention module; The use of the at least two attention modules to perform attention feature extraction on the point feature information of the first point cloud data set to obtain at least two intermediate feature information of the first point cloud data set includes: Using the first attention module to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first intermediate feature information of the first point cloud data set; Using the second attention module to perform attention feature extraction on the first intermediate feature information of the first point cloud data set, to obtain the second intermediate feature information of the first point cloud data set; Using the third attention module to perform attention feature extraction on the second intermediate feature information of the first point cloud data set, to obtain the third intermediate feature information of the first point cloud data set; The using the third joint layer to jointly process the at least two intermediate feature information to obtain the attention feature information of the first point cloud data set includes: Using the third joint layer to jointly process the first intermediate feature information, the second intermediate feature information and the third intermediate feature information to obtain the attention feature information of the first point cloud data set.
11. The method according to claim 9, wherein, The attention module includes any one of a self-attention module and an offset attention module.
12. The method according to claim 11, wherein, When the attention module is a self-attention module, the self-attention module includes a first attention layer, a third multi-layer perceptron layer and a first adder; Using the self-attention module to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first intermediate feature information of the first point cloud data set includes: Using the first attention layer to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first attention information of the first point cloud data set; Using the third multi-layer perceptron layer to perform feature mapping on the first attention information of the first point cloud data set to obtain the second attention information of the first point cloud data set; Using the first adder to perform an addition operation on the point feature information and the second attention information of the first point cloud data set to obtain the intermediate feature information of the first point cloud data set.
13. The method according to claim 11, wherein, When the attention module is an offset attention module, the offset attention module includes a second attention layer, a subtractor, a fourth multi-layer perceptron layer and a second adder; Using the offset attention module to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first intermediate feature information of the first point cloud data set includes: Using the second attention layer to perform attention feature extraction on the point feature information of the first point cloud data set to obtain the first attention information of the first point cloud data set; Using the subtractor to perform a subtraction operation on the point feature information and the first attention information of the first point cloud data set to obtain the second attention information of the first point cloud data set; Using the fourth multi-layer perceptron layer to perform feature mapping on the second attention information of the first point cloud data set to obtain the third attention information of the first point cloud data set; Using the second adder to perform an addition operation on the point feature information and the third attention information of the first point cloud data set to obtain the intermediate feature information of the first point cloud data set.
14. The method according to claim 5, wherein, The contribution marking module includes a fifth multi-layer perceptron layer, a fully connected layer, and a selection module; The use of the contribution marking module to evaluate the contribution of the attention feature information of the first point cloud data set to determine the second point cloud data set includes: Using the fifth multi-layer perceptron layer to perform feature mapping on the attention feature information of the first point cloud data set to obtain fourth intermediate feature information of the first point cloud data set; Using the fully connected layer to perform a fully connected process on the fourth intermediate feature information of the first point cloud data set to determine the contribution scores of the points in the first point cloud data set; Using the selection module and the contribution scores to perform a selection process on the first point cloud data set to determine m candidate point index numbers to be selected, and determining the second point cloud data set according to the m candidate point index numbers; where m is a positive integer.
15. The method according to claim 14, wherein, The use of the selection module and the contribution scores to perform a selection process on the first point cloud data set to determine m candidate point index numbers to be selected includes: Performing a top-k query process on the first point cloud data set based on the contribution scores to determine the index numbers corresponding to the top m points with the highest contribution scores to obtain the m candidate point index numbers to be selected; The determination of the second point cloud data set according to the m candidate point index numbers includes: Determining candidate points corresponding to the m candidate point index numbers according to the first point cloud data set, and determining the candidate points corresponding to the m candidate point index numbers as the second point cloud data set.
16. The method according to any one of claims 2 to 15, wherein, The method further includes: Inputting the second point cloud data set into a preset task network, and outputting an execution result corresponding to the task to be processed through the preset task network.
17. The method according to claim 16, wherein, The method further includes: Determining at least two sets of training samples, each set of training samples including a point cloud data sample set and a task sample; Training an initial joint model according to the at least two sets of training samples, and determining the trained model as a target joint model; wherein the target joint model includes the data processing network and the preset task network.
18. The method according to claim 17, wherein, The method further includes: Determining a task loss function of the second point cloud data set for the task to be processed, and determining a sampling loss function between the first point cloud data set and the second point cloud data set; Determining a joint loss function corresponding to the target joint model according to the task loss function and the sampling loss function.
19. The method according to claim 18, wherein, The determination of the joint loss function corresponding to the target joint model according to the task loss function and the sampling loss function includes: Determining a first factor; Determining a weighted sampling loss function according to the first factor and the sampling loss function; Determining the joint loss function according to the task loss function and the weighted sampling loss function.
20. A point cloud data processing device, comprising a determination unit and a processing unit, wherein: the determination unit is configured to determine a first point cloud data set; the processing unit is configured to process the first point cloud data set by using a data processing network to determine a second point cloud data set, wherein the first point cloud data set is the input data of the data processing network, the second point cloud data set is the output data of the data processing network, and the number of points included in the second point cloud data set is less than the number of points in the first point cloud data set; wherein the processing unit is specifically configured to determine the contribution score of the points in the first point cloud data set; and determine the second point cloud data set according to the contribution score.
21. An electronic device, comprising a memory and a processor, wherein: the memory is used for storing a computer program that can run on the processor; the processor is used for executing the method according to any one of claims 1 to 19 when running the computer program.
22. A computer-readable storage medium, wherein, the computer-readable storage medium stores a computer program, and when the computer program is executed by at least one processor, the method according to any one of claims 1 to 19 is implemented.