Semantic label propagation, model training method and device, and unmanned vehicle
By generating dense and semantically rich pseudo-labels in autonomous driving equipment, the problem of network performance degradation in point cloud segmentation in sparse label scenarios is solved, and more efficient semantic segmentation model training and segmentation results are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JD DIGITS HAIYI INFORMATION TECHNOLOGY CO LTD
- Filing Date
- 2022-09-23
- Publication Date
- 2026-06-16
Smart Images

Figure CN115546759B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer vision technology, particularly to the field of autonomous driving, and specifically to a semantic label propagation, model training method, device, and autonomous vehicle. Background Technology
[0002] Currently, autonomous vehicles are used to automatically transport people or goods from one location to another. These vehicles collect environmental information through onboard sensors and complete the automated transport. Autonomous delivery vehicles controlled by autonomous driving technology greatly improve the convenience of production and daily life, and save labor costs.
[0003] 3D point cloud semantic segmentation is a fundamental task for understanding real-world environments and is important for various applications, including autonomous driving, drones, and augmented reality. It aims to assign a semantic category to each point in the point cloud. In recent years, we have witnessed significant performance improvements in 3D point cloud semantic segmentation. We have found that current high-performance methods often heavily rely on large-scale 3D data with point-by-point annotations, which require very time-consuming manual annotation. Therefore, weakly supervised point cloud segmentation emerges as a promising alternative.
[0004] Research on weakly supervised point cloud segmentation is still in the exploratory stage, mainly covering the following scenarios: sparse labeling, coarse-grained labeling, and doodle labeling. Currently, mainstream work focuses on the sparse labeling scenario. In the sparse labeling scenario, only a portion of the point cloud data used to train the model has label information, while the rest is unlabeled. Summary of the Invention
[0005] One of the technical problems this disclosure aims to solve is to provide a semantic label propagation, model training method, device, and autonomous vehicle.
[0006] According to a first aspect of this disclosure, a semantic label propagation method is proposed, comprising: determining the feature representation of point cloud data in the current training batch, wherein the point cloud data in the current training batch includes point cloud points carrying semantic labels and point cloud points not carrying semantic labels; determining the global centroid of each semantic label class in the current training batch based on the feature representation of the point cloud points carrying semantic labels and the global centroid of each semantic label class in the previous training batch; determining the similarity between the point cloud points not carrying semantic labels and the global centroid of each semantic label class in the current training batch; and determining the pseudo-label of the point cloud points not carrying semantic labels based on the similarity.
[0007] In some embodiments, determining the global centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying semantic labels and the global centroid of each semantic label in the previous training batch includes: determining the local centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying semantic labels; and determining the global centroid of each semantic label in the current training batch based on the local centroid of each semantic label in the current training batch and the global centroid of each semantic label in the previous training batch.
[0008] In some embodiments, determining the local centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying each semantic label includes: calculating the average value of the feature representation of the point cloud points carrying each semantic label, and using the average value as the local centroid of that semantic label.
[0009] In some embodiments, determining the global centroid of each semantic label in the current training batch based on the local centroid of each semantic label in the current training batch and the global centroid of each semantic label in the previous training batch includes: weighting the local centroid of each semantic label in the current training batch and its global centroid in the previous training batch, and using the weighted sum as its global centroid in the current training batch.
[0010] In some embodiments, determining the feature representation of the point cloud data in the current training batch includes: inputting the point cloud data of the current training batch into a semantic segmentation backbone network to obtain an intermediate feature representation of the point cloud data, wherein the intermediate feature representation is used to determine the semantic category prediction probability of the point cloud data; and inputting the intermediate feature representation of the point cloud data into a projection network to obtain the feature representation of the point cloud data in the current training batch.
[0011] In some embodiments, determining the pseudo-labels of the point cloud points without semantic labels based on the similarity includes: determining a pseudo-label for each of the point cloud points without semantic labels based on the similarity.
[0012] In some embodiments, determining the similarity between the point cloud points without semantic labels and the global centroids of each semantic label class in the current training batch includes: determining the distance between the point cloud points without semantic labels and the global centroids of each semantic label class in the current training batch; calculating the similarity probability between the point cloud points without semantic labels and the global centroids of each semantic label class in the current training batch based on the distance, and using the similarity probability as the similarity.
[0013] In some embodiments, the distance from the point cloud points without semantic labels to the global centroid of each semantic label in the current training batch is determined based on the cosine distance function.
[0014] In some embodiments, determining the pseudo-label of the point cloud point without semantic label based on the similarity includes: using the similarity probability as the pseudo-label of the point cloud point without semantic label.
[0015] In some embodiments, when the current training batch is the first training batch, the global class centroid of each semantic label in the previous training batch is a preset value.
[0016] In some embodiments, the preset value is a zero vector.
[0017] In some embodiments, determining a pseudo-label for each of the point cloud points without semantic labels based on the similarity includes: using the similarity as a pseudo-label for the point cloud points without semantic labels.
[0018] According to a second aspect of this disclosure, a model training method is proposed, comprising: determining pseudo-labels for point cloud points that do not carry semantic labels in the current training batch according to the semantic label propagation method described above; and training a semantic segmentation model based on the semantic labels carried in the current training batch and the pseudo-labels.
[0019] In some embodiments, training the semantic segmentation model based on the semantic labels carried in the current training batch and the pseudo-labels includes: determining the value of a first loss function based on the semantic category prediction probability of point cloud points carrying semantic labels in the current training batch and the semantic labels; determining the value of a second loss function based on the semantic category prediction probability of point cloud points not carrying semantic labels in the current training batch and the pseudo-labels; determining the overall loss function value based on the value of the first loss function and the value of the second loss function; and updating the semantic segmentation model based on the overall loss function value.
[0020] According to a third aspect of this disclosure, a semantic segmentation method is proposed, comprising: performing semantic segmentation on point cloud data to be processed based on a semantic segmentation model trained using the model training method described above.
[0021] According to a fourth aspect of this disclosure, an apparatus is proposed, comprising: a module for performing the semantic label propagation method as described above, or a module for performing the model training method as described above, or a module for performing the semantic segmentation method as described above.
[0022] According to a fifth aspect of this disclosure, an electronic device is proposed, comprising: a memory; and a processor coupled to the memory, the processor being configured to execute, based on instructions stored in the memory, a semantic tag propagation method as described above, a model training method as described above, or a semantic segmentation method as described above.
[0023] According to a sixth aspect of this disclosure, a computer-readable storage medium is proposed that stores computer program instructions thereon, which, when executed by a processor, implement the semantic tag propagation method as described above, or the model training method as described above, or the semantic segmentation method as described above.
[0024] According to the seventh aspect of this disclosure, an unmanned vehicle is also proposed, including the device or electronic equipment as described above.
[0025] Other features and advantages of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description
[0026] The accompanying drawings, which form part of this specification, illustrate embodiments of this disclosure and, together with the specification, serve to explain the principles of this disclosure.
[0027] This disclosure will become clearer with reference to the accompanying drawings and the following detailed description, wherein:
[0028] Figure 1 This is a flowchart illustrating a semantic tag propagation method according to some embodiments of the present disclosure;
[0029] Figure 2 This is a schematic flowchart illustrating the process of determining the feature representation of point cloud points according to some embodiments of the present disclosure;
[0030] Figure 3 This is a flowchart illustrating the process of updating the global class centroid according to some embodiments of this disclosure;
[0031] Figure 4 This is a flowchart illustrating a model training method according to some embodiments of the present disclosure;
[0032] Figure 5 This is a schematic diagram of the structure of a semantic tag propagation device according to some embodiments of the present disclosure;
[0033] Figure 6 This is a schematic diagram of the structure of a model training apparatus according to some embodiments of the present disclosure;
[0034] Figure 7 This is a schematic diagram of a semantic segmentation model training framework according to some embodiments of the present disclosure;
[0035] Figure 8This is a schematic diagram of the structure of a semantic segmentation apparatus according to some embodiments of the present disclosure;
[0036] Figure 9 This is a schematic diagram of the structure of a computer system according to some embodiments of the present disclosure.
[0037] Figure 10 This is a schematic diagram of the structure of an unmanned vehicle according to some embodiments of the present disclosure.
[0038] Figure 11 This is a three-dimensional structural diagram of an unmanned vehicle according to some embodiments of the present disclosure. Detailed Implementation
[0039] Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that, unless otherwise specifically stated, the relative arrangement, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present disclosure.
[0040] At the same time, it should be understood that, for ease of description, the dimensions of the various parts shown in the accompanying drawings are not drawn according to actual scale.
[0041] The following description of at least one exemplary embodiment is merely illustrative and is in no way intended to limit this disclosure or its application or use.
[0042] Techniques, methods, and equipment known to those skilled in the art may not be discussed in detail, but where appropriate, such techniques, methods, and equipment should be considered part of the specification.
[0043] In all examples shown and discussed herein, any specific values should be interpreted as merely exemplary and not as limitations. Therefore, other examples of exemplary embodiments may have different values.
[0044] It should be noted that similar labels and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be discussed further in subsequent figures.
[0045] To make the objectives, technical solutions, and advantages of this disclosure clearer, the following detailed description is provided in conjunction with specific embodiments and the accompanying drawings.
[0046] In related technologies, weakly supervised point cloud segmentation in sparse label scenarios mostly adopts self-training methods, such as contrastive learning. When performing weakly supervised learning, most methods force all labeled points (points with large semantic labels) in the input point cloud to be included, which ignores and violates the important spatial relationships between points, thus impairing network performance.
[0047] Furthermore, related technologies typically employ sparse propagation strategies to construct pseudo-labels for unlabeled points. For example, during prediction, all point cloud points are scored, and the semantic category predictions of a few high-scoring point cloud points are used as pseudo-labels, which are then used as ground truth values in gradient backpropagation during the next training round. Weakly supervised point cloud semantic segmentation methods in related technologies discard most unlabeled points, and the obtained pseudo-labels are also sparse.
[0048] In view of this, this disclosure proposes a semantic label propagation, model training method, device, and unmanned vehicle that can solve the technical problems existing in related technologies.
[0049] Figure 1 This is a flowchart illustrating a semantic tag propagation method according to some embodiments of this disclosure. Figure 1 As shown, the semantic tag propagation method of this disclosure includes:
[0050] Step S110: Determine the feature representation of the point cloud data in the current training batch.
[0051] The point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels. For example, the current training batch may be the first training batch, the second training batch, or the third training batch, etc.
[0052] In some embodiments, based on Figure 2 The process shown determines the feature representation of the point cloud data in the current training batch, so as to execute steps S120 and S130 based on the feature representation of the point cloud data determined therefrom.
[0053] In some embodiments, feature extraction is performed on the point cloud data of the current training batch based on the semantic segmentation backbone network, so as to execute steps S120 and S130 according to the feature representation of the point cloud data determined therefrom.
[0054] In some embodiments, in the semantic label propagation method, features of the point cloud data are extracted based on a feature extraction network different from the semantic segmentation backbone network. For example, the semantic segmentation backbone network uses a first neural network, and the feature extraction network used for semantic label propagation uses a second neural network. In step S110, the point cloud data of the current training batch is input into the second neural network to obtain the feature representation of the point cloud data of the current training batch.
[0055] Step S120: Determine the global centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying semantic labels and the global centroid of each semantic label in the previous training batch.
[0056] As originally proposed for few-shot learning, a typical prototype refers to the class centroid in the feature space. The class centroid, also known as the class center, has corresponding class centroids for different semantic categories (or semantic label categories). For example, the class centroid of semantic category 1 is 'a', and the class centroid of semantic category 2 is 'b'. Class centroids are typically determined based on the feature representations of point cloud points carrying the same semantic label class.
[0057] In related technologies, most methods force all labeled points (point cloud points carrying semantic labels) in the input point cloud, which ignores and violates the important spatial relationships between point cloud points, thereby impairing network performance.
[0058] In contrast, in this disclosure, it is recommended to keep the class centroid consistent globally, i.e., consistent at the training set level, and to use momentum updates to maintain the global class centroid.
[0059] In some embodiments, according to Figure 3 The process shown determines the global centroid of each semantic label in the current training batch.
[0060] Step S130: Determine the similarity between point cloud points without semantic labels and the global centroid of each semantic label in the current training batch.
[0061] In some embodiments, step S130 includes: determining the distance between point cloud points without semantic labels and the global centroid of each semantic label in the current training batch; calculating the similarity probability between point cloud points without semantic labels and the global centroid of each semantic label in the current training batch based on the distance, and using the similarity probability as the similarity between point cloud points without semantic labels and the global centroid of each semantic label in the current training batch.
[0062] In some embodiments, the distance from unlabeled point cloud points to the global centroids of each semantic label in the current training batch is determined based on the cosine distance function. Alternatively, other methods can be used to measure the distance between unlabeled points (i.e., point cloud points without semantic labels) and the global centroids. For example, for an unlabeled point p1, the cosine distance between p1's feature representation and the centroids a, b, and c is calculated, and then the similarity between the unlabeled point p1 and the centroids a, b, and c is calculated based on the cosine distance.
[0063] In some other embodiments, step S130 includes: calculating the cosine distance between the point cloud points without semantic labels and the global centroid of each semantic label in the current training batch, and using the cosine distance as the similarity between the two.
[0064] In some other embodiments, step S130 may also employ other methods to calculate the similarity between point cloud points without semantic labels and the global centroid of each semantic label in the current training batch.
[0065] Step S140: Based on similarity, determine the pseudo-labels of point cloud points that do not carry semantic labels.
[0066] In some embodiments, step S140 includes: determining a pseudo-label for each of the point cloud points that do not carry semantic labels based on similarity. For example, if there are M unlabeled points in the point cloud data of the current training batch, then a pseudo-label is generated for each of the M unlabeled points.
[0067] In some embodiments, similarity is used as a pseudo-label for unlabeled points. In these embodiments, the resulting pseudo-labels are soft labels, distinct from one-hot labels. For example, when similarity specifically refers to similarity probability, the similarity probability between the unlabeled point and the global centroid of each semantic label class in the current training batch is used as the pseudo-label for the unlabeled point. For instance, assuming there are three global centroids a, b, and c in the current training batch, for the unlabeled point p1, its similarity probability with global centroids a, b, and c is used as the pseudo-label for unlabeled point p1.
[0068] In this embodiment, by assigning pseudo-labels to each unlabeled point based on similarity, the generated pseudo-labels are no longer sparse and can fully utilize the information of unlabeled points during model training. Furthermore, by assigning soft labels to each unlabeled point based on similarity, more informative pseudo-labels can be provided compared to assigning one-hot labels to unlabeled points. Therefore, through the above processing, the model training effect in weakly supervised scenarios is greatly improved.
[0069] In other embodiments, step S140 includes: determining the score of each unlabeled point based on similarity; setting pseudo-labels for unlabeled points with scores greater than a preset threshold; and not setting pseudo-labels for unlabeled points with scores less than or equal to the preset threshold. For example, for each unlabeled point in the current training batch, the maximum similarity between it and the global class centroids in the current training batch is used as the score of the unlabeled point. If the score is greater than the preset threshold, a pseudo-label is set for the unlabeled point.
[0070] In this embodiment, a dense label propagation strategy is implemented through the above steps. Specifically, the global class centroid of the current training batch is determined based on the feature representation of the labeled points in the current training batch and the global class centroid of the previous training batch. This allows the class centroid to dynamically evolve during model training and remain within the global scope, i.e., the entire training set. This facilitates consistent measurement between the point features of unlabeled points and the class centroid. Furthermore, by calculating the similarity between unlabeled points and the momentum-updated global class centroid, and setting pseudo-labels for unlabeled points based on this similarity, more accurate and semantically meaningful pseudo-labels can be generated, thereby optimizing the model training effect. Moreover, compared with related technologies, this embodiment does not require forcibly inputting all labeled points at once by disrupting the spatial relationships of point cloud points, effectively mitigating the adverse effects on network performance.
[0071] Figure 2 This is a schematic flowchart illustrating the process of determining the feature representation of point cloud points according to some embodiments of this disclosure. Figure 2 As shown, the process for determining the feature representation of point cloud points in this embodiment of the present disclosure includes:
[0072] Step S111: Input the point cloud data of the current training batch into the semantic segmentation backbone network to obtain the intermediate feature representation of the point cloud data.
[0073] The point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels.
[0074] The intermediate feature representations of the point cloud data in the current training batch are used to determine the semantic category prediction probabilities of the point cloud data. For example, the intermediate feature representations of the point cloud data in the current training batch are input into the prediction head of the semantic segmentation network to obtain the semantic category prediction probabilities of the point cloud data.
[0075] In some embodiments, the semantic segmentation backbone network may employ existing backbone networks for semantic segmentation in related technologies.
[0076] Step S112: Input the intermediate feature representation of the point cloud data into the projection network to obtain the feature representation of the point cloud data of the current training batch.
[0077] In some embodiments, the projection network employs a neural network already existing in related technologies. For example, it may employ a multilayer perceptron neural network or other neural networks.
[0078] In this embodiment of the disclosure, by setting up a projection network and decoupling the features of the global class centroid from the network prediction, high-level semantics can be captured better. Furthermore, using the features extracted by the projection network to maintain the global class centroid allows for more flexible optimization and adjustment of the network, enabling better measurement of the metric between points and class centroids.
[0079] In this embodiment of the disclosure, the above steps can better extract the features of point cloud data, which helps to generate pseudo-labels with more semantic meaning, thereby improving the training effect of the semantic segmentation model.
[0080] Figure 3 This is a schematic diagram illustrating the process of updating the global class centroid according to some embodiments of this disclosure. For example... Figure 3 As shown, the process for updating the global class centroid in this embodiment of the disclosure includes:
[0081] Step S121: Determine the local centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying each semantic label.
[0082] In some embodiments, step S121 includes: calculating the average value of the feature representation of the point cloud points carrying each type of semantic label, and using the average value as the local centroid of that type of semantic label.
[0083] For example, the local class centroid of each semantic label in the current training batch is determined according to the following formula:
[0084]
[0085] in, To determine the local class centroid of the k-th semantic label in the current training batch, Let x be the number of point cloud points carrying the semantic label of class k in the current training batch. i ∈X∧y i =k represents x i For point cloud points carrying semantic labels of class k, Represents point x in a point cloud i The feature representation. In some embodiments, f θ (x i () represents the extraction of point cloud point x based on the semantic segmentation backbone network. i The intermediate feature representation, θ, represents the network parameters that the semantic segmentation backbone network needs to learn. This indicates that the projection network is used to represent the point cloud point x. i The intermediate feature representations are processed to finally obtain the point cloud point x. i Feature representation, These are the network parameters that the projection network needs to learn.
[0086] Step S122: Determine the global centroid of each semantic label in the current training batch based on the local centroid of each semantic label in the current training batch and the global centroid of each semantic label in the previous training batch.
[0087] In some embodiments, step S122 includes: weighting the local class centroid of each semantic label in the current training batch with its global class centroid in the previous training batch, and using the weighted sum as its global class centroid in the current training batch.
[0088] For example, the global class centroid in the current training batch can be determined using the following formula:
[0089]
[0090] Among them, C′ k Let C be the global class centroid of the k-th semantic label in the current training batch. k Let $\mathbf{k}$ be the global centroid of the semantic label of the k-th class in the previous training batch. λ is a set coefficient used to determine the local centroid of the k-th semantic label in the current training batch. In some embodiments, λ is a positive number greater than or equal to 0.9 and less than 1. For example, the value of λ is 0.9.
[0091] For example, if the point cloud data in the current training batch carries two semantic labels L1 and L2, and the global centroid of the previous training batch includes three semantic labels L1, L2, and L3, then the global centroid corresponding to L1 in the current training batch is determined based on the feature representation of the point cloud points carrying L1 in the current training batch and the global centroid of L1 in the previous training batch; the global centroid corresponding to L2 in the current training batch is determined based on the feature representation of the point cloud points carrying L2 in the current training batch and the global centroid of L2 in the previous training batch; and the global centroid corresponding to L3 in the previous training batch is determined based on the global centroid corresponding to L3 in the previous training batch.
[0092] In some embodiments, when the current training batch is the first training batch, the global centroid of each semantic label class in the previous training batch is a preset value. For example, the global centroid of each semantic label class is set to the zero vector.
[0093] In this embodiment of the disclosure, the above steps enable the class centroid to dynamically evolve and remain in the global scope, i.e. the entire training set, during the model training process. This is beneficial for generating consistent metrics between the point features of unlabeled points and the class centroid, thereby helping to generate pseudo-labels with more semantic meaning and thus optimizing the training effect of the semantic segmentation model.
[0094] Figure 4This is a flowchart illustrating a model training method according to some embodiments of the present disclosure. Figure 4 As shown, some embodiments of the model training method disclosed herein include:
[0095] Step S410: Determine the feature representation of the point cloud data in the current training batch.
[0096] The point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels.
[0097] In some embodiments, in step S410, according to Figure 2 The process shown determines the feature representation of the point cloud data in the current training batch.
[0098] Step S420: Determine the global centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying semantic labels and the global centroid of each semantic label in the previous training batch.
[0099] In some embodiments, in step S420, according to Figure 3 The process shown determines the global centroid of each semantic label in the current training batch.
[0100] Step S430: Determine the similarity between point cloud points without semantic labels and the global centroid of each semantic label in the current training batch.
[0101] In some embodiments, step S430 includes: for each unlabeled point (i.e., a point cloud point without semantic labels) in the current training batch, calculating the distance between the feature representation of the unlabeled point and each global class centroid (i.e., the global class centroid of each semantic label in the current training batch), calculating the similarity probability between the unlabeled point and each global class centroid based on the distance, and using the similarity probability as the similarity between the unlabeled point and each global class centroid.
[0102] For example, the similarity between unlabeled points and each current global centroid can be calculated using the following formula:
[0103]
[0104] in, Indicates the unlabeled point x i The similarity probability with the current global centroid of the k-th class, d(q) i c k ) indicates an unlabeled point x i The distance to the global centroid of the current k-th class, d(q) i c k′ ) indicates an unlabeled point x iThe distance to any current global centroid, exp(d(q) i c k′ )) represents d(q i c k′ ∑ is the exponent, and the power operation is performed with the natural number e as the base. k′ exp(d(q i c k′ )) represents the exp(d(q) for all current global centroids. i c k′ Summation is performed.
[0105] Step S440: Based on similarity, determine the pseudo-labels of point cloud points that do not carry semantic labels.
[0106] In some embodiments, step S440 includes: determining a pseudo-label for each unlabeled point in the current training batch based on similarity.
[0107] In some embodiments, for each unlabeled point in the current training batch, the similarity between the unlabeled point and each global class centroid is used as the pseudo-label of the unlabeled point.
[0108] For example, when similarity is specifically defined as similarity probability, assuming there are three global centroids a, b, and c, for an unlabeled point p1, its similarity probability with global centroids a, b, and c is used as the pseudo-label of the unlabeled point p1.
[0109] In this embodiment of the disclosure, through steps S410 to S440, denser and more semantically meaningful pseudo-labels can be generated, overcoming the shortcomings of related technologies that cannot utilize a large amount of unlabeled point information.
[0110] Step S450: Train the semantic segmentation model based on the semantic labels and pseudo-labels carried by the point cloud data of the current training batch.
[0111] In some embodiments, step S450 includes: determining the value of a first loss function based on the semantic category prediction probability and semantic label of point cloud points carrying semantic labels in the current training batch; determining the value of a second loss function based on the semantic category prediction probability and pseudo label of point cloud points not carrying semantic labels in the current training batch; determining the overall loss function value based on the value of the first loss function and the value of the second loss function; and updating the semantic segmentation model based on the overall loss function value.
[0112] For example, the overall loss function value can be determined using the following formula:
[0113]
[0114] in, For the overall loss function value, N l N is the number of labeled points in the point cloud data of the current training batch. u X is the number of unlabeled points in the current training batch. l Let X be the set of labeled points in the current training batch. u p represents the set of unlabeled points in the current training batch. i =h σ (f θ (x i )) is point x i The network prediction probability, h σ This represents a linear layer with parameters σ and α being the loss weights. In the first term, It is the first loss function, where y i These are the semantic labels corresponding to the annotation points; in the second item, It is the second loss function, where These are the pseudo-labels corresponding to the unlabeled points, and α is a set coefficient. In some embodiments, the value of α is 0.1.
[0115] In some embodiments, the first loss function and the second loss function are adopted as the cross-entropy loss function.
[0116] In this embodiment of the disclosure, the above steps can generate denser and more semantically meaningful pseudo-labels, and the semantic segmentation model can be trained based on the generated pseudo-labels and the original semantic labels, thereby greatly improving the performance of the trained semantic segmentation model.
[0117] In some embodiments of this disclosure, a semantic segmentation method is also provided, comprising: performing semantic segmentation on point cloud data to be processed based on a semantic segmentation model trained by the model training method in the foregoing embodiments, so as to obtain a semantic segmentation result.
[0118] In this embodiment of the disclosure, the semantic segmentation model trained using the model training method described in the foregoing embodiments can greatly improve the accuracy of point cloud semantic segmentation results.
[0119] Figure 5 This is a schematic diagram of the structure of a semantic tag propagation apparatus according to some embodiments of the present disclosure. Figure 5 As shown, the semantic tag propagation device 500 of some embodiments of this disclosure includes: a feature extraction module 510, a class centroid maintenance module 520, a similarity determination module 530, and a tag determination module 540.
[0120] The feature extraction module 510 is configured to determine the feature representation of the point cloud data in the current training batch.
[0121] The point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels. For example, the current training batch may be the first training batch, the second training batch, or the third training batch, etc.
[0122] In some embodiments, the feature extraction module 510 is configured to input the point cloud data of the current training batch into the semantic segmentation backbone network to obtain the intermediate feature representation of the point cloud data, and input the intermediate feature representation of the point cloud data into the projection network to obtain the feature representation of the point cloud data of the current training batch. The intermediate feature representation of the point cloud data is used to determine the semantic category prediction probability of the point cloud data, while the feature representation of the point cloud data processed by the projection network is used to generate pseudo-labels.
[0123] The class centroid maintenance module 520 is configured to determine the global class centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying semantic labels and the global class centroid of each semantic label in the previous training batch.
[0124] In some embodiments, the class centroid maintenance module 520 is configured to: determine the local class centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying each semantic label; and determine the global class centroid of each semantic label in the current training batch based on the local class centroid of each semantic label in the current training batch and the global class centroid of each semantic label in the previous training batch.
[0125] In some embodiments, when the current training batch is the first training batch, the global centroid of each semantic label in the previous training batch is a preset value, such as the zero vector.
[0126] The similarity determination module 530 is configured to determine the similarity between point cloud points that do not carry semantic labels and the global class centroid of each semantic label in the current training batch.
[0127] For instructions on how to calculate similarity, please refer to the relevant descriptions in the other embodiments mentioned above.
[0128] The label determination module 540 is configured to determine pseudo-labels for point cloud points that do not carry semantic labels based on similarity.
[0129] In some embodiments, the label determination module 540 is configured to determine a pseudo-label for each unlabeled point in the current training batch based on similarity.
[0130] In some embodiments, the label determination module 540 is configured to use the similarity between the unlabeled point and each global class centroid as the pseudo-label of the unlabeled point.
[0131] In this embodiment, a dense label propagation strategy is implemented using the above-described apparatus. By determining the global class centroid of the current training batch based on the feature representations of the labeled points in the current batch and the global class centroid of the previous training batch, the class centroid dynamically evolves and remains within the global scope during model training. This facilitates consistent measurement between the point features of unlabeled points and the class centroid. Furthermore, by calculating the similarity between unlabeled points and the momentum-updated global class centroid, and setting pseudo-labels for unlabeled points based on this similarity, more accurate and semantically meaningful pseudo-labels can be generated, thereby optimizing the model training effect. Moreover, compared with related technologies, this embodiment does not require all labeled points to be input at once, effectively mitigating the adverse impact on network performance.
[0132] Figure 6 This is a schematic diagram of the structure of a model training apparatus according to some embodiments of the present disclosure. Figure 6 As shown, the model training apparatus 600 of some embodiments of this disclosure includes: a feature extraction module 610, a class centroid maintenance module 620, a similarity determination module 630, a label determination module 640, and a training module 650.
[0133] The feature extraction module 610 is configured to determine the feature representation of the point cloud data in the current training batch.
[0134] The point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels.
[0135] The class centroid maintenance module 620 is configured to determine the global class centroid of each semantic label in the current training batch based on the feature representation of the point cloud points carrying semantic labels and the global class centroid of each semantic label in the previous training batch.
[0136] The similarity determination module 630 is configured to determine the similarity between point cloud points that do not carry semantic labels and the global class centroid of each semantic label in the current training batch.
[0137] The label determination module 640 is configured to determine pseudo-labels for the point cloud points that do not carry semantic labels based on similarity.
[0138] Training module 650 is configured to train the semantic segmentation model based on the semantic labels and pseudo-labels carried by the point cloud data of the current training batch.
[0139] In some embodiments, the training module 650 is configured to: determine the value of a first loss function based on the semantic category prediction probability and semantic label of point cloud points carrying semantic labels in the current training batch; determine the value of a second loss function based on the semantic category prediction probability and pseudo label of point cloud points not carrying semantic labels in the current training batch; determine the overall loss function value based on the value of the first loss function and the value of the second loss function; and update the semantic segmentation model based on the overall loss function value.
[0140] In this embodiment of the disclosure, the above-mentioned device can generate denser and more semantically meaningful pseudo-labels, and can train the semantic segmentation model based on the generated pseudo-labels and the original semantic labels, thereby greatly improving the performance of the trained semantic segmentation model.
[0141] In some embodiments of this disclosure, a semantic segmentation apparatus is also provided, comprising: a semantic segmentation model trained based on the model training method in the foregoing embodiments, and a module for semantic segmentation of point cloud data to be processed.
[0142] Figure 7 This is a schematic diagram of a model training framework according to some embodiments of the present disclosure.
[0143] This disclosure provides a novel learning framework (i.e., model training framework) for weakly supervised point cloud semantic segmentation, which overcomes the deficiencies in related technologies by generating denser and more semantically meaningful pseudo-labels.
[0144] like Figure 7 As shown, the model training framework of this disclosure includes two components: an "adaptive prototyping" component and a "dense label propagation" component. The "adaptive prototyping" component captures global information for each semantic class by maintaining the class centroid of each class. Furthermore, the class centroid dynamically evolves during model training and remains globally, i.e., throughout the entire training set. This facilitates consistent metric measurements between unlabeled point features and class centroids. Building upon the "adaptive prototyping" component, we further propose a "dense label propagation" component to generate point-by-point supervision. To provide informative labels and alleviate the pain of setting score thresholds, the "dense label propagation" component uses soft labels for all unlabeled points, obtained by estimating similarity probability based on the distance between the features of labeled points and the class centroids. Compared to the one-hot labeling widely used in existing methods, this strategy reflects uncertainty and provides more informative labels, using similarity probability as a semantic cue.
[0145] The model training process in this disclosure embodiment is described below in conjunction with the "adaptive prototyping" and "dense label propagation" components. In some embodiments, the process includes steps 710 to 740.
[0146] Step 710: Feature extraction.
[0147] In this embodiment, the entire semantic segmentation network consists of a backbone network, a prediction head, and an additional projection network. The backbone network outputs an intermediate feature representation for each 3D point cloud point, which is then input into the prediction head to obtain a semantic category prediction value for each 3D point cloud point. The projection network processes the intermediate feature representation of each 3D point cloud point to obtain the final feature representation used for semantic label transmission.
[0148] In step 710, the point cloud data of the current training batch is input into the semantic segmentation backbone network to obtain the intermediate feature representation of the point cloud data. The intermediate feature representation of the point cloud data is then input into the projection network to obtain the feature representation of the point cloud data of the current training batch.
[0149] The point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels.
[0150] Step 720: Update the class centroid in the "Adaptive Prototype" component.
[0151] In step 720, from the feature representations of the point cloud obtained by the projection network, feature representations of point cloud points with semantic labels are selected, and the corresponding class centroids in the "adaptive prototype" component are updated based on the feature representations of point cloud points with semantic labels. That is, the class centroid of the current training batch is determined based on the feature representations of point cloud points with semantic labels in the current training batch and the class centroids of the previous training batch. The class centroids maintained by the "adaptive prototype" component are the global class centroids.
[0152] In some embodiments, the class centroid in the "Adaptive Prototype" component is updated according to the following formula:
[0153]
[0154]
[0155] in, To determine the local class centroid of the k-th semantic label in the current training batch, Let x be the number of point cloud points carrying the semantic label of class k in the current training batch. i ∈X∧y i =k represents x i For point cloud points carrying semantic labels of class k, Represents point x in a point cloud i Feature representation, C′ k Let C be the global class centroid of the k-th semantic label in the current training batch. kLet $\mathbf{k}$ be the global centroid of the semantic label of the k-th class in the previous training batch. λ is a set coefficient used to determine the local centroid of the k-th semantic label in the current training batch. In some embodiments, λ is a positive number greater than or equal to 0.9 and less than 1. For example, the value of λ is 0.9.
[0156] Step 730: Generate pseudo tags.
[0157] In this step, based on the "dense label propagation" component, the similarity between the feature representation of unlabeled points in the current training batch and the updated class centroids (i.e., the global class centroids in the current training batch) is determined; based on the similarity, pseudo-labels are generated for each unlabeled point in the current training batch.
[0158] In some embodiments, the similarity between the feature representations of unlabeled points in the current training batch and the updated centroids of each class is determined according to the following formula:
[0159]
[0160] in, Indicates the unlabeled point x i The similarity probability with the global centroid of the k-th class in the current training batch, d(q) i c k ) indicates an unlabeled point x i The distance to the global centroid of the k-th class in the current training batch, d(q) i c k′ ) indicates an unlabeled point x i The distance to any current global centroid, exp(d(q) i c k′ )) represents d(q i c k′ ∑ is the exponent, and the power operation is performed with the natural number e as the base. k′ exp(d(q i c k′ )) represents the exp(d(q) for all current global centroids. i c k′ Summation is performed.
[0161] In some embodiments, pseudo-labels are generated for each unlabeled point in the current training batch according to the following formula:
[0162]
[0163] in, This indicates a pseudo-label for an unlabeled point. Indicates the origin of the unlabeled point x iA vector composed of the similarity scores to the centroids of each global class in the current training batch.
[0164] Step 740: Calculate the loss function and update the semantic segmentation model.
[0165] After generating pseudo-labels, the loss function is calculated based on the semantic labels carried by the current training batch, the generated pseudo-labels, and the semantic category prediction probabilities of the point cloud points in the current training batch, thus obtaining the following result: Figure 7 The value of the total loss function is shown.
[0166] In some embodiments, the overall loss function value is determined according to the following formula:
[0167]
[0168] in For the overall loss function value, N l N is the number of labeled points in the point cloud data of the current training batch. u X is the number of unlabeled points in the current training batch. l Let X be the set of labeled points in the current training batch. u p represents the set of unlabeled points in the current training batch. i =h σ (f θ (x i )) is point x i The network prediction probability, h σ This represents a linear layer, where the parameters are σ and α is the loss weight. In the first term, It is the first loss function, where y i These are the semantic labels corresponding to the annotation points; in the second item, It is the second loss function, where These are the pseudo-labels corresponding to the unlabeled points, and α is a set coefficient. In some embodiments, the value of α is 0.1. In some embodiments, the first loss function and the second loss function can be the cross-entropy loss function.
[0169] After obtaining the value of the loss function, the semantic segmentation network model is updated based on this value. Through multiple iterations of training, the final trained semantic segmentation network model is obtained.
[0170] Figure 8 This is a schematic diagram of the structure of a semantic tag propagation device, a model training device, or a semantic segmentation device according to some embodiments of the present disclosure.
[0171] like Figure 8As shown, the semantic tag propagation apparatus, model training apparatus, or semantic segmentation apparatus 800 includes a memory 810 and a processor 820 coupled to the memory 810. The memory 810 is used to store instructions for executing embodiments of the semantic segmentation method or environment awareness method. The processor 820 is configured to execute the semantic tag propagation method, model training method, or semantic segmentation method in any of the embodiments of this disclosure based on the instructions stored in the memory 810.
[0172] Figure 9 This is a schematic diagram of the structure of a computer system according to some embodiments of the present disclosure.
[0173] like Figure 9 As shown, the computer system 900 can be represented in the form of a general computing device. The computer system 900 includes a memory 910, a processor 920, and a bus 930 connecting different system components.
[0174] The memory 910 may include, for example, system memory, non-volatile storage media, etc. The system memory may store, for example, an operating system, application programs, a boot loader, and other programs. The system memory may include volatile storage media, such as random access memory (RAM) and / or cache memory. The non-volatile storage media may store, for example, instructions for a corresponding embodiment of at least one semantic tag propagation method, model training method, or semantic segmentation method being executed. Non-volatile storage media include, but are not limited to, disk storage, optical storage, flash memory, etc.
[0175] The processor 920 can be implemented using a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistors, or other discrete hardware components. Correspondingly, each module, such as the feature extraction module and the centroid maintenance module, can be implemented by executing instructions from the central processing unit (CPU)'s runtime memory, or by dedicated circuitry executing those steps.
[0176] Bus 930 can use any of a variety of bus architectures. For example, bus architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, and Peripheral Component Interconnect (PCI) bus.
[0177] The computer system 900 can be connected to interfaces 940, 950, and 960, as well as the memory 910 and processor 920, via a bus 930. Input / output interface 940 provides a connection interface for input / output devices such as monitors, mice, and keyboards. Network interface 950 provides a connection interface for various networked devices. Storage interface 960 provides a connection interface for external storage devices such as floppy disks, USB flash drives, and SD cards.
[0178] Figure 10 This is a schematic diagram of the structure of an unmanned vehicle according to some embodiments of the present disclosure; Figure 11 This is a perspective view of an unmanned vehicle according to some embodiments of the present disclosure. The following is in conjunction with... Figure 10 and Figure 11 The unmanned vehicle provided in the embodiments of this disclosure will be described.
[0179] As attached Figure 10 As shown, the unmanned vehicle consists of four parts: chassis module 1010, autonomous driving module 1020, cargo box module 1030, and remote monitoring and propulsion module 1040.
[0180] In some embodiments, the chassis module 1010 mainly includes a battery, a power management device, a chassis controller, a motor driver, and a drive motor. The battery provides power to the entire autonomous vehicle system, and the power management device converts the battery output into different voltage levels usable by various functional modules and controls the power-on and power-off states. The chassis controller receives motion commands from the autonomous driving module and controls the autonomous vehicle's steering, forward movement, reverse movement, braking, etc.
[0181] In some embodiments, the autonomous driving module 1020 includes a core processing unit (Orin or Xavier module), a traffic light recognition camera, front, rear, left, and right surround view cameras, a multi-line LiDAR, a positioning module (such as BeiDou, GPS, etc.), and an inertial navigation unit. The cameras and the autonomous driving module can communicate with each other; to improve transmission speed and reduce wiring, a GMSL link communication can be used.
[0182] In some embodiments, the autonomous driving module 1020 includes the semantic label propagation device, model training device, or semantic segmentation device described in the above embodiments.
[0183] In some embodiments, the remote monitoring streaming module 1030 consists of a front monitoring camera, a rear monitoring camera, a left monitoring camera, a right monitoring camera, and a streaming module. This module transmits the video data collected by the monitoring cameras to a backend server for viewing by backend operators. The wireless communication module communicates with the backend server via an antenna, enabling backend operators to remotely control the unmanned vehicle.
[0184] The cargo container module 1040 is the cargo-carrying device for the unmanned vehicle. In some embodiments, the cargo container module 1040 is also equipped with a display and interaction module, which is used for interaction between the unmanned vehicle and the user. The user can perform operations such as picking up items, storing goods, and purchasing goods through the display and interaction module. The type of cargo container can be changed according to actual needs. For example, in a logistics scenario, the cargo container may include multiple sub-containers of different sizes, which can be used to load goods for delivery. In a retail scenario, the cargo container can be set as a transparent container so that users can see the products for sale at a glance.
[0185] The unmanned vehicle of this disclosure can improve the accuracy of point cloud semantic segmentation results, thereby helping to improve the safety of unmanned driving.
[0186] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus, and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations thereof, can be implemented by computer-readable program instructions.
[0187] These computer-readable program instructions are provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable device to produce a machine, such that execution of the instructions by the processor produces means for implementing the functions specified in one or more boxes of the flowchart and / or block diagram.
[0188] These computer-readable program instructions may also be stored in a computer-readable storage medium. These instructions cause a computer to work in a particular manner to produce an article of manufacture, including instructions that implement the functions specified in one or more boxes in a flowchart and / or block diagram.
[0189] This disclosure may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects.
[0190] Through the semantic label propagation, model training methods, devices, and unmanned vehicles described in the above embodiments, denser and more semantically meaningful pseudo-labels can be generated, improving the training effect of semantic segmentation models in weakly supervised scenarios and optimizing the performance of semantic segmentation models.
[0191] This concludes the detailed description of the semantic tag propagation, model training methods, apparatus, and autonomous vehicle according to this disclosure. To avoid obscuring the concept of this disclosure, some details known in the art have not been described. Those skilled in the art will fully understand how to implement the technical solutions disclosed herein based on the above description.
Claims
1. A semantic tag propagation method, comprising: Determine the feature representation of the point cloud data in the current training batch, wherein the point cloud data in the current training batch includes point cloud points with semantic labels and point cloud points without semantic labels; Based on the feature representation of the point cloud points carrying semantic labels and the global centroid of each semantic label in the previous training batch, the global centroid of each semantic label in the current training batch is determined, including: based on the feature representation of the point cloud points carrying each semantic label, the local centroid of each semantic label in the current training batch is determined; based on the local centroid of each semantic label in the current training batch and the global centroid of each semantic label in the previous training batch, the global centroid of each semantic label in the current training batch is determined. Determine the similarity between the point cloud points without semantic labels and the global centroid of each semantic label in the current training batch; Based on the similarity, pseudo-labels for the point cloud points that do not carry semantic labels are determined.
2. The semantic tag propagation method according to claim 1, wherein, Based on the feature representation of the point cloud points carrying each semantic label, the local centroid of each semantic label in the current training batch is determined as follows: The average value is calculated for the feature representation of the point cloud points carrying each semantic label, and the average value is used as the local centroid of that semantic label.
3. The semantic tag propagation method according to claim 1, wherein, Based on the local centroid of each semantic label class in the current training batch and the global centroid of each semantic label class in the previous training batch, the global centroid of each semantic label class in the current training batch is determined as follows: The local centroid of each semantic label in the current training batch is weighted and combined with its global centroid in the previous training batch, and the weighted sum is used as its global centroid in the current training batch.
4. The semantic tag propagation method according to any one of claims 1 to 3, wherein, The feature representation of the point cloud data in the current training batch includes: The point cloud data of the current training batch is input into the semantic segmentation backbone network to obtain the intermediate feature representation of the point cloud data, wherein the intermediate feature representation is used to determine the semantic category prediction probability of the point cloud data. The intermediate feature representation of the point cloud data is input into the projection network to obtain the feature representation of the point cloud data of the current training batch.
5. The semantic tag propagation method according to any one of claims 1 to 3, wherein, Based on the similarity, the pseudo-labels of the point cloud points without semantic labels are determined to include: Based on the similarity, a pseudo-label is determined for each of the point cloud points that do not carry semantic labels.
6. The semantic tag propagation method according to any one of claims 1 to 3, wherein, Determining the similarity between the point cloud points without semantic labels and the global centroid of each semantic label in the current training batch includes: Determine the distance from the point cloud points without semantic labels to the global centroid of each semantic label in the current training batch; Based on the distance, calculate the similarity probability between the point cloud points without semantic labels and the global centroid of each semantic label in the current training batch, and use the similarity probability as the similarity score.
7. The semantic tag propagation method according to claim 6, wherein, The distance between the point cloud points without semantic labels and the global centroid of each semantic label in the current training batch is determined based on the cosine distance function.
8. The semantic tag propagation method according to claim 6, wherein, Based on the similarity, the pseudo-labels of the point cloud points without semantic labels are determined to include: The similarity probability is used as a pseudo-label for the point cloud points that do not carry semantic labels.
9. The semantic tag propagation method according to any one of claims 1 to 3, wherein, When the current training batch is the first training batch, the global centroid of each semantic label in the previous training batch is a preset value.
10. The semantic tag propagation method according to claim 9, wherein, The preset value is the zero vector.
11. The semantic tag propagation method according to claim 5, wherein, Based on the similarity, the pseudo-labels for each of the point cloud points without semantic labels are determined as follows: The similarity is used as a pseudo-label for the point cloud points that do not carry semantic labels.
12. A model training method, comprising: According to any one of the semantic label propagation methods of claims 1-11, determine the pseudo-labels of point cloud points that do not carry semantic labels in the current training batch; The semantic segmentation model is trained based on the semantic labels and pseudo-labels carried in the current training batch.
13. The model training method according to claim 12, wherein, Training the semantic segmentation model based on the semantic labels carried in the current training batch and the pseudo-labels includes: The value of the first loss function is determined based on the semantic category prediction probability of point cloud points carrying semantic labels in the current training batch and the semantic labels. The value of the second loss function is determined based on the semantic category prediction probability of point cloud points without semantic labels in the current training batch and the pseudo-labels. Based on the values of the first loss function and the second loss function, determine the overall loss function value; The semantic segmentation model is updated based on the overall loss function value.
14. A semantic segmentation method, comprising: The semantic segmentation model trained based on the model training method described in claim 12 or 13 is used to perform semantic segmentation on the point cloud data to be processed.
15. An apparatus comprising: A module for performing the semantic tag propagation method according to any one of claims 1-11, or a module for performing the model training method according to claim 12 or 13, or a module for performing the semantic segmentation method according to claim 14.
16. An electronic device comprising: Memory; as well as A processor coupled to the memory, the processor being configured to execute the semantic tag propagation method of any one of claims 1 to 11, the model training method of claim 12 or 13, or the semantic segmentation method of claim 14, based on instructions stored in the memory.
17. A computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the semantic tag propagation method of any one of claims 1 to 11, the model training method of claim 12 or 13, or the semantic segmentation method of claim 14.
18. An unmanned vehicle, comprising: The apparatus of claim 15, or the electronic device of claim 16.