Edge position labeling method and apparatus
By marking rough bounding boxes in 3D point cloud data, the edge positions of obstacles are detected, solving the problem of low efficiency in manual annotation in existing technologies and realizing a more efficient operation process.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING SANKUAI ONLINE TECH CO LTD
- Filing Date
- 2022-12-26
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, developers are inefficient when marking the edge positions of obstacles, as they need to manually mark the edge positions of obstacles.
This paper provides an edge location annotation method that displays 3D point cloud data and obtains a coarse bounding box. Based on the coarse bounding box and the 3D point cloud data, the edge location of obstacles is detected, reducing the need for manual marking.
It improves operational efficiency, breaks the limitations of manually marking the edge of obstacles, and simplifies the operation process.
Smart Images

Figure CN115965936B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a method and device for edge location marking. Background Technology
[0002] With the continuous development of autonomous driving technology, the terminal can detect obstacles in the surrounding environment by collecting three-dimensional point cloud data.
[0003] Specifically, the terminal invokes an obstacle detection model to detect obstacles in the 3D point cloud data to obtain the obstacles in the surrounding environment. Before using the obstacle detection model, it needs to be trained. The sample data used for training requires developers to manually label the edge positions of obstacles in the 3D point cloud data so that the obstacle detection model can be trained based on the 3D point cloud data with labeled edge positions.
[0004] However, the process is inefficient because developers need to ensure that the markings of the obstacle's edge are aligned with the obstacle's edge. Summary of the Invention
[0005] This application provides an edge location marking method and device, which overcomes the limitation of requiring users to manually mark the edge locations of obstacles and improves operational efficiency. The technical solution is as follows:
[0006] On the one hand, an edge location annotation method is provided, the method comprising:
[0007] Displays 3D point cloud data of obstacles detected by the target object;
[0008] In response to the annotation operation on the three-dimensional point cloud data, a rough bounding box is obtained for the obstacle, and the edge of the obstacle is within the coverage area of the rough bounding box;
[0009] Based on the 3D point cloud data and the rough bounding box, the obstacle is detected to obtain the edge position of the obstacle.
[0010] On the one hand, an edge position marking device is provided, the device comprising:
[0011] The display module is used to display the 3D point cloud data of obstacles detected by the target object;
[0012] The acquisition module is used to acquire a rough bounding box for the obstacle in response to the annotation operation of the three-dimensional point cloud data, wherein the edge of the obstacle is within the coverage area of the rough bounding box;
[0013] The detection module is used to detect the obstacle based on the three-dimensional point cloud data and the coarse marker box, and obtain the edge position of the obstacle.
[0014] In one possible implementation, the device further includes:
[0015] The generation module is used to generate training data based on the edge position of the obstacle and the three-dimensional point cloud data. The training data is used to train the obstacle detection model, and the obstacle detection model is used to perform obstacle detection.
[0016] In one possible implementation, the detection module includes:
[0017] The determining unit is used to determine multiple key points corresponding to the coarse marking box based on the position of each point in the three-dimensional point cloud data and the coarse marking box;
[0018] The fusion unit is used to fuse the key point features of the key points corresponding to the coarse marker box based on the key point features of each acquired key point and the position of the coarse marker box, so as to obtain the marker box features of the coarse marker box.
[0019] The detection unit is used to detect the features of the bounding box and obtain the edge position of the obstacle corresponding to the rough bounding box.
[0020] In one possible implementation, the determining unit is configured to:
[0021] Based on the position of each point in the three-dimensional point cloud data, the centroid corresponding to the coarse marking box is determined, and the centroid is any point corresponding to the coarse marking box;
[0022] Among the multiple points corresponding to the rough marking box, the point with the largest distance from the centroid is determined as the key point;
[0023] For each of the remaining points, obtain the distance between the point and the centroid and the distance between the point and each of the determined key points, and determine the minimum distance between the point and the centroid and the distance between the point and each of the determined key points as the distance corresponding to the point.
[0024] The point with the largest distance among the remaining points is identified as the key point, until a first preset number of key points are obtained.
[0025] In one possible implementation, the determining unit is configured to:
[0026] Obtain the average value of multiple points corresponding to the rough marking box, and determine the point corresponding to the average value as the centroid;
[0027] or,
[0028] The centroid is determined from any one of the multiple points corresponding to the rough marking box.
[0029] In one possible implementation, the device further includes:
[0030] The partitioning module is used to divide the three-dimensional point cloud data into a second preset number of two-dimensional regions according to a two-dimensional angle.
[0031] The feature extraction module is used to extract features from each of the two-dimensional regions in the second preset number of two-dimensional regions to obtain a first feature matrix, wherein each element in the first feature matrix indicates the first feature of the corresponding two-dimensional region.
[0032] The feature extraction module is further configured to extract features from the first feature matrix to obtain a second feature matrix, wherein each element in the second feature matrix indicates a second feature of the corresponding two-dimensional region.
[0033] The cascading module is used to cascade the first feature of the two-dimensional region to which the key point belongs in the first feature matrix, the second feature corresponding to the second feature matrix, and the original feature of the key point for each of the plurality of key points, to obtain the key point feature of the key point.
[0034] In one possible implementation, the detection module is further configured to detect the obstacle based on the 3D point cloud data and the coarse bounding box, and obtain the category of the obstacle.
[0035] In one possible implementation, the detection module is further configured to detect the obstacle based on the three-dimensional point cloud data and the coarse marker box, and obtain the movement direction angle of the obstacle, wherein the movement direction angle refers to the angle between the movement direction of the obstacle and a preset direction.
[0036] In one possible implementation, the step of detecting the obstacle based on the 3D point cloud data and the coarse bounding box to obtain the edge position of the obstacle is implemented based on an edge detection model. The acquisition module is also used to acquire the sample edge position of the obstacle corresponding to the coarse bounding box in the sample 3D point cloud data.
[0037] The detection module is also used to detect obstacles based on the edge detection model, the sample 3D point cloud data, and the rough bounding box, and to obtain the predicted edge position of the obstacle corresponding to the rough bounding box;
[0038] The device further includes a training module for training the edge detection model based on the predicted edge position and the sample edge position.
[0039] On one hand, a terminal is provided, the terminal including one or more processors and one or more memories, the one or more memories storing at least one piece of program code, the at least one piece of program code being loaded and executed by the one or more processors to perform the operations performed by the edge location annotation method as described in any of the above possible implementations.
[0040] On the one hand, a computer-readable storage medium is provided, which stores at least one piece of program code, which is loaded and executed by a processor to perform the operations performed by the edge location annotation method as described in any of the above possible implementations.
[0041] On the one hand, a computer program or computer program product is provided, the computer program or computer program product comprising: computer program code, which, when executed by a terminal, causes the terminal to perform the operation performed by the edge position annotation method as described in any of the above possible implementations.
[0042] This application proposes an edge location annotation method. Users can specify the location of an obstacle by marking a rough bounding box in 3D point cloud data. The edge location of the obstacle is then detected based on the rough bounding box. Since this application only requires users to mark the rough location of the obstacle and to mark the edge of the obstacle, it breaks the limitation of requiring users to manually mark the edge location of the obstacle and improves the operation efficiency. Attached Figure Description
[0043] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 This is a schematic diagram of an implementation environment provided in an embodiment of this application;
[0045] Figure 2 This is a flowchart of an edge position annotation method provided in an embodiment of this application;
[0046] Figure 3 This is a flowchart of an edge position annotation method provided in an embodiment of this application;
[0047] Figure 4This is a flowchart of another edge position annotation method provided in the embodiments of this application;
[0048] Figure 5 This is a schematic diagram illustrating the edge position and category of a detected obstacle, provided in an embodiment of this application.
[0049] Figure 6 This is a schematic diagram of the structure of an edge position marking device provided in an embodiment of this application;
[0050] Figure 7 This is a schematic diagram of another edge position marking device provided in an embodiment of this application;
[0051] Figure 8 This is a schematic diagram of the terminal structure provided in the embodiments of this application;
[0052] Figure 9 This is a schematic diagram of the structure of a server provided in an embodiment of this application. Detailed Implementation
[0053] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0054] It is understood that the terms "first," "second," etc., used in this application may be used to describe various concepts herein, but unless otherwise specified, these concepts are not limited by these terms. These terms are only used to distinguish one concept from another. For example, without departing from the scope of this application, the first feature matrix may be referred to as the second feature matrix, and the second feature matrix may be referred to as the first feature matrix.
[0055] As used in this application, the terms "at least one", "multiple", "each", and "any" are used in the following ways: at least one includes one, two, or more; multiple includes two or more; each refers to each of the corresponding multiple; and any refers to any one of the multiple. For example, multiple key points include three key points, each refers to each of the three key points, and any refers to any one of the three key points, which can be the first, the second, or the third.
[0056] It should be noted that all information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.), and signals involved in this application have been authorized by the user or fully authorized by all parties, and the collection, use, and processing of related data must comply with the relevant laws, regulations, and standards of the relevant countries and regions. For example, the location information involved in this application was obtained with full authorization. Furthermore, the aforementioned information and data, after processing, are used in big data application scenarios and cannot identify any natural person or establish a specific association with them.
[0057] In some embodiments, the edge location annotation method provided in this application is executed by a terminal.
[0058] In other embodiments, the edge location annotation method provided in this application is executed by a terminal and a server. The server can be a single server, a server cluster consisting of several servers, or a cloud computing service center.
[0059] It should be noted that the embodiments of this application do not limit the subject that performs the edge position annotation method.
[0060] Figure 1 This is a schematic diagram of an implementation environment provided in an embodiment of this application, such as... Figure 1 As shown, the implementation environment includes a terminal 101 and a server 102, which are connected via a wireless or wired network.
[0061] Server 102 is a server that provides autonomous driving-related services to terminal 101. In some embodiments, server 102 is used to provide services to terminal 101.
[0062] In some embodiments, the terminal 101 displays three-dimensional point cloud data of obstacles detected by the target object. If an annotation operation on the displayed three-dimensional point cloud data is detected, the terminal responds to the annotation operation, obtains the rough bounding box of the obstacle, and then sends the three-dimensional point cloud data and the rough bounding box to the server. The server detects the obstacle, obtains the edge position of the obstacle, and then trains an obstacle detection model based on the edge position of the obstacle and the three-dimensional point cloud data.
[0063] Figure 2 This is a flowchart illustrating an edge location annotation method provided in an embodiment of this application. This embodiment uses a terminal as the executing entity for illustrative purposes, and includes:
[0064] 201. The terminal displays the 3D point cloud data of the obstacles detected by the target object.
[0065] The moving object can be any object with a moving function, such as a bicycle, car, drone, or other object; this application embodiment does not limit the scope of the moving object. An obstacle is an object that obstructs the movement of the moving object. The obstacle can be a stationary obstacle, a moving obstacle, or other types of obstacles; this application embodiment does not limit the scope of the obstacle. For example, the obstacle can be a stationary bicycle, a moving bicycle, a stationary car, a moving car, or other object; this application embodiment does not limit the scope of the obstacle.
[0066] During the movement of the target object, it can take pictures of obstacles to obtain three-dimensional point cloud data of the obstacles. The three-dimensional point cloud data is then uploaded to the terminal, so the terminal can display the three-dimensional point cloud data of the obstacles detected by the target object.
[0067] 202. The terminal responds to the annotation operation on the 3D point cloud data and obtains the rough bounding box of the obstacle. The edge of the obstacle is within the coverage area of the rough bounding box.
[0068] In this embodiment of the application, if the 3D point cloud data includes obstacles, the user can annotate the obstacles by viewing the 3D point cloud data, so that the roughly marked bounding box includes the obstacle. The coverage area of the roughly marked bounding box is larger than the obstacle, therefore the roughly marked box can cover the edge of the obstacle.
[0069] 203. The terminal detects obstacles based on 3D point cloud data and rough bounding boxes, and obtains the edge positions of the obstacles.
[0070] In this embodiment, the terminal acquires three-dimensional point cloud data and a rough bounding box, that is, the terminal determines the rough location of the obstacle. Then, based on the three-dimensional point cloud data and the rough bounding box of the obstacle, the obstacle is detected, and the edge position of the obstacle can be obtained.
[0071] This application proposes an edge location annotation method. Users can specify the location of an obstacle by marking a rough bounding box in 3D point cloud data. The edge location of the obstacle is then detected based on the rough bounding box. Since this application only requires users to mark the rough location of the obstacle and to mark the edge of the obstacle, it breaks the limitation of requiring users to manually mark the edge location of the obstacle and improves the operation efficiency.
[0072] Figure 3 This is a flowchart illustrating an edge location annotation method provided in an embodiment of this application. This embodiment uses a terminal as the executing entity for illustrative purposes, and includes:
[0073] 301. The terminal displays the 3D point cloud data of the obstacles detected by the target object.
[0074] In some embodiments, the target object is equipped with radar to detect the surrounding environment in order to obtain 3D point cloud data of obstacles in the surrounding environment. Optionally, the terminal is equipped with a model training application, through which the 3D point cloud data of the obstacles detected by the target object is displayed via the annotation interface of the model training application.
[0075] 302. The terminal responds to the annotation operation on the 3D point cloud data and obtains the rough bounding box of the obstacle. The edge of the obstacle is within the coverage area of the rough bounding box.
[0076] In this embodiment, the terminal displays three-dimensional point cloud data, and the three-dimensional point cloud data includes the shape of obstacles. Therefore, the user can perform a box selection operation on the three-dimensional point cloud data of obstacles based on the shape of the obstacles included in the three-dimensional point cloud data. This can also be understood as a labeling operation on the three-dimensional point cloud data, so that the rough label box includes the edge of the obstacle. Therefore, the terminal can obtain the rough label box labeled by the user.
[0077] In some embodiments, when displaying three-dimensional point cloud data, the terminal can display a top view of the three-dimensional point cloud data, or it can be understood as displaying two-dimensional point cloud data of the three-dimensional point cloud data. Therefore, when the user performs annotation operations on the three-dimensional point cloud data, he / she can focus on the two-dimensional point cloud data and exclude the third-dimensional parameters of the three-dimensional point cloud data.
[0078] For example, 3D point cloud data includes parameters in three dimensions: X, Y, and Z. When the terminal displays a top view of the 3D point cloud data, it will display the X and Y dimensions and ignore the Z dimension. Therefore, when annotating 3D point cloud data, users only need to consider the X and Y dimensions.
[0079] Optionally, when the terminal displays a top view of the 3D point cloud data, obstacles are also displayed in a top view. Therefore, users can annotate obstacles according to the displayed top view so that the rough annotation box includes the edge of the obstacle.
[0080] 303. The terminal determines multiple key points corresponding to the rough marker box based on the position of each point in the 3D point cloud data and the rough marker box.
[0081] In this embodiment of the application, the three-dimensional point cloud data includes multiple points, each of which has its own location, and some points are located in or near the coarse marker box. Therefore, based on the location of each point in the three-dimensional point cloud data and the coarse marker box, multiple key points corresponding to the coarse marker box can be determined.
[0082] In some embodiments, the method for determining multiple key points corresponding to a coarse bounding box includes: determining the centroid corresponding to the coarse bounding box based on the position of each point in the 3D point cloud data, wherein the centroid is any point corresponding to the coarse bounding box; determining the point with the largest distance from the centroid among the multiple points corresponding to the coarse bounding box as a key point; for each of the remaining points, obtaining the distance between the point and the centroid and the distance between the point and each determined key point; determining the minimum distance between the point and the centroid and the minimum distance between the point and each determined key point as the distance corresponding to the point; determining the point with the largest corresponding distance among the remaining points as a key point, until a first preset number of key points are obtained.
[0083] In this embodiment, the 3D point cloud data includes multiple points, and the coarse marker box is located within the 3D point cloud data. Therefore, the coarse marker box also corresponds to multiple points. The centroid of the coarse marker box is determined based on the position of the points corresponding to the coarse marker box, and then the key points of the coarse marker box are determined based on the centroid. Specifically, for each point among the multiple points corresponding to the coarse marker box, the distance between each point and the centroid is obtained. The point with the largest distance from the centroid is determined as a key point. At this time, the determined key points are excluded from the coarse marker box. For each remaining point, the point with the largest distance from both the centroid and the determined key points needs to be determined as a key point. Therefore, for each remaining point, the distance between the point and the centroid and the distance between the point and each determined key point are obtained. The smallest distance between the point and the centroid and the smallest distance between the point and each determined key point is determined as the distance corresponding to the point. The point with the largest corresponding distance among the remaining points is determined as a key point, until a first preset number of key points are obtained.
[0084] For example, if the rough marker box corresponds to multiple points A, B, C, and D, and the preset number to be obtained is 2, for example, the distance between point A and the center of gravity is 1, the distance between point B and the center of gravity is 2, the distance between point C and the center of gravity is 3, and the distance between point D and the center of gravity is 4, then point D is determined as a key point. Then, to determine the next key point, point A is 1 distance from the center of gravity and 3 distance from point D; point B is 2 distance from the center of gravity and 2 distance from point D; point C is 3 distance from the center of gravity and 1 distance from point D; therefore, point D is determined as a key point.
[0085] It should be noted that the center of gravity determined in the embodiments of this application can also be considered as the key point corresponding to the rough marking box. That is, the center of gravity determined by the terminal is the first key point, and then the second key point, the third key point, and so on until the first preset number of key points are determined.
[0086] In some embodiments, the method for determining key points in this application can be called the farthest sampling algorithm, that is, this application can use the farthest sampling algorithm to sample a first preset number of key points. Optionally, this application uses a partitioned farthest sampling algorithm to sample a first preset number of key points.
[0087] The following explains how to determine the center of gravity of the rough markup box:
[0088] In some embodiments, the coarse marking box corresponds to multiple points. The average value of these multiple points can be obtained, and the point corresponding to the average value is determined as the centroid. In this embodiment, each of the multiple points corresponding to the coarse marking box has a corresponding position. Since the average position of the multiple points corresponding to the coarse marking box can be obtained, the point corresponding to the average value can be determined, and that point can be determined as the centroid of the coarse marking box.
[0089] In some embodiments, the terminal determines any one of the multiple points corresponding to the coarse marking box as the center of gravity. For example, if the multiple points corresponding to the coarse marking box include A, B, C, and D, then any one of A, B, C, and D can be determined as the center of gravity. For example, A can be determined as the center of gravity, or B can be determined as the center of gravity, or C can be determined as the center of gravity, or D can be determined as the center of gravity.
[0090] The solution provided in this application expands the ways to determine the center of gravity by determining the average value of multiple points corresponding to the coarse marking box, or by selecting any one of the multiple points as the center of gravity, thus increasing the diversity.
[0091] In some embodiments, the terminal enlarges the coarse marking frame by a preset factor, and determines the points included in the enlarged coarse marking frame as the points corresponding to the coarse marking frame. The preset factor can be 1.5 times, 1.7 times, or other values, and this embodiment is not limited to this. For example, if the terminal enlarges the coarse marking frame by 1.5 times, the enlarged coarse marking frame can include more points. Determining the points included in the enlarged coarse marking frame as the points corresponding to the coarse marking frame allows for the determination of the features of the coarse marking frame by referring to more points, thereby improving the accuracy of subsequent obstacle detection.
[0092] In some embodiments, the terminal determines the points included in the coarse marking box as the points corresponding to the coarse marking box. For example, if the coarse marking box includes three points A, B, and C, then points A, B, and C are determined as the points corresponding to the coarse marking box.
[0093] It should be noted that step 303 in this embodiment is executed by the key point sampling module, which means that the key point sampling module can determine multiple key points corresponding to the rough marking box.
[0094] 304. Based on the key point features of each acquired key point and the position of the coarse marker box, the terminal fuses the key point features of the key points corresponding to the coarse marker box to obtain the marker box features of the coarse marker box.
[0095] For each of the multiple key points corresponding to the coarse bounding box, each key point has key point features. Therefore, after fusing the key point features of the multiple key points corresponding to the coarse bounding box, the resulting feature can represent the bounding box feature of the coarse bounding box.
[0096] In some embodiments, the terminal uses a region pooling algorithm to fuse multiple key point features corresponding to a coarse bounding box to obtain the bounding box features of the coarse bounding box. Optionally, the region pooling algorithm is a RoI pooling (Region of Interest) pooling algorithm, or other pooling algorithms, which are not limited in this application embodiment.
[0097] It should be noted that step 304 in this embodiment is executed by the coarse bounding box feature extraction module, which means that the coarse bounding box feature extraction module can determine the bounding box features of the coarse bounding box.
[0098] 305. The terminal detects the features of the bounding box and obtains the edge position of the obstacle corresponding to the rough bounding box.
[0099] In this embodiment of the application, the marker box feature indicates the feature of the corresponding coarse marker box, and since the coarse marker box includes obstacles, the edge position of the obstacle corresponding to the coarse marker box can be obtained by detecting the marker box feature.
[0100] It should be noted that the embodiments in this application are illustrated using the detection of the edge position of an obstacle as an example. In another embodiment, the size of the obstacle can also be detected, that is, the terminal detects the features of the marked box to obtain the edge position and size of the obstacle.
[0101] Optionally, the size of the obstacle can be determined based on the edge positions of the obstacle. For example, if the edge positions of the obstacle form a rectangle, the length and width of the obstacle can be determined based on the edge positions, and then the product of the length and width can be used to determine the size of the obstacle. As another example, if the edge positions of the obstacle form a circle, the diameter of the obstacle can be determined based on the edge positions, and then the size of the obstacle can be determined based on the determined diameter. Alternatively, the obstacle can also be of other shapes, which will not be elaborated upon in this embodiment.
[0102] In some embodiments, the terminal detects obstacles based on 3D point cloud data and coarse bounding boxes, obtaining not only the edge position of the obstacle but also its category. The category of the obstacle can be a car, bicycle, tree, or other category, which is not limited in this embodiment.
[0103] In some embodiments, the terminal detects obstacles based on the 3D point cloud data and the coarse bounding box, obtaining not only the edge position of the obstacle but also its movement direction angle. Specifically, the terminal detects the features of the bounding box to obtain the movement direction angle of the obstacle corresponding to the coarse bounding box; the movement direction angle is the angle between the obstacle's movement direction and a preset direction.
[0104] It should be noted that steps 303-305 in this embodiment are one possible implementation of step 203. In some embodiments, step 203 can also be implemented based on a model. That is, the step of detecting obstacles and obtaining the edge positions of obstacles based on 3D point cloud data and coarse bounding boxes is implemented based on an edge detection model. The steps for training the edge detection model are described below:
[0105] Obtain the sample edge position of the obstacle corresponding to the coarse bounding box in the sample 3D point cloud data. Based on the edge detection model, the sample 3D point cloud data and the coarse bounding box, detect the obstacle and obtain the predicted edge position of the obstacle corresponding to the coarse bounding box. Based on the predicted edge position and the sample edge position, train the edge detection model.
[0106] In this embodiment, after obtaining sample 3D point cloud data, the sample 3D point cloud data can be labeled to obtain a coarse bounding box and sample edge position in the sample 3D point cloud data. Then, the edge detection model is called to detect the sample 3D point cloud data and the coarse bounding box to obtain the predicted edge position of the obstacle corresponding to the coarse bounding box. Then, the edge detection model is trained based on the difference between the sample edge position and the predicted edge position so that the edge detection model has the ability to detect the edge position of the obstacle based on 3D point cloud data and coarse bounding box.
[0107] In some embodiments, the edge detection model also has the ability to detect the category of obstacles. The model acquires the sample edge positions and sample categories of obstacles corresponding to coarsely marked boxes in sample 3D point cloud data. Based on the edge detection model, sample 3D point cloud data, and coarsely marked boxes, obstacles are detected to obtain the predicted edge positions and predicted categories of obstacles corresponding to the coarsely marked boxes. The edge detection model is then trained based on the predicted edge positions, predicted categories, sample edge positions, and predicted categories.
[0108] In this embodiment, after obtaining the sample 3D point cloud data, the sample 3D point cloud data can be labeled to obtain the coarse bounding box, sample edge position, and obstacle sample category in the sample 3D point cloud data. Then, the edge detection model is called to detect the sample 3D point cloud data and the coarse bounding box to obtain the predicted edge position and predicted category of the obstacle corresponding to the coarse bounding box. Then, the edge detection model is trained based on the difference between the sample edge position and the predicted edge position, and the difference between the obstacle sample category and the predicted category, so that the edge detection model has the ability to detect the edge position and obstacle category based on 3D point cloud data and coarse bounding box.
[0109] In some embodiments, the edge detection model also has the ability to detect the movement direction of obstacles. In the process of training the edge detection model, it is necessary to obtain the sample movement direction of obstacles in the sample 3D point cloud data. Based on the edge detection model, the sample 3D point cloud data and the coarse bounding box, the obstacle is detected to obtain the predicted movement direction of the obstacle corresponding to the coarse bounding box. The edge detection model is then trained based on the sample movement direction and the predicted movement direction.
[0110] It should be noted that the above embodiments directly illustrate the processing based on the key point features of key points. In other embodiments, the key point feature acquisition process includes: dividing the 3D point cloud data into a second preset number of 2D regions according to the 2D angle; extracting features from each of the second preset number of 2D regions to obtain a first feature matrix, where each element in the first feature matrix indicates the first feature of the corresponding 2D region; extracting features from the first feature matrix to obtain a second feature matrix, where each element in the second feature matrix indicates the second feature of the corresponding 2D region; and for each key point among multiple key points, concatenating the first feature of the 2D region to which the key point belongs in the first feature matrix, the second feature of the 2D region to which the key point belongs in the second feature matrix, and the original feature of the key point to obtain the key point feature of the key point.
[0111] In some embodiments, the original features of the keypoint include features along four dimensions: X-axis, Y-axis, Z-axis, and intensity. Intensity refers to a feature in one dimension of the 3D point cloud data.
[0112] In some embodiments, for each of the two-dimensional regions in the second preset number of two-dimensional regions, the PointNet algorithm is used to extract features, thereby obtaining the first feature corresponding to the second preset number of two-dimensional regions. Since feature extraction is performed on each two-dimensional region separately, each element in the obtained first feature matrix corresponds to the first feature of the two-dimensional region.
[0113] In some embodiments, a 2D fully convolutional backbone network is used to extract features from the first feature matrix to obtain a second feature matrix. Since each element in the first feature matrix corresponds to the first feature of a two-dimensional region, after feature extraction from the first feature matrix, each element in the resulting second feature matrix corresponds to the second feature of a two-dimensional region.
[0114] For example, the 3D point cloud data is divided into M*N two-dimensional regions on the XY plane. Each two-dimensional region is not divided on the Z-axis, meaning each two-dimensional region includes all points on the corresponding Z-axis. For each two-dimensional region, the PointNet algorithm (an algorithm) is used to extract D1-dimensional two-dimensional region features. For the M*N two-dimensional regions in the 3D point cloud data, an M*N*D1-dimensional first feature matrix is formed. The M*N*D1-dimensional first feature matrix is then input into a 2D fully convolutional backbone network to obtain an M*N*D1-dimensional second feature matrix. For each keypoint, the first feature corresponding to the two-dimensional region to which the keypoint belongs in the first feature matrix, the second feature corresponding to the second feature matrix, and the original feature of the keypoint are concatenated to obtain D1+D2+4-dimensional keypoint features.
[0115] It should be noted that the embodiments of this application are illustrated using a two-dimensional region as an example. In another embodiment, the three-dimensional point cloud data is divided into a preset number of voxels. Subsequently, feature extraction is performed on each voxel, and then the key point features of the key points are determined based on the extracted features.
[0116] Optionally, the step of extracting features from a preset number of voxels is performed by a voxel feature extraction module, meaning that this voxel feature extraction module can extract features from voxels. Optionally, the step of extracting keypoint features is performed by a keypoint feature extraction module, meaning that this keypoint feature extraction module can extract keypoint features from keypoints. Optionally, step 305 is performed by a detection head module, meaning that this detection head module can detect the edge position of the obstacle corresponding to the coarse bounding box.
[0117] 306. The terminal generates training data based on the edge position of the obstacle and the 3D point cloud data. The training data is used to train the obstacle detection model, and the obstacle detection model is used to perform obstacle detection.
[0118] In this embodiment of the application, since it is necessary to train an obstacle detection model so that the obstacle detection model can detect obstacles using 3D point cloud data, training data is generated based on the edge position of the obstacle and the 3D point cloud data, and the obstacle detection model is trained using the training data.
[0119] In some embodiments, the training data is sample data, that is, the edge positions of obstacles included in the training data are sample data. During the training of the obstacle detection model, the obstacle detection model is called to detect the three-dimensional point cloud data in the training data to obtain the predicted edge positions of the obstacles. Then, the obstacle detection model is trained based on the edge positions of the obstacles included in the training data and the predicted edge positions of the obstacles to obtain an obstacle detection model with the ability to perform obstacle detection.
[0120] In some embodiments, the terminal can also detect obstacles based on 3D point cloud data and coarse bounding boxes to obtain the category of the obstacle. Therefore, the terminal generates training data based on the edge position of the obstacle, the category of the obstacle, and the 3D point cloud data. The training data is used to train the obstacle detection model.
[0121] In some embodiments, the training data is sample data, meaning that the edge positions and categories of obstacles included in the training data are all sample data. During the training of the obstacle detection model, the obstacle detection model is called to detect the 3D point cloud data in the training data to obtain the predicted edge positions and categories of obstacles. Then, based on the edge positions and categories of obstacles included in the training data, as well as the predicted edge positions and categories of obstacles, the obstacle detection model is trained to obtain an obstacle detection model with the ability to detect obstacles.
[0122] See below Figure 4 The following example illustrates the solution of this application: The terminal acquires 3D point cloud data and a coarse bounding box. Based on this 3D point cloud data and the coarse bounding box, keypoint sampling can be performed to obtain a first preset number of keypoints. Furthermore, the 3D point cloud data can be divided into multiple voxels. Voxel feature extraction is performed on the obtained voxels to obtain a first feature matrix. Then, backbone network feature extraction is performed on the first feature matrix. Based on the acquired multiple keypoints, the first feature matrix, and the second feature matrix, keypoint feature extraction is performed to obtain keypoint features. Then, based on the multiple keypoint features, bounding box features of the coarse bounding box are extracted. Finally, the bounding box features are detected to obtain the edge position and category of the obstacle. See also... Figure 5 , Figure 5 The dashed box represents the rough marking of the obstacle by the user, while the solid box represents the edge position of the obstacle detected by the method provided in this application, and also displays the category of the obstacle.
[0123] The edge location annotation method provided in this application allows users to specify the location of an obstacle by annotating a rough bounding box in 3D point cloud data. Then, based on the key point features of the key points in the rough bounding box, the bounding box features are determined. These bounding box features can represent the features of the obstacle included in the corresponding rough bounding box. Therefore, the edge location of the obstacle is obtained by detecting these bounding box features, ensuring that the obtained obstacle edge location is used to train the obstacle detection model. Since this application only requires users to annotate the rough location of the obstacle and to mark the edge of the obstacle, it breaks the limitation of requiring users to manually mark the edge location of the obstacle and improves the operation efficiency.
[0124] Furthermore, this application embodiment divides the three-dimensional point cloud data into two-dimensional regions, and then extracts key point features based on the two-dimensional regions, ensuring that the obtained key point features match the features of obstacles included in the three-dimensional point cloud data, thereby improving the accuracy of obtaining key point features and thus improving the accuracy of obstacle detection.
[0125] Furthermore, the solution provided in this application expands the ways of determining the center of gravity by determining the average value of multiple points corresponding to the coarse marking box, or by selecting any one of the multiple points as the center of gravity, thereby increasing diversity.
[0126] Figure 6 This is a schematic diagram of the structure of an edge position marking device provided in an embodiment of this application. See also... Figure 6 The device includes:
[0127] Display module 601 is used to display the three-dimensional point cloud data of obstacles detected by the target object;
[0128] The acquisition module 602 is used to acquire a rough bounding box for the obstacle in response to the annotation operation of the three-dimensional point cloud data, wherein the edge of the obstacle is within the coverage area of the rough bounding box;
[0129] The detection module 603 is used to detect the obstacle based on the three-dimensional point cloud data and the coarse marker box, and obtain the edge position of the obstacle.
[0130] In one possible implementation, see Figure 7 The device further includes:
[0131] The generation module 604 is used to generate training data based on the edge position of the obstacle and the three-dimensional point cloud data. The training data is used to train the obstacle detection model, and the obstacle detection model is used to perform obstacle detection.
[0132] In one possible implementation, the detection module 603 includes:
[0133] The determining unit 6031 is used to determine multiple key points corresponding to the coarse marking box based on the position of each point in the three-dimensional point cloud data and the coarse marking box;
[0134] The fusion unit 6032 is used to fuse the key point features of the key points corresponding to the coarse marker box based on the key point features of each key point and the position of the coarse marker box to obtain the marker box features of the coarse marker box.
[0135] The detection unit 6033 is used to detect the features of the marked box and obtain the edge position of the obstacle corresponding to the rough marked box.
[0136] In one possible implementation, the determining unit 6031 is configured to:
[0137] Based on the position of each point in the three-dimensional point cloud data, the centroid corresponding to the coarse marking box is determined, and the centroid is any point corresponding to the coarse marking box;
[0138] Among the multiple points corresponding to the rough marking box, the point with the largest distance from the centroid is determined as the key point;
[0139] For each of the remaining points, obtain the distance between the point and the centroid and the distance between the point and each of the determined key points, and determine the minimum distance between the point and the centroid and the distance between the point and each of the determined key points as the distance corresponding to the point.
[0140] The point with the largest distance among the remaining points is identified as the key point, until a first preset number of key points are obtained.
[0141] In one possible implementation, the determining unit 6031 is configured to:
[0142] Obtain the average value of multiple points corresponding to the rough marking box, and determine the point corresponding to the average value as the centroid;
[0143] or,
[0144] The centroid is determined from any one of the multiple points corresponding to the rough marking box.
[0145] In one possible implementation, see Figure 7 The device further includes:
[0146] The partitioning module 605 is used to divide the three-dimensional point cloud data into a second preset number of two-dimensional regions according to a two-dimensional angle.
[0147] Feature extraction module 606 is used to extract features from each of the two-dimensional regions in the second preset number of two-dimensional regions to obtain a first feature matrix, wherein each element in the first feature matrix indicates the first feature of the corresponding two-dimensional region.
[0148] The feature extraction module 606 is further configured to extract features from the first feature matrix to obtain a second feature matrix, wherein each element in the second feature matrix indicates a second feature of the corresponding two-dimensional region.
[0149] The cascading module 607 is used to cascade the first feature corresponding to the two-dimensional region to which the key point belongs in the first feature matrix, the second feature corresponding to the second feature matrix, and the original feature of the key point for each of the plurality of key points, to obtain the key point feature of the key point.
[0150] In one possible implementation, the detection module 603 is further configured to detect the obstacle based on the three-dimensional point cloud data and the coarse bounding box, and obtain the category of the obstacle.
[0151] In one possible implementation, the detection module 603 is further configured to detect the obstacle based on the three-dimensional point cloud data and the coarse marker box, and obtain the movement direction angle of the obstacle, wherein the movement direction angle refers to the angle between the movement direction of the obstacle and a preset direction.
[0152] In one possible implementation, the step of detecting the obstacle based on the three-dimensional point cloud data and the coarse bounding box to obtain the edge position of the obstacle is implemented based on an edge detection model. The acquisition module 602 is also used to acquire the sample edge position of the obstacle corresponding to the coarse bounding box in the sample three-dimensional point cloud data.
[0153] The detection module 603 is also used to detect obstacles based on the edge detection model, the sample three-dimensional point cloud data and the coarse marker box, and obtain the predicted edge position of the obstacle corresponding to the coarse marker box;
[0154] The device further includes a training module 608, used to train the edge detection model based on the predicted edge position and the sample edge position.
[0155] It should be noted that the edge location annotation device provided in the above embodiments is only illustrated by the division of the above functional modules when generating training data. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the terminal can be divided into different functional modules to complete all or part of the functions described above. In addition, the edge location annotation device and the edge location annotation method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0156] Figure 8 This is a schematic diagram of the structure of a terminal provided in an embodiment of this application. The terminal 800 includes a processor 801 and a memory 802.
[0157] Processor 801 may include one or more processing cores, such as a quad-core processor or an octa-core processor. Processor 801 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 801 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 801 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 801 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.
[0158] The memory 802 may include one or more computer-readable storage media, which may be non-transitory. The memory 802 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 802 are used to store at least one program code, which is executed by the processor 801 to implement the edge location annotation method provided in the method embodiments of this application.
[0159] In some embodiments, the terminal 800 may also optionally include a peripheral device interface 803 and at least one peripheral device. The processor 801, memory 802, and peripheral device interface 803 can be connected via a bus or signal line. Each peripheral device can be connected to the peripheral device interface 803 via a bus, signal line, or circuit board. Specifically, the peripheral device includes at least one of the following: a radio frequency circuit 804, a display screen 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.
[0160] Peripheral device interface 803 can be used to connect at least one I / O (Input / Output) related peripheral device to processor 801 and memory 802. In some embodiments, processor 801, memory 802 and peripheral device interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of processor 801, memory 802 and peripheral device interface 803 can be implemented on separate chips or circuit boards, which is not limited in this embodiment.
[0161] The radio frequency (RF) circuit 804 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The RF circuit 804 communicates with communication networks and other communication devices via electromagnetic signals. The RF circuit 804 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals back into electrical signals. Optionally, the RF circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, etc. The RF circuit 804 can communicate with other terminals through at least one wireless communication protocol. This wireless communication protocol includes, but is not limited to: metropolitan area networks (MANs), various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks (WLANs), and / or WiFi (Wireless Fidelity) networks. In some embodiments, the RF circuit 804 may also include circuitry related to NFC (Near Field Communication), which is not limited in this application.
[0162] Display screen 805 is used to display a UI (User Interface). This UI may include graphics, text, icons, videos, and any combination thereof. When display screen 805 is a touch display screen, it also has the ability to collect touch signals on or above its surface. These touch signals can be input as control signals to processor 801 for processing. In this case, display screen 805 can also be used to provide virtual buttons and / or a virtual keyboard, also known as soft buttons and / or a soft keyboard. In some embodiments, there may be one display screen 805, which serves as the front panel of terminal 800; in other embodiments, there may be at least two display screens, respectively disposed on different surfaces of terminal 800 or in a folded design; in still other embodiments, display screen 805 may be a flexible display screen, disposed on a curved or folded surface of terminal 800. Furthermore, display screen 805 may be configured as a non-rectangular irregular shape, i.e., a non-rectangular screen. Display screen 805 may be made of materials such as LCD (Liquid Crystal Display) or OLED (Organic Light-Emitting Diode).
[0163] The camera assembly 806 is used to acquire images or videos. Optionally, the camera assembly 806 includes a front-facing camera and a rear-facing camera. The front-facing camera is disposed on the front panel of the terminal, and the rear-facing camera is disposed on the back of the terminal. In some embodiments, there are at least two rear-facing cameras, which are any one of a main camera, a depth-sensing camera, a wide-angle camera, and a telephoto camera, to achieve background blurring by fusion of the main camera and the depth-sensing camera, panoramic shooting by fusion of the main camera and the wide-angle camera, VR (Virtual Reality) shooting, or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash may be a single-color temperature flash or a dual-color temperature flash. A dual-color temperature flash refers to a combination of a warm light flash and a cool light flash, which can be used for light compensation at different color temperatures.
[0164] The audio circuit 807 may include a microphone and a speaker. The microphone is used to collect sound waves from the user and the environment, converting the sound waves into electrical signals that are input to the processor 801 for processing, or input to the radio frequency circuit 804 to achieve voice communication. For stereo sound acquisition or noise reduction purposes, multiple microphones may be used, each located at a different part of the terminal 800. The microphone may also be an array microphone or an omnidirectional microphone. The speaker is used to convert the electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional diaphragm speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can convert electrical signals not only into audible sound waves but also into inaudible sound waves for purposes such as distance measurement. In some embodiments, the audio circuit 807 may also include a headphone jack.
[0165] The positioning component 808 is used to determine the current geographic location of the terminal 800 in order to enable navigation or LBS (Location Based Service). The positioning component 808 can be a positioning component based on the US GPS (Global Positioning System), China's BeiDou system, Russia's Granas system, or the European Union's Galileo system.
[0166] Power supply 809 is used to supply power to the various components in terminal 800. Power supply 809 can be AC power, DC power, a disposable battery, or a rechargeable battery. When power supply 809 includes a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery can also be used to support fast charging technology.
[0167] In some embodiments, the terminal 800 further includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: an accelerometer 811, a gyroscope 812, a pressure sensor 813, a fingerprint sensor 814, an optical sensor 815, and a proximity sensor 816.
[0168] Accelerometer 811 can detect the magnitude of acceleration on the three coordinate axes of a coordinate system established by terminal 800. For example, accelerometer 811 can be used to detect the components of gravitational acceleration on the three coordinate axes. Processor 801 can control display screen 805 to display the user interface in either a landscape or portrait view based on the gravitational acceleration signal acquired by accelerometer 811. Accelerometer 811 can also be used for games or for acquiring user motion data.
[0169] The gyroscope sensor 812 can detect the orientation and rotation angle of the terminal 800. The gyroscope sensor 812, in conjunction with the accelerometer sensor 811, can collect 3D motion data from the user on the terminal 800. Based on the data collected by the gyroscope sensor 812, the processor 801 can perform the following functions: motion sensing (e.g., changing the UI based on the user's tilt), image stabilization during shooting, game control, and inertial navigation.
[0170] The pressure sensor 813 can be disposed on the side bezel of the terminal 800 and / or on the lower layer of the display screen 805. When the pressure sensor 813 is disposed on the side bezel of the terminal 800, it can detect the user's grip signal on the terminal 800, and the processor 801 can perform left / right hand recognition or quick operation based on the grip signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed on the lower layer of the display screen 805, the processor 801 can control the operable controls on the UI interface based on the user's pressure operation on the display screen 805. The operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
[0171] The fingerprint sensor 814 is used to collect the user's fingerprint. The processor 801 identifies the user's identity based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the user's identity based on the collected fingerprint. When the user's identity is identified as trusted, the processor 801 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings. The fingerprint sensor 814 can be located on the front, back, or side of the terminal 800. When the terminal 800 has physical buttons or a manufacturer's logo, the fingerprint sensor 814 can be integrated with the physical buttons or manufacturer's logo.
[0172] An optical sensor 815 is used to collect ambient light intensity. In one embodiment, the processor 801 can control the display brightness of the display screen 805 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the display screen 805 is increased; when the ambient light intensity is low, the display brightness of the display screen 805 is decreased. In another embodiment, the processor 801 can also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.
[0173] The proximity sensor 816, also known as a distance sensor, is installed on the front panel of the terminal 800. The proximity sensor 816 is used to detect the distance between the user and the front of the terminal 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front of the terminal 800 is gradually decreasing, the processor 801 controls the display screen 805 to switch from a screen-on state to a screen-off state; when the proximity sensor 816 detects that the distance between the user and the front of the terminal 800 is gradually increasing, the processor 801 controls the display screen 805 to switch from a screen-off state to a screen-on state.
[0174] Those skilled in the art will understand that Figure 8 The structure shown does not constitute a limitation on terminal 800 and may include more or fewer components than shown, or combine certain components, or use different component arrangements.
[0175] Figure 9 This is a schematic diagram of a server structure provided in an embodiment of this application. The server 900 can vary considerably due to different configurations or performance. It may include one or more Central Processing Units (CPUs) 901 and one or more memories 902. The memory 902 stores at least one line of program code, which is loaded and executed by the processor 901 to implement the methods provided in the above-described method embodiments. Of course, the server may also have wired or wireless network interfaces, a keyboard, and input / output interfaces for input and output. The server may also include other components for implementing device functions, which will not be elaborated here.
[0176] The server 900 is used to execute the steps performed by the server in the above method embodiments.
[0177] In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory including program code that can be executed by a processor in a computer device to perform the edge location marking method in the above embodiments. For example, the computer-readable storage medium may be a ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
[0178] In an exemplary embodiment, a computer program or computer program product is also provided, which includes computer program code that, when executed by a computer, causes the computer to implement the edge position annotation method described above.
[0179] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.
[0180] The above description is merely an optional embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. A method for marking edge positions, characterized in that, The method includes: Displays 3D point cloud data of obstacles detected by the target object; In response to the annotation operation on the three-dimensional point cloud data, a rough bounding box is obtained for the obstacle, and the edge of the obstacle is within the coverage area of the rough bounding box; Based on the three-dimensional point cloud data and the rough bounding box, the obstacle is detected to obtain the edge position of the obstacle; The step of detecting the obstacle based on the 3D point cloud data and the coarse bounding box to obtain the edge position of the obstacle includes: Based on the position of each point in the 3D point cloud data and the rough marker box, determine multiple key points corresponding to the rough marker box; Based on the key point features of each key point and the position of the coarse marker box, the key point features of the key points corresponding to the coarse marker box are fused to obtain the marker box features of the coarse marker box. The features of the marked box are detected to obtain the edge position of the obstacle corresponding to the rough marked box.
2. The method according to claim 1, characterized in that, The method further includes: Based on the edge position of the obstacle and the three-dimensional point cloud data, training data is generated. The training data is used to train the obstacle detection model, and the obstacle detection model is used to perform obstacle detection.
3. The method according to claim 1, characterized in that, The step of determining multiple key points corresponding to the coarsely marked box based on the position of each point in the 3D point cloud data and the coarsely marked box includes: Based on the position of each point in the three-dimensional point cloud data, the centroid corresponding to the coarse marking box is determined, and the centroid is any point corresponding to the coarse marking box; Among the multiple points corresponding to the rough marking box, the point with the largest distance from the centroid is determined as the key point; For each of the remaining points, obtain the distance between the point and the centroid and the distance between the point and each of the determined key points, and determine the minimum distance between the point and the centroid and the distance between the point and each of the determined key points as the distance corresponding to the point. The point with the largest distance among the remaining points is identified as the key point, until a first preset number of key points are obtained.
4. The method according to claim 3, characterized in that, Determining the centroid of the rough marker box based on the position of each point in the 3D point cloud data includes: Obtain the average value of multiple points corresponding to the rough marking box, and determine the point corresponding to the average value as the centroid; Alternatively, any one of the multiple points corresponding to the rough marking box can be determined as the centroid.
5. The method according to claim 1, characterized in that, The method further includes: The three-dimensional point cloud data is divided into a second preset number of two-dimensional regions according to a two-dimensional angle; Feature extraction is performed on each of the two-dimensional regions in the second preset number of two-dimensional regions to obtain a first feature matrix, where each element in the first feature matrix indicates the first feature of the corresponding two-dimensional region. The first feature matrix is subjected to feature extraction to obtain a second feature matrix, where each element in the second feature matrix indicates the second feature of the corresponding two-dimensional region. For each of the plurality of key points, the first feature corresponding to the two-dimensional region to which the key point belongs in the first feature matrix, the second feature corresponding to the two-dimensional region to which the key point belongs, and the original feature of the key point are concatenated to obtain the key point feature of the key point.
6. The method according to claim 1, characterized in that, The method further includes: Based on the 3D point cloud data and the rough bounding box, the obstacle is detected to obtain the category of the obstacle.
7. The method according to claim 1, characterized in that, The method further includes: Based on the three-dimensional point cloud data and the rough bounding box, the obstacle is detected to obtain the movement direction angle of the obstacle. The movement direction angle is the angle between the movement direction of the obstacle and the preset direction.
8. The method according to any one of claims 1 to 7, characterized in that, The step of detecting the obstacle based on the 3D point cloud data and the coarse bounding box to obtain the edge position of the obstacle is implemented based on an edge detection model. The method further includes: Obtain the sample edge position of the obstacle corresponding to the rough bounding box in the sample 3D point cloud data; Based on the edge detection model, the sample 3D point cloud data, and the rough bounding box, obstacles are detected, and the predicted edge position of the obstacle corresponding to the rough bounding box is obtained. The edge detection model is trained based on the predicted edge location and the sample edge location.
9. A terminal, characterized in that, The terminal includes one or more processors and one or more memories, wherein at least one piece of program code is stored in the one or more memories, and the at least one piece of program code is loaded and executed by the one or more processors to perform the operations performed by the edge location annotation method as described in any one of claims 1 to 8.