Lane line labeling method and device of high-definition map, electronic equipment and storage medium
By combining image and point cloud semantic segmentation models and fusing lane line location and category information, the problem of high manual cost and low accuracy in lane line annotation in high-precision maps is solved, and efficient and accurate automated lane line annotation is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHIDAO NETWORK TECH (BEIJING) CO LTD
- Filing Date
- 2022-10-12
- Publication Date
- 2026-06-16
AI Technical Summary
Existing high-precision map lane line annotation methods suffer from high manual costs, low accuracy and recall, and unsatisfactory automated annotation based on single data. The lane line positions in image segmentation are not accurate enough, and the lane line categories in point cloud segmentation are not accurate enough.
By combining image semantic segmentation models and point cloud semantic segmentation models, road image data and laser point cloud data are acquired and semantically segmented separately to obtain the position and category information of lane lines. The results are then fused, and feature pyramid networks and point cloud semantic segmentation algorithms such as PointConv are used for feature extraction and 3D point filtering to improve the accuracy and efficiency of lane line labeling.
It improves the accuracy and efficiency of lane line annotation, makes up for the problems of inaccurate image segmentation location and inaccurate point cloud segmentation category, realizes end-to-end high-precision map lane line automatic annotation, reduces labor and time costs, and improves detection speed.
Smart Images

Figure CN115546752B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of high-precision map technology, and in particular to a method and apparatus for lane line marking in high-precision maps, electronic equipment and storage medium. Background Technology
[0002] With the rapid development of autonomous driving technology, technologies such as perception, control, localization, and high-precision maps applied to autonomous driving are also constantly evolving. Their purpose is to ensure the safety of autonomous vehicles. Among them, high-precision maps play an important role in the autonomous driving process, providing autonomous vehicles with information such as the current road and surrounding traffic signs. They play a crucial role in lane departure and trajectory planning, and at the same time provide important safety guarantees for autonomous vehicles.
[0003] The annotation methods for high-precision maps mainly include manual annotation and a combination of automated annotation and manual repair. With the continuous development of deep learning, deep learning models have also been gradually applied to the automated annotation process of high-precision maps. These models have been trained with data and can learn more semantic information, so they can still segment lane lines even when there are no markers in the image.
[0004] However, the aforementioned solutions based on manual annotation require a significant amount of manual work for high-precision map annotation, resulting in high time and labor costs. Solutions combining automated annotation and manual repair are mostly based on single data sources such as point cloud data or image data, leading to low accuracy and recall rates in the generated high-precision maps. Automated annotation is ineffective, and manual repair also incurs substantial time and labor costs. While data-driven methods using deep learning models can easily extend to changes in road scenes, the results obtained from single data sources such as point cloud data or image data still have limitations. For example, lane line positions segmented from images are not accurate enough, and lane line categories segmented from point clouds are not accurate enough. Summary of the Invention
[0005] This application provides a method, apparatus, electronic device, and storage medium for lane line annotation in high-precision maps, to improve the accuracy and efficiency of lane line annotation.
[0006] The embodiments of this application adopt the following technical solutions:
[0007] In a first aspect, embodiments of this application provide a method for lane line annotation in a high-precision map, wherein the method includes:
[0008] Acquire the current road image data and the corresponding laser point cloud data;
[0009] The road image data is semantically segmented using a preset image semantic segmentation model to obtain a first image semantic segmentation result, which includes the location and category of discrete points of lane lines in the road image.
[0010] The laser point cloud data is semantically segmented using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, which include the positions of lane line 3D points in 3D space.
[0011] The semantic segmentation results of the first image and the semantic segmentation results of the point cloud are fused to obtain the lane line recognition results, so as to perform lane line annotation in the high-precision map based on the lane line recognition results.
[0012] Optionally, the step of using a preset image semantic segmentation model to perform semantic segmentation on the road image data to obtain a first image semantic segmentation result includes:
[0013] The feature pyramid network in the preset image semantic segmentation model is used to extract features from the road image data to obtain the feature map of the road image.
[0014] The feature map of the road image is divided into grids to obtain a feature map containing multiple grids;
[0015] The prediction network in the preset image semantic segmentation model is used to predict the feature map containing multiple grids to obtain the feature map prediction result;
[0016] The location and category of discrete lane line points in the road image are determined based on the feature map prediction results.
[0017] Optionally, the feature map prediction result includes the category of each grid in the feature map, and determining the location and category of the discrete points of the lane lines in the road image based on the feature map prediction result includes:
[0018] The grid in which the lane line is located is determined based on the category of each grid in the feature map;
[0019] The initial position of the discrete points of the lane line is determined based on the center position of the grid in which the lane line is located.
[0020] The feature map containing multiple grids is convolved to obtain the center position correction value of the grid where the lane line is located.
[0021] The initial positions of the discrete points of the lane lines are corrected by using the center position correction value of the grid where the lane lines are located, thus obtaining the corrected positions of the discrete points of the lane lines.
[0022] Optionally, fusing the semantic segmentation result of the first image and the semantic segmentation result of the point cloud includes:
[0023] The discrete points of the lane lines in the road image are projected into 3D space to obtain the projection points of the discrete points of the lane lines in 3D space.
[0024] Based on the projection points of the discrete points of the lane lines in 3D space, the 3D points of the lane lines in 3D space are filtered.
[0025] The lane line recognition result is determined based on the selected lane line 3D points.
[0026] Optionally, the step of filtering the lane line 3D points in the 3D space based on the projection points of the lane line discrete points in the 3D space includes:
[0027] A sphere of a predetermined size is constructed with the position of the projection point of the discrete points of the lane line in 3D space as the center.
[0028] A 3D point located in a sphere of a preset size in the 3D space is identified as a candidate matching point corresponding to the projection point;
[0029] The 3D point in the 3D space that matches the projection point is determined based on the category score of the candidate matching point;
[0030] The 3D points that match the projection points are used as the filtered lane line 3D points.
[0031] Optionally, determining the lane line recognition result based on the filtered lane line 3D points includes:
[0032] The positions of the filtered lane line 3D points are used as the positions of the final lane line discrete points.
[0033] The category of the filtered lane line 3D points is determined based on the category corresponding to the projection points, and this category is used as the final category of the lane line discrete points.
[0034] The final position and category of the lane line discrete points are used as the lane line recognition result.
[0035] Optionally, the preset image semantic segmentation model is trained in the following manner:
[0036] Acquire training sample images and input the training sample images into a preset image semantic segmentation model to obtain a second image semantic segmentation result;
[0037] The relative positions of each lane line discrete point in the training sample image are determined based on the semantic segmentation results of the second image;
[0038] The loss weights corresponding to each lane line discrete point are determined based on their relative positions in the road image.
[0039] The loss value of the preset image semantic segmentation model is determined based on the loss weights corresponding to the discrete points of each lane line, and the parameters of the preset image semantic segmentation model are updated using the loss value of the preset image semantic segmentation model to obtain the trained preset image semantic segmentation model.
[0040] Secondly, embodiments of this application also provide a lane marking device for high-precision maps, wherein the device includes:
[0041] The acquisition unit is used to acquire the current road image data and the corresponding laser point cloud data;
[0042] The first semantic segmentation unit is used to perform semantic segmentation on the road image data using a preset image semantic segmentation model to obtain a first image semantic segmentation result, which includes the location and category of discrete points of lane lines in the road image.
[0043] The second semantic segmentation unit is used to perform semantic segmentation on the laser point cloud data using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, the point cloud semantic segmentation results including the positions of lane line 3D points in 3D space.
[0044] The fusion unit is used to fuse the semantic segmentation result of the first image and the semantic segmentation result of the point cloud to obtain the lane line recognition result, so as to perform lane line annotation in the high-precision map based on the lane line recognition result.
[0045] Thirdly, embodiments of this application also provide an electronic device, including:
[0046] Processor; and
[0047] A memory configured to store computer-executable instructions, which, when executed, cause the processor to perform any of the methods described above.
[0048] Fourthly, embodiments of this application also provide a computer-readable storage medium that stores one or more programs, which, when executed by an electronic device including multiple applications, cause the electronic device to perform any of the methods described above.
[0049] The at least one technical solution adopted in this application embodiment can achieve the following beneficial effects: The lane line annotation method for high-precision maps in this application embodiment first acquires the current road image data and the corresponding laser point cloud data; then, it uses a preset image semantic segmentation model to perform semantic segmentation on the road image data to obtain a first image semantic segmentation result, which includes the position and category of discrete lane line points in the road image; then, it uses a preset point cloud semantic segmentation model to perform semantic segmentation on the laser point cloud data to obtain a point cloud semantic segmentation result, which includes the position of 3D lane line points in 3D space; finally, it fuses the first image semantic segmentation result and the point cloud semantic segmentation result to obtain a lane line recognition result, so as to perform lane line annotation in the high-precision map based on the lane line recognition result. The lane line annotation method for high-precision maps in this application embodiment fuses the image semantic segmentation result and the point cloud semantic segmentation result of the lane line, which makes up for the problem of inaccurate lane line position obtained based on the image semantic segmentation result and the problem of inaccurate lane line category obtained based on the point cloud semantic segmentation result, thereby improving the accuracy of lane line annotation, and improving the efficiency of lane line annotation through the discretization processing of lane line points. Attached Figure Description
[0050] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings:
[0051] Figure 1 This is a flowchart illustrating a lane line marking method for a high-precision map according to an embodiment of this application.
[0052] Figure 2 This is a schematic diagram of the structure of a lane marking device for a high-precision map according to an embodiment of this application;
[0053] Figure 3 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Detailed Implementation
[0054] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0055] The technical solutions provided by the various embodiments of this application are described in detail below with reference to the accompanying drawings.
[0056] This application provides a method for lane line annotation in high-precision maps, such as... Figure 1 The diagram illustrates a flowchart of a lane marking method for a high-precision map according to an embodiment of this application. The method includes at least the following steps S110 to S140:
[0057] Step S110: Obtain the current road image data and the corresponding laser point cloud data.
[0058] The lane marking method for high-precision maps in this application embodiment can be executed by the vehicle. The vehicle is equipped with a camera and a LiDAR. After adjusting the parameters, data can be collected in real time. When marking lanes in a high-precision map, it is necessary to first obtain the road image data collected by the current camera and the laser point cloud data collected by the LiDAR.
[0059] In addition, since the data acquisition frequencies of cameras and lidar are different, time synchronization processing can be performed on road image data and lidar point cloud data to ensure the accuracy of subsequent data processing.
[0060] Step S120: Use a preset image semantic segmentation model to perform semantic segmentation on the road image data to obtain a first image semantic segmentation result. The first image semantic segmentation result includes the location and category of discrete points of lane lines in the road image.
[0061] This application embodiment can use a pre-trained preset image semantic segmentation model to perform semantic segmentation on road image data, thereby segmenting lane line related information from the road image, specifically including the location and category of lane line discrete points, where the category refers to whether it is a lane line category.
[0062] Existing image semantic segmentation models produce a continuous set of pixels after semantic segmentation of lane lines in an image. However, high-precision maps do not require all continuous lane line points for lane line annotation, which places high demands on model learning difficulty and segmentation accuracy. Therefore, the preset image semantic segmentation model used in this application outputs discrete lane line points. Connecting these discrete points yields a complete lane line, thus reducing the difficulty of model learning and the amount of data required for subsequent processing.
[0063] Step S130: Use a preset point cloud semantic segmentation model to perform semantic segmentation on the laser point cloud data to obtain point cloud semantic segmentation results. The point cloud semantic segmentation results include the positions of lane line 3D points in 3D space.
[0064] This application embodiment also requires the use of a pre-trained preset point cloud semantic segmentation model to perform semantic segmentation on the laser point cloud data. The preset point cloud semantic segmentation model can be implemented using the PointConv algorithm. PointConv is used to perform convolution operations on a non-uniformly sampled 3D point cloud dataset. The actual convolution operation can be regarded as a discrete approximation of continuous convolution. In 3D space, the weights of the convolution operator can be regarded as a continuous function of the local 3D point coordinates relative to the reference 3D point. The continuous function can be approximated by MLP (Multilayer Perceptron). Since most existing algorithms do not consider the influence of non-uniform sampling, the PointConv algorithm proposes to recalculate the continuous function learned by MLP using inverse density estimation. This process corresponds to the Monte Carlo approximation in the continuous function. Based on this, the point cloud semantic segmentation results can be obtained, specifically including the position of the lane line 3D points in 3D space and the classification of foreground and background points.
[0065] Of course, in addition to using the above-mentioned preset point cloud semantic segmentation model, those skilled in the art can also flexibly adopt other types of point cloud semantic segmentation models according to actual needs, without making specific limitations here.
[0066] Step S140: The semantic segmentation result of the first image and the semantic segmentation result of the point cloud are fused to obtain the lane line recognition result, so as to perform lane line annotation in the high-precision map according to the lane line recognition result.
[0067] Since the lane line positions in the image semantic segmentation results are not accurate enough, and the lane line categories in the point cloud semantic segmentation results are not accurate enough, this application adopts a certain fusion strategy to fuse the first image semantic segmentation results and the point cloud semantic segmentation results, thereby ensuring that the finally identified lane lines have both high positional accuracy and high category accuracy. Finally, based on the lane line identification results obtained after fusion, lane lines are labeled in a high-precision map, which improves the accuracy of lane line labeling.
[0068] The lane line labeling method for high-precision maps in this application integrates the image semantic segmentation results and the point cloud semantic segmentation results of lane lines, which makes up for the problems of inaccurate lane line positions obtained based on image semantic segmentation results and inaccurate lane line categories obtained based on point cloud semantic segmentation results, thereby improving the accuracy of lane line labeling. Furthermore, the lane line labeling efficiency is improved through the discretization processing of lane line points.
[0069] In some embodiments of this application, the step of semantically segmenting the road image data using a preset image semantic segmentation model to obtain a first image semantic segmentation result includes: extracting features from the road image data using a feature pyramid network in the preset image semantic segmentation model to obtain a feature map of the road image; dividing the feature map of the road image into grids to obtain a feature map containing multiple grids; predicting the feature map containing multiple grids using a prediction network in the preset image semantic segmentation model to obtain a feature map prediction result; and determining the location and category of discrete points of lane lines in the road image based on the feature map prediction result.
[0070] Since lane lines are relatively small in images and belong to small target detection objects, this embodiment of the application can first use Feature Pyramid Networks (FPN) to extract features from road image data when using a preset image semantic segmentation model to perform semantic segmentation on road image data, thereby obtaining the corresponding feature map. In this way, the information of lane line targets can be fully preserved during the convolution operation.
[0071] In order to obtain sparse lane line discrete points, the feature map of the road image obtained above is divided into a grid, that is, the feature map is divided into several grids of the same size, and convolution processing and prediction are performed on the feature map after grid division to obtain the prediction result of the feature map after grid division. Finally, the position and category of the lane line discrete points in the road image are determined according to the prediction result of the feature map after grid division.
[0072] In some embodiments of this application, the feature map prediction result includes the category of each grid in the feature map, and determining the position and category of the discrete lane line points in the road image based on the feature map prediction result includes: determining the grid where the lane line is located based on the category of each grid in the feature map; determining the initial position of the discrete lane line point based on the center position of the grid where the lane line is located; performing convolution processing on the feature map containing multiple grids to obtain a correction value for the center position of the grid where the lane line is located; and correcting the initial position of the discrete lane line point using the correction value for the center position of the grid where the lane line is located to obtain the corrected position of the discrete lane line point.
[0073] Based on the prediction results of the aforementioned embodiments, it is possible to determine which grids in the feature map after the current grid division are lane line grids, or to determine which grids the predicted lane line points fall within. Since the position range of each grid in the feature map can be determined, the center position of the grid where the lane line is located can be further determined. The center position of each grid where the lane line is located is used as the initial position of the coarse lane line discrete points. This process is equivalent to converting the originally segmented continuous lane line pixels into lane line discrete points based on the size of each grid.
[0074] Since each grid corresponds to a location range, the position of the discrete lane line point determined based on the center position of each grid has a certain deviation from the actual position of the lane line. Therefore, the initial position of the discrete lane line point can be corrected to improve the position accuracy of the discrete lane line point.
[0075] When correcting the initial position of discrete points of lane lines, the following methods can be used: 1) For each grid where a lane line is located, for example, 1*n grids, a convolution kernel can be generated; 2) Use this convolution kernel to convolve the feature map after dividing the grid, and a new vector, such as a 2*n matrix, can be obtained, corresponding to the n grids where the lane line is located, so that the position correction value (△x, △y) of each grid can be obtained; 3) Use the position correction value (△x, △y) of each grid to correct the initial position of the discrete points of the lane line corresponding to each grid, so as to obtain the corrected position (x+△x, y+△y) of the discrete points of the lane line.
[0076] In some embodiments of this application, the fusion of the first image semantic segmentation result and the point cloud semantic segmentation result includes: projecting discrete lane line points in the road image into 3D space to obtain the projection points of the discrete lane line points in 3D space; filtering the 3D lane line points in the 3D space based on the projection points of the discrete lane line points in 3D space; and determining the lane line recognition result based on the filtered 3D lane line points.
[0077] Since the information of discrete lane lines in the image semantic segmentation result is two-dimensional information located in the image coordinate system, while the point cloud semantic segmentation result corresponds to the 3D information of lane lines in three-dimensional space, when fusing the two, the discrete lane lines in the road image can be projected into 3D space through perspective transformation to obtain the projection points of the discrete lane lines in 3D space. Since the number of 3D lane line points segmented in 3D space is very large, and not all 3D points are needed in the actual annotation of high-precision maps, and there will be some 3D points with inaccurate classification in the point cloud segmentation result, the discrete lane line points obtained based on image semantic segmentation can provide accurate lane line point category information. Therefore, the 3D lane line points segmented in 3D space can be filtered based on this, and a more accurate lane line recognition result can be determined based on the filtered lane line 3D points.
[0078] In some embodiments of this application, the step of filtering lane line 3D points in 3D space based on the projection points of the discrete lane line points in 3D space includes: constructing a sphere of a preset size centered on the position of the projection points of the discrete lane line points in 3D space; determining 3D points in 3D space located within the sphere of the preset size as candidate matching points corresponding to the projection points; determining 3D points in 3D space that match the projection points based on the category scores of the candidate matching points; and using the 3D points that match the projection points as the filtered lane line 3D points.
[0079] Since the lane line positions obtained based on image semantic segmentation are not accurate enough compared to those obtained based on point cloud segmentation, this embodiment of the application can first construct a sphere with a preset radius centered on the projection point of the discrete lane line points in 3D space when filtering lane line 3D points in 3D space. Then, all 3D points located within the sphere are regarded as candidate matching points corresponding to the projection point. Since the accuracy of the lane line discrete points has been further improved after the aforementioned embodiment corrects the position of the discrete lane line points, this method can reduce the impact of position deviation to a certain extent.
[0080] Since the semantic segmentation results of point clouds can also obtain the category scores of 3D points, the higher the category score of a 3D point, the more likely the 3D point is to be a lane line 3D point. Therefore, the 3D point with the highest category score in the sphere can be used as the 3D point that is finally matched with the projection point, that is, the 3D point after final screening. In this way, the lane line 3D point that matches the projection point of each lane line discrete point can be obtained.
[0081] In some embodiments of this application, determining the lane line recognition result based on the filtered lane line 3D points includes: using the position of the filtered lane line 3D points as the position of the final lane line discrete point; determining the category of the filtered lane line 3D points according to the category corresponding to the projection point, and using it as the category of the final lane line discrete point; and using the position of the final lane line discrete point and the category of the final lane line discrete point as the lane line recognition result.
[0082] Based on the aforementioned embodiments, the 3D points of the lane lines that match the projection points of each discrete lane line point can be obtained. Therefore, the filtered 3D points of the lane lines are essentially also discrete 3D points of the lane lines. The category corresponding to the projection point of each discrete lane line point can be directly assigned to the matching 3D points of the lane lines to obtain the category of the final discrete lane line point, and the position of the 3D points of the lane lines can be used as the position of the final discrete lane line point.
[0083] In other words, the lane line positions in the final lane line recognition result essentially originate from the 3D lane line points segmented from the point cloud data, while the lane line categories are derived from the categories of the discrete lane line points segmented from the image. This addresses both the inaccuracy of lane line positions in image semantic segmentation and the inaccuracy of lane line categories in point cloud semantic segmentation. Furthermore, based on the positions and categories of the discrete lane line points segmented from the image, the 3D lane line points are also filtered, significantly reducing the amount of data processed and improving lane line annotation efficiency.
[0084] In some embodiments of this application, the preset image semantic segmentation model is trained as follows: a training sample image is acquired and input into the preset image semantic segmentation model to obtain a second image semantic segmentation result; the relative positions of each lane line discrete point in the training sample image are determined based on the second image semantic segmentation result; the loss weights corresponding to each lane line discrete point are determined based on their relative positions in the road image; the loss value of the preset image semantic segmentation model is determined based on the loss weights corresponding to each lane line discrete point, and the parameters of the preset image semantic segmentation model are updated using the loss value of the preset image semantic segmentation model to obtain the trained preset image semantic segmentation model.
[0085] In this embodiment of the application, when training a preset image semantic segmentation model, it is necessary to first collect training sample images, and then input them into the preset image semantic segmentation model to obtain the second image semantic segmentation result, including the position and category of the segmented lane line discrete points. By comparing with the label information of the training sample images, the loss value of each lane line discrete point can be calculated, and the model parameters can be optimized by the magnitude of the loss value. The specific loss function used can be, for example, the cross-entropy loss function or the Focal Loss loss function.
[0086] Considering that the farther the lane line is from the observation point, the greater the detection difficulty, this embodiment of the application can assign different weights to the discrete points of each lane line based on their relative positions in the image. For example, the farther the position of the discrete point of the lane line is from the observation point, that is, from the lower boundary of the image, the greater the detection difficulty, so it can be given a higher weight. Specifically, the normalized result of the Euclidean distance from the position of the discrete point of the lane line to the position of the observation point can be used as the weight of the lane line discrete point loss. Finally, the model parameters are optimized by weighting, thereby improving the accuracy of the image semantic segmentation model for detecting distant targets.
[0087] In summary, the lane line annotation method for high-precision maps in this application has achieved at least the following technical effects:
[0088] 1) Compared with traditional lane line semantic segmentation models, the output of the image semantic segmentation model in this application is discrete points on the lane line, rather than a continuous set of pixels. These points can be directly connected into lines to serve as the automated annotation results of lane lines in high-precision maps, which reduces the difficulty of model learning.
[0089] 2) This application fuses the image semantic segmentation results and the point cloud semantic segmentation results, which improves the positional accuracy and category accuracy of lane lines;
[0090] 3) The lane line automatic annotation method of this application realizes end-to-end, reduces the tedious intermediate processes, reduces labor and time costs, and can also improve the detection speed to meet real-time requirements through GPU (Graphics Processing Unit) acceleration, thereby improving annotation efficiency;
[0091] 4) This application uses a dynamic weighting method to calculate the training loss value of the image semantic segmentation model, which can improve the detection accuracy of the image semantic segmentation model for distant targets.
[0092] This application embodiment also provides a lane marking device 200 for high-precision maps, such as... Figure 2As shown, a schematic diagram of a lane marking device for a high-precision map according to an embodiment of this application is provided. The device 200 includes: an acquisition unit 210, a first semantic segmentation unit 220, a second semantic segmentation unit 230, and a fusion unit 240, wherein:
[0093] The acquisition unit 210 is used to acquire the current road image data and the corresponding laser point cloud data;
[0094] The first semantic segmentation unit 220 is used to perform semantic segmentation on the road image data using a preset image semantic segmentation model to obtain a first image semantic segmentation result, the first image semantic segmentation result including the location and category of discrete points of lane lines in the road image;
[0095] The second semantic segmentation unit 230 is used to perform semantic segmentation on the laser point cloud data using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, the point cloud semantic segmentation results including the positions of lane line 3D points in 3D space.
[0096] The fusion unit 240 is used to fuse the semantic segmentation result of the first image and the semantic segmentation result of the point cloud to obtain the lane line recognition result, so as to perform lane line annotation in the high-precision map based on the lane line recognition result.
[0097] In some embodiments of this application, the first semantic segmentation unit 220 is specifically used to: extract features from the road image data using the feature pyramid network in the preset image semantic segmentation model to obtain a feature map of the road image; divide the feature map of the road image into grids to obtain a feature map containing multiple grids; predict the feature map containing multiple grids using the prediction network in the preset image semantic segmentation model to obtain a feature map prediction result; and determine the position and category of the discrete points of lane lines in the road image based on the feature map prediction result.
[0098] In some embodiments of this application, the feature map prediction result includes the category of each grid in the feature map, and the first semantic segmentation unit 220 is specifically used to: determine the grid where the lane line is located according to the category of each grid in the feature map; determine the initial position of the discrete point of the lane line according to the center position of the grid where the lane line is located; perform convolution processing on the feature map containing multiple grids to obtain the center position correction value of the grid where the lane line is located; and use the center position correction value of the grid where the lane line is located to correct the initial position of the discrete point of the lane line to obtain the corrected position of the discrete point of the lane line.
[0099] In some embodiments of this application, the fusion unit is specifically used to: project discrete lane line points in the road image into 3D space to obtain projection points of the discrete lane line points in 3D space; filter lane line 3D points in 3D space based on the projection points of the discrete lane line points in 3D space; and determine the lane line recognition result based on the filtered lane line 3D points.
[0100] In some embodiments of this application, the fusion unit 240 is specifically used to: construct a sphere of a preset size centered on the position of the projection point of the discrete lane line point in 3D space; determine the 3D point in the 3D space located in the sphere of the preset size as the candidate matching point corresponding to the projection point; determine the 3D point in the 3D space that matches the projection point according to the category score of the candidate matching point; and use the 3D point that matches the projection point as the filtered lane line 3D point.
[0101] In some embodiments of this application, the fusion unit 240 is specifically used to: use the position of the filtered lane line 3D points as the position of the final lane line discrete points; determine the category of the filtered lane line 3D points according to the category corresponding to the projection points, and use it as the category of the final lane line discrete points; use the position of the final lane line discrete points and the category of the final lane line discrete points as the lane line recognition result.
[0102] In some embodiments of this application, the preset image semantic segmentation model is trained as follows: a training sample image is acquired and input into the preset image semantic segmentation model to obtain a second image semantic segmentation result; the relative positions of each lane line discrete point in the training sample image are determined based on the second image semantic segmentation result; the loss weights corresponding to each lane line discrete point are determined based on their relative positions in the road image; the loss value of the preset image semantic segmentation model is determined based on the loss weights corresponding to each lane line discrete point, and the parameters of the preset image semantic segmentation model are updated using the loss value of the preset image semantic segmentation model to obtain the trained preset image semantic segmentation model.
[0103] It is understood that the lane marking device for high-precision maps described above can implement each step of the lane marking method for high-precision maps provided in the foregoing embodiments. The relevant explanations of the lane marking method for high-precision maps are applicable to the lane marking device for high-precision maps, and will not be repeated here.
[0104] Figure 3 This is a schematic diagram of the structure of an electronic device according to an embodiment of this application. Please refer to it. Figure 3At the hardware level, the electronic device includes a processor, and optionally also includes an internal bus, a network interface, and memory. The memory may include main memory, such as high-speed random-access memory (RAM), or non-volatile memory, such as at least one disk drive. Of course, the electronic device may also include other hardware required for other business operations.
[0105] The processor, network interface, and memory can be interconnected via an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, or an EISA (Extended Industry Standard Architecture) bus, etc. This bus can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 3 The symbol is represented by a single double-headed arrow, but this does not mean that there is only one bus or one type of bus.
[0106] Memory is used to store programs. Specifically, programs may include program code, which includes computer operation instructions. Memory may include main memory and non-volatile memory, and provides instructions and data to the processor.
[0107] The processor reads the corresponding computer program from non-volatile memory into main memory and then executes it, forming a lane marking device for a high-precision map at the logical level. The processor executes the program stored in memory and specifically performs the following operations:
[0108] Acquire the current road image data and the corresponding laser point cloud data;
[0109] The road image data is semantically segmented using a preset image semantic segmentation model to obtain a first image semantic segmentation result, which includes the location and category of discrete points of lane lines in the road image.
[0110] The laser point cloud data is semantically segmented using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, which include the positions of lane line 3D points in 3D space.
[0111] The semantic segmentation results of the first image and the semantic segmentation results of the point cloud are fused to obtain the lane line recognition results, so as to perform lane line annotation in the high-precision map based on the lane line recognition results.
[0112] The above is as stated in this application. Figure 1 The method executed by the lane marking device for high-precision maps disclosed in the illustrated embodiments can be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in the embodiments of this application can be directly embodied as being executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.
[0113] The electronic device can also perform Figure 1 The method for implementing lane line marking devices in medium- and high-precision maps, and the implementation of lane line marking devices in high-precision maps. Figure 1 The functions of the embodiments shown are not described in detail here.
[0114] This application also proposes a computer-readable storage medium that stores one or more programs, the programs including instructions that, when executed by an electronic device including multiple applications, enable the electronic device to perform... Figure 1 The method executed by the lane marking device of the high-precision map in the illustrated embodiment is specifically used to perform:
[0115] Acquire the current road image data and the corresponding laser point cloud data;
[0116] The road image data is semantically segmented using a preset image semantic segmentation model to obtain a first image semantic segmentation result, which includes the location and category of discrete points of lane lines in the road image.
[0117] The laser point cloud data is semantically segmented using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, which include the positions of lane line 3D points in 3D space.
[0118] The semantic segmentation results of the first image and the semantic segmentation results of the point cloud are fused to obtain the lane line recognition results, so as to perform lane line annotation in the high-precision map based on the lane line recognition results.
[0119] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0120] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0121] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0122] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0123] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0124] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0125] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0126] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0127] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0128] The above description is merely an embodiment of this application and is not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.
Claims
1. A method for lane line annotation in high-precision maps, wherein, The method includes: Acquire the current road image data and the corresponding laser point cloud data; The road image data is semantically segmented using a preset image semantic segmentation model to obtain a first image semantic segmentation result, which includes the location and category of discrete points of lane lines in the road image. The laser point cloud data is semantically segmented using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, which include the positions of lane line 3D points in 3D space. The semantic segmentation results of the first image and the semantic segmentation results of the point cloud are fused to obtain the lane line recognition results, so as to mark lane lines in the high-precision map based on the lane line recognition results; The category of the discrete points of the lane lines refers to whether they are lane line points. The fusion of the semantic segmentation result of the first image and the semantic segmentation result of the point cloud includes: The discrete points of the lane lines in the road image are projected into 3D space to obtain the projection points of the discrete points of the lane lines in 3D space. Based on the projection points of the discrete points of the lane lines in 3D space, the 3D points of the lane lines in 3D space are filtered. The lane line recognition result is determined based on the filtered lane line 3D points; The step of filtering the lane line 3D points in the 3D space based on the projection points of the lane line discrete points in the 3D space includes: A sphere of a predetermined size is constructed with the position of the projection point of the discrete points of the lane line in 3D space as the center. A 3D point located in a sphere of a preset size in the 3D space is identified as a candidate matching point corresponding to the projection point; The 3D point in the 3D space that matches the projection point is determined based on the category score of the candidate matching point; The 3D points that match the projection points are used as the filtered lane line 3D points. The step of determining the 3D point in the 3D space that matches the projection point based on the category score of the candidate matching point includes: The candidate matching point with the highest category score is taken as the 3D point in the 3D space that matches the projection point; The step of using a preset image semantic segmentation model to perform semantic segmentation on the road image data to obtain a first image semantic segmentation result includes: The feature pyramid network in the preset image semantic segmentation model is used to extract features from the road image data to obtain the feature map of the road image. The feature map of the road image is divided into grids to obtain a feature map containing multiple grids; The prediction network in the preset image semantic segmentation model is used to predict the feature map containing multiple grids to obtain the feature map prediction result; The location and category of discrete points of lane lines in the road image are determined based on the feature map prediction results; The feature map prediction result includes the category of each grid in the feature map, and determining the location and category of the discrete points of the lane lines in the road image based on the feature map prediction result includes: The grid in which the lane line is located is determined based on the category of each grid in the feature map; The initial position of the discrete points of the lane line is determined based on the center position of the grid in which the lane line is located. The feature map containing multiple grids is convolved to obtain the center position correction value of the grid where the lane line is located. The initial positions of the discrete points of the lane lines are corrected by using the center position correction value of the grid where the lane lines are located, thus obtaining the corrected positions of the discrete points of the lane lines.
2. The method as described in claim 1, wherein, The process of determining the lane line recognition result based on the filtered lane line 3D points includes: The positions of the filtered lane line 3D points are used as the positions of the final lane line discrete points. The category of the filtered lane line 3D points is determined based on the category corresponding to the projection points, and this category is used as the final category of the lane line discrete points. The final position and category of the lane line discrete points are used as the lane line recognition result.
3. The method as described in claim 1, wherein, The preset image semantic segmentation model is trained in the following way: Acquire training sample images and input the training sample images into a preset image semantic segmentation model to obtain a second image semantic segmentation result; The relative positions of each lane line discrete point in the training sample image are determined based on the semantic segmentation results of the second image; The loss weights corresponding to each lane line discrete point are determined based on their relative positions in the road image. The loss value of the preset image semantic segmentation model is determined based on the loss weights corresponding to the discrete points of each lane line, and the parameters of the preset image semantic segmentation model are updated using the loss value of the preset image semantic segmentation model to obtain the trained preset image semantic segmentation model.
4. A lane marking device for a high-precision map, wherein, The device includes: The acquisition unit is used to acquire the current road image data and the corresponding laser point cloud data; The first semantic segmentation unit is used to perform semantic segmentation on the road image data using a preset image semantic segmentation model to obtain a first image semantic segmentation result, which includes the location and category of discrete points of lane lines in the road image. The second semantic segmentation unit is used to perform semantic segmentation on the laser point cloud data using a preset point cloud semantic segmentation model to obtain point cloud semantic segmentation results, the point cloud semantic segmentation results including the positions of lane line 3D points in 3D space. The fusion unit is used to fuse the semantic segmentation result of the first image and the semantic segmentation result of the point cloud to obtain the lane line recognition result, so as to perform lane line annotation in the high-precision map based on the lane line recognition result; The category of the discrete points of the lane lines refers to whether they are lane line points. The fusion unit is specifically used for: The discrete points of the lane lines in the road image are projected into 3D space to obtain the projection points of the discrete points of the lane lines in 3D space. Based on the projection points of the discrete points of the lane lines in 3D space, the 3D points of the lane lines in 3D space are filtered. The lane line recognition result is determined based on the filtered lane line 3D points; The fusion unit is specifically used for: A sphere of a predetermined size is constructed with the position of the projection point of the discrete points of the lane line in 3D space as the center. A 3D point located in a sphere of a preset size in the 3D space is identified as a candidate matching point corresponding to the projection point; The 3D point in the 3D space that matches the projection point is determined based on the category score of the candidate matching point; The 3D points that match the projection points are used as the filtered lane line 3D points. The fusion unit is specifically used for: The candidate matching point with the highest category score is taken as the 3D point in the 3D space that matches the projection point; The first semantic segmentation unit is specifically used for: The feature pyramid network in the preset image semantic segmentation model is used to extract features from the road image data to obtain the feature map of the road image. The feature map of the road image is divided into grids to obtain a feature map containing multiple grids; The prediction network in the preset image semantic segmentation model is used to predict the feature map containing multiple grids to obtain the feature map prediction result; The location and category of discrete points of lane lines in the road image are determined based on the feature map prediction results; The feature map prediction result includes the category of each grid in the feature map, and the first semantic segmentation unit is specifically used for: The grid in which the lane line is located is determined based on the category of each grid in the feature map; The initial position of the discrete points of the lane line is determined based on the center position of the grid in which the lane line is located. The feature map containing multiple grids is convolved to obtain the center position correction value of the grid where the lane line is located. The initial positions of the discrete points of the lane lines are corrected by using the center position correction value of the grid where the lane lines are located, thus obtaining the corrected positions of the discrete points of the lane lines.
5. An electronic device, comprising: processor; as well as A memory configured to store computer-executable instructions, which, when executed, cause the processor to perform the method of any one of claims 1 to 3.
6. A computer-readable storage medium storing one or more programs, which, when executed by an electronic device including a plurality of applications, cause the electronic device to perform the method of any one of claims 1 to 3.