A ground line generation method and device based on a point cloud and an intelligent driving device

By fusing multiple LiDAR point clouds and using neural network semantic segmentation, a two-dimensional grid classification map is constructed and non-ground points are filtered out, which solves the problem of insufficient ground line detection accuracy in low-speed scenarios and improves the path planning and perception capabilities of autonomous driving.

CN121982131BActive Publication Date: 2026-06-23城市之光(深圳)无人驾驶有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
城市之光(深圳)无人驾驶有限公司
Filing Date
2026-04-08
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing ground line detection methods in low-speed scenarios suffer from insufficient 3D positioning accuracy and poor adaptability to dynamic environments. In particular, vision-based methods experience a decrease in accuracy when the slope changes, while LiDAR-based grid mapping methods are limited by grid resolution and cannot fully utilize the high-precision ranging capabilities of LiDAR.

Method used

By fusing multiple lidar point clouds and using neural network semantic segmentation, a two-dimensional raster classification map is constructed to identify non-ground areas. A three-dimensional index mapping between points and rasters is established, and a set of non-ground points is selected to generate ground lines.

Benefits of technology

It achieves higher precision ground line generation, improves the reliability and perception capabilities of vehicle path planning in low-speed autonomous driving scenarios, and enhances driving safety.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121982131B_ABST
    Figure CN121982131B_ABST
Patent Text Reader

Abstract

The application discloses a ground line generation method and device based on point clouds and intelligent driving equipment, and relates to the technical field of point cloud processing. The ground line generation method based on point clouds comprises the following steps: acquiring original point clouds collected by multiple laser radars of a vehicle. The multiple original point clouds are spliced into a vehicle coordinate system through external parameter transformation to obtain fused point clouds. The fused point clouds are input into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation processing on the fused point clouds and outputs a two-dimensional grid map. Based on the two-dimensional grid map, grid index mapping is performed on each three-dimensional point, and a non-ground point set is obtained. The projection of the coordinate system origin of a corresponding laser radar on the two-dimensional grid map is taken as a starting point, and the projection of each non-ground point on the two-dimensional grid map is taken as an ending point, and a grid index sequence is generated. Ground line points are selected from the non-ground points of all grids of the grid index sequence, and a ground line is generated according to multiple ground line points.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of point cloud processing technology, and in particular to a method, apparatus and intelligent driving device for generating ground lines based on point clouds. Background Technology

[0002] In low-speed scenarios of autonomous driving systems (such as parking lots, park roads, or narrow passages), ground lines are often used to indicate the areas where vehicles can pass or road boundary information. Therefore, accurately detecting ground lines and obtaining their spatial positions is of great significance for providing driving constraint data for vehicle planners.

[0003] Currently, existing ground line detection methods in low-speed scenarios mainly fall into two categories. One category is based on visual segmentation, which uses an onboard camera to acquire environmental images and a deep learning network to identify ground line regions, then reconstructs their spatial positions using a 2D-to-3D projection method. The other category is based on LiDAR point cloud grid mapping, which constructs a grid map from the original LiDAR point cloud and extracts ground line features from the grid space.

[0004] However, both of these methods still have certain limitations. Vision-based methods, relying on the conversion from 2D images to 3D space, tend to experience a significant decrease in spatial positioning accuracy in scenes with slope changes or when the ground line is far from the camera's optical center. LiDAR-based grid mapping methods, on the other hand, are limited by grid resolution design, typically achieving only grid-scale detection accuracy, thus restricting the accuracy of ground line detection.

[0005] Therefore, how to improve the detection accuracy of ground lines in low-speed scenarios remains a technical problem that needs to be solved in this field. Summary of the Invention

[0006] To address the technical problems of low spatial positioning accuracy based on two-dimensional image projection and insufficient ground line detection accuracy due to grid resolution limitations in existing technologies, the present invention aims to provide a ground line generation method and apparatus based on point clouds. By fusing point cloud information from multiple LiDARs of a vehicle and combining it with a point cloud segmentation neural network for non-ground point filtering, high-precision detection and generation of ground lines in low-speed scenarios can be achieved.

[0007] The objective of this invention is achieved through the following technical solution:

[0008] In a first aspect, the present invention provides a method for generating ground lines based on point clouds, comprising the following steps:

[0009] Acquire the raw point cloud data collected by multiple lidar sensors on the vehicle;

[0010] Multiple original point clouds are stitched together into the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud;

[0011] The fused point cloud is input into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation on the fused point cloud, divides the non-ground region from the three-dimensional space and outputs a two-dimensional grid map. Each grid of the two-dimensional grid map stores a classification label, which is used to identify whether the spatial region corresponding to the grid is a non-ground region.

[0012] Based on the two-dimensional raster map, each three-dimensional point in the fused point cloud is mapped by raster index, and non-ground points located in non-ground areas are filtered out from the three-dimensional points according to the classification labels to obtain a set of non-ground points.

[0013] Starting from the projection of the origin of the corresponding lidar coordinate system onto the two-dimensional grid map, and ending with the projection of each non-ground point in the set of non-ground points onto the two-dimensional grid map, a grid index sequence is generated between the starting point and the ending point.

[0014] Ground line points are selected from the non-ground points of all rasters in the raster index sequence, and ground lines are generated based on the multiple ground line points.

[0015] In one possible implementation, a raster index mapping is performed on each 3D point in the fused point cloud, specifically as follows:

[0016] For three-dimensional points Calculate its corresponding raster index according to the following formula. ;

[0017] ;

[0018] In the formula, These are the coordinates of a 3D point in the vehicle coordinate system. These are the minimum coordinate values ​​of a rectangular region in the Y-axis and X-axis directions of a 2D raster image. The spatial resolution of a two-dimensional raster image; This indicates rounding down to the nearest integer.

[0019] In one possible implementation, generating the raster index sequence between the start point and the end point specifically includes the following steps:

[0020] The projected coordinates of the starting point and the ending point in the two-dimensional raster map are converted into raster indices. and raster index ;

[0021] Generate raster indexes using the two-dimensional Bresenham algorithm. To raster index All raster indexes traversed;

[0022] Generate a raster index sequence by sequentially processing all raster indices. Where n is a positive integer, .

[0023] In one possible implementation, selecting ground line points from the non-ground points of all rasters in the raster index sequence specifically includes the following steps:

[0024] The classification label of each grid cell is queried sequentially along the grid index sequence;

[0025] If the classification label of a raster is a non-ground area, then output a non-ground point of that raster as a candidate point;

[0026] Calculate the Euclidean distance between all candidate points in the grid index sequence and the origin of the lidar coordinate system, and output the candidate point with the smallest Euclidean distance as the ground line point.

[0027] In one possible implementation, if the classification label identifiers of all rasters in the raster index sequence do not contain non-ground areas, the non-ground point corresponding to the endpoint is output as a ground line point.

[0028] In one possible implementation, a non-ground point of the raster is output as a candidate point, specifically:

[0029] When the grid contains only one non-ground point, output that non-ground point as a candidate point.

[0030] When there are multiple non-ground points in the same grid, calculate the Euclidean distance between each non-ground point and the origin of the coordinate system of the lidar, and output the non-ground point with the smallest Euclidean distance as a candidate point.

[0031] In one possible implementation, inputting the fused point cloud into a pre-trained point cloud segmentation neural network further includes preprocessing the fused point cloud to obtain a network input feature tensor, specifically including the following steps:

[0032] The fused point cloud is height-cropped to retain three-dimensional points within a preset height range;

[0033] The retained 3D points are divided according to the preset voxel resolution, and the number of points in each voxel is counted.

[0034] The number of three-dimensional points within a voxel is truncated, and the logarithmic value is calculated to obtain the voxel point count feature.

[0035] Extract the voxel statistical features for each voxel and concatenate them to form a 30-dimensional voxel feature vector;

[0036] The retained 3D points are divided in the XY plane according to Pillar resolution, and the maximum and minimum heights are extracted based on the 3D points in each Pillar to form 2D geometric features.

[0037] For each Pillar, the 30-dimensional voxel feature vector of multiple voxels is subjected to element-wise max pooling to obtain aggregated voxel features, which are then concatenated with the corresponding 2-dimensional geometric features to form a 32-dimensional feature vector.

[0038] Arrange the 32-dimensional feature vectors of each Pillar according to their corresponding grid coordinates to form a three-dimensional feature tensor;

[0039] Generate a binary Confidence Mask with the same size as the three-dimensional feature tensor. Set the Pillar position containing three-dimensional points to 1 and the Pillar position not containing three-dimensional points to 0. Then, concatenate the binary Confidence Mask as an additional channel with the 32-dimensional feature vector to form the network input feature tensor.

[0040] Secondly, the present invention provides a ground line generation device based on point clouds, comprising:

[0041] The acquisition module is used to acquire the raw point cloud data collected by multiple lidar sensors on the vehicle.

[0042] The stitching module is used to stitch multiple original point clouds to the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud.

[0043] The segmentation module is used to input the fused point cloud into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation processing on the fused point cloud, divides the non-ground region from the three-dimensional space and outputs a two-dimensional grid map. Each grid of the two-dimensional grid map stores a classification label, which is used to identify whether the spatial region corresponding to the grid is a non-ground region.

[0044] The filtering module is used to perform grid index mapping on each three-dimensional point in the fused point cloud based on the two-dimensional grid map, and filter out non-ground points located in non-ground areas from the three-dimensional points according to the classification labels to obtain a set of non-ground points.

[0045] An indexing module is used to generate a grid index sequence from the origin of the corresponding lidar coordinate system onto a two-dimensional grid map as the starting point and the projection of each non-ground point in the set of non-ground points onto the two-dimensional grid map as the ending point.

[0046] A generation module is used to select ground line points from the non-ground points of all rasters in the raster index sequence, and generate ground lines based on the plurality of ground line points.

[0047] Thirdly, the present invention provides an electronic device including a processor coupled to a memory, the memory storing program instructions, which, when executed by the processor, implement a point cloud-based ground line generation method in any possible implementation of the first aspect.

[0048] Fifthly, the present invention provides an intelligent driving device, including a point cloud-based ground line generation device as described in the second aspect.

[0049] Compared with the prior art, the present invention has at least the following beneficial effects:

[0050] This invention provides a ground line generation method based on point clouds, comprising the following steps: acquiring original point clouds collected by multiple LiDARs of a vehicle; stitching the multiple original point clouds into the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud; inputting the fused point cloud into a pre-trained point cloud segmentation neural network, which performs semantic segmentation processing on the fused point cloud, dividing non-ground regions from three-dimensional space and outputting a two-dimensional raster image, wherein each grid cell of the two-dimensional raster image stores a classification label, which is used to identify whether the spatial region corresponding to the grid cell is a non-ground region; based on the two-dimensional raster image, performing raster index mapping on each three-dimensional point in the fused point cloud, and filtering out non-ground points located in non-ground regions from the three-dimensional points according to the classification labels to obtain a set of non-ground points; and generating a raster index sequence from the starting point to the ending point, using the projection of the origin of the corresponding LiDAR coordinate system onto the two-dimensional raster image as the starting point and the projection of each non-ground point in the set onto the two-dimensional raster image as the ending point. Ground line points are selected from the non-ground points of all rasters in the raster index sequence, and ground lines are generated based on the multiple ground line points.

[0051] This invention utilizes multi-LiDAR point cloud fusion and neural network semantic segmentation to first construct a two-dimensional grid classification map to identify non-ground areas, and then establishes a three-dimensional point-grid index mapping to filter the set of non-ground points. The core innovation lies in generating a radial grid index sequence starting from the LiDAR origin and ending at non-ground points. The sequence is traversed to query the first non-ground point as the ground line point, and finally, multi-directional results are aggregated to generate a complete ground line. Through this technical solution, this invention can fully utilize the spatial information of multi-LiDAR point clouds and combine it with a deep learning point cloud segmentation network for non-ground point filtering, achieving higher accuracy in ground line generation than traditional two-dimensional projection or fixed grid mapping methods. This provides reliable driving constraint information for vehicle path planning in low-speed autonomous driving scenarios, improving the vehicle's perception of complex road environments and driving safety. Attached Figure Description

[0052] Figure 1This is a schematic diagram of one embodiment of the intelligent driving device of the present invention;

[0053] Figure 2 This is a schematic flowchart of a path optimization method according to the present invention;

[0054] Figure 3 This is a schematic diagram of the process of constructing the bounding box of each path point of the initial path according to the present invention;

[0055] Figure 4 This is a schematic diagram of a process for calculating obstacle reference points according to the present invention;

[0056] Figure 5 This is a schematic diagram of a process for preprocessing the fused point cloud to obtain a network input feature tensor according to the present invention;

[0057] Figure 6 This is a schematic diagram of the structure of a point cloud-based ground line generation device according to the present invention.

[0058] Figure 7 This is a schematic diagram of the electronic device of the present invention.

[0059] Figure 8 This is the original point cloud of the lidar of the present invention;

[0060] Figure 9 This is a schematic diagram of the two-dimensional raster image output by the point cloud segmentation neural network of the present invention;

[0061] Figure 10 This is a schematic diagram of the ground point and non-ground point segmentation results of the present invention;

[0062] Figure 11 This is a schematic diagram illustrating the effect of the fused grounding wire according to the present invention. Detailed Implementation

[0063] To facilitate understanding, the following section first introduces several concepts and terms involved in this application.

[0064] 3D point cloud, also known as laser point cloud, refers to a discrete set of three-dimensional points obtained by scanning a target environment using a lidar sensor in a unified spatial coordinate system. Each 3D point typically contains spatial coordinate information (x, y, z), used to characterize the spatial distribution and geometric structure of objects in the environment.

[0065] A LiDAR coordinate system is a three-dimensional spatial coordinate system established with reference to a single LiDAR sensor. It is used to describe the spatial positional relationships of the point cloud data collected by the LiDAR. Typically, the optical center or geometric center of the LiDAR is used as the origin of the coordinate system.

[0066] Vehicle coordinate system: This is a three-dimensional coordinate system established with the vehicle itself as a reference. It typically uses the vehicle's geometric center or a fixed point as the origin, with the x-axis pointing forward, the y-axis pointing left, and the z-axis pointing upward. This coordinate system is used to uniformly describe the spatial information of the vehicle's surrounding environment.

[0067] Two-dimensional raster graphics: These are regular grid structures formed by projecting point cloud data from three-dimensional space onto a two-dimensional plane (usually the xy-plane) and dividing it according to a preset resolution. Each raster corresponds to a certain spatial range and is used to store classification labels or statistical information.

[0068] Raster resolution refers to the actual spatial size (e.g., 0.05m × 0.05m) corresponding to each raster cell in a two-dimensional raster image. The choice of resolution directly affects the level of detail and computational complexity of the raster image.

[0069] Raster index mapping refers to the process of mapping each point in a 3D point cloud to a corresponding 2D raster cell based on its projected coordinates on a 2D plane. This mapping relationship enables the association between point-level data and raster-level semantic labels.

[0070] Ground line: A spatial curve or boundary line formed by connecting or fitting multiple ground line points according to certain rules, used to represent the boundary between ground areas and non-ground areas.

[0071] To facilitate understanding, a brief introduction to the background technology involved in this application will be given first below.

[0072] In low-speed operation scenarios of autonomous driving systems, such as parking lots, park roads, and narrow passages, ground lines are typically used to represent the boundaries or road constraints of drivable areas. The spatial accuracy of ground lines directly affects the path planning module's judgment of drivable areas. Therefore, high-precision detection and reconstruction of ground lines are of great significance for improving the safety and stability of autonomous driving systems.

[0073] In the existing technology, ground line detection methods for low-speed scenes mainly include two categories: visual segmentation-based methods and LiDAR point cloud grid mapping methods.

[0074] The first category is ground line detection methods based on visual segmentation. This method typically acquires environmental images using an onboard camera and employs a semantic segmentation neural network to identify ground line regions within the images. Subsequently, based on the camera's imaging model, the segmentation results from the 2D image are back-projected into 3D space to recover the spatial location information of the ground line. However, this type of method inherently relies on a geometric inversion process from 2D to 3D, and its accuracy is significantly affected by factors such as the camera model, calibration errors, and viewpoint distribution. In practical applications, when there are changes in road slope, vehicle attitude (such as pitch angle), or when the ground line region is far from the camera's optical center, the back-projection error is significantly amplified, leading to a marked decrease in the 3D positioning accuracy of the ground line, making it difficult to meet the high-precision constraint information requirements of downstream planning modules.

[0075] The second category is ground line detection methods based on LiDAR grid mapping. This method accumulates the raw point cloud collected by LiDAR in the vehicle coordinate system and constructs an occupied grid map according to a preset spatial resolution (e.g., 0.1 meters). Ground lines are then extracted in the grid space based on features such as height information and reflection intensity. However, this type of method is essentially a grid-level representation, and its detection accuracy is limited by the grid resolution. It can usually only achieve positioning accuracy at the grid scale (e.g., 10 centimeters), and cannot fully utilize the high-precision ranging capability of the raw LiDAR point cloud. In addition, in scenarios with dynamic obstacles, the accumulation of multiple frames of point cloud can easily produce a ghosting phenomenon in the grid map, thus interfering with the ground line extraction process. This usually requires the design of additional dynamic target detection and tracking algorithms for filtering, further increasing the system complexity and computational cost.

[0076] In summary, existing technologies for ground line detection in low-speed scenarios still suffer from insufficient 3D positioning accuracy and poor adaptability to dynamic environments. Therefore, how to improve ground line detection accuracy while maintaining computational efficiency and reducing sensitivity to dynamic obstacles remains a pressing technical challenge in this field.

[0077] To address this, this invention provides a point cloud-based method for generating ground lines. This method utilizes multi-LiDAR point cloud fusion and neural network semantic segmentation to first construct a two-dimensional grid classification map to identify non-ground regions. Then, it establishes a three-dimensional point-to-grid index mapping to filter the set of non-ground points. The core innovation lies in generating a radial grid index sequence starting from the LiDAR origin and ending at non-ground points. The sequence is traversed to query the first non-ground point as the ground line point, and finally, the results from multiple directions are aggregated to generate a complete ground line. Through this technical solution, this invention can fully utilize the spatial information of multi-LiDAR point clouds and combine it with a deep learning point cloud segmentation network for non-ground point filtering, achieving higher accuracy ground line generation than traditional two-dimensional projection or fixed grid mapping methods. This provides reliable driving constraint information for vehicle path planning in low-speed autonomous driving scenarios, improving the vehicle's perception of complex road environments and enhancing driving safety.

[0078] The technical solutions in the embodiments of this application will now be described with reference to the accompanying drawings.

[0079] This application can be applied to intelligent driving devices. These intelligent driving devices can include land vehicles, water vehicles, air vehicles, industrial equipment, agricultural equipment, or entertainment equipment. For example, an intelligent driving device can be a vehicle, which is a broad concept and can include transportation vehicles (such as commercial vehicles, passenger cars, motorcycles, etc.), industrial vehicles (such as forklifts, trailers, tractors, etc.), engineering vehicles (such as excavators, bulldozers, cranes, etc.), agricultural equipment (such as lawnmowers, harvesters, etc.). This application does not specifically limit the type of vehicle.

[0080] The following is combined with Figure 1 Taking vehicles as an example, the application scenarios of this application will be introduced exemplarily.

[0081] Figure 1 This is a functional block diagram of a vehicle 100 provided in an embodiment of this application. It should be understood that... Figure 1 The descriptions provided are merely examples, and actual vehicles may vary. Vehicle 100 may be a manually driven vehicle, or it may be configured for fully or partially automated driving. Figure 1 As shown, the vehicle 100 may include a perception system 110 and a computing platform 120.

[0082] The perception system 110 may include one or more sensors for sensing information about the environment surrounding the vehicle 100, and the perception system 110 includes at least a lidar. For example, the perception system 110 may include a positioning system, which may be a Global Positioning System (GPS), a BeiDou system, or another positioning system. The perception system 110 may also include one or more of the following: an inertial measurement unit (IMU), lidar, millimeter-wave radar, ultrasonic radar, and a camera device.

[0083] Some or all of the functions of vehicle 100 can be controlled by computing platform 120. Computing platform 120 may include one or more processors, such as processors 121 to 12n (n being a positive integer). A processor is a circuit with signal processing capabilities. In one implementation, the processor can be a circuit with instruction read and execute capabilities, such as a central processing unit (CPU), microprocessor, graphics processing unit (GPU) (which can be understood as a type of microprocessor), or digital signal processor (DSP). In another implementation, the processor can implement certain functions through the logical relationships of hardware circuits. These logical relationships are fixed or reconfigurable. For example, the processor may be a hardware circuit implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD), such as a field-programmable gate array (FPGA). In reconfigurable hardware circuits, the process of the processor loading a configuration document and configuring the hardware circuit can be understood as the process of the processor loading instructions to implement some or all of the functions of the aforementioned units. Furthermore, the processor can also be a hardware circuit designed for artificial intelligence, which can be understood as an ASIC, such as a neural network processing unit (NPU), tensor processing unit (TPU), deep learning processing unit (DPU), etc. In addition, the computing platform 120 may also include a memory for storing instructions. Some or all of the processors 121 to 12n can call the instructions in the memory to implement the corresponding functions.

[0084] Figure 2 This is a schematic flowchart illustrating a point cloud-based ground line generation method provided in an embodiment of this application. This ground line generation method can be... Figure 1 The ground line generation method can be performed by the vehicle 100 shown, or by the computing platform 120, or by a system consisting of the computing platform 120 and sensors, or by a system-on-chip (SoC) in the computing platform 120, or by a processor in the computing platform 120. The ground line generation method may include steps S210 to S260, which are described in detail below.

[0085] S210: Acquire the raw point cloud data collected by multiple lidar sensors on the vehicle.

[0086] In this application, multiple lidar sensors mounted on the vehicle scan the surrounding environment and collect raw point cloud data. The data collected by each lidar sensor includes the three-dimensional coordinates of various points in space and information such as reflection intensity. The data from multiple lidar sensors can cover different areas around the vehicle to ensure the integrity of the scene point cloud.

[0087] In some embodiments, the acquired raw point cloud can be labeled according to timestamps or radar numbers for subsequent processing. The raw point cloud may contain ground points, obstacle points, and other environmental points, providing a data foundation for subsequent ground segmentation.

[0088] In one specific implementation, such as Figure 9 As shown, point cloud data can be acquired in real time through the network port of the vehicle-mounted LiDAR. Each data packet contains the ranging result 'r' obtained by TOF (Time-of-Flight) ranging, as well as the elevation and azimuth information of the laser beam. The acquired raw data is in polar coordinate form, reflecting the spatial ranging information of the LiDAR relative to its own coordinate system.

[0089] To facilitate subsequent 3D point cloud fusion and processing, the polar coordinates of each laser point are converted into 3D spatial coordinates (x, y, z) in the lidar coordinate system. The specific conversion formula is as follows:

[0090] ;

[0091] Where r is the measured distance, elevation is the laser beam's pitch angle, azimuth is the laser beam's azimuth angle, and x, y, z represent the spatial position of the point in the lidar coordinate system. Using the above formula, the polar coordinate information of all laser points can be converted into a 3D point cloud in the lidar coordinate system, providing basic data for subsequent point cloud fusion, segmentation, and ground line generation.

[0092] In some embodiments, the converted 3D point cloud can be preliminarily filtered or outliers removed to improve the stability and accuracy of subsequent algorithm processing. For example, points with abnormal ranging or height can be removed to reduce the impact of noise on ground line detection.

[0093] S220: Multiple original point clouds are stitched together into the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud.

[0094] In this application, the original point clouds collected by each lidar are stitched together into the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud. The extrinsic parameter transformation includes rotation and translation matrices, used to map the point clouds in each lidar coordinate system to a unified vehicle coordinate system. The fused point cloud can eliminate coordinate differences between lidars and achieve a holistic representation of the point cloud around the vehicle.

[0095] In some embodiments, when transforming a 3D point (x, y, z) in the lidar coordinate system to a 3D point (X, Y, Z) in the vehicle coordinate system, it is necessary to use the calibration extrinsic parameters of the lidar to the origin of the vehicle coordinate system. These calibration extrinsic parameters include rotations (roll, pitch, yaw) along the three coordinate axes and translations (x1, y1, z1) along the three coordinate axes, which are used to fully describe the position and orientation of the lidar relative to the vehicle coordinate system.

[0096] For example, the conversion formula is as follows:

[0097] ;

[0098] The three rotation matrices represent the rotation relationships along each axis:

[0099] Rotation matrix around the Z-axis (yaw):

[0100] ;

[0101] Rotation matrix around the Y-axis (pitch):

[0102] ;

[0103] Rotation matrix around the X-axis (roll):

[0104] ;

[0105] The above formula allows for precise mapping of point cloud coordinates from the lidar coordinate system to the vehicle coordinate system, providing a unified coordinate basis for subsequent point cloud fusion, 2D raster generation, and ground line detection. In some embodiments, translation (x1, y1, z1) can be used to compensate for radar installation deviations, thereby improving point cloud registration accuracy.

[0106] S230: The fused point cloud is input into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation on the fused point cloud, divides the non-ground region from the three-dimensional space and outputs a two-dimensional grid map. Each grid cell of the two-dimensional grid map stores a classification label. The classification label is used to identify whether the spatial region corresponding to the grid cell is a non-ground region.

[0107] In this application, a pre-trained point cloud segmentation neural network is input into a fused point cloud to output a two-dimensional grid map. The two-dimensional grid map covers the planar area around the vehicle, with each grid cell corresponding to a small spatial region in the vehicle coordinate system, and stores a classification label. The classification label is used to identify whether the grid cell is a non-ground area, i.e., to identify obstacles or passable areas.

[0108] In this embodiment of the application, the pre-trained point cloud segmentation neural network (i.e., pre-segmentation model) is used to perform semantic segmentation processing on the fused point cloud, dividing the three-dimensional space into ground area and non-ground area, thereby providing basic data support for subsequent ground line generation.

[0109] In one implementation, the point cloud segmentation neural network takes a pre-processed three-dimensional feature tensor as input to classify the spatial region in the vehicle coordinate system grid by grid and outputs a two-dimensional grid label map.

[0110] For example, each grid corresponds to a region in the actual space, which is classified into one of the following two categories:

[0111] Ground area: This indicates that the space area corresponding to this grid is mainly composed of the ground, including road surfaces, parking areas, or other passable surfaces;

[0112] Non-Ground: This indicates that the spatial region corresponding to this grid contains point clouds that are not ground structures or do not belong to the ground, including but not limited to vehicles, pedestrians, curbs, guardrails, walls, obstacles, and ground boundary structures.

[0113] Furthermore, the non-ground area not only includes significant obstacle points, but also boundary points located at the junction of the ground and obstacles. These points spatially reflect the location of the ground edge and are an important basis for subsequent ground line extraction.

[0114] In terms of output format, the point cloud segmentation neural network generates a two-dimensional label map with the same size as the input raster, where the output of each raster is the label value of the corresponding category. Based on this output, the three-dimensional points in the fused point cloud can be mapped to the corresponding raster, and the point cloud belonging to the non-ground region can be filtered out according to the raster label to form a set of non-ground points.

[0115] For example, such as Figure 10 As shown, the fused point cloud surrounding the vehicle is input into a pre-trained point cloud segmentation neural network, which outputs a two-dimensional raster label map. The two-dimensional raster map covers the planar area surrounding the vehicle, and each raster (or pillar) stores a classification label to identify whether the spatial area corresponding to that raster is a non-ground area. The data structure of this two-dimensional raster label map is defined as follows:

[0116] (1) Raster map resolution and spatial coverage

[0117] Resolution: On the XY plane, the spatial size of each grid is 0.05 m × 0.05 m (5 cm). This resolution was selected after balancing computing power and ground line accuracy, so that the accuracy of the final generated ground line is close to the accuracy of the original point cloud ranging.

[0118] (2) Spatial coverage area: A rectangular area centered on the origin of the vehicle coordinate system (usually the center of the rear axle). In the example, the X direction covers [-50m, 50m], and the Y direction covers [-50m, 50m]. The specific range can be adjusted according to the actual scenario, but it must be kept consistent during network training and inference.

[0119] (3) Array dimensions: Based on the coverage and resolution, the number of raster rows H and the number of columns W are calculated as follows: For example, when X min =-50,X max When W = 50, W = ⌈(50 - (-50)) / 0.05⌉ = 2000; similarly, H = 2000. Therefore, a two-dimensional raster map can be represented as an H×W matrix.

[0120] Specifically, each grid cell stores an integer label value to identify whether the grid cell belongs to the ground area:

[0121] 0: Indicates that the grid is classified as a ground point (mainly containing ground areas, not passable).

[0122] 1: This indicates that the grid is classified as a non-ground point (including obstacles, vehicles, pedestrians, etc., or empty). Ground lines are usually located at the edge of non-ground areas, so non-ground points are retained for subsequent generation of ground lines.

[0123] If needed, it can be expanded to multiple categories, such as ground, low obstacle, and high obstacle, but this embodiment requires at least ground / non-ground binary classification.

[0124] In this embodiment, the output of the point cloud segmentation neural network is a two-dimensional grid label map based on the XY plane, without grid discretization in the Z direction. It should be noted that this two-dimensional grid map does not only contain planar information, but implicitly incorporates three-dimensional spatial information through feature construction.

[0125] Specifically, during the network input stage, the distribution information of the point cloud in the Z direction (including maximum height, minimum height, variance, etc.) is encoded into the feature vector corresponding to each grid through voxel statistical features and Pillar geometric features, so that the two-dimensional grid has a planar structure in terms of expression form, but still has the ability to perceive three-dimensional space in terms of semantics.

[0126] Furthermore, the reason why this application uses a two-dimensional grid instead of a three-dimensional voxel grid is that: the ground line is essentially the projection of the contact boundary between the ground and the obstacle onto the horizontal plane, and its geometry is mainly reflected in the XY plane; at the same time, if a three-dimensional grid is used for path search, it will significantly increase the computational complexity, which is not conducive to the real-time performance of engineering. Therefore, by using the method of "two-dimensional grid representation + three-dimensional feature encoding", necessary spatial information is preserved while ensuring computational efficiency, thus achieving a balance between accuracy and efficiency.

[0127] S240: Based on the two-dimensional raster map, perform raster index mapping on each three-dimensional point in the fused point cloud, and filter out non-ground points located in non-ground areas from the three-dimensional points according to the classification labels to obtain a set of non-ground points.

[0128] In this application, as Figure 11 As shown, based on a two-dimensional raster map, raster index mapping is performed on each three-dimensional point in the fused point cloud, mapping each point to a corresponding raster. Then, based on the raster's classification label, three-dimensional points located in non-ground areas are selected from the fused point cloud to obtain a set of non-ground points.

[0129] In some embodiments, raster index mapping of each 3D point in the fused point cloud can be implemented as follows: First, obtain the spatial coordinates (X, Y, Z) of the 3D point in the vehicle coordinate system. Then, based on the coverage area and spatial resolution of the 2D raster image, map the 3D point to the corresponding raster. Specifically, for each 3D point, its raster index in the 2D raster image can be calculated according to the following formula. For example, raster index mapping is performed on each 3D point in the fused point cloud, specifically as follows:

[0130] For a 3D point (X,Y,Z), calculate its corresponding raster index using the following formula. ;

[0131] ;

[0132] Where X and Y are the coordinates of the 3D points in the vehicle coordinate system, are the minimum coordinate values of the rectangular area of the 2D grid map in the Y-axis and X-axis directions, is the spatial resolution of the 2D grid map; represents rounding down.

[0133] Through the above mapping, each 3D point can correspond to a unique grid position in the 2D grid map, achieving an accurate mapping from 3D points to 2D grids. This mapping not only provides a basis for subsequent non-ground point screening but also ensures the correctness and continuity of the grid index sequence during the ground line generation process. In some embodiments, boundary checks can also be performed on the grid index mapping results to avoid the situation where the point cloud exceeds the coverage area of the 2D grid map.

[0134] It should be noted that since the grid index starts from 0, the calculated index should satisfy 0 ≤ i < H and 0 ≤ j < W. If the point coordinates fall on the boundary (such as y = Y max ), rounding down may result in i = H, and boundary truncation processing should be performed at this time.

[0135] In a preferred implementation, if the calculated index i or j exceeds the range of [0, H - 1] or [0, W - 1], it means that the point is outside the network inference coverage area. For such points, this embodiment will directly discard them and not participate in the subsequent step processing to ensure the reliability of the output. In practical applications, default labels or expanding the coverage area can also be used, but this embodiment adopts the discard strategy.

[0136] On the other hand, during the grid lookup process, if multiple original points are mapped to the same grid and the grid label is "non-ground", then all these points will be retained and used separately in subsequent ground line generation steps (such as the Bresenham line drawing algorithm). This "full retention" strategy maintains the point cloud density information, helps to generate a more continuous and refined ground line, and improves the accuracy of ground line extraction.

[0137] Specifically, based on the 2D grid label map, the 3D points in the fused point cloud are mapped to the corresponding grids, and the non-ground point set is extracted according to the classification labels. The function of this step is to screen out the candidate points that may be near the ground boundary from the original point cloud, and its output is an unordered point set.

[0138] S250: Starting from the projection of the origin of the corresponding lidar coordinate system onto the two-dimensional grid map, and ending with the projection of each non-ground point in the set of non-ground points onto the two-dimensional grid map, generate a grid index sequence between the starting point and the ending point.

[0139] In this application, a grid index sequence is generated along the direction from the origin of the corresponding lidar coordinate system projected onto a two-dimensional grid map as the starting point, and the projections of each non-ground point in the non-ground point set onto the two-dimensional grid map as the ending point. The grid index sequence represents the spatial path from the lidar origin to the non-ground points and can be used to determine the location of potential ground boundaries.

[0140] For example, for a multi-lidar system, the original point cloud generated by each radar needs to be processed starting from the origin of its coordinate system. In the subsequent Bresenham line drawing process, for non-ground points obtained from a specific radar scan, the corresponding radar origin is used as a reference.

[0141] The endpoint is any non-ground point selected. The Bresenham line drawing algorithm is based on the projection points of the start and end points on the XY plane. and Perform this process, ignoring height differences, to ensure that the primary focus is on capturing the horizontal position distribution.

[0142] In some embodiments, the raster index sequence can be implemented using the Bresenham line algorithm or a similar raster path generation method, ensuring that the sequence covers all rasters from the start point to the end point.

[0143] This step, based on the S240 filtering results, introduces the geometric relationships of lidar observations. Using the lidar origin as the starting point and non-ground points as the ending points, a two-dimensional grid path generation algorithm (such as the Bresenham algorithm) is used to construct a grid index sequence from the starting point to the ending point. The purpose of this step is to transform the originally disordered set of non-ground points into a path structure with spatial order along the direction of the observation ray, thereby establishing the spatial relationships between points.

[0144] Furthermore, the grid index sequence can be used to determine the "first observed non-ground point" along each ray path, thus providing a basis for the subsequent determination of ground boundary points. Therefore, S240 solves the "candidate point acquisition problem," while S250 solves the "spatial order and path relationship construction problem," and the two work together.

[0145] S260: Select ground line points from the non-ground points of all rasters in the raster index sequence, and generate a ground line based on the multiple ground line points.

[0146] In this application, based on the raster index sequence, non-ground points in each raster cell of the sequence are queried, and a ground line point is selected from them. The final ground line is generated by aggregating the ground line points of multiple raster index sequences.

[0147] It should be noted that this application searches for non-ground points along the path direction in a grid index sequence starting from the origin of the lidar coordinate system, and selects the non-ground point closest to the origin as the ground boundary point corresponding to the path, and uses this ground boundary point as the ground line point. This ground line point spatially corresponds to the position where the lidar ray first contacts the obstacle, and its horizontal projection reflects the boundary position between the ground and the obstacle.

[0148] Specifically, in the point cloud segmentation results, each grid cell is labeled as "ground" or "non-ground". Non-ground regions typically correspond to obstacles (such as vehicles, pedestrians, curbs, guardrails, etc.) or structures above the ground, while ground regions correspond to passable road surfaces. Geometrically, the ground line represents the boundary contour between the ground and the aforementioned non-ground structures.

[0149] In summary, the ground line points in this application are ground boundary points determined by a ray search mechanism based on a set of non-ground points. This definition transforms the generation of ground lines from a "ground fitting problem" to a "boundary detection problem," resulting in greater stability and applicability in complex environments. Physically, the ground line point corresponds to the position where the lidar ray first contacts an obstacle; its projection onto the horizontal plane (XY plane) constitutes the boundary between the ground and the obstacle. Therefore, this point is neither the top of the obstacle nor an arbitrary non-ground point, but rather a boundary point with the property of "first occlusion," stably reflecting the ground edge position.

[0150] like Figure 3 As shown, in one specific implementation, generating the raster index sequence between the start point and the end point specifically includes the following steps:

[0151] S310: Convert the projected coordinates of the starting point and the ending point in the two-dimensional raster map into raster indices respectively. and raster index .

[0152] First, the projection of the origin of the LiDAR coordinate system onto the 2D grid map is taken as the starting point, and the projections of each non-ground point in the non-ground point set onto the 2D grid map are taken as the ending points. Then, the projected coordinates of the starting and ending points are converted into corresponding grid indices, denoted as follows: and This conversion process follows the aforementioned mapping formula from the original point cloud to the raster index, ensuring that each projected point falls into a unique raster.

[0153] S320: Generating raster indexes using the 2D Bresenham algorithm To raster index All raster indices traversed.

[0154] To obtain a continuous raster sequence from the start point to the end point, a two-dimensional Bresenham algorithm is employed on a two-dimensional raster map. This algorithm can determine the start point raster index based on the raster index. and endpoint raster index This generates the indexes of all the grid cells along the straight path. The Bresenham algorithm ensures that the path grid sequence covers all grid cells from the start to the end point, thus providing a complete candidate set for subsequent ground line point selection.

[0155] The Bresenham algorithm is a classic rasterized line generation algorithm, primarily used to draw discrete point sequences that approximate continuous straight lines in two-dimensional discrete grids (such as the two-dimensional raster map in this application). This algorithm computes incrementally, avoiding floating-point operations, resulting in high efficiency and generating smooth, continuous paths, making it highly suitable for use in point cloud projection, raster path generation, or map rasterization.

[0156] S330: Generate a raster index sequence by sequentially processing all raster indices. Where n is a positive integer, .

[0157] Specifically, the grid index sequence fully describes the two-dimensional grid path from the radar origin to non-ground points, providing a reference for subsequent selection of ground line points from the grid index sequence.

[0158] In some embodiments, the height information, density, or weight of non-ground points within a grid can be combined to prioritize the grids in the sequence, thereby further optimizing the accuracy and continuity of ground line point selection.

[0159] like Figure 4 As shown, in one specific implementation, selecting ground line points from the non-ground points of all rasters in the raster index sequence specifically includes the following steps:

[0160] S410: Query the classification label of each grid sequentially along the grid index sequence.

[0161] In practice, along the generated raster index sequence The classification label for each grid cell is queried sequentially. The classification label is derived from the 2D grid label map and indicates whether the grid cell is a non-ground area.

[0162] S420: If the classification label of a raster is a non-ground area, then output a non-ground point of the raster as a candidate point.

[0163] In practice, if a grid cell is classified as a non-ground region, a non-ground point is selected from that grid cell as a candidate point. This candidate point retains the spatial information of the original point cloud and can be used for further calculations of ground line points. For example, the most suitable candidate point within the grid cell can be selected by combining the height information, density, or other attributes of the non-ground points to improve the accuracy of the ground lines.

[0164] In one specific implementation, a non-ground point of the raster is output as a candidate point, specifically as follows:

[0165] S421: When the grid contains only one non-ground point, output that non-ground point as a candidate point.

[0166] S422: When there are multiple non-ground points in the same grid, calculate the Euclidean distance between each non-ground point and the origin of the coordinate system of the lidar, and output the non-ground point with the smallest Euclidean distance as a candidate point.

[0167] Understandably, when the grid contains only one non-ground point, that non-ground point is directly output as a candidate point. In this case, no additional filtering is required, reducing computational overhead while ensuring the uniqueness of the candidate point.

[0168] When multiple non-ground points exist within the grid, these points need to be filtered to determine the optimal candidate point. Specifically, the Euclidean distance between each non-ground point and the origin of the corresponding lidar coordinate system is calculated, and the non-ground point with the smallest Euclidean distance is selected as the candidate point for output. This method prioritizes the non-ground point that is first encountered along the radar observation direction, thus more accurately reflecting the actual boundary between the ground and obstacles.

[0169] In some embodiments, the selection strategy based on minimum Euclidean distance described above can be combined with directional constraints of the raster index sequence to ensure that the selection of candidate points satisfies both spatial continuity and geometric consistency, thereby further improving the stability and accuracy of ground line extraction.

[0170] S430: If the classification label of all rasters in the raster index sequence does not contain a non-ground area, the non-ground point corresponding to the endpoint is output as a ground line point.

[0171] Understandably, if no non-ground points are found after traversing the entire raster sequence L (i.e., all rasters are empty or ground rasters), a backup plan is adopted: directly retain the endpoint Pend as the grounding point output to ensure recall in sparse point clouds or isolated endpoints. If necessary, time-domain filtering or clustering methods can be used in subsequent processing to remove abnormal isolated points, but the endpoint can be retained in the basic algorithm.

[0172] S440: Calculate the Euclidean distance between all candidate points in the grid index sequence and the origin of the lidar coordinate system, and output the candidate point with the smallest Euclidean distance as the ground line point.

[0173] In practice, for all candidate points in the grid index sequence, the Euclidean distance from each point to the origin of the LiDAR coordinate system is calculated. The candidate point with the smallest Euclidean distance is output as the ground line point. In this way, it is ensured that the selected ground line point is located at the non-ground edge position closest to the radar origin, thereby effectively identifying the boundary between the vehicle-accessible area and non-ground obstacles.

[0174] In some embodiments, the above steps can be repeated for multiple grid index sequences to generate a continuous ground line, further improving the accuracy and continuity of ground line extraction and providing a reliable boundary reference for low-speed autonomous driving or path planning of vehicles.

[0175] If the classification label identifier of all rasters in the raster index sequence does not contain any non-ground areas, the non-ground point corresponding to the endpoint will be output as a ground line point.

[0176] This application also provides a specific implementation of a point cloud segmentation neural network. Specifically, the point cloud segmentation neural network is used to perform two-dimensional grid segmentation on fused point clouds to distinguish between ground and non-ground regions. The network includes encoder (En_1 to En_5) and decoder (De_1 to De_6) structures, which are symmetrical encoder-decoder structures, supplemented by skip connections and multi-scale supervision branches to ensure high-precision spatial segmentation.

[0177] (1) Encoder function

[0178] The encoder is responsible for downsampling and extracting high-dimensional semantic features step by step. The functions of each layer are as follows:

[0179] En_1: Input layer, performs preliminary convolution on the original 33-dimensional point cloud features (32-dimensional fused features + 1-dimensional confidence mask), outputs 64 channels, and extracts shallow features such as edges and corners.

[0180] En_2: First downsampling, feature map size halved, 128 channels, extracting local shape features.

[0181] En_3: Second downsampling, size halved, 256 channels, using dilated convolution (Dilation Rate=2) to expand the receptive field and extract coarse semantic information.

[0182] En_4: Third downsampling, size halved, 512 channels, dilated convolution rate of 4, to obtain global context information.

[0183] En_5: The deepest layer, smallest size, 512 channels, dilated convolution rate of 8, extracts the most abstract semantic features, and distinguishes complex scenes.

[0184] Downsampling in the encoder is achieved through convolutions with a stride of 2 (3×3 kernel, Stride=2, Padding=1), eliminating the need for max pooling and combining feature extraction with size reduction. Each convolutional layer uses the ReLU activation function to introduce non-linearity.

[0185] (2) Decoder function

[0186] The decoder recovers spatial resolution through step-by-step upsampling and refines it by combining encoder features:

[0187] De_6: Upsamples the output of En_5, doubles the size, and adds 256 channels.

[0188] De_5: Fusion of De_6 upsampling results with En_4 features (skip connections), upsampling, size doubled, 128 channels.

[0189] De_4: Integrates En_3 features, upsamples, 64 channels.

[0190] De_3: Integrates En_2 features, upsamples, 32 channels.

[0191] De_2: Fuse En_1 features, upsample, 16 channels.

[0192] De_1: The last layer, which upsamples the feature map to the input resolution (H×W) and outputs the final segmentation result through convolution.

[0193] Upsampling in the decoder is achieved through transposed convolution (ConvTranspose, 4×4 kernel, Stride=2, padding calculated based on output size), restoring spatial dimensions while preserving feature information. Skip connections are concatenated, and the number of channels is adjusted by 1×1 convolution after concatenation, reducing computation and promoting feature fusion.

[0194] (3) Multi-scale supervision branch

[0195] In the intermediate layers of the decoder (such as De_4 and De_3), auxiliary supervision branches are set up. Each branch contains a 1×1 convolution mapping to the number of classes (2 classes) and calculates the segmentation loss. The auxiliary loss is weighted and summed with the main output loss, which helps to accelerate convergence and improve accuracy.

[0196] (4) Network input and output

[0197] Input: Point cloud feature matrix, a three-dimensional tensor in the format (C,H,W), where C=33-dimensional features and H×W is the number of grids (e.g., 2000×2000).

[0198] Output: Ground / non-ground probability map for each grid cell, with dimensions (2, H, W); can be converted into a binary label map (0 / 1) by thresholding, where 0 represents ground and 1 represents non-ground.

[0199] (5) Convolution and dilated convolution configurations

[0200] All standard convolutional layers use 3×3 convolutions with Stride=1, Padding=1 (Same Padding), and the activation function is ReLU.

[0201] The deep layers of the encoder (En_3 to En_5) introduce dilated convolutions with dilation rates of 2, 4, and 8, respectively, to expand the receptive field and capture global semantic information.

[0202] (6) Network collaboration mechanism

[0203] The encoder extracts rich semantic features, while the decoder recovers spatial details step by step. Through skip connections, the decoder fuses features from the corresponding encoder layer, enabling the simultaneous use of edge and semantic information to generate a high-precision two-dimensional raster label map, providing an accurate set of non-ground points for subsequent ground line reconstruction.

[0204] Further, refer to Figure 5 Based on the aforementioned point cloud segmentation neural network, in this embodiment of the application, inputting the fused point cloud into the pre-trained point cloud segmentation neural network further includes preprocessing the fused point cloud to obtain the network input feature tensor, specifically including the following steps:

[0205] S510: Perform height clipping on the fused point cloud, retaining three-dimensional points within a preset height range.

[0206] The fused point cloud is height-cropped, retaining only 3D points within a preset height range and removing outliers that are above the vehicle's driving height or below ground level. This reduces redundant points and improves the effectiveness of the network's input features, aiding in subsequent ground-to-non-ground classification.

[0207] For example, the fused point cloud is cropped according to the Z-axis height, retaining only 3D points with a height ranging from -1 meter to 2 meters. This cropping removes irrelevant high-altitude or underground points, ensuring the network focuses on ground and obstacle areas.

[0208] S520: Divide the retained 3D points according to the preset voxel resolution and count the number of points in each voxel.

[0209] In practice, the clipped 3D points are divided into spaces according to a preset voxel resolution. The number of points within each voxel is counted, and the point count is truncated to a maximum of 64 and then logarithmically processed to obtain the voxel point count feature. This feature reflects the spatial density of the point cloud and can help the network identify sparse obstacles and continuous ground areas.

[0210] For example, the retained point cloud is divided into voxel resolutions of X×Y×Z = 0.05m×0.05m×0.1m, and the number of points within each voxel is counted. Voxels with no points are initialized to 0. The maximum value of the number of points within each voxel is then truncated (clipped). Then a logarithmic transformation is performed to smooth the distribution: .

[0211] S530: The number of three-dimensional points within a voxel is truncated, and the logarithmic value is calculated to obtain the voxel point count feature.

[0212] In practice, the clipped 3D points are divided into spaces according to a preset voxel resolution. The number of points within each voxel is counted, and the point count is truncated to a maximum of 64 and then logarithmically processed to obtain the voxel point count feature. This feature reflects the spatial density of the point cloud and can help the network identify sparse obstacles and continuous ground areas.

[0213] In one specific implementation, the number of 3D points within each voxel is truncated, and the logarithmic value is calculated to obtain the voxel point count feature. To ensure the algorithm's adaptability under different point cloud densities and hardware conditions, this invention represents the maximum value of this truncation as M, rather than a fixed specific value.

[0214] Specifically, the value range of M is related to the number of lines of the LiDAR, the point cloud resolution, and the hardware computing power. For example, in autonomous driving or robotics scenarios, when the number of lines of the LiDAR is high, the point cloud resolution and density are large, and a higher value of M can be selected to make full use of the point cloud information; the higher the number of lines, the more points, the stronger the resolution, and the better the model performance. In general, M is usually taken as 32 or 64;

[0215] In indoor or near-field high-precision scenarios, where the point cloud density is large but the spatial range is small, 16–24 can be selected as the M value; in large-scene, long-range scanning scenarios, 48–64 can be selected as the M value to balance sparse point cloud and computational efficiency; in extremely sparse point cloud scenarios (such as 4-line radar), where the point cloud is very sparse, 8–12 can be selected as the M value.

[0216] In an embodiment of the present invention, M is set to 64 to accommodate the resolution of the LiDAR and the computing power of the hardware. During the truncation process, the maximum number of points in each voxel is M. Points exceeding M will be truncated to M. Then, voxel point features are generated through logarithmic transformation, thereby balancing point cloud information representation and computational load, and improving the stability and efficiency of network training and inference.

[0217] S540: Extract the voxel statistical features for each voxel and concatenate them to form a 30-dimensional voxel feature vector.

[0218] In practical implementation, the 30-dimensional voxel feature vector can express the geometric shape and distribution characteristics of the point cloud at a local scale, providing a foundation for the network to extract local semantics. For example, point cloud statistics are performed on each non-empty voxel to extract a 30-dimensional feature vector, including:

[0219] Point cloud mean coordinates (3D): The X, Y, and Z mean values ​​of all points within a voxel, representing the local geometric position.

[0220] Point cloud center offset (3D): The offset of each point relative to the geometric center of the voxel, reflecting the distribution of points within the voxel.

[0221] Reflection intensity statistics (4-dimensional): the mean, maximum, minimum and variance of the intensity of points within a voxel, used to reflect the reflection characteristics of the point.

[0222] Point cloud distribution characteristics (4-dimensional): X, Y, and Z variances of points within a voxel and the number of points (log), describing point cloud density and local distribution.

[0223] Spatial location encoding (16-dimensional): a normalized index of voxels within a 3D raster, providing spatial location information.

[0224] Through the above statistics, the network can simultaneously acquire local geometric structure, reflection characteristics, and statistical distribution information, thereby improving the accuracy of ground / non-ground classification.

[0225] S550: Divide the retained 3D points into the XY plane according to Pillar resolution, and extract the maximum and minimum heights of the 3D points in each Pillar to form 2D geometric features.

[0226] In practice, voxel points are divided into XY planes at Pillar resolution (0.05m × 0.05m). The maximum and minimum heights within each Pillar are extracted as geometric features to form a 2D vector. This feature can be used to describe local terrain undulations, enabling the network to recover 3D structural information on a 2D grid plane.

[0227] S560: For the 30-dimensional voxel feature vectors of multiple voxels covered by each Pillar, perform element-wise max pooling to obtain aggregated voxel features, and then concatenate them with the corresponding 2-dimensional geometric features to form a 32-dimensional feature vector.

[0228] In practice, element-wise max pooling is performed on the 30-dimensional feature vectors of multiple voxels covered by the same Pillar to obtain aggregated voxel features, which are then concatenated with 2-dimensional geometric features to form a 32-dimensional Pillar feature vector. This aggregation method can retain the most significant geometric information within the Pillar while reducing data dimensionality and computational cost.

[0229] S570: Arrange the 32-dimensional feature vectors of each Pillar according to their corresponding grid coordinates to form a three-dimensional feature tensor.

[0230] In practice, the 32-dimensional feature vectors of each Pillar are arranged according to their corresponding grid coordinates to form a three-dimensional feature tensor (C, H, W), where C = 32, and H and W correspond to the number of grid cells in the Y and X directions, respectively. This tensor provides a regularized two-dimensional grid input for the convolutional neural network while preserving three-dimensional spatial feature information.

[0231] S580: Generate a binary Confidence Mask with the same size as the three-dimensional feature tensor. Set the Pillar position containing three-dimensional points to 1 and the Pillar position not containing three-dimensional points to 0. Then, concatenate the binary Confidence Mask as an additional channel with the 32-dimensional feature vector to form the network input feature tensor.

[0232] In practice, a binary Confidence Mask with the same size as the feature tensor is generated. Pillar positions containing at least one 3D point are marked as 1, and Pillar positions without points are marked as 0. This Mask is then concatenated with the 32-dimensional features as additional channels to form the final input feature tensor (33, H, W). This explicitly tells the network which grids contain valid points, improving the network's robustness in predicting sparse or empty regions.

[0233] This application constructs a multi-stage strategy combination of "segmentation-filtering-indexing-generation," and through the functional coupling between each stage, achieves an effective transformation from point cloud semantic information to ground geometry. Firstly, regarding the collaborative relationship between segmentation and filtering, this application does not directly use the point cloud segmentation results for ground extraction or obstacle recognition. Instead, it transforms the segmentation results into a two-dimensional raster label map and extracts a set of non-ground points through a filtering step. This processing transforms the segmentation results, originally intended for semantic understanding, into "constraint boundaries" for subsequent geometric calculations, realizing the functional transformation from semantic information to geometric constraints.

[0234] Secondly, regarding the combination of filtering and indexing, this application does not merely perform spatial filtering on non-ground points, but further introduces a two-dimensional grid indexing mechanism based on the lidar origin. The Bresenham algorithm is used to construct discrete path sequences from the lidar to each non-ground point. This step transforms the originally disordered set of non-ground points into a grid sequence with spatial order along the ray direction, thereby establishing a correlation between point cloud data and sensor observation geometry. This structured processing, "from set to ordered path," is crucial for subsequent boundary determination.

[0235] Furthermore, regarding the collaborative mechanism between indexing and generation, this application employs a "nearest point optimization" strategy on the grid path to select the point observed first along the ray direction from multiple candidate non-ground points as the ground boundary point. This mechanism essentially uses obstacle points to infer the ground boundary position, transforming the generation of ground lines from a traditional method relying on ground point fitting to a reverse derivation method based on the spatial distribution of non-ground points. This allows for stable extraction of ground lines even in cases of sparse point clouds, severe occlusion, or discontinuous ground surfaces.

[0236] Therefore, this application, through the above strategy combination, forms a closed-loop structure in the processing flow: "semantic constraint → spatial filtering → path modeling → boundary generation." This not only changes the way point cloud segmentation results are used but also reconstructs the technical path for ground line generation. This combination establishes new data dependencies and functional synergies between the various steps.

[0237] See Figure 6 , Figure 6 This is a schematic diagram of a point cloud-based ground line generation device provided in an embodiment of this application. The ground line generation device can implement the above-described ground line generation method entirely or partially through software, hardware, firmware, or any combination thereof. In a specific implementation, the ground line generation device includes:

[0238] The acquisition module is used to acquire the raw point cloud data collected by multiple lidar sensors on the vehicle.

[0239] The stitching module is used to stitch multiple original point clouds to the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud.

[0240] The segmentation module is used to input the fused point cloud into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation processing on the fused point cloud, divides the non-ground region from the three-dimensional space and outputs a two-dimensional grid map. Each grid of the two-dimensional grid map stores a classification label, which is used to identify whether the spatial region corresponding to the grid is a non-ground region.

[0241] The filtering module is used to perform grid index mapping on each three-dimensional point in the fused point cloud based on the two-dimensional grid map, and filter out non-ground points located in non-ground areas from the three-dimensional points according to the classification labels to obtain a set of non-ground points.

[0242] An indexing module is used to generate a grid index sequence from the origin of the corresponding lidar coordinate system onto a two-dimensional grid map as the starting point and the projection of each non-ground point in the set of non-ground points onto the two-dimensional grid map as the ending point.

[0243] A generation module is used to select ground line points from the non-ground points of all rasters in the raster index sequence, and generate ground lines based on the plurality of ground line points.

[0244] This application also provides an electronic device, including at least one processor, a memory, and a communication interface, wherein the processor is used to execute... Figures 2 to 5 The method for generating ground lines.

[0245] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

[0246] like Figure 7 As shown, the electronic device 700 includes at least one processor 701, a memory 703, and a communication interface 702. The processor 701, memory 703, and communication interface 702 are communicatively connected, or can communicate via wireless transmission or other means. The communication interface 702 is used to receive 3D point cloud data sent by a sensing module (e.g., LiDAR); the memory 703 stores computer instructions, and the processor 701 executes these computer instructions to perform a ground line generation method as described in the aforementioned method embodiment.

[0247] It should be understood that, in the embodiments of this application, the processor 701 may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor.

[0248] The memory 703 may include read-only memory and random access memory, and provides instructions and data to the processor 701. The memory 703 may also include non-volatile random access memory.

[0249] The memory 703 can be volatile memory or non-volatile memory, or it can include both. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM).

[0250] It should be understood that the electronic device 700 according to the embodiments of this application can perform the implementation of the embodiments of this application. Figures 2 to 5 The diagram illustrates a method for generating ground lines. A detailed description of this method is provided above and will not be repeated here for the sake of brevity.

[0251] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0252] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented in hardware, processor-executed software modules, or a combination of both. The software modules can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or any other form of storage medium known in the art.

[0253] This application also provides an intelligent driving device, which, in conjunction with the above-described embodiments, provides a smart driving device. Figure 1 The description of intelligent driving equipment is as follows: Figure 1 The structure of the vehicle shown includes a corresponding intelligent driving device that can be deployed. Figure 6 The ground line generation device described in the embodiments is used to achieve Figures 2 to 5 The ground line generation method in the corresponding embodiment.

[0254] In another implementation, a corresponding [device] can be deployed on the intelligent driving device. Figure 7 The electronic device described in the embodiments is used to implement Figures 2 to 5 The ground line generation function in the corresponding embodiment.

[0255] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this application. It should be understood that the above description is only a specific embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. A method for generating ground lines based on point clouds, characterized in that, Includes the following steps: Acquire the raw point cloud data collected by multiple lidar sensors on the vehicle; Multiple original point clouds are stitched together into the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud; The fused point cloud is input into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation on the fused point cloud, divides the non-ground region from the three-dimensional space and outputs a two-dimensional grid map. Each grid of the two-dimensional grid map stores a classification label, which is used to identify whether the spatial region corresponding to the grid is a non-ground region. Based on the two-dimensional raster map, each three-dimensional point in the fused point cloud is mapped by raster index, and non-ground points located in non-ground areas are filtered out from the three-dimensional points according to the classification labels to obtain a set of non-ground points. Starting from the projection of the origin of the corresponding lidar coordinate system onto the two-dimensional grid map, and ending with the projection of each non-ground point in the set of non-ground points onto the two-dimensional grid map, a grid index sequence is generated between the starting point and the ending point. Select ground line points from the non-ground points of all rasters in the raster index sequence, and generate a ground line based on the multiple ground line points; The process of selecting ground line points from the non-ground points of all rasters in the raster index sequence specifically includes the following steps: The classification label of each grid cell is queried sequentially along the grid index sequence; If the classification label of a raster is a non-ground area, then output a non-ground point of that raster as a candidate point; If the classification label identifier of all rasters in the raster index sequence does not contain any non-ground areas, the non-ground point corresponding to the endpoint will be output as a ground line point. Calculate the Euclidean distance between all candidate points in the grid index sequence and the origin of the lidar coordinate system, and output the candidate point with the smallest Euclidean distance as the ground line point.

2. The method for generating ground lines based on point clouds according to claim 1, characterized in that, The raster index mapping is performed on each 3D point in the fused point cloud, specifically as follows: For three-dimensional points Calculate its corresponding raster index according to the following formula. ; ; In the formula, These are the coordinates of a 3D point in the vehicle coordinate system. These are the minimum coordinate values ​​of a rectangular region in the Y-axis and X-axis directions of a 2D raster image. The spatial resolution of a two-dimensional raster image; This indicates rounding down to the nearest integer.

3. The method for generating ground lines based on point clouds as described in claim 1, characterized in that, Generating the raster index sequence between the start and end points specifically includes the following steps: The projected coordinates of the starting point and the ending point in the two-dimensional raster map are converted into raster indices. and raster index ; Generate raster indexes using the two-dimensional Bresenham algorithm. To raster index All raster indexes traversed; Generate a raster index sequence by sequentially processing all raster indices. Where n is a positive integer, .

4. The method for generating ground lines based on point clouds as described in claim 1, characterized in that, Output a non-ground point of the raster as a candidate point, specifically: When the grid contains only one non-ground point, output that non-ground point as a candidate point. When there are multiple non-ground points in the same grid, calculate the Euclidean distance between each non-ground point and the origin of the coordinate system of the lidar, and output the non-ground point with the smallest Euclidean distance as a candidate point.

5. A method for generating ground lines based on point clouds as described in any one of claims 1 to 4, characterized in that, Inputting the fused point cloud into a pre-trained point cloud segmentation neural network further includes preprocessing the fused point cloud to obtain a network input feature tensor, specifically including the following steps: The fused point cloud is height-cropped to retain three-dimensional points within a preset height range; The retained 3D points are divided according to the preset voxel resolution, and the number of points in each voxel is counted. The number of three-dimensional points within a voxel is truncated, and the logarithmic value is calculated to obtain the voxel point count feature. Extract the voxel statistical features for each voxel and concatenate them to form a 30-dimensional voxel feature vector; The retained 3D points are divided in the XY plane according to Pillar resolution, and the maximum and minimum heights are extracted based on the 3D points in each Pillar to form 2D geometric features. For each Pillar, the 30-dimensional voxel feature vector of multiple voxels is subjected to element-wise max pooling to obtain aggregated voxel features, which are then concatenated with the corresponding 2-dimensional geometric features to form a 32-dimensional feature vector. Arrange the 32-dimensional feature vectors of each Pillar according to their corresponding grid coordinates to form a three-dimensional feature tensor; Generate a binary Confidence Mask with the same size as the three-dimensional feature tensor. Set the Pillar position containing three-dimensional points to 1 and the Pillar position not containing three-dimensional points to 0. Then, concatenate the binary Confidence Mask as an additional channel with the 32-dimensional feature vector to form the network input feature tensor.

6. A ground line generation device based on point clouds, characterized in that, include: The acquisition module is used to acquire the raw point cloud data collected by multiple lidar sensors on the vehicle. The stitching module is used to stitch multiple original point clouds to the vehicle coordinate system through extrinsic parameter transformation to obtain a fused point cloud. The segmentation module is used to input the fused point cloud into a pre-trained point cloud segmentation neural network. The point cloud segmentation neural network performs semantic segmentation processing on the fused point cloud, divides the non-ground region from the three-dimensional space and outputs a two-dimensional grid map. Each grid of the two-dimensional grid map stores a classification label, which is used to identify whether the spatial region corresponding to the grid is a non-ground region. The filtering module is used to perform grid index mapping on each three-dimensional point in the fused point cloud based on the two-dimensional grid map, and filter out non-ground points located in non-ground areas from the three-dimensional points according to the classification labels to obtain a set of non-ground points. An indexing module is used to generate a grid index sequence from the origin of the corresponding lidar coordinate system onto a two-dimensional grid map as the starting point and the projection of each non-ground point in the set of non-ground points onto the two-dimensional grid map as the ending point. A generation module is used to select ground line points from non-ground points of all rasters in the raster index sequence, and generate ground lines based on multiple ground line points; The generation module selects ground line points from the non-ground points of all rasters in the raster index sequence, specifically including the following steps: The classification label of each grid cell is queried sequentially along the grid index sequence; If the classification label of a raster is a non-ground area, then output a non-ground point of that raster as a candidate point; If the classification label identifier of all rasters in the raster index sequence does not contain any non-ground areas, the non-ground point corresponding to the endpoint will be output as a ground line point. Calculate the Euclidean distance between all candidate points in the grid index sequence and the origin of the lidar coordinate system, and output the candidate point with the smallest Euclidean distance as the ground line point.

7. An electronic device, characterized in that, The device includes a processor coupled to a memory, the memory storing program instructions, which, when executed by the processor, implement a ground line generation method based on point clouds as described in any one of claims 1 to 5.

8. An intelligent driving device, characterized in that, Includes the point cloud-based ground line generation device as described in claim 6.