A method and apparatus for processing point cloud features
By extracting and fusing features from LiDAR point clouds using 2D bird's-eye view and front view projections, the problems of high computational cost and long time consumption in point cloud feature extraction are solved, achieving more efficient feature extraction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUZHOU QINGZHOU ZHIHANG INTELLIGENT TECH CO LTD
- Filing Date
- 2022-08-29
- Publication Date
- 2026-06-16
AI Technical Summary
In existing technologies, point cloud feature extraction methods involve large computational loads and excessive time consumption, which can easily lead to computation timeouts.
The feature extraction method using two-dimensional bird's-eye view and front view projection is adopted. First, bird's-eye view features and front view features are extracted from the lidar point cloud. Then, the front view features are fused into the bird's-eye view features to form three-dimensional fused features, which replaces the traditional three-dimensional voxel network feature extraction method.
It reduces the computational cost of feature extraction, shortens the computation time, and solves the computation timeout problem of traditional 3D voxel network feature extraction methods.
Smart Images

Figure CN115457357B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, and in particular to a method and apparatus for processing point cloud features. Background Technology
[0002] In the field of autonomous driving, LiDAR is one of the main sensors used to acquire environmental perception data. The perception module of an autonomous driving system performs point cloud feature extraction and target detection based on a point cloud target detection model for LiDAR point clouds. Common point cloud target detection models use voxel feature extraction by default when extracting point cloud features from LiDAR point clouds. This involves dividing the point cloud space into a network of voxels composed of multiple three-dimensional meshes and calculating features from the voxel network. However, in practical applications, we have found that this conventional processing method has a high computational load and is too time-consuming, easily leading to computation timeouts. Summary of the Invention
[0003] The purpose of this invention is to address the shortcomings of existing technologies by providing a method, apparatus, electronic device, and computer-readable storage medium for processing point cloud features. After obtaining a lidar point cloud, a two-dimensional bird's-eye view (BEV) and front view (FV) projection is first performed on the lidar point cloud. Then, features are extracted from the two-dimensional bird's-eye view and front view to obtain corresponding two-dimensional bird's-eye view and front view features. Finally, the front view features are fused into the bird's-eye view features to obtain a three-dimensional fused feature. This invention replaces the conventional three-dimensional voxel network feature extraction method with a two-dimensional bird's-eye view + front view feature fusion method, which reduces the computational load and shortens the feature extraction time, thus solving the computation timeout problem of conventional three-dimensional voxel network feature extraction methods.
[0004] To achieve the above objectives, a first aspect of the present invention provides a method for processing point cloud features, the method comprising:
[0005] Obtain the lidar point cloud as the corresponding first point cloud;
[0006] Bird's-eye view features are extracted from the first point cloud to generate the corresponding first bird's-eye view feature tensor;
[0007] The first point cloud is subjected to front-look feature extraction to generate the corresponding first front-look feature tensor.
[0008] The first bird's-eye view feature tensor and the first front view feature tensor are fused to generate the corresponding first three-dimensional feature tensor.
[0009] Preferably, the step of extracting bird's-eye view features from the first point cloud to generate a corresponding first bird's-eye view feature tensor specifically includes:
[0010] A bird's-eye view projection is performed on the first point cloud to generate a first bird's-eye view with a graphic size of H1×W1; and based on the preset bird's-eye view grid size △h1×△w1, the first bird's-eye view is divided into grids to obtain a first bird's-eye view network composed of X1×Y1 first bird's-eye view grids; H1 and W1 are the height and width of the first bird's-eye view, respectively; X1=int(H1 / △h1), Y1=int(W1 / △w1), and int() is the floor function;
[0011] Based on a preset bird's-eye view feature extraction network, the first bird's-eye view is processed by extracting features from each first bird's-eye view grid as a feature extraction unit to generate the corresponding first bird's-eye view feature tensor; the shape of the first bird's-eye view feature tensor is X1×Y1×C, where C is the preset number of feature channels.
[0012] Preferably, the step of extracting forward-looking features from the first point cloud to generate a corresponding first forward-looking feature tensor specifically includes:
[0013] Projecting the first point cloud onto the front view generates a first front view with a graphic size of H2×W2; and based on the preset front view grid size △h2×△w2, dividing the first front view into a grid to obtain a first front view network composed of Z1×X2 first front view grids; H2 and W2 are the height and width of the first front view, respectively; Z1=int(H2 / △h2), X2=int(W2 / △w2);
[0014] Based on a preset front view feature extraction network, each of the first front view grids is used as a feature extraction unit to perform feature extraction processing on the first front view to generate the corresponding first front view feature tensor; the shape of the first front view feature tensor is Z1×X2×C.
[0015] Preferably, the step of fusing the first bird's-eye view feature tensor and the first forward-looking feature tensor to generate the corresponding first three-dimensional feature tensor specifically includes:
[0016] The first bird's-eye view feature tensor is decomposed into X1×Y1 first feature tensors A of shape 1×1×C in a cell grid manner. i,j ; 1≤i≤X1, 1≤j≤Y1;
[0017] The first forward-looking feature tensor is decomposed into X2 second feature tensors B of shape Z1×1×C in a columnar manner. k ; 1≤k≤X2;
[0018] From all the second feature tensors B k Select from each of the first feature tensors A i,jThe second feature tensor of the match is taken as the corresponding matching feature tensor B. * ;
[0019] For each of the first feature tensors A i,j and the corresponding matching feature tensor B * Feature fusion is performed to generate a third feature tensor D with a shape of 1×1×Z1×C. i,j ;
[0020] The third feature tensor D is obtained by obtaining X1×Y1 tensors. i,j The first three-dimensional feature tensor is formed accordingly; the shape of the first three-dimensional feature tensor is X1×Y1×Z1×C.
[0021] Furthermore, the second feature tensor B k Select from each of the first feature tensors A i,j The second feature tensor of the match is taken as the corresponding matching feature tensor B. * Specifically, it includes:
[0022] For each of the first feature tensors A i,j Perform traversal; during traversal, the first feature tensor A currently being traversed is... i,j As the current feature tensor A i,j and the current feature tensor A i,j Extract the subscript i as the current subscript index; and extract each of the second feature tensors B k The subscript k is used as the corresponding second subscript index; and the absolute difference between the current subscript index and each of the second subscript indices is calculated to generate the corresponding first absolute difference; and the minimum value is selected from the obtained X2 first absolute differences as the corresponding minimum absolute difference; and the second feature tensor B corresponding to the minimum absolute difference is... k As the current feature tensor A i,j The corresponding matching feature tensor B * .
[0023] Furthermore, the first feature tensor A... i,j and the corresponding matching feature tensor B * Feature fusion is performed to generate a third feature tensor D with a shape of 1×1×Z1×C. i,j Specifically, it includes:
[0024] The matching feature tensor B with shape Z1×1×C * Decomposed into Z1 fourth characteristic tensors b of shape 1×1×C g ;1≤g≤Z1;
[0025] For the first feature tensor A of shape 1×1×C i,j The fourth feature tensor b with shape 1×1×C g Perform tensor cross product calculation to generate the corresponding fifth feature tensor d g The fifth feature tensor d g Its shape is 1×1×C;
[0026] And the Z1 fifth feature tensors d obtained g The corresponding third feature tensor D i,j The third feature tensor D i,j Its shape is 1×1×Z1×C.
[0027] A second aspect of the present invention provides an apparatus for implementing the point cloud feature processing method described in the first aspect above, the apparatus comprising: an acquisition module, a bird's-eye view feature processing module, a forward-looking feature processing module, and a feature fusion processing module;
[0028] The acquisition module is used to acquire the lidar point cloud as the corresponding first point cloud;
[0029] The bird's-eye view feature processing module is used to extract bird's-eye view features from the first point cloud to generate a corresponding first bird's-eye view feature tensor.
[0030] The forward-looking feature processing module is used to extract forward-looking features from the first point cloud to generate a corresponding first forward-looking feature tensor.
[0031] The feature fusion processing module is used to perform feature fusion on the first bird's-eye view feature tensor and the first front view feature tensor to generate a corresponding first three-dimensional feature tensor.
[0032] A third aspect of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;
[0033] The processor is used to couple with the memory, read and execute instructions in the memory to implement the steps of the method described in the first aspect above;
[0034] The transceiver is coupled to the processor, and the processor controls the transceiver to send and receive messages.
[0035] A fourth aspect of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the instructions described in the first aspect.
[0036] This invention provides a method, apparatus, electronic device, and computer-readable storage medium for processing point cloud features. After obtaining a lidar point cloud, a two-dimensional bird's-eye view and a front view projection are first performed on the lidar point cloud. Then, features are extracted from the two-dimensional bird's-eye view and the front view to obtain corresponding two-dimensional bird's-eye view and front view features. Finally, the front view features are fused into the bird's-eye view features to obtain a three-dimensional fused feature. This invention replaces the conventional three-dimensional voxel network feature extraction method with a two-dimensional bird's-eye view + front view feature fusion method, reducing the computational load and shortening the computation time for feature extraction, thus solving the computation timeout problem of the conventional three-dimensional voxel network feature extraction method. Attached Figure Description
[0037] Figure 1 This is a schematic diagram of a point cloud feature processing method provided in Embodiment 1 of the present invention;
[0038] Figure 2 This is a module structure diagram of a point cloud feature processing device provided in Embodiment 2 of the present invention;
[0039] Figure 3 This is a schematic diagram of the structure of an electronic device provided in Embodiment 3 of the present invention. Detailed Implementation
[0040] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this invention, and not all embodiments. Based on the embodiments of this invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this invention.
[0041] Embodiment 1 of the present invention provides a method for processing point cloud features, such as... Figure 1 The schematic diagram shows a point cloud feature processing method provided in Embodiment 1 of the present invention. The method mainly includes the following steps:
[0042] Step 1: Obtain the lidar point cloud as the corresponding first point cloud.
[0043] Here, the perception module of the autonomous driving system acquires the real-time LiDAR point cloud, i.e., the first point cloud, from the onboard LiDAR. The first point cloud consists of multiple radar scan points, each corresponding to a three-dimensional coordinate and a radar reflection intensity. The coordinate system of this three-dimensional coordinate is the LiDAR coordinate system by default. It should be noted that, after obtaining the first point cloud, this embodiment of the invention will further crop the first point cloud based on a pre-set point cloud spatial size; and then filter outlier scan points from the cropped first point cloud using a preset statistical filter.
[0044] Step 2: Extract bird's-eye view features from the first point cloud to generate the corresponding first bird's-eye view feature tensor;
[0045] Specifically, it includes: Step 21, projecting a bird's-eye view onto the first point cloud to generate a first bird's-eye view with a graphic size of H1×W1; and based on the preset bird's-eye view grid size △h1×△w1, dividing the first bird's-eye view into a grid to obtain a first bird's-eye view network composed of X1×Y1 first bird's-eye view grids;
[0046] Where H1 and W1 are the height and width of the first bird's-eye view, respectively; △h1 and △w1 are the height and width of the bird's-eye view grid, respectively; X1 = int(H1 / △h1), Y1 = int(W1 / △w1), and int() is the round-up function;
[0047] Specifically, it includes: Step 211, projecting a bird's-eye view onto the first point cloud to generate a first bird's-eye view with a graphic size of H1×W1;
[0048] Specifically, it includes: step 2111, performing coordinate transformation from the lidar coordinate system to the vehicle coordinate system on the three-dimensional coordinates of each scanning point in the first point cloud to obtain the corresponding second point cloud;
[0049] Here, each scan point of the second point cloud corresponds one-to-one with each scan point of the first point cloud; each scan point of the second point cloud also corresponds to a three-dimensional coordinate and a radar reflection intensity, except that the three-dimensional coordinates of each scan point of the second point cloud are based on the three-dimensional coordinates (x, y, z) of the vehicle coordinate system.
[0050] Step 2112: Extract the extreme coordinates on the xy-axis coordinate plane of the second point cloud as the corresponding x min x max y min and y max ; and based on x in the xy-axis coordinate plane min x max Draw two perpendicular lines to the x-axis, based on y min and y max Draw two perpendicular lines to the y-axis, and use the rectangular plane formed by the four perpendicular lines as the projection plane of the bird's-eye view;
[0051] Step 2113: Record the xy-axis coordinate components of each scanning point in the second point cloud on the bird's-eye view projection plane as the corresponding bird's-eye view projection point coordinates (x, y); and form the corresponding bird's-eye view projection point feature by the z-axis coordinate of the scanning point with the highest z-axis height corresponding to each bird's-eye view projection point coordinate (x, y) and the radar reflection intensity.
[0052] Here, in the second point, there may be multiple scanning points at different heights in the cloud with the same bird's-eye view projection point coordinates (x, y). In this embodiment of the invention, the height information, i.e. z-axis coordinate, and reflection intensity information, i.e. radar reflection intensity, of the highest scanning point are selected by default as the corresponding bird's-eye view projection point features.
[0053] Step 2114: Construct a first bird's-eye view with a graphic size of H1×W1 based on the bird's-eye view projection plane; and set the pixel features of the first bird's-eye view to consist of height features and reflection intensity features; and record the pixels on the first bird's-eye view corresponding to the coordinates (x,y) of each bird's-eye view projection point as first projected pixels, and record all pixels on the first bird's-eye view other than the first projected pixels as first extended pixels; and set the height features and reflection intensity features of each first projected pixel based on the corresponding bird's-eye view projection point features; and use bilinear interpolation to set the height features and reflection intensity features of the first extended pixels around each first projected pixel based on the height features and reflection intensity features of each first projected pixel.
[0054] Here, the height H1 and x of the first bird's-eye view max and x min The absolute difference is directly proportional to the width W1 and y. max and y min The absolute difference is proportional; the feature dimension of the first bird's-eye view is 2, including height feature and reflection intensity feature; the pixel on the first bird's-eye view that corresponds to the second point cloud projection point is the first projection pixel, and the others are the first extended pixels; the feature of the first projection pixel comes from the feature of the corresponding bird's-eye view projection point; the feature of the first extended pixel is smoothly predicted based on the features of the surrounding first projection pixels using bilinear interpolation.
[0055] Step 212: Based on the preset bird's-eye view grid size △h1×△w1, the first bird's-eye view is divided into grids to obtain a first bird's-eye view network consisting of X1×Y1 first bird's-eye view grids; X1=int(H1 / △h1), Y1=int(W1 / △w1), int() is the floor function;
[0056] Here, X1 is the total number of rows in the first bird's-eye view network, and Y1 is the total number of columns in the first bird's-eye view network;
[0057] Step 22: Based on the preset bird's-eye view feature extraction network, the first bird's-eye view is processed by feature extraction using each first bird's-eye view grid as a feature extraction unit to generate the corresponding first bird's-eye view feature tensor; wherein, the shape of the first bird's-eye view feature tensor is X1×Y1×C, and C is the preset number of feature channels.
[0058] Here, the bird's-eye view feature extraction network pre-selected in this embodiment of the invention includes a first feature extraction network and a first upsampling network. The first feature extraction network is composed of a multi-layer convolutional neural network. In this embodiment of the invention, when performing feature extraction, the first bird's-eye view is tensor-transformed according to the shape X1×Y1 of the first bird's-eye view network to obtain a first input feature tensor with a shape of X1×Y1×2. The first input feature tensor is then fed into the first feature extraction network of the bird's-eye view feature extraction network for convolution calculation to obtain the corresponding first output feature tensor. The first output feature tensor is then input into the first upsampling network of the bird's-eye view feature extraction network for upsampling processing to obtain the final first bird's-eye view feature tensor. The size of the first bird's-eye view feature tensor is the same as the size of the first input feature tensor, both being X1×Y1. The feature channel dimension, i.e., the number of feature channels C, of the first bird's-eye view feature tensor is determined by the network parameters of the bird's-eye view feature extraction network.
[0059] Step 3: Extract the front-looking features from the first point cloud to generate the corresponding first front-looking feature tensor;
[0060] Specifically, it includes: Step 31, projecting the first point cloud into a front view to generate a first front view with a graphic size of H2×W2; and based on the preset front view grid size △h2×△w2, dividing the first front view into a grid to obtain a first front view network composed of Z1×X2 first front view grids.
[0061] Where H2 and W2 are the height and width of the first front view, respectively; △h2 and △w2 are the height and width of the front view grid, respectively; Z1 = int(H2 / △h2), X2 = int(W2 / △w2), and int() is the round-up function;
[0062] Specifically, it includes: Step 311, projecting the first point cloud onto the front view to generate a first front view with a graphic size of H2×W2;
[0063] Specifically, it includes: Step 3111, performing coordinate transformation from the lidar coordinate system to the vehicle coordinate system on the three-dimensional coordinates of each scanning point in the first point cloud to obtain the corresponding third point cloud;
[0064] Here, each scan point of the obtained third point cloud corresponds one-to-one with each scan point of the first point cloud; each scan point of the third point cloud also corresponds to a three-dimensional coordinate and a radar reflection intensity, except that the three-dimensional coordinates of each scan point of the third point cloud are based on the three-dimensional coordinates (x, y, z) of the vehicle coordinate system.
[0065] Step 3112: Extract the extreme coordinates on the xz-axis coordinate plane of the third point cloud as the corresponding x min x max z min and z max; and based on x in the xz coordinate plane min x max Draw two perpendicular lines to the x-axis, based on z min and z max Draw two perpendicular lines to the z-axis, and use the rectangular plane formed by the four perpendicular lines as the projection plane of the front view;
[0066] Step 3113: Record the xz axis coordinate components of each scanning point in the third point cloud on the front view projection plane as the corresponding front view projection point coordinates (x,z); and form the corresponding front view projection point feature by the y-axis coordinate of the scanning point with the deepest y-axis depth corresponding to each front view projection point coordinate (x,z) and the radar reflection intensity.
[0067] Here, in the third point cloud, there may be multiple scanning points at different heights whose forward projection point coordinates (x, z) are equal. In this embodiment of the invention, the depth information, i.e., the y-axis coordinate, and the reflection intensity information, i.e., the radar reflection intensity, of the deepest scanning point are selected by default as the corresponding forward projection point features.
[0068] Step 3114: Construct a first front view with a graphic size of H2×W2 based on the front view projection plane; and set the pixel features of the first front view to consist of depth features and reflection intensity features; and record the pixels on the first front view corresponding to the coordinates (x,z) of each front view projection point as second projection pixels, and record all pixels on the first front view other than the second projection pixels as second extended pixels; and set the depth features and reflection intensity features of each second projection pixel based on the corresponding front view projection point features; and use bilinear interpolation to set the depth features and reflection intensity features of the second extended pixels around each second projection pixel according to the depth features and reflection intensity features of each second projection pixel.
[0069] Here, the height H2 of the first front view is related to x. max and x min The absolute difference is directly proportional to the width W2 and z. max and z min The absolute difference is proportional; the feature dimension of the first front view is 2, including depth features and reflection intensity features; the pixel corresponding to the second point cloud projection point on the first front view is the second projection pixel, and the others are the second extended pixels; the features of the second projection pixel come from the features of the corresponding front projection point; the features of the second extended pixels are smoothly predicted based on the features of the surrounding second projection pixels using bilinear interpolation.
[0070] Step 312: Based on the preset front view grid size △h2×△w2, the first front view is divided into grids to obtain a first front view network consisting of Z1×X2 first front view grids; Z1=int(H2 / △h2), X2=int(W2 / △w2), and int() is the floor function;
[0071] Here, Z1 is the total number of rows in the first front view network, and Y1 is the total number of columns in the first front view network;
[0072] Step 32: Based on the preset front view feature extraction network, the first front view is processed by feature extraction using each first front view grid as a feature extraction unit to generate the corresponding first front view feature tensor.
[0073] The shape of the first forward-looking feature tensor is Z1×X2×C.
[0074] Here, the pre-selected front view feature extraction network in this embodiment of the invention includes a second feature extraction network and a second upsampling network. The second feature extraction network is composed of a multi-layer convolutional neural network. In this embodiment of the invention, when performing feature extraction, the first front view is tensor-transformed according to the shape Z1×X2 of the first front view network to obtain a second input feature tensor with a shape of Z1×X2×2. The second input feature tensor is then fed into the second feature extraction network of the front view feature extraction network for convolution calculation to obtain the corresponding second output feature tensor. The second output feature tensor is then input into the second upsampling network of the front view feature extraction network for upsampling processing to obtain the final first front view feature tensor. The size of the first front view feature tensor is the same as the size of the second input feature tensor, both being Z1×X2. The feature channel dimension of the first front view feature tensor is determined by the network parameters of the front view feature extraction network. In this embodiment of the invention, the pre-defined front view feature extraction network and the bird's-eye view feature extraction network pre-agree to output the same feature channel dimension, that is, the feature channel dimension of the first front view feature tensor is the pre-defined number of feature channels C.
[0075] Step 4: Perform feature fusion on the first bird's-eye view feature tensor and the first forward-looking feature tensor to generate the corresponding first three-dimensional feature tensor;
[0076] Specifically, this includes: Step 41, decomposing the first bird's-eye view feature tensor into X1×Y1 first feature tensors A of shape 1×1×C in a cell grid manner. i,j ;
[0077] Where 1≤i≤X1, 1≤j≤Y1;
[0078] Here, each first feature tensor A i,j Corresponding to a first bird's-eye view grid in the first bird's-eye view network, the first feature tensor A i,jThis can be considered as a unit grid feature corresponding to the first bird's-eye view grid;
[0079] Step 42: Decompose the first forward-looking feature tensor into X2 second feature tensors B of shape Z1×1×C in a columnar manner. k ;
[0080] Where 1≤k≤X2;
[0081] Here, each second feature tensor B k Corresponding to a column of the first front view grid in the first front view network; the second feature tensor B k It can be regarded as a column grid feature corresponding to a first front view grid;
[0082] Step 43, from all second feature tensors B k Select from each of the first feature tensors A i,j The second feature tensor of the match is taken as the corresponding matching feature tensor B. * ;
[0083] Specifically, this includes: for each first characteristic tensor A i,j Perform a traversal; during the traversal, the first feature tensor A of the current traversal is... i,j As the current feature tensor A i,j and the current feature tensor A i,j Extract the subscript i as the current subscript index; and extract each second feature tensor B k The subscript k is used as the corresponding second subscript index; the absolute difference between the current subscript index and each of the second subscript indices is calculated to generate the corresponding first absolute difference; the minimum value is selected from the obtained X2 first absolute differences as the corresponding minimum absolute difference; and the second feature tensor B corresponding to the minimum absolute difference is... k As the current feature tensor A i,j The corresponding matching feature tensor B * ;
[0084] Here, as will be seen from the subsequent steps, the feature fusion processing method of this embodiment of the invention uses a column grid feature of the first front view network and a unit grid feature of the first bird's-eye view network to fuse. However, the first bird's-eye view network and the first front view network are not necessarily perfectly aligned, which may result in a first bird's-eye view grid having multiple columns of first front view grids intersecting with it. Therefore, before performing feature fusion, each unit grid feature of the first bird's-eye view network, i.e., the first feature tensor A, needs to be defined. i,j The best-matching column grid feature is selected, i.e., the matching feature tensor B. * ;
[0085] In this embodiment of the invention, during the screening process, the current feature tensor A is first extracted. i,jThe current feature tensor A is obtained by using the subscript i. i,j The x-axis grid index in the first bird's-eye view network is the current footer index; and by extracting each second feature tensor B k The subscript k is used to obtain the values of each second feature tensor B. k The x-axis grid index in the first front view network is the second subscript index; then, the current feature tensor A is obtained by calculating the absolute value of (ik), which is the absolute difference between the current subscript index and each of the second subscript indices. i,j Corresponding grid and each second feature tensor B k The x-axis spacing of the corresponding columns is the first absolute difference, because the second feature tensor B k Since the number is X2, we can obtain X2 x-axis intervals, which are X2 first absolute differences; then, we select the closest second feature tensor B from the X2 x-axis intervals. k That is, the second characteristic tensor B corresponding to the minimum absolute difference. k The filtering result is the matching feature tensor B. * ;
[0086] Step 44, for each first feature tensor A i,j and the corresponding matching feature tensor B * Feature fusion is performed to generate a third feature tensor D with a shape of 1×1×Z1×C. i,j ;
[0087] Here, the feature fusion processing method of this embodiment of the invention uses a column grid feature from the first front view network, namely the matching feature tensor B. * The first feature tensor A is a cell grid feature in the first bird's-eye view network. i,j The mesh is fused to obtain a 3D mesh feature, namely the third feature tensor D, which is based on the bird's-eye view mesh features and simultaneously incorporates the forward-looking height features. i,j ;
[0088] Specifically, this includes: Step 441, converting the matching feature tensor B of shape Z1×1×C... * Decomposed into Z1 fourth characteristic tensors b of shape 1×1×C g ;
[0089] Where 1≤g≤Z1;
[0090] Step 442, for the first feature tensor A of shape 1×1×C i,j With each fourth characteristic tensor b of shape 1×1×C g Perform tensor cross product calculation to generate the corresponding fifth feature tensor d g ;
[0091] Among them, the fifth feature tensor d gIts shape is 1×1×C;
[0092] Step 443, from the obtained Z1 fifth feature tensors d g The corresponding third characteristic tensor D is formed i,j The third characteristic tensor D i,j Its shape is 1×1×Z1×C;
[0093] Here, the third feature tensor D i,j Z1 fifth feature tensors d g Cascaded together;
[0094] Step 45, obtaining X1×Y1 third feature tensors D i,j The corresponding first three-dimensional feature tensor is formed; the shape of the first three-dimensional feature tensor is X1×Y1×Z1×C.
[0095] In summary, this embodiment of the invention achieves a novel 3D point cloud feature extraction method through steps 1-4. Specifically, it first extracts 2D bird's-eye view and forward-looking features from the point cloud separately, and then fuses the two 2D features to obtain the 3D features of the point cloud. Compared to the traditional 3D voxel network feature extraction method, this processing method significantly reduces the computational load and shortens the computation time for point cloud feature extraction, effectively solving the computation timeout problem in conventional processing methods.
[0096] Figure 2 This is a module structure diagram of a point cloud feature processing device provided in Embodiment 2 of the present invention. This device can be a terminal device or server implementing the aforementioned method embodiments, or it can be a device that enables the aforementioned terminal device or server to implement the aforementioned method embodiments. For example, the device can be a device or chip system of the aforementioned terminal device or server. Figure 2 As shown, the device includes: an acquisition module 201, a bird's-eye view feature processing module 202, a forward-looking feature processing module 203, and a feature fusion processing module 204.
[0097] The acquisition module 201 is used to acquire the lidar point cloud as the corresponding first point cloud.
[0098] The bird's-eye view feature processing module 202 is used to extract bird's-eye view features from the first point cloud and generate the corresponding first bird's-eye view feature tensor.
[0099] The forward-looking feature processing module 203 is used to extract forward-looking features from the first point cloud to generate the corresponding first forward-looking feature tensor.
[0100] The feature fusion processing module 204 is used to perform feature fusion on the first bird's-eye view feature tensor and the first front view feature tensor to generate the corresponding first three-dimensional feature tensor.
[0101] The point cloud feature processing device provided in this embodiment of the invention can execute the method steps in the above method embodiment. Its implementation principle and technical effect are similar, and will not be repeated here.
[0102] It should be noted that the division of the various modules in the above device is merely a logical functional division. In actual implementation, they can be fully or partially integrated into a single physical entity, or they can be physically separated. Furthermore, these modules can be implemented entirely in software via processing element calls; they can be fully implemented in hardware; or some modules can be implemented by processing element calls to software, while others are implemented in hardware. For example, the acquisition module can be a separate processing element, or it can be integrated into a chip in the above device. Alternatively, it can be stored as program code in the memory of the above device, and called and executed by a processing element of the device. The implementation of other modules is similar. Moreover, these modules can be fully or partially integrated together, or they can be implemented independently. The processing element described here can be an integrated circuit with signal processing capabilities. In the implementation process, each step of the above method or each of the above modules can be completed through integrated logic circuits in the hardware of the processor element or through software instructions.
[0103] For example, these modules can be one or more integrated circuits configured to implement the above methods, such as one or more Application Specific Integrated Circuits (ASICs), one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs). As another example, when a module is implemented using processing element scheduler code, the processing element can be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. Furthermore, these modules can be integrated together as a System-on-a-Chip (SOC).
[0104] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. This computer program product includes one or more computer instructions. When these computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the foregoing method embodiments are generated. The computer described above can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The aforementioned computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the aforementioned computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, Bluetooth, microwave, etc.) means. The aforementioned computer-readable storage medium can be any available medium that a computer can access, or a data storage device such as a server or data center that integrates one or more available media. The aforementioned available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state disks (SSDs)).
[0105] Figure 3 This is a schematic diagram of an electronic device provided in Embodiment 3 of the present invention. This electronic device can be the aforementioned terminal device or server, or it can be a terminal device or server connected to the aforementioned terminal device or server that implements the method of the embodiments of the present invention. Figure 3 As shown, the electronic device may include: a processor 301 (e.g., CPU), a memory 302, and a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transmission and reception operations of the transceiver 303. The memory 302 may store various instructions for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device involved in the embodiments of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to realize communication connections between components. The communication port 306 is used for communication between the electronic device and other peripherals.
[0106] exist Figure 3The system bus 305 mentioned can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This system bus can be divided into address bus, data bus, control bus, etc. For ease of representation, it is represented by only one thick line in the figure, but this does not indicate that there is only one bus or one type of bus. The communication interface is used to enable communication between the database access device and other devices (e.g., clients, read-write libraries, and read-only libraries). Memory may include Random Access Memory (RAM) and may also include non-volatile memory, such as at least one disk storage device.
[0107] The processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), graphics processing units (GPUs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0108] It should be noted that the embodiments of the present invention also provide a computer-readable storage medium storing instructions that, when run on a computer, cause the computer to perform the methods and processes provided in the above embodiments.
[0109] This invention also provides a chip for executing instructions, which is used to perform the processing steps described in the foregoing method embodiments.
[0110] This invention provides a method, apparatus, electronic device, and computer-readable storage medium for processing point cloud features. After obtaining a lidar point cloud, a two-dimensional bird's-eye view and a front view projection are first performed on the lidar point cloud. Then, features are extracted from the two-dimensional bird's-eye view and the front view to obtain corresponding two-dimensional bird's-eye view and front view features. Finally, the front view features are fused into the bird's-eye view features to obtain a three-dimensional fused feature. This invention replaces the conventional three-dimensional voxel network feature extraction method with a two-dimensional bird's-eye view + front view feature fusion method, reducing the computational load and shortening the computation time for feature extraction, thus solving the computation timeout problem of the conventional three-dimensional voxel network feature extraction method.
[0111] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0112] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented in hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
[0113] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above description is only a specific embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for processing point cloud features, characterized in that, The method includes: Obtain the lidar point cloud as the corresponding first point cloud; Bird's-eye view features are extracted from the first point cloud to generate the corresponding first bird's-eye view feature tensor; The first point cloud is subjected to front-look feature extraction to generate the corresponding first front-look feature tensor. The first bird's-eye view feature tensor and the first front view feature tensor are fused to generate the corresponding first three-dimensional feature tensor. The step of extracting bird's-eye view features from the first point cloud to generate a corresponding first bird's-eye view feature tensor specifically includes: projecting a bird's-eye view onto the first point cloud to generate a first bird's-eye view with a graphic size of H1×W1; and dividing the first bird's-eye view into a grid based on a preset bird's-eye view grid size △h1×△w1 to obtain a first bird's-eye view network composed of X1×Y1 first bird's-eye view grids; H1 and W1 are the height and width of the first bird's-eye view, respectively; X1=int(H1 / △h1), Y1=int(W1 / △w1), and int() is an up-rounding function; and performing feature extraction processing on the first bird's-eye view using each first bird's-eye view grid as a feature extraction unit based on the preset bird's-eye view feature extraction network to generate the corresponding first bird's-eye view feature tensor; the shape of the first bird's-eye view feature tensor is X1×Y1×C, where C is a preset number of feature channels. The step of extracting front-view features from the first point cloud to generate a corresponding first front-view feature tensor specifically includes: projecting the first point cloud into a front view to generate a first front view with a graphic size of H2×W2; and dividing the first front view into a grid based on a preset front view grid size △h2×△w2 to obtain a first front view network composed of Z1×X2 first front view grids; H2 and W2 are the height and width of the first front view, respectively; Z1=int(H2 / △h2), X2=int(W2 / △w2); and performing feature extraction processing on the first front view using each first front view grid as a feature extraction unit based on the preset front view feature extraction network to generate the corresponding first front-view feature tensor; the shape of the first front-view feature tensor is Z1×X2×C. The step of fusing the first bird's-eye view feature tensor and the first forward-looking feature tensor to generate the corresponding first three-dimensional feature tensor specifically includes: decomposing the first bird's-eye view feature tensor into X1×Y1 first feature tensors A of shape 1×1×C in a cell grid manner. i,j ; 1≤i≤X1, 1≤j≤Y1; Decompose the first forward-looking feature tensor into X2 second feature tensors B of shape Z1×1×C in a columnar manner. k ; 1 ≤ k ≤ X2; From all the second feature tensors B k Select from each of the first feature tensors A i,j The second feature tensor of the match is taken as the corresponding matching feature tensor B. * For each of the first feature tensors A i,j and the corresponding matching feature tensor B * Feature fusion is performed to generate a third feature tensor D with a shape of 1×1×Z1×C. i,j ; thus obtaining X1×Y1 of the aforementioned third feature tensors D i,j The corresponding first three-dimensional feature tensor is formed; the shape of the first three-dimensional feature tensor is X1×Y1×Z1×C; The first feature tensor A i,j and the corresponding matching feature tensor B * Feature fusion is performed to generate a third feature tensor D with a shape of 1×1×Z1×C. i,j Specifically, this includes: converting the matching feature tensor B of shape Z1×1×C... * Decomposed into Z1 fourth characteristic tensors b of shape 1×1×C g ; 1≤g≤Z1; for the first feature tensor A of shape 1×1×C i,j The fourth feature tensor b with shape 1×1×C g Perform tensor cross product calculation to generate the corresponding fifth feature tensor d g The fifth feature tensor d g The shape is 1×1×C; and it consists of Z1 fifth feature tensors d. g The corresponding third feature tensor D i,j The third feature tensor D i,j Its shape is 1×1×Z1×C; The step of generating a first bird's-eye view with a size of H1×W1 by projecting the first point cloud onto a bird's-eye view specifically includes: performing a coordinate transformation from the lidar coordinate system to the vehicle coordinate system on the three-dimensional coordinates of each scanning point in the first point cloud to obtain the corresponding second point cloud; and extracting the extreme coordinates on the xy-axis coordinate plane of the second point cloud as the corresponding x min x max y min and y max ; and based on x in the xy-axis coordinate plane min x max Draw two perpendicular lines to the x-axis, based on y min and y max Draw two perpendicular lines to the y-axis, and use the rectangular plane formed by the four perpendicular lines as the projection plane for the bird's-eye view. Record the x and y axis coordinates of each scan point in the second point cloud on the projection plane as the corresponding bird's-eye view projection point coordinates (x, y). The z-axis coordinate of the scan point with the highest z-axis height corresponding to each bird's-eye view projection point coordinate (x, y) and the radar reflection intensity constitute the corresponding bird's-eye view projection point feature. Construct a first bird's-eye view with a size of H1×W1 based on the projection plane. Set the pixel features of the first bird's-eye view to consist of height features and reflection intensity features. The first bird's-eye view is defined as the first projected pixel corresponding to the coordinates (x, y) of each of the bird's-eye view projection points. All pixels on the first bird's-eye view other than the first projected pixels are defined as the first extended pixels. The height and reflection intensity features of each of the first projected pixels are set based on the corresponding bird's-eye view projection point features. Bilinear interpolation is used to set the height and reflection intensity features of the first extended pixels surrounding each of the first projected pixels according to the height and reflection intensity features of each of the first projected pixels. The step of generating a first front view with a graphic size of H2×W2 by projecting the first point cloud onto the front view specifically includes: performing a coordinate transformation from the lidar coordinate system to the vehicle coordinate system on the three-dimensional coordinates of each scanning point in the first point cloud to obtain the corresponding third point cloud; and extracting the extreme coordinates on the xz axis coordinate plane of the third point cloud as the corresponding x min x max z min and z max ; and based on x in the xz coordinate plane min x max Draw two perpendicular lines to the x-axis, based on z min and z max Two perpendicular lines are drawn along the z-axis, and the rectangular plane formed by the four perpendicular lines is used as the front view projection plane. The xz-axis coordinate components of each scan point in the third point cloud on the front view projection plane are recorded as the corresponding front view projection point coordinates (x, z). The y-axis coordinates of the scan point with the deepest y-axis depth corresponding to each of the front view projection point coordinates (x, z) and the radar reflection intensity are used to form the corresponding front view projection point features. A first front view with a graphic size of H2×W2 is constructed based on the front view projection plane. The pixel features of the first front view are set to consist of depth features and reflection intensity features. The pixels on the first front view corresponding to each front view projection point coordinate (x, z) are recorded as second projection pixels, and all pixels on the first front view other than the second projection pixels are recorded as second extended pixels. The depth features and reflection intensity features of each second projection pixel are set based on the corresponding front view projection point features. Bilinear interpolation is used to set the depth features and reflection intensity features of the second extended pixels around each second projection pixel according to the depth features and reflection intensity features of each second projection pixel. The from all the second feature tensors B k Select from each of the first feature tensors A i,j The second feature tensor of the match is taken as the corresponding matching feature tensor B. * Specifically, it includes: For each of the first feature tensors A i,j Perform traversal; during traversal, the first feature tensor A currently being traversed is... i,j As the current feature tensor A i,j and the current feature tensor A i,j Extract the subscript i as the current subscript index; and extract each of the second feature tensors B k The subscript k is used as the corresponding second subscript index; and the absolute difference between the current subscript index and each of the second subscript indices is calculated to generate the corresponding first absolute difference; and the minimum value is selected from the obtained X2 first absolute differences as the corresponding minimum absolute difference; and the second feature tensor B corresponding to the minimum absolute difference is... k As the current feature tensor A i,j The corresponding matching feature tensor B * .
2. An apparatus for performing the point cloud feature processing method according to claim 1, characterized in that, The device includes: an acquisition module, a bird's-eye view feature processing module, a forward-looking feature processing module, and a feature fusion processing module; The acquisition module is used to acquire the lidar point cloud as the corresponding first point cloud; The bird's-eye view feature processing module is used to extract bird's-eye view features from the first point cloud to generate a corresponding first bird's-eye view feature tensor. The forward-looking feature processing module is used to extract forward-looking features from the first point cloud to generate a corresponding first forward-looking feature tensor. The feature fusion processing module is used to perform feature fusion on the first bird's-eye view feature tensor and the first front view feature tensor to generate a corresponding first three-dimensional feature tensor.
3. An electronic device, characterized in that, include: Memory, processor, and transceiver; The processor is configured to be coupled to the memory, read and execute instructions in the memory to implement the method of claim 1; The transceiver is coupled to the processor, and the processor controls the transceiver to send and receive messages.
4. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that, when executed by a computer, cause the computer to perform the method of claim 1.