Night scene oriented autonomous driving perception planning cooperation method and system
By using a multi-task perception model to evaluate nighttime near-infrared images and laser point cloud data from multiple dimensions, a candidate trajectory set is generated and optimized. This solves the problem of insufficient coordination between perception and planning modules in nighttime autonomous driving systems, thereby improving the safety and reliability of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG UNIV
- Filing Date
- 2026-04-14
- Publication Date
- 2026-06-23
AI Technical Summary
Existing autonomous driving systems have technical limitations in their perception modules at night or in low-light conditions. The perception and planning modules lack effective coordination, leading to unstable decision-making and affecting safety and smoothness.
An autonomous driving multi-task perception model is adopted to perform multi-task perception on nighttime near-infrared images and laser point cloud data, generate candidate trajectory sets and conduct multi-dimensional evaluation, forming a complete decision-making link of perception, planning and control, and improve the robustness of the perception model by utilizing cross-modal attention fusion and gating mechanisms.
It achieves highly robust autonomous driving in complex nighttime scenarios, enhances vehicle safety, smoothness and reliability, and improves the closed-loop decision-making capability of perception data at the path planning level.
Smart Images

Figure CN122009247B_ABST
Abstract
Description
TECHNICAL FIELD
[0001] The present application belongs to the technical field of automatic driving, and particularly relates to an automatic driving perception planning coordination method and system for night scenes. BACKGROUND
[0002] The statements in this section merely provide background information related to the present application and do not necessarily constitute prior art.
[0003] The reliable operation of existing automatic driving systems in night or low-light environments faces severe challenges, and the core reason can be attributed to the inherent technical limitations of the perception module at night and the lack of effective coordination architecture design between the perception and planning modules. At the perception level, the multi-modal scheme relying on visible light cameras and laser radars has fundamental defects at night. The image quality of visible light cameras deteriorates severely under complex interference such as low illumination, glare, and reflection, resulting in a significant decrease in texture-dependent semantic perception ability. Laser radars are not affected by light, but their point cloud sparsity limits the ability to detect small targets at a distance, and they cannot independently complete semantic understanding. Existing multi-modal fusion algorithms are mostly designed for daytime data, and it is difficult to effectively balance the feature imbalance between the sharp decline in night image quality and point cloud sparsity. In addition, the demand for features by different tasks of the multi-task oriented network is intensified in night scenes, often leading to serious negative transfer phenomenon, making the overall performance of the model unstable and unreliable.
[0004] The degradation of the above-mentioned perception ability is further amplified in the existing architecture due to the separation of the perception and planning modules. Current systems usually pass the results of each perception task to the planning module in an independent and discrete data structure. This lack of unified environmental model representation makes it difficult for the planning layer to accurately understand and utilize the inherent uncertainty in the perception results. The planning algorithm may make overly aggressive or conservative decisions at night due to the inability to quantify and evaluate these perception uncertainties, essentially leading to a decline in system-level safety and smoothness. SUMMARY
[0005] To solve the above technical problems, the present application provides an automatic driving perception planning coordination method and system for night scenes, which can realize closed-loop decision-making based on uncertainty evaluation, ultimately enhancing the overall safety, smoothness, and reliability of automatic driving vehicles in complex night scenes.
[0006] To achieve the above-mentioned purpose, the present application adopts the following technical solutions:
[0007] The first aspect of the present application provides an automatic driving perception planning coordination method for night scenes.
[0008] In one or more embodiments, an autonomous driving perception planning cooperative method for nighttime scenarios is provided, including:
[0009] Acquire nighttime near-infrared images and laser point cloud data, use the trained autonomous driving multi-task perception model to perform multi-task perception, and uniformly transform the multi-task perception results to the vehicle coordinate system.
[0010] Based on the vehicle's current motion state and navigation target path, multiple candidate trajectories are generated and sampled to form a candidate trajectory set.
[0011] Safety, smoothness, and efficiency are evaluated for each candidate trajectory in the candidate trajectory set. The evaluation values are normalized and then weighted and summed to obtain the comprehensive value. The trajectory with the smallest comprehensive value is selected as the optimal trajectory.
[0012] The selected optimal trajectory is used as a vehicle control command and transmitted to the vehicle control actuator, forming a complete autonomous driving decision-making chain of perception, planning and control.
[0013] As one implementation method, the multi-task perception results include: target detection results, drivable area segmentation mask, lane line detection results, and material classification results; the target detection results are converted into obstacle information, the drivable area segmentation mask is converted into feasible domain boundary parameters, the lane line detection results are converted into parameterized lane centerlines, and the material classification results are converted into road surface friction coefficients.
[0014] As one implementation method, the candidate trajectory set is characterized as follows: :
[0015] ;
[0016] ;
[0017] in, For the first Candidate trajectories, The number of candidate trajectories. For the first The position sequence of candidate trajectories, It is a velocity sequence. For curvature sequence, An acceleration sequence; Let be the arc length along the reference line. To plan the horizontal length.
[0018] As one implementation method, the process of training an autonomous driving multi-task perception model is as follows:
[0019] The original nighttime near-infrared image is enhanced and then stitched together with the original nighttime near-infrared image to form a variable light image group;
[0020] After spatially aligning the laser point cloud data with the variable light image group, point cloud features and image features are extracted.
[0021] Using point cloud features as queries and image features as keys and values, cross-modal attention fusion is performed on point cloud features and image features to obtain enhanced fusion features and extract shared features from them;
[0022] Extract the initial semantic features corresponding to the parallel decoding task, and fuse the shared features and each initial semantic feature through a gating mechanism to obtain the final fused features corresponding to the parallel decoding task;
[0023] Parallel decoding is performed on each final fused feature to obtain the corresponding decoding result. Then, the pre-built autonomous driving multi-task perception model is trained by combining the multi-task loss function.
[0024] As one implementation method, a depth curve estimation network is used to enhance the original nighttime near-infrared image, generating a multi-exposure image, which is then stitched together with the original nighttime near-infrared image along the channel dimension to form a variable-light image group; wherein, the process of generating the multi-exposure image is as follows:
[0025] ;
[0026] in, It is a multi-exposure image; It is the first Exposure transformation function; It is a raw nighttime near-infrared image; These are parameters learned by the network and are used to control brightness stretching and contrast balance under different exposures.
[0027] As one implementation method, during the process of fusing shared features and each initial semantic feature through a gating mechanism, the gating mechanism generates fusion weights for the shared features and each initial semantic feature. ;
[0028] ;
[0029] in, It is a fusion weight; It is the weight matrix of the gated linear transformation; These are the bias terms of the gating mechanism, and both are learnable parameters; It is a shared feature; These are the initial semantic features; Indicates feature splicing; This represents the Sigmoid function.
[0030] As one implementation method, the multi-task loss function is:
[0031] ;
[0032] in, It is the total loss function; Loss for target detection; To segment the Dice loss, Cross-entropy loss for material classification; , , The uncertainty parameter is the learnable task parameter.
[0033] A second aspect of the present invention provides an autonomous driving perception planning and coordination system for nighttime scenarios.
[0034] In one or more embodiments, an autonomous driving perception planning cooperative system for nighttime scenarios includes:
[0035] The multi-task perception and conversion module is used to acquire nighttime near-infrared images and laser point cloud data, use the trained autonomous driving multi-task perception model to perform multi-task perception, and uniformly convert the multi-task perception results to the vehicle coordinate system.
[0036] The candidate trajectory set formation module is used to generate multiple candidate trajectories based on the vehicle's current motion state and navigation target path, and then sample them to form a candidate trajectory set;
[0037] The optimal trajectory selection module is used to evaluate the safety, smoothness, and efficiency of each candidate trajectory in the candidate trajectory set. After normalizing the evaluation cost value, it performs a weighted sum to obtain the comprehensive cost value, and selects the trajectory with the smallest comprehensive cost value as the optimal trajectory.
[0038] The autonomous driving decision link formation module is used to take the selected optimal trajectory as a vehicle control command and transmit it to the vehicle control actuator, forming a complete autonomous driving decision link of perception, planning and control.
[0039] A third aspect of the present invention provides a computer-readable storage medium.
[0040] A computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the steps in the autonomous driving perception planning cooperative method for nighttime scenarios as described above.
[0041] A fourth aspect of the present invention provides an electronic device.
[0042] An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in the autonomous driving perception planning and coordination method for nighttime scenarios described above.
[0043] Compared with the prior art, the beneficial effects of the present invention are:
[0044] This invention utilizes an autonomous driving multi-task perception model to perform multi-task perception on near-infrared images and laser point cloud data at night. Based on the vehicle's current motion state and navigation target path, multiple candidate trajectories are generated and sampled to form a candidate trajectory set. Finally, each candidate trajectory in the candidate trajectory set is evaluated in multiple dimensions, and the selected optimal trajectory is used as the vehicle control command to form a complete autonomous driving decision-making link of perception, planning, and control. This achieves highly robust autonomous driving perception in nighttime scenarios, and then transmits the perception data to the path planning layer to realize closed-loop decision-making based on uncertainty assessment at the system level. Ultimately, this enhances the overall safety, smoothness, and reliability of autonomous vehicles in complex nighttime scenarios. Attached Figure Description
[0045] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0046] Figure 1 This is a flowchart of an autonomous driving perception planning and coordination method for nighttime scenarios according to an embodiment of the present invention;
[0047] Figure 2 This is a flowchart of the variable light image group generation process according to an embodiment of the present invention;
[0048] Figure 3 This is a flowchart of the enhanced fusion feature generation process according to an embodiment of the present invention;
[0049] Figure 4 This is a schematic diagram illustrating the training principle of the autonomous driving multi-task perception model according to an embodiment of the present invention.
[0050] Figure 5 This is a flowchart of the perception planning and collaboration process according to an embodiment of the present invention;
[0051] Figure 6 This is a comparison of the overall performance of the present invention and the baseline model on the test machine in an embodiment of the present invention;
[0052] Figure 7 This is a schematic diagram of the autonomous driving perception planning and coordination system structure for nighttime scenarios according to an embodiment of the present invention;
[0053] Figure 8This is a schematic diagram of an electronic device according to an embodiment of the present invention. Detailed Implementation
[0054] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0055] It should be noted that the following detailed description is illustrative and intended to provide further explanation of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0056] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of exemplary embodiments according to the invention. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.
[0057] Figure 1 A schematic diagram of the collaborative perception and planning method for autonomous driving in nighttime scenarios, according to an embodiment of the present invention, is provided. Figure 1 and Figure 5 The autonomous driving perception planning and coordination method for nighttime scenarios in this embodiment may include the following steps S101 to S104.
[0058] The specific implementation process of steps S101 to S104 is as follows:
[0059] Step S101: Acquire nighttime near-infrared images and laser point cloud data, use the trained autonomous driving multi-task perception model to perform multi-task perception, and uniformly convert the multi-task perception results to the vehicle coordinate system.
[0060] In this embodiment, the multi-task perception results include: target detection results, drivable area segmentation mask, lane line detection results, and material classification results. The target detection results are converted into obstacle information, the drivable area segmentation mask is converted into feasible domain boundary parameters, the lane line detection results are converted into parameterized lane centerlines, and the material classification results are converted into road surface friction coefficients. This provides an environmental constraint basis for subsequent trajectory planning.
[0061] The parameter coordinates of the obstacle's position are obtained by transforming the center point of the detection box in the autonomous driving multi-task perception model. The transformation formula is as follows:
[0062] ;
[0063] in, Indicates the obstacle index. The three-dimensional position coordinates of the obstacle in the vehicle coordinate system. This is the transformation matrix from the infrared camera coordinate system to the vehicle coordinate system. This is the intrinsic parameter matrix of the infrared camera. To detect the pixel coordinates of the bounding box in the image, Distance information is obtained through point cloud or depth estimation.
[0064] The feasible region boundary parameters are expressed as follows:
[0065] ;
[0066] in, This is the set of boundary points of the drivable area. For the first The two-dimensional coordinates of the boundary points in the vehicle coordinate system A segmentation mask for the drivable area. This represents the total number of boundary points.
[0067] The lane center line is represented as: ;
[0068] in, The center line of the lane; For the arc length The coordinates of the centerline in the vehicle coordinate system; Let be the arc length along the reference line. To plan the horizontal length; and For lane line polynomial coefficients.
[0069] The road surface friction coefficient is: ;
[0070] in, For position Belongs to the first The probability of a material type. For the first Reference value for the standard coefficient of friction of similar materials; This represents the total number of material categories.
[0071] In the specific implementation process, the training of the autonomous driving multi-task perception model includes steps a to e.
[0072] Step a: Enhance the original nighttime near-infrared image and then stitch it together with the original nighttime near-infrared image to form a variable light image group.
[0073] In this embodiment, a depth curve estimation network is used to enhance the original nighttime near-infrared image, generating a multi-exposure image, which is then stitched together with the original nighttime near-infrared image along the channel dimension to form a variable-light image group, such as...Figure 2 As shown.
[0074] The depth curve estimation network output layer is configured to simultaneously generate five different sets of curve parameters, each corresponding to an exposure level parameter. Multiple exposure images. For the input raw nighttime near-infrared image. The following transformations are performed using each set of curve parameters to generate multi-exposure images:
[0075] ;
[0076] in, It is a multi-exposure image; It is the first Exposure transformation function; It is a raw nighttime near-infrared image; These are parameters learned by the network, used to control brightness stretching and contrast balance under different exposures. This method overcomes the motion artifact problem in traditional multi-exposure HDR technology and directly optimizes exposure parameters through end-to-end learning.
[0077] This embodiment utilizes a depth curve estimation network to enhance near-infrared images at night, generating a multi-exposure image group with different exposure levels. This transforms a single, low-quality image into a group of variable-light images rich in detail, thereby directly improving the information richness and quality of the network input and laying a reliable data foundation for subsequent perception.
[0078] The generated multi-exposure images are stitched together with the original images along the channel dimension to form a variable light image group. :
[0079] Concat ;
[0080] In the formula, It refers to a group of images with varying brightness; Concat is the stitching operation. It is a multi-exposure enhanced image sequence.
[0081] This operation transforms the dynamic lighting adaptation problem into a feature selection and weighted fusion task within the network. By stitching images with different exposure levels in the channel dimension, the network's bottom convolutional kernels can explicitly learn feature extractors for different brightness ranges, thereby ensuring that the trained model can adaptively handle inputs under various lighting conditions.
[0082] Step b: After spatially aligning the laser point cloud data with the variable light image group, extract the point cloud features and image features.
[0083] To achieve accurate fusion of point cloud features and image features, rigorous joint sensor calibration is necessary to obtain an extrinsic parameter matrix. It includes a rotation matrix. Translation vector After joint calibration of the sensors, any three-dimensional point of the lidar... The following methods can be used to accurately project the data onto the image pixel coordinate system, achieving initial spatial alignment with the visual data:
[0084] ;
[0085] in, These are the homogeneous coordinates of the image pixels; It is the intrinsic parameter matrix of the infrared camera, which includes parameters such as focal length and principal point coordinates; It is a rotation matrix. Both are translation vectors and are components of the extrinsic parameter matrix; It is any three-dimensional point of the lidar. It is an extrinsic parameter matrix.
[0086] After spatial alignment of the laser point cloud data with the variable light image group, a projected point cloud density map is formed. A shared convolutional encoder is used to extract high-level semantic features from the multi-exposure infrared image and the projected point cloud density map, respectively, to obtain image features and point cloud features.
[0087] After spatial alignment, to achieve deep fusion with image features, the sparse and irregular 3D point cloud needs to be converted into a 2D representation that matches the shape of the image features. To this end, a point cloud density map is generated using a 2D Gaussian kernel density estimation method based on adaptive bandwidth. ; and These represent the height and width of the point cloud density map, respectively. Specifically, for each laser point projected onto the image plane... For any position in the density map The contribution is calculated using the two-dimensional Gaussian distribution function:
[0088] ;
[0089] in, It is a point cloud density map. It is a two-dimensional Gaussian distribution function. It is any pixel coordinate on the density map. It is the first The coordinates of each lidar point projected onto the image. It is the first The covariance matrix of each point. The preprocessing step ensures that the training data of different modalities have accurate spatiotemporal alignment before being input into the model, providing a reliable data foundation for subsequent attention-based cross-modal fusion training, enabling the model to effectively learn the correlation between geometric and texture information during training.
[0090] A shared convolutional encoder is used to extract high-level semantic features from both the multi-exposure infrared image and the projected point cloud density map, resulting in image features. and point cloud features . Point cloud features for lidar The feature dimension is the length of the feature vector extracted from the projected point cloud density map for each spatial location. This represents the number of channels for the image features. The height and width of the image features and the point cloud features correspond to the height and width of the point cloud density map, respectively.
[0091] Step c: Using point cloud features as the query and image features as the key and value, perform cross-modal attention fusion on the point cloud features and image features to obtain enhanced fusion features and extract shared features from them.
[0092] like Figure 3 As shown, a cross-modal feature alignment module based on a query-key-value attention model is used for deep fusion. Point cloud features are used as the core. Image features as a query Serving as both keys and values, a cross-modal attention weight map is output by calculating the similarity between point cloud features (Query) and image features (Key). :
[0093] ;
[0094] in, It is a cross-modal attention weight map. It is a normalized exponential function. and It is a learnable linear transformation weight matrix. It is a point cloud feature. It is an image feature. This is the dimension of the key vector, used to scale the dot product result. Attention weight graph. Spatially, it highlights regions that have both reliable laser point cloud projection and significant semantic features in infrared images, thereby effectively suppressing the interference of background noise from images without point cloud support on the perception task.
[0095] Image features (Values) are weighted and fused using attention weights, and then added to the original point cloud features to obtain enhanced fused features. :
[0096] ;
[0097] in, It enhances the fusion features. It is another linear transformation matrix.
[0098] This embodiment uses a query-key attention mechanism-based cross-modal feature alignment method. Point cloud features are used as queries, and image features are used as keys and values. Attention weight maps are dynamically calculated to achieve adaptive weighted fusion of image features and 3D depth information at the feature level. This method can automatically focus on and enhance the features of target regions that have both reliable geometric structure and clear texture, suppress interference caused by shadows, reflections, etc., and improve the robustness of fused features.
[0099] This mechanism utilizes attention weight graphs. By weighting image features, the contribution of image features that do not match the spatial location of the laser point cloud is reduced, enabling the network to adaptively focus on target region features that have both rich texture information and reliable geometric structure.
[0100] In one or more embodiments, a shared encoder is employed to extract shared features from enhanced fusion features; the shared encoder includes a parallel block-aware attention (PPA) module and an efficient channel attention (ECA) module;
[0101] The PPA module contains three parallel convolutional branches: the first branch uses a 3×3 standard convolutional layer, the second branch uses a 5×5 standard convolutional layer, and the third branch uses a 3×3 dilated convolutional layer with a dilation rate of 2. The output feature maps of the three branches are concatenated along the channel dimension, and then a 1×1 convolutional layer is used for channel fusion and dimensionality reduction to finally output an enhanced feature map.
[0102] ;
[0103] in, This is the output characteristic of the PPA module. It is the first The adaptive learning weight coefficients of each branch. Indicates the kernel size as Convolution operation, The input feature map is the enhanced fusion feature. This design effectively preserves the edge and texture information of small targets such as distant vehicles and pedestrians. In this embodiment, the number of convolutional branches is [number]. .
[0104] The ECA module contains a shared convolutional encoder and three independent linear transformation matrices. It includes a Softmax function and a weighted summation unit. Optimization is performed at the channel level, avoiding the dimensionality reduction operations in traditional SE attention, and directly capturing cross-channel interactions through one-dimensional convolution:
[0105] ;
[0106] ;
[0107] in, It is the Sigmoid activation function. It is a one-dimensional convolution. It is global average pooling. For adaptive selection of convolution kernel size; The number of channels in the input feature map Decide; It is a scaling factor. It is the offset; Represents the first... Attention weights for each channel; .
[0108] The final output is That is, as a shared feature This enhances the ability to focus on key channel features.
[0109] Step d: Extract the initial semantic features corresponding to the parallel decoding task, and fuse the shared features through a gating control module (GCA). and each initial semantic feature This yields the final fused features corresponding to the parallel decoding task.
[0110] It should be noted here that the initial semantic features Specifically, this includes the initial semantic features of the object detection decoder, the initial semantic features of the drivable region segmentation decoder, and the initial semantic features of the lane line detection decoder. These initial semantic features respectively carry the preliminary task semantics for object localization, road surface region segmentation, and lane line recognition.
[0111] Generate fusion weights through a gating mechanism:
[0112] ;
[0113] in, It is a fusion weight; It is the weight matrix of the gated linear transformation; These are the bias terms of the gating mechanism, and both are learnable parameters; It is a shared feature; These are the initial semantic features; Indicates feature splicing; This represents the Sigmoid function.
[0114] The final fusion features are:
[0115] ;
[0116] in, It is the final fusion feature.
[0117] This embodiment uses a gating mechanism to fuse shared features and initial semantic features to obtain the final fused features. It dynamically adjusts the fusion ratio between shared encoder features and initial semantic features, enabling the extraction of useful information from shared features according to its own needs while retaining task specificity. This achieves efficient collaborative optimization, alleviates performance conflicts between tasks, and finally realizes highly robust autonomous driving perception in nighttime scenarios.
[0118] Step e: Perform parallel decoding on each final fused feature to obtain the corresponding decoding result, and then combine it with the multi-task loss function to train the pre-built autonomous driving multi-task perception model, such as... Figure 4 As shown.
[0119] To address the different feature scale requirements of various tasks, the decoder introduces a feature selection weighting mechanism to optimize feature sources according to task needs. For each task... The optimal output features are generated through scale-adaptive weight calculation:
[0120] ;
[0121] in, It is a task The final fusion feature is also the task The corresponding optimized input features of the decoder; Feature maps representing three different levels in the feature pyramid; For task t at scale The learnable weights are normalized using Softmax. It is an upsampling operation; It is a scale The corresponding feature maps. This design makes lane line detection more reliant on high-resolution features (P3), while feasible region segmentation prioritizes deep features rich in semantic information (P5).
[0122] To improve the semantic understanding and reliability of perception models in nighttime scenes, this invention integrates a material perception branch into the training process. The core innovation of this branch lies in enabling the model to autonomously learn the complex mapping relationship between near-infrared spectral reflectance characteristics and object surface material properties through the training process, thereby embedding high-precision material recognition capabilities into the trained model.
[0123] During the training phase, the material-aware branch is designed as an optimizable material classifier. The input to the material-aware branch includes not only the fused high-level semantic features... Furthermore, low-level features containing more detailed textures were specifically introduced from the early layers of the backbone network to output the material probability distribution:
[0124] ;
[0125] in, It is the probability distribution of materials. and For classifier parameters, It is a multi-layer sensor. This branch can distinguish materials such as metal, fabric, and glass, and combined with target detection results, it can effectively reduce the false alarm rate for non-threatening heat sources such as manhole covers and billboards.
[0126] The multi-task loss function integrates various objectives and automatically adjusts the weights using the homoscedasticity uncertainty theory.
[0127] ;
[0128] in, It is the total loss function. For target detection loss, To segment the Dice loss, For material classification, cross-entropy loss, , , The specific formula for the learnable task uncertainty parameter is as follows:
[0129] ;
[0130] ;
[0131] ;
[0132] in, , , These are the manually weighted coefficients for the bounding box regression loss, the target confidence loss, and the classification loss. It is the bounding box regression loss. It is confidence level to predict loss. It is the category prediction loss; It is the probability predicted by the model that the i-th pixel belongs to the drivable area. It is the actual drivable area label, which is 1 for drivable areas and 0 otherwise. It is the total number of pixels in an image. It is a smoothing coefficient to prevent the denominator from being zero; This represents the total number of material categories. It is a genuine material category code label. The model predicts which category an object belongs to. The probability of the above material probability distribution The Each component.
[0133] The aforementioned multi-task loss function enables the network to dynamically balance the learning intensity of each task during training, preventing any one task from dominating gradient updates. Finally, all perception results are integrated with the output and decision layer to generate a comprehensive environmental perception summary. This structured perception information is transmitted in real-time to the decision-making and planning module of the autonomous driving system for vehicle control or to provide warnings to the driver through a human-machine interface, thus completing the closed loop from raw data to driving action.
[0134] Step S102: Based on the vehicle's current motion state and navigation target path, generate multiple candidate trajectories and sample them to form a candidate trajectory set.
[0135] Based on the vehicle's current motion state and the navigation target path, multiple candidate trajectories are generated. A fifth-order polynomial is used to generate the lateral offset curves of the candidate trajectories to calculate the shape of the vehicle's route.
[0136] ;
[0137] in, This is the lateral offset. Let be the arc length along the reference line. To plan the lateral length, the coefficient Determined by boundary conditions.
[0138] To cover different driving strategies, the boundary conditions are sampled to generate multiple sets of different polynomial coefficients, thereby planning multiple candidate trajectories. All these trajectories constitute a candidate trajectory set, providing diverse options for subsequent evaluation and decision-making.
[0139] Candidate trajectory set is characterized as :
[0140] ;
[0141] ;
[0142] in, For the first Candidate trajectories, The number of candidate trajectories. For the first The position sequence of candidate trajectories, It is a velocity sequence. For curvature sequence, An acceleration sequence; Let be the arc length along the reference line. To plan the horizontal length.
[0143] Step S103: Perform security assessment, smoothness assessment and efficiency assessment on each candidate trajectory in the candidate trajectory set. After normalizing the evaluation values, perform weighted summation to obtain the comprehensive value. Select the trajectory with the smallest comprehensive value as the optimal trajectory.
[0144] ;
[0145] ;
[0146] ;
[0147] in, For the sake of security, As a cost of smoothness, For the sake of efficiency, For the weighting function of safety, For the first The trajectory in arc length Distance to the nearest obstacle For safe distance parameters, in nighttime scenarios Increase; For curvature, The rate of change of curvature, For acceleration, , , These are the corresponding weighting coefficients; For the desired speed curve, For the first The velocity curve of the trajectory, For the estimated travel time, This is the time weighting coefficient.
[0148] ;
[0149] in, To comprehensively represent value, , and They are respectively , and Normalized corresponding generation value, weight coefficient satisfy Finally, the trajectory with the lowest overall cost is selected as the optimal trajectory output.
[0150] Finally, the system compares the candidate trajectory set and its comprehensive cost function, and selects the candidate trajectory with the minimum comprehensive cost value as the initial optimal trajectory. .in, It is the parameter minimization operator.
[0151] Step S104: The selected optimal trajectory is used as a vehicle control command and transmitted to the vehicle control actuator to form a complete autonomous driving decision-making chain of perception, planning and control.
[0152] To verify the effectiveness of this invention, systematic experiments were conducted on nighttime autonomous driving datasets such as exDARK and CULane-Night. Evaluation metrics included: mean average accuracy (mAP) for object detection, intersection-over-union (IoU) for drivable region segmentation, F1 score for lane line recognition, and accuracy for material classification. To ensure fairness in the comparison, all experiments were conducted under identical conditions. Figure 6 This demonstrates the overall performance of the present invention and the baseline model on the testing machine. The line graph uses method category as the x-axis and performance value as the y-axis; the two lines represent the performance of the visible light baseline method and the present invention, respectively. Since the visible light baseline method lacks the capability to handle material classification tasks, the accuracy for this task is null. Figure 6 As can be seen, the method of the present invention outperforms the baseline method for visible light in all perception tasks, demonstrating the significant characteristics and feasibility of the method.
[0153] like Figure 7 As shown, the autonomous driving perception planning and coordination system for nighttime scenarios provided in this embodiment of the invention can be implemented in software. The autonomous driving perception planning and coordination system for nighttime scenarios includes the following software modules:
[0154] The multi-task perception and conversion module 701 is used to acquire nighttime near-infrared images and laser point cloud data, perform multi-task perception using the trained autonomous driving multi-task perception model, and uniformly convert the multi-task perception results to the vehicle coordinate system.
[0155] The candidate trajectory set forming module 702 is used to generate multiple candidate trajectories based on the current motion state of the vehicle and the navigation target path, and to sample them to form a candidate trajectory set;
[0156] The optimal trajectory selection module 703 is used to perform safety evaluation, smoothness evaluation and efficiency evaluation on each candidate trajectory in the candidate trajectory set. After normalizing the evaluation cost value, it performs a weighted sum to obtain the comprehensive cost value, and selects the trajectory with the smallest comprehensive cost value as the optimal trajectory.
[0157] The autonomous driving decision link formation module 704 is used to take the selected optimal trajectory as a vehicle control command and transmit it to the vehicle control actuator to form a complete autonomous driving decision link of perception, planning and control.
[0158] It should be noted that each module in the autonomous driving perception planning and coordination system for nighttime scenarios in this embodiment corresponds one-to-one with each step in the autonomous driving perception planning and coordination method for nighttime scenarios in the above embodiments, and their specific implementation processes are the same, so they will not be repeated here.
[0159] The structure of the electronic device according to an embodiment of the present invention will be described in detail below. Figure 8 This is a schematic diagram of the composition structure of an electronic device provided in an embodiment of the present invention. It can be understood that... Figure 8 The diagram shows only an exemplary structure of the electronic device, not the entire structure. Some or all of the structures shown may be implemented as needed.
[0160] The electronic device provided in this embodiment of the invention includes: at least one processor 801, a memory 802, a user interface 803, and at least one network interface 804. The various components in the autonomous driving perception planning and coordination system for nighttime scenarios are coupled together via a bus system 805. It can be understood that the bus system 805 is used to realize the connection and communication between these components. In addition to a data bus, the bus system 805 also includes a power bus, a control bus, and a status signal bus. However, for clarity, in... Figure 8 The general labeled all buses as Bus System 805.
[0161] The user interface 803 may include a monitor, keyboard, mouse, trackball, click wheel, buttons, touchpad, or touch screen.
[0162] It is understood that memory 802 can be volatile memory or non-volatile memory, or both. In this embodiment of the invention, memory 802 is capable of storing data to support the operation of the terminal. Examples of this data include any computer programs used to operate on the terminal, such as operating systems and applications. The operating system includes various system programs, such as framework layers, core library layers, driver layers, etc., used to implement various basic services and handle hardware-based tasks. Applications can include various applications.
[0163] In some embodiments, the autonomous driving perception planning and coordination system for nighttime scenarios provided by this invention can be implemented using a combination of hardware and software. As an example, the autonomous driving perception planning and coordination system for nighttime scenarios provided by this invention can be a processor in the form of a hardware decoding processor, programmed to execute the autonomous driving perception planning and coordination method for nighttime scenarios provided by this invention. For example, the processor in the form of a hardware decoding processor can employ one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.
[0164] As an example, processor 801 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., wherein the general-purpose processor can be a microprocessor or any conventional processor, etc.
[0165] As an example of the hardware implementation of the autonomous driving perception planning and coordination system for nighttime scenarios provided in this embodiment of the invention, the device provided in this embodiment of the invention can be directly executed by a processor 801 in the form of a hardware decoding processor. For example, it can be executed by one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components to implement the autonomous driving perception planning and coordination method for nighttime scenarios provided in this embodiment of the invention.
[0166] The memory 802 in this embodiment of the invention is used to store various types of data to support the operation of the autonomous driving perception planning and coordination system for nighttime scenarios, or to store data for execution. Figure 1The program code for the method shown. Examples of this data include: any executable instructions for operation on an autonomous driving perception planning and coordination system for nighttime scenarios, such as executable instructions that can be included in the executable instructions, implementing the autonomous driving perception planning and coordination method for nighttime scenarios according to embodiments of the present invention.
[0167] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program including functions for executing... Figure 1 The program code for the method shown. In such an embodiment, the computer program can be downloaded and installed from a network via a communication component, and / or installed from a removable medium. When the computer program is executed by the central processing unit, it performs the various functions defined in the apparatus of this application.
[0168] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, as well as combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0169] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A collaborative perception and planning method for autonomous driving in nighttime scenarios, characterized in that, include: Acquire nighttime near-infrared images and laser point cloud data, use the trained autonomous driving multi-task perception model to perform multi-task perception, and uniformly transform the multi-task perception results to the vehicle coordinate system. Based on the vehicle's current motion state and navigation target path, multiple candidate trajectories are generated and sampled to form a candidate trajectory set. Safety, smoothness, and efficiency are evaluated for each candidate trajectory in the candidate trajectory set. The evaluation values are normalized and then weighted and summed to obtain the comprehensive value. The trajectory with the smallest comprehensive value is selected as the optimal trajectory. The selected optimal trajectory is used as a vehicle control command and transmitted to the vehicle control actuator, forming a complete autonomous driving decision-making chain of perception, planning and control; The candidate trajectory set is characterized as : ; ; in, For the first Candidate trajectories, The number of candidate trajectories. For the first The position sequence of candidate trajectories, It is a velocity sequence. For curvature sequence, An acceleration sequence; Let be the arc length along the reference line. To plan the horizontal length; The process of training an autonomous driving multi-task perception model is as follows: The original nighttime near-infrared image is enhanced and then stitched together with the original nighttime near-infrared image to form a variable light image group; After spatially aligning the laser point cloud data with the variable light image group, point cloud features and image features are extracted. Using point cloud features as queries and image features as keys and values, cross-modal attention fusion is performed on point cloud features and image features to obtain enhanced fusion features and extract shared features from them; Extract the initial semantic features corresponding to the parallel decoding task, and fuse the shared features and each initial semantic feature through a gating mechanism to obtain the final fused features corresponding to the parallel decoding task; Parallel decoding is performed on each final fused feature to obtain the corresponding decoding result. Then, the pre-built autonomous driving multi-task perception model is trained by combining the multi-task loss function. The multi-task loss function is: ; in, It is the total loss function; Loss for target detection; To segment the Dice loss, Cross-entropy loss for material classification; , , The uncertainty parameter is the learnable task parameter.
2. The autonomous driving perception planning and collaborative method for nighttime scenarios as described in claim 1, characterized in that, The multi-task perception results include: target detection results, drivable area segmentation mask, lane line detection results, and material classification results; the target detection results are converted into obstacle information, the drivable area segmentation mask is converted into feasible domain boundary parameters, the lane line detection results are converted into parameterized lane centerlines, and the material classification results are converted into road surface friction coefficients.
3. The autonomous driving perception planning and collaborative method for nighttime scenarios as described in claim 1, characterized in that, A depth curve estimation network is used to enhance the original nighttime near-infrared image, generating a multi-exposure image. This multi-exposure image is then stitched together with the original nighttime near-infrared image along the channel dimension to form a variable-light image group. The process of generating the multi-exposure image is as follows: ; in, It is a multi-exposure image; It is the first Exposure transformation function; It is a raw nighttime near-infrared image; These are parameters learned by the network and are used to control brightness stretching and contrast balance under different exposures.
4. The autonomous driving perception planning and collaborative method for nighttime scenarios as described in claim 1, characterized in that, In the process of fusing shared features and initial semantic features through a gating mechanism, the gating mechanism generates the fusion weights for the shared features and initial semantic features. ; ; in, It is a fusion weight; It is the weight matrix of the gated linear transformation; These are the bias terms of the gating mechanism, and both are learnable parameters; It is a shared feature; These are the initial semantic features; Indicates feature splicing; This represents the Sigmoid function.
5. An autonomous driving perception, planning, and collaborative system for nighttime scenarios, characterized in that, The autonomous driving perception planning cooperative method for nighttime scenarios, based on any one of claims 1-4, includes: The multi-task perception and conversion module is used to acquire nighttime near-infrared images and laser point cloud data, use the trained autonomous driving multi-task perception model to perform multi-task perception, and uniformly convert the multi-task perception results to the vehicle coordinate system. The candidate trajectory set formation module is used to generate multiple candidate trajectories based on the vehicle's current motion state and navigation target path, and then sample them to form a candidate trajectory set; The optimal trajectory selection module is used to evaluate the safety, smoothness, and efficiency of each candidate trajectory in the candidate trajectory set. After normalizing the evaluation cost value, it performs a weighted sum to obtain the comprehensive cost value, and selects the trajectory with the smallest comprehensive cost value as the optimal trajectory. The autonomous driving decision link formation module is used to take the selected optimal trajectory as a vehicle control command and transmit it to the vehicle control actuator, forming a complete autonomous driving decision link of perception, planning and control.
6. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps in the autonomous driving perception planning cooperative method for nighttime scenarios as described in any one of claims 1-4.
7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the autonomous driving perception planning and coordination method for nighttime scenarios as described in any one of claims 1-4.