An intelligent driving behavior method based on machine learning
By employing a machine learning-based intelligent driving behavior approach, utilizing real-world road data and reinforcement learning algorithms, the system addresses the shortcomings of traditional autonomous driving systems in terms of flexibility and adaptability in complex environments, enabling efficient decision-making and response in urban environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DALIAN UNIV OF TECH
- Filing Date
- 2025-03-14
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional autonomous driving systems struggle to match the flexibility and adaptability of human drivers in dynamically changing real-world scenarios. In particular, in complex urban driving environments, AI systems often fail to effectively identify and handle various situations, and their algorithms lack accuracy and response speed.
We employ a machine learning-based intelligent driving behavior approach. By acquiring driving behavior data from real road scenarios, we perform image and point cloud data augmentation to construct feature vectors. Then, we utilize reinforcement learning algorithms to select and generate driving behaviors, including feature extraction, policy updates, and model optimization.
It improves the adaptability and reliability of intelligent driving systems in complex environments, optimizes resource allocation, and enhances the accuracy and response speed of driving tasks.
Smart Images

Figure CN120246009B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of information technology, specifically a machine learning-based intelligent driving behavior method. Background Technology
[0002] Against the backdrop of the rapid development of the automotive industry, intelligent driving technology is receiving increasing attention. With the continuous advancement of artificial intelligence (AI) technology, particularly breakthroughs in computer vision, deep learning, and sensor technology, automakers, technology companies, and research institutions are racing to develop intelligent vehicles with autonomous driving capabilities. These technological advancements enable vehicles to make real-time decisions in complex driving environments, improving driving safety and efficiency. However, despite the promising future of intelligent driving technology, many challenges remain in its practical application.
[0003] The complexity of driving scenarios is one of the major challenges facing intelligent driving technology. Drivers need to pay attention to various environmental factors while driving, including road conditions, traffic signs, pedestrians, other vehicles, and unexpected situations. These factors pose a severe test to the accuracy and timeliness of driving decisions. While traditional autonomous driving systems have made some progress in perceiving and understanding their surroundings, they still struggle to match the flexibility and adaptability of human drivers in dynamically changing real-world scenarios. In complex urban driving environments, AI systems must be able to effectively identify and process multiple situations, which places even higher demands on the accuracy and response speed of their algorithms.
[0004] Therefore, this invention addresses the above problems by proposing a more general intelligent driving behavior learning method based on machine learning algorithms, aiming to optimize the environmental perception and decision-making capabilities of intelligent driving systems, thereby improving their adaptability and reliability in real driving scenarios.
[0005] The neural network used in this method is referenced from Bacon, P.-L., Harb, J., & Precup, D. (2017). The Option-Critic Architecture. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https: / / arxiv.org / abs / 1609.05140. Summary of the Invention
[0006] The purpose of this invention is to address the problem that traditional autonomous driving systems struggle to achieve the flexibility and adaptability of human drivers in dynamically changing real-world scenarios. In complex urban driving environments, AI systems must be able to effectively identify and handle various situations, placing higher demands on the accuracy and response speed of algorithms.
[0007] To address the aforementioned problems, this invention proposes a machine learning-based intelligent driving behavior method, comprising:
[0008] S1: Obtain basic data;
[0009] The basic data includes driving behavior data in real road scenarios, which contains 1,000 driving scenarios, each lasting 20 seconds, and each driving scenario includes sensor data and labeled data;
[0010] S2: Calculate the feature vector of the driving scenario based on the basic data;
[0011] S2-1: Load data;
[0012] Load image data, load point cloud data, load vehicle position;
[0013] S2-2: Image data enhancement;
[0014] Random cropping, color jitter;
[0015] S2-3: Point cloud data enhancement;
[0016] Random rotation, random translation;
[0017] S2-4: Constructing feature vectors;
[0018] Image feature extraction, point cloud feature extraction;
[0019] S2-5: Eigenvector representation;
[0020] The image features, point cloud features, vehicle state, and surrounding object state are concatenated into the final feature vector s;
[0021] S3: Select driving scenarios based on feature vectors using reinforcement learning algorithms;
[0022] S4: Based on feature vectors, driving behavior in driving scenarios is generated using reinforcement learning algorithms;
[0023] S5: Strategy update method to obtain the optimal model.
[0024] In the preferred embodiment, it includes:
[0025] S1: Obtain basic data;
[0026] The basic data includes driving behavior data in real road scenarios, which contains 1,000 driving scenarios, each lasting 20 seconds, and each driving scenario includes sensor data and labeled data;
[0027] Sensor data includes:
[0028] Cameras: Six cameras in total (front, rear, left, right, left front, right front), providing RGB image output. B H*W*3 B H*W*3 Represents a three-dimensional tensor set of H*W*3, with height H=900 and width W=1600;
[0029] LiDAR: One 32-line LiDAR provides point cloud data P B N*3 N is the number of points, B N*3 This represents a three-dimensional tensor set of N*3, where each point is (x, y, z, intensity, ring). x, y, and z represent the coordinates of each point in three-dimensional space, intensity represents the signal strength reflected back after the laser beam emitted by the lidar hits the object, and ring represents the line number of the lidar.
[0030] Radar: Five radars (front, rear, left, right, and rear left) provide target detection data;
[0031] GPS and IMU: Provide vehicle location p=[p x, p y p z ] and velocity v=[v x v y v z ] represents the position p and velocity v of each point in the three-dimensional space along the x, y, and z coordinates;
[0032] The labeled data includes:
[0033] 3D bounding box: 3D annotation of two types of objects. Each object is represented as b=[x,y,z,w,l,h,θ], where x,y,z are the three-dimensional spatial coordinates of the center point of the 3D annotation, w represents the width of the bounding box, l represents the length of the bounding box, h represents the height of the bounding box, and θ represents the orientation angle.
[0034] Attributes: The motion state S of an object, including: stationary, moving, and behavior, including stopping and turning;
[0035] Trajectory: The trajectory of an object's motion T = [p1, p2, ..., p i ], where p i =[x i y i , z i[] represents the three-dimensional spatial coordinates of the object's position at time i;
[0036] Map information: High-precision map data, including lanes and traffic signs;
[0037] S2: Calculate the feature vector of the driving scenario based on the basic data;
[0038] S2-1: Load data;
[0039] Load image data: Obtain image I from the camera, with a shape of (H*W*3);
[0040] Load point cloud data: Obtain point cloud data P from LiDAR, with shape (N, 5), and each point is (x, y, z, intensity, ring).
[0041] Load vehicle position: Vehicle position p = [p x, p y p z ] and velocity v=[v x v y v z ];
[0042] S2-2: Image data enhancement;
[0043] Random cropping: randomly cropping a region I from image I. crop Size is (H) C W C The formula is:
[0044] I crop (x,y)=I(x+Δx,y+Δy)
[0045] Δx~u(0,WW C )
[0046] Δy~u(0, HH) C )
[0047] Where Δx and Δy represent the coordinate offset of the upper left corner of the cropping region, and the values are randomly sampled from a uniform distribution. u represents a uniform distribution, and W... C、 H C This indicates the width and height of the cropped image;
[0048] Color jittering: The brightness, contrast, and saturation of image I are randomly adjusted to obtain image enhancement data I. jitter The formula is:
[0049] I jitter =T color (I) crop )
[0050] Among them, T color This represents a color transformation function, including adjustments to brightness, contrast, and saturation. The specific derivation process includes:
[0051]
[0052] Where, α brightness 、 α contrast α saturation Indicates uniformly distributed random sampling, β brightness β contrast β saturation This indicates the adjustment range for brightness, contrast, and saturation. HSV2RGB and RGB2HSV represent the conversion functions between RGB and HSV.
[0053] S2-3: Point cloud data enhancement;
[0054] Random rotation: Randomly rotate the point cloud data P by an angle θ ~ u (-θ) max, θ max The formula is:
[0055]
[0056] Random translation: for point cloud data P rot Perform random translation to obtain enhanced point cloud data P trans The formula is:
[0057] P trans= P rot +Δt
[0058] Δt=[Δt x , Δt y , Δt z ]
[0059] Wherein, Δt~u(-t) max , t max ) represents the translation amount, t max Δt represents the maximum range of the translation operation. x , Δt y , Δt z These represent the translation amounts in three directions in three-dimensional space, respectively.
[0060] S2-4: Constructing feature vectors;
[0061] Image feature extraction: using convolutional neural networks to extract image features f image The formula is:
[0062] f image =CNN(I jitter )
[0063] Point cloud feature extraction: Extracting point cloud features f using PointNet pointcloud The formula is:
[0064] f pointcloud =PointNet(P trans )
[0065] Vehicle status s ego Including positions p and v, the formula is:
[0066] s ego =[p x, p y p z v x v y v z ]
[0067] Surrounding object state s obj Including the object's 3D bounding box and attributes, the formula is:
[0068] s obj =[b1, b2, ..., b M ]
[0069] b i =[x i y i , z i w i , l i h i θ i v xi v xi ]
[0070] Where M represents the number of objects in the surrounding space;
[0071] S2-5: Eigenvector representation;
[0072] The image features, point cloud features, vehicle state, and surrounding object state are concatenated to form the final feature vector s, as shown in the formula:
[0073] s=[f image f pointcloud s ego s obj ]
[0074] S3: Select driving scenarios based on feature vectors using reinforcement learning algorithms;
[0075] S3-1: Define a driving scenario o∈{o1, o2, o3, o4}, where o1 is an urban road, o2 is a highway, o3 is a rural road, and o4 is a tunnel;
[0076] Based on π option The strategy selected is the one best suited for the current driving scenario o, and this strategy uses a neural network π. option (o|s; θ) option ) parameterization, where θ option This represents the neural network parameters of the upper-level policy. The training objective of the neural network is to maximize the cumulative reward, as shown in the formula:
[0077]
[0078] Among them, J option Let E represent the cumulative reward, t represent the time step, T represent the maximum time step, γ represent the discount factor, and R represent the reward function, where the specific formula for the reward function R is:
[0079]
[0080] Where w1, w2, and w3 represent weighting coefficients, and R safety M represents the number of objects around the vehicle, and d represents the safety reward. i d represents the distance between the vehicle and the i-th object. min R represents the safe distance threshold. efficiency This represents the efficiency reward, where v represents the current vehicle speed. target R represents the target velocity. confort Indicates the comfort reward; acceleration indicates the current acceleration.
[0081] S4: Based on feature vectors, driving behavior in driving scenarios is generated using reinforcement learning algorithms;
[0082] Define driving behavior a∈{a1,a2,a3,a4,a5}, where a1 represents acceleration, a2 represents braking force, a3 represents throttle opening, a4 represents steering angle, and a5 represents steering speed;
[0083] Strategy π action The goal is to select the optimal driving behavior a in the current driving scenario o, using a neural network π. action (a|s, o; θ) action The parameterized driving behavior strategy outputs the optimal driving behavior 'a' as the actual driving behavior for intelligent driving, where θ action The parameters of the neural network represent the lower-level policy. The training objective of the neural network is to maximize the cumulative reward, as shown in the formula:
[0084]
[0085] Among them, J actionLet E represent the cumulative reward, t represent the expected value, T represent the maximum time step, γ represent the discount factor, and R represent the reward function. The termination function β determines whether the current driving scenario o terminates. The output termination serves as the basis for determining whether the current driving scenario terminates. If it is True, the scenario terminates; if it is False, the current driving scenario continues. The neural network β(termination|s; θ) is used. β ) Parameterized termination function θ β The neural network parameters, with the training objective of maximizing cumulative reward, are formulated as follows:
[0086]
[0087] J β Let E represent the cumulative reward, t represent the time step, T represent the maximum time step, γ represent the discount factor, and R represent the reward function.
[0088] S5: Strategy Update;
[0089] Based on the feature vector s, driving scenario o, driving behavior a, and reward R, the parameters of the upper-layer policy, lower-layer policy, and termination function are updated using the gradient ascent method, with the following formula:
[0090]
[0091] Where α represents the learning rate, This indicates that the objective function of the upper-level policy has respect to the parameter J. option gradient, This indicates that the objective function of the lower-level policy has respect to the parameter J. action gradient, The terminator represents the objective function with respect to the parameter J. β The gradient is used to repeatedly sample data and learn the policy until the model converges, thus obtaining the optimal model method.
[0092] In the preferred method, the two types of objects include: vehicles, pedestrians, and bicycles.
[0093] The beneficial effects of this invention are as follows: By analyzing driving scenarios, the intelligent driving task can be decomposed, which is conducive to the optimized allocation of intelligent driving resources, precise product deployment, and improved customer satisfaction; by using hierarchical reinforcement learning to decompose the intelligent driving scenario into multiple intelligent driving sub-tasks, the upper-level strategy is used to select the driving task for the current scenario, the lower-level strategy is used to solve the driving task, and the basic data is used to assist the training process, thereby improving the efficiency of data utilization and the training effect of the model. Attached Figure Description
[0094] Figure 1 This is a schematic diagram of the workflow of the present invention. Detailed Implementation
[0095] Example 1:
[0096] A machine learning-based intelligent driving behavior method includes:
[0097] S1: Obtain basic data;
[0098] The basic data includes driving behavior data in real road scenarios, which contains 1,000 driving scenarios, each lasting 20 seconds, and each driving scenario includes sensor data and labeled data;
[0099] S2: Calculate the feature vector of the driving scenario based on the basic data;
[0100] S2-1: Load data;
[0101] Load image data, load point cloud data, load vehicle position;
[0102] S2-2: Image data enhancement;
[0103] Random cropping, color jitter;
[0104] S2-3: Point cloud data enhancement;
[0105] Random rotation, random translation;
[0106] S2-4: Constructing feature vectors;
[0107] Image feature extraction, point cloud feature extraction;
[0108] S2-5: Eigenvector representation;
[0109] The image features, point cloud features, vehicle state, and surrounding object state are concatenated into the final feature vector s;
[0110] S3: Select driving scenarios based on feature vectors using reinforcement learning algorithms;
[0111] S4: Based on feature vectors, driving behavior in driving scenarios is generated using reinforcement learning algorithms;
[0112] S5: Strategy update method to obtain the optimal model.
[0113] include:
[0114] S1: Obtain basic data;
[0115] The basic data includes driving behavior data in real road scenarios, which contains 1,000 driving scenarios, each lasting 20 seconds, and each driving scenario includes sensor data and labeled data;
[0116] Sensor data includes:
[0117] Cameras: Six cameras in total (front, rear, left, right, left front, right front), providing RGB image output. B H*W*3 B H*W*3 Represents a three-dimensional tensor set of H*W*3, with height H=900 and width W=1600;
[0118] LiDAR: One 32-line LiDAR provides point cloud data P B N*3 N is the number of points, B N*3 This represents a three-dimensional tensor set of N*3, where each point is (x, y, z, intensity, ring). x, y, and z represent the coordinates of each point in three-dimensional space, intensity represents the signal strength reflected back after the laser beam emitted by the lidar hits the object, and ring represents the line number of the lidar.
[0119] Radar: Five radars (front, rear, left, right, and rear left) provide target detection data;
[0120] GPS and IMU: Provide vehicle location p=[p x, p y p z ] and velocity v=[v x v y v z ] represents the position p and velocity v of each point in the three-dimensional space along the x, y, and z coordinates;
[0121] The labeled data includes:
[0122] 3D bounding box: 3D annotation of two types of objects. Each object is represented as b=[x,y,z,w,l,h,θ], where x,y,z are the three-dimensional spatial coordinates of the center point of the 3D annotation, w represents the width of the bounding box, l represents the length of the bounding box, h represents the height of the bounding box, and θ represents the orientation angle.
[0123] Attributes: The motion state S of an object, including: stationary, moving, and behavior, including stopping and turning;
[0124] Trajectory: The trajectory of an object's motion T = [p1, p2, ..., p i ], where p i =[x i y i , z i [] represents the three-dimensional spatial coordinates of the object's position at time i;
[0125] Map information: High-precision map data, including lanes and traffic signs;
[0126] S2: Calculate the feature vector of the driving scenario based on the basic data;
[0127] S2-1: Load data;
[0128] Load image data: Obtain image I from the camera, with a shape of (H*W*3);
[0129] Load point cloud data: Obtain point cloud data P from LiDAR, with shape (N, 5), and each point is (x, y, z, intensity, ring).
[0130] Load vehicle position: Vehicle position p = [p x, p y p z ] and velocity v=[v x v y v z ];
[0131] S2-2: Image data enhancement;
[0132] Random cropping: randomly cropping a region I from image I. crop Size is (H) C W C The formula is:
[0133] I crop (x,y)=I(x+Δx,y+Δy)
[0134] Δx~u(0,WW C )
[0135] Δy~u(0, HH) C )
[0136] Where Δx and Δy represent the coordinate offset of the upper left corner of the cropping region, and the values are randomly sampled from a uniform distribution. u represents a uniform distribution, and W... C、 H C This indicates the width and height of the cropped image;
[0137] Color jittering: The brightness, contrast, and saturation of image I are randomly adjusted to obtain image enhancement data I. jitter The formula is:
[0138] I jitter =T color (I) crop )
[0139] Among them, T color This represents a color transformation function, including adjustments to brightness, contrast, and saturation. The specific derivation process includes:
[0140]
[0141] Where, α brightness 、 α contrast α saturation Indicates uniformly distributed random sampling, β brightness β contrast β saturation This indicates the adjustment range for brightness, contrast, and saturation. HSV2RGB and RGB2HSV represent the conversion functions between RGB and HSV.
[0142] S2-3: Point cloud data enhancement;
[0143] Random rotation: Randomly rotate the point cloud data P by an angle θ ~ u (-θ) max, θ max The formula is:
[0144]
[0145] Random translation: for point cloud data P rot Perform random translation to obtain enhanced point cloud data P trans The formula is:
[0146] P trans= P rot +Δt
[0147] Δt=[Δt x , Δt y , Δt z ]
[0148] Wherein, Δt~u(-t) max , t max ) represents the translation amount, t max Δt represents the maximum range of the translation operation. x , Δt y , Δt z These represent the translation amounts in three directions in three-dimensional space, respectively.
[0149] S2-4: Constructing feature vectors;
[0150] Image feature extraction: using convolutional neural networks to extract image features f image The formula is:
[0151] f image =CNN(I jitter )
[0152] Point cloud feature extraction: Extracting point cloud features f using PointNet pointcloud The formula is:
[0153] fpointcloud =PointNet(P trans )
[0154] Vehicle status s ego Including positions p and v, the formula is:
[0155] s ego =[p x, p y p z v x v y v z ]
[0156] Surrounding object state s obj Including the object's 3D bounding box and attributes, the formula is:
[0157] s obj =[b1, b2, ..., b M ]
[0158] b i =[x i y i , z i w i , l i h i θ i v xi v xi ]
[0159] Where M represents the number of objects in the surrounding space;
[0160] S2-5: Eigenvector representation;
[0161] The image features, point cloud features, vehicle state, and surrounding object state are concatenated to form the final feature vector s, as shown in the formula:
[0162] s=[f image f pointcloud s ego s obj ]
[0163] S3: Select driving scenarios based on feature vectors using reinforcement learning algorithms;
[0164] S3-1: Define a driving scenario o∈{o1, o2, o3, o4}, where o1 is an urban road, o2 is a highway, o3 is a rural road, and o4 is a tunnel;
[0165] Based on π option The strategy selected is the one best suited for the current driving scenario o, and this strategy uses a neural network π. option (o|s; θ) option) parameterization, where θ option This represents the neural network parameters of the upper-level policy. The training objective of the neural network is to maximize the cumulative reward, as shown in the formula:
[0166]
[0167] Among them, J option Let E represent the cumulative reward, t represent the time step, T represent the maximum time step, γ represent the discount factor, and R represent the reward function, where the specific formula for the reward function R is:
[0168]
[0169] Where w1, w2, and w3 represent weighting coefficients, and R safety M represents the number of objects around the vehicle, and d represents the safety reward. i d represents the distance between the vehicle and the i-th object. min R represents the safe distance threshold. efficiency This represents the efficiency reward, where v represents the current vehicle speed. target R represents the target velocity. confort Indicates the comfort reward; acceleration indicates the current acceleration.
[0170] S4: Based on feature vectors, driving behavior in driving scenarios is generated using reinforcement learning algorithms;
[0171] Define driving behavior a∈{a1,a2,a3,a4,a5}, where a1 represents acceleration, a2 represents braking force, a3 represents throttle opening, a4 represents steering angle, and a5 represents steering speed;
[0172] Strategy π action The goal is to select the optimal driving behavior a in the current driving scenario o, using a neural network π. action (a|s, o; θ) action The parameterized driving behavior strategy outputs the optimal driving behavior 'a' as the actual driving behavior for intelligent driving, where θ action The parameters of the neural network represent the lower-level policy. The training objective of the neural network is to maximize the cumulative reward, as shown in the formula:
[0173]
[0174] Among them, J actionLet E represent the cumulative reward, t represent the expected value, T represent the maximum time step, γ represent the discount factor, and R represent the reward function. The termination function β determines whether the current driving scenario o terminates. The output termination serves as the basis for determining whether the current driving scenario terminates. If it is True, the scenario terminates; if it is False, the current driving scenario continues. The neural network β(termination|s; θ) is used. β ) Parameterized termination function θ β The neural network parameters, with the training objective of maximizing cumulative reward, are formulated as follows:
[0175]
[0176] J β Let E represent the cumulative reward, t represent the time step, T represent the maximum time step, γ represent the discount factor, and R represent the reward function.
[0177] S5: Strategy Update;
[0178] Based on the feature vector s, driving scenario o, driving behavior a, and reward R, the parameters of the upper-layer policy, lower-layer policy, and termination function are updated using the gradient ascent method, with the following formula:
[0179]
[0180] Where α represents the learning rate, This indicates that the objective function of the upper-level policy has respect to the parameter J. option gradient, This indicates that the objective function of the lower-level policy has respect to the parameter J. action gradient, The terminator represents the objective function with respect to the parameter J. β The gradient is calculated, and data sampling and policy learning are repeated until the model converges to obtain the optimal model method. Two object classes are included: vehicles, pedestrians, and bicycles.
[0181] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the present invention. Various changes and modifications can be made to the present invention without departing from its spirit and scope. All such changes and modifications fall within the scope of the present invention as claimed, which is defined by the appended claims and their equivalents.
Claims
1. A method for intelligent driving behavior based on machine learning, characterized in that, include: S1: Obtain basic data; The basic data includes driving behavior data in real road scenarios, which contains 1,000 driving scenarios, each lasting 20 seconds, and each driving scenario includes sensor data and labeled data; Sensor data includes: Cameras: Six cameras in total (front, rear, left, right, front left, and front right), providing RGB images. , express A three-dimensional tensor set with height H=900 and width W=1600; LiDAR: One 32-line LiDAR provides point cloud data. N is the number of points. express The three-dimensional tensor set, each point is (x, y, z, intensity, ring), where x, y, z represent the coordinates of each point in three-dimensional space, intensity represents the signal intensity reflected back after the laser beam emitted by the lidar hits the object, and ring represents the line number of the lidar. Radar: Five radars (front, rear, left, right, and rear left) provide target detection data; GPS and IMU: Provide the vehicle's position p=[px, py, pz] and velocity v=[vx, vy, vz], representing the position p and velocity v of each point in the three-dimensional space in the x, y, and z coordinates; The labeled data includes: 3D bounding box: 3D annotation of two types of objects. Each object is represented as b=[x,y,z,w,l,h,θ], where x,y,z are the three-dimensional spatial coordinates of the center point of the 3D annotation, w represents the width of the bounding box, l represents the length of the bounding box, h represents the height of the bounding box, and θ represents the orientation angle. Attributes: The motion state S of an object, including: stationary, moving, and behavior, including stopping and turning; Trajectory: The trajectory of an object's motion is T = [p1, p2, ..., pi], where pi = [xi, yi, zi] represents the three-dimensional spatial coordinates of the object's position at time i; Map information: High-precision map data, including lanes and traffic signs; S2: Calculate the feature vector of the driving scenario based on the basic data; S2-1: Load data; Load image data: Get image I from camera, shape is ; Load point cloud data: Obtain point cloud data P from LiDAR, with shape (N, 5), and each point is (x, y, z, intensity, ring). Load the vehicle's position: the vehicle's position p = [px, py, pz] and speed v = [vx, vy, vz]; S2-2: Image data enhancement; Random crop: randomly crop a region from image I with size ( ), formula: in, , This represents the coordinate offset of the top-left corner of the cropped area. The value is a uniformly distributed random sample, where u indicates a uniform distribution. , This indicates the width and height of the cropped image; Color jittering: Randomly adjust the brightness, contrast, and saturation of image I to obtain image enhancement data. The formula is: wherein, represents a color transform function, including adjustment of luminance, contrast and saturation, the specific derivation process including: in, , , This indicates a uniformly distributed random sampling. , , This indicates the adjustment range for brightness, contrast, and saturation. HSV2RGB and RGB2HSV represent the conversion functions between RGB and HSV. S2-3: Point cloud data enhancement; Random rotation: Randomly rotate the point cloud data P by a rotation angle , which is given by the formula Randomly shifting: on point cloud data obtaining enhanced point cloud data by randomly shifting , the formula is: in, Indicates the amount of translation. Indicates the maximum range of the translation operation. , , These represent the translation amounts in three directions in three-dimensional space, respectively. S2-4: Constructing feature vectors; Image feature extraction: image features are extracted using a convolutional neural network The formula is: Point cloud feature extraction: Use PointNet to extract point cloud features The formula is: ego state including position p and v, the formula is: Surrounding object state including the 3D bounding box and attributes of the object, the formula is: Where M represents the number of objects in the surrounding space; S2-5: Eigenvector representation; The image features, point cloud features, vehicle state, and surrounding object state are concatenated to form the final feature vector s, as shown in the formula: S3: Select driving scenarios based on feature vectors using reinforcement learning algorithms; S3-1 : Define a driving scenario , for an urban road, for a highway, for a rural road, for a tunnel; based on The strategy selected is the one best suited for the current driving scenario, and this strategy utilizes a neural network. Parameterization, where This represents the neural network parameters of the upper-level policy. The training objective of the neural network is to maximize the cumulative reward, as shown in the formula: in, Let E represent the cumulative reward, t represent the expected value, and T represent the maximum time step. Let R represent the discount factor, and R represent the reward function, where the specific formula for the reward function R is: in, , , The weighting coefficient represents the weighting coefficient. M represents the number of objects around the vehicle, indicating a safety bonus. This represents the distance between the vehicle and the i-th object. Indicates the safe distance threshold. This represents the efficiency reward, where v represents the current vehicle speed. Indicates the target speed. Indicates the comfort reward; acceleration indicates the current acceleration. S4: Based on feature vectors, driving behavior in driving scenarios is generated using reinforcement learning algorithms; Defining driving behavior , represents an acceleration, represents a brake force, represents an accelerator opening degree, represents a steering angle, represents a steering speed; Strategy The goal is to select the optimal driving behavior 'a' in the current driving scenario 'o', using a neural network. (a|s,o; The parameterized driving behavior strategy outputs the optimal driving behavior 'a' as the actual driving behavior for intelligent driving. This represents the neural network parameters of the lower-level policy. The training objective of the neural network is to maximize the cumulative reward, as shown in the formula: in, Let E represent the cumulative reward, t represent the expected value, and T represent the maximum time step. Let represent the discount factor, and R represent the reward function. The termination function β determines whether the current driving scenario o terminates. The output termination serves as the basis for determining whether the current driving scenario terminates. If it is True, the scenario terminates; if it is False, the current driving scenario continues. The neural network β(termination|s; θ) is used. β ) Parameterized termination function θ β The neural network parameters, with the training objective of maximizing cumulative reward, are formulated as follows: J β Let E represent the cumulative reward, t represent the expected value, and T represent the maximum time step. R represents the discount factor, and R represents the reward function; S5: Method for obtaining the optimal model through strategy update; Based on the feature vector s, driving scenario o, driving behavior a, and reward R, the parameters of the upper-layer policy, lower-layer policy, and termination function are updated using the gradient ascent method, with the following formula: Where α represents the learning rate, This indicates that the objective function of the upper-level policy has respect to the parameters. gradient, This indicates that the objective function of the lower-level policy has respect to the parameters. gradient, The terminator is the objective function with respect to the parameter J. β The gradient is used to repeatedly sample data and learn the policy until the model converges, thus obtaining the optimal model method.
2. The intelligent driving behavior method based on machine learning according to claim 1, characterized in that, The two categories of objects include: vehicles, pedestrians, and bicycles.