An automatic driving multi-task coordination decision-making method based on hierarchical rolling optimization

By using a hierarchical rolling optimization method, the autonomous driving task is decomposed into multiple driving tasks. A reinforcement learning decision framework is used to coordinate multiple tasks, which solves the problems of transferability and interpretability of reinforcement learning in autonomous driving, and realizes safer and more stable driving decisions and control in complex scenarios.

CN116552563BActive Publication Date: 2026-06-23FUZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
FUZHOU UNIV
Filing Date
2023-04-14
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing reinforcement learning methods suffer from poor transferability, insufficient interpretability, and problems with decision-making and control in complex scenarios in autonomous driving, making it difficult to apply them effectively outside of the training scenario.

Method used

A hierarchical rolling optimization method is adopted to plan the driving objective into multiple driving tasks. A reinforcement learning decision framework is used to coordinate multiple tasks. Environmental information is acquired through visual sensors, represented in low dimension, and the decision is optimized in a rolling manner. Only the control action of the first driving task is executed, and the planning and execution are carried out in a loop.

Benefits of technology

It improves the decision-making ability of autonomous driving in complex scenarios, achieves safer, more stable and efficient driving control, reduces the gap between the virtual environment and the real environment, and is similar to the planning and control behavior of human drivers.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116552563B_ABST
    Figure CN116552563B_ABST
Patent Text Reader

Abstract

The application relates to an automatic driving multitask coordination decision-making method based on hierarchical rolling optimization. The automatic driving vehicle environment state information is used as the input of a reinforcement learning decision-making framework, the driving target is planned as the connection of multiple driving tasks, the coordination among the multiple tasks is realized, each driving task is specifically realized as a control action, and only the control action of the first driving task is executed. Then, rolling forward to the next time step, based on the updated automatic driving vehicle environment state information, planning and control action execution are carried out again. The planning is repeatedly carried out, and rolling optimization decision-making is realized. The method can realize multitask coordination and is suitable for long-term decision-making of automatic driving in a complex driving scene.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of autonomous driving technology, and specifically to an autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization. Background Technology

[0002] Intelligent driving is developing rapidly, with various control and decision-making methods emerging one after another. Classical control methods have been widely used due to their stability and mature practical experience, and are constantly improving themselves to adapt to the ever-increasing technological demands. However, while classical methods cover most control scenarios, the remaining extreme scenarios are areas that classical methods cannot solve. At this point, reinforcement learning demonstrates its powerful vitality and is constantly replacing the original classical control methods in various scenarios and functions.

[0003] While reinforcement learning is booming, it also faces the practical problem of being difficult to apply in real-world situations. Reinforcement learning or deep learning is difficult to transfer to training scenarios or unexpected data conditions, which limits its widespread application. Furthermore, the requirements for training scenarios and training data are quite stringent, relying heavily on human selection and labeling of the data.

[0004] Patent CN114170488A discloses an autonomous driving method based on conditional imitation learning and reinforcement learning, which solves the problem of low exploration efficiency in reinforcement learning caused by random initialization. However, the application of imitation learning does not address the algorithm's dependence on label data. Patent CN115629608A discloses an autonomous vehicle control method based on deep prediction networks and deep reinforcement learning, further considering the impact of vehicle interactions on vehicle trajectory prediction. However, more complex driving scenarios require long-term decision-making, and end-to-end control sacrifices interpretability. Patent CN114013443B discloses an autonomous vehicle lane-changing decision-making control method based on hierarchical reinforcement learning, dividing the entire autonomous driving task into two layers: decision-making and control. Phased driving task planning makes autonomous driving control interpretable; however, the transferability of this scheme needs further discussion and verification. Summary of the Invention

[0005] The purpose of this invention is to provide an autonomous driving multi-task coordination and decision-making method based on hierarchical rolling optimization. This method can coordinate multiple driving tasks and is suitable for long-term autonomous driving decisions in complex driving scenarios.

[0006] To achieve the above objectives, the technical solution adopted by this invention is: a multi-task coordination decision-making method for autonomous driving based on hierarchical rolling optimization. This method utilizes the environmental state information of the autonomous vehicle as input to a reinforcement learning decision framework, plans the driving objective as a series of interconnected driving tasks to achieve coordination between tasks. Each driving task is specified as a control action, but only the control action of the first driving task is executed. Then, the process rolls forward to the next time step, and based on the updated environmental state information of the autonomous vehicle, planning and execution of control actions are performed again. This process is repeated to achieve rolling optimization decision-making.

[0007] Furthermore, the method includes the following steps:

[0008] Step S1: Obtain the original images around the autonomous vehicle using a visual sensor, process the images and obtain information about surrounding vehicles and obstacles; combine the information about surrounding vehicles and obstacles with the vehicle information and map information to form the environmental state information of the autonomous vehicle and represent it as a high-dimensional environmental state bird's-eye view; input the high-dimensional environmental state bird's-eye view into the neural network framework, extract the low-dimensional representation of the environmental state information, and obtain the low-dimensional environmental state information to simplify the input information of the reinforcement learning decision framework.

[0009] Step S2: Input the low-dimensional environmental state information into the reinforcement learning decision framework, perform multi-task decision-making through the reinforcement learning decision framework, and obtain decision actions, that is, a combination of multiple driving tasks in sequence; on this basis, decide the control action sequence of each driving task, that is, a combination of control actions in sequence; the control actions are composed of action units in the action unit library.

[0010] Step S3: After the decision-making action and control action sequence is planned, only the action unit sequence corresponding to the first driving task is executed;

[0011] Step S4: Roll forward to the next time step, repeat steps S1-S3, update the environmental state information of the autonomous vehicle, and make decisions again based on this and execute the action unit sequence of the first driving task after the decision update; repeat this planning process to achieve rolling optimization decision.

[0012] Furthermore, the vehicle information includes the vehicle's location and status information; the map information comes from existing high-precision maps or semantic maps obtained by the recognition module, and includes road path information; the road path information is global path information, including a series of path points from the starting point to the end point, represented by broken lines in the bird's-eye view; the high-dimensional environment status bird's-eye view is 256*256 pixels, which is processed and adjusted to 64*64 pixels, and the viewpoint is always aligned with the vehicle view, with the vehicle located at a fixed position in the view.

[0013] Furthermore, in order to model the vehicle's behavior and use it as input to the reinforcement learning decision-making framework, the dimensions of the vehicle and surrounding vehicles are modeled as variable units with collision risk, which change dynamically according to the vehicle's driving behavior.

[0014] Furthermore, the vehicle dimensions are modeled as variable elements, as follows:

[0015] For a vehicle traveling at a constant speed, its front and rear dimensions are defined as follows:

[0016]

[0017] Among them, L head For the variable unit, the dimension is the extension of the front of the original vehicle dimensions, L ttc The variable element is an extension of the vehicle's original dimensions, where ΔT is the time constant required to maintain minimum clearance with the vehicle in front, and V... HV V is the vehicle speed. front V represents the speed of the vehicle in front. rear The speed of the vehicle behind;

[0018] For vehicles that are accelerating and decelerating, the front and rear dimensions of the vehicle are increased respectively; defined as follows:

[0019]

[0020] Where Δt represents the image acquisition interval and Δv represents the relative velocity;

[0021] For stationary obstacles, extend their dimensions backward to the safe braking distance;

[0022] For vehicles changing lanes, extend the dimension of the direction of lane change, with the extension dimension based on the lane dimension;

[0023] For large vehicles, their front and rear dimensions are fixedly extended, with the rear dimension being extended more than the front dimension;

[0024] The variable unit is used to determine if an accident has occurred. If the variable units of the two vehicles overlap, then an accident between the two vehicles is determined.

[0025] Furthermore, a low-dimensional representation of the environmental state information is extracted, specifically as follows:

[0026] The low-dimensional representation of the environmental state information is obtained through a variational autoencoder. The encoding network encodes the original high-dimensional environmental state information into a low-dimensional state representation. To obtain the specific parameters in the network, the objective function is set as follows:

[0027]

[0028] Among them, L VAE D represents loss.KL Denotes KL divergence, Let μ(s) represent the prior probability distribution of a multivariate Gaussian distribution. t ),σ(s t ) represent the mean and standard deviation of the low-dimensional state representation, respectively. The reconstruction loss is used to measure how close the predicted frame is to the original frame.

[0029] Furthermore, the variational autoencoder comprises four 3×3 kernel-sized convolutional layers with 32, 64, 128, and 256 channels respectively; each convolutional layer is followed by a ReLU activation function; then a latent space layer of size 64 is fully connected to the last convolutional layer and trained using the Adam optimizer.

[0030] The variational autoencoder is pre-trained, and the resulting network is integrated into a reinforcement learning decision framework as a visual encoding layer; the parameters of the visual encoding layer no longer change with the change of the reinforcement learning decision framework.

[0031] The variational autoencoder is trained in the Prescan environment. A vehicle model is built using Carsim, and the original images are acquired and processed into a bird's-eye view format for training the variational autoencoder.

[0032] Furthermore, the reinforcement learning decision framework is based on a phased design to determine driving tasks over a future period; after outputting a combination sequence of driving tasks, the reinforcement learning decision framework further determines a combination sequence of control actions.

[0033] The combination sequence of control actions consists of action units from the action unit library. The action units contain vehicle speed and turning angle information and are obtained in a virtual environment by a pure tracking algorithm and a PID algorithm.

[0034] Furthermore, the design of the action unit includes the following steps;

[0035] Step 1: Determine the start and end points of any task; based on the vehicle's variable units, select the distance between the variable units of this vehicle and the vehicle in front to determine the action at the start of the task; the end point is defined as the center line of the target lane, the target speed, or both simultaneously.

[0036] Step 2: Select feasible action units between the start point and the end point; generate feasible paths using a path planner, the feasible paths consisting of a series of points containing vehicle state information; obtain lateral and longitudinal control commands using a pure tracking algorithm and a PID algorithm; based on this, obtain a set of action units available for reinforcement learning.

[0037] Furthermore, the implementation method of the reinforcement learning decision framework is as follows:

[0038] The reinforcement learning decision framework employs a SAC network, which includes two Q-networks: a value network and a policy network. The value network consists of a visual encoding layer followed by five dense layers with hidden units ranging from 256 to 32. The policy network has the same structure as the value network, except that its last layer is split into two branches. The first branch represents the mean of the action, and the second branch represents its variance. All SAC networks are trained using the Adam optimizer.

[0039] The decision-making process relies on the weighting of factors in the reward function, ensuring that both driver and passenger comfort and satisfaction are considered; the reward function is designed as follows:

[0040] R = R v +R α +R c +R o +k

[0041] Each item represents a speed bonus, steering smoothness bonus, collision bonus, lane departure bonus, and vehicle stationary penalty;

[0042] The reinforcement learning decision framework outputs a sequence of control actions for driving tasks. For each driving task, the Q-learning algorithm is used to select an action unit sequence from the action unit library and execute the action unit sequence corresponding to the first driving task to complete a multi-task decision and control in a complex scenario. Based on this, decision control is performed cyclically to drive the vehicle to the target location.

[0043] Compared with existing technologies, this invention has the following advantages: This invention proposes a multi-task coordinated decision-making method for autonomous driving based on hierarchical rolling optimization, solving the problems of transferability, interpretability, and decision-making and control in complex scenarios when reinforcement learning is applied to autonomous driving. Preprocessing of reinforcement learning input information enables the decision controller to be trained in a virtual environment and quickly transferred to the real environment, narrowing the gap with actual application scenarios. Multi-task serialization and action serialization enable the controller to perform long-term planning, greatly improving the autonomous driving decision-making ability in complex scenarios. Hierarchical planning control also overcomes the drawbacks of end-to-end approaches, similar to human driving behavior where a driving decision is made first, followed by detailed control. Furthermore, similar to human drivers planning driving over a period of time, but ultimately only executing the next step and then replanning based on the new state, this invention only applies and executes the action sequence corresponding to the first task in the driving task serialization, completing one round of decision control, and then proceeding to the next round of decision control. Based on this, the multi-task coordinated decision-making method based on reinforcement learning proposed in this invention can achieve safer, more stable, and more efficient decision-making. The controller can continuously learn and adapt to new environments, rather than being limited to virtual or trained scenarios. Attached Figure Description

[0044] Figure 1 This is a schematic diagram illustrating the implementation principle of the method in an embodiment of the present invention.

[0045] Figure 2 This is a schematic diagram illustrating the process of converting environmental state information into a bird's-eye view in an embodiment of the present invention.

[0046] Figure 3 This is a schematic diagram of the variable unit in an embodiment of the present invention.

[0047] Figure 4 This is a schematic diagram of the multi-task coordination decision based on reinforcement learning in an embodiment of the present invention. Detailed Implementation

[0048] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0049] It should be noted that the following detailed descriptions are exemplary and intended to provide further explanation of this application. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains.

[0050] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this application. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms "comprising" and / or "including" are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0051] This embodiment provides a hierarchical rolling optimization-based multi-task coordination decision-making method for autonomous driving. It utilizes the environmental state information of the autonomous vehicle as input to a reinforcement learning decision framework, planning the driving objective as a series of interconnected driving tasks to achieve coordination between them. Each driving task is concretized into control actions, but only the control actions of the first driving task are executed. Then, it rolls forward to the next time step, planning and executing control actions again based on updated autonomous vehicle environmental state information. This process is repeated to achieve rolling optimization decision-making.

[0052] like Figure 1 As shown, the method specifically includes the following steps:

[0053] Step S1: Obtain raw images of the surroundings of the autonomous vehicle using a visual sensor, process the images and obtain information on surrounding vehicles and obstacles; combine the surrounding vehicle and obstacle information with the vehicle information and map information to form the environmental state information of the autonomous vehicle and represent it as a high-dimensional environmental state bird's-eye view; input the high-dimensional environmental state bird's-eye view into the neural network framework, extract the low-dimensional representation of the environmental state information, and obtain low-dimensional environmental state information to simplify the input information of the reinforcement learning decision framework.

[0054] Step S2: Input the low-dimensional environmental state information into the reinforcement learning decision framework, perform multi-task decision-making through the reinforcement learning decision framework, and obtain decision actions, that is, a combination of multiple driving tasks in sequence; on this basis, decide the control action sequence of each driving task, that is, a combination of control actions in sequence; the control actions are composed of action units in the action unit library.

[0055] Step S3: After the decision-making action and control action sequence is planned, only the action unit sequence corresponding to the first driving task is executed.

[0056] Step S4: Roll forward to the next time step, repeat steps S1-S3, update the environmental state information of the autonomous vehicle, and make decisions again based on this and execute the action unit sequence of the first driving task after the decision update; repeat this planning process to achieve rolling optimization decision.

[0057] In this embodiment, under the Prescan simulation environment, the software's own visual sensor acquires the original image, processing the surrounding vehicles and obstacles, map information, and the vehicle itself into a bird's-eye view. Further, the surrounding vehicles and the vehicle are represented as square cells with dynamic dimensions; in an improved version, this could be represented as regions with different collision probabilities. The bird's-eye view is further processed by a variational autoencoder into low-dimensional latent states, serving as input to a reinforcement learning decision framework. This framework first outputs a combination sequence of driving tasks to determine the vehicle's actions over a future period. Further, for each driving task, the framework selects a suitable sequence of actions from a library of action cells between appropriate start and end points to complete a full driving task. Based on this, the framework outputs a series of control actions, which include speed and steering angle information. Furthermore, the vehicle state information included in the actions can be refined to achieve smoother and more accurate control. The actual control actions used are the first driving task and its specific action sequence. After execution, a decision control is completed, and the current original image is reacquired to begin the next planning decision.

[0058] In this embodiment, the vehicle information includes the vehicle's location and status information; the map information comes from an existing high-precision map or a semantic map obtained by a recognition module, and includes road path information. The road path information is global, containing a series of path points from the starting point to the destination, represented by a broken line in the bird's-eye view. In this embodiment, drivable areas can be represented in gray, and other non-drivable areas in black. The road path information is calculated by a path planner, containing a series of path points, and represented by a broken line in the bird's-eye view. The high-dimensional environmental state bird's-eye view is 256*256 pixels, processed and adjusted to 64*64 pixels, and the viewpoint is always aligned with the vehicle's view, with the vehicle located at a fixed position in the view. The process of converting environmental state information into a bird's-eye view is as follows: Figure 2 As shown.

[0059] In order to model vehicle behavior and use it as input to a reinforcement learning decision framework, the dimensions of the vehicle and surrounding vehicles are modeled as variable units with collision risk, which change dynamically according to the vehicle's driving behavior, so that rich vehicle driving behaviors can be controlled through datasets or controllers.

[0060] like Figure 3 As shown, the vehicle dimensions are modeled as variable elements, specifically as follows:

[0061] For a vehicle traveling at a constant speed, its front and rear dimensions are defined as follows:

[0062]

[0063] Among them, L headFor the variable unit, the dimension is the extension of the front of the original vehicle dimensions, L ttc The variable element is an extension of the vehicle's original dimensions, where ΔT is the time constant required to maintain minimum clearance with the vehicle in front, and V... HV V is the vehicle speed. front V represents the speed of the vehicle in front. rear This represents the speed of the vehicle behind.

[0064] For vehicles that are accelerating and decelerating, the front and rear dimensions of the vehicle are increased respectively; defined as follows:

[0065]

[0066] Where Δt represents the image acquisition interval and Δv represents the relative velocity;

[0067] For stationary obstacles, extend their dimensions backward to the safe braking distance;

[0068] For vehicles changing lanes, extend the dimension of the direction of lane change, with the extension dimension based on the lane dimension;

[0069] For large vehicles such as trucks, their front and rear dimensions are fixedly extended, with the rear dimension being extended more than the front dimension.

[0070] The variable unit is used to determine if an accident has occurred. If the variable units of the two vehicles overlap, then an accident between the two vehicles is determined.

[0071] In this embodiment, the low-dimensional representation of the extracted environmental state information is specifically as follows:

[0072] The low-dimensional representation of the environmental state information is obtained through a variational autoencoder. The encoding network encodes the original high-dimensional environmental state information into a low-dimensional state representation, and the decoding network decodes the low-dimensional state representation back into the original high-dimensional state information. To obtain the specific parameters in the network, the objective function is set as follows:

[0073]

[0074] Among them, L VAE D represents loss. KL Denotes KL divergence, Let μ(s) represent the prior probability distribution of a multivariate Gaussian distribution. t ),σ(s t ) represent the mean and standard deviation of the low-dimensional state representation, respectively. The reconstruction loss is used to measure how close the predicted frame is to the original frame.

[0075] The image obtained by the low-dimensional representation retains core information, including road geometry and obstacles, compared to the original image.

[0076] In this embodiment, the variational autoencoder comprises four 3×3 kernel-sized convolutional layers with 32, 64, 128, and 256 channels respectively; each convolutional layer is followed by a ReLU activation function; then a latent spatial layer of size 64 is fully connected to the last convolutional layer, and the system is trained using the Adam optimizer.

[0077] The variational autoencoder is pre-trained, and the resulting network is integrated into a reinforcement learning decision framework as a visual encoding layer; the parameters of the visual encoding layer no longer change with the changes in the reinforcement learning decision framework.

[0078] The variational autoencoder is trained in the Prescan environment. A vehicle model is built using Carsim, and the original images are acquired and processed into a bird's-eye view format for training the variational autoencoder.

[0079] like Figure 4 As shown, the reinforcement learning decision-making framework takes a sequence of action units as output and a low-dimensional state representation as input. Based on a phased design, the framework determines driving tasks over a future period, such as lane keeping, following the vehicle in front, or overtaking. Driving tasks can be customized, including but not limited to the tasks described above. After outputting a sequence of driving tasks, the framework further determines a sequence of control actions, such as acceleration, deceleration, and steering.

[0080] The combination sequence of control actions consists of action units from the action unit library. The action units contain vehicle speed and turning angle information and are obtained in a virtual environment by a pure tracking algorithm and a PID algorithm.

[0081] The design of the action unit includes the following steps;

[0082] Step 1: Determine the start and end points of any task; based on the vehicle's variable units, select the distance between the variable units of this vehicle and the vehicle in front to determine the action at the start of the task; the end point is defined as the center line of the target lane, the target speed, or both.

[0083] Step 2: Select feasible action units between the start point and the end point; generate feasible paths using a path planner, the feasible paths consisting of a series of points containing vehicle state information; obtain lateral and longitudinal control commands using a pure tracking algorithm and a PID algorithm; based on this, obtain a set of action units available for reinforcement learning.

[0084] The implementation method of the reinforcement learning decision framework is as follows:

[0085] The reinforcement learning decision framework employs a SAC network, which includes two Q-networks: a value network and a policy network. The value network consists of a visual encoding layer followed by five dense layers with hidden units ranging from 256 to 32. The policy network has the same structure as the value network, except that its last layer is split into two branches. The first branch represents the mean of the action, and the second branch represents its variance. All SAC networks are trained using the Adam optimizer.

[0086] The decision-making process relies on the weighting of factors in the reward function, ensuring that both driver and passenger comfort and satisfaction are considered; the reward function is designed as follows:

[0087] R = R v +R α +R c +R o +k

[0088] The items represent speed bonuses, steering smoothness bonuses, collision bonuses, lane departure bonuses, and vehicle stationary penalty bonuses, respectively.

[0089] The reinforcement learning decision framework outputs a sequence of control actions for driving tasks. For each driving task, the Q-learning algorithm is used to select an action unit sequence from the action unit library and execute the action unit sequence corresponding to the first driving task to complete a multi-task decision and control in a complex scenario. Based on this, decision control is performed cyclically to drive the vehicle to the target location.

[0090] The reinforcement learning decision-making framework is trained within the Prescan framework. The driving scenario design includes a three-lane straight section, a curved section, an obstacle section, and a roundabout section. Traffic participants include stationary obstacles, trucks, cars driving at different speeds, and cars randomly changing lanes or accelerating / decelerating.

[0091] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0092] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0093] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0094] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0095] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.

Claims

1. A multi-task coordination decision-making method for autonomous driving based on hierarchical rolling optimization, characterized in that, By using the environmental state information of autonomous vehicles as input to the reinforcement learning decision framework, the driving objective is planned as a series of driving tasks to achieve coordination between multiple tasks. Each driving task is specified as a control action, but only the control action of the first driving task is executed. Then, the process is rolled forward to the next time step, and the planning and execution of control actions are carried out again based on the updated environmental state information of the autonomous vehicles. This process is repeated to achieve rolling optimization decision-making. The method includes the following steps: Step S1: Obtain the original images around the autonomous vehicle using a visual sensor, process the images and obtain information about surrounding vehicles and obstacles; combine the information about surrounding vehicles and obstacles with the vehicle information and map information to form the environmental state information of the autonomous vehicle and represent it as a high-dimensional environmental state bird's-eye view; input the high-dimensional environmental state bird's-eye view into the neural network framework, extract the low-dimensional representation of the environmental state information, and obtain the low-dimensional environmental state information to simplify the input information of the reinforcement learning decision framework. Step S2: Input the low-dimensional environmental state information into the reinforcement learning decision framework, perform multi-task decision-making through the reinforcement learning decision framework, and obtain decision actions, that is, a combination of multiple driving tasks in sequence; on this basis, decide the control action sequence of each driving task, that is, a combination of control actions in sequence; the control actions are composed of action units in the action unit library. Step S3: After the decision-making action and control action sequence is planned, only the action unit sequence corresponding to the first driving task is executed; Step S4: Roll forward to the next time step, repeat steps S1-S3, update the environmental state information of the autonomous vehicle, and make decisions again based on this and execute the action unit sequence of the first driving task after the decision update; repeat this planning process to achieve rolling optimization decision. In order to model vehicle behavior and use it as input to a reinforcement learning decision framework, the dimensions of the vehicle and surrounding vehicles are modeled as variable units with collision risk, which change dynamically according to the vehicle's driving behavior. The vehicle dimensions are modeled as variable elements, as follows: For a vehicle traveling at a constant speed, its front and rear dimensions are defined as follows: in, The variable unit is an extension of the front dimension based on the original dimensions of the vehicle. The variable unit is an extension of the original vehicle dimensions. The time constant required to maintain minimum distance from the vehicle in front. For the vehicle's speed, The speed of the vehicle in front. The speed of the vehicle behind; For vehicles that are accelerating and decelerating, the front and rear dimensions of the vehicle are increased respectively; defined as follows: in, Indicates the image acquisition interval. Represents relative velocity; For stationary obstacles, extend their dimensions backward to the safe braking distance; For vehicles changing lanes, extend the dimension of the direction of lane change, with the extension dimension based on the lane dimension; For large vehicles, their front and rear dimensions are fixedly extended, with the rear dimension being extended more than the front dimension; The variable unit is used to determine if an accident has occurred. If the variable units of the two vehicles overlap, then an accident between the two vehicles is determined.

2. The autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization according to claim 1, characterized in that, The vehicle information includes the vehicle's location and status information; the map information comes from existing high-precision maps or semantic maps obtained by the recognition module, and includes road path information; the road path information is global path information, including a series of path points from the starting point to the end point, which is represented by broken lines in the bird's-eye view; the high-dimensional environment status bird's-eye view is 256*256 pixels, which is processed and adjusted to 64*64 pixels, and the viewpoint is always aligned with the vehicle view, with the vehicle located in a fixed position in the view.

3. The autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization according to claim 1, characterized in that, The low-dimensional representation of the extracted environmental state information is as follows: The low-dimensional representation of the environmental state information is obtained through a variational autoencoder. The encoding network encodes the original high-dimensional environmental state information into a low-dimensional state representation. To obtain the specific parameters in the network, the objective function is set as follows: in, Indicates loss, Denotes KL divergence, Let represent the prior probability distribution of a multivariate Gaussian distribution, where Let these represent the mean and standard deviation of the low-dimensional state representation, respectively. The reconstruction loss is used to measure how close the predicted frame is to the original frame.

4. The autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization according to claim 3, characterized in that, The variational autoencoder consists of four 3×3 kernel-sized convolutional layers with 32, 64, 128, and 256 channels respectively; each convolutional layer is followed by a ReLU activation function; then a latent spatial layer of size 64 is fully connected to the last convolutional layer and trained using the Adam optimizer. The variational autoencoder is pre-trained, and the resulting network is integrated into a reinforcement learning decision framework as a visual encoding layer; the parameters of the visual encoding layer no longer change with the change of the reinforcement learning decision framework. The variational autoencoder is trained in the Prescan environment. A vehicle model is built using Carsim, and the original images are acquired and processed into a bird's-eye view format for training the variational autoencoder.

5. The autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization according to claim 4, characterized in that, The reinforcement learning decision-making framework is based on a phased design and is used to determine driving tasks over a future period of time. After the reinforcement learning decision framework outputs a combination of driving tasks, it further decides on a combination of control actions. The combination sequence of control actions consists of action units from the action unit library. The action units contain vehicle speed and turning angle information and are obtained in a virtual environment by a pure tracking algorithm and a PID algorithm.

6. The autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization according to claim 5, characterized in that, The design of the action unit includes the following steps; Step 1: Determine the start and end points of any task; based on the vehicle's variable units, select the distance between the variable units of this vehicle and the vehicle in front to determine the action at the start of the task; the end point is defined as the center line of the target lane, the target speed, or both simultaneously. Step 2: Select feasible action units between the start point and the end point; generate feasible paths using a path planner, the feasible paths consisting of a series of points containing vehicle state information; obtain lateral and longitudinal control commands using a pure tracking algorithm and a PID algorithm; based on this, obtain a set of action units available for reinforcement learning.

7. The autonomous driving multi-task coordination decision-making method based on hierarchical rolling optimization according to claim 5, characterized in that, The implementation method of the reinforcement learning decision framework is as follows: The reinforcement learning decision framework employs a SAC network, which includes two Q-networks: a value network and a policy network. The value network consists of a visual encoding layer followed by five dense layers with hidden units ranging from 256 to 32. The policy network has the same structure as the value network, except that its last layer is split into two branches. The first branch represents the mean of the action, and the second branch represents its variance. All SAC networks are trained using the Adam optimizer. The decision-making process relies on the weighting of factors in the reward function, ensuring that both driver and passenger comfort and satisfaction are considered; the reward function is designed as follows: Each item represents a speed bonus, steering smoothness bonus, collision bonus, lane departure bonus, and vehicle stationary penalty; The reinforcement learning decision framework outputs a sequence of control actions for driving tasks. For each driving task, the Q-learning algorithm is used to select an action unit sequence from the action unit library and execute the action unit sequence corresponding to the first driving task to complete a multi-task decision and control in a complex scenario. Based on this, decision control is performed cyclically to drive the vehicle to the target location.