A vehicle-road cooperation automatic driving decision method based on hierarchical reinforcement learning
By deploying a local safety decision-making model in the vehicle's intelligent agent and generating verifiable credentials, combined with the security contribution assessment of the federated server and the aggregation of model parameters, the problems of safety evolution and data privacy protection in vehicle-road cooperative autonomous driving are solved, and safe and reliable vehicle-road cooperative autonomous driving decision-making is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 盐城原子智能科技有限责任公司
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
Smart Images

Figure CN122245136A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of autonomous driving technology, and in particular to a vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning. Background Technology
[0002] With the development of autonomous driving technology, decision-making systems are evolving from single-vehicle intelligence to integrated vehicle-road-cloud collaboration. Vehicle-road collaboration, through information sharing, is expected to overcome the limitations of single-vehicle perception and achieve safer and more efficient global decision-making. Against this backdrop, machine learning-based decision-making methods, especially reinforcement learning and hierarchical reinforcement learning, have become a research hotspot due to their powerful ability to handle complex scenarios and make sequential decisions. However, combining such data-driven methods with practical vehicle-road collaborative systems and ensuring their safe, reliable, and scalable deployment still faces a series of severe challenges, and existing technical solutions have significant shortcomings: 1. Limitations of single-vehicle intelligence and centralized learning: Early autonomous driving decisions relied on single-vehicle sensors and computing units, which had problems such as blind spots and insufficient handling of long-tail scenarios. The subsequent centralized cloud learning solution, although it can gather massive fleet data to train better models, has two major bottlenecks: first, the upload of raw data brings huge data transmission bandwidth pressure and user privacy leakage risks; second, after the model is updated and distributed, it is difficult to adapt to the personalized driving needs and road conditions of different vehicles and regions. 2. New problems arising from the introduction of federated learning: To protect data privacy, federated learning has been introduced into vehicle-cloud collaborative frameworks. Vehicles train models locally, uploading only model parameters rather than raw data, which are then aggregated in the cloud and distributed globally. However, existing vehicle-road cooperative solutions based on federated learning mostly directly follow the traditional FL paradigm, ignoring the safety primacy and data value heterogeneity unique to the field of autonomous driving. Therefore, there is an urgent need in this field for an innovative vehicle-road cooperative autonomous driving decision-making method that can guide vehicle swarm intelligence to continuously and reliably evolve in a safer direction while protecting data privacy, intelligently distinguish the value of different data, prioritize the absorption of experience with high safety contributions, and properly balance global safety consensus with local personalized driving strategies. Summary of the Invention
[0003] The purpose of this section is to outline some aspects of embodiments of the present invention and to briefly describe some preferred embodiments. Simplifications or omissions may be made in this section, as well as in the abstract and title of this application, to avoid obscuring the purpose of these documents; however, such simplifications or omissions should not be construed as limiting the scope of the invention.
[0004] In view of the problems existing in the vehicle-road cooperative autonomous driving decision-making methods based on hierarchical reinforcement learning, this invention is proposed.
[0005] Therefore, the purpose of this invention is to provide a vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning. It systematically solves the core problems of vehicle-road cooperative autonomous driving decision-making in terms of safety evolution, reliable cooperation, data value utilization, and personalized balance, and provides a practical and feasible technical path for moving towards large-scale, high-level, and reliable swarm intelligence autonomous driving.
[0006] To address the aforementioned technical problems, this invention provides the following technical solution: a vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning, wherein the method is executed in an architecture comprising multiple vehicle agents and a federated server, each vehicle agent deploying a local safety decision-making model, and the method includes the following steps: Step S1: Local safety decision-making and event recording. The vehicle intelligent agent uses its local safety decision-making model to process real-time vehicle-road cooperative perception data and generate driving decisions. The local safety decision-making model includes a safety kernel based on formal rules and an elastic decision layer based on hierarchical reinforcement learning. When the vehicle enters a preset safety-critical scenario, the verification data of the safety kernel and the decision data of the elastic decision layer are recorded to generate a local safety event data packet. Step S2: Security contribution assessment and credential generation step, based on the local security event data packet, assess the decision security contribution of the vehicle intelligent agent in the security-critical scenario, and generate verifiable credentials based on the security contribution; Step S3: Federated Aggregation Request Step, the vehicle agent uploads the credentials along with the updated model parameters of its local security decision model to the federated server. Step S4: Trusted Federation Aggregation Step. The federation server verifies each received credential and calculates the aggregation weight of the corresponding vehicle agent based on the verified credential. Based on the aggregation weight, the model parameter update amounts from multiple vehicle agents are weighted and aggregated to generate a global security decision model. Step S5: Model security distribution and deployment step, the global security decision model is distributed to each vehicle intelligent agent, and the vehicle intelligent agent integrates it with local personalized parameters to update its local security decision model.
[0007] As a preferred embodiment of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning described in this invention, wherein: in step S1, the safety kernel based on formal rules is specifically used for: Based on real-time perception data and vehicle dynamics, a dynamic safety behavior envelope is calculated using a responsibility-sensitive safety model; the driving decisions output by the flexible decision layer must be constrained within this dynamic safety behavior envelope.
[0008] As a preferred embodiment of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning described in this invention, the step S1 of generating a local security event data packet specifically includes: Record the real-time boundary value of the dynamic safety behavior envelope, the distance of the elastic decision layer output decision relative to the boundary, and the vehicle's final actual driving trajectory; when the elastic decision layer decision triggers safety kernel intervention or is in an edge state near the boundary, it is marked as valid safety event data; The preset safety-critical scenario is triggered by at least one of the following conditions: (a) Calculate the theoretical minimum safe distance determined by the real-time dynamic safe behavior envelope. Distance from actual vehicle ratio ,when Triggered when less than or equal to the first threshold T1; (b) Calculate the minimum relative margin between the output decision of the resilient decision layer and the envelope boundary of the dynamic safety behavior. ,when Triggered when less than or equal to the second threshold T2; (c) Probability of future spatiotemporal conflicts predicted based on vehicle-road cooperative perception ,when Triggered when greater than or equal to the third threshold T3; Wherein, T1, T2, and T3 are preset constants or parameters that are dynamically adjusted according to the performance of the global model.
[0009] As a preferred embodiment of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning described in this invention, the safety contribution is evaluated based on the following metrics: The recorded actual driving trajectory of the vehicle is compared with the dynamic safety behavior envelope calculated in advance by the safety kernel to calculate the trajectory safety margin. The safety contribution is positively correlated with the trajectory safety margin, and a basic positive contribution is obtained when the actual driving trajectory is completely within the dynamic safety behavior envelope; If the actual driving trajectory is closer to the envelope boundary than the historical average trajectory without triggering safety kernel intervention, it will receive additional efficiency optimization contribution.
[0010] As a preferred embodiment of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning described in this invention, wherein: in step S4, the aggregation weight is calculated based on the credentials. The specific method is as follows: ; in, For the vehicle intelligent agent obtained from the credential decoding The security contribution score value; The sum of the safety contribution scores of all participating vehicle agents in this round of federated aggregation; For vehicle-based intelligent agents The adjustment factor for local data diversity is positively correlated with the degree of difference between local data distribution and global data distribution; The adjustment factor The calculation method is as follows: ; in, Indicates vehicle intelligent agent Local scene data distribution Global scene data distribution maintained in the cloud KL divergence between them; This is a preset coefficient to encourage diversity. The global scene data distribution The data is statistically maintained by the federal server based on the anonymized scene feature vectors uploaded by each vehicle in each federal aggregation.
[0011] As a preferred embodiment of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning described in this invention, wherein: in step S5, the vehicle intelligent agent fuses the global safety decision model with local personalized parameters, specifically including: The model parameters in the global safety decision model that are strongly related to safety are replaced with the corresponding local parameters. For the personalized parameter layer in the elastic decision layer that is related to driving style and local road conditions, a weighted average method is used to merge the global model parameters with the existing local parameters, so as to retain personalized driving strategies while ensuring safety consensus.
[0012] As a preferred embodiment of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning described in this invention, the method further includes a model effectiveness continuous monitoring step S6: After the updated local safety decision model is deployed in the vehicle intelligent agent, its performance indicators are continuously monitored in common and safety-critical scenarios. If the performance drops below the threshold, the system automatically rolls back to the previous version of the model and uploads this event as negative feedback data to the federated server to adjust subsequent federated aggregation strategies.
[0013] A safe and reliable vehicle-road cooperative autonomous driving decision-making federated evolutionary system, the system comprising: Multiple vehicle intelligent agent terminals, each terminal including at least: a local computing module for performing local security decisions and recording security events, and a credential generation module for generating credentials; The federated server includes at least: a credential verification and weight calculation module for verifying credentials and calculating aggregation weights, and a federated aggregation engine for performing weighted aggregations; A permissioned blockchain network is communicatively connected to the credential generation module and the credential verification and weight calculation module for storing and verifying the credential.
[0014] The beneficial effects of this invention are: This invention deeply integrates a formalized safety kernel with a data-driven, resilient decision-making layer. The safety kernel sets insurmountable dynamic safety boundaries for all decisions, ensuring absolute safety in each vehicle's local decisions. Building on this, the resilient layer's optimization goal is to pursue comfort and efficiency within these safety boundaries. Crucially, by using safety contribution as the core criterion for federated aggregation weights, the cloud server prioritizes model updates from vehicles that perform exceptionally well in safety-critical scenarios during aggregation. This clearly guides the collective intelligence of the entire fleet towards "how to drive more safely and effectively in various complex scenarios," achieving continuous and autonomous evolution of the system's overall safety level. This invention innovatively introduces a permissioned blockchain network to store security contribution credentials. The contribution evaluation result corresponding to each high-value security event is recorded in hash form on an immutable distributed ledger, generating a globally verifiable credential. The federated server must verify the authenticity of this credential before aggregation. This mechanism brings two core advantages: first, reliable contributions eliminate the possibility of nodes falsely reporting contributions; second, resistance to poisoning attacks—any malicious node attempting to upload harmful updates will be rejected by the server or given extremely low weight because it cannot provide a genuine, consensus-verified security contribution credential that matches the on-chain record, thus fundamentally enhancing the robustness of the federated learning system. Attached Figure Description
[0015] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein: Figure 1 This is a schematic diagram illustrating the overall method steps of the vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning of the present invention. Detailed Implementation
[0016] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
[0017] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.
[0018] Reference Figure 1 This paper presents a vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning. The method is executed in an architecture containing multiple vehicle agents and a federated server. Each vehicle agent deploys a local safety decision-making model. The method includes the following steps: Step S1: Local safety decision-making and event recording. The vehicle intelligent agent uses its local safety decision-making model to process real-time vehicle-road cooperative perception data and generate driving decisions. The local safety decision-making model includes a safety kernel based on formal rules and an elastic decision layer based on hierarchical reinforcement learning. When the vehicle enters a preset safety-critical scenario, the verification data of the safety kernel and the decision data of the elastic decision layer are recorded to generate a local safety event data packet. Step S2: Security contribution assessment and credential generation step. Based on the local security event data packet, assess the vehicle intelligent agent's decision security contribution in safety-critical scenarios, and generate verifiable credentials based on the security contribution. Step S3: Federated Aggregation Request Step, the vehicle agent uploads the credentials along with the updated model parameters of its local safety decision model to the federated server. Step S4: Trusted Federation Aggregation Step. The federation server verifies each received credential and calculates the aggregation weight of the corresponding vehicle agent based on the verified credential. Based on the aggregation weight, the model parameter update amounts from multiple vehicle agents are weighted and aggregated to generate a global security decision model. Step S5: Model security distribution and deployment steps, the global security decision model is distributed to each vehicle intelligent agent, the vehicle intelligent agent integrates it with local personalized parameters, and updates its local security decision model. In step S1, the security kernel based on formal rules is specifically used for: Based on real-time perception data and vehicle dynamics, a dynamic safety behavior envelope is calculated using a responsibility-sensitive safety model; the driving decisions output by the flexible decision layer must be constrained within the dynamic safety behavior envelope.
[0019] Specifically, generating the local security event data packet in step S1 includes: Record the real-time boundary values of the dynamic safety behavior envelope, the distance of the elastic decision layer output decision relative to the boundary, and the vehicle's final actual driving trajectory; when the elastic decision layer decision triggers safety kernel intervention or is in an edge state near the boundary, it is marked as valid safety event data; The preset safety-critical scenario is triggered by at least one of the following conditions: (a) Calculate the theoretical minimum safe distance determined by the real-time dynamic safe behavior envelope. Distance from actual vehicle ratio ,when Triggered when less than or equal to the first threshold T1; (b) Calculate the minimum relative margin between the output decision of the elastic decision layer and the envelope boundary of dynamic safety behavior. ,when Triggered when less than or equal to the second threshold T2; (c) Probability of future spatiotemporal conflicts predicted based on vehicle-road cooperative perception ,when Triggered when greater than or equal to the third threshold T3; Among them, T1, T2, and T3 are preset constants or parameters that are dynamically adjusted according to the performance of the global model.
[0020] Specifically, decision margin : ; The elastic layer is the planned trajectory point at the th The closest distance to the safety boundary in each dimension (such as horizontal position, vertical velocity), It is the width of the safety envelope in this dimension; The assessment of safety contribution is based on the following metrics: The actual driving trajectory of the recorded vehicle is compared with the dynamic safety behavior envelope calculated in advance by the safety kernel to calculate the trajectory safety margin. The safety contribution is positively correlated with the trajectory safety margin, and a basic positive contribution is obtained when the actual driving trajectory is completely within the dynamic safety behavior envelope; If the actual driving trajectory is closer to the envelope boundary than the historical average trajectory without triggering safety kernel intervention, it will gain additional efficiency optimization contribution.
[0021] In step S4, the aggregation weight is calculated based on the voucher. The specific method is as follows: ; in, For the vehicle intelligent agent obtained from the credential decoding The security contribution score value; The sum of the safety contribution scores of all participating vehicle agents in this round of federated aggregation; For vehicle-based intelligent agents The adjustment factor for local data diversity is positively correlated with the degree of difference between local data distribution and global data distribution; Adjustment factor The calculation method is as follows: ; in, Indicates vehicle intelligent agent Local scene data distribution Global scene data distribution maintained in the cloud KL divergence between them; This is a preset coefficient to encourage diversity. Global scene data distribution The data is statistically maintained by the federal server based on the anonymized scene feature vectors uploaded by each vehicle in each federal aggregation.
[0022] In step S5, the vehicle intelligent agent fuses the global safety decision model with local personalized parameters, specifically including: The model parameters in the global safety decision model that are strongly related to safety are replaced with the corresponding local parameters. For the personalized parameter layer in the elastic decision layer that is related to driving style and local road conditions, a weighted average method is used to merge the global model parameters with the existing local parameters, so as to retain personalized driving strategies while ensuring safety consensus.
[0023] Specifically, the contribution calculation in the above process is as follows: Basic security contribution: Compare the actual trajectory with the dynamic security envelope; if the entire trajectory is within the envelope, a basic score is obtained. The average trajectory margin can be further calculated. Contribution is positively correlated with ; Efficiency optimization contribution: If the margin of the actual trajectory in this case... This is lower than the historical average margin for similar scenarios. If no nuclear intervention was triggered, then the decision is considered more efficient under the premise of safety; efficiency contribution. ,in This is the reward coefficient; Ultimately, the security contribution score for a single incident. for: ; Blockchain-based evidence storage generates certificates: The vehicle terminal will use the scene ID (hash value) and contribution score. Generate an event digest using timestamps, etc., and sign it with a private key; The signature digest is sent to a smart contract on a permissioned blockchain network. After the contract verifies the signature, it records the digest hash value on the blockchain.
[0024] Once the blockchain network reaches a consensus, it generates a certificate containing a transaction hash (TxHash), a block number (Block#), and a timestamp, which is returned to the vehicle terminal. This certificate is unforgeable, unalterable, and verifiable across the entire chain.
[0025] More specifically, model distribution: the federated server will update the global model parameters. The encrypted packets are then distributed to all participating vehicles. Local integration: The vehicle terminal receives Subsequently, it is not a simple replacement, but a differentiated integration is performed: Full replacement of safety-related parameters: For parts of the model that are strongly related to the safety kernel function (e.g., parameters of neural network layers that determine whether a state is dangerous), directly use... Replace the local parameters with the corresponding parameters in the file. This ensures that the entire team has a unified and up-to-date "safety baseline." Personalized parameter weighted averaging: For parameters related to driving style and regional road conditions (such as frequently occurring special ramps) in the flexible decision-making layer, a weighted average fusion is used. ;in Personalized retention factor (0 < <1), which can be set by the user or adaptively adjusted based on the amount of local data, allows the vehicle to retain a personalized driving experience while adhering to safety consensus. Furthermore, the method also includes a continuous monitoring step S6 for model effectiveness: After the updated local safety decision model is deployed in the vehicle intelligent agent, its performance indicators are continuously monitored in common and safety-critical scenarios. If the performance drops below the threshold, the system automatically rolls back to the previous version of the model and uploads this event as negative feedback data to the federated server to adjust subsequent federated aggregation strategies.
[0026] One of them is a safe and reliable vehicle-road cooperative autonomous driving decision-making federated evolution system, which includes: Multiple vehicle intelligent agent terminals, each terminal including at least: a local computing module for performing local security decisions and recording security events, and a credential generation module for generating credentials; The federated server includes at least: a credential verification and weight calculation module for verifying credentials and calculating aggregation weights, and a federated aggregation engine for performing weighted aggregations; The permissioned blockchain network is communicatively connected to the credential generation module and the credential verification and weight calculation module, and is used for storing and verifying credentials.
[0027] Specifically, after the vehicle deploys a new model, a monitoring period is started. The system tracks performance metrics in common scenarios (such as smooth following) and safety-critical scenarios, such as: comfort (acceleration change rate), traffic efficiency (average speed), and safety kernel intervention frequency.
[0028] If within a fixed period (such as 100 hours of driving), the safety kernel intervention frequency increases by more than a certain threshold (such as 50%) compared to the previous version model, or the comfort significantly decreases, it is determined that the performance has degraded. The system will automatically roll back to the previous version of the local model to ensure safety. At the same time, this rollback event is recorded as a "negative contribution" and uploaded to the server together with the scenario data. In subsequent aggregations, the federated server can酌情 reduce the weight of the data or nodes related to this failed update to achieve self-correction and robustness improvement of the system. Specific embodiments: Suppose the present invention is implemented in a fleet of 1000 intelligent connected vehicles; Scenario trigger: Vehicle A is driving on a highway curve in rainy weather (the high friction coefficient decreases). At this time, the preceding vehicle B suddenly decelerates. The cooperative perception system of Vehicle A (combining on-vehicle sensors and roadside information) predicts a high conflict probability T3(0.3), triggering the recording of a safety-critical scenario; Decision-making and recording: The safety kernel of Vehicle A calculates the current minimum safety distance meters, the actual vehicle distance meters, = 1.125 < T1(1.2), further confirmation; The elastic decision-making layer outputs a decision with a large deceleration, and the actual trajectory is smooth and stable and within the safety envelope; The system records the complete event data packet; Contribution evaluation and on-chain: Local evaluation shows that the actual trajectory margin of this time is good, and compared with historical similar scenario data, the decision is more decisive (higher efficiency). Calculate the contribution degree =85 points. Vehicle A signs the event summary and sends it to the blockchain network to obtain the credential {TxHash: "0xabc…", Block#: 123456}.
[0030] Federated aggregation: In this round, 200 vehicles participated in the contribution. After the federated server verifies all credentials, it reads the contribution degree. Suppose the of Vehicle A = 85, the total contribution degree of all vehicles =15000, and the local data distribution of Vehicle A (mostly "rainy highway" scenarios) is quite different from the global distribution, = 0.8, =0.5, then =1.4. Aggregate weight of vehicle A. =(85 / 15000)*1.4≈0.00793, which is higher than the weight it obtained based solely on its contribution ratio (0.00567); the server performs weighted aggregation to generate a new generation of global model; Deployment and Integration: The new fleet model is distributed; Vehicle A replaces its old local parameters with the safety parameters from the new model; regarding driving style parameters, Vehicle A's driver prefers a smooth driving style and sets... =0.7, therefore the new local strategy retains 70% of the original stable style and incorporates 30% of the new knowledge from the global model.
[0031] Monitoring: In the following days, vehicle A's braking performance in the rain was monitored to be smoother and safer, with no abnormal rollback occurring.
[0032] This invention features safety-oriented evolution: by combining formal safety rules with learning models and using safety contribution as the core weight of federated aggregation, it guides vehicle swarm intelligence to continuously evolve in a safer direction, rather than simply pursuing task efficiency.
[0033] In another aspect, the present invention also discloses a computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to perform the steps of the method described above.
[0034] In another aspect, the present invention also discloses a computer device, including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor performs the steps of the method described above.
[0035] In another embodiment provided in this application, a computer program product containing instructions is also provided, which, when run on a computer, causes the computer to execute any of the intelligent adjustment methods for the health status of vehicle occupants in the above embodiments.
[0036] It is understood that the systems, devices, and storage media provided in the embodiments of the present invention correspond to the methods provided in the embodiments of the present invention, and the explanations, examples, and beneficial effects of the relevant content can be referred to the corresponding parts of the above methods.
[0037] In the above embodiments, implementation can be achieved entirely or partially through software, hardware, firmware, or any combination thereof. When implemented using software, it can be implemented entirely or partially as a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (SSD)).
[0038] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0039] The various embodiments in this specification are described in a related manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.
[0040] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning, characterized in that, The method is executed in an architecture comprising multiple vehicle agents and a federated server, each vehicle agent deploying a local security decision model, and the method includes the following steps: Step S1: Local safety decision-making and event recording. The vehicle intelligent agent uses its local safety decision-making model to process real-time vehicle-road cooperative perception data and generate driving decisions. The local safety decision-making model includes a safety kernel based on formal rules and an elastic decision layer based on hierarchical reinforcement learning. When the vehicle enters a preset safety-critical scenario, the verification data of the safety kernel and the decision data of the elastic decision layer are recorded to generate a local safety event data packet. Step S2: Security contribution assessment and credential generation step, based on the local security event data packet, assess the decision security contribution of the vehicle intelligent agent in the security-critical scenario, and generate verifiable credentials based on the security contribution; Step S3: Federated Aggregation Request Step, the vehicle agent uploads the credentials along with the updated model parameters of its local security decision model to the federated server. Step S4: Trusted Federation Aggregation Step. The federation server verifies each received credential and calculates the aggregation weight of the corresponding vehicle agent based on the verified credential. Based on the aggregation weight, the model parameter update amounts from multiple vehicle agents are weighted and aggregated to generate a global security decision model. Step S5: Model security distribution and deployment step, the global security decision model is distributed to each vehicle intelligent agent, and the vehicle intelligent agent integrates it with local personalized parameters to update its local security decision model.
2. The vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning according to claim 1, characterized in that: In step S1, the security kernel based on formal rules is specifically used for: Based on real-time perception data and vehicle dynamics, a dynamic safety behavior envelope is calculated using a responsibility-sensitive safety model; the driving decisions output by the flexible decision layer must be constrained within this dynamic safety behavior envelope.
3. The vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning according to claim 2, characterized in that: The specific steps in step S1 of generating the local security event data packet include: Record the real-time boundary value of the dynamic safety behavior envelope, the distance of the elastic decision layer output decision relative to the boundary, and the vehicle's final actual driving trajectory; when the elastic decision layer decision triggers safety kernel intervention or is in an edge state near the boundary, it is marked as valid safety event data; The preset safety-critical scenario is triggered by at least one of the following conditions: (a) Calculate the theoretical minimum safe distance determined by the real-time dynamic safe behavior envelope. Distance from actual vehicle ratio ,when Triggered when less than or equal to the first threshold T1; (b) Calculate the minimum relative margin between the output decision of the resilient decision layer and the envelope boundary of the dynamic safety behavior. ,when Triggered when less than or equal to the second threshold T2; (c) Probability of future spatiotemporal conflicts predicted based on vehicle-road cooperative perception ,when Triggered when greater than or equal to the third threshold T3; Wherein, T1, T2, and T3 are preset constants or parameters that are dynamically adjusted according to the performance of the global model.
4. The vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning according to claim 3, characterized in that: The assessment of the security contribution is based on the following metrics: The recorded actual driving trajectory of the vehicle is compared with the dynamic safety behavior envelope calculated in advance by the safety kernel to calculate the trajectory safety margin. The safety contribution is positively correlated with the trajectory safety margin, and a basic positive contribution is obtained when the actual driving trajectory is completely within the dynamic safety behavior envelope; If the actual driving trajectory is closer to the envelope boundary than the historical average trajectory without triggering safety kernel intervention, it will receive additional efficiency optimization contribution.
5. The vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning according to claim 1, characterized in that: In step S4, the aggregation weight is calculated based on the voucher. The specific method is as follows: ; in, For the vehicle intelligent agent obtained from the credential decoding The security contribution score value; The sum of the safety contribution scores of all participating vehicle agents in this round of federated aggregation; For vehicle-based intelligent agents The adjustment factor for local data diversity is positively correlated with the degree of difference between local data distribution and global data distribution; The adjustment factor The calculation method is as follows: ; in, Indicates vehicle intelligent agent Local scene data distribution Global scene data distribution maintained in the cloud KL divergence between them; This is a preset coefficient to encourage diversity. The global scene data distribution The data is statistically maintained by the federal server based on the anonymized scene feature vectors uploaded by each vehicle in each federal aggregation.
6. The vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning according to claim 1, characterized in that: In step S5, the vehicle intelligent agent fuses the global safety decision model with local personalized parameters, specifically including: The model parameters in the global safety decision model that are strongly related to safety are replaced with the corresponding local parameters. For the personalized parameter layer in the elastic decision layer that is related to driving style and local road conditions, a weighted average method is used to merge the global model parameters with the existing local parameters, so as to retain personalized driving strategies while ensuring safety consensus.
7. The vehicle-road cooperative autonomous driving decision-making method based on hierarchical reinforcement learning according to claim 1, characterized in that: The method also includes a continuous model effectiveness monitoring step S6: After the updated local safety decision model is deployed in the vehicle intelligent agent, its performance indicators are continuously monitored in common and safety-critical scenarios. If the performance drops below the threshold, the system automatically rolls back to the previous version of the model and uploads this event as negative feedback data to the federated server to adjust subsequent federated aggregation strategies.
8. A safe and reliable vehicle-road cooperative automated driving decision-making federated evolutionary system, characterized in that, The system for implementing the method as described in any one of claims 1 to 7 comprises: Multiple vehicle intelligent agent terminals, each terminal including at least: a local computing module for performing local security decisions and recording security events, and a credential generation module for generating credentials; The federated server includes at least: a credential verification and weight calculation module for verifying credentials and calculating aggregation weights, and a federated aggregation engine for performing weighted aggregations; A permissioned blockchain network is communicatively connected to the credential generation module and the credential verification and weight calculation module for storing and verifying the credential.
9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it causes the processor to perform the steps of the method as described in any one of claims 1 to 7.
10. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the computer program is executed by the processor, it causes the processor to perform the steps of the method as described in any one of claims 1 to 7.