An open-pit mine unmanned truck intersection cooperative control method for identifying dynamic working conditions
By deploying a collaborative control system of onboard edge computing and cloud-based training and control center on unmanned trucks in open-pit mines, the physical model parameters are corrected in real time and dual safety checks are performed, solving the problem of collaborative passage under dynamic working conditions and improving the safety and energy efficiency of the transportation system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA UNIV OF MINING & TECH
- Filing Date
- 2026-04-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing unmanned collaborative control methods for intersections in open-pit mines are ill-suited to the complex dynamic conditions of mines, leading to frequent rear-end collisions or skidding accidents. Furthermore, purely data-driven AI decision-making lacks physical safety verification, fails to meet the inherent safety requirements of mines, and fails to fully utilize the regenerative braking characteristics during heavy-load downhill driving.
The system employs a collaborative control system that combines an onboard edge computing terminal and a cloud-based training and control center. It enables multi-vehicle status interaction through a V2X direct communication network. By integrating data preprocessing, parameter identification and status estimation, collaborative decision-making control, and a dual-loop safety circuit breaker module, it corrects physical model parameters in real time and performs dual safety checks to optimize energy efficiency.
It achieves all-weather adaptability, eliminates safety accidents, ensures robustness of intersection passage and high efficiency of energy management, and meets mine safety requirements.
Smart Images

Figure CN122245141A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a collaborative control method for unmanned truck intersections, specifically a collaborative control method for unmanned truck intersections in open-pit mines that identifies dynamic operating conditions. Background Technology
[0002] With the large-scale application of unmanned driving technology in open-pit mines, collaborative traffic flow at intersections has become a key bottleneck restricting transportation efficiency and safety. However, existing scheduling methods (such as rule-based control or general reinforcement learning) are mostly based on static environment models and flat road assumptions, making it difficult to adapt to the complex dynamic conditions in mines. On the one hand, the adhesion coefficient of mine roads fluctuates drastically due to rain, snow, and water spraying operations, and the actual vehicle load changes in real time with fuel consumption and ore adhesion, leading to distortion of dynamic boundaries calculated based on fixed parameters, which can easily cause rear-end collisions or skidding accidents. On the other hand, purely data-driven AI decision-making lacks deterministic physical safety verification, posing an unexplained risk of failure and failing to meet the inherent safety requirements of mines. In addition, existing strategies ignore the regenerative braking characteristics of electric wheel mining trucks when heavily loaded downhill, failing to fully utilize gravitational potential energy for energy recovery. Therefore, a collaborative control system integrating dynamic parameter identification, dual-loop safety circuit breaking, and energy efficiency optimization is urgently needed. Summary of the Invention
[0003] To address the problems existing in the prior art, this invention provides a collaborative control method for unmanned trucks at intersections in open-pit mines that identifies dynamic working conditions. It integrates dynamic parameter identification, dual-loop safety circuit breaking, and energy efficiency optimization, and solves the problem of collaborative passage under complex working conditions by real-time correction of physical model parameters and dual safety verification.
[0004] To achieve the above objectives, the present invention provides the following technical solution: a collaborative control system for unmanned trucks at intersections in open-pit mines that identifies dynamic working conditions, comprising an on-board edge computing terminal and a cloud-based training and control center; the on-board edge computing terminals interact with each other via a V2X direct communication network; the on-board edge computing terminal includes a data preprocessing module, a parameter identification and state estimation module, a collaborative decision-making main control module, and a dual-loop safety circuit breaker module; the data preprocessing module is connected to the on-board sensor terminal, the collaborative decision-making main control module, the parameter identification and state estimation module, and the cloud-based training and control center respectively; the parameter identification and state estimation module and the cloud-based training and control center are both connected to the collaborative decision-making main control module; and the collaborative decision-making main control module is connected to the dual-loop safety circuit breaker module. Data preprocessing module: used to process high-noise heterogeneous data uploaded from vehicle-mounted sensors; at the same time, it is used to record the state transitions, action executions and reward feedback within a single vehicle scheduling cycle, and to construct a priority experience replay matrix to eliminate data temporal correlation, providing standardized samples for model training; Parameter identification and state estimation module: It is used to receive the perception data processed by the data preprocessing module, calculate the tire adhesion coefficient μ and the actual load mass m of the vehicle at the current intersection in real time, and construct the current dynamic physical safety envelope of the vehicle based on the slope-load coupling model. Collaborative decision-making main control module: includes a high-precision 3D map and is equipped with an agent network based on gradient-load perception multi-agent reinforcement learning, which is used to generate normalized intersection passage expected acceleration commands based on vehicle status and physical envelope; Dual-ring safety circuit breaker module: used to perform physical feasibility verification and collision risk prediction on AI-generated instructions. When the instruction exceeds the physical envelope or the collision time (TTC) is lower than the threshold, it will forcibly take over and issue the minimum risk strategy (MRM). Cloud-based training and control center: Equipped with a Critic network, it uses priority experience samples from the experience replay database and combines them with a multi-objective reward function to calculate the temporal difference error. It then performs centralized evaluation and gradient updates of the policies of all vehicle-side Actor networks and sends the updated policy parameters to the collaborative decision-making control module.
[0005] Furthermore, the on-board sensor includes one or more of the following: wheel speed sensor, differential GPS / IMU integrated navigation unit, hydropneumatic suspension pressure sensor, drive motor torque sensor, and brake temperature sensor.
[0006] Furthermore, the parameter identification and state estimation module includes working condition identification logic, and the identification method is as follows: Road surface adhesion status assessment: Real-time acquisition of wheel linear velocity v measured by wheel speed sensor. w The vehicle's ground speed v, measured by a high-precision differential GPS / IMU integrated navigation unit, is used to calculate the tire longitudinal slip ratio. When slip ratio When the threshold of the preset linear region is exceeded, the current peak road adhesion coefficient is back-calculated using an extended Kalman filter combined with a tire dynamics model. ; Load mass status assessment: Data from the hydropneumatic suspension pressure sensors are collected and initially calibrated using the vehicle's static weight; during vehicle operation, the motor drive torque equation is used. Real-time correction of vehicle mass estimates based on recursive least squares method Eliminate quality deviations caused by fuel consumption and ore adhesion; In the formula: T e The torque output by the drive motor is represented by m; the dynamic total mass of the vehicle to be identified is represented by a; the current longitudinal acceleration of the vehicle is represented by F. resistance This represents the total resistance force currently experienced by the vehicle; r represents the effective rolling radius of the drive wheels.
[0007] Furthermore, the method by which the collaborative decision-making main control module performs intersection collaborative control of the unmanned truck is as follows: S301: Define the multi-agent state space S and action space A; the state space S contains the normalized position, speed, remaining battery power, current intersection branch slope θ, identified load m, and adhesion coefficient μ of all vehicles; the action space A is a continuous acceleration control command. S302: Construct a reinforcement learning model; the model consists of an Actor network deployed on the vehicle and a Critic network deployed in the cloud training and control center; the Actor network is used to output the vehicle's scheduling actions in real time during the distributed execution phase, and the Critic network is used to evaluate the policy performance of the Actor network based on the global state and joint actions during the centralized training phase. The Actor network employs a hierarchical feature coding architecture, comprising a basic feature coding layer and a physical coupling sensing layer. The physical coupling sensing layer introduces a gated modulation mechanism, using the slope θ and load m to generate a gated signal. Nonlinear weighting is applied to the basic features; In the formula: g represents the generated physical gating signal; σ represents the Sigmoid nonlinear activation function; W g represents the learnable weight matrix in the gated neural network; [m,θ] represents the physical feature input vector formed by concatenating the dynamic load m output by the parameter identification and state estimation module with the current intersection branch slope θ; b g This represents the learnable bias vector in a gated neural network; S303: Calculate multidimensional adaptive attention weights; process vehicle interaction information using four parallel attention heads respectively: Spatiotemporal head: Calculates the correlation between Euclidean distance and speed difference between vehicles; Physics head: Calculate the difference in potential energy between vehicles due to gradient and load. The degree of correlation; In the formula: m represents the absolute difference in gravitational potential energy between vehicle i and vehicle j; i This indicates the dynamic load capacity of vehicle i after real-time correction by the parameter identification and state estimation module; m j The dynamic load mass of the neighboring vehicle j broadcast via V2X within the interaction range is represented by g; g represents the gravitational acceleration constant; h i Indicates the relative elevation of the intersection branch where vehicle i is currently located; h j This indicates the relative elevation of the intersection branch where the adjacent vehicle j is currently located; Task Header: Assign weight matrix based on vehicle task priority; Risk head: Based on the estimated time to collision (TTC) ijThe reciprocal of the formula generates a mask matrix, which forces attention to high-risk vehicles; The final feature representation is a weighted fusion of the four heads: ; In the formula: h' represents the enhanced feature vector after processing and fusion through a multi-dimensional adaptive attention mechanism; Concat represents the vector concatenation operation; head1, head2, head3, and head4 represent the local feature representation vectors output by the spatiotemporal head, physical head, task head, and risk head after feature extraction, respectively; W O This represents the linear transformation weight matrix of the output layer; S304: Generating and constraining actions; the Actor network outputs the original actions. Using a state-dependent differentiable mapping function, it is mapped to the physical boundary calculated by the parameter identification module. Within, the actual control command a is obtained. real ; The mapping formula is: ; In the formula: a real This represents the actual control commands ultimately output to the vehicle's underlying controller for execution; a raw This represents the dimensionless raw action value directly output by the Actor network, whose range is restricted to [-1, 1] by the activation function; This indicates that the parameter identification and state estimation module calculates the upper boundary of the physically achievable maximum acceleration based on the current vehicle dynamic state S; The parameter identification and state estimation module calculates the lower boundary of the physical minimum achievable acceleration (i.e., the maximum physical braking deceleration of the vehicle under the current operating conditions) based on the current vehicle dynamic state S.
[0008] Furthermore, the execution logic of the dual-ring safety fuse module is as follows: First step: Receiving instructions a output from the collaborative decision-making main control module real Let's check again whether it satisfies a. min real max If it exceeds the limit, it will be truncated directly; where a min Based on the current brake temperature T brake The maximum physical braking deceleration is calculated based on the road surface adhesion coefficient μ; if it does not exceed this value, the process proceeds to the second stage. The second step: Substitute the command into the vehicle's kinematics model to predict the trajectory in the next N seconds; if the predicted trajectory's minimum distance from other vehicles or obstacles is less than the dynamic safety threshold D... safe If the original command is immediately blocked, the emergency braking logic is triggered, and an alarm signal is sent to the roadside base station; if it is not less than the dynamic safety threshold D safe If so, the original instruction will be executed normally.
[0009] Furthermore, the cloud-based training control center employs a multi-objective reward function to guide training: In the formula: R total R represents the total reward value obtained by the agent within a single decision step; w1, w2, w3, and w4 represent the weight coefficients of each reward; speed Indicates speed bonus; R progress Indicates progress reward items; R collision This refers to a collision penalty, where a large negative reward is given to the system when the distance between the vehicle and other vehicles or obstacles is less than a safe threshold, or when the expected collision time is less than a safe minimum; R energy This refers to the energy management reward item. When a vehicle is heavily loaded and descending a slope towards an intersection, if the vehicle speed is maintained within the high-efficiency regenerative braking range of the electric motor, the reward will be given. low v high Positive rewards are given if the conditions are met, otherwise negative penalties are imposed.
[0010] A collaborative control method for unmanned trucks at intersections in open-pit mines, which identifies dynamic operating conditions, utilizes the aforementioned system and includes the following steps: S1: The on-board sensor collects vehicle operation data in real time and transmits it to the data preprocessing module, which then uploads it to the cloud training and control center. The data preprocessing module is used to perform timestamp alignment, outlier removal, and dimensionless normalization on the high-noise heterogeneous data uploaded from the vehicle sensor terminal. At the same time, it is used to record the state transitions, action execution, and reward feedback of the vehicle within a single scheduling cycle, and to construct a priority experience replay matrix to eliminate the temporal correlation of data, providing standardized samples for model training. S2: The parameter identification and state estimation module uses the slip ratio to estimate the road adhesion coefficient, uses the dynamic equation to correct the vehicle mass, and combines high-precision map slope data to update the dynamic performance envelope of each vehicle in real time. S3: The collaborative decision-making main control module inputs the updated state into the agent network based on gradient-load perception multi-agent reinforcement learning. After aggregating features through a multi-dimensional attention mechanism, it outputs the expected acceleration that balances efficiency and energy recovery. For heavily loaded downhill vehicles, it prioritizes outputting a constant speed command that maintains the peak regenerative braking power. S4: The dual-ring safety fuse module performs dual verification of the physical feasibility and collision risk of the desired acceleration, and outputs the final control command; S5: The vehicle's underlying controller executes the final instructions and asynchronously feeds back the state transition, action, and reward signals after a single interaction to the cloud-based experience playback database; the Critic network uses this feedback data to perform centralized training and evaluation of the model, and continuously iterates and optimizes the Actor network.
[0011] Compared with the prior art, the present invention has the following advantages: All-weather adaptability: Through online parameter identification, the problem of static model failure under rain, snow and muddy road conditions is solved, ensuring the robustness of intersection traffic.
[0012] Intrinsic safety: The introduction of a dual-ring safety circuit breaker mechanism eliminates safety accidents caused by AI algorithm "illusions" and meets mine safety requirements.
[0013] Energy saving and efficiency improvement: Energy management is incorporated into the control closed loop, and the speed curve is optimized for heavy-load downhill vehicles to maximize the utilization of potential energy recovery. Attached Figure Description
[0014] Figure 1 This is a block diagram of the system structure of the present invention; Figure 2 This is a flowchart of the control method of the present invention; Figure 3 This is a diagram of the gradient-load perception multi-agent reinforcement learning network structure of the present invention. Figure 4 This is the timing diagram of the dual-ring safety fuse module. Detailed Implementation
[0015] The invention will now be further described with reference to the accompanying drawings.
[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0017] like Figure 1As shown, this invention provides a collaborative control system for unmanned trucks at intersections in open-pit mines, capable of identifying dynamic working conditions. The system includes an onboard edge computing terminal and a cloud-based training and control center. Each unmanned truck is equipped with an onboard edge computing terminal and an onboard sensor terminal. The onboard edge computing terminals on each unmanned truck interact with each other via a V2X direct communication network. The onboard edge computing terminal includes a data preprocessing module, a parameter identification and state estimation module, a collaborative decision-making control module, and a dual-loop safety circuit breaker module. The data preprocessing module is connected to the onboard sensor terminal, the collaborative decision-making control module, the parameter identification and state estimation module, and the cloud-based training and control center. The parameter identification and state estimation module and the cloud-based training and control center are both connected to the collaborative decision-making control module, which is connected to the dual-loop safety circuit breaker module.
[0018] The vehicle-mounted sensor includes one or more of the following: wheel speed sensor, high-precision differential GPS / IMU integrated navigation unit, hydro-pneumatic suspension pressure sensor, drive motor torque sensor, and brake temperature sensor.
[0019] The data preprocessing module is used to perform timestamp alignment, outlier removal, and dimensionless normalization on the high-noise heterogeneous data (including high-frequency wheel speed data and low-frequency GPS data) uploaded from the vehicle sensor terminal; at the same time, it is used to record the state transitions, action execution, and reward feedback of the vehicle within a single scheduling cycle, and to construct a priority experience replay matrix to eliminate the temporal correlation of data, providing standardized samples for model training. Parameter identification and state estimation module: It is used to receive the perception data uploaded by the vehicle sensor to the data preprocessing module and processed by the data preprocessing module, calculate the tire adhesion coefficient μ and the actual load mass m of the vehicle at the current intersection in real time, and construct the current dynamic physical safety envelope of the vehicle based on the slope-load coupling model. The parameter identification and state estimation module has built-in operating condition identification logic, and the specific identification method is as follows: Road surface adhesion state determination: The parameter identification and state estimation module collects the wheel linear velocity v measured by the wheel speed sensor in real time. w The vehicle's ground speed v, measured by a high-precision differential GPS / IMU integrated navigation unit, is used to calculate the tire longitudinal slip ratio. When slip ratio When the threshold of the preset linear region is exceeded, the current peak road adhesion coefficient is calculated by using an extended Kalman filter (EKF) combined with a tire dynamics model. ; Load mass status assessment: The parameter identification and state estimation module collects values from the hydropneumatic suspension pressure sensors and performs initial calibration by combining them with the vehicle's static weight; during vehicle operation, it utilizes the motor drive torque equation... Vehicle mass estimates are corrected in real time based on recursive least squares (RLS) method. To eliminate quality deviations caused by fuel consumption and ore adhesion, where: T e The torque output by the drive motor is represented by m; the dynamic total mass of the vehicle to be identified is represented by a; the current longitudinal acceleration of the vehicle is represented by F. resistance This represents the total driving resistance currently experienced by the vehicle (including tire rolling resistance, air resistance, and gradient resistance); r represents the effective rolling radius of the drive wheels.
[0020] Collaborative decision-making main control module: includes a high-precision 3D map and is equipped with an agent network based on gradient-load perception multi-agent reinforcement learning (GLA-MA-MADDPG) to generate normalized intersection passage expected acceleration commands based on vehicle status and physical envelope; like Figure 3 As shown, the method by which the collaborative decision-making main control module performs collaborative control of unmanned trucks at intersections is as follows: S301: Define the multi-agent state space S and action space A; the state space S contains the normalized position, speed, remaining battery charge (SOC), current intersection branch slope θ, identified load m, and adhesion coefficient μ of all vehicles; the action space A is a continuous acceleration control command. S302: Construct a reinforcement learning model; the model consists of an Actor network (policy network) deployed on the vehicle and a Critic network (value network) deployed in the cloud training and control center; the Actor network is used to output the vehicle's scheduling actions in real time during the distributed execution phase, and the Critic network is used to evaluate the policy performance of the Actor network based on the global state and joint actions during the centralized training phase. The Actor network employs a hierarchical feature coding architecture, comprising a basic feature coding layer and a physical coupling sensing layer. The physical coupling sensing layer introduces a gated modulation mechanism, using the slope θ and load m to generate a gated signal. Nonlinear weighting is applied to the basic features; In the formula: g represents the generated physical gating signal (or modulation weight), used to adjust the response intensity of each dimension in the basic feature vector; σ represents the Sigmoid nonlinear activation function, used to map the output value to the (0, 1) interval; W g represents the learnable weight matrix in the gated neural network; [m,θ] represents the physical feature input vector formed by concatenating the dynamic load m output by the parameter identification and state estimation module with the current intersection branch slope θ; b g This represents the learnable bias vector in a gated neural network; S303: Calculate multidimensional adaptive attention weights; process vehicle interaction information using four parallel attention heads respectively: Spatiotemporal head: Calculates the correlation between Euclidean distance and speed difference between vehicles; Physics head: Calculate the difference in potential energy between vehicles due to gradient and load. The degree of correlation; In the formula: The value of m represents the absolute difference in gravitational potential energy between vehicle i and vehicle j, used to characterize the asymmetric game relationship between the two vehicles in the physical state of the intersection segment; i This indicates the dynamic load capacity of vehicle i after real-time correction by the parameter identification and state estimation module; m j The dynamic load mass of the neighboring vehicle j broadcast via V2X within the interaction range is represented by g; g represents the gravitational acceleration constant; h i This indicates the relative elevation of the current intersection branch where vehicle i is located (which can be calculated from a high-precision map combined with the current slope); h j This indicates the relative elevation of the intersection branch where the adjacent vehicle j is currently located; Task Header: Assign weight matrix based on vehicle task priority; Risk head: Based on the estimated time to collision (TTC) ij The reciprocal of the formula generates a mask matrix, which forces attention to high-risk vehicles; The final feature representation is a weighted fusion of the four heads: ; In the formula: h' represents the enhanced feature vector after processing and fusion through a multi-dimensional adaptive attention mechanism, used as the comprehensive state basis for subsequent acceleration command generation; Concat represents the vector concatenation operation, used to combine features from different dimensions; head1, head2, head3, and head4 represent the local feature representation vectors output by the spatiotemporal head, physical head, task head, and risk head after feature extraction, respectively; W O This represents the linear transformation weight matrix of the output layer, which is used to reduce the dimensionality of the concatenated multidimensional features and perform linear combination, so that it is mapped to the expected action feature output dimension space. S304: Generating and constraining actions; the Actor network outputs the original actions. Using a state-dependent differentiable mapping function, it is mapped to the physical boundary calculated by the parameter identification module. Within, the actual control command a is obtained. real ; The mapping formula is: ; In the formula: a real This refers to the actual control command ultimately output to the vehicle's underlying controller for execution, i.e., the physically executable target acceleration; a raw This represents the dimensionless raw action value directly output by the Actor network in the collaborative decision-making master module, whose value range is restricted to [-1, 1] by the activation function; The parameter identification and state estimation module calculates the upper boundary of the physical maximum acceleration based on the current vehicle dynamic state S (including the real-time identified road adhesion coefficient, dynamic mass, and slope). The parameter identification and state estimation module calculates the lower boundary of the physical minimum achievable acceleration (i.e., the maximum physical braking deceleration of the vehicle under the current operating conditions) based on the current vehicle dynamic state S.
[0021] Dual-ring safety circuit breaker module: connected in series at the output end of the collaborative decision-making main control module, used to perform physical feasibility verification and collision risk prediction on AI-generated instructions. When the instruction exceeds the physical envelope or the collision time (TTC) is lower than the threshold, it will forcibly take over and issue the minimum risk strategy (MRM). like Figure 4 As shown, the execution logic of the dual-ring safety fuse module is as follows: First loop (physical boundary clamping): Receives instruction a output from the collaborative decision-making main control module. real Let's check again whether it satisfies a. min real max If it exceeds the limit, it will be truncated directly; where a min Based on the current brake temperature T brake The maximum physical braking deceleration is calculated based on the road surface adhesion coefficient μ; if it does not exceed this value, the process proceeds to the second stage. The second step (model prediction circuit breaker): Substitute the command into the vehicle kinematics model to predict the trajectory in the next N seconds; if the minimum distance between the predicted trajectory and other vehicles or obstacles is less than the dynamic safety threshold D... safe If the original command is immediately blocked, the emergency braking logic is triggered, and an alarm signal is sent to the roadside base station; if it is not less than the dynamic safety threshold D safe If so, the original instruction will be executed normally.
[0022] Cloud-based training and control center: Equipped with a Critic network, it uses priority experience samples from the experience replay database and combines them with a multi-objective reward function to calculate the temporal difference error. It then performs centralized evaluation and gradient updates of the policies of all vehicle-side Actor networks and OTA sends the updated Actor policy parameters to the collaborative decision-making main control module. The cloud-based training control center uses a multi-objective reward function to guide training. In the formula: R total R represents the total reward value obtained by the agent within a single decision step, used to guide the gradient update of the scheduling decision network; w1, w2, w3, and w4 represent the weight coefficients of each reward, used to balance the optimization weight between traffic efficiency, safety, and energy consumption; speed This represents a speed bonus, used to assess how close the vehicle's current speed is to the maximum speed allowed within the current physical safety envelope, encouraging the vehicle to maintain a higher speed within the physical safety boundary to improve efficiency; R progress This represents a traffic progress reward, used to assess the displacement of vehicles towards the target node at the intersection, encouraging vehicles to keep moving and reducing unnecessary stops and idling; R collision This refers to a collision penalty, where a large negative reward is given to the system when the distance between the vehicle and other vehicles or obstacles is less than a safe threshold, or when the expected collision time is less than a safe minimum; R energy This refers to the energy management reward item. When a vehicle is heavily loaded and descending a slope towards an intersection, if the vehicle speed is maintained within the high-efficiency regenerative braking range of the electric motor, the reward will be given. low v high Positive rewards are given if the system is not positive, and negative penalties are given otherwise, in order to encourage the system to maximize the recovery of gravitational potential energy.
[0023] like Figure 2 As shown, this invention provides a method for collaborative control of unmanned trucks at intersections in open-pit mines, which identifies dynamic operating conditions: S1: The on-board sensor collects vehicle operation data in real time and transmits it to the data preprocessing module, which then uploads it to the cloud training and control center via the 5G industrial private network. The data preprocessing module is used to perform timestamp alignment, outlier removal, and dimensionless normalization on the high-noise heterogeneous data (including high-frequency wheel speed data and low-frequency GPS data) uploaded from the vehicle sensor terminal; at the same time, it is used to record the state transitions, action execution, and reward feedback of the vehicle within a single scheduling cycle, and to construct a priority experience replay matrix to eliminate the temporal correlation of data, providing standardized samples for model training. S2: The parameter identification and state estimation module uses the slip ratio to estimate the road adhesion coefficient, uses the dynamic equation to correct the vehicle mass, and combines high-precision map slope data to update the dynamic performance envelope of each vehicle in real time. S3: The collaborative decision-making main control module inputs the updated state into the agent network based on gradient-load perception multi-agent reinforcement learning. After aggregating features through a multi-dimensional attention mechanism, it outputs the expected acceleration that balances efficiency and energy recovery. For heavily loaded downhill vehicles, it prioritizes outputting a constant speed command that maintains the peak regenerative braking power. S4: The dual-ring safety fuse module performs dual verification of the physical feasibility and collision risk of the desired acceleration, and outputs the final control command; S5: The vehicle's underlying controller executes the final instructions and asynchronously feeds back the state transition, action, and reward signals after a single interaction to the cloud-based experience playback database; the cloud-based Critic network uses this feedback data to perform centralized training and evaluation of the model, and continuously iterates and optimizes the vehicle-side Actor policy network.
[0024] Example: Dynamic parameter identification logic: The parameter identification and state estimation module determines the operating condition; road surface adhesion coefficient identification: the parameter identification and state estimation module monitors the linear velocity v of the drive wheel collected from the wheel speed sensor in real time. w and the vehicle reference speed v collected from the differential GPS / IMU integrated navigation unit ref When v is detected w With v ref When deviation exists, calculate the longitudinal slip ratio. Using a pre-built Burckhardt tire model The peak adhesion coefficient of the current road surface is iteratively updated by combining the extended Kalman filter (EKF) algorithm. .like If the pressure is less than 0.3, the system determines it as a "slippery road surface" and automatically increases the safe following distance at intersections. Dynamic load correction: The parameter identification and state estimation module collects the pressure value P of the hydropneumatic suspension and uses the calibration curve m static =f(P) yields the static mass. During vehicle acceleration, the dynamic equations are used... A recursive least squares (RLS) observer is constructed to correct the vehicle mass estimate in real time. This is to eliminate errors caused by fuel consumption and ore adhesion.
[0025] Crossroad Cooperative Control Method Flow: like Figure 2 As shown, the method for cooperative control of unmanned trucks is as follows: S1: Define the state space and action space: The state space S not only includes traditional position and velocity, but also expands to include the adhesion coefficient μ and dynamic load m obtained from identification, as well as the slope θ provided by the high-precision map.
[0026] The action space A represents the vehicle's acceleration command, ranging from [-3, 1] m / s². 2 .
[0027] S2: Constructing a physically coupled sensing network: The reinforcement learning model adopts the GLA-MA-MADDPG architecture. A physically coupled sensing layer is designed during the feature extraction stage.
[0028] This layer introduces a gating mechanism: ; The gating signal g is used as a weight to nonlinearly modulate the vehicle's basic characteristics (position, speed). For example, when the vehicle is in a "heavy load + downhill" state, the gating signal will automatically amplify the weight of the "speed" feature, making the network pay more attention to speed control.
[0029] S3: The multi-dimensional attention interaction model internally runs four attention heads in parallel to calculate the interaction weights between vehicles: Spatiotemporal head: focuses on vehicles that are close to each other. Physical head: focuses on vehicles with similar or dangerous physical states (e.g., both are heavily loaded downhill). Task head: focuses on high-priority vehicles (e.g., fully loaded vehicles have higher weight than empty vehicles). Risk head: focuses on vehicles with shorter TTC (Time to Collision). The final feature representation is a weighted fusion of the four heads. .
[0030] S4: Generating and Constraining Actions: The Actor network outputs the original action a. raw After that, it is not executed directly, but constrained through a differentiable mapping function.
[0031] The system first calculates the physical boundary based on the currently identified μ and m: ; ; Where F env It is the sum of slope resistance and rolling resistance.
[0032] Then apply the mapping formula: ; This step ensures that even if the AI wants to brake suddenly, if the road surface is too slippery (μ is small), the output command will be limited within physical limits to prevent skidding.
[0033] S5: Energy Management and Safety Trigger Energy Management: For heavily loaded vehicles descending a slope towards an intersection, the system checks their battery SOC. If SOC < 80%, the scheduling algorithm tends to output a constant negative acceleration to maintain the vehicle speed within the range of highest motor power generation efficiency (e.g., 15-20 km / h), utilizing potential energy for charging. Safety Trigger: Before issuing the command, the system predicts the trajectory for the next 3 seconds using a vehicle kinematics model. If the predicted distance to other vehicles is less than a safety threshold D... safe The safety fuse module immediately intercepts the command and issues a maximum braking command.
[0034] S6: Model Update: The system stores the actual state of the vehicle after execution (including energy consumption data and whether a circuit breaker or takeover has occurred) into the database.
[0035] The cloud-based training and control center regularly samples from the database, updates network parameters using a multi-objective reward function (including efficiency, safety, and energy recovery), and sends the new parameters to the vehicle-mounted edge computing terminal via OTA over the 5G network.
[0036] Through the above process, the present invention realizes a complete closed loop from the identification of underlying physical parameters to the upper-level intelligent decision-making, effectively improving the intelligence level of the open-pit mine transportation system.
[0037] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered in all respects as exemplary and non-limiting, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the present invention. No reference numerals in the claims should be construed as limiting the scope of the claims.
[0038] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any minor modifications, equivalent substitutions, and improvements made to the above embodiments based on the technical essence of the present invention should be included within the protection scope of the present invention.
Claims
1. A collaborative control system for unmanned trucks at intersections in open-pit mines, characterized in that: This includes in-vehicle edge computing terminals and cloud-based training and control centers; the in-vehicle edge computing terminals interact with each other via a V2X direct communication network. The vehicle-mounted edge computing terminal includes a data preprocessing module, a parameter identification and state estimation module, a collaborative decision-making control module, and a dual-ring safety circuit breaker module. The data preprocessing module is connected to the vehicle-mounted sensor terminal, the collaborative decision-making main control module, the parameter identification and state estimation module, and the cloud training and control center, respectively. The parameter identification and state estimation module and the cloud training and control center are both connected to the collaborative decision-making main control module, which is connected to the dual-ring safety circuit breaker module. Data preprocessing module: used to process high-noise heterogeneous data uploaded from vehicle-mounted sensors; at the same time, it is used to record the state transitions, action executions and reward feedback within a single vehicle scheduling cycle, and to construct a priority experience replay matrix to eliminate data temporal correlation, providing standardized samples for model training; Parameter identification and state estimation module: It is used to receive the perception data processed by the data preprocessing module, calculate the tire adhesion coefficient μ and the actual load mass m of the vehicle at the current intersection in real time, and construct the current dynamic physical safety envelope of the vehicle based on the slope-load coupling model. Collaborative decision-making main control module: includes a high-precision 3D map and is equipped with an agent network based on gradient-load perception multi-agent reinforcement learning, which is used to generate normalized intersection passage expected acceleration commands based on vehicle status and physical envelope; Dual-ring safety circuit breaker module: used to perform physical feasibility verification and collision risk prediction on AI-generated instructions. When the instruction exceeds the physical envelope or the collision time is less than the threshold, it will forcibly take over and issue the minimum risk strategy. Cloud-based training and control center: Equipped with a Critic network, it uses priority experience samples from the experience replay database and combines them with a multi-objective reward function to calculate the temporal difference error. It then performs centralized evaluation and gradient updates of the policies of all vehicle-side Actor networks and sends the updated policy parameters to the collaborative decision-making control module.
2. The open-pit mine unmanned truck intersection collaborative control system for identifying dynamic working conditions according to claim 1, characterized in that, The vehicle-mounted sensor terminal includes one or more of the following: wheel speed sensor, differential GPS / IMU integrated navigation unit, hydro-pneumatic suspension pressure sensor, drive motor torque sensor, and brake temperature sensor.
3. The open-pit mine unmanned truck intersection collaborative control system for identifying dynamic working conditions according to claim 2, characterized in that, The parameter identification and state estimation module has built-in operating condition identification logic, and the identification method is as follows: Road surface adhesion status assessment: Real-time acquisition of wheel linear velocity v measured by wheel speed sensor. w The vehicle's ground speed v, measured by a high-precision differential GPS / IMU integrated navigation unit, is used to calculate the tire longitudinal slip ratio. When slip ratio When the threshold of the preset linear region is exceeded, the current peak road adhesion coefficient is back-calculated using an extended Kalman filter combined with a tire dynamics model. ; Loading weight status assessment: Collect the values of the oil-gas suspension pressure sensor and perform initial calibration in combination with the vehicle's static weight; During vehicle operation, the motor driving torque equation is used. Real-time correction of vehicle mass estimates based on recursive least squares method Eliminate quality deviations caused by fuel consumption and ore adhesion; In the formula: T e The torque output by the drive motor is represented by m; the dynamic total mass of the vehicle to be identified is represented by a; the current longitudinal acceleration of the vehicle is represented by F. resistance This represents the total resistance force currently experienced by the vehicle; r represents the effective rolling radius of the drive wheels.
4. The open-pit mine unmanned truck intersection collaborative control system for identifying dynamic working conditions according to claim 1, characterized in that, The method by which the collaborative decision-making main control module performs intersection collaborative control of unmanned trucks is as follows: S301: Define the multi-agent state space S and action space A; the state space S contains the normalized position, speed, remaining battery power, current intersection branch slope θ, identified load m, and adhesion coefficient μ of all vehicles; the action space A is a continuous acceleration control command. S302: Construct a reinforcement learning model; the model consists of an Actor network deployed on the vehicle and a Critic network deployed in the cloud training and control center; the Actor network is used to output the vehicle's scheduling actions in real time during the distributed execution phase, and the Critic network is used to evaluate the policy performance of the Actor network based on the global state and joint actions during the centralized training phase. The Actor network employs a hierarchical feature coding architecture, comprising a basic feature coding layer and a physical coupling sensing layer. The physical coupling sensing layer introduces a gated modulation mechanism, using the slope θ and load m to generate a gated signal. Nonlinear weighting is applied to the basic features; In the formula: g represents the generated physical gating signal; σ represents the Sigmoid nonlinear activation function; W g represents the learnable weight matrix in the gated neural network; [m,θ] represents the physical feature input vector formed by concatenating the dynamic load m output by the parameter identification and state estimation module with the current intersection branch slope θ; b g This represents the learnable bias vector in a gated neural network; S303: Calculate multidimensional adaptive attention weights; process vehicle interaction information using four parallel attention heads respectively: Spatiotemporal head: Calculates the correlation between Euclidean distance and speed difference between vehicles; Physics head: Calculate the difference in potential energy between vehicles due to gradient and load. The degree of correlation; In the formula: m represents the absolute difference in gravitational potential energy between vehicle i and vehicle j; i This indicates the dynamic load capacity of vehicle i after real-time correction by the parameter identification and state estimation module; m j The dynamic load mass of the neighboring vehicle j broadcast via V2X within the interaction range is represented by g; g represents the gravitational acceleration constant; h i Indicates the relative elevation of the intersection branch where vehicle i is currently located; h j This indicates the relative elevation of the intersection branch where the adjacent vehicle j is currently located; Task Header: Assign weight matrix based on vehicle task priority; Risk head: Based on the estimated time to collision (TTC) ij The reciprocal of the formula generates a mask matrix, which forces attention to high-risk vehicles; The final feature representation is a weighted fusion of the four heads: ; In the formula: h' represents the enhanced feature vector after processing and fusion through a multi-dimensional adaptive attention mechanism; Concat represents the vector concatenation operation; head1, head2, head3, and head4 represent the local feature representation vectors output by the spatiotemporal head, physical head, task head, and risk head after feature extraction, respectively; W O This represents the linear transformation weight matrix of the output layer; S304: Generating and constraining actions; the Actor network outputs the original actions. Using a state-dependent differentiable mapping function, it is mapped to the physical boundary calculated by the parameter identification module. Within, the actual control command a is obtained. real ; The mapping formula is: ; In the formula: a real This represents the actual control commands ultimately output to the vehicle's underlying controller for execution; a raw This represents the dimensionless raw action value directly output by the Actor network, whose range is restricted to [-1, 1] by the activation function; This indicates that the parameter identification and state estimation module calculates the upper boundary of the physically achievable maximum acceleration based on the current vehicle dynamic state S; The parameter identification and state estimation module calculates the lower boundary of the physical minimum achievable acceleration based on the current vehicle dynamic state S.
5. A collaborative control system for unmanned trucks at intersections in open-pit mines, as described in claim 4, characterized in that: The execution logic of the dual-ring safety fuse module is as follows: First step: Receiving instructions a output from the collaborative decision-making main control module real Let's check again whether it satisfies a. min real max If it exceeds the limit, it will be truncated directly; where a min Based on the current brake temperature T brake The maximum physical braking deceleration is calculated based on the road surface adhesion coefficient μ; if it does not exceed this value, the process proceeds to the second stage. The second step: Substitute the command into the vehicle's kinematics model to predict the trajectory in the next N seconds; if the predicted trajectory's minimum distance from other vehicles or obstacles is less than the dynamic safety threshold D... safe If the original command is immediately blocked, the emergency braking logic is triggered, and an alarm signal is sent to the roadside base station; if it is not less than the dynamic safety threshold D safe If so, the original instruction will be executed normally.
6. The open-pit mine unmanned truck intersection collaborative control system for identifying dynamic working conditions according to claim 1, characterized in that, The cloud-based training control center uses a multi-objective reward function to guide training. In the formula: R total R represents the total reward value obtained by the agent within a single decision step; w1, w2, w3, and w4 represent the weight coefficients of each reward; speed Indicates speed bonus; R progress Indicates progress reward items; R collision This refers to a collision penalty, where a large negative reward is given to the system when the distance between the vehicle and other vehicles or obstacles is less than a safe threshold, or when the expected collision time is less than a safe minimum; R energy This refers to the energy management reward item. When a vehicle is heavily loaded and descending a slope towards an intersection, if the vehicle speed is maintained within the high-efficiency regenerative braking range of the electric motor, the reward will be given. low v high Positive rewards are given if the conditions are met, otherwise negative penalties are imposed.
7. A method for collaborative control of unmanned trucks at intersections in open-pit mines, characterized in that: Using the system according to any one of claims 1-6 includes the following steps: S1: The on-board sensor collects vehicle operation data in real time and transmits it to the data preprocessing module, which then uploads it to the cloud training and control center. The data preprocessing module is used to perform timestamp alignment, outlier removal, and dimensionless normalization on the high-noise heterogeneous data uploaded from the vehicle sensor terminal. At the same time, it is used to record the state transitions, action execution, and reward feedback of the vehicle within a single scheduling cycle, and to construct a priority experience replay matrix to eliminate the temporal correlation of data, providing standardized samples for model training. S2: The parameter identification and state estimation module uses the slip ratio to estimate the road adhesion coefficient, uses the dynamic equation to correct the vehicle mass, and combines high-precision map slope data to update the dynamic performance envelope of each vehicle in real time. S3: The collaborative decision-making main control module inputs the updated state into the agent network based on gradient-load perception multi-agent reinforcement learning. After aggregating features through a multi-dimensional attention mechanism, it outputs the expected acceleration that balances efficiency and energy recovery. For heavily loaded downhill vehicles, it prioritizes outputting a constant speed command that maintains the peak regenerative braking power. S4: The dual-ring safety fuse module performs dual verification of the physical feasibility and collision risk of the desired acceleration, and outputs the final control command; S5: The vehicle's underlying controller executes the final instructions and asynchronously feeds back the state transition, action, and reward signals after a single interaction to the cloud-based experience playback database; the Critic network uses this feedback data to perform centralized training and evaluation of the model, and continuously iterates and optimizes the Actor network.