A hybrid model predictive control method for coping with large vehicle control delay
By combining LSTM networks and OSGP models, a hybrid predictive control method was developed to address the control delay problem in large vehicles. This method enables the capture of long time-series delays and real-time disturbance compensation, thereby improving the accuracy and stability of vehicle control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF SCI & TECH
- Filing Date
- 2026-03-06
- Publication Date
- 2026-06-12
Smart Images

Figure CN122194640A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent driving control technology, and more specifically to a hybrid model predictive control method for addressing control delays in large vehicles. Background Technology
[0002] With the rapid development of automation and intelligent technologies, large-scale work vehicles are gradually transforming towards intelligent operation to improve work efficiency and safety. However, the intelligentization process of this type of equipment faces a key bottleneck—the delay effect: due to the large vehicle mass and the sluggish response of hydraulic / pneumatic actuators, coupled with uncertainties such as road disturbances and dynamic load fluctuations in the working environment, there is a significant time lag between control commands and system response. This phenomenon not only leads to accuracy problems such as speed control overshoot and trajectory tracking deviation, but may also cause system oscillations or even safety accidents, limiting the performance limits of intelligent work vehicles.
[0003] Model predictive control (MPC), a classic method in industrial control, effectively handles multivariable constraint problems through a combination of rolling optimization and feedback correction, and has been validated in structured scenarios such as chemical processes and vehicle control. However, when dealing with special objects like large-scale operational vehicles, the limitations of traditional MPC become increasingly apparent. Its framework relies on precise dynamic modeling, but the strong nonlinearity and multi-physics coupling characteristics of large-scale operational vehicles make precise modeling extremely costly. Simultaneously, random disturbances and system variations in actual operations make it difficult for control strategies based on prior physical models to reflect real-time dynamics, leading to prediction inaccuracies and control performance degradation. This contradiction is particularly prominent in unstructured scenarios such as mines and farmland. Against this backdrop, data-driven black-box modeling methods demonstrate unique advantages. By deeply mining the dynamic correlations in system operation data, these methods can bypass complex physical mechanism analysis and directly establish the mapping relationship between control inputs and system responses. Especially for nonlinear dynamic lag phenomena such as delay, data-driven models, through adaptive learning mechanisms, can effectively capture the implicit correlation between delay parameters and environmental disturbances, providing a new technical path to improve the real-time performance and robustness of predictive control.
[0004] MPC requires online prediction of future outputs at each sampling time, while LSTM networks, through their memory-gated structure, can capture long-term dependencies between inputs and states, making them suitable for time-delay-related prediction problems. Therefore, combining LSTM networks with MPC has become an important technical approach for solving delay and nonlinear dynamic systems. Summary of the Invention
[0005] The purpose of this invention is to provide a hybrid model predictive control method for addressing control delays in large vehicles.
[0006] The technical solution to achieve the objective of this invention is: a hybrid model predictive control method for addressing control delays in large vehicles, comprising the following steps:
[0007] Step 1: Construct a simulation environment based on the vehicle dynamics model to simulate vehicle motion under different driving modes. Design a control command sequence with delay and noise to act on the vehicle dynamics model, and collect the control command sequence and the corresponding vehicle state to form time series data.
[0008] Step 2: Construct a vehicle state prediction model by integrating a self-attention mechanism and a residual multilayer LSTM network. This model includes a spatiotemporal feature extraction layer, multilayer LSTM units, and a self-attention mechanism and feature fusion layer. The spatiotemporal feature extraction layer contains a linear transformation layer, ReLU, and layer normalization. It takes vehicle time-series data as input, performs low-dimensional to high-dimensional mapping and activation, and normalization to output preliminary high-dimensional spatiotemporal features. The multilayer LSTM units are stacked LSTM subunits with residual connections. They take high-dimensional spatiotemporal features as input, and capture long-term dependencies through multilayer iteration by adding the output of each layer to the input residuals to output deep features. The encoding time-series features; the self-attention mechanism includes a dimensionality reduction linear layer, Tanh, and a weighted output layer. The input is the deep encoded time-series features, which are then weighted and aggregated at each time step to output a feature vector focusing on key time sequences; the feature fusion layer includes a concatenation operation, a linear transformation layer, ELU, and Dropout. The input is the output of the attention layer and the original features from the last step. After concatenation and fusion, the output is the predicted state of the vehicle at the next time step. The vehicle state prediction model is trained using a dynamically delayed adaptive kinematic loss function, which optimizes performance by considering position error, heading error, and kinematic constraint deviation.
[0009] Step 3: Construct the residual dataset and the multi-task OSGP real-time disturbance compensation model. Using vehicle time-series data as input, the model infers the state prediction result through vehicle state prediction. The residual result is obtained by subtracting the actual vehicle state from the prediction result, and the output includes data pairs containing control commands and corresponding residuals, thus constructing the residual dataset. The multi-task OSGP model consists of an induction point sharing module, multi-task branches, and an online update unit. The induction point sharing module selects initial induction points from the residual dataset for all tasks to share. The multi-task branches contain independent RBF and linear combination kernel functions and likelihood functions. Using the control command sequence from the residual dataset as input, it jointly models multi-dimensional residual dependencies and outputs the probability distribution prediction of each residual as the disturbance compensation result. The online update unit receives new residual data in real time through a sliding window and dynamically optimizes the kernel function hyperparameters and induction point positions.
[0010] Step 4: The vehicle state prediction model and the real-time disturbance compensation model infer from the vehicle time-series data to obtain the state prediction result and disturbance compensation result. Based on this, the vehicle state prediction model and the real-time disturbance compensation model are linearized using the central difference method. Using the current time-series data as a benchmark, small disturbances are applied to the state variables and control variables respectively, and the difference in state changes before and after the disturbance is calculated, thereby obtaining the state Jacobian matrix and the control Jacobian matrix, achieving a linear approximation between the vehicle state prediction model and the real-time disturbance compensation model. Subsequently, an MPC optimization problem is constructed, using the linearized state equation as constraints, including initial state constraints, state transition constraints, control quantity boundary constraints, and control rate of change constraints, and the constraints are transformed to a standardized space. The objective function design comprehensively considers trajectory tracking error, control quantity magnitude, and control smoothness. The optimization problem is solved using a solver to obtain the optimal control sequence.
[0011] Further, in step 1, a simulation environment is constructed based on the vehicle dynamics model to simulate vehicle motion under different driving modes. A control command sequence containing delays and noise is designed to act on the vehicle dynamics model, and time-series data is collected by combining the control command sequence with the corresponding vehicle state. The specific method is as follows:
[0012] Step 11: Based on the six-degree-of-freedom state-space equations, establish a vehicle dynamics model to simulate vehicle motion under different driving modes: ; Indicates the vehicle status, where For longitudinal velocity, For lateral speed, For yaw rate, For heading angle, and Position coordinates; longitudinal acceleration Calculated using the force balance equations: ; in For driving force, For braking force, For air density, For air drag coefficient, For windward area, For rolling resistance coefficient, The acceleration is due to gravity; meanwhile, the vehicle dynamics model considers the correction of tire lateral stiffness caused by the dynamic changes in axle load due to acceleration. ; ; ; ; in These are the distances from the front and rear axles to the center of gravity, respectively. These are the front and rear tire lateral stiffness, based on static lateral stiffness. , and front and rear wheel axle load , Calculated;
[0013] Step 12: Design a control command sequence containing delay and noise to act on the vehicle dynamics model, and collect timing data of the control command sequence and corresponding vehicle states. The generation of control commands is based on acceleration. and front wheel cornering As core benchmark parameters, both are first constructed using sine waves and uniform distribution functions to establish basic values. Then, parameters are adjusted based on driving modes. Smoothness is optimized through filtering and rate-of-change limiting. For smooth driving, the sine wave frequency of steering angle and acceleration is reduced, and the amplitude and disturbance range are adjusted. During emergency braking, the steering angle gradually decays, and the acceleration transitions from stable acceleration to the braking target. For slalom maneuvers, continuous steering is simulated by increasing the sine wave frequency of the steering angle and adjusting the amplitude and phase, while maintaining stable acceleration. Delays are achieved through two methods: queue delay, where randomly selected delay steps are stored in a queue, and the instruction is only retrieved and executed when the queue length reaches the required delay steps; and first-order inertial delay, based on the time constant. and time step Calculate the transition coefficient The current instruction is weighted and merged with the value executed at the previous time step: New value = + The simulation actuator's response lag due to inertia is used to generate a noisy control sequence by superimposing normally distributed random values with a mean of 0 and different standard deviations onto the final command. The state sequence is then calculated using a vehicle dynamics model, ultimately forming the timing data of the control command and the corresponding vehicle state.
[0014] Further, in step 2, a vehicle state prediction model is constructed by integrating a self-attention mechanism with a residual multilayer LSTM network. This model includes a spatiotemporal feature extraction layer, multilayer LSTM units, and a self-attention mechanism and feature fusion layer. The spatiotemporal feature extraction layer contains a linear transformation layer, ReLU, and layer normalization. It takes vehicle time-series data as input, performs low-dimensional to high-dimensional mapping and activation, and normalization to output preliminary high-dimensional spatiotemporal features. The multilayer LSTM units are stacked LSTM subunits with residual connections. They take high-dimensional spatiotemporal features as input and capture long-term dependencies through multilayer iteration by adding the output of each layer to the input residual, thus outputting a deep... The layer-encoded temporal features; the self-attention mechanism includes a dimensionality reduction linear layer, Tanh, and a weighted output layer. It takes deep temporal features as input, calculates and aggregates weights at each time step, and outputs a feature vector focusing on key temporal sequences. The feature fusion layer includes concatenation, a linear transformation layer, ELU, and Dropout. Its input is the output of the attention layer and the original features from the last step. After concatenation and fusion, it outputs the predicted state of the vehicle at the next time step. The vehicle state prediction model is trained using a dynamically delayed adaptive kinematic loss function, comprehensively considering position error, heading error, and kinematic constraint deviation to optimize performance. The specific method is as follows:
[0015] Step 21: Construct a vehicle state prediction model by integrating a self-attention mechanism with a residual multilayer LSTM network, including a spatiotemporal feature extraction layer, multilayer LSTM units, and a self-attention mechanism and feature fusion layer, specifically:
[0016] (1) Spatiotemporal feature extraction layer: A cascaded structure of linear transformation, ReLU activation, and layer normalization is adopted. The input dimension is the historical control command and state sequence of [batch_size, seq_len, input_size], where batch_size, seq_len, and input_size represent the batch size, sequence length, and number of input features, respectively. First, the features of each time step are mapped through a linear layer to convert the number of input features into hidden dimensions, which are set to 256 during training, and the output dimension becomes [batch_size, seq_len, 256]. Then, the nonlinear expression capability is enhanced through the ReLU activation function, and the output dimension remains unchanged at [batch_size, seq_len, 256]. Finally, the 256-dimensional features of each time step are normalized through layer normalization to eliminate the difference in the scale of different features. The final output dimension is still a uniform scale spatiotemporal feature of [batch_size, seq_len, 256], which provides a foundation for subsequent time series modeling.
[0017] (2) Multi-layer LSTM unit: Four layers of LSTM units with residual connections are stacked. The input and output dimensions of each LSTM layer are 256. The input of the multi-layer LSTM unit is the uniform-scale spatiotemporal features output by the spatiotemporal feature extraction layer. After receiving the input, the first LSTM layer outputs a temporal feature with a dimension of [batch_size, seq_len, 256]. Then, it is added element-wise with the original input of the layer to obtain a feature with residual enhancement, which still has a dimension of [batch_size, seq_len, 256]. This enhanced feature is used as the input of the second LSTM layer. After being processed by the second LSTM layer, it is also added element-wise with the input of the second layer. This process continues. After four LSTM layers, the final output is a deep temporal feature with a dimension of [batch_size, seq_len, 256].
[0018] (3) Self-attention mechanism: It consists of a dimension reduction linear layer, Tanh activation, and a weight output layer. The input of the self-attention mechanism is the deep temporal features output by a multi-layer LSTM. First, the 256-dimensional features are mapped to 32 dimensions through the dimension reduction linear layer, and the output dimension is [batch_size, seq_len, 32]. After processing by the Tanh activation function, the dimension remains unchanged at [batch_size, seq_len, 32]. Then, the 32-dimensional features are converted into 1-dimensional weight scores through the weight output layer, and the output dimension is [batch_size, seq_len, 1]. After processing by squeeze(-1), the original weights of [batch_size, seq_len] are obtained. The sequence length dimension is then normalized using the Softmax function to obtain the attention weights of [batch_size, seq_len]. Finally, the weights are used to perform a weighted summation of the temporal features output by the LSTM, that is, the weights of dimension [batch_size, 1, seq_len] are multiplied with the LSTM features of dimension [batch_size, seq_len, 256], and the output dimension is [batch_size, 1, 256]. After squeezing (1), the global key temporal feature vector of dimension [batch_size, 256] is obtained, which realizes the dynamic focus on key moments related to control delay and suppresses irrelevant temporal information.
[0019] (4) Feature Fusion Layer: The global key temporal features with dimensions [batch_size, 256] output by the self-attention mechanism are concatenated with the original features with dimensions [batch_size, input_size] from the last step of the input sequence to obtain the fused feature with dimensions [batch_size, 256 + input_size]. This fused feature is processed sequentially through linear transformation, ELU activation, Dropout, and output linear layer: First, the fused feature is mapped to 64 dimensions through the linear layer, and the output feature has dimensions [batch_size, 64]. After the nonlinearity is enhanced by the ELU activation function, the dimension remains unchanged at [batch_size, 64]. Dropout randomly discards some neurons to prevent overfitting, and the dimension is still [batch_size, 64]. Finally, the 64-dimensional feature is mapped to the output dimension through the output linear layer, and the final output is the predicted vehicle state value at the future time [batch_size, input_size].
[0020] Step 22: Design a dynamically delay-sensitive adaptive kinematic loss function. Specifically, the loss function comprises three parts: position loss, heading loss, and kinematic constraint loss. The weights of each part are dynamically adjusted by a learnable log-variance parameter. The expression is as follows:
[0021] ;
[0022] in, , , For the predicted position and heading angle, , , The actual value; , , It is a learnable log-variance parameter used to dynamically balance the weights of position and heading losses; , For the predicted lateral and longitudinal displacement increments, To find the minimum value to avoid the denominator being 0; The weighting coefficients for kinematic constraint loss are used to strengthen the model's adherence to the physical laws of vehicle motion by adjusting the deviation between the ratio of constraint displacement increments and the steering angle tangent.
[0023] Step 23: Train the vehicle state prediction model using vehicle time-series data. Employ the AdamW optimizer and use the loss function designed in Step 22 as the optimization objective to iteratively update the network parameters. During training, prune the gradients to avoid gradient explosion. Calculate the average loss of the training and validation sets every 10 iterations. If the validation set loss does not decrease for 10 consecutive epochs, trigger an early stopping mechanism to stop training. Save the final network weight parameters and the feature normalizer calculated based on the training set for feature normalization processing in subsequent prediction processes.
[0024] Further, in step 3, a residual dataset and a multi-task OSGP real-time disturbance compensation model are constructed. Vehicle time-series data is used as input, and the state prediction result is obtained through inference from the vehicle state prediction model. The residual result is obtained by subtracting the actual vehicle state from the prediction result, and the output includes data pairs containing control commands and corresponding residuals, thus constructing the residual dataset. The multi-task OSGP real-time disturbance compensation model consists of an induction point sharing module, multi-task branches, and an online update unit. The induction point sharing module selects initial induction points from the residual dataset for all tasks to share. The multi-task branches contain independent RBF and linear combination kernel functions and likelihood functions. Using the control command sequence of the residual dataset as input, it jointly models multi-dimensional residual dependencies and outputs the probability distribution prediction of each residual as the disturbance compensation result. The online update unit receives new residual data in real time through a sliding window and dynamically optimizes the kernel function hyperparameters and induction point positions. The specific method is as follows:
[0025] Step 31, constructing the residual dataset, specifically:
[0026] (1) Data preprocessing: The vehicle time series data is preprocessed, firstly by processing the angle continuity, exceeding ( Angle correction is performed within the range to ensure the continuity of angles in the time series and avoid calculation deviations caused by periodicity; the mean and standard deviation obtained during the training phase of the vehicle state prediction model are saved, and the input feature columns are standardized to eliminate dimensional differences; features from 10 consecutive time steps in the vehicle time series data are extracted to form the input sequence, with the corresponding label being the vehicle state at the 11th time step.
[0027] (2) Vehicle state prediction model prediction and residual calculation: Load the trained vehicle state prediction model, input the preprocessed vehicle time series data into the model in batches, output the vehicle state prediction value, calculate the difference between the actual vehicle state and the vehicle state prediction value, and combine the input control command and the corresponding residual into a residual dataset.
[0028] Step 32, Multi-task OSGP model training, specifically:
[0029] (1) Model structure design: Construct a multi-task OSGP real-time disturbance compensation model and adapt it to... The three-dimensional residual joint modeling has the following structure: 500 samples are randomly selected from the training samples of the residual dataset as initial induction points. Each induction point is composed of a sequence length of 10 and five features corresponding to the vehicle time-series data, forming a 50-dimensional set. All tasks share this set of induction points, reducing computational complexity while ensuring feature correlation between tasks. An independent general probability (GP) branch is designed for each dimension. Each branch takes a 50-dimensional feature vector as input and outputs the probability distribution of the residual for that dimension. This includes a mean module, RBF, and a linear combination kernel function. The kernel function supports automatic correlation length, dynamically adapting to the scale of influence of different feature dimensions on the residual. Each task is configured with an independent likelihood function, which takes the distribution output by the branch as input and outputs a noisy prediction distribution, adapting to the differences in noise characteristics of residuals across different dimensions.
[0030] (2) Hyperparameter optimization and training control: The Adam optimizer is used, with the variational evidence lower bound as the loss function. The negative ELBO loss is calculated and summed for the GP branch of each task. The model parameters are updated through backpropagation, including the induction point position, kernel function hyperparameters, and likelihood function noise parameters. The sliding window size W=100 is set, and the statistical characteristics of the residuals within the window are statistically analyzed in real time. The hyperparameters are updated through gradient descent, with the goal of minimizing the negative log-likelihood of the residuals within the window to achieve dynamic adaptation to time-varying perturbations. The training loss and validation RMSE are calculated every 10 epochs. The learning rate is halved to avoid gradient oscillation when the validation RMSE does not decrease for 10 consecutive epochs through the scheduler. If the validation RMSE does not improve for 10 consecutive epochs, the early stopping mechanism is triggered, and the model state dictionary, likelihood function state, and configuration parameters are saved to the weight file to ensure the generalization of the model.
[0031] Further, in step 4, the vehicle state prediction model and the real-time disturbance compensation model infer the vehicle time-series data to obtain the state prediction result and the disturbance compensation result. Based on this, the vehicle state prediction model and the real-time disturbance compensation model are linearized using the central difference method. Using the current time-series data as a benchmark, small disturbances are applied to the state variables and control variables respectively, and the difference in state changes before and after the disturbance is calculated, thereby obtaining the state Jacobian matrix and the control Jacobian matrix, achieving a linear approximation between the vehicle state prediction model and the real-time disturbance compensation model. Subsequently, an MPC optimization problem is constructed, using the linearized state equation as constraints, including initial state constraints, state transition constraints, control quantity boundary constraints, and control rate of change constraints, and the constraints are transformed to a standardized space. The objective function design comprehensively considers trajectory tracking error, control quantity magnitude, and control smoothness. The optimization problem is solved using a solver to obtain the optimal control sequence. The specific method is as follows:
[0032] Step 41, State Prediction and Processing: Vehicle time-series data is input into the vehicle state prediction model and the real-time disturbance compensation model for prediction. The vehicle time-series data first undergoes angle continuity processing and is then transformed to a standardized space using a feature normalizer. The vehicle state prediction model first performs a preliminary prediction on the time-series data, and then the real-time disturbance compensation model performs disturbance prediction and compensation based on the same vehicle time-series data.
[0033] Step 42: The composite model is linearized online using the central difference method, and the state variables in the current vehicle time series data are linearized. , set as and control variables , set as Apply small perturbations respectively The state changes before and after a disturbance are predicted using a vehicle state prediction model and a real-time disturbance compensation model. The state Jacobian matrix A and the control Jacobian matrix B are calculated, respectively describing the impact of historical state and control on the current state. Let be the unit vector of dimension j.
[0034] ;
[0035] ;
[0036] Linearized state equations are established based on matrices A and B: ;
[0037] Step 43, MPC optimization problem construction, defining the sequence of state variables in the prediction time domain N. and control variable sequence Time step The constraints include initial state constraints, state transition constraints, physical constraints, and control smoothness constraints. Physical constraints are used to constrain the boundaries of the control variable in the standardized space, including acceleration. m / s², steering angle rad, after conversion to normalized space as The upper and lower limits are used to constrain the control smoothness constraint, which is used to constrain the rate of change of control variables. The acceleration rate of change ≤ 1.5 m / s³ and the steering angle rate of change ≤ 0.08 rad / s are converted into a standardized space and then constrained to limit the difference between adjacent control variables. Simultaneously, the objective function is set to minimize trajectory tracking error, optimize control energy, and achieve control smoothness, and its expression is:
[0038] ;
[0039] ;
[0040] in Let be the target trajectory at time t. , , These are the weights for state tracking error, control energy, and control smoothness, respectively. For the allowable relative rate of change, The minimum allowable variation;
[0041] Step 44: Solve the above optimization problem by calling the MOSEK and ECOS solvers in sequence. If the solution is successful, output the optimal control sequence. If all solvers fail, the current control quantity is used as the initial control sequence. The first instruction of the optimal control sequence is applied to the vehicle, and the control quantity and the actual state of the vehicle are added to the state sequence. The state sequence is updated through a sliding window, and the state of the last 10 moments is retained. In the next control cycle, the above process is repeated based on the updated state sequence and the newly extracted target trajectory segment to achieve rolling optimization and ensure that the vehicle tracks the target trajectory in real time.
[0042] Step 45: Online parameter update of the real-time disturbance compensation model. After the vehicle executes the control command, vehicle time-series data is collected. Combined with the predicted value of the vehicle state prediction model at that moment, residual data is calculated. At the same time, the control sequence and state sequence of the current moment and the previous 9 moments are extracted to form a vehicle time-series data of length 10. This data is combined with the new residual to form an input feature-residual data pair and stored in the data buffer. When the amount of data in the buffer reaches the threshold, the buffer data is converted into tensors and transmitted to the computing device. The real-time disturbance compensation model is switched to training mode. The model parameters are iteratively optimized using the variational evidence lower bound as the loss function and the Adam optimizer. After optimization, the buffer is cleared. The updated real-time disturbance compensation model is used for disturbance compensation in the next cycle to ensure the model's dynamic adaptability to time-varying disturbances.
[0043] A hybrid model predictive control method for addressing control delay in large vehicles is provided. The hybrid model predictive control method for addressing control delay in large vehicles is implemented to achieve the hybrid model predictive control method for addressing control delay in large vehicles.
[0044] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the hybrid model predictive control method for large vehicle control delay, thereby realizing the hybrid model predictive control method for large vehicle control delay.
[0045] A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the hybrid model predictive control method for large vehicle control delay is implemented, thereby realizing the hybrid model predictive control method for large vehicle control delay.
[0046] Compared with the prior art, the significant advantages of this invention are:
[0047] (1) By using the dual-layer mechanism of LSTM global dynamic modeling and OSGP local error compensation, the time-varying delay caused by high inertia and actuator hysteresis can be captured without an accurate physical model, thus avoiding the trajectory deviation caused by the difficulty of modeling in traditional control.
[0048] (2) Attention residual structure and dynamic delay sensitive loss function enhance the capture of long time delays, while incorporating kinematic constraints to avoid physical inaccuracies and significantly reduce heading angle and position prediction errors.
[0049] (3) Real-time compensation for residuals caused by external disturbances, and online parameter updates and induced point sparsification through sliding window, taking into account both disturbance adaptation and online control timeliness. Attached Figure Description
[0050] Figure 1 This is the overall methodological framework of the present invention.
[0051] Figure 2 This is a single-layer LSTM unit implemented in this invention.
[0052] Figure 3 This is the multi-layer LSTM network framework implemented in this invention.
[0053] Figure 4 This is a comparison of predicted trajectories in an embodiment of the present invention.
[0054] Figure 5 This is a comparison of the predicted heading angles in an embodiment of the present invention.
[0055] Figure 6 This is a comparison of the predicted position errors in an embodiment of the present invention.
[0056] Figure 7 This is a comparison of the predicted heading angle errors in embodiments of the present invention.
[0057] Figure 8 This is a comparison of trajectory tracking in an embodiment of the present invention.
[0058] Figure 9 This is a comparison of heading angles in an embodiment of the present invention.
[0059] Figure 10 This is a comparison of positional errors in embodiments of the present invention.
[0060] Figure 11 This is a comparison of heading angle errors in embodiments of the present invention.
[0061] Figure 12 This is a comparison of the control performance indicators of embodiments of the present invention. Detailed Implementation
[0062] The present invention will be further described below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only for explaining this application and are not intended to limit this application. After reading this invention, any modifications of the invention in various equivalent forms by those skilled in the art fall within the scope defined by the appended claims.
[0063] This invention provides a hybrid model predictive control method to address control delays in large vehicles. The overall method flow is as follows: Figure 1 As shown.
[0064] Step 1: Construct a simulation environment based on the vehicle dynamics model to simulate vehicle motion under different driving modes. Design a control command sequence containing delays and noise to act on the vehicle dynamics model, and collect time-series data of the control command sequence and corresponding vehicle states. This includes the following specific steps:
[0065] Step 11: Based on the six-degree-of-freedom state-space equations, establish a vehicle dynamics model to simulate vehicle motion under different driving modes:
[0066] ;
[0067] Indicates the vehicle status, where For longitudinal velocity, For lateral speed, For yaw rate, For heading angle, and Position coordinates; longitudinal acceleration Calculated using the force balance equations:
[0068] ;
[0069] in For driving force, For braking force, For air density, For air drag coefficient, For windward area, For rolling resistance coefficient, The acceleration is due to gravity; meanwhile, the vehicle dynamics model considers the correction of tire lateral stiffness caused by the dynamic changes in axle load due to acceleration.
[0070] ;
[0071] ;
[0072] ;
[0073] ;
[0074] in These are the distances from the front and rear axles to the center of gravity, respectively. These are the front and rear tire lateral stiffness, based on static lateral stiffness. , and front and rear wheel axle load , The calculated vehicle parameters are shown in the table below:
[0075] Table 1 Main parameters of the vehicle dynamics model
[0076]
[0077] Step 12: Design a control command sequence containing delay and noise to act on the vehicle dynamics model, and collect timing data of the control command sequence and corresponding vehicle states. The generation of control commands is based on acceleration. and front wheel cornering As core benchmark parameters, both are first constructed using sine waves and uniform distribution functions to establish basic values. Then, parameters are adjusted based on driving modes, and smoothness is optimized through filtering and rate-of-change limiting. For smooth driving, a 0.3Hz sine wave is used for the steering angle, ranging from -0.18 to 0.18 rad, and a 0.2Hz sine wave is used for acceleration, ranging from -0.5 to 1.5 m / s². During emergency braking, the initial steering angle is a 0.8Hz sine wave, which gradually decays at 0.9 times the initial value, while the acceleration ranges from 0.8 to 1.2 rad. The speed is linearly switched from ±2.0 m / s² to -4.5 m / s² via a transition rate of ±2.0 m / s²; the serpentine steering maneuver uses a 0.4 Hz sine wave, ranging from -0.18 to 0.18 rad, with the phase randomly increasing the steering angle frequency, and the acceleration stabilized at 2.0 ± 1.0 m / s², simulating continuous steering; the delay is achieved in two ways: queue delay, where the queue delay randomly selects 5 to 10 steps, and the instruction is only retrieved and executed when the queue length reaches the delay step number; and first-order inertial delay, based on the time constant. and time step Calculate the transition coefficient The current instruction is weighted and merged with the value executed at the previous time step: New value = + The simulation actuator exhibits a response lag due to inertia, and this lag is superimposed on the final command with a mean of 0, a steering angle of ±0.01 rad, and an acceleration. Gaussian noise is used to calculate the state sequence through the vehicle dynamics model, which ultimately forms the timing data of control commands and corresponding vehicle states.
[0078] Step 2: Construct a vehicle state prediction model by integrating a self-attention mechanism and a residual multilayer LSTM network. This model includes a spatiotemporal feature extraction layer, multilayer LSTM units, and a self-attention mechanism and feature fusion layer. The spatiotemporal feature extraction layer contains a linear transformation layer, ReLU, and layer normalization. It takes vehicle time-series data as input, performs low-dimensional to high-dimensional mapping and activation, and normalization to output preliminary high-dimensional spatiotemporal features. The multilayer LSTM units are stacked LSTM subunits with residual connections. They take high-dimensional spatiotemporal features as input, and capture long-term dependencies through multilayer iteration by adding the output of each layer to the input residual, outputting deeply encoded features. Temporal features; the self-attention mechanism includes a dimensionality reduction linear layer, Tanh, and a weighted output layer. It takes deep temporal features as input, calculates and aggregates them through time-step weights, and outputs a feature vector focusing on key temporal sequences. The feature fusion layer includes concatenation operations, a linear transformation layer, ELU, and Dropout. Its input is the output of the attention layer and the original features from the last step. After concatenation and fusion, it outputs the predicted state of the vehicle at the next moment. The vehicle state prediction model training uses a dynamically delayed adaptive kinematic loss function, comprehensively considering position error, heading error, and kinematic constraint deviation to optimize performance. The specific steps include:
[0079] Step 21: Construct a vehicle state prediction model by integrating a self-attention mechanism with a residual multilayer LSTM network, including a spatiotemporal feature extraction layer, multilayer LSTM units, and a self-attention mechanism and feature fusion layer, specifically:
[0080] (1) Spatiotemporal feature extraction layer: such as Figure 2 The diagram illustrates a cascaded structure employing linear transformation, ReLU activation, and layer normalization. The input dimension is [batch_size, seq_len, input_size], representing the historical control commands and state sequences, where batch_size, seq_len, and input_size represent the batch size, sequence length, and number of input features, respectively. First, a linear layer maps the features at each time step, converting the number of input features into a hidden dimension, set to 256 during training, resulting in an output dimension of [batch_size, seq_len, 256]. Next, the ReLU activation function enhances the non-linear expressive power, while maintaining the output dimension at [batch_size, seq_len, 256]. Finally, layer normalization normalizes the 256-dimensional features at each time step, eliminating dimensional differences between features. The final output dimension remains a uniform-scale spatiotemporal feature of [batch_size, seq_len, 256], providing a foundation for subsequent temporal modeling.
[0081] (2) Multilayer LSTM unit: such as Figure 3The diagram shows a stack of four layers of LSTM units with residual connections. Each LSTM layer has an input and output dimension of 256. The input to each LSTM unit is the spatiotemporal feature extraction layer, whose output dimension is [batch_size, seq_len, 256]. The first LSTM layer receives this input and outputs a temporal feature with a dimension of [batch_size, seq_len, 256]. This feature is then added element-wise to the original input of this layer to obtain a feature with residual enhancement, still with a dimension of [batch_size, seq_len, 256]. This enhanced feature is used as the input to the second LSTM layer. After processing by the second LSTM layer, it is also added element-wise to the input of the second layer. This process continues until, after four LSTM layers, the final output is a deep temporal feature with a dimension of [batch_size, seq_len, 256].
[0082] (3) Self-attention mechanism: It consists of a dimension reduction linear layer, Tanh activation, and a weight output layer. The input of the self-attention mechanism is a deep temporal feature with a multi-layer LSTM output dimension of [batch_size, seq_len, 256]. First, the 256-dimensional feature is mapped to 32 dimensions through the dimension reduction linear layer, and the output dimension is [batch_size, seq_len, 32]. After processing by the Tanh activation function, the dimension remains unchanged at [batch_size, seq_len, 32]. Then, the 32-dimensional feature is converted into a 1-dimensional weight score through the weight output layer, and the output dimension is [batch_size, seq_len, 1]. After processing by squeeze(-1), [batch_size, seq_len, 1] is obtained. The original weight scores of [e, seq_len] are then normalized in the sequence length dimension using the Softmax function to obtain attention weights of dimension [batch_size, seq_len]. Finally, these weights are used to perform a weighted summation of the temporal features output by the LSTM, that is, the weight dimension of [batch_size, 1, seq_len] is multiplied by the LSTM feature dimension of [batch_size, seq_len, 256], and the output dimension is [batch_size, 1, 256]. After squeezing (1), a global key temporal feature vector of [batch_size, 256] is obtained, which realizes dynamic focusing on key moments related to control delay and suppresses irrelevant temporal information.
[0083] (4) Feature Fusion Layer: The global key temporal features with dimensions [batch_size, 256] output by the self-attention mechanism are concatenated with the original features with dimensions [batch_size, input_size] from the last step of the input sequence to obtain the fused feature with dimensions [batch_size, 256 + input_size]. This fused feature is processed sequentially through linear transformation, ELU activation, Dropout, and output linear layer: First, the fused feature is mapped to 64 dimensions through the linear layer, and the output dimension is [batch_size, 64]. After the nonlinearity is enhanced by the ELU activation function, the dimension remains unchanged at [batch_size, 64]. Dropout randomly discards some neurons to prevent overfitting, and the dimension is still [batch_size, 64]. Finally, the 64-dimensional features are mapped to the output dimension through the output linear layer, and the final output is the predicted vehicle state value at the future time [batch_size, input_size].
[0084] Step 22: Design a dynamically delay-sensitive adaptive kinematic loss function. Specifically, the loss function comprises three parts: position loss, heading loss, and kinematic constraint loss. The weights of each part are dynamically adjusted by a learnable log-variance parameter. The expression is as follows:
[0085] ;
[0086] in, , , For the predicted position and heading angle, , , The actual value; , , It is a learnable log-variance parameter used to dynamically balance the weights of position and heading losses; , For the predicted lateral and longitudinal displacement increments, This refers to the front wheel steering angle. The minimum value is 1e-6 to avoid the denominator being 0; The weighting coefficient for the kinematic constraint loss is set to 5.0. By constraining the deviation between the ratio of displacement increments and the steering angle tangent, the model's adherence to the physical laws of vehicle motion is strengthened.
[0087] Step 23: Train the network using the training set data, employing the AdamW optimizer with an initial learning rate of 1e-2 and a weight decay coefficient of 1e-4. Iteratively update the network parameters using the loss function designed in Step 22 as the optimization objective. During training, gradients are pruned to avoid gradient explosion. The average loss of the training and validation sets is calculated every 10 iterations. If the validation set loss does not decrease for 10 consecutive epochs, an early stopping mechanism is triggered to stop training. The final network weight parameters and the feature normalizer calculated based on the training set are saved for feature normalization processing in the subsequent prediction process.
[0088] Step 3: Construct the residual dataset and the multi-task OSGP real-time disturbance compensation model. Using vehicle time-series data as input, the model infers the state prediction result through vehicle state prediction. The residual result is obtained by subtracting the actual vehicle state from the prediction result, and the output includes data pairs containing control commands and corresponding residuals, thus constructing the residual dataset. The multi-task OSGP real-time disturbance compensation model consists of an induction point sharing module, multi-task branches, and an online update unit. The induction point sharing module selects initial induction points from the residual dataset for all tasks to share. The multi-task branches contain independent RBF and linear combination kernel functions and likelihood functions. Using the control command sequence from the residual dataset as input, it jointly models multi-dimensional residual dependencies and outputs the probability distribution prediction of each residual as the disturbance compensation result. The online update unit receives new residual data in real time through a sliding window and dynamically optimizes the kernel function hyperparameters and induction point positions, including the following specific steps:
[0089] Step 31, constructing the residual dataset, specifically:
[0090] (1) Data preprocessing: The vehicle time series data is preprocessed, firstly by processing the angle continuity, exceeding ( Angle correction is performed within the range to ensure the continuity of angles in the time series and avoid calculation deviations caused by periodicity; the mean and standard deviation obtained during the training phase of the vehicle state prediction model are saved, and the input feature columns are standardized to eliminate dimensional differences; features from 10 consecutive time steps in the vehicle time series data are extracted to form the input sequence, with the corresponding label being the vehicle state at the 11th time step.
[0091] (2) Vehicle state prediction model prediction and residual calculation: Load the trained vehicle state prediction model, input the preprocessed vehicle time series data into the model in batches of batch_size=64, output the vehicle state prediction value, calculate the difference between the actual vehicle state and the vehicle state prediction value, and combine the input control command and the corresponding residual into a residual dataset containing 15,000 valid samples.
[0092] Step 32, Multi-task OSGP model training, specifically:
[0093] (1) Model structure design: Construct a multi-task OSGP real-time disturbance compensation model to adapt to the joint modeling of residuals in three dimensions X, Y, and φ. The specific structure is as follows: 500 samples are randomly selected from the training samples of the residual dataset as initial induction points. Each induction point is composed of a sequence length of 10 and 5 features corresponding to the vehicle time series data to form a 50-dimensional set. All tasks share this set of induction points to reduce computational complexity while ensuring feature correlation between tasks. An independent GP branch is designed for each dimension. The input of each branch is a 50-dimensional feature vector, and the output is the probability distribution of the residual in the corresponding dimension. It includes a mean module, RBF and linear combination kernel function. The kernel function supports automatic correlation length and can dynamically adapt to the scale of influence of different feature dimensions on the residual. Each task is configured with an independent likelihood function, which is input to the distribution of the branch output and outputs a noisy prediction distribution to adapt to the noise characteristics of residuals in different dimensions.
[0094] (2) Hyperparameter optimization and training control: The Adam optimizer is used, with an initial learning rate of 0.007. The variational evidence lower bound is used as the loss function. The negative ELBO loss is calculated and summed for the GP branch of each task. The model parameters, including the induction point position, kernel function hyperparameters, and likelihood function noise parameters, are updated through backpropagation. The sliding window size W=100 is set, and the statistical characteristics of the residuals within the window are statistically analyzed in real time. The hyperparameters are updated through gradient descent, with the learning rate set to 0.007. The goal is to minimize the negative log-likelihood of the residuals within the window to achieve dynamic adaptation to time-varying perturbations. The training loss and validation RMSE are calculated every 10 epochs. The learning rate is halved to avoid gradient oscillation when the validation RMSE does not decrease for 10 consecutive epochs through the scheduler. If the validation RMSE does not improve for 10 consecutive epochs, the early stopping mechanism is triggered, and the model state dictionary, likelihood function state, and configuration parameters are saved to the weight file to ensure the generalization of the model.
[0095] Performance Comparison Analysis of Predictive Models: The experiment compared a single LSTM model with an LSTM-OSGP composite model, using a test set of 1500 samples as the benchmark. Evaluation metrics included position RMSE, heading angle RMSE, and maximum error. Key metrics and comparison results are shown in Table 2. Figure 4-7 As shown.
[0096] The LSTM-OSGP composite model, also known as the LOG model, outperforms the single LSTM model significantly: the position RMSE is 1.12m, a reduction of 62.88%, the heading angle RMSE is 0.32°, a reduction of 60.60%, the maximum error under complex conditions is greatly compressed, and the prediction bias in the external disturbance area is reduced by more than 40%.
[0097] In summary, the composite model effectively offsets external disturbances and inherent model biases by dynamically modeling the LSTM prediction residuals using OSGP, significantly improving the accuracy and robustness of large vehicle state prediction.
[0098] Table 2 Comparison of Model Prediction Performance Indicators
[0099]
[0100] Step 4: The vehicle state prediction model and the real-time disturbance compensation model infer from the vehicle time-series data to obtain the state prediction result and the disturbance compensation result. Based on this, the vehicle state prediction model and the real-time disturbance compensation model are linearized using the central difference method. Using the current time-series data as a benchmark, small disturbances are applied to the state variables and control variables respectively, and the difference in state changes before and after the disturbance is calculated, thereby obtaining the state Jacobian matrix and the control Jacobian matrix, achieving a linear approximation between the vehicle state prediction model and the real-time disturbance compensation model. Subsequently, an MPC optimization problem is constructed, using the linearized state equation as constraints, including initial state constraints, state transition constraints, control quantity boundary constraints, and control rate of change constraints, and the constraints are transformed to a standardized space. The objective function design comprehensively considers trajectory tracking error, control quantity magnitude, and control smoothness. The optimization problem is solved using a solver to obtain the optimal control sequence, including the following specific steps:
[0101] Step 41, State Prediction and Processing: Vehicle time-series data is input into the vehicle state prediction model and the real-time disturbance compensation model for prediction. The vehicle time-series data first undergoes angle continuity processing and is then transformed to a standardized space using a feature normalizer. The vehicle state prediction model first performs a preliminary prediction on the time-series data, and then the real-time disturbance compensation model performs disturbance prediction and compensation based on the same vehicle time-series data.
[0102] Step 42: The composite model is linearized online using the central difference method, and the state variables in the current input sequence are linearized. , set as and control variables , set as Apply small perturbations respectively By predicting the state changes before and after the disturbance using a composite model, the state Jacobian matrix A and the control Jacobian matrix B are calculated, which respectively describe the influence of state on state and the influence of control on state. Let be the unit vector of dimension j.
[0103] ;
[0104] ;
[0105] Linearized state equations are established based on matrices A and B: .
[0106] Step 43, MPC optimization problem construction, defining the sequence of state variables in the prediction time domain N. and control variable sequence Time step The constraints include initial state constraints, state transition constraints, physical constraints, and control smoothness constraints. Physical constraints are used to constrain the boundaries of the control variable in the standardized space, including acceleration. m / s², steering angle rad, after conversion to normalized space as The upper and lower limits are used to constrain the control smoothness constraint, which is used to constrain the rate of change of control variables. The acceleration rate of change ≤ 1.5 m / s³ and the steering angle rate of change ≤ 0.08 rad / s are converted into a standardized space and then constrained to limit the difference between adjacent control variables. Simultaneously, the objective function is set to minimize trajectory tracking error, optimize control energy, and achieve control smoothness, and its expression is:
[0107] ;
[0108] ;
[0109] in Let be the target trajectory at time t. , , These are the weights for state tracking error, control energy, and control smoothness, respectively. For the allowable relative rate of change, This is the minimum allowable variation.
[0110] Step 44: Solve the above optimization problem by calling the MOSEK and ECOS solvers in sequence. If the solution is successful, output the optimal control sequence. If all solvers fail, the current control quantity is used as the initial control sequence. The first instruction of the optimal control sequence is applied to the vehicle, and the control quantity and the actual state of the vehicle are added to the state sequence. The state sequence is updated through a sliding window, and the state of the last 10 moments is retained. In the next control cycle, the above process is repeated based on the updated state sequence and the newly extracted target trajectory segment to achieve rolling optimization and ensure that the vehicle tracks the target trajectory in real time.
[0111] Step 45: Online parameter update of the real-time disturbance compensation model. After the vehicle executes the control command, vehicle time-series data is collected. Combined with the predicted value of the vehicle state prediction model at that moment, residual data is calculated. At the same time, the control sequence and state sequence of the current moment and the previous 9 moments are extracted to form a vehicle time-series data of length 10. This data is combined with the new residual to form an input feature-residual data pair and stored in the data buffer. When the amount of data in the buffer reaches the threshold of 1000, the buffer data is converted into tensors and transmitted to the computing device. The real-time disturbance compensation model is switched to training mode. The model parameters are iteratively optimized using the variational evidence lower bound as the loss function and the Adam optimizer is used. The learning rate is set to 0.007. After optimization, the buffer is cleared. The updated real-time disturbance compensation model is used for disturbance compensation in the next cycle to ensure the model's dynamic adaptability to time-varying disturbances.
[0112] Control performance comparison analysis: The experiment selected traditional PID, LSTM-MPC and the LOG-MPC of this invention for comparison. The test scenario covered the typical operating trajectory of large vehicles. The core indicators and comparison results are shown in Table 3. Figure 8-12 As shown.
[0113] The chart data shows that LOG-MPC outperforms LOG-MPC in all aspects: the position RMSE is 7.09m, which is 28.9% lower than PID and 13.3% lower than LSTM-MPC; the heading angle RMSE is 4.61°, which is 36.0% lower than PID and 38.2% lower than LSTM-MPC. In addition, it has better trajectory fit and heading stability in continuous curves and disturbance sections, and the maximum error and tracking delay are significantly reduced.
[0114] In summary, LOG-MPC effectively compensates for control delays through the synergy of composite prediction and online updates, improving trajectory tracking accuracy and disturbance resistance, and making it more practical for engineering applications.
[0115] Table 3. Comparison Indicators of Control Performance
[0116]
[0117] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0118] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these modifications and improvements all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A hybrid model predictive control method for addressing control delay in large vehicles, characterized in that, Includes the following steps: Step 1: Construct a simulation environment based on the vehicle dynamics model to simulate vehicle motion under different driving modes. Design a control command sequence with delay and noise to act on the vehicle dynamics model, and collect the control command sequence and the corresponding vehicle state to form time series data. Step 2: Construct a vehicle state prediction model by integrating a self-attention mechanism and a residual multilayer LSTM network. The model includes a spatiotemporal feature extraction layer, a multilayer LSTM unit, a self-attention mechanism and feature fusion layer. The spatiotemporal feature extraction layer includes a linear transformation layer, ReLU and layer normalization. The input vehicle time series data is mapped from low dimension to high dimension and then activated and normalized to output preliminary high-dimensional spatiotemporal features. A multi-layer LSTM unit is composed of stacked LSTM sub-units with residual connections. It takes high-dimensional spatiotemporal features as input, adds the output of each layer to the input residual, and captures long-term dependencies through multi-layer iteration, thus outputting deeply encoded temporal features. The self-attention mechanism includes a dimensionality reduction linear layer, Tanh, and a weighted output layer. It takes deep-encoded temporal features as input, calculates and aggregates time-step weights, and outputs a feature vector focusing on key temporal sequences. The feature fusion layer includes a concatenation operation, a linear transformation layer, ELU, and Dropout. Its input is the output of the attention layer and the original features from the last step. After concatenation and fusion, it outputs the predicted state of the vehicle at the next moment. The vehicle state prediction model is trained using a dynamically delayed adaptive kinematic loss function, which optimizes performance by combining position error, heading error, and kinematic constraint deviation. Step 3: Construct the residual dataset and the multi-task OSGP real-time disturbance compensation model. Using vehicle time-series data as input, the model infers the state prediction result through vehicle state prediction. The difference between the actual vehicle state and the prediction result yields the residual result, outputting data pairs containing control commands and corresponding residuals to construct the residual dataset. The multi-task OSGP model consists of an induction point sharing module, multi-task branches, and an online update unit. The induction point sharing module selects initial induction points from the residual dataset for all tasks to share. The multi-task branches contain independent RBF and linear combination kernel functions and likelihood functions. Using the control command sequence from the residual dataset as input, it jointly models multi-dimensional residual dependencies and outputs the probability distribution prediction of each residual as the disturbance compensation result. The online update unit receives new residual data in real time through a sliding window and dynamically optimizes the kernel function hyperparameters and induction point positions. Step 4: The vehicle state prediction model and the real-time disturbance compensation model infer from the vehicle time-series data to obtain the state prediction result and the disturbance compensation result. Based on this, the vehicle state prediction model and the real-time disturbance compensation model are linearized using the central difference method. Taking the current time-series data as a benchmark, small disturbances are applied to the state variables and control variables respectively, and the difference in state changes before and after the disturbance is calculated to obtain the state Jacobian matrix and the control Jacobian matrix, thus achieving a linear approximation between the vehicle state prediction model and the real-time disturbance compensation model. Subsequently, an MPC optimization problem is constructed, with the linearized state equation as a constraint, including initial state constraints, state transition constraints, control quantity boundary constraints, and control rate of change constraints, and the constraints are transformed to the standardized space. The objective function design comprehensively considers trajectory tracking error, control quantity magnitude, and control smoothness. The optimization problem is solved by a solver to obtain the optimal control sequence.
2. The hybrid model predictive control method for addressing control delays in large vehicles as described in claim 1, characterized in that: Step 1: Construct a simulation environment based on the vehicle dynamics model to simulate vehicle motion under different driving modes. Design a control command sequence containing delays and noise to act on the vehicle dynamics model, and collect time-series data of the control command sequence and the corresponding vehicle state. The specific method is as follows: Step 11: Based on the six-degree-of-freedom state-space equations, establish a vehicle dynamics model to simulate vehicle motion under different driving modes: ; Indicates the vehicle status, where For longitudinal velocity, For lateral speed, For yaw rate, For heading angle, and Position coordinates; longitudinal acceleration Calculated using the force balance equations: ; in For driving force, For braking force, For air density, For air drag coefficient, For windward area, For rolling resistance coefficient, The acceleration is due to gravity; meanwhile, the vehicle dynamics model considers the correction of tire lateral stiffness caused by the dynamic changes in axle load due to acceleration. ; ; ; ; in These are the distances from the front and rear axles to the center of gravity, respectively. These are the front and rear tire lateral stiffness, based on static lateral stiffness. , and front and rear wheel axle load , Calculated; Step 12: Design a control command sequence containing delay and noise to act on the vehicle dynamics model, and collect timing data of the control command sequence and corresponding vehicle states. The generation of control commands is based on acceleration. and front wheel cornering As core benchmark parameters, both are first constructed using sine waves and uniform distribution functions to establish basic values. Then, parameters are adjusted based on driving modes. Smoothness is optimized through filtering and rate-of-change limiting. For smooth driving, the sine wave frequency of steering angle and acceleration is reduced, and the amplitude and disturbance range are adjusted. During emergency braking, the steering angle gradually decays, and the acceleration transitions from stable acceleration to the braking target. For slalom maneuvers, continuous steering is simulated by increasing the sine wave frequency of the steering angle and adjusting the amplitude and phase, while maintaining stable acceleration. Delays are achieved through two methods: queue delay, where randomly selected delay steps are stored in a queue, and the instruction is only retrieved and executed when the queue length reaches the required delay steps; and first-order inertial delay, based on the time constant. and time step Calculate the transition coefficient The current instruction is weighted and merged with the value executed at the previous time step: New value = + The simulation actuator's response lag due to inertia is used to generate a noisy control sequence by superimposing normally distributed random values with a mean of 0 and different standard deviations onto the final command. The state sequence is then calculated using a vehicle dynamics model, ultimately forming the timing data of the control command and the corresponding vehicle state.
3. The hybrid model predictive control method for addressing control delay in large vehicles as described in claim 1, characterized in that: Step 2: Construct a vehicle state prediction model by integrating a self-attention mechanism and a residual multilayer LSTM network. The model includes a spatiotemporal feature extraction layer, a multilayer LSTM unit, a self-attention mechanism and feature fusion layer. The spatiotemporal feature extraction layer includes a linear transformation layer, ReLU and layer normalization. The input vehicle time series data is mapped from low dimension to high dimension and then activated and normalized to output preliminary high-dimensional spatiotemporal features. A multi-layer LSTM unit is composed of stacked LSTM sub-units with residual connections. It takes high-dimensional spatiotemporal features as input, adds the output of each layer to the input residual, and captures long-term dependencies through multi-layer iteration, thus outputting deeply encoded temporal features. The self-attention mechanism includes a dimensionality reduction linear layer, Tanh, and a weighted output layer. It takes deep temporal features as input, calculates and aggregates time-step weights, and outputs a feature vector focusing on key temporal sequences. The feature fusion layer includes concatenation operations, a linear transformation layer, ELU, and Dropout. Its input consists of the attention layer output and the original features from the last step. After concatenation and fusion, it outputs the predicted state of the vehicle at the next time step. The vehicle state prediction model is trained using a dynamically delayed adaptive kinematic loss function, comprehensively considering position error, heading error, and kinematic constraint bias to optimize performance. The specific method is as follows: Step 21: Construct a vehicle state prediction model by integrating a self-attention mechanism with a residual multilayer LSTM network, including a spatiotemporal feature extraction layer, multilayer LSTM units, and a self-attention mechanism and feature fusion layer, specifically: (1) Spatiotemporal feature extraction layer: A cascaded structure of linear transformation, ReLU activation, and layer normalization is adopted. The input dimension is the historical control command and state sequence of [batch_size, seq_len, input_size], where batch_size, seq_len, and input_size represent the batch size, sequence length, and number of input features, respectively. First, the features of each time step are mapped through a linear layer to convert the number of input features into hidden dimensions, which are set to 256 during training, and the output dimension becomes [batch_size, seq_len, 256]. Then, the nonlinear expression capability is enhanced through the ReLU activation function, and the output dimension remains unchanged at [batch_size, seq_len, 256]. Finally, the 256-dimensional features of each time step are normalized through layer normalization to eliminate the difference in the scale of different features. The final output dimension is still a uniform scale spatiotemporal feature of [batch_size, seq_len, 256], which provides a foundation for subsequent time series modeling. (2) Multi-layer LSTM unit: Four layers of LSTM units with residual connections are stacked. The input and output dimensions of each LSTM layer are 256. The input of the multi-layer LSTM unit is the uniform scale spatiotemporal features output by the spatiotemporal feature extraction layer. After receiving the input, the first LSTM layer outputs a temporal feature with dimensions [batch_size, seq_len, 256]. This feature is then added element-wise to the original input of this layer to obtain a feature with residual enhancement, still with dimensions [batch_size, seq_len, 256]. This enhanced feature is used as the input of the second LSTM layer. After processing by the second LSTM layer, it is also added element-wise to the input of the second layer. This process is repeated for four LSTM layers, eventually outputting a deep temporal feature with dimensions [batch_size, seq_len, 256]. (3) Self-attention mechanism: It consists of a dimension reduction linear layer, Tanh activation, and a weight output layer. The input of the self-attention mechanism is the deep temporal features output by a multi-layer LSTM. First, the 256-dimensional features are mapped to 32 dimensions through the dimension reduction linear layer, and the output dimension is [batch_size, seq_len, 32]. After processing by the Tanh activation function, the dimension remains unchanged at [batch_size, seq_len, 32]. Then, the 32-dimensional features are converted into 1-dimensional weight scores through the weight output layer, and the output dimension is [batch_size, seq_len, 1]. After processing by squeeze(-1), the original weights of [batch_size, seq_len] are obtained. The sequence length dimension is then normalized using the Softmax function to obtain the attention weights of [batch_size, seq_len]. Finally, the weights are used to perform a weighted summation of the temporal features output by the LSTM, that is, the weights of dimension [batch_size, 1, seq_len] are multiplied with the LSTM features of dimension [batch_size, seq_len, 256], and the output dimension is [batch_size, 1, 256]. After squeezing (1), the global key temporal feature vector of dimension [batch_size, 256] is obtained, which realizes the dynamic focus on key moments related to control delay and suppresses irrelevant temporal information. (4) Feature fusion layer: The global key temporal features with dimensions [batch_size, 256] output by the self-attention mechanism are concatenated with the original features with dimensions [batch_size, input_size] from the last step of the input sequence to obtain fused features with dimensions [batch_size, 256 + input_size]. The fused features are processed by linear transformation, ELU activation, Dropout, and output linear layer in sequence: First, the fused features are mapped to 64 dimensions through the linear layer, and the output features have dimensions [batch_size, 64]. After enhancing nonlinearity using the ELU activation function, the dimension remains unchanged at [batch_size, 64]. Dropout is used to randomly discard some neurons to prevent overfitting, while the dimension is still [batch_size, 64]. Finally, the 64-dimensional features are mapped to the output dimension through the output linear layer, and the final output is the predicted vehicle state value at the future time step of [batch_size, input_size]. Step 22: Design a dynamically delay-sensitive adaptive kinematic loss function. Specifically, the loss function comprises three parts: position loss, heading loss, and kinematic constraint loss. The weights of each part are dynamically adjusted by a learnable log-variance parameter. The expression is as follows: ; in, , , For the predicted position and heading angle, , , The actual value; , , It is a learnable log-variance parameter used to dynamically balance the weights of position and heading losses; , For the predicted lateral and longitudinal displacement increments, To find the minimum value to avoid the denominator being 0; The weighting coefficients for kinematic constraint loss enhance the model's adherence to the physical laws of vehicle motion by using the deviation between the ratio of constraint displacement increments and the steering angle tangent. Step 23: Train the vehicle state prediction model using vehicle time-series data. Employ the AdamW optimizer and use the loss function designed in Step 22 as the optimization objective to iteratively update the network parameters. During training, prune the gradients to avoid gradient explosion. Calculate the average loss of the training and validation sets every 10 iterations. If the validation set loss does not decrease for 10 consecutive epochs, trigger an early stopping mechanism to stop training. Save the final network weight parameters and the feature normalizer calculated based on the training set for feature normalization processing in subsequent prediction processes.
4. The hybrid model predictive control method for addressing control delay in large vehicles as described in claim 1, characterized in that: Step 3: Construct the residual dataset and the multi-task OSGP real-time disturbance compensation model. Using vehicle time-series data as input, the model infers the state prediction result through vehicle state prediction. The residual result is obtained by subtracting the actual vehicle state from the prediction result, and the output includes data pairs containing control commands and corresponding residuals, thus constructing the residual dataset. The multi-task OSGP real-time disturbance compensation model consists of an induction point sharing module, multi-task branches, and an online update unit. The induction point sharing module selects initial induction points from the residual dataset for all tasks to share. The multi-task branches contain independent RBF and linear combination kernel functions and likelihood functions. Using the control command sequence from the residual dataset as input, it jointly models multi-dimensional residual dependencies and outputs the probability distribution prediction of each residual as the disturbance compensation result. The online update unit receives new residual data in real time through a sliding window and dynamically optimizes the kernel function hyperparameters and induction point positions. The specific method is as follows: Step 31, constructing the residual dataset, specifically: (1) Data preprocessing: The vehicle time series data is preprocessed, firstly by processing the angle continuity, and then processing the angles beyond the specified range. Angle correction is performed within the range to ensure the continuity of angles in the time series and avoid calculation deviations caused by periodicity; the mean and standard deviation obtained during the training phase of the vehicle state prediction model are saved, and the input feature columns are standardized to eliminate dimensional differences; features from 10 consecutive time steps in the vehicle time series data are extracted to form the input sequence, with the corresponding label being the vehicle state at the 11th time step. (2) Vehicle state prediction model prediction and residual calculation: Load the trained vehicle state prediction model, input the preprocessed vehicle time series data in batches, output the vehicle state prediction value, calculate the difference between the actual vehicle state and the vehicle state prediction value, and combine the input control command and the corresponding residual into a residual dataset. Step 32, Multi-task OSGP model training, specifically: (1) Model structure design: Construct a multi-task OSGP real-time disturbance compensation model and adapt it to... The three-dimensional residual joint modeling has the following structure: 500 samples are randomly selected from the training samples of the residual dataset as initial induction points. Each induction point is composed of a sequence length of 10 and five features corresponding to the vehicle time-series data, forming a 50-dimensional set. All tasks share this set of induction points, reducing computational complexity while ensuring feature correlation between tasks. An independent general probability (GP) branch is designed for each dimension. Each branch takes a 50-dimensional feature vector as input and outputs the probability distribution of the residual for that dimension. This includes a mean module, RBF, and a linear combination kernel function. The kernel function supports automatic correlation length, dynamically adapting to the scale of influence of different feature dimensions on the residual. Each task is configured with an independent likelihood function, which takes the distribution output by the branch as input and outputs a noisy prediction distribution, adapting to the differences in noise characteristics of residuals across different dimensions. (2) Hyperparameter optimization and training control: The Adam optimizer is used, with the variational evidence lower bound as the loss function. The negative ELBO loss is calculated and summed for the GP branch of each task. The model parameters are updated through backpropagation, including the induction point position, kernel function hyperparameters, and likelihood function noise parameters. The sliding window size W=100 is set, and the statistical characteristics of the residuals within the window are statistically analyzed in real time. The hyperparameters are updated through gradient descent, with the goal of minimizing the negative log-likelihood of the residuals within the window to achieve dynamic adaptation to time-varying perturbations. The training loss and validation RMSE are calculated every 10 epochs. The learning rate is halved to avoid gradient oscillation when the validation RMSE does not decrease for 10 consecutive epochs through the scheduler. If the validation RMSE does not improve for 10 consecutive epochs, the early stopping mechanism is triggered, and the model state dictionary, likelihood function state, and configuration parameters are saved to the weight file to ensure the generalization of the model.
5. The hybrid model predictive control method for addressing control delay in large vehicles as described in claim 1, characterized in that: Step 4: The vehicle state prediction model and the real-time disturbance compensation model infer from the vehicle time-series data to obtain the state prediction result and disturbance compensation result. Based on this, the vehicle state prediction model and the real-time disturbance compensation model are linearized using the central difference method. Using the current time-series data as a benchmark, small disturbances are applied to the state variables and control variables respectively, and the difference in state changes before and after the disturbance is calculated, thereby obtaining the state Jacobian matrix and the control Jacobian matrix, achieving a linear approximation between the vehicle state prediction model and the real-time disturbance compensation model. Subsequently, an MPC optimization problem is constructed, using the linearized state equation as constraints, including initial state constraints, state transition constraints, control quantity boundary constraints, and control rate of change constraints, and the constraints are transformed to a standardized space. The objective function design comprehensively considers trajectory tracking error, control quantity magnitude, and control smoothness. The optimization problem is solved using a solver to obtain the optimal control sequence. The specific method is as follows: Step 41, State Prediction and Processing: Vehicle time series data are input into the vehicle state prediction model and the real-time disturbance compensation model for prediction. The vehicle time series data needs to be processed for angle continuity and transformed into a standardized space through a feature normalizer. The vehicle state prediction model first makes a preliminary prediction on the time series data, and then the real-time disturbance compensation model performs disturbance prediction compensation based on the same vehicle time series data. Step 42: The composite model is linearized online using the central difference method, and the state variables in the current vehicle time series data are linearized. , set as and control variables , set as Apply small perturbations respectively The state changes before and after a disturbance are predicted using a vehicle state prediction model and a real-time disturbance compensation model. The state Jacobian matrix A and the control Jacobian matrix B are calculated, respectively describing the influence of state on state and the influence of control on state. Let be the unit vector of dimension j; ; ; Linearized state equations are established based on matrices A and B: ; Step 43, MPC optimization problem construction, defining the sequence of state variables in the prediction time domain N. and control variable sequence Time step The constraints include initial state constraints, state transition constraints, physical constraints, and control smoothness constraints. Physical constraints are used to constrain the boundaries of the control variable in the standardized space, including acceleration. m / s², steering angle rad, after conversion to normalized space as The upper and lower limits are used to constrain the control smoothness constraint, which is used to constrain the rate of change of control variables. The acceleration rate of change ≤ 1.5 m / s³ and the steering angle rate of change ≤ 0.08 rad / s are converted into a standardized space and then constrained to limit the difference between adjacent control variables. Simultaneously, the objective function is set to minimize trajectory tracking error, optimize control energy, and achieve control smoothness, and its expression is: ; ; in Let be the target trajectory at time t. , , These are the weights for state tracking error, control energy, and control smoothness, respectively. For the allowable relative rate of change, The minimum allowable variation; Step 44: Solve the above optimization problem by calling the MOSEK and ECOS solvers in sequence. If the solution is successful, output the optimal control sequence. If all solvers fail, the current control quantity is used as the initial control sequence. The first instruction of the optimal control sequence is applied to the vehicle, and the control quantity and the actual state of the vehicle are added to the state sequence. The state sequence is updated through a sliding window, and the state of the last 10 moments is retained. In the next control cycle, the above process is repeated based on the updated state sequence and the newly extracted target trajectory segment to achieve rolling optimization and ensure that the vehicle tracks the target trajectory in real time. Step 45: Online parameter update of the real-time disturbance compensation model. After the vehicle executes the control command, vehicle time-series data is collected. Combined with the predicted value of the vehicle state prediction model at that moment, residual data is calculated. At the same time, the control sequence and state sequence of the current moment and the previous 9 moments are extracted to form a vehicle time-series data of length 10. This data is combined with the new residual to form an input feature-residual data pair and stored in the data buffer. When the amount of data in the buffer reaches the threshold, the buffer data is converted into tensors and transmitted to the computing device. The real-time disturbance compensation model is switched to training mode. The model parameters are iteratively optimized using the variational evidence lower bound as the loss function and the Adam optimizer. After optimization, the buffer is cleared. The updated real-time disturbance compensation model is used for disturbance compensation in the next cycle to ensure the model's dynamic adaptability to time-varying disturbances.
6. A hybrid model predictive control system for addressing control delays in large vehicles, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the hybrid model predictive control method for addressing control delays in large vehicles as described in any one of claims 1-5.
7. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the hybrid model predictive control method for addressing control delays in large vehicles as described in any one of claims 1-5.
8. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, it implements the hybrid model predictive control method for addressing control delay of large vehicles as described in any one of claims 1-5, thereby realizing the hybrid model predictive control method for addressing control delay of large vehicles.