A vehicle control parameter self-calibration method and system oriented to driving style preference
By combining reinforcement learning and unsupervised clustering techniques, vehicle control parameters are adjusted in real time, solving the static and personalized adaptability problems of traditional calibration methods. This achieves safe, comfortable, and personalized intelligent chassis control, improving vehicle handling performance and safety.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JILIN UNIVERSITY
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-16
AI Technical Summary
Existing vehicle control parameter calibration methods are static, time-consuming, and unable to be personalized to adapt to the driver's driving style preferences, resulting in decreased control accuracy and safety hazards.
By combining reinforcement learning and unsupervised clustering techniques, driving style is perceived in real time. Vehicle control parameters are adjusted adaptively online. An incremental clustering algorithm is used to generate a style index, and a digital twin model is used for safety verification to achieve personalized intelligent chassis control.
It achieves safe, comfortable and personalized intelligent chassis control under different driving styles, solves the problem that traditional calibration methods cannot adapt to dynamic changes and personalized needs, and improves vehicle handling performance and safety.
Smart Images

Figure CN121900385B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of automotive electronic control, and in particular to a method and system for self-calibrating vehicle control parameters based on driving style preferences. Background Technology
[0002] As the automotive industry evolves towards greater intelligence and connectivity, driver assistance systems and autonomous driving technologies have become core elements for ensuring driving safety and enhancing the passenger experience. In the field of vehicle motion control, to achieve precise lateral stability maintenance and longitudinal trajectory tracking, the industry widely employs various classic control architectures, including Proportional-Integral-Derivative (PID), Linear Quadratic Regulator (LQR), and Model Predictive Control (MPC). These control algorithms rely on a series of key parameters, such as the weight matrices of LQR and MPC, to balance the vehicle's dynamic response speed and system stability. Their performance directly determines the vehicle's handling performance and safety under complex operating conditions.
[0003] However, existing vehicle control parameter calibration processes generally suffer from static nature and long cycles. Traditional calibration methods primarily rely on engineers conducting extensive offline testing and manual, iterative adjustments at specific locations to ultimately obtain a fixed set of parameters. This factory-fixed open-loop calibration strategy struggles to adapt to the highly dynamic changes that occur during actual vehicle operation, such as abrupt changes in road surface adhesion coefficient, increases or decreases in vehicle load, and dynamic characteristic drift caused by tire wear. When the vehicle's operating environment deviates from the calibration conditions, controllers with fixed parameters often fail to maintain optimal performance, leading to decreased control accuracy and even system instability under extreme conditions.
[0004] Furthermore, traditional calibration systems severely neglect the personalized driving style preferences of drivers and passengers. Different users have drastically different needs for the vehicle's dynamic response: aggressive drivers prefer more sensitive power response and steering feedback, while comfort-oriented drivers prioritize smoothness and low impact. Existing general-purpose calibration schemes typically adopt a compromise strategy, failing to meet the growing demand for customized experiences. Although some existing technologies attempt to introduce simple adaptive laws, they still lack a deep understanding of driving style as a high-dimensional semantic feature and the ability to cluster it in real time. Summary of the Invention
[0005] Purpose of the invention: To propose a self-calibration method and system for vehicle control parameters based on driving style preferences. By combining reinforcement learning and unsupervised clustering techniques, the system can perceive driving style in real time and adaptively adjust the underlying control parameters online. This effectively solves the problems of traditional control's inability to be personalized and the safety hazards of end-to-end intelligent assistance, and achieves intelligent chassis control that combines safety, comfort and personalization.
[0006] To achieve the above objectives, the present invention provides a self-calibration method for vehicle control parameters based on driving style preferences, the steps of which are as follows:
[0007] Collect vehicle operation data and environmental data;
[0008] Based on the vehicle operation data, vehicle dynamics state and original feature vectors are generated respectively. After normalizing the original feature vectors, an incremental clustering algorithm is used for online clustering. The continuous style index is calculated by combining the clustering results.
[0009] The style index, vehicle dynamics state, and environmental data are used as inputs. The weight adjustment factor of the controller is dynamically output through a reinforcement learning algorithm. The updated controller parameters are calculated based on the weight adjustment factor.
[0010] The updated controller parameters are verified for security using a digital twin model. If the verification passes, the updated controller parameters are sent to the controller to complete the self-calibration of the control parameters. If the verification fails, the controller parameters are not updated, and this parameter adjustment attempt is stored as a failure sample in the experience replay pool.
[0011] As a preferred embodiment, the process for generating the vehicle dynamics state is as follows:
[0012] The parameters characterizing the vehicle's dynamics are extracted from the vehicle's operating data, including lateral position error, heading angle error, longitudinal speed, vehicle center of gravity sideslip angle, yaw rate, longitudinal acceleration, lateral acceleration, and the control output of the previous moment. Combined with the road surface adhesion coefficient in the environmental data, the vehicle's dynamic state is obtained.
[0013] As a preferred embodiment, the process for generating the original feature vector is as follows:
[0014] Physical quantities characterizing driving style are selected, and feature parameters are obtained through sliding window statistics. These feature parameters include the mean absolute value of longitudinal acceleration, the mean absolute value of lateral acceleration, the maximum lateral acceleration within the window, the standard deviation of the average headway and steering wheel speed.
[0015] As a preferred approach, during the normalization process of the original feature vector, the physical boundary values of each feature obtained from offline driving data statistics are used as normalization parameters.
[0016] As a preferred approach, an incremental clustering algorithm is used for online clustering, with three cluster centers defined as conservative style centers, normal style centers, and radical style centers.
[0017] The clustering process introduces a minimum separation constraint to ensure that the three cluster centers maintain a preset minimum separation distance in the feature space.
[0018] As a preferred embodiment, the calculation process of the style index includes:
[0019] A reference vector is constructed pointing from the conservative style center to the radical style center. The projection ratio of the current normalized feature vector onto the reference vector is calculated. After truncating and smoothing the projection ratio, the continuous style index is obtained. The value range of the style index is [0,1], where 0 represents extremely conservative and 1 represents extremely radical.
[0020] As a preferred embodiment, the reinforcement learning algorithm models the driving style adaptive control problem as a Markov decision process with an infinite horizon.
[0021] In the Markov decision-making process, the state space is the set of vehicle dynamic states and style indices; the action space is the weight adjustment factor of the controller; the reward function is a dynamic weighted instantaneous reward function based on the style index, including handling cost, comfort cost, and safety cost. The weight coefficients of the handling cost, comfort cost, and safety cost are nonlinear functions of the style index, wherein the weight coefficient of the comfort cost is a monotonically decreasing function, and the weight coefficient of the handling cost is a sigmoid-type increasing function.
[0022] As a preferred embodiment, the digital twin model is a three-degree-of-freedom single-track dynamic model; the safety indicators for safety verification include collision risk indicators, road boundary constraints, and dynamic stability envelope.
[0023] The collision risk index is calculated based on obstacle data sensed by radar. If the collision risk index is less than a preset collision time threshold, it is determined to be unsafe.
[0024] The road boundary constraint is used to check whether the lateral position of the predicted trajectory exceeds the lane safety boundary. If it does, it is determined to be unsafe.
[0025] The dynamic stability envelope is used to check whether the predicted yaw rate exceeds the physical limit. If it does, it is determined to be unsafe.
[0026] As a preferred approach, the updated controller parameters are verified for security using a digital twin model. If the verification passes, the controller parameters are updated smoothly using an exponential moving average method.
[0027] Furthermore, this invention proposes a vehicle control parameter self-calibration system oriented towards driving style preferences, which automatically executes the aforementioned vehicle control parameter self-calibration method oriented towards driving style preferences. The system includes:
[0028] The sensing unit is used to collect vehicle operation data and environmental data; the sensing unit includes an inertial measurement unit, wheel speed sensor, steering wheel angle sensor, millimeter-wave radar and forward-looking camera;
[0029] The perception and quantization module generates vehicle dynamics state and original feature vector based on the vehicle operation data collected by the perception unit. After normalizing the original feature vector, it performs online clustering using an incremental clustering algorithm and calculates a continuous style index based on the clustering results.
[0030] The decision module, based on the style index, vehicle dynamics state and environmental data, dynamically outputs the controller's weight adjustment factor through a reinforcement learning algorithm, and calculates the updated controller parameters based on the weight adjustment factor;
[0031] The security gating module verifies the security of the updated controller parameters using a digital twin model. If the verification passes, the updated controller parameters are sent to the controller to complete the self-calibration of the control parameters. If the verification fails, the controller parameters are not updated, and this parameter adjustment attempt is stored as a failure sample in the experience replay pool.
[0032] Beneficial effects: This invention balances comfort and operability through a multi-objective dynamic reward function and introduces a digital twin mode with safety gating to pre-simulate the safety of new parameters in a virtual environment. It effectively solves the problems of traditional control systems' inability to be personalized and the safety risks of end-to-end AI, achieving intelligent chassis control that combines safety, comfort, and personalization. Attached Figure Description
[0033] Figure 1 This is a flowchart illustrating the architecture and execution process of the vehicle control parameter self-calibration system for driving style preferences in this embodiment. Detailed Implementation
[0034] In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to those skilled in the art that the invention can be practiced without one or more of these details. In other instances, certain technical features well-known in the art have not been described in order to avoid obscuring the invention.
[0035] This embodiment discloses a vehicle control parameter self-calibration system oriented towards driving style preferences, see [link to documentation]. Figure 1As shown, the system adopts a two-layer architecture: the execution layer runs a high-frequency controller to ensure dynamic stability; the decision layer runs a low-frequency RL agent, which dynamically adjusts the controller parameters based on a real-time driving style index generated by K-Means clustering. This system can identify the driving style preferences of the driver or passenger in real time and automatically calibrate the key parameters of the underlying vehicle stability and trajectory tracking controller accordingly. This invention covers technical solutions that apply this method to various classic control architectures, including model predictive control, linear quadratic regulators, and proportional-integral-derivative control, aiming to solve the problems of static and simplistic calibration of traditional vehicle control parameters, as well as their inability to adapt to dynamic driving scenarios and personalized needs.
[0036] The following details the implementation plan from four aspects: S1 system hardware, S2 perception and quantization layer, S3 decision layer design, and S4 security gating.
[0037] S1, System Hardware:
[0038] This system is deployed in the domain controller of the intelligent vehicle, and the sensing units include IMU (Inertial Measurement Unit), wheel speed sensor, steering wheel angle sensor, millimeter-wave radar and forward-looking camera.
[0039] S2, Perception and Quantization Layer:
[0040] This module maps driving data to continuous style indices through unsupervised clustering. .
[0041] S2.1, Feature Vectors:
[0042] Constructing the original feature vector We selected a physical quantity that can characterize driving style and performed sliding window statistics with a window length of W=10s.
[0043]
[0044] In the formula, The mean of the absolute values of longitudinal acceleration represents the ride comfort characteristics; The mean of the absolute values of lateral acceleration represents the handling characteristics; The maximum lateral acceleration within the window represents the aggressive characteristics of the curve. The average headway indicates aggressive following behavior; if there are no vehicles ahead, then... ;otherwise , The distance between the vehicle and the vehicle in front; This refers to the longitudinal vehicle speed. The standard deviation of steering wheel rotation speed represents the frequency characteristic of operation.
[0045] To eliminate the influence of dimensions, The final feature vector is obtained by performing Min-Max normalization. :
[0046]
[0047] in and These are the physical boundary values of each feature obtained statistically based on a large amount of offline driving data;
[0048]
[0049]
[0050] S2.2 Online Clustering Algorithm:
[0051] An incremental K-Means algorithm is used. K=3 cluster centers are set and initialized as follows:
[0052] Conservative style center; : Standard style center; Radical style center.
[0053] When a vehicle is in a single operating condition for an extended period, in order to prevent three cluster centers... The proximity of clusters leads to semantic ambiguity, necessitating the introduction of a minimum center separation constraint. First, the center of the winning cluster is calculated. Candidate update values :
[0054]
[0055] in This is the learning rate.
[0056] Calculate this candidate value and other non-winning centers That is, set remove The Euclidean distance of the elements.
[0057] Only when the candidate value meets the minimum separation distance threshold Only when the condition is met will the actual update be performed; otherwise, the update will be abandoned or only a boundary truncation update will be performed.
[0058]
[0059] In the formula, is a preset semantic separation constant used to force the conservative, ordinary, and radical styles to maintain significant distinguishability in the feature space.
[0060] S2.3, Style Index calculate:
[0061] In order to obtain continuous Using the projection method, construct from point to reference vector .
[0062] Calculate current features The projection ratio onto vector V:
[0063]
[0064] right The data is truncated to the [0,1] interval and then smoothed using a first-order low-pass filter.
[0065]
[0066] in This is a smoothing factor. This indicates extreme conservatism. This indicates extreme radicalism; This indicates that the interval is truncated to [0,1].
[0067] S3. Decision-making level design:
[0068] This layer employs the soft actor-critic SAC algorithm, which possesses maximum entropy exploration characteristics and is more suitable for continuous action spaces. Reinforcement Learning Problem Modeling: This invention models the driving style adaptive control problem as an infinite-view Markov decision process (MDP), consisting of quintuples. The description is as follows: S is the state space, representing the vehicle's current dynamic state and environmental information; A is the action space, representing the adjustment behavior of the controller parameters; and P is the state transition probability density function, describing the state transition in the current state. Next action Then transition to the next state The probability distribution of ; R is the reward function, used to evaluate the benefit of performing an action in this state; This is a discount factor used to weigh the importance of immediate rewards against long-term returns.
[0069] S3.1, Optimization of Control Algorithm:
[0070] The optimization form of Model Predictive Control (MPC) involves solving the following quadratic programming problem in the finite time domain at each time k:
[0071]
[0072] The constraints are , . To predict the length of the time domain; This is the predicted future state vector based on the current time k. For reference only; R is the control input vector; Q is the state weight matrix; R is the control weight matrix. To control the minimum value; To control the maximum value; The state matrix; To control the input matrix.
[0073] The goal of a linear quadratic regulator (LQR) is to find the optimal control law. This minimizes the following infinite time-domain cost function:
[0074]
[0075] The system state is represented by u; the control input is K; and the state feedback matrix is K.
[0076] S3.2, State Space and Action Space:
[0077] State space S:
[0078]
[0079] These are the lateral position error and the heading angle error, respectively. This refers to the longitudinal speed of the vehicle. The sideslip angle is the angle between the vehicle's center of gravity and its body. This refers to the yaw rate; For driving style index; The road surface adhesion coefficient; These are longitudinal and lateral accelerations, respectively. This is the control output from the previous moment.
[0080] Action space A: The weight adjustment factor for the output controller. To ensure that the weights are always positive and change smoothly, the adjustment value in the logarithmic domain of the network output is:
[0081]
[0082] The output is the logarithmic adjustment value of the diagonal elements of the weight matrix of control algorithms such as MPC and LQR.
[0083] The actual weight calculation method is as follows:
[0084]
[0085] , These are the initial weighting coefficients for the controller; , This refers to the updated controller weight coefficients.
[0086] S3.3 Multi-objective dynamic reward function:
[0087] To achieve differentiated control performance under different driving styles, a style index-based design was developed. The dynamic weighted instant reward function. Instant reward. Defined as:
[0088]
[0089] In the formula , , These are the weighting coefficients for handling costs, comfort costs, and safety costs, respectively.
[0090] The costs are defined as follows:
[0091] Control costs:
[0092]
[0093] The desired yaw rate; , , These are the weighting coefficients for lateral position error, heading angle error, and yaw rate error, respectively.
[0094] Comfort cost:
[0095]
[0096] In the formula, For longitudinal acceleration; Lateral acceleration; , These are the weighting coefficients for longitudinal jerk and lateral jerk, respectively.
[0097] A penalty is imposed when the vehicle's condition approaches the instability boundary:
[0098]
[0099] In the formula , Increase the weight of penalties; This represents the maximum centroid sideslip angle deviation. This represents the maximum lateral displacement deviation. This is a linear rectified function used to implement unilateral safety constraints. When the vehicle state variable is less than the safety threshold, the function output is zero, indicating that no penalty is applied; only when the state variable exceeds the threshold is the difference between the two output, and a linearly increasing penalty is applied, forcing the control strategy to converge quickly to the safe range.
[0100] S3.4, Dynamic weight function design, where the weight coefficients are style indices. Nonlinear functions:
[0101] Weighting coefficient of comfort cost Designed as a monotonically decreasing function:
[0102]
[0103] In the formula The maximum peak value of the weighting coefficient for comfort cost; This is the decay rate coefficient.
[0104] Weighting coefficient of control costs Designed as a Sigmoid growth function:
[0105]
[0106] In the formula The target saturation value for the weighting coefficient of control costs; This is the growth slope coefficient.
[0107] S3.5, Neural Network Update, Constructing the Actor Network and Critic Network :
[0108] The loss function of the Critic network is defined as follows:
[0109]
[0110] In the formula, The Bellman residual loss function for the Critic network; For the Critic network expectation operator; This represents the current prediction from the Critic network. For instant rewards; Discount factor; Given two Critic networks in the system, take the one with the smallest future evaluation. The state transitioned to at the next moment after the action is performed; This is the sampling action for the next moment.
[0111] The loss function of the Actor network is updated by maximizing the Q-value and entropy:
[0112]
[0113] In the formula, Let be the policy loss function of the Actor network; The goal is to pursue higher rewards; In pursuit of entropy, randomness is explored; Represents certainty; For the exploration coefficient; For the Actor network expectation operator.
[0114] S4, Security Gating:
[0115] To prevent the exploratory parameters output by reinforcement learning (RL) from causing the vehicle to lose control, this system introduces an online verification process based on digital twins.
[0116] S4.1 Digital Twin Model:
[0117] The system maintains a digital twin model consistent with the physical characteristics of the actual vehicle within the isolated core of the domain controller. This model employs a simplified yet high-precision three-degree-of-freedom single-track dynamics model to extrapolate future states in an extremely short time. The discretized state equations for the virtual extrapolation are:
[0118]
[0119] In the formula, For simulating step size, Differential equations including the lateral, longitudinal, and yaw motions of the vehicle; This indicates the state of the digital twin model.
[0120] S4.2 Verification Process:
[0121] When control weight is received , Based on the current state of the vehicle As initial conditions, control weights , Injecting virtual controllers to predict the future Vehicle state sequence in the time domain .
[0122] Safety indicator calculation:
[0123] Collision Risk Index (TTC): Calculates TTC on a virtual trajectory based on obstacle data perceived by radar.
[0124]
[0125] like This is considered unsafe. This is the collision time threshold. The coordinates of the longitudinal position of the obstacle ahead; This represents the predicted longitudinal position of the vehicle at step k in the digital twin model. Let k be the predicted speed of this vehicle at step k in the digital twin model; The longitudinal velocity of the obstacle ahead.
[0126] Road boundary constraints: Check the lateral position of the predicted trajectory Does it exceed the lane safety boundary?
[0127]
[0128] In the formula For the width of the lane, This is for a safety margin.
[0129] Dynamic stability envelope: Examine the predicted yaw rate Does it break the physical limits?
[0130]
[0131] In the formula As a safety factor, we set it to 0.85; is the current road surface adhesion coefficient; g is the gravitational acceleration.
[0132] S4.3, Security Gating Criteria:
[0133] Define the synthesis gate function Only when candidate parameters simultaneously meet both security constraints and performance optimization requirements are they allowed to be issued.
[0134]
[0135] S4.4, Smooth Parameter Update and Closed-Loop Feedback:
[0136] If G=1, then update the parameters Q and R of the actual controller. To prevent control oscillations caused by sudden parameter changes, an exponential moving average is used.
[0137]
[0138] in For update rate;
[0139] If G=0, it means that RL has output a dangerous action. In this case, the controller is not updated, but this attempt should be stored as a failure sample in RL's experience replay pool.
[0140] Embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROMs) containing computer-usable program code. The form of a computer program product implemented on ROM, optical memory, etc.
[0141] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0142] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0143] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0144] In a typical configuration, a computing device includes one or more processors, input / output interfaces, network interfaces, and memory.
[0145] Memory may include non-persistent memory in computer-readable media, random access memory, and / or non-volatile memory, such as read-only memory or flash memory. Memory is an example of computer-readable media.
[0146] Computer-readable media include both permanent and non-permanent, removable and non-removable media that can store information by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory, static random access memory, dynamic random access memory, other types of random access memory, read-only memory, electrically erasable programmable read-only memory, flash memory or other memory technologies, optical disc read-only memory, digital versatile optical disc or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media do not include temporary computer-readable media, such as modulated data signals and carrier waves.
[0147] As described above, although the invention has been shown and described with reference to specific preferred embodiments, it should not be construed as limiting the invention itself. Various changes in form and detail may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims
1. A vehicle control parameter self-calibration method oriented to driving style preference, characterized in that, Includes the following steps: Collect vehicle operation data and environmental data; Based on the vehicle operation data, vehicle dynamics state and original feature vectors are generated. After normalizing the original feature vectors, an incremental clustering algorithm is used for online clustering. Three cluster centers are set: a conservative style center, a normal style center, and an aggressive style center. A minimum separation constraint is introduced during the clustering process to ensure that the three cluster centers maintain a preset minimum separation distance in the feature space. A continuous style index is calculated based on the clustering results. The calculation process of the style index includes: constructing a reference vector from the conservative style center to the aggressive style center; calculating the projection ratio of the current normalized feature vector on the reference vector; truncating and smoothing the projection ratio to obtain the continuous style index. The value range of the style index is [0,1], where 0 represents extremely conservative and 1 represents extremely aggressive. The style index, vehicle dynamics state, and environmental data are used as inputs. The weight adjustment factor of the controller is dynamically output through a reinforcement learning algorithm. The updated controller parameters are calculated based on the weight adjustment factor. The updated controller parameters are verified for security using a digital twin model. If the verification passes, the updated controller parameters are sent to the controller to complete the self-calibration of the control parameters. If the verification fails, the controller parameters are not updated, and this parameter adjustment attempt is stored as a failure sample in the experience replay pool.
2. The vehicle control parameter self-calibration method for driving style preference according to claim 1, wherein, The process of generating the vehicle dynamic state is as follows: The parameters characterizing the vehicle's dynamics are extracted from the vehicle's operating data, including lateral position error, heading angle error, longitudinal speed, vehicle center of gravity sideslip angle, yaw rate, longitudinal acceleration, lateral acceleration, and the control output of the previous moment. Combined with the road surface adhesion coefficient in the environmental data, the vehicle's dynamic state is obtained.
3. The vehicle control parameter self-calibration method for driving style preference according to claim 1, characterized in that, The process of generating the original feature vector is as follows: Physical quantities characterizing driving style are selected, and feature parameters are obtained through sliding window statistics. These feature parameters include the mean absolute value of longitudinal acceleration, the mean absolute value of lateral acceleration, the maximum lateral acceleration within the window, the standard deviation of the average headway and steering wheel speed.
4. A self-calibration method for vehicle control parameters based on driving style preferences according to claim 1 or 3, characterized in that, During the normalization process of the original feature vector, the physical boundary values of each feature obtained from offline driving data statistics are used as normalization parameters.
5. The self-calibration method for vehicle control parameters based on driving style preferences according to claim 1, characterized in that, In the reinforcement learning algorithm, the driving style adaptive control problem is modeled as a Markov decision process with an infinite horizon. In the Markov decision process, the state space is a set of vehicle dynamic states and style indices; The action space is the weight adjustment factor of the controller; the reward function is a dynamic weighted real-time reward function based on the style index, including control cost, comfort cost and safety cost. The weight coefficients of the control cost, comfort cost and safety cost are nonlinear functions of the style index, wherein the weight coefficient of comfort cost is a monotonically decreasing function and the weight coefficient of control cost is a sigmoid-type increasing function.
6. The vehicle control parameter self-calibration method based on driving style preferences according to claim 1, characterized in that, The digital twin model is a three-degree-of-freedom single-track dynamic model; the safety indicators for safety verification include collision risk indicators, road boundary constraints, and dynamic stability envelope. The collision risk index is calculated based on obstacle data sensed by radar. If the collision risk index is less than a preset collision time threshold, it is determined to be unsafe. The road boundary constraint is used to check whether the lateral position of the predicted trajectory exceeds the lane safety boundary. If it does, it is determined to be unsafe. The dynamic stability envelope is used to check whether the predicted yaw rate exceeds the physical limit. If it does, it is determined to be unsafe.
7. The self-calibration method for vehicle control parameters based on driving style preferences according to claim 1, characterized in that, The updated controller parameters are verified for security using a digital twin model. If the verification passes, the controller parameters are updated smoothly using an exponential moving average method.
8. A vehicle control parameter self-calibration system oriented towards driving style preferences, used to automatically execute the vehicle control parameter self-calibration method oriented towards driving style preferences as described in any one of claims 1 to 7, characterized in that, include: The sensing unit is used to collect vehicle operation data and environmental data; the sensing unit includes an inertial measurement unit, wheel speed sensor, steering wheel angle sensor, millimeter-wave radar and forward-looking camera; The perception and quantization module generates vehicle dynamics state and original feature vector based on the vehicle operation data collected by the perception unit. After normalizing the original feature vector, it performs online clustering using an incremental clustering algorithm and calculates a continuous style index based on the clustering results. The decision module, based on the style index, vehicle dynamics state and environmental data, dynamically outputs the controller's weight adjustment factor through a reinforcement learning algorithm, and calculates the updated controller parameters based on the weight adjustment factor; The security gating module verifies the security of the updated controller parameters using a digital twin model. If the verification is successful, the updated controller parameters are sent to the controller to complete the self-calibration of the control parameters. If the verification fails, the controller parameters will not be updated, and this parameter adjustment attempt will be stored as a failed sample in the experience replay pool.