Intelligent chassis limit motion decision control method and system

By using an intelligent chassis extreme motion decision control method, which combines a vision-language-action model with a chassis state compression encoder, the joint planning of trajectory and execution mode is achieved. This solves the problem of decision-making and execution disconnect in existing intelligent driving systems under extreme conditions, and improves the safety and executability of the system.

CN122194984APending Publication Date: 2026-06-12TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2026-02-24
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing intelligent driving systems struggle to balance trajectory executability and safety under extreme conditions, exhibiting issues such as a disconnect between decision-making and execution, underutilization of chassis status information, lack of collaborative output capability of multimodal large models, and conservative redundancy strategies.

Method used

A smart chassis extreme motion decision control method is proposed. By fusing chassis state information, a collaborative mode of the execution system paired with the trajectory is generated. The method utilizes a vision-language-action model, a chassis state compression encoder, and an executability constraint attention mechanism to achieve joint planning and control of the trajectory and the execution mode.

🎯Benefits of technology

It maintains safe and high-performance driving capabilities under extreme conditions, improves the feasibility of the trajectory and the control safety of the system, and solves the limitations of existing methods under extreme conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122194984A_ABST
    Figure CN122194984A_ABST
Patent Text Reader

Abstract

The application discloses a kind of intelligent chassis limit sports decision control method and system, the method obtains the comprehensive perception and state information of vehicle;Current chassis state information is generated low-dimensional embedding vector by compression encoder, and is translated into semantic prompt word;Based on the embedding features of surrounding environment visual information, historical trajectory and action sequence, the attention weight of candidate trajectory is dynamically adjusted in combination with low-dimensional embedding vector and semantic prompt word;Future trajectory and corresponding chassis execution mode are synchronously generated by decoder, to ensure that the generated trajectory and chassis physical ability match and directly map to execution strategy.The application can realize the joint decision of trajectory and execution mode under extreme working condition, and through the attention mechanism embedded with executable constraints, the physical feasibility of trajectory and the real-time adaptability of chassis control are significantly improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of active safety technology for automobiles, and in particular to an intelligent chassis extreme motion decision-making and control method and system. Background Technology

[0002] With the rapid development of intelligent driving and vehicle-to-everything (V2X) technologies, the autonomous perception, decision-making, and control capabilities of vehicles have gradually been applied in urban roads, highways, and some specific scenarios. However, existing intelligent driving systems still have significant limitations when dealing with extreme conditions (such as emergency collision avoidance, extreme cornering, driving on low-friction surfaces, and partial failure of the execution system), making it difficult to achieve human-like driving control that ensures both safety and performance.

[0003] First, regarding the separation of decision-making and execution, most current autonomous driving methods adopt a layered structure. The upper-level decision-making or planning module generates the vehicle's future trajectory, while the lower-level execution module allocates chassis control parameters such as steering, braking, and traction based on the trajectory. This architecture can meet requirements under normal operating conditions, but it often leads to the problem of "trajectory feasible but physically unexecutable" under extreme conditions. For example, a trajectory planner might provide a geometrically smooth collision avoidance trajectory, but the lateral acceleration or force required for this trajectory exceeds the tire adhesion limit, ultimately making it impossible to achieve through the chassis system, thus creating a disconnect between trajectory and execution.

[0004] Secondly, regarding the insufficient utilization of chassis status information, most existing systems only perform "post-processing checks" using vehicle dynamics constraints after trajectory generation, rather than actively incorporating chassis status during the decision-making process. This results in the generated trajectory lacking real-time awareness of vehicle limits and actuator health status, making it difficult to make rapid decisions that conform to dynamic constraints in extreme situations. For example, when a steering actuator fails, traditional methods may still output a trajectory dependent on that actuator, requiring additional redundancy mechanisms for correction, which significantly increases response latency.

[0005] Secondly, considering the limitations of multimodal large models in driving, in recent years, Visual-Language-Action (VLA) models have been gradually introduced into intelligent driving tasks, demonstrating certain advantages in complex environments due to their powerful perception and reasoning capabilities. However, the outputs of existing VLA models are mostly abstract action sequences or trajectory points, lacking deep coupling with chassis physical constraints and failing to simultaneously generate specific execution system coordination strategies. For example, the model may output a trajectory, but it does not specify whether this trajectory should be executed through "normal steering," or requires "steering + differential steering coordination," or "braking + drift mode," making it difficult to directly implement the decision results.

[0006] Furthermore, considering the specific needs of extreme operating conditions, intelligent driving requires not only rapid decision-making in situations such as emergency collision avoidance, extreme cornering, drifting on low-friction surfaces, and system failure, but also that the generated actions be both feasible and safe. Most existing research focuses on trajectory optimization in normal scenarios, while in extreme conditions, it often adopts conservative avoidance or emergency braking strategies, lacking the flexibility and coordination of a human driver in extreme situations. Although this approach offers high safety, it may lead to collision avoidance failure or vehicle instability during high-speed emergency collision avoidance or extreme maneuvering.

[0007] In summary, existing autonomous driving methods suffer from the following pain points: 1. The decision-making layer and the execution layer are disconnected, making it difficult for trajectories to directly correspond to the chassis's executable modes; 2. Chassis state information is not fully utilized in the trajectory generation stage, lacking "executability priors"; 3. Multimodal large models can only output trajectories or actions, lacking the ability to jointly output with the execution system; 4. Under extreme conditions, existing methods often adopt conservative and redundant strategies, lacking the flexibility and stability required for high-performance driving.

[0008] Therefore, there is an urgent need for a new decision-making and control method for extreme sports. Summary of the Invention

[0009] This invention proposes an intelligent chassis extreme motion decision-making and control method that can fully integrate chassis state information while generating a trajectory, and output a collaborative mode of the execution system paired with the trajectory, thereby realizing joint planning and control of "trajectory-mode". This method not only requires powerful multimodal reasoning capabilities based on vision-language-action models, but also needs to introduce a chassis state compression encoder, an executable constraint attention mechanism, and chassis state cue input into the architecture, enabling the system to maintain safe, stable, and high-performance driving capabilities under extreme conditions, much like an experienced human driver.

[0010] Another objective of this invention is to propose an intelligent chassis extreme motion decision control system.

[0011] To achieve the above objectives, a first aspect of the present invention provides an intelligent chassis extreme motion decision-making and control method, comprising: Acquire comprehensive perception and status information of the vehicle; the comprehensive perception and status information of the vehicle includes at least visual information of the environment in front of and around the vehicle, historical trajectory and action sequence, and current chassis status information. The current chassis status information is used to generate a low-dimensional embedding vector through a compression encoder and then converted into semantic prompt words. Based on the embedding features of surrounding visual information, historical trajectories and action sequences, combined with low-dimensional embedding vectors and semantic cue words, the attention weights of candidate trajectories are dynamically adjusted. The decoder synchronously generates future trajectories and corresponding chassis execution modes to ensure that the generated trajectories match the chassis's physical capabilities and are directly mapped to the execution strategy.

[0012] In one embodiment of the present invention, acquiring comprehensive perception and status information of the vehicle includes: The system acquires visual information of the vehicle's front, visual information of the surrounding environment, historical trajectory information, historical action information, and current chassis status information, wherein the current chassis status information includes vehicle dynamic parameters and actuator health status.

[0013] In one embodiment of the present invention, the chassis state information is used to generate a low-dimensional embedding vector through a compression encoder, including: The visual information of the front of the vehicle and the visual information of the surrounding environment are extracted by the front view encoder and the surrounding view encoder, respectively, to generate the front visual embedding and the surrounding environment embedding. Historical trajectory information and historical action information are input into the historical information encoder to generate a timing context embedding, and the current chassis state information is input into the chassis state compression encoder to generate a chassis state embedding.

[0014] In one embodiment of the present invention, based on the embedding features of surrounding environmental visual information, historical trajectories, and action sequences, and combined with low-dimensional embedding vectors and semantic cue words, the attention weights of candidate trajectories are dynamically adjusted, including: Introduce an actionable constraint attention mechanism; Based on frontal visual embedding, surrounding environment embedding, temporal context embedding, chassis state embedding, and chassis state cues, a Transformer structure with an introduced executable constraint attention mechanism is used for multimodal feature fusion. The executable constraint attention mechanism dynamically adjusts the attention weights based on the ratio of the lateral acceleration requirement of the candidate trajectory to the road adhesion limit, the yaw rate requirement to the vehicle's maximum yaw capacity, and the steering angle requirement to the steering actuator's maximum capacity, in order to suppress trajectories that exceed vehicle dynamics constraints.

[0015] In one embodiment of the present invention, a future trajectory and a corresponding chassis execution mode are synchronously generated through a decoder to ensure that the generated trajectory matches the chassis physical capabilities and is directly mapped to the execution strategy, including: A joint decoder is used to synchronously output the future trajectory increment and the corresponding chassis execution mode classification. The chassis execution mode classification includes normal execution mode, steering and differential steering coordinated mode, and braking and steering drift mode. Excessive jitter in execution modes at adjacent time steps is suppressed by hysteresis consistency terms, ensuring that the trajectory matches the physical feasibility of the execution mode.

[0016] In one embodiment of the present invention, the vehicle's forward visual information and surrounding environment visual information are respectively extracted using a front view encoder and a surrounding view encoder to generate forward visual embeddings and surrounding environment embeddings, including: The main view encoder uses the visual encoding and embedding structure inherent in the large language model to extract lane line geometric features, road boundary features, static obstacle features, and dynamic target features; The surrounding view encoder uses a Q-former structure to compress features from the side and rear camera inputs, and outputs the surrounding environment feature embedding through redundancy suppression and noise constraint processing.

[0017] In one embodiment of the present invention, acquiring visual information about the area in front of and around the vehicle includes: The forward-facing main camera captures key elements such as road geometry, lane line boundaries, pedestrians, and obstacles directly in front of the vehicle. Surround view cameras are used to provide panoramic information from the sides and rear, compensating for blind spots and the dynamics of neighboring vehicles that cannot be perceived from the forward view.

[0018] In one embodiment of the present invention, it is assumed that at time... The main view image captured below is:

[0019] The input for the surround view camera is:

[0020] past The trajectory points and action sequence at each moment are stored together. Let the historical trajectory be:

[0021] The action history is as follows:

[0022] in Indicates the steering angle command. Indicates longitudinal acceleration; Let the current chassis state be:

[0023] in These are the longitudinal and lateral velocities, respectively. The yaw rate is angular velocity. The sideslip angle is the angle of the center of mass. These are longitudinal and lateral accelerations, respectively. Indicates the health status of steering, braking, and drive actuators; In the data acquisition and input phase, the system ultimately obtains the following raw input set: .

[0024] In one embodiment of the present invention, a historical information encoder is used to... and Modeling is performed; trajectory points and action sequences are embedded and input into QT-Former, and a temporal context representation is obtained through a time series self-attention mechanism:

[0025] Chassis state vector Through a dedicated compression encoder It is mapped to a low-dimensional embedding:

[0026] Transform the chassis embedding vector into semantic prompts:

[0027] in This is a mapping function from state to language; Five key embeddings were obtained during the feature encoding stage: frontal visual features, surrounding environment features, historical time sequence features, chassis numerical embeddings, and chassis cue words.

[0028] To achieve the above objectives, a second aspect of the present invention provides an intelligent chassis extreme motion decision control system, comprising: The information acquisition module is used to acquire the vehicle's comprehensive perception and status information; the comprehensive perception and status information of the vehicle includes at least visual information of the environment in front of and around the vehicle, historical trajectory and action sequence, and current chassis status information. The multimodal feature extraction module is used to generate low-dimensional embedding vectors from the current chassis status information through a compression encoder and convert them into semantic prompt words; The multimodal feature fusion module is used to dynamically adjust the attention weights of candidate trajectories based on the embedded features of the surrounding environment visual information, historical trajectories and action sequences, combined with low-dimensional embedding vectors and semantic cue words. The autoregressive decoding module is used to synchronously generate future trajectories and corresponding chassis execution modes through the decoder, so as to ensure that the generated trajectory matches the chassis physical capabilities and is directly mapped to the execution strategy.

[0029] The intelligent chassis extreme motion decision control method and system of this invention aims to solve the problems of unexecutable trajectory, disconnect between decision and chassis, and insufficient system adaptability in existing autonomous driving technologies under extreme conditions such as emergency collision avoidance, extreme cornering, driving on low-adhesion surfaces, and partial failure of the execution system.

[0030] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0031] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein: Figure 1 A flowchart of an intelligent chassis extreme motion decision control method provided in an embodiment of the present invention; Figure 2 This is an architecture diagram of an intelligent chassis extreme motion decision-making and control method provided in an embodiment of the present invention; Figure 3 This is a schematic diagram illustrating the effect of outputting an executable trajectory to overcome ambient noise, as provided in an embodiment of the present invention. Figure 4 This is a structural diagram of an intelligent chassis extreme motion decision control system provided in an embodiment of the present invention. Detailed Implementation

[0032] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other. The present invention will now be described in detail with reference to the accompanying drawings and embodiments.

[0033] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0034] The following description, with reference to the accompanying drawings, describes an intelligent chassis extreme motion decision-making and control method and system according to an embodiment of the present invention.

[0035] Example 1 This embodiment provides an intelligent chassis extreme motion decision-making and control method. For example... Figure 1 As shown, it includes: S1, acquire comprehensive perception and status information of the vehicle; the comprehensive perception and status information of the vehicle includes at least visual information of the front and surrounding environment of the vehicle, historical trajectory and action sequence, and current chassis status information.

[0036] Understandably, during the data acquisition phase, the system acquires visual information, historical trajectories and action sequences, and current chassis status information from the area in front of and around the vehicle through multi-source sensor fusion, providing basic input for subsequent multimodal fusion and decision-making control. Specifically, visual information is jointly acquired by the main camera and surround-view cameras. The main camera captures key information such as road geometry features, lane boundaries, pedestrians, and obstacles in front of the vehicle, and its image data is represented as... ,in , These represent the image height and width, respectively, with 3 indicating a three-channel color image. The surround-view camera provides panoramic information from the sides and rear, and its input is... It also uses three-channel images to enhance the system's ability to perceive the dynamics of neighboring vehicles and blind spots.

[0037] This step ensures that the model possesses comprehensive environmental awareness, historical behavior memory, and chassis state constraints when generating trajectories and execution modes. Through the synchronous acquisition and structured processing of multimodal data, the system can achieve joint reasoning of trajectories and execution strategies under extreme operating conditions, significantly improving the executability and safety of decisions.

[0038] S2 generates a low-dimensional embedding vector from the current chassis status information through a compression encoder and converts it into semantic prompt words.

[0039] In this invention, generating low-dimensional embedding vectors from chassis state information using a compression encoder and further transforming them into semantic cue words is a key step in achieving joint "trajectory-pattern" decision-making. This step aims to structurally compress and semantically enhance high-dimensional, dynamically changing chassis state information, thereby introducing executability constraints of vehicle dynamics into the multimodal fusion and trajectory generation process.

[0040] Specifically, the present invention extracts features from the vehicle's frontal visual information and surrounding environment visual information through the main view encoder and the surrounding view encoder, respectively, to generate frontal visual embedding and surrounding environment embedding; inputs historical trajectory information and historical action information into the historical information encoder to generate temporal context embedding, and inputs the current chassis state information into the chassis state compression encoder to generate chassis state embedding.

[0041] The forward visual information of the vehicle is processed by a front-view encoder to extract depth features. This encoder employs an improved ConvNeXt or Swin Transformer architecture, with input being a sequence of temporal images captured by a forward-facing camera, typically 224×224 or 480×640 pixels. The encoder first extracts low-level visual features such as lane line geometry and topology, obstacle contours and distance estimates, traffic sign semantic categories, and road curvature variations through multi-scale convolutional layers. Subsequently, it fuses information from multiple frames using temporal convolution or a Transformer temporal module to capture the relative motion trends of dynamic targets and collision time prediction, ultimately generating a forward visual embedding vector with dimensions 512 or 768. This embedding integrates high-level semantic information from scene depth estimation, object detection bounding box features, and lane line masks, effectively representing the drivable area and potential risks within a 50-meter radius in front of the vehicle.

[0042] The system extracts omnidirectional features from the surrounding environment visual information using a surrounding view encoder. This encoder processes multi-view images from surround-view fisheye cameras and side cameras, employing a cross-view attention mechanism and geometric projection correction to ensure spatial consistency. First, features are extracted independently for each viewpoint, acquiring the position, speed, and category information of vehicles, pedestrians, cyclists, curbs, and obstacles in adjacent lanes. Then, a bird's-eye view transformation module projects these multi-view features onto a unified BEV coordinate system. Combined with accurate depth priors provided by LiDAR point cloud data, a dense surrounding environment occupancy grid and dynamic target trajectory predictions are generated. The final output surrounding environment embedding dimension is also 512 or 768, but it focuses on encoding the intentions of interactive traffic participants within a 360-degree radius around the vehicle, side and rear blind spot risks, and lane change feasibility assessments.

[0043] Historical trajectory and motion information are input into a historical information encoder to generate a temporal context embedding. This encoder receives vehicle pose sequences, velocity curves, acceleration commands, and steering and braking action execution values ​​from the past three to five seconds. A bidirectional LSTM or causal Transformer structure is used to perform temporal modeling of historical states, capturing driver operating habits, vehicle response delay characteristics, and the cumulative effect of recent trajectory tracking errors. Internally, the encoder incorporates motion-to-state transition probability modeling to learn the mapping relationship between historical control inputs and the vehicle's actual dynamic response. The generated temporal context embedding is designed with a dimension of 256 or 512, containing not only simple memories of past states but also implicitly encoding the temporal evolution of vehicle dynamic parameter uncertainties, actuator saturation trends, and external disturbances, providing short-term historical causal relationships for current decision-making.

[0044] The current chassis state information is input into a chassis state compression encoder to generate a chassis state embedding. This encoder receives real-time signals from the CAN bus, including twenty to thirty key state variables such as longitudinal vehicle speed, lateral vehicle speed, yaw rate, center of gravity sideslip angle, body roll and pitch angles, four-wheel wheel speeds, vertical loads on each wheel, actual position and angular velocity of the steering actuators, clamping force feedback from the four electromechanical brakes, and battery system voltage and current. The encoder employs a residual MLP or a lightweight fully connected network to perform nonlinear dimensionality reduction and feature selection on the high-dimensional state space, suppressing sensor noise and extracting key dynamic patterns. Using a pre-trained vehicle dynamics model as a constraint prior, the encoder learns to compress the original state into a 128 or 256-dimensional chassis state embedding. This embedding effectively characterizes the current vehicle stability margin, tire adhesion utilization level, health status and remaining capacity of each actuator, and the linearity of the overall vehicle dynamic response, providing accurate real-time state perception for higher-level decision-making.

[0045] Furthermore, the visual information of the vehicle's front and the visual information of the surrounding environment are extracted using a front view encoder and a surrounding view encoder, respectively, to generate a front visual embedding and a surrounding environment embedding, including: S21, the main view encoder uses the visual encoding and embedding structure inherent in the large language model to extract lane line geometric features, road boundary features, static obstacle features and dynamic target features.

[0046] Specifically, in the feature encoding stage of this invention, the front view encoder uses the visual encoding and embedding structure inherent in the Large Language Model (LLM) to extract features from the environmental information in front of the vehicle. This encoder, based on the visual encoding module in a pre-trained vision-language model (such as CLIP, ViT, etc.), processes the images captured by the forward-facing camera. The input is fed into a convolutional neural network (CNN) or visual transformer (ViT) structure, which extracts local and global features of the image layer by layer. In some implementations, the encoder may employ a multi-scale feature pyramid structure to take into account both the geometric structure of lane lines (such as straight lines, curves, and width) and the semantic information of obstacles (such as pedestrians, vehicles, and roadblocks).

[0047] Specifically, the main view encoder first extracts multi-layer feature maps of the image through a backbone network (such as ResNet or ViT), then fuses the features with positional encoding through a self-attention mechanism, and finally outputs a high-dimensional semantic embedding vector. This embedding vector not only includes the geometric features of lane lines (such as curvature and offset), but also incorporates semantic information from road boundaries, static obstacles (such as guardrails and curbs), and dynamic targets (such as pedestrians and vehicles). Regarding parameter settings, the embedding dimension... The value is usually set to 768 or 1024 to match the input dimension of mainstream large language models.

[0048] Furthermore, the encoder maps visual features to a semantic space compatible with the language model through the embedding layer of the pre-trained model, thereby achieving seamless integration with subsequent language modules. In practical applications, this step provides the system with a high-precision environmental perception representation, serving as one of the fundamental inputs for joint decision-making in trajectory generation and execution modes. By extracting multiple types of visual features, the system can accurately identify drivable areas and potential risks in complex road environments, providing crucial geometric and semantic constraints for decision-making under extreme conditions.

[0049] S22, the surrounding view encoder uses a Q-former structure to compress features from the side and rear camera inputs, and outputs the surrounding environment feature embedding through redundancy suppression and noise constraint processing.

[0050] Specifically, in this invention, the surrounding view encoder employs a Q-former structure to compress features from the lateral and rearward camera inputs. Its core objective is to extract highly discriminative surrounding environment features for trajectory decision-making through redundancy suppression and noise constraint processing. The Q-former is a multimodal feature fusion module based on the Transformer architecture, widely used in vision-language models for structured compression and semantic enhancement of image features. In this invention, the Q-former is custom-designed to adapt to the specific characteristics of the vehicle's lateral and rearward camera inputs, including issues such as viewpoint distortion, frequent occlusion, and high dynamic target density.

[0051] In the specific implementation, images are input from the side and rear cameras. First, local feature maps are extracted using a pre-trained visual encoder (such as ViT or ResNet), and then fed into the Q-former module. The Q-former introduces a learnable query vector. The system interacts with image features through cross-attention to filter out visual information relevant to trajectory decision-making. During the attention calculation process, the system uses an attention masking mechanism to suppress background regions in the lateral image that are unrelated to vehicle motion (such as fixed buildings and non-dynamic obstacles), while imposing constraints on noisy regions (such as low light and motion blur) to reduce their interference with trajectory prediction.

[0052] Furthermore, the output dimension of Q-former is ,in The dimension of the feature embedding is typically set to 256 or 512 to match the output dimension of the main view encoder. This embedding vector is then concatenated with the main view features, historical trajectory embeddings, chassis state embeddings, etc., in the subsequent multimodal fusion stage to serve as the trajectory query vector. Context input.

[0053] This step plays a crucial role in the entire system: through structured compression and attention filtering, the system can effectively reduce redundant information in lateral and rearward images, improving the robustness and accuracy of trajectory prediction. Especially in scenarios such as emergency collision avoidance and extreme cornering, this module can ensure that the model's perception of key information such as the dynamics of neighboring vehicles and obstacles behind is not interfered with by background noise, thereby generating trajectories and execution patterns that conform to physical executability.

[0054] S3 dynamically adjusts the attention weights of candidate trajectories based on the embedding features of surrounding visual information, historical trajectories, and action sequences, combined with low-dimensional embedding vectors and semantic cue words.

[0055] Specifically, in the multimodal fusion stage, this invention performs multimodal feature fusion based on forward visual embedding, surrounding environment embedding, temporal context embedding, chassis state embedding, and chassis state cues, using a Transformer structure with an introduced executability-constrained attention mechanism. This executability-constrained attention mechanism dynamically adjusts attention weights based on the ratio of the candidate trajectory's lateral acceleration requirement to the road adhesion limit, the yaw rate requirement to the vehicle's maximum yaw capacity, and the steering angle requirement to the steering actuator's maximum capacity, to suppress trajectories exceeding vehicle dynamics constraints. The core of this mechanism lies in real-time coupling of chassis state information with the physical feasibility of trajectory candidates, ensuring that the model actively avoids unexecutable paths when generating trajectories.

[0056] First, the forward vision embedding is extracted from road images captured by the forward-facing camera using a convolutional neural network, including spatial structure information such as lane line geometry, obstacle outlines, and traffic sign semantics. The surrounding environment embedding is jointly encoded from point cloud data from surround-view cameras and LiDAR, covering scene context such as the distribution of omnidirectional targets around the vehicle, passable area masks, and dynamic obstacle movement trends. The temporal context embedding accumulates historical trajectory prediction residuals, execution mode sequences, and environmental change flows to capture the evolution and dynamic trends of the driving scene. The chassis state embedding encodes real-time dynamic state variables such as the vehicle's current longitudinal speed, lateral speed, yaw rate, center of gravity sideslip angle, four-wheel speeds, and the actual feedback positions of each actuator. Chassis state cues serve as high-level semantic guidance, describing the current vehicle dynamics margin in natural language, such as good road adhesion conditions and sufficient yaw capability or limited steering execution on low-adhesion surfaces. These cues are encoded by a pre-trained language model and injected into the Transformer, providing prior constraints for attention calculation.

[0057] The executability-constrained attention mechanism embeds a vehicle dynamics feasibility assessment module into the standard self-attention calculation process. For each candidate trajectory, the system calculates three core physical indicators in real time. The first is the ratio of lateral acceleration requirement to road adhesion limit. This is obtained by evaluating the product of trajectory curvature and the square of vehicle speed, and then comparing it with the maximum adhesion capacity provided by the current road surface. When this ratio approaches one, it indicates that the trajectory is close to the adhesion limit. The second is the ratio of yaw rate requirement to the vehicle's maximum yaw capacity. The required yaw rate is calculated based on the trajectory's turning radius and vehicle speed, and then compared with the vehicle's maximum yaw response capacity at the current speed, determined by tire sideslip characteristics and axle load transfer. This ratio reflects the challenge the trajectory poses to vehicle attitude control. The third is the ratio of steering angle requirement to the maximum capacity of the steering actuator. This is achieved by converting trajectory curvature into front wheel steering angle requirement through an inverse dynamics model, considering the equivalent steering angle contribution of differential steering, and finally comparing it with the mechanical travel limits of the steering motor and steer-by-wire system to determine the feasibility of steering execution.

[0058] These three ratios constitute an executability constraint vector, which is mapped to attention adjustment coefficients by a gating unit. When all ratios are within a safety margin, the attention weights are calculated normally, allowing the trajectory feature to fully interact with other modalities. When any ratio exceeds a preset warning threshold, the gating unit gradually reduces the attention score of the trajectory and other features, suppressing its contribution to the fusion representation. If the ratio approaches or reaches the limit value, the attention weight corresponding to the trajectory is forcibly decayed to near zero, effectively filtering out unexecutable extreme trajectory candidates during the feature fusion stage. This dynamic adjustment mechanism effectively avoids the delay and inconsistency problems of prediction followed by verification in traditional methods, embedding dynamic constraints in the early stage of attention calculation, ensuring that only physically feasible trajectory features can dominate the subsequent decoding and decision-making process, thereby significantly improving the system's safety and response rationality under extreme conditions.

[0059] S4 synchronously generates future trajectories and corresponding chassis execution modes through the decoder to ensure that the generated trajectory matches the chassis's physical capabilities and is directly mapped to the execution strategy.

[0060] Specifically, the present invention employs a joint decoder to synchronously output future trajectory increments and corresponding chassis execution mode classifications. The chassis execution mode classifications include normal execution mode, steering and differential steering coordinated mode, and braking and steering drift mode. Excessive jitter of execution modes at adjacent time moments is suppressed by hysteresis consistency terms to ensure physical feasibility matching between trajectory and execution mode.

[0061] It is understood that this invention employs a multi-task joint decoder architecture, simultaneously outputting future trajectory increment predictions and chassis execution mode classifications through a single neural network, achieving end-to-end collaborative optimization of prediction and decision-making. This joint decoder is built upon a Transformer decoder structure, where the trajectory increment branch outputs a temporal trajectory increment sequence of vehicle coordinates and attitude using continuous vector regression. The execution mode classification branch outputs the probability distributions of three types of chassis execution modes through an independent multi-layer sensing head.

[0062] The chassis execution mode classification specifically includes: First, the normal operation mode. Suitable for regular driving conditions, it employs a linear coordination control strategy combining front-wheel steering and the torque of the four independent wheel hub motors. In this mode, the response of each actuator follows a preset linear gain relationship, lateral acceleration is kept at a low level, and longitudinal slip ratio is kept within a small range, ensuring both ride comfort and energy economy.

[0063] Second, the steering and differential steering coordination mode. For high-speed cornering or emergency obstacle avoidance, it actively integrates the front wheel steering angle and the torque vector difference between the left and right wheels. By establishing a coordination mapping function, the front wheel steering and torque difference are converted into the overall steering effect, achieving a significant improvement in steering efficiency while maintaining the coordination between yaw rate and center of gravity sideslip angle.

[0064] Third, the braking and steering drift mode. Designed specifically for extreme conditions, it employs active pressure modulation of the electromechanical braking system and dynamic planning of the front wheel steering angle. This mode allows the rear wheel longitudinal slip ratio to enter a controllable drift state. Through model predictive control, it optimizes the braking pressure distribution matrix and front wheel steering sequence in real time, enabling the vehicle to fully utilize the tire's nonlinear adhesion while ensuring trajectory tracking accuracy, thereby minimizing braking distance or maximizing obstacle avoidance capability.

[0065] To ensure smooth mode switching and system stability, this invention introduces a hysteresis consistency regularization term. This regularization term includes a mode difference weighting coefficient and a hysteresis penalty coefficient. When the change in the mode probability vector between adjacent time steps exceeds a preset threshold, a secondary penalty is triggered, effectively suppressing mode jitter caused by sensor noise or prediction uncertainty and ensuring temporal consistency.

[0066] In addition, the system has established a dual verification mechanism for physical feasibility.

[0067] Forward verification, based on vehicle dynamics model constraints, calculates a compatibility index between the current execution mode and the predicted trajectory. This index is obtained by comparing the maximum output torque in the current mode with the torque required for trajectory tracking. When the compatibility index falls below a set threshold, mode degradation or trajectory replanning is triggered.

[0068] Backpropagation employs posterior bias analysis to establish a trajectory-pattern matching loss function. This loss function measures the deviation between the actual trajectory generated under a specific pattern and the target trajectory, incorporating confidence weights. This loss is directly embedded into the multi-task learning framework of the joint decoder, achieving end-to-end physical consistency optimization through gradient backpropagation.

[0069] Through this tightly coupled joint prediction and hierarchical verification mechanism, the present invention achieves deep collaboration between trajectory planning and underlying execution mode. While retaining the advantages of each mode, it significantly reduces the frequency of mode switching, greatly reduces trajectory tracking error, and improves control robustness and safety under complex working conditions.

[0070] An intelligent chassis extreme motion decision control method according to an embodiment of the present invention realizes integrated decision-making of trajectory generation and chassis execution mode, effectively improving trajectory executability and system control safety under extreme working conditions.

[0071] Example 2 This invention proposes an architecture for an intelligent chassis extreme motion decision-making and control method, such as... Figure 2 As shown, it can be understood that, unlike traditional methods that only output geometric trajectories, this invention proposes a joint output mechanism of trajectory and execution mode. Combined with a surrounding view encoder, a historical information encoder, a chassis state compression encoder, chassis state prompt words, and an executability constraint attention mechanism, it achieves a deep integration of multimodal perception, dynamic constraints, and execution system collaboration, enabling the system to maintain safety and controllability even under extreme conditions.

[0072] In one embodiment of the present invention, a view encoder consists of a main camera and surround-view cameras that acquire information about the environment in front of and around the vehicle. The main view encoder uses a large language model with its own encoding and embedding structure to extract features of the road, obstacles, and lanes ahead. The surround-view encoder uses Q-former to compress features from the side and rear camera inputs, reducing redundancy and suppressing interference from lateral noise on decision-making. The two types of visual features are encoded separately before entering multimodal fusion, enabling the system to have omnidirectional perception capabilities.

[0073] In one embodiment of the present invention, the historical information encoder—comprising historical trajectories, action sequences, and past visual features—is modeled using a QT-Former structure to form a time-series context embedding. This module is equivalent to "human-like driving experience memory," which can improve the predictability and continuity of the system in scenarios such as continuous maneuvering for emergency collision avoidance and multi-stage adjustments during extreme cornering.

[0074] In one embodiment of the present invention, a chassis state compression encoder converts data such as speed, acceleration, yaw rate, sideslip angle, tire force margin, and actuator health status collected by chassis sensors into low-dimensional embedded vectors. This embedding removes redundant information while retaining dynamic characteristics related to vehicle limits, enabling it to play a constraining role in subsequent reasoning.

[0075] In one embodiment of the present invention, the chassis status prompt module further transforms the chassis embedding into semantic prompts, such as "Current vehicle speed is 90 km / h, front wheel steering actuator partially malfunctions, low adhesion margin." The prompts, along with visual features and historical information, are input into a large language model, enabling it to automatically consider vehicle executability constraints during reasoning, thus improving the rationality and feasibility of decision-making.

[0076] In one embodiment of the present invention, an executability-constrained attention mechanism is introduced during the multimodal fusion stage, so that candidate trajectories are influenced by vehicle dynamics constraints during generation. This mechanism dynamically adjusts the attention weights of candidate trajectories based on the ratio of lateral acceleration to road adhesion limits, ensuring that trajectories that do not meet dynamic constraints are weakened during inference, thereby outputting truly executable trajectories.

[0077]

[0078] In one embodiment of the present invention, the trajectory and execution mode joint output module outputs, while the decoder generates the future trajectory, an execution mode paired with the trajectory, including: a normal execution mode (conventional steering + drive + braking coordination); a steering and differential steering coordination mode (used when steering is limited or additional yaw moment is required); and a braking and steering drift mode (used for low-traction surfaces or extreme cornering scenarios). This joint output ensures that the trajectory is not only geometrically reasonable but also directly mapped to the vehicle chassis's execution mode, improving controllability and safety under extreme conditions.

[0079] Furthermore, the intelligent chassis extreme motion decision control method of the present invention will be described in stages below.

[0080] Data Acquisition Phase: In the implementation of this invention, the system first continuously acquires environmental perception information and the vehicle's own dynamic state through onboard sensors and a bus, serving as input for subsequent multimodal reasoning and decision-making control. Visual information mainly comes from two types of cameras: on the one hand, the forward-facing main camera captures key elements such as road geometry, lane boundaries, pedestrians, and obstacles directly in front of the vehicle; on the other hand, surrounding view cameras provide panoramic information from the sides and rear, compensating for blind spots and the dynamics of adjacent vehicles that cannot be perceived by the traditional forward-facing view. Let's assume at time... The main view image captured below is:

[0081] The input for the surround view camera is:

[0082] Both types of image data are three-channel color images. These two types of image data constitute the main input for environmental information.

[0083] In addition to the current frame, the system also incorporates historical information as a supplement. Specifically, past... The trajectory points and action sequences at each moment are stored together. Let the historical trajectory be...

[0084] The action history is as follows:

[0085] in Indicates the steering angle command. This represents longitudinal acceleration (which can be determined by both driving and braking forces). This historical data ensures that the model can incorporate "temporal context" during inference, thereby avoiding inconsistent decision-making in continuous maneuvering scenarios such as emergency collision avoidance and extreme cornering.

[0086] Meanwhile, the chassis state vector is provided in real time by the onboard speed sensor, inertial measurement unit (IMU), and actuator diagnostic module. Let the current chassis state be:

[0087] in These are the longitudinal and lateral velocities, respectively. The yaw rate is angular velocity. The sideslip angle is the angle of the center of mass. These are longitudinal and lateral accelerations, respectively. These represent the health status of the steering, braking, and drive actuators (values ​​can be categorized as normal / partially failed / completely failed). These state variables collectively constitute the core inputs describing the vehicle's dynamic boundaries, determining whether a particular trajectory is physically executable.

[0088] Therefore, in the data acquisition and input stage, the original input set finally obtained by the system can be described as follows:

[0089] Visual information, historical information, and chassis status together provide the foundation for subsequent feature encoding and multimodal reasoning.

[0090] This data acquisition step plays a crucial role in preparing the input for the system, ensuring that the multimodal model can simultaneously perceive the environment, understand historical behavior, and assess the chassis's performance capabilities during the decision-making process. By introducing chassis status cues and an executability constraint attention mechanism, the system possesses the ability to perceive vehicle dynamic limits during the trajectory generation stage, significantly improving the safety and feasibility of decision-making under extreme conditions.

[0091] Feature Encoding Stage: After completing multi-source data acquisition, this invention enters the feature encoding stage to achieve a joint representation of the environment, history, and chassis status. The forward-facing main camera acquires global information on lane geometry, road boundaries, static obstacles, and dynamic targets, while the side and rear cameras provide the dynamic status of adjacent lanes and rear traffic participants. These image inputs first undergo independent feature encoding processing. The main view encoder directly calls the visual encoding and embedding structure built into the large language model to encode the input images. Convert to dense semantic vector This includes not only road geometry but also the ability to characterize obstacle locations and lane distribution. Simultaneously, a holographic image set... The input is fed into the surrounding view encoder, which uses a Q-Former structure to compress and filter lateral and backward features, outputting an embedding of surrounding environment features. By performing redundancy suppression and noise constraints before attention convergence, this module can effectively reduce the interference of lateral background on trajectory reasoning.

[0092] To incorporate temporal context, this invention employs a historical information encoder. and Modeling is then performed. Specifically, trajectory points and action sequences are embedded and input into QT-Former, where a temporal context representation is obtained through a time-series self-attention mechanism.

[0093] This vector is equivalent to a "memory" of the vehicle's past state and behavior, ensuring that the model can generate smoother and more consistent trajectory predictions in long-term dependent scenarios such as continuous collision avoidance and extreme cornering.

[0094] Meanwhile, the chassis state vector Through a dedicated compression encoder It is mapped to a low-dimensional embedding:

[0095] This embedding method can remove redundancy while retaining constraint information closely related to dynamic limits, such as whether the sideslip angle is close to the stability boundary, or whether there is partial failure of the steering, braking, or drive actuators. To further enhance the reasoning ability of the language model, this invention also transforms the chassis embedding vector into semantic cue words:

[0096] in This is a mapping function from state to language, and the output can be a structured description such as "vehicle speed 90km / h, front wheel steering failure, low road surface adhesion margin". Cue words and numerical embeddings are input into the large model in parallel, enabling the model to understand the vehicle's physical executable range like a human driver and actively consider these constraints during the inference phase.

[0097] After the above processing, the system obtained five key embeddings in the feature encoding stage: front visual features, surrounding environment features, historical time sequence features, chassis numerical embedding, and chassis prompt words.

[0098] Understandably, this feature encoding step is widely used in extreme driving scenarios such as emergency collision avoidance, extreme cornering, and drifting on low-adhesion surfaces. For example, when a sudden obstacle appears ahead, the main view encoder can quickly identify the obstacle's location and lane boundaries, while the surrounding view encoder can detect the dynamics of adjacent vehicles, providing the system with comprehensive environmental perception input. Even in the event of actuator failure, the features extracted in this step can still retain crucial environmental information, providing a reliable basis for subsequent trajectory-pattern joint decision-making.

[0099] The technical advantage of this step lies in the fact that, through structured encoding, the system can transform multi-view visual information into a unified-dimensional embedding vector, providing semantically consistent input for subsequent multimodal fusion and trajectory generation. Simultaneously, by compressing lateral information using Q-Former, the system effectively reduces the impact of redundancy and noise, improving perception accuracy and decision-making stability in complex environments.

[0100] Multimodal fusion and inference stage: The above features are fed into the same Transformer structure for joint modeling. All features are first linearly projected to a unified dimension and then encoded with spatiotemporal location, enabling interaction within the same space. During this process, the trajectory query vector... This is introduced to represent the parameters of the future trajectory to be generated. Through a cross-attention mechanism, the trajectory query vector interacts with features from different modalities, and its update formula is:

[0101] in This represents an attention mechanism that introduces executable constraints. The weight distribution of ordinary attention is... The decision was made, and the present invention adds a dynamic penalty term to this. Thus, the following is formed:

[0102] in This is the executable penalty matrix for each candidate key-value pair. It is a learnable scaling term with the same dimensions as attention logits. This indicates an element-wise effect. The penalty matrix is ​​determined by the current chassis state. The provisional candidate trajectory given by the decoder at this layer (or above) Jointly determined; to avoid introducing explicit modeling related to tire lateral forces, the penalty term is expressed as a geometric / kinematic observable index: the maximum curvature of the candidate trajectory. The resulting lateral acceleration Required maximum yaw rate With the required maximum steering angle At discrete landmarks {( , )}Down,

[0103] Therefore, we can conclude that:

[0104] in, Let L be the current longitudinal speed and L be the wheelbase. The available boundary obtained from chassis state estimation and actuator diagnostics is paired with the above equation to form a penalty term:

[0105] in , The coefficients are represented. This construction dynamically weakens all candidates that exceed the current attachment / attitude / actuator capability boundaries before attention normalization, thereby endogenously injecting "physical executability" into information selection and fusion; simultaneously, from Semantic cues are incorporated into the same attention map as part of Keys / Values, enabling the model to explicitly “understand” failures and low-attachment constraints during cross-modal alignment.

[0106] In such inter-layer loops, the merged latent variables The system aggregates environmental geometry, neighborhood interactions, temporal context, and state constraints layer by layer, providing the joint decoder with paired trajectories and execution patterns. In practical applications, it can be used to... A simplified approximation is performed to improve real-time performance.

[0107] Decoding and Output Stage: Latent Variables Generated by Multimodal Fusion Under these conditions, the system uses an autoregressive approach to progressively generate incremental tokens for future trajectories. The decoder at time... By combining the latent variables and the previously generated increments as context, output the increment distribution. The trajectory is reconstructed starting from the initial attitude, satisfying:

[0108] In parallel with the trajectory increment, the decoder simultaneously outputs the classification distribution of chassis execution modes. ,in These correspond to normal execution, steering + differential steering coordination, and braking + steering drift, respectively. To avoid excessive jitter in adjacent time steps, a temperature and hysteresis consistency term is introduced to address deviations from the previous time step's mode. Apply a mild punishment. Then... Obtain the current pattern. The joint decision of trajectory and pattern can be written as:

[0109] Based on this, the system generates the raw control quantities that can be issued from the chassis.

[0110] Therefore, Figure 3 A schematic diagram illustrating the executable trajectory effect output to overcome ambient noise.

[0111] In summary, the present invention has the following technical effects: 1. Integrated Trajectory and Execution: Trajectory generation and execution mode are bound together, avoiding the problem of trajectory non-executability in traditional methods.

[0112] 2. Dynamic constraint embedding: Chassis executability constraints are introduced during the trajectory generation stage, rather than being filtered afterward.

[0113] 3. Enhanced semantics of chassis status: Chassis status cues improve the model's inference accuracy under extreme conditions.

[0114] 4. Strong adaptability to extreme conditions: In scenarios such as steering failure, low-friction road surfaces, and extreme cornering, the model can output the execution mode that should be switched, which improves the robustness and safety of the system.

[0115] The intelligent chassis extreme motion decision control method of this invention aims to solve the problems of trajectory non-executability, decision-making disconnection from chassis, and insufficient system adaptability in existing autonomous driving technologies under extreme conditions such as emergency collision avoidance, extreme cornering, driving on low-adhesion surfaces, and partial failure of the execution system. It can achieve joint decision-making of trajectory and execution mode under extreme conditions. Through an attention mechanism with embedded executability constraints, it significantly improves the physical feasibility of trajectory and the real-time adaptability of chassis control.

[0116] To implement the methods of the above embodiments, the present invention also provides an intelligent chassis extreme motion decision control system 10, such as... Figure 4 As shown, it includes: The information acquisition module 100 is used to acquire the comprehensive perception and status information of the vehicle; the comprehensive perception and status information of the vehicle includes at least visual information of the environment in front of and around the vehicle, historical trajectory and action sequence, and current chassis status information. The multimodal feature extraction module 200 is used to generate a low-dimensional embedding vector from the current chassis status information through a compression encoder and convert it into semantic prompt words; The multimodal feature fusion module 300 is used to dynamically adjust the attention weights of candidate trajectories by combining low-dimensional embedding vectors and semantic cue words with the embedded features based on the visual information of the surrounding environment, historical trajectories and action sequences. The autoregressive decoding module 400 is used to synchronously generate future trajectories and corresponding chassis execution modes through the decoder, so as to ensure that the generated trajectory matches the chassis physical capabilities and is directly mapped to the execution strategy.

[0117] The intelligent chassis extreme motion decision control system of this invention aims to solve the problems of unexecutable trajectory, disconnect between decision and chassis, and insufficient system adaptability in existing autonomous driving technology under extreme conditions such as emergency collision avoidance, extreme cornering, driving on low-adhesion surfaces, and partial failure of the execution system. It can realize joint decision-making of trajectory and execution mode under extreme conditions. Through an attention mechanism with embedded executability constraints, it significantly improves the physical feasibility of trajectory and the real-time adaptability of chassis control.

[0118] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

[0119] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0120] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "a plurality of" means at least two, such as two, three, etc., unless otherwise explicitly specified.

Claims

1. A smart chassis extreme motion decision-making and control method, characterized in that, include: Acquire comprehensive perception and status information of the vehicle; the comprehensive perception and status information of the vehicle includes at least visual information of the environment in front of and around the vehicle, historical trajectory and action sequence, and current chassis status information. The current chassis status information is used to generate a low-dimensional embedding vector through a compression encoder and then converted into semantic prompt words. Based on the embedding features of surrounding visual information, historical trajectories and action sequences, combined with low-dimensional embedding vectors and semantic cue words, the attention weights of candidate trajectories are dynamically adjusted. The decoder synchronously generates future trajectories and corresponding chassis execution modes to ensure that the generated trajectories match the chassis's physical capabilities and are directly mapped to the execution strategy.

2. The method as described in claim 1, characterized in that, Acquire comprehensive perception and status information of the vehicle, including: The system acquires visual information of the vehicle's front, visual information of the surrounding environment, historical trajectory information, historical action information, and current chassis status information, wherein the current chassis status information includes vehicle dynamic parameters and actuator health status.

3. The method as described in claim 2, characterized in that, The chassis status information is used to generate a low-dimensional embedding vector through a compression encoder, including: The visual information of the front of the vehicle and the visual information of the surrounding environment are extracted by the front view encoder and the surrounding view encoder, respectively, to generate the front visual embedding and the surrounding environment embedding. Historical trajectory information and historical action information are input into the historical information encoder to generate a timing context embedding, and the current chassis state information is input into the chassis state compression encoder to generate a chassis state embedding.

4. The method as described in claim 3, characterized in that, Based on the embedding features of surrounding visual information, historical trajectories, and action sequences, combined with low-dimensional embedding vectors and semantic cues, the attention weights of candidate trajectories are dynamically adjusted, including: Introduce an actionable constraint attention mechanism; Based on frontal visual embedding, surrounding environment embedding, temporal context embedding, chassis state embedding, and chassis state cues, a Transformer structure with an introduced executable constraint attention mechanism is used for multimodal feature fusion. The executable constraint attention mechanism dynamically adjusts the attention weights based on the ratio of the lateral acceleration requirement of the candidate trajectory to the road adhesion limit, the yaw rate requirement to the vehicle's maximum yaw capacity, and the steering angle requirement to the steering actuator's maximum capacity, in order to suppress trajectories that exceed vehicle dynamics constraints.

5. The method as described in claim 4, characterized in that, The decoder synchronously generates future trajectories and corresponding chassis execution modes to ensure that the generated trajectories match the chassis's physical capabilities and are directly mapped to the execution strategy, including: A joint decoder is used to synchronously output the future trajectory increment and the corresponding chassis execution mode classification. The chassis execution mode classification includes normal execution mode, steering and differential steering coordinated mode, and braking and steering drift mode. Excessive jitter in execution modes at adjacent time steps is suppressed by hysteresis consistency terms, ensuring that the trajectory matches the physical feasibility of the execution mode.

6. The method as described in claim 3, characterized in that, The visual information of the vehicle's front and the surrounding environment are extracted using a front view encoder and a surrounding view encoder, respectively, to generate a front visual embedding and a surrounding environment embedding, including: The main view encoder uses the visual encoding and embedding structure inherent in the large language model to extract lane line geometric features, road boundary features, static obstacle features, and dynamic target features; The surrounding view encoder uses a Q-former structure to compress features from the side and rear camera inputs, and outputs the surrounding environment feature embedding through redundancy suppression and noise constraint processing.

7. The method as described in claim 1, characterized in that, Acquire visual information about the area in front of and around the vehicle, including: The forward-facing main camera captures key elements such as road geometry, lane line boundaries, pedestrians, and obstacles directly in front of the vehicle. Surround view cameras are used to provide panoramic information from the sides and rear, compensating for blind spots and the dynamics of neighboring vehicles that cannot be perceived from the forward view.

8. The method as described in claim 7, characterized in that, Set at time The main view image captured below is: The input for the surround view camera is: past The trajectory points and action sequence at each moment are stored together. Let the historical trajectory be: The action history is as follows: in Indicates the steering angle command. Indicates longitudinal acceleration; Let the current chassis state be: in These are the longitudinal and lateral velocities, respectively. The yaw rate is angular velocity. The sideslip angle is the angle of the center of mass. These are longitudinal and lateral accelerations, respectively. Indicates the health status of steering, braking, and drive actuators; In the data acquisition and input phase, the system ultimately obtains the following raw input set: 。 9. The method as described in claim 8, characterized in that, Using a historical information encoder and Modeling is performed; trajectory points and action sequences are embedded and input into QT-Former, and a temporal context representation is obtained through a time series self-attention mechanism: Chassis state vector Through a dedicated compression encoder It is mapped to a low-dimensional embedding: Transform the chassis embedding vector into semantic prompts: in This is a mapping function from state to language; Five key embeddings were obtained during the feature encoding stage: frontal visual features, surrounding environment features, historical time sequence features, chassis numerical embeddings, and chassis cue words.

10. An intelligent chassis extreme motion decision control system, characterized in that, include: The information acquisition module is used to acquire the vehicle's comprehensive perception and status information; the comprehensive perception and status information of the vehicle includes at least visual information of the environment in front of and around the vehicle, historical trajectory and action sequence, and current chassis status information. The multimodal feature extraction module is used to generate low-dimensional embedding vectors from the current chassis status information through a compression encoder and convert them into semantic prompt words; The multimodal feature fusion module is used to dynamically adjust the attention weights of candidate trajectories based on the embedded features of the surrounding environment visual information, historical trajectories and action sequences, combined with low-dimensional embedding vectors and semantic cue words. The autoregressive decoding module is used to synchronously generate future trajectories and corresponding chassis execution modes through the decoder, so as to ensure that the generated trajectory matches the chassis physical capabilities and is directly mapped to the execution strategy.