Remote driving cooperative control system and method based on TCN-BiLSTM-attention network prediction

By using the TCN-BiLSTM-Attention network prediction method, combined with dynamic polygon masking and asymmetric penalty loss function, the problem of vehicle loss of control and visual blindness caused by sudden degradation of communication network in remote driving systems in high-risk closed scenarios is solved, and safe and stable vehicle control is achieved.

CN122219232APending Publication Date: 2026-06-16DONGFENG COMML VEHICLE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
DONGFENG COMML VEHICLE CO LTD
Filing Date
2026-03-18
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

In high-risk, enclosed scenarios, existing technologies for remote driving systems suffer from cross-domain cascading failures, such as physical loss of vehicle control and visual blindness caused by sudden degradation of the underlying communication network. These failures lack a unified cross-domain coordination mechanism, making it difficult to effectively capture network degradation characteristics, resulting in control lag and loss of visual information.

Method used

A prediction method based on TCN-BiLSTM-Attention network is adopted. By constructing a dynamic polygon mask and an asymmetric penalty loss function, synchronous and collaborative intervention between the video information layer and the chassis control layer is achieved. The future network state is predicted and a closed-loop degradation collaborative mechanism is triggered to realize dynamic adjustment of vehicle load and road surface adhesion coefficient, ensuring the synchronous collaborative mechanism between video bitrate control and vehicle speed.

🎯Benefits of technology

It effectively avoids physical disconnection caused by sudden network fluctuations, ensures clear visibility of the road ahead, and constructs an absolute safety braking redundancy that does not rely on fixed parameters, achieving a deep integration of clear visibility and stable driving.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122219232A_ABST
    Figure CN122219232A_ABST
Patent Text Reader

Abstract

The application provides a remote driving cooperative control system and method based on a TCN-BiLSTM-Attention network prediction, and relates to the field of intelligent networked vehicles. The method first extracts bottom-layer communication data to construct an original input matrix, obtains the current load, coordinates, heading angle, road adhesion coefficient and other physical situations of the vehicle, and calls the three-dimensional coordinate set of the center line of the road ahead. Then, the original matrix is preprocessed, and through multi-scale time sequence convolution, bidirectional feature splicing and attention time step weighting, a global prediction matrix of the communication index in the future time window is obtained. Then, the prediction matrix, vehicle situation and environmental information are combined to generate a dynamic polygon mask and differential encoding. Compared with the existing method, the asymmetric penalty loss function is used to reduce the network sudden drop and false alarm rate, avoid sudden disconnection, replace the static partition with the dynamic polygon mask, maintain the core field of view of remote driving at low bandwidth, and ensure the continuity of takeover. The information and physical layer isolation is broken, and the load and road coefficient are combined to construct a safety braking redundancy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent connected vehicles, and in particular to a remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction. Background Technology

[0002] With the development of intelligent connected vehicle technology, remote driving systems have been widely used in high-risk, enclosed environments such as mining areas and ports. However, in actual engineering operations, the underlying communication network in these high-risk, enclosed environments is prone to sudden degradation, leading to cross-domain cascading failures of the remote driving system, resulting in physical loss of vehicle control and blindness in core vision. For this scenario, existing technologies typically employ passive response control strategies and isolated video and vehicle control mechanisms, which have the following technical drawbacks: First, existing general network timing prediction models struggle to effectively capture sudden degradation characteristics of underlying communication networks, resulting in missed degradation reports. Due to the lack of precise, forward-looking triggers, systems generally employ passive response control strategies. This approach leads to significant hardware and software pipeline delays in actual engineering deployments, increasing the round-trip time (RTT) of control commands. When the onboard control unit receives a delayed command, the vehicle's actual spatial pose often deviates from the initial coordinates at the time the command was issued. This spatial misalignment and temporal lag creates potential risks for subsequent system failures.

[0003] Secondly, the existing video stream control mechanism at the information layer further exacerbates the risk of failure, building upon the control lag caused by passive response. When the underlying communication bandwidth is suddenly limited due to network degradation, existing video encoders typically use static two-dimensional Region of Interest (ROI) partitioning logic when allocating macroblock bitrates. This static ROI partitioning logic is completely independent of the vehicle's actual three-dimensional physical trajectory. When the vehicle turns in complex terrain, a large amount of limited network throughput is consumed on invalid background pixels, causing right-of-way pixel blocks to become blurred due to excessively high quantization parameters. The buffer pool of the remote video decoder is prone to underflow, causing stuttering or black screens at the remote end, resulting in "visual blindness" for the driver under low bandwidth conditions and cutting off the information path for manual intervention.

[0004] Third, while visual blindness occurs at the information layer, the chassis control at the physical layer cannot provide effective safety redundancy due to the lack of a cross-domain coupling mechanism. Existing chassis drive-by-wire degradation logic typically relies solely on a single network latency metric for speed limiting, failing to incorporate underlying physical limits such as real-time vehicle dynamic load and road adhesion coefficient into the safety degradation calculation logic. Since video stream bitrate control and vehicle driving control operate independently as isolated systems, when the vehicle encounters network degradation and visual blindness under heavy load or low-adhesion road conditions, the degraded speed calculated solely based on network latency cannot meet the safe braking distance requirements under actual physical kinematics, easily leading to braking over-limit and inertial collision risks.

[0005] In summary, the existing technologies suffer from a complex interplay of problems, including network prediction model underreporting, video coding space disconnection, and fragmented chassis control physical parameters. The lack of a unified cross-domain collaboration mechanism makes it difficult to address the cascading failure problem of remote driving systems during sudden network degradation. Summary of the Invention

[0006] To address the shortcomings of the existing technologies, the technical problem to be solved by this invention is to provide a remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction. This system can forcibly construct and trigger a mechanism for synchronous closed-loop degradation cooperative intervention between the video information layer and the chassis control layer based on the forward-looking judgment results of the underlying communication network optimized by asymmetric penalties. This solves the cross-domain cascading failure problem of "physical loss of vehicle control" and "visual blindness" caused by sudden degradation of the underlying communication network in high-risk closed scenarios.

[0007] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is as follows: The present invention provides a remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction, comprising the following steps: S1. Extract underlying communication data to construct the original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. ; S2. For the original input matrix Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. Preprocessing includes multi-scale temporal convolution processing, bidirectional contextual feature concatenation, and time-step feature weighting based on attention mechanism; S3. Based on the global prediction sequence matrix The vehicle's physical state and environmental spatial information are used to generate a dynamic polygon mask, and the dynamic polygon mask is used to perform differentiated directional coding. S4. Determine the dynamic safety attenuation coefficient based on the vehicle's physical condition. According to the dynamic safety attenuation coefficient Determine the maximum safe speed threshold And based on the maximum safe speed threshold Active torque intervention and speed limiting are applied to the underlying drive-by-wire chassis.

[0008] In the preferred embodiment, step S2 specifically includes: S21. The original input matrix... Using the exponentially weighted moving average algorithm for any time step Perform data feature smoothing to obtain smoothed feature vectors. Based on smooth eigenvectors The Z-Score normalization algorithm is applied to eliminate dimensional differences and obtain standardized eigenvectors. ; S22, set any time step Standardization characteristics The input is fed into a multi-layer temporal convolutional network consisting of stacked one-dimensional causal convolutions and dilated convolutions to perform multi-scale temporal convolution processing. The multi-scale temporal convolution processing formula is expressed as: ; in, The weight vector of the convolution kernel. To sense the size of the field, The dilation factor, the dilation factor in a multi-layer temporal convolutional network. The expansion factor increases exponentially along the network depth. The value can be 1, 2, 4, or 8; High-level feature sequences are determined based on the top-level output tensor of multi-layer temporal convolutional networks. ; S23, The high-level feature sequence Bidirectional context feature concatenation is performed in the bidirectional recurrent network unit, and the forward long short-term memory unit calculates the forward hidden state in ascending time order. The backward hidden state is calculated by sliding backward long short-term memory units in reverse chronological order. Hide the forward state With backward hidden state Perform spatial concatenation along the feature dimensions to obtain the joint feature matrix. ; S24. Combine the characteristic matrix Respectively with the query weight matrix Key weight matrix and value weight matrix Multiply to obtain the query matrix Key matrix and value matrix ; Using query matrix AND key matrix The unnormalized attention score is obtained by performing dot product similarity calculation. Unnormalized attention score Dynamic attention weights are obtained through processing using the Softmax function. Based on dynamic attention weights Log-value matrix The value vectors in the tensor are weighted and summed to obtain the focused context feature tensor. ; S25, Focusing on Context Feature Tensors The input to the fully connected output layer is subjected to a linear transformation, focusing on the context feature tensor. Weight matrix of the output fully connected layer Multiply and add the bias vector Perform dimensionality reduction mapping to obtain the global prediction sequence matrix. ; From the global prediction sequence matrix Extracting end-to-end delay prediction subsequences Packet loss rate prediction subsequence and downlink bandwidth prediction subsequence ; For downlink bandwidth prediction subsequence Perform integral calculations to obtain the total downlink bandwidth capacity within the future time window. .

[0009] In the preferred embodiment, step S2 also includes a model training phase: S26, Based on asymmetric penalty loss function Drive the TCN-BiLSTM-Attention hybrid network model to perform backpropagation closed-loop training; Obtaining real-world future network latency With predicted latency ; When the actual network latency of the future Greater than the predicted delay When, the asymmetric penalty coefficient will be... Introducing an asymmetric penalty loss function Calculate the partial derivatives, based on the asymmetric penalty coefficient. The partial derivatives amplify the gradient step size during backpropagation, and the weight updates of the TCN-BiLSTM-Attention hybrid network model are performed by amplifying the gradient step size during backpropagation.

[0010] In the preferred scheme, the global prediction sequence matrix is ​​parsed in step S3. ; Downlink bandwidth prediction subsequence If multiple consecutive time steps are below a preset video connectivity bandwidth threshold, or if the packet loss rate prediction subsequence... When the packet loss exceeds the preset limit, the following steps are triggered: S31. Utilizing the spatial coordinates in the vehicle's physical state With heading angle Constructing the dynamic pose transformation matrix Combined with the global dynamic external parameter matrix of the vehicle camera and internal parameter matrix Set the three-dimensional world coordinates of the centerline of the road ahead. Mapped to the current video frame to generate the set of center pixels ; S32. Apply the least squares method to the set of center pixels. Perform a polynomial smoothing fit to obtain the equation for the centerline of the continuous road; combine this with the current vehicle speed. and basic viewing distance width Construct a mathematical equation for dynamically expanding width; introduce yaw rate. Construct an asymmetric extended simultaneous equation, and calculate the width of the left and right boundaries based on the asymmetric extended simultaneous equation to obtain the edge contour point set; Close the edge contour point set to generate a dynamic polygonal region of interest mask, so as to define the two-dimensional pixel-coded boundary that matches the physical road trend; S33. Based on the total downlink bandwidth capacity within the future time window. Determine the physical boundary constraints of throughput; construct quantization parameters using polygonal region of interest masks. The goal is to minimize the dynamic optimization equation; initialize the polygonal region of interest mask quantization parameters. The value of is determined and the parameters are quantized based on the polygon region of interest mask. Compared with the preset differential pressure threshold Determine the quantization parameters of the background region Generate quantization parameters containing polygonal region of interest mask. Quantization parameters of the background region The trial parameter pair; The predicted total coded data volume under the current parameter combination is determined based on the trial parameters; when the predicted total coded data volume is greater than the total downlink bandwidth capacity within the future time window... At that time, the quantization parameters of the polygon region of interest mask are increased by adjusting the long-cycle distance from the walk. The value is determined and the quantization parameters of the background area are updated simultaneously. The value of is determined until the quantization parameter of the background region is indirectly satisfied, while still meeting the physical boundary constraints of throughput. Greater than or equal to the polygon region of interest mask quantization parameter Compared with the preset differential pressure threshold The parameter combination of the forced differential pressure lower limit constraint of the sum; Determine the polygonal region of interest mask quantization parameters based on the matched parameter combinations. The optimal solution and background region quantization parameters The optimal solution; Parameter quantization using polygonal region of interest mask The optimal solution performs fine-grained encoding on the region corresponding to the dynamic polygon region of interest mask, utilizing the background region quantization parameters. The optimal solution performs coarse encoding on the background region outside the dynamic polygon region of interest mask to output a differentiated compressed video stream containing the core region of interest.

[0011] In the preferred embodiment, step S4 specifically includes: S41, End-to-end delay prediction subsequence Performing an arithmetic mean calculation yields the future average latency, which characterizes the macroscopic trend of future network congestion. ; S42. Based on the current vehicle load Maximum rated load The load penalty term is determined proportionally based on the road surface adhesion coefficient. Determine the pavement attenuation term; multiply the load penalty term by the first weighting adjustment parameter. Multiply the road surface attenuation term by the second weighting adjustment parameter. The multiplied load penalty term and the multiplied pavement attenuation term are then combined with the basic safety factor. Perform linear superposition calculations to obtain the dynamic safety attenuation coefficient. Dynamic safety attenuation coefficient The calculation formula is expressed as: ; S43, Based on the physical limit speed Dynamic safety attenuation coefficient Compared with future average latency Construct an engineering application formula, perform multiplicative decay calculation based on the engineering application formula, and obtain the maximum safe vehicle speed threshold. The engineering application formula is expressed as: ; S44, Maximum safe vehicle speed threshold The data is converted into bus control messages and sent to the chassis drive-by-wire actuator unit to perform speed limiting intervention, so as to maintain a physical collision avoidance safety distance within the network degradation time window.

[0012] The preferred solution includes an adaptive degradation step for the prediction model under hardware-constrained computing power conditions: Get the graphics processor utilization rate and the core thermistor temperature; when the graphics processor utilization rate continuously exceeds the preset load alarm threshold, or the core thermistor temperature continuously exceeds the preset temperature alarm threshold, generate a model structure degradation instruction. According to the model structure downgrade instruction, skip step S23, which involves processing the high-level feature sequence. Obtain the joint characteristic matrix The steps, and skipping step S24 based on the joint feature matrix Obtain the focused context feature tensor The steps are to extract high-level feature sequences. Degradation mapping weight matrix Multiply and add the downgraded bias vector The degraded network delay prediction matrix is ​​obtained. To replace the global prediction sequence matrix under hardware computing power constraints. Trigger the downgraded collaborative intervention mechanism.

[0013] The preferred solution includes visual fault-tolerant perception and hard braking steps under sensor communication timeout conditions: Listen for heartbeat messages on the chassis controller's local area network bus; when a communication timeout occurs at a node of the chassis anti-lock braking system, it affects the road adhesion coefficient. When missing, extract the global brightness histogram feature matrix of the current video frame image. High-frequency edge texture feature matrix ; The global brightness histogram feature matrix High-frequency edge texture feature matrix The input is fed into a support vector machine classifier to perform classification matching, resulting in a conservative road surface adhesion coefficient. ; S53, based on the current vehicle load Maximum rated load The load penalty term is determined proportionally based on the conservative road adhesion coefficient. Determine the conservative road surface attenuation term; Multiply the load penalty term by the first weight adjustment parameter. Multiply the conservative pavement attenuation term by the second weighting adjustment parameter. The multiplied load penalty term, the multiplied conservative pavement attenuation term, and the basic safety factor will be added. and extreme value safety penalty bias term Perform linear superposition calculations to obtain the degradation dynamic attenuation coefficient. Degradation dynamic attenuation coefficient The calculation formula is expressed as: ; S54, Based on the physical speed limit Degradation dynamic attenuation coefficient Degraded network latency prediction matrix The future average physical delay Perform multiplicative decay calculations to obtain the decay physical limit speed; then compare the decay physical limit speed with the minimum creep speed boundary. Perform maximum value comparison and extraction calculation to obtain the maximum safe speed threshold for downgrading. Downgrade the maximum safe speed threshold The calculation formula is expressed as: ; Based on the downgraded maximum safe speed threshold Generate the highest-level interrupt request command and send it to the chassis drive system to execute physical braking hard lock, so as to maintain the vehicle's physical collision avoidance safety baseline in the event of sensor communication timeout.

[0014] In a preferred embodiment, the present invention also provides a remote driving cooperative control system based on TCN-BiLSTM-Attention network prediction, comprising: The multi-dimensional perception acquisition module is used to extract underlying communication data to construct the original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. To perform step S1 as claimed in claim 1; The network state look-ahead prediction module is used to process the original input matrix. Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. The preprocessing includes multi-scale temporal convolution processing, bidirectional context feature concatenation, and time-step feature weighting processing based on attention mechanism, in order to perform step S2 as claimed in claim 1; The cross-domain degradation collaborative intervention triggering module is used to parse the global prediction sequence matrix. Based on the global prediction sequence matrix The video information control module and the chassis physical control module are triggered synchronously to execute the steps of the synchronous triggering degradation and collaborative intervention mechanism as described in claim 1; The video information control module is used to determine the global prediction sequence matrix. The vehicle's physical state and environmental spatial information are used to generate a dynamic polygon mask, and the dynamic polygon mask is used to perform differentiated directional coding to perform step S3 as described in claim 1. In a preferred embodiment, the present invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor executes the computer program to implement the steps of the remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction described above.

[0015] In a preferred embodiment, the present invention provides a computer non-transitory readable storage medium storing computer instructions thereon, characterized in that, when the computer instructions are executed by a processor, they implement the steps of the remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction described above.

[0016] In a preferred embodiment, the present invention further provides a computer program product, including computer instructions, characterized in that, when the computer instructions are executed by one or more processors, they implement the steps of the remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction described above.

[0017] This invention provides a remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction. Through the coordination of the above structures, it has the following advantages compared to existing methods: First, by introducing an asymmetric penalty loss function to reconstruct the time-series prediction model, a severe penalty is imposed on the network's sudden drop in false negatives, providing a forward-looking trigger source with an extremely low false negative rate for the cross-domain collaborative degradation mechanism, fundamentally avoiding sudden disconnection.

[0018] Secondly, it abandons the traditional static center area division. When network degradation is predicted, it precisely projects the road centerline from the 3D high-precision map onto the pixel coordinate system of the current 2D video frame, generating a dynamic polygon mask that fits the actual road direction. This maintains the core right-of-way view for remote driving with extremely low bandwidth consumption, ensuring the continuity of manual takeover.

[0019] Finally, the control isolation between the information layer and the physical layer was broken, and the real-time vehicle load ratio and road adhesion coefficient were introduced into the calculation logic of the safety attenuation coefficient, which made up for the defects of single network delay speed limit and constructed an absolute safety braking redundancy that does not rely on fixed parameters. Attached Figure Description

[0020] The present invention will be further described below with reference to the accompanying drawings and embodiments: Figure 1 This is the main view structure diagram of the process in Embodiment 1; Figure 2 This is the cross-domain instruction interaction diagram based on the cloud-edge collaborative redundancy mechanism in Embodiment 2; Figure 3 This is a schematic diagram of the composite neural network structure based on TCN-BiLSTM-Attention in Embodiment 1. Figure 4 This is the asymmetric penalty loss function in Example 1. A schematic diagram of gradient sensitivity distribution; Figure 5 This is a video frame pixel feature mapping map generated by dynamic ROI mask in the network degradation scenario of this embodiment 1; Figure 6 This is the dynamic safe vehicle speed limit under different network latency fluctuations in Example 1. Response curve; Figure 7 This is a diagram of the computer device in Embodiment 3. Detailed Implementation

[0021] To better understand the purpose, system architecture, and functional implementation of this embodiment, the embodiments and features described herein can be combined with each other without conflict. The exemplary embodiments disclosed herein will be described below with reference to the accompanying drawings, including specific technical details disclosed to aid understanding; however, these details should be considered exemplary rather than restrictive. Therefore, those skilled in the art should understand that various improvements and adjustments can be made to the embodiments described herein without departing from the scope and core ideas of the invention. Similarly, for clarity, detailed descriptions of well-known technologies, functions, and structures are omitted in the following description.

[0022] To address the cascading failure of remote driving systems in high-risk, enclosed scenarios, resulting in physical loss of vehicle control and blindness of core visual information due to sudden degradation of the underlying communication network, this invention resolves the core technical contradiction between the absolute dependence of remote driving on high-performance networks and the inherent unpredictable volatility of the network environment. Existing technologies generally employ passive response control strategies. In specific implementations, a global congestion control algorithm is preferred, but its approach of first sensing network degradation and then adjusting results in a lag of several seconds, causing severe video stream stuttering, blackouts, and instantaneous loss of critical visual information. Simultaneously, the high latency prevents timely delivery of control commands, leading to a sense of loss of control and separation between the driver and vehicle. Furthermore, existing single prediction models, preferably autoregressive integral moving average models, recurrent neural networks, or long short-term memory networks, struggle to effectively capture long-term dependencies and suffer from gradient vanishing problems. Simple temporal convolutional networks lack sensitivity to sequence abrupt changes, and video bitrate control and vehicle control often operate as isolated systems.

[0023] This solution provides a remote driving cooperative control method and system based on TCN-BiLSTM-Attention network prediction. By introducing an asymmetric penalty loss function to reconstruct a deep hybrid temporal prediction model, the network state in future time windows is proactively assessed. Based on the same prediction results, a mechanism is forcibly constructed and triggered to synchronously implement closed-loop degradation cooperative intervention between the video information layer and the chassis control layer. Specifically, when the prediction network is good, both the video bitrate and vehicle speed are increased; when the prediction network deteriorates, both the video bitrate and vehicle speed are significantly reduced; and under moderate network conditions, intelligent encoding of regions of interest based on 3D spatial mapping is initiated.

[0024] Thus, this solution completely breaks down the control silos between the information layer and the physical layer, not only fundamentally avoiding physical disconnection caused by sudden network fluctuations, but also prioritizing the clarity of the road ahead under extremely low bandwidth, maintaining the core right-of-way vision, and constructing an absolute safety physical braking redundancy that does not rely on fixed parameters by combining real-time vehicle load and road surface adhesion coefficient, ultimately achieving a deep binding and collaborative intelligence between clear vision and stable driving.

[0025] Example 1 like Figure 1 As shown, in the preferred embodiment, the method provided by the present invention relies on the interaction and cooperation between the vehicle terminal system, the cloud-based collaborative control platform, and the remote cockpit. In specific implementation, the vehicle terminal system is equipped with lidar, cameras, millimeter-wave radar, and high-precision global positioning system equipment for environmental perception and self-positioning.

[0026] In addition, edge computing nodes are deployed on the vehicle side, with a lightweight version of the prediction model built in. This model is responsible for local real-time prediction and serves as a redundant backup for cloud commands. The vehicle is also modified with drive-by-wire to receive remote control commands to execute steering, throttle, and braking.

[0027] The cloud-based collaborative control platform is deployed in a dedicated data center, housing a complete cluster of high-precision TCN-BiLSTM-Attention prediction models. Leveraging blade server arrays and Tensor Processing Unit (TPU) clusters for overwhelming computing power, this cluster deploys a fully structured hybrid temporal prediction model for high-precision asymmetric loss function training and long-cycle forward inference computation. Simultaneously, utilizing the 5G private network or vehicle-to-everything (V2X) network deployed in the mining area, low-latency, high-bandwidth communication between vehicles and the cloud is achieved. Furthermore, a 1:1 virtual digital twin platform, mirroring the real mining area, is constructed for simulation testing, algorithm training, and remote monitoring.

[0028] In one feasible approach, a remote cockpit serves as the remote control terminal. The remote cockpit is equipped with a multi-screen splicing display array and a physical control panel with a steering wheel and pedals that provide damping force feedback. This provides the safety operator with a 360° real-time three-dimensional surround view. After receiving trajectory prediction data from edge nodes, the predicted trajectory line is rigidly superimposed on the physical light-emitting diode pixel layer of the display array to assist the safety operator in physical takeover.

[0029] Specifically, the aforementioned display screen will precisely overlay the predicted driving trajectory line for the next 5 seconds based on a spatial mapping matrix. This collaborative control method includes the following core steps: S1 Data Acquisition and Multidimensional Perception Stage: S11 First, extract the underlying communication data. Vehicle-side communication nodes synchronously trigger sampling at fixed time intervals to collect the underlying network metric matrix in real time. The time step is [missing information]. A sliding time window to extract past to The sequence of core network indicator feature vectors at each time step generates the original input matrix containing historical communication states. eigenvectors in a matrix Each represents an independent time step. The sampling includes end-to-end latency, packet loss rate, downlink bandwidth, and latency jitter.

[0030] S12. Secondly, acquire the vehicle's physical status. Collect the current vehicle load via the vehicle controller's local area network bus. High-precision spatial coordinates and heading angle And the road adhesion coefficient is estimated through the chassis anti-lock braking system. .

[0031] The aforementioned vehicle physical state, as a deterministic physical scalar, is directly substituted into the downstream steps to participate in the classic dynamic closed-loop calculation, thereby ensuring that the chassis control has an absolute physical safety redundancy baseline. Simultaneously, the system synchronously collects driving state time-series data from skilled human safety drivers in motion for early supervised training, extracting vehicle state time-series data sampled at a frequency of 100 Hz over the past 3 seconds.

[0032] Specifically, the vehicle status timing data includes: vehicle speed, longitudinal acceleration, lateral acceleration, yaw rate, steering wheel angle, throttle opening, and braking pressure.

[0033] S13 Next, retrieve environmental spatial information. Match and extract the set of three-dimensional world coordinates of the centerline of the road ahead from the high-precision map library of the cloud-based digital twin system. .

[0034] In practice, the cloud-based digital twin platform constructs a virtual environment that is 1:1 scale with the real mining area for simulation testing, algorithm training, and remote monitoring.

[0035] In addition, after fusing the above multi-source perception results, the system further extracts higher-order environmental feature information such as the curvature and slope of the road ahead.

[0036] S14 Next, the multi-source output label data is integrated. As a further extension of this solution, to meet the requirement of extremely high reliability in trajectory and network dual-track prediction, the system performs strict temporal alignment on the aforementioned collected physical situation and environmental spatial information, constructing an output label sequence for supervised training, which includes a real network delay sequence for future time windows. And vehicle trajectory sequences.

[0037] Specifically, in addition to the label data of the prediction network, the system simultaneously generates a sequence of vehicle future trajectory points and future state sequences for a total of forty time points within the next two seconds.

[0038] The vehicle's future trajectory point sequence includes future latitude and longitude coordinates, and the future state sequence includes future vehicle speed and future heading angle. These are used to minimize the mean square error between the predicted trajectory and the actual trajectory during the supervised training phase, thereby enabling the system to fully control the vehicle's driving physical limits.

[0039] S2 core network state forward prediction phase: like Figure 3 As shown in this embodiment, a deep hybrid time-series prediction model is constructed to address the time-varying nonlinear characteristics exhibited by vehicular mobile networks.

[0040] In practice, the model includes a data preprocessing module, a feature extraction network module, and an asymmetric optimization training module. The specific steps are as follows: S21 First, the underlying physical communication feature matrix is ​​constructed and preprocessed. Specifically, the vehicle-side communication node performs handshake communication with the base station at a fixed time interval (preferably set to the level of tens of milliseconds) to record the underlying physical status messages in the actual communication channel.

[0041] The underlying status message is unpacked by the hardware and mapped into objective physical metrics for four dimensions: end-to-end latency, packet loss rate, downlink bandwidth, and latency jitter.

[0042] Therefore, a time step of [time step] is constructed in the memory buffer of the on-board computing unit. A sliding time window to extract past to The sequence of core network indicator feature vectors at each time step generates the original input matrix containing historical communication states. .

[0043] Specifically, each column of the eigenvector in the above matrix , respectively representing independent time steps The sampling includes end-to-end latency, packet loss rate, downlink bandwidth, and latency jitter.

[0044] in, This represents the index of any discrete time step within the sliding time window, whose value range satisfies... .

[0045] In practical implementation, because vehicle-mounted mobile communication is highly susceptible to transient impulse noise caused by physical obstructions, the system applies an exponentially weighted moving average algorithm to smooth the extracted original feature vector sequence. For any time step... The smoothing formula is as follows: ; in, As a smoothing attenuation factor, its value range is preferably set to [value range missing]. to between.

[0046] This technical feature automatically filters out random high-frequency jitter noise generated by the underlying radio frequency hardware interface from the data sequence. The resulting effect is to effectively prevent downstream AI prediction models from falling into local noise traps, thus improving the prediction system's anti-interference capability.

[0047] The Z-Score normalization algorithm is then applied to the smoothed matrix to eliminate the dimensional differences between different network metrics. The normalization formula is as follows: ; in, and These represent the mean and standard deviation of the physical feature variables within the sliding time window. The purpose of this feature is to unify the dimensional boundaries of multi-dimensional physical quantities such as millisecond-level latency and percentage-level packet loss rate, ensuring the convergence speed of the gradient descent algorithm.

[0048] In a specific scenario of this step, the vehicle-side packagees the current and preferably past three seconds of vehicle status data and uploads it to the cloud platform in real time via the 5G network. After receiving the data, the cloud prediction model simultaneously performs the aforementioned normalization and other preprocessing actions.

[0049] During the model training phase, the data source is the data collected in advance from skilled human safety drivers, and the input features also include: (1) Vehicle status timing data (past 3 seconds, 100Hz): vehicle speed, longitudinal acceleration, lateral acceleration, yaw rate, steering wheel angle, throttle opening, and braking pressure; (2) Positioning information: high-precision GPS coordinates (longitude, latitude), heading angle; (3) Environmental information: curvature and slope of the road ahead after fusion of perception results.

[0050] The output labels are the future trajectory point sequence (latitude and longitude coordinates) and future state sequence (vehicle speed, heading angle) of the vehicle in the next 2 seconds (40 time points).

[0051] The training process uses the collected data to perform supervised training on the TCN-BiLSTM-Attention network to minimize the mean squared error (MSE) between the predicted trajectory and the true trajectory.

[0052] The online prediction workflow is as follows: The vehicle-side packages the current and past 3 seconds of vehicle status data and uploads it to the cloud platform via the 5G network. After receiving the data, the cloud prediction model performs preprocessing such as normalization, and then sequentially passes through the TCN layer to extract multi-scale temporal features, the BiLSTM layer to learn contextual dependencies, and the Attention layer to focus on key historical moments. Finally, it outputs the predicted trajectory and status for the next 2 seconds and sends it to the remote cockpit and control decision module.

[0053] S22 Secondly, multi-scale temporal features are extracted using temporal convolutional network layers.

[0054] In one feasible approach, any time step Standardization characteristics The input consists of a multi-layer temporal convolutional network composed of stacked one-dimensional causal convolutions and dilated convolutions.

[0055] Specifically, for any time step in the input sequence Standardization characteristics The mathematical logic of dilated causal convolution is expressed as follows: ; in, The weight vector of the convolution kernel. To sense the size of the field, The dilation factor. The dilation factor in a multi-layer temporal convolutional network. The network depth increases exponentially, preferably in a jump sequence of 1, 2, 4, 8. This setting, under the premise of strictly isolating the leakage of physical data at future moments, exponentially expands the historical receptive field of the time dimension with minimal computing power overhead. This enables the computing unit to accurately capture and quantify the long-period network fading patterns generated when vehicles enter long-distance tunnels or continuous physical obstruction areas, strictly ensuring the causal temporality of vehicle-side prediction and low-latency forward inference performance.

[0056] Specifically, the dilation factor in multi-layer temporal convolutional networks The network depth increases exponentially, preferably in increments of 1, 2, 4, and 8, to ensure the model has a very large receptive field.

[0057] Meanwhile, each temporal convolutional network block contains a residual connection mechanism to prevent deep network degradation. The core mechanism of this residual connection ensures effective iterative training of deep networks, completely avoiding the network degradation problem easily induced in long-sequence learning. This is based on the high-level feature sequences output from the multi-layer temporal convolutional network. .

[0058] This step strictly guarantees causality. Therefore, it ensures that at any given time step... The output depends only on By taking the time step before the time step, data leakage in the future moment is completely isolated, and long-term network decay patterns are accurately captured.

[0059] S23 Then, a global context-dependent feature space is constructed using bidirectional long short-term memory network layers. The high-level feature sequences... The hidden states are fed into a bidirectional recurrent network unit and processed simultaneously by forward and backward long short-term memory networks. The forward long short-term memory unit calculates the hidden states in ascending time order. The backward unit calculates the hidden state by sliding in reverse time. .

[0060] For any time step The core state update logic within a single neuron depends on the forget gate. Input gate and output gate The mathematical state equation for the nonlinear cooperative interaction is derived as follows: ; ; ; ; ; ; Specifically, and These represent the weight parameter matrix and bias constant term of the corresponding gated channel mechanism, respectively. It is the tensor Hadama product.

[0061] Next, the calculated and Spatial concatenation is performed along the feature dimension to generate a joint feature matrix that contains global contextual information. This step enables the system to intelligently identify natural geographical attenuation trends and abnormal link abrupt failures in network signals.

[0062] S24 Next, a scaled dot product attention mechanism layer is introduced to actively focus on key physical events. This is achieved by introducing three independent learnable projection weight matrices to the joint feature matrix. Perform linear spatial projection transformation.

[0063] In practice, the query weight matrix is ​​used. Key weight matrix and value weight matrix Generate query matrices respectively Key matrix and value matrix Therefore, the network can autonomously update the aforementioned weight parameters during backpropagation training to adapt to physical congestion characteristics.

[0064] calculate Attention weights for features at each time step. Calculate the query vector. With each key vector The similarity dot product yields a set of unnormalized attention scores. : ; in, This is the scaling factor for the key vector dimension. The score is then normalized using the Softmax function to obtain the dynamic attention weights. : ; The obtained weights value vector Perform a weighted summation to generate the final focused context feature tensor. : ; This setting breaks the rigid limitation of the traditional cyclic unit in equally distributing weights to all historical communication sampling points, and gives the system the ability to autonomously search for and amplify the impact weight of historical abnormal nodes. If there is a severe communication cliff event caused by base station switching in the historical sequence, this attention mechanism can give the underlying physical event a very high response weight, which greatly improves the predictive sensitivity and early warning lead time of the collaborative control center for sudden network paralysis events.

[0065] Therefore, this attention mechanism enables the model to proactively and selectively focus on key physical events in the historical sequence that have the greatest impact on future predictions. Preferably, in the vehicle control sequence, the prediction weight assigned to the most recent physical emergency braking action will be much higher than the weight assigned to the constant speed driving action ten seconds ago.

[0066] S25 Subsequently, a fully connected output layer is used to map the high-dimensional features into a global prediction sequence matrix. The focusing context feature tensor calculated in the preceding steps As the direct input to this fully connected layer, a linear transformation is performed.

[0067] The derivation formula is: ; in, To output the weight matrix of the fully connected layer, its shape is strictly set to... Multiply by the model's hidden dimension; As the bias vector, the shape of the corresponding weight matrix for which the mapping operation is performed is set to... Multiply by the model's hidden depth. This means that the fully connected computation layer simultaneously learns and maintains... There are several different mathematical mapping relationships. Each independent mapping relationship is responsible for extracting information of a specific dimension from the global features in order to accurately predict the physical state at a specific future point in time.

[0068] Using fully connected layers to transfer tensors Dimensionality reduction mapping includes the future The fully connected computation layer simultaneously learns and maintains the predicted sequence at each time step. A variety of mathematical mapping relationships, accurately outputting the future. The multidimensional predicted physical value matrix at each time step .

[0069] Specifically, this global prediction sequence matrix It contains independent prediction subsequence tensors: that is, end-to-end time delay prediction subsequences. Packet loss rate prediction subsequence and downlink bandwidth prediction subsequence At the same time, the system according to The total downlink bandwidth capacity within the future time window is calculated using the integral or mean. This serves as an absolute boundary constraint for the downstream video coding layer.

[0070] In one feasible approach, in addition to outputting the network prediction matrix, the model also simultaneously outputs the predicted physical trajectory and chassis driving status for the next two seconds, and sends them at high frequency to the remote cockpit and the core control decision module.

[0071] This step maps the multi-scale network features extracted by the TCN-BiLSTM-Attention layer to the prediction space, which can accurately output the network indicator sequence for future time windows, significantly reducing the false negative probability of sudden network degradation and providing a reliable trigger source for subsequent video degradation and chassis speed limiting.

[0072] In the preferred scheme, the network state for the next 5 seconds is predicted using a sampling rate of 10Hz, which generates prediction node data for 50 consecutive time steps.

[0073] In the online prediction workflow, the vehicle-side communication node collects current and past data in real time. The network state metrics at each time step are used to construct the input feature matrix in the local edge computing unit. The pre-trained TCN-BiLSTM-Attention model receives this feature matrix in real time and performs forward inference to output the future... The predicted network indicator sequence for each time step is simultaneously sent to the video stream intelligent control module and the vehicle driving safety control module for closed-loop collaborative control.

[0074] like Figure 4 As shown, in one feasible approach, S26 uses asymmetric penalty loss function to drive the backpropagation closed-loop training of the network model. To establish a strong causal link between the aforementioned AI architecture and the physical braking safety of the drive-by-wire chassis hardware, the system employs a deeply customized asymmetric loss function for gradient updates.

[0075] A training set containing large-scale real-world network fluctuation data is collected. Specifically, this training set consists of underlying communication status logs collected from the interaction between the vehicle-mounted communication module and the 5G base station in a real network environment. It includes time-series data of core network indicators such as end-to-end latency, packet loss rate, downlink bandwidth, and latency jitter over past preset time windows. This allows for the analysis of real-world future network latency data. With predicted latency Construct an asymmetric loss function: ; in, This represents the total number of samples in the training batch. For sample index; For real-world future network latency, This is the network latency prediction vector obtained from the model's forward inference; The regularization coefficient, used to suppress overfitting, has a value range of [value range missing]. The preferred value is 0.001; In practice, Let be the asymmetric penalty coefficient, and satisfy . The preferred value is 0.8. When the actual delay is greater than the predicted delay, this function uses a larger weight. Impose severe punishment.

[0076] This is a regularization penalty boundary term used to suppress overfitting in high-dimensional feature spaces. This regularization term penalizes the L2 norm of the weights of each layer of the model, thereby preventing the model from overfitting to the noise of the training set during training and improving the model's generalization ability in real vehicle network scenarios. Specifically, for the predicted value Taking the partial derivative, we can obtain the piecewise gradient update equation: when a time delay and missed report occur... When, partial derivatives This rigorous calculus formula shows that when the actual physical network deteriorates more than the model expects, the gradient step size during backpropagation is significantly increased by the larger weights. At the mathematical level, it is forcibly amplified, forcing the prediction model to evolve and converge towards a conservative defensive strategy of "preferring to overestimate network physical latency rather than underestimate the risk of communication degradation" when updating weights. This completely eliminates the cascading failure links caused by physical collisions due to a sudden increase in network missed reports from the micro-foundation of the algorithm.

[0077] Therefore, when the actual physical network degradation is greater than the model's expectation (i.e.) When this happens, the loss function immediately triggers the large weights... Amplified severe punishment.

[0078] S3 Spatial Visual Mapping and Video Degradation Control Phase: S31 First, perform spatial visual mapping triggered by a quantization threshold.

[0079] Specifically, the prediction sequence matrix output by the look-ahead prediction stage of the core network state is analyzed in real time. When the predicted downlink bandwidth in the matrix is ​​lower than the preset video connectivity bandwidth threshold for multiple consecutive time steps, or the predicted packet loss rate is higher than the preset extreme packet loss threshold, the spatial visual degradation logic is forcibly triggered.

[0080] Obtain high-precision spatial coordinates from the vehicle and heading angle Based on this high-precision spatial coordinates, the system matches and extracts the set of three-dimensional world coordinates of the centerline of the road ahead from a high-precision map library in the cloud. .

[0081] Next, using this high-precision spatial coordinates With heading angle Construct the dynamic pose transformation matrix of the vehicle body coordinate system relative to the three-dimensional world coordinate system. .

[0082] in, The dynamic pose transformation matrix is ​​a 4×4 homogeneous transformation matrix. Based on heading angle rotation matrix With high-precision spatial coordinates The translation vector obtained by the transformation Composition, rotation matrix Used to describe the rotational transformation and translation vector of the vehicle's body coordinate system about the Z-axis of the world coordinate system. Used to describe the position of the origin of the vehicle's body coordinate system in the world coordinate system, i.e. , where the rotation matrix Translation vector From high-precision spatial coordinates It is obtained by converting to three-dimensional coordinates in the world coordinate system.

[0083] Furthermore, the static mounting external parameter matrix of the onboard camera relative to the vehicle body is used. Generate a global dynamic extrinsic parameter matrix through matrix multiplication. , here Taking the inverse matrix is ​​used to transform the coordinates in the world coordinate system to the vehicle body coordinate system, and then... Transform to the camera coordinate system to achieve the mapping from the three-dimensional world coordinates to the camera coordinate system.

[0084] Therefore, by utilizing this global dynamic extrinsic parameter matrix and the internal parameter matrix of the vehicle camera The system maps the three-dimensional world coordinates of the road centerline to the two-dimensional pixel coordinate system of the current video frame in real time, generating a set of center pixels. In practice, the conversion formula is as follows: ; in, This is the scaling factor.

[0085] S32. Next, a dynamic polygonal region of interest mask is generated. In practice, the least squares method is applied to the pixel set. A polynomial smoothing fit is performed to generate the equation of a continuous road centerline baseline on a two-dimensional pixel plane. Next, a geometric normal vector is calculated for each reference point along this centerline baseline, and then extended to both sides by a predetermined pixel width along this normal vector. Introducing basic line-of-sight width and vehicle speed topology gain coefficient Combined with the current vehicle speed Constructing a dynamic expansion width mathematical equation .

[0086] Preset pixel width here It can be dynamically adjusted according to the vehicle's speed; when the speed is high, Increase the size appropriately to cover a wider field of vision; when the vehicle speed is low, By appropriately reducing the bandwidth, the edge contour point set is calculated.

[0087] In one feasible approach, when extending to the left and right along the normal vector, the dynamic left boundary width of the left extension... With the dynamic right-side boundary width extending to the right Introducing a basic line-of-sight width. Vehicle speed topology gain coefficient and steering bias gain coefficient Derivation of the dynamic analytical solution equation: ; ; In this asymmetric extended simultaneous equation, the yaw rate during a left turn is defined. A positive value indicates a positive value, while a right turn indicates a negative value.

[0088] Using the above formula, the real-time physical state of the chassis is precisely back-coupled into the spatial geometric parameters of the video encoder. In the longitudinal speed coupling dimension, when the chassis bus feedbacks the vehicle speed... At extremely high speeds, the speed gain term Significantly larger, this forces the forward ROI region to become longer and wider in the 2D image, prematurely incorporating distant road conditions into the low-compression-rate fine-grained coding region, thus gaining physical reaction time for remote safety drivers. In the lateral angular velocity coupling dimension, when performing a sharp turn, the steering bias term... It takes effect immediately, forcing the mask to break its left-right symmetry and generating a strong inward topological bias, perfectly simulating the physiological instinct of a driver's line of sight shifting towards the center of the curve when entering a curve.

[0089] After calculating the edge contour point set, the edge contour point set is finally closed and connected to generate a polygonal region of interest mask that conforms to the actual physical road trend.

[0090] This mechanism completely breaks down the control isolation between the information layer and the physical layer, solving the problem of the disconnect between traditional two-dimensional image ROI segmentation and the actual three-dimensional driving trajectory of vehicles. In terms of visual feature projection, this dynamic mask not only accurately includes the controlled vehicle in the foreground, but also closely conforms to the conventional road baseline, and exhibits a strong sense of perspective with depth, ensuring that the limited underlying communication bandwidth always prioritizes the core right-of-way areas with the most lethal threats on the current physical driving trajectory.

[0091] S33 Then, perform targeted coding compression. In order to completely eliminate the risk of bandwidth waste or congestion deterioration caused by simply relying on static empirical thresholds in traditional coding mechanisms, this step abandons the simple fixed pressure difference rule and uses a dynamic mathematical optimization process with physical boundary constraints to establish a closed loop for look-ahead prediction between the video information layer and the underlying physical link.

[0092] In practice, since the actual amount of compressed data in the current frame cannot be accurately known before actual encoding, the video hardware encoding microchip maintains and updates a dynamic rate-distortion estimation lookup table in real time in its internal cache. This estimation lookup table is automatically generated by the encoder when processing the previous video frame, based on the historical mapping relationship between the spatial texture complexity parameters of each macroblock and its actual output bitrate in that frame, and is updated at fixed intervals as the video stream progresses.

[0093] The video encoding module uses a polygonal region of interest mask and applies a smaller quantization parameter to the polygonal region of interest mask. Fine-grained encoding is performed, and a larger quantization parameter is used for the background area outside the mask. Perform rough coding.

[0094] Specifically, the video hardware encoding microchip has a built-in rate-distortion solver that performs the following optimization solution logic based on the dynamic polygon region of interest mask generated in step S32: First, the system introduces the total future downlink bandwidth capacity output in step S25. As the physical boundary of absolute throughput, construct parameters quantized using polygonal region of interest masks. The dynamic optimization equation with the goal of minimizing (i.e., achieving the lowest possible image compression rate and maximum clarity) is as follows: ; Solving this optimization objective function must strictly satisfy the following two hard mathematical constraints within the encoding cycle of each frame: The first constraint is the physical boundary constraint of throughput, meaning that the sum of the data generated by the core region of interest after encoding and the data generated by the background region outside the mask must be strictly less than or equal to the predicted bandwidth capacity. Its mathematical inequality is expressed as: ; The second constraint is a mandatory lower limit constraint on pressure difference. To ensure that background image quality is sacrificed first under extremely low bandwidth conditions, the quantization parameters of the background area are required to be strictly controlled. Greater than or equal to the core area quantization parameter and the preset differential pressure threshold The sum of. Its mathematical inequality is expressed as: ; When the single-frame optimization calculation starts, the solver first... Initialize to the lowest parameter values ​​supported by the hardware (representing the highest image quality), and set... A trial parameter pair is formed; then, this trial parameter pair is substituted into the dynamic rate distortion estimation lookup table to quickly map the predicted total coded data volume under the current parameter combination; if the predicted total data volume is greater than the total downlink bandwidth capacity... The solver uses a forward iterative mechanism with a step size of 1, and iteratively increases... The value is retrieved and refreshed synchronously. The value of is maintained until the first hit indirectly satisfies the parameter combination of the first and second constraints. This is the globally optimal solution.

[0095] By solving this dynamic optimization equation, the video hardware encoding microchip, within the boundary of satisfying the future available physical bandwidth, adopts a larger quantization parameter for the background region outside the mask, sacrificing pawns to protect the rook. By performing coarse coding, the absolute maximization of image quality gains in the core driving trajectory area was achieved at the mathematical level, ensuring that remote drivers can still clearly obtain the core right-of-way view even when the network deteriorates. Therefore, this differentiated coding method can prioritize the video clarity of core road areas that closely match the actual driving trajectory of vehicles under bandwidth constraints. It solves the problem of the disconnect between traditional two-dimensional image ROI division and the actual three-dimensional driving trajectory of vehicles, ensuring that remote drivers can still clearly obtain the core right-of-way view when the network deteriorates, and ensuring the continuity of manual takeover.

[0096] S4 Physical Limit Coupling and Safe Speed ​​Closed-Loop Control Stage: S41 First, the network latency prediction results are received and feature quantization and dimensionality reduction are performed. Specifically, the aforementioned prediction sequence matrix is ​​parsed. China corresponds to the future End-to-end delay prediction subsequence at each time step .

[0097] Then, predict the delay subsequence. By performing arithmetic mean calculation, the future average latency, which characterizes the macroscopic trend of future network congestion, is obtained. The calculation formula is derived as follows: .

[0098] S42 Secondly, the safety attenuation coefficient is dynamically calculated. Integrating the real-time dynamic physical state of the vehicle, the calculation logic is as follows: ; in, The preferred value for the basic safety factor is 0.2. The current vehicle load is collected in real time by an onboard weight sensor. This is the maximum rated load, which is the value set at the time the vehicle left the factory; The current road surface adhesion coefficient is estimated in real time by the chassis anti-lock braking system through wheel speed difference and braking pressure; and The preferred values ​​for the weighting adjustment parameters are 0.3 and 0.5.

[0099] Therefore, under heavy loads or on slippery road surfaces, The value increases dynamically, resulting in a lower maximum safe vehicle speed calculated by the system under the same expected network latency. This solves the problem that the speed limit based on single network latency does not reach the physical safe braking distance under heavy load or adverse road conditions, ensuring that the vehicle has sufficient physical braking safety redundancy when the network deteriorates, and avoiding the risk of inertial loss of control.

[0100] S43 Derivation of the Maximum Safe Vehicle Speed ​​Threshold. In order to completely solve the defect of existing empirical speed limit formulas lacking physical boundary support and to meet the strict requirements of full disclosure and implementability in patent examination, this step rigorously couples the communication dead time of the network prediction domain with the physical braking limit of the chassis control domain in mathematics to derive the maximum safe vehicle speed that balances absolute safety and microsecond-level computing power efficiency.

[0101] In practice, the derivation process includes the following rigorous mathematical steps: First, we introduce the classical physical braking kinematics equations. Then, we analyze the future average time delay obtained in step S41. Defined as the communication and control dead time, that is, the response hysteresis period from the issuance of a command to the physical locking of the brake caliper, and setting an absolute safety buffer distance ahead. This distance can be obtained in real-time through LiDAR point cloud and high-precision map calculations. To ensure that vehicles will never experience a physical collision during periods of network degradation, the physical braking inequality under extreme conditions holds as follows: ; in, The vehicle's current instantaneous speed. It is the acceleration due to gravity. This is the road surface adhesion coefficient.

[0102] Secondly, in the above dynamic formula, the road surface adhesion coefficient The reliable acquisition of this parameter is the physical foundation for the braking inequality to hold. In practical implementation, to prevent the loss of this physical parameter due to communication timeouts of the chassis anti-lock braking system sensors, this step innovatively introduces mathematical modeling based on the visual feature space and the maximum margin classifier (SVM) as a fault-tolerant degradation channel. This involves clearly defining the visual input feature vector. ,in Extract the global brightness histogram feature matrix of the video frame captured by the current forward-looking camera. This represents the high-frequency edge texture feature matrix. The system constructs a support vector machine classifier with a maximum margin optimization objective function, introducing Lagrange multipliers and relaxation variables. Its standard optimized form is expressed as: ; This is used to obtain the optimal hyperplane weight vector. Subsequently, the system utilizes mathematical mapping functions. This achieves forced dimensionality reduction mapping from a high-dimensional visual feature space to a low-dimensional discrete physical constant space, and matches the conservative road adhesion coefficient. Replace the original Substituting back into the physical braking inequality ensures that the physical equation remains absolutely calculable even under extreme conditions of sensor hard failure.

[0103] Then, a Taylor expansion dimensionality reduction mapping is performed from the theoretical analytical upper bound to the engineering application formula. The boundary equations of the above univariate quadratic physical inequality are then solved. The theoretical analytical upper limit of velocity can be obtained through the quadratic formula. : ; To avoid excessive microprocessor clock cycles consumed by complex square root floating-point operations under thermal failure and frequency limiting conditions at edge computing nodes, this step transforms it into a linear engineering formula through mathematical approximation. The network latency is ideally set to zero. The physical limit speed at which the vehicle is Then the above analytical root can be equivalently rewritten as: ; Because within the prediction time window at the microsecond to millisecond level, the delay parameter Second-order higher-order terms It belongs to an extremely small infinitesimal quantity. Applying a first-order Taylor expansion to approximate its derivation, the square root term can be simplified to a constant 1, thus deriving the linear decay equation for high computing power efficiency: ; Let the basic attenuation coefficient And the theoretical limit speed is equivalent to ; However, the above This is merely the theoretical upper limit under an ideal physical model. In real-world remote control implementations, to construct an absolute defense baseline independent of fixed parameters, an inequality safety boundary mechanism is introduced. This involves applying a theoretically constant attenuation slope... Replace with the dynamic safety attenuation coefficient calculated in step S42. Due to the dynamic attenuation coefficient The load penalty term and the road surface attenuation term are superimposed, and the boundary conditions are set to meet the requirements. .

[0104] By replacing this dynamic parameter with the theoretical equation, a high-performance engineering application formula that can be directly called by the vehicle control domain manager is finally generated: ; This scheme not only firmly anchors the empirical speed limit formula to the dynamic limit theorem, but also utilizes... The mathematical inequalities ensured the maximum speed limit actually issued in the project. Always strictly less than or equal to the ideal physical limit While maintaining a microsecond-level closed-loop instruction processing speed, it establishes an absolute safety redundancy limit for vehicle physical collision avoidance based on the mathematical foundation of boundary constraints. S44 then implements active torque intervention. The vehicle control domain manager translates the maximum safe speed threshold into a bus control message and executes the highest priority speed limit command on the drive system to ensure that the vehicle has sufficient physical braking safety distance.

[0105] Specifically, the cross-domain collaborative decision-making and state machine distribution phase: The system is based on the above-mentioned prediction sequence matrix A cross-domain collaborative control state machine with three independent branches is constructed to achieve deep intelligent collaboration between video stream and vehicle control: (1) Steady-state experience-preserving branch: When predicting downlink bandwidth subsequence The predicted packet loss rate is consistently above the high-quality connectivity threshold. At extremely low speeds, the network environment is good. The decision-maker issues a synchronization boost command: the video information layer allocates a very small global quantization parameter to improve image resolution; the chassis control layer maintains the default baseline vehicle speed $V_{default}$ to ensure efficient passage.

[0106] (2) Moderate trade-off branch: When the prediction index is in an intermediate transition state, the resource intelligent optimization mechanism is triggered. The dynamic polygon mask of the subsequent S3 stage is called simultaneously for downgrade encoding, and the dynamic attenuation speed limit calculation of the S4 stage is performed. Under limited bandwidth, the core field of view is prioritized to maintain a moderate safe speed.

[0107] (3) Extreme Security Branch: When the predicted downlink bandwidth subsequence is continuously lower than the connectivity bandwidth threshold, or the predicted packet loss rate subsequence is higher than the extreme packet loss threshold (the trigger condition here is forced to be "or"), it indicates a sudden network degradation. Forced triggering of physical safety defense: The video information layer ignores the background image quality and performs maximum quantization parameter truncation to ensure link connectivity to the death; the chassis control layer directly applies the physical limit equation to perform deep speed limiting or even emergency braking.

[0108] In this embodiment, to verify the technical effectiveness of the remote driving cooperative control system based on forward-looking prediction constructed in steps S1 to S4, a real-vehicle road test simulation was conducted in a closed mining area under a dedicated 5G mobile communication network. Specifically, the vehicle-cloud interaction underlying communication status logs and chassis controller local area network bus messages were continuously collected during the vehicle's movement and subjected to rigorous quantitative comparison.

[0109] like Figure 6 As shown, in one feasible approach, a highly representative physical time window of data from instantaneous network congestion to link recovery is extracted. Specifically, this time window includes not only the data stream of the hybrid time-series prediction model constructed in this invention, but also a traditional Long Short-Term Memory (LSTM) network that has not undergone the aforementioned S26 step asymmetric loss function optimization as a benchmark model. The generated measured physical data are summarized in Table 1: Table 1: Measured and Simulated Data of Cooperative Control under Sudden Change Scenarios in Vehicle-Mounted Mobile Networks

[0110] In response to the multidimensional physical state data matrix derived from the real-world scenario, this invention performs a three-level deep analysis path closed loop to rigorously demonstrate the technical problems substantially solved and the resulting technical effects of the technical solution.

[0111] First, we will conduct case studies of typical transient physical mutation scenarios.

[0112] Specifically, regarding the sampling time and The sudden network degradation event was analyzed independently. At any given moment, the actual physical end-to-end latency is affected by non-line-of-sight obstruction of the base station. Increased sharply to .

[0113] like Figure 5 As shown in this embodiment, the hybrid time-series model deployed on edge computing nodes accurately outputs... The forward-looking prediction value. Therefore, the safety control decision-maker immediately triggered the aforementioned S3 stage spatial visual mapping and S4 stage physical limit coupling dual-branch degradation action.

[0114] In the video degradation branch, the video encoder maintains a small quantization parameter for the polygonal mask region generated in step S32. And strictly execute the forced difference lower bound logic in step S33 to quantize the parameters of the background region outside the mask. It quickly rose to 32.

[0115] Furthermore, in the chassis speed limiting branch, the physical limit coupling algorithm module in step S42 simultaneously incorporates the vehicle's current heavy load parameters and the road adhesion coefficient. The calculated dynamic safety attenuation coefficient... The maximum safe speed that forces the S43 procedure from Reduced to active hard intervention .

[0116] This micro-case fully demonstrates that the solution effectively breaks down traditional control silos, prioritizes core visibility, and establishes absolute physical braking redundancy in advance.

[0117] Secondly, conduct a cross-sectional comparative analysis at the feature ablation level.

[0118] In the preferred scheme, the baseline model data in Table 1 that did not use asymmetric optimization is introduced for comparison.

[0119] In practical implementation, when the network's physical state falls into severe deep fading... At any given moment, the actual underlying latency deteriorates to Traditional benchmark models, limited by smooth inertia, only output... The predicted value was not accurately predicted, and this missed report would directly cause the vehicle to rush into the communication blind spot at a dangerous speed.

[0120] In one feasible embodiment, the hybrid prediction model of the present invention incorporates an asymmetric penalty loss function nested at the bottom layer of step S26. A weighting factor was applied to this type of missed reporting error. Severe punishment.

[0121] Therefore, this hybrid model in Output at all times The prediction conclusion is that it is better to slightly overestimate the delay than to underestimate the risk.

[0122] The comparative data demonstrates from the algorithm's fundamental nature that the unique S26 step of this invention forces the control center to evolve towards a conservative defensive strategy, completely blocking the cascading failure link that is directly induced by the sudden increase in latency and missed reports, leading to physical collisions.

[0123] Then, a global statistical analysis at a macro scale is performed.

[0124] Specifically, a total physical log set of 500 hours, covering various climates and operating conditions from full load to no load, was extracted to perform statistical performance verification on the aforementioned S1 to S4 full-process architecture.

[0125] Large-scale tensor calculations show that, relying on multi-scale feature extraction and attention focusing from S21 to S25, the absolute false negative rate of the model in predicting sudden cliff events is strictly suppressed to within the confidence threshold of 0.05%.

[0126] Furthermore, within the dynamic region interest encoding interval that triggers the S3 phase, the difference is forcibly widened. The average bandwidth consumption of the vehicle-mounted uplink video stream was reduced by 42.6%, and the peak signal-to-noise ratio in the core right-of-way area remained above 35.0 dB.

[0127] Next, based on the chassis over-limit test data, it can be seen that under the condition of full load and extremely low road surface adhesion coefficient, the attenuation coefficient is dynamically generated based on the S4 stage. The forward-looking speed limiting strategy successfully reduced the average physical overrun distance caused by network lag to a safe buffer zone of 0.15 meters.

[0128] Example 2 like Figure 2 As shown, in this embodiment, an adaptive degradation cooperative control variation scheme is constructed to address specific extreme business scenarios such as edge computing node thermal failure frequency limiting and chassis sensor communication bus frame loss in the harsh working environment of a closed mine. Specifically, this scheme adaptively replaces and adjusts some core steps in the aforementioned Embodiment 1, clearly defining the algorithm replacement logic under specific working conditions to demonstrate the extreme physical robustness of the system. This degradation control scheme includes the following steps: Adaptive degradation control phase under S5 hardware computing power limitations and data loss conditions: S51 First, perform real-time monitoring of edge computing power status, and forcibly replace and downgrade the bidirectional long short-term memory network layer modeling in step S23 of Example 1, the scaling dot product attention mechanism layer weighting in step S24, and the fully connected layer mapping scheme in step S25.

[0129] The vehicle-side daemon polls the edge computing platform's graphics processor utilization and core thermistor temperature at a fixed frequency. When the computing resource load rate continuously exceeds a preset alarm threshold, the system proactively triggers a structural degradation instruction from the hybrid time-series predictive model.

[0130] In specific implementation, the downgrade instruction forces the bypass of the complex global context semantic modeling and key information focusing layer in Example 1, and only retains the one-dimensional causal convolution kernel of the temporal convolutional network layer constructed in step S22 of Example 1 for fast forward inference.

[0131] Among them, the network latency prediction matrix after downgrading and replacement The mathematical calculation logic is extremely simplified to: ; also, and This refers specifically to the high-level feature sequence output by the temporal convolutional network layer in Example 1. Pre-tuned linear mapping parameters.

[0132] Thus, by sacrificing some of the accuracy of medium- and long-term predictions, the millisecond-level real-time response rate of the prediction system under thermal failure conditions is forcibly preserved.

[0133] S52 Next, the physical sensor signal loss determination is performed, and the sensing scheme for estimating the road surface adhesion coefficient of the chassis anti-lock braking system in step S12 of Example 1 is replaced with a fault-tolerant one.

[0134] The vehicle control domain manager continuously monitors the heartbeat messages on the chassis controller's local area network bus. If a communication timeout occurs in the anti-lock braking system node, resulting in the loss of the core physical parameter, namely the road adhesion coefficient, the system immediately switches to the visual fault-tolerant perception channel.

[0135] In one feasible approach, the system downgrades and calls the vehicle-mounted forward-looking vision perception module to capture the current video frame image sequence. Specifically, the vision perception module extracts the global brightness histogram distribution data and high-frequency edge texture density features of the image, and inputs the extracted brightness distribution and texture features into a pre-trained lightweight support vector machine classifier, thereby completely replacing the real-time estimation action that relies on chassis hardware in Example 1.

[0136] Specifically, based on the aforementioned visual features, the support vector machine classifier rigidly discretizes the current weather into enumerated states such as sunny, rainy, and snowy. Then, it matches the conservative road adhesion coefficient corresponding to each enumerated state from a pre-preserved table of discrete physical constants. .

[0137] S53 Then, perform conservative safety attenuation coefficient reconstruction under extreme conditions and replace the dynamic calculation logic of safety attenuation coefficient in step S42 of Example 1.

[0138] Based on the physical state obtained through the aforementioned visual tolerance substitution, the physical limit coupling calculation branch must undergo more stringent tightening. This involves obtaining a conservative road adhesion coefficient. Forced substitution into the reconstruction equation of the safety attenuation coefficient.

[0139] Among them, the dynamic attenuation coefficient after replacing the original logic in Example 1 The calculation formula is derived as follows: ; In practice, An extreme value safety penalty bias term is introduced to address the situation where underlying data is missing, and is used to replace the smoothing weight logic based on the complete data assumption in Example 1.

[0140] Therefore, even in the physical blind zone where the key physical feedback signal of the chassis is lost, the system can still synthesize a very large attenuation coefficient with absolute defensive properties by introducing this penalty bias term.

[0141] S54 Next, the highest priority physical braking hard lock is implemented in the degraded state, and the maximum safe vehicle speed calculation in step S43 and the smooth torque intervention scheme in step S44 of Example 1 are hard replaced.

[0142] The maximum safe speed threshold under the downgraded replacement state is derived using a formula. :

[0143] in The future average physical delay output by the replaced degraded prediction model. The minimum creep speed boundary to ensure that the vehicle does not run away.

[0144] Next, the calculated extremely low threshold is encapsulated as a highest-level interrupt request instruction, bypassing the original smooth control closed loop in Example 1, and directly performing physical braking hard lock on the chassis drive system.

[0145] In this embodiment, the conventional collaborative strategy of Embodiment 1 was replaced by an extremely robust degradation and speed limiting logic, successfully transforming the uncontrollable risk of physical collision into a controllable low-speed safe shutdown state.

[0146] Example 3 Based on the same inventive concept as the remote driving cooperative control methods in Embodiments 1 and 2, Embodiment 3 provides a remote driving cooperative control system based on TCN-BiLSTM-Attention network prediction. The remote driving cooperative control system based on TCN-BiLSTM-Attention network prediction logically includes: a multi-dimensional perception acquisition module, a network state prospective prediction module, a cross-domain degradation cooperative intervention triggering module, a video information control module, and a chassis physical control module. At the physical deployment level, the multi-dimensional perception acquisition module, the network state prospective prediction module, the cross-domain degradation cooperative intervention triggering module, the video information control module, and the chassis physical control module are distributed and mapped to underlying physical hardware devices and a cloud-edge-device collaborative distributed system, as detailed below: S6 Distributed Hardware Architecture and Physical Device Mapping Phase: First, S61 builds a vehicle-side physical perception and edge computing hardware platform.

[0147] In this embodiment, the vehicle terminal system is equipped with a multi-source heterogeneous sensor matrix, preferably including a lidar, a high-definition camera, and a high-precision inertial measurement unit (IMU), for real-time capture of the three-dimensional spatial pose of the vehicle body and continuous physical snapshots of the external environment.

[0148] In addition, an onboard edge computing node is deployed in the vehicle. This edge computing node includes a central processing unit (CPU), a graphics processing unit (GPU), and high-speed dynamic random access memory. The GPU deploys the lightweight temporal convolutional network layer and visual fault-tolerant perception algorithm described in Example 1 to ensure local closed-loop inference and collaborative degradation actions under extreme physical network outage conditions. Simultaneously, the vehicle is modified with drive-by-wire technology to receive remote control commands to execute steering, throttle, and braking.

[0149] Secondly, S62 will build a high-performance cloud-based prediction and digital twin server cluster.

[0150] In practice, the cloud control platform is deployed in a separate data center, relying on blade server arrays and Tensor Processing Unit (TPU) clusters to provide absolute computing power. Within this TPU cluster, the fully structured hybrid time-series prediction model described in Example 1 is deployed to perform high-precision asymmetric loss function training and long-cycle forward inference computation. Simultaneously, a 1:1 virtual digital twin environment, mirroring the real mining area, is constructed for simulation testing, algorithm training, and remote monitoring.

[0151] S63 then constructs a remote cockpit and human-machine physical interaction device.

[0152] In one feasible approach, the remote control terminal deploys a multi-screen splicing display array and a physical control panel with a steering wheel and pedals that provide damping force feedback.

[0153] Therefore, this physical interaction device is used to provide the safety officer with a real-time three-dimensional surround view, and after receiving the trajectory prediction data sent by the edge node, it rigidly superimposes the predicted trajectory line onto the physical light-emitting diode pixel layer of the display array to assist the safety officer in physical takeover.

[0154] S64 then establishes the physical link between the in-vehicle and vehicle-to-cloud communication bus.

[0155] In this embodiment, the microsecond-level interaction between in-vehicle components relies on a hybrid physical topology of the Controller Area Network (CAN) bus and the in-vehicle Ethernet.

[0156] Specifically, the long-range communication physical link between the vehicle and the cloud relies on the macro base station of the 5G private network and the radio frequency antenna matrix of the vehicle communication module (T-Box) to carry the concurrent transmission of full-duplex high-frequency control commands.

[0157] like Figure 2 As shown, the timing interaction process and underlying command transmission mechanism execution phase of the S7 cloud edge are as follows: S71 first performs multi-source synchronous trigger acquisition and uplink packetization of underlying physical data.

[0158] In practice, the underlying crystal oscillator of the vehicle communication module uses a physical clock beat of preferably 100 Hz to synchronously trigger data sampling of various sensors in the chassis.

[0159] The collected communication metrics such as network latency and packet loss rate, along with chassis physical parameters such as vehicle load and road surface adhesion coefficient, are encapsulated into unified data packets containing strict timestamps.

[0160] In addition, the data packet is asynchronously pushed to the upper-layer cloud platform through a high-priority queue of the transmission control protocol stack.

[0161] S72 secondly performs asynchronous pipelined predictive inference and heartbeat synchronization for cloud-edge collaboration.

[0162] In this embodiment, after receiving the uplink data packet, the cloud-based tensor processing unit cluster immediately starts the computing container and performs high-performance forward inference, outputting a high-precision network prediction matrix for the future time window. .

[0163] Meanwhile, the vehicle-mounted edge computing nodes independently execute local temporal convolutional networks for fast degradation inference, generating real-time prediction matrices. .

[0164] Specifically, if the vehicle communication module does not receive the message sent from the cloud within a preset physical time period... In the message frame, the edge computing node will directly transmit the local prediction matrix through the internal high-speed peripheral component interconnect standard (PCIe) bus. Forced overwrite to the shared memory area of ​​the collaborative control daemon. The video stream intelligent control module and the vehicle driving safety control module will read the data from this shared memory area in real time. The matrix performs dynamic polygon region of interest encoding and dynamic rate limiting calculation respectively, ensuring that collaborative degradation control can still be maintained when cloud communication is interrupted.

[0165] This creates an extremely robust hardware and software redundancy defense, completely eliminating the risk of prediction engine paralysis caused by single-point physical link failure.

[0166] S73 then executes the issuance of cross-domain collaborative instructions and the hard response of multiple physical execution units.

[0167] In one feasible approach, the security control decision unit directly sends the calculated dynamic polygon mask boundary coordinates and quantization parameter instructions to the video hardware encoding microchip via local video memory.

[0168] Meanwhile, the Vehicle Control Domain Manager (VDM) first integrates the current vehicle load data collected by chassis sensors. Road adhesion coefficient estimated in real time with chassis anti-lock braking system The safety attenuation coefficient was calculated. .

[0169] in, Based on the basic safety factor, For the maximum rated load, and The preferred values ​​for the weighting adjustment parameters are 0.3 and 0.5.

[0170] Then, the formula for calculating the maximum safe speed will be presented. The derived vehicle speed threshold output is converted into hexadecimal low-level control machine code through calculation.

[0171] Specifically, this machine code, as a speed limit request frame with the highest interrupt priority, is periodically broadcast to the drive motor controller and chassis wire-controlled brake actuator unit via the controller area network bus. S74 then performs closed-loop physical feedback confirmation and safety link keep-alive monitoring.

[0172] After the chassis-controlled brake actuator completes the physical push rod action of the solenoid valve, it immediately generates a physical torque real decay confirmation message and sends it back to the vehicle control domain manager along the data bus in the reverse direction.

[0173] If the vehicle control domain manager fails to capture the acknowledgment message within three consecutive physical bus cycles, it will directly trigger the physical relay to disconnect the main contactor of the power battery, forcing the entire vehicle to power down and enter an absolute physical shutdown state.

[0174] Furthermore, through the aforementioned underlying hardware-level link keep-alive monitoring mechanism, the remote driving collaborative control scheme of this invention is completely transformed from abstract algorithmic logic into a physical protection entity with an extreme safety baseline.

[0175] Example 4 Further explanation in conjunction with Example 1, such as Figure 7 The structure shown. Figure 7 A schematic diagram of the structure of a computer device provided in an embodiment of this application. The computer device includes: Processor, memory, communication bus, and computer programs stored in memory that can run on the processor.

[0176] The processor can call a computer program in memory, and when executing the program, implement the remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction provided in the above embodiments. The method includes: S1, extracting low-level communication data to construct the original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. S2, For the original input matrix Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. Preprocessing includes multi-scale temporal convolution, bidirectional contextual feature concatenation, and time-step feature weighting based on an attention mechanism; S3, based on the global prediction sequence matrix... S4. Generate a dynamic polygon mask based on the vehicle's physical situation and environmental spatial information, and perform differentiated directional coding using the dynamic polygon mask; According to the dynamic safety attenuation coefficient Determine the maximum safe speed threshold And based on the maximum safe speed threshold Active torque intervention and speed limiting are applied to the underlying drive-by-wire chassis.

[0177] Furthermore, computer equipment also includes: The Communications Interface (CI) is used for communication between the memory and the processor.

[0178] The memory may include high-speed RAM, and may also include non-volatile memory, such as at least one disk drive.

[0179] If the memory, processor, and communication interface are implemented independently, they can be interconnected via a bus to communicate with each other. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 7 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0180] Furthermore, the logical instructions in the aforementioned memory can be implemented as software functional units and sold or used as independent products, and can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0181] A processor may include one or more processing units, such as an application processor (AP), an application-specific integrated circuit (ASIC), a modem processor, a central processing unit (CPU), an image signal processor (ISP), a controller, memory, a video codec, a digital signal processor (DSP), a baseband processor, and / or a neural network processing unit (NPU). Different processing units may be independent devices or integrated into one or more processors. The controller may serve as a central nervous system and command center. The controller generates operation control signals based on instruction opcodes and timing signals to control instruction fetching and execution. The processor may also include memory for storing instructions and data. In some embodiments, the memory in the processor is a cache memory. This memory can store instructions or data that the processor has just used or that is used repeatedly. If the processor needs to reuse the instruction or data, it can directly retrieve it from the memory. This avoids repeated access, reduces processor waiting time, and thus improves system efficiency.

[0182] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0183] Display devices are used to display images, videos, etc. Display devices may include display panels, which can be liquid crystal displays (LCDs), organic light-emitting diodes (OLEDs), active-matrix organic light-emitting diodes (AMOLEDs), flexible light-emitting diodes (FLEDs), MiniLEDs, MicroLEDs, Micro-OLEDs, quantum dot light-emitting diodes (QLEDs), etc.

[0184] Alternatively, in a specific implementation, if the memory, processor, and communication interface are integrated on a single chip, then the memory, processor, and communication interface can communicate with each other through an internal interface.

[0185] On the other hand, embodiments of this application also provide a computer-non-transitory readable storage medium storing a computer program thereon. When executed by a processor, this program implements the remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction as described above. The method includes: S1, extracting low-level communication data to construct an original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. S2, For the original input matrix Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. Preprocessing includes multi-scale temporal convolution, bidirectional contextual feature concatenation, and time-step feature weighting based on an attention mechanism; S3, based on the global prediction sequence matrix... S4. Generate a dynamic polygon mask based on the vehicle's physical situation and environmental spatial information, and perform differentiated directional coding using the dynamic polygon mask; According to the dynamic safety attenuation coefficient Determine the maximum safe speed threshold And based on the maximum safe speed threshold Active torque intervention and speed limiting are applied to the underlying drive-by-wire chassis.

[0186] In another aspect, embodiments of this application also provide a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. The computer program can execute computer instructions, and when executed by a processor, the computer can execute the remote driving cooperative control system and method based on TCN-BiLSTM-Attention network prediction provided by the methods described above. This method includes: S1, extracting underlying communication data to construct an original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. S2, For the original input matrix Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. Preprocessing includes multi-scale temporal convolution, bidirectional contextual feature concatenation, and time-step feature weighting based on an attention mechanism; S3, based on the global prediction sequence matrix... S4. Generate a dynamic polygon mask based on the vehicle's physical situation and environmental spatial information, and perform differentiated directional coding using the dynamic polygon mask; According to the dynamic safety attenuation coefficient Determine the maximum safe speed threshold And based on the maximum safe speed threshold Active torque intervention and speed limiting are applied to the underlying drive-by-wire chassis.

[0187] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus or device (such as a computer-based system, a processor-included system or other system that can fetch and execute instructions from, an instruction execution system, apparatus or device).

[0188] For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit a program for use in or in conjunction with an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. Additionally, a computer-readable medium can even be paper or other suitable media on which the program can be printed, since the program can be obtained electronically by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.

[0189] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0190] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0191] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0192] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.

[0193] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution disclosed in this disclosure can be achieved, and this is not limited herein.

[0194] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of this application.

Claims

1. A remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction, characterized in that, Includes the following steps: S1. Extract underlying communication data to construct the original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. ; S2. For the original input matrix Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. Preprocessing includes multi-scale temporal convolution processing, bidirectional contextual feature concatenation, and time-step feature weighting based on attention mechanism; S3. Based on the global prediction sequence matrix The vehicle's physical state and environmental spatial information are used to generate a dynamic polygon mask, and the dynamic polygon mask is used to perform differentiated directional coding. S4. Determine the dynamic safety attenuation coefficient based on the vehicle's physical condition. According to the dynamic safety attenuation coefficient Determine the maximum safe speed threshold And based on the maximum safe speed threshold Active torque intervention and speed limiting are applied to the underlying drive-by-wire chassis.

2. The remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction according to claim 1, characterized in that, Step S2 specifically includes: S21. The original input matrix... Using the exponentially weighted moving average algorithm for any time step Perform data feature smoothing to obtain smoothed feature vectors. Based on smooth eigenvectors The Z-Score normalization algorithm is applied to eliminate dimensional differences and obtain standardized eigenvectors. ; S22, set any time step Standardization characteristics The input is fed into a multi-layer temporal convolutional network consisting of stacked one-dimensional causal convolutions and dilated convolutions to perform multi-scale temporal convolution processing. The multi-scale temporal convolution processing formula is expressed as: ; in, The weight vector of the convolution kernel. To sense the size of the field, The dilation factor, the dilation factor in a multi-layer temporal convolutional network. The expansion factor increases exponentially along the network depth. The value can be 1, 2, 4, or 8; High-level feature sequences are determined based on the top-level output tensor of multi-layer temporal convolutional networks. ; S23, The high-level feature sequence Bidirectional context feature concatenation is performed in the bidirectional recurrent network unit, and the forward long short-term memory unit calculates the forward hidden state in ascending time order. The backward hidden state is calculated by sliding backward long short-term memory units in reverse chronological order. Hide the forward state With backward hidden state Perform spatial concatenation along the feature dimensions to obtain the joint feature matrix. ; S24. Combine the characteristic matrix Respectively with the query weight matrix Key weight matrix and value weight matrix Multiply to obtain the query matrix Key matrix and value matrix ; Using query matrix AND key matrix The unnormalized attention score is obtained by performing dot product similarity calculation. Unnormalized attention score Dynamic attention weights are obtained through processing using the Softmax function. Based on dynamic attention weights Log-value matrix The value vectors in the tensor are weighted and summed to obtain the focused context feature tensor. ; S25, Focusing on Context Feature Tensors The input to the fully connected output layer is subjected to a linear transformation, focusing on the context feature tensor. Weight matrix of the output fully connected layer Multiply and add the bias vector Perform dimensionality reduction mapping to obtain the global prediction sequence matrix. ; From the global prediction sequence matrix Extracting end-to-end delay prediction subsequences Packet loss rate prediction subsequence and downlink bandwidth prediction subsequence ; For downlink bandwidth prediction subsequence Perform integral calculations to obtain the total downlink bandwidth capacity within the future time window. .

3. The remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction according to claim 2, characterized in that, Step S2 also includes a model training phase: S26, Based on asymmetric penalty loss function Drive the TCN-BiLSTM-Attention hybrid network model to perform backpropagation closed-loop training; Obtaining real-world future network latency With predicted latency ; When the actual network latency of the future Greater than the predicted delay When, the asymmetric penalty coefficient will be... Introducing an asymmetric penalty loss function Calculate the partial derivatives, based on the asymmetric penalty coefficient. The partial derivatives amplify the gradient step size during backpropagation, and the weight updates of the TCN-BiLSTM-Attention hybrid network model are performed by amplifying the gradient step size during backpropagation.

4. The remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction according to claim 2, characterized in that, The global prediction sequence matrix is ​​parsed in step S3. ; Downlink bandwidth prediction subsequence If multiple consecutive time steps are below a preset video connectivity bandwidth threshold, or if the packet loss rate prediction subsequence... When the packet loss exceeds the preset limit, the following steps are triggered: S31. Utilizing the spatial coordinates in the vehicle's physical state With heading angle Constructing the dynamic pose transformation matrix Combined with the global dynamic external parameter matrix of the vehicle camera and internal parameter matrix Set the three-dimensional world coordinates of the centerline of the road ahead. Mapped to the current video frame to generate the set of center pixels ; S32. Apply the least squares method to the set of center pixels. Perform a polynomial smoothing fit to obtain the equation for the centerline of the continuous road; combine this with the current vehicle speed. and basic viewing distance width Construct a mathematical equation for dynamically expanding width; introduce yaw rate. Construct an asymmetric extended simultaneous equation, and calculate the width of the left and right boundaries based on the asymmetric extended simultaneous equation to obtain the edge contour point set; Close the edge contour point set to generate a dynamic polygonal region of interest mask, so as to define the two-dimensional pixel-coded boundary that matches the physical road trend; S33. Based on the total downlink bandwidth capacity within the future time window. Determine the physical boundary constraints of throughput; construct quantization parameters using polygonal region of interest masks. The goal is to minimize the dynamic optimization equation; initialize the polygonal region of interest mask quantization parameters. The value of is determined and the parameters are quantized based on the polygon region of interest mask. Compared with the preset differential pressure threshold Determine the quantization parameters of the background region Generate quantization parameters containing polygonal region of interest mask. Quantization parameters of the background region The trial parameter pair; The predicted total coded data volume under the current parameter combination is determined based on the trial parameters; when the predicted total coded data volume is greater than the total downlink bandwidth capacity within the future time window... At that time, the quantization parameters of the polygon region of interest mask are increased by adjusting the long-cycle distance from the walk. The value is determined and the quantization parameters of the background area are updated simultaneously. The value of is determined until the quantization parameter of the background region is indirectly satisfied, while still meeting the physical boundary constraints of throughput. Greater than or equal to the polygon region of interest mask quantization parameter Compared with the preset differential pressure threshold The parameter combination of the forced differential pressure lower limit constraint of the sum; Determine the polygonal region of interest mask quantization parameters based on the matched parameter combinations. The optimal solution and background region quantization parameters The optimal solution; Parameter quantization using polygonal region of interest mask The optimal solution performs fine-grained encoding on the region corresponding to the dynamic polygon region of interest mask, utilizing the background region quantization parameters. The optimal solution performs coarse encoding on the background region outside the dynamic polygon region of interest mask to output a differentiated compressed video stream containing the core region of interest.

5. The remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction according to claim 2, characterized in that, Step S4 specifically includes: S41, End-to-end delay prediction subsequence Performing an arithmetic mean calculation yields the future average latency, which characterizes the macroscopic trend of future network congestion. ; S42. Based on the current vehicle load Maximum rated load The load penalty term is determined proportionally based on the road surface adhesion coefficient. Determine the pavement attenuation term; multiply the load penalty term by the first weighting adjustment parameter. Multiply the road surface attenuation term by the second weighting adjustment parameter. The multiplied load penalty term and the multiplied pavement attenuation term are then combined with the basic safety factor. Perform linear superposition calculations to obtain the dynamic safety attenuation coefficient. Dynamic safety attenuation coefficient The calculation formula is expressed as: ; S43, Based on the physical limit speed Dynamic safety attenuation coefficient Compared with future average latency Construct an engineering application formula, perform multiplicative decay calculation based on the engineering application formula, and obtain the maximum safe vehicle speed threshold. The engineering application formula is expressed as: ; S44, Maximum safe vehicle speed threshold The data is converted into bus control messages and sent to the chassis drive-by-wire actuator unit to perform speed limiting intervention, so as to maintain a physical collision avoidance safety distance within the network degradation time window.

6. The remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction according to claim 2, characterized in that, This includes adaptive degradation steps for prediction models under hardware-constrained computing conditions: Get the graphics processor utilization rate and the core thermistor temperature; when the graphics processor utilization rate continuously exceeds the preset load alarm threshold, or the core thermistor temperature continuously exceeds the preset temperature alarm threshold, generate a model structure degradation instruction. According to the model structure downgrade instruction, skip step S23, which involves processing the high-level feature sequence. Obtain the joint characteristic matrix The steps, and skipping step S24 based on the joint feature matrix Obtain the focused context feature tensor The steps are to extract high-level feature sequences. Degradation mapping weight matrix Multiply and add the downgraded bias vector The degraded network delay prediction matrix is ​​obtained. To replace the global prediction sequence matrix under hardware computing power constraints. Trigger the downgraded collaborative intervention mechanism.

7. The remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction according to claim 6, characterized in that, This includes visual fault-tolerant perception and hard braking procedures under sensor communication timeout conditions: Listen for heartbeat messages on the chassis controller's local area network bus; when a communication timeout occurs at a node of the chassis anti-lock braking system, it affects the road adhesion coefficient. When missing, extract the global brightness histogram feature matrix of the current video frame image. High-frequency edge texture feature matrix ; The global brightness histogram feature matrix High-frequency edge texture feature matrix The input is fed into a support vector machine classifier to perform classification matching, resulting in a conservative road surface adhesion coefficient. ; S53, based on the current vehicle load Maximum rated load The load penalty term is determined proportionally based on the conservative road adhesion coefficient. Determine the conservative road surface attenuation term; Multiply the load penalty term by the first weight adjustment parameter. Multiply the conservative pavement attenuation term by the second weighting adjustment parameter. The multiplied load penalty term, the multiplied conservative pavement attenuation term, and the basic safety factor will be added. and extreme value safety penalty bias term Perform linear superposition calculations to obtain the degradation dynamic attenuation coefficient. Degradation dynamic attenuation coefficient The calculation formula is expressed as: ; S54, Based on the physical speed limit Degradation dynamic attenuation coefficient Degraded network latency prediction matrix The future average physical delay Perform multiplicative decay calculations to obtain the decay physical limit speed; then compare the decay physical limit speed with the minimum creep speed boundary. Perform maximum value comparison and extraction calculation to obtain the maximum safe speed threshold for downgrading. Downgrade the maximum safe speed threshold The calculation formula is expressed as: ; Based on the downgraded maximum safe speed threshold Generate the highest-level interrupt request command and send it to the chassis drive system to execute physical braking hard lock, so as to maintain the vehicle's physical collision avoidance safety baseline in the event of sensor communication timeout.

8. A remote driving cooperative control system based on TCN-BiLSTM-Attention network prediction, characterized in that, include: The multi-dimensional perception acquisition module is used to extract underlying communication data to construct the original input matrix. Get the current vehicle load Spatial coordinates Heading angle and road surface adhesion coefficient The vehicle's physical state is assessed, and spatial information of the preceding environment is retrieved to extract the three-dimensional world coordinate set of the road centerline. To perform step S1 as claimed in claim 1; The network state look-ahead prediction module is used to process the original input matrix. Perform preprocessing to obtain a global prediction sequence matrix containing network communication metrics for future time windows. The preprocessing includes multi-scale temporal convolution processing, bidirectional context feature concatenation, and time-step feature weighting processing based on attention mechanism, in order to perform step S2 as claimed in claim 1; The cross-domain degradation collaborative intervention triggering module is used to parse the global prediction sequence matrix. Based on the global prediction sequence matrix The video information control module and the chassis physical control module are triggered synchronously to execute the steps of the synchronous triggering degradation and collaborative intervention mechanism as described in claim 1; The video information control module is used to determine the global prediction sequence matrix. The vehicle's physical state and environmental spatial information are used to generate a dynamic polygon mask, and the dynamic polygon mask is used to perform differentiated directional coding to perform step S3 as described in claim 1. The chassis physical control module is used to determine the dynamic safety attenuation coefficient based on the vehicle's physical condition. According to the dynamic safety attenuation coefficient Determine the maximum safe speed threshold And based on the maximum safe speed threshold Active torque intervention speed limiting is applied to the underlying drive-by-wire chassis to perform step S4 as described in claim 1.

9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, The processor executes the computer program to implement the steps of the remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction as described in any one of claims 1-7.

10. A computer program product comprising computer instructions, characterized in that, When executed by one or more processors, the computer program implements the steps of the remote driving cooperative control method based on TCN-BiLSTM-Attention network prediction as described in any one of claims 1-7.