Deep learning-based internet of vehicles data processing and decision optimization method and device

By using a deep learning-based dual-branch network structure, implicit spatiotemporal representation learning is fused with adaptive learning, which solves the problems of high computational overhead and missing implicit interactions in vehicle-to-everything (V2X) decision-making systems, and achieves real-time and high-precision decision-making in V2X systems.

CN122196886APending Publication Date: 2026-06-12CHONGQING ZHUYUAN AUTOMOBILE TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING ZHUYUAN AUTOMOBILE TECHNOLOGY CO LTD
Filing Date
2026-03-06
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing deep learning-based vehicle networking decision-making systems suffer from high computational update overhead in spatiotemporal feature extraction, making it difficult to meet real-time requirements. Furthermore, explicit graph structures are prone to missing implicit interactions, resulting in insufficient decision-making accuracy.

Method used

A deep learning-based dual-branch network structure is adopted to capture the spatiotemporal correlation of multimodal data and the implicit interaction features between vehicles through implicit spatiotemporal representation learning. Combined with adaptive fusion and cross-modal adaptation, a single-branch multi-task decision optimization model is constructed to generate vehicle-level and roadside-level decision results.

🎯Benefits of technology

It eliminates the need for explicit construction of vehicle interaction graphs, reducing computational overhead, enhancing the expressive power of spatiotemporal fusion features, and ensuring the real-time nature and reliability of decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122196886A_ABST
    Figure CN122196886A_ABST
Patent Text Reader

Abstract

The application relates to the technical field of deep learning, and discloses a vehicle networking data processing and decision optimization method and device based on deep learning, which comprises the following steps: acquiring vehicle networking multi-source data, and correlatively storing and preprocessing the vehicle networking multi-source data to obtain standardized space-time data; constructing a double-branch network structure based on deep learning, performing implicit space-time representation learning on the standardized space-time data, and obtaining space-time fusion features fusing multi-modal features; constructing a decision optimization model based on the space-time fusion features, generating vehicle-level and roadside-level decision results in a vehicle networking scene; jointly training and real-time verifying the double-branch network structure and the decision optimization model, combining the decision results and scene data to construct a data closed loop; and iteratively adjusting network parameters of the double-branch network structure and the decision optimization model to obtain a fusion model with optimized parameters.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of deep learning technology, and more specifically to a method and apparatus for data processing and decision optimization in vehicle networking based on deep learning. Background Technology

[0002] The core value of the Internet of Vehicles lies in improving traffic safety and efficiency through the collaborative interaction of "vehicle-road-people". Its data has the typical characteristics of "strong spatiotemporal correlation, strong dynamism, and multi-source heterogeneity": the vehicle status changes dynamically over time and has a strong spatial interaction dependency with surrounding vehicles and the roadside environment.

[0003] Current deep learning-based vehicle-to-everything (V2X) decision-making systems face key bottlenecks in spatiotemporal feature extraction: To capture spatial interactions between vehicles, existing solutions mostly employ dynamic graph neural networks, requiring real-time construction of a graph structure with vehicles as nodes and interaction relationships as edges. However, in V2X scenarios, vehicles move at high speeds, and nodes / edges change dynamically at millisecond levels, resulting in a high proportion of computational overhead for graph construction and updates. The end-to-end latency is insufficient to meet the real-time requirements of driving decisions. Furthermore, explicit graph structures rely on pre-defined interaction rules, easily overlooking implicit interactions, leading to insufficient decision-making accuracy. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention proposes a deep learning-based method and apparatus for vehicle network data processing and decision optimization, which solves the aforementioned technical problems.

[0005] Firstly, it provides a deep learning-based method for vehicle-to-everything (V2X) data processing and decision optimization, including: Acquire multi-source data from the Internet of Vehicles (IoV), and perform correlation storage and preprocessing on the IoV multi-source data to obtain standardized spatiotemporal data; A dual-branch network structure is constructed based on deep learning, and implicit spatiotemporal representation learning is performed on the standardized spatiotemporal data to obtain spatiotemporal fusion features that integrate multimodal features; Based on the spatiotemporal fusion features, a decision optimization model is constructed to generate vehicle-level and roadside-level decision results in the Internet of Vehicles scenario. The dual-branch network structure and decision optimization model are jointly trained and validated in real time. The decision results and scenario data are combined to construct a data closed loop. The network parameters of the dual-branch network structure and the decision optimization model are iteratively adjusted to obtain the parameter-optimized fusion model.

[0006] Furthermore, the acquisition of multi-source data from the vehicle network includes: The system adopts a two-level data acquisition architecture: vehicle-side edge-cloud. The vehicle-side collects vehicle status data and uploads structured feature data after preliminary outlier filtering. The roadside edge unit collects vehicle perception data, environmental data, and V2X interaction messages within a local area. The cloud acquires regional traffic history data and control information.

[0007] Furthermore, the preprocessing of the multi-source data from the vehicle network to obtain standardized spatiotemporal data includes: A rule-based filtering strategy combined with short-term prediction and completion is adopted to remove outliers caused by sensor failures and complete missing data caused by packet loss in V2X communication. Standardize the multi-source data of different modalities, map the structured vehicle state data to a preset numerical range, convert the perception data into low-dimensional features, and parse the V2X interaction messages into semantic feature vectors. Time synchronization protocols and filtering algorithms are used to correct time and location discrepancies in multi-source data. The roadside edge unit automatically configures scene parameters based on the road type to obtain standardized spatiotemporal data.

[0008] Furthermore, the method of constructing a dual-branch network structure based on deep learning to perform implicit spatiotemporal representation learning on the standardized spatiotemporal data, thereby obtaining spatiotemporal fusion features that integrate multimodal features, includes: Standardized spatiotemporal data is mapped to a pre-defined local spatial grid according to spatial location; The grid cells without data are filled with learnable null value identifier vectors, and the filled grid data is subjected to layer normalization to obtain a structured input tensor. An improved ConvLSTM network is constructed as a temporal branch, and the temporal branch is used to process the structured input tensor and fuse long-range early warning information from V2X semantic features. A structured prior local attention convolutional module is constructed as a spatial branch, and a local grid is divided with the target vehicle as the center. Attention weights are calculated based on the relative states and modal credibility between vehicles. Implicit spatial interaction features are extracted by combining depthwise separable convolution; adaptive fusion weights are generated based on vehicle motion features, and the output features of temporal and spatial branches are fused using residual connection; The modal attention gating module is used to adapt and fuse the fused features with the original multimodal features to obtain the spatiotemporal fused features of the multimodal features.

[0009] Furthermore, the step of constructing a decision optimization model based on the spatiotemporal fusion features to generate vehicle-level and roadside-level decision results in the vehicle-to-everything (V2X) scenario includes: A single-branch multi-task decision optimization model is constructed, and spatiotemporal fusion features are input into the single-branch multi-task decision optimization model; The single-branch multi-task decision optimization model outputs short-term vehicle state prediction results, as well as vehicle-level decision actions and roadside-level decision actions in the Internet of Vehicles scenario. A multi-objective reward function is designed based on locally instantaneously computable indicators, and the core reward term of the reward function is constructed by combining collision time, vehicle speed ratio and acceleration change rate. Based on the scene recognition results, the weight coefficients of each reward item are adaptively adjusted; A deterministic uncertainty assessment method based on a single forward propagation is adopted, and a dual-output branch is set in the output layer of the decision optimization model; The dual output branches output the mean and logarithmic variance of the decision action, respectively, and the decision confidence is calculated based on the mean and logarithmic variance. In critical scenarios, a lightweight secondary verification process is triggered to generate vehicle-level and roadside-level decision results in the context of vehicle-to-everything (V2X) scenarios.

[0010] Furthermore, the joint training and real-time verification of the dual-branch network structure and the decision optimization model, combined with the decision results and scenario data, to construct a data closed loop includes: A spatiotemporal interactive annotation dataset was constructed using real roadside data and simulation scenario data; Temporal pre-training of the dual-branch network structure; The dual-branch network structure is jointly fine-tuned with the decision optimization model and the uncertainty estimation module; The model is validated in real time from three dimensions: end-to-end latency, interaction capture accuracy, and decision security. When latency exceeds the limit, activate the feature dimensionality reduction strategy; When the confidence level of the interactive capture is lower than the threshold, switch to rule-based decision-making as a fallback. When a decision exceeds a safety threshold, the decision is forcibly corrected, and a data loop is constructed by combining the decision results with scenario data.

[0011] Furthermore, the iterative adjustment of the network parameters of the dual-branch network structure and the decision optimization model to obtain the parameter-optimized fusion model includes: Based on the accumulated spatiotemporal interaction data and uncertainty assessment feedback results, the cloud retrains each branch of the dual-branch network structure and decision optimization model according to the regular cycle, so as to achieve the initial iteration of network parameters; The cloud-based system monitors low-confidence decision events and security verification trigger events reported by roadside edge units in real time. Cluster analysis is performed on frequently occurring new scene data to trigger emergency model patch training and further optimize network parameters; The optimized network parameters obtained from regular periodic training and hotspot emergency training are distributed to the roadside edge units; The local fusion update of the dual-branch network structure and the decision optimization model is completed to obtain the fusion model with optimized parameters.

[0012] Secondly, a deep learning-based vehicle network data processing and decision optimization device is provided, based on any one of the deep learning-based vehicle network data processing and decision optimization methods described above, including: The acquisition module is configured to acquire multi-source data from the Internet of Vehicles (IoV) and perform associated storage and preprocessing on the multi-source data to obtain standardized spatiotemporal data. The learning module is configured to construct a dual-branch network structure based on deep learning, and perform implicit spatiotemporal representation learning on the standardized spatiotemporal data to obtain spatiotemporal fusion features that integrate multimodal features; The construction module is configured to build a decision optimization model based on the spatiotemporal fusion features and generate vehicle-level and roadside-level decision results in the Internet of Vehicles scenario; The verification module is configured to jointly train and verify the dual-branch network structure and the decision optimization model in real time, and to construct a data closed loop by combining the decision results and scenario data. The iteration module is configured to iteratively adjust the network parameters of the dual-branch network structure and the decision optimization model to obtain the parameter-optimized fusion model.

[0013] Thirdly, a terminal is provided, including a processor, an input device, an output device, and a memory, wherein the processor, input device, output device, and memory are interconnected, wherein the memory is used to store a computer program, the computer program including program instructions, and the processor is configured to call the program instructions to execute the deep learning-based vehicle networking data processing and decision optimization method as described in any of the preceding claims.

[0014] Fourthly, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program, the computer program including program instructions, which, when executed by a processor, cause the processor to perform the deep learning-based vehicle networking data processing and decision optimization method as described in any of the preceding claims.

[0015] The invention employing the above technical solution has the following advantages: This invention utilizes a dual-branch network structure built upon deep learning. It eliminates the need for explicit construction of vehicle interaction graphs. Instead, it captures the spatiotemporal correlations of multimodal data and implicit interaction features between vehicles through implicit spatiotemporal representation learning, thus avoiding the high computational overhead associated with explicit graph construction. Furthermore, through adaptive fusion and cross-modal adaptation, it enhances the expressive power of spatiotemporal fusion features, providing high-quality feature support for decision optimization.

[0016] This invention presents a single-branch multi-task decision optimization model based on spatiotemporal fusion features, which can stably generate vehicle-level and roadside-level decision results. Combined with deterministic and uncertainty assessment of a single forward propagation, it ensures the reliability of decision-making while meeting the real-time requirements of vehicle-to-everything (V2X) decision-making. Attached Figure Description

[0017] To more clearly illustrate the specific embodiments of the present invention, the accompanying drawings used in the specific embodiments will be briefly described below. In all the drawings, the elements or parts are not necessarily drawn to scale.

[0018] Figure 1 This is a flowchart of the deep learning-based vehicle network data processing and decision optimization method of the present invention; Figure 2 This is a flowchart of the deep learning-based vehicle network data processing and decision optimization device of the present invention; Figure 3 This is a schematic diagram of the terminal structure. Detailed Implementation

[0019] The embodiments of the technical solution of the present invention will now be described in detail with reference to the accompanying drawings. These embodiments are merely illustrative of the technical solution of the present invention and are therefore intended to limit the scope of protection of the present invention.

[0020] like Figures 1-3 As shown, the present invention provides a deep learning-based method for vehicle network data processing and decision optimization, comprising: Step S01: Acquire multi-source data from the Internet of Vehicles (IoV), and perform correlation storage and preprocessing on the multi-source data to obtain standardized spatiotemporal data; Step S02: Construct a dual-branch network structure based on deep learning, perform implicit spatiotemporal representation learning on standardized spatiotemporal data, and obtain spatiotemporal fusion features that integrate multimodal features; Step S03: Construct a decision optimization model based on spatiotemporal fusion features to generate vehicle-level and roadside-level decision results in the Internet of Vehicles scenario; Step S04: Jointly train and validate the dual-branch network structure and the decision optimization model in real time, and construct a data closed loop by combining the decision results and scenario data; Step S05: Iteratively adjust the network parameters of the dual-branch network structure and the decision optimization model to obtain the parameter-optimized fusion model.

[0021] In this embodiment, acquiring multi-source data from the vehicle network includes: The system adopts a two-tier architecture of vehicle-side and roadside edge collaborative data acquisition + cloud support. The vehicle-side collects vehicle status data and uploads structured feature data after preliminary outlier filtering. The roadside edge unit collects vehicle perception data, environmental data and V2X interaction messages in the local area. The cloud obtains regional traffic history data and control information.

[0022] Specifically, a two-tier data acquisition architecture, consisting of vehicle-side and roadside edge sensors, is adopted to focus on core data acquisition and efficient correlation, reducing transmission and storage overhead. Data collection targets and hierarchical division of labor: On the vehicle side: Vehicle status data (speed, acceleration, position coordinates, steering angle) is collected through OBU and on-board sensors at a sampling frequency of 10Hz. After the initial filtering of outliers (3σ principle) is completed locally, only structured feature data is uploaded. Roadside unit: Through millimeter-wave radar, high-definition camera, and weather sensor integrated into the RSU, it collects vehicle position, relative distance, road surface condition, and ambient light intensity within a local area (500m range, 3 lanes), with a sampling frequency of 15Hz; it also collects V2X interaction messages (including semantic information such as vehicle lane change intentions and long-distance congestion warnings). Cloud-based: Obtain historical traffic flow data and control information from the traffic management platform for offline model training and macro-level scheduling reference.

[0023] Data association and transmission: Using high-precision timestamps and UTM coordinates as dual keys, roadside edge units are associated with features uploaded by vehicles and roadside perception data, and only decision results and key features are transmitted to the cloud; historical data and model parameters are stored in the cloud to support offline training.

[0024] In this embodiment, the multi-source data of the vehicle network is preprocessed to obtain standardized spatiotemporal data, including: A rule-based filtering strategy combined with short-term prediction and completion is adopted to remove outliers caused by sensor failures and complete missing data caused by packet loss in V2X communication. Standardize the multi-source data of different modalities, map the structured vehicle state data to a preset numerical range, convert the perception data into low-dimensional features, and parse the V2X interaction messages into semantic feature vectors. Time synchronization protocols and filtering algorithms are used to correct time and location discrepancies in multi-source data. The roadside edge unit automatically configures scene parameters based on the road type to obtain standardized spatiotemporal data.

[0025] Specifically, the roadside edge unit dominates the preprocessing process, while the cloud only assists in comparing historical data to ensure data quality and spatiotemporal consistency. Data cleaning and completion: A rule-based filtering + short-term prediction completion strategy is adopted to remove outliers caused by sensor failures; for V2X communication packet loss, linear interpolation based on 3 historical frames of data is used to complete the data and avoid errors introduced by long-term prediction. Modal normalization: Structured data (velocity, acceleration, etc.): Min-Max normalization is used to map to the [0,1] interval to unify the units; Perception data (radar point cloud, image): Radar point cloud is converted into low-dimensional pseudo-image features through voxelization, and image data is compressed and normalized. V2X messages are parsed into key-value pairs of vehicle ID, location, time, and interaction intent. Core semantic features are extracted and converted into semantic feature vectors through a single fully connected layer.

[0026] Spatiotemporal calibration: Time synchronization between vehicle-side and roadside equipment is achieved through the NTP protocol: Kalman filtering is used to correct vehicle position deviations and ensure the consistency of multi-source data in the same spatiotemporal scenario; Road scene parameter configuration: The roadside edge unit automatically configures scene parameters according to the road level (highway / urban road / intersection ramp), including local grid reference size and long-range interaction perception threshold, to provide an adaptation basis for subsequent feature rasterization and interaction capture.

[0027] In this embodiment, a dual-branch network structure is constructed based on deep learning to perform implicit spatiotemporal representation learning on standardized spatiotemporal data, resulting in spatiotemporal fusion features that integrate multimodal features, including: Standardized spatiotemporal data is mapped to a pre-defined local spatial grid according to spatial location; The grid cells without data are filled with learnable null value identifier vectors, and the filled grid data is subjected to layer normalization to obtain a structured input tensor. An improved ConvLSTM network is constructed as a temporal branch, which is used to process the structured input tensor and fuse long-range early warning information from V2X semantic features. A structured prior local attention convolutional module is constructed as a spatial branch, and a local grid is divided with the target vehicle as the center. Attention weights are calculated based on the relative states and modal credibility between vehicles. Implicit spatial interaction features are extracted by combining depthwise separable convolution; adaptive fusion weights are generated based on vehicle motion features, and the output features of temporal and spatial branches are fused using residual connection; The modal attention gating module is used to adapt and fuse the fused features with the original multimodal features to obtain the spatiotemporal fused features of the multimodal features.

[0028] Specifically, a dual-branch structure of temporal ConvLSTM + structured prior local attention convolution is designed, combined with feature rasterization and adaptive mechanisms, to accurately capture spatiotemporal interactions and dependencies through implicit feature associations without explicitly constructing a vehicle interaction graph. Feature mapping and rasterization: Constructing a unified spatial grid tensor to organize multimodal features according to their spatial location, adapting to ConvLSTM input requirements: Mesh initialization: Based on the baseline dimensions configured in the previous steps, construct an M×N local mesh with the geometric center of the roadside edge unit coverage area as the origin, and each mesh cell corresponds to a physical space area; Feature projection: The target vehicle features, neighboring vehicle features, radar pseudo-image features, and V2X semantic features are mapped to the corresponding grid cells according to their respective UTM coordinates to form an initial grid feature map; Empty grid filling: For grid cells with no vehicles / no perception data, fill them with learnable null value identifier vectors (optimized through offline training to avoid zero value interference). Tensor normalization: Performs layer normalization on the mesh feature map, outputting a structured input tensor with a uniform format. This ensures that ConvLSTM effectively captures spatial information.

[0029] Temporal long-short dependency capture (temporal branch): An improved ConvLSTM network is used, which combines convolutional operations with LSTM to capture temporal dependencies while preserving spatial feature correlations. The formula is as follows: in, For the rasterized structured input tensor, The hidden state (temporal feature vector) at time t. In cellular state, , , These are the input gate, forget gate, and output gate, respectively. , This is the weight matrix. This is a bias term; by adjusting the forget gate threshold (dynamically adapting to vehicle speed), long-term dependency capture is enhanced; at the same time, long-range warning information from V2X semantic features is integrated to indirectly enhance long-range interaction perception capabilities.

[0030] Spatial implicit interaction capture (spatial branch): Constructs a structured prior local attention convolutional module based on a regular grid prior of road topology. It eliminates the need for explicit definition of interaction edges, implicitly associating features of adjacent vehicles through attention weights, and also supports adaptive grid size. Dynamic local area division: Centered on the target vehicle, a 3×3 local grid is divided according to the baseline size configured in the previous steps to cover all potential interactive vehicles; Spatial attention weight calculation: Based on the relative speed, distance, azimuth angle, and modal confidence of vehicles within the grid and the target vehicle, the attention weight is calculated using the following formula: in, Characteristics of the target vehicle. Features of other vehicles within the local grid. As a relative distance feature, The characteristics are relative velocity. For modal reliability weights, for Convolution (dimensionality reduction and feature fusion) Use the Sigmoid activation function; Attention-based convolutional fusion: After multiplying the attention weights element-wise with the local grid features, spatial interaction features are extracted through a 3×3 depthwise separable convolution to obtain an implicit spatial interaction representation; Long-range interaction indirect reinforcement: By capturing historical velocity change trends through time-series branches and coordinating with long-range early warning information in V2X semantic features, the long-range interaction impact beyond the local grid can be indirectly perceived.

[0031] Adaptive spatiotemporal feature fusion: Adaptive weight generation + residual connection is used to fuse temporal and spatial branch features, with dynamic weights. Generated adaptively based on vehicle motion characteristics, rather than empirically defined: Adaptive weight generation: Construct a small neural network (2 fully connected layers + Sigmoid activation), inputting instantaneous vehicle motion features (absolute acceleration values). velocity variance Output dynamic weights The formula is as follows: in, , As a fully connected layer, the weight parameters are optimized through offline training to achieve adaptive adjustment under different motion states; Spatiotemporal fusion computing: in, Output features for spatial branching. The original input features are rasterized (residual connections avoid gradient vanishing); through 2 layers Convolution reduces the dimensionality of the fused features, generating a lightweight implicit spatiotemporal representation feature map.

[0032] Cross-modal feature adaptation: Construct a modal attention gating module to adapt and fuse spatiotemporal representation features with multimodal data (sensors, radar, V2X): in, The gating function (which dynamically adjusts weights based on the credibility of modal data) ensures effective collaboration between multimodal features and spatiotemporal representation.

[0033] In this embodiment, a decision optimization model is constructed based on spatiotemporal fusion features to generate vehicle-level and roadside-level decision results in the vehicle-to-everything (V2X) scenario, including: A single-branch multi-task decision optimization model is constructed, and spatiotemporal fusion features are input into the single-branch multi-task decision optimization model; The single-branch multi-task decision optimization model outputs short-term vehicle state prediction results, as well as vehicle-level and roadside-level decision actions in the Internet of Vehicles scenario. A multi-objective reward function is designed based on locally instantaneously computable indicators, and the core reward term of the reward function is constructed by combining collision time, vehicle speed ratio and acceleration change rate. Based on the scene recognition results, the weight coefficients of each reward item are adaptively adjusted; A deterministic uncertainty assessment method based on a single forward propagation is adopted, and a dual-output branch is set in the output layer of the decision optimization model; The two output branches output the mean and log-variance of the decision action, respectively, and the decision confidence is calculated based on the mean and log-variance. In critical scenarios, a lightweight secondary verification process is triggered to generate vehicle-level and roadside-level decision results in the context of vehicle-to-everything (V2X) scenarios.

[0034] Specifically, based on implicit spatiotemporal representation features, a single-branch multi-task decision-making model is constructed, taking into account real-time performance, multi-objective optimization, and decision reliability: Model architecture: Employs 3 fully connected layers, with spatiotemporal fusion features as input. Output the results of both tasks: Short-term state prediction (1-3s): Predicts changes in the speed of the vehicle in front and the level of lane congestion, with the loss function being MAE; Decision action outputs: vehicle-level actions (acceleration / deceleration / constant speed, following distance adjustment) and roadside-level actions (lane guidance, short-term signal timing suggestions), with the loss function being cross-entropy loss.

[0035] Multi-objective reward function optimization (eliminating circular dependencies): Based on the interaction state reflected by implicit spatiotemporal features, a quantifiable, game-resistant, and circularly dependent reward function is designed to replace the parameters of the original dependent prediction results. (Safety Bonus): Collision Time-of-Collision (TTC) design based on spatiotemporal characteristics; (Efficiency Reward): Designed based on the ratio of the average vehicle speed in the current local grid to the expected speed of the road segment, avoiding reliance on model-predicted congestion probability; (Stability reward): Designed based on the rate of change of acceleration; Dynamic weights: Adjusted based on scene recognition results in spatiotemporal features (adaptive weight allocation for different scenarios such as rainy days and congested periods).

[0036] Low-overhead uncertainty estimation: A deterministic uncertainty estimation method using a single forward propagation replaces the high-overhead MC Dropout, ensuring real-time performance while assessing decision confidence. Model output extension: The last layer of the decision model is changed to a dual-output branch, which outputs the mean of the decision actions. and logarithmic variance The loss function is (cross-entropy loss + Gaussian likelihood loss). Joint training: in, For balance coefficient, For label values, To output the mean, To output the logarithmic variance; Confidence Calculation: Decision Confidence When the confidence level is below the threshold, switch to rule-based decision-making as a fallback. Security verification trigger: Lightweight MC Dropout is activated for secondary verification only when the decision action is close to the security threshold or when there is a conflict in the credibility of multimodal data, avoiding the additional overhead of regular inference.

[0037] In this embodiment, the dual-branch network structure and the decision optimization model are jointly trained and validated in real time. A data loop is constructed by combining the decision results with scenario data, including: A spatiotemporal interactive annotation dataset was constructed using real roadside data and simulation scenario data; Temporal pre-training of the dual-branch network structure; The dual-branch network structure is jointly fine-tuned with the decision optimization model and the uncertainty estimation module; The model is validated in real time from three dimensions: end-to-end latency, interaction capture accuracy, and decision security. When latency exceeds the limit, activate the feature dimensionality reduction strategy; When the confidence level of the interactive capture is lower than the threshold, switch to rule-based decision-making as a fallback. When a decision exceeds a safety threshold, the decision is forcibly corrected, and a data loop is constructed by combining the decision results with scenario data.

[0038] Specifically, an offline pre-training + edge-side inference model is adopted to ensure the reliability and real-time performance of decisions: Offline training: The training data includes real roadside data (including highways, urban areas, road conditions and other scenarios) + simulation scenario data to build a spatiotemporal interaction annotation dataset (annotating the implicit interaction relationships and decision effects between vehicles). Training strategy: Temporal pre-training + joint fine-tuning is adopted. First, the spatiotemporal representation branch (based on spatiotemporal reconstruction loss) is trained, and then the joint decision branch and uncertainty estimation branch are fine-tuned to ensure the synergistic optimization of spatiotemporal features, decision actions and confidence assessment. Adaptive weight training: The dynamic weight generation network is jointly trained with the overall model, and the parameters of the fully connected layers are optimized through backpropagation. Adapts to different motion states.

[0039] Real-time verification system: Real-time verification: Calculate the end-to-end delay of the roadside edge unit (acquisition-preprocessing-spatiotemporal learning-decision). If the delay exceeds the standard, feature dimensionality reduction is automatically enabled to ensure that the real-time requirements are met. Interaction capture accuracy verification: By comparing with the interaction trajectory of real vehicles, the accuracy of interaction prediction is calculated. Combined with the confidence assessment of uncertainty estimation, when the confidence is lower than the threshold, the system switches to rule-based decision-making as a fallback. Security verification: Real-time verification of whether the decision action exceeds the security threshold; if it does, a forced correction is performed.

[0040] In this embodiment, the network parameters of the dual-branch network structure and the decision optimization model are iteratively adjusted to obtain a parameter-optimized fusion model, including: Based on the accumulated spatiotemporal interaction data and uncertainty assessment feedback results, the cloud retrains each branch of the dual-branch network structure and decision optimization model according to the regular cycle, so as to achieve the initial iteration of network parameters; The cloud-based system monitors low-confidence decision events and security verification trigger events reported by roadside edge units in real time. Cluster analysis is performed on frequently occurring new scene data to trigger emergency model patch training and further optimize network parameters; The optimized network parameters obtained from regular periodic training and hotspot emergency training are distributed to the roadside edge units; The local fusion update of the dual-branch network structure and the decision optimization model is completed to obtain the fusion model with optimized parameters.

[0041] Specifically, the output includes: On-board unit: Real-time driving suggestions and interactive risk warnings are pushed via V2X messages, ensuring that latency meets requirements; Roadside end: Push lane guidance instructions to RSU and push short-term timing fine-tuning schemes to traffic signal controllers; Cloud: Asynchronously upload spatiotemporal features, decision results, and uncertainty assessment results of scene datasets for offline iterative optimization of the model.

[0042] Data closed loop: Based on accumulated spatiotemporal interaction data and uncertainty feedback, the cloud retrains the branches of each model according to the regular cycle, while monitoring low confidence decision and safety verification trigger events reported by edge units in real time, clustering analysis of high-frequency new patterns, triggering emergency model patch training, combining regular updates with hot emergency updates, and continuously improving the system's adaptability; the optimized model is distributed to the roadside edge units to complete the iteration.

[0043] In other embodiments, a deep learning-based vehicle network data processing and decision optimization apparatus is provided, and the deep learning-based vehicle network data processing and decision optimization method based on any of the preceding embodiments includes: The acquisition module is configured to acquire multi-source data from the Internet of Vehicles (IoV) and perform associated storage and preprocessing on the multi-source data to obtain standardized spatiotemporal data. The learning module is configured to build a dual-branch network structure based on deep learning, perform implicit spatiotemporal representation learning on standardized spatiotemporal data, and obtain spatiotemporal fusion features that integrate multimodal features; The module is configured to build a decision optimization model based on spatiotemporal fusion features and generate vehicle-level and roadside-level decision results in the Internet of Vehicles scenario. The verification module is configured to jointly train and verify the dual-branch network structure and the decision optimization model in real time, and to build a data closed loop by combining the decision results and scenario data. The iteration module is configured to iteratively adjust the network parameters of the dual-branch network structure and the decision optimization model to obtain the parameter-optimized fusion model.

[0044] In other embodiments, a terminal is provided, including a processor, an input device, an output device, and a memory, which are interconnected. The memory is used to store a computer program, which includes program instructions. The processor is configured to invoke the program instructions to execute the deep learning-based vehicle networking data processing and decision optimization method as described above.

[0045] In other embodiments, a computer-readable storage medium is provided that stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform a deep learning-based vehicle networking data processing and decision optimization method as described above.

[0046] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention, and they should all be covered within the scope of the claims and specification of the present invention.

Claims

1. A deep learning-based method for vehicle networking data processing and decision optimization, characterized in that, include: Acquire multi-source data from the Internet of Vehicles (IoV), and perform correlation storage and preprocessing on the IoV multi-source data to obtain standardized spatiotemporal data; A dual-branch network structure is constructed based on deep learning, and implicit spatiotemporal representation learning is performed on the standardized spatiotemporal data to obtain spatiotemporal fusion features that integrate multimodal features; Based on the spatiotemporal fusion features, a decision optimization model is constructed to generate vehicle-level and roadside-level decision results in the Internet of Vehicles scenario. The dual-branch network structure and decision optimization model are jointly trained and validated in real time. The decision results and scenario data are combined to construct a data closed loop. The network parameters of the dual-branch network structure and the decision optimization model are iteratively adjusted to obtain the parameter-optimized fusion model.

2. The deep learning-based vehicle networking data processing and decision optimization method according to claim 1, characterized in that, The acquisition of multi-source data from the Internet of Vehicles includes: The system adopts a two-level data acquisition architecture: vehicle-side edge-cloud. The vehicle-side collects vehicle status data and uploads structured feature data after preliminary outlier filtering. The roadside edge unit collects vehicle perception data, environmental data, and V2X interaction messages within a local area. The cloud acquires regional traffic history data and control information.

3. The deep learning-based vehicle networking data processing and decision optimization method according to claim 1, characterized in that, The preprocessing of the multi-source data from the vehicle network to obtain standardized spatiotemporal data includes: A rule-based filtering strategy combined with short-term prediction and completion is adopted to remove outliers caused by sensor failures and complete missing data caused by packet loss in V2X communication. Standardize the multi-source data of different modalities, map the structured vehicle state data to a preset numerical range, convert the perception data into low-dimensional features, and parse the V2X interaction messages into semantic feature vectors. Time synchronization protocols and filtering algorithms are used to correct time and location discrepancies in multi-source data. The roadside edge unit automatically configures scene parameters based on the road type to obtain standardized spatiotemporal data.

4. The deep learning-based vehicle networking data processing and decision optimization method according to claim 1, characterized in that, The method involves constructing a dual-branch network structure based on deep learning to perform implicit spatiotemporal representation learning on the standardized spatiotemporal data, resulting in spatiotemporal fusion features that integrate multimodal features, including: Standardized spatiotemporal data is mapped to a pre-defined local spatial grid according to spatial location; The grid cells without data are filled with learnable null value identifier vectors, and the filled grid data is subjected to layer normalization to obtain a structured input tensor. An improved ConvLSTM network is constructed as a temporal branch, and the temporal branch is used to process the structured input tensor and fuse long-range early warning information from V2X semantic features. A structured prior local attention convolutional module is constructed as a spatial branch, and a local grid is divided with the target vehicle as the center. Attention weights are calculated based on the relative states and modal credibility between vehicles. Implicit spatial interaction features are extracted by combining depthwise separable convolution; adaptive fusion weights are generated based on vehicle motion features, and the output features of temporal and spatial branches are fused using residual connection; The modal attention gating module is used to adapt and fuse the fused features with the original multimodal features to obtain the spatiotemporal fused features of the multimodal features.

5. The deep learning-based vehicle networking data processing and decision optimization method according to claim 1, characterized in that, The decision optimization model constructed based on the spatiotemporal fusion features generates vehicle-level and roadside-level decision results in the vehicle-to-everything (V2X) scenario, including: A single-branch multi-task decision optimization model is constructed, and spatiotemporal fusion features are input into the single-branch multi-task decision optimization model; The single-branch multi-task decision optimization model outputs short-term vehicle state prediction results, as well as vehicle-level decision actions and roadside-level decision actions in the Internet of Vehicles scenario. A multi-objective reward function is designed based on locally instantaneously computable indicators, and the core reward term of the reward function is constructed by combining collision time, vehicle speed ratio and acceleration change rate. Based on the scene recognition results, the weight coefficients of each reward item are adaptively adjusted; A deterministic uncertainty assessment method based on a single forward propagation is adopted, and a dual-output branch is set in the output layer of the decision optimization model; The dual output branches output the mean and logarithmic variance of the decision action, respectively, and the decision confidence is calculated based on the mean and logarithmic variance. In critical scenarios, a lightweight secondary verification process is triggered to generate vehicle-level and roadside-level decision results for the Internet of Vehicles (IoV) scenario.

6. The deep learning-based vehicle networking data processing and decision optimization method according to claim 1, characterized in that, The joint training and real-time verification of the dual-branch network structure and the decision optimization model, combined with the decision results and scenario data, to construct a data closed loop includes: A spatiotemporal interactive annotation dataset was constructed using real roadside data and simulation scenario data; Temporal pre-training of the dual-branch network structure; The dual-branch network structure is jointly fine-tuned with the decision optimization model and the uncertainty estimation module; The model is validated in real time from three dimensions: end-to-end latency, accuracy of interaction capture, and decision security. When latency exceeds the limit, activate the feature dimensionality reduction strategy; When the confidence level of the interactive capture is lower than the threshold, switch to rule-based decision-making as a fallback. When a decision exceeds a safety threshold, the decision is forcibly corrected, and a data loop is constructed by combining the decision results with scenario data.

7. The deep learning-based vehicle networking data processing and decision optimization method according to claim 1, characterized in that, The iterative adjustment of network parameters in the dual-branch network structure and decision optimization model to obtain the parameter-optimized fusion model includes: Based on the accumulated spatiotemporal interaction data and uncertainty assessment feedback results, the cloud retrains each branch of the dual-branch network structure and decision optimization model according to the regular cycle, so as to achieve the initial iteration of network parameters; The cloud monitors low-confidence decision events and security verification trigger events reported by roadside edge units in real time. Cluster analysis is performed on frequently occurring new scene data to trigger emergency model patch training and further optimize network parameters; The optimized network parameters obtained from regular periodic training and hotspot emergency training are distributed to the roadside edge units; The local fusion update of the dual-branch network structure and the decision optimization model is completed to obtain the fusion model with optimized parameters.

8. A deep learning-based vehicle networking data processing and decision optimization device, characterized in that, The deep learning-based vehicle networking data processing and decision optimization method according to any one of claims 1 to 7 includes: The acquisition module is configured to acquire multi-source data from the Internet of Vehicles (IoV) and perform associated storage and preprocessing on the multi-source data to obtain standardized spatiotemporal data. The learning module is configured to construct a dual-branch network structure based on deep learning, and perform implicit spatiotemporal representation learning on the standardized spatiotemporal data to obtain spatiotemporal fusion features that integrate multimodal features; The construction module is configured to build a decision optimization model based on the spatiotemporal fusion features and generate vehicle-level and roadside-level decision results in the Internet of Vehicles scenario; The verification module is configured to jointly train and verify the dual-branch network structure and the decision optimization model in real time, and to construct a data closed loop by combining the decision results and scenario data. The iteration module is configured to iteratively adjust the network parameters of the dual-branch network structure and the decision optimization model to obtain the parameter-optimized fusion model.

9. A terminal, characterized in that, The device includes a processor, an input device, an output device, and a memory, which are interconnected. The memory stores a computer program, which includes program instructions. The processor is configured to invoke the program instructions to execute the deep learning-based vehicle networking data processing and decision optimization method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, the computer program including program instructions, which, when executed by a processor, cause the processor to perform the deep learning-based vehicle networking data processing and decision optimization method as described in any one of claims 1 to 7.