A control algorithm test method based on an intelligent driving simulator
By synchronizing multi-source data through wavelet transform, spatiotemporal graph structure, and deep generative model, and combining it with swarm optimization algorithm to optimize parameters, the problems of inaccurate data fusion and low scene generation efficiency in intelligent driving control algorithm testing are solved, achieving efficient and accurate anomaly detection and scene generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG ATTC AUTOMOBILE TECH SERVICE CO LTD
- Filing Date
- 2026-03-03
- Publication Date
- 2026-06-23
AI Technical Summary
Existing intelligent driving control algorithm testing methods rely on real vehicle road tests and hardware-in-the-loop simulations, which have problems such as high cost, difficulty in scenario reproduction, safety risks, and inaccurate data fusion, making it difficult to effectively capture the spatiotemporal correlation and subtle abnormal behavior between multi-source data.
Wavelet transform denoising and adaptive normalization are used to process multi-source test data. A spatiotemporal graph structure model is constructed for data synchronization. Anomaly detection is performed using a multimodal feature fusion network and a deep generative model. Parameters are optimized by combining a population optimization algorithm to generate interactive test scenarios.
It improves data consistency and testing accuracy, enhances the ability to detect unknown types of anomalies and subtle anomalies in critical scenarios, and improves the automation level and coverage of the testing process.
Smart Images

Figure CN121764050B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of autonomous driving testing technology, specifically to a control algorithm testing method based on an intelligent driving simulator. Background Technology
[0002] The testing and verification of current intelligent driving control algorithms heavily rely on real-vehicle road tests and hardware-in-the-loop simulations. Real-vehicle testing is costly, difficult to reproduce scenarios, and poses safety risks. While hardware-in-the-loop simulations can mitigate some of the risks associated with real-vehicle testing, the dynamic elements of the constructed test scenarios are limited, the model fidelity differs from the real world, and it is difficult to simulate complex and ever-changing traffic interactions. During testing, the data streams generated by simulators are multi-source and heterogeneous. For example, vehicle dynamics data, sensor perception data, and control command data often have inconsistent timestamps and sampling frequencies. Conventional processing methods involve filtering and simple timestamp alignment of each source data independently, but this approach is insufficient to effectively capture the deep spatiotemporal correlations between different modalities.
[0003] At the data fusion level, existing technologies mostly employ early fusion or decision-level fusion. Early fusion, by directly splicing features, is prone to introducing noise, while decision-level fusion loses fine-grained interaction information between data points, resulting in insensitivity to subtle abnormal behaviors of control algorithms in critical scenarios. In the parameter optimization stage, the generation of test scenarios and the adjustment of algorithm parameters typically rely on manual traversal based on engineer experience, lacking a systematic automated search mechanism. This manual parameter tuning process is inefficient and struggles to guarantee finding the globally optimal solution, limiting test coverage and depth. Abnormal performance of control algorithms, such as delayed response to specific obstacles or aggressive decisions, is often hidden in complex data interactions, making it difficult for traditional threshold- or rule-based analysis methods to accurately capture and locate these issues. These problems constrain the efficiency and reliability of intelligent driving algorithm testing. Summary of the Invention
[0004] The purpose of this invention is to provide a control algorithm testing method based on an intelligent driving simulator to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the present invention provides a control algorithm testing method based on an intelligent driving simulator, the method comprising:
[0006] Collect multi-source test data generated in real time by the intelligent driving simulator, including vehicle motion state data, environmental perception data, and control command data;
[0007] Wavelet transform denoising and adaptive normalization are performed on multi-source test data to generate a normalized data stream;
[0008] A spatiotemporal graph structure model is constructed, which maps the normalized data stream to graph nodes and edges, and performs spatiotemporal alignment operations based on node similarity to generate a synchronized data set;
[0009] The synchronous dataset is input into a multimodal feature fusion network, where cross-modal attention weighting and feature concatenation are performed to output a fused feature map.
[0010] A deep generative model is used to calculate the reconstruction probability of the fused feature map, and abnormal test points are identified based on the probability threshold.
[0011] The optimal parameter configuration is output after convergence by iteratively updating the parameter positions in the parameter search space using a swarm optimization algorithm.
[0012] The system uses optimal parameter configuration to drive the 3D graphics engine to generate interactive test scenarios and establishes a dynamic relationship between parameters and scenario elements.
[0013] Preferably, the wavelet transform denoising and adaptive normalization processing of the multi-source test data includes:
[0014] The system continuously collects vehicle motion status data, environmental perception data, and control command data through the simulator data interface, and converts the data into a time series format.
[0015] Missing values in time series are detected and filled using spline interpolation.
[0016] Applying discrete wavelet transform to the complete time series decomposes it into approximation coefficients and detail coefficients;
[0017] A threshold is set to perform soft thresholding on the detail coefficients to suppress high-frequency noise;
[0018] The denoised signal is reconstructed using inverse wavelet transform;
[0019] Calculate the maximum and minimum values of the reconstructed signal, and normalize the data to the zero-one interval using minimum-maximum scaling;
[0020] Time-domain and frequency-domain features are extracted from normalized data and combined to form a normalized data stream.
[0021] Preferably, the construction of the spatiotemporal graph structure model, which maps the normalized data stream to graph nodes and edges, and performs spatiotemporal alignment operations based on node similarity to generate a synchronized data set, includes:
[0022] Each data point in the normalized data stream is defined as a graph node, and the node attributes include timestamp and spatial coordinates;
[0023] Edge connections are created based on the temporal proximity and spatial correlation of data points, and the edge weights reflect the strength of the data association.
[0024] Select a reference time point and spatial origin as the alignment datum;
[0025] Calculate the feature similarity between each node and the baseline node, using cosine similarity as the metric.
[0026] Adjust node positions and edge weights based on similarity to synchronize data in the spatiotemporal dimension;
[0027] The adjusted nodes and edges are merged to form a synchronized data set.
[0028] Preferably, the step of inputting the synchronized data set into the multimodal feature fusion network, performing cross-modal attention weighting and feature concatenation, and outputting a fused feature map includes:
[0029] Design a multimodal feature fusion network, which includes a modality-specific encoder and a shared encoder;
[0030] Vehicle motion state features, environmental perception features, and control command features are extracted using modality-specific encoders.
[0031] A cross-modal attention mechanism is applied in the shared encoder to calculate the attention score for each modal feature;
[0032] The features are weighted and summed based on the attention scores to highlight the contributions of important modalities;
[0033] The weighted features are concatenated along the feature dimensions to form a high-dimensional feature vector;
[0034] The feature dimension is reduced by using fully connected layers, and a fused feature map is output.
[0035] Preferably, the step of using a deep generative model to reconstruct the probability of the fused feature map includes:
[0036] Construct the deep generative model, wherein the encoder consists of multiple fully connected layers and the decoder consists of symmetrical fully connected layers;
[0037] Prepare a dataset containing fused feature maps in normal mode as training samples;
[0038] Define a loss function that combines reconstruction error and latent spatial distribution constraints;
[0039] The deep generative model is iteratively trained using training samples until the loss function converges, resulting in a fully trained deep generative model.
[0040] The fused feature map to be detected is input into the encoder of the well-trained deep generative model and mapped to the latent space distribution.
[0041] Latent vectors are sampled from the latent space distribution and input into the decoder of a well-trained deep generative model for data reconstruction.
[0042] Calculate the reconstruction error between the fused feature map to be detected and the reconstructed fused feature map;
[0043] A dynamic probability threshold is set based on the reconstruction error distribution of all training samples;
[0044] Points in the data to be tested whose reconstruction error exceeds the dynamic probability threshold are identified as abnormal test points.
[0045] Preferably, the iterative update of parameter positions in the parameter search space using a population optimization algorithm includes:
[0046] Initialize the particle swarm optimization algorithm, set the number of particles, number of iterations and inertia weight, randomly generate the initial positions of the particles, each position represents a set of test parameters, define the fitness function and evaluate the parameter performance;
[0047] Calculate the fitness value of each particle, update the individual optimal position and the global optimal position, and adjust the particle velocity and position according to the particle velocity update formula;
[0048] Repeat the iteration until the convergence condition is met, and output the global optimal position as the optimal parameter configuration.
[0049] Preferably, the step of generating an interactive test scene based on optimal parameter configuration to drive the 3D graphics engine includes:
[0050] Load the optimal parameter configuration, parse it into scene parameter values, initialize the 3D graphics engine, set the scene coordinate system and lighting parameters, and generate terrain mesh, vehicle model and obstacle model based on the scene parameter values;
[0051] Test feature data is bound to model vertex attributes to achieve dynamic rendering, enable user input interface, support viewpoint switching and parameter adjustment, update scene rendering in real time, and maintain the correlation between parameters and scene elements.
[0052] Preferably, the continuous acquisition of vehicle motion state data, environmental perception data, and control command data through the simulator data interface includes:
[0053] Connect to the sensor output port of the intelligent driving simulator to stream vehicle speed, angular velocity, and position data;
[0054] Point cloud data and image data, including obstacle shapes and road markings, are acquired from the environmental simulation subsystem;
[0055] Monitor the output port of the control algorithm and record throttle opening, brake pressure, and steering angle commands;
[0056] The data is packaged into timestamp data packets and transmitted to the processing unit.
[0057] Preferably, the application of the cross-modal attention mechanism, calculating the attention score for each modal feature includes:
[0058] Each modal feature is input into a linear transformation layer to generate a query vector, a key vector, and a value vector. The dot product of the query vector and the key vector is calculated to obtain the attention weights.
[0059] The attention weights are normalized using the softmax function, and the normalized weights are multiplied by the value vector to generate a weighted modal feature representation.
[0060] By aggregating the weighted features of all modalities, a cross-modal attention output is obtained.
[0061] Preferably, setting the dynamic probability threshold based on the reconstruction error distribution of all training samples includes:
[0062] The reconstruction error values of all training samples during the training phase are collected to form an error sequence;
[0063] Perform sliding window analysis on the error sequence and calculate the error statistical characteristics within each window;
[0064] Establish the probability density function of the error distribution based on the statistical characteristics of the error;
[0065] The error value corresponding to the preset quantile is determined based on the probability density function and used as the initial threshold.
[0066] The initial threshold is dynamically adjusted based on the fluctuation of error values in the real-time test data stream;
[0067] The adjusted threshold is used as a dynamic probability threshold for identifying abnormal test points.
[0068] Compared with the prior art, the beneficial effects of the present invention are:
[0069] A spatiotemporal graph structure model is constructed, abstracting entities such as vehicles and environmental objects as graph nodes, and the dynamic interactions between them as edges. Spatiotemporal alignment based on node similarity is then performed on the normalized multi-source data streams. This method overcomes the limitations of traditional timestamp alignment, dynamically capturing and associating implicit spatiotemporal dependencies within the data streams. This enables heterogeneous data from different physical sources with varying sampling rates to be accurately synchronized within a unified graph model, improving data consistency. The improved data synchronization accuracy provides high-quality input for subsequent fusion analysis and reduces the risk of misjudgments due to data misalignment.
[0070] Unsupervised learning is employed using a deep generative model to study the cross-modal fused feature map. Anomalies are identified by calculating the probability difference between the input features and the model's reconstructed output. This approach does not rely on predefined anomaly labels or fixed thresholds, but instead learns the intrinsic distribution from normal test data. When the control algorithm exhibits anomalous behavior deviating from normal patterns, its generated data features will show a low reconstruction probability, thus being effectively identified. This data-driven and probabilistic statistical approach enhances the detection capability for unknown types of anomalies and subtle anomalies in critical scenarios, improving the depth of insight and automation of the testing process regarding potential algorithm failures. Attached Figure Description
[0071] Figure 1 This is a schematic diagram illustrating the working principle of the control algorithm testing method based on an intelligent driving simulator described in this invention.
[0072] Figure 2 A flowchart for preprocessing multi-source test data;
[0073] Figure 3 Flowcharts for constructing spatiotemporal graphs and synchronizing data;
[0074] Figure 4 For deep learning model anomaly detection maps;
[0075] Figure 5 This is a diagram illustrating the convergence process of the particle swarm optimization algorithm. Detailed Implementation
[0076] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0077] Please see Figure 1 This invention provides a method for testing control algorithms based on an intelligent driving simulator, the method comprising:
[0078] Multi-source test data generated in real time by an intelligent driving simulator is collected, encompassing vehicle motion state data, environmental perception data, and control command data. Wavelet transform denoising and adaptive normalization are performed on the multi-source test data to generate a standardized data stream. A spatiotemporal graph structure model is constructed, mapping the standardized data stream to graph nodes and edges, and spatiotemporal alignment based on node similarity is performed to generate a synchronized data set. This synchronized data set is input into a multimodal feature fusion network, where cross-modal attention weighting and feature concatenation are performed to output a fused feature map. A deep generative model is used to calculate the reconstruction probability of the fused feature map, and abnormal test points are identified based on probability thresholds. A swarm optimization algorithm iteratively updates parameter positions in the parameter search space, outputting the optimal parameter configuration upon convergence. Based on the optimal parameter configuration, a 3D graphics engine is driven to generate an interactive test scene, establishing a dynamic relationship between parameters and scene elements.
[0079] Example 1: See Figure 2 The system continuously collects vehicle motion state data, environmental perception data, and control command data through the simulator data interface. It connects to the sensor output port of the intelligent driving simulator to stream vehicle speed, angular velocity, and position data. Point cloud data and image data, including obstacle shapes and road markings, are acquired from the environmental simulation subsystem. The system monitors the control algorithm output port, recording throttle opening, braking pressure, and steering angle commands. This data is packaged into timestamp data packets and transmitted to the processing unit. The data is converted to a time series format, and missing values are detected and filled using spline interpolation. Discrete wavelet transform is applied to the complete time series, decomposing it into approximation coefficients and detail coefficients. A threshold is set to perform soft thresholding on the detail coefficients to suppress high-frequency noise. Inverse wavelet transform is used to reconstruct the denoised signal. The maximum and minimum values of the reconstructed signal are calculated, and the data is normalized to the zero-to-one interval using minimum-maximum scaling. Time-domain and frequency-domain features are extracted from the normalized data and combined into a normalized data stream.
[0080] In practice, data acquisition is completed through the built-in data interface of the intelligent driving simulator. Connecting to the simulator's sensor output ports, vehicle motion state data is continuously read in a streaming manner. This data includes longitudinal velocity, lateral velocity, yaw rate, and position coordinates in three-dimensional space. Environmental perception data is acquired from the environmental simulation subsystem. This data includes point cloud data generated by simulated LiDAR and image data captured by simulated cameras. The point cloud data describes the three-dimensional contours of surrounding obstacles, while the image data includes texture information of road lane lines, traffic signs, and traffic lights. The output port of the control algorithm module is monitored to record control command data, including accelerator pedal opening commands, brake master cylinder pressure commands, and steering wheel angle commands. All acquired data is packaged into data packets with high-precision timestamps and transmitted via a high-speed bus to the central processing unit for subsequent analysis. The timestamps are used to maintain the temporal relationship between different data sources.
[0081] In practice, after receiving multi-source test data, it is uniformly converted into a time series format arranged with a fixed sampling interval. The system detects whether there are missing data entries in the time series, and for locations with missing data, a cubic spline interpolation algorithm is used to fill in the gaps, thereby generating a continuous and complete time series signal.
[0082] In practice, cubic spline interpolation is applied to fill in time series with detected missing values. This process aims to construct a smooth curve that passes through all known data points and use this curve to estimate the value at the missing location. The process includes determining the interpolation interval: for each interval with missing values in the time series, the nearest known data points before and after it are located as interpolation boundary points. A piecewise cubic polynomial is constructed: for each subinterval formed by two adjacent known data points, an independent cubic polynomial function is defined, with the form: , where: symbol Represents the interpolation function value on the k-th subinterval, with the sign... The time variable represents the normalized value, with the symbol... These are the coefficients of the piecewise polynomial that need to be determined.
[0083] In practice, solving for the polynomial coefficients requires satisfying a series of continuity conditions: the interpolation function value at each internal node must be equal to the original data point, i.e., the function value must be continuous; the first derivative at each internal node must be continuous to ensure a smooth curve transition; and the second derivative at each internal node must be continuous to further ensure the smoothness of the curvature. For boundary nodes, natural boundary conditions are typically used, i.e., the second derivative value at the boundary point is set to zero. By solving the system of linear equations constituting these conditions, the coefficients of all piecewise cubic polynomials can be uniquely determined. Finally, for the time point t where the missing value is located, the corresponding cubic polynomial is substituted according to its subinterval k. Calculate the interpolation fill value.
[0084] Optionally, the boundary conditions can be fixed clamping conditions, i.e., specifying the first derivative values at the boundary points. It is understandable that numerical algorithms such as the pursuit method can be used to improve computational efficiency when solving linear equation systems.
[0085] Discrete wavelet transform (DWT) is applied to the complete vehicle motion state data time series, environmental perception data time series, and control command data time series. DWT decomposes each signal into a series of approximation coefficients and detail coefficients. The approximation coefficients represent the low-frequency general components of the signal, while the detail coefficients represent the high-frequency detail components. A global threshold is set to perform soft thresholding on the detail coefficients, setting those with absolute values below the threshold to zero and shrinking those above the threshold towards zero to suppress high-frequency noise. Inverse wavelet transform is then performed using the thresholded approximation and detail coefficients to reconstruct the denoised vehicle motion state data signal, environmental perception data signal, and control command data signal.
[0086] The maximum and minimum values of the reconstructed vehicle motion state data signal are calculated, and the data is linearly transformed to the zero-to-one interval using a minimum-maximum scaling method. The transformation formula is as follows:
[0087] ;
[0088] Where: symbol Represents a data point in the original signal, symbol Represents the minimum value within this signal segment, symbol Represents the maximum value within this signal segment, symbol This represents the normalized data points. The same adaptive normalization process is performed on both the environmental perception data signal and the control command data signal. Time-domain and frequency-domain features are extracted from the normalized data. Time-domain features include the mean, variance, and zero-crossing rate, while frequency-domain features are obtained by calculating the amplitude of the main frequency components after performing a Fast Fourier Transform on the signal. Finally, all extracted features are combined in chronological order to form a normalized data stream for use by subsequent modules.
[0089] Optionally, the data interface can use Ethernet or CAN bus protocols for communication. It is understandable that the choice of wavelet basis functions can be determined based on signal characteristics, such as using the Daubechies wavelet family.
[0090] Optionally, normalization can be performed independently for each data source to eliminate the influence of different physical dimensions. It can be understood that the combined dimensions of time-domain and frequency-domain features can be adjusted according to the actual application scenario.
[0091] Example 2: See Figure 3 Each data point in the normalized data stream is defined as a graph node. Node attributes include timestamp and spatial coordinates. Edges are created based on the temporal proximity and spatial correlation of data points. Edge weights reflect the strength of data association. A reference time point and spatial origin are selected as alignment benchmarks. The feature similarity between each node and the benchmark node is calculated. Cosine similarity is used as a metric. The node positions and edge weights are adjusted based on the similarity to synchronize the data in the spatiotemporal dimension. The adjusted nodes and edges are merged to form a synchronized data set. A multimodal feature fusion network is designed, comprising a modality-specific encoder and a shared encoder. The modality-specific encoder extracts vehicle motion state features, environmental perception features, and control command features respectively. A cross-modal attention mechanism is applied in the shared encoder to calculate the attention score for each modality feature. Each modality feature is input into a linear transformation layer to generate a query vector, key vector, and value vector. The dot product of the query vector and the key vector is calculated to obtain the attention weights. The attention weights are normalized using the softmax function. The normalized weights are multiplied by the value vector to generate a weighted modality feature representation. The weighted features of all modalities are aggregated to obtain the cross-modal attention output. The features are weighted and summed according to the attention scores to highlight the contributions of important modalities. The weighted features are concatenated along the feature dimension to form a high-dimensional feature vector. The feature dimension is reduced through a fully connected layer to output a fused feature map.
[0092] In practice, the process of constructing a spatiotemporal graph structure model begins by defining each data point in the normalized data stream as a graph node. Node attributes include precise timestamps and three-dimensional spatial coordinates. Edge connections are created based on the temporal proximity and spatial relevance of data points. Edge weights reflect the strength of data association by calculating the Euclidean distance between data points and the reciprocal of the temporal difference. A specific time point and spatial origin in the spatiotemporal graph structure model are selected as alignment benchmarks. The feature similarity between each node and the benchmark node is calculated. Cosine similarity is used to measure the directional consistency between the feature vectors of two nodes. Cosine similarity calculation is based on the inner product and modulus of node attributes. Node positions and edge weights are adjusted according to the calculated similarity values. Data is synchronized in the spatiotemporal dimension through translation and scaling operations. Finally, all adjusted nodes and edges are merged to form a synchronized data set for subsequent processing. In practice, the synchronous data set is input into the multimodal feature fusion network, which includes a set of modality-specific encoders and a shared encoder. The modality-specific encoders are used to extract vehicle motion state features, environmental perception features, and control command features respectively. Each modality-specific encoder is composed of a multi-layer sensing mechanism and is responsible for mapping the original modal data to a high-dimensional feature space. A cross-modal attention mechanism is applied in the shared encoder to calculate the attention score of each modality feature in order to realize information interaction between modalities.
[0093] In practical implementation, when applying the cross-modal attention mechanism, each modal feature is input into an independent linear transformation layer to generate corresponding query vector, key vector, and value vector. The dot product of the query vector and key vector is calculated to obtain unnormalized attention weights. The attention weights are then normalized using the softmax function, ensuring that the sum of all weights is one. The normalized weights are multiplied by the value vector to generate a weighted modal feature representation. The weighted features of all modalities are aggregated, and the cross-modal attention output is obtained through a summation operation. The formula for calculating the cross-modal attention weights is expressed as:
[0094] ;
[0095] Where: symbol Represents the query vector for the i-th modality; symbol The key vector representing the j-th mode; symbol Represents the dimension of a vector; symbol Represents the total number of modes; symbol This represents the attention weight of the i-th mode to the j-th mode; Represents the natural exponential function; The key vector representing the k-th mode. This is the summation index variable. The features are weighted and summed based on the attention score to highlight the contribution of important modalities. The weighted vehicle motion state features, environmental perception features, and control command features are concatenated along the feature dimensions to form a high-dimensional feature vector. The feature dimension is reduced through a fully connected layer, and the fused feature map is output.
[0096] In some embodiments, the network structure of the modality-specific encoder can be customized according to the characteristics of the input data. In some embodiments, the depth of the shared encoder can be adjusted according to computational resource constraints. Optionally, the linear transformation layer in the attention mechanism can include a bias term to increase model flexibility. Optionally, layer normalization operations can be introduced after feature concatenation to stabilize the training process. It is understood that the cross-modal attention mechanism can dynamically adjust the contribution of different modalities. It is understood that the fused feature map encompasses the joint representation of multi-source data.
[0097] Example 3: Construct a deep generative model where the encoder consists of multiple fully connected layers and the decoder consists of symmetric fully connected layers. Prepare a dataset containing fused feature maps from the normal mode as training samples. Define a loss function that combines reconstruction error and latent space distribution constraints. Iteratively train the deep generative model using the training samples until the loss function converges, obtaining a fully trained deep generative model. Input the fused feature map to be detected into the encoder of the fully trained deep generative model, mapping it to the latent space distribution. Sample latent vectors from the latent space distribution and input them into the decoder of the fully trained deep generative model for data reconstruction. Calculate the fused feature map to be detected and the reconstructed data. The reconstruction error between the generated fusion feature maps is used to set a dynamic probability threshold based on the reconstruction error distribution of all training samples. The reconstruction error values of all training samples during the training phase are collected to form an error sequence. Sliding window analysis is performed on the error sequence to calculate the error statistical features within each window. A probability density function of the error distribution is established based on the error statistical features. The error value corresponding to the preset quantile is determined as the initial threshold based on the probability density function. The initial threshold is dynamically adjusted according to the fluctuation of the error value in the real-time test data stream. The adjusted threshold is used as the dynamic probability threshold for identifying abnormal test points. Points in the data to be detected whose reconstruction error is higher than the dynamic probability threshold are identified as abnormal test points.
[0098] In practical implementation, the process of constructing a deep generative model includes designing the encoder and decoder structures. The encoder consists of multiple fully connected layers, each containing a specific number of neurons and employing an activation function. The decoder consists of symmetrical fully connected layers, with the same number of layers as the encoder but a decreasing number of neurons. A dataset containing fused feature maps from normal modes is prepared as training samples, covering various typical driving scenarios. A loss function is defined that combines reconstruction error and latent spatial distribution constraints. The form of the loss function can be expressed as:
[0099] ;
[0100] Where: symbol Represents the total loss value, symbol Operator representing mathematical expectation, symbol Represents the input fused feature map, symbol Represents the fused feature map generated by reconstruction, symbol It is a hyperparameter that balances reconstruction error and distribution constraints, with the symbol... Represents the Kullback-Leibler divergence, with the symbol... It is the latent spatial conditional distribution of the encoder output, symbol It is a pre-defined prior distribution in the latent space. The deep generative model is iteratively trained using training samples, and the model parameters are optimized through backpropagation until the loss function converges to a stable value, thus obtaining a fully trained deep generative model.
[0101] In practical implementation, when using a well-trained deep generative model for anomaly detection, the fused feature map to be detected is input into the encoder of the well-trained deep generative model. The encoder maps the fused feature map to a latent space distribution, samples latent vectors from the latent space distribution, and inputs the latent vectors into the decoder of the well-trained deep generative model for data reconstruction, generating a reconstructed fused feature map. The reconstruction error between the fused feature map to be detected and the reconstructed fused feature map is calculated, and the reconstruction error is measured using mean squared error. A dynamic probability threshold is set based on the reconstruction error distribution of all training samples. The reconstruction error values of all training samples during the training phase are collected to form an error sequence. A sliding window analysis is performed on the error sequence, with the window size determined according to the data sampling rate. The error statistical characteristics within each window are calculated, including the error mean and error variance. A probability density function of the error distribution is established based on the error statistical characteristics. A non-parametric kernel density estimation method is used. The error value corresponding to the preset quantile is determined based on the probability density function as the initial threshold. The initial threshold is dynamically adjusted according to the fluctuation of the error value in the real-time test data stream. The adjustment method is based on the local statistics of the error sequence. The adjusted threshold is used as the dynamic probability threshold for identifying abnormal test points. Finally, points in the data to be tested whose reconstruction error is higher than the dynamic probability threshold are identified as abnormal test points.
[0102] In some embodiments, the fully connected layers of the encoder may use nonlinear activation functions to enhance representational power. In some embodiments, the prior distribution of the latent space may be a standard Gaussian distribution to simplify computation. Optionally, the hyperparameter γ in the loss function may be optimized using cross-validation. Optionally, the window size in the sliding window analysis may be adjusted according to computational efficiency requirements. It is understood that other distance metrics such as Manhattan distance may be used to calculate the reconstruction error. It is understood that a smoothing factor may be introduced in the dynamic probability threshold adjustment process to reduce the impact of noise.
[0103] See Figure 4 This diagram demonstrates the application effect of an anomaly detection system based on a deep generative model in intelligent driving simulation testing. The diagram clearly presents the reconstruction error distribution of the test samples, with blue dots indicating normal test points, red crosses indicating detected abnormal test points, and green dashed lines representing dynamically calculated probability threshold boundaries. The distribution of data points in the diagram reflects the model's ability to distinguish between normal and abnormal driving modes, and the level of reconstruction error directly reflects the degree of deviation between the test data and the normal mode learned during training. The dynamic threshold line automatically adjusts according to the error distribution characteristics of the training data, adapting to the anomaly detection needs under different testing scenarios. Anomaly detection points are clearly clustered in the high-error region in the diagram, verifying the effectiveness of the deep generative model in capturing abnormal patterns. This visualization provides an intuitive basis for the safety assessment of intelligent driving systems, helping engineers quickly identify potential abnormal test scenarios.
[0104] Example 4: Initialize the particle swarm optimization algorithm, set the number of particles, number of iterations and inertia weight, randomly generate the initial positions of the particles, each position represents a set of test parameters, define the fitness function, evaluate the performance of the parameters, calculate the fitness value of each particle, update the individual optimal position and the global optimal position, adjust the particle velocity and position according to the particle velocity update formula, repeat the iteration until the convergence condition is met, and output the global optimal position as the optimal parameter configuration.
[0105] In practical implementation, the initialization process of the particle swarm optimization algorithm includes setting algorithm parameters such as the number of particles, the number of iterations, and the inertia weight. These parameters are determined in advance based on the complexity of the test scenario. Initial particle positions are randomly generated, with each particle position representing a set of test parameters. These test parameters include gain coefficients or threshold settings that control the algorithm. A fitness function is defined to evaluate parameter performance, calculated based on vehicle trajectory tracking errors or collision avoidance success rates in the test scenario. Refer to Table 1 for the parameter settings of the particle swarm optimization algorithm.
[0106] Table 1: Parameter Settings for Particle Swarm Optimization Algorithm
[0107]
[0108] In practice, the fitness value of each particle is calculated by decoding the particle position into test parameters and driving the simulator to run. The individual optimal position and the global optimal position are updated based on a comparison of fitness values. The individual optimal position is the particle's own historical best solution, and the global optimal position is the current best solution for the entire population. The particle velocity and position are adjusted according to the particle velocity update formula, which is expressed as:
[0109] ;
[0110] Where: symbol Represents the velocity vector of particle i in the kth iteration, with the symbol... Represents inertial weight, symbol Represents cognitive learning factors, symbol Represents social learning factors, symbols and Represents a uniformly distributed random number between zero and one, with the sign... The vector representing the optimal position of particle i, denoted by . Represents the globally optimal position vector, symbol This represents the position vector of particle i in the k-th iteration. The iteration process is repeated until the convergence condition is met. The convergence condition can be that the number of iterations reaches the maximum value or the change in fitness value is less than a threshold. The globally optimal position is output as the optimal parameter configuration.
[0111] In some embodiments, the number of particles can be scaled according to the dimension of the parameter search space. In some embodiments, the inertia weight can employ a linearly decreasing strategy to balance exploration and exploitation. Optionally, the fitness function can be a weighted sum of multiple performance metrics. Optionally, the convergence condition can include an early stopping mechanism to prevent overfitting. It is understood that particle position updates must be ensured to be within the boundaries of the parameter search space. It is understood that the random number in the velocity update formula introduces randomness to avoid local optima.
[0112] See Figure 5This paper demonstrates the convergence characteristics of the Particle Swarm Optimization (PSO) algorithm in the optimization of control parameters for intelligent driving. The thick red line in the figure depicts the trajectory of the global optimal fitness as the number of iterations increases, while the thin light blue lines in the background show the evolution of the individual fitness of some particles. The performance of the optimization algorithm is reflected in the continuous improvement of the fitness value, from the initial random search to the later fine-tuning, showcasing the algorithm's efficient exploration capability in the parameter space. The smooth upward trend of the convergence curve indicates that the algorithm can stably find better parameter configurations. The PSO process simulates the complexity of parameter tuning in actual engineering, using a collaborative search mechanism of swarm intelligence to find the optimal configuration for vehicle control performance in a multi-dimensional parameter space. This optimization result provides a scientific basis for parameter setting in intelligent driving algorithms.
[0113] Example 5: Load the optimal parameter configuration, parse it into scene parameter values, initialize the 3D graphics engine, set the scene coordinate system and lighting parameters, generate terrain mesh, vehicle model and obstacle model according to the scene parameter values, bind test feature data to model vertex attributes to achieve dynamic rendering, enable user input interface, support viewpoint switching and parameter adjustment, update scene rendering in real time, and maintain the dynamic association between parameters and scene elements.
[0114] In practice, the process of loading the optimal parameter configuration involves reading the parameter set obtained through a population optimization algorithm from a data file or memory. The optimal parameter configuration is stored in key-value pairs and parsed into scene parameter values. These scene parameter values include terrain dimensions, road curvature, obstacle position coordinates, and the initial state of the vehicle. The 3D graphics engine is initialized, the scene coordinate system is set to a right-handed coordinate system, and lighting parameters, including ambient light intensity, parallel light direction, and point light source position, are set. A terrain mesh is generated based on the scene parameter values. The terrain mesh is generated based on heightmap data, and the mesh vertices contain position coordinates and normal vector information. A vehicle model is generated, created and imported by 3D modeling software, containing mesh data of components such as the vehicle body and wheels. An obstacle model is generated, including geometric data of objects such as cones and guardrails. All models are instantiated in the scene and placed at the parsed positions.
[0115] In practical implementation, test feature data is bound to model vertex attributes. The test feature data comes from real-time acquired vehicle motion states or control commands output by the algorithm, enabling dynamic rendering. Dynamic rendering is achieved by updating the model's transformation matrices and vertex attributes. The transformation matrices include translation, rotation, and scaling matrices, while vertex attributes include color and texture coordinates. A user input interface is enabled, supporting keyboard and mouse event handling, viewpoint switching, and parameter adjustment. Viewpoint switching is achieved by modifying the observation and projection matrices, while parameter adjustment is completed by modifying scene parameter values and triggering scene updates. Scene rendering is updated in real-time, with the rendering loop running at a fixed frequency to maintain the dynamic association between parameters and scene elements. This dynamic association is maintained through parameter change listeners and scene element update callback functions. The formula for updating the position of scene elements is expressed as:
[0116] ;
[0117] Where: symbol Represents the new position vector of a scene element in 3D space, symbol Represents the composite transformation matrix calculated from the current parameters, with the symbol... Represents the original position vectors of scene elements. The rendering pipeline continuously performs vertex shading and rasterization operations, outputting framebuffer data for interactive testing of the scene.
[0118] In some embodiments, the scene coordinate system may be a left-handed coordinate system to conform to the conventions of the graphics API. In some embodiments, the lighting model may be the Phong lighting model or a more complex physically based rendering model. Optionally, the level of detail of the terrain mesh may be dynamically adjusted based on the viewpoint distance. Optionally, the user input interface may support peripheral input such as game controllers. It is understood that the update of the view matrix is based on the position and orientation of the virtual camera. It is understood that parameter adjustments can be reflected in the visual appearance of scene elements in real time.
[0119] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0120] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A method for testing control algorithms based on an intelligent driving simulator, characterized in that, The method is implemented through the following process: Collect multi-source test data generated in real time by the intelligent driving simulator, including vehicle motion state data, environmental perception data, and control command data; Wavelet transform denoising and adaptive normalization are performed on multi-source test data to generate a normalized data stream; A spatiotemporal graph structure model is constructed, which maps the normalized data stream to graph nodes and edges, and performs spatiotemporal alignment operations based on node similarity to generate a synchronized data set; The synchronous dataset is input into a multimodal feature fusion network, where cross-modal attention weighting and feature concatenation are performed to output a fused feature map. A deep generative model is used to calculate the reconstruction probability of the fused feature map, and abnormal test points are identified based on the probability threshold. The optimal parameter configuration is output after convergence by iteratively updating the parameter positions in the parameter search space using a population optimization algorithm, including: The particle swarm optimization algorithm is initialized by setting the number of particles, the number of iterations, and the inertia weight. Initial positions of particles are randomly generated, with each position representing a set of test parameters. A fitness function is defined to evaluate parameter performance. The fitness value of each particle is calculated, and the individual optimal position and the global optimal position are updated. The particle velocity and position are adjusted according to the particle velocity update formula. The iteration is repeated until the convergence condition is met, and the global optimal position is output as the optimal parameter configuration. The fitness function is calculated based on the vehicle trajectory tracking error or collision avoidance success rate in the test scenario. Based on the optimal parameter configuration, the 3D graphics engine generates interactive test scenes and establishes a dynamic relationship between parameters and scene elements. The step of using a deep generative model to reconstruct the probability of the fused feature map includes: The deep generative model is constructed, wherein the encoder consists of multiple fully connected layers and the decoder consists of symmetric fully connected layers; a dataset containing fused feature maps in normal mode is prepared as training samples; a loss function combining reconstruction error and latent space distribution constraints is defined; the deep generative model is iteratively trained using the training samples until the loss function converges, obtaining a fully trained deep generative model; the fused feature map to be detected is input into the encoder of the fully trained deep generative model and mapped to the latent space distribution; latent vectors are sampled from the latent space distribution and input into the decoder of the fully trained deep generative model for data reconstruction; the reconstruction error between the fused feature map to be detected and the reconstructed fused feature map is calculated; a dynamic probability threshold is set based on the reconstruction error distribution of all training samples; points in the data to be detected whose reconstruction error is higher than the dynamic probability threshold are marked as abnormal test points.
2. The control algorithm testing method based on an intelligent driving simulator as described in claim 1, characterized in that, The wavelet transform denoising and adaptive normalization processing of the multi-source test data includes: The system continuously collects vehicle motion status data, environmental perception data, and control command data through the simulator data interface, and converts the data into a time series format. Missing values in time series are detected and filled using spline interpolation. Applying discrete wavelet transform to the complete time series decomposes it into approximation coefficients and detail coefficients; A threshold is set to perform soft thresholding on the detail coefficients to suppress high-frequency noise; The denoised signal is reconstructed using inverse wavelet transform; Calculate the maximum and minimum values of the reconstructed signal, and normalize the data to the zero-one interval using minimum-maximum scaling; Time-domain and frequency-domain features are extracted from normalized data and combined to form a normalized data stream.
3. The control algorithm testing method based on an intelligent driving simulator as described in claim 1, characterized in that, The construction of the spatiotemporal graph structure model maps the normalized data stream to graph nodes and edges, and performs spatiotemporal alignment operations based on node similarity to generate a synchronized data set, including: Each data point in the normalized data stream is defined as a graph node, and the node attributes include timestamp and spatial coordinates; Edge connections are created based on the temporal proximity and spatial correlation of data points, and the edge weights reflect the strength of the data association. Select a reference time point and spatial origin as the alignment datum; Calculate the feature similarity between each node and the baseline node, using cosine similarity as the metric. Adjust node positions and edge weights based on similarity to synchronize data in the spatiotemporal dimension; The adjusted nodes and edges are merged to form a synchronized data set.
4. The control algorithm testing method based on an intelligent driving simulator as described in claim 3, characterized in that, The process of inputting the synchronized data set into the multimodal feature fusion network, performing cross-modal attention weighting and feature concatenation, and outputting a fused feature map includes: Design a multimodal feature fusion network, which includes a modality-specific encoder and a shared encoder; Vehicle motion state features, environmental perception features, and control command features are extracted using modality-specific encoders. A cross-modal attention mechanism is applied in the shared encoder to calculate the attention score for each modal feature; The features are weighted and summed based on the attention scores to highlight the contributions of important modalities; The weighted features are concatenated along the feature dimensions to form a high-dimensional feature vector; The feature dimension is reduced by using fully connected layers, and a fused feature map is output.
5. The control algorithm testing method based on an intelligent driving simulator as described in claim 1, characterized in that, The iterative update of parameter positions in the parameter search space using a swarm optimization algorithm includes: Initialize the particle swarm optimization algorithm, set the number of particles, number of iterations and inertia weight, randomly generate the initial positions of the particles, each position represents a set of test parameters, define the fitness function and evaluate the parameter performance; Calculate the fitness value of each particle, update the individual optimal position and the global optimal position, and adjust the particle velocity and position according to the particle velocity update formula; Repeat the iteration until the convergence condition is met, and output the global optimal position as the optimal parameter configuration.
6. The control algorithm testing method based on an intelligent driving simulator as described in claim 1, characterized in that, The method of generating interactive test scenarios based on optimal parameter configuration-driven 3D graphics engine includes: Load the optimal parameter configuration, parse it into scene parameter values, initialize the 3D graphics engine, set the scene coordinate system and lighting parameters, and generate terrain mesh, vehicle model and obstacle model based on the scene parameter values; Test feature data is bound to model vertex attributes to achieve dynamic rendering, enable user input interface, support viewpoint switching and parameter adjustment, update scene rendering in real time, and maintain the correlation between parameters and scene elements.
7. The control algorithm testing method based on an intelligent driving simulator as described in claim 2, characterized in that, The continuous acquisition of vehicle motion state data, environmental perception data, and control command data through the simulator data interface includes: Connect to the sensor output port of the intelligent driving simulator to stream vehicle speed, angular velocity, and position data; Point cloud data and image data, including obstacle shapes and road markings, are acquired from the environmental simulation subsystem; Monitor the output port of the control algorithm and record throttle opening, brake pressure, and steering angle commands; The data is packaged into timestamp data packets and transmitted to the processing unit.
8. The control algorithm testing method based on an intelligent driving simulator as described in claim 4, characterized in that, The application of the cross-modal attention mechanism calculates the attention score for each modal feature, including: Each modal feature is input into a linear transformation layer to generate a query vector, a key vector, and a value vector. The dot product of the query vector and the key vector is calculated to obtain the attention weights. The attention weights are normalized using the softmax function, and the normalized weights are multiplied by the value vector to generate a weighted modal feature representation. By aggregating the weighted features of all modalities, a cross-modal attention output is obtained.
9. The control algorithm testing method based on an intelligent driving simulator as described in claim 8, characterized in that, The dynamic probability threshold set based on the reconstruction error distribution of all training samples includes: The reconstruction error values of all training samples during the training phase are collected to form an error sequence; Perform sliding window analysis on the error sequence and calculate the error statistical characteristics within each window; Establish the probability density function of the error distribution based on the statistical characteristics of the error; The error value corresponding to the preset quantile is determined based on the probability density function and used as the initial threshold. The initial threshold is dynamically adjusted based on the fluctuation of error values in the real-time test data stream; The adjusted threshold is used as a dynamic probability threshold for identifying abnormal test points.