A traffic prediction method fusing spatial additive attention and gated mixture expert
By integrating spatial additive attention and gated hybrid expert traffic prediction methods, this approach addresses the issues of high computational overhead and limited feature representation in existing methods for predicting complex spatiotemporal dependencies. It achieves lightweight and efficient traffic data prediction, improving the practicality and accuracy of the model.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-19
Smart Images

Figure CN122245114A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a traffic prediction method that integrates spatial additive attention and gating hybrid experts, belonging to the field of traffic prediction technology. Background Technology
[0002] As a crucial component of Intelligent Transportation Systems (ITS), traffic forecasting is essential for the efficiency and accuracy of intelligent traffic scheduling and management. Its predictive accuracy is vital not only for improving road safety and alleviating traffic congestion, but also for providing significant support for key application scenarios such as urban traffic scheduling, dynamic route planning, and personalized travel services. The challenges of traffic forecasting stem from the high complexity of its data. The data is tightly coupled in both spatiotemporal dimensions, exhibiting complex dependencies. Spatially, changes in node states rapidly impact surrounding areas, and their spatial dependencies are not simple sequential relationships, making them difficult to capture fully by traditional sequence models. Temporally, traffic data displays multi-scale mixed patterns, including continuity between adjacent times, daily periodicity represented by morning and evening rush hours, and weekly periodicity of differences between weekdays and weekends. These patterns overlap, requiring models to simultaneously identify and model them.
[0003] Extensive research has been conducted on traffic forecasting for decades. Early on, traffic forecasting was directly viewed as a time-series forecasting problem, with researchers using Autoregressive Integrated Moving Average (ARIMA) for prediction. Subsequently, researchers increasingly employed Support Vector Regression (SVR) to develop data-driven methods capable of handling more complex traffic data structures. However, traditional machine learning methods have limitations in effectively modeling the spatiotemporal dependencies of traffic data and are also insufficient in capturing and analyzing complex spatiotemporal patterns. Multilayer Perceptrons (MLPs), as classic feedforward neural networks, have also been widely used in traffic forecasting. Through multilayer fully connected structures, they perform nonlinear fitting and feature mapping on traffic data, enabling the modeling of simple spatiotemporal correlations and laying the foundation for subsequent deep learning models. In recent years, deep learning technology has demonstrated remarkable effectiveness in modeling high-dimensional spatiotemporal data for traffic forecasting, outperforming traditional time-series statistical methods and shallow machine learning methods. The core principle of deep learning-based traffic forecasting methods lies in utilizing diverse deep learning techniques to extract the inherent spatiotemporal features of traffic data.
[0004] In spatial modeling, convolutional neural networks (CNNs) have successfully simulated spatial dependencies, outperforming traditional methods. However, they are generally well-suited for Euclidean data but perform poorly on non-Euclidean data, such as irregular graph structures in traffic networks. Graph neural networks (GNNs), by passing information along edges and embedding nodes with their features and local topology, effectively capture the spatial dependencies of traffic data. Many GNN-based traffic prediction methods have been actively developed and explored; however, the prediction accuracy of manually predefined graph structures is inherently limited by their quality. To address this issue, adaptively generated graph structures have been invented. Furthermore, neural networks model continuous dynamical systems using parameterized neural networks that parameterize the derivatives of the hidden states, rather than relying on predefined discrete layer sequences, to model frequent differential equations.
[0005] In temporal modeling, recurrent neural networks (RNNs) are widely used for traffic prediction because they can capture temporal dependencies in continuous data. Compared to traditional RNNs, Long Short-Term Memory (LSTM) networks employ a gating mechanism that allows for selective storage and discarding of information, thus handling long-term dependencies more effectively. Gated Recurrent Units (GRUs), a simplified variant of RNNs, also address the problems of long-term dependencies and vanishing gradients.
[0006] While existing methods based on deep learning have made significant breakthroughs in traffic prediction, they still have some significant limitations when dealing with complex traffic data prediction tasks, particularly in the following aspects:
[0007] (1) It relies on explicit topology modeling, which results in complex structures and high computational overhead.
[0008] Traffic prediction methods based on graph neural networks (GNNs) model spatial dependencies in traffic networks by performing convolution operations on adjacency matrices and node features. Their modeling effectiveness largely depends on how the adjacency matrix is constructed and is highly dependent on a predefined road network topology. These methods typically require designing graph structure learning modules or relying on prior topological information, leading to increasingly complex implementations and larger model sizes. This results in significant computational overhead and engineering implementation costs, placing high demands on experimental equipment and computing resources. Furthermore, the complex topological dependencies and multi-layered graph convolutional structures also reduce the model's generalization ability and inference efficiency, making it difficult to efficiently complete traffic data prediction tasks in lightweight, low-resource scenarios.
[0009] (2) Ignoring regional spatial heterogeneity, global modeling lacks adaptive differentiation capabilities.
[0010] RNN-based temporal prediction methods typically employ a single, globally shared recurrent unit to process traffic sequences across all spatial nodes, assuming all regions follow a uniform temporal evolution pattern. These methods struggle to effectively characterize the spatial heterogeneity of different urban areas (such as commercial districts, residential areas, transportation hubs, and suburban roads) and cannot adaptively distinguish traffic change patterns across different road segments. Furthermore, traditional models often use uniform feature fusion methods, lacking the ability to adaptively focus on and weight key spatial information, making it difficult to highlight high-impact areas and key features. This results in limited prediction accuracy, insufficient generalization, and inadequate robustness in complex and heterogeneous traffic networks.
[0011] (3) Traditional hybrid experts have a simple structure and limited multimodal feature expression.
[0012] Traditional expert hybrid models often employ a fixed feature fusion approach involving all experts, processing all channel features uniformly without achieving differentiated routing and specialized learning. These models struggle to fully characterize the heterogeneous distribution and complex multimodal features of traffic data across channel dimensions, preventing different experts from focusing on learning specific traffic patterns. Furthermore, the lack of an effective gating mechanism, with all experts participating in the entire computation, easily leads to significant redundant computation and parameter waste. While increasing model capacity, it's difficult to control computational costs, and the model cannot adaptively focus on more critical and effective features for prediction, ultimately limiting its expressive performance and inference efficiency in complex traffic data prediction tasks.
[0013] (4) Simple linear projection has limited feature representation capabilities.
[0014] Most prediction heads employ simple, fixed, fully connected projection layers, directly performing linear mapping on high-dimensional hidden features. This lacks the ability to adaptively model the dynamic characteristics of traffic data. Such fixed-structure projection methods have high parameter redundancy, making it difficult to efficiently weight and reconstruct the hidden states. When dealing with complex, non-linear, and non-stationary traffic data, they are prone to feature information loss and insufficient representation, failing to achieve accurate and efficient prediction mapping while maintaining low computational overhead. Summary of the Invention
[0015] The technical problem to be solved by this invention is to provide a traffic prediction method that integrates spatial additive attention and gating hybrid experts, which improves the accuracy of predicting complex spatiotemporal heterogeneous traffic data on the basis of lightweight structure, highlighting its practicality and efficiency as a core component of intelligent transportation system.
[0016] To solve the above-mentioned technical problems, the present invention adopts the following technical solution: The present invention designs a traffic prediction method that integrates spatial additive attention and gating hybrid experts, and performs the following steps A to C to obtain a traffic prediction model. Then, based on the historical traffic data of each target traffic monitoring point of the target number, the traffic prediction model is applied to predict the traffic data of each target traffic monitoring point after the target time interval.
[0017] Step A. Collect a preset number of traffic data points for each traffic monitoring point. Use at least one traffic monitoring point to collect a single traffic data point to form a single sample, and then construct a sample set. Proceed to Step B.
[0018] Step B. Concatenate the time-dependent refinement module, spatiotemporal embedding layer, spatial additive attention MLP module, gated routing channel hybrid expert module, and adaptive low-rank projection layer sequentially from the input to the output to construct the network to be trained, and then proceed to step C;
[0019] Step C. Based on the sample set, take the traffic data of each traffic monitoring point in the sample as input and the traffic data of each traffic monitoring point after the target time interval as output, train the network to be trained to obtain the traffic prediction model.
[0020] As a preferred technical solution of the present invention: the time-dependent refinement module consists of a transposed layer, a linear layer, a GELU layer, a linear layer, and a transposed layer connected in series from the input end to the output end. The input end of the first transposed layer in sequence constitutes the input end of the time-dependent refinement module, and the output end of the last transposed layer in sequence constitutes the output end of the time-dependent refinement module.
[0021] The input of the time-dependent refinement module is used to receive... Size of input data The time-dependent refinement module is defined by the following formula:
[0022] ;
[0023] Get output ,in, This represents the number of input samples received by the network to be trained in a single iteration. This indicates the target number of traffic monitoring points corresponding to the input data. This indicates the number of feature dimensions included in the traffic data. This represents the length of the time series corresponding to the traffic data from each traffic monitoring point in the input received by the network to be trained in a single iteration. and These represent the weight matrices for dimensionality increase and dimensionality reduction, respectively. and It is a bias term. For GELU activation function, This indicates that the intermediate features are enhanced by time-dependent relationships extracted through nonlinear mapping in the time dimension.
[0024] As a preferred embodiment of the present invention: the spatiotemporal embedding layer includes a time feature embedding module, a temporal sequence feature embedding module, an adaptive spatial embedding module, and a splicing module, wherein a day is divided into... The daily characteristics of a time slice, and the contents of a week The weekly features for each time unit are obtained by inputting daily and weekly features into a time feature embedding module. The features are first retrieved using an index lookup table, according to the following formula:
[0025] ;
[0026] Obtain daily time embedding features respectively Weekly time embedding features Press again splicing to obtain Size of temporal embedding features ,in, , These represent the daily index corresponding to the daily feature and the weekly index corresponding to the weekly feature, respectively. , These represent embedded lookup operations based on daily and weekly indexes, respectively. Represents the real number field. This indicates the target number of traffic monitoring points corresponding to the input data. , These represent the daily time embedding dimension and the weekly time embedding dimension, respectively. , This indicates the embedding of the time dimension;
[0027] The input of the timing feature embedding module receives the output of the timing dependency refinement module. ,according to Perform data embedding operations to obtain output. Temporal embedding features of size The adaptive spatial embedding module introduces learnable... Spatial embedding features of size and output; where, Represents the fully connected layer function. , These represent the temporal feature embedding dimension and the adaptive spatial embedding dimension, respectively.
[0028] The outputs of the temporal feature embedding module, the temporal sequence feature embedding module, and the adaptive spatial embedding module are respectively connected to the input of the splicing module. The output of the splicing module constitutes the output of the spatiotemporal embedding layer. The splicing module then processes the received three inputs according to... Perform splicing to obtain Spatiotemporal embedding features of size and output, where, Indicates the spatiotemporal feature embedding dimension. .
[0029] As a preferred embodiment of the present invention: the spatial additive attention MLP module includes an adaptive grouping discriminator, a multilayer perception module, three multiplication modules, and three linear layers. One input terminal of the first multiplication module constitutes the input terminal of the spatial additive attention MLP module, receiving the spatiotemporal embedding features output by the spatiotemporal embedding layer. The other input of the first multiplication module is connected to the output of the adaptive group discriminator, which determines the number of traffic monitoring points corresponding to the input data. Divided into Grouping, to obtain an adaptive grouping matrix And output; the first multiplication module, for the two received inputs, press Perform element-wise multiplication to obtain Size of group-level aggregation features and output, where, express Transpose of;
[0030] The output of the first multiplication module is connected to the input of the multilayer sensing module. The output of the multilayer sensing module is connected to the inputs of two linear layers. The multilayer sensing module processes the received input according to... After obtaining the intra-group pattern model Size enhancement features and output, where, This represents a multilayer sensing function; the output of a multilayer sensing module is connected to one of the linear layers, which, in response to its received input, performs... Mapping to obtain query vector And introduce learnable attention vectors. ,according to Obtain attention weight vector Further according to Obtain the global query vector And output to one of the inputs of the second multiplication module, where, Represents the query vector Dimensions Represents the attention weight vector The first in One portion, Represents the query vector The first in Each component; the output of the multilayer sensing module is connected to another linear layer, which, in response to its received input, is... Mapping to obtain the key matrix And output to the other input of the second multiplication module;
[0031] The output of the second multiplication module is connected to the input of the third linear layer, and the output of the third linear layer is connected to one of the inputs of the third multiplication module. The input of the third linear layer receives the query vector. The second multiplication module performs element-wise multiplication on the two received inputs, and then the third linear layer processes them according to the following formula:
[0032] ;
[0033] Calculate and obtain the feature vector of the attention module and output, where, This indicates element-wise multiplication. Represents the query vector The normalization result, Represents a linear change function;
[0034] The other input of the third multiplication module receives the adaptive grouping matrix. The third multiplication module is used according to Obtain attention-enhanced node-level features The output of the third multiplication module is simultaneously combined with the spatiotemporal embedding features output from the spatiotemporal embedding layer. ,according to ,get Size spatial information augmented after embedding features This constitutes the output of the spatial additive attention MLP module.
[0035] As a preferred embodiment of the present invention: the gated routing channel hybrid expert module includes an average pooling layer, a sigmoid activation layer, a multiplication module, a temperature scaling module, a softmax module, an MLP feedforward network group, and two linear layers. The input of the average pooling layer and the input of one of the linear layers together constitute the input of the gated routing channel hybrid expert module, used to receive the spatially enhanced embedded features output by the spatial additive attention MLP module. The output of one of the linear layers is connected to one of the inputs of the multiplication module, and the linear layer then processes the received data. ,according to Obtain the unnormalized route scores of traffic monitoring points from various experts. and output, where, It is a learnable weight matrix. Indicates the preset number of experts;
[0036] The output of the average pooling layer is connected to the input of another linear layer, the output of the other linear layer is connected to the input of a sigmoid activation layer, and the output of the sigmoid activation layer is connected to the other input of the multiplication module. Features are then embedded after the average pooling layer enhances the received spatial information. ,according to To obtain the corresponding average pooling result and output, where, This represents the average pooling function, calculated by the linear layer and the sigmoid activation layer. ,according to Obtain the global modulation coefficient matrix of each expert. and output, where, It is a learnable weight matrix. It is a bias term. This represents the sigmoid activation function;
[0037] The output of the multiplication module is connected to the input of the temperature scaling module, and the output of the temperature scaling module is connected to the input of the Softmax module. The multiplication module is configured to process the two received inputs. and After passing through the temperature scaling module, according to ,get And output, where, This represents the expert selection score matrix after modulation and fusion. This represents a learnable temperature parameter used to adjust the smoothness of the distribution;
[0038] The output of the Softmax module is connected to the input of the MLP feedforward network group. The Softmax module uses a Top-K selection strategy for each traffic monitoring point. The expert with the highest score was selected. These experts constitute the selection expert set corresponding to each traffic monitoring point. , and according to ,against Perform normalization and output the result, where... express The Middle Each traffic monitoring point has its corresponding set of screening experts. The Middle The expert's selection score Indicates the first The selection of experts corresponding to each traffic monitoring point express The normalization result, Represents an exponential function;
[0039] MLP feedforward network group receives spatial information and then embeds features. And the normalized results of the selection scores of each expert in the corresponding expert set for each traffic monitoring point. According to the following formula:
[0040] ;
[0041] Obtain the spatiotemporal representation features of each traffic monitoring point, enhanced by a hybrid gated route and expert analysis. , forming a matrix This serves as the output of the gated routing channel hybrid expert module; among which, Indicates the first The spatiotemporal representation features corresponding to each traffic monitoring point, enhanced by gated routes and expert hybrid technology. express The Middle The spatial information corresponding to each traffic monitoring point is enhanced and then embedded with features. Indicates the first The expert set corresponding to each traffic monitoring point The Middle An MLP network of experts.
[0042] As a preferred embodiment of the present invention: the adaptive low-rank projection layer includes a low-rank adapter, a transpose layer, a multiplication module, and a dimension adjustment module, wherein the output of the low-rank adapter is connected to one of the inputs of the multiplication module, and a learnable low-rank parameter tensor is introduced by the low-rank adapter. According to the following formula:
[0043] ;
[0044] The node adaptive projection weight matrix is generated through low-rank linear mapping and nonlinear transformation. Then, through adaptive projection weights and dimensional adjustment, the results are obtained. and output, where, Represents a low-rank dimension. Represents the ReLU activation function. This represents the learnable weight matrix. Indicates the length of the predicted time period;
[0045] The output of the transpose layer connects to the other input of the multiplication module. The input of the transpose layer forms the input of the adaptive low-rank projection layer, which is used to receive the output of the gated routing channel hybrid expert module. And perform transpose processing to obtain And output; the output of the multiplication module is connected to the input of the dimension adjustment module, and the multiplication module outputs the following formula:
[0046] ;
[0047] Obtain tensor product and output, where, This represents tensor contraction operations based on Einstein's summation conventions.
[0048] The output of the dimension adjustment module constitutes the output of the adaptive low-rank projection layer, i.e., the output of the network to be trained. The dimension adjustment module adjusts the product of the received tensors. Perform dimensional adjustments to obtain predictions And output it.
[0049] As a preferred embodiment of the present invention: in step C, the target loss function is constructed using L1 loss and load balancing loss as follows, and the target loss result is obtained. , used to train the network to be trained;
[0050] ;
[0051] ;
[0052] Where E represents the total number of experts, and e represents the expert index. This indicates the importance of the e-th expert at this moment, used to measure the degree to which the expert is selected by the gating mechanism. Indicates the first The gating weights assigned to the e-th expert by each traffic monitoring point are generated by the gating network using the softmax function, satisfying the following conditions: and ; This indicates a non-negative hyperparameter.
[0053] As a preferred embodiment of the present invention, the traffic data includes at least one of traffic speed and traffic flow.
[0054] The traffic prediction method integrating spatial additive attention and gating hybrid experts described in this invention has the following technical advantages compared with existing technologies:
[0055] (1) This invention designs a traffic prediction method that integrates spatial additive attention and gating hybrid expert. Based on the collection of traffic data from multiple traffic monitoring points across two time periods before and after the target duration, a training network is designed and trained to obtain a traffic prediction model for actual prediction. The designed network directly learns the spatiotemporal dynamic features of traffic data using a backbone network based on MLP, without explicitly introducing road network topology information. This simplifies the model structure and reduces computational and implementation complexity. Furthermore, by dynamically dividing nodes through the spatial additive attention MLP module, features are adaptively weighted to effectively characterize regional spatial heterogeneity and compensate for the shortcomings of traditional MLP methods. The design addresses the shortcomings of LP in distinguishing key spatial information. Furthermore, it incorporates a gated routing channel hybrid expert module, which uses dual gating to achieve adaptive routing of channel features, allowing different experts to focus on modeling different feature patterns and improving the ability to express channel heterogeneity and complex traffic patterns. Additionally, it introduces an adaptive low-rank projection layer, which constructs projection weights in the form of low-rank decomposition and dynamically weights and reconstructs the hidden state. This reduces parameter redundancy while achieving efficient and accurate prediction mapping. The overall design enhances the accurate prediction capability of complex spatiotemporal heterogeneous traffic data on the basis of a lightweight structure, highlighting its practicality and efficiency as a core component of intelligent transportation systems.
[0056] (2) This invention designs a traffic prediction method that integrates spatial additive attention and gated hybrid experts. It has significant advantages in spatiotemporal dependency modeling, computational efficiency, long-term data processing, and adaptability to complex traffic data prediction tasks. Compared with existing technologies, it effectively avoids the strong dependence of traditional graph models on topology, greatly simplifies the model structure, reduces computational overhead and engineering implementation difficulty. Among them, the SAM module realizes dynamic region division and adaptive feature weighting of road network nodes, effectively characterizes the spatial heterogeneity of the road network, strengthens the model's ability to perceive and filter key spatial information, and improves the limitations of traditional MLP in spatial representation; GR-CMoE model The block utilizes a dual gating strategy to achieve dynamic routing and adaptive allocation of feature channels, enabling different expert branches to specifically model differentiated traffic feature patterns. The ALRP layer completes feature mapping and hidden state optimization through low-rank decomposition, significantly reducing parameter redundancy and computational load while improving the model's dynamic extraction and prediction accuracy of deep spatiotemporal features. This invention also achieves significant improvements in computational efficiency. Relying on the backbone network of the MLP architecture, it can directly learn spatiotemporal correlation features autonomously from the original traffic data, effectively getting rid of the excessive dependence of traditional graph models on topology, significantly simplifying the model architecture, and reducing computational costs and practical deployment difficulties. Attached Figure Description
[0057] Figure 1 This is a schematic diagram of the network structure in the design of this invention. Detailed Implementation
[0058] The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.
[0059] This invention designs a traffic prediction method that integrates spatial additive attention and gating hybrid experts. In practical applications, the specific design executes the following steps A to C to obtain a traffic prediction model. Then, based on the historical traffic data of each target traffic monitoring point with a target number of traffic, the traffic prediction model is applied to predict the traffic data of each target traffic monitoring point after a target time interval.
[0060] Step A. Collect a preset number of traffic data points for each traffic monitoring point. Use at least one traffic monitoring point to collect a single traffic data point to form a single sample, and then construct a sample set. Proceed to Step B.
[0061] Step B. As Figure 1 As shown, from the input to the output, the Temporal Dependency Refinement (TDR) module, Spatio-Temporal Embedding (STE) layer, Spatial Additive Attention MLP (SAM) module, Gated Routing Channel-wise Mixture-of-Experts (GR-CMoE) module, and Adaptive Low-Rank Projection (ALRP) layer are sequentially connected to construct the network to be trained, and then proceed to step C.
[0062] Regarding the designed network to be trained, specific design is implemented in practical applications. Specifically, the Time Dependency Refinement (TDR) module, as a preprocessing step before embedding, refines historical dependencies by applying a lightweight nonlinear transformation only in the time dimension. Figure 1 As shown, the design sequentially connects the transpose layer, linear layer, GELU layer, linear layer, and transpose layer from the input to the output. The input of the first transpose layer in sequence constitutes the input of the time-dependent refinement module (TDR), and the output of the last transpose layer in sequence constitutes the output of the time-dependent refinement module (TDR).
[0063] The input of the Time Dependent Refinement (TDR) module is used to receive... Size of input data The Time Dependency Refinement (TDR) module calculates the following formula:
[0064] ;
[0065] Get output ,in, This represents the number of input samples received by the network to be trained in a single iteration. This indicates the target number of traffic monitoring points corresponding to the input data. This indicates the number of feature dimensions included in the traffic data. This represents the length of the time series corresponding to the traffic data from each traffic monitoring point in the input received by the network to be trained in a single iteration. and These represent the weight matrices for dimensionality increase and dimensionality reduction, respectively. and It is a bias term. For GELU activation function, This indicates that the intermediate features are enhanced by time-dependent relationships extracted through nonlinear mapping in the time dimension.
[0066] Regarding the Spatiotemporal Embedding Layer (STE), it effectively integrates traffic data features, temporal information, and spatial structural features to provide high-dimensional input with sufficient information and strong expressive power for subsequent modeling, such as... Figure 1 As shown, the design includes a time feature embedding module, a temporal series feature embedding module, an adaptive spatial embedding module, and a stitching module. To characterize the periodic temporal patterns in traffic data, a learnable temporal identity embedding is introduced, modeling time periods within a day and the number of days within a week, based on the division of a day into... The daily characteristics of a time slice, and the contents of a week The weekly features for each time unit are obtained by inputting daily and weekly features into a time feature embedding module. The features are first retrieved using an index lookup table, according to the following formula:
[0067] ;
[0068] Obtain daily time embedding features respectively Weekly time embedding features Press again splicing to obtain Size of temporal embedding features ,in, , These represent the daily index corresponding to the daily feature and the weekly index corresponding to the weekly feature, respectively. , These represent embedded lookup operations based on daily and weekly indexes, respectively. Represents the real number field. This indicates the target number of traffic monitoring points corresponding to the input data. , These represent the daily time embedding dimension and the weekly time embedding dimension, respectively. , This indicates the embedded time dimension.
[0069] The input of the timing feature embedding module receives the output of the time-dependent refinement (TDR) module. ,according to Perform data embedding operations to obtain output. Temporal embedding features of size To model the inherent characteristics of nodes at different locations in a traffic network, the adaptive spatial embedding module introduces learnable features. Spatial embedding features of size and output; where, Represents the fully connected layer function. , These represent the temporal feature embedding dimension and the adaptive spatial embedding dimension, respectively.
[0070] The outputs of the temporal feature embedding module, the temporal sequence feature embedding module, and the adaptive spatial embedding module are respectively connected to the input of the splicing module. The output of the splicing module constitutes the output of the spatiotemporal embedding layer (STE). The splicing module then processes the received three inputs according to... Perform splicing to obtain Spatiotemporal embedding features of size and output, where, Indicates the spatiotemporal feature embedding dimension. .
[0071] Further, regarding the Spatial Additive Attention MLP module (SAM), adaptive grouping and additive attention mechanisms are used to adaptively model node features, enhancing feature representation capabilities while reinforcing spatial heterogeneity. Figure 1 As shown, the specific design includes an adaptive grouping discriminator, a multilayer perception module, three multiplication modules, and three linear layers. One of the inputs of the first multiplication module forms the input of the Spatial Additive Attention MLP module (SAM), receiving the spatiotemporal embedding features output by the spatiotemporal embedding layer (STE). The other input of the first multiplication module is connected to the output of the adaptive group discriminator, which determines the target number of traffic monitoring points corresponding to the input data. Divided into Grouping, to obtain an adaptive grouping matrix And output; the first multiplication module, for the two received inputs, press Perform element-wise multiplication to obtain Size of group-level aggregation features and output, where, express The transpose of .
[0072] The output of the first multiplication module is connected to the input of the multilayer sensing module. The output of the multilayer sensing module is connected to the inputs of two linear layers. The multilayer sensing module processes the received input according to... After obtaining the intra-group pattern model Size enhancement features and output, where, This represents a multilayer sensing function.
[0073] Furthermore, an efficient additive attention mechanism is introduced to adaptively aggregate relevant information in the spatial domain, thereby enhancing spatial dependencies. A linear transformation maps the input to query and key representations.
[0074] One of the linear layers connected to the output of the multilayer sensing module, according to the received input, performs... Mapping to obtain query vector And introduce learnable attention vectors. Used to characterize the global importance of different query positions, by Obtain attention weight vector Further according to Obtain the global query vector And output to one of the inputs of the second multiplication module, where, Represents the query vector Dimensions Represents the attention weight vector The first in One portion, Represents the query vector The first in Each component; the output of the multilayer sensing module is connected to another linear layer, which, in response to its received input, is... Mapping to obtain the key matrix And output to the other input of the second multiplication module.
[0075] Next, element-wise multiplication is used to interact with the global query vector and the key matrix to guide the encoding of contextual information. This process effectively injects global semantic information while maintaining linear computational complexity. Specifically, the output of the second multiplication module connects to the input of the third linear layer, the output of the third linear layer connects to one of the inputs of the third multiplication module, and the input of the third linear layer receives the query vector. The second multiplication module performs element-wise multiplication on the two received inputs, and then the third linear layer processes them according to the following formula:
[0076] ;
[0077] Calculate and obtain the feature vector of the attention module and output, where, This indicates element-wise multiplication. Represents the query vector The normalization result, This represents a linearly changing function.
[0078] The other input of the third multiplication module receives the adaptive grouping matrix. The third multiplication module is used according to Obtain attention-enhanced node-level features The output of the third multiplication module is simultaneously combined with the spatiotemporal embedding features output by the spatiotemporal embedding layer (STE). ,according to ,get Size spatial information augmented after embedding features This constitutes the output of the Spatial Additive Attention MLP module (SAM).
[0079] Regarding the Gated Routing Channel Hybrid Expert Module (GR-CMoE), a gated routing mechanism adaptively selects and combines multiple feedforward experts to perform fine-grained modeling and reorganization of channel-dimensional features, thereby enhancing the diversity and representational power of feature expressions. Figure 1 As shown, the specific design includes an average pooling layer, a sigmoid activation layer, a multiplication module, a temperature scaling module, a softmax module, an MLP feedforward network group, and two linear layers. The input of the average pooling layer and the input of one of the linear layers together constitute the input of the Gated Routing Channel Hybrid Expert Module (GR-CMoE), used to receive the spatial information-enhanced embedded features output by the Spatial Additive Attention MLP Module (SAM). The output of one of the linear layers is connected to one of the inputs of the multiplication module, and the linear layer then processes the received data. ,according to By mapping the input features to a multi-expert space through linear transformation, the unnormalized routing scores of traffic monitoring points on each expert are obtained. and output, where, It is a learnable weight matrix. This indicates the preset number of experts.
[0080] The output of the average pooling layer is connected to the input of another linear layer, the output of which is connected to the input of a sigmoid activation layer, and the output of the sigmoid activation layer is connected to the other input of the multiplication module. To extract global context information from the node set, average pooling is used to aggregate node information to obtain a global representation; that is, the average pooling layer enhances the received spatial information and then embeds features. ,according to To obtain the corresponding average pooling result and output, where, This represents the average pooling function, calculated by the linear layer and the sigmoid activation layer. ,according to Obtain the global modulation coefficient matrix of each expert. and output, where, It is a learnable weight matrix. It is a bias term. This represents the sigmoid activation function.
[0081] The output of the multiplication module is connected to the input of the temperature scaling module, and the output of the temperature scaling module is connected to the input of the Softmax module. The multiplication module is configured to process the two received inputs. and After passing through the temperature scaling module, according to ,get And output, where, This represents the expert selection score matrix after modulation and fusion. This represents a learnable temperature parameter used to adjust the smoothness of the distribution.
[0082] The output of the Softmax module is connected to the input of the MLP feedforward network. To reduce computational complexity and enhance model sparsity, the Softmax module employs a Top-K selection strategy for each traffic monitoring point. The expert with the highest score was selected. These experts constitute the selection expert set corresponding to each traffic monitoring point. , and according to ,against Perform normalization and output the result, where... express The Middle Each traffic monitoring point has its corresponding set of screening experts.
[0083] The Middle The expert's selection score Indicates the first The selection of experts corresponding to each traffic monitoring point express The normalization result, This represents an exponential function.
[0084] MLP feedforward network group receives spatial information and then embeds features. And the normalized results of the selection scores of each expert in the corresponding expert set for each traffic monitoring point. According to the following formula:
[0085] ;
[0086] Obtain the spatiotemporal representation features of each traffic monitoring point, enhanced by a hybrid gated route and expert analysis. , forming a matrix This serves as the output of the Gated Routing Channel Hybrid Expert Module (GR-CMoE); where, Indicates the first The spatiotemporal representation features corresponding to each traffic monitoring point, enhanced by gated routes and expert hybrid technology. express The Middle The spatial information corresponding to each traffic monitoring point is enhanced and then embedded with features. Indicates the first The expert set corresponding to each traffic monitoring point The Middle An MLP network of experts.
[0087] Regarding the Adaptive Low-Rank Projection Layer (ALRP), an adaptive regression model from the hidden state to the predicted value is achieved by constructing a learnable low-rank mapping matrix. This maps high-dimensional spatiotemporal features to the prediction space and outputs traffic prediction results for future time steps, such as... Figure 1 As shown, the specific design includes a low-rank adapter, a transpose layer, a multiplication module, and a dimension adjustment module. The output of the low-rank adapter is connected to one of the inputs of the multiplication module, and a learnable low-rank parameter tensor is introduced through the low-rank adapter. According to the following formula:
[0088] ;
[0089] The node adaptive projection weight matrix is generated through low-rank linear mapping and nonlinear transformation. Then, through adaptive projection weights and dimensional adjustment, the results are obtained. and output, where, Represents a low-rank dimension. Represents the ReLU activation function. This represents the learnable weight matrix. This indicates the length of the predicted time period.
[0090] The output of the transpose layer is connected to the other input of the multiplication module. The input of the transpose layer forms the input of the adaptive low-rank projection layer (ALRP), which is used to receive the output of the gated routing channel hybrid expert module (GR-CMoE). And perform transpose processing to obtain And output; the output of the multiplication module is connected to the input of the dimension adjustment module, and the multiplication module outputs the following formula:
[0091] ;
[0092] Obtain tensor product and output, where, This represents the tensor contraction operation based on Einstein's summation convention.
[0093] The output of the dimension adjustment module constitutes the output of the adaptive low-rank projection layer (ALRP), i.e., the output of the network to be trained. The dimension adjustment module adjusts the product of the received tensors. Perform dimensional adjustments to obtain predictions And output it.
[0094] Step C. Based on the sample set, take the traffic data of each traffic monitoring point in the sample as input and the traffic data of each traffic monitoring point after the target time interval as output, train the network to be trained to obtain the traffic prediction model.
[0095] In practical applications, the target loss function is constructed using L1 loss and load balancing loss as shown below, yielding the target loss result. , used to train the network to be trained;
[0096] ;
[0097] ;
[0098] in, The purpose of this application is to avoid excessive concentration of expert weights and to use load balancing loss to ensure that all experts are utilized in a balanced manner. This indicates the total number of experts. Indicates expert index, Indicates the first The importance of an expert at this point is used to measure the degree to which that expert is selected by the gating mechanism. Indicates the first The traffic monitoring point was assigned to the first The gating weights for each expert are generated by the gating network using the softmax function, satisfying... and ; This indicates a non-negative hyperparameter.
[0099] The trained network, i.e., the traffic prediction model, is obtained according to the above design process. Further, in the process of applying the traffic prediction model to predict traffic data at target intervals between traffic monitoring points, evaluation metrics such as mean squared error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) can be designed to assess the effectiveness of the traffic prediction model. In practical applications, the traffic data may include at least one of traffic speed and traffic flow, i.e., a traffic flow prediction model obtained according to the design method is used to predict traffic flow.
[0100] The above design was applied to a specific embodiment, and the proposed method was evaluated on three real-world datasets from the California Department of Transportation Performance Measurement System (PEMS): PEMS03, PeMS04, and PeMSD7. These datasets were collected in real-time at 5-minute intervals. Traffic data from the first 12 time slices was used to predict the next 12 time slices. The data was divided chronologically, with 60% used for training, 20% for validation, and 20% for testing. Experiments in this embodiment were performed on a server equipped with an NVIDIA Tesla P100 GPU and running CUDA version 10.2. This embodiment was implemented using PyTorch 1.12.1 on Python 3.8.
[0101] All methods were optimized using Adam with an initial learning rate of 0.002, following a gradual decay strategy. The batch size was 64, and the number of training iterations was 100. The hidden state dimension F was 128, the embedding layer dimension was 32, and the low-rank dimension R in the ALRP module was 16. For the PEMS03 and PEMS04 datasets, the group P in the SAM module was set to 15, the number of experts E in the MLP of the GR-CMoE module was set to 4, and the number of SC layers was set to 3. For the PEMS07 dataset, the group P in the SAM module was set to 20, the number of experts E in the MLP of the GR-CMoE module was set to 6, and the number of SC layers was set to 4.
[0102] This embodiment selected 13 methods for comparison to evaluate the effectiveness of the present invention in traffic prediction. Mean squared error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) were used as quantitative indicators to evaluate various models. The quantitative evaluation results are shown in Table 1, representing the performance comparison of each method on three datasets.
[0103] Table 1
[0104]
[0105] As shown in Table 1, the experimental results demonstrate that the present invention achieves leading performance, especially on the PEMS03 and PEMS04 datasets.
[0106] In contrast, traditional statistical prediction methods such as ARIMA and machine learning-based SVR primarily capture the temporal correlation of traffic data, making it difficult to model its nonlinear dependencies. Their prediction results are significantly inferior to deep learning-based methods. While the traditional deep learning method LSTM can effectively capture the temporal features of traffic data, its performance still lags behind graph-based methods because it ignores spatial dependencies. STGCN employs a spatiotemporal fusion architecture, demonstrating superior prediction performance compared to methods that rely solely on temporal modeling. GWNet and AGCRN use dynamic adjacency matrices to capture continuously evolving spatial dependencies, achieving significant breakthroughs in prediction performance. STGODE expands the spatial receptive field through graph ordinary differential equations, but its ability to model global spatial dependencies is weak, resulting in performance inferior to AGCRN. SSBTAN introduces a self-supervised learning module, automatically learning and extracting deep and robust feature representations from the data, exhibiting strong adaptability and stable, reliable prediction performance. ASTGNN and STAEformer further improve performance by utilizing multi-head self-attention mechanisms. MHGNet decouples single-modal traffic data into multi-modal traffic data by performing feature mapping on timestamps and node embedding matrices. Compared to M3-Net, which is also a lightweight design, on the PEMS03 dataset, the MAE, RMSE, and MAPE of this invention are reduced from 14.87, 27.07, and 15.35% to 14.63, 25.66, and 14.61%, respectively. RMSE and MAPE are relatively reduced by approximately 5.2% and 4.8%, respectively, indicating significant improvements in both overall error magnitude and relative error control. For the PEMS07 dataset, this invention significantly outperforms M3-Net in all three metrics, with MAE, RMSE, and MAPE reduced by approximately 6.0%, 3.0%, and 9.1%, respectively, demonstrating stronger modeling capabilities and generalization performance under conditions of large-scale road networks and highly volatile data. Overall, the superior performance of this invention can be attributed to the following factors: (1) The proposed spatial additive attention MLP module effectively captures the spatial heterogeneity of the road network through dynamic node partitioning and adaptive feature weighting, overcoming the shortcomings of traditional MLP in spatial feature modeling. (2) The proposed gated routing channel hybrid expert module realizes dynamic routing allocation of channel features through a dual gating mechanism, and strengthens the model's ability to represent complex features by combining sparse filtering and load balancing strategies. (3) The introduction of an adaptive low-rank projection module can enhance feature expression capabilities through a more flexible projection structure, thereby effectively improving the modeling effect of complex spatiotemporal dependencies.
[0107] In practical applications, this invention's design scheme, based on the collection of multiple sets of cross-target time-duration traffic data from various traffic monitoring points, designs a training network for training, obtains a traffic prediction model, and performs actual predictions. The designed network directly learns the spatiotemporal dynamic features of traffic data using an MLP-based backbone network, without explicitly introducing road network topology information. This simplifies the model structure and reduces computational and implementation complexity. Furthermore, it dynamically partitions nodes through a Spatial Additive Attention MLP module (SAM) and adaptively weights features, effectively characterizing regional spatial heterogeneity and overcoming the shortcomings of traditional MLPs in distinguishing key spatial information. Moreover, it designs a gated routing channel hybrid expert module (GR-CMoE) to achieve adaptive routing of channel features through dual gating, allowing different... This design, based on a lightweight structure, enhances the ability to accurately predict complex spatiotemporal heterogeneous traffic data. It also introduces an adaptive low-rank projection layer (ALRP) to construct projection weights in a low-rank decomposition form and dynamically weight and reconstruct the hidden states. This reduces parameter redundancy while achieving efficient and accurate prediction mapping. The overall design improves the accuracy of predicting complex spatiotemporally heterogeneous traffic data, highlighting its practicality and efficiency as a core component of intelligent transportation systems. It addresses issues such as the high dependence on explicit topology in existing traffic prediction models, complex structures and high computational costs, insufficient modeling of regional spatial heterogeneity, weak feature differentiation and adaptive capabilities of traditional hybrid expert models, and limited feature expression and insufficient dynamic modeling capabilities of simple fixed projection layers.
[0108] This invention offers significant advantages in spatiotemporal dependency modeling, computational efficiency, long-term data processing, and adaptability to complex traffic data prediction tasks. Compared to existing technologies, it effectively avoids the strong dependence of traditional graph models on topology, significantly simplifies the model structure, reduces computational overhead, and lowers engineering implementation difficulty. Specifically, the SAM module dynamically partitions road network nodes and adaptively weights features, effectively characterizing the spatial heterogeneity of the road network, enhancing the model's ability to perceive and filter key spatial information, and improving the limitations of traditional MLPs in spatial representation. The GR-CMoE module, through a dual-gating strategy, achieves dynamic routing and adaptive allocation of feature channels, guiding different expert branches to specifically model differentiated traffic feature patterns. The ALRP layer completes feature mapping and hidden state optimization through low-rank decomposition, significantly reducing parameter redundancy and computational load while improving the model's dynamic extraction and prediction accuracy of deep spatiotemporal features. This invention also achieves significant improvements in computational efficiency. Relying on the backbone network of the MLP architecture, it can directly learn spatiotemporal correlation features autonomously from raw traffic data, effectively eliminating the excessive dependence of traditional graph models on topology, significantly simplifying the model architecture, and reducing computational costs and practical deployment difficulty.
[0109] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention.
Claims
1. A traffic prediction method integrating spatial additive attention and gating hybrid experts, characterized in that, Perform steps A through C to obtain a traffic prediction model. Then, apply the traffic prediction model to the historical traffic data of each target traffic monitoring point for the target number of traffic monitoring points to predict the traffic data of each target traffic monitoring point after the target time interval. Step A. Collect a preset number of traffic data points for each traffic monitoring point. Use at least one traffic monitoring point to collect a single traffic data point to form a single sample, and then construct a sample set. Proceed to Step B. Step B. Concatenate the time-dependent refinement module, spatiotemporal embedding layer, spatial additive attention MLP module, gated routing channel hybrid expert module, and adaptive low-rank projection layer sequentially from the input to the output to construct the network to be trained, and then proceed to step C; Step C. Based on the sample set, take the traffic data of each traffic monitoring point in the sample as input and the traffic data of each traffic monitoring point after the target time interval as output, train the network to be trained to obtain the traffic prediction model.
2. The traffic prediction method based on spatial additive attention and gating hybrid experts as described in claim 1, characterized in that: The time-dependent refinement module consists of a transposed layer, a linear layer, a GELU layer, another linear layer, and another transposed layer connected in series from the input to the output. The input of the first transposed layer is the input of the time-dependent refinement module, and the output of the last transposed layer is the output of the time-dependent refinement module. The input of the time-dependent refinement module is used to receive... Size of input data The time-dependent refinement module is defined by the following formula: ; Get output ,in, This represents the number of input samples received by the network to be trained in a single iteration. This indicates the target number of traffic monitoring points corresponding to the input data. This indicates the number of feature dimensions included in the traffic data. This represents the length of the time series corresponding to the traffic data from each traffic monitoring point in the input received by the network to be trained in a single iteration. and These represent the weight matrices for dimensionality increase and dimensionality reduction, respectively. and It is a bias term. For GELU activation function, This indicates that the intermediate features are enhanced by time-dependent relationships extracted through nonlinear mapping in the time dimension.
3. The traffic prediction method based on spatial additive attention and gating hybrid experts according to claim 2, characterized in that: The spatiotemporal embedding layer includes a time feature embedding module, a temporal feature embedding module, an adaptive spatial embedding module, and a splicing module, wherein a day is divided into... The daily characteristics of a time slice, and the contents of a week The weekly features for each time unit are obtained by inputting daily and weekly features into a time feature embedding module. The features are first retrieved using an index lookup table, according to the following formula: ; Obtain daily time embedding features respectively Weekly time embedding features Press again splicing to obtain Size of temporal embedding features ,in, , These represent the daily index corresponding to the daily feature and the weekly index corresponding to the weekly feature, respectively. , These represent embedded lookup operations based on daily and weekly indexes, respectively. Represents the real number field. This indicates the target number of traffic monitoring points corresponding to the input data. , Representing the daily time embedding dimension and the weekly time embedding dimension, respectively. Entering the dimension, , This indicates the embedding of the time dimension; The input of the timing feature embedding module receives the output of the timing dependency refinement module. ,according to Perform data embedding operations to obtain output. Temporal embedding features of size The adaptive spatial embedding module introduces learnable... Spatial embedding features of size and output; where, Represents the fully connected layer function. , These represent the temporal feature embedding dimension and the adaptive spatial embedding dimension, respectively. The outputs of the temporal feature embedding module, the temporal sequence feature embedding module, and the adaptive spatial embedding module are respectively connected to the input of the splicing module. The output of the splicing module constitutes the output of the spatiotemporal embedding layer. The splicing module then processes the received three inputs according to... Perform splicing to obtain Spatiotemporal embedding features of size and output, where, Indicates the spatiotemporal feature embedding dimension. .
4. The traffic prediction method based on spatial additive attention and gating hybrid experts as described in claim 3, characterized in that: The spatial additive attention MLP module includes an adaptive grouping discriminator, a multilayer perception module, three multiplication modules, and three linear layers. One input of the first multiplication module serves as the input of the spatial additive attention MLP module, receiving the spatiotemporal embedding features output from the spatiotemporal embedding layer. The other input of the first multiplication module is connected to the output of the adaptive group discriminator, which determines the number of traffic monitoring points corresponding to the input data. Divided into Grouping, to obtain an adaptive grouping matrix And output; the first multiplication module, for the two received inputs, press Perform element-wise multiplication to obtain Size of group-level aggregation features and output, where, express Transpose of; The output of the first multiplication module is connected to the input of the multilayer sensing module. The output of the multilayer sensing module is connected to the inputs of two linear layers. The multilayer sensing module processes the received input according to... After obtaining the intra-group pattern model Size enhancement features and output, where, This represents a multilayer sensing function; the output of a multilayer sensing module is connected to one of the linear layers, which, in response to its received input, performs... Mapping to obtain query vector And introduce learnable attention vectors. ,according to Obtain attention weight vector Further according to Obtain the global query vector And output to one of the inputs of the second multiplication module, where, Represents the query vector Dimensions Represents the attention weight vector The first in One portion, Represents the query vector The first in Each component; the output of the multilayer sensing module is connected to another linear layer, which, in response to its received input, is... Mapping to obtain the key matrix And output to the other input of the second multiplication module; The output of the second multiplication module is connected to the input of the third linear layer, and the output of the third linear layer is connected to one of the inputs of the third multiplication module. The input of the third linear layer receives the query vector. The second multiplication module performs element-wise multiplication on the two received inputs, and then the third linear layer processes them according to the following formula: ; Calculate and obtain the feature vector of the attention module and output, where, This indicates element-wise multiplication. Represents the query vector The normalization result, Represents a linear change function; The other input of the third multiplication module receives the adaptive grouping matrix. The third multiplication module is used according to Obtain attention-enhanced node-level features The output of the third multiplication module is simultaneously combined with the spatiotemporal embedding features output from the spatiotemporal embedding layer. ,according to ,get Size spatial information augmented after embedding features This constitutes the output of the spatial additive attention MLP module.
5. The traffic prediction method based on spatial additive attention and gating hybrid experts according to claim 4, characterized in that: The gated routing channel hybrid expert module includes an average pooling layer, a sigmoid activation layer, a multiplication module, a temperature scaling module, a softmax module, an MLP feedforward network group, and two linear layers. The input of the average pooling layer and the input of one of the linear layers together constitute the input of the gated routing channel hybrid expert module, used to receive the spatially enhanced embedded features from the spatial additive attention MLP module. The output of one of the linear layers is connected to one of the inputs of the multiplication module, and the linear layer then processes the received data. ,according to Obtain the unnormalized route scores of traffic monitoring points from various experts. and output, where, It is a learnable weight matrix. Indicates the preset number of experts; The output of the average pooling layer is connected to the input of another linear layer, the output of the other linear layer is connected to the input of a sigmoid activation layer, and the output of the sigmoid activation layer is connected to the other input of the multiplication module. Features are then embedded after the average pooling layer enhances the received spatial information. ,according to To obtain the corresponding average pooling result and output, where, This represents the average pooling function, calculated by the linear layer and the sigmoid activation layer. ,according to Obtain the global modulation coefficient matrix of each expert. and output, where, It is a learnable weight matrix. It is a bias term. This represents the sigmoid activation function; The output of the multiplication module is connected to the input of the temperature scaling module, and the output of the temperature scaling module is connected to the input of the Softmax module. The multiplication module is configured to process the two received inputs. and After passing through the temperature scaling module, according to ,get And output, where, This represents the expert selection score matrix after modulation and fusion. This represents a learnable temperature parameter used to adjust the smoothness of the distribution; The output of the Softmax module is connected to the input of the MLP feedforward network group. The Softmax module uses a Top-K selection strategy for each traffic monitoring point. The expert with the highest score was selected. These experts constitute the selection expert set corresponding to each traffic monitoring point. , and according to ,against Perform normalization and output the result, where... express The Middle Each traffic monitoring point has its corresponding set of screening experts. The Middle The expert's selection score Indicates the first The selection of experts corresponding to each traffic monitoring point express The normalization result, Represents an exponential function; MLP feedforward network group receives spatial information and then embeds features. And the normalized results of the selection scores of each expert in the corresponding expert set for each traffic monitoring point. According to the following formula: ; Obtain the spatiotemporal representation features of each traffic monitoring point, enhanced by a hybrid gated route and expert analysis. , forming a matrix This serves as the output of the gated routing channel hybrid expert module; among which, Indicates the first The spatiotemporal representation features corresponding to each traffic monitoring point, enhanced by gated routes and expert hybrid technology. express The Middle The spatial information corresponding to each traffic monitoring point is enhanced and then embedded with features. Indicates the first The expert set corresponding to each traffic monitoring point The Middle An MLP network of experts.
6. The traffic prediction method based on spatial additive attention and gating hybrid experts according to claim 5, characterized in that: The adaptive low-rank projection layer includes a low-rank adapter, a transpose layer, a multiplication module, and a dimension adjustment module. The output of the low-rank adapter is connected to one of the inputs of the multiplication module, and a learnable low-rank parameter tensor is introduced through the low-rank adapter. According to the following formula: ; The node adaptive projection weight matrix is generated through low-rank linear mapping and nonlinear transformation. Then, through adaptive projection weights and dimensional adjustment, the results are obtained. and output, where, Represents a low-rank dimension. Represents the ReLU activation function. This represents the learnable weight matrix. Indicates the length of the predicted time period; The output of the transpose layer connects to the other input of the multiplication module. The input of the transpose layer forms the input of the adaptive low-rank projection layer, which is used to receive the output of the gated routing channel hybrid expert module. And perform transpose processing to obtain And output; the output of the multiplication module is connected to the input of the dimension adjustment module, and the multiplication module outputs the following formula: ; Obtain tensor product and output, where, This represents tensor contraction operations based on Einstein's summation conventions. The output of the dimension adjustment module constitutes the output of the adaptive low-rank projection layer, i.e., the output of the network to be trained. The dimension adjustment module adjusts the product of the received tensors. Perform dimensional adjustments to obtain predictions And output it.
7. The traffic prediction method based on spatial additive attention and gating hybrid experts according to claim 6, characterized in that: In step C, the target loss function is constructed using L1 loss and load balancing loss as follows, and the target loss result is obtained. , used to train the network to be trained; ; ; Where E represents the total number of experts, and e represents the expert index. This indicates the importance of the e-th expert at this moment, used to measure the degree to which the expert is selected by the gating mechanism. Indicates the first The gating weights assigned to the e-th expert by each traffic monitoring point are generated by the gating network using the softmax function, satisfying the following conditions: and ; This indicates a non-negative hyperparameter.
8. A traffic prediction method integrating spatial additive attention and gating hybrid experts according to any one of claims 1 to 7, characterized in that: The traffic data includes at least one of traffic speed and traffic flow.