A traffic flow prediction method of a spatio-temporal decoupling pre-training model
By integrating a spatiotemporally decoupled pre-trained model to extract and fuse spatial and temporal features of traffic flow data, the shortcomings of existing traffic flow prediction technologies in characterizing long-range spatiotemporal dependence and spatial heterogeneity are addressed, resulting in more accurate traffic flow prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGXI TRANSPORTATION SCI & TECH GRP CO LTD
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing traffic flow prediction technologies have shortcomings in long-term spatiotemporal dependency modeling, spatial heterogeneity characterization, and efficiency in utilizing historical data, making it difficult to accurately predict future traffic flow trends.
A fusion spatiotemporal decoupled pre-trained model is adopted. The spatial and temporal features of traffic flow data are extracted by pre-trained spatial encoder and temporal encoder respectively, and then fused to predict traffic flow results. Self-supervised pre-training is performed using masked autoencoder to mine long-term spatiotemporal dependencies.
It improves the accuracy and stability of traffic flow prediction, enhances the modeling ability and generalization performance of complex traffic patterns, and makes full use of the spatiotemporal characteristics of large-scale unlabeled historical traffic flow data.
Smart Images

Figure CN122245101A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of traffic prediction technology, and in particular to a traffic flow prediction method that integrates a spatiotemporally decoupled pre-trained model. Background Technology
[0002] With the continuous advancement of urbanization and the increasing number of motor vehicles in cities, the operational complexity of road traffic systems is constantly increasing. Traffic flow prediction, as a crucial component of intelligent transportation systems, has significant application value in areas such as traffic operation status monitoring, traffic signal control, congestion early warning, and travel services. By analyzing historical traffic flow data and accurately predicting traffic flow trends within a certain future timeframe, traffic management departments can formulate scheduling strategies in advance, improve the efficiency of road resource utilization, and alleviate traffic congestion.
[0003] However, current traffic flow prediction methods are still not accurate enough. Summary of the Invention
[0004] In view of this, embodiments of this application provide a traffic flow prediction method and related equipment that integrates a spatiotemporally decoupled pre-trained model to improve the accuracy of traffic flow prediction.
[0005] One aspect of this application provides a traffic flow prediction method that integrates a spatiotemporally decoupled pre-trained model, the method comprising the following steps:
[0006] Acquire the spatiotemporal characteristics of traffic flow data;
[0007] The long historical spatiotemporal sequence of traffic flow data is input into the pre-trained spatial encoder and temporal encoder to obtain spatial features and temporal features respectively.
[0008] By fusing the spatiotemporal features, the spatial features, and the temporal features, a fused feature is obtained;
[0009] Traffic flow prediction results are obtained based on the fusion features.
[0010] In some embodiments, acquiring the spatiotemporal characteristics of traffic flow data includes the following steps:
[0011] The short historical spatiotemporal sequence of traffic flow data and the adjacency matrix of road network information are input into a graph attention network. The graph attention network is used to dynamically aggregate the node features at each time step to obtain a spatially enhanced sequence.
[0012] The spatially enhanced sequence is input into a long short-term memory network along the time dimension, and the long short-term memory network is used to output the corresponding spatiotemporal features.
[0013] In some embodiments, the spatial encoder and the temporal encoder are pre-trained through the following steps:
[0014] Two independent decoupled mask autoencoders are used to perform mask reconstruction on the long historical spatiotemporal sequence along the spatial and temporal dimensions, respectively.
[0015] Construct the objective loss function for the reconstruction task;
[0016] Two decoupled mask autoencoders are pre-trained according to the target loss function. When the loss function is minimized, the two decoupled mask autoencoders serve as the spatial encoder and the temporal encoder, respectively.
[0017] In some embodiments, the task of masking and reconstructing the long historical spatiotemporal sequence using two independent decoupled mask autoencoders along the spatial and temporal dimensions respectively includes the following steps:
[0018] Construct the input tensor based on the long historical spatiotemporal sequence. ;in, Indicates the length of the time step. Indicates the number of spatial nodes. Indicates the number of feature channels;
[0019] The long historical spatiotemporal sequence is divided into non-overlapping patches, each patch being of size [size missing]. Get the number of patches Each patch is mapped to an embedding vector through a linear projection layer. , to obtain the original embedding By superimposing two-dimensional sinusoidal position codes, a token sequence with position information is finally obtained;
[0020] With mask ratio Randomly discard a portion of the time patch along the time dimension to generate a first mask token and a first visible token; then use the first visible token... The first decoupled mask autoencoder is input, and the first unmasked unit is obtained by calculating along the time dimension through a self-attention mechanism. Subsequently, in the time decoder stage, the first mask token is inserted to fill the complete sequence, which is then input into the time decoder to obtain the time feature. The first decoupling mask autoencoder serves as the time encoder.
[0021] With mask ratio Randomly discard some node patches along the spatial dimension to generate a second mask token and a second visible token; then, use the second visible token sequence... The second decoupled mask autoencoder is input, and the unmasked unit after the second encoding is obtained by calculating along the spatial dimension through a self-attention mechanism. Subsequently, in the spatial decoder stage, the second mask token is inserted to fill the complete sequence, which is then input into the spatial decoder to obtain the spatial features. The second decoupling mask autoencoder serves as the spatial encoder.
[0022] In some embodiments, constructing the target loss function for the reconstruction task includes the following steps:
[0023] The mean absolute error is calculated in the masked region of the spatial encoder, yielding the spatial reconstruction loss function as follows: ,in Patch the first original node;
[0024] The mean absolute error is calculated in the masked region of the time encoder, yielding the time reconstruction loss function as follows: ,in This is the second original time patch;
[0025] Define the target loss function for .
[0026] In some embodiments, fusing the spatiotemporal features, the spatial features, and the temporal features to obtain fused features includes the following steps:
[0027] The fusion feature is obtained by fusing the following calculation formula:
[0028] ;
[0029] in, This indicates the fusion feature. This represents the spatiotemporal characteristics. This represents the spatial features. This represents the time characteristic.
[0030] In some embodiments, the step of predicting traffic flow based on the fused features includes the following steps:
[0031] The fused features are mapped to the prediction target dimension through a fully connected layer to obtain the traffic flow prediction result.
[0032] Another aspect of this application embodiment provides a traffic flow prediction device that integrates a spatiotemporally decoupled pre-trained model, the device comprising:
[0033] The spatiotemporal feature acquisition unit is used to acquire the spatiotemporal features of traffic flow data;
[0034] The feature extraction unit is used to input the long historical spatiotemporal sequence of traffic flow data into the pre-trained spatial encoder and temporal encoder respectively to obtain spatial features and temporal features.
[0035] A feature fusion unit is used to fuse the spatiotemporal features, the spatial features, and the temporal features to obtain fused features;
[0036] The vehicle prediction unit is used to predict traffic flow results based on the fused features.
[0037] Another aspect of this application embodiment provides an electronic device, including a processor and a memory;
[0038] The memory is used to store programs;
[0039] The processor executes the program to implement any of the methods described above.
[0040] Another aspect of this application provides a computer-readable storage medium storing a program that is executed by a processor to implement the method described in any of the above embodiments.
[0041] This application includes at least the following beneficial effects:
[0042] This application can acquire the spatiotemporal features of traffic flow data; input the long historical spatiotemporal sequence of traffic flow data into a pre-trained spatial encoder and temporal encoder respectively to obtain spatial and temporal features; fuse the spatiotemporal features, spatial features, and temporal features to obtain fused features; and predict traffic flow results based on the fused features. This application, by separately modeling spatial and temporal features, mines long-term spatiotemporal dependencies from large-scale unlabeled long historical spatiotemporal sequences, thereby enhancing the modeling ability and generalization performance of downstream prediction models for complex traffic patterns. Attached Figure Description
[0043] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 A flowchart illustrating a traffic flow prediction method that integrates a spatiotemporally decoupled pre-trained model, as provided in an embodiment of this application;
[0045] Figure 2A flowchart illustrating the pre-training process of the spatiotemporal encoder provided in this application embodiment;
[0046] Figure 3 An example flowchart of a traffic flow prediction method that integrates a spatiotemporally decoupled pre-trained model provided in an embodiment of this application;
[0047] Figure 4 This is a structural block diagram of a traffic flow prediction device that integrates a spatiotemporally decoupled pre-trained model, as provided in an embodiment of this application. Detailed Implementation
[0048] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0049] Before providing a detailed description of the embodiments of this application, some related technologies involved in the embodiments of this application will be described first, as follows:
[0050] In actual traffic operations, traffic flow data typically exhibits significant spatiotemporal correlations. On the one hand, traffic flow displays periodicity, trends, and randomness over time, such as peak hours, differences between weekdays and non-weekdays, and abnormal fluctuations caused by special events. On the other hand, different road segments or intersections are interconnected through road network structures, resulting in continuous spatial propagation and mutual influence among vehicles. Therefore, traffic flow prediction is essentially a typical spatiotemporal sequence prediction problem, placing high demands on the model's ability to model time and characterize spatial dependencies.
[0051] Existing traffic flow prediction technologies mainly include methods based on statistical models and methods based on machine learning or deep learning. Traditional statistical models, such as autoregressive models, moving average models, and autoregressive integral moving average models (ARIMA) and their extensions, typically predict future trends by parametrically modeling historical traffic flow data. These methods have low computational complexity and are practically valuable in scenarios with relatively stable traffic patterns. However, these methods usually rely on the assumption of stationarity, making it difficult to characterize the nonlinear variations in traffic flow data, and they cannot effectively utilize the spatial structure information of the road network, resulting in limited prediction accuracy in complex road network environments and dynamic traffic scenarios.
[0052] With the improvement of computing power, deep learning-based traffic flow prediction methods have been widely used. Sequence models such as recurrent neural networks, long short-term memory networks, and gated recurrent units can automatically learn nonlinear time dependencies from historical time series, demonstrating strong modeling capabilities in short-term prediction tasks. However, these methods typically treat traffic flow on different road segments as independent time series, ignoring the objective spatial correlations within the traffic system and failing to reflect the propagation characteristics of traffic flow in the road network.
[0053] To address the issue of insufficient spatial dependency modeling, researchers introduced graph neural networks to model traffic networks. Road stops or intersections are abstracted as nodes, and road connections are abstracted as directed edges. Neighbor node information is aggregated through graph convolution and other operations to model the spatial characteristics of traffic flow. This type of spatiotemporal joint modeling method has achieved some success in conventional traffic prediction tasks. However, existing methods mostly employ end-to-end supervised learning, which is highly sensitive to the number of labeled samples and struggles to fully utilize large-scale unlabeled historical traffic flow data. Furthermore, some methods only focus on short-term spatiotemporal relationships within a limited time window, lacking the ability to uncover long-term evolutionary patterns and spatial heterogeneity of traffic flow.
[0054] In summary, existing traffic flow prediction technologies still have shortcomings in long-range spatiotemporal dependency modeling, spatial heterogeneity characterization, and efficiency of historical data utilization. There is an urgent need for a technical solution that can fully explore the spatiotemporal characteristics of traffic flow and improve prediction accuracy and robustness.
[0055] Objective disadvantages of existing technologies:
[0056] The ability to mine long-term spatiotemporal dependencies in historical traffic flow data is insufficient: Existing deep learning-based traffic flow prediction methods mostly adopt end-to-end supervised learning approaches, typically using only a finite historical time window as model input to predict short-term traffic flow changes. These methods focus more on fitting short-term temporal dependencies during model training, making it difficult to fully explore the evolutionary patterns of traffic flow over longer time spans. Furthermore, because the model training objective is directly geared towards the prediction task, the latent spatiotemporal structural information implicit in historical data is not effectively utilized, thus limiting the model's ability to model complex traffic patterns.
[0057] Based on this, this application proposes a traffic flow prediction scheme that integrates a spatiotemporally decoupled pre-trained model. By modeling spatial and temporal features separately, it mines long-term spatiotemporal dependencies from large-scale unlabeled historical traffic flow data, thereby enhancing the downstream prediction model's ability to model complex traffic patterns and its generalization performance.
[0058] Reference Figure 1This application provides a traffic flow prediction method that integrates a spatiotemporally decoupled pre-trained model, specifically including the following steps S100~S130:
[0059] S100: Spatiotemporal characteristics of acquiring traffic flow data;
[0060] S110: Input the long historical spatiotemporal sequence of traffic flow data into the pre-trained spatial encoder and temporal encoder respectively to obtain spatial features and temporal features.
[0061] S120: The spatiotemporal features, spatial features, and temporal features are fused to obtain the fused features;
[0062] S130: Traffic flow prediction results are obtained based on the fusion features.
[0063] Optionally, acquiring the spatiotemporal characteristics of traffic flow data includes the following steps:
[0064] The short historical spatiotemporal sequence of traffic flow data and the adjacency matrix of road network information are input into a graph attention network. The graph attention network is used to dynamically aggregate the node features at each time step to obtain a spatially enhanced sequence.
[0065] The spatially enhanced sequence is input into a long short-term memory network along the time dimension, and the long short-term memory network is used to output the corresponding spatiotemporal features.
[0066] Optionally, the spatial encoder and the temporal encoder are pre-trained through the following steps:
[0067] Two independent decoupled mask autoencoders are used to perform mask reconstruction on the long historical spatiotemporal sequence along the spatial and temporal dimensions, respectively.
[0068] Construct the objective loss function for the reconstruction task;
[0069] Two decoupled mask autoencoders are pre-trained according to the target loss function. When the loss function is minimized, the two decoupled mask autoencoders serve as the spatial encoder and the temporal encoder, respectively.
[0070] Optionally, the task of masking and reconstructing the long historical spatiotemporal sequence using two independent decoupled masking autoencoders along the spatial and temporal dimensions respectively includes the following steps:
[0071] Construct the input tensor based on the long historical spatiotemporal sequence. ;in, Indicates the length of the time step. Indicates the number of spatial nodes. Indicates the number of feature channels;
[0072] The long historical spatiotemporal sequence is divided into non-overlapping patches, each patch being of size [size missing]. Get the number of patches Each patch is mapped to an embedding vector through a linear projection layer. , to obtain the original embedding By superimposing two-dimensional sinusoidal position codes, a token sequence with position information is finally obtained;
[0073] With mask ratio Randomly discard a portion of the time patch along the time dimension to generate a first mask token and a first visible token; then use the first visible token... The first decoupled mask autoencoder is input, and the first unmasked unit is obtained by calculating along the time dimension through a self-attention mechanism. Subsequently, in the time decoder stage, the first mask token is inserted to fill the complete sequence, which is then input into the time decoder to obtain the time feature. The first decoupling mask autoencoder serves as the time encoder.
[0074] With mask ratio Randomly discard some node patches along the spatial dimension to generate a second mask token and a second visible token; then, use the second visible token sequence... The second decoupled mask autoencoder is input, and the unmasked unit after the second encoding is obtained by calculating along the spatial dimension through a self-attention mechanism. Subsequently, in the spatial decoder stage, the second mask token is inserted to fill the complete sequence, which is then input into the spatial decoder to obtain the spatial features. The second decoupling mask autoencoder serves as the spatial encoder.
[0075] Optionally, constructing the objective loss function for the reconstruction task includes the following steps:
[0076] The mean absolute error is calculated in the masked region of the spatial encoder, yielding the spatial reconstruction loss function as follows: ,in Patch the first original node;
[0077] The mean absolute error is calculated in the masked region of the time encoder, yielding the time reconstruction loss function as follows: ,in This is the second original time patch;
[0078] Define the target loss function for .
[0079] Optionally, fusing the spatiotemporal features, the spatial features, and the temporal features to obtain the fused features includes the following steps:
[0080] The fusion feature is obtained by fusing the following calculation formula:
[0081] ;
[0082] in, This indicates the fusion feature. This represents the spatiotemporal characteristics. This represents the spatial features. This represents the time characteristic.
[0083] Optionally, the step of predicting the traffic flow based on the fused features includes the following steps:
[0084] The fused features are mapped to the prediction target dimension through a fully connected layer to obtain the traffic flow prediction result.
[0085] The following section will provide a detailed introduction and explanation of the solutions in the embodiments of this application, using specific application examples.
[0086] This embodiment includes two stages: pre-training and model fine-tuning, as detailed below:
[0087] 1.Reference Figure 2 In the pre-training phase, two independent decoupled masked autoencoders are used to perform mask reconstruction tasks on long historical spatiotemporal sequences along the spatial and temporal dimensions, respectively, to achieve self-supervised pre-training. This phase does not use any labels; it drives the model to learn a long-range spatiotemporal context representation solely through reconstruction loss. The core idea is to allow the encoder to learn rich context representations based only on visible tokens, while the decoder uses mask tokens to recover the original data. The detailed pre-training process is as follows:
[0088] Step 1: Obtain historical spatiotemporal sequence data and construct the input tensor ,in Indicates the length of the time step. Indicates the number of spatial nodes. Indicates the number of feature channels.
[0089] Step 2: Divide the input sequence into non-overlapping patches, each patch being of size [size missing]. Get the number of patches Each patch is mapped to an embedding vector through a linear projection layer. , to obtain the original embedding The token sequence with location information is finally obtained by superimposing two-dimensional sinusoidal position encoding.
[0090] Step 3: Using mask ratio Randomly discard portions of time patches along the time dimension to generate masked tokens and visible tokens. Only the visible portions are retained as input to obtain the visible token sequence. The visible tokens are input into the temporal encoder, and self-attention is calculated along the time dimension to obtain the encoded unmasked unit: Subsequently, in the temporal decoder stage, masked tokens are inserted to fill the complete sequence, which is then input into the temporal decoder to obtain the decoder output: .
[0091] Step 4: Using mask ratio Randomly discard some node patches along the spatial dimension to create masked tokens and visible tokens. Only the visible parts are retained as input, resulting in the visible token sequence. The visible tokens are input into the spatial encoder, and the self-attention mechanism performs calculations only along the spatial dimension to obtain the encoded unmasked unit: Subsequently, in the spatial decoder stage, masked tokens are inserted to fill the complete sequence, which is then input into the spatial decoder to obtain the decoder output. .
[0092] Step 5: Calculate the reconstruction loss for the spatial masked autoencoder and the temporal masked autoencoder, calculating the mean absolute error (MAE) only in their respective masked regions to drive the model to learn rich long-range spatiotemporal context representations. For the spatial reconstruction loss... ,in For the original node patch; similarly, temporal reconstruction loss. ,in For the original time patch, the losses of the two independent branches are directly added together as the overall self-supervised pre-training objective: .
[0093] Step 6: Minimize the total loss mentioned above This results in two independent encoders: a spatial encoder and a time encoder.
[0094] 2.Reference Figure 3 In the model fine-tuning stage, the rich long-range spatiotemporal representations extracted by the pre-trained spatial and temporal encoders are fused with the output of the downstream spatiotemporal prediction model to enhance the performance of downstream tasks. The specific implementation process is as follows:
[0095] Step 1: Convert short historical spatiotemporal sequences Road network information adjacency matrix The input is fed into the GAT (Graph Attention Network) module, which uses a multi-head graph attention mechanism to dynamically aggregate the node features at each time step, resulting in a spatially enhanced sequence. ;
[0096] Step 2: Spatial Enhancement Sequence Inputting data along the time dimension into an LSTM module captures long-term temporal dependencies and outputs spatiotemporal features. ;
[0097] Step 3: Transform the long historical spatiotemporal sequence The spatial encoder output, encoded unmasked unit, is obtained by inputting pre-trained spatial encoders and temporal encoders respectively. Time encoder output encoded unmasked unit With downstream model output To merge, that is:
[0098] ;
[0099] Step 4: Finally, the prediction result is obtained by mapping the fully connected layer to the target dimension.
[0100] This embodiment includes the following key technical solutions:
[0101] A mask-based self-supervised pre-training mechanism is adopted to mine spatiotemporal features: during the pre-training stage, traffic flow data is masked in both spatial and temporal dimensions, and the masked parts are reconstructed and learned based on the unmasked data information, so as to achieve spatiotemporal feature extraction without manual annotation and improve the utilization efficiency of historical traffic flow data.
[0102] This embodiment introduces a spatiotemporal decoupling pre-training mechanism during the traffic flow prediction model training process and applies the obtained pre-trained model to downstream traffic flow prediction tasks. Compared with traditional end-to-end supervised learning traffic flow prediction schemes, this approach can fully mine the spatiotemporal features contained in historical traffic flow data without relying on a large number of labeled samples. This scheme effectively improves the model's ability to model long-term temporal dependencies and spatial heterogeneous features, thereby improving the accuracy and stability of traffic flow prediction.
[0103] Reference Figure 4 This application provides a traffic flow prediction device that integrates a spatiotemporally decoupled pre-trained model, comprising:
[0104] The spatiotemporal feature acquisition unit is used to acquire the spatiotemporal features of traffic flow data;
[0105] The feature extraction unit is used to input the long historical spatiotemporal sequence of traffic flow data into the pre-trained spatial encoder and temporal encoder respectively to obtain spatial features and temporal features.
[0106] A feature fusion unit is used to fuse the spatiotemporal features, the spatial features, and the temporal features to obtain fused features;
[0107] The vehicle prediction unit is used to predict traffic flow results based on the fused features.
[0108] It is understood that the content of the above method embodiments is applicable to the present device embodiments. The specific functions implemented by the present device embodiments are the same as those of the above method embodiments, and the beneficial effects achieved are also the same as those achieved by the above method embodiments.
[0109] In some alternative embodiments, the functions / operations mentioned in the block diagrams may not occur in the order shown in the operation diagrams. For example, depending on the functions / operations involved, two consecutively shown blocks may actually be executed substantially simultaneously, or the blocks may sometimes be executed in reverse order. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and sub-operations described as part of a larger operation are executed independently.
[0110] Furthermore, although this application is described in the context of functional modules, it should be understood that, unless otherwise stated, one or more of the described functions and / or features may be integrated into a single physical device and / or software module, or one or more functions and / or features may be implemented in a separate physical device or software module. It is also understood that a detailed discussion of the actual implementation of each module is unnecessary for understanding this application. Rather, given the properties, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the module will be understood within the scope of conventional technology for an engineer. Therefore, those skilled in the art can implement the application set forth in the claims using ordinary techniques without excessive experimentation. It is also understood that the specific concepts disclosed are merely illustrative and not intended to limit the scope of this application, which is determined by the full scope of the appended claims and their equivalents.
[0111] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0112] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-including system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.
[0113] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which the program can be printed, because the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.
[0114] It should be understood that various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0115] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0116] Although embodiments of this application have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of this application, the scope of which is defined by the claims and their equivalents.
[0117] The above is a detailed description of the preferred embodiments of this application, but this application is not limited to the embodiments described. Those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of this application, and these equivalent modifications or substitutions are all included within the scope defined by the claims of this application.
Claims
1. A traffic flow prediction method integrating a spatiotemporally decoupled pre-trained model, characterized in that, The method includes the following steps: Acquire the spatiotemporal characteristics of traffic flow data; The long historical spatiotemporal sequence of traffic flow data is input into the pre-trained spatial encoder and temporal encoder to obtain spatial features and temporal features respectively. By fusing the spatiotemporal features, the spatial features, and the temporal features, a fused feature is obtained; Traffic flow prediction results are obtained based on the fusion features.
2. The traffic flow prediction method based on a spatiotemporally decoupled pre-trained model according to claim 1, characterized in that, The acquisition of the spatiotemporal characteristics of traffic flow data includes the following steps: The short historical spatiotemporal sequence of traffic flow data and the adjacency matrix of road network information are input into a graph attention network. The graph attention network is used to dynamically aggregate the node features at each time step to obtain a spatially enhanced sequence. The spatially enhanced sequence is input into a long short-term memory network along the time dimension, and the long short-term memory network is used to output the corresponding spatiotemporal features.
3. The traffic flow prediction method based on a spatiotemporally decoupled pre-trained model according to claim 1, characterized in that, The spatial encoder and the temporal encoder are pre-trained through the following steps: Two independent decoupled mask autoencoders are used to perform mask reconstruction on the long historical spatiotemporal sequence along the spatial and temporal dimensions, respectively. Construct the objective loss function for the reconstruction task; Two decoupled mask autoencoders are pre-trained according to the target loss function. When the loss function is minimized, the two decoupled mask autoencoders serve as the spatial encoder and the temporal encoder, respectively.
4. The traffic flow prediction method based on a spatiotemporally decoupled pre-trained model according to claim 3, characterized in that, The task of masking and reconstructing the long historical spatiotemporal sequence using two independent decoupled masking autoencoders along the spatial and temporal dimensions respectively includes the following steps: Construct the input tensor based on the long historical spatiotemporal sequence. ;in, Indicates the length of the time step. Indicates the number of spatial nodes. Indicates the number of feature channels; The long historical spatiotemporal sequence is divided into non-overlapping patches, each patch being of size [size missing]. Get the number of patches Each patch is mapped to an embedding vector through a linear projection layer. , to obtain the original embedding By superimposing two-dimensional sinusoidal position codes, a token sequence with position information is finally obtained; With mask ratio Randomly discard a portion of the time patch along the time dimension to generate a first mask token and a first visible token; then use the first visible token... The first decoupled mask autoencoder is input, and the first unmasked unit is obtained by calculating along the time dimension through a self-attention mechanism. Subsequently, in the time decoder stage, the first mask token is inserted to fill the complete sequence, which is then input into the time decoder to obtain the time feature. The first decoupling mask autoencoder serves as the time encoder. With mask ratio Randomly discard some node patches along the spatial dimension to generate a second mask token and a second visible token; then, use the second visible token sequence... The second decoupled mask autoencoder is input, and the unmasked unit after the second encoding is obtained by calculating along the spatial dimension through a self-attention mechanism. Subsequently, in the spatial decoder stage, the second mask token is inserted to fill the complete sequence, which is then input into the spatial decoder to obtain the spatial features. The second decoupling mask autoencoder serves as the spatial encoder.
5. The traffic flow prediction method based on a spatiotemporally decoupled pre-trained model according to claim 4, characterized in that, The construction of the target loss function for the reconstruction task includes the following steps: The mean absolute error is calculated in the masked region of the spatial encoder, yielding the spatial reconstruction loss function as follows: ,in Patch the first original node; The mean absolute error is calculated in the masked region of the time encoder, yielding the time reconstruction loss function as follows: ,in This is the second original time patch; Define the target loss function for .
6. The traffic flow prediction method based on a spatiotemporally decoupled pre-trained model according to claim 1, characterized in that, The process of fusing the spatiotemporal features, spatial features, and temporal features to obtain the fused features includes the following steps: The fusion feature is obtained by fusing the following calculation formula: ; in, This indicates the fusion feature. This represents the spatiotemporal characteristics. This represents the spatial features. This represents the time characteristic.
7. A traffic flow prediction method based on a fusion of spatiotemporally decoupled pre-trained models according to any one of claims 1 to 6, characterized in that, The process of obtaining traffic flow prediction results based on the fused features includes the following steps: The fused features are mapped to the prediction target dimension through a fully connected layer to obtain the traffic flow prediction result.
8. A traffic flow prediction device integrating a spatiotemporally decoupled pre-trained model, characterized in that, The device includes: The spatiotemporal feature acquisition unit is used to acquire the spatiotemporal features of traffic flow data; The feature extraction unit is used to input the long historical spatiotemporal sequence of traffic flow data into the pre-trained spatial encoder and temporal encoder respectively to obtain spatial features and temporal features. A feature fusion unit is used to fuse the spatiotemporal features, the spatial features, and the temporal features to obtain fused features; The vehicle prediction unit is used to predict traffic flow results based on the fused features.
9. An electronic device, characterized in that, The electronic device includes a processor and a memory; The memory is used to store programs; The processor executes the program to implement the method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The storage medium stores a program that is executed by a processor to implement the method as described in any one of claims 1 to 7.