A shared spatio-temporal attention convolutional network population quantity prediction method based on mobile phone data

By constructing a road network structure map and a shared spatiotemporal attention convolutional network, the problems of inaccurate spatiotemporal features and insufficient graph structure learning in existing technologies are solved, and higher accuracy population prediction is achieved.

CN115600744BActive Publication Date: 2026-06-23WUXI BRANCH CHONGQING CITY COMPANY OF CHINA NAT TOBACCO

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
WUXI BRANCH CHONGQING CITY COMPANY OF CHINA NAT TOBACCO
Filing Date
2022-10-20
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing population prediction methods are inaccurate in terms of spatiotemporal features due to changes in road networks when dealing with complex topologies, making it difficult to accurately predict population size. Furthermore, graph structure learning lacks long-distance relationships and has high time overhead, making it difficult to achieve long-term predictions.

Method used

A road network structure map is constructed using a shared spatiotemporal attention convolutional network. Residual connections are made through shared spatiotemporal convolutional layers and blocks. Node features are extracted using the nonodevec algorithm. Spatiotemporal associations are adaptively extracted using an optimized graph module and a multi-head attention mechanism. The graph adjacency matrix is ​​optimized to capture dynamic spatial features.

Benefits of technology

It improves the accuracy of population prediction, better captures the spatiotemporal characteristics of data, adaptively extracts temporal and spatial correlation information, and enhances the prediction accuracy of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115600744B_ABST
    Figure CN115600744B_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of population data information analysis, and particularly relates to a shared spatio-temporal attention convolution network population quantity prediction method based on mobile phone data, which comprises the following steps: constructing a road network structure graph; obtaining road network data and mobile phone data, and performing matching mapping on the data to obtain a road network population graph; inputting the road network population graph into a population quantity prediction model to obtain a population quantity prediction result; the population quantity prediction model is a shared spatio-temporal attention convolution network, which is composed of at least two layers of shared spatio-temporal convolution layers, each layer of shared spatio-temporal convolution layer comprises an optimization graph module and a plurality of shared spatio-temporal convolution blocks; residual connection is performed between the shared spatio-temporal convolution layers, and residual connection is performed between the shared spatio-temporal convolution blocks; the application can better capture the spatio-temporal characteristics of data by performing residual connection on a plurality of spatio-temporal convolution modules to form a spatio-temporal convolution layer, and performing residual connection on a plurality of spatio-temporal convolution layers to form a shared spatio-temporal attention convolution network.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of population data information analysis technology, specifically relating to a population prediction method based on a shared spatiotemporal attention convolutional network using mobile phone data. Background Technology

[0002] With continuous social development and accelerated urbanization, the spatial scope of population activity is expanding, and the quantity and speed of population movement are rapidly increasing. This rapid population concentration and disorderly spread have brought about a series of problems, including traffic congestion and urban safety, increasing the difficulty of urban management. Mastering regional active population data and dynamically monitoring population flow distribution can enable early warning of problem areas and provide quantitative data for urban infrastructure provision, effectively assisting in refined urban management.

[0003] Mobile phone data, generated by users during mobile phone use, exhibits strong periodicity and reflects user activity patterns and related information. Therefore, population prediction based on mobile phone data plays a crucial role in fields such as spatiotemporal data mining, resident population analysis, and intelligent transportation systems. Traditional population prediction methods are numerous, including using models like Autoregressive Interpolated Moving Average (ARIMA) and Support Vector Machines (SVM) to extract temporal features of population flow; using LSTM to predict time-series data; proposing the ST-ResNet residual structure based on convolutional neural networks to extract local spatial correlations between grids; using 3D convolution to simultaneously extract local correlations in the time, spatial, and feature dimensions; and replacing matrix multiplication in Long Short-Term Memory (LSTM) networks with convolution operations to capture long-term and short-term temporal correlations while learning local spatial correlations. While these methods can clearly represent relationships between regions, they cannot handle complex topological structures.

[0004] To address the aforementioned issues, a spectral domain-based graph convolution computation method was proposed. This method uses the Laplacian matrix of the graph to perform a Fourier transform on the graph, and then approximates the solution using spline interpolation to achieve spectral domain-based graph convolution. However, due to the high computational complexity of spectral domain graph convolution, other researchers have proposed using spatial domain graph convolution methods based on adjacency matrices or transition matrices to simplify the propagation and aggregation processes in graph neural networks. Furthermore, gating mechanisms are used to adjust temporal and spatial modules, extracting complex spatiotemporal relationships. For example, the Graph WaveNet network structure proposed by Wu et al. introduces a graph encoding module to improve the spatial relationship representation capability of the graph adjacency matrix. However, all of the above methods suffer from the following problems:

[0005] 1. Since human activities are based on road networks, existing methods use predefined road network maps to obtain the spatiotemporal features of human activities. However, road networks are constantly changing with road traffic, traffic restrictions, etc., which causes human activities to change as well. This results in inaccurate spatiotemporal features extracted by existing methods, leading to low accuracy in prediction results.

[0006] 2. Graph structures require multiple methods to generate, including road network attributes, node autocorrelation, and adaptive graph learning. Among these, road network attributes lack the ability to capture long-distance relationships, and node autocorrelation requires more time. Existing methods for learning graph structures only involve the global relationships of the graph, which fails to fully explore the spatial and temporal features in the data, resulting in the inability to predict population size in the long term. Summary of the Invention

[0007] To address the problems existing in the prior art, this invention proposes a population prediction method based on mobile phone data using a shared spatiotemporal attention convolutional network. The method includes: constructing a road network structure map; acquiring road traffic data and preprocessing it to obtain mobile phone data and road network data; matching and mapping the road network data and mobile phone data to obtain road network population data, and constructing a road network population map based on the road network population data; inputting the road network population map into a population prediction model to obtain the population prediction result; the population prediction model is a shared spatiotemporal attention convolutional network, which consists of at least two shared spatiotemporal convolutional layers, each of which includes an optimization graph module and multiple shared spatiotemporal convolutional blocks; residual connections are made between the shared spatiotemporal convolutional layers and between the shared spatiotemporal convolutional blocks.

[0008] Preferably, the process of processing the road network population map using a population prediction model includes:

[0009] S1: Set the number of training iterations for the model;

[0010] S2: Use the nonodevec algorithm to extract the initial features of each node in the road network population map;

[0011] S3: Construct the adjacency matrix of the road network population map;

[0012] S4: Input the constructed adjacency matrix and the initial features of the nodes into the first shared spatiotemporal convolutional layer to extract spatiotemporal features and obtain the first spatiotemporal feature map of the road network population map;

[0013] S5: Convolve the first spatiotemporal feature map with the adjacency matrix and use it as the input to the next shared spatiotemporal convolutional layer to obtain the nth spatiotemporal feature map; until all shared spatiotemporal convolutional layers have processed the data;

[0014] S6: Calculate the similarity of each node in the last layer of spatiotemporal feature map and construct a similarity matrix;

[0015] S7: Normalize the similarity matrix, aggregate the normalized similarity matrix with the initial features to obtain new initial features, and return to step S4;

[0016] S8: After the model reaches the required number of iterations, all spatiotemporal feature maps are fused to obtain the population prediction result.

[0017] Furthermore, the process by which the shared spatiotemporal convolutional layer processes the input data includes:

[0018] S41: Input the adjacency matrix and initial features into the first shared spatiotemporal convolutional block to extract initial spatiotemporal features and obtain the first initial spatiotemporal feature map;

[0019] S42: Perform residual convolution between the first initial spatiotemporal feature map and the initial features, and use the result of the residual convolution as the input of the next shared spatiotemporal convolution block until all shared spatiotemporal convolution blocks have processed the data;

[0020] S43: Fuse all the initial spatiotemporal feature maps to obtain a spatiotemporal feature map.

[0021] Furthermore, the process by which shared spatiotemporal convolutional blocks process the input data includes:

[0022] S411: Input the adjacency matrix into the optimized graph convolutional network for optimization processing;

[0023] S412: The spatial features of the optimized adjacency matrix are extracted using a shared spatial attention mechanism;

[0024] S413: Perform temporal dilation convolution on the input data;

[0025] S414: A shared temporal attention mechanism is used to extract temporal features from the data after dilated convolution processing to obtain temporal features;

[0026] S415: A shared attention mechanism is used to fuse temporal and spatial features to obtain initial spatiotemporal features.

[0027] Furthermore, the formulas for optimizing the adjacency matrix of a graph structure include:

[0028]

[0029] T k (A o ) = 2A o T k-1 (A o)-T k-2 (A o )

[0030]

[0031] T1 = Softmax(E)

[0032] Among them, g θ (.) denotes a function of the eigenvalues ​​of L, where L represents the normalized graphical Laplacian matrix, K represents the order, and θ k T represents the learnable parameters. k (.) denotes a Chebyshev polynomial. Rescaling is performed based on the largest eigenvalue of L and the identity matrix, where X represents the input data, and A... o Let represent the optimized graph adjacency matrix, E represent the intermediate states of T1, and ReLU represent the activation function. T0 is the transpose of A0, T1 represents the identity matrix, T1 represents the normalization of E, and Softmax represents the normalization function.

[0033] Furthermore, the shared spatial attention mechanism employs a multi-head attention mechanism, the expression of which includes:

[0034]

[0035]

[0036]

[0037]

[0038] in, This represents the sum of the outputs of the optimized graph convolution and the temporally dilated convolution, b represents the current block number, l represents the current layer number in the neural network, GCN represents the optimized graph convolutional neural network, X represents the input data, TDC represents the temporally dilated convolution, query represents the query matrix, cat represents the operation of concatenating multiple matrices, split will split the input into multiple matrices, Q represents the learnable parameters, and b Q K represents the learnable parameter, key represents the relevance matrix between the queried information and other information, K represents the learnable parameter, and b H V represents the learnable parameters, value represents the matrix of queried information, and b represents the learnable parameters. V This represents the learnable parameters.

[0039] Preferably, the formula for calculating the similarity of nodes in the spatiotemporal feature map is:

[0040]

[0041]

[0042] in, Let W represent the p-th similarity vector between node i and node j, where CosSimilarity represents the cosine similarity between the two vectors. p vec represents the weighted parameter vector, ⊙ represents the Hadamard product, and vec represents the weighted parameter vector. i vec represents the feature vector of node i. j Let s represent the feature vector of node j. i,j Let m represent the similarity matrix between node i and node j, m represent the number of weighted parameter vectors, and p represent the current similarity vector number.

[0043] The beneficial effects of this invention are:

[0044] This invention designs a shared spatiotemporal attention convolutional network. This network forms spatiotemporal convolutional layers by residually connecting multiple spatiotemporal convolutional modules, and then residually connecting these layers to form the shared spatiotemporal attention convolutional network, which can better capture the spatiotemporal features of data. This invention also designs a shared spatiotemporal attention mechanism that can adaptively extract temporal and spatial correlation information from data, improving the accuracy of population prediction. Furthermore, this invention designs a node embedding learning optimization graph structure, using the nonodevec algorithm to extract node embedding features, while simultaneously utilizing a self-attention mechanism to learn the correlations between nodes, thus improving the accuracy of model prediction. Attached Figure Description

[0045] Figure 1 This is a flowchart of the population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to the present invention.

[0046] Figure 2 This is a schematic diagram of the shared spatiotemporal attention convolutional network structure of the present invention;

[0047] Figure 3 This is a schematic diagram of the shared spatiotemporal attention convolutional block structure of the present invention;

[0048] Figure 4 This is a structural diagram of the shared attention mechanism of the present invention. Detailed Implementation

[0049] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0050] A population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data, such as... Figure 1 As shown, the method includes: constructing a road network structure map; acquiring road traffic data, preprocessing the road traffic data to obtain mobile phone data and road network data; matching and mapping the road network data and mobile phone data to obtain road network population data, and constructing a road network population map based on the road network population data; inputting the road network population map into a population prediction model to obtain population prediction results; the population prediction model is a shared spatiotemporal attention convolutional network, which includes at least two shared spatiotemporal convolutional layers, each of which includes an optimization graph module and multiple shared spatiotemporal convolutional blocks; residual connections are made between the shared spatiotemporal convolutional layers and between the shared spatiotemporal convolutional blocks.

[0051] Matching and mapping road network data and mobile phone data involves: obtaining the road network structure of the area, which includes road information, information on intersecting road nodes, road traffic data, and surrounding environmental information; and obtaining mobile phone data, which includes basic user information, user locations, and user movement trajectories, etc. Mapping the obtained mobile phone data onto the road network structure yields a road network population data map.

[0052] Optionally, the data obtained is the publicly available METR-LA and PEMS-BAY datasets from the three major operators, and the traffic data is mapped to the road network population data at a ratio of 1:1.5.

[0053] In this embodiment, a population prediction model is used to process the road network population map, wherein the structure of the population prediction model is as follows: Figure 2 As shown, its data processing procedure includes:

[0054] S1: Set the number of training iterations for the model;

[0055] S2: Use the nonodevec algorithm to extract the initial features of each node in the road network population map;

[0056] S3: Construct the adjacency matrix of the road network population map;

[0057] S4: Input the constructed adjacency matrix and the initial features of the nodes into the first shared spatiotemporal convolutional layer to extract spatiotemporal features and obtain the first spatiotemporal feature map of the road network population map;

[0058] S41: Input the adjacency matrix and initial features into the first shared spatiotemporal convolutional block to extract initial spatiotemporal features, obtaining the first initial spatiotemporal feature map; such as Figure 2 As shown, the process of shared spatiotemporal convolutional blocks processing input data includes:

[0059] S411: Input the adjacency matrix into the optimized graph convolutional network for optimization processing;

[0060] S412: The spatial features of the optimized adjacency matrix are extracted using a shared spatial attention mechanism;

[0061] S413: Perform temporal dilation convolution on the input data;

[0062] S414: A shared temporal attention mechanism is used to extract temporal features from the data after dilated convolution processing to obtain temporal features;

[0063] S415: A shared attention mechanism is used to fuse temporal and spatial features to obtain initial spatiotemporal features. The structural diagram of the shared attention mechanism is shown below. Figure 4 As shown.

[0064] S42: Perform residual convolution between the first initial spatiotemporal feature map and the initial features, and use the result of the residual convolution as the input of the next shared spatiotemporal convolution block until all shared spatiotemporal convolution blocks have processed the data;

[0065] S43: Fuse all the initial spatiotemporal feature maps to obtain a spatiotemporal feature map.

[0066] S5: Convolve the first spatiotemporal feature map with the adjacency matrix and use it as the input to the next shared spatiotemporal convolutional layer to obtain the nth spatiotemporal feature map; until all shared spatiotemporal convolutional layers have processed the data;

[0067] S6: Calculate the similarity of each node in the last layer of spatiotemporal feature map and construct a similarity matrix;

[0068] S7: Normalize the similarity matrix, aggregate the normalized similarity matrix with the initial features to obtain new initial features, and return to step S4;

[0069] S8: After the model reaches the required number of iterations, all spatiotemporal feature maps are fused to obtain the population prediction result.

[0070] In this embodiment, as Figure 3 As shown, each shared spatiotemporal attention convolutional block has the same structure, including GCN, temporally dilated convolution, and shared temporal and spatial attention between layers. The input to this layer, after being subjected to temporally dilated convolution, has a temporal dimension that is not equal to the input temporal dimension; therefore, a fully connected layer is used to change the dimension.

[0071] A specific implementation of a population prediction method based on a shared spatiotemporal attention convolutional network is disclosed. The method includes an overall structure composed of shared spatiotemporal attention convolutional layers, each layer having a shared spatiotemporal attention convolutional block and an optimized graph module. The shared spatiotemporal attention convolutional layers take an optimized adjacency matrix and historical traffic data as input, and add residual connections between layers to ensure that each layer can learn complex spatiotemporal correlations. Each shared spatiotemporal attention convolutional block has different graph convolutional layers and different temporally dilated convolutional layers, where the dilation factor of the dilated convolutional layers is up to 2 to control the receptive field size in the temporal dimension. All blocks in each layer share the same spatial and temporal attention, enabling each layer to learn different spatiotemporal dependency patterns. The adjacency matrix learned by the graph optimization module is used as input to the GCN (Graph Convolutional Neural Network). The key idea of ​​the graph optimization module is to use better node embeddings to learn a better graph structure. To obtain good initial node embeddings, this invention proposes using the nonodevec algorithm to extract node embedding features.

[0072] In this embodiment, since mobile phone data and road network population conditions have strong periodicity, traffic conditions during morning and evening rush hours may be similar on consecutive workdays. Furthermore, lower temperatures or holidays may delay the morning rush hour. The temporal dilation convolution designed in this invention uses four two-dimensional convolutions to extract temporal features at different scales. To further improve the ability to extract global temporal features, a dilation factor is used to sample the input data. Since increasing the dilation factor fills the time dimension of the data and increases the sequence length, this allows a large amount of useless noise data to enter the network for training, reducing model performance. Therefore, in this embodiment, the dilation factor is limited to 1 and 2. The value of the dilation factor is determined based on the current layer number; the formula is:

[0073] dilation factor=Number of layers%2

[0074] Wherein, dilation factor represents the dilation factor, number of layers represents the number of network layers, and % represents the remainder.

[0075] In this embodiment, by increasing the number of layers and the length of the input data sequence, the value of the expansion factor changes with the sequence, thereby improving the model's ability to extract global temporal features. Since the four convolutional kernels have different sizes (1x2, 1x3, 1x6, and 1x7), the four outputs are cut to the same length using the largest kernel as the standard, and then connected across channels. The fully connected layer then changes the input sequence length to match the output sequence length of the GCN and adds it to the GCN's input. The expression is:

[0076]

[0077] Where TDC represents temporally dilated convolution, This represents the input to the b-th block in the l-th layer. `cat` is a function that divides the matrix of the previous calculation results into multiple matrices, and `truncation` is a function that truncates a matrix according to a certain dimension. denoted as 2D dilated convolution operation, W represents the learnable parameters, and b represents the learnable parameters.

[0078] In this embodiment, the shared attention consists of shared spatial attention and shared temporal attention, which can further understand the global correlation of each layer of temporal and spatial features and capture the spatiotemporal correlation of traffic data.

[0079] A shared spatial attention mechanism is proposed. Pedestrian traffic on a road segment reflects population size and is influenced by pedestrian traffic on other road segments, with these factors interacting. To model dynamic global spatial features, a shared spatial attention mechanism is designed to adaptively extract correlations between road networks. To improve stability, the shared spatial attention mechanism employs a multi-head attention mechanism, whose expression includes:

[0080]

[0081]

[0082]

[0083]

[0084] in, This represents the sum of the outputs of the optimized graph convolution and the temporally dilated convolution, b represents the current block number, l represents the current layer number in the neural network, GCN represents the optimized graph convolutional neural network, X represents the input data, TDC represents the temporally dilated convolution, query represents the query matrix, cat represents the operation of concatenating multiple matrices, split will split the input into multiple matrices, Q represents the learnable parameters, and b Q K represents the learnable parameter, key represents the relevance matrix between the queried information and other information, K represents the learnable parameter, and b H V represents the learnable parameters, value represents the matrix of queried information, and b represents the learnable parameters. V This represents the learnable parameters.

[0085] The attention score is calculated using the query and key, expressed as follows:

[0086]

[0087]

[0088] Where score represents spatial attention score, matmul represents matrix multiplication, and T represents matrix transpose. This represents spatial attention, and value represents a matrix of the information being queried.

[0089] Vertex v in the graph structure i With vertex v j The correlation is:

[0090]

[0091]

[0092] Where · represents the vector inner product operation, N represents the number of nodes, d = hid / heads, hid represents the dimension of the hidden state, and heads represents the number of matrices after dividing a matrix into multiple matrices.

[0093] In this embodiment, heads = 8; and a shared attention mechanism is used to further enhance the ability to dynamically model spatiotemporal dependencies, i.e., each layer uses the same structure, but processes different data, and the output of all blocks in each layer will pass through the same shared attention layer. This mechanism enables the network to learn different information patterns in each layer.

[0094] Shared temporal attention; Shared temporal attention and shared spatial attention share the same structure, but the main difference lies in that the attention is calculated along the time axis instead of the spatial axis. Therefore, shared temporal attention can extract global temporal features of the traffic road network and then fuse them with the output of shared spatial attention, as shown below:

[0095]

[0096] Where, output att Denotes the initial spatiotemporal features, and gelu represents the activation function. Indicates spatial characteristics, Indicates time characteristics.

[0097] The optimized graph module adaptively learns and optimizes the graph adjacency matrix from the data, effectively capturing the dynamic spatial features of the road network. Its core idea is to optimize the graph's adjacency matrix using optimized node feature vectors. During model training, node feature vectors are encapsulated as network parameters, thus allowing them to be trained and updated along with other network parameters. Weighted cosine similarity is then used to calculate the similarity between nodes, as shown below:

[0098]

[0099]

[0100] in, Let W represent the p-th similarity vector between node i and node j, where CosSimilarity represents the cosine similarity between the two vectors. p vec represents the weighted parameter vector, ⊙ represents the Hadamard product, and vec represents the weighted parameter vector. i vec represents the feature vector of node i. j Let s represent the feature vector of node j. i,j Let m represent the similarity matrix between node i and node j, m represent the number of weighted parameter vectors, and p represent the current similarity vector number.

[0101] To obtain good initial node characteristics, the nodevec algorithm was used on a predefined adjacency matrix. The p value of nodevec was set to 1, and the q value was set to 2. Figure 2 This demonstrates that the similarity matrix is ​​fed into a dual-gated aggregation mechanism, which controls how much of the learned graph structure can be updated in the current graph. Therefore, a portion of the original graph structure's state can be preserved, and deep hidden relationships can be learned.

[0102]

[0103] A o =μ(λA+(1-λ)A s )+(1-μ)A (1)

[0104] A is obtained by linear normalization of the similarity matrix. s A is a predefined graph, and A is calculated during model initialization using cosine similarity and normalization. (1) λ and μ are hyperparameters used to control the proportion of the currently optimized graph structure in the graph adjacency matrix.

[0105] In this embodiment, the formula for optimizing the adjacency matrix of the graph structure includes:

[0106]

[0107] T k (A o ) = 2A o T k-1 (A o )-T k-2 (A o )

[0108]

[0109] T1 = Softmax(E)

[0110] Among them, g θ (.) denotes a function of the eigenvalues ​​of L, where L represents the normalized graphical Laplacian matrix, K represents the order, and θ k T represents the learnable parameters. k (.) denotes a Chebyshev polynomial. Rescaling is performed based on the largest eigenvalue of L and the identity matrix, where X represents the input data, and A... o Let represent the optimized graph adjacency matrix, E represent the intermediate states of T1, and ReLU represent the activation function. T0 is the transpose of A0, T1 represents the identity matrix, T1 represents the normalization of E, and Softmax represents the normalization function.

[0111] One pass of the gradient descent algorithm on all training data is called one round. Each round updates the model's parameters, with a maximum of 100 rounds. During the 100 iterations of training the model, the model that achieves the minimum error on the validation dataset and its parameters are saved and used to generate evaluation metrics on the test set.

[0112] Evaluation metrics: Three metrics are used: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE); their expressions are as follows:

[0113]

[0114]

[0115]

[0116] Where x represents the input data, This represents the output data, where N represents the number of data points.

[0117] The above-described embodiments further illustrate the purpose, technical solution, and advantages of the present invention. It should be understood that the above-described embodiments are merely preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made to the present invention within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data, characterized in that, include: Construct a road network structure diagram; The process involves acquiring road traffic data, preprocessing it to obtain mobile phone data and road network data, matching and mapping the road network data and mobile phone data to obtain road network population data, and constructing a road network population map based on this data. The road network population map is then input into a population prediction model to obtain population prediction results. The population prediction model is a shared spatiotemporal attention convolutional network, which consists of at least two shared spatiotemporal convolutional layers. Each shared spatiotemporal convolutional layer includes an optimization graph module and multiple shared spatiotemporal convolutional blocks. Residual connections are established between shared spatiotemporal convolutional layers and between shared spatiotemporal convolutional blocks. The process of processing road network population maps using population prediction models includes: S1: Set the number of training iterations for the model; S2: Use the nonodevec algorithm to extract the initial features of each node in the road network population map; S3: Construct the adjacency matrix of the road network population map; S4: Input the constructed adjacency matrix and the initial features of the nodes into the first shared spatiotemporal convolutional layer to extract spatiotemporal features and obtain the first spatiotemporal feature map of the road network population map; S41: Input the adjacency matrix and initial features into the first shared spatiotemporal convolutional block to extract initial spatiotemporal features and obtain the first initial spatiotemporal feature map; S42: Perform residual convolution between the first initial spatiotemporal feature map and the initial features, and use the result of the residual convolution as the input of the next shared spatiotemporal convolution block until all shared spatiotemporal convolution blocks have processed the data; S43: Fuse all the initial spatiotemporal feature maps to obtain a spatiotemporal feature map; S5: Convolve the first spatiotemporal feature map with the adjacency matrix and use it as the input to the next shared spatiotemporal convolutional layer to obtain the nth spatiotemporal feature map; until all shared spatiotemporal convolutional layers have processed the data; S6: Calculate the similarity of each node in the last layer of spatiotemporal feature map and construct a similarity matrix; S7: Normalize the similarity matrix, aggregate the normalized similarity matrix with the initial features to obtain new initial features, and return to step S4; S8: After the model reaches the required number of iterations, all spatiotemporal feature maps are fused to obtain the population prediction result.

2. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 1, characterized in that, Road traffic data includes: road information, road intersection node information, road traffic data, surrounding environment information, and mobile phone data. Among them, mobile phone data includes the user's basic information, the user's location, and the user's movement trajectory.

3. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 1, characterized in that, The process of shared spatiotemporal convolutional blocks processing input data includes: S411: Perform dilated convolution processing on the input data; S412: Input the adjacency matrix after dilation convolution into the optimized graph convolutional network for optimization. S413: Employ a shared spatial attention mechanism to extract spatial features of the optimized adjacency matrix; S414: Temporal dilation convolution is used to process the data after dilation convolution. S415: A shared temporal attention mechanism is used to extract temporal features from the data after dilated convolution processing to obtain temporal features; S416: A shared attention mechanism is used to fuse temporal and spatial features to obtain initial spatiotemporal features.

4. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 3, characterized in that, Formulas for optimizing the adjacency matrix of a graph structure include: ; ; ; ; in, Let K represent a function of eigenvalues ​​L, where L is the normalized graph Laplacian matrix and K is the order. Represents the learnable parameters. Represents the Chebyshev polynomial. Rescaling is performed based on the largest eigenvalue of L and the identity matrix. Indicates input data, This represents the optimized graph adjacency matrix. Indicates calculation The intermediate state, This represents the activation function. yes The transpose of the matrix, This indicates that E is being normalized. This represents the normalization function.

5. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 3, characterized in that, The shared spatial attention mechanism employs a multi-head attention mechanism, the expression of which includes: ; ; ; ; in, This represents the sum of the outputs of the optimized graph convolution and the temporally dilated convolution. This indicates which block has been passed, and 'l' indicates which layer of the neural network is currently in. This represents an optimized graph convolutional neural network. Indicates input data, This represents temporally dilated convolution. Represents the query matrix. This represents the operation of concatenating multiple matrices. The input will be split into multiple matrices. Indicates learnable parameters, Indicates learnable parameters, This represents a relevance matrix between the queried information and other information. Indicates learnable parameters, Indicates learnable parameters, A matrix representing the queried information. Indicates learnable parameters, This represents the learnable parameters.

6. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 3, characterized in that, The formula for fusing temporal attention and spatial attention is: ; in, Represents the initial spatiotemporal characteristics. This represents the activation function. Indicates spatial characteristics, Indicates time characteristics.

7. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 3, characterized in that, The process of processing input data using dilated convolution includes: extracting temporal features at different scales using four two-dimensional convolutions, and sampling the input data using a dilation factor; the formula for determining the dilation factor is: ; in, Indicates the expansion factor. Indicates the number of network layers. It indicates taking the remainder.

8. The population prediction method based on shared spatiotemporal attention convolutional networks using mobile phone data according to claim 1, characterized in that, The formula for calculating the similarity between nodes in the spatiotemporal feature map is as follows: ; ; in, This represents the p-th similarity vector between node i and node j. Cosine similarity represents the similarity between two vectors. Represents the weighted parameter vector. This represents the Hadamard product. This represents the feature vector of node i. This represents the feature vector of node j. This represents the similarity matrix between node i and node j. This indicates the number of weighted parameter vectors. This indicates which similarity vector is being used.