Traffic prediction method and device, and storage medium

By dividing the geographic region into autonomous domains and identifying population flow models, and using pre-trained models to process feature data, the problem of predicting telecommunications networks lacking historical traffic data is solved, achieving higher prediction accuracy and interpretability.

CN115269670BActive Publication Date: 2026-06-16HUAWEI TECH CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2021-04-30
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies cannot effectively predict telecommunications network data traffic in the absence of historical traffic data, and their reliance on high-quality historical traffic data leads to insufficient prediction accuracy.

Method used

By dividing autonomous regions based on geographic information data and population flow data, the population flow models and characteristics of sub-regions are determined. A pre-trained flow prediction model is used to process the population flow characteristics to generate flow prediction curves, thereby achieving data flow prediction.

🎯Benefits of technology

It enables accurate data flow prediction in geographical areas without historical flow data, improving the accuracy and interpretability of predictions and reducing reliance on historical flow data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115269670B_ABST
    Figure CN115269670B_ABST
Patent Text Reader

Abstract

The application relates to a traffic prediction method and device and a storage medium, wherein the method comprises the following steps: performing autonomous domain division on a geographical region to be predicted according to geographical information data and crowd flow data of the geographical region, and obtaining a plurality of sub-regions; for any sub-region, determining a crowd flow model of the sub-region according to the geographical information data and the crowd flow data of the sub-region, wherein the crowd flow model is used for indicating a multi-point motion mode of crowds in the sub-region; determining a crowd flow feature of the sub-region according to the crowd flow model; and predicting data traffic of the sub-region according to the crowd flow feature of the sub-region, and obtaining a data traffic prediction result of the sub-region. The embodiment of the application predicts the data traffic of the sub-region based on the crowd flow model, can not only predict the geographical region without historical traffic data, but also can improve the accuracy of data traffic prediction.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a method, apparatus and storage medium for traffic prediction. Background Technology

[0002] Network traffic forecasting is of great significance in telecommunications network control and management. Long-term network traffic forecasting helps in network traffic planning to better address potential network problems; short-term network traffic forecasting facilitates real-time dynamic planning of various network resources, such as bandwidth allocation, load balancing, and base station energy saving.

[0003] Currently, data traffic prediction for telecommunications networks typically relies on the network's own traffic statistics. For example, it utilizes the temporal patterns of traffic data from wireless cells to predict data traffic by mining the correlation between historical and future data—that is, using historical traffic data time-series curves to predict future traffic data. However, this method usually requires a high-quality accumulation of historical traffic data; without such data, prediction is impossible. Summary of the Invention

[0004] In view of this, a flow prediction method, device and storage medium are proposed.

[0005] In a first aspect, embodiments of this application provide a traffic flow prediction method, characterized in that the method includes: dividing the geographical region into traffic autonomous regions based on geographic information data and population flow data of the geographical region to be predicted, obtaining multiple sub-regions; for any sub-region, determining a population flow model of the sub-region based on the geographic information data and population flow data of the sub-region, the population flow model being used to indicate the multi-point movement pattern of the population within the sub-region; determining the population flow characteristics of the sub-region based on the population flow model, the population flow characteristics being used to indicate the frequency of occurrence of the population flow model within the sub-region; and predicting the data flow of the sub-region based on the population flow characteristics of the sub-region, obtaining a data flow prediction result for the sub-region.

[0006] According to embodiments of this application, a traffic autonomous region can be divided into multiple sub-regions based on geographic information data and population flow data of the geographic region to be predicted. For any sub-region, a population flow model of the sub-region is determined based on the geographic information data and population flow data of the sub-region, and the population flow characteristics of the sub-region are determined based on the population flow model. Then, the data flow of the sub-region is predicted based on the population flow characteristics of the sub-region, and the data flow prediction result of the sub-region is obtained. Thus, by constructing a population flow model based on the sub-region and predicting the data flow of the sub-region based on the population flow characteristics determined by the population flow model, it is possible not only to predict the data flow of geographic regions without historical traffic data, making the data flow prediction independent of the historical traffic data of the geographic region to be predicted, but also to improve the accuracy of data flow prediction.

[0007] According to the first aspect, in a first possible implementation of the traffic prediction method, the data traffic prediction result includes a traffic prediction curve. The step of predicting the data traffic of the sub-region based on the population flow characteristics of the sub-region to obtain the data traffic prediction result of the sub-region includes: processing the population flow characteristics using a pre-trained traffic prediction model to obtain the coefficients of the traffic prediction curve for the sub-region, wherein the traffic prediction curve is a linear combination of traffic category curves corresponding to the traffic prediction model, and the coefficients of the traffic prediction curve are used to indicate the weight of each traffic category curve in the traffic prediction curve; and determining the traffic prediction curve for the sub-region based on the coefficients of the traffic prediction curve for the sub-region and the traffic category curves corresponding to the traffic prediction model.

[0008] In this embodiment, the flow characteristics of the crowd can be processed by a pre-trained flow prediction model (i.e., a trained flow model) to obtain the coefficients of the flow prediction curve for a sub-region. Based on the coefficients of the flow prediction curve for the sub-region and the flow category curve corresponding to the flow prediction model, the data flow prediction result (i.e., the flow prediction curve) for the sub-region is determined. This not only enables data flow prediction using a flow prediction model that takes the flow characteristics of the crowd as input, making the data flow prediction independent of the historical flow data of the geographic area to be predicted, but also improves the accuracy of data flow prediction by using the coefficients of the flow prediction curve for the sub-region determined by the flow prediction model and the flow category curve corresponding to the flow prediction model to determine the data flow prediction result for the sub-region.

[0009] According to the first aspect, in a second possible implementation of the traffic prediction method, determining the crowd flow model of the sub-region based on the geographic information data and crowd flow data of the sub-region includes: determining the location information of key landmarks in the sub-region based on the geographic information data of the sub-region; determining a crowd flow feature map of the sub-region based on the crowd flow data and the location information of the key landmarks, wherein the crowd flow feature map is a directed graph including multiple nodes and lines connecting the nodes, the nodes representing key landmarks, and the lines representing the crowd flow direction between the nodes; and extracting the crowd flow model of the sub-region from the crowd flow feature map.

[0010] In this embodiment, by determining the location information of key landmarks in a sub-region and based on the crowd flow data and location information of key landmarks in the sub-region, a crowd flow feature map of the sub-region is determined, and then a crowd flow motif is extracted from the crowd flow feature map. This not only improves the accuracy and extraction efficiency of the crowd flow motif, but also enables each node of the crowd flow motif to have spatial semantics, thereby improving the interpretability of the crowd flow motif.

[0011] According to the first possible implementation of the first aspect, in the third possible implementation of the traffic prediction method, the method further includes: training the traffic prediction model according to a preset sample set, wherein the sample set includes geographic information data, population flow data and historical traffic data of multiple sample areas.

[0012] In this embodiment, the traffic prediction model is trained using a preset sample set to obtain a trained traffic prediction model, which can improve the accuracy of the traffic prediction model and thus improve the accuracy of data traffic prediction.

[0013] According to the third possible implementation of the first aspect, in the fourth possible implementation of the traffic prediction method, the step of training the traffic prediction model based on a preset sample set includes: determining the population flow characteristics of each sample area based on the geographic information data and population flow data of each sample area in the sample set; determining the traffic category curve based on the historical traffic data of the multiple sample areas; determining the first traffic curve of each sample area based on the traffic category curve, wherein the first traffic curve is a linear combination of the traffic category curves; and training the traffic prediction model by taking the population flow characteristics of each sample area as input and the coefficients of the first traffic curves of each sample area as output.

[0014] In this embodiment, by determining the crowd flow characteristics and first flow curve of each sample region in the sample set, and using the crowd flow characteristics of each sample region as input and the coefficients of the first flow curve of each sample region as output, the flow prediction model is trained. This not only improves the accuracy of the flow prediction model but also makes it independent of historical flow data, thereby improving its transferability. Furthermore, restricting the crowd flow characteristics to the sample region also improves the interpretability of the flow prediction model.

[0015] According to the fourth possible implementation of the first aspect, in the fifth possible implementation of the traffic prediction method, determining the traffic category curve based on the historical traffic data of the plurality of sample areas includes: determining the second traffic curve of each sample area based on the historical traffic data of each sample area; and clustering the second traffic curves of the plurality of sample areas to obtain the traffic category curve.

[0016] In this embodiment, the second flow curve of each sample area is determined based on the historical flow data of each sample area, and the second flow curve is clustered to obtain the flow category curve. This method is simple, fast, and highly accurate, thereby improving processing efficiency and accuracy.

[0017] According to the first aspect or any one of the first to fifth possible implementations of the first aspect, in the sixth possible implementation of the traffic prediction method, the geographic information data includes at least one of the following: map of the geographic area, road network, point of interest, region of interest, building type, or social management grid; and the crowd flow data includes at least one of the following: online crowd flow big data, crowd trajectory data in minimized drive test data, or base station handover data related to crowd flow.

[0018] According to one or more of the first aspect or multiple possible implementations of the first aspect, in the seventh possible implementation of the traffic prediction method, the method is applied to data traffic prediction of a telecommunications network, and the data traffic prediction result includes the data traffic prediction result of the telecommunications network.

[0019] In this embodiment, the traffic prediction method is applied to the data traffic prediction of telecommunications networks. It can predict the data traffic in geographical areas with no historical traffic data or with poor quality historical traffic data, and obtain the data traffic prediction results. The data traffic prediction results can then be used as a reference for telecommunications operators to make decisions such as network planning, bandwidth allocation, load balancing, and base station energy saving.

[0020] Secondly, embodiments of this application provide a traffic prediction device, the device comprising: a sub-region division module, configured to divide the geographical region into traffic autonomous regions based on geographical information data and population flow data of the geographical region to be predicted, thereby obtaining multiple sub-regions; a population flow model determination module, configured to determine a population flow model for any sub-region based on the geographical information data and population flow data of the sub-region, the population flow model indicating the multi-point movement pattern of the population within the sub-region; a population flow feature determination module, configured to determine the population flow features of the sub-region based on the population flow model, the population flow features indicating the frequency of occurrence of the population flow model within the sub-region; and a traffic prediction module, configured to predict the data traffic of the sub-region based on the population flow features of the sub-region, thereby obtaining a data traffic prediction result for the sub-region.

[0021] According to embodiments of this application, a traffic autonomous region can be divided into multiple sub-regions based on geographic information data and population flow data of the geographic region to be predicted. For any sub-region, a population flow model of the sub-region is determined based on the geographic information data and population flow data of the sub-region, and the population flow characteristics of the sub-region are determined based on the population flow model. Then, the data flow of the sub-region is predicted based on the population flow characteristics of the sub-region, and the data flow prediction result of the sub-region is obtained. Thus, by constructing a population flow model based on the sub-region and predicting the data flow of the sub-region based on the population flow characteristics determined by the population flow model, it is possible not only to predict the data flow of geographic regions without historical traffic data, making the data flow prediction independent of the historical traffic data of the geographic region to be predicted, but also to improve the accuracy of data flow prediction.

[0022] According to the second aspect, in a first possible implementation of the traffic prediction device, the data traffic prediction result includes a traffic prediction curve, and the traffic prediction module is configured to: process the crowd flow characteristics through a pre-trained traffic prediction model to obtain the coefficients of the traffic prediction curve for the sub-region, wherein the traffic prediction curve is a linear combination of traffic category curves corresponding to the traffic prediction model, and the coefficients of the traffic prediction curve are used to indicate the weight of each traffic category curve in the traffic prediction curve; and determine the traffic prediction curve for the sub-region based on the coefficients of the traffic prediction curve for the sub-region and the traffic category curves corresponding to the traffic prediction model.

[0023] In this embodiment, the flow characteristics of the crowd can be processed by a pre-trained flow prediction model (i.e., a trained flow rate model) to obtain the coefficients of the flow prediction curve for the sub-region. Based on the coefficients of the flow prediction curve for the sub-region and the flow category curve corresponding to the flow prediction model, the data flow prediction result (i.e., the flow prediction curve) for the sub-region is determined. This not only enables data flow prediction using a flow prediction model that takes the flow characteristics of the crowd as input, making the flow rate prediction independent of the historical flow data of the geographic area to be predicted, but also improves the accuracy of data flow prediction by using the coefficients of the flow prediction curve for the sub-region determined by the flow prediction model and the flow category curve corresponding to the flow prediction model to determine the data flow prediction result for the sub-region.

[0024] According to the second aspect, in a second possible implementation of the traffic prediction device, the crowd flow motif determination module is configured to: determine the location information of key landmarks in the sub-region based on the geographic information data of the sub-region; determine a crowd flow feature map of the sub-region based on the crowd flow data of the sub-region and the location information of the key landmarks, wherein the crowd flow feature map is a directed graph including multiple nodes and lines connecting the nodes, the nodes representing key landmarks and the lines representing the crowd flow direction between the nodes; and extract the crowd flow motif of the sub-region from the crowd flow feature map.

[0025] In this embodiment, by determining the location information of key landmarks in a sub-region and based on the crowd flow data and location information of key landmarks in the sub-region, a crowd flow feature map of the sub-region is determined, and then a crowd flow motif is extracted from the crowd flow feature map. This not only improves the accuracy and extraction efficiency of the crowd flow motif, but also enables each node of the crowd flow motif to have spatial semantics, thereby improving the interpretability of the crowd flow motif.

[0026] According to the first possible implementation of the second aspect, in the third possible implementation of the traffic prediction device, the device further includes: a training module, used to train the traffic prediction model according to a preset sample set, wherein the sample set includes geographic information data, population flow data and historical traffic data of multiple sample areas.

[0027] In this embodiment, the traffic prediction model is trained using a preset sample set to obtain a trained traffic prediction model, which can improve the accuracy of the traffic prediction model and thus improve the accuracy of data traffic prediction.

[0028] According to the third possible implementation of the second aspect, in the fourth possible implementation of the traffic prediction device, the training module is configured to: determine the population flow characteristics of each sample area based on the geographical information data and population flow data of each sample area in the sample set; determine the traffic category curve based on the historical traffic data of the multiple sample areas; determine the first traffic curve of each sample area based on the traffic category curve, wherein the first traffic curve is a linear combination of the traffic category curves; and train the traffic prediction model by taking the population flow characteristics of each sample area as input and the coefficients of the first traffic curves of each sample area as output.

[0029] In this embodiment, by determining the crowd flow characteristics and first flow curve of each sample region in the sample set, and using the crowd flow characteristics of each sample region as input and the coefficients of the first flow curve of each sample region as output, the flow prediction model is applied alternately. This not only improves the accuracy of the flow prediction model but also makes it independent of historical flow data, thereby improving its transferability. Furthermore, restricting the crowd flow characteristics to the sample region also improves the interpretability of the flow prediction model.

[0030] According to the fourth possible implementation of the second aspect, in the fifth possible implementation of the traffic prediction device, determining the traffic category curve based on the historical traffic data of the plurality of sample areas includes: determining the second traffic curve of each sample area based on the historical traffic data of each sample area; and clustering the second traffic curves of the plurality of sample areas to obtain the traffic category curve.

[0031] In this embodiment, the second flow curve of each sample area is determined based on the historical flow data of each sample area, and the second flow curve is clustered to obtain the flow category curve. This method is simple, fast, and highly accurate, thereby improving processing efficiency and accuracy.

[0032] According to the second aspect or any one of the first to fifth possible implementations of the second aspect, in the sixth possible implementation of the traffic prediction device, the geographic information data includes at least one of the following: map of the geographic area, road network, point of interest, area of ​​interest, building type, or social management grid; and the crowd flow data includes at least one of the following: online crowd flow big data, crowd trajectory data in minimized drive test data, or base station switching data related to crowd flow.

[0033] According to one or more of the second aspect or various possible implementations of the second aspect, in the seventh possible implementation of the traffic prediction device, the device is applied to data traffic prediction of a telecommunications network, and the data traffic prediction result includes the data traffic prediction result of the telecommunications network.

[0034] In this embodiment, the traffic prediction device is applied to the data traffic prediction of telecommunications networks. It can predict the data traffic in geographical areas with no historical traffic data or with poor quality historical traffic data, and obtain the data traffic prediction results. The data traffic prediction results can then be used as a reference for telecommunications operators to make decisions such as network planning, bandwidth allocation, load balancing, and base station energy saving.

[0035] Thirdly, embodiments of this application provide a traffic prediction apparatus, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement one or more of the traffic prediction methods described in the first aspect or various possible implementations of the first aspect when executing the instructions.

[0036] According to embodiments of this application, a traffic autonomous region can be divided into multiple sub-regions based on geographic information data and population flow data of the geographic region to be predicted. For any sub-region, a population flow model of the sub-region is determined based on the geographic information data and population flow data of the sub-region, and the population flow characteristics of the sub-region are determined based on the population flow model. Then, the data flow of the sub-region is predicted based on the population flow characteristics of the sub-region, and the data flow prediction result of the sub-region is obtained. Thus, by constructing a population flow model based on the sub-region and predicting the data flow of the sub-region based on the population flow characteristics determined by the population flow model, it is possible not only to predict the data flow of geographic regions without historical traffic data, making the data flow prediction independent of the historical traffic data of the geographic region to be predicted, but also to improve the accuracy of data flow prediction.

[0037] Fourthly, embodiments of this application provide a non-volatile computer-readable storage medium storing computer program instructions thereon, which, when executed by a processor, implement one or more of the traffic prediction methods described in the first aspect or various possible implementations of the first aspect.

[0038] According to embodiments of this application, a traffic autonomous region can be divided into multiple sub-regions based on geographic information data and population flow data of the geographic region to be predicted. For any sub-region, a population flow model of the sub-region is determined based on the geographic information data and population flow data of the sub-region, and the population flow characteristics of the sub-region are determined based on the population flow model. Then, the data flow of the sub-region is predicted based on the population flow characteristics of the sub-region, and the data flow prediction result of the sub-region is obtained. Thus, by constructing a population flow model based on the sub-region and predicting the data flow of the sub-region based on the population flow characteristics determined by the population flow model, it is possible not only to predict the data flow of geographic regions without historical traffic data, making the data flow prediction independent of the historical traffic data of the geographic region to be predicted, but also to improve the accuracy of data flow prediction.

[0039] Fifthly, embodiments of this application provide a computer program product including computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code. When the computer-readable code is run in an electronic device, the processor in the electronic device executes one or more of the traffic prediction methods described in the first aspect or various possible implementations of the first aspect.

[0040] According to embodiments of this application, a traffic autonomous region can be divided into multiple sub-regions based on geographic information data and population flow data of the geographic region to be predicted. For any sub-region, a population flow model of the sub-region is determined based on the geographic information data and population flow data of the sub-region, and the population flow characteristics of the sub-region are determined based on the population flow model. Then, the data flow of the sub-region is predicted based on the population flow characteristics of the sub-region, and the data flow prediction result of the sub-region is obtained. Thus, by constructing a population flow model based on the sub-region and predicting the data flow of the sub-region based on the population flow characteristics determined by the population flow model, it is possible not only to predict the data flow of geographic regions without historical traffic data, making the data flow prediction independent of the historical traffic data of the geographic region to be predicted, but also to improve the accuracy of data flow prediction.

[0041] These and other aspects of this application will become more apparent in the description of the following embodiments(s). Attached Figure Description

[0042] The accompanying drawings, which are included in and form part of this specification, illustrate exemplary embodiments, features, and aspects of this application together with the specification and serve to explain the principles of this application.

[0043] Figure 1 A schematic diagram illustrating an application scenario of a traffic prediction method according to an embodiment of this application is shown.

[0044] Figure 2 A schematic diagram illustrating an application scenario of a traffic prediction method according to an embodiment of this application is shown.

[0045] Figure 3 A flowchart illustrating a traffic prediction method according to an embodiment of this application is shown.

[0046] Figure 4a A schematic diagram of a crowd flow model according to an embodiment of this application is shown.

[0047] Figure 4b A schematic diagram showing an abstract representation of a crowd flow model according to an embodiment of this application is provided.

[0048] Figure 5 A flowchart illustrating a traffic prediction method according to an embodiment of this application is shown.

[0049] Figure 6 A schematic diagram illustrating the migration of a traffic prediction model according to an embodiment of this application is shown.

[0050] Figure 7 A block diagram of a flow prediction apparatus according to an embodiment of this application is shown. Detailed Implementation

[0051] Various exemplary embodiments, features, and aspects of this application will now be described in detail with reference to the accompanying drawings. The same reference numerals in the drawings denote elements that have the same or similar functions. Although various aspects of the embodiments are shown in the drawings, they are not necessarily drawn to scale unless specifically indicated otherwise.

[0052] The term “exemplary” as used herein means “serving as an example, embodiment, or illustration.” Any embodiment illustrated herein as “exemplary” is not necessarily to be construed as superior to or better than other embodiments.

[0053] Furthermore, to better illustrate this application, numerous specific details are provided in the following detailed embodiments. Those skilled in the art should understand that this application can be implemented without certain specific details. In some instances, methods, means, components, and circuits well-known to those skilled in the art have not been described in detail in order to highlight the main points of this application.

[0054] As 5G networks enter large-scale deployment, telecom operators will further increase their investments, and the pressure of rising capital expenditures (CAPEX) and operating expenses (OPEX) will continue. During the network investment phase, facing the construction of new 5G networks and the expansion of 4G networks, telecom operators need to accurately predict network data traffic growth (e.g., annual growth), identify data traffic hotspots and value areas, and rationally plan network investments to effectively improve average revenue per user (ARPU).

[0055] Currently, common methods for network data traffic forecasting include curve fitting, market share analysis, traditional time series forecasting, baseline analogy, and deep learning-based methods. Curve fitting, which fits historical traffic data and selects the curve with the highest fit for trend extrapolation, is simple but struggles to match long-term trends, resulting in low prediction accuracy. Market share analysis extrapolates trends based on current terminal status, consumer preferences, and average data usage per user, relying on operator user data, which is difficult to obtain. Traditional time series forecasting methods, including the autoregressive integrated moving average (ARIMA) model, linear regression, and Bayesian models, use limited data features and have low computational efficiency, failing to meet the forecasting requirements of the big data era. While baseline analogy can predict traffic from blank networks lacking historical data, it requires collecting extensive basic information, relies on subjective experience, and has poor accuracy.

[0056] Deep learning-based methods, such as those based on long short-term memory (LSTM) networks and sequence-to-sequence (seq2seq) models, not only rely on historical traffic data and have high requirements for such data, but also suffer from problems such as difficulty in transfer (effective only for regions with historical traffic data) and low accuracy in long-term time series prediction.

[0057] To address the aforementioned technical problems, this application provides a traffic flow prediction method. The method, according to embodiments of this application, divides the geographical region into traffic flow autonomous regions (DAORs) based on geographic information data and population flow data of the region to be predicted, resulting in multiple sub-regions. For any sub-region, a population flow model (indicating multi-point movement patterns of people within the sub-region) is determined based on its geographic information data and population flow data, thereby determining its population flow characteristics. Based on these characteristics, the data flow of the sub-region is predicted, yielding a data flow prediction result. This method constructs a population flow model based on sub-regions and predicts the data flow of the sub-regions based on the population flow characteristics determined by the model. It not only enables prediction for geographical regions lacking historical traffic data, making data flow prediction independent of the historical traffic data of the geographical region to be predicted, but also improves the accuracy of data flow prediction.

[0058] The traffic prediction method of this application can be applied to electronic devices. The electronic device may be, for example, a server, desktop computer, mobile device, or any other type of computing device containing a processor; this application does not limit the specific type of electronic device. The electronic device may include a processor, which can be used to execute the traffic prediction method.

[0059] A processor may include one or more processing units, such as a central processing unit (CPU), an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and / or a neural network processing unit (NPU). These different processing units may be independent devices or integrated into one or more processors.

[0060] Figure 1 A schematic diagram illustrating an application scenario of a traffic prediction method according to an embodiment of this application is shown. For example... Figure 1As shown, the electronic device 10 includes a processor 20, which executes the traffic prediction method. A database 30, connected to the processor 20, is deployed on another device outside the electronic device and is used to store geographic information data and population flow data. The processor 20 can read geographic information data and population flow data of the geographic area to be predicted from the database 30, and predict the data flow of that geographic area based on the read geographic information data and population flow data. The database 30 can be a single database or multiple databases of the same or different types; this application does not impose any restrictions on this.

[0061] Figure 2 A schematic diagram illustrating an application scenario of a traffic prediction method according to an embodiment of this application is shown. For example... Figure 2 As shown, the electronic device 10 includes a processor 20 and a database 30. The processor 20 executes the traffic prediction method, and the database 30 stores geographic information data and population flow data. The processor 20 can read geographic information data and population flow data of the geographic area to be predicted from the database 30, and predict the data flow of that geographic area based on the read geographic information data and population flow data.

[0062] In one possible implementation, when there are multiple databases, some databases can be deployed on electronic devices, while others can be deployed on other devices. Those skilled in the art can configure the database deployment according to actual circumstances, and this application does not impose any limitations on this.

[0063] In one possible implementation, the traffic prediction method can be applied to data traffic prediction in telecommunications networks. The obtained data traffic prediction results can serve as a reference for telecommunications operators when making decisions such as network planning, bandwidth allocation, load balancing, and base station energy saving. The traffic prediction method can also be applied to other scenarios requiring data traffic prediction, such as advertising and market assessment. It should be noted that this application does not limit the specific application scenarios of the traffic prediction method.

[0064] Figure 3 A flowchart illustrating a traffic prediction method according to an embodiment of this application is shown. Figure 3 As shown, the traffic prediction method includes:

[0065] Step S310: Based on the geographic information data and population flow data of the geographic area to be predicted, the geographic area is divided into autonomous traffic domains to obtain multiple sub-regions.

[0066] The geographical area to be predicted can be either a geographical area with or without historical traffic data; this application makes no limitation on this. The geographical area can be an administrative region such as a province, city, district, or county, or a geographical area set up according to actual needs (e.g., the needs of 5G or 4G network deployment). This application makes no limitation on the specific type and size of the geographical area.

[0067] Geographic information data may include at least one of the following: map, road network, point of interest (POI), area of ​​interest (AOI), building type, or social management grid of the geographic area to be predicted. Optionally, geographic information data may also include other spatial semantic data such as land use type and place semantics. Geographic information data may be obtained from a pre-defined geographic information system (GIS) database, or through other means; this application does not impose any restrictions on this.

[0068] Crowd movement data may include at least one of the following: online crowd movement big data (i.e., coarse-grained data), crowd trajectory data from minimized drive-test (MDT) data, or base station handover (HO) data related to crowd movement. Among these, crowd trajectory data from minimized drive-test data and base station handover data related to crowd movement can be considered fine-grained data. Crowd movement data can be obtained from base stations, terminals used for minimized drive-tests, etc., or from big data centers, pre-set crowd movement trajectory databases, etc. This application does not limit the source of crowd movement data.

[0069] After obtaining geographic information data and population flow data for the geographic area to be predicted, traffic autonomous zones (TAZs) can be divided into multiple sub-regions based on this data. A TAZ is a spatial partitioning method based on the road network, points of interest (POIs), and the interaction intensity between various spatial locations within the geographic area (which can be determined through population flow data). Dividing the geographic area into multiple sub-regions ensures that the population flow patterns within each sub-region are highly consistent, thus better accommodating similar population types and achieving more rational functional zoning.

[0070] In one possible implementation, a network community discovery algorithm can be combined to divide the geographic region to be predicted into traffic autonomous regions (TAZs) to obtain multiple sub-regions, resulting in more population movement within each sub-region and less population movement between sub-regions.

[0071] In one possible implementation, for any given sub-region, the theme of that sub-region can be determined based on land use type, place semantics, and other data from its geographic information data. For example, a theme for the sub-region, such as school, residential area, or hospital, can be generated using a topic generation model (latent dirichlet allocation, LDA), based on the land use type and place semantics data. By determining the theme for sub-regions, their understandability and interpretability can be improved.

[0072] Step S320: For any sub-region, determine the population flow model of the sub-region based on the geographic information data and population flow data of the sub-region.

[0073] A motif can be defined as a recurring isomorphic subgraph in a network, appearing far more frequently in real-world networks than in random networks with the same number of nodes and connections. A motif can locally characterize specific patterns of interconnection within a given network and, from bottom to top, construct complex networks with different global structures; it can be considered the "primitive" of a real-world network. In other words, a motif microscopically characterizes the adaptive patterns of interaction within a real-world network and, from bottom to top, constitutes the global network structure.

[0074] In this embodiment, the crowd flow phantom of a sub-region can be used to indicate the multi-point movement pattern of crowds within the sub-region, and can be determined based on the geographic information data and crowd flow data of the sub-region. For example, for any sub-region, the crowd flow phantom of the sub-region can be extracted based on the geographic information data and crowd flow data of that sub-region using a frequent subgraph mining algorithm, such as fast frequent subgraph mining (FFSM).

[0075] By combining the crowd flow phantom with sub-regions, the spatial semantic information of each node in the crowd flow phantom becomes clear. For example, each node is a point of interest area that can be interpreted by land use type, such as subway station, city library, people's square, etc., thereby improving the interpretability of the crowd flow phantom.

[0076] In one possible implementation, step S320 may include: determining the location information of key landmarks in the sub-region based on the geographic information data of the sub-region; determining a crowd flow feature map of the sub-region based on the crowd flow data of the sub-region and the location information of the key landmarks, wherein the crowd flow feature map is a directed graph including multiple nodes and lines connecting the nodes, the nodes representing key landmarks and the lines representing the direction of crowd flow between the nodes; and extracting the crowd flow model of the sub-region from the crowd flow feature map.

[0077] For any sub-region, the location information of key landmarks within the sub-region can be determined based on its geographic information data, such as maps, points of interest (POIs), building types, and land use types. Examples of key landmarks include hospitals, parking lots, and shopping malls. Optionally, based on existing experience, key landmarks (such as gates, school buildings, and cafeterias) can be identified within the sub-region, and their locations can then be determined using the sub-region's geographic information data. Multiple key landmarks may be identified.

[0078] After determining the location information of key landmarks, the flow patterns between key landmarks can be determined based on the location information of key landmarks and the crowd flow data of sub-regions (such as MDT data), thus establishing a crowd flow feature map for the sub-regions. The crowd flow feature map is a directed graph containing multiple nodes and lines connecting them. Nodes represent key landmarks, and lines represent the direction of crowd flow between nodes.

[0079] The crowd flow feature map can be represented as G′=(V′,E′,W′), where G′ represents the crowd flow feature map, V′ represents the set of key landmarks and time pairs, E′ represents whether there is interaction (i.e., crowd flow) between two key landmarks and the direction of the interaction, and W′ represents the amount of interaction (i.e., number of people) between two key landmarks.

[0080] FFSM can be used to perform frequent subgraph mining on the crowd flow feature map G′ to obtain crowd flow motifs in sub-regions. These crowd flow motifs can be considered as frequently and stably occurring multi-point movement patterns of crowds in the crowd flow feature map, representing the steady-state high-order movement patterns of the crowd. They can reflect the high-order characteristics of crowd movement, thereby enabling the effective utilization of high-order interaction information between multiple locations.

[0081] Assuming there are m crowd flow phantoms in a sub-region, the i-th crowd flow phantom in the sub-region can be represented as M. i =(V i E i ), where i and m are positive integers and 1 ≤ i ≤ m, V iLet V represent the set of nodes of the i-th crowd flow module, which is also the set of key landmarks of the i-th crowd flow module. Assume that the i-th crowd flow module has n nodes (n is a positive integer). i ={v1,v2,…,v n}, v1 represents the first node, v2 represents the second node, ..., v n Let E represent the nth node. i E represents the directed crowd flow between nodes in the i-th crowd flow model. i =(e jk ) n×n e jk Represents the j-th node v j To the k-th node v k The directed flow of people between them, where j and k are positive integers, and 1≤j≤n, 1≤k≤n.

[0082] Since each key landmark in the sub-region has a clear geographical meaning and its land use type is interpretable, each node in the crowd flow model of the sub-region also has a clear semantic label, which can improve the interpretability of the crowd flow model.

[0083] Figure 4a This diagram illustrates a crowd flow model according to an embodiment of this application. Assuming the sub-region is a campus, and the daily movement pattern of a large number of students within the campus is "dormitory → cafeteria → classroom → dormitory," then the extracted crowd flow model for this sub-region is as follows: Figure 4a As shown.

[0084] Figure 4b A schematic diagram illustrating an abstract representation of a crowd flow phantom according to an embodiment of this application is shown. (The diagram is for illustrative purposes only.) Figure 4a The crowd flow model in the image is abstracted and represented as follows: Figure 4b The diagram shown is as follows. Figure 4b As shown, the crowd flow model is M = (V, E), where V = (v1, v2, v3), v1 represents the first node (dormitory), v2 represents the second node (cafeteria), v3 represents the third node (classroom), and E represents the directed crowd flow between nodes v1, v2, and v3, which can be represented by a matrix, for example...

[0085] By determining the location information of key landmarks in a sub-region and using the crowd flow data and key landmark location information of the sub-region, a crowd flow feature map of the sub-region is determined. Then, crowd flow phantoms are extracted from the crowd flow feature map. This not only improves the accuracy and extraction efficiency of crowd flow phantoms, but also enables each node of the crowd flow phantom to have spatial semantics, thereby improving the interpretability of the crowd flow phantoms.

[0086] Step S330: Determine the crowd flow characteristics of the sub-region based on the crowd flow model.

[0087] The crowd flow characteristics are used to indicate the frequency of occurrence of the crowd flow model within the sub-region.

[0088] After identifying the crowd flow patterns in a sub-region, the crowd flow characteristics of that sub-region can be determined based on the frequency of their occurrence. For example, if five crowd flow patterns are identified in a sub-region, the frequency of each pattern can be determined, where frequency represents the number of people in the sub-region whose movement pattern matches the corresponding pattern. Assuming the frequencies of the five patterns are d1, d2, d3, d4, and d5, the crowd flow characteristic d of the sub-region can be represented as d = {d1, d2, d3, d4, d5}.

[0089] In one possible implementation, u (where u is a positive integer) types of population flow modalities can be preset, and the population flow characteristics of a sub-region can be represented as a u-dimensional vector. For any sub-region, when determining its population flow characteristics, the population flow modalities of the sub-region can be determined based on the u types of population flow modalities, the geographic information data of the sub-region, and the population flow data. The occurrence frequency of each population flow modality in the sub-region can be determined, and then its occurrence frequency is filled into the corresponding position of the u-dimensional vector to obtain the population flow characteristics of the sub-region. For population flow modalities that are not extracted within the sub-region, their occurrence frequency can be set to 0.

[0090] For example, six crowd flow motifs are preset: B1, B2, B3, B4, B5, and B6. Three of these motifs, B2, B4, and B5, are identified from sub-region A. When determining the crowd flow characteristics of sub-region A, the frequency of occurrence of crowd flow motifs B2, B4, and B5 within sub-region A can be determined. Assuming these frequencies are 1000, 800, and 1200 respectively, the crowd flow characteristics of sub-region A can be represented as a 6-dimensional vector (0, 1000, 0, 800, 1200, 0). In this vector, the frequency of occurrence of crowd flow motifs B1, B3, and B6, which are not extracted from sub-region A, is set to 0.

[0091] It should be noted that those skilled in the art can set the representation method of crowd flow characteristics according to the actual situation, and can also perform normalization and other processing on the crowd flow characteristics. This application does not impose any restrictions on this.

[0092] Step S340: Based on the population flow characteristics of the sub-region, predict the data flow of the sub-region to obtain the data flow prediction result of the sub-region.

[0093] This hypothesis assumes that the main source of data traffic is people moving within a geographical area, and that population flow patterns exhibit similarities across different geographical areas (e.g., different urban areas). Based on this assumption, after determining the population flow characteristics of a sub-region, the data traffic of that sub-region can be predicted by comparing the similarity of its population flow characteristics with those of other geographical areas (with historical traffic data) or by using a pre-trained traffic prediction model.

[0094] In one possible implementation, the data traffic prediction result can be represented by a time-spectrum curve, a multi-dimensional vector, etc., and this application does not impose any limitations on this. When the traffic prediction method is applied to data traffic prediction in telecommunications networks, the data traffic prediction result may include the data traffic prediction result of the telecommunications network.

[0095] By repeatedly executing the above steps S320, S330 and S340, the data traffic prediction results for each sub-region of the geographic region to be predicted can be obtained, and the traffic prediction results of all sub-regions within the geographic region can be determined as the data traffic prediction results for the geographic region.

[0096] For example, assuming the geographic region to be predicted includes four sub-regions, namely sub-region A1, sub-region A2, sub-region A3 and sub-region A4, the data traffic prediction results of sub-region A1, sub-region A2, sub-region A3 and sub-region A4 can be determined through the above steps S320, S330 and S340 respectively. Then, the data traffic prediction results of sub-region A1, sub-region A2, sub-region A3 and sub-region A4 are determined as the data traffic prediction results of the geographic region.

[0097] According to embodiments of this application, a traffic autonomous region can be divided into multiple sub-regions based on geographic information data and population flow data of the geographic region to be predicted. For any sub-region, a population flow model of the sub-region is determined based on the geographic information data and population flow data of the sub-region, and the population flow characteristics of the sub-region are determined based on the population flow model. Then, the data flow of the sub-region is predicted based on the population flow characteristics of the sub-region, and the data flow prediction result of the sub-region is obtained. Thus, by constructing a population flow model based on the sub-region and predicting the data flow of the sub-region based on the population flow characteristics determined by the population flow model, it is possible not only to predict the data flow of geographic regions without historical traffic data, making the data flow prediction independent of the historical traffic data of the geographic region to be predicted, but also to improve the accuracy of data flow prediction.

[0098] Figure 5 A flowchart illustrating a traffic prediction method according to an embodiment of this application is shown. Figure 5As shown, the traffic prediction method in this embodiment predicts the data traffic of a sub-region using a traffic prediction model, including steps S300, S310, S320, S330, S3401, and S3402. Steps S3401 and S3402 are... Figure 3 A possible more detailed implementation of step S340 in the illustrated embodiment.

[0099] Step S300: Train the traffic prediction model based on the preset sample set.

[0100] The sample set includes geographic information data, population flow data, and historical traffic data for multiple sample areas.

[0101] By training the traffic prediction model using a pre-defined sample set, the accuracy of the traffic prediction model can be improved, thereby enhancing the accuracy of data traffic prediction.

[0102] In one possible implementation, step S300 may include: determining the population flow characteristics of each sample area based on the geographic information data and population flow data of each sample area in the sample set; determining the flow category curve based on the historical flow data of the multiple sample areas; determining the first flow curve of each sample area based on the flow category curve, wherein the first flow curve is a linear combination of the flow category curves; and training the flow prediction model by taking the population flow characteristics of each sample area as input and the coefficients of the first flow curves of each sample area as output.

[0103] The sample area in the sample set can be an administrative region such as a province, city, district, or county, or a geographical area designated by operators for management convenience, such as the geographical area designated by operators when deploying 4G / 5G networks. This application does not impose any restrictions on this.

[0104] In one possible implementation, for any sample area in the sample set, the theme of the sample area can be determined based on data such as land use type and place semantics in the geographic information data of that sample area. For example, the theme of the sample area, such as school, residential area, or hospital, can be generated using a topic generation model (latent dirichlet allocation, LDA), based on data such as land use type and place semantics in the geographic information data of the sample area. By determining the theme for the sample area, the understandability and interpretability of the sample area can be improved.

[0105] In one possible implementation, the population flow characteristics of each sample area can be determined based on the geographic information data and population flow data of each sample area in the sample set. For example, for any sample area, a population flow model of the sample area can be determined based on its geographic information data and population flow data; then, the population flow characteristics of the sample area can be determined based on the population flow model. The specific process is similar to... Figure 3 The method for determining the crowd flow characteristics of the sub-regions in the illustrated examples is similar and will not be repeated here.

[0106] In one possible implementation, traffic category curves can be determined based on historical traffic data from multiple sample areas. This historical traffic data may include operating parameters, call statistics, and measurement reports (MR). Historical traffic data can be obtained from individual base stations or from the operator's historical traffic database; this application does not restrict the source of the historical traffic data.

[0107] In one possible implementation, the traffic category curve corresponds to a preset duration. For example, when predicting data traffic for the next 7 days (i.e., when the prediction period is 7 days), the preset duration can be set to 7 days. Then, the historical traffic data of each sample area can be divided into 7-day units, and the traffic category curve corresponding to the preset duration (7 days) can be determined based on the divided historical traffic data.

[0108] In one possible implementation, when determining the flow category curve, the second flow curve for each sample area can be determined first based on the historical flow data of each sample area.

[0109] For example, for any sample area, historical flow data can be statistically analyzed to obtain statistical results. For instance, with a preset duration of one day and a preset sampling interval of 10 minutes, the historical flow data of the sample area can be divided into multiple historical flow data groups based on 24 hours per day. For any historical flow data group, the historical flow of the sample area at each sampling time point can be statistically analyzed according to the sampling interval to obtain statistical results. Then, curve fitting and other processing can be performed on these statistical results to obtain a second flow curve, which is a flow time-spectrum curve. When there are multiple historical flow data groups, there will also be multiple second flow curves for the sample area.

[0110] It should be noted that the preset duration and sampling interval can be set according to the actual situation such as the quality of historical traffic data and the prediction period, and are not limited to 7 days, 1 day, 24 hours, and 10 minutes as in the above example. That is, this application does not restrict the specific values ​​of the preset duration and sampling interval.

[0111] After determining the second flow curves of each sample region, clustering algorithms such as k-means clustering (KMeans) and graph segmentation-based spectral clustering can be used to cluster the second flow curves of multiple sample regions in the sample set to obtain flow category curves.

[0112] For example, each second flow curve can be standardized to obtain a standardized second flow curve; then, according to a preset time interval, each standardized second flow curve is represented as an eigenvector x of length D. Assuming any standardized second flow curve is a 24-hour flow spectrum curve, and the preset time interval is 0.5 hours, then D = 24 ÷ 0.5 = 48, and the eigenvector x of this second flow curve = (x1, x2, ..., x...). 48 , where x1 represents the first value selected from the standardized second flow curve, x2 represents the second value selected from the standardized second flow curve, ..., x 48 This represents the 48th value selected from the standardized second flow curve, with a time interval of 0.5 hours between any two adjacent values.

[0113] The above method determines the feature vectors of each standardized second flow curve. Then, clustering algorithms such as K-means and graph-partition-based spectral clustering can be used to cluster these feature vectors, resulting in P clusters and their respective cluster centers, where P is a positive integer. The number of clusters P can be used as the number of flow category curves, and the curves {l1, l2, ..., l...} corresponding to the cluster centers of the P clusters can be further classified. P}, which is determined to be a traffic category curve.

[0114] In one possible implementation, for any cluster, the semantic labels of the traffic category curves corresponding to the cluster can be determined by keyword extraction, semantic analysis, and other methods based on the theme of the sample area corresponding to the second traffic curve in the cluster, such as work type, residential type, transportation type, etc., thereby improving the interpretability of the traffic category curves.

[0115] By determining the second flow curve for each sample area based on historical flow data, and then clustering the second flow curves to obtain flow category curves, the process is simple, fast, and highly accurate, thereby improving processing efficiency and accuracy.

[0116] After determining the flow category curve, the first flow curve for each sample area can be determined based on this curve. The first flow curve is a linear combination of the flow category curves. For example, the second flow curve for any sample area can be represented as the flow category curve {l1, l2, ..., l...} using decomposition methods such as Fourier analysis.P The linear combination of} is used to determine the second flow curve, which is represented by the linear combination, as the first flow curve of the sample area.

[0117] In one possible implementation, the first flow curve x′ of any sample region can be represented by the following formula (1):

[0118]

[0119] In the above formula (1), c1, c2, ..., c P Let l1, l2, ..., l be the coefficients of the first flow curve x′, respectively, used to indicate l1, l2, ..., l in the first flow curve x′. P The weights are given, the residuals are white noise sequences, and q is a positive integer and 1≤q≤P.

[0120] After obtaining the population flow characteristics and first flow curve of each sample area, the population flow characteristics of each sample area can be used as input and the coefficients of the first flow curve of each sample area can be used as output to train the flow prediction model.

[0121] For example, suppose the traffic prediction model includes P sub-models, each with a corresponding coefficient; suppose the number of sample regions is Z, and the coefficient c of the first traffic curve in the g-th sample region is... g ={c g1 ,c g2 ,…,c gP}, Population flow characteristics d′ of the g-th sample region g ={d′ g1 ,d′ g2 ,…,d′ gu}, where Z and g are positive integers and 1≤g≤Z.

[0122] The coefficient c of the first flow curve using the g-th sample region. g and population flow characteristics d′ g When training the traffic prediction model, d′ can be used. g ={d′ g1 ,d′ g2 ,…,d′ gu} as input (i.e., independent variable), c g ={c g1 ,c g2 ,…,c gP The output (i.e., the response variable) is used to train the traffic prediction model. This can be expressed as the following formula (2):

[0123] c gq =f q (d′ g1 ,d′g2 ,…,d′ gu (2)

[0124] In the above formula (2), c gq c g The q-th coefficient, f q This represents the q-th sub-model of the traffic prediction model.

[0125] In other words, the population flow characteristics d′ of the g-th sample region can be used to... g As input, the q-th coefficient c of the first flow curve of the g-th sample region gq As output, the q-th sub-model of the traffic prediction model is trained.

[0126] In one possible implementation, the traffic prediction model can be a machine learning model such as a support vector machine, random forest, or extreme gradient boosting (XGBoost). This application does not limit the specific type of traffic prediction model.

[0127] The following example uses a support vector machine as a traffic prediction model to illustrate its training process.

[0128] The q-th sub-model f of the traffic prediction model q It can be expressed as the following formula (3):

[0129] f q (d)=w q φ(d)+b q (3)

[0130] In the above formula (3), d is the input variable, representing the population flow characteristics of the input sample area, and w q b q Let q be the training parameters of the q-th sub-model, and φ(·) be the nonlinear mapping function.

[0131] When the traffic prediction model is a support vector machine, the q-th sub-model f of the traffic prediction model can be represented by the following formula (4). q :

[0132]

[0133] In the above formula (4), C is the regularization constant, l ∈ For loss function, Where y = f q (d′ g )-c gq .

[0134] The q-th sub-model f of the flow prediction model described in formula (4) can be obtained by using the population flow characteristics of the Z sample areas and the coefficients of the first flow curve. q Conduct training (e.g., adjust w) q b q And testing, thus obtaining the q-th trained sub-model f. q A similar method can be used to train each sub-model in the traffic prediction model, thereby obtaining a trained traffic prediction model.

[0135] By determining the crowd flow characteristics and first flow curve of each sample area in the sample set, and using the crowd flow characteristics of each sample area as input and the coefficients of the first flow curve of each sample area as output, the flow prediction model is trained. This not only improves the accuracy of the flow prediction model, but also restricts the crowd flow characteristics to the sample area, thereby improving the interpretability of the flow prediction model.

[0136] Step S310: Based on the geographic information data and population flow data of the geographic area to be predicted, the geographic area is divided into autonomous traffic domains to obtain multiple sub-regions.

[0137] Step S320: For any sub-region, determine the population flow model of the sub-region based on the geographic information data and population flow data of the sub-region.

[0138] Step S330: Determine the crowd flow characteristics of the sub-region based on the crowd flow model.

[0139] Optionally, the above steps S310, S320, and S330 are the same as... Figure 3 Steps S310, S320, and S330 in the illustrated embodiment are similar and will not be described again here.

[0140] Step S3401: Process the crowd flow characteristics using a flow prediction model to obtain the coefficients of the flow prediction curve for the sub-region.

[0141] The traffic prediction model is the traffic prediction model that has been trained in step S300. The traffic prediction curve is a linear combination of the traffic category curves corresponding to the traffic prediction model. The coefficients of the traffic prediction curve are used to indicate the weight of each traffic category curve in the traffic prediction curve.

[0142] For any sub-region, the population flow characteristics of that sub-region can be processed using a traffic prediction model. For example, the population flow characteristics d of the sub-region * ={d1,d2,…,d u The traffic prediction model consists of P sub-models, each with a corresponding coefficient, which can be used to predict the flow rate.* Each sub-model of the traffic prediction model is input and processed to obtain P coefficients of the traffic prediction curve for that sub-region.

[0143] In one possible implementation, the q-th coefficient of the flow prediction curve for a sub-region can be determined using the following formula (5).

[0144]

[0145] Step S3402: Determine the flow prediction curve of the sub-region based on the coefficients of the flow prediction curve of the sub-region and the flow category curve corresponding to the flow prediction model.

[0146] For any sub-region, the coefficient of the flow prediction curve determined in step S3401 is: The traffic category curves corresponding to the traffic prediction model are {l1,l2,…,l P The flow prediction curve for a sub-region can be determined by linear combination, and this flow prediction curve can be used as the data flow prediction result for the sub-region.

[0147] Sub-region flow prediction curve x * It can be expressed by the following formula (6):

[0148]

[0149] By repeatedly executing the above steps S320, S330, S3401 and S3402, the data traffic prediction results for each sub-region of the geographic region to be predicted can be obtained, and the data traffic prediction results for all sub-regions within the geographic region can be determined as the data traffic prediction results for the geographic region.

[0150] In this embodiment, a traffic prediction model can be trained based on a preset sample set to obtain a trained traffic prediction model. Based on the geographic information data and population flow data of the geographic area to be predicted, the geographic area is divided into traffic autonomous regions, resulting in multiple sub-regions. For any sub-region, the population flow model of the sub-region is determined based on its geographic information data and population flow data, thereby determining the population flow characteristics of the sub-region. The trained traffic prediction model is then used to process these population flow characteristics to obtain the coefficients of the traffic prediction curve for the sub-region. Finally, based on the coefficients of the traffic prediction curve for the sub-region and the corresponding traffic category curve of the traffic prediction model, the data traffic prediction result (i.e., the traffic prediction curve) for the sub-region is determined. This not only enables data traffic prediction using a traffic prediction model with population flow characteristics as input, making data traffic prediction independent of historical traffic data of the geographic area to be predicted, but also improves the accuracy of data traffic prediction by using the coefficients of the traffic prediction curve for the sub-region determined by the traffic prediction model and the corresponding traffic category curve of the traffic prediction model to determine the data traffic prediction result for the sub-region.

[0151] Figure 6 A schematic diagram illustrating the migration of a traffic prediction model according to an embodiment of this application is shown. Figure 6 As shown, the traffic prediction model includes two phases: the training phase 610 and the transfer phase 620.

[0152] In the training phase 610 of the traffic prediction model, a sample set can be established first. Multiple sample geographic regions with historical traffic data can be selected initially. For any given sample geographic region, based on its geographic information data (e.g., road network, POIs, etc.) and population flow data 611, a network community discovery algorithm can be used to divide the sample geographic region into Traffic Autonomous Regions (TAZs), resulting in multiple sample regions 612. After dividing the multiple sample geographic regions into traffic autonomous regions, a sample set can be established based on the geographic information data, population flow data (determined from the geographic information data and population flow data of the corresponding sample geographic regions), and historical traffic data (determined from the historical traffic data of the corresponding sample geographic regions 615).

[0153] The frequent subgraph mining algorithm can be used to determine the population flow modality 613 of each sample region in the sample set, and the population flow characteristics 614 of each sample region can be determined based on the population flow modality 613 of each sample region.

[0154] Based on the historical flow data of each sample area, the second flow curve of each sample area can be determined, and the second flow curve can be clustered to obtain the flow category curve 616. Then, through decomposition methods such as Fourier analysis, the second flow curve of each sample area can be represented as a linear combination of the flow category curve 616, and the second flow curve represented by the linear combination is determined as the first flow curve 617 of each sample area.

[0155] The traffic prediction model 618 can be trained based on the coefficients of the first traffic flow curve 617 and the crowd flow characteristics 614 of each sample area. Optionally, the crowd flow characteristics 614 of each sample area can be used as input, and the coefficients of the first traffic flow curve 617 of each sample area can be used as output to train the traffic prediction model 618. Training can be terminated when the traffic prediction model 618 meets the preset training termination conditions, resulting in the trained traffic prediction model 618.

[0156] In the migration phase 620 of the traffic prediction model, the trained traffic prediction model 618 can be used to predict traffic in geographical areas where there is no historical traffic data.

[0157] Based on the geographic information data (such as road network, POI, etc.) and population flow data of the geographic region to be predicted (without historical traffic data) 621, the traffic autonomous region (TAZ) is divided into multiple sub-regions 622 through a network community discovery algorithm; the population flow modality of each sub-region is determined 623 through a frequent subgraph mining algorithm, and the population flow characteristics of each sub-region are determined 624 based on the population flow modality of each sub-region 623.

[0158] After obtaining the population flow characteristics 624 of each sub-region, for any sub-region, the population flow characteristics 624 of the sub-region can be processed by the traffic prediction model 618 trained in the training phase 610 to obtain the coefficients of the traffic prediction curve of the sub-region. Based on the coefficients of the traffic prediction curve of the sub-region and the traffic category curve corresponding to the traffic prediction model, the traffic prediction curve of the sub-region is determined. The data traffic prediction curves of multiple sub-regions 622 in the geographical region are determined as the data traffic prediction result 625 of the geographical region.

[0159] In this embodiment, the local similarity of different geographical regions can be characterized by sub-regions (i.e., traffic autonomous regions) and population flow models. This allows for full utilization of the local similarity of geographical regions to achieve the transfer of traffic prediction models, enabling traffic prediction models to predict traffic in geographical regions without historical traffic data.

[0160] Furthermore, the crowd flow characteristics determined based on the crowd flow phantom do not rely on personal experience and have fewer restrictions. Compared with existing traffic prediction methods, they can utilize more local similarity features, thereby improving the accuracy of traffic prediction.

[0161] The traffic prediction method of this application embodiment can divide the geographic region to be predicted into autonomous traffic domains to obtain multiple sub-regions of the geographic region to be predicted. For any sub-region, its population flow model can be determined based on the geographic information data and population flow data of the sub-region, thereby restricting the population flow model to the sub-region and establishing the relationship between the nodes of the population flow model and the real geographic space, thereby realizing spatiotemporal prediction.

[0162] The traffic prediction method of this application does not rely on historical traffic data for the extraction of crowd flow patterns. Therefore, the traffic prediction model can be trained in geographical areas with historical traffic data, and the trained traffic prediction model can be transferred to geographical areas without historical traffic data for data traffic prediction.

[0163] Furthermore, crowd movement behavior is unrelated to events such as network adjustments and is not affected by traffic anomalies. Therefore, the traffic prediction method in this application can be applied to non-steady-state scenarios. Non-steady-state scenarios may include scenarios with sudden increases or decreases in traffic caused by major events, network adjustments (such as bulk business migration in and out, equipment maintenance, etc.), promotions (such as Double Eleven), etc.

[0164] Figure 7 A block diagram of a flow prediction apparatus according to an embodiment of this application is shown. Figure 7 As shown, the flow prediction device includes:

[0165] The sub-region division module 710 is used to divide the geographic region into traffic autonomous regions based on the geographic information data and population flow data of the geographic region to be predicted, thereby obtaining multiple sub-regions.

[0166] The crowd flow model determination module 720 determines the crowd flow model of any sub-region based on the geographic information data and crowd flow data of the sub-region. The crowd flow model is used to indicate the multi-point movement pattern of the crowd in the sub-region.

[0167] The crowd flow characteristic determination module 730 is used to determine the crowd flow characteristics of the sub-region based on the crowd flow phantom, wherein the crowd flow characteristics are used to indicate the frequency of occurrence of the crowd flow phantom in the sub-region;

[0168] The traffic prediction module 740 is used to predict the data traffic of the sub-region based on the population flow characteristics of the sub-region, and obtain the data traffic prediction result of the sub-region.

[0169] In one possible implementation, the data traffic prediction result includes a traffic prediction curve. The traffic prediction module 740 is configured to: process the crowd flow characteristics using a pre-trained traffic prediction model to obtain the coefficients of the traffic prediction curve for the sub-region, wherein the traffic prediction curve is a linear combination of traffic category curves corresponding to the traffic prediction model, and the coefficients of the traffic prediction curve are used to indicate the weight of each traffic category curve in the traffic prediction curve; and determine the traffic prediction curve for the sub-region based on the coefficients of the traffic prediction curve for the sub-region and the traffic category curves corresponding to the traffic prediction model.

[0170] In one possible implementation, the crowd flow phantom determination module 720 is configured to: determine the location information of key landmarks in the sub-region based on the geographic information data of the sub-region; determine a crowd flow feature map of the sub-region based on the crowd flow data of the sub-region and the location information of the key landmarks, wherein the crowd flow feature map is a directed graph including multiple nodes and lines connecting the nodes, the nodes representing key landmarks and the lines representing the direction of crowd flow between the nodes; and extract the crowd flow phantom of the sub-region from the crowd flow feature map.

[0171] In one possible implementation, the device further includes a training module for training the traffic prediction model based on a preset sample set, wherein the sample set includes geographic information data, population flow data, and historical traffic data of multiple sample areas.

[0172] In one possible implementation, the training module is configured to: determine the population flow characteristics of each sample area based on the geographic information data and population flow data of each sample area in the sample set; determine the flow category curve based on the historical flow data of the multiple sample areas; determine the first flow curve of each sample area based on the flow category curve, wherein the first flow curve is a linear combination of the flow category curves; and train the flow prediction model by taking the population flow characteristics of each sample area as input and the coefficients of the first flow curves of each sample area as output.

[0173] In one possible implementation, determining the traffic category curve based on the historical traffic data of the plurality of sample areas includes: determining a second traffic curve for each sample area based on the historical traffic data of each sample area; and clustering the second traffic curves of the plurality of sample areas to obtain the traffic category curve.

[0174] In one possible implementation, the geographic information data includes at least one of the following: map of the geographic area, road network, points of interest, areas of interest, building type, or social management grid; and the crowd flow data includes at least one of the following: online crowd flow big data, crowd trajectory data in minimized drive test data, or base station handover data related to crowd flow.

[0175] In one possible implementation, the apparatus is applied to data traffic prediction for a telecommunications network, the data traffic prediction result including data traffic prediction results for the telecommunications network.

[0176] Embodiments of this application provide a traffic prediction apparatus, including: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions.

[0177] Embodiments of this application provide a non-volatile computer-readable storage medium storing computer program instructions thereon, which, when executed by a processor, implement the above-described method.

[0178] Embodiments of this application provide a computer program product including computer-readable code, or a non-volatile computer-readable storage medium carrying computer-readable code, wherein when the computer-readable code is run in a processor of an electronic device, the processor in the electronic device performs the above-described method.

[0179] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), electrically programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital video disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing.

[0180] The computer-readable program instructions or code described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.

[0181] The computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages ​​such as Smalltalk, C++, etc., and conventional procedural programming languages ​​such as "C" or similar languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), are personalized by utilizing state information from computer-readable program instructions. These electronic circuits can execute computer-readable program instructions to implement various aspects of this application.

[0182] Various aspects of this application are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.

[0183] These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processor of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner; thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.

[0184] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.

[0185] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction containing one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those shown in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved.

[0186] It should also be noted that each block in the block diagram and / or flowchart, as well as combinations of blocks in the block diagram and / or flowchart, can be implemented using hardware (such as circuits or ASICs (Application Specific Integrated Circuits)) that performs the corresponding function or action, or using a combination of hardware and software, such as firmware.

[0187] Although the invention has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings, disclosure, and appended claims in carrying out the claimed invention. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit can implement several functions listed in the claims. While different dependent claims may recite certain measures, this does not mean that these measures cannot be combined to produce good results.

[0188] The various embodiments of this application have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A flow prediction method, characterized in that, The method includes: Based on the geographic information data and population flow data of the geographic region to be predicted, the geographic region is divided into autonomous traffic domains to obtain multiple sub-regions; For any sub-region, a population flow model is determined based on the geographic information data and population flow data of the sub-region. The population flow model is used to indicate the multi-point movement pattern of the population in the sub-region. The population flow model is determined by a frequent subgraph mining algorithm. Based on the crowd flow phantom, the crowd flow characteristics of the sub-region are determined, and the crowd flow characteristics are used to indicate the frequency of occurrence of the crowd flow phantom in the sub-region; Based on the population flow characteristics of the sub-region, the data flow of the sub-region is predicted to obtain the data flow prediction result of the sub-region. The prediction is achieved by comparing the similarity of the population flow characteristics of the sub-region with the population flow characteristics of other geographical regions, or by using a pre-trained flow prediction model.

2. The method according to claim 1, characterized in that, The data traffic prediction results include traffic prediction curves. The step of predicting the data traffic of the sub-region based on the population flow characteristics of the sub-region, and obtaining the data traffic prediction result of the sub-region, includes: The flow prediction model is pre-trained to process the flow characteristics of the crowd, and the coefficients of the flow prediction curve of the sub-region are obtained. The flow prediction curve is a linear combination of the flow category curves corresponding to the flow prediction model. The coefficients of the flow prediction curve are used to indicate the weight of each flow category curve in the flow prediction curve. The flow prediction curve for the sub-region is determined based on the coefficients of the flow prediction curve for the sub-region and the flow category curve corresponding to the flow prediction model.

3. The method according to claim 1, characterized in that, The step of determining the population flow model of the sub-region based on the geographic information data and population flow data of the sub-region includes: Based on the geographic information data of the sub-region, determine the location information of key landmarks in the sub-region; Based on the crowd flow data of the sub-region and the location information of the key landmarks, a crowd flow feature map of the sub-region is determined. The crowd flow feature map is a directed graph that includes multiple nodes and lines connecting the nodes. The nodes represent key landmarks, and the lines represent the direction of crowd flow between the nodes. Extract the crowd flow phantom of the sub-region from the crowd flow feature map.

4. The method according to claim 2, characterized in that, The method further includes: The traffic prediction model is trained based on a pre-set sample set. The sample set includes geographic information data, population flow data, and historical traffic data for multiple sample areas.

5. The method according to claim 4, characterized in that, The step of training the traffic prediction model based on a preset sample set includes: Based on the geographic information data and population flow data of each sample area in the sample set, the population flow characteristics of each sample area are determined. Based on the historical traffic data of the multiple sample areas, determine the traffic category curve; Based on the flow category curves, a first flow curve is determined for each sample area, and the first flow curve is a linear combination of the flow category curves. The flow characteristics of each sample area are used as input, and the coefficients of the first flow curve of each sample area are used as output to train the flow prediction model.

6. The method according to claim 5, characterized in that, The step of determining the traffic category curve based on the historical traffic data of the multiple sample areas includes: Based on the historical flow data of each sample area, determine the second flow curve for each sample area; Clustering is performed on the second flow curves of the multiple sample regions to obtain flow category curves.

7. The method according to any one of claims 1-6, characterized in that, The geographic information data includes at least one of the following: map of the geographic region, road network, point of interest, area of ​​interest, building type, or social management grid. The population flow data includes at least one of the following: online population flow big data, population trajectory data in minimized road test data, or base station handover data related to population flow.

8. The method according to any one of claims 1-6, characterized in that, The method is applied to data traffic prediction in telecommunications networks, and the data traffic prediction results include data traffic prediction results for telecommunications networks.

9. A flow prediction device, characterized in that, The device includes: The sub-region division module is used to divide the geographic region into traffic autonomous regions based on the geographic information data and population flow data of the geographic region to be predicted, thereby obtaining multiple sub-regions. The crowd flow motif determination module determines the crowd flow motif for any sub-region based on the geographic information data and crowd flow data of the sub-region. The crowd flow motif is used to indicate the multi-point movement pattern of the crowd within the sub-region. The crowd flow motif is determined by a frequent subgraph mining algorithm. A crowd flow characteristic determination module is used to determine the crowd flow characteristics of the sub-region based on the crowd flow model, wherein the crowd flow characteristics are used to indicate the frequency of occurrence of the crowd flow model in the sub-region; The traffic prediction module is used to predict the data traffic of the sub-region based on the population flow characteristics of the sub-region, and obtain the data traffic prediction result of the sub-region. The prediction is achieved by comparing the similarity of the population flow characteristics of the sub-region with the population flow characteristics of other geographical regions, or by using a pre-trained traffic prediction model.

10. The apparatus according to claim 9, characterized in that, The data traffic prediction results include traffic prediction curves. The traffic prediction module is configured as follows: The flow prediction model is pre-trained to process the flow characteristics of the crowd, and the coefficients of the flow prediction curve of the sub-region are obtained. The flow prediction curve is a linear combination of the flow category curves corresponding to the flow prediction model. The coefficients of the flow prediction curve are used to indicate the weight of each flow category curve in the flow prediction curve. The flow prediction curve for the sub-region is determined based on the coefficients of the flow prediction curve for the sub-region and the flow category curve corresponding to the flow prediction model.

11. The apparatus according to claim 9, characterized in that, The crowd flow phantom determination module is configured as follows: Based on the geographic information data of the sub-region, determine the location information of key landmarks in the sub-region; Based on the crowd flow data of the sub-region and the location information of the key landmarks, a crowd flow feature map of the sub-region is determined. The crowd flow feature map is a directed graph that includes multiple nodes and lines connecting the nodes. The nodes represent key landmarks, and the lines represent the direction of crowd flow between the nodes. Extract the crowd flow phantom of the sub-region from the crowd flow feature map.

12. The apparatus according to claim 10, characterized in that, The device further includes: The training module is used to train the traffic prediction model based on a preset sample set. The sample set includes geographic information data, population flow data, and historical traffic data for multiple sample areas.

13. The apparatus according to claim 12, characterized in that, The training module is configured as follows: Based on the geographic information data and population flow data of each sample area in the sample set, the population flow characteristics of each sample area are determined. Based on the historical traffic data of the multiple sample areas, determine the traffic category curve; Based on the flow category curves, a first flow curve is determined for each sample area, and the first flow curve is a linear combination of the flow category curves. The flow prediction model is trained by taking the population flow characteristics of each sample area as input and the coefficients of the first flow curve of each sample area as output.

14. The apparatus according to claim 13, characterized in that, The step of determining the traffic category curve based on the historical traffic data of the multiple sample areas includes: Based on the historical flow data of each sample area, determine the second flow curve for each sample area; Clustering is performed on the second flow curves of the multiple sample regions to obtain flow category curves.

15. The apparatus according to any one of claims 9-14, characterized in that, The geographic information data includes at least one of the following: map of the geographic region, road network, point of interest, area of ​​interest, building type, or social management grid. The population flow data includes at least one of the following: online population flow big data, population trajectory data in minimized road test data, or base station handover data related to population flow.

16. The apparatus according to any one of claims 9-14, characterized in that, The device is used for data traffic prediction in telecommunications networks, and the data traffic prediction results include data traffic prediction results for telecommunications networks.

17. A flow prediction device, characterized in that, include: processor; Memory used to store processor-executable instructions; The processor is configured to implement the method of any one of claims 1-8 when executing the instructions.

18. A non-volatile computer-readable storage medium storing computer program instructions thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1-8.

19. A computer program product comprising computer-readable code, or a non-volatile computer-readable storage medium carrying the computer-readable code, wherein when the computer-readable code is executed in an electronic device, a processor in the electronic device performs the method of any one of claims 1-8.