A method and system for predicting network traffic
By establishing a weighted calculation of the device's own model and related device models, and utilizing traffic data from devices in the same and cross-regional areas, the problem of insufficient reliability in network traffic prediction and burst traffic identification capability in existing technologies has been solved, achieving more accurate traffic prediction and burst traffic identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FIBERHOME TELECOMMUNICATION TECHNOLOGIES CO LTD
- Filing Date
- 2023-06-09
- Publication Date
- 2026-06-23
AI Technical Summary
Existing network traffic prediction models lack reliability and the ability to pre-identify bursts of traffic. They mainly rely on historical data of a single object for prediction and fail to effectively utilize historical data of other related objects.
By establishing a model of the device itself, selecting relevant devices with a traffic similarity greater than a threshold to the predicted target, updating the relevant device models, and combining traffic prediction values and empirical prediction values for weighted calculation, the device's own model is optimized using traffic data from devices in the same region and across regions.
It improves the reliability of network traffic prediction and the ability to identify sudden traffic surges in advance, and enhances the accuracy of judging traffic changes and the ability to analyze multiple scenarios.
Smart Images

Figure CN116668319B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of communication network management, and in particular to a method and system for predicting network traffic. Background Technology
[0002] With the increasing number of internet users and the surge in online time, network traffic has skyrocketed, making refined, automated, and intelligent network operation and management a new challenge. Accurately predicting network traffic trends can help operators accurately estimate network usage, rationally allocate and efficiently utilize network resources to meet the growing and diverse needs of users. Given the complex temporal and spatial distribution of network traffic, the selection and design of models are crucial for accurate prediction.
[0003] Currently, prediction models are divided into two categories: linear models (AR, MA, ARMA, ARIMA, etc.) and nonlinear models (SVM, neural networks, etc.). However, these models are mainly trained and predicted for a single object from a time-series perspective, assuming that the future flow of the predicted object is related to its own historical flow data. The trained models suffer from overfitting and lack adaptation to changes, do not consider the reference value of historical data of other related objects for the predicted object, and lack reliability and the ability to pre-identify sudden flow. Summary of the Invention
[0004] This application provides a method and system for predicting network traffic to address the problems of low reliability and low ability to predict sudden traffic in related technologies.
[0005] Firstly, a method for predicting network traffic is provided, including:
[0006] A device-specific model is built based on the historical dataset of each device, and traffic prediction values are obtained through the device-specific model. The historical dataset includes traffic data.
[0007] Other devices with a traffic similarity greater than a preset similarity threshold to the predicted object are selected as relevant devices. The relevant device models are updated periodically based on traffic monitoring of the relevant devices to obtain the empirical prediction values of each relevant device model.
[0008] The network traffic prediction result is obtained by weighting the traffic prediction value and all empirical prediction values.
[0009] In some embodiments, for each selected relevant device, a portion of its similarity to the traffic of the predicted object is used as an empirical weight, and a weighted empirical prediction value is obtained based on the empirical prediction values of all relevant devices and their corresponding empirical weights.
[0010] In some embodiments, the traffic weight of the predicted object is obtained based on the empirical weight of all relevant devices, and the weighted traffic prediction value is obtained through the traffic weight. The prediction result is the sum of the weighted empirical prediction value and the weighted traffic prediction value.
[0011] In some embodiments, the historical dataset includes data of at least one baseline feature. The data changes of the baseline feature of the prediction object are periodically acquired, the device's own model is updated according to the data changes, the device's own model before and after the update is compared, and the device's own model with higher accuracy is selected to obtain the traffic prediction value.
[0012] In some embodiments, the historical dataset includes data of at least one benchmark feature. Correlation analysis between the predicted target traffic and all benchmark features is periodically obtained. A preset number of benchmark features with high correlation coefficients are selected as optimization features. The device's own model is updated. The device's own model before and after the update is compared. The device's own model with high accuracy is selected to obtain the traffic prediction value.
[0013] In some embodiments, the other devices include devices in the same region and devices across regions, with preset similarity thresholds for the same region and similarity thresholds for across regions, and relevant devices are selected according to the corresponding similarity thresholds.
[0014] In some embodiments, for each similarity threshold, when there are multiple other devices with similarity greater than the threshold, they are sorted in descending order of similarity, a corresponding quantity threshold is set, and other devices above the quantity threshold are selected as related devices.
[0015] In some embodiments, when calculating the traffic similarity between the predicted object and other devices, the length of the traffic data is selected as the full length of the predicted object's traffic, or a portion of the predicted object's traffic data is truncated from the time point when the last traffic data was generated.
[0016] In some embodiments, the device's own model is one or more combinations of linear and nonlinear machine learning models.
[0017] On the other hand, a prediction system for the network traffic prediction method is provided, comprising:
[0018] The device-specific model module is used to build a device-specific model based on the historical dataset of each device, and obtain traffic prediction values through the device-specific model.
[0019] The resource monitoring module is used to monitor traffic and data changes of the predicted targets and related devices;
[0020] The related device model module is used to select other devices whose traffic similarity to the predicted object is greater than a preset similarity threshold as related devices. It is also used to build related device models and update the related device models periodically based on traffic monitoring of related devices to obtain the empirical prediction values of each related device model.
[0021] The traffic prediction module is used to calculate the predicted network traffic by weighting the predicted traffic value and all empirical prediction values.
[0022] In some embodiments, it also includes:
[0023] The data acquisition module is used to periodically collect historical data from the device.
[0024] The data preprocessing module is used to preprocess historical data to form a historical dataset;
[0025] The visualization module is used to display data from the data preprocessing module and the traffic prediction module, showing various situational analyses.
[0026] The beneficial effects of the technical solution provided in this application include:
[0027] By combining traffic prediction values with the experience prediction values of relevant equipment, the final prediction result of the prediction object is obtained. This solves the problems of singularity and bias in traffic prediction based solely on the historical data of the prediction object itself, improves the reliability of network traffic prediction, and enhances the ability to identify sudden traffic surges in advance.
[0028] The device's own model can be updated by incorporating data changes in baseline features; it can also obtain optimized features through correlation analysis based on changes in baseline features, and then update the device's own model; this further improves the reliability of network traffic prediction and the ability to identify burst traffic in advance.
[0029] The relevant equipment includes both local and cross-regional equipment. Local equipment records the scene information closest to the predicted object, and the traffic data changes are most similar, increasing the reliability of the prediction value. Cross-regional equipment records traffic data changes in multiple scenarios, providing a basis for analyzing sudden traffic from the perspective of big data analysis. Attached Figure Description
[0030] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0031] Figure 1This is a flowchart of the first embodiment of the network traffic prediction method of this application;
[0032] Figure 2 This is a schematic diagram of an embodiment of the basic features of this application;
[0033] Figure 3 This is a schematic diagram illustrating the calculation of similarity by truncating the entire data length in an embodiment of this application;
[0034] Figure 4 This is a schematic diagram illustrating the similarity calculation performed on a portion of the data length in an embodiment of this application.
[0035] Figure 5 This is a schematic diagram of the relevant equipment model and predicted empirical values for this application;
[0036] Figure 6 This is a schematic diagram illustrating the calculation of similarity within the same region in this application;
[0037] Figure 7 This is a schematic diagram illustrating the cross-regional similarity calculation for this application;
[0038] Figure 8 This is a flowchart of the second embodiment of the network traffic prediction method of this application;
[0039] Figure 9 This is a flowchart of the third embodiment of the network traffic prediction method of this application;
[0040] Figure 10 This is a schematic diagram of the network traffic prediction system in this embodiment. Detailed Implementation
[0041] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0042] This application provides a method and system for predicting network traffic, which can solve the problems of low reliability in network traffic prediction and low ability to identify sudden traffic in advance.
[0043] like Figure 1 As shown, a first embodiment of a method for predicting network traffic includes the following steps:
[0044] S1. Build a device model based on the historical dataset of each device, and obtain traffic prediction values through the device model. The historical dataset includes traffic data.
[0045] S2. Select other devices whose traffic similarity to the predicted object is greater than a preset similarity threshold as related devices, and update the related device models periodically based on traffic monitoring of related devices to obtain the empirical prediction values of each related device model.
[0046] Understandably, relevant device models can be pre-built, meaning that each other device can build a relevant device model based on its corresponding historical data, and the output of the relevant device model is an empirical prediction of traffic.
[0047] S3. The network traffic prediction result is obtained by weighting the traffic prediction value and all empirical prediction values. It can be understood that the corresponding empirical prediction value will be updated after each update of the relevant device model.
[0048] In step S1 above, such as Figure 2 As shown, historical data is multi-source heterogeneous data, and the historical dataset includes data on at least one baseline feature. Baseline features can be categorized into several types, each containing multiple features. For example, long-term correlation features include flow rate, voltage, region, current, and rate; periodic / nodal features include seasons, holidays, and weather; and sudden features include trending topics on Weibo, Douyin, and Baidu Index. Historical data is collected periodically, with collection periods set to seconds, minutes, hours, days, weeks, months, or grades. The historical dataset must include flow rate data, collecting device flow rate periodically. Additionally, other baseline features that influence flow rate can be selected for collection, such as corresponding weather, news, and holiday data across multiple dimensions.
[0049] Understandably, after historical data is collected, a data preprocessing step is required. This could involve data cleaning, data integration, and data normalization to transform the messy data into ordered basic features that can be used for model training, thereby forming a historical dataset.
[0050] The device's own model is initially built based on historical datasets and can be regarded as the device's initial model. This model can be selected from one or more combinations of linear machine learning models (AR, MA, ARMA, ARIMA, etc.) and non-linear models (SVM, RNN, LSTM, etc.).
[0051] In step S2 above, the device to be predicted is selected as the prediction object. Within the same network system, network devices other than the prediction object are considered "other devices," which include devices within the same region and devices across regions. For example, when the prediction object is an ONU (Optical Network Unit), devices within the same region can select the ONU in the cell where the ONU is located, while devices across regions can select other ONUs outside that cell. When the prediction object is an OLT (Optical Line Terminal) PON (Passive Optical Network) port or uplink port, devices within the same region can select the slot and OLT where the PON port or uplink port is located, while devices across regions can select other devices besides the OLT. When the prediction object is an OLT, devices within the same region can select OLT devices covered by the geographical area to which the OLT belongs, while devices across regions can select other OLT devices outside the coverage of that geographical area.
[0052] When selecting relevant devices, preset similarity thresholds for the same region (range 0-100%) and cross-regional similarity thresholds (range 0-100%). Devices within the same region are selected based on the same-region similarity threshold, and devices across regions are selected based on the cross-regional similarity threshold. These two similarity thresholds can be set to the same value or different values. When calculating the traffic similarity between the predicted object and relevant devices, such as... Figure 3 As shown, the length of the data flow can be selected as the full length of the predicted flow; or, a portion of the predicted flow data can be truncated from the time point when the last flow data was generated, such as... Figure 4 As shown.
[0053] like Figure 5 As shown, for each similarity threshold, when there are multiple other devices with similarity values greater than that threshold, they are sorted from highest to lowest similarity, and a corresponding quantity threshold is set. Devices at or above the specified quantity threshold are selected as relevant devices. The quantity thresholds for devices within the same region and across different regions can be set to the same or different values. For example, for other devices with similarity values greater than the same region threshold, if the same region quantity threshold is N, the top N other devices are selected as relevant devices; for other devices with similarity values greater than the cross-regional similarity threshold, if the cross-regional quantity threshold is M, the top M other devices are selected as relevant devices.
[0054] like Figure 6 The diagram shows a similarity calculation within the same region. Among other devices in the same region, the top N devices with similarity values greater than the similarity threshold within the same region are selected as related devices. In this embodiment, the similarity threshold between the predicted object and a related device in the same region is 99%.
[0055] like Figure 7 As shown, this is a schematic diagram of cross-regional similarity calculation. Among other devices in the cross-region, the TOP M with a similarity greater than the cross-regional similarity threshold is selected as the relevant device. In this embodiment, the similarity between the predicted object and a relevant device in the cross-region is 95%.
[0056] The aforementioned devices within the same region record scenario information most closely related to the current prediction target; their traffic data changes are most similar, increasing the reliability of the prediction. The aforementioned devices across regions record traffic data changes across multiple scenarios, providing a basis for analyzing sudden traffic spikes from a big data analytics perspective. Searching for relevant devices within both the same-region and cross-regional systems avoids the system finding only devices specific to a particular scenario, thus preventing the omission of valuable sudden traffic spikes.
[0057] In step S3 above, the network traffic prediction result is obtained by weighting the traffic prediction value and the empirical prediction values of all relevant devices. For each selected relevant device, a portion of its traffic similarity with the predicted object is used as the corresponding empirical weight. Selecting a portion of the similarity is equivalent to multiplying the similarity by a coefficient between 0 and 1, with the aim of ensuring that the sum of the weights of the predicted object and the relevant devices is 1, achieving a prediction that takes both into account.
[0058] In this embodiment, half of the similarity is selected as the empirical weight for each related device. For example... Figure 5 As shown, since related equipment is divided into intra-regional related equipment and cross-regional related equipment, the weights of intra-regional related equipment are WT1, WT2, ..., WT1, ..., WT2, ..., WT3, ... N The empirical prediction values for each relevant device are XT1, XT2, ..., XT N The weights of the cross-regional related devices are WK1, WK2, ..., WK. M The empirical prediction values for each relevant device are XK1, XK2, ..., XK M Based on the empirical predictions of all relevant equipment and their corresponding empirical weights, a weighted empirical prediction value is obtained, namely:
[0059]
[0060] Where Wr represents the empirical weight of the relevant equipment, and Xr represents the empirical predicted value of the relevant equipment. Then, based on the empirical weights of all relevant equipment, the traffic weight of the predicted object is obtained, i.e.:
[0061]
[0062] Where Wd represents the traffic weight of the predicted object. The final prediction result is the sum of the weighted empirical prediction value and the weighted traffic prediction value, that is:
[0063] y = Wd × Xd + Wr × Xr
[0064] Where Xd is the predicted traffic value of the predicted object.
[0065] like Figure 8 The following is a second embodiment of a method for predicting network traffic. In this embodiment, the historical dataset of the prediction device includes data on at least one baseline feature. When there is only one baseline feature, it is traffic data; when there are more than one, it may also include weather, current, etc. The data changes of the baseline feature of the prediction object are periodically acquired. When there is a data change, the device's own model is updated, and the device's own model before and after the update is compared. The device's own model with higher accuracy is selected to obtain the traffic prediction value. Furthermore, in each period, the device's own model with higher accuracy serves as the basis for the next update, and iterative updates are continuously performed to obtain a better model and traffic prediction value. Based on the traffic prediction value obtained after each period's update, combined with the empirical prediction values of all relevant devices, a weighted calculation is performed to obtain the network traffic prediction result.
[0066] The acquisition period for the baseline feature data of the prediction object can be set according to actual needs. The period can be set to real-time acquisition (seconds, minutes, hours), short-term acquisition (days, weeks), or long-term acquisition (months, grades).
[0067] The accuracy of the prediction results is determined by comparing the equipment's own models. Common error calculation methods can be used, such as root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), where s is the number of test samples. and y i These are the predicted result and the actual value, respectively; the root mean square error is calculated using the following formula:
[0068]
[0069] The formula for calculating the mean absolute error is:
[0070]
[0071] The formula for calculating the mean absolute percentage error is:
[0072]
[0073] The other steps in this embodiment are basically the same as those in the first embodiment described above, and will not be repeated here.
[0074] like Figure 9As shown, a third embodiment of a network traffic prediction method is provided. In this embodiment, the changes in traffic data of the prediction object are periodically acquired, the correlation between the traffic of the prediction object and all benchmark features is analyzed, a preset number of benchmark features with high correlation coefficients are selected as optimization features, the device's own model of the prediction object is updated, the device's own model before and after the update is compared, and the device's own model with high accuracy is selected to obtain the traffic prediction value.
[0075] For example, the historical dataset of the prediction target includes two baseline features besides traffic flow: weather and news. Using the Pearson coefficient, correlation analysis is performed on traffic flow and all baseline features. A predetermined number of baseline features with high correlation coefficients are selected. That is, feature vectors are extracted from high to low correlation coefficients as optimized features. The number of optimized features is set according to different situations. For example, if calculations show that temperature, time, and traffic flow are more correlated, then the optimized features are traffic flow, temperature, and time. The optimized features are used to update the device's own model. By calculating the accuracy of the prediction results, the device's own model with the highest accuracy is selected to obtain the traffic prediction value. In each cycle, the device's own model with the highest accuracy serves as the basis for the next update, continuously iterating to obtain a better model and traffic prediction value. Based on the traffic prediction value obtained after each cycle update, combined with the empirical prediction values of all relevant devices, a weighted calculation is performed to obtain the network traffic prediction result. The other steps in this embodiment are basically the same as in the second embodiment described above, and will not be repeated here.
[0076] like Figure 10 As shown, an embodiment of a network traffic prediction system is provided, which can be used to implement the above method. The system includes a device self-model module, a resource monitoring module, a related device model module, and a traffic prediction module.
[0077] The device-specific model module is used to build a device-specific model based on the historical dataset of each device, and to obtain traffic prediction values through the device-specific model.
[0078] The resource monitoring module is used to monitor traffic and data changes of the predicted objects and related devices.
[0079] The related device model module is used to select other devices whose traffic similarity to the predicted object is greater than a preset similarity threshold as related devices. It is also used to build related device models and update the related device models periodically based on traffic monitoring of related devices, and obtain empirical prediction values through the corresponding related device models.
[0080] The device's own model module, resource monitoring module, and related device model modules can all be completed using an AI engine.
[0081] The traffic prediction module is used to calculate the predicted network traffic by weighting the predicted traffic value and the empirical predicted value. The traffic prediction module supports real-time traffic prediction (second-level, minute-level, hour-level), short-term traffic prediction (daily-level, weekly-level), and long-term traffic prediction (monthly-level, year-level) through the prediction engine.
[0082] In some embodiments, the system further includes a data acquisition module, a data preprocessing module, and a visualization module.
[0083] A data acquisition module is used to periodically collect historical data from the devices. In some embodiments, the data acquisition module collects baseline features from multiple devices in a device cluster and stores them in a data resource library.
[0084] The data preprocessing module is used to preprocess historical data to form a historical dataset, which is then used to build the device's own model. In some embodiments, the data processing module can perform preprocessing steps such as data cleaning, data integration, and data normalization on the historical data through a preprocessing engine.
[0085] The visualization module is used to display data from the data preprocessing module and the traffic prediction module, showing various situational analyses. In some embodiments, the visualization module can perform multiple situational analyses through the situational analysis engine, including traffic situational analysis, device hotspot situational analysis, potential user situational analysis, and so on.
[0086] In the second embodiment of the network traffic prediction method, the resource monitoring module is used to periodically acquire changes in the traffic data of the prediction target. The device self-model module is used to update the device self-model of the prediction target and compare the device self-model before and after the update. The resource monitoring module also monitors the traffic of related devices, and the related device model module updates the related device model periodically accordingly.
[0087] In the third embodiment of the network traffic prediction method, in addition to the above functions, the resource monitoring module also monitors all baseline characteristics of the prediction object and related devices to facilitate subsequent correlation analysis between traffic and all baseline characteristics.
[0088] This application combines the traffic prediction value of the target object with the experience prediction value of related equipment to finally obtain the prediction result of the target object. It solves the problems of singularity and bias in traffic prediction based solely on the historical data of the target object itself, improves the reliability of network traffic prediction, and enhances the ability to identify sudden traffic in advance.
[0089] It should be noted that in this application, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0090] The above description is merely a specific embodiment of this application, enabling those skilled in the art to understand or implement this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.
Claims
1. A method for predicting network traffic, characterized in that, include: A device-specific model is built based on the historical dataset of each device, and traffic prediction values are obtained through the device-specific model. The historical dataset includes traffic data. Other devices with a traffic similarity greater than a preset similarity threshold to the predicted object are selected as relevant devices. The relevant device models are updated periodically based on traffic monitoring of the relevant devices to obtain the empirical prediction values of each relevant device model. The network traffic prediction result is obtained by weighting the traffic prediction value and all empirical prediction values. The historical dataset includes data of at least one benchmark feature. The data changes of the benchmark feature of the prediction object are periodically acquired. The device's own model is updated according to the data changes. The device's own model before and after the update is compared. The device's own model with the higher accuracy is selected to obtain the traffic prediction value. Periodically obtain the correlation analysis between the predicted traffic and all benchmark features, select a preset number of benchmark features with high correlation coefficients as optimization features, update the device's own model, compare the device's own model before and after the update, and select the device's own model with high accuracy to obtain the traffic prediction value. The other devices include devices in the same region and devices across regions. A similarity threshold for the same region and a similarity threshold for the cross region are preset respectively, and the relevant devices are selected according to the corresponding similarity threshold.
2. The network traffic prediction method as described in claim 1, characterized in that, For each selected relevant device, a portion of its traffic similarity to the predicted target is used as an empirical weight. Based on the empirical predicted values of all relevant devices and their corresponding empirical weights, a weighted empirical predicted value is obtained.
3. The network traffic prediction method as described in claim 2, characterized in that, The traffic weight of the predicted object is obtained based on the empirical weight of all relevant devices, and the weighted traffic prediction value is obtained through the traffic weight. The prediction result is the sum of the weighted empirical prediction value and the weighted traffic prediction value.
4. The network traffic prediction method as described in claim 1, characterized in that, For each similarity threshold, when there are multiple other devices with similarity greater than that threshold, sort them in descending order of similarity, set the corresponding quantity threshold, and select the other devices above the quantity threshold as the relevant devices.
5. The network traffic prediction method as described in claim 1, characterized in that, When calculating the traffic similarity between the predicted object and other devices, the length of the traffic data can be either the full length of the predicted object's traffic or a portion of the predicted object's traffic data taken from the time point when the last traffic data was generated.
6. The network traffic prediction method as described in claim 1, characterized in that, The device's own model is a combination of one or more of the following: linear machine learning models and nonlinear machine learning models.
7. A prediction system based on the network traffic prediction method according to any one of claims 1-6, characterized in that, include: The device-specific model module is used to build a device-specific model based on the historical dataset of each device, and obtain traffic prediction values through the device-specific model. The resource monitoring module is used to monitor traffic and data changes of the predicted targets and related devices; The related device model module is used to select other devices whose traffic similarity to the predicted object is greater than a preset similarity threshold as related devices. It is also used to build related device models and update the related device models periodically based on traffic monitoring of related devices to obtain the empirical prediction values of each related device model. The traffic prediction module is used to calculate the predicted network traffic by weighting the predicted traffic value and all empirical prediction values.
8. The prediction system as described in claim 7, characterized in that, Also includes: The data acquisition module is used to periodically collect historical data from the device. The data preprocessing module is used to preprocess historical data to form a historical dataset; The visualization module is used to display data from the data preprocessing module and the traffic prediction module, showing various situational analyses.