An abnormal traffic data filtering method and device
By using a first-in-first-out queue structure and anomaly measurement indicators in network traffic data analysis, the problem of outlier fluctuations affecting analysis accuracy is solved, and automated and adaptive anomaly filtering is achieved, improving the accuracy of data analysis and prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FIBERHOME TELECOMMUNICATION TECHNOLOGIES CO LTD
- Filing Date
- 2023-07-05
- Publication Date
- 2026-06-26
AI Technical Summary
In network traffic prediction and analysis, the model feature selection is affected by outlier fluctuations, which impacts the accuracy of the analysis.
The system adopts a first-in-first-out (FIFO) queue structure, calculates anomaly metrics when data is enqueued and dequeued, combines the judgment results from enqueuing and dequeuing to make a final anomaly determination, and uses a preset processing strategy to handle abnormal data.
It improves the accuracy of data analysis and prediction, realizes automated processing and adaptive filtering of outliers, adapts to changes in traffic data, and supports filtering of large amounts of data.
Smart Images

Figure CN116846839B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent network management and maintenance technology, specifically to a method and device for filtering abnormal traffic data. Background Technology
[0002] In actual network traffic analysis and prediction, traffic is affected by factors such as network status, user scale, equipment configuration, and device operating environment. Numerous factors can impact the accuracy of network traffic prediction and analysis, particularly outlier fluctuations in traffic data. These outliers can interfere with feature selection during network traffic prediction and analysis. If these outlier fluctuations are identified, filtered, and properly handled during the data analysis preprocessing stage, the accuracy of the prediction and analysis models can be effectively improved. Summary of the Invention
[0003] The purpose of this invention is to provide an abnormal traffic data filtering method and apparatus, which can identify and process abnormal traffic data in the data analysis preprocessing stage, thereby reducing the impact of abnormal traffic data values on traffic prediction and analysis, automating the abnormal value analysis and processing process of traffic data, and adapting to changes in traffic data.
[0004] To achieve the above objectives, in a first aspect, embodiments of the present invention provide an abnormal traffic data filtering method, the method comprising:
[0005] The traffic data to be analyzed is sequentially entered into a first-in-first-out queue. When each traffic data is entered into the queue and dequeued, an anomaly metric is calculated to determine the data value anomaly. Based on the results of the anomaly determination when the data is entered into the queue and dequeued, the traffic data is finally determined to be abnormal.
[0006] Traffic data that is ultimately determined to be abnormal will be processed according to the preset processing strategy.
[0007] As a preferred implementation, when each traffic data is queued, anomaly detection is performed by calculating an anomaly metric for that traffic data, specifically including:
[0008] Input the raw data value of the next traffic data to be analyzed, and scale the raw data value by magnitude;
[0009] Update the change factors of each traffic data in the current queue based on the incoming traffic data;
[0010] Update the reference change factor of the current queue based on the updated change factors of each traffic data;
[0011] Based on the updated reference change factor and the change factor corresponding to the traffic data in the current incoming queue, calculate the anomaly measurement index of the traffic data.
[0012] Determine whether the calculated abnormality metric is less than or equal to the preset abnormality threshold; if so, it is determined to be abnormal; otherwise, it is determined to be normal.
[0013] As a preferred implementation, when data is dequeued, anomaly detection is performed by calculating anomaly metrics for each element in the queue. Specifically, this includes:
[0014] Calculate the anomaly measurement index of the current outbound traffic data based on the currently updated reference change factor and the change factor corresponding to the updated outbound traffic data.
[0015] Determine whether the calculated abnormality metric is less than or equal to the preset abnormality threshold; if so, it is determined to be abnormal; otherwise, it is determined to be normal.
[0016] As a preferred implementation, after each traffic data is dequeued, the following operations are also included: updating the change factor of each traffic data in the current queue; and updating the reference change factor of the current queue based on the updated change factor of each traffic data.
[0017] As a preferred implementation, when making a final anomaly determination on the traffic data by combining the anomaly detection results of the data values at the time of enqueueing and dequeueing, a fast determination method is adopted, which specifically includes:
[0018] Obtain the abnormal data value judgment results when the traffic data is enqueued and dequeued; if both abnormal data value judgment results are abnormal, the traffic data is ultimately determined to be abnormal; otherwise, the traffic data is ultimately determined to be normal.
[0019] As a preferred implementation, when making a final anomaly determination on the traffic data by combining the anomaly detection results of the data values at the time of enqueueing and dequeueing, a deep-level determination method is adopted, which specifically includes:
[0020] A supplementary queue is added to the existing queue. The supplementary queue includes at least one level of first-in-first-out queue, and the supplementary queue and the original queue form a multi-level queue combination structure.
[0021] After each traffic data is dequeued from the original queue, it will enter the first-in-first-out queues at each level of the supplementary queue in sequence. For traffic data with different judgment results in the past, when the traffic data leaves the supplementary queue at each level, multiple data value anomaly judgments will be made by calculating the change anomaly measurement index, and a comprehensive judgment will be made by combining the results of multiple data value anomaly judgments.
[0022] In a preferred embodiment, the length of each level of the supplementary queue is longer than the length of the original FIFO queue, and the two lengths are in a multiple relationship.
[0023] As a preferred implementation, the comprehensive evaluation method includes: using the last result of the supplementary queue as the standard; or adopting the seven-point continuous anomaly judgment principle in the field of quality control; or calculating the relative proportion of anomaly judgments.
[0024] As a preferred implementation, the preset processing strategy includes: discard processing or interpolation processing.
[0025] In a second aspect, embodiments of the present invention also provide an abnormal traffic data filtering device based on the method in the first aspect embodiment, the device including an analysis and identification module and an anomaly processing module;
[0026] The analysis and identification module is used to: sequentially enter the traffic data to be analyzed into a first-in-first-out queue, and at the time of each traffic data entering and leaving the queue, calculate the abnormal change measurement index of the traffic data to judge the abnormality of the data value; and combine the abnormality judgment results of the data value at the time of entering and leaving the queue to make a final abnormality judgment of the traffic data.
[0027] The anomaly handling module is used to process traffic data that is ultimately determined to be abnormal according to a preset processing strategy.
[0028] The beneficial effects of this invention are as follows:
[0029] (1) This invention designs an anomaly measurement index to determine whether the current traffic data is abnormal. By introducing this anomaly measurement index, the judgment of data anomalies conforms to the characteristics of data changes, providing a reasonable and optimized anomaly judgment method, thereby helping to improve the accuracy of subsequent data analysis and prediction work; and by using this data anomaly measurement index, it can adapt to changes in traffic data, realizing a dynamic anomaly data filtering method that can change with data trends.
[0030] (2) The present invention also adopts a first-in-first-out queue structure, and performs data value anomaly judgment at both the enqueue time and the dequeue time of the queue, which satisfies the time requirement for anomaly judgment; moreover, by using the first-in-first-out queue structure, the filtering method can support filtering of large amounts of data, and can flexibly support filtering of static data and online dynamic data, making it flexible to use.
[0031] (3) In this invention, a depth judgment method is also designed when making the final anomaly judgment, which can realize a multiple anomaly depth judgment based on multi-queue combination, which can further improve the accuracy of anomaly judgment and reduce misjudgment. Attached Figure Description
[0032] Figure 1 This is a schematic diagram illustrating how the present invention is applied to data analysis and prediction work.
[0033] Figure 2 This is a flowchart of the abnormal traffic data filtering method in an embodiment of the present invention;
[0034] Figure 3 This is a schematic diagram illustrating an application scenario of the abnormal traffic data filtering method in this embodiment of the invention;
[0035] Figure 4 This is a flowchart illustrating the process of judging abnormal data values when each traffic data is enqueued in an embodiment of the present invention.
[0036] Figure 5 This is a flowchart illustrating the process of judging abnormal data values when each traffic data is dequeued in an embodiment of the present invention.
[0037] Figure 6 This is a schematic diagram illustrating the combination of the supplementary queue and the original queue to form a multi-level queue in an embodiment of the present invention;
[0038] Figure 7 This is a schematic diagram of a dual-queue structure formed by the original queue and the supplementary queue in one example.
[0039] Figure 8 This is a flowchart illustrating a method for determining anomalies using depth-based analysis in one example. Detailed Implementation
[0040] In existing technologies, outlier fluctuations in traffic data can affect the accuracy of network traffic prediction and analysis, leading to interference with feature selection during modeling. This invention aims to provide an outlier traffic data filtering method and apparatus. This method can identify and process outlier traffic data during the data analysis preprocessing stage, thereby reducing the impact of outliers on traffic prediction and analysis. It automates the outlier analysis and processing process and can adapt to changes in traffic data.
[0041] To achieve the aforementioned objectives, the inventors, after diligent research into the technical problems described above, discovered that traffic data anomalies are characterized by fluctuations, which lead to abnormal changes in data trends and affect the accuracy of data analysis and prediction. Furthermore, the determination of traffic data anomalies is related to the overall change pattern of the data over a certain period. Therefore, by setting an adjustable-length first-in-first-out (FIFO) queue and calculating the overall changes in traffic data within the queue, an anomaly measurement index for traffic data can be derived. Calculation of this anomaly measurement index can then determine whether a data value is an anomaly. Moreover, considering that determining the anomaly of a data value requires a comprehensive assessment based on anomaly judgments at different points in time for a single data point, the judgment mechanism can be designed to perform anomaly judgments at both the enqueue and dequeue times of the queue. This allows for anomaly judgments at different points in time for a single data point, thus meeting the time requirements for anomaly value judgment.
[0042] Based on the above design concept, the main technical solution of this invention is as follows:
[0043] The traffic data to be analyzed is sequentially entered into a first-in-first-out (FIFO) queue. When each traffic data is entered into the queue and dequeued, an anomaly metric is calculated to determine the data value anomaly. Combining the anomaly determination results at the time of entry and dequeue, the traffic data is finally determined to be abnormal. The traffic data that is finally determined to be abnormal is processed according to the preset processing strategy.
[0044] By adopting the above technical solution, abnormal traffic data can be identified and processed in the data analysis preprocessing stage, thereby reducing the impact of traffic data anomalies on traffic prediction and analysis, automating the traffic data anomaly analysis and processing process, and adapting to changes in traffic data, thus meeting the needs of practical applications.
[0045] To make the technical problems, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
[0046] However, it should be noted that the examples described below are merely specific examples and are not intended to limit the embodiments of the present invention to the specific steps, values, conditions, data, order, etc. Those skilled in the art can utilize the concept of the present invention to construct more embodiments not mentioned herein by reading this specification.
[0047] Example 1
[0048] The technical solution provided in this embodiment is applicable to... Figure 1 The data analysis and forecasting process is shown below. See also... Figure 1As shown, by applying the technical solution provided in this embodiment, abnormal data can be analyzed and identified and processed in the data analysis preprocessing stage of the input live network traffic data to be analyzed, thereby outputting processed analyzable traffic data to support further analysis and prediction of subsequent traffic data.
[0049] Specifically, see Figure 2 As shown in the figure, this embodiment provides a method for filtering abnormal traffic data, which includes the following steps:
[0050] A. As Figure 3 As shown, the traffic data to be analyzed is sequentially entered into a first-in-first-out queue. When each traffic data is entered into the queue and dequeued, an anomaly metric is calculated to determine the data value anomaly. Based on the results of the anomaly determination when the data is entered into the queue and dequeued, the final anomaly determination of the traffic data is made.
[0051] B. Process the traffic data that is ultimately determined to be abnormal according to the preset processing strategy.
[0052] Understandably, in practical applications, the traffic data to be analyzed can be passed one by one through a first-in, first-out (FIFO) queue of length n, with the n-length data queue serving as the basic unit for calculating the anomaly measurement index of each traffic data point in the queue. For example, as an optional implementation, in step A, the FIFO queue is a data queue of length n, specifically defined as follows:
[0053] queue
[0054] in, ~ These represent the n traffic data items in the queue. Furthermore, since queue C is a first-in, first-out queue, therefore... This is the data at the head of the queue, and also the data at the exit of the queue. This is the data at the tail of the queue, and also the data at the entry point of the queue.
[0055] Furthermore, such as Figure 4 As shown, as an optional implementation, when each traffic data is queued, an anomaly metric is calculated to determine if the data value is abnormal. Specifically, this may include:
[0056] Step S401: Input the raw data value of the next flow data to be analyzed, such as... The original data value is then scaled by its magnitude to obtain scaled traffic data, such as X. For example, in practical applications, logarithmic scaling can be used when scaling the magnitude, i.e. After numerical scaling, the order of magnitude of the data changes in subsequent calculations is reduced, thus improving the speed of arithmetic calculations.
[0057] Step S402: Update the change factors of each traffic data in the current queue based on the traffic data received from the incoming queue. .
[0058] It is understood that the variation factor described in this embodiment The original definition formula is:
[0059]
[0060] in, This represents the flow data at the i-th position in the queue; This represents the change factor index corresponding to the traffic data at the i-th position in the queue within the current queue. As can be seen from the original definition formula above, the change factor... The metric is the sum of squares of the changes in traffic data at position i relative to traffic data at other positions in the queue.
[0061] Since the change factors of each traffic data in queue C in this embodiment can change dynamically along with the changes in the data values in the queue, it is necessary to update the change factors corresponding to each traffic data in the queue whenever new traffic data is added to the queue. This is to provide support for subsequent calculations of anomaly metrics.
[0062] In this embodiment, the update formula for the change factor of each traffic data is:
[0063] ;
[0064] in, , represents the arithmetic sum of all traffic data in queue C at time t; , represents the arithmetic sum of squares of all traffic data in queue C at time t.
[0065] At this point, since it is an enqueue operation and does not involve a dequeue operation, there is no dequeue operation on the head of the queue. Therefore, only the newly added data value needs to be updated. That is, the above update formula can be simplified to the following form:
[0066] .
[0067] For example, taking the scaled traffic data X passed in step S401 as an example, due to the formula... This corresponds to the traffic data X in step S401. Therefore, after substituting it into the simplified update formula, the change factor update for each traffic data in the queue can be obtained as follows:
[0068]
[0069] It is understandable that this embodiment calculates the change factors for each updated traffic data. In this case, we only need to use the simplified update formula described above and input the updated queue tail data (such as X), without needing to perform a global summation operation according to the original definition formula. This effectively simplifies the computational complexity of the change factor and improves computational efficiency.
[0070] Step S403: Based on the change factors of each updated traffic data Update the reference change factor of the current queue. .
[0071] It is understood that the reference change factor for the current queue described in this embodiment... The original definition formula is:
[0072] ;
[0073] Where G represents the set of all change factors in the current queue. From the original definition formula above, it can be seen that the reference change factor... It is the index with the smallest change factor in the current queue.
[0074] So, based on the change factors of each updated traffic data Update the reference change factor of the current queue. The updated reference change factor .
[0075] Step S404: Based on the updated reference change factor And the change factor corresponding to the traffic data currently being passed to the queue, calculate the anomaly measurement index for the change in the traffic data. The calculation formula is as follows: ,Right now It represents the square root of the ratio of the change factor of the traffic data to the minimum change factor in the current queue.
[0076] For example, let's take the scaled traffic data X passed in step S401 as an example. That is, for traffic data X, the corresponding anomaly measurement index is: Where X represents newly added traffic data, the change factor corresponding to its input into the queue is... The calculation is still performed according to the original definition of the change factor, that is:
[0077]
[0078] It should be noted that this embodiment does not consider If the value is 0, A value of 0 indicates that all elements in the queue have the same constant value. The queue elements have not changed. In this case, all elements corresponding to non-zero change factors can be considered anomalous elements. Therefore, an anomalous element change metric is calculated. At that time, the element change factor can be excluded. The case where the value is 0.
[0079] Step S405: Determine the calculated abnormal change measurement index Is it less than or equal to the preset abnormality threshold? If yes, it is considered abnormal; otherwise, it is considered normal.
[0080] It is understood that in this embodiment, the abnormal change measurement index The range of values for is: . The closer the value is to 1, the closer the change factor of the traffic data is to the minimum change factor in the queue, which can be considered normal. The closer the value is to 0, the greater the change factor of the traffic data relative to the smallest change factor in the queue, which can be considered an anomaly. Therefore, to achieve flexible judgment, this embodiment sets an anomaly threshold. If so, the current traffic data is determined to be abnormal traffic data. If the current traffic data is deemed normal, then it is determined that the current traffic data is normal. Furthermore, preferably, a preset anomaly threshold is set. .
[0081] Step S406: Record and save the current judgment result of traffic data X.
[0082] Step S407: The value X is enqueued. After the queue completes the dequeue operation, it becomes the last flow data in the queue. Each element.
[0083] Furthermore, such as Figure 5 As shown, as an optional implementation, when each traffic data is dequeued, an anomaly metric for the change of the traffic data is calculated to determine data value anomalies. Specifically, this may include:
[0084] Step S501: Based on the currently updated reference change factor And the change factor corresponding to the updated current queue head data. Calculate anomaly metrics for the current dequeue traffic data (i.e., the current queue head data). The calculation formula is as follows: .
[0085] It is understandable that, since the change factors and reference change factors related to the calculation of the change anomaly measurement index are already updated when the data at the head of the queue is dequeued, the change anomaly measurement index of the data at the head of the queue can be directly substituted into the formula for calculation without updating the relevant factors before calculation.
[0086] Step S502: Determine the calculated abnormal change measurement index Is it less than or equal to the preset abnormality threshold? If yes, it is determined to be abnormal; otherwise, it is determined to be normal. For the specific determination principle, please refer to step S405 for details, which will not be repeated here.
[0087] Step S503: Record and save the result of this judgment.
[0088] Step S504: Dequeue the data from the head of the queue.
[0089] Understandably, when the head data is dequeued, the queue position of each subsequent data item in the queue will shift forward by one position. Since the queue data changes, as mentioned earlier, the change factor of each data item in queue C in this embodiment can dynamically change along with the data in the queue. Therefore, each time the head data is dequeued, the change factor corresponding to each data item in the queue needs to be updated. and reference factors This is to support subsequent calculations of anomaly metrics and ensure the accuracy of the calculations.
[0090] As an optional implementation, this embodiment further includes the following operations after each traffic data is dequeued:
[0091] (1) Update the change factors of each flow data in the current queue. Among them, the change factor The update formula can be found in step S402, and will not be repeated here.
[0092] (2) Update the reference change factor of the current queue based on the updated change factors of each traffic data. Similarly, refer to the change factor. The update details can be found in step S403, and will not be repeated here.
[0093] Furthermore, as an optional implementation, this embodiment employs a rapid determination method when performing the final anomaly determination on the traffic data. This method can comprehensively determine the anomaly by combining the results of two determinations on the observed data, has clear logic, consumes few computing and storage resources, and can quickly reach a conclusion. Specifically, in step A of this method, combining the anomaly determination results of the data values at the time of enqueueing and dequeueing to perform the final anomaly determination on the traffic data, the following operations may be included:
[0094] Obtain the anomaly detection results of the data values when the traffic data is enqueued and dequeued;
[0095] If both data value anomaly judgment results are abnormal, the traffic data is ultimately determined to be abnormal; otherwise, the traffic data is ultimately determined to be normal, as shown in Table 1.
[0096] Table 1
[0097]
[0098] Furthermore, as an optional implementation, in step B of this embodiment, the preset processing strategy may include discarding or interpolation. Discarding involves removing traffic data ultimately determined to be abnormal and not outputting it to the analyzable traffic data. Interpolation involves selecting the closest normal traffic data to the traffic data ultimately determined to be abnormal as the reference data for interpolation, and replacing the currently determined abnormal traffic data.
[0099] As can be seen from the above, this embodiment designs an anomaly measurement index to determine whether the current traffic data is abnormal. By introducing this anomaly measurement index, the judgment of data anomalies conforms to the characteristics of data change, providing a reasonable and optimized anomaly judgment method, thereby helping to improve the accuracy of subsequent data analysis and prediction. Furthermore, by using this data anomaly measurement index, it can adaptively adapt to changes in traffic data, realizing a dynamic anomaly data filtering method that changes with data trends.
[0100] Meanwhile, this embodiment also adopts a first-in-first-out (FIFO) queue structure, and performs data value anomaly judgment at both the enqueue and dequeue times of the queue, which meets the time requirement for anomaly judgment; moreover, by using this FIFO queue structure, this filtering method can support filtering of large amounts of data, and can flexibly support filtering of static data and online dynamic data, making it flexible in use.
[0101] To better illustrate the technical effects of this embodiment, the superiority of this embodiment will be verified through specific comparative experiments below.
[0102] As shown in Table 2, traffic data from four ports were collected from the live network. Uplink and downlink bidirectional traffic data were collected at 15-minute intervals. The learning performance of the prediction model was compared using an LSTM (Long Short-Term Memory) prediction model to filter out unfiltered outlier data, manually filtered outlier data, and the automatic outlier filtering method described in this example. Automatic filtering was verified using queue lengths of 50 and 100. The comparison metric was the prediction RRMSE (Relative Root Mean Square Error) after the prediction model was learned. A smaller RRMSE value indicates a smaller prediction error and higher prediction accuracy.
[0103] Table 2
[0104]
[0105] As can be seen from Table 2, the predicted RRMSE value of the automatic filtering of abnormal data using the method in this example is significantly smaller than the predicted RRMSE value of the manual filtering of abnormal data. This indicates that the error of the automatic filtering of abnormal traffic data using the method in this example is small, the accuracy is high, and it has a good technical effect.
[0106] Example 2
[0107] This embodiment provides a method for filtering abnormal traffic data. Its basic steps are the same as in Embodiment 1. The difference lies in that, as a preferred implementation, to further improve the accuracy of anomaly detection and reduce false positives, this embodiment does not use the rapid detection method described in the previous embodiment (i.e., rapid secondary anomaly detection based on a single queue) when performing the final anomaly detection. Instead, it preferably uses a deep detection method (i.e., multiple deep anomaly detections based on a combination of multiple queues). Specifically, the deep detection method can involve adding a supplementary queue to the existing queues. This supplementary queue may include at least one level of first-in-first-out (FIFO) queue, and the supplementary queue and the original FIFO queue form a multi-level queue combination, constituting a cascading detection structure (i.e., after traffic data exits from the head of the original queue, it enters the tail of the first-level queue of the supplementary queue; after exiting from the head of the first-level queue, it enters the tail of the second-level queue; and so on, until the traffic data exits from the head of the last-level queue of the supplementary queue, such as...). Figure 6 (As shown); Furthermore, for traffic data with previously differing judgment results, when this traffic data leaves each level of supplementary queue, multiple data value anomaly judgments will be performed by calculating anomaly measurement indicators, and a comprehensive evaluation will be made based on the results of these multiple anomaly judgments, such as... Figure 6 As shown.
[0108] For example, as an optional implementation, this embodiment uses a deep-determination method when performing the final anomaly determination, which may specifically include:
[0109] A supplementary queue is added to the existing queue. The supplementary queue includes at least one level of first-in-first-out queue, and the supplementary queue and the original first-in-first-out queue form a multi-level queue combination structure.
[0110] After each traffic data is dequeued from the original queue, it will enter the first-in-first-out queues at each level of the supplementary queue in sequence. For traffic data with different judgment results in the past, when the traffic data leaves the supplementary queue at each level, multiple data value anomaly judgments will be made by calculating the change anomaly measurement index, and a comprehensive judgment will be made by combining the results of multiple data value anomaly judgments.
[0111] Furthermore, in this embodiment, the purpose of introducing a supplementary queue is to determine data anomalies based on a more comprehensive data sample. The size of the data sample is determined by the queue length. Generally, a larger data sample can more comprehensively reflect the environmental changes of the data, thus helping to accurately determine data anomalies. Therefore, in practical applications, the length of each level of the supplementary queue should be longer than the original FIFO queue, and the length of each level of the supplementary queue can be a multiple of the length of the original FIFO queue. For example, if the length of the original FIFO queue is n, the length of the first level of the supplementary queue is 2n. However, this multiple usually does not exceed 10 times, because a longer multiple means more memory consumption and computing resource requirements, and will lead to low execution efficiency. Therefore, considering all factors, a multiple of less than 10 times is a better choice. In addition, the lengths of the FIFO queues at each level in the supplementary queue can also be different. For example, the length of the first level FIFO queue can be 2n, and the length of the second level FIFO queue can be 3n. Figure 6 As shown. In addition, preferably, the number of levels of the first-in-first-out queues included in the supplementary queue does not exceed 10, which can avoid affecting the decision efficiency due to too many levels.
[0112] Furthermore, as an optional implementation method, a strategy that combines multiple data value anomaly detection results for comprehensive evaluation can be adopted in several ways:
[0113] Method 1: Use the last result of the replenishment queue as the standard.
[0114] For example, if the supplementary queue has a first-in-first-out queue, then when the first two judgment results of a certain traffic data are inconsistent (such as being judged as normal when entering the original queue and being judged as abnormal when leaving the original queue), the last judgment made when it leaves the supplementary queue shall prevail. If the judgment is abnormal this time, then it shall be judged as abnormal in the end; if the judgment is normal this time, then it shall be judged as normal in the end.
[0115] Method 2: Adopt the 7-point continuous anomaly judgment principle in the field of quality control.
[0116] That is, if the same data is judged as abnormal 7 times in a row, the traffic data is considered to be an outlier.
[0117] Method 3: Based on the relative proportion of anomaly judgments calculated after multiple judgments, the final anomaly judgment result is obtained.
[0118] For example, if the supplementary queue has two levels of first-in-first-out queues, then for traffic data with different judgment results, it will go through a total of four judgments (two judgments when enqueuing and dequeuing in the original queue, and two judgments when dequeuing in the two levels of supplementary queues). If there are three abnormal judgments out of the four judgments, then the relative proportion of abnormal judgments is three-quarters, which is more than half (the judgment standard of relative proportion can be set as needed), and then it is finally judged as abnormal.
[0119] It should be noted that the specific comprehensive evaluation strategy adopted can be selected according to the actual application requirements, and this embodiment does not impose any specific limitations.
[0120] To better understand the depth determination method in this embodiment, the following uses a combination structure of a double queue (e.g., the original queue plus a supplementary queue) as an example. Figure 7 Taking the example shown, we will illustrate the specific process of using the depth-based judgment method to make the final anomaly judgment.
[0121] See Figure 8 As shown, the final anomaly determination is performed using a depth-based method, which specifically includes the following steps:
[0122] Step S801: Obtain the abnormal judgment results of the data values when the traffic data is enqueued and dequeued in the original first-in-first-out queue;
[0123] Step S802: Based on the two abnormal judgment results of the obtained data values, the traffic data is marked. As shown in Table 3, three marking results can be obtained: if both judgment results are normal, it is marked as normal traffic data; if both judgment results are abnormal, it is marked as abnormal traffic data; if different judgment results occur, it is marked as delayed judgment.
[0124] Table 3
[0125]
[0126] It is understandable that traffic data marked as delayed judgment means that the traffic data needs to rely on the supplementary queue for further judgment; while other marked traffic data means that the judgment result of the traffic data can be determined and no further judgment is needed. Therefore, the final judgment can be made directly based on its marking. For example, for traffic data marked as normal, the traffic data can be judged to be normal in the end.
[0127] In step S803, after all traffic data is dequeued from the original queue, it will sequentially enter the first-level FIFO queue in the supplementary queue. Furthermore, for traffic data marked as delayed, a third data value anomaly assessment will be performed by calculating the change anomaly metric when that traffic data leaves the first-level FIFO queue. It is understood that, as described in Embodiment 1, each time traffic data undergoes an enqueue operation and after a dequeue operation, the change factor corresponding to each traffic data in the queue needs to be updated. and reference factors This is to support the calculation of anomaly metrics and ensure the accuracy of the calculations.
[0128] Step S804: For traffic data marked as delayed, the final anomaly determination result is obtained by combining the three data value anomaly determination results and using the calculated relative proportion of anomaly determinations.
[0129] Example 3
[0130] Based on the same inventive concept, embodiments of the present invention also provide an abnormal traffic data filtering device, which includes an analysis and identification module and an anomaly processing module.
[0131] The analysis and identification module is used to: sequentially enter the traffic data to be analyzed into a first-in-first-out queue, and at the time of each traffic data entering and leaving the queue, calculate the abnormality measurement index of the traffic data to judge the abnormality of the data value; and combine the abnormality judgment results of the data value at the time of entering and leaving the queue to make a final abnormality judgment on the traffic data.
[0132] The anomaly handling module is used to process traffic data that is ultimately determined to be abnormal according to a preset processing strategy.
[0133] It is understood that the device using this embodiment can identify and process abnormal traffic data in the data analysis preprocessing stage, thereby reducing the impact of abnormal traffic data values on traffic prediction and analysis, automating the abnormal value analysis and processing process of traffic data, and adapting to changes in traffic data, thus meeting the needs of practical applications.
[0134] It should be noted that the various variations and specific examples in the foregoing method embodiments are also applicable to the device in this embodiment. Through the detailed description of the foregoing method, those skilled in the art can clearly understand the implementation method of the device in this embodiment. Therefore, for the sake of brevity, it will not be described in detail here.
[0135] Note: The specific embodiments described above are merely examples and not limitations. Those skilled in the art can combine and integrate some steps and devices from the various embodiments described separately above to achieve the effects of the present invention. Such combined and integrated embodiments are also included in the present invention, but will not be described one by one here.
[0136] The advantages, benefits, and effects mentioned in the embodiments of this invention are merely examples and not limitations. They should not be considered as essential features of each embodiment of this invention. Furthermore, the specific details disclosed in the embodiments of this invention are for illustrative and facilitative purposes only and are not limitations. These details do not restrict the embodiments of this invention from being implemented using these specific details.
[0137] The block diagrams of devices, apparatuses, devices, and systems involved in the embodiments of this invention are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, apparatuses, devices, and systems can be connected, arranged, and configured in any manner. Words such as "comprising," "including," "having," etc., are open-ended terms meaning "including but not limited to," and are used interchangeably with them. The terms "or" and "and" as used in the embodiments of this invention refer to the terms "and / or," and are used interchangeably with them unless the context explicitly indicates otherwise. The term "such as" as used in the embodiments of this invention refers to the phrase "such as but not limited to," and is used interchangeably with it.
[0138] The flowcharts and method descriptions in the embodiments of this invention are merely illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the given order. As those skilled in the art will recognize, the steps in the above embodiments can be performed in any order. Words such as "then," "next," etc., are not intended to limit the order of steps; these words are only used to guide the reader through the description of these methods. Furthermore, any reference to a singular element, such as the use of the articles "a," "one," or "the," is not to be construed as limiting that element to the singular.
[0139] Furthermore, the steps and apparatus in the various embodiments of the present invention are not limited to any one embodiment. In fact, new embodiments can be conceived by combining relevant steps and apparatus in the various embodiments of the present invention with the concepts of the present invention, and these new embodiments are also included within the scope of the present invention.
[0140] The various operations in the embodiments of the present invention can be performed by any suitable means capable of performing the corresponding functions. Such means may include various hardware and / or software components and / or modules, including but not limited to hardware circuits or processors.
[0141] The method of this invention includes one or more actions for implementing the method described above. The methods and / or actions may be interchanged without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and / or use of specific actions may be modified without departing from the scope of the claims.
[0142] The functions in the embodiments of the present invention can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions can be stored as one or more instructions on a physical computer-readable medium. The storage medium can be any available physical medium that can be accessed by a computer. By way of example and not limitation, such a computer-readable medium can include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, magnetic disk storage or other magnetic storage devices, or any other physical medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. As used herein, disk and disc include compact disc (CD), laser disc, optical disc, DVD (Digital Versatile Disc), floppy disk, and Blu-ray disc, wherein a disk reproduces data magnetically, while a disc optically reproduces data using lasers.
[0143] Therefore, a computer program product can perform the operations described herein. For example, such a computer program product can be a computer-readable tangible medium having instructions tangibly stored (and / or encoded) thereon, which can be executed by one or more processors to perform the operations described herein. The computer program product may include packaging materials.
[0144] Other examples and implementations are within the scope and spirit of the embodiments of the present invention and the appended claims. For example, due to the nature of software, the functions described above can be implemented using software executed by a processor, hardware, firmware, hardwired, or any combination thereof. Features implementing the functions can also be physically located in various locations, including being distributed so that portions of the function are implemented at different physical locations.
[0145] Those skilled in the art can make various changes, substitutions, and modifications to the technology described herein without departing from the teachings defined by the appended claims. Furthermore, the scope of the claims of this disclosure is not limited to the specific aspects of the processes, machines, manufactures, events, means, methods, and actions described above. Currently existing or later-developed processes, machines, manufactures, events, means, methods, or actions that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein can be utilized. Therefore, the appended claims include such processes, machines, manufactures, events, means, methods, or actions within their scope.
[0146] The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of the invention. Therefore, the invention is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.
[0147] The above description has been given for illustrative and descriptive purposes. Furthermore, this description is not intended to limit the embodiments of the invention to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations therein. Moreover, anything not described in detail in this specification is prior art well known to those skilled in the art.
Claims
1. A method for filtering abnormal traffic data, characterized in that, The method includes: The traffic data to be analyzed is sequentially entered into a first-in-first-out queue. When each traffic data is entered into the queue and dequeued, an anomaly metric is calculated to determine the data value anomaly. Based on the results of the anomaly determination when the data is entered into the queue and dequeued, the traffic data is finally determined to be abnormal. Traffic data that is ultimately determined to be abnormal will be processed according to the preset processing strategy; Specifically, when each traffic data is queued, anomaly detection is performed by calculating an anomaly metric for that traffic data, including: Input the raw data value of the next traffic data to be analyzed, and scale the raw data value by magnitude; Based on the incoming traffic data, update the change factor of each traffic data in the current queue; the change factor of the i-th traffic data in the queue is the sum of the squares of the changes of the traffic data at the i-th position in the queue relative to the traffic data at other positions in the queue; Based on the updated change factors of each traffic data, update the reference change factor of the current queue; the updated reference change factor of the current queue is the index with the smallest updated change factor of the current queue. Based on the updated reference change factor and the change factor corresponding to the traffic data in the current incoming queue, calculate the anomaly measurement index of the traffic data. Determine whether the calculated abnormality metric is less than or equal to the preset abnormality threshold; if so, it is judged as abnormal; otherwise, it is judged as normal. When data is dequeued, anomaly detection is performed by calculating anomaly metrics for each element in the queue, including: Calculate the anomaly measurement index of the current outbound traffic data based on the currently updated reference change factor and the change factor corresponding to the updated outbound traffic data. Determine whether the calculated abnormality metric is less than or equal to the preset abnormality threshold; if so, it is determined to be abnormal; otherwise, it is determined to be normal.
2. The abnormal traffic data filtering method as described in claim 1, characterized in that, After each traffic data is dequeued, the following operations are also performed: Update the change factors for each traffic data in the current queue; Update the reference change factor of the current queue based on the updated change factors of each traffic data.
3. The abnormal traffic data filtering method as described in claim 1, characterized in that, Combining the anomaly detection results of the data values at the time of enqueueing and dequeueing, a fast determination method is used when making the final anomaly determination for this traffic data, which specifically includes: Obtain the anomaly detection results of the data values when the traffic data is enqueued and dequeued; If both data value anomaly assessments result in anomaly, the traffic data is ultimately determined to be abnormal; otherwise, the traffic data is ultimately determined to be normal.
4. The abnormal traffic data filtering method as described in claim 1, characterized in that, Combining the anomaly detection results of the data values during enqueueing and dequeueing, a deep-level analysis method is used to perform the final anomaly determination of this traffic data, specifically including: A supplementary queue is added to the existing queue. The supplementary queue includes at least one level of first-in-first-out queue, and the supplementary queue and the original queue form a multi-level queue combination structure. After each traffic data is dequeued from the original queue, it will enter the first-in-first-out queues at each level of the supplementary queue in sequence. For traffic data with different judgment results in the past, when the traffic data leaves the supplementary queue at each level, multiple data value anomaly judgments will be made by calculating the change anomaly measurement index, and a comprehensive judgment will be made by combining the results of multiple data value anomaly judgments.
5. The abnormal traffic data filtering method as described in claim 4, characterized in that: The length of each level of the supplementary queue is longer than the length of the original first-in-first-out queue, and the two lengths are in a multiple relationship.
6. The abnormal traffic data filtering method as described in claim 4, characterized in that: The comprehensive evaluation methods include: The method of using the last result of the replenishment queue as the standard; Alternatively, the seven-point continuous anomaly determination principle in the field of quality control can be adopted; Alternatively, one could calculate the relative proportion of anomaly detections.
7. The abnormal traffic data filtering method as described in claim 1, characterized in that, The preset processing strategies include: discard processing or interpolation processing.
8. An abnormal traffic data filtering device based on the method of any one of claims 1 to 7, characterized in that: The device includes an analysis and identification module and an anomaly handling module; The analysis and identification module is used to: sequentially enter the traffic data to be analyzed into a first-in-first-out queue, and at the time of each traffic data entering and leaving the queue, calculate the abnormal change measurement index of the traffic data to judge the abnormality of the data value; and combine the abnormality judgment results of the data value at the time of entering and leaving the queue to make a final abnormality judgment of the traffic data. The anomaly handling module is used to process traffic data that is ultimately determined to be abnormal according to a preset processing strategy.