[0051] Example 1:
[0052] Please refer to figure 1 , figure 1 Flow diagram of flow detection method based on interactive strategy. like figure 1 As shown, the flow detection method of the interactive strategy of the present invention includes:
[0053] The pre-processing step S1: obtains the return log information of each user's tour contact, pre-processes the traffic log of the backup log information;
[0054] Feature Getting Step S2: A single-dimensional data characteristic and receive multi-dimensional data characteristics are extracted from the pre-processed traffic log;
[0055] Flow test step S3: The flow prediction result is obtained from the isolated forest model after the single dimensional data characteristics and the multi-dimensional data characteristics;
[0056] Comparative Step S4: A abnormal flow rate is obtained after comparing the abnormal flow identified by the initialization rule.
[0057] Sort feedback Step S5: After sorting the single-dimensional data characteristics and the multi-dimensional data characteristics, the importance is given to at least one data feature and the abnormal flow rate of the importance according to the importance of data characteristics.
[0058] The model improves step S6: After analyzing at least one data characteristics before importance and the abnormal flow rate, new multi-dimensional data characteristics will be obtained, according to new multi-dimensional data characteristics and the single-dimensional data characteristics, the isolated forest model Carry out a new round of training and interaction.
[0059] Among them, the feature field is parsed from the traffic log, and the missing value of the resolution feature field is filled.
[0060] Please refer to figure 2 , figure 2 It is a flow chart for feature acquisition step S2. like figure 2 As shown, the feature acquisition step S2 includes:
[0061] Single-dimensional data feature extraction step S21: Extract the single dimensional data characteristics from the pre-processed flow log;
[0062] Multi-dimensional data characteristics Extract step S22: The multi-dimensional data characteristics are extracted from the expert rule base.
[0063] Please refer to image 3 , image 3 It is a flow chart of the single-dimensional data characterization step S21. like image 3 As shown, the single-dimensional data feature extraction step S21 includes:
[0064] Discrete Variable Coding Step S211: The discrete variable in the traffic log is encoded by an ANDM algorithm;
[0065] Continuous Variable Code Step S212: The continuous variable in the traffic log uses a segment mapping, and the value of different sections is reflected in different values.
[0066] Please refer to Figure 4 , Figure 4 It is a flow chart of comparative step S4. like Figure 4 As shown, the comparative step S4 includes:
[0067] Prediction Step S41: Predicting the normal and abnormal flow of the test concentration through the isolated forest model to obtain the prediction result;
[0068] Identifying step S42: Aligning the predicted results with an abnormal flow rate identified by initialization, identifying different abnormal flow.
[0069] like Figure 5 As shown, the specific steps are as follows:
[0070] The pre-processing step obtains the return log information of each user's tour contact through the advertisement traffic monitoring system, and the traffic log is pre-processed. The feature field is parsed from the traffic log, and the filling of the defective field lack value is used in accordance with the field features, such as: OS field, using the number or UNK fill; the number of seconds received, using average Number fill, etc.
[0071] Feature Engineering Construction Procedure, the Construction of Feature Projects plays a very important role in the final effect of the model. This paper uses interactive strategies to gradually abundant data characteristics. It mainly includes two parts, one is the single-dimensional data characteristics after log analysis. The second is the multidimensional data characteristics of expert knowledge.
[0072] Analysis of single-dimensional data feature extraction steps, including discrete variables and continuous variables. Discrete variables are coded in oneHot, such as OS, MD, Region and other features. Continuous variables use segment mapping to map the values of different sections into different values, such as receiving the number of seconds.
[0073] The multi-dimensional data characteristics of expert knowledge returns, the experts summarize experts from expert rules banks, providing related characteristic methods rich in natural features. When the model completes the initial training, experts are sorted from the model of TOP-K from the model, and the relevant rules are refined, and the expertise summary characteristics are used to return to the model rich feature project.
[0074] Model training and testing steps, in the actual flow of the system, normal traffic and abnormal flow rates are large, and abnormal flow accounted for small samples in overall samples, and the characteristic performance of abnormal flow and characteristics of normal traffic. The difference is very large, and the abnormal flow is sparse, the high density group is far away. Therefore, this paper uses an isolated forest algorithm as a basic model to identify abnormal flow. First, n-data is randomly selected from the training data as a subample, put the root node of the isolated tree; then specify a dimension, and then generate a cut point P in the range of the current node data, and the cutting point is generated in the current node data. The maximum value of the specified dimension is between the minimum; the super plane is generated according to the cut point, and the current node data space is divided into two sub-spaces, put the current dimension in the left branch of the current node at the current node, greater than The P point is placed on the right branch of the current node, repeating the above operation on the left and right branches, constantly constructing a new leaf node, knowing that the leaf node has only one data or the tree has grown to the preset height. Finally, the results of all isolated trees are needed. Since the cutting process is random, it is necessary to start cut from the beginning and then calculate the average of each cut-out result until the result converges.
[0075] Results Differential contrast and feature of characteristics The steps are sorted, using training-well model predicting the normal and abnormal flow in the test concentration, compares the abnormal flow rate identified by the initialization rule, identifies the different abnormal flows. The characteristics of model prediction use are sorted, and the feature and difference flow feedback to the TOP5 are selected.
[0076] Performing an interactive policy step, using results difference comparisons and feature importance sorting TOP5 characteristics and difference traffic, expert knowledge analysis of rule features contained in TOP5 characteristics and difference flow, and adds it to expert rules library, and use identification The new rule enrich model characteristic project. Then, a new round of model training and interaction, until the model converges or expert knowledge cannot be judged, that is, the identified abnormal flow cannot be described in rule description.