Distributed drift data set-based feature selection method
A feature selection method and data set technology, applied in the field of machine learning, can solve problems such as failure of feature subsets or feature sorting lists, and achieve the effect of improving operating efficiency and effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0042] The present invention is a feature selection method based on distribution drift data set, by inputting distribution drift data set and feature candidate set, and considering the degree of correlation between features and labels and the drift degree of features over time to obtain the final feature candidate Subset and feature sorted lists.
[0043] The feature selection method of the present invention is based on a feature evaluation index: feature generalization ability effectiveness score FGES. The feature generalization ability effectiveness score FGES is a brand-new concept proposed by the present invention, and its calculation combines the feature correlation score FRS and the feature drift score FSS. The feature correlation score FRS refers to the degree of correlation or importance between features and labels; the feature drift score refers to the degree to which feature distribution changes over time or the combination of feature labels changes over time.
[00...
Embodiment 2
[0083] Calculate the feature generalization ability effectiveness score FGES:
[0084] Given data set D and feature candidate set F, feature candidate set F={A, B, C, D, E, F, G, H, I, J}; for each feature in feature candidate set F, calculate Feature correlation score (FRS), this embodiment uses the "mutual information of features and labels" method to calculate FRS, the FRS of each feature refers to the corresponding column in Table 1 below; for each feature in the F set, calculate the degree of feature drift Score FSS, the present embodiment adopts the "KL distance of feature" method to calculate FSS, and the FSS of each feature refers to the corresponding column of Table 1 below; for each feature in the F set, use FGES=log(FRS) / log(FSS ) fusion method to calculate FGES, the FGES of each feature is shown in the corresponding column of Table 1 below.
[0085] Table I
[0086] Features
Embodiment 3
[0088] Distribution shift dataset filter feature selection method:
[0089] (1) Given data set D, feature candidate set F, the number of features to be selected N; in this embodiment, F={A, B, C, D, E, F, G, H, I, J}, N=4.
[0090] (2) Choose a method to calculate the feature correlation score FRS of each feature in the feature candidate set F; in this embodiment, the method of "mutual information between features and labels" is used to calculate FRS, and the specific values refer to the correspondence in Table 1 List.
[0091] (3) Select a method to calculate the feature drift degree score FSS of each feature in the feature candidate set F; in the present embodiment, the method of "feature KL distance" is used to calculate FSS, and the specific values are listed in the corresponding column of Table 1;
[0092] (4) Select a method to calculate the feature generalization ability effectiveness score FGES of each feature in the feature candidate set F; in this embodiment, t...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 