Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Feature selection method and device based on conditional mutual information, equipment and storage medium

A feature selection method and conditional mutual information technology, applied in digital data information retrieval, computer parts, character and pattern recognition, etc., to achieve the effect of improving accuracy and efficiency

Pending Publication Date: 2021-12-07
CHINA ELECTRIC POWER RES INST +5
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to provide a feature selection method, device, device and storage medium based on conditional mutual information, to solve how to reduce the computational complexity without reducing the accuracy of feature selection, so as to adapt to the big data environment. The characteristics of diversity and high dimensionality, and the technical problem of improving the quality of the overall data collection; the method of the present invention considers the importance of balancing the selected feature set and the candidate feature set, and efficiently removes data redundancy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method and device based on conditional mutual information, equipment and storage medium
  • Feature selection method and device based on conditional mutual information, equipment and storage medium
  • Feature selection method and device based on conditional mutual information, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0049] see figure 1 As shown, the present invention provides a feature selection method based on conditional mutual information, comprising the following steps:

[0050] S1. Read the sample data, form a set of candidate features as a candidate feature set F; each sample has a corresponding classification, and take out the category features in the sample separately to form a category attribute C; initialize the feature set S to an empty set; Calculate the variance of each feature in the candidate feature set F, remove the features with zero variance in the candidate feature set F, and update the candidate feature set F. The sample data is the data generated by the calculation example. Suppose there are n samples, each sample has a physical quantity, and n samples of a certain physical quantity form a matrix with n rows and 1 column. This matrix is ​​the eigenvector corresponding to this physical quantity, a The physical quantity constitutes a large matrix of n rows and a colum...

Embodiment 1

[0063] see figure 1 As shown, the present invention provides a feature selection method based on conditional mutual information, comprising the following steps:

[0064] S1. Read the data, form a set of candidate features as the candidate feature set F; initialize the feature set S as an empty set; calculate the variance of each feature in the candidate feature set F, and remove the features with zero variance in the candidate feature set F , and update the candidate feature set F.

[0065] S2. Calculate the mutual information between each candidate feature and category attribute C in the candidate feature set F updated in step S1, and put the candidate feature or empirical key feature with the largest mutual information into the feature set S as the initial selection feature, and select The candidate features of are deleted from the candidate feature set F; and the feature set S and the candidate feature set F are updated; the weight coefficient α is set, and the value range...

Embodiment 2

[0077] The present invention takes a test system standardized by IEEE10 machines and 39 nodes in a certain area as an example to carry out verification experiments; the simulation data of the 10 machines and 39 nodes power system is used as an input sample. Active power P, reactive power Q, node voltage amplitude V and node voltage phase angle θ have 170 candidate features. The category attribute C contains two categories, stable and unstable. A total of 3000 samples, including 1500 stable samples and 1500 unstable samples. During model training and prediction, the samples are randomly divided into 2000 training samples and 1000 prediction samples. In this experiment, the Relief algorithm and the mutual information selection algorithm were set as controls. All three algorithms selected 30 features to predict whether they were stable or not, and compared the accuracy. In order to reduce the error, each set of experiments was randomly divided into training set and test set 10 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of data mining, and discloses a feature selection method and device based on conditional mutual information, equipment and a storage medium, and the method comprises the steps: obtaining a data set to form a candidate feature set F; calculating mutual information of each candidate feature in the candidate feature set F and the category attribute C, and putting the selected features into a feature set S; setting a threshold value, and entering circulation until the threshold value is met; training a model for the selected feature set S through a classifier, predicting the category by using the trained model, and calculating the prediction accuracy; changing the weight coefficient, repeatedly screening the feature sets S, calculating the prediction accuracy, and selecting the feature set S with the highest accuracy as a final output feature set. According to the method, feature selection can be performed more efficiently and quickly, and the precision and efficiency of data mining are improved.

Description

technical field [0001] The invention belongs to the technical field of feature selection in data mining, and in particular relates to a feature selection method, device, equipment and storage medium based on conditional mutual information. Background technique [0002] Data mining refers to the process of searching for information hidden in a large amount of data through algorithms. There is a large amount of data in production and life, which can be widely used, and it is urgent to convert these data into useful information and knowledge. Data mining is to obtain the required information and knowledge by analyzing a large amount of data and looking for its laws. It has many applications in power systems, such as transient stability assessment, fault diagnosis, load forecasting, etc. The data mining process mainly consists of three stages: data preparation and preprocessing, data mining, and result expression and interpretation. Feature selection is a very important data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/2458G06F16/28G06F16/2453G06K9/62
CPCG06F16/2465G06F16/285G06F16/2453G06F18/2411G06F18/24323G06F18/214
Inventor 马晓忱孙博吕闫李理石上丘罗雅迪程文帅郑乐冷喜武常乃超吴迪章昊王吉文李端超叶海峰刘辉马金辉胡海琴陈伟李智李顺朱刚刚王维坤樊锐轶高志张秀丽刘志良刘国瑞杨旋余志国李英孙珂周明李杨月汪春燕
Owner CHINA ELECTRIC POWER RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products