Feature selection method and device of high-dimension data

A feature selection method and high-dimensional data technology, which are applied in the field of feature selection methods and devices for high-dimensional data, can solve the problems of low feature selection accuracy, difficulty in applying high-dimensional data sets, etc., so as to improve the classification accuracy and overcome related problems. Effects of Sex and Redundancy

Inactive Publication Date: 2016-09-28
HARBIN UNIV OF SCI & TECH
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although MIC is very effective in variable measurement, it can only measure the correlation and redundancy between single variables. Therefore, this paper proposes a new measurement mMIC (effective value) and applies it to the Markov blanket condition to solve the current problem. There is a problem of low accuracy of feature selection due to the difficulty of applying redundancy between features and feature subsets in high-dimensional datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method and device of high-dimension data
  • Feature selection method and device of high-dimension data
  • Feature selection method and device of high-dimension data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0047] figure 1 It is a schematic flow chart of a feature selection method for high-dimensional data proposed by an embodiment of the present invention, refer to figure 1 , a feature selection method for this high-dimensional data, including:

[0048] 110. Obtain an original data set to be processed, the original data set includes a feature s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a feature selection method and device of high-dimension data. The method comprises the following steps: obtaining an original data set to be processed, wherein the original data set comprises a feature set, a plurality of samples and a category set, and the category set comprises the category of each sample; calculating to obtain a MIC (Maximum Information Coefficient) between each feature in the feature set and the category set, and the redundant value of each feature and a selected feature subset; and according to the MIC and the redundant value, obtaining the effective value of each feature, and selecting the feature subset from the feature set according to the effective value. The MIC is introduced into feature selection, and the feature is effectively evaluated on the basis of the MIC so as to select features according to the effective value generated by evaluation. Compared with the prior art, the feature selection method can effectively improve accuracy for high-dimension data feature selection.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a feature selection method and device for high-dimensional data. Background technique [0002] The rapidly developing information society generates massive amounts of data every day, and how to quickly discover useful information from these data has become an urgent problem to be solved. Researchers have tackled this problem from the perspective of machine learning models and have made remarkable progress. However, it is increasingly difficult for high-complexity models and high-dimensional feature spaces to meet the urgent requirements of big data applications, and there are often a lot of useless information in feature spaces. Only by adopting an appropriate feature selection method can effective features be obtained from massive data, thereby improving the efficiency and accuracy of machine learning models in processing data; at the same time, feature selection can also p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/285G06F16/2465G06F16/283
Inventor 孙广路宋智超陈腾何勇军
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products