Feature selection method for high-dimensional data

A feature selection method and high-dimensional data technology, applied in the direction of instruments, calculations, character and pattern recognition, etc., can solve the problems that the feature dimension cannot be well determined, the calculation cost is high, and the accuracy rate is difficult to obtain, so as to achieve stable features Selection process, effect of good feature dimension

Active Publication Date: 2019-05-24
XIAMEN UNIV
View PDF17 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The filter method can achieve fast feature selection, but it is difficult to obtain a high accuracy rate; the wrapper method can obtain a high accuracy rate, but the calculation cost is large, and it is not easy to promote [12]
The Embedded method scores the features based on the classification algorithm, and then implements feature selection, but the dimension of the features cannot be well determined

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature selection method for high-dimensional data
  • Feature selection method for high-dimensional data
  • Feature selection method for high-dimensional data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following embodiments will further illustrate the present invention in conjunction with the accompanying drawings.

[0037] 1) The stability score of the feature:

[0038] Randomized Logistic Regression (RLR) is a stable selection technique that calculates a stable score for the features of the dataset by sampling multiple times. The present invention only needs to perform a scoring calculation on the data set to obtain a feature score, and does not need to perform repeated evaluation on the feature subsequently. Then a subset of features can be searched based on the different scores between features. The specific feature scoring results are as follows: figure 2 , which means that the higher the score corresponding to the feature, the stronger the importance of the feature.

[0039] 2) Selection of feature subsets:

[0040] Different features have different scores, and there are differences in scores between features. Sort the features in descending order of i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a feature selection method for high-dimensional data, and relates to a feature selection method. The invention provides a feature selection method for high-dimensional data, which can efficiently and stably screen features. The method comprises the following specific steps: 1) stability scoring of features; 2) selecting a feature subset; 3) evaluating the feature subset; and 4) verifying the effectiveness of the feature subsets, namely verifying the selected feature subsets through different classifiers, indicating that the selected feature subsets have generalization and representativeness on the different classifiers, and further indicating the effectiveness of the feature selection method. A new feature selection method oriented to high-dimensional data is provided, and a better feature subset can be obtained by combining the idea of a wepper and the idea of an embedded. In combination with a greedy strategy, the step length of search can be customized, the feature dimension can be well determined, and the feature selection process can be terminated in time.

Description

technical field [0001] The invention relates to a feature selection method, in particular to a feature selection method oriented to high-dimensional data, which can efficiently and stably screen features. Background technique [0002] Feature selection is very important for the classification of high-dimensional data. It selects some of the most important features from a set of features to reduce the dimensionality of the feature space. [1] . The quality of the feature selection results directly affects the accuracy of the classification results. Feature selection methods in the field of bioinformatics [2-4] , image field [5-7] and text fields [8-10] etc. have a wide range of applications. Feature selection generally consists of four steps: feature subset search process, feature subset evaluation method, feature subset search stopping criterion, and feature subset validity verification. [11] . Commonly used feature selection methods include filter, wrapper and embedde...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62
Inventor 张仲楠郑辉辉
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products