Optimal Feature Subset Selection Method Based on Structural Vector Complementary of Classification Ability

A technology with classification ability and optimal features, applied in instruments, character and pattern recognition, computer parts, etc., and can solve problems such as small scores

Inactive Publication Date: 2018-07-24
TIANJIN NORMAL UNIVERSITY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, some work has shown that some features with small scores should also be selected, and some combinations of features with higher classification ability values ​​do not always lead to good classification results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimal Feature Subset Selection Method Based on Structural Vector Complementary of Classification Ability
  • Optimal Feature Subset Selection Method Based on Structural Vector Complementary of Classification Ability
  • Optimal Feature Subset Selection Method Based on Structural Vector Complementary of Classification Ability

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] 1. Read the classification problem dataset.

[0050] Usually a classification problem dataset is a two-dimensional matrix, e.g. having features kind sample classification problem datasets such as figure 1 shown, where Indicates the first of the sample The eigenvalues ​​of a feature, Indicates the first category of samples. Table 1 shows the expression values ​​of some characteristic genes of some samples in the breast cancer data set, where the second row is the sample category, the third row is the expression value of the first feature on each sample, and so on for other rows, and one column represents a A sample is the expression value and category of each feature of a certain person. Read all the eigenvalues ​​of each sample in the dataset into a two-dimensional array , read the category of each sample into a one-dimensional array middle.

[0051] Table 1 Expression values ​​of some characteristic genes of some samples in the breast cancer dat...

Embodiment 2

[0096] Experimental result and data of the present invention:

[0097] The experimental data set of the present invention - breast cancer (breast), was downloaded from http: / / www.ccbm.jhu.edu / in 2007, see references. The breast dataset contains 5 categories, 9216 features and 54 samples. The traditional objective evaluation index is used to test the performance of the algorithm, which mainly includes the number of selected features and the accuracy of classification prediction. The number of features selected refers to the number of features selected by the feature selection method, and the accuracy of classification prediction is the The accuracy rate obtained by selecting a subset of features as input to the classifier. In order to verify the effectiveness of the method proposed in the present invention, it is compared with existing attribute selection methods such as FCBF, CFS, mrMr, and Relief. Because the mrMr and Relief methods only evaluate features and give sorting ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention proposes a new optimal feature subset selection method based on the complementarity of classification capability structure vectors, aiming at the classification capability evaluation criteria in which single values ​​are used as features or feature subsets in most existing methods. This method defines the feature classification ability structure vector and the complementary features of the classification ability structure vector in binary form, and uses the dichotomy method to calculate the threshold of feature classification and discrimination ability in each subcategory problem, and on this basis, according to the different characteristics of the selected feature subset The principle of structural complementarity maximization and greedy strategy are used to select the optimal feature subset. This method not only fully considers the different evaluations of each feature on the classification ability of different categories, but also follows the principle of maximizing the structural complementarity of the classification ability in the process of feature selection. It not only conforms to the natural law of complementary advantages, but also maximizes the feature classification information, so as to obtain a better feature subset, effectively reduce redundant features, and improve the accuracy of classification prediction.

Description

technical field [0001] The invention belongs to the technical field of machine learning and pattern recognition, and specifically proposes a reasonable and effective feature subset selection method. Background technique [0002] Feature selection is one of the two main approaches to dimensionality reduction. It plays a vital role in machine learning and pattern recognition, and it is also one of the basic issues studied in it, and it is a key data preprocessing step in constructing classifiers. Feature selection is based on some evaluation criteria to select a subset of features that are meaningful for classification on the original feature set to remove irrelevant or redundant features, thereby reducing the dimension of the original space to m dimensions that are much smaller than the original dimension. With the rapid development of the Internet and high-throughput technology, we have entered the era of big data, and the data is huge and complex, which also makes the rese...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/2113
Inventor 王淑琴
Owner TIANJIN NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products