Filter characteristic selection method based on subclass problem classification ability measurement

A classification ability and problem classification technology, applied in medical data mining, special data processing applications, instruments, etc., can solve problems such as small scores

Inactive Publication Date: 2016-10-12
TIANJIN NORMAL UNIVERSITY
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, some work has shown that some features with small scores should also be selected, and some combinations of features with higher classification ability values ​​do not always lead to good classification results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Filter characteristic selection method based on subclass problem classification ability measurement
  • Filter characteristic selection method based on subclass problem classification ability measurement
  • Filter characteristic selection method based on subclass problem classification ability measurement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] (1) Read the classification problem dataset:

[0051] Usually classification problem dataset is a two-dimensional matrix, for example with sample classification problem datasets such as figure 1 shown, where Indicates the category of the i-th sample. Table 1 shows the expression values ​​of some characteristic genes of some samples in the Breast Cancer Dataset, where the second row represents the sample category, the third row represents the expression value of the first feature on each sample, and so on for the other rows, and one column represents a Sample, that is, the expression value and category of each characteristic gene of a certain person. Read all the eigenvalues ​​of each sample in the dataset into a two-dimensional array,

[0052] middle,

[0053] Read the category of each sample into a one-dimensional array C=(c 1 , c 2 ,...,c n )middle.

[0054] (2) Calculate the classification discrimination ability value of each feature for each subcatego...

Embodiment 2

[0090] Experimental result and data of the present invention:

[0091] The experimental data sets of the present invention - breast cancer (Breast), DLBCL and Leukemia3, were downloaded from http: / / www.ccbm.jhu.edu / in 2007, see references. Table 2 shows the number of categories, features and samples contained in these datasets. The traditional objective evaluation index is used to test the performance of the algorithm, which mainly includes the number of selected features and the accuracy of classification prediction. The number of features selected refers to the number of features selected by the feature selection method, and the accuracy of classification prediction is the The accuracy rate obtained by selecting a subset of features as input to the classifier. In order to verify the effectiveness of the method proposed in the present invention, the method of the present invention (referred to as RRSPFS for short) is compared with existing Filter attribute selection methods...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a Filter characteristic selection method based on subclass problem classification ability measurement. The main difference between the method and the most existing methods is that a single value is not used as a classification ability evaluation standard of characteristics, the classification ability of each subclass problem by the characteristics and weighted average values thereof are used for measurement, and the classification abilities of the subclass problems by the characteristics are specially valued. According to the method, the characteristic having a strong total classification ability can be ensured to be selected, the characteristic having a strong subclass problem classification ability and a not strong total classification ability can also be ensured to be selected, so that more accurate ordering evaluation on characteristic classification abilities is obtained, better characteristic subsets are also obtained, redundant characteristics are effectively reduced, and classification prediction accuracy is increased. The method can be used for classification prediction of cancer data sets, prediction accuracy is improved, cancer markers can be found, and therefore early diagnosis of cancers and development of targeted drugs for treating the cancers are promoted.

Description

[0001] This invention was supported by Project Nos. 15JCYBJC46600, 52XB1002. technical field [0002] The invention belongs to the technical field of machine learning and pattern recognition, and relates to a reasonable and effective feature subset selection method. More specifically, it is a filter feature selection method based on subcategory problem classification ability measurement. Background technique [0003] Feature selection is one of the two main approaches to dimensionality reduction. It plays a vital role in machine learning and pattern recognition, and it is also one of the basic issues studied in it, and it is a key data preprocessing step in constructing classifiers. Feature selection is based on some evaluation criteria to select a subset of features that are meaningful for classification on the original feature set to remove irrelevant or redundant features, thereby reducing the dimension of the original space to m dimensions that are much smaller than the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00
CPCG16H50/70
Inventor 王淑琴梁颖
Owner TIANJIN NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products