Feature gene selection method based on logistic and relevant information entropy

A technology of characteristic genes and related information, applied in the field of data processing, can solve problems such as poor generalization performance and high risk of overfitting, and achieve the effects of reducing workload, optimizing data quality, and avoiding information loss

Inactive Publication Date: 2015-05-06
HENAN NORMAL UNIV
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current method solves the negative impact of redundancy to a certain extent, but directly uses learning algorithms to evaluate gen

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Feature gene selection method based on logistic and relevant information entropy
  • Feature gene selection method based on logistic and relevant information entropy
  • Feature gene selection method based on logistic and relevant information entropy

Examples

Experimental program
Comparison scheme
Effect test

example

[0114] In this paper, the breast cancer data set (Breast) and the gastric cancer data set (Bastric) in the UCI database are used as experimental data, and the classifier LIBSVM is used to select the parameters and features at the same time to find the optimal point of the corresponding parameters. The kernel function is RBF. Since the genetic data sample is small, the value of the penalty factor can be increased. If the value is larger, it indicates that the emphasis on each sample is stronger. Therefore, the penalty factor c=100 in this paper, and other parameters are default. Table 1 is the description of the experimental data, and Table 2 is the classification performance comparison of the three algorithms.

[0115] Table 1 Description of experimental data set

[0116]

[0117]

[0118] Table 2 Classification performance comparison of three algorithms

[0119]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a novel feature gene selection method based on logistic and relevant information entropy. The method comprises the following steps that a dataset is subjected to logistic regression, a gene variable with great influence on the classification is obtained, a Relief algorithm is used for giving a value on the gene variable, sequencing is carried out, a maximum feature value gene is added to an initial feature gene set, and the relevant information entropy is calculated. The novel feature gene selection method has the advantages that a logistic regression model in the machine study is introduced into the feature gene selection method, and a high-quality gene expression profile is obtained; the correlation between gene variables is measured by the relevant information entropy, redundant genes are deleted, and a feature gene sub set with high classification capability and fewer genes is obtained through searching a feature gene space set.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a feature gene selection method based on logistic and related information entropy. Background technique [0002] With the development of large-scale gene expression profiling technology, the analysis and modeling of gene expression data has become an important topic in the field of bioinformatics research. Gene expression data has the characteristics of high-dimensional and small samples, which has a serious impact on learning classification. Therefore, it is necessary to use some optimization algorithm to select a subset of characteristic genes with the most ability to identify diseases from all attributes of gene expression profile data. The subset of genes identified plays an important role in the cancer recognition process. Due to the characteristics of "high-dimensional small samples", many classifiers in commonly used data mining have a high classification accuracy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22
Inventor 徐久成李涛孙林孟慧丽马媛媛张倩倩徐天贺胡玉文李晓艳冯森
Owner HENAN NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products