Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Gene expression profile feature selection method based on MFA score and redundancy exclusion

A feature selection method and gene expression profile technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems that cannot be ruled out, affect classification results, and feature redundancy of tumor gene expression profiles, and achieve improved classification Accuracy, the effect of reducing the feature dimension

Inactive Publication Date: 2014-12-10
BEIJING UNIV OF TECH
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the feature redundancy of tumor gene expression profiles is high, and this method cannot eliminate the redundancy, which affects the classification effect to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene expression profile feature selection method based on MFA score and redundancy exclusion
  • Gene expression profile feature selection method based on MFA score and redundancy exclusion
  • Gene expression profile feature selection method based on MFA score and redundancy exclusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0031] The lung cancer data set (Lung Cancer) on the website http: / / www.gems-system.org is now used, and its characteristics are listed in the following table:

[0032] Table 1 Number of Lung Cancer genes: 12600

[0033]

[0034]

[0035] The data is randomly divided into two equal parts, half of which is divided into training sets for feature selection, and then tested on the other half of the test set with a support vector machine to obtain the classification accuracy (if the number of samples of a certain type is odd, it is divided into training sets There is one more set than the test set, such as the Normal class, there are 9 samples assigned to the training set, and 8 samples to the test set), so that the training set has 103 samples and the test set has 100 samples.

[0036] 1. Feature selection:

[0037] 1) Construct the intra-class neighbor matrix W w and between-class neighbor matrix W b .

[0038]The set of 103 samples of the Lung Cancer training set can b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a gene expression profile feature selection method based on MFA score and redundancy exclusion. Feature selection and classification of oncogene expression profiles facilitate the early diagnosis of tumors, and the causes of the tumors are explained from the angle of gene expression. Firstly, a class inter neighbor matrix Wb and a class inter neighbor matrix Ww are structured through an MFA score algorithm, consequently a class inter Laplacian matrix Lb and a class inter Laplacian matrix Lw are obtained, and lastly genes are ranked. For the feature that gene expression data are high in redundancy, the correlation among the genes is judged through Pearson correlation coefficients, the high correlation genes, namely redundancy genes, are excluded, and finally a gene subset is obtained. The gene expression profile feature selection method based on the MFA score and redundancy exclusion is suitable for training samples distributed in any space, the number of dimensions of features is further reduced by excluding the redundancy genes, the complexity of the algorithm is small, and the high classification accuracy is obtained in experiments.

Description

technical field [0001] The invention relates to the technical field of bioinformatics tumor gene data processing, and relates to a feature selection method for tumor gene expression profiles. Background technique [0002] With the continuous development of bioinformatics, a large amount of gene expression data has been obtained, especially the gene expression data of tumors. Using machine learning to analyze these data and obtain classification feature genes is helpful for early diagnosis of tumors, and has been a hot spot in bioinformatics research in recent years. Since the dimensionality of tumor gene expression data is generally thousands or even tens of thousands, it will affect the efficiency of machine learning algorithms and even reduce the effect of learning. This is the so-called "curse of dimensionality". Selecting the genes with more classification information among the genes not only improves the learning efficiency and learning accuracy, but also has important...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/24
Inventor 李建更苏磊逄泽楠李晓丹张卫
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products