Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Redundancy removal feature selection method LLRFC score+ based on LLRFC and correlation analysis

A feature selection method and correlation analysis technology, applied in the field of tumor classification research in bioinformatics, can solve problems such as poor interpretation, no consideration of the relationship between feature genes, and no clear biological significance, and achieve computational time complexity The effect of high and high experimental cost

Inactive Publication Date: 2016-07-06
BEIJING UNIV OF TECH
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The features extracted by LLRFC have no clear biological meaning and are not very explanatory
And due to the complexity of gene expression data, the LLRFC algorithm does not consider the interrelationships between eigengenes, and there is still redundancy in the selected eigengenes.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Redundancy removal feature selection method LLRFC score+ based on LLRFC and correlation analysis
  • Redundancy removal feature selection method LLRFC score+ based on LLRFC and correlation analysis
  • Redundancy removal feature selection method LLRFC score+ based on LLRFC and correlation analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0040] Eleven different tumor data sets (11Tumors) on the website http: / / www.gems-system.org are used for classification verification, and feature selection methods such as LLRFCscore+, LLRFCscore, Laplacianscore, Fisherscore, t-test, etc. are compared here The classification accuracy rate on the data set and the characteristics of the data set are listed in the following table:

[0041] Table 111Tumors

[0042] Number of genes: 12533

[0043]

[0044]

[0045] Considering the balance of the tumor sample distribution, the data is randomly divided into equal parts by category, half of which is the training set, used for feature selection; the other half is the test set, used to test the classification accuracy and obtain the classification accuracy. Since SVM is not sensitive to data dimensionality, it shows great advantages in solving small sample high-dimensional problems. For gene expression profile data, the classifier uses LIBSVM, linear kernel function, and default parameters....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a redundancy removal feature selection method LLRFC (Locally Linear Representation Fisher Criterion) score+ based on LLRFC and correlation analysis. A DNA (Deoxyribonucleic Acid) microarray technology provides a new direction for clinic tumor diagnosis. Performance of gene expression data corresponding to different kinds of tumor is different; through the analysis on the tumor gene expression data, study personnel can realize the accurate recognition on the tumor and the tumor subtype in the molecular level; and important biological significance is realized on the diagnosis and the treatment of the tumor. The feature genes in LLRFC judging criterion descending sort gene expression data is used to be combined with the dynamic correlation analysis strategy for further eliminating redundant features; an LLRFC score+ algorithm is provided; and the optimum feature gene subset is selected. The feature selection method LLRFC score+ has the advantages that the classification precision of a classifier can be effectively improved; a sample data set does not need to meet the normal distribution; and the method is applicable to data in various distribution types. The feature selection method LLRFC score+ can help people to find the virulence gene of cancer, and the early-stage diagnosis, tumor staging and typing, prognosis treatment and the like of clinic tumor diseases are facilitated.

Description

Technical field [0001] The invention relates to the technical field of tumor classification research of bioinformatics, and is a feature selection method for tumor gene expression profile data. Background technique [0002] In recent years, the development of gene chip technology has made it possible to detect the expression levels of thousands of genes in parallel on a large scale, opening up a new way for the diagnosis and prevention of human diseases from the molecular biology level. By analyzing the differences in gene expression in different tissue types (such as normal cells and tumor cells or different stages of cancer), and classifying the corresponding gene expression data, clinical diagnosis and treatment of tumor diseases, subtype identification and prognosis analysis are realized. At present, the morbidity and mortality of cancer patients have been on the rise, and they have become the number one killer of human health. Therefore, the use of gene chip technology to st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/24
CPCG16B40/00
Inventor 李建更李晓丹张卫王朋飞李立杰张岩
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products