Cancer classification and characteristic gene selection method

A technology for eigengenes and cancer, applied in the field of biological information, can solve problems such as unstable prediction performance, and achieve the effect of enhancing interpretability, improving accuracy and stability, and improving model accuracy

Active Publication Date: 2021-09-24
NANCHANG UNIV
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For data sets with different feature group information, the traditional sparse group lasso (SGL) method has unstable prediction performance, and its feature selection ability depends on the selection of α

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cancer classification and characteristic gene selection method
  • Cancer classification and characteristic gene selection method
  • Cancer classification and characteristic gene selection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention is described in detail below with reference to accompanying drawing and embodiment:

[0055] attached figure 1 and 2 It can be seen that a cancer classification and feature gene selection method includes the following steps:

[0056] (1) The establishment of the primary learner:

[0057] For n*p-dimensional training set matrix X and sample label y, establish T logistic regression models as primary learners;

[0058] For the sparse group lasso (SGL) regularization term, it has a mixed parameter α to adjust the weight of lasso and group lasso, based on T equidistant distribution between (0, 1) α value, establish corresponding T SGL Regularized logistic regression solution model;

[0059] for each α t , choose the optimal regularization parameter λ by cross-validation t , and record the predicted probability value of the validation set in each primary learner as n*T matrix

[0060] (2) The establishment of the secondary learner:

[0061] Est...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of biological information, and discloses a cancer classification and characteristic gene selection method, which comprises the following steps of: establishment of a primary learner: establishing T logistic regression models and a spark group lasso regularized loss function solving model corresponding to the T logistic regression models, and outputting a secondary learner training set; establishing a secondary learner: establishing a multi-response regression model and a loss function solving model corresponding to L1 regularization, and outputting a training set prediction result; and a prognosis feature selection model: establishing a prognosis feature selection SGL model. According to the cancer classification and feature gene selection method, the three standards of prediction, stabilization and selection are met, the accuracy and stability of the model on cancer classification prediction are improved through stacking integration, oncogenes and cancer-related genes are accurately selected, and the interpretability of the model is enhanced; gene and gene pathway priori knowledge are fused, and the accuracy of cancer classification and the effectiveness of feature selection are improved.

Description

technical field [0001] The invention relates to the field of biological information, in particular to a cancer classification and characteristic gene selection method. Background technique [0002] Numerous studies have demonstrated that genomics data are useful for classifying many cancers. With the development of sequencing technology, it is now possible to isolate and sequence genetic material from single cells. For such gene expression RNA-seq data, the number of variables p (as gene expression) is much larger than the sample size n. However, from a biological perspective, only a small subset of genetic variables strongly point to the targeted disease, whereas most genes are not associated with cancer classification. These irrelevant genes may introduce noise and reduce classification accuracy. Furthermore, from a machine learning perspective, too many genes may lead to overfitting and negatively impact classification performance, and the optimization process is not u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B35/20G16B40/00G16B5/00
CPCG16B35/20G16B40/00G16B5/00
Inventor 施绍萍何欢余佳麟
Owner NANCHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products