Acute myelogenous leukemia drug sensitivity related gene classifier constructed by machine learning algorithm

An acute myeloid and machine learning technology, applied in the field of leukemia research, can solve the problems of lack of consistency of drug resistance and other problems

Active Publication Date: 2021-10-26
宋洋
View PDF8 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, the current cell line database also lacks research on the consistency of drug resistance from epigenetics to transcriptional expression

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Acute myelogenous leukemia drug sensitivity related gene classifier constructed by machine learning algorithm
  • Acute myelogenous leukemia drug sensitivity related gene classifier constructed by machine learning algorithm
  • Acute myelogenous leukemia drug sensitivity related gene classifier constructed by machine learning algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0102] A machine learning algorithm constructs a drug-sensitivity-related gene classifier for acute myeloid leukemia, and the specific algorithm is:

[0103] Cluster analysis of drugs

[0104] The K-means clustering algorithm was used to group the patients' drug sensitivity (K=2), so that the patients were divided into two groups, and a data set with classification labels was provided for the subsequent screening of drug-sensitive genes using the supervised learning algorithm.

[0105] The specific steps of the K-Means algorithm

[0106] (1) Randomly select 2 samples C from the processed samples 1 , C 2 as the initial cluster center.

[0107] (2) According to the data of each sample, calculate the distance between each sample and two cluster centers, and divide it into the class corresponding to the cluster center with the smallest distance.

[0108] The distance measure between the sample and the cluster center is Euclidean distance:

[0109] Among them, x represents t...

Embodiment 2

[0158] In this example, a total of 41 patients with relapsed and refractory AML and newly diagnosed AML were included, and transcriptome RNA-seq sequencing and methylomics 850K chip sequencing were performed at the same time. A total of 598,243 methylated gene probe sites are involved, and a total of 23,710 genes are involved in the transcriptome. Considering the limited number of samples and the large number of sample feature genes, all gene feature data are used for modeling, which is prone to failure of high-dimensional features. As a result, the accuracy of model learning is lost. Therefore, this paper first considers the differential analysis of gene features, and then performs dimensionality reduction in different algorithm modes.

[0159] ChAMP method was used for differential analysis of methylomics and DESeq2 method for differential analysis of transcriptome data. Then, based on the difference analysis, the feature dimensionality reduction of the original data is car...

Embodiment 3

[0162] GDSC database verification screening gene drug sensitivity prediction accuracy

[0163] IC50 (half inhibitory concentration) is the main evaluation index for the therapeutic effect of GDSC database drugs in cell lines. In this study, the R package pRRophetic version 0.5 was used to obtain and integrate the GDSC database. The pRRophetic package was developed by Paul Geeleher in 2014. It selected the clinical response of 138 drugs from more than 700 cell lines included in the Cancer Genome Project (CGP) database, and developed a drug response prediction algorithm using the expression matrix of the CGP database. The reliability of the algorithm is verified in the data set. The basic principles and steps are as follows:

[0164] 1) Standardize the CGP database (cell line gene expression matrix as a training set) and the expression matrix to be predicted (clinical patient gene expression matrix as a test set) respectively, and select empirical Bayesian method to merge data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an acute myelogenous leukemia drug sensitivity related gene classifier constructed by a machine learning algorithm. The acute myelogenous leukemia drug sensitivity related gene classifier is characterized by comprising sample clustering and gene screening; the sample clustering is to respectively cluster the patient sensitivity of 24 drugs by using a K-means clustering algorithm; the gene screening is to carry out gene screening and verification on methylation and transcriptome data of 24 drugs by utilizing a feature selection model according to the clustering result of a patient. According to the method, the final screening of the target gene is realized by adopting logistic regression, ridge regression, RFECV-SVM and RFECV-RF algorithms. Parameter optimization is carried out on logistic regression and ridge regression by using four-fold layered cross validation, and a threshold value of feature selection is set as an average value of feature weights, that is, features with the feature weights larger than the average value are reserved. The RFECV algorithm uses different learning models SVM and RF for screening.

Description

technical field [0001] The invention relates to the field of leukemia research, in particular to a machine learning algorithm for constructing a drug-sensitivity-related gene classifier for acute myeloid leukemia. Background technique [0002] Acute myeloid leukemia (AML), as a group of highly heterogeneous hematological malignancies, accounts for about 70% of the total incidence of leukemia, and 20% to 40% of patients are difficult to achieve complete remission (CR). Remission is refractory AML. The basic chemotherapy regimen for AML is a regimen composed of anthracycline / anthraquinone drugs combined with cytarabine (Ara-C). The domestic first-line treatment drug also includes homoharringtonine (HHT). According to the re-induction regimen, the proportion of these patients in CR again ranges from 30-68%. With the clinical application of molecular targeted drugs, the curative effect of some patients has improved, but only solved the problems of some patients [0003] With ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16C20/50G16C20/70G06K9/62G06N20/00G06N20/10
CPCG16C20/50G16C20/70G06N20/00G06N20/10G06F18/23213
Inventor 宋洋秘营昌王建祥房秋云
Owner 宋洋
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products