Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for predicting cancer synthesis lethal gene pairs based on decision tree and linear regression model

A linear regression model and synthetic lethal technology, applied in genomics, instruments, biological systems, etc., can solve problems such as high cost, missing synthetic lethal gene pairs, and difficulty in synthesizing lethal gene pairs

Pending Publication Date: 2019-10-18
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, there are certain difficulties in the prediction of synthetic lethal gene pairs. So far, the results of various prediction models of synthetic lethal gene pairs that have been published are quite different. For example, the famous DAISY algorithm (Jerby-Arnon, L., Pfetzer, N., Waldman, Y.Y., Mcgarry, L., James, D., Shanks, E., Seashore-Ludlow, B., Weinstock, A., Geiger, T. and Clemons, P.A., 2014. Predicting Cancer-Specific Vulnerability via Data-Driven Detection of Synthetic Lethality.Cell158,1199-1209.), although this algorithm uses a variety of data screening criteria for screening, it still misses a large number of synthetic lethal gene pairs, and uses Fixed criteria for screening
In view of this feature, MiSL (Mining Synthetic Lethals) (Sinha, S., Thomas, D., Chan, S., Gao, Y., Brunen, D., Torabi, D., published by Subarna Sinha et al. in Nature Communications magazine .,Reinisch,A.,Hernandez,D.,Chan,A.and Rankin,E.B.,2017.Systematic discovery of mutation-specific syntheticethals by mining pan-cancer human primary tumor data.Nature Communications8,15580) algorithm has been partially improved , but still has many limitations, the number of predicted synthetic lethal gene pairs is too small and limited to the prediction of 12 cancer types in TCGA
Therefore, the current algorithm for the prediction of synthetic lethal gene pairs is still immature, and there is still a large room for optimization and improvement. In terms of experimental verification, only a small number of synthetic lethal interaction relationships between gene pairs have been determined, and the cost of experimental verification is relatively low. Therefore, it is very necessary to use computer calculation methods to find a reliable algorithm with high accuracy to screen synthetic lethal genes that can be used for experimental verification.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for predicting cancer synthesis lethal gene pairs based on decision tree and linear regression model
  • Method for predicting cancer synthesis lethal gene pairs based on decision tree and linear regression model
  • Method for predicting cancer synthesis lethal gene pairs based on decision tree and linear regression model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0033] The method for predicting the synthetic lethal gene pair of cancer based on decision tree and linear regression model, specifically comprises the following steps:

[0034] 1) Extract multi-omics data from high-throughput sequencing data, and preprocess it into a matrix format, including gene name, sample name and corresponding quantitative data; the multi-omics data includes gene mutation, mRNA expression, DNA The medical big data of methylation and copy number variation can be analyzed from the perspective of a single cancer type or pan-cancer. Each feature of each cancer type is integrated and processed into a data matrix. The row name is gene name, and the column name is sample Numbering.

[0035] 2) Effective data screening is performed based on the multi-omics data in 1), specifically integrating the original data and removing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of gene prediction, and discloses a cancer synthesis lethal gene pair prediction method based on a decision tree model and a linear regression model. The method is mainly divided into a data training stage and a synthetic lethal gene pair testing stage. The method sequentially comprises the steps: firstly, extracting the data, which contains all mutant gene paircoverage rates, DNA methylation, mRNA expression profiles and copy number variation, from multi-omics data to serve as model feature values to be clustered, removing the false positive, performing normalization processing, and training a decision tree model and a linear regression model; secondly, respectively predicting synthetic lethal gene pairs possibly existing in various cancers through utilizing the decision tree model and the linear regression model to obtain distribution maps of the synthetic lethal gene pairs in different cancers; and finally, comparing the two models to obtain 508 pairs of synthetic lethal genes existing in the pan-cancer. According to the method, the synthetic lethal gene pairs possibly existing in various cancers can be accurately predicted, so a basis is provided for accurate treatment of the cancers.

Description

technical field [0001] The invention belongs to the field of gene prediction, and specifically relates to a combination of genes with synthetic lethal effects related to cancer, which is integrated and analyzed for multi-omics high-throughput sequencing data and predicted through different model algorithms based on various molecular level feature values. Background technique [0002] As we all know, the treatment of cancer has always been a difficult problem in the field of modern medicine. Traditional treatment methods are difficult to completely kill cancer cells due to the spread of cancer cells with the blood. In recent years, the development of genome sequencing technology has made some Emerging treatments continue to emerge and offer possible cures for cancer. Targeted therapy has attracted widespread attention due to its specific treatment methods and less damage to normal cells. However, the problem of drug resistance in targeted therapy widely exists in a variety of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B20/00G16B5/00
CPCG16B20/00G16B5/00
Inventor 郭丽殷子博杨国伟钱博文
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products