Unlock instant, AI-driven research and patent intelligence for your innovation.

Cancer driver gene identification method based on machine learning and various statistic principles

A technology driven by genes and machine learning, applied in the intersection of bioinformatics and cancer medicine, can solve the problems of high specificity, low sensitivity, lack of robustness of tumors, etc., and achieve the effect of low false positives and high robustness

Active Publication Date: 2018-05-29
ZHEJIANG UNIV
View PDF7 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the above-mentioned bioinformatics tools still have some shortcomings. First, these algorithms have not achieved a good balance between sensitivity and specificity, that is, some algorithms have high sensitivity but low specificity, or high specificity and low sensitivity; secondly, These methods lack robustness to different tumor types, that is, for some tumor types, the method performs well and finds many reliable driver genes, but shows poor performance for other tumor types

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cancer driver gene identification method based on machine learning and various statistic principles
  • Cancer driver gene identification method based on machine learning and various statistic principles
  • Cancer driver gene identification method based on machine learning and various statistic principles

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments, but the present invention is not limited thereto.

[0030] 1. Experimental materials:

[0031] Experimental sample data: lung squamous cell carcinoma mutation data, downloaded from the TCGA database ( http: / / tcga- data.nci.nih.gov / docs / publications / lusc_2012 / );

[0032] Operating system: Linux

[0033] Software: R, Perl. Both are downloaded from the official website.

[0034] 2. Experimental methods, such as figure 1 Shown:

[0035] (1) Organize the data as follows: the first column is the gene name, the fifth column is the chromosome number of the gene, the sixth column is the starting position of the gene mutation sequence, the ninth column is the gene mutation classification, and the eleventh column is the corresponding mutation gene sequence The normal reference sequence, the 13th column is the mutant gene sequence, and th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cancer driver gene identification method based on machine learning and various statistic principles. The method includes following steps: (1), organizing data into a standardformat; (2), calculating a background variation rate; (3), statistically testing cancer driver gene; (4), performing Monte-Carlo simulation on statistic distribution; (5), adjusting P value. The background variation rate of each sample, gene and mutation types and influence of various mutation types on protein functions are taken into consideration, and score testing is adopted to judge the driver gene, so that the method has high robustness and is widely suitable for cancer of various types; sensitivity and specificity are balanced well, more driver genes can be detected while low false positive rate can be maintained. The method is of important significance in looking for potential loci for cancer treatment and developing anticancer drug.

Description

technical field [0001] The invention belongs to the interdisciplinary field of bioinformatics and cancer medicine, and relates to a cancer driver gene identification method using machine learning and various statistical methods. Background technique [0002] Most cancers are diseases caused by mutations in somatic cells. Driver genes are factors that directly lead to the occurrence and development of cancer. On the other hand, there is no direct relationship between passenger genes and cancer, so it is necessary to identify driver genes. Several major tumor sequencing projects in the world, such as Cancer Genome Atlas Project (TCGA), International Cancer Genome Consortium Project (ICGC) and Clinical Application Research to General Effective Therapy Project (TARGET), have established comprehensive catalogs of somatic mutations in various types of cancer. A major goal of these sequencing projects is to identify driver genes that cause cancer. Finding cancer driver genes can ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/22G06F19/24
CPCG16B30/00G16B40/00
Inventor 刘鹏渊韩毅陆燕周莉媛
Owner ZHEJIANG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More