Method and system for constructing disease risk prediction model based on sequencing and machine learning

A prediction model and disease risk technology, applied in the field of biomedicine, can solve the problems of early and late lesions without good molecular markers, and achieve the effect of high diagnostic sensitivity and specificity, and high prediction accuracy.

Pending Publication Date: 2021-06-18
QINGDAO MEDINTELL BIOMEDICAL CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the clinical distinction between ulcer and cancer can be basically confirmed by colonoscopy combined with biopsy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for constructing disease risk prediction model based on sequencing and machine learning
  • Method and system for constructing disease risk prediction model based on sequencing and machine learning
  • Method and system for constructing disease risk prediction model based on sequencing and machine learning

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0157] Example 1 Construction of the disease prediction model

[0158] 1, get data

[0159] Collect sample sequencing data.

[0160] 2, sequencing data processing

[0161] Use the FASTP software to get connecting and quality control to get Cleandata.

[0162] 3, sequence alignment

[0163] Use the ICGC software to get the Cleandata to get the BAM file to the human reference genome (version Grch38.d1.vd1).

[0164] 4, build expression quantities matrix

[0165] Using HTSEQ software, combined with annotation files, the quantification of gene expression quantities is compared to the post-BAM file, according to the genetic ID, multi-sample expression quantity constructs the gene expression matrix of M * N, the gene expression matrix in the gene expression matrix The value of the line J column represents the expression of the jual corresponding to the expression of the i-th gene, of which 1 ≤ i ≤ m, 1 ≤ j ≤ n; m represents the number of detected genes, n represents the number of sample...

Example Embodiment

[0193] Example 2 Construction of a diagnostic model of colorectal disease

[0194] 1, data source and acquisition

[0195] All data to construct a rectal cancer risks model is downloaded from the TCGA and NCBI-SRA databases, where colorectal cancer, cancer, and cancerous expression quantities files from TCGA databases, intestinal polyp RAW data Download from NCBI-SRA databases. The retrieval got a total of 443 colorectal cancer cases, 31 intestinal polyps samples, 72 normal samples, a total of 546 sample data were used for further screening and quality control.

[0196] 2, Raw Data's processing

[0197] Use the FASTP software to get joint processing and quality control, get Cleandata, including:

[0198] a. Connector

[0199] The joint processing is performed using the FASTP software dual-end sequence automatic detection mode;

[0200] b. Data trim and quality control

[0201] The minimum N-base number threshold is 5, the READS minimum length threshold is 15, the base quality thre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method and a system for constructing a disease risk prediction model based on sequencing and machine learning. The system is embedded with a disease risk prediction model developed by utilizing machine learning, and the disease risk of a subject is judged according to the risk prediction model by receiving sequencing information from the subject.

Description

technical field [0001] The invention belongs to the field of biomedicine, and relates to a method and system for constructing a disease risk prediction model based on sequencing and machine learning. Background technique [0002] With the development of sequencing technology and the reduction of cost, in the field of human health, human genome sequencing will become the mainstream trend in the future, and precision medicine will be the ultimate goal of sequencing. Therefore, how to accurately discover the sequencing results has become a necessary means to realize precision medicine. [0003] Colorectal cancer (CRC) is the third most common cancer worldwide and the fourth most common cause of cancer-related death. Its onset is rapid, its prognosis is poor, and its incidence is increasing year by year. According to statistics, people with a positive family history of colorectal cancer and people over the age of 50 have a significantly increased risk of CRC, and patients with ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B40/20G16H50/30G06K9/62
CPCG16B40/20G16H50/30G06F18/2135G06F18/241
Inventor 杨承刚李雨晨
Owner QINGDAO MEDINTELL BIOMEDICAL CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products