Method and system for constructing disease risk prediction model based on sequencing and machine learning
A prediction model and disease risk technology, applied in the field of biomedicine, can solve the problems of early and late lesions without good molecular markers, and achieve the effect of high diagnostic sensitivity and specificity, and high prediction accuracy.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0157] Example 1 Construction of the disease prediction model
[0158] 1, get data
[0159] Collect sample sequencing data.
[0160] 2, sequencing data processing
[0161] Use the FASTP software to get connecting and quality control to get Cleandata.
[0162] 3, sequence alignment
[0163] Use the ICGC software to get the Cleandata to get the BAM file to the human reference genome (version Grch38.d1.vd1).
[0164] 4, build expression quantities matrix
[0165] Using HTSEQ software, combined with annotation files, the quantification of gene expression quantities is compared to the post-BAM file, according to the genetic ID, multi-sample expression quantity constructs the gene expression matrix of M * N, the gene expression matrix in the gene expression matrix The value of the line J column represents the expression of the jual corresponding to the expression of the i-th gene, of which 1 ≤ i ≤ m, 1 ≤ j ≤ n; m represents the number of detected genes, n represents the number of sample...
Example Embodiment
[0193] Example 2 Construction of a diagnostic model of colorectal disease
[0194] 1, data source and acquisition
[0195] All data to construct a rectal cancer risks model is downloaded from the TCGA and NCBI-SRA databases, where colorectal cancer, cancer, and cancerous expression quantities files from TCGA databases, intestinal polyp RAW data Download from NCBI-SRA databases. The retrieval got a total of 443 colorectal cancer cases, 31 intestinal polyps samples, 72 normal samples, a total of 546 sample data were used for further screening and quality control.
[0196] 2, Raw Data's processing
[0197] Use the FASTP software to get joint processing and quality control, get Cleandata, including:
[0198] a. Connector
[0199] The joint processing is performed using the FASTP software dual-end sequence automatic detection mode;
[0200] b. Data trim and quality control
[0201] The minimum N-base number threshold is 5, the READS minimum length threshold is 15, the base quality thre...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap