Method and system for constructing disease risk prediction model based on sequencing and machine learning
A prediction model and disease risk technology, applied in the field of biomedicine, can solve the problems of early and late lesions without good molecular markers, and achieve the effect of high diagnostic sensitivity and specificity, and high prediction accuracy.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0157] Embodiment 1 Construction of disease prediction model
[0158] 1. Get data
[0159] Collect the sequencing data for the sample.
[0160] 2. Processing of sequencing data
[0161] Fastp software was used for joint processing and quality control to obtain cleandata.
[0162] 3. Sequence Alignment
[0163] Use ICGC software to align cleandata to the human reference genome (version GRCh38.d1.vd1) to obtain a bam file.
[0164] 4. Construct expression matrix
[0165] Use htseq software, combined with annotation files, compare and compare bam files to quantify gene expression, and construct M*N gene expression matrix according to gene ID and expression of multiple samples, the i-th in the gene expression matrix The value in the jth column of the row indicates the expression count value of the jth sample corresponding to the ith gene, where 1≤i≤M, 1≤j≤N; M indicates the number of detected genes, and N indicates the number of analyzed samples. Save the expression matrix a...
Embodiment 2
[0193] Embodiment 2 Construction of colorectal disease diagnosis model
[0194] 1. Data source and acquisition
[0195] All the data for building the risk model of colorectal cancer were downloaded from the TCGA and NCBI-SRA databases, the expression files of colorectal cancer and adjacent cancers were downloaded from the TCGA database, and the raw data of intestinal polyps were downloaded from the NCBI-SRA database. A total of 443 samples of colorectal cancer cases, 31 samples of intestinal polyps, and 72 normal samples were selected from the retrieval, and a total of 546 sample data were used for further screening and quality control.
[0196] 2. Raw data processing
[0197] Use fastp software for joint processing and quality control to obtain cleandata. The steps include:
[0198] a. Joint treatment
[0199] Use the double-end sequence automatic detection mode of fastp software to process the adapter;
[0200] b. Data pruning and quality control
[0201] The minimum th...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com