Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Causal correlation analysis method for fine positioning of whole genome pathogenic SNP (Single Nucleotide Polymorphism)

A genome-wide and association analysis technology, applied in the fields of genomics, proteomics, instruments, etc., can solve problems such as easy deletion of pathogenic SNPs, affecting analysis results, etc., to break through limitations, reduce false positive rates, and improve true The effect of positive rate

Pending Publication Date: 2021-12-17
SHANDONG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The sparse model constructed under this strategy retains only a few SNPs in highly correlated SNP regions, and it is easy to mistakenly delete the real pathogenic SNPs
[0008] (4) The Bayesian model represented by the Bayesian variable selection model performs fine positioning by calculating the posterior probability of the SNP as the pathogenic locus. Settings will affect analysis results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Causal correlation analysis method for fine positioning of whole genome pathogenic SNP (Single Nucleotide Polymorphism)
  • Causal correlation analysis method for fine positioning of whole genome pathogenic SNP (Single Nucleotide Polymorphism)
  • Causal correlation analysis method for fine positioning of whole genome pathogenic SNP (Single Nucleotide Polymorphism)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Take a small data set as an example, assuming that the real causal relationship between SNP and Y in the data is as follows figure 2 shown in . Assuming that there are 7 SNPs in the original data set, in order to screen the causative locus of outcome Y, first use the unary regression model to judge whether each SNP is marginally independent from Y, and remove the SNP 7 ; In the binary regression model due to a given SNP 1 Post-SNP 4 Independent of Y condition, remove SNP 4 ; Similarly, the triple regression model can be used to remove SNPs 5 and SNP 6 ; At this time, the number of remaining SNPs in the candidate set is less than 4, and the quaternary regression model cannot be constructed, and the operation is terminated. then {SNP 1 , SNP 2 , SNP 3} is the pathogenic site selected.

Embodiment 2

[0057] Such as image 3 As shown, Example 2 of the present invention provides a causal GWAS method for fine mapping of pathogenic SNPs oriented to the whole genome, comprising the following steps:

[0058] (1) First determine whether each SNP in the genome is independent of the outcome Y. In this model, a univariate regression model (such as linear regression or logistic regression model) is used to conduct genome-wide association analysis on samples, and based on the analysis results, the P value is screened below a certain threshold (such as P–8 ), and define the selected SNP as the candidate gene set S 0 ,which is:

[0059]

[0060] gene set S 0 The SNPs in are sorted according to the P value from small to large.

[0061] (2) Fixed S 0 The SNP with the smallest P value in 01 , the remaining SNPs constitute S 0 subset SNP 0j (j=2, . . . , J).

[0062] SNPs 01 with SNP 0j (j=J, . . . , 2) Simultaneously perform regression analysis on the outcome Y (for example, u...

Embodiment 3

[0068] Embodiment 3 of the present invention provides a causal association analysis system for fine mapping of pathogenic SNPs in the whole genome, including:

[0069] The data acquisition module is configured to: acquire genome data to be analyzed;

[0070] The causal GWAS module, configured as:

[0071] Genome-wide association analysis was performed on genomic data using a single factor regression model, and significant SNPs with P values ​​lower than the preset threshold were screened, and the selected SNPs were defined as the first candidate gene set (i.e. S 0 ), sort the SNPs in the first candidate gene set according to the P value from small to large;

[0072] Fix the SNP with the smallest P value in the first candidate gene set 01 , the remaining SNPs constitute the first candidate gene subset, SNP 01 Perform binary regression analysis on the outcome in turn with the SNPs in the first candidate gene subset, calculate the conditional independence between the two SNPs ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a causal correlation analysis method for fine positioning of whole-genome pathogenic SNP, which is used for finely positioning the pathogenic SNP of human complex diseases and reducing the false positive rate of GWAS results. Under the guidance of a causal inference framework, a causal GWAS analysis strategy (CDSFM algorithm) for causal site fine positioning of a whole genome is constructed, under the constraint of a specific causal graph model, the false positive rate is effectively reduced, the true positive rate is improved, and the hit rate of causal SNP capture is improved to 90% or above through gradual condition independent adjustment strategies. the detection efficiency is relatively high.

Description

technical field [0001] The invention relates to the technical field of biological genetic engineering, in particular to a causal association analysis method for the fine positioning of pathogenic SNPs in the whole genome. Background technique [0002] The statements in this section merely provide background art related to the present invention and do not necessarily constitute prior art. [0003] Since the Genome Wide Association Study (GWAS) method was proposed, more than 140,000 SNPs throughout the genome have been found to be statistically associated with more than 4,000 common diseases. However, few SNPs have been functionally verified in the laboratory, making it difficult to elucidate their genetic mechanisms. While such a high false positive rate brings difficulties to subsequent verification, it will also cause non-geneticists to doubt the results of GWAS. How to further fine-tune the true pathogenic SNP and reduce the false positive rate is an issue widely discuss...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/20G16B40/00
CPCG16B20/20G16B40/00Y02A90/10
Inventor 薛付忠孙晓茹李洪凯杨帆
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products