Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome

A genome-wide and disease-based technology, applied in the field of data processing, can solve problems such as missing high-order SNP interactions, complex calculations required in the filtering process, and large amounts of calculations

Inactive Publication Date: 2013-10-23
XIDIAN UNIV
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. The order of SNPs that can be processed is very limited. For example, the amount of calculation for exhaustive enumeration is huge. Only two or two SNPs are exhausted to obtain their scores, so that only the second-order SNP interaction (that is, the interaction of two SNPs) can be retained, and the higher order SNP interactions
[0006] 2. The scale of SNPs that can be process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome
  • Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome
  • Method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from whole-genome

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] refer to figure 1 and figure 2 , the method of the present invention is called the FGSA method, and its specific implementation steps are as follows:

[0027] Step 1, preprocessing and initializing the SNP data.

[0028] (1.1) According to the principle that the variation of any gene in the homologous chromosome alleles can have the same impact on the disease, the SNP data is processed into where x i ∈ {0, 1, 2, 3} d is the value of the corresponding site of SNP i: when the two alleles at the corresponding site are homozygous for AA, take 1, for homozygous aa, take 2, for heterozygote Aa or aA, take 3, and when the corresponding site is Take 0 when the allele data of is missing; y i ∈ {1, 2} for sample x i 1 represents the disease group, 2 represents the control group, N represents the number of samples in the SNP data, d represents the number of SNPs in the data, only contains 0, 1, 2, 3 data, where 0 represents missing data , the set of SNPs involved is denot...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for filtering SNP (Single Nucleotide Polymorphism) unrelated to complex diseases from a whole-genome is used for the pathogenetic mechanism research of complex diseases, the early diagnosis and biological medicine development. The method comprises the following steps: (1), pre-processing and initializing SNP data, and processing the SNP data into data only including 0, 1, 2, 3 as per the principle that the influence of the variation of a random gene among the alleles of homologous chromosomes on diseases can be in equal treatment; (2), defining the relevance measure, namely defining the relevance I (Y;X) between the SNP subset X and the diseases Y as mutual information MI (Y;X) between X and Y; (3), searching SNP groups of the candidate suspected pathogenesis in the SNP set by adopting an FGSA (factor based genetic search algorithm) method; (4), selecting the SNP group of which the occurrence frequency of frequentness exceeds a threshold value in the set of SNP groups of the candidate suspected pathogenesis according to the frequentness-relevance priority criterion; (5), outputting the SNP of which the frequentness is larger than the threshold value and ranking at the headmost. According to the invention, the method can reserve the SNP corresponding to the pathogenesis covered by other pathogenesis, so as to lay a foundation for the discovery of the follow-up pathogenesis.

Description

technical field [0001] The invention belongs to the technical field of data processing. Specifically, a method for filtering SNPs irrelevant to complex diseases from whole-genome single nucleotide polymorphism (Single Nucleotide Polymorphism, SNP) data is proposed, which can be used for pathogenicity of complex diseases Mechanism research, early diagnosis and biopharmaceutical development. Background technique [0002] Complex diseases are caused by multiple genetic factors and environmental factors, and their occurrence and development are affected by multiple genes in a complex network structure. Complex diseases are different from Mendelian genetic diseases. In most cases, there are often no major genes that are sufficient to cause disease, and the effect of a single gene on disease may be negligible or even non-existent, but these single gene combinations that may play a negligible role Together, their combined effects may be the cause of complex diseases. These charac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
Inventor 张军英刘丹赵晓雪谭芳慧
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products