Unlock instant, AI-driven research and patent intelligence for your innovation.

Compression based disease gene fast analysis algorithm

A technology for rapid analysis and disease-causing genes, applied in genomics, computing, special data processing applications, etc., to achieve the effects of improving efficiency, good adaptability, improving operating efficiency and fragment detection and computing efficiency

Inactive Publication Date: 2018-05-01
TIANJIN UNIVERSITY OF SCIENCE AND TECHNOLOGY
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the FADG algorithm has a great improvement compared with the P&F algorithm, there is a lot of room for improvement in the accuracy of the fitting error and the time efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Compression based disease gene fast analysis algorithm
  • Compression based disease gene fast analysis algorithm
  • Compression based disease gene fast analysis algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0041] The design concept of the present invention is as follows: firstly considering the dimorphism of SNP, converting the SNP genotype into binary data, and then considering the idea of ​​introducing data compression, analyzing multiple SNP sites at the same time, reducing the number of times of site comparative analysis to improve The efficiency of the algorithm, and among two or more individuals, if a DNA fragment has the same nucleotide sequence, it is said that the DNA fragment is IBS, if the IBS fragment is inherited from the same ancestor and there is no intermediate process If a recombination event has occurred, the fragment is said to be IBD. One of the most important applications of IBD fragment detection is the ability to quantify the association between genetic loci and diseases. IBD mapping is similar to linkage analysis, but can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a compression based disease gene fast analysis algorithm. The algorithm is mainly technically characterized in that SNP (single nucleotide polymorphism) genotype data are subjected to binary conversion; a binary sequence is subjected to compression after conversion; an evaluation criterion of similarity among samples is determined; locus score results of the samples are obtained according to the evaluation criterion of similarity among the samples, and a threshold is determined so as to obtain a candidate IBD (identity by state) fragments; case group and control groupsamples are selected and subjected to case contrast analysis, and case group / case group individual and control group / control group individual evaluation value differences are compared; association between SNP and diseases is identified. The compression based disease gene fast analysis algorithm is reasonable in design, experimental analysis operating time is shortened to a great extent, efficiencyis improved, disease genes can be positioned accurately, and the algorithm is widely applicable to correlation research of association analysis between common diseases and genes.

Description

technical field [0001] The invention belongs to the technical field of biological information processing, in particular to a fast analysis of disease gene based on compression algorithm (FADG-C, Fast analysis of disease gene based on compression). Background technique [0002] Single nucleotide polymorphism (Single nucleotide polymorphism, SNP) refers to the polymorphism phenomenon in which a single nucleotide variation exists in the DNA sequence of different individuals of a certain organism. SNP is an abnormally abundant variant form in the genome, accounting for more than 90% of genetic polymorphisms in the human genome. SNPs differ from rare variants in that, generally, such variants are called mutations with a frequency of 1% or less in the population, and single nucleotides only when the frequency is greater than 1% polymorphism. [0003] The many characteristics of SNP make it suitable for the genetic anatomy of complex traits and diseases, as well as population-bas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 孙志伟贾洪川马永军蔡润身
Owner TIANJIN UNIVERSITY OF SCIENCE AND TECHNOLOGY