Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Multi-task high-order SNP upper detection method and system, storage medium and equipment

A detection method and multi-task technology, applied in the fields of proteomics, instrumentation, genomics, etc., can solve the problems of high computational complexity of detection methods, insufficient success rate of detection algorithms, and inaccurate search results, and achieve enhanced global search. ability, improve global search ability, and solve the effect of low identification accuracy

Pending Publication Date: 2021-03-05
XIAN UNIV OF POSTS & TELECOMM
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage is that a possible pathogenic SNP combination will not be missed. The disadvantage is that the calculation amount is huge. When k>3, the calculation cannot be completed within the effective time
[0008] 2. Stochastic search (random search): using the idea of ​​random sampling to search in the solution space can greatly reduce the amount of calculation, but the success rate is low, and it prefers SNP upper combinations with marginal effects
The disadvantage is: it is difficult to find high-order SNP epistatic combinations with no marginal effect (or with low marginal effect)
The advantage of machine learning is that it can evaluate and compare SNP combinations of any order. However, for high-order SNP combinations, the recognition accuracy is very low and the calculation is very heavy.
The existing detection methods for high-order SNP epistasis combinations mainly have the following deficiencies: (1) Too much reliance on SNP epistasis (causation) models, resulting in a preference for certain simulation models in detection methods, which is difficult to apply to the detection of unknown models
Especially in the face of real complex disease data sets, it is difficult to give an effective detection method
(2) The P-value threshold used by the statistical test method is determined artificially, resulting in low sensitivity of the test results
(3) Most of the existing swarm intelligence search algorithms use a single or similar correlation evaluation function, resulting in inaccurate search results and missing the true pathogenic SNP epistasis combination
(4) For the data of multiple pathogenic SNP combinations, the detection ability is low
[0020] (1) The calculation complexity of the detection method is very large, or it is easy to miss the real SNP upper combination
[0021] (2) The sensitivity of the detection results is not high, and the versatility is very low
[0022] (3) The detection method has a preference for the SNP epistasis model, and the success rate of the detection algorithm is not high enough; the single-task detection method adopted requires repeated trials for unknown diseases, resulting in a large amount of calculation and is not conducive to heuristics search
[0024] (1) Existing detection methods are computationally complex, or it is easy to miss the real SNP epistasis combination
[0025] (2) The sensitivity of the detection results of existing detection methods is not high, and the versatility is very low
[0026] (3) Existing detection methods have a preference for the SNP epistasis model, and the success rate of the detection algorithm is not high enough; the single-task detection method adopted requires repeated trials for unknown diseases, resulting in a large amount of calculation and is not conducive to enlightenment type search
[0028] (1) The number of loci in the human genome is huge, and the number of combinations is increasing exponentially. Existing computers and methods cannot perform association detection of k-order (k-order, k>2) SNP combinations within a limited time , there is no effective method to quickly discover possible k-order SNP epistatic combinations
[0029] (2) SNP epistasis models are rich and diverse, such as main effect + interaction model, no main effect + interaction model, etc. A single method cannot correctly identify all SNP epistasis models, and there is a preference for epistasis models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-task high-order SNP upper detection method and system, storage medium and equipment
  • Multi-task high-order SNP upper detection method and system, storage medium and equipment
  • Multi-task high-order SNP upper detection method and system, storage medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0084] Aiming at the problems existing in the prior art, the present invention provides a multi-task high-order SNP epistasis detection method, system, storage medium, and equipment. The present invention will be described in detail below with reference to the accompanying drawings.

[0085] Such as figure 1 As shown, the multi-task high-order SNP epistasis detection method provided by the present invention comprises the following steps:

[0086]S101: Use Plink software to read out PED and MAP format data from the VCF file, convert bit binary format files (FAM, BED, BIM) into a sample matrix;

[0087] S102: Set the parame...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of single nucleotide polymorphism upper detection, and discloses a multi-task high-order SNP upper detection method and a system, a storage medium and equipment. The multi-task high-order SNP upper detection method comprises the following steps: reading PED and MAP format data from a VCF file by using Plink software, converting a bit binary format file,and arranging the bit binary format file into a sample matrix; setting search algorithm parameters according to the sizes of the SNP sites and the sample size in the data; reading in SNP sample data,and preparing first-stage search; carrying out high-order SNP upper combination detection by utilizing a multi-task and multi-harmony memory bank harmony search (HS) algorithm. According to the multi-task harmony search detection method provided by the invention, a plurality of harmony memory banks are adopted, SNP combinations of different orders are respectively stored, and a multi-task technology is applied, so that high-order SNP upper detection of a plurality of different orders can be carried out at the same time, mutual learning among individuals in a population is promoted, the diversity of the population is enhanced, and the global search capability is further improved.

Description

technical field [0001] The invention belongs to the technical field of single nucleotide polymorphism epistasis detection, and in particular relates to a multi-task high-order SNP epistasis detection method, system, storage medium and equipment. Background technique [0002] At present: SNP (Single Nucleotide Polymorphism, SNP) refers to the polymorphism caused by the variation of a single base site at the genome level, which may be a single base conversion (transition) or transversion (transversion), It may also be due to insertion or deletion of bases. A base pair C-G in sequence 1 is expressed as A-T in sequence 2, and this site is called a SNP site. In the whole human genome, there are more than 3 million such SNP sites. Under normal circumstances, most of the SNPs will not pose a threat to human health, but some SNP variation sites are closely related to human health. Epistatic effect: Indicates the interaction between a gene or SNP, traditionally defined as an allele...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/20G16B50/00
CPCG16B20/20G16B50/00
Inventor 拓守恒刘凡李超
Owner XIAN UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products