Low-depth sequencing population genotype filling calculation memory optimization method

A technology of deep sequencing and optimization methods, applied in computing, genomics, program control design, etc., can solve problems such as different degrees of linkage disequilibrium, loss of filling accuracy, and failure to take into account the data dependencies of computational auxiliary variable matrices, etc. The effect of reducing memory consumption, improving efficiency, and eliminating data dependencies

Pending Publication Date: 2021-12-03
GENETALKS BIO TECH CHANGSHA CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0020] 3. The auxiliary variable α introduced in the process of calculating the above probability t (i) and β t (i) Although the time complexity of calculation can be effectively reduced, the requirements for computer memory are very high
Through simple calculation, it can be known that for a chromosome containing 3 million SNP sites and the number of ancestral haplotypes is 30, a single α t (i) The storage of the matrix will consume 20G of memory, and the memory consumed by a single sample in step E will reach more than 60GB, which makes this method unable to process multiple samples in parallel due to memory limitations in practical applications, and thus cannot make full use of many-core servers CPU computing resources
[0021] From the above background, it can be seen that the memory consumption of a single sample auxiliary variable matrix is ​​related to the number of SNPs T and the number of ancestral haplotypes K 2 is proportional to the product of haplotypes, considering the locality of the haplotype region, some practitioners have proposed a method to reduce memory, which is to divide the SNP set into N consecutive subsets that do not contain each other, and each subset contains the same SNP number; then apply the above algorithm to each SNP subset separately, this method does not take into account the data dependencies between the front and rear columns when calculating the auxiliary variable matrix, which in turn affects the accuracy of genotype filling
[0022] An improved method proposed by another practitioner is to allow two adjacent sets to overlap a fixed-length chromosome interval when dividing the SNP set. Although this improves the defect of the previous method to a certain extent, due to the , the degree of linkage disequilibrium in different groups is different, and the artificially selected fixed-length interval will still inevitably lead to the loss of filling accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low-depth sequencing population genotype filling calculation memory optimization method
  • Low-depth sequencing population genotype filling calculation memory optimization method
  • Low-depth sequencing population genotype filling calculation memory optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0062] Such as figure 1 As shown, the low-depth sequencing population genotype filling calculation memory optimization method of the present invention includes:

[0063] Block the single nucleotide polymorphism site SNP, and set the checkpoint value according to the block;

[0064] According to the set checkpoint value, the forward auxiliary variable and the backward auxiliary variable are calculated by block, and the conditional probability of haplotype observation is calculated.

[0065] In a specific application example, the first round of forward auxiliary variable and backward auxiliary variable algorithm is used to determine the checkpoint; on the basis of the checkpoint value, the second round of forward auxiliary variable and backward auxiliary variable algorithm is used to calculate the block The forward auxiliary variable ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a low-depth sequencing population genotype filling calculation memory optimization method, which comprises the following steps of: partitioning single nucleotide polymorphism (SNP) sites, and setting check point values according to the blocks; calculating a forward auxiliary variable and a backward auxiliary variable by blocks according to the set check point values, and calculating a haplotype observation condition probability. The method has the advantages that the principle is simple, calculation memory consumption can be effectively reduced, and genotype filling efficiency is improved.

Description

technical field [0001] The invention mainly relates to the technical fields of gene sequencing and biological information analysis, in particular to a low-depth sequencing population genotype filling calculation memory optimization method. Background technique [0002] With the reduction of sequencing costs, large-scale population low-depth whole-genome sequencing has gradually become an indispensable technical means in population genetic research. By performing low-depth sequencing typing on a large number of samples, denser molecular markers than traditional chip typing and simplified genome sequencing can be obtained at the same or even lower cost, thereby improving the efficacy of GWAS mapping. However, due to the very low sequencing depth, this method inevitably results in the genotype loss of a large number of single nucleotide polymorphism sites (SNPs), which in turn affects the downstream genetic map construction, QTL mapping and population GWAS research. Therefore,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50G16B20/20
CPCG06F9/5027G16B20/20
Inventor 蒋艳凰马丑贤王振国杨仁武毛海波黄立磊冯博伦
Owner GENETALKS BIO TECH CHANGSHA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products