A method for detecting long tandem repeats based on the bionano platform

A technology of tandem repeat sequences and platforms, applied in genomics, instrumentation, proteomics, etc., can solve the problems of assembly errors, consumption of computing resources, long computing time, etc., to achieve fast running speed, consume less memory resources, and improve accuracy. sexual effect

Active Publication Date: 2022-02-22
BEIJING GRANDOMICS BIOTECH
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The detection of long tandem repeats based on the assembly algorithm has the following disadvantages: 1. There are a large number of insertion and deletion errors in the Bionano data, which can easily lead to assembly errors; 2. The calculation time is long; 3. It consumes a lot of computing resources; 4. It cannot be accurately detected Chimera samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for detecting long tandem repeats based on the bionano platform
  • A method for detecting long tandem repeats based on the bionano platform
  • A method for detecting long tandem repeats based on the bionano platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Embodiment 1. Building a machine learning model

[0066] a. Dataset

[0067] Using HX1 data (see Shi, L., Guo, Y., Dong, C., Huddleston, J., Yang, H., Han, X. & Lintner, K.E. (2016). Long-read sequencing and de novo assembly of a Chinese genome.Nature communication.) constructed the Chinese reference genome and its Bionano optical map data, and compared the Bionano data to the HX1 reference genome, and the sites that could be compared to the reference genome were defined as true positive sites. The detected sites were defined as false positive sites. Accordingly, we randomly selected 1000 true positive sites and 1000 false positive sites as data sets for the two cases.

[0068] b. Feature selection

[0069] According to the data characteristics of Bionano, the intensity (Intensity), signal-to-noise ratio (SNR), and coverage of the site are weighted according to the confidence (Confidence) of reads compared to the reference genome (shown in formula 1-4). At the same ...

Embodiment 2

[0102] Example 2. Long tandem repeat sequence detection

[0103] 1. Experimental method

[0104] Lysis of human red blood cells (1hour)

[0105] Quantitative white blood cell count (5min)

[0106] Embedding of white blood cells (~1hour)

[0107] Digestion with proteinase K

[0108] Wash gel to immobilize DNA

[0109] DNA recovery

[0110] DNA dialysis and homogenization

[0111] Quantification of DNA concentration (10μl, 2 hours and 30 minutes)

[0112]Digest DNA with BssSI enzyme (10 μl, 2 hours and 30 minutes)

[0113] Labeling (15 μl, 1 hr 15 min)

[0114] Fix (20 μl, 45 minutes)

[0115] Staining treatment (60 μl, 16 hours / overnight)

[0116] BionanoSaphyr system for quantitative processing

[0117] Experiment details reference:

[0118] https: / / bionanogenomics.com / wp-content / uploads / 2017 / 03 / 30033-Rev-C-Bionano-Prep-Blood-DNA-Isolation-Protocol.pdf;

[0119] https: / / bionanogenomics.com / wp-content / uploads / 2017 / 07 / 30024-Rev-J-Bionano-Prep-Labeling-NLRS-Protocol....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for detecting long tandem repeat sequences based on the Bionano platform. The method of the present invention filters the Bionano data by constructing a naive Bayesian classifier machine learning model, removes false positive errors of insertion and deletion sites, and realizes long series repeat unit counting based on a comparison algorithm, reduces running time, calculation resource consumption. The method can also combine the clustering analysis algorithm and the number of repeating units on each reads to determine whether the genotype of the sample is homozygous, heterozygous or chimera.

Description

technical field [0001] The invention relates to the technical field of gene sequencing, in particular to a method for detecting long tandem repeat sequences based on the Bionano platform. Background technique [0002] Long tandem repeats refer to repeated sequences formed by connecting multiple nucleotides (single repeating unit greater than 1 kb) back and forth in a DNA sequence. Changes in the number of repeating units will have an important impact on the genome structure. [0003] The Bionano optical map is an ordered, genome-wide restriction endonuclease cut site map of a single DNA molecule. Use endonucleases to identify, digest and label DNA with fluorescence, and then straighten the DNA molecules through nanoscale capillary electrophoresis, unfold each DNA molecule linearly, perform ultra-long single-molecule high-resolution fluorescence imaging, and generate restriction enzymes Location map. Utilizing these extremely long read fragments for genome alignment overcom...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B40/00G16B20/30
CPCG16B40/00
Inventor 李丕栋周家蓬王凯孙贝贝汪德鹏
Owner BEIJING GRANDOMICS BIOTECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products