Bionano platform-based long tandem repeat sequence detection method

A series of repetitive sequence and platform technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of assembly errors, consumption of computing resources, long computing time, etc., achieve fast running speed and consume less memory resources , the effect of improving accuracy

Active Publication Date: 2018-08-28
BEIJING GRANDOMICS BIOTECH
View PDF9 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The detection of long tandem repeats based on the assembly algorithm has the following disadvantages: 1. There are a large number of insertion and deletion errors in the Bion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bionano platform-based long tandem repeat sequence detection method
  • Bionano platform-based long tandem repeat sequence detection method
  • Bionano platform-based long tandem repeat sequence detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] Embodiment 1. Building a machine learning model

[0066] a. Dataset

[0067] Using HX1 data (see Shi, L., Guo, Y., Dong, C., Huddleston, J., Yang, H., Han, X. & Lintner, K.E. (2016). Long-read sequencing and de novo assembly of a Chinese genome.Nature communication.) constructed the Chinese reference genome and its Bionano optical map data, and compared the Bionano data to the HX1 reference genome, and the sites that could be compared to the reference genome were defined as true positive sites. The detected sites were defined as false positive sites. Accordingly, we randomly selected 1000 true positive sites and 1000 false positive sites as data sets for the two cases.

[0068] b. Feature selection

[0069] According to the data characteristics of Bionano, the intensity (Intensity), signal-to-noise ratio (SNR), and coverage of the site are weighted according to the confidence (Confidence) of reads compared to the reference genome (shown in formula 1-4). At the same ...

Embodiment 2

[0102] Example 2. Long tandem repeat sequence detection

[0103] 1. Experimental method

[0104] Lysis of human red blood cells (1hour)

[0105] Quantitative white blood cell count (5min)

[0106] Embedding of white blood cells (~1hour)

[0107] Digestion with proteinase K

[0108] Wash gel to immobilize DNA

[0109] DNA recovery

[0110] DNA dialysis and homogenization

[0111] Quantification of DNA concentration (10μl, 2 hours and 30 minutes)

[0112]Digest DNA with BssSI enzyme (10 μl, 2 hours and 30 minutes)

[0113] Labeling (15 μl, 1 hr 15 min)

[0114] Fix (20 μl, 45 minutes)

[0115] Staining treatment (60 μl, 16 hours / overnight)

[0116] BionanoSaphyr system for quantitative processing

[0117] Experiment details reference:

[0118] https: / / bionanogenomics.com / wp-content / uploads / 2017 / 03 / 30033-Rev-C-Bionano-Prep-Blood-DNA-Isolation-Protocol.pdf;

[0119] https: / / bionanogenomics.com / wp-content / uploads / 2017 / 07 / 30024-Rev-J-Bionano-Prep-Labeling-NLRS-Protocol....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Bionano platform-based long tandem repeat sequence detection method. According to the method, by building a naive Bayesian classifier machine learning model, Bionano data is filtered to remove false positive errors of insertion and deletion sites, and based on a comparison algorithm, long tandem repeat unit counting is realized, so that the consumption of running time andcomputing resources is reduced. In addition, sample genotypes can be determined as homozygotes, heterozygotes or chimeras in combination with a clustering analysis algorithm and repeat unit numbers onreads.

Description

technical field [0001] The invention relates to the technical field of gene sequencing, in particular to a method for detecting long tandem repeat sequences based on the Bionano platform. Background technique [0002] Long tandem repeats refer to repeated sequences formed by connecting multiple nucleotides (single repeating unit greater than 1 kb) back and forth in a DNA sequence. Changes in the number of repeating units will have an important impact on the genome structure. [0003] The Bionano optical map is an ordered, genome-wide restriction endonuclease cut site map of a single DNA molecule. Use endonucleases to identify, digest and label DNA with fluorescence, and then straighten the DNA molecules through nanoscale capillary electrophoresis, unfold each DNA molecule linearly, perform ultra-long single-molecule high-resolution fluorescence imaging, and generate restriction enzymes Location map. Utilizing these extremely long read fragments for genome alignment overcom...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/24
CPCG16B40/00
Inventor 李丕栋周家蓬王凯孙贝贝汪德鹏
Owner BEIJING GRANDOMICS BIOTECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products