Position anchoring bar code system for nanopore sequencing library building

A technology for nanopore sequencing and sequencing library, which is applied in the field of position-anchored barcode system, can solve the problems of doctor's diagnosis interference, poor metagenomic species analysis accuracy, barcode confusion and other problems, so as to screen out long-distance barcode interference and reduce false The effect of positive identification and result accuracy improvement

Active Publication Date: 2020-07-24
SIMCERE DIAGNOSTICS CO LTD +1
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The development of Illumina next-generation sequencing is in full swing in China, but there are the following problems in the application of microbial detection: First, the read length of next-generation sequencing is less than a few hundred bp, and there will be high homologous sequences among different species of microorganisms, resulting in The accuracy of metagenomic species analysis is poor, and irrelevant microbial information is fed back in the data report, which instead causes greater diagnostic interference to doctors; secondly, the identification of deeper disease-causing genes and drug-resistant genes requires assembly and splicing of sequencing sequences , so complex analysis requires higher time and capital costs to make up for the read length defect of next-generation sequencing data; in addition, the instruments related to next-generation sequencing are expensive, cumbersome to operate, and the initial investment is high, and the entire sequencing time is long, making it difficult to Applicable to the needs of acute infection
The third-generation sequencing technology PacBio has greatly improved the sequencing read length, and can detect 8-12kb, or even 40-70kb long fragment data, but its disadvantage is that the library construction process is relatively complicated
However, when distinguishing samples according to the barcode sequence, it was found that the 24bp barcode included in the library construction kit was seriously confused. The reason was that the Oxford Nanopore sequencer would cause reads The error rate of single base in the medium is as high as 10-15%. Therefore, when the reads are classified according to the barcode sequence in the downstream data analysis, the cross-contamination of data between samples will be caused by the error of barcode identification, which will lead to false positive identification of microorganisms, which will bring negative consequences for clinical decision-making. come to great trouble

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Position anchoring bar code system for nanopore sequencing library building
  • Position anchoring bar code system for nanopore sequencing library building
  • Position anchoring bar code system for nanopore sequencing library building

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081] Embodiment 1 comparison error statistics

[0082] The present invention considers that the main cause of barcode confusion is the sequence difference between the sequenced barcode and the preset real barcode, so the difference between the sequenced barcode sequence and the real barcode sequence is firstly sorted out. For this reason, the present invention uses 10 sets of barcodes of the SQK-PBK004 kit of ONT Company to separately build a library for the sample DNA on the computer, intercept the 250 bp of the 5' end of the sequencing data to ensure that the barcode region is included, and then perform global analysis with the corresponding preset barcode adapter Comparison (overlap alignment). Finally, according to the output multiple alignment file, the alignment difference of each position of the barcode adapter sequence is summarized and sorted, and the error position distribution and error type of the sequencing barcode are counted. Among them, the error types are d...

Embodiment 2

[0090] Embodiment 2 The impact of different error types and barcode lengths on the overall accuracy of barcode comparison

[0091] According to the different error types and their corresponding error rate values ​​in the above-mentioned embodiment 1, the present invention respectively simulates barcode sequences with lengths of 20bp, 40bp, 60bp, 80bp, 100bp, and 120bp in total, and each group simulates 12 different barcode elements . The present invention takes 80bp as an example to illustrate the specific overview of the simulation. First, the present invention presets 12 ideal barcode sequences, and the sequence information is shown in Table 2:

[0092] Table 2.

[0093]

[0094]

[0095] Then three different types of errors were introduced with the probability of a total error rate of 0.08 for each site, which were only indels, only base mismatches, and both indels and base mismatches. The detailed error rate distribution ratio of each error type is consistent with...

Embodiment 3

[0097] Example 3 The impact of inserting anchor sequences on the overall accuracy of barcode alignment

[0098] Based on the conclusion of the above-mentioned embodiment 2, the present invention intends to modify the barcode fragment in the following two ways, taking 80bp as an example:

[0099] 1) 1 anchor sequence insertion: replace the 12 bases in the middle of the barcode sequence with an anchor sequence with the same 12bp sequence (as shown in the underline in Table 2), that is, replace the original preset barcode fragment with the same length "short barcode- Anchor sequence-short barcode” format;

[0100] 2) Insertion of 2 anchor sequences: replace the 20 bp at both ends of the barcode sequence in Table 2 with 12 bp anchor sequence 1 and anchor sequence 2 (as shown in bold in Table 2), that is, short barcodes of the same length- Anchor Sequence 1 - Short Barcode - Anchor Sequence 2 - Short Barcode.

[0101] The location information of the anchor sequences obtained duri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a position anchoring bar code system for nanopore sequencing library building, and a preparation method and application of the position anchoring bar code system. The positionanchoring bar code system provided by the invention has higher resolution and higher classification accuracy, can significantly reduce the false positive rate of identification, improves the nanoporesequencing precision on the whole, and reduces the sequencing cost.

Description

technical field [0001] The invention relates to the field of gene sequencing, in particular to a position-anchored barcode system for nanopore sequencing library construction. Background technique [0002] At present, there are many clinically infected patients in the world and there are various sources of infection. In China, infectious diseases even account for 49% of the total incidence of all diseases. At present, the conventional clinical diagnosis method is the doctor's empirical judgment plus microscopic examination, biochemical analysis, etc. to determine the source of infection of the symptoms, but human factors, detection cycle and detection range limitations can easily lead to false detection and missed judgment, especially not conducive to acute Diagnosis and treatment of infection. With the vigorous development of high-throughput sequencing and genomics, metagenomic sequencing technology is flourishing in the field of infection diagnosis because it can quickly,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): C12Q1/6806C12N15/10C40B50/06C40B70/00C40B80/00
CPCC12N15/1093C12Q1/6806C40B50/06C40B70/00C40B80/00C12Q2565/631C12Q2563/185C12Q2525/191
Inventor 戴岩胡龙张烨肖念清任用
Owner SIMCERE DIAGNOSTICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products