Unlock instant, AI-driven research and patent intelligence for your innovation.

Base recognition method for nanopore sequencing data based on deep learning

A technology of sequencing data and deep learning, applied in the field of bioinformatics, can solve the problems of low accuracy of nanopore sequencing and achieve high-accurate identification and good generalization performance

Active Publication Date: 2022-05-17
NORTHEAST FORESTRY UNIVERSITY +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to propose a method for base recognition of nanopore sequencing data based on deep learning in view of the low accuracy of nanopore sequencing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Base recognition method for nanopore sequencing data based on deep learning
  • Base recognition method for nanopore sequencing data based on deep learning
  • Base recognition method for nanopore sequencing data based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0058] Specific implementation mode one: refer to figure 1 Specifically describe this embodiment, the method for base recognition of nanopore sequencing data based on deep learning described in this embodiment.

[0059] Such as figure 1 As shown, the following steps S1-S8 are included:

[0060] S1. Download 50 groups of nanopore raw data including Klebsiella pneumoniae, Enterobacteriaceae, and Proteobacteria and sequence data of 9 fungi to form a data set.

[0061] Among them, 50 sets of nanopore original sequencing data were obtained as the training set of the model, and the gene sequences of the other 9 species were used as the test set.

[0062] S2. Use nanopore's official base calling tool Guppy to perform base calling on 50 sets of raw data.

[0063] Use the official base recognition tool Guppy to convert the unknown nanopore into a base sequence to find its corresponding reference genome for next-generation sequencing.

[0064] S3. Using the Illumina sequencing seque...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The base recognition method of nanopore sequencing data based on deep learning involves the field of bioinformatics. In order to solve the problem of low accuracy of nanopore sequencing in the existing technology, one: download 50 groups of nanopore raw materials including Pneumobacterium, Enterobacteriaceae and Proteus The data is used as a training set; 2: Perform base recognition on 50 sets of raw data to obtain the base sequence; 3: Obtain the Illumina sequencing sequence with an accuracy rate of more than 99%, and use the Illumina sequencing sequence with an accuracy rate of more than 99% as a reference Genome, use the reference genome as the ground truth and use the Tombo algorithm to correct the base sequence; 4: use the Re-squiggle method to convert the corrected base sequence into the corresponding electrical signal data, and then mark the electrical signal data; 5 : Use the labeled electrical signal data and original data to train the neural network, and use the trained neural network to perform base recognition. This application realizes highly accurate recognition of the base sequence of nanopore sequencing data.

Description

technical field [0001] The invention relates to the field of bioinformatics, in particular to a method for base recognition of nanopore sequencing data based on deep learning. Background technique [0002] Compared with the second-generation sequencer and the third-generation sequencer of PacBio, the nanopore third-generation sequencer produced by Oxford has the advantages of portability, low cost, and long sequencing reads. However, the sequencing accuracy of nanopore is much lower than that of next-generation sequencing technology and PacBio's HIFI sequencing technology. The accuracy rate of its official base calling tool is only about 90%, and it is not open source. The nanopore of the Nanopore sequencer is essentially a nanoscale protein hole with voltage detection devices on both sides. When working, primers are used to pull single-stranded DNA / RNA through the nanopore, and different types of nucleotides will cause different current changes when passing through the na...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B30/00G16B40/20G06K9/62G06N3/04G06N3/08
CPCG16B30/00G16B40/20G06N3/08G06N3/047G06N3/045G06F18/2415G06F18/241
Inventor 汪国华高文韬邹权
Owner NORTHEAST FORESTRY UNIVERSITY