Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Mathematical sequence reconstruction method for long-chain molecules

A sequence and molecular technology, applied in the field of long-chain molecular sequence mathematical reconstruction algorithms, can solve the problem of high sequencing error rate and achieve the effect of improving accuracy and high accuracy

Inactive Publication Date: 2019-05-24
GUANGZHOU SHIBIO BIOTECH CO LTD +1
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, some base modifications can be detected by detecting the sequencing time between two adjacent bases, that is, if the base is modified, the speed of passing through the polymerase will slow down, and the distance between two adjacent peaks It can be used to detect methylation and other information through this. The sequencing speed of SMRT technology is very fast, about 10 dNTPs per second; however, at the same time, its sequencing error rate is relatively high (this is almost the same as the current single-molecule sequencing technology. Common faults), reaching 15%, its errors are random, and there is no bias in sequencing errors like the second-generation sequencing technology, so multiple sequencing is required for effective error correction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mathematical sequence reconstruction method for long-chain molecules

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0022] A mathematical sequence reconstruction method for long-chain molecules, the sequencing method mainly includes the following steps:

[0023] 1) Provide at least two DNA molecular chains to be tested in an individual, or use a PCR instrument to multiply the DNA chains of an individual, set the number of DNA molecules to X, and X is a natural number ≥ 2; DNA molecular chains can also be replaced by a single Protein molecular chains, single polysaccharide chains or other polymer molecules with a single structure to determine and reconstruct protein and polysaccharide sequences;

[0024] 2) breaking the X DNA molecules into fragmented sequences to form X gene libraries;

[0025] 3) Sequence the gene fragments of the X gene libraries to obtain the fragment information collection of the X gene libraries; suppose the X gene libraries are gene library A, gene library B, ..., gene library X, and gene library A fragment information for {A 1 , A 2 , A 3 ,...,A m}, the fragment...

example 1

[0034] For a certain gene sequence tctaactg to be determined, the gene sequence is randomly interrupted to obtain gene fragments tct, aac, tg, and based on this, a fragment library A{'tct';'aac';'tg'} is established; The gene sequence is randomly interrupted to obtain gene fragments tc, taac, tg, and based on this, a fragment library B{'tc';'taac';'tg'} is established. Completely arrange the fragment library A and the fragment library B respectively, obtain two sets A and B, and find the intersection of A and B, C=A∩B={tctaactg}, which happens to be the gene sequence we need to reconstruct.

[0035] Library A {'tct'; 'aac'; 'tg'} is as follows:

[0036] {

[0037] 'tctaactg'

[0038] 'tcttgaac'

[0039] 'aactcttg'

[0040] 'aactgtct'

[0041] 'tgtctaac'

[0042] 'tgaactct'

[0043]}

[0044] Library B {'tc'; 'taac'; 'tg'} is as follows:

[0045] {

[0046] 'tctaactg'

[0047] 'tctgtaac'

[0048] 'taactctg'

[0049] 'taactgtc'

[0050] 'tgtctaac'

[0051] 'tgtaactc...

example 2

[0055] For a certain gene sequence tctaactggcgcctcgctgtggaaaa to be determined, the gene sequence is randomly interrupted to obtain gene fragments tctaactgg, cgcctcgctg, tg and gaaaa, and based on this, a fragment library A {'tctaactgg'; 'cgcctcgctg'; 'tg'; 'gaaaa'}; This gene sequence is randomly interrupted again, and the gene fragments tctaact, g, gcg, cctcgc and tgtggaaaa are obtained, and the fragment library B{'tctaact'; 'g'; 'gcg' is established on this basis; 'cctcgc'; 'tgtggaaaa'}. Completely arrange the fragment library A and the fragment library B respectively, obtain two sets A and B, and find the intersection of A and B, C=A∩B={tctaactggcgcctcgctgtggaaaa}, which happens to be the gene sequence we need to reconstruct, detailed calculation process as follows:

[0056] gene='tctaactggcgcctcgctgtggaaaa'; / / The total number of sequences is 24

[0057] / / Randomly break into small fragments of 1-10

[0058] breaks={'tctaactgg' 'cgcctcgctg' 'tg' 'gaaaa'};

[0059] / / ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a long-chain molecule sequence mathematical reconstruction method, and specifically relates to a mathematical sequence reconstruction method for long-chain molecules, for geneDNA sequencing, protein amino acid sequencing or other long-chain structural chemical detection. The mathematical sequence reconstruction method for long-chain molecules includes the steps: 1) providing at least two DNA molecular chains, 2) breaking the at least two DNA molecular chains into fragment sequences to form X gene libraries, 3) sequencing gene fragments of the gene libraries to obtainlibrary fragment information; 4) performing full arrangement and splicing on the fragments of the library fragments, and obtaining a set of possibilities; 5) solving an intersection; and 6) judging the number of elements in the intersection to get a correct gene sequence map. The mathematical sequence reconstruction method for long-chain molecules belongs to a mathematical algorithm technology, and can realize sequencing and reconstruction of gene sequences, and sequencing and reconstruction of proteins, polysaccharides or other polymers having a single structure, thereby improving the accuracy in current gene sequencing, and the mathematical sequence reconstruction method for long-chain molecules is not probability speculation, but is based on rigorous mathematical algorithms, thus beinghigh in accuracy of sequencing structure.

Description

technical field [0001] The invention relates to a long-chain molecular sequence mathematical reconstruction algorithm, in particular to a mathematical sequence reconstruction method for long-chain molecules used for gene DNA sequence determination, protein amino acid sequence determination or other long-chain structural chemical substance detection. Background technique [0002] Sequence detection is involved in both biology and materials science, that is, the determination of various group arrangements of a certain chain, such as protein sequence determination, DNA sequence determination, polysaccharide sequence determination, etc. For example, to illustrate the bottlenecks faced in sequence determination, in molecular biology research, DNA sequence analysis is the basis for further research and transformation of target genes. [0003] The current technologies used for sequencing mainly include the dideoxy chain terminal termination method invented by Sanger et al. (1977) a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B30/10G16B30/20
CPCG16B30/10G16B30/20
Inventor 胡洪超舒绪刚
Owner GUANGZHOU SHIBIO BIOTECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products