Genome restriction map splicing method and system

A splicing system and genome technology, which is applied in the field of genomic restriction map splicing methods and systems, can solve the problems of loss of accuracy, difficulty in directly recovering the genome, Bayesian model a priori, and high computational complexity

Inactive Publication Date: 2015-09-30
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The second-generation sequencing technology has gradually become the mainstream sequencing technology due to its high-throughput and low-cost characteristics. However, both Sanger sequencing and second-generation sequencing technologies have limited sequencing lengths, and it is difficult to span the genome. Some longer repeats (repeated sequences) in the genome, where "repeat" refers to sequences that appear more than once on the genome, the existence of repeats makes it difficult to directly restore the complete genome through next-generation sequencing data splicing, however Research such as structural variant detection relies on the complete sequence information of the genome, so higher requirements are placed on sequencing technology
Anantharaman, T.S., B.Mishra, and D.C.Schwartz, Genomics via optical mapping. II: Ordered restriction maps. J Comput Biol, 1997.4(2): p.91-118. In this paper, the Bayesian method is used to predict the occurrence of loci The probabilistic model is used to assemble the map molecules. The shortcomings of this method mainly include how to set the prior of the Bayes model and the high computational complexity.
Anantharaman, T., B. Mishra, and D. Schwartz, Genomics via optical mapping. III: Contiging genomic DNA. Proc Int Conf Intell Syst Mol Biol, 1999: p.18-27. Using the idea of ​​sequence alignment to construct restriction enzymes Site map, the algorithm has shortcomings: (1) it is very time-consuming to find the optimal sequence alignment, (2) some heuristic strategies are introduced to reduce the time complexity, but the accuracy is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome restriction map splicing method and system
  • Genome restriction map splicing method and system
  • Genome restriction map splicing method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] The enzyme-cut map stitching algorithm nanoARCS is mainly composed of two parts: molecular error correction and molecular splicing. The algorithm flow chart is shown in figure 2 . The present invention will be further described below in conjunction with the accompanying drawings, wherein the "molecules" appearing in the present invention all represent "gene sequence molecules".

[0071] Step 1: The main steps of molecular error correction are divided into data preprocessing, clustering and error correction;

[0072] Step 11: data preprocessing;

[0073] The main features of enzyme digestion map data are: 1) Irys system such as figure 1 The resolution of the restriction map generated as shown is about the order of Kbp. That is, if two restriction sites are close together, there is a high chance that one of the sites will be missed. Such as image 3 As shown in , only one fluorescent signal is recognized in the enzyme digestion map for the sites that are relatively ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the field of genomic sequence splicing in the field of molecular biology, and provides a genome restriction map splicing method and system. The method includes the steps of preprocessing gene sequence molecules in the genome restriction map, obtaining new gene sequence molecules, and cutting the new gene sequence molecules into FLES segments, wherein the FLES segments are gene segments which have the fixed total segment lengths and do not need to have the same restriction locus number; clustering the FLES segments to form a representative FLES set, and conducting error correction on the gene sequence molecules according to the representative FLES set; establishing an FLES map according to the FLES set and the gene sequence molecules where error correction is conducted, conducting route search on the FLES map, obtaining the Hamilton route of the FLES map to serve as the restriction locus sequence of a genome, and completing splicing of the genome restriction map. By means of the method and the system, the restriction locus map of the genome can be rapidly and accurately established.

Description

technical field [0001] The invention relates to the field of genome sequence splicing in the field of molecular biology, in particular to a method and system for splicing genome restriction map. Background technique [0002] The genome contains the most basic genetic information of an organism, which determines the biological characteristics of the species, guides the operation of life functions and development processes; and guides the synthesis of important compounds in cells (such as proteins, RNA, etc.). [0003] The genome sequence is a double helix structure formed by deoxyribonucleotides (adenine A, guanine G, thymine T, and cytosine C) connected to each other by 3'-5'-phosphodiester bonds in a certain order. The so-called genome sequencing refers to obtaining the sequence information of the deoxyribonucleotides of the genome. With the development of genome sequencing technology, the genomes of more and more species have been determined. [0004] The development of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/24G06F19/22
Inventor 卜东波许情陈挺孙世伟李帅成刘兴武张仁玉王超
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products