Method based on repetitive sequence recognition for splicing sequencing data of whole genome

A whole-genome sequencing and repeating sequence technology, applied in the field of genetic engineering, can solve problems such as splicing errors in large genome data of higher animals and plants

Inactive Publication Date: 2002-07-24
北京六合华大基因科技有限公司
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to propose a splicing method for whole genome sequencing data based on repetitive sequence identification, after analyzing the rules of the repetitive sequences in the genomes of higher animals and plants when using the "shotgun method" for seque

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method based on repetitive sequence recognition for splicing sequencing data of whole genome
  • Method based on repetitive sequence recognition for splicing sequencing data of whole genome
  • Method based on repetitive sequence recognition for splicing sequencing data of whole genome

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0026] In the following, the steps of the method of the present invention will be described in detail with reference to the accompanying drawings:

[0027] In order to identify repetitive sequences, the present invention first sets a minimum fragment length, generally 15bp-20bp, and repetitive sequences smaller than this length will not be considered. To simplify the model, it is assumed that all sequencing reads are equal in length, which are all L.

[0028] The meaning of the parameters in the following formula: G: total length of the genome, L: average effective read length for sequencing, N: number of successful sequencing reactions, F: identification minimum fragment length.

[0029] Count the number of occurrences of non-repetitive small fragments in shotgun sequencing:

[0030] Define a random variable Y ik Describe the occurrence of K times of the above-mentioned DNA fragments of the specified length in whole-genome sequencing using the shotgun method:

[0031] If the num...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method based on repetitive sequence recognition for splicing the sequencing data of full-genome includes calculating the probability distribution of non-repetitive and repetitive fragments in sequencing data, determining the standard for recognizing repetitive sequences, shielding the repetitive sequences with said standard, splicing by group according to size of target genoma, restoring the N in big fragment back to original bases, finding out relative big fragments and the reading between them, linking them together, and sorting them to obtain the working block diagram of target genome. Its advantages are high efficiency and high accuracy.

Description

technical field [0001] The invention relates to a splicing method of whole genome sequencing data based on repetitive sequence recognition, which belongs to the technical field of genetic engineering. Background technique [0002] Genomics is a comprehensive analysis of the complete set of genetic material of an organism to understand the function and role of genetic information from a holistic perspective. The most important step is to determine the complete set of genetic information of the organism, that is, to know the sequence of all nucleic acid bases of the organism, which is the so-called whole genome sequencing analysis. At present, the whole genome sequencing mainly adopts two strategies: 1. "Hierarchical cloning method", that is, the larger genome is first broken into medium-sized fragments (150kb~300kb) and cloned, and then the medium fragments are broken into small fragments (1kb~300kb). 3kb) for sequencing, and finally for data splicing by computer. For examp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): C12P19/34C12Q1/68
Inventor 李松岗王俊盖伊·王于军汪建杨焕明倪培相韩玉军黄显刚张建国胡咏武
Owner 北京六合华大基因科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products