Unlock instant, AI-driven research and patent intelligence for your innovation.

Assembling strategy of high heterozygous diploid genome Scaffold sequences

A technology of sequence assembly and diploidy, applied in the field of bioinformatics, can solve the problems of poor scaffold assembly effect and high genome heterozygosity, and achieve the effect of improving the assembly effect

Inactive Publication Date: 2013-03-20
BEIJING NOVOGENE TECH CO LTD
View PDF5 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The main purpose of the present invention is to provide a highly heterozygous diploid genome scaffold sequence assembly strategy to solve the problem in the prior art that the heterozygosity of the genome itself is high and the effect of scaffold assembly is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Assembling strategy of high heterozygous diploid genome Scaffold sequences
  • Assembling strategy of high heterozygous diploid genome Scaffold sequences
  • Assembling strategy of high heterozygous diploid genome Scaffold sequences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention and are not intended to limit the present invention. invention.

[0023] What is required for scaffold sequence (Scaffold) assembly includes continuous contig sequence (Contig) files containing coverage information and Read files of each insert length fragment library obtained by linearizing the De Bruijn diagram, after a series of unit processing The final structure obtained in the process is a linearly connected Contig, and the Contigs are filled with a certain length of blank space, which is expressed as the distance between the Contigs. The linearly connected Contig structure is Scaffold. Since the insertion length of the inser...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an assembling strategy of high heterozygous diploid genome Scaffold sequences, and is suitable for the field of biological information. The strategy specifically comprises the following steps of: comparing Reads to Contig to obtain the to-be mapped information Arc and Link; setting threshold values according to the length and coverage depth of Contig, and filtering the short Contig with deep coverage depth; configuring a directed connection diagram between Contig according to the Arc relationship of two Contigs, and processing the diagram by using an arithmetic unit to search a foam-like structure to filter a single patch; configuring a directed connection diagram between Contig and TempScaffold according to the link relationship between the Contigs, and linearly processing the diagram; after transversing all the inserted fragment libraries, obtaining the final TempScaffold, namely the final Scaffold; and correspondingly supplementing the heterozygous single path filtered by Arc information according to the stored information, and finally displaying in the result. The strategy plays an important role in the assembling of Scaffold in the high heterozygous diploid genome; and the result meeting the subsequent analysis requirements is finally obtained.

Description

technical field [0001] The invention is applicable to the field of bioinformatics, and in particular relates to a highly heterozygous diploid genome scaffold sequence assembly strategy. Background technique [0002] The high-speed and low-cost second-generation sequencing technology has promoted the research process of animal and plant genomes. Today's bioinformatics algorithms based on second-generation sequencing technology assembly are developed for genomes with simple sequences. The assembly results will achieve better results when the sequence complexity of some biological genomes is low. However, when the genome is relatively When it is complex (for example, the heterozygosity is higher than 0.5%), the assembly result will be very unsatisfactory, and it will not meet the minimum requirements for subsequent analysis. [0003] Assembly is the process of processing the read length and short sequence (Read) obtained in the sequencing process to obtain a linear genome sequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/20
Inventor 阮航王海龙朱红梅李瑞强
Owner BEIJING NOVOGENE TECH CO LTD