Reference genome and de novo assembly combination based next-generation sequencing data assembly method

A reference genome and next-generation sequencing technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of assembly integrity discount, large impact, low quality, etc., to reduce complexity and improve continuity. , the effect of improving continuity and accuracy

Inactive Publication Date: 2016-02-03
HUAZHONG AGRI UNIV
View PDF1 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The method is highly affected by the quality of the reference genome
When the quality of the reference genome is high, the quality of the assembly using the reference genome is also high; when the quality of the reference genome is low, the quality of the assembly using the reference genome is also low
At the same time, this strategy is difficult to assemble some specific fragments of the species, which will greatly reduce the integrity of the assembly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reference genome and de novo assembly combination based next-generation sequencing data assembly method
  • Reference genome and de novo assembly combination based next-generation sequencing data assembly method
  • Reference genome and de novo assembly combination based next-generation sequencing data assembly method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] In this study, the sequencing data used were the whole-genome shotgun sequencing data of leaves of MH63 and ZS97 (indica rice varieties Minghui 63 and Zhenshan 97) provided by the rice research team of Huazhong Agricultural University. The sampling period was the three-leaf stage of rice, the sequencing platform IlluminaHiseq2000, PE100 sequencing, a total of 300bp, 5k, 10k libraries of three different insert-size (Table 1). In addition, the Nipponbare Genome Reference Genome IRGSP-1.0 (http: / / rapdb.dna.affrc.go.jp / ) was also used

[0049] Table 1 Sequencing data statistics

[0050]

[0051] a According to the read alignment to Nipponbare genome statistics.

[0052] b Based on Nipponbare genome size estimates. .

[0053] We adopted a strategy of assembling based on the Nipponbare reference genome. We divided multiple regions based on the Nipponbare sequence, and performed partial de novo assembly in each region. The Nipponbare sequence, whole genome de novo as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a reference genome and de novo assembly combination based next-generation sequencing data assembly method. Two policies based on reference genome assembly and genome de novo assembly are combined for overcoming the disadvantages of the two policies, and the advantages of the two policies are fully utilized. The method comprises: firstly, obtaining a genome sequence relatively high in continuity and accuracy by utilizing the reference genome based policy; secondly, obtaining a genome subjected to de novo assembly by utilizing the de novo assembly policy, wherein the genome is relatively good in performance of specific sequence assembly of species; and finally, integrating the two genomes to generate a genome relatively high in accuracy, continuity and integrity.

Description

technical field [0001] The present invention relates to methods for whole genome assembly of second generation sequencing reads in the presence of a reference genome. Background technique [0002] At present, there are two main assembly strategies for next-generation sequencing data based on the presence or absence of a reference genome, one is genome de novo assembly, and the other is assembly based on a reference genome. But both have pros and cons. [0003] De novo genome assembly is an assembly based entirely on sequenced reads without relying on a reference genome or other genomes. Currently, there are mainly three algorithms for sequence assembly. The first is the greedy algorithm. As long as there are repeated (consistent) sequences between the sequences, this algorithm will find the largest repeated region and merge more sequences. This algorithm is simple and can achieve local optimal results, but it is difficult to achieve global optimal results. In practice, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/18
Inventor 陈玲玲孙帅焦文标徐锡文宋佳明
Owner HUAZHONG AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products