Method and device of estimating and verifying sequence assembly result of three-generation sequencing

A technology for sequence assembly and next-generation sequencing, which is used in sequence analysis, special data processing applications, instruments, etc., and can solve problems such as low coverage depth

Active Publication Date: 2018-04-10
BGI TECH SOLUTIONS
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to effectively solve the reasons for the low coverage depth of some regions of the third-generation sequencing results in the second-generation sequence, the present invention provides a method and device for evaluating and verifying the sequence assembly results of the third-generation sequencing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device of estimating and verifying sequence assembly result of three-generation sequencing
  • Method and device of estimating and verifying sequence assembly result of three-generation sequencing
  • Method and device of estimating and verifying sequence assembly result of three-generation sequencing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] Embodiment 1, the concrete application example of the corn genome of the method of the present invention

[0065] (1) Three-generation sequence assembly

[0066] Use the third-generation assembly software FALCON to assemble the 80X third-generation maize genome data (Pacbio single-molecule real-time sequencing (SMRT) results), and use the third-generation original data to polish the assembly results, and then use the 60X second-generation data to polish the assembly results Further error correction was performed to obtain the final assembly result of the maize genome.

[0067] (2) Comparison of the second-generation sequence and the third-generation assembly results

[0068]Use SOAPAligner software to filter the second-generation sequence of 60X PE250 (filtered second-generation sequence, the sequence after removing the adapter and low-quality bases. From the original data to the effective data filtering, it is processed in three steps: 1) filter adapter: sequencing re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device of estimating and verifying a sequence assembly result of three-generation sequencing. The method comprises the steps of comparing a two-generation sequencewith a three-generation assembly result; extending and selecting a low cover degree area, and obtaining the extended sequence; comparing a three-generation sequence with the extended sequence; counting basic group coverage depth; marking the assembly result. The method and device can screen out the area with not too high quality from the three-generation assembly result, and mark the area. In thefollowing species research, a prompt function is provided if the area with not too high quality needs to be used, and a rapid screening method is provided for the following improvement; meanwhile, the accuracy and quality of the three-generation assembly result can be proved, and the accuracy of the assembly result can be improved.

Description

technical field [0001] The invention belongs to the field of genome sequencing, and relates to a method and device for evaluating and verifying sequence assembly results of three-generation sequencing. Background technique [0002] The contig (contig) is a sequence segment (reads) that is assembled into a sequence segment without a gap (gap) by splicing the overlapping (overlap) region; the skeleton sequence (scaffold) is determined by the paired-end position information. There is a gap in the middle. Arrange the assembled contigs or scaffolds from large to small. When the cumulative length is just over 50% of the total length of all assembled sequences, the size of the last contig or scaffold is the size of N50. N50 is critical for evaluating the continuity of assembled sequences, Integrity is important; N70 and N90 are calculated similarly to N50, except that the percentages are changed to 70% or 90%. [0003] Due to the limitation of the read length of next-generation s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22
CPCG16B30/00
Inventor 邓天全
Owner BGI TECH SOLUTIONS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products