Supercharge Your Innovation With Domain-Expert AI Agents!

Method and device for improving genome assembly integrity and application thereof

A technology for genome assembly and integrity, used in instrumentation, sequence analysis, biostatistics, etc.

Active Publication Date: 2021-12-17
BEIJING NOVOGENE TECH CO LTD
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The main purpose of the present invention is to provide a method, device and application for improving the integrity of genome assembly, so as to solve the problem that most genome sequences still contain many gaps in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for improving genome assembly integrity and application thereof
  • Method and device for improving genome assembly integrity and application thereof
  • Method and device for improving genome assembly integrity and application thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] This embodiment provides a method for improving the integrity of genome assembly, such as figure 1 As shown, the method includes:

[0036] S101, obtaining the preliminary chromosome version genome of the target sample;

[0037] S102, using the third-generation sequencing short sequence to compare with the preliminary chromosomal version genome sequence, and clustering the optimally aligned short sequences according to the chromosome to obtain multiple clusters;

[0038] S103. Partially assemble the sequences of the three-generation sequencing short sequences in multiple taxa, so as to obtain the assembled genome sequence with improved integrity.

[0039] The method for improving the integrity of genome assembly in this application is to firstly use the sequencing sequence to perform conventional assembly to obtain the primary assembly gene, and then compare (map) the three-generation short sequence back to the chromosome version of the primary assembly gene (you can al...

Embodiment 2

[0048] This embodiment takes the assembly of the CCS sequence based on the PacBio platform as an example, combining figure 2 Describe the assembly process in detail.

[0049] 1) Contig V1 was obtained by using CCS short sequences and assembling based on software such as hifiasm / hicanu;

[0050] 2) Compare the HIC data to the contig V1, and then use the extract, partition, optimize and build modules in the ALLHIC software to mount to the chromosome level to obtain the preliminary chromosome version pseudochromosome V1;

[0051] 3) Use juicerbox software to adjust the above results to obtain pseudochromosome V2;

[0052] 4) Use the minimap2 software to align the three generations of short sequences to the pseudochromosome V2 to obtain the alignment bam;

[0053] 5) Use samtools software to filter according to the flag value, that is, samtools view -F2308 (2308=4+256+2048), or use samtools markdup to remove duplicate alignments, so that each short sequence will only correspond...

Embodiment 3

[0060] In this example, the sequence assembly of a plant was tested, and the results of the corresponding contigs before and after the process were compared, and it was found that without reducing the assembly quality, the contig N50 increased from the original 14M to 19M, an increase of 34%. (See the table below for details).

[0061] Table 1:

[0062] .

[0063] From the above description, it can be seen that the above-mentioned embodiments of the present application have the following improvements: 1) When aligning three generations of short sequences to the reference genome, minimap2 must add the parameter --secondary=no, and for the comparison For the correct bam file, you need to use the samtools software to add the -F2308 parameter, that is, filter out flag values ​​of 4 (the read is not compared to the reference sequence), 256 (the read is a suboptimal alignment result) and 2048 (supplemented matching reads), or use samtools markdup to remove duplicate alignments, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for improving genome assembly integrity and the application thereof. The method comprises the following steps: collecting a preliminary chromosome version genome of a target sample; comparing the third-generation sequencing short sequence with the initial sequence of the genome of the chromosome version, and clustering the optimal comparison short sequence according to chromosomes to obtain a plurality of class groups; and carrying out local assembly on the sequences of the third-generation sequencing short sequences in a plurality of class groups so as to obtain the assembled genome sequence with improved integrity. The method comprises the following steps: performing conventional assembly by using a sequencing sequence to obtain a primary assembly genome sequence, performing chromosome mounting, aligning third-generation short sequences back to a genome, clustering the third-generation short sequences which are optimally aligned into a plurality of class groups according to chromosomes, and performing third-generation local assembly in each class group. Therefore, a genome sequence with higher integrity is obtained.

Description

technical field [0001] The present invention relates to the field of genome sequence assembly, in particular, to a method, device and application for improving the integrity of genome assembly. Background technique [0002] Since 1977, DNA sequencing technology has gone through three stages. The first stage is mainly based on the dideoxy chain end termination method proposed by Sanger and Coulson, also known as Sanger sequencing; the second stage is mainly based on Roche’s 454 sequencing platform, Illumina’s The second-generation sequencing technology represented by Solexa sequencing system, also known as "Next-generation" sequencing technology (NGS); the third stage is mainly based on Pacific Biosciences' SMRT (single molecule real time) Technology, Oxford Nanopore Technologies' nanopore single-molecule sequencing technology, is considered a third-generation sequencing technology. [0003] With the generation of sequencing, researchers have also started to explore the gen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B30/10G16B30/20G16B40/00
CPCG16B30/10G16B30/20G16B40/00
Inventor 李本萍田仕林周勋陶琳娜王静
Owner BEIJING NOVOGENE TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More