Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for eliminating redundancy of high-heterozygous diploid sequence assembling result and application of method and device

A sequence assembly and diploid technology, which is applied in the field of redundant sequence assembly results of highly heterozygous diploids, can solve problems such as difficult redundant sequence removal, and achieve improved accuracy, improved accuracy, and redundant sequence removal high rate effect

Inactive Publication Date: 2021-12-10
BEIJING NOVOGENE TECH CO LTD
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The main purpose of the present invention is to provide a method, device and application for removing redundancy in the assembly results of highly heterozygous diploid sequences, so as to solve the problem in the prior art that it is difficult to remove redundancy in the assembly results of highly heterozygous diploid genomes. The problem with sequence removal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for eliminating redundancy of high-heterozygous diploid sequence assembling result and application of method and device
  • Method and device for eliminating redundancy of high-heterozygous diploid sequence assembling result and application of method and device
  • Method and device for eliminating redundancy of high-heterozygous diploid sequence assembling result and application of method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] In a typical embodiment of the present application, a method for removing redundancy in the assembly result of highly heterozygous diploid sequences is provided, the method comprising:

[0034] S101, according to the comparison of the HiC data and the assembly result of the highly heterozygous diploid sequence, an alignment file is obtained, and the alignment file includes all alignment information including multiple alignments;

[0035] S102, calculate the HIC interaction strength in the contig according to the comparison file, and interrupt the wrongly connected contig in the assembly result of the highly heterozygous diploid sequence according to the HIC interaction strength in the contig, and obtain the correctly connected contig after the interruption genome sequence;

[0036] S103, using the interrupted genome with correct connection and the comparison file corresponding to the genome with correct connection, to cluster the contigs according to the interaction str...

Embodiment 2

[0056] This embodiment provides a detailed method for removing redundancy from the assembly results of highly heterozygous diploid sequences. The above specific process is as follows: figure 1 (Rounded rectangles represent input / output, right-angled rectangles represent processing operations):

[0057] (1) Use HICUP or HiC-Pro software to compare the HIC data with the genome to be deredundant, and obtain the alignment file align.bam, and keep all the alignment information in the alignment result, that is, do not perform MAPQ on the alignment file ( mapping quality value) filtering, without filtering multiple comparisons (according to the comparison file helps to improve the accuracy of the calculated interaction strength within the contig and the interaction strength between contigs, as well as the accuracy of subsequent clustering ).

[0058] (2) According to the alignment file align.bam, calculate the HIC interaction strength in the contig. According to the HIC interaction...

Embodiment 3

[0064] Due to the high heterozygosity of the highly heterozygous diploid genome, abandon the simple idea of ​​removing heterozygosity first, mount the chromosome first, and then remove the heterozygous set to avoid possible misconnections during the assembly process Chimeric contigs, or parental chimeric sequences appearing within the same contig, use the de-redundancy method in the process shown in Example 2 to de-redundant the assembly results of a certain Citrus species (heterozygosity greater than 1.5%) , the results are shown in the table below.

[0065] Table 1:

[0066]

[0067]As can be seen from the above table, by testing highly heterozygous diploid plants, the de-redundancy method of the present application can effectively break the obviously wrongly connected contig in the genome. After the above-mentioned steps, in the finally retained genome, This makes the BUSCO D value drop from 93.2% before de-redundancy to 1.4% after de-redundancy, that is, it drops below...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and a device for removing redundancy of a high-heterozygous diploid sequence assembly result and application of the method and the device. The method comprises the following steps: comparing HiC data with a highly hybrid diploid sequence assembly result to obtain a comparison file of all comparison information including multiple comparison; calculating the HIC interaction intensity in contig and then interrupting the contig, which is wrongly connected, in a high-heterozygous diploid sequence assembling result; clustering contig by utilizing the broken genomes which are correctly connected, the comparison files corresponding to the broken genomes and the interaction strength between the contig to obtain a plurality of class groups; sequencing and orienting contig sequences in each class group to obtain a genome of the chromosome version; and reserving one homologous chromosome in the genome of the chromosome version, and combining with the non-mounted contig sequence to form a final genome of which the redundant sequence is removed. The problem that redundant sequences are difficult to remove is solved.

Description

technical field [0001] The present invention relates to the field of genome sequence assembly, in particular, to a method, device and application for removing redundancy from assembly results of highly heterozygous diploid sequences. Background technique [0002] At present, genome assembly is mainly based on PacBio single molecule real-time sequencing (single molecule realtime, SMRT, including CLR and HIFI data), or / and ONT (Oxford Nanopore Technologies) sequencing, or / and NGS sequencing (Next-generation sequencing technology), etc. The genome is assembled to the contig level, and then a variety of technologies are used for scaffold connection, such as 10X Genomic, Bionano, etc., and finally the genome map or HIC data are used to mount to the chromosome level. [0003] However, in the process of assembly, there will also be certain problems, especially for highly heterozygous species (high heterozygosity means that the degree of heterozygosity is higher than 0.5%, where the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/10G16B30/20G16B40/00
CPCG16B30/10G16B30/20G16B40/00
Inventor 李本萍周勋田仕林蔡晶陶琳娜
Owner BEIJING NOVOGENE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products