Whole genome typing method based on Pacio subreads and Hi-C reads

A typing method and genome-wide technology, applied in genomics, sequence analysis, proteomics, etc., can solve the problems of not involving assembly, unable to type contigs, etc.

Pending Publication Date: 2020-10-23
WUHAN FRASERGEN CO LTD
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] However, there are some defects in the existing typing processes. For example, HapCUT can only type mutation sites and does not involve assembly; because there are few restriction sites on short contigs, neither Falcon-Phase nor ALLHIC can accurately classify Shorter contigs typing, and the ALLHIC software needs to provide the genome assembled to the chromosome level of related species as a reference sequence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Whole genome typing method based on Pacio subreads and Hi-C reads
  • Whole genome typing method based on Pacio subreads and Hi-C reads
  • Whole genome typing method based on Pacio subreads and Hi-C reads

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] Example 1: Reference sequence construction

[0039] In this example, double haploid plants of highly heterozygous Populus nigra can be obtained, so the whole genome of the double haploid plants was first sequenced (using the third-generation Pacbio Sequel sequencing platform), and assembled using Falcon, and then Using Hi-C technology to build a library and sequence it, use the Hi-C data obtained by sequencing to mount the genome draft assembled by Falcon, and finally obtain the genome of high-quality double haploid plants as a reference for subsequent analysis sequence.

Embodiment 2

[0040] Example 2: Aligning the sequencing data of highly heterozygous Populus nigra to the reference sequence

[0041] The third-generation Pacbio Sequel sequencing platform was used to sequence the highly heterozygous Populus small black (about 560X); at the same time, Hi-C technology was also used to sequence the library of the highly heterozygous small black Populus to obtain Hi-C reads (about 515X); in addition, the The shotgun sequencing data (about 289X) used to evaluate the genome heterozygosity of highly heterozygous P. The three sets of data were compared to the reference genome. The third generation of data was compared using NGMLR software, and the second generation of data was compared using the BWA MEM method. After completion, three comparison results were obtained.

Embodiment 3

[0042] Example 3: Building MVP blocks

[0043] HapCUT2 was used to analyze shotgun sequencing data and Hi-C data to construct linked SNP information, in which one MVP block was obtained for each chromosome.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a whole genome typing method based on Pacbio subreads and Hi-C reads. The whole genome typing method comprises the following steps: 1) preparing a reference genome; 2) comparing second-generation sequencing data with the reference genome, and detecting all the SNP sites of each chromosome; 3) comparing Hi-C library building sequencing data with a reference genome, and building a linkage SNP group by adopting a combination of HapCUT2 with the SNP sites; 4) grouping the Pacbio subreads based on the MVP Block, and then respectively assembling the Pacbio subreads to finally obtain the sequence of each chromatid; and 5) performing whole genome sequencing on the parent genome, comparing a sequencing result to a chromatid sequence separated in the previous step, and dividing the chromatids into two groups according to a comparison result to correspond to male and female parent genomes. According to the method, the defect that contigs with too few restriction enzyme cutting sites cannot be assembled in the process of Hi-C data assembling is avoided, a linked SNP group is constructed from the aspect of whole genome, and then the Pacbio long reads are combined, so the error risk of typing is greatly reduced.

Description

【Technical field】 [0001] The invention relates to the field of genome assembly typing, in particular to a whole genome typing method based on Pacbio subreads and Hi-Creads. 【Background technique】 [0002] In 1985, American scientists proposed the Human Genome Project to sequence the human genome in order to obtain a complete human genome. Once the plan was put forward, it received global attention, and six countries (including China) participated in it. With great cooperation from various countries, the draft human genome was finally officially published in 2001, which also marked the arrival of the genome era. Subsequently, a series of technologies including resequencing, transcriptomics, and three-dimensional genomes have flourished, and the basis of these technologies can be attributed to the possession of high-quality reference genome sequences. At present, if a new species is studied at the molecular level, it usually starts with sequencing and assembly. [0003] How...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/00G16B30/10G16B30/20
CPCG16B20/00G16B30/10G16B30/20
Inventor 卢锐
Owner WUHAN FRASERGEN CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products