Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

De novo diploid genome assembly and haplotype sequence reconstruction

A genome assembly and haplotype technology, applied in genomics, sequence analysis, proteomics, etc.

Active Publication Date: 2018-01-19
PACIFIC BIOSCIENCES
View PDF28 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Conventional graph traversal algorithms often stop extending contigs around the nodes of such complex bubbles, which often leads to fragmented assemblies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • De novo diploid genome assembly and haplotype sequence reconstruction
  • De novo diploid genome assembly and haplotype sequence reconstruction
  • De novo diploid genome assembly and haplotype sequence reconstruction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0096] Figure 11A is a flowchart illustrating the process of string graph assembly performed by the diploid contig generator 114 on a polyploid genome, according to an exemplary embodiment. The process may begin by receiving a string map and a unified group map generated from sequence reads of length at least 0.5 kb, more preferably at least 1 kb (block 1100). According to an exemplary embodiment, the diploid contig generator 114 uses long reads to generate a string map from which a unified group map is constructed, rather than just identifying simple paths without branches in the unified group map. In one embodiment, the unified group graph may be generated by the string graph generator 112 . Alternatively, the unified group map can be generated by the diploid contig generator 114 .

[0097] String bundles are identified in the unified group graph or string graph (block 1102). In one embodiment, a bundle of character strings may include a set of unbranched wires that form...

Embodiment approach 2-

[0112] Embodiment 2 - Identification of bundles of strings and determination of primary and related contigs

[0113] Figure 12B is a graphical diagram showing processing of a character string bundle according to the second embodiment. In a second embodiment for identifying string bundles and determining primary and related contigs, the goal is to first identify bubble regions as compound pathways. One purpose of this is to try to decompose the string graph into simple paths and simple bubbles. However, the string graph of diploid genomes with complex heterozygous structural variants or repeat structures cannot be easily decomposed into simple paths and simple bubbles due to possible subgraph motifs.

[0114] For example, it is possible to have nested bubbles, loops, intertwined bubbles, and long branches between source and sink nodes, in which case instead of between haplotypes Local structural variation, some duplication of bubbles may be caused at branch points. The fol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Exemplary embodiments provide methods and systems for diploid genome assembly and haplotype sequence reconstruction. Aspects of the exemplary embodiment include generating a fused assembly graph fromreads of both haplotypes, the fused assembly graph including identified primary contigs and associated contigs; generating haplotype-specific assembly graphs using phased reads and haplotype aware overlapping of the phased reads; merging the fused assembly graph and haplotype- specific assembly graphs to generate a merged assembly haplotype graph; removing cross-phasing edges from the merged assembly haplotype graph to generate a final haplotype-resolved assembly graph; and reconstructing haplotype-specific contigs from the final haplotype-resolved assembly graph resulting in haplotype-specific contigs.

Description

[0001] Cross References to Related Applications [0002] This International PCT Patent Application claims the benefit of priority to U.S. Provisional Patent Application No. 62 / 166,605, filed May 26, 2015, and relates to a work titled "String Graph Assembly for Polyploid Genomes," filed December 18, 2014 US Patent Application Serial No. 14 / 574,887, assigned to the assignee of the present application and incorporated herein by reference. Background of the invention [0003] Advances in the sequencing of biomolecules, especially with respect to nucleic acid and protein samples, have revolutionized the field of cellular and molecular biology. Motivated by the development of automated sequencing systems, it has now become possible to sequence mixed populations of sample nucleic acids. However, the quality of the sequence information must be carefully monitored and can be compromised by a number of factors related to the biomolecule itself or the sequencing system used, including t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G06F19/18C12P19/34G16B30/20G16B20/00G16B20/20G16B30/10
CPCG16B20/00G16B30/00G16B30/10G16B20/20G16B30/20
Inventor C·金P·佩鲁索D·兰克
Owner PACIFIC BIOSCIENCES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products