Methods for genome assembly and haplotype phasing

A genome assembly and haplotype technology, applied in biochemical equipment and methods, computer combinatorial chemistry, determination/inspection of microorganisms, etc., can solve problems such as difficulty in generating high-quality and highly contiguous genome sequences

Active Publication Date: 2015-12-02
RGT UNIV OF CALIFORNIA
View PDF78 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] It is still difficult to generate high-quality, high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Methods for genome assembly and haplotype phasing
  • Methods for genome assembly and haplotype phasing
  • Methods for genome assembly and haplotype phasing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0245] Example 1. Methods of producing chromatin in vitro

[0246] Two approaches to remodel chromatin deserve special attention: one approach uses ATP-independent random deposition of histones onto DNA, and the other approach uses ATP-dependent assembly of periodic nucleosomes. The present invention allows any one approach to be used in conjunction with one or more of the methods disclosed herein. Examples of the two ways to generate chromatin can be found in Lusser et al. ("Strategiesforthereconstitution of chromatin", Nature Methods (2004), 1(1): 19-26), the entire contents of which are incorporated herein by reference, including the references cited therein literature.

Embodiment 2

[0247] Example 2. Genome assembly using HI-C-based technology

[0248] The genome from a human subject was fragmented into pseudo contigs of 500 kb in size. Using the Hi-C-based method, multiple read pairs are generated by probing the physical layout of chromosomes in living cells. A variety of Hi-C-based methods can be used to generate read pairs, including the method shown below: Lieberman-Aiden et al. ("Comprehensive mapping of longrangeinteractionsrevealsfoldingprinciplesofthehumangenome", Science (2009), 326(5950):289-293), the entire contents of which are quoted The method is incorporated herein, including the references cited therein. Read pairs are located to all pseudo contigs, and those read pairs located to two independent pseudo contigs are used to construct an adjacency matrix based on the positioning data. By using the function of the distance from the reads to the edge of the pseudo contig, the read pairs of about 50%, about 60%, about 70%, about 80%, about 90%,...

Embodiment 3

[0253] Example 3. Method for haplotype phasing

[0254] Because the read pairs generated by the methods disclosed herein are usually derived from intrachromosomal contacts, any read pair containing a heterozygous site will also carry information related to its phasing. Using this information, reliable phasing can be performed quickly and accurately at short, medium, or even long (million base) distances. Designed for an experiment to phase data from one of 1,000 genome triads (a collection of maternal / paternal / offspring genomes) to reliably infer phasing. In addition, haplotype reconstruction can also be used in conjunction with the haplotype phasing method disclosed herein, which uses ortho-linkage similar to Selvaraj et al. (Nature Biotechnology 31: 1111-1118 (2013)).

[0255] For example, haplotype reconstruction based on the ortho-linkage method can also be used in the methods disclosed herein for phasing the genome. Haplotype reconstruction using a method based on proximity...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The disclosure provides methods for greatly accelerating and improving de novo genome assembly. The methods disclosed herein utilize methods for data analysis that allow for rapid and inexpensive de novo assembly of genomes from one or more subjects. The disclosure further provides that the methods disclosed herein can be used in a variety of applications, including haplotype phasing, and metagenomics analysis.

Description

[0001] Cross references to related applications [0002] This application claims the rights and interests of provisional application number 61 / 759,941 filed on February 1, 2013 and provisional application number 61 / 892,355 filed on October 17, 2013, the disclosure of which is incorporated herein by reference. Technical field [0003] The present invention provides methods for genome assembly and haplotype phasing to identify short, medium and long connections within the genome. Background technique [0004] In theory and practice, it is still difficult to generate high-quality, highly continuous genome sequences. Summary of the invention [0005] A long-term flaw of next-generation sequencing (NGS) data is that it cannot span large genomic repetitive regions due to short reads and relatively small insert sizes. This defect significantly affected de novo (denovo) assembly. Because the nature and arrangement of the genome rearrangement are uncertain, contigs separated by long repeat ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): C12Q1/68G16B30/20G16B30/10G16B35/10
CPCC12Q1/6869G16B30/00G16B35/00G16C20/60G16B30/10G16B35/10G16B30/20C12Q2521/301C12Q2521/319C12Q2522/101C12Q2523/101C12Q2535/122C12Q2563/131C12Q2565/501C12Q2525/307C12Q1/6888C12Q2521/501
Inventor 小R·E·格林L·F·拉里奥
Owner RGT UNIV OF CALIFORNIA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products