Method for assembly of nucleic acid sequence data

A nucleic acid sequence and nucleotide sequence technology, applied in sequence analysis, electronic digital data processing, special data processing applications, etc., can solve problems such as non-standardized raw data, valueless NGS sequencing machines, and differences in matching thresholds

Inactive Publication Date: 2014-05-14
KONINKLJIJKE PHILIPS NV
View PDF0 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Furthermore, raw data obtained from NGS platforms are not normalized and show differences in read lengths, error profiles, match thresholds, etc.
Therefore, the application of NGS methods implies an increase in the amount and complexity of sequence information
[0004] However, the output of NGS sequencing machines is essentially worthless by itself, as sequence reads only become meaningful if the continuous genome sequence they represent is reconstructed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for assembly of nucleic acid sequence data
  • Method for assembly of nucleic acid sequence data
  • Method for assembly of nucleic acid sequence data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0127] Example 1 - Reference and de novo alignment of sequence reads to establish accurate repeat content of the AVPR1A gene

[0128] Since the repeat content (number of repeats) of the AVPR1A gene is associated with behavior, it has major health implications. Therefore, experimental assessments were performed based on reference and de novo alignments of sequence reads to establish the accurate repeat content of AVPR1A.

[0129] Reference alignments were used to map reads to genomic coordinates, while de novo alignments were used to determine the exact repeat content in the AVPR1A gene (see Figure 5 and 6 ).

[0130] The Qseq files obtained from Illumina GAIIx were first converted into fastq format. These files were then aligned to the human reference (GRCh37) genome using the BWA aligner. Consensus sequences were constructed using the SAM output from the BWA alignment. We know that the RS3 polymorphism in the AVPR1 gene is actually highly polymorphic and associated with c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a method for assembly of nucleic acid sequence data comprising nucleic acid fragment reads into (a) contiguous nucleotide sequence segment(s). The method comprises steps of: (a) obtaining a plurality of nucleic acid sequence data from a plurality of nucleic acid fragment reads; (b) aligning the plurality of nucleic acid sequence data to a reference sequence; (c) detecting one or more gaps or regions of non-assembly, or non-matching with the reference sequence in the alignment output of the step (b); (d) performing de novo sequence assembly of nucleic acid sequence data mapping to the gaps or regions of non-assembly; and (e) combining the alignment output of the step (b) and the assembly output of the step (d) in order to obtain (a) contiguous nucleotide sequence segment(s).The present invention further relates to a method, wherein the detection of gaps or regions of non-assembly is performed by implementing a base quality, coverage, complexity of the surrounding region, or length of mismatch filter or threshold. Also envisaged is the masking out of nucleic acid sequence data relating to known polymorphisms, disease related mutations or modifications, repeats, low map ability regions, CPG islands, or regions with certain biophysical features. In addition, a corresponding program element or computer program for assembly of nucleic the sequence data and a sequence assembly system for transforming the nucleic acid sequence data comprising nucleic acid fragment reads into (a) contiguous nucleotide sequence segment(s) are provided.

Description

field of invention [0001] The present invention relates to a method for assembling nucleic acid sequence data comprising nucleic acid fragment reads (reads) into contiguous nucleotide sequence segments, comprising the steps of: (a) obtaining a plurality of nucleic acids from a plurality of nucleic acid fragment reads sequence data; (b) aligning the plurality of nucleic acid sequence data with a reference sequence; (c) detecting one or more of the alignment output of step (b) that is unassembled or that does not match the reference sequence a gap or region; (d) de novo sequence assembly of nucleic acid sequence data mapped to said unassembled gap or region; and (e) combining the alignment output of step (b) and the assembly output of step (d) to obtain Contiguous segment of nucleotide sequence. The present invention also relates to a method wherein detection of unassembled gaps or regions is performed by applying base quality, coverage, surrounding region complexity or mismatc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G16B30/20G16B30/10
CPCG16B30/00G16B30/10G16B30/20
Inventor S·库马尔R·辛格N·迪米特罗娃
Owner KONINKLJIJKE PHILIPS NV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products