The present invention relates to a method for
assembly of
nucleic acid sequence data comprising
nucleic acid fragment reads into (a) contiguous
nucleotide sequence segment(s). The method comprises steps of: (a) obtaining a plurality of
nucleic acid sequence data from a plurality of nucleic acid fragment reads; (b) aligning the plurality of
nucleic acid sequence data to a reference sequence; (c) detecting one or more gaps or regions of non-
assembly, or non-matching with the reference sequence in the alignment output of the step (b); (d) performing de novo
sequence assembly of
nucleic acid sequence data mapping to the gaps or regions of non-
assembly; and (e) combining the alignment output of the step (b) and the assembly output of the step (d) in order to obtain (a) contiguous
nucleotide sequence segment(s).The present invention further relates to a method, wherein the detection of gaps or regions of non-assembly is performed by implementing a base quality, coverage, complexity of the surrounding region, or length of mismatch filter or threshold. Also envisaged is the masking out of
nucleic acid sequence data relating to known polymorphisms,
disease related mutations or modifications, repeats, low map ability regions, CPG islands, or regions with certain biophysical features. In addition, a corresponding program element or
computer program for assembly of nucleic the sequence data and a
sequence assembly system for transforming the nucleic acid sequence data comprising nucleic acid fragment reads into (a) contiguous
nucleotide sequence segment(s) are provided.