Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Efficient clustering of noisy polynucleotide sequence reads

A reading and clustering technology, applied in sequence analysis, instrumentation, code conversion, etc., can solve problems such as inability to read and the sequence of nucleotide bases cannot be directly observed

Active Publication Date: 2019-05-24
MICROSOFT TECH LICENSING LLC
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

A polynucleotide sequencer cannot read the sequence of nucleotide bases on a DNA molecule with 100% accuracy
However, since the sequence of nucleotide bases cannot be directly observed, it is difficult to identify when an error is made by a polynucleotide sequencer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient clustering of noisy polynucleotide sequence reads
  • Efficient clustering of noisy polynucleotide sequence reads
  • Efficient clustering of noisy polynucleotide sequence reads

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present disclosure provides computationally efficient techniques for clustering reads in sequence data such that reads directed to the same original DNA strand are placed in the same cluster. Clustering reads by itself cannot correct errors in the sequence data but it can organize DNA reads in a manner that makes error correction more efficient and / or accurate. One example of error correction for sequence data using clustering is described in US Provisional Patent Application No. 62 / 329,945. Due to the large amount of data generated by polynucleotide sequencers, computational efficiency is desired for applications involving DNA sequences. For example, data output by a single run of a polynucleotide sequencer can contain over a billion different DNA reads representing millions of different DNA strands.

[0012] The term "DNA strand" or simply "strand" refers to a DNA molecule. As used herein, "read" may be a noun referring to a string of data generated by a polynuc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A technique for clustering DNA reads from polynucleotide sequencing is described. DNA reads with a level of difference that is likely caused by errors in sequencing are grouped together in the same cluster. DNA reads that represent reads of different DNA molecules are placed in different clusters. The clusters are based on edit distance, which is the number of changes necessary to convert a givenDNA read into another. The process of forming clusters may be performed iteratively and may use other types of distance that serve as an approximation for edit distance. Well clustered DNA reads provide a starting point for further analysis.

Description

Background technique [0001] Sequencing of polynucleotides such as deoxyribonucleic acid (DNA) produces errors. A polynucleotide sequencer cannot read the sequence of nucleotide bases on a DNA molecule with 100% accuracy. However, since the sequence of nucleotide bases cannot be directly observed, it is difficult to identify when an error is made by a polynucleotide sequencer. Therefore, the correct sequence for DNA analysis can best be inferred only from the data generated by the polynucleotide sequencer. Analysis of the output from the polynucleotide sequencer can correct some errors. Sometimes, a moderate level of accuracy against DNA sequences is sufficient. However, in some other situations it is desirable to have the DNA sequence as accurate as possible. Various techniques are available to reduce errors in sequence data. Some techniques involve calibrating or otherwise altering the operation of polynucleotide sequencers. Other techniques involve processing sequence ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H03M7/30G16B30/10G16B30/20G16B40/00
CPCH03M7/3079G16B30/00G16B40/00G16B30/10G16B30/20G11C13/0019G06F16/285
Inventor L·策泽S·耶卡尼恩S·D·安格K·施特劳斯C·拉施特奇安R·坎南K·玛卡彻夫
Owner MICROSOFT TECH LICENSING LLC
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More