Unlock instant, AI-driven research and patent intelligence for your innovation.

Using k-mers for rapid quality control of sequencing data without alignment

A technology of aggregation and quality, applied in data visualization, microbial determination/inspection, sequence analysis, etc., can solve problems such as increased cost and low efficiency

Pending Publication Date: 2020-08-04
KONINKLJIJKE PHILIPS NV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This significant time delay between generating a read and obtaining quality control information prevents corrective action and can lead to inefficiencies and increased costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Using k-mers for rapid quality control of sequencing data without alignment
  • Using k-mers for rapid quality control of sequencing data without alignment
  • Using k-mers for rapid quality control of sequencing data without alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present disclosure describes various embodiments of systems and methods for quality control analysis of sequencing data. More generally, Applicants have recognized and appreciated that it would be beneficial to provide a system that provides for quality control analysis of sequencing data without the need for alignment of reads to a genomic reference. The system includes a database of k-mers extracted from one or more genomic references and annotated with information about the k-mers, such as species, position, and / or other information. As NGS data is acquired, k-mers are extracted from the reads and used to identify one or more annotated k-mers in the annotated k-mer database. The annotation information associated with each identified k-mer in the database provides data regarding the kind of read, the most likely origin of the read, and / or other information about the read. Annotated information can be collated and aggregated to provide one or more quality control ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method (200) for evaluating nucleic acid sequencing data using a quality control analysis system (300). The method comprises: receiving (210) a plurality of reads of a nucleic acid sequence; extracting (220) a plurality of k-mers from the plurality of reads; identifying (230), using the plurality of extracted k-mers, one or more of a plurality of annotated k-mers found in the plurality of reads,wherein the plurality of extracted k-mers are stored in an annotation database (350), and further wherein the annotated k-mers are annotated with annotation information about the one or more nucleicacid sequences from which the annotated k-mers are generated; gathering (240), based on the identified annotated k-mers found in the plurality of reads, annotation information about the plurality of reads; and determining (250), based on the gathered annotation information, a quality control metric for at least some of the plurality of reads.

Description

technical field [0001] The present disclosure generally relates to methods and systems for quality control analysis of sequencing data. Background technique [0002] Next-generation sequencing (NGS) is an important tool for genomics research and has many applications for discovery, diagnostics, and other methodologies. As a result, billions of NGS reads are generated every minute worldwide. [0003] Quality control analysis of NGS data is a critical first step required prior to any downstream analysis. Proper assessment of the quality of NGS data prior to subsequent analysis can reduce or prevent misinterpretation of data, misdiagnosis, and other undesired downstream effects. [0004] Metrics for quality control analysis of NGS data were established. Typically, these analyzes are performed before or after aligning the reads to the genome. For example, tools such as FASTQC are used to view raw sequencing data such as fastq files, and typically report information such as b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/10
CPCG16B50/30G16B30/00G16B45/00G16B20/00C12Q1/68
Inventor 吴捷Y·H·张
Owner KONINKLJIJKE PHILIPS NV