Check patentability & draft patents in minutes with Patsnap Eureka AI!

Optimization method for output result of Kraken2 software and method for identifying species types in sample

A technology for outputting results and optimizing methods, applied in the biological field, can solve the problem of high false positive rate, improve the accuracy of results and avoid false positives

Pending Publication Date: 2021-04-06
杭州瑞普基因科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Kraken2 has an extremely fast running speed, but because it does not filter the results by default, the false positive rate in the original results is very high, which can reach more than 85%.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimization method for output result of Kraken2 software and method for identifying species types in sample
  • Optimization method for output result of Kraken2 software and method for identifying species types in sample
  • Optimization method for output result of Kraken2 software and method for identifying species types in sample

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] Example 1: Analysis of simulated sequencing data for 40 species

[0038] First, the genomes of 40 representative species (Table 1) were randomly selected from the database, covering eukaryotes, bacteria, and viruses, including some species with close relatives. Then use the sequencing data simulation tool ART to generate double-ended 75bp sequencing data and run Kraken2 to identify these species. Finally, the results of Kraken2 are screened with the two parameters kmermax and kmersum, and the sensitivity, relative specificity and accuracy under different data volumes are calculated.

[0039] Sensitivity is the ratio of the number of detected true positive species X to the number of theoretical true positive species (40), ie X / 40.

[0040] Since there is no true negative value in the original data, in order to evaluate the results, we use the original results of Kraken2 as the basis. The false positive species in this result are regarded as the theoretical true negative...

Embodiment 2

[0062] Example 2: ZymoBIOMICS TM Analysis of sequencing data of MICROBIAL Community Standard (Catalog No.D6300)

[0063] Will ZymoBIOMICS TM The MICROBIAL Community Standard standard (Table 5) was added to triple-distilled water for library construction and sequencing. After quality control of the paired-end 75bp sequencing data and removal of human sources, Kraken2 was run for species identification and the parameters kmermax and kmersum were used. filter. Finally, the sensitivity, relative specificity and correct rate under different data volumes are calculated.

[0064] Table 5 ZymoBIOMICS TM MICROBIAL Community Standard

[0065]

[0066] Sensitivity is the ratio of the number of detected true positive species X to the number of theoretical true positive species (10), ie X / 10.

[0067] Since there is no true negative value in the original data, in order to evaluate the results, we use the original results of Kraken2 as the basis. The false positive species in thi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Kraken2 software output result optimization method, which comprises the following steps of: matching a sub-reading section of each reading section in a sequencing result with a species sequence in a known database, obtaining the kmer number of the sub-reading section matched with each species in each reading section, selecting the maximum value of the kmer number in each reading section, and recording the maximum value as the kmermax number; and comparing the kmermax number with a first threshold value, and when the kmermax number is smaller than or equal to the first threshold value, removing the read segments corresponding to the kmermax number so as to filter all the read segments. The method can accurately optimize the Kraken2 software output result, avoids the false positive phenomenon, and can be applied to identification of species types in samples.

Description

technical field [0001] The present invention relates to the field of biology. Specifically, the present invention relates to a method for optimizing output results of Kraken2 software and a method for identifying species types in a sample. Background technique [0002] Metagenome refers to the sum of all biological genetic materials in a specific environment. Taking it as the research object, the biological composition in the sample and the relationship between organisms and between organisms and the environment can be obtained through sequencing analysis and functional gene screening. [0003] Metagenome sequencing is referred to as mNGS, that is, metagenomics next generation sequencing, which is a technology for hybrid sequencing of all biological genomes in the environment without separation. [0004] Pathogenic microorganisms refer to microorganisms that can cause diseases in humans or animals. Including parasites, fungi, bacterial viruses, etc. [0005] mNGS can iden...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/10G16B50/00
CPCG16B30/10G16B50/00
Inventor 王涛肖姗姗常壹昭
Owner 杭州瑞普基因科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More