Analysis method and system of metagenome data

A metagenomic and data analysis technology, applied in the field of bioinformatics, can solve problems such as poor specificity, many false positives in test results, and shorten analysis time, and achieve the effect of reducing the amount of calculation, eliminating false positive results, and controlling the calculation time.

Active Publication Date: 2018-07-27
SIMCERE DIAGNOSTICS CO LTD +2
View PDF13 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005]1) Metagenome high-throughput detection has high sensitivity, but there are too many false positives in the test results, and the specificity is poor, which cannot meet the identification methods with high specificity requirements needs, e.g. clinical identification of pathogenic microorganisms
[0006]2) The existing metagenomic sequencing data analysis methods are still difficult to greatly accelerate the analysis speed and shorten the analysis time on the basis of ensuring the accuracy of identification results
[0007]3) The existing metagenomic data analysis platform has poor compatibility and cannot be generally applied to various sequencing scenarios
[0008]4) Existing metagenomic analysis technology cannot organically integrate species identification and functional gene analysis, and cannot provide more comprehensive and deeply processed information analysis results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Analysis method and system of metagenome data
  • Analysis method and system of metagenome data
  • Analysis method and system of metagenome data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0082] Example 1 Metagenome detection and data analysis of cardiac neoplasm samples based on Nanopore sequencing platform

[0083] The cardiac vegetation samples A1-A7 were collected from 7 cases of infective endocarditis patients with negative clinical culture for valve replacement surgery, and stored in a -80°C refrigerator.

[0084] Nucleic acid was extracted from the obtained samples according to the following procedure: take out the neoplastic sample from the refrigerator and place it at room temperature for 30 minutes, then use sterilized scissors to cut the neoplastic sample into pieces, and use the TIANamp Micro DNA kit to perform nucleic acid extraction according to the instructions.

[0085] The extracted nucleic acid samples were constructed and sequenced according to the following procedures. The library construction scheme selected the 1D Native barcoding protocol provided by Oxford Nanopore:

[0086] 1) Use g-TUBE (Covaris) to disrupt 1.2 μg nucleic acid sample at ...

Embodiment 2

[0107] Example 2 Metagenome detection and data analysis of cardiac neoplasm samples based on Ilumina sequencing platform

[0108] Using A1-A2 in Example 1 as samples, genomic nucleic acid was extracted and a library was constructed, and Illumina HiseqPE150 was used for sequencing. After removing adapters and sequences with a high N ratio, sequence information in fastq format was obtained from the obtained sequencing data. The data analysis of each sample was carried out as follows:

[0109] 1) The data in fastq format generated by Ilumina sequencing is removed from the adapters and sequences with a high N ratio, and then enters the next step of quality assessment analysis.

[0110] 2) Sequencing quality identification. The read length of this data is 150, and the sequences whose length is less than 100bp and whose average sequencing quality is less than 25 are filtered out. If the GC ratio of the first 10 bases of the data is abnormal, the first 10 bases of each sequence will...

Embodiment 3

[0123] Example 3 Drug-resistant gene detection of postcardiac neoplasm samples based on BGI sequencing platform

[0124] Taking A1-A2 in Example 1 as samples, extract genomic nucleic acid and construct a library, use the BGI sequencing platform for sequencing, and perform the following data analysis on the data generated by BGI sequencing for each sample:

[0125] 1) The data in fastq format generated by BGI sequencing is removed from the adapters and sequences with a high N ratio, and then enters the next step of quality assessment analysis.

[0126] 2) Sequencing quality identification. The read length of the data library was 150, and the sequences with a length of <100bp and an average sequencing quality of <25 were filtered out.

[0127] 3) Remove the host sequence. By aligning to the human genome (genome version HG38), the sequences that failed to align were retained and entered into the next step of analysis.

[0128] 4) The "two-step method" is used to identify the p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an analysis method and a system of metagenome data. According to the invention, a preliminary species identification result of a sample is obtained on the basis of a k-Mer algorithm, a part or all of supporting sequences are extracted on the basis of the preliminary species identification result, and the preliminary species identification result is verified by using a blast algorithm to judge whether the preliminary species identification result is a reported detected species or not. The method and system disclosed by the invention can lower false positivity, quickly and accurately obtain the reported detected species of the sample in a short time, and are compatible with various mainstream sequencing platforms, thereby being suitable for second-generation sequencing technologies and third sequencing technologies; the method and system of the invention can also accurately identify drug-resistant genes and drug-resistant mutation sites of the sample and map thedrug-resistant genes and the drug-resistant mutation sites of the sample to the reported detected species. Furthermore, the system disclosed by the invention can be used for identifying pathogenic microorganisms, especially endocarditis pathogens to overcome the defect that the endocarditis pathogens are difficultly cultured.

Description

technical field [0001] The present invention relates to the field of bioinformatics, in particular to a metagenomic data analysis method and system. Background technique [0002] Metagenome, also known as community genome, refers to the sum of the genetic material of all tiny organisms in a specific niche. Metagenomics (metagenomics) refers to the discipline that directly applies genomics technology to the study of microbial communities in niches without the need to isolate and cultivate a single strain. [0003] Unlike previous microbiological analysis methods, metagenomics analysis does not need to screen the cultures of each microbial community, but directly determines the nucleic acid sequences of all microorganisms in the sample to analyze the growth of the microbial community. Metagenomics analysis can avoid the bias caused by changes in microbial sequences due to environmental changes, and is especially suitable for identifying microorganisms that are difficult to cu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22
CPCG16B30/00
Inventor 康悦胡欢程军周洲任用
Owner SIMCERE DIAGNOSTICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products