Unlock instant, AI-driven research and patent intelligence for your innovation.

Metagenome data mining method

A metagenomics and data mining technology, applied in the field of bioinformatics analysis, can solve problems such as redundancy and cumbersome collection, and achieve the effect of reducing computing costs

Active Publication Date: 2020-05-22
RES CENT FOR ECO ENVIRONMENTAL SCI THE CHINESE ACAD OF SCI
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the bottleneck limiting the use of this method is the establishment, analysis and use of specific databases
The current bioinformatics databases are characterized by large and redundant, such as the famous nr database, which covers all known functional sequence information; eggnog database covers known protein sequence information; kegg database covers known Metabolic pathway, enzyme function and sequence information; Cazy is the functional sequence involved in sugar metabolism; and specific function databases are still lacking, such as methane metabolism database, propionate metabolism database, etc.; this specific small database is especially suitable for small The pursuit of precision in field research, such as the ARGs database of CARD, the nitrogen cycle database of Ncyc, and the virulence factor database of VFDB, etc.; suitable databases for these small field researches often exist in large databases, but this specific small The establishment of a database is often like looking for a needle in a haystack, and it is especially cumbersome to collect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Metagenome data mining method
  • Metagenome data mining method
  • Metagenome data mining method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] Data mining of functional genes of methane metabolism in metagenomic sequencing.

[0078] Metagenome sequencing results: 12 paired-end sequencing, sequencing depth 5G;

[0079] Objective: To study the effect of different ammonia nitrogen suppression conditions on methane metabolism.

[0080] 1. Construct the specificity database of methane mechanism

[0081] 1) Run perl kegg_pathway_extract.pl--ko_ID_file ko_ID.txt#ko_ID.txt file(map00680)#

[0082] 2) According to the species classification information, remove the gene sequence of eukaryotes, and according to the final gene ID, through the TBtools tool Amazing fasta extractor function, specifically obtain the sequence of bacteria and archaea, as the final special impact database Methane_mechanism. fasta.

[0083] 2. Establish the mapping file of the methane mechanism specific database

[0084] 1) Obtain the index file of the Methane mechanism specific database from the Methane mechanism specific database, run the c...

Embodiment 2

[0097] Data mining of nitrogen metabolism (Nitrogen mechanism) functional genes in metagenomic sequencing.

[0098] Metagenome sequencing results: 12 paired-end sequencing, sequencing depth 5G;

[0099] Objective: To study the effect of different ammonia nitrogen suppression conditions on nitrogen metabolism in anaerobic digestion.

[0100] 1. Build Nitrogen mechanism database

[0101] 1) perl kegg_pathway_extract.pl--ko_ID_file ko_ID.txt#ko_ID.txt file(map00910)#

[0102] 2) According to the species classification information, remove the gene sequence of eukaryotes, and according to the final generated gene ID, use the Amazing fasta extractor function of the TBtools tool to specifically obtain the sequences of bacteria and archaea as the final database Nitrogen_mechanism.fasta.

[0103] 2. Create Nitrogen mechanism database mapping file

[0104] 1) samtools faidx Nitrogen_mechanism.fasta

[0105] 2) Merge Nitrogen_mechanism.fasta.fai and ko_pathway_information.txt to form...

Embodiment 3

[0117] Data Mining of Sulfur Mechanism Functional Genes in Metagenome Sequencing.

[0118] Metagenome sequencing results: 12 paired-end sequencing, sequencing depth 5G;

[0119] Objective: To study the effect of different ammonia nitrogen suppression conditions on sulfur metabolism in anaerobic digestion.

[0120] 1. Build the Sulfur mechanism database

[0121] 1) perl kegg_pathway_extract.pl--ko_ID_file ko_ID.txt#ko_ID.txt file(map00920)#

[0122] 2) According to the species classification information, remove the gene sequence of eukaryotes, and according to the final generated gene ID, through the TBtools tool Amazing fasta extractor function, specifically obtain the sequences of bacteria and archaea, as the final database Sulfur_mechanism.fasta.

[0123] 2. Create a Sulfur mechanism database mapping file

[0124] 1) samtools faidx Sulfur_mechanism.fasta

[0125] 2) Merge Sulfur_mechanism.fasta.fai and ko_pathway_information.txt to form the mapping file Sulfur_mechanism....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a metagenome data mining method which comprises the following steps: 1) acquiring all gene information of a target metabolic pathway from a KEGG database, and establishing a specific database; 2) establishing an image file of a specific database of the target metabolic pathway; 3) based on the obtained specific database of the target metabolic pathway, carrying out rapid database comparison on the clear reads obtained by metagenome sequencing to obtain a comparison result of each sample; 4) sorting, counting and integrating the comparison results of the samples; and 5)carrying out homogenization treatment on the annotation result of each sample, and carrying out quantitative analysis among different samples according to the homogenization result. According to themethod, the specific database of the specified metabolic pathway can be quickly established for subsequent analysis, and the data can be homogenized and post-processed, so that the metabolic pathway related gene differences in different samples are quantitatively compared, and the method can be widely applied to the field of metagenome data mining.

Description

technical field [0001] The invention belongs to the field of bioinformatics analysis, in particular to a metagenomic data mining method. Background technique [0002] Metagenome sequencing is more and more widely used, and its data mining technology is constantly updated. In the process of metagenome bioinformatics analysis, the use of database is the foundation of subsequent functional analysis. At present, the analysis of metagenomic data at home and abroad lacks specificity, and the database construction for specific fields is not perfect, and the analysis results cannot be quantitatively or semi-quantitatively analyzed among different samples. Traditional analysis methods are mostly: paired-end sequencing → splicing into contigs (contigs) → open reading frame (Open reading frame, ORF) annotationdata analysis. A large amount of sequencing reads are lost in the process. For example, general metagenomic paired-end sequencing (5G data) will obtain about 50 million reads...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/00G16B50/00
CPCG16B20/00G16B50/00Y02A90/10
Inventor 张俊亚魏源送
Owner RES CENT FOR ECO ENVIRONMENTAL SCI THE CHINESE ACAD OF SCI