A method for analyzing microbial community function using metagenomic data

A metagenomics and data analysis technology, applied in the field of bioinformatics, can solve the problems of increasing the difficulty, unreliability, large manpower and economic costs of integrated databases, achieve excellent universality and information comprehensiveness, and improve utilization Effect

Active Publication Date: 2020-11-17
BEIJING INST OF GENOMICS CHINESE ACAD OF SCI CHINA NAT CENT FOR BIOINFORMATION
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The sequencing and analysis of such a large sample size requires a lot of human and economic costs, which increases the difficulty of establishing such an integrated database.
[0009] Third, in terms of data utilization, the utilization rate of reads by conventional methods is too low to fully reflect the real state of the research object
Specifically, in the process of assembly and splicing of sequencing data, if there is no overlapping sequence between the reads, the splicing cannot be performed; if the splicing can be completed, if the resulting sequence is less than 500bp, it will be removed; and some obtained sequences are higher than 500bp , but cannot be predicted to have ORFs, these fragments are then removed
This process will cause a large amount of sequencing data to be lost or removed, which will lead to the situation that microbial species and functional information annotations are greatly missing compared with the actual situation.
Especially in studies with small sample sizes (20 samples or less), this loss of data will cause large errors and unreliability in the research results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for analyzing microbial community function using metagenomic data
  • A method for analyzing microbial community function using metagenomic data
  • A method for analyzing microbial community function using metagenomic data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] The construction of embodiment 1 microbial metagenomic reference database

[0064] In this embodiment, NCBI and KEGG are used as target biological information databases. Those skilled in the art can choose other biological information databases in the field to use the method of the present invention to carry out the construction of the microbial metagenomic reference database. scope of protection.

[0065] 1. Download the whole genome sequences of all microorganisms from different sources and different species in NCBI. The obtained data is a .fna file, which includes the NC number, base sequence, gi number and corresponding species information of the whole genome sequence.

[0066] 2. Download the .gbk annotation files of all microorganisms in NCBI, and extract the classification information of related species from the .gbk annotation files according to the NC number and species information obtained from the .fna file in the previous step, including phylum, class, ord...

Embodiment 2

[0073] Embodiment 2 Utilizes the method of metagenomic data to analyze microbial group function

[0074] 1. Sequence the metagenome of the microbial population to be tested, and perform quality control on the sequencing data: remove bases with a sequencing quality value less than 20, further remove sequences with reads less than 25bp in length, and remove reads derived from host DNA to reduce Errors that may occur during sample extraction and sequencing, so as to obtain high-quality whole-genome sequencing data.

[0075] 2. Calculation of species abundance: compare the microbial species data sets in the microbial metagenomic reference database constructed in Example 1 of the high-quality reads obtained in step 1, and perform species abundance calculations to obtain the Abundance values ​​for all species. Based on the species abundance value, the difference analysis of various microbial compositions among different samples was carried out.

[0076] 3. Calculation of gene abun...

Embodiment 3

[0078] Example 3 Functional metagenomic analysis of microbial populations in intestinal contents of poultry with small sample size

[0079] In this example, sequencing and data analysis were performed on the metagenomic data of intestinal contents of healthy and diseased poultry individuals with a small sample size. The experimental subjects used were 18 poultry individuals from the same industrial breeding factory, which were divided into two groups, with 9 samples in each group; the disease subject group was the disease individuals who had been diagnosed with poultry colibacillosis by veterinarians, and the control group The group is healthy individuals.

[0080] DNA extracted from the above 18 intestinal content samples was metagenomically sequenced by microbial whole-genome shotgun sequencing. Then use the method for analyzing the functions of microbial populations using metagenomic data established in Example 2 of the present invention and the existing conventional metho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for analyzing a microbial population function by using metagenome data. The method comprises the following steps: collecting all known microbial species, gene and function information, and integrating the information as a reference database; sequencing the metagenome of the to-be-tested microbial specifies, controlling the sequencing data quality, computing speciesabundance and gene abundance, analyzing the composition difference of microbes among different samples and gene level difference; annotating the gene function, clustering the genes with the same function to obtain a function module, performing adduction computation on related abundance of all non-redundancy genes in various function modules to obtain abundance values of all function modules, performing difference comparison analysis or overall evaluation on the functions of the to-be-tested sample microbes. Through the method provided by the invention, the step of respectively comparing the splicing data, the assembling data, the predicting data and the sequencing data with the single function database is saved, the time is saved, the utilization efficiency of the sequencing data is improved, and the method can be used for analyzing high-throughput microbe whole-genome sequencing data and screening the function microbe.

Description

technical field [0001] The present invention relates to the field of bioinformatics, in particular to a method for analyzing the functions of microbial populations using metagenomic data, which can save analysis steps and time and improve the utilization rate of sequencing data. Background technique [0002] With the continuous development of high-throughput sequencing technology, people have been able to explore complex biological functions from the genome level, which has given us a deeper understanding of the organism itself and disease-related research. More and more studies have found that there is a mutually beneficial balance between the microbial flora and the health of the host. Among them, the microbial flora can help the host ferment undigested food, participate in energy metabolism and nutrient absorption, provide the host with various trace elements, essential amino acids, and some antibacterial polypeptides, and decompose some toxins or harmful substances in th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B30/10G16B40/30G16B50/00C12Q1/689C12Q1/10C12Q1/06C12Q1/04
CPCC12Q1/689
Inventor 米双利邢志凯郭翀晔李蒙
Owner BEIJING INST OF GENOMICS CHINESE ACAD OF SCI CHINA NAT CENT FOR BIOINFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products