Microbial species and functional composition analysis method for metagenome sequencing data

A technology of sequencing data and metagenomics, which is applied in the field of microbial gene analysis, can solve problems such as limited application range, limited application area, and low sensitivity, and achieve the effects of avoiding high false positive results, improving accuracy, and high sensitivity

Pending Publication Date: 2021-04-02
SHANGHAI PASSION BIOTECHNOLOGY CO LTD
View PDF1 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) The splicing step is not included, which cannot satisfy further analysis of contig sequences generated based on splicing
[0005] (2) The taxonomic annotation of species relies on the comparative analysis of non-spliced ​​sequence data, which does not include sequence splicing and corresponding identification based on the spliced ​​data, so the results are high in false positives, relying on the construction of a dedicated database,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microbial species and functional composition analysis method for metagenome sequencing data
  • Microbial species and functional composition analysis method for metagenome sequencing data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] A microbial species and functional composition analysis method for metagenomic sequencing data, comprising the following steps:

[0074] 1) Cut off the linker sequence fragments and low-quality fragments in the original data, filter out short sequences and ambiguous base sequences; if the host genome is known, delete the host sequence;

[0075] 2) Use the data obtained above for species annotation, and count the number of species sequences as the abundance, and then eliminate the sequences annotated to non-target species based on the annotation results;

[0076] 3) splicing the sequence after removing the non-order species to obtain the contig sequence;

[0077] 4) Perform similarity clustering on the contig sequences, and calculate the non-redundant contig sequence abundance of each sample, and remove the sequences whose total abundance is zero;

[0078] 5) Use the blastn algorithm to compare the non-redundant contig sequences to the nucleic acid database, and use the...

Embodiment 2

[0091] In the embodiment of the present invention, the simulated metagenomic data is used for analysis, and the species composition of the simulated data is shown in Attached Table 1, wherein the known host genome is the human genome.

[0092] In step S101, first use FastQC to check the sequencing quality of the original data; use Cutadapt to identify potential adapter sequences at the 3' end, and truncate at the identified adapters. here. It is required that the matching length with the linker sequence (R1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA; R2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT) be at least 3 bp, and a base mismatch rate of up to 20% is allowed. Then use the fastp software to cut the low-quality fragments. Specifically, the sliding window method is used to perform quality screening on the sequence: the window size is 5 bp, and it starts to move from the first base position at the 5' end, and the average quality of the bases in the window is required. Greater than or equal t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a microbial species and functional composition analysis method for metagenome sequencing data, which comprises the following steps: 1) cutting off a linker sequence fragment and a low-quality fragment in original data, and filtering out an over-short sequence and a sequence containing fuzzy bases; (2) performing species annotation by using the obtained data; (3) splicing the sequences without the non-target species; (4) performing similarity clustering on the contiguous group sequences; (5)using a blastn algorithm; (6) predicting gene regions in the non-redundant contiguous group sequences; (7) comparing a non-redundant protein sequence set with various protein annotation databases; and (8) calculating the gene sequence abundance. According to the method, further analysis of the contiguous group sequence generated by splicing is met, and the problems that the result is high in false positive, depends on construction of a special database and is narrow in application range are avoided; the sensitivity is high and the accuracy is improved.

Description

technical field [0001] The invention relates to the field of microbial gene analysis, in particular to a method for analyzing microbial species and functional components used in metagenomic sequencing data. Background technique [0002] With the continuous development of next generation high-throughput sequencing technology (Next Generation Sequencing, NGS), people's research on microbial communities is becoming more and more comprehensive and in-depth. Different from the common amplicon sequencing technology targeting microbial ribosomal RNA genes, metagenomics takes the genome of all microorganisms in the entire community system as the research object, and based on the shotgun sequencing technology, it comprehensively displays the species composition and function of the entire community Potential composition, and then elucidate the mechanism of action of the microbial community. However, due to the variety of sequencing sample types, sample size, sequencing depth, hosts, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00G16B50/00
CPCG16B30/00G16B50/00
Inventor 李鸿毅曲昊淼寇文伯薛正晟孙子奎
Owner SHANGHAI PASSION BIOTECHNOLOGY CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products