Microbial metagenome binning method and system

A metagenomic and microorganism technology, applied in the field of microbial metagenomic binning methods and systems, can solve problems such as loss of useful features, low binning accuracy, and inability to accurately describe sequence features of microorganisms, so as to achieve accurate binning results and improve processing. Efficiency, the effect of reducing feature dimension

Active Publication Date: 2021-08-03
NANKAI UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The inventors found that the existing binning algorithm does not fully utilize the sequence features and cannot deeply mine the microbial sequence features, resulting in low binning accuracy. A large number of useful features are lost, resulting in that the sequence features after dimensionality reduction cannot accurately describe microorganisms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microbial metagenome binning method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] The purpose of this embodiment is to provide a microbial metagenomic composition box method.

[0034] A microbial metagenomic component box method, comprising:

[0035] Obtain the microbial metagenomic sequence to be binned;

[0036] Feature extraction is performed on each sequence in the metagenomic sequence, wherein the extracted features include tetranucleotide frequency, RPKM abundance and kmer coverage features;

[0037] Input the extracted features into the VAE-GAN neural network for training, and encode the extracted features into the VAE hidden vector through training;

[0038] Based on the mean variable in the VAE hidden vector, the metagenomic sequence is clustered, and the binning of the metagenomic set is realized according to the clustering result.

[0039] Specifically, for ease of understanding, the solutions described in the present disclosure will be described in detail below in conjunction with the accompanying drawings:

[0040] Such as figure 1 T...

Embodiment 2

[0066] The purpose of this embodiment is a microbial metagenomic component box system, comprising:

[0067] A data acquisition unit, which is used to acquire microbial metagenomic sequences to be binned;

[0068] A feature extraction unit, which is used to extract features from each sequence in the metagenomic sequence, wherein the extracted features include tetranucleotide frequency, RPKM abundance and kmer coverage features;

[0069] A feature dimension reduction unit, which is used to input the extracted features into the VAE-GAN neural network for training, and encode the extracted features into the VAE hidden vector through training;

[0070] A clustering unit is configured to cluster the metagenomic sequences based on the mean variable in the VAE hidden vector, and implement binning of the metagenomics according to the clustering result.

[0071] In further embodiments, there is also provided:

[0072] An electronic device includes a memory, a processor, and computer i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a microbial metagenome binning method and system. The microbial metagenome binning method comprises the following steps: acquiring a microbial metagenome sequence to be binned; performing feature extraction on each sequence in the metagenome sequences, inputting the extracted features into a VAE-GAN neural network for training, and encoding the extracted features into a VAE implicit vector through training; based on a mean variable in the VAE implicit vector, clustering the metagenome sequence to realize binning of the metagenome; compared with an existing method, the scheme of the invention adopts a mode of fusing multiple features to deeply mine the sequence features of the metagenome so as to accurately describe the metagenome sequence, and meanwhile, in order to improve the processing efficiency of the algorithm, the VAE-GAN neural network is utilized to carry out feature dimension reduction on the extracted features, the feature dimension is reduced, meanwhile, necessary components in the sequence features are fully reserved, and the relation between the binning precision and the binning time is well balanced.

Description

technical field [0001] The disclosure belongs to the technical field of bioinformatics analysis, and in particular relates to a method and system for a microbial macrogenome component box. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] With the development of sequencing technology, microbial research methods based on culture and isolation are gradually replaced by metagenomics research methods. Since the metagenomics is the sequencing study of all the microorganisms in the community, in order to study the sequence of the single bacteria in the community, it is necessary to classify the metagenomics according to strains or species. In metagenomics, this classification method is called binning. (binning). [0004] The inventors found that the existing binning algorithm does not fully utilize the sequence features and cannot deeply mine t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/00G06K9/62G06N3/04G06N3/08
CPCG16B30/00G06N3/08G06N3/048G06N3/045G06F18/23G06F18/253
Inventor 刘健田妹陈娇
Owner NANKAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products