Method and device for data classification of metagenome

A data classification and metagenomic technology, applied in the fields of genomics, electrical digital data processing, special data processing applications, etc., can solve the problems of high running time overhead and low classification accuracy, so as to reduce time overhead, improve classification The effect of operation speed

Active Publication Date: 2017-05-17
SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, the above existing supervised classification algorithms, due to the performance of feature extraction methods and classifier models, have relatively low classification accuracy for large-scale metagenomic data classification problems with low classification levels and multi-species classification, and the running time overhead is too high. Big

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for data classification of metagenome
  • Method and device for data classification of metagenome
  • Method and device for data classification of metagenome

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the objectives, technical solutions and beneficial effects of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0030] An embodiment of the present invention provides a method for classifying metagenomic data, the method includes: calculating a feature vector of a sequence to be sequenced; clustering the feature vector to obtain M groups of clusters G including read lengths 1 to G M , the M is an integer not less than 1; obtain the cluster G 1 to G M The central set K of each cluster in i ; by dividing the center set K of each cluster i Each read is compared with the reference gene sequence to determine the genomic class of each cluster. The embodiment of the present invention also prov...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of gene data processing and provides a method and device for data classification of metagenomes. The genomic classification precision is improved at a small time cost. The method comprises the steps of calculating feature vectors of a sequencing sequence; clustering the feature vectors to obtain M groups of clusters G1 to GM including read length, and M is an integer not smaller than 1; obtaining a center set Ki of each cluster from the clusters G1 to GM; and through the comparison of each read length of the center set Ki of each cluster and the reference gene sequence, judging the genome category of each cluster. Compared with the prior art, according to the technical scheme, the time cost used for classification is reduced, the operating speed is improved, and the classification precision of the genome category which the sequencing sequence belongs to is remarkably improved.

Description

technical field [0001] The invention belongs to the field of gene data processing, and in particular relates to a method and device for classifying metagenome data. Background technique [0002] DNA-based metagenomics theoretically covers all microorganisms in environmental samples, so it can reflect the composition of microbial communities more comprehensively and truly, and at the same time greatly expand the sources of screening new genes or biologically active substances. According to the different strategies used, metagenomics research can be divided into sequence-driven (sequence-driven) and function-driven (function-driven). Metagenomics research based on the construction of metagenomic library to screen new genes or new substances. [0003] The goal of metagenomic research is to study the structural composition of the microbiome. For example, the sequencing of marine samples can reveal environmental diversity. Similarly, the study of human samples can reveal the rel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/18G06K9/62
CPCG16B20/00G06F18/23
Inventor 郭宁魏彦杰滕彦宁葛健秋张慧玲
Owner SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products