Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Micro genome segment clustering method based on fuzzy k-mean

A technology of metagenomics and clustering methods, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., and can solve problems such as errors, misclassifications, and easy pollution

Inactive Publication Date: 2014-07-30
JILIN UNIV
View PDF1 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] China Patent No. 201110439198.X, titled "Classification Method and Device Based on Metagenome 16S Hypervariable Region V3" discloses a classification method of metagenomic data, but this method uses 16S rDNA to classify metagenomic data. Data classification, 16SrDNA identification is a PCR-based identification method, which is prone to contamination like other PCR identification methods; and this method still assembles the overlapping relationship of DNA sequences before classification, which inevitably brings errors. And ultimately affect the accuracy of classification results
[0006] The master's thesis "Species Classification Method Based on k-mer Frequency Statistics" introduced the method of using k-mer frequency to classify species, but it only selected six DNA sequences of bacteria as data sets, and the length of each sequence was Above 1000bp, it cannot reflect the characteristics of large number of species and short sequences of metagenomic data
In addition, in the process of classification, only the Euclidean distance between feature vectors is used as the basis for species classification, which is easy to cause misclassification.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Micro genome segment clustering method based on fuzzy k-mean
  • Micro genome segment clustering method based on fuzzy k-mean
  • Micro genome segment clustering method based on fuzzy k-mean

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0099] Example: Cluster analysis of simulated metagenomic data

[0100] The metagenomic data in this embodiment is simulated from the whole genome data of 20 species, which contains 20 species, each species has 300 DNA fragments, and the length of each fragment is 200nt (gene fragment length unit, meaning nucleotides), therefore, there are a total of 6000 DNA fragments in the dataset. We use the method proposed by the present invention to perform feature extraction, data normalization, and cluster analysis on the metagenomic data in this example. The metagenomic data information in the present embodiment is as shown in Table 1, and the result information of cluster analysis is as shown in Table 2 using the method proposed by the present invention:

[0101] Table 1: Metagenome data information in this embodiment

[0102]

[0103] Table 2: Results of Cluster Analysis

[0104]

[0105] It can be seen from Table 2 that the cluster analysis of the 20 species in this examp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a micro genome segment clustering method based on fuzzy k-mean, and belongs to the technical field of bioinformatics analysis. The micro genome segment clustering method based on the fuzzy k-mean aims at utilizing the self features of micro genome segments under the condition of no assembly of the micro genome segments, and then the micro genome segments are clustered, so the number of contained species and the abundance ratio of species are obtained. The method comprises the following steps of obtaining the micro genome segments, establishing the feature vectors, utilizing the fuzzy k-mean method to cluster, and calculating the number of the contained species and the abundance ratio of the species according to the clustering results. The method has the advantages of directness and convenience.

Description

technical field [0001] The invention belongs to the technical field of bioinformatics analysis. Background technique [0002] Traditional gene sequencing technology must first culture microorganisms in a laboratory environment before sequencing can be completed, and only a single species can be sequenced. However, only a very small number of microorganisms on the earth can be cultivated in an experimental environment, which means that the genetic data obtained by traditional gene sequencing technology is very incomplete and cannot describe the true appearance of the microbial world. In addition, there is almost no microbial community containing only a single species on the earth, and the interaction relationship between species is very complex. Therefore, it is obviously unreasonable to only sequence the genes of a single species and ignore the species that interact with it. [0003] With the development of gene sequencing technology, it is possible to obtain DNA sequences...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/20
Inventor 刘富刘云侯涛张潇王珂康冰薛建
Owner JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products