Rapid analysis method based on OrthoMCL clustering result

A rapid analysis and genome analysis technology, applied in the field of comparative genomics and bioinformatics, to achieve the effect of high versatility, high added value, and convenient operation

Active Publication Date: 2020-02-18
ANHUI MEDICAL UNIV
View PDF2 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the above problems, the present invention proposes a rapid analysis method based on OrthoMCL clustering results to solve the problem that the prior art does not specifically target proteins in pan-genome analysis. A method for analyzing and counting source clustering results, quickly classifying corresponding proteins, and outputting corresponding representative protein sequences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rapid analysis method based on OrthoMCL clustering result
  • Rapid analysis method based on OrthoMCL clustering result
  • Rapid analysis method based on OrthoMCL clustering result

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] This embodiment provides a method for homologous clustering and rapid analysis of proteins based on the clustering results of OrthoMCL to analyze the proteins of 9 Trametes species, including the following steps:

[0049] Step S1, download the sequencing data files (including nucleic acid and protein sequences) of 9 Trametes species from the website of the National Center for Biotechnology Information (NCBI), and use the OrthoMCL clustering software to perform protein sequence analysis on the protein sequences of the 9 Trametes species. Homologous clustering, OrthoMCL clustering software can be downloaded from https: / / orthomcl.org / orthomcl / , OrthoMCL’s homologous clustering operation of proteins within a species is based on the existing technology, and will not be repeated here, obtained OrthoMCL clustering results;

[0050] Step S2, set the number of species used in the pan-genome analysis to N=9, count the number of species N1 contained in the cluster of each correspo...

Embodiment 2

[0062] The only difference between this example and Example 1 is that the pan-genome analysis species are different. The files processed in this embodiment are the OrthoMCL protein clustering results of 9 strains of Veillonella atypica, the results are as follows Figure 4-5 as shown, Figure 4 It is the statistical result of the number of specific proteins in 9 species of atypical Veillonella, Figure 5 The homologous protein cluster60 in 9 species of atypical Veillonella corresponds to the protein sequence of each species. Based on the OrthoMCL protein clustering results of 9 atypical Veillonella species, after processing with this method, corresponding files can be provided for subsequent pan-genome analysis of atypical Veillonella species.

Embodiment 3

[0064] The only difference between this example and Example 1 is that the pan-genome analysis species are different. The files processed in this embodiment are the OrthoMCL protein clustering results of 66 strains of Porphyromonas gingivalis (Porphyromonas gingivalis), the results are as follows Figure 6-7 as shown, Figure 6 is a partial statistical result of the core protein of Porphyromonas gingivalis, Figure 7 The homologous protein cluster459 in 66 species of Porphyromonas gingivalis corresponds to the protein sequence in each species. Based on the OrthoMCL protein clustering results of 66 Porphyromonas gingivalis, after processing using this method, corresponding files can be provided for subsequent pan-genome analysis of Porphyromonas gingivalis.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a rapid analysis method based on an OrthoMCL clustering result, and belongs to the field of comparative genomics and bioinformatics. The method is based on an OrthoMCL clustering result and comprises the steps of: establishing automatic identification of various proteins in generic analysis, including all representative proteins, core proteins, single-copy core proteins andspecies-specific proteins; based on the respective classification of the proteins, counting the number of the classified proteins existing in the respective species, and outputting the result according to the category. The method achieves output of representative sequences of proteins in various classifications and output of representative sequences of various proteins in each species. In addition, according to the method, protein homologous clustering results are output according to the sequence corresponding to each homologous protein, and a foundation is laid for achieving higher-level personalized analysis in generic analysis.

Description

technical field [0001] The invention relates to the fields of comparative genomics and bioinformatics, in particular to a rapid analysis method based on OrthoMCL clustering results. Background technique [0002] Comparative Genomics (Comparative Genomics) is the analysis of genome data of different species from the perspective of evolution, and the comparison of known genes and genome structures, so as to analyze the function of genes and the genetic mechanism between genes and diseases and phenotypes ( C. Setubal et al., 2017, Shilei Zhao et al., 2019). With the rapid development of sequencing technology, especially the development and innovation of second-generation and third-generation sequencing technology, the genomes of many species have been sequenced, and more and more species have population genome data of multiple samples at the species level. How to quickly and efficiently compare and analyze these genome sequencing data is a major research field in the developm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B25/00G16B50/00
CPCG16B25/00G16B50/00
Inventor 韩毛振张雁曹杰汪栋罗学才
Owner ANHUI MEDICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products