Method for analyzing data of prokaryotic proteogenomics rapidly and automatically

A protein genome and automatic analysis technology, applied in the field of protein genome data analysis, can solve problems such as hindering the development of protein genomics, complicated use settings, and limited application scope, and achieve the effect of improving identification coverage and good compatibility.

Active Publication Date: 2016-09-21
湖北普罗金科技有限公司
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the current research on proteomics is developing rapidly, there are still problems in the following aspects: a. In terms of database construction, integrating multi-omics databases can obtain protein sequence databases with wider coverage, but it will also cause problems in the database. Too large, so that the mass spectrometry identification search engine cannot cope; b. Most of the research data have quality control problems, such as directly obtaining the identified protein set only through the global FDR at the PSM level, and there is no precise FDR control for newly identified peptides; c. Multiple data integration and quality control tools are very lacking, and incremental genome annotation cannot be realized, which hinders the development of proteomics to a large extent; d. Mass spectrometry data also makes data sharing and transmission...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for analyzing data of prokaryotic proteogenomics rapidly and automatically
  • Method for analyzing data of prokaryotic proteogenomics rapidly and automatically
  • Method for analyzing data of prokaryotic proteogenomics rapidly and automatically

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] The mass spectrum data in Example 1 and Example 2 are from published articles respectively [Muller, S.A., Findeiss, S., Pernitzsch, S.R., Wissenbach, D.K., Stadler, P.F., Hofacker, I.L., von Bergen, M., and Kalkhof , S, "Identification of new protein coding sequences and signalpeptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics", Journal of proteomics, 2013, 86, 27-42] and [Albrethsen, J., Agner, J., Piersma, S.R., Hojrup, P ., Pham, T.V., Weldingh, K., Jimenez, C.R., Andersen, P., and Rosenkrands, I., "Proteomic Profiling of Mycobacterium tuberculosis Identifies Nutrient-starvation-responsive Toxin-antitoxin Systems", Molecular & Cellular Proteomics, 2013, 12, 1180- 1191].

[0049] Example 1 Large-scale identification of new coding genes and post-translational modifications of Helicobacter pylori, the steps are as follows:

[0050] 1) download Helicobacter pylori complete genome sequence, transcriptome sequence, GFF format file, GBK format f...

Embodiment 2

[0065] Non-marker quantitative analysis of newly encoded genes and proteins of Mycobacterium tuberculosis, the steps are as follows:

[0066] 1) adopt the method same as embodiment 1, provide whole genome sequence of Mycobacterium tuberculosis, transcriptome sequence, GFF format file, GBK format file, the protein storehouse sequence of proteome, the present invention uses perl language program, according to six reading and The three-reading frame translation method translates to obtain the protein library file; then uses ProteoWizard to convert the original data into a standard mgf format file; finally configures the search engine search parameters uniformly.

[0067] 2) The search engines of five different algorithms automatically search the database, and carry out the identification of new genes and structurally changed genes, as shown in Table 3, by the method of the present invention, 10 new genes and 9 N The gene with terminal extension includes 559 new unique peptides; t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for analyzing data of prokaryotic proteogenomics rapidly and automatically. Users only need to provide mass spectral data and a corresponding database file and set a simple retrieval parameter; retrieval of data of proteogenomics can be completed; simultaneously, a user defined data retrieval result can also be compatible; therefore, the identification coverage rate of the data of proteogenomics is increased; in the method disclosed by the invention, library search engines of different algorithms are integrated in advance; disadvantages of a single retrieval method are made up; a user defined library search result can also be compatible; the method has good compatibility; the peptide fragment identification coverage rate is increased to the most extent; in the method disclosed by the invention, functional annotation of new genes is automatically completed; large-scale identification of protein post-translational modification and analysis of non-labelled quantitative proteomics are realized for the first time; and automatic and rapid deep analysis of the data of proteogenomics is really realized.

Description

technical field [0001] The invention relates to a method for analyzing protein genome data, in particular to a method for automatically and rapidly analyzing prokaryotic protein genome data. Background technique [0002] With the rapid development of high-throughput DNA sequencing technology, humans have sequenced the genomes of more and more species. The purpose of genome sequencing is to gain a better understanding of the composition and function of genes involved in biological functions. Therefore, the basic task of genome annotation is to determine the location and structure of genes and other elements, and to determine the specific biological functions of these genes and elements. At present, this genome annotation method mainly relies on DNA and RNA sequence information; compared with genome or transcriptome annotation, proteomics can directly study the protein products translated from coding genes, so proteomics is more important than genome or transcriptome annotati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 杨明坤张珈洪斌葛峰
Owner 湖北普罗金科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products