Method for analyzing data of prokaryotic proteogenomics rapidly and automatically

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A protein genome and automatic analysis technology, applied in the field of protein genome data analysis, can solve problems such as hindering the development of protein genomics, complicated use settings, and limited application scope, and achieve the effect of improving identification coverage and good compatibility.

Active Publication Date: 2016-09-21

湖北普罗金科技有限公司

View PDF2 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Although the current research on proteomics is developing rapidly, there are still problems in the following aspects: a. In terms of database construction, integrating multi-omics databases can obtain protein sequence databases with wider coverage, but it will also cause problems in the database. Too large, so that the mass spectrometry identification search engine cannot cope; b. Most of the research data have quality control problems, such as directly obtaining the identified protein set only through the global FDR at the PSM level, and there is no precise FDR control for newly identified peptides; c. Multiple data integration and quality control tools are very lacking, and incremental genome annotation cannot be realized, which hinders the development of proteomics to a large extent; d. Mass spectrometry data also makes data sharing and transmission very difficult. Inconvenient, but also limits the promotion of proteomics

[0004] At present, the main software for protein genome data analysis includes Peppy, PPLine, PGTools and Genosuite, etc. However, the methods set by these software have relatively high limitations, and are only applicable to data generated by specific high-resolution mass spectrometry instruments and several common databases Moreover, the use settings of these software are relatively complicated, requiring users to have a deep background in proteomics research, and the scope of application is also greatly limited, and automatic and rapid analysis of data has not been realized; in addition, these The software is also unable to cope with the current mass spectrometry data collection, and the expansion of the search space due to the large database, which limits the research of proteomics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0048] The mass spectrum data in Example 1 and Example 2 are from published articles respectively [Muller, S.A., Findeiss, S., Pernitzsch, S.R., Wissenbach, D.K., Stadler, P.F., Hofacker, I.L., von Bergen, M., and Kalkhof , S, "Identification of new protein coding sequences and signalpeptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics", Journal of proteomics, 2013, 86, 27-42] and [Albrethsen, J., Agner, J., Piersma, S.R., Hojrup, P ., Pham, T.V., Weldingh, K., Jimenez, C.R., Andersen, P., and Rosenkrands, I., "Proteomic Profiling of Mycobacterium tuberculosis Identifies Nutrient-starvation-responsive Toxin-antitoxin Systems", Molecular & Cellular Proteomics, 2013, 12, 1180- 1191].

[0049] Example 1 Large-scale identification of new coding genes and post-translational modifications of Helicobacter pylori, the steps are as follows:

[0050] 1) download Helicobacter pylori complete genome sequence, transcriptome sequence, GFF format file, GBK format f...

Embodiment 2

[0065] Non-marker quantitative analysis of newly encoded genes and proteins of Mycobacterium tuberculosis, the steps are as follows:

[0066] 1) adopt the method same as embodiment 1, provide whole genome sequence of Mycobacterium tuberculosis, transcriptome sequence, GFF format file, GBK format file, the protein storehouse sequence of proteome, the present invention uses perl language program, according to six reading and The three-reading frame translation method translates to obtain the protein library file; then uses ProteoWizard to convert the original data into a standard mgf format file; finally configures the search engine search parameters uniformly.

[0067] 2) The search engines of five different algorithms automatically search the database, and carry out the identification of new genes and structurally changed genes, as shown in Table 3, by the method of the present invention, 10 new genes and 9 N The gene with terminal extension includes 559 new unique peptides; t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for analyzing data of prokaryotic proteogenomics rapidly and automatically. Users only need to provide mass spectral data and a corresponding database file and set a simple retrieval parameter; retrieval of data of proteogenomics can be completed; simultaneously, a user defined data retrieval result can also be compatible; therefore, the identification coverage rate of the data of proteogenomics is increased; in the method disclosed by the invention, library search engines of different algorithms are integrated in advance; disadvantages of a single retrieval method are made up; a user defined library search result can also be compatible; the method has good compatibility; the peptide fragment identification coverage rate is increased to the most extent; in the method disclosed by the invention, functional annotation of new genes is automatically completed; large-scale identification of protein post-translational modification and analysis of non-labelled quantitative proteomics are realized for the first time; and automatic and rapid deep analysis of the data of proteogenomics is really realized.

Description

technical field [0001] The invention relates to a method for analyzing protein genome data, in particular to a method for automatically and rapidly analyzing prokaryotic protein genome data. Background technique [0002] With the rapid development of high-throughput DNA sequencing technology, humans have sequenced the genomes of more and more species. The purpose of genome sequencing is to gain a better understanding of the composition and function of genes involved in biological functions. Therefore, the basic task of genome annotation is to determine the location and structure of genes and other elements, and to determine the specific biological functions of these genes and elements. At present, this genome annotation method mainly relies on DNA and RNA sequence information; compared with genome or transcriptome annotation, proteomics can directly study the protein products translated from coding genes, so proteomics is more important than genome or transcriptome annotati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F19/18

CPCG16B20/00

Inventor 杨明坤张珈洪斌葛峰

Owner 湖北普罗金科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for analyzing data of prokaryotic proteogenomics rapidly and automatically

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology