Distributed accelerating method and system for open type protein identification

A protein identification and distributed technology, applied in the field of bioinformatics, can solve the problems of combinatorial explosion in search space, difficulty in open identification, and slowdown of database search, so as to overcome time and space challenges, reduce calculation and space overhead, The effect of improving the identification speed

Active Publication Date: 2014-03-26
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The parallel version usually adopts the simplest spectrum splitting strategy, that is, each node identifies a part of the spectrum, and the identification speed is still limited by a single machine
[0014] Obviously, with the rapid growth of large-scale protein databases, if factors such as post-translational modifications and enzyme cleavage specificity are considered, it will lead to an explosion of search space combinations, and the high-configuration internal memory of a single machine cannot effectively cope with the huge amount of open identification. Space requirements, greatly reducing the speed of database searches, while leading to an increase in false positive search results
[0015] In short, due to the limitation of storage space and computing power, the current protein identification system is difficult to effectively complete open identification, that is, support large protein library, non-specific digestion and arbitrary modification
In addition, most practical engines are still in stand-alone mode, which cannot effectively take advantage of resources such as clusters, so it is necessary to design and develop effective parallel systems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed accelerating method and system for open type protein identification
  • Distributed accelerating method and system for open type protein identification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] as attached figure 1 Shown is a flowchart of a distributed acceleration method for open protein identification of the present invention, comprising the following steps:

[0062] Step 100, create peptide indexes in batches according to the protein sequence database, and save and persist the peptide indexes in blocks, before setting the necessary search parameters, inputting the protein sequence database, and storing the index data in the disk for persistence , so that only one indexing overhead is required for each protein data sequence library;

[0063] Step 200, input the protein spectrogram data to be identified, generate a query set from the protein spectrogram data in multiple threads, and pre-store the peptide index in the internal memory, wherein the protein sequence database contains all relevant known proteins Sequence, spectrogram data is to be identified, and the process of spectrogram identification is to match the spectrogram data with the corresponding par...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed accelerating method and system for open type protein identification. The distributed accelerating system for open type protein identification comprises a protein index establishing module, a spectrogram data preparation module, a query marking module and a result summarizing and output module. According to the distributed accelerating method and system for open type protein identification, cluster resources can be effectively utilized, a user is allowed to conduct protein identification on the premise that the types of enzyme digestion and grooming are not appointed in a large protein bank or any type of enzyme digestion and grooming is appointed, and protein map identification speed and analysis rate are effectively improved.

Description

technical field [0001] The invention relates to protein spectrum identification in the field of biological information, in particular to creation and retrieval of a distributed index of an open protein identification engine. Background technique [0002] On February 15, 2001, when "Nature" released the framework map of the human genome, it also published the news of the establishment of the Human Proteome Organization (HUPO). Scientists realized that understanding proteins -- the products of genes -- is crucial to truly unraveling the genetic mysteries of life. Proteomics studies the characteristics of proteins on a large scale, including protein expression levels, post-translational modification studies, and protein-protein interactions. In recent years, innovations in mass spectrometry have played a key role in promoting proteomics research. [0003] The qualitative and quantitative analysis of proteins using mass spectrometry data has become one of the core contents of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/18
Inventor 张文力迟浩路远征王乐珩赵晓芳贺思敏
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products