Large-scale distributed parallel acceleration method and system for protein identification

A protein identification, large-scale technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as poor acceleration efficiency
CN102411679AActive Publication Date: 2012-04-11INST OF COMPUTING TECH CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INST OF COMPUTING TECH CHINESE ACAD OF SCI
Publication Date
2012-04-11

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to a large-scale distributed parallel acceleration method and system for protein identification. The method comprises the following steps of: by a parallel processing method, theoretically digesting a protein sequence to obtain peptide sequences, and sorting and redundantly processing the peptide sequences so as to create a peptide index file block; by the parallel processing method, sorting mass spectra spectrums, and evenly dividing the sorted mass spectra spectrums into a plurality of spectrum data blocks; and evenly distributing the spectrum data blocks to a plurality of host processes, and each host process sorting the distributed spectrum data blocks and sequentially assigning to idle slave processes for peptide spectrum matching identification; and summarizing identification results by the parallel processing method, inferring the corresponding protein sequences by the peptide sequences obtained by identification, and generating output files. When a processor has hundreds or even thousands of cores, the satisfactory acceleration efficiency can be obtained when the protein identification is carried out.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to a distributed parallel acceleration method for large-scale protein identification, in particular to a method and a system for effectively sharing search tasks on multiple computing nodes by using distributed parallel technology to increase the speed of protein identification. Background technique

[0002] A "Proteome" describes the ensemble of proteins expressed at a given moment and under given conditions in a particular biological sample. As the name suggests, proteomics is the study of the proteome. Its most basic task is to determine which proteins are expressed in the organism, how much they are expressed, post-translational modifications, and protein-protein interactions, etc., thereby obtaining protein levels. A holistic and comprehensive understanding of disease occurrence, cell metabolism and other processes in the world. In current proteome research, protein identification based on tandem mass spectrometry is one of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More