Large-scale distributed parallel acceleration method and system for protein identification

A protein identification and acceleration system technology, applied in the field of distributed parallel acceleration, can solve problems such as poor acceleration efficiency

Active Publication Date: 2012-04-11
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to provide a large-scale distributed parallel acceleration method and system for protein identification, which is used to solve the problem of poor acceleration efficiency in the prior art under the parallel condition of reaching a hundred-core or even exceeding a thousand-core processor scale

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale distributed parallel acceleration method and system for protein identification
  • Large-scale distributed parallel acceleration method and system for protein identification
  • Large-scale distributed parallel acceleration method and system for protein identification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

[0078] Such as figure 1 As shown, it is a flow chart of the large-scale distributed parallel acceleration method for protein identification of the present invention. The process uses the following operations to perform large-scale distributed parallel acceleration of protein identification. The specific steps are as follows:

[0079] Step 101, first set the necessary search parameters;

[0080] Step 102, then input the protein sequence, use multiple processor processes in the cluster to theoretically digest the protein sequence, sort the obtained peptide sequences according to the theoretical precursor ion mass, remove redundancy, and finally create an index file block, and according to The Peptide Index file block generates the Peptide Index metadata file;

[0081] Step 103, next analyze th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a large-scale distributed parallel acceleration method and system for protein identification. The method comprises the following steps of: by a parallel processing method, theoretically digesting a protein sequence to obtain peptide sequences, and sorting and redundantly processing the peptide sequences so as to create a peptide index file block; sorting mass spectra spectrums, and evenly dividing the sorted mass spectra spectrums into a plurality of spectrum data blocks; and evenly distributing the spectrum data blocks to a plurality of host processes, and each host process sorting the distributed spectrum data blocks and sequentially assigning to idle slave processes for peptide spectrum matching identification; and summarizing identification results by the parallel processing method, inferring the corresponding protein sequences by the peptide sequences obtained by identification, and generating output files. When a processor has hundreds or even thousands of cores, the satisfactory acceleration efficiency when the protein identification is carried out can be obtained.

Description

technical field [0001] The invention relates to a distributed parallel acceleration method for large-scale protein identification, in particular to a method and a system for effectively sharing search tasks on multiple computing nodes by using distributed parallel technology to increase the speed of protein identification. Background technique [0002] A "Proteome" describes the ensemble of proteins expressed at a given moment and under given conditions in a particular biological sample. As the name suggests, proteomics is the study of the proteome. Its most basic task is to determine which proteins are expressed in the organism, how much they are expressed, post-translational modifications, and protein-protein interactions, etc., thereby obtaining protein levels. A holistic and comprehensive understanding of disease occurrence, cell metabolism and other processes in the world. In current proteome research, protein identification based on tandem mass spectrometry is one of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/00G06F17/30
Inventor 王乐珩王文平迟浩吴妍洁周郴付岩孙瑞祥贺思敏
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products