Supercharge Your Innovation With Domain-Expert AI Agents!

Protein sequence database parallel search and identification method and device

A protein sequence, protein sequence technology, applied in the field of bioinformatics

Inactive Publication Date: 2019-09-24
HUNAN UNIV
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In recent years, with the rapid development and popularization of mass spectrometry science and technology, the amount of bioinformatics data, especially mass spectrometry data, has shown explosive growth. At the same time, the scale of protein databases has also shown an exponential growth trend. The volume of data poses a more serious challenge to database search methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein sequence database parallel search and identification method and device
  • Protein sequence database parallel search and identification method and device
  • Protein sequence database parallel search and identification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0051] In one embodiment, such as figure 1 As shown, a method for parallel search and identification of protein sequence databases is provided, comprising the following steps:

[0052] S100: Preprocessing the experimental mass spectrometry data set and protein sequence database to be identified.

[0053]The experimental mass spectrum data and protein sequence database to be identified are uploaded to the parallel search identification server, and the parallel search identification server starts a parallel processing process to preprocess the experimental mass spectrum dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a protein sequence database parallel search and identification method and device, computer equipment and a storage medium, and the method comprises the steps: preprocessing a to-be-identified experiment mass spectrum data set and protein sequence database, carrying out simulated enzyme digestion of each protein sequence in the protein sequence database, and obtaining a theoretical mass spectrum data set; separately extracting a mass spectrum peak of each experimental mass spectrogram corresponding to the experimental mass spectrum data set, generating a search query set, and searching for the theoretical mass spectrum data set in a protein sequence library according to the search query set to obtain a candidate peptide fragment sequence set; according to each experimental mass spectrogram and the corresponding candidate peptide fragment sequence set, generating candidate peptide fragments and scoring the same; integrating scoring results, performing inference from peptide fragments to protein to obtain protein identification results. The protein sequence database search and identification efficiency is greatly improved by adopting a parallel processing mode.

Description

technical field [0001] The present application relates to the technical field of biological information, in particular to a protein sequence database parallel search and identification method, device, computer equipment and storage medium. Background technique [0002] With the basic completion of the Human Genome Project, life science research has entered the post-genome era, in which proteomics research has gradually become the core content of life science research. In proteomics, protein sequence identification technology based on tandem mass spectrometry has become one of the mainstream large-scale protein identification methods. There are three main computational methods for protein identification based on tandem mass spectrometry: database search methods, de novo sequencing methods, and peptide sequence tagging methods. Among them, the database search method is the most commonly used. The basic idea is to match the tandem mass spectrogram obtained in the experiment wi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B20/00G16B50/30G01N33/68
CPCG01N33/6818G01N33/6848G16B20/00G16B50/30
Inventor 李肯立李闯唐卓陈建国刘勇刚李克勤廖湘科
Owner HUNAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More