A semi-supervised learning-based multispectral sorting method for cross-linked mass spectrometry

A semi-supervised learning and sorting method technology, applied in informatics, bioinformatics, and used to analyze two-dimensional or three-dimensional molecular structures, etc., can solve problems such as instability and large sensitivity fluctuations, achieve stable performance and improve sensitivity , the effect of high sensitivity

Active Publication Date: 2019-05-07
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Compared with the first method, the above-mentioned second multispectral sorting method has higher sensitivity and faster speed, but it has the disadvantage of instability, that is, the sensitivity fluctuates greatly on different data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A semi-supervised learning-based multispectral sorting method for cross-linked mass spectrometry
  • A semi-supervised learning-based multispectral sorting method for cross-linked mass spectrometry
  • A semi-supervised learning-based multispectral sorting method for cross-linked mass spectrometry

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] figure 1 Shows a flow chart of a semi-supervised learning-based multispectral sorting method provided according to an embodiment of the present invention, refer to figure 1 , the method includes the following steps:

[0039] Step 1: Perform single-spectrum matching and sorting on each spectrum to obtain the best (usually the first score) cross-linked dipeptide single-spectrum matching result. In this embodiment, for all spectrograms obtained from the same batch of samples, the candidate peptide segment with the highest score matched by each spectrogram is obtained respectively. A spectrum is matched with a candidate cross-linked dipeptide to form a peptide spectrum match. The first place result refers to the peptide spectrum matching result with the first score. The peptide spectrum matching result includes information such as spectrum identification, peptide sequence, cross-linking site, and score. The above-mentioned matching scoring can use commonly used fine-scor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a semi-supervised learning-based multi-cross-linked-mass-spectrum sorting method. The method comprises the steps of 1) performing single spectrum matching and sorting on spectrograms to obtain corresponding optimal cross-linked dipeptide single spectrum matching results; extracting multi-spectrum matching eigenvectors of current peptide spectrum matching results, wherein dynamic features include an SVM (Support Vector Machine) score, a parent ion error ratio feature, a modification ratio feature and the like; 2) in the obtained cross-linked dipeptide matching results, selecting the results having the FDR (False Discovery Rate) in a preset FDR threshold and belonging to positive samples for constructing a positive sample library, and selecting the results of all negative samples for constructing a negative sample library; updating the multi-spectrum matching eigenvectors according to new training samples; 3) training an SVM classifier; 4) re-scoring all the cross-linked dipeptide results by using the trained SVM classifier; and 5) judging whether iteration continues to be performed or not according to a preset iteration condition, and outputting a multi-spectrum sorting result based on a current SVM score when the iteration is ended. The multi-spectrum sorting method is high in sensitivity and stable in performance.

Description

technical field [0001] The invention relates to bioinformatics technology, in particular, the invention relates to the identification technology of cross-linked mass spectrometry. Background technique [0002] At present, chemical cross-linking mass spectrometry has become the mainstream technology for studying protein structure and protein-protein interaction. In the prior art, due to the huge search space of cross-linked dipeptides, the research on computational methods is relatively lagging behind, and there is a problem of low accuracy in identifying cross-linked proteins in large-scale databases. Due to the weak normalization effect of the scoring function within a single spectrum, it is not suitable for the comparison of peptide spectrum matching between multiple spectra. Therefore, it is usually necessary to use a multispectral sorting algorithm to improve the sensitivity of cross-linked peptide identification. There are currently two types of multispectral sorting a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B15/00
CPCG16B15/00
Inventor 尹吉澧孟佳明刘超迟浩陈镇霖孙瑞祥董梦秋贺思敏
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products