Construction method of reference protein database, storage medium and electronic equipment

A technology for building methods and databases, applied in proteomics, instrumentation, genomics, etc., which can solve problems such as increasing the cost of experiments and analysis, high false positive results, and biased results.

Pending Publication Date: 2021-09-14
上海君谊生物科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Theoretically, the optimal protein database can be adaptively constructed using sequencing data for each sample type. Although this is very beneficial to comprehensively detect the SNP of the sample and improve the accuracy of peptide identification, on the one hand, it will increase the cost of experiments and analysis. On the other hand, in some cases it is difficult to obtain samples from which nucleic acid can be extracted for sequencing
Establishing a general-

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Construction method of reference protein database, storage medium and electronic equipment
  • Construction method of reference protein database, storage medium and electronic equipment
  • Construction method of reference protein database, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] The preparation of embodiment 1 sample

[0076] Preparation of tryptic peptides

[0077] About 105 K562 cells were lysed by adding 8M urea containing 100mM ammonium bicarbonate for 30 minutes. Cell lysates were then diluted with 600 μL of 100 mM ammonium bicarbonate (reducing urea to 2M) and reduced with DTT (final concentration 10 mM). The protein was alkylated by adding IAA (final concentration 55 mM), and digested by adding 100 mg trypsin. Peptides were desalted using SepPak Plus columns, dried by vacuum centrifugation and diluted with 0.1% acetic acid. Approximately 200 ng of tryptic digested peptide was used for analysis in each experiment.

[0078] The preparation process of the tryptic peptide of the present invention can also be prepared by methods known in the art.

[0079] Perform mass spectrometry analysis on the above tryptic peptides to obtain mass spectrometry data files. The mass spectrometry here can adopt the existing conventional methods and instr...

Embodiment 2

[0080] Example 2 Construction method of GVP reference protein database based on sample

[0081] In this example, the GVP reference protein database was constructed based on the samples in Example 1. Schematic flow chart of the construction method of the GVP reference protein database of Example 2, figure 2 for figure 1 The detailed method flowchart schematic diagram of step S2 in, image 3 for figure 1 The detailed method flow diagram of step S3, Figure 4 for figure 1 The detailed method flow diagram of step S4. Such as figure 1 , figure 2 , image 3 , Figure 4 As shown, the specific steps of this embodiment are as follows:

[0082] First, step S2 is performed to obtain SNP site information in the public database, and screen the SNP sites to obtain a list of SNP sites that cause changes in encoded amino acids.

[0083] Specifically, as in step S201, download the VCF file of all chromosomes from the official website of 1000genome, and download the common_all.vcf....

Embodiment 3

[0123] Example 3 The method for determining SNP based on proteomics

[0124] In this example, the reference protein database and the GVP list obtained by the construction method of the sample-based GVP reference protein database in Example 2 were used to determine the SNP based on proteomics for the samples in Example 1.

[0125] Figure 5 It is a schematic flow chart of the method for determining SNP based on proteomics. Such as Figure 5 As shown, first, as in step S1, protein extraction is performed on the sample, proteomic detection is performed with a mass spectrometer, and mass spectrometry data is collected.

[0126] Then construct the GVP reference protein database and the GVP list based on the sample as steps S2 to S4, including steps:

[0127] Step S2, obtaining SNP site information in the public database, screening the SNP sites, and obtaining a list of SNP sites that cause changes in encoded amino acids.

[0128] Step S3, in the SNP site list, according to the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of biology, and particularly relates to a construction method of a reference protein database, which comprises the following steps: S1, acquiring SNP site information in a public database, and screening SNP sites to obtain an SNP site list causing coded amino acid change; s2, screening out SNP sites of protein with high expression quantity from the SNP site list according to the protein abundance of the sample; and s3, generating heritable variation peptides corresponding to the screened SNP site list, adding the heritable variation peptides into a reference protein sequence of a public protein database to generate a sample-based reference protein database, and the reference protein database comprises the heritable variation peptides. The reference protein database is based on samples, so that errors caused by mistakenly removing the protein are avoided, the effective data integrity is improved, and high redundancy caused by excessive heritable variation information is avoided.

Description

technical field [0001] The invention relates to the fields of biotechnology and information technology, in particular to a method for constructing a reference protein database and a method for determining SNP sites based on proteomics. Background technique [0002] Mass spectrometry-based proteomics has become a major method for the comprehensive detection and characterization of proteins, playing an important role throughout the fields of biology and medicine. In order to identify peptides or proteins, most proteomics experiments rely on public and general protein sequence reference databases, and compare the mass spectra of experimental peptides with the theoretical mass spectra searched from public and general protein sequence reference databases. score. However, it is difficult to detect proteins that are missing in public general protein sequence reference databases, such as undiscovered unannotated proteins or proteins containing single nucleotide polymorphism (single...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B50/00G16B20/20G16B20/50
CPCG16B50/00G16B20/20G16B20/50
Inventor 戚良
Owner 上海君谊生物科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products