Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Rapid Method for Analyzing Eukaryotic Proteomics Data

A protein genome and eukaryotic technology, applied in the field of protein genome data analysis, can solve the problems of limited application range, only support data statistics, and high limitations, and achieve the effects of improving credibility, rapid identification and analysis, and improving coverage

Active Publication Date: 2021-07-13
INST OF AQUATIC LIFE ACAD SINICA
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the current research on proteomics is developing rapidly, since 2004, proteomics has supported several important model organisms, as well as a large number of non-model organisms, especially the genome annotation research of prokaryotes, but there are also the following problems: Problems in two aspects: 1) In terms of database construction: Compared with prokaryotes, eukaryotes have larger genomes, and it is difficult to directly use their genomes to construct databases. Taking the human genome as an example, the size of the six-reading frame translation database is about the same as that of traditional proteins. It is about 230 times that of the database; compared with the genome, the amount of data is larger due to the redundancy of the transcriptome. It is very worthwhile to use the de novo assembled transcripts to build a library. How to use a better storage structure to remove data redundancy research question
2) The problem of data quality control, the false positive rate of new peptides is often high: most of the current research work only performs false positive control (FDR) at the spectrum level to directly obtain the identified protein set, and only uses FDR in false positive control. Global FDR screening, resulting in a relatively high actual false positive rate for new peptides
3) Lack of automatic annotation tools suitable for eukaryotes: the vast majority of current proteomics research focuses on the explanation of new phenomena, and does not focus on the development of a complete process to support more research, especially true For nuclear biology, considering the mass spectrometry data, it is very inconvenient to share and transmit data, and it also greatly limits the promotion of proteomics
[0005] At present, there is still a lack of complete identification and analysis methods suitable for eukaryotic proteomics data, especially the existence of alternative splicing unique to eukaryotic genomes, which makes its gene annotation more complicated; in addition, the presence of point mutation genes also increases annotation complexity
Software for eukaryotic protein genome data analysis includes: PGTools, QUILTS, GALAXY-P, PPLine, PoGo, etc.; however, the methods set by these software have relatively high limitations, or are only suitable for human protein genome data analysis; or completely Rely on transcriptome prediction of variable splicing sites and point mutations, rather than direct identification based on proteomic data; or only support data statistics, lack of previous data processing and identification; or users have deep proteomics research background, the scope of application is also very limited, and the automatic and rapid analysis of data has not been realized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Rapid Method for Analyzing Eukaryotic Proteomics Data
  • A Rapid Method for Analyzing Eukaryotic Proteomics Data
  • A Rapid Method for Analyzing Eukaryotic Proteomics Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0068]Mass spectrum data in embodiment 1 comes from published article [Kelkar DS, Provost E, Chaerkady R, Muthusamy B, Manda SS, Subbannayya T, Selvan LDN, Wang CH, Datta KK, Woo S, DwivediSB, Renuse S, Getnet D, Huang TC, Kim MS, Pinto SM, Mitchell CJ, Madgundu AK, Kumar P, Sharma J, Advani J, Dey G, Balakrishnan L, Syed N, Nanjappa V, Subbannayya Y, Goel R, Prasad TSK, Bafna V, Sirdeshmukh R, Gowda H, Wang C, Leach SD, Pandey A, "Annotation of the Zebrafish Genome through an Integrated Transcriptomic and Proteomic Analysis", Molecular & Cellular Proteomics, 2014, 13:3184-3198]

[0069] Example 1

[0070] Zebrafish genome re-annotation, the steps are as follows:

[0071] Download the whole genome sequence of zebrafish from the Ensembl website, the GFF format file, the protein library sequence of the proteome (46260 known protein sequences are predicted), and the zebrafish transcriptome sequence from NCBI.

[0072] Combine the assembled transcriptome data, EST sequences and ...

Embodiment 2

[0092] Phaeodactylum tricornutum genome re-annotation, the steps are as follows:

[0093] 1) According to the experimental method in the literature [Yang MK, Yang YH, Chen Z, Zhang J, Lin Y, Wang Y, Xiong Q, Li T, Ge F, Bryant DA, Zhao JD, "Proteogenomic analysis and global discovery of posttranslational modifications in prokaryotes",2014,111(52):E5633-E5642], extract Phaeodactylum tricornutum protein, and carry out enzymatic digestion to the total protein to obtain a peptide mixed solution, and use Thermo LTQ Exactive mass spectrometer to detect the obtained peptide solution, Mass spectrometry data were collected, and a total of 1,555,391 mass spectrograms were collected.

[0094] 2) Using the same method as in Example 1, download the complete genome sequence of Phaeodactylum tricornutum from the JGI Genome Portal website, the transcriptome sequence, the GFF format file, the protein library sequence of the proteome (10567 known protein sequences), and Use ProteoWizard to con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for rapidly analyzing eukaryotic protein genome data, and belongs to the technical field of protein genome data analysis methods. The method for rapidly analyzing eukaryotic protein genomics data provided by the present invention adopts prokaryotic multi-group data sorting methods and screening methods to obtain Class II trusted peptides, and then targets the prediction of new genes, alternative splicing bodies, and point mutation genes. And to correct the structure of the annotated genes, we designed three different methods of replying the genome. The method provided by the present invention is applicable to any eukaryote that has been sequenced. At the same time, the method of predicting alternative splicing bodies and point mutation genes can improve the coverage of identification; different and stricter false positive control strategies can be used to improve identification. Credibility; From the original mass spectrometry data to the prediction and correction of new genes, alternative splicing bodies and point mutation genes, the serial analysis of annotated gene structures truly realizes the rapid identification and analysis of eukaryotic mass spectrometry data.

Description

technical field [0001] The invention belongs to a method for analyzing protein genome data, in particular to a method for rapidly analyzing eukaryotic protein genome data. Background technique [0002] With the completion of the Human Genome Project, genome sequencing technology has also matured, and more and more species have also completed genome sequencing. However, whole genome sequencing is only the beginning of solving problems. From sequence data to biological ontology, the value of genome is reflected in the functional annotation of its genome. Genome functional annotation is a process of adding analysis and interpretation to the original DNA sequence generated by genome sequencing, which is necessary to understand its biological metabolic process and biological significance. High-quality genome annotation is the sequence organization of the genome, especially the detailed identification and identification of genes and gene products. [0003] Ten years after the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/00
Inventor 葛峰杨明坤张珈洪斌
Owner INST OF AQUATIC LIFE ACAD SINICA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products