Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for rapidly analyzing eukaryotic protein genomic data

A protein genome and eukaryotic technology, applied in the field of protein genome data analysis, can solve the problems of limited application scope, only support data statistics, and high limitations, and achieve the effects of improving reliability, rapid identification and analysis, and improving coverage.

Active Publication Date: 2018-11-30
INST OF AQUATIC LIFE ACAD SINICA
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although the current research on proteomics is developing rapidly, since 2004, proteomics has supported several important model organisms, as well as a large number of non-model organisms, especially the genome annotation research of prokaryotes, but there are also the following problems: Problems in two aspects: 1) In terms of database construction: Compared with prokaryotes, eukaryotes have larger genomes, and it is difficult to directly use their genomes to construct databases. Taking the human genome as an example, the size of the six-reading frame translation database is about the same as that of traditional proteins. It is about 230 times that of the database; compared with the genome, the amount of data is larger due to the redundancy of the transcriptome. It is very worthwhile to use the de novo assembled transcripts to build a library. How to use a better storage structure to remove data redundancy research question
2) The problem of data quality control, the false positive rate of new peptides is often high: most of the current research work only performs false positive control (FDR) at the spectrum level to directly obtain the identified protein set, and only uses FDR in false positive control. Global FDR screening, resulting in a relatively high actual false positive rate for new peptides
3) Lack of automatic annotation tools suitable for eukaryotes: the vast majority of current proteomics research focuses on the explanation of new phenomena, and does not focus on the development of a complete process to support more research, especially true For nuclear biology, considering the mass spectrometry data, it is very inconvenient to share and transmit data, and it also greatly limits the promotion of proteomics
[0005] At present, there is still a lack of complete identification and analysis methods suitable for eukaryotic proteomics data, especially the existence of alternative splicing unique to eukaryotic genomes, which makes its gene annotation more complicated; in addition, the presence of point mutation genes also increases annotation complexity
Software for eukaryotic protein genome data analysis includes: PGTools, QUILTS, GALAXY-P, PPLine, PoGo, etc.; however, the methods set by these software have relatively high limitations, or are only suitable for human protein genome data analysis; or completely Rely on transcriptome prediction of variable splicing sites and point mutations, rather than direct identification based on proteomic data; or only support data statistics, lack of previous data processing and identification; or users have deep proteomics research background, the scope of application is also very limited, and the automatic and rapid analysis of data has not been realized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for rapidly analyzing eukaryotic protein genomic data
  • Method for rapidly analyzing eukaryotic protein genomic data
  • Method for rapidly analyzing eukaryotic protein genomic data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0068]The mass spectrum data in Example 1 are from published articles [KelkarDS, Provost E, Chaerkady R, Muthusamy B, Manda SS, Subbannayya T, Selvan LDN, Wang CH, Datta KK, Woo S, DwivediSB, Renuse S, Getnet D, Huang TC , Kim MS, Pinto SM, Mitchell CJ, Madgundu AK, Kumar P, Sharma J, Advani J, Dey G, Balakrishnan L, Syed N, Nanjappa V, Subbannayya Y, Goel R, Prasad TSK, Bafna V, Sirdeshmukh R, Gowda H, Wang C, Leach SD, Pandey A, "Annotation of the Zebrafish Genome through an Integrated Transcriptomic and Proteomic Analysis", Molecular & Cellular Proteomics, 2014, 13:3184-3198]

[0069] Example 1

[0070] Zebrafish genome re-annotation, the steps are as follows:

[0071] Download the whole genome sequence of zebrafish from the Ensembl website, the GFF format file, the protein library sequence of the proteome (46260 known protein sequences are predicted), and the zebrafish transcriptome sequence from NCBI.

[0072] Merge the assembled transcriptome data, EST sequences and no...

Embodiment 2

[0092] Phaeodactylum tricornutum genome re-annotation, the steps are as follows:

[0093] 1) According to the experimental method in the literature [Yang MK, YangYH, ChenZ, Zhang J, LinY, Wang Y, XiongQ, Li T, Ge F, Bryant DA, Zhao JD, "Proteogenomic analysis and global discovery of posttranslational modifications in prokaryotes", 2014 , 111(52):E5633-E5642], extract Phaeodactylum tricornutum protein, and digest the total protein with enzymatic digestion to obtain a peptide mixture solution, use Thermo LTQExactive mass spectrometer to detect the obtained peptide solution, collect mass spectrometry data, a total of Collect 1555391 mass spectra.

[0094] 2) Using the same method as in Example 1, download the complete genome sequence of Phaeodactylum tricornutum from the JGI Genome Portal website, the transcriptome sequence, the GFF format file, the protein library sequence of the proteome (10567 known protein sequences), and Use ProteoWizard to convert the original data into a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for rapidly analyzing eukaryotic protein genomic data, and belongs to the technical field of protein genomic data analysis methods. According to the method for rapidlyanalyzing eukaryotic protein genomic data, II-type credible peptide fragments are obtained by adoption of a prokaryote multi-group data arrangement method and a screening method; and three different genome replying methods for the aims of predicting new genes, variant spliceosomes and point mutation genes and correcting structures of annotated genes are designed. The method provided by the invention is suitable for any sequenced eukaryon, and through a variant spliceosome and point mutation gene prediction method, the coverage degree of authentication is improved; by adoption of different relatively strict false positive control strategies, the credibility of the authentication is improved; and through predicting and correcting original mass spectrometric data, final new genes, variant spliceosome and point mutation genes, annotated gene structure series are analyzed, so that rapid authentication and analysis of eukaryotic mass spectrometric data are really realized.

Description

technical field [0001] The invention belongs to a method for analyzing protein genome data, in particular to a method for rapidly analyzing eukaryotic protein genome data. Background technique [0002] With the completion of the Human Genome Project, genome sequencing technology has also matured, and more and more species have also completed genome sequencing. However, whole genome sequencing is only the beginning of solving problems. From sequence data to biological ontology, the value of genome is reflected in the functional annotation of its genome. Genome functional annotation is a process of adding analysis and interpretation to the original DNA sequence generated by genome sequencing, which is necessary to understand its biological metabolic process and biological significance. High-quality genome annotation is the sequence organization of the genome, especially the detailed identification and identification of genes and gene products. [0003] Ten years after the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/18
Inventor 葛峰杨明坤张珈洪斌
Owner INST OF AQUATIC LIFE ACAD SINICA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products