Bioinformatics method based on protein mass spectrum data annotation eukaryote genome

A technology of bioinformatics and eukaryotes, applied in the field of bioinformatics to annotate eukaryotic genomes based on protein mass spectrometry data, can solve the difficulty of obtaining full-length mRNA sequences, start codons and stop codons, and reduce Issues such as mass spectrum matching sensitivity

Inactive Publication Date: 2017-08-29
湖北普罗金科技有限公司
View PDF4 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method has some shortcomings: for example, due to the instability of mRNA, it is difficult to obtain the full-length mRNA sequence of some genes; in some species, the transcript encodes an operon instead of a gene; The start site is easily affected by many factors, and it is difficult to rely on mRNA to determine the start codon and stop codon
Although it can benefit from the establishment of these methods, the current research on proteomics analysis methods is still in its infancy, and its analysis methods are still computationally challenging, and there are inevitably some problems: A. Limitations o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bioinformatics method based on protein mass spectrum data annotation eukaryote genome

Examples

Experimental program
Comparison scheme
Effect test

Embodiment example

[0056] Aspergillus flavus mass spectrometry data 10G, establish six reading frame translation database, N-terminal peptide database, de novo predicted protein sequence database, transcriptome translation sequence database, integrated multi-omics database, construct high-coverage multi-omics sequence of eukaryotes database.

[0057] Download the non-coding RNA, pseudogene, non-coding gene sequence and EST sequence data of this species from the corresponding public database, and translate them into six different , A whole enzyme-cut peptide sequence with a length greater than 38.

[0058] (b) According to the integration strategy in step (e) of the first point, integrate the four types of databases created in the previous step into a redundant database.

[0059] (c) Filtering the constructed eukaryotic protein sequence database. If the same sequence appears in the eukaryotic protein sequence database and the deredundancy database, these sequences will be removed from the eukar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a bioinformatics method based on a protein mass spectrum data annotation eukaryote genome. The method specifically comprises the steps of 1, constructing a high-coverage eukaryote multi-omics sequence database; 2, removing eukaryotic protein sequence database redundancy; 3, conducting mass spectrum original data format conversion; 4, adopting a database searching engine with different algorithms, and retrieving mass spectrum data separately; 5, conducting peptide fragment spectrum matching and scoring on a retrieved and processed result; 6, screening result data after type FDR system evaluation; 7, verifying an annotated encoding gene; 8, authenticating a new gene which is not annotated; 9, authenticating alternative splicing; 10, authenticating functional point mutation; 11, aiming at protein posttranslational modification, conducting large-scale authentication; 12, conducting functional annotation of the new gene and the posttranslational modification. According to the bioinformatics method based on the protein mass spectrum data annotation eukaryote genome, the accuracy and sensitivity of protein mass spectrum data analysis are comprehensively improved, in-depth analysis and annotation for the eukaryote genome is achieved, and the method specifically has the advantages of being efficient, accurate and comprehensive.

Description

technical field [0001] The invention belongs to the field of bioinformatics, and in particular relates to a bioinformatics method for annotating eukaryotic genomes based on protein mass spectrum data. Background technique [0002] Genome sequencing can only measure the base pair sequence of the entire DNA, but cannot directly measure the genes and their functions on the DNA. The sequence must be analyzed through bioinformatics methods, combined with proteomics and transcriptomics , to mine and annotate genes and their functions, which is called gene annotation. [0003] Genome annotation is a high-throughput annotation of the biological functions of all genes in the genome using bioinformatics methods and tools, and is a hot spot in current functional genomics research. [0004] The prediction of gene structure is of great significance for discovering new genes and understanding the rules of genome structure, and is an important content of various genome projects. At prese...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 张珈葛峰杨明坤熊倩洪斌李俊峰刘光猛
Owner 湖北普罗金科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products