Key protein identification method based on protein clustering characteristics and activity co-expression

A recognition method and protein technology, which is applied in the field of key protein recognition based on protein clustering characteristics and active co-expression, can solve the problems of affecting accuracy, deviation of prediction results, ignoring influence, etc., and achieve accuracy improvement, high accuracy, Effect of Noise Cancellation

Active Publication Date: 2020-05-08
HUNAN NORMAL UNIVERSITY
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the noise of subcellular localization information and functional annotation information, the prediction results may be biased
At the same time, it also increases the computational complexity after incorporating multiple data sets
[0009] In the above-mentioned methods and public documents, the key protein prediction method based on PPI network data may affect the accuracy of prediction due to the existence of many false positive and false negative data in high-throughput protein interaction data; Although the key protein prediction method of gene expression data can eliminate the false positive and false negative of protein interaction data to a certain extent, it ignores the influence of noise in gene expression data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Key protein identification method based on protein clustering characteristics and activity co-expression
  • Key protein identification method based on protein clustering characteristics and activity co-expression
  • Key protein identification method based on protein clustering characteristics and activity co-expression

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] 1. Selection of protein interaction network (PPI) data and gene expression data.

[0046] Since the yeast data is relatively complete among all species and is widely used in various key protein prediction methods, the present invention uses the data of Saccharomyces cerevisiae (Bakers Yeast) for testing. The genome-wide protein interaction data of yeast was downloaded from DIP, and the repeated interaction data and self-interaction data were discarded. The resulting yeast PPI network had 5093 proteins and 24743 edges. The key proteins are integrated with four databases of MIPS, SGD, DEG and SGDP, among which there are 1285 key proteins of yeast (1167 key proteins appear in yeast PPI). Gene expression data reflect the dynamic properties of genes in the metabolic cycle. The gene expression data of yeast was downloaded from the NCBI GeneExpression Omnibus website. After preprocessing, 6777 gene products and 36 samples were obtained, of which 4858 genes participated in the...

Embodiment 2

[0063]In order to verify the performance of the JDC method of the present invention, 9 key protein prediction methods were selected for comparison. The nine prediction methods are: (Degree Centrality, DC) degree centrality method; (Information Centrality, IC) information centrality; (Eigenvector Centrality, EC) information vector centrality; (Subgraph Centrality, SC) subgraph centrality; ( Betweenness Centrality, BC) betweenness centrality; (ClosenessCentrality, CC) proximity centrality; (Edge Clustering Cofficient Centrality, NC), a key protein measurement method based on the edge clustering coefficient, (Integratioin of gene expression profiles and PPIdata, PeC) based on Key protein measurement method of gene expression data and PPI network data; (Integratioin of gene expression profiles and PPI data and add the parameters to adjust the proportion, P&E) key protein measurement method based on weighted centrality. And selected Top1%, 5%, 10%, 15%, 20% and 25% of the proteins ...

Embodiment 3

[0065] ROC curve comparison and multiple performance evaluations of key protein identification methods based on protein clustering properties and active co-expression.

[0066] To evaluate the global performance of each method, ROC curves were used for comparison. Compare the result as Figure 8 As shown, under the Yeast data, the area under the curve (AUC) of JDC was 0.6992, and the areas under the curve (AUC) of WDC and NC were 0.6884 and 0.6889, respectively. Compared with WDC and NC, the method of JDC improves by 0.0108 and 0.0103, respectively. The difference between JDC, WDC and PeC is how to weight the PPI network. Although LI and Tang introduced the PCC correlation coefficient to weight the PPI network on the basis of ECC, it effectively suppressed false positives and false negatives. However, the "activity" and "inactivity" of gene expression at different moments are ignored when introducing gene expression data. For this reason, the present invention proposes to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a key protein identification method based on protein clustering characteristics and activity co-expression. The method comprises the following specific steps: describing the clustering characteristics of a protein interaction network by utilizing an edge aggregation coefficient; setting gene activity expression by setting a threshold parameter, and describing by adopting aBoolean value; defining a calculation method based on the Boolean value of gene activity expression, and calculating the score of activity co-expression by utilizing a Jaccard coefficient; finally obtaining the key comprehensive scores based on the protein clustering characteristics and the activity co-expression, outputting a sorting result, and using the protein with the high key comprehensivescore after top sorting (top N is taken as a threshold value) as a key protein. According to the key protein identification method, the influence of gene expression data noise is eliminated, and the key protein identification method is superior to a centrality measurement method and a key protein prediction method with the same input data set in the aspects of identification accuracy, specificity,sensitivity and the like.

Description

technical field [0001] The invention belongs to the technical field of biological information, and relates to a key protein identification method based on protein clustering properties and activity co-expression. Background technique [0002] The life activities of organisms often require the deep participation of proteins. Key proteins generally exist in protein complexes, and its absence will cause the loss of certain functions in the organism, and even cause the organism to fail to survive. Key proteins are essential for the physiological activities and survival of life. Therefore, how to accurately predict key proteins has become a research focus in the field of proteomics. [0003] When studying key proteins in the early days, biologists mainly used biological experiments to observe the impact of organisms on organisms when certain proteins were lost, and to judge whether the protein was a key protein. Although good results have been achieved, there are limitations s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B5/10
CPCG16B5/10Y02A90/10
Inventor 钟坚成唐超孙瑜穗杨家红
Owner HUNAN NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products