Protein-protein interaction prediction method based on deep forest

A prediction method and protein technology, applied in the field of biological information, can solve problems such as high cost, and achieve the effect of improving prediction accuracy and reducing model complexity

Active Publication Date: 2020-05-29
QINGDAO UNIV OF SCI & TECH
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] But deep learning also has its shortcomings. First, before training the deep learning model, the number of layers and nodes of the neural network need to be specified in advance, which requires a lot of energy to adjust the parameters.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein-protein interaction prediction method based on deep forest
  • Protein-protein interaction prediction method based on deep forest
  • Protein-protein interaction prediction method based on deep forest

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0082] A deep forest-based protein-protein interaction prediction method GcForest-PPI, the specific steps are as follows (such as figure 1 shown):

[0083] 1) collect data

[0084] 1-1) Select the protein interaction data set of yeast S.cerevisiae and Helicobacter pylori H.pylori as the training set, and use the experimentally verified protein-protein interaction pairs obtained from the database as positive samples. Interacting protein pairs serve as negative samples. The S.cerevisiae dataset is from the DIP (www.dip.doe-mbi.ucla.edu) core database, version DIP_20070219. H. pylori is derived from Martin, S., Roe, D. and Faulon, J.L. (2005). Predicting protein-protein interactions using signature products. Bioinformatics, 21(2), 218-226.

[0085] 1-2) For the yeast data set, first remove samples with less than 50 residues in the sample, and then use the multiple sequence alignment tool CD-HIT program to remove protein sequences with sequence similarity higher than 40%, and o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention, which belongs to the technical field of biological information, discloses a protein-protein interaction prediction method based on a deep forest. According to the method, pseudo amino acid composition, a mutual information descriptor, composition, and distribution, a conversion descriptor, an amino acid composition position specificity score matrix and a dipeptide composition position specificity score matrix are fused to convert a protein sequence into a numerical vector; sequence information, physicochemical property information and evolution information of the protein pair are fused as initial characteristics of a sample; an elastic network is used for feature selection, and redundant and irrelevant features are removed; and a fused optimal feature vector is inputted intoa constructed multi-granularity cascade depth forest to predict protein-protein interaction. The method is simple and effective, the deep forest can represent the high-level feature information of the protein pair, the results of the training set and the test set are obviously superior to those of other prediction methods, and a certain reference can be provided for drug target prediction and human disease treatment.

Description

technical field [0001] The invention belongs to the technical field of biological information, and relates to a protein-protein interaction prediction method based on deep forest. Background technique [0002] Protein-protein interactions (PPIs) play an important role in the structure and function of cells. The disorder of the network structure will cause abnormalities in the life activities of cells. In-depth study of PPIs is very important for understanding the life activities of cells and elucidating cell life. It is of great significance in terms of function and human disease prevention and treatment. With the advent of the post-genome era and the development of high-throughput sequencing technologies, a large number of experimentally identified PPIs have been generated. Considering that the experimental method to identify and identify PPIs is resource-consuming and takes a long period, how to use machine learning-based prediction of protein-protein interactions is part...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/00G16B5/00G16B40/00
CPCG16B20/00G16B5/00G16B40/00Y02A90/10
Inventor 于彬陈成张青梅王磊张岩
Owner QINGDAO UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products