Method of predicting interaction between chemical compounds and proteins based on random forest

A random forest algorithm and random forest technology, applied in the field of computer-aided drug design, can solve the problem of low prediction accuracy

Active Publication Date: 2013-05-22
ZHEJIANG UNIV
View PDF2 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to propose a method based on the random forest algorithm that can be used to discover the interaction between compounds and proteins in order to improve the efficiency of compound and protein interactions more effectively. The accuracy of the interaction prediction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of predicting interaction between chemical compounds and proteins based on random forest

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the purpose, implementation and advantages of the present invention more clearly understood, here in conjunction with specific implementation example, be described in further detail, as figure 1 Shown:

[0029] (A) Collect information on target proteins known to interact with drug compounds to build a target library.

[0030] From the DrugBank3.0 database (C.Knox et al., Nucleic Acids Research , 201139 (suppl 1), p.D1035-D1041) to obtain 4177 known target proteins and their sequences that can interact with drugs, and according to the target protein sequence information, select the target protein through PseAAC (pseudo Amino acid composition, Pseudo Amino Acid Composition, see literature HB Shen&KC Chou, Analytical Biochemistry , 2008, 373(2), p.386-388) tool to calculate the feature descriptors of all target proteins, and the number of feature descriptors is 30.

[0031] (B) Collect the drug compounds used to build the training set (that is, the d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method of predicting interaction between chemical compounds and proteins based on random forest. The method includes: collecting information of target proteins which tend to interact with drug compounds to establish a target base; collecting the drug compounds for establishment of a training set and information of interaction relation between the drug compounds and the target proteins to establish a compound base; establishing the training set according to the information of the compound base and the target base; training by modified random forest algorithm on the basis of the training set to establish a predicting model; collecting the compounds for prediction and the information of the target proteins obtained in step (A) to establish a test set; predicting the test set on the basis of the predicting model; and (H), judging whether interaction between the compounds to be predicted and the target proteins exists or not according to the prediction result. Accuracy of predicting interaction between the compounds and the proteins by the method can be improved.

Description

technical field [0001] The invention relates to the field of computer-aided drug design, in particular to a method for predicting the interaction between compounds and proteins based on random forest algorithm. Background technique [0002] In the past ten years, although the investment in drug research and development has increased worldwide, its output—the number of drugs approved by the FDA has shown a downward trend year by year (C.R.Chong&D.J.Sullivan, Nature , 2007.448: p.645-646.). More and more scholars believe that the traditional "single drug, single target" drug development model is the main reason for this result. [0003] The goal of traditional drug R&D is to discover drugs targeting a single target with high selectivity and high safety. However, people have gradually discovered that the relationship between drugs and targets is a many-to-many relationship, that is, one drug often acts on multiple different targets, and one target often interacts with multipl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/18
Inventor 黄剑平范骁辉
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products