Drug target affinity prediction method based on deep learning

A deep learning and prediction method technology, applied in drug reference, biological neural network model, neural architecture, etc., can solve the problem of affecting prediction accuracy, drug-target interaction accuracy is not high, ignoring drug-target interaction combination Affinity and other issues to achieve the effect of improving fault tolerance, compressing the number, and reducing overfitting

Active Publication Date: 2020-01-14
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006]1. The drug-target interaction prediction task is modeled as a binary classification problem. This modeling problem ignores the relationship between drug-target interaction The binding affinity between drugs leads to low accuracy in the final prediction of drug-target interaction;
[0007]2. Transform the drug-target interaction prediction task into drug-target interaction affinity prediction. The existing methods can learn one-dimensional of drugs and proteins Structural features, the order relationship between amino acids in the target protein structure cannot be learned, which affects the prediction accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Drug target affinity prediction method based on deep learning
  • Drug target affinity prediction method based on deep learning
  • Drug target affinity prediction method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Please refer to figure 1 with figure 2 , the present invention provides a drug target affinity prediction method based on deep learning, comprising the following steps:

[0045] S1, data preparation, obtain drug compound and target protein data from Davis dataset and KIBA dataset;

[0046] S2. Data processing, encoding the compound, using molecular fingerprints to represent, generating tag codes, performing sequence representation on the protein, and representing the protein using a position-specific scoring matrix;

[0047] S3. Compound feature extraction, constructing a CNN model, inputting label codes into the CNN model, performing feature extraction on the compound, and obtaining the molecular representation of the compound;

[0048] S4. Protein feature extraction, construct LSTM model, input protein position-specific scoring matrix into LSTM model, perform feature extraction on protein sequence, learn the order relationship between amino acids in protein structu...

Embodiment 2

[0057] For step S2 in Example 1, encoding the compound specifically includes: expressing the chemical structure of each compound as a set of ASCII codes through the SMILES code of the molecule, and each ASCII code represents a substructural feature of the compound.

[0058] In this example, the .mol ​​format file that saves the chemical structure information is downloaded from the TCMSP database, and Openbabel is used to process the .mol ​​format file to calculate the SMILES molecular structure specification of the compound, and then use the "rcdk" package in the R language " get.fingerprint" function can calculate the SMILES code of drug molecules.

[0059] Example of SMILES molecular fingerprint representation of a compound: SMILES molecular fingerprint carbon dioxide 'O=C=O'. Compounds are represented by letters and symbols. For better operation and processing of the algorithm, letters and symbols are converted into numerical forms, integers are used to represent letters an...

Embodiment 3

[0061] For step S2 in Example 1, since there are more than 30 amino acids obtained from natural protein hydrolysis and 20 basic amino acids, each protein is generally composed of these 20 common amino acids, so the position-specific scoring matrix PSSM can represent Be n×20 matrix M={M i→j , i=1...n, j=1...20}, matrix element M i→j Indicates the possibility of the amino acid at the i-th position of an amino acid changing into amino acid j during the evolution process. The larger the number, the higher the possibility of being replaced during the evolution process. n indicates the total number of residues in a given protein sequence .

[0062] In this embodiment, the acquisition of PSSM needs to use PSI-BLAST software to perform amino acid multiple sequence comparison in the nr database (non-redundant protein database) to find homologous sequences, and the formal definition is as follows:

[0063]

[0064] The numbers in each row in the PSSM add up to 1.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a drug target affinity prediction method based on deep learning, and relates to the technical field of drug target affinity prediction. The method comprises the steps of: obtaining a drug compound and target protein data from a Davis data set and a KIBA data set; encoding the compound, and representing the protein by using a position specificity scoring matrix; inputting acompound label code into a CNN model, and performing feature extraction on the compound to obtain molecular representation of the compound; inputting the position specificity scoring matrix of the protein into an LSTM model, performing feature extraction on a protein sequence, and learning an order relationship between amino acids in a protein structure and a relationship between residues on the protein sequence to obtain the sequence representation of the protein; and simultaneously inputting the molecular representations of the compounds and the sequence representations of the proteins intoa fully linked layer to predict the affinity of the interaction of the compounds and the proteins. The method can predict the affinity relationship between the drug and the target more accurately.

Description

technical field [0001] The present invention relates to the technical field of drug target affinity prediction, in particular, to a method for predicting drug target affinity based on deep learning. Background technique [0002] The target of a drug refers to the binding site between the drug and the biomacromolecule of the body, and the target of the drug involves receptors, enzymes, ion channels, transporters, the immune system, genes, etc. Most drug molecules produce therapeutic effects through the interaction with target molecules in the human body, so target selection is a very critical step in drug development. The discovery of new drug targets is often the breakthrough for new drug discovery. Drug-target interactions (DTI) prediction is an important part of the drug discovery process. With the development of bioinformatics and the continuous expansion of public data sets, it is possible to use different calculation methods to predict drug-target interactions, which ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16H70/40G06N3/04
CPCG16H70/40G06N3/045Y02A90/10
Inventor 李巧勤刘勇国杨尚明李杨兰荻蔡茁
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products