Sampling learning based protein-ligand binding site prediction method

A technology of binding sites and prediction methods, which is applied in the fields of instruments, computing, and electrical digital data processing, etc., can solve problems such as poor interpretability, insufficient consideration of differences, and poor versatility, so as to improve prediction accuracy, Fairness and rationality, and the effect of improving interpretability

Active Publication Date: 2015-10-21
NANJING UNIV OF SCI & TECH
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the above-mentioned problem of protein-ligand binding site prediction due to the lack of generality caused by the lack of ligand types, the differences between different samples to be predicted have not been fully considered, resulting in a large gap between the prediction accuracy and the actual application. Due...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sampling learning based protein-ligand binding site prediction method
  • Sampling learning based protein-ligand binding site prediction method
  • Sampling learning based protein-ligand binding site prediction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] In order to better understand the technical content of the present invention, the present invention will be further described below in conjunction with the accompanying drawings.

[0018] figure 1 A schematic diagram of the system structure of the prediction method of the present invention is given. combine figure 1 As shown, according to an embodiment of the present invention, a method for predicting protein-ligand binding sites based on sampling learning includes the following steps:

[0019] First, use the PSI-BLAST and PSIPRED programs to obtain the evolutionary information matrix (Position Specific Scoring Matrix, PSSM) and secondary structure prediction probability matrix (Predicted Secondary Structure, PSS) of the training protein respectively; Construct the eigenvector of each amino acid residue with the secondary structure prediction probability matrix, and then serially combine the eigenvectors of the above two kinds of information to obtain the final eigenv...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a sampling learning based protein-ligand binding site prediction method. The method comprises the steps of: firstly, utilizing PSI-BLAST and PSIPRED programs to obtain evolutionary information and secondary structure information of protein, and using a slide window technology to extract characteristics of each amino acid residue (sample); secondly, utilizing a random down-sampling technology to perform random down-sampling on non-binding site samples, and using obtained non-binding site sample subsets and binding site sample set to train an SVM for predicting all to-be-predicted samples; thirdly, according to characteristic information of each to-be-predicted sample, utilizing a KNN dynamic sampling learning technology to perform sampling learning on binding site samples and the non-binding site samples respectively, and combining binding site sample subsets and the non-binding site sample subsets after sampling to train a specific SVM for predicting the to-be-predicted samples; and finally, using a threshold based integration technology to integrate the two trained SVMs. The method has the advantages that: firstly, the use of the random down-sampling and KNN dynamic sampling learning technologies can effectively reduce the scale of training sets and accelerate the model training speed; secondly, the use of the KNN dynamic sampling learning technology can train different SVM models for different to-be-predicted samples and effectively infuse the difference among the to-be-predicted samples; and thirdly, the use of the SVM integration technology effectively reduces the information loss caused by sampling learning and improves the model prediction precision.

Description

technical field [0001] The present invention relates to the field of bioinformatics prediction of protein-ligand binding sites, in particular to a method for predicting protein-ligand binding sites based on sampling learning, especially a method based on random downsampling and KNN dynamic sampling A method for predicting protein-ligand binding sites with high precision using learning techniques and support vector machine ensemble strategies. Background technique [0002] In life activities, large and small ligands play an indispensable role, such as adenosine triphosphate (ATP), vitamins, etc.; among them, ATP is an important biological macromolecule, which is important for membrane transport, muscle contraction, Signal transmission, cell movement, DNA replication and transcription, and other life activities are of great significance. Most of these ligands interact with proteins through protein-ligand binding sites, and perform various biochemical functions by means of pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18G06F19/24
Inventor 胡俊何雪李阳於东军沈红斌杨静宇
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products