The invention provides a sampling
learning based protein-ligand
binding site prediction method. The method comprises the steps of: firstly, utilizing PSI-BLAST and PSIPRED programs to obtain evolutionary information and secondary structure information of
protein, and using a
slide window technology to extract characteristics of each
amino acid residue (sample); secondly, utilizing a random down-sampling technology to perform random down-sampling on non-
binding site samples, and using obtained non-
binding site sample subsets and binding site sample set to
train an SVM for predicting all to-be-predicted samples; thirdly, according to characteristic information of each to-be-predicted sample, utilizing a KNN dynamic sampling learning technology to perform sampling learning on binding site samples and the non-binding site samples respectively, and combining binding site sample subsets and the non-binding site sample subsets after sampling to
train a specific SVM for predicting the to-be-predicted samples; and finally, using a threshold based integration technology to integrate the two trained SVMs. The method has the advantages that: firstly, the use of the random down-sampling and KNN dynamic sampling learning technologies can effectively reduce the scale of training sets and accelerate the model training speed; secondly, the use of the KNN dynamic sampling learning technology can
train different SVM models for different to-be-predicted samples and effectively infuse the difference among the to-be-predicted samples; and thirdly, the use of the SVM integration technology effectively reduces the
information loss caused by sampling learning and improves the
model prediction precision.