Granular support vector machine ensemble-based protein ligand binding site prediction method

A technology of support vector machines and protein ligands, which is applied in the field of protein ligand binding site prediction based on granular support vector machine integration, can solve problems such as class imbalance and information redundancy in data sets, and achieve enhanced generalization ability. , Improve the prediction accuracy and prevent the effect of overfitting

Inactive Publication Date: 2017-09-22
NANJING UNIV OF SCI & TECH
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in machine learning-based methods, the problem of class imbalance in datasets is an unavoidable problem, i.e., the number of ligand-binding residues (positive samples) is much smaller than that of non

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Granular support vector machine ensemble-based protein ligand binding site prediction method
  • Granular support vector machine ensemble-based protein ligand binding site prediction method
  • Granular support vector machine ensemble-based protein ligand binding site prediction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022]The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0023] Such as figure 1 As shown, according to a preferred embodiment of the present invention, the protein ligand binding site prediction method based on granular support vector machine integration is used for a protein sequence to be predicted / queried (hereinafter referred to as a given query input q) Prediction, which is divided into five steps, the first four steps are the model training stage, and the fifth step is the prediction stage, combined below figure 1 As shown, the implementation of the above five steps is described in detail.

[0024] The first step is to perform feature extraction based on the evolutionary information and secondary structure of the existing protein sequence, express the amino acid residues in the sequence in the form of feature vectors, and construct a training sample set in units of residues (sites). For any g...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a granular support vector machine ensemble-based protein ligand binding site prediction method. The method comprises the steps of 1, performing feature extraction according to evolution information and a secondary structure of a protein sequence, representing amino acid residues in the sequence in an eigenvector form, and constructing a training sample set by the residues (sites) as units; 2, performing sampling on the training sample set by utilizing a granular computation thought to generate a plurality of sub-training sample sets; 3, training support vector machine (SVM) models on the sub-training sample sets respectively, wherein multiple SVMs form an SVM ensemble; 4, performing integration on the models in the SVM ensemble by adopting an AdaBoost algorithm to obtain an integrated SVM model; and 5, for a given query sequence, generating eigenvectors corresponding to all residues in the sequence by using the same feature extraction method. For each residue sample, the integrated SVM model is used for performing prediction to generate an original prediction result, and then the original result is processed by utilizing a simple post-processing technology to generate a final prediction result. The method is high in prediction precision and good in generalization capability.

Description

technical field [0001] The invention relates to the field of bioinformatics protein-ligand interaction, in particular, it is a protein ligand binding site prediction method based on granular support vector machine integration. Background technique [0002] In the life activities of cells, proteins often need to bind with other molecules (ligands) to participate in various biological processes. Accurate identification of protein ligand binding sites is helpful for understanding protein function and designing new drugs. However, traditional biochemical identification methods are time-consuming and costly, and cannot meet the urgent needs of related research. Therefore, in recent decades, researchers in this field have proposed a large number of efficient computational methods to identify protein ligand binding sites, including: template-based methods, machine learning-based methods, etc. [0003] Machine learning-based methods are one of the most commonly used methods for pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22G06F19/24
CPCG16B30/00G16B40/00
Inventor 於东军朱一亨
Owner NANJING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products