Zinc-binding protein action site prediction method based on integrated learning in non-equilibrium mode

A technology that integrates learning and prediction methods, applied in the intersection of proteomics and computer science, and can solve problems such as data imbalance and low precision without considering

Active Publication Date: 2019-02-12
JINLING INST OF TECH
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the existing prediction methods use data mining and other methods to establish classification models, treat the two types of samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Zinc-binding protein action site prediction method based on integrated learning in non-equilibrium mode
  • Zinc-binding protein action site prediction method based on integrated learning in non-equilibrium mode
  • Zinc-binding protein action site prediction method based on integrated learning in non-equilibrium mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention can be better understood from the following examples.

[0041] The overall process of the present invention is as figure 1 shown.

[0042]Aiming at the problem of predicting the action site of zinc-binding protein under unbalanced data set, the present invention uses a down-sampling technique to balance the data so that the data tends to be stable. A probabilistic neural network classifier model based on support vector machine and sample weighting was constructed using integrated technology, and the model was used to classify and identify zinc-binding protein interaction sites. The specific implementation steps are as follows:

[0043] 1. Balancing

[0044] The zinc-binding protein action site is called a small class sample (negative class sample); the non-binding protein action site is called a large class sample (positive class sample). Random non-replacement down-sampling is performed on large-scale samples, and in order to avoid the loss of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a zinc-binding protein action site prediction method based on integrated learning in a non-equilibrium mode. According to characteristics of the zinc-binding protein action site, protein source data is pre-processed; non-equilibrium of the zinc-binding protein site is equilibrated through the random down-sampling technology to obtain several sub-equilibrium data sets; for the sub-equilibrium data sets, distinguishable protein biochemical characteristics are selected for characteristic representation to form characteristic vectors; the characteristic vectors are respectively taken as the input of a base classifier support vector machine to calculate the sample weight, a probability neural network model based on the sample weight is constructed, and lastly, the base classifier support vector machine and the probability neural network model based on the sample weight are integrated to obtain a prediction model; the prediction model is utilized to identify the zinc-binding protein action site of a target sample.

Description

technical field [0001] The invention relates to a zinc-binding protein action site prediction method based on integrated learning in an unbalanced mode, which aims at identifying the zinc-binding protein action site by using an integrated learning classification model in an unbalanced classification mode, and belongs to proteomics and computer science the cross field. Background technique [0002] With the completion of the Human Genome Project, life science has entered the post-genome era, and the protein expressed by genes has become one of the important research topics in the fields of life science and natural science. Protein is the basic organic matter that constitutes cells, the material basis of life, and plays a decisive role in the process of biological life. However, this decisive role is not simply determined by a single protein. In most cases, the protein needs to interact with other proteins or ligands to complete specific biological functions. [0003] In cel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B40/00G16B5/00
Inventor 李慧
Owner JINLING INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products