Method and system for classification modeling based on protein length and DCNN

A modeling method and protein technology, applied in the field of classification modeling method and system based on protein length and DCNN

Pending Publication Date: 2019-10-08
QILU UNIV OF TECH
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical task of the present invention is to address the above deficiencies and provide a classification modeling method and system based on protein length and DCNN to solve the problem of how to combine deep learning to predict and analyze protein secondary structure and improve accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for classification modeling based on protein length and DCNN
  • Method and system for classification modeling based on protein length and DCNN
  • Method and system for classification modeling based on protein length and DCNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] The classification modeling method based on protein length and DCNN of the present invention comprises the following steps:

[0073] Step 1: Obtain multiple data sets as the training set, each data set includes multiple proteins, extract the PSSM features generated by PSI-Blast in the data set, and convert the format of the PSSM features by setting different sliding windows;

[0074] Step 2: Group the proteins in the training set based on the length of the protein to obtain multiple model groups;

[0075] Step 3: For each model group, construct a prediction model corresponding to the model group based on the deep convolutional network, and train the prediction model through the model group to obtain a trained prediction model.

[0076] Among them, the data set selected in the first step is a classic data set for protein secondary structure prediction. In this embodiment, the data set AstraCull with 15666 protein pieces synthesized from Astrall and CullPDB data is used a...

Embodiment 2

[0112] The classification modeling system based on protein length and DCNN of the present invention includes an input module, a format conversion module, a grouping module and a model training module.

[0113] The input module is used to obtain multiple data sets as training sets, and each data set includes multiple proteins. The selected data set is a classic data set for protein secondary structure prediction. In this example, Astrall and CullPDB data were synthesized into a data set AstraCull with 15666 protein entries.

[0114] The format conversion module is used to extract the PSSM features generated by PSI-Blast in the data set, and perform format conversion on the PSSM features through the sliding window. In the format conversion module, the 20-bit PSSM feature generated by PSI-Blast in the above data set is extracted, and after the format conversion of the PSSM feature through a sliding window with a value of 13, the feature of each amino acid in the training set is a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for classification modeling based on protein length and DCNN, and belongs to the field of protein prediction and analysis. The technical problem to be solved is how to use deep learning to predict and analyze the secondary structure of the protein and improve the accuracy. The method comprises the following steps: taking a plurality of big data sets as atraining set, extracting PSSM features generated by PSI-Blast in the data sets, and performing format conversion on the PSSM features through a sliding window; grouping proteins in the training set based on the protein length to obtain a plurality of model groups; and constructing a predication model corresponding to every model group based on a deep convolutional network, and training the prediction models by the model groups to obtain trained prediction models. The system includes an input module, a format conversion module, a grouping module and a model training module.

Description

technical field [0001] The invention relates to the field of protein prediction analysis, in particular to a classification modeling method and system based on protein length and DCNN. Background technique [0002] The study of the relevant properties of proteins is of great significance to bioinformatics. Generally speaking, new discoveries of proteins can also be obtained of new discoveries of human life. Among them, the secondary structure of proteins helps to discover the three-dimensional structure and can provide functional annotations of proteins, so the study of protein secondary structure is a topic worthy of in-depth research. After 66 years of development, the prediction of protein secondary structure is now accurate. rate has exceeded 80%. [0003] For information technology-related majors, it is mainly to explore and improve the accuracy of prediction, that is, to be able to design a prediction mechanism through existing technology, and when any new protein is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B15/00
CPCG16B15/00
Inventor 刘毅慧朱树平
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products