Protein solubility prediction method based on combined machine learning model

A technology of machine learning model and prediction method, which is applied in the fields of artificial intelligence and protein engineering, can solve problems such as the need to improve the prediction accuracy, and achieve the effects of saving experimental costs, high prediction accuracy, and improving efficiency

Pending Publication Date: 2022-06-03
河南省健康元生物医药研究院有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, its prediction accuracy needs to be improved; therefore, in E. coli, how to improve the accuracy of protein solubility prediction based on protein amino acid sequence has become an urgent problem to be solved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein solubility prediction method based on combined machine learning model
  • Protein solubility prediction method based on combined machine learning model
  • Protein solubility prediction method based on combined machine learning model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the technical means, creative features, achievement goals and effects realized by the present invention easy to understand, the present invention will be further described below with reference to the specific embodiments.

[0030] In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front end", "rear end", "two ends", "one end" and "the other end" The orientation or positional relationship indicated by etc. is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation, a specific orientation, and a specific orientation. The orientation configuration and operation are therefore not to be construed as limiting the invention. Furthermore, the terms "first" and "...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a protein solubility prediction method based on a combined machine learning model, and the method comprises the steps: extracting protein amino acid sequences expressed in escherichia coli and solubility-related records from a TargetTrack database, and taking the protein amino acid sequences and solubility-related records as a training data set; training a convolutional neural network model; the protein amino acid sequence is used as input, and the protein dissolvable probability is used as output; a secondary structure where each amino acid of the protein in the training data set is located is extracted, a support vector machine model is trained based on related features of a protein sequence, and the output of the model is the dissolvable probability of the protein; calculating a linear combination coefficient of the trained convolutional neural network model and the support vector machine model according to the training data set, and determining a final combined model; and inputting a protein sequence of which the solubility needs to be predicted into the combined model, and outputting a corresponding protein soluble probability. The method is high in prediction accuracy, and the scientific research and production efficiency is improved.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence and protein engineering, and relates to a machine learning prediction method for protein solubility, in particular to a method for predicting the solubility in Escherichia coli from the primary amino acid sequence of a protein based on a combined machine learning model. Background technique [0002] Solubility is one of the important properties of proteins, and this property is closely related to the application of proteins in academic and industrial fields, such as enzyme engineering, synthetic biology, and structural biology. The "ProteinStructure Initiativa (PSI)" project is committed to pooling the strength of a large number of structural biology laboratories around the world to solve more than 300,000 protein sequence structures. In this project, the insoluble expression of the protein in the expression system made it impossible for a large number of protein sequences to be pu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/00G16B40/00G06N3/04
CPCG16B30/00G16B40/00G06N3/045
Inventor 范子灵梁恒宇周鹏韩超陈民良幸志伟马英宁张一平张皓
Owner 河南省健康元生物医药研究院有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products