Random forest-based sucrose transporter identification method

A sucrose transporter, random forest technology, applied in character and pattern recognition, machine learning, computer parts and other directions, can solve problems such as sucrose transporter classification

Pending Publication Date: 2022-01-07
NORTHEAST FORESTRY UNIVERSITY
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to classify a new protein - sucrose transporter, and propose a method based on machine learning, especially a method based on random forest and k-separated-bigrams-PSSM, to solve Problems in the classification of sucrose transporters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Random forest-based sucrose transporter identification method
  • Random forest-based sucrose transporter identification method
  • Random forest-based sucrose transporter identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts all belong to the protection scope of the present invention.

[0038] figure 1 It is a specific flow diagram of the implementation of the present invention, such as figure 1 As shown, the method includes:

[0039] 1. Collect the data sets required for the experiment and preprocess them:

[0040] The Uniprot database is an authoritative protein sequence database. In this database, the keyword "sucrose transporter" is searched to obtain the initial positive data set. Through the PFAM protein family, the family c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a random forest-based sucrose transporter identification method, which comprises the following steps of: firstly, acquiring initial data from a protein database, preprocessing the initial data, deleting sequences containing non-standard letters and sequences with too short lengths, and deleting sequences with the similarity greater than 60%; extracting different features according to physicochemical properties and evolutionary information of the protein, and taking each feature and the combined feature as feature input; then, due to the fact that the difference between the sample numbers of the positive example and the counter example is large, performing oversampling on the data set; and finally, under ten-fold cross validation, performing feature training on the oversampled training set by using a random forest, a support vector machine, stochastic gradient descent and naive Bayes, performing testing by using a test set, and analyzing a result. According to the method, a k-separated-bigrams-PSSM and random forest combined method is used, and a Borderline-SMOTE algorithm is introduced, so that the problem of data imbalance is solved, and the identification accuracy of the sucrose transporter is effectively improved.

Description

Technical field: [0001] The invention relates to the field of protein classification, in particular to a method for identifying sucrose transporters based on random forests. Background technique: [0002] Sucrose transporter is a transmembrane protein widely present in plants. It plays an important physiological function in the process of sucrose transport and sucrose-specific signal induction, and has great research value on plant growth and yield. Traditional biochemical techniques require the preparation of a large number of experimental supplies, and the detection process is complex, time-consuming and labor-intensive. Currently, more and more researchers use machine learning methods to classify proteins. [0003] First of all, classifying proteins requires obtaining a benchmark data set, then the original data set needs to be preprocessed to ensure reliability. Next, feature extraction is performed on the sequence, mainly including amino acid composition, physical and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06V10/764G06V10/774G06K9/62G06N20/00
CPCG06N20/00G06F18/24323G06F18/214
Inventor 陈宇李赛
Owner NORTHEAST FORESTRY UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products