RNA binding protein recognition method based on multi-view depth features and multi-label learning

A deep-featured, protein-binding technology used in the field of RNA-binding protein recognition

Active Publication Date: 2020-07-24
JIANGNAN UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Now there are many methods that can use machine learning models to identify RBP binding sites from RNA sequences. The main focus is to use the sequence features or str...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • RNA binding protein recognition method based on multi-view depth features and multi-label learning
  • RNA binding protein recognition method based on multi-view depth features and multi-label learning
  • RNA binding protein recognition method based on multi-view depth features and multi-label learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] Following the implementation of the training phase, the examples were done for the RNA-RBP binding data of the AURA2 dataset. The dataset contains 67 RBPs and 73681 RNA sequences and their 550386 binding site information, as shown in Table 1. The amount of sample RNA that can be bound by each RBP is different and varies greatly. The length of each RNA sequence is different, so we uniformly stipulated a length of 2700, and the deficiency was filled with base B. Table 2 shows the comparison results of the method iDeepMV used in the present invention and the current advanced methods in this field.

[0076]

[0077]

[0078] The performance index of this algorithm in the embodiment 1 of table 2

[0079]

[0080] Among them, RNA perspective-, amino acid perspective-, dipeptide perspective- and voting results- are the prediction results of the neural network that have not been trained by the multi-label classifier in the iDeepMV method and their voting results, RNA...

Embodiment 2

[0083] In order to reflect the prediction accuracy of the method of the present invention from an individual, Table 3 calculates the prediction effect of the method used in this experiment and the advanced method in this field on different RBPs.

[0084] Table 3 Prediction effects of different RBPs

[0085]

[0086]

[0087]

[0088] The abscissa of the three graphs in Figure 13 is the number of samples of different RBPs, and the ordinate is the precision rate, recall rate and F1-score respectively. It can be seen that, with the gradual increase in the number of samples of the three methods, each index shows a gradual increase and a flattening trend. It is noticed that when the number of samples is less than 5000, the fluctuations of various indicators are very large. This is because the number of samples of some classes is too small and the model cannot learn the deep features of these samples well. And from the comparison of the three curves, the learning ability o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of intelligent cell biological recognition, and relates to an RNA binding protein recognition method based on multi-view depth features and multi-label learning. Themethod comprises a training stage and a use stage, wherein the training stage comprises initial multi-view data construction, a depth multi-view feature extraction model and multi-label classifier training, wherein the initial multi-view data construction comprises the following steps: converting an original RNA sequence into an amino acid sequence and a dipeptide component by using a molecular biology principle and a statistics principle to obtain the characteristics of the amino acid sequence and the dipeptide component, and then constructing an initial multi-view characteristic together with the original RNA sequence, and constructing a model for the initial multi-view characteristic. According to the method, based on the initial multi-view data, the CNN is used for deep learning to construct a deep multi-view characteristic, and compared with the original multi-view characteristic, the multi-view characteristic extracted based on the deep features have smaller data dimensions anda higher classification effect.

Description

technical field [0001] The invention belongs to the field of intelligent cell biological recognition, and relates to an RNA-binding protein recognition based on multi-view depth features and multi-label learning. Background technique [0002] RNA, the full name of ribonucleic acid, exists in biological cells and the genetic information carriers in some viruses and virus-like viruses. In living organisms, it mainly plays a role in regulating the expression of coding genes, and also serves as a template for synthesizing proteins after gene transcription. It is an indispensable ingredient in life. If an RNA wants to perform its function smoothly, it generally needs to be mediated by RNA binding protein (RBP), so the lack of a certain RBP may cause a certain type of RNA to be unable to perform its regulatory or translation functions, thus making the living body lack some important functions. Abnormal proliferation of proteins or certain proteins, affecting their own functions. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B5/00G16B20/00G06N3/04
CPCG16B5/00G16B20/00G06N3/045Y02A90/10
Inventor 邓赵红杨海涛吴敬王蕾王士同
Owner JIANGNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products