Method for predicting RNA coding potential

A prediction method and potential technology, applied in the field of prediction of RNA coding potential, can solve problems such as low prediction accuracy and fitting risk, achieve good species universality, high accuracy, and reduce species dependence

Active Publication Date: 2019-04-09
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This solves the technical problems of low prediction accuracy and over-fitting risk

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for predicting RNA coding potential
  • Method for predicting RNA coding potential
  • Method for predicting RNA coding potential

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0058] The present invention uses CPPred to test the data of human, mouse, zebrafish and Saccharomyces cerevisiae, and compares it with the test results of existing CPAT, CPC2, PLEK and sORF finder tools.

[0059] On the human test set (including long and short sequences) and the human sORF test set, the prediction performance comparison results of different prediction tools are shown in Table 1 and Table 2. It can be seen from Table 1 and Table 2 that whether it is the human test set or the human sORF test set, CPPred is better than CPAT and CPC2, but slightly worse than PLEK. This is because of redundancy between PLEK's training set and human test set.

[0060] Table 1: Comparison of CPPred with CPAT, CPC2, PLEK humans on the test set

[0061]

[0062] Table 2: Comparison of CPPred with CPAT, CPC2, PLEK, sORF finder on the human sORF test set

[0063]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of gene annotation and in particular relates to a method for predicting RNA coding potential. The method (named as CPPred) comprises the following steps: by integrating multiple sequence characteristics, particularly describing global distribution of RNA by using CTD; taking redundancy and relevance among candidate characteristics as standards, and combining a characteristic increasing selection method to select an optimum characteristic set to serve as a characteristic vector; establishing a prediction model by a support vector machine (SVM); finally, acquiring the prediction result according to a to-be-predicted RNA sequence characteristic vector. The prediction method provided by the invention is equivalent to a current existing method (having accuracyreaching 90% or higher) while predicting a long RNA sequence, while the method is obviously better than the current existing method while predicting a short RNA sequence.

Description

technical field [0001] The invention belongs to the field of gene annotation, and more specifically relates to a method for predicting RNA coding potential. Background technique [0002] In recent years, next-generation sequencing technologies have generated tens of thousands of new transcripts, so quickly and accurately distinguishing coding RNAs from non-coding RNAs (ncRNAs) has become the key to analyzing these data. In organisms, although ncRNAs cannot encode proteins, they also have important biological functions, such as gene regulation, gene silencing, RNA modification and processing. [0003] In the field of prediction of coding potential, a coding potential assessment tool CPAT using a matchless logistic regression model has been disclosed. It uses 4 sequence features: length of open reading frame, coverage of open reading frame, Fickett score and hexamer score. In this field of prediction, CPC2 is also disclosed, which also uses only 4 sequence features: the leng...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B50/00
Inventor 刘士勇童晓雪
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products