Circular RNA identification method based on machine learning strategy

A machine learning and recognition method technology, applied in the field of data science, can solve the problem of low recognition sensitivity and accuracy, and achieve the effect of increasing the classification effect, improving the detection accuracy and saving costs.

Active Publication Date: 2022-02-22
XI AN JIAOTONG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

A variety of computational methods for identifying circular RNAs from RNA-seq data have been proposed, but these methods generally suffer from low identification sensitivity and low accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Circular RNA identification method based on machine learning strategy
  • Circular RNA identification method based on machine learning strategy
  • Circular RNA identification method based on machine learning strategy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] see figure 1 , the present invention a kind of circular RNA identification method based on machine learning strategy, comprises the following steps:

[0046] S1, input data

[0047] For the SAM file and gene annotation FASTA file generated by RNA-seq, the input data format requirements are: SAM format and FASTA format; run the existing circular RNA identification algorithm to obtain the candidate circular RNA set, and determine the breakpoint position of the candidate circular RNA;

[0048] Run the existing circular RNA identification algorithm to output a set of candidate circular RNAs, where the reference genome number is used, and the position of the 5' splicing site of the circular RNA is defined as the left breakpoint, using brk 1 Indicates; the position of the 3' splice site of circular RNA is defined as the right breakpoint, with brk 2 Represent; the two can jointly represent a candidate circular RNA; sort the circular RNAs in the candidate circular RNA set acc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a circular RNA identification method based on a machine learning strategy. Input data, locate each candidate circular RNA on a reference genome and extract Reads features near these circular RNA regions; use the extracted features to train supervised A machine learning model; use the trained model to classify the true and false positives of the candidate circular RNA set, and output the final circular RNA. The present invention belongs to a class of machine learning filtering strategies, has the advantages of machine learning filtering strategies, and can significantly save costs, time, etc. in clinical practice.

Description

technical field [0001] The invention belongs to the technical field of data science, and in particular relates to a circular RNA identification method based on a machine learning strategy. Background technique [0002] Circular RNA (English name: Circular RNA, English abbreviation: CircRNA) is an important member of the non-coding RNA (English name: non-coding RNA, English abbreviation: ncRNA) family. The definition of circular RNA is: circRNA (circular RNA, circular RNA) is a type of non-coding RNA molecule with a closed circular structure, without a 5'cap structure and a 3'poly(A) structure. Its existence was discovered as early as the 1970s, but due to the limitations of technology and knowledge at that time, circular RNA was once considered to be the result of splicing errors or transcriptional noise. In recent years, with the deepening of research and the development of sequencing technology, for the first time in 2012, RNA sequencing (English name: RNA sequencing, Eng...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B40/00G16B30/00G06N20/00
CPCG16B40/00G16B30/00G06N20/00
Inventor 张选平王一丹王嘉寅
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products