Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

LightGBM-based circRNA identification method

A multi-feature fusion and sequence technology, applied in the field of bioinformatics, can solve problems such as high false positive rate, high cost of time and equipment, and influence of circRNA function identification, and achieve the effect of improving algorithm stability and algorithm stability

Pending Publication Date: 2019-09-17
SUN YAT SEN UNIV
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, it has also become a relatively important potential biomarker in recent years. The most important task before discovering the function of circRNA in the experiment is to identify circRNA, and the experimental method with a high reliability rate is too time-consuming and expensive, which is not conducive to Identifying circRNAs in large batches
At the same time, traditional identification tools have problems such as high false positive rate and low overlap rate between multiple methods.
Will further negatively affect the functional identification of circRNAs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LightGBM-based circRNA identification method
  • LightGBM-based circRNA identification method
  • LightGBM-based circRNA identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described below in conjunction with the embodiments and accompanying drawings.

[0024] refer to figure 1 The flowchart of the circRNA identification method based on LightGBM in this embodiment. The main steps of the technical solution adopted by the present invention to solve its problem are:

[0025] S1. Import the circRNA of the large data sample from the positive and negative samples in the form of a (.bed) file, which contains the chromosome number, the sequence start site, and the positive and negative strand markers.

[0026] S2. Map the circRNA (.bed) file to the whole human genome (hg19 version) according to the relevant information such as the start site. Get the specific circRNA sequence information (.fasta) file.

[0027] S3. A feature fusion algorithm is proposed to extract relevant features based on potential circRNA seq...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

For overcoming defects in prior art, the invention aims to perform identification on the circRNA by means of a lightGBM method. The method comprises the following main steps of (1), inputting the circRNA of a big data sample in a (.bed) file form, wherein the circRNA comprises a chromosome number, a sequence starting site and a positive-and-negative chain mark; (2), mapping the circRNA(.bed) file to a human whole genome (hg19 version) according to the related information of starting site, and obtaining a circRNA sequence information (.fasta) file; (3), presenting a characteristic fusion algorithm for extracting the related characteristic of a circRNA forming process; (4), according to a lightGBM algorithm, respectively performing sample data sampling and characteristic sampling by means of a GOSS and EFB, mapping the continuous characteristics into discrete buckets by means of a Histogram-based algorithm, forming multiple bins, and establishing a histogram by means of the multiple bins, namely discretizing a continuous variable; and (5), adjusting parameters such as a maximal depth max_depth of the tree, a possible minimal recording number min_data_in_leaf of the leaf, and a data proportion bagging_fraction in each iteration, thereby obtaining a model optimal parameter.

Description

technical field [0001] The invention relates to the technical field of bioinformatics, in particular to the field of circRNA identification methods. Background technique [0002] circRNAs have multiple functions in biology, such as being rich in miRNA binding sites and acting as sponges in cells; regulating protein activity by binding to proteins; some circRNAs can even be translated into proteins. Therefore, it has also become a relatively important potential biomarker in recent years. The most important task before discovering the function of circRNA in the experiment is to identify circRNA. However, the time-consuming and high equipment cost of the experimental method with high reliability is not conducive to Identifying circRNAs in large quantities. At the same time, traditional identification tools have problems such as high false positive rate and low overlap rate between multiple methods. It will further negatively affect the functional identification of circRNAs. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/00
CPCG16B20/00
Inventor 邓怡云曾甜戴宪华
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products