A Sequence Feature Analysis Method for Predicting miRNA Target Genes

A feature analysis, target gene technology, applied in the field of molecular biology and bioinformatics, can solve the problem of low specificity, achieve the effect of balancing differences, good prediction accuracy, and fast training and prediction speed

Active Publication Date: 2019-04-05
SYSU CMU SHUNDE INT JOINT RES INST +2
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In recent years, many studies have adopted commonly used features such as the minimum free energy of duplex formation between miRNA and target site, the pairing number of miRNA seed region, target site conservation, and target site accessibility. has the disadvantage of low specificity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Sequence Feature Analysis Method for Predicting miRNA Target Genes
  • A Sequence Feature Analysis Method for Predicting miRNA Target Genes
  • A Sequence Feature Analysis Method for Predicting miRNA Target Genes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Embodiment 1 Experimental method

[0062] 1. Experimental environment

[0063] Experimental instrument: ASUS N551JM computer

[0064] Programming software: Anaconda3 Spyder, Visual Studio 2013

[0065] Programming languages: Python 3.5, C++

[0066] 2. Positive and negative samples and their forms

[0067] The positive sample is selected from the CLASH experimental data set, with a total of 18514 pieces of data, each piece of data contains the following information: miRNA name, miRNA sequence, mRNA name to which the target site belongs (taken from the ENSEMBL database), and the target site is on the mRNA The start position of the target site, the end position of the target site on the mRNA, and the sequence of the target site.

[0068] Because the number of target sites that can be combined with miRNA is definitely much smaller than the number of target sites that cannot be combined with it, the positive samples are removed by randomly matching the miRNA and target ...

Embodiment 2

[0097] Example 2 Sequence feature analysis of predicted miRNA target genes

[0098] 1. Based on miRNA-target site pairing

[0099] miRNAs are not perfectly matched to their target sites, and matches vary widely. According to the pairing situation of miRNA and its target site in the sample set, this method expresses the double strand of each miRNA combined with the target site as a binary sequence composed of "0" and "1", and analyzes the binary sequence formed , the specific process is as figure 2 As shown, the shaded part is the "seed area".

[0100] exist figure 2 Among them, the BEYLA sequence is the target site sequence corresponding to miR-149. First, the improved Smith-Waterman method was used to perform sequence matching according to the principle of base A:U and G:C complementary pairing, allowing G:U mismatches. Starting from the first nucleotide at the 5' end of the miR-149 sequence, it is compared with each nucleotide of the BEYLA sequence. If it matches, it ...

Embodiment 3

[0133] Example 3 Model construction and method for predicting miRNA target genes

[0134] Based on the above research and analysis, the method and model for predicting miRNA target genes were constructed, as follows:

[0135] 1. Collect data sets (collect miRNAs with high reliability and target site data that can be combined with them), and construct positive and negative samples

[0136] Select the CLASH data set as a positive sample, and construct a negative sample based on the data set, randomly pair the miRNAs in the CLASH data set with the target site sequence, delete the positive samples, and then randomly select 18514 entries from the remaining data set as negative samples ;

[0137](1) Select the positive sample data from the CLASH data set, the positive sample data includes miRNA name, miRNA sequence, mRNA name to which the target site belongs, the starting position of the target site on the mRNA, and the position of the target site on the mRNA. Termination position...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sequence characteristic analysis method for forecasting a miRNA target gene. The method comprises the steps of constructing related characteristics of 27 miRNA-target point pairing sequences on the basis of a CLASH experiment data set, and forming a characteristic set comprising 84 characteristic values by combining traditional characteristic; and performing machine learning by using a random forest model, and constructing a miRNA target gene forecast model to perform miRNA target gene recognition. The model constructed according to the method has the advantages of high accuracy, sensitivity, specificity and precision, and the miRNA target gene can be relatively and accurately forecasted.

Description

technical field [0001] The invention belongs to the technical fields of molecular biology and bioinformatics. More specifically, it relates to a sequence feature analysis method for predicting miRNA target genes. Background technique [0002] MicroRNAs (miRNAs) are a class of endogenous non-coding RNAs with a length of about 23 nucleotides (nt). They mainly achieve complete or incomplete base pairing with the 3'UTR sequence of mRNA, thereby achieving the purpose of cleaving mRNA and inhibiting translation of mRNA into protein, and play an important role in gene regulation at the post-transcriptional stage and translation level. So far, more than 2,000 human miRNAs have been discovered, and these miRNAs may regulate 80% of the genes in the human body, and play a key role in the regulation of various life activities and diseases. Since the specific mechanism of miRNA target gene recognition is still unclear, and the interaction mechanism between miRNA and its target gene is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/30G16B40/00
CPCG16B30/00G16B40/00
Inventor 邹小勇夏飞迪王洋戴宗
Owner SYSU CMU SHUNDE INT JOINT RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products