Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Prediction method for identifying 4-methylcytosine site

A methylcytosine and prediction method technology, which is applied in the field of prediction methods and software systems for identifying 4-methylcytosine sites, can solve the problems of lack of prediction algorithms, laborious and laborious problems, and achieves great application significance and excellent accuracy. , the effect of accurate prediction and identification

Inactive Publication Date: 2021-01-19
TIANJIN UNIV
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, with the development of sequencing technology, in the context of high-throughput sequencing, experimental methods such as sulfite detection are laborious, and the prediction algorithm for 4-methylcytosine sites is still relatively lacking. In this context, there is an urgent need Establishing accurate and efficient prediction methods to identify methylation sites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Prediction method for identifying 4-methylcytosine site
  • Prediction method for identifying 4-methylcytosine site
  • Prediction method for identifying 4-methylcytosine site

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0071] The 4-methylcytosine site is a specific methyltransferase (DNA methyltransferase, DNMT) that transfers a methyl group to the N4 position of cytosine to form a modification site. figure 1 It is a flowchart of a prediction algorithm for identifying 4-methylcytosine sites, and the specific steps are as follows:

[0072] Step 1: Establishment of benchmark dataset

[0073] Select Chen W, Yang H, Feng P, et al.iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties[J].Bioinformatics,2017,33(22):3518-3523Caenorhabditiselegans , Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterraneus, Fusobacterium Geobacterpickeringii positive samples with 4-methylcytosine sites and without 4-methylcytosine sites Six benchmark datasets constructed from positive and negative samples of cytosine sites. In each species, the positive sample is a sequence fragment centered on 4-methylcytosine, with 20 bp upstream and downstrea...

Embodiment 2

[0102] This embodiment provides a software system 4mCPred developed by a prediction algorithm for identifying 4-methylcytosine sites. Each species corresponds to two independent models 4mCPred_I and 4mCPred_II. Choosing 4mCPred_II can get the best prediction results, and choosing 4mCPred_I can better understand the position-specific tendency of triplets. The software system is a software system developed based on the optimal model using MATLAB software and JavaScript programming language. Users submit at least one DNA sequence in FASTA format, and they can quickly predict whether cytosine (C) in this sequence may be methylated (the length of the upstream and downstream sequences of this cytosine should not be less than 20).

[0103] To predict the possibility of methylation of cytosine in the DNA sequence of six species, the user only needs to select the model of the corresponding species in the prediction interface of 4mCPred, and input the FASTA format sequence of the corres...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a prediction technology for an epigenetic modification site in the field of machine learning, and provides a prediction method and a software system for identifying a 4-methylcytosine site. The prediction method for identifying the 4methylcytosine site comprises the following steps: step 1, establishing a reference data set; step 2, feature extraction: extracting sequenceinformation of the positive and negative sample sets to construct a multi-dimensional feature code; step 3, selecting a machine learning algorithm: constructing a prediction model by using the same features, and selecting an optimal classification algorithm from a naive Bayesian K nearest neighbor KNN, a random forest RF and a support vector machine SVM; step 4, feature selection; and step 5, model construction: integrating different feature subsets to perform cutting verification by using the machine learning algorithm selected in the step 3, evaluating the prediction model, and selecting anoptimal prediction model. The method is mainly applied to prediction occasions of epigenetic modification sites.

Description

technical field [0001] The invention relates to a prediction technology for epigenetic modification sites in the field of machine learning, in particular to a prediction method and software system for identifying 4-methylcytosine sites. Background technique [0002] Under the action of specific methyltransferase (DNA methyltransferase, DNMT), DNA transfers a methyl group to the N4 position of cytosine to form 4-methylcytosine. [0003] DNA methylation is one of the most important epigenetic modifications in living organisms. DNA methylation modification is involved in biological processes such as cell differentiation, genome stability, and X chromosome inactivation. Changes in DNA methylation status lead to abnormalities in gene structure and function, which are closely related to tumorigenesis. [0004] The main biological detection method of DNA methylation is sulfite sequencing. After sulfite treatment, cytosine (C) of DNA is converted into thymine (T), and the CpG dinu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B20/00G06K9/62G06N20/00
CPCG16B20/00G06N20/00G06F18/2411
Inventor 郭菲邹权何文颖唐继军
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products