Method for using computer program to simulate and generate simplified DNA methylation sequencing data

A computer program and sequencing data technology, applied in computing, electrical digital data processing, special data processing applications, etc., can solve problems that cannot be used to evaluate the reliability of splicing tools

Active Publication Date: 2017-12-08
ZHEJIANG UNIV
View PDF7 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Recently, there are also some data simulation tools based on RRBS sequencing, but they only generate some DNA methylation data through statistical models, not the read length (reads) data obtained by simulating actual sequencing, and these data cannot be used to evaluate the corresponding splicing tools naturally. reliability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for using computer program to simulate and generate simplified DNA methylation sequencing data
  • Method for using computer program to simulate and generate simplified DNA methylation sequencing data
  • Method for using computer program to simulate and generate simplified DNA methylation sequencing data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0018] Embodiment 1: The method for generating simplified DNA methylation sequencing data by computer program (Python programming language) simulation provided by the present invention, first according to figure 1 The simulation shown produces simplified DNA methylation sequencing data:

[0019] (1) Simulation generates a reference genome sequence including single base insertion, deletion, single nucleotide variation and structural variation (these variation parameters can be given by the user), such as hg19.

[0020] (2) Simulating the methylation level at the CpG dinucleotide site on the reference genome obtained in step (1). Since the methylation level of CpG sites on the genome usually obeys the Beta distribution, we use the Beta model to generate methylation level values ​​on CpG sites. In addition, considering the strong correlation of adjacent CpG sites in real data, we corrected the methylation levels of CpG sites within a distance of 100 bp based on a maximum likelih...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for using a computer program to simulate and generate simplified DNA methylation sequencing data, which can estimate the efficiency of different simplified genome methylation (RRBS) sequencing data comparison software and the reliability of corresponding data analysis platforms so as to determine the optimal comparison method and a corresponding optimal parameter. The method simulates an RRBS library construction and sequencing process through a computer program, and generates simulation data similar to real RRBS sequencing data according to distribution of a CpGs methylation level. The simulation data simulates other characteristics such as inserting, deletion, mononucleotide variation and structural variation of the real data except for the single base group methylation level so as to enhance the authenticity. During a simulation process of RRBS sequencing, an experience error model is introduced to simulate errors during the sequencing process, and then the authenticity of the simulation data can be further enhanced.

Description

technical field [0001] The invention belongs to the field of computer technology simulation to generate simplified DNA methylation sequencing data (bioinformatics), and specifically relates to a method for generating highly simulated simplified DNA methylation sequencing data by computer program simulation. Background technique [0002] DNA methylation refers to the chemical modification of DNA that affects biological processes or changes genetic phenotypes without changing the DNA sequence. In recent years, with the deepening of research, researchers have discovered that DNA methylation, as an important epigenetic modification, plays an important role in the formation and development of tumors. In addition, studies have also shown that DNA methylation may also be involved in important biological processes such as X chromosome silencing, genome imprinting, transposon silencing, and stem cell differentiation. Therefore, accurate detection of differentially methylated regions...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/12G06F19/20
CPCG16B5/00G16B25/00
Inventor 陆燕孙喜伟刘鹏渊周莉媛
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products