Word vector and convolutional neural network based DNA replication initial region identification method

A convolutional neural network and region recognition technology, applied in the field of DNA replication origin region recognition based on word vector and convolutional neural network, can solve problems such as low recognition accuracy

Pending Publication Date: 2020-08-28
SHANDONG UNIV
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a DNA replication origin region recognition method based on word vectors and convolutional neural networks to solve the technical problem of low recognition accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector and convolutional neural network based DNA replication initial region identification method
  • Word vector and convolutional neural network based DNA replication initial region identification method
  • Word vector and convolutional neural network based DNA replication initial region identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] In order to enable those skilled in the art to better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described The embodiments are only some of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

[0059] This application provides a method for identifying the origin of DNA replication based on word vectors and convolutional neural networks, such as figure 1 shown, including:

[0060] S110: Randomly select ORI sequences and non-ORI sequences from the yeast biological DNA sequence database to construct a DNA sequence sample set.

[0061...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word vector and convolutional neural network based DNA replication initial region identification method. The method comprises the following steps: performing word segmentationon a DNA sequence through continuous trisection sequence word segmentation to obtain individual trinucleotides, carrying out negative sampling on the trinucleotides after word segmentation, carryingout vectorization on the trinucleotides through Word2vec iterative training to obtain word vectors, combining all word vectors to obtain a pre-training feature vector matrix which comprises a pre-training feature vector of each trinucleotide, vertically arranging the trinucleotides after word segmentation, and embedding the trinucleotides into the pre-training feature vectors of the trinucleotidesto obtain a word embedding layer, carrying out feature vectorization on the trinucleotide sequence by the word embedding layer, then carrying out convolution and pooling training to obtain a convolutional neural network, carrying out deep mining and classification identification on ORI features through the convolutional neural network added into the word embedding layer, and finally identifying ORI. The identification accuracy is greatly improved.

Description

technical field [0001] This application relates to the technical fields of biotechnology and genetic engineering, and in particular to a method for identifying DNA replication initiation regions based on word vectors and convolutional neural networks. Background technique [0002] As the first step in transmitting genetic information, DNA replication has profound biological research significance. DNA replication refers to the biological process of semi-conservative replication of DNA double strands with one DNA strand as the parent strand before cell division, thereby producing two daughter strands identical to the original DNA double strands. Therefore, the study of DNA replication is fundamental to the study of other aspects of biology and is the primary task of studying the processes of life. Numerous biological experiments have shown that DNA replication starts from a special regional location, which is called ORI (Origin of Replication, replication initiation region). ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/10G06N3/04G06N3/08
CPCG16B30/10G06N3/08G06N3/045
Inventor 杨润涛吴峰张承进陈金桂张丽娜
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products