Bidirectional LSTM and CNN model for predicting DNA-protein binding

A protein and model technology, applied in the field of deep learning and bioinformatics, can solve the problems affecting the performance of the model system, capture the position and dynamic performance of the probe sequence, and predict the accuracy of DNA-protein binding to be improved, etc., to achieve high accuracy , the effect of good prediction effect

Inactive Publication Date: 2019-04-02
CHENGDU UNIV OF INFORMATION TECH
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, existing solutions have the following disadvantages: the framework of the neural network will greatly affect the system performance of the model, and at the same time perform worse in capturing the...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bidirectional LSTM and CNN model for predicting DNA-protein binding
  • Bidirectional LSTM and CNN model for predicting DNA-protein binding
  • Bidirectional LSTM and CNN model for predicting DNA-protein binding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are only exemplary and not intended to limit the scope of the present invention. In addition, in the following description, descriptions of well-known structures and technologies are omitted to avoid unnecessarily obscuring the concept of the present invention.

[0025] The invention uses a two-way LSTM structure to process the order and reverse order dependent information in the DNA sequence. The network structure and the proposed algorithm are implemented based on the Keras library. All of this is done on the graphics processing unit (GPU) to speed up training time.

[0026] Such as figure 1 As shown, the BLSTM network and the CNN network are combined to form the BLSTM+CN...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a bidirectional LSTM and CNN model for predicting DNA-protein binding. The model includes an input layer, a BLSTM layer, a convolutional layer, a maximum pooling layer, a full connection layer, and an output layer. Each input sequence is expressed as a four-line binary matrix by the input layer through single thermal coding. In the BLSTM layer, each LSTM model in a previouslayer will receive information of interest on DNA from an input sequence, and encode and interpret contributions from past historical information to a hidden state; then, the BLSTM module is propagated to the next BLSTM module, wherein a matrix scanned and input by each convolution kernel in the convolution layer is used for motif discovery, and information with different intensities is associatedwith potential sequence patterns; the maximum pooling layer is used for maximizing an output signal of each convolution kernel to form a complete sequence; the output layer performs non-linear conversion to determine DNA-protein binding feature information.

Description

Technical field [0001] The present invention relates to the field of deep learning and biological information technology, in particular to a two-way LSTM and CNN model for predicting DNA-protein binding. Background technique [0002] Accurately establishing a specific model of transcription factor (TF) sequence is a basic problem for understanding genome function and evolution. In particular, the binding properties of transcription factors will have a decisive effect on downstream gene expression. With the development of high-throughput sequencing technology, the ENCODE project provides the binding specificity of 187 TFs in the whole genome of 98 cells. According to transcription factor binding sites, a binary classification problem of a sequence can be defined, that is, according to whether TF is bound, it is divided into positive samples and negative samples. By establishing a binary classification model of sequences, the binding sites of new samples can be predicted. [0003]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B30/00G06N3/04
CPCG06N3/049G06N3/045
Inventor 张永清曾圆麟卢荣钊何嘉周激流
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products