Transcription factor binding site prediction method based on depth convolution automatic encoder

An autoencoder and binding site technology, applied in the field of computer technology and bioinformatics, can solve the problems of insufficient model generalization ability, limited prediction level of different transcription factors, affecting the performance of prediction models in TFBS, etc.

Active Publication Date: 2020-06-19
CHENGDU UNIV OF INFORMATION TECH
View PDF8 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It may contain some noisy data and affect the performance of predictive models in TFBS
(2) Due to the heterogeneity of data samples of different transcription ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transcription factor binding site prediction method based on depth convolution automatic encoder
  • Transcription factor binding site prediction method based on depth convolution automatic encoder
  • Transcription factor binding site prediction method based on depth convolution automatic encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In order to facilitate those skilled in the art to understand the technical content of the present invention, the content of the present invention will be further explained below in conjunction with the accompanying drawings.

[0033] In the present invention, considering the spatial and sequential features of DNA sequences, we design a hybrid deep neural network that integrates a convolutional autoencoder and a high-speed fully-connected MLP at this stage. Convolutional Neural Networks (CNNs) are specialized versions of Artificial Neural Networks (ANNs) that employ a weight-sharing strategy to capture local patterns in data such as DNA sequences. Including the preliminary prediction algorithm, feature extraction and model establishment of transcription factor binding sites for the preprocessed DNA sequence data, the overall flow chart of the system is as follows figure 1 shown. The following will introduce each in detail:

[0034] The invention aims to make full use ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a transcription factor binding site prediction method based on a depth convolution automatic encoder, is applied to the technical field of computer technology and biological information, and aims to improve the generalization ability of a model while solving the dependence of the model on a negative sequence sample without a binding site. The method comprises the followingsteps: firstly, specifically enriching DNA fragments combined with target protein by virtue of a chromatin co-immunoprecipitation technology, so as to obtain an original data set; preprocessing the original data set to obtain a training data set; secondly, inputting the training data set into a convolution automatic encoder for training; and finally, carrying out binding site identification according to the trained convolutional automatic encoder. Experiments prove that the method can predict different transcription factor binding sites of different cell lines, and has a high-accuracy recognition effect.

Description

technical field [0001] The invention belongs to the fields of computer technology and biological information technology, and particularly relates to a technology for predicting transcription factor binding sites. Background technique [0002] In the early days of studying transcription factor binding sites, the traditional identification problem of transcription factor binding sites was to obtain real transcription factor binding sites from DNA sequences experimentally. Later, with the development of bioinformatics, various methods using mathematical models were developed, and the use of mathematical models enabled researchers not to be limited to the only transcription factor binding site information. The research on transcription factor binding site (TFBS) has been going on for a long time, and it was first widely used to study transcriptional regulators in the upstream promoter region of co-expressed genes. Due to the relatively short sequences of transcription factor bi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B15/30G16B30/00G16B40/00G06N3/04
CPCG16B15/30G16B30/00G16B40/00G06N3/049G06N3/045
Inventor 张永清乔少杰郜东瑞曾圆麒陈庆园卢荣钊林志宇
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products