Supercharge Your Innovation With Domain-Expert AI Agents!

Deep learning-based transcription factor binding site positioning method

A transcription factor and binding site technology, applied in the field of deep learning, can solve the problems of long response time, poor performance, and low efficiency, and achieve fast response time, good performance, and accurate prediction and positioning

Active Publication Date: 2022-07-15
GUANGXI ACAD OF SCI
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are two shortcomings in such a model: first, the length of TFBSs in the model is fixed; second, the model assumes that the positions of TFBSs are independent of each other
[0005] In order to solve the above problems, the TFBSs recognition method based on k-mer encoding was proposed, which can encode the dependence relationship between nucleotides; but in the method based on k-mer, the gene sequence is only composed of k-mer The vector representation of the count does not consider the position of each segment in the sequence. In addition, although the position-specific sequence kernel exists, it maps the sequence to a higher-dimensional space, making it inefficient.
[0006] To sum up, for the positioning of TFBSs, the current existing method is to use the recognition algorithm to filter out possible sequences, and then use the method of probability statistics to determine the binding area. It is necessary to traverse the entire sequence to select the position with the highest probability, and the amount of data is not large. It can still cope with the time, but with the increase of the amount of data, it shows the defects of poor performance and long response time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning-based transcription factor binding site positioning method
  • Deep learning-based transcription factor binding site positioning method
  • Deep learning-based transcription factor binding site positioning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0050] like figure 1 As shown, this embodiment provides a deep learning-based method for locating transcription factor binding sites, including:

[0051] One-hot encoding is performed on the DNA sequence bound to the transcription factor to obtain a data set, and the data set is divided into a training set and a test set based on the k-fold cross-validation method;

[0052] Further, one-hot encoding is performed on the DNA sequence bound to the transcription factor: data encoding is performed on the bases {A, C, G, T} in the DNA sequence according to one-hot encoding, and the DNA sequence is encoded by one-hot encoding. The conserved information data in the sequence is selected from the data at the corresponding position, and the two constitute the coding information of the DNA sequence.

[0053] In this embodiment, a given DNA sequence (assuming a length of L) is one-hot encoded as input data, and at the same time, a Dense Label label (length is L, and the element value of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a transcription factor binding site positioning method based on deep learning, and the method comprises the steps: carrying out one-hot coding on a DNA sequence bound with a transcription factor, obtaining a data set, and dividing the data set into a training set and a test set based on a k-fold cross validation method; constructing an FCNARRB + model based on a full convolutional network, and setting a loss function and an evaluation index; the FCNARRB + model is trained based on the training set and the loss function, the trained FCNARRB + model is used for positioning transcription factor binding sites, and the positioning result of the trained FCNARRB + model is tested and evaluated through the test set and the evaluation index. According to the method, a nucleotide-level classification model is introduced, and accurate prediction and positioning of the transcription factor binding site are realized.

Description

technical field [0001] The invention belongs to the technical field of deep learning, and in particular relates to a method for locating transcription factor binding sites based on deep learning. Background technique [0002] Proteins are ubiquitous in the human body and are a class of macromolecular compounds composed of 20 amino acids that are synthesized by cellular activities such as gene transcription and translation. Among them, there is a type of protein that specifically binds to chromosomes - DNA Binding Protein (DBP). DBP plays a key role in gene replication, recombination, chain cleavage, transcription and other processes, and is associated with cell cycle staining. A series of qualitative changes are closely related; Transcription Factors (TFs) belong to one of the DBPs, also known as trans-acting factors, which can specifically interact with the non-coding regions of the DNA sequences in the regulatory regions, and have specific effects on gene expression. Tran...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B20/30G16B40/00G16B30/10G06K9/62G06N3/04G06N3/08
CPCG16B20/30G16B40/00G16B30/10G06N3/08G06N3/047G06N3/048G06N3/045G06F18/2415G06F18/241
Inventor 黄德双徐尤红元昌安
Owner GUANGXI ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More