Supercharge Your Innovation With Domain-Expert AI Agents!

Human gene promoter identification method and system

An identification method and promoter technology, applied in the field of promoter identification, can solve the problem of low identification rate of gene promoters

Active Publication Date: 2014-06-18
SUZHOU UNIV
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, this application provides a method and system for identifying human gene promoters, which are used to solve the problem of low recognition rate of gene promoters by existing algorithms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Human gene promoter identification method and system
  • Human gene promoter identification method and system
  • Human gene promoter identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0070] see figure 1 , figure 1 It is a flow chart of a human gene promoter identification method disclosed in the embodiment of this application.

[0071] like figure 1 As shown, the method includes:

[0072] Step 101: receiving a sample set composed of multiple sample gene sequences;

[0073] Step 102: Count the cytosine and guanine CG preference characteristics of each sample gene sequence separately to obtain statistical results;

[0074] Specifically, the number of cytosine C (Cytosine) and guanine G (Guanine) CG in each sample gene sequence is counted. If the ratio of C and G numbers in the entire sample gene sequence of CG is greater than a certain value, the sample gene is considered Sequences have a preference feature for CG, otherwise there is no CG preference feature.

[0075] Step 103: divide all sample gene sequences into two categories according to the statistical results, one category has the CG preference feature, and the other category does not have the CG...

Embodiment 2

[0087] In this embodiment, we will introduce the specific implementation process of each step in the first embodiment in detail.

[0088] First, define a sample set consisting of multiple sample gene sequences as:

[0089] where x i ∈R L ,y i ∈ {"promoter", "exon", "intron", "3′UTR"}, N is the number of samples, L is the length of the sample gene sequence.

[0090] It should be noted that the 3'UTR is the untranslated region at the 3' end of the gene sequence.

[0091] Next, we count the cytosine and guanine CG preference characteristics of each sample gene sequence, specifically:

[0092] For each sample gene sequence x i The ratio of cytosine C and guanine G content is counted, and the sample set after statistics is expressed as:

[0093] in no C is the number of occurrences of cytosine C in the sample gene sequence, n G Indicates the number of occurrences of guanine G in the sample gene sequence.

[0094] Through the statistics of CG preference characteristic...

Embodiment 3

[0132] see figure 2 , figure 2 It is a structural diagram of a human gene promoter recognition system disclosed in the embodiment of this application.

[0133] like figure 2 As shown, the system includes:

[0134] A receiving unit 21, configured to receive a sample set composed of a plurality of sample gene sequences;

[0135] The statistics unit 22 is used to separately count the cytosine and guanine CG preference characteristics of each sample gene sequence to obtain statistical results;

[0136] A classification unit 23, configured to divide all sample gene sequences into two categories according to the statistical results, one category has the CG preference feature, and the other category does not have the CG preference feature;

[0137] The feature extraction unit 24 is used to extract the rigidity feature, CpG island feature and quadruple composition feature of each sample gene sequence for each type of sample gene sequence after division;

[0138] A rigid classi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a promoter identification method. Statistics on the preference characteristics of cytosine (C) and guanine (G) is carried out on a plurality of sample gene sequences, the sample gene sequences are divided into two types, the following steps are carried out on the two types of sample gene sequences respectively, namely the rigidity features, the CpG island features and the quadruplet component features of each sample gene sequence are extracted respectively, corresponding classifiers are established to carry out promoter identification judging on the sample gene sequences, the five-conjoint component features of an identified non-promoter sequence are extracted, a five-conjoint classifier is established, promoter identification judging is carried out again, when the identification results meet a preset condition, the current sample gene sequence is determined as a promoter sequence, and otherwise the current sample gene sequence is a non-promoter sequence. The rigidity features, the CpG island features and the component features of genes are fully considered, and through classification identification, the finally-given promoter identification results are high in accuracy.

Description

technical field [0001] The present application relates to the technical field of promoter identification, and more specifically, relates to a method and system for identifying a human gene promoter. Background technique [0002] After the completion of the human gene draft, the research on the regulation of human gene expression has become a very challenging research direction. Promoter identification plays an important role in interpreting the function of the whole genome, so how to identify human promoters quickly and well has become a hot research field. [0003] At present, the prediction of promoters mainly starts from four directions of identifying the transcription initiation site of the promoter, the core promoter region, the transcription factor binding domain and the CpG island of the promoter. Among them, the meaning of CpG island (CpG island) is: the distribution of CpG dinucleotides in the human genome is very uneven, and in some sections of the genome, CpG mai...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/10
Inventor 张莉徐文轩罗璇王邦军杨季文李凡长
Owner SUZHOU UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More