Unlock instant, AI-driven research and patent intelligence for your innovation.

A human gene promoter identification method and system

A technology of promoter and promoter sequence, applied in the field of promoter recognition, can solve the problem of low recognition rate of gene promoters

Active Publication Date: 2017-06-16
SUZHOU UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, this application provides a method and system for identifying human gene promoters, which are used to solve the problem of low recognition rate of gene promoters by existing algorithms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A human gene promoter identification method and system
  • A human gene promoter identification method and system
  • A human gene promoter identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0070] See figure 1 , figure 1 This is a flowchart of a method for identifying human gene promoters disclosed in the embodiments of this application.

[0071] Such as figure 1 As shown, the method includes:

[0072] Step 101: Receive a sample set composed of multiple sample gene sequences;

[0073] Step 102: Calculate the cytosine and guanine CG preference characteristics of each sample gene sequence respectively to obtain statistical results;

[0074] Specifically, the number of cytosine C (Cytosine) and guanine G (Guanine) CG in the gene sequence of each sample is counted. If the ratio of the number of C and G of the entire sample gene sequence of CG is greater than a certain value, the sample gene is considered The sequence has a preference for CG, otherwise there is no preference for CG.

[0075] Step 103: Divide all sample gene sequences into two categories according to the statistical results, one category has the CG preference feature, and the other category does not have the...

Embodiment 2

[0087] In this embodiment, we will introduce in detail the specific implementation process of each step in the first embodiment.

[0088] First, define the sample set composed of multiple sample gene sequences as:

[0089] Where x i ∈R L , Y i ∈{"promoter", "exon", "intron", "3'UTR"}, N is the number of samples, and L is the length of the sample gene sequence.

[0090] It should be noted that the 3'UTR is the untranslated region at the 3'end of the gene sequence.

[0091] Next, we count the cytosine and guanine CG preference characteristics of each sample gene sequence, specifically:

[0092] For each sample gene sequence x i Count the ratio of the content of cytosine C and guanine G, and express the sample set after statistics as:

[0093] among them n C Is the number of cytosine C in the sample gene sequence, n G Indicates the number of guanine G in the sample gene sequence.

[0094] Through the statistics of CG preference characteristics, all sample gene sequences are divided into t...

Embodiment 3

[0132] See figure 2 , figure 2 This is a structural diagram of a human gene promoter recognition system disclosed in the embodiments of this application.

[0133] Such as figure 2 As shown, the system includes:

[0134] The receiving unit 21 is configured to receive a sample set composed of multiple sample gene sequences;

[0135] The statistical unit 22 is used to separately count the cytosine and guanine CG preference characteristics of each sample gene sequence to obtain statistical results;

[0136] The classification unit 23 is configured to divide all the sample gene sequences into two categories according to the statistical results, one category has the CG preference feature, and the other category does not have the CG preference feature;

[0137] The feature extraction unit 24 is configured to extract the rigid features, CpG island features, and quadruple component features of each sample gene sequence after the division;

[0138] The rigid classifier 25 composed of the rigid ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a promoter identification method. Statistics on the preference characteristics of cytosine (C) and guanine (G) is carried out on a plurality of sample gene sequences, the sample gene sequences are divided into two types, the following steps are carried out on the two types of sample gene sequences respectively, namely the rigidity features, the CpG island features and the quadruplet component features of each sample gene sequence are extracted respectively, corresponding classifiers are established to carry out promoter identification judging on the sample gene sequences, the five-conjoint component features of an identified non-promoter sequence are extracted, a five-conjoint classifier is established, promoter identification judging is carried out again, when the identification results meet a preset condition, the current sample gene sequence is determined as a promoter sequence, and otherwise the current sample gene sequence is a non-promoter sequence. The rigidity features, the CpG island features and the component features of genes are fully considered, and through classification identification, the finally-given promoter identification results are high in accuracy.

Description

Technical field [0001] This application relates to the technical field of promoter recognition, and more specifically, to a method and system for human gene promoter recognition. Background technique [0002] After the completion of the human gene draft, the study of human gene expression regulation has become a very challenging research direction. Promoter recognition plays an important role in the interpretation of the function of the entire genome. Therefore, how to recognize human promoters quickly and well has become a hot research field. [0003] At present, the predicted promoters mainly start from the four directions of identifying the transcription start site of the promoter, the core promoter region, the transcription factor binding domain and the CpG island of the promoter. Among them, the meaning of CpG island (CpG island) is: the distribution of CpG dinucleotides in the human genome is very uneven, and in certain segments of the genome, CpG is maintained or higher tha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/10
Inventor 张莉徐文轩罗璇王邦军杨季文李凡长
Owner SUZHOU UNIV