A human gene promoter identification method and system
A technology of promoter and promoter sequence, applied in the field of promoter recognition, can solve the problem of low recognition rate of gene promoters
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0070] See figure 1 , figure 1 This is a flowchart of a method for identifying human gene promoters disclosed in the embodiments of this application.
[0071] Such as figure 1 As shown, the method includes:
[0072] Step 101: Receive a sample set composed of multiple sample gene sequences;
[0073] Step 102: Calculate the cytosine and guanine CG preference characteristics of each sample gene sequence respectively to obtain statistical results;
[0074] Specifically, the number of cytosine C (Cytosine) and guanine G (Guanine) CG in the gene sequence of each sample is counted. If the ratio of the number of C and G of the entire sample gene sequence of CG is greater than a certain value, the sample gene is considered The sequence has a preference for CG, otherwise there is no preference for CG.
[0075] Step 103: Divide all sample gene sequences into two categories according to the statistical results, one category has the CG preference feature, and the other category does not have the...
Embodiment 2
[0087] In this embodiment, we will introduce in detail the specific implementation process of each step in the first embodiment.
[0088] First, define the sample set composed of multiple sample gene sequences as:
[0089] Where x i ∈R L , Y i ∈{"promoter", "exon", "intron", "3'UTR"}, N is the number of samples, and L is the length of the sample gene sequence.
[0090] It should be noted that the 3'UTR is the untranslated region at the 3'end of the gene sequence.
[0091] Next, we count the cytosine and guanine CG preference characteristics of each sample gene sequence, specifically:
[0092] For each sample gene sequence x i Count the ratio of the content of cytosine C and guanine G, and express the sample set after statistics as:
[0093] among them n C Is the number of cytosine C in the sample gene sequence, n G Indicates the number of guanine G in the sample gene sequence.
[0094] Through the statistics of CG preference characteristics, all sample gene sequences are divided into t...
Embodiment 3
[0132] See figure 2 , figure 2 This is a structural diagram of a human gene promoter recognition system disclosed in the embodiments of this application.
[0133] Such as figure 2 As shown, the system includes:
[0134] The receiving unit 21 is configured to receive a sample set composed of multiple sample gene sequences;
[0135] The statistical unit 22 is used to separately count the cytosine and guanine CG preference characteristics of each sample gene sequence to obtain statistical results;
[0136] The classification unit 23 is configured to divide all the sample gene sequences into two categories according to the statistical results, one category has the CG preference feature, and the other category does not have the CG preference feature;
[0137] The feature extraction unit 24 is configured to extract the rigid features, CpG island features, and quadruple component features of each sample gene sequence after the division;
[0138] The rigid classifier 25 composed of the rigid ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


