A screening method for uorfs that inhibit abnormal proliferation of tumors

By identifying and screening genome-wide uORFs and utilizing transcriptome and translatome data analysis software, the lack of uORF screening methods in existing technologies has been addressed. This has enabled the screening of uORFs that inhibit abnormal tumor proliferation, providing support for tumor drug target research.

CN116486917BActive Publication Date: 2026-06-16HUNAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUNAN UNIV
Filing Date
2023-04-24
Publication Date
2026-06-16

Smart Images

  • Figure CN116486917B_ABST
    Figure CN116486917B_ABST
Patent Text Reader

Abstract

The application discloses a screening method of uORF for inhibiting abnormal proliferation of tumors, and comprises the following steps: S1, identifying uORF of the whole genome based on a human whole genome file and a human genome annotation file; S2, acquiring uORF with a translation signal according to tumor transcriptome data; and S3, setting a screening standard of uORF for abnormal proliferation of tumors, and acquiring uORF for inhibiting abnormal proliferation of tumors based on the set screening standard. The application can screen uORF for inhibiting abnormal proliferation of tumors from the whole genome, provides possible tumor drug targets for future scientific research and clinical medicine, and provides a reference direction for the research on the role of uORF in tumors.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of biomedical technology, and in particular relates to a method for screening uORFs that inhibit abnormal tumor proliferation. Background Technology

[0002] Cancer is now the leading cause of death among urban residents in my country, and multiple complex factors work together in the development of tumors, posing a huge challenge to cancer treatment. The emergence of translational omics has provided a new direction for solving the cancer problem.

[0003] With the rapid development of high-throughput sequencing technology and the increasing research on eukaryotic translation processes, translatomics is also constantly evolving. Consequently, research on mRNA (messenger ribonucleic acid) translation is no longer limited to the CDS (protein-coding sequence) region, but has begun to focus on translation phenomena in the 5'UTR (5' untranslated region), where the translated sequence is called the uORF (upstream open reading frame). Due to the increasing importance of uORF in translation regulation, research on uORF regulation in tumors is also increasing. For example, the translation of ATF4 protein (cyclic adenosine monophosphate-dependent transcription factor) in oral squamous cell carcinoma is abnormally regulated by uORF, leading to abnormal tumor proliferation. uORF inhibition of HER2 (an important prognostic factor for breast and gastric cancer) mRNA translation is suppressed in breast cancer, and inhibition of HER2 uORF promotes breast cancer development. Although there are examples of abnormal uORF translation affecting tumor occurrence and development, this technical field still lacks methods for screening uORFs that inhibit abnormal tumor proliferation from the whole genome. Summary of the Invention

[0004] To address the aforementioned technical problems, this invention provides a method for screening uORFs that inhibit abnormal tumor proliferation, the method comprising the following steps:

[0005] S1. Identify uORFs of the whole genome based on human whole genome files and human genome annotation files;

[0006] S2. Obtain uORFs with translational signals based on tumor translatome data;

[0007] S3. Set the screening criteria for abnormal tumor proliferation uORF, and obtain uORF that inhibits abnormal tumor proliferation based on the set screening criteria.

[0008] Preferably, step S1 specifically includes:

[0009] S11. Download the human whole genome cDNA fasta file and human genome annotation file from the ENSEMBL database, and screen out protein-coding genes based on the human genome annotation file. Then, find uORFs from the screened protein-coding genes according to the human whole genome cDNA fasta file. uORFs satisfy the following conditions: they are started by a start codon before the start codon of the CDS of the same transcript, and there is a stop codon in the same open reading frame of the start codon.

[0010] S12. Extract the genetic background information of uORF and improve the uORF annotation file to identify uORFs of the whole genome.

[0011] Preferably, step S12 specifically involves: extracting and recording the genomic background information of uORF, and annotating it according to the transcript name plus the position of the first base of the uORF start codon on the transcript, and recording the position of the uORF start codon, the position of the stop codon, and the sequence length.

[0012] Preferably, the extraction and recording of the genomic background information of the uORF specifically involves: First, extracting the start codons of all annotated uORFs and the phyloP scores of the complete uORF sequences from the UCSC genomebrowser site to determine the conservation of the uORF; then, establishing a position probability matrix based on the 6 nucleotides upstream and 1 nucleotide downstream of the CDS start codon of all protein-coding genes, and using this position probability matrix as the Kozak sequence score background; next, extracting the 6 nucleotides upstream and 1 nucleotide downstream of the uORF start codon to calculate the Kozak score of the uORF; finally, calculating the distance between the uORF start codon and the transcript start position, and the distance between the uORF stop codon and the CDS start codon of the transcript, based on the position of the uORF in the transcript, thereby obtaining the genomic background information of the uORF.

[0013] Preferably, step S2 specifically includes:

[0014] S21. Obtain FastQ data from the transcriptome and translatome, and remove adapter sequences, substandard reads, and reads that perfectly match rRNA and tRNA from the reads.

[0015] S22. Using STAR software, the reads obtained after processing in step S21 are compared with the reference genome to obtain binary sequence alignment map BAM format files of reads successfully aligned to the reference genome. Then, based on RSEM software, the highest expression transcripts of each gene in the sample are identified from the BAM format files of the transcriptome and new human genome annotation files are generated.

[0016] S23. Based on featureCounts software and using new human genome annotation files, determine the expression level of each gene at the transcriptome level in the transcriptome data;

[0017] S24. Count the number of reading segments of different lengths within the translation group data, and filter out reading segments that cover more than 85% of the total reading segments;

[0018] S25. Based on the translation initiation site and the periodicity of the three bases, determine the P site of different length reads, locate the site of each read at the identified P site, calculate the periodicity of the three bases of all reads in the sample, and screen out qualified samples.

[0019] S26. Based on the qualified samples selected in step S25, count the number of uORF and CDS reads in each transcript.

[0020] Preferably, step S21 specifically involves: first, acquiring FastQ data from the transcriptome and translatome; then, using Fastx_clipper software to delete reads shorter than 25 bases and to remove reads containing the unknown base N; next, using Fastq_quality_filter software to remove low-quality reads where half of the bases have a quality of less than 20; and finally, using Bowtie2 software to align the reads with human rRNA and tRNA sequences, identifying and discarding reads that perfectly match the reference genome rRNA and tRNA.

[0021] Preferably, step S22 specifically involves: using STAR software to align the reads obtained after step S21 with the reference genome, retaining at most two base mismatches and a unique alignment read with a number of successfully matched bases greater than 16, obtaining a BAM format file of reads successfully aligned to the reference genome, determining the position of each read on the reference genome to obtain a SAM format file of the position of each transcript of the reads successfully aligned to the reference genome; sorting the alignment records in the output BAM format file according to chromosome position based on the results of the RSEM-calculate-Expression command, then comparing the expression levels of different transcripts in the same gene based on the alignment information of each read in the output BAM format file, and considering the transcript with the highest expression level in the gene as the transcript expressed in the sample; finally, extracting the information of the transcript with the highest expression level from the total annotation file to generate a new human genome annotation file containing only the highest expression level.

[0022] Preferably, step S25 specifically involves: when the phase with the most reads in the translation initiation site map is the same as the phase with the most reads in the tribase periodicity map, the P site of that read is considered to be the distance between the phase with the most reads in the translation initiation site map and the 0 site; when the phase with the most reads in the translation initiation site map is not the same as the phase with the most reads in the tribase periodicity map, the read of that length is deleted; then, the site of each read is located at the identified P site and the tribase periodicity of all reads in the sample is calculated; if all reads in the sample have tribase periodicity, it is considered a qualified sample.

[0023] Preferably, step S3 specifically involves: calculating the fold difference in uORF translation efficiency and the fold difference in CDS translation efficiency between the tumor and adjacent normal tissue using Riborex software, and obtaining the uORF that inhibits abnormal tumor proliferation based on the absolute value of the difference between the uORF translation efficiency fold difference and the CDS translation efficiency fold difference.

[0024] Preferably, when the uORF translation efficiency difference factor is uORF(LFC) TE The difference in translation efficiency between CDS and CDS (LFC) TE If the absolute value of the difference between the two values ​​is greater than 1.5, then the uORF is determined to be a uORF that inhibits abnormal tumor proliferation.

[0025] Compared with existing technologies, the screening method for uORFs that inhibit abnormal tumor proliferation proposed in this invention first identifies uORFs across the entire genome and extracts those with translational signals. Then, screening criteria for uORFs that inhibit abnormal tumor proliferation are established to identify those that do so. This invention can screen for uORFs that inhibit abnormal tumor proliferation, providing potential tumor drug targets for future research and clinical medicine, and offering a reference direction for research on the role of uORFs in tumors. Attached Figure Description

[0026] Figure 1 This is a flowchart of a screening method for uORFs that inhibit abnormal tumor proliferation, as described in this invention.

[0027] Figure 2 This is a schematic diagram showing the distribution of the number of read segments of all lengths in the translation group of this invention.

[0028] Figure 3 This is the translational quality control chart for a read length of 28 and the P site being the 12th base of each read in this embodiment.

[0029] Figure 4 This is the translational quality control chart for a read length of 29 and the P site being the 13th base of each read in this embodiment.

[0030] Figure 5 This is a schematic diagram illustrating the periodicity of the three bases in all reads of the sample in this invention.

[0031] Figure 6 This is a standard schematic diagram of uORF for screening abnormal tumor proliferation in this invention. Detailed Implementation

[0032] To better understand the above-mentioned objectives, features, and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in these embodiments can be combined with each other.

[0033] like Figures 1-6 As shown, a method for screening uORFs that inhibit abnormal tumor proliferation includes the following steps:

[0034] S1. Identify uORFs of the whole genome based on human whole genome files and human genome annotation files;

[0035] This step specifically includes:

[0036] S11. Download the human whole-genome cDNA fasta file and human genome annotation file from the ENSEMBL database (a genome database). Based on the human genome annotation file, screen out protein-coding genes. Then, based on the human whole-genome cDNA fasta file, identify uORFs from the screened protein-coding genes. uORFs satisfy the following conditions: they are started by a start codon before the start codon in the CDS of the same transcript, and a stop codon exists within the same open reading frame of the start codon. The human whole-genome cDNA fasta file is located at https: / / ftp.ensembl.org / pub / release-109 / fasta / homo_sapiens / cdna / Homo_sapiens.GRCh38.cdna.all.fa.gz, and the human genome annotation file is...

[0037] https: / / ftp.ensembl.org / pub / release-104 / gtf / homo_sapiens / Homo_sapiens.GRCh38.104.gtf.gz;

[0038] S12. Extract the genomic background information of uORFs and improve the uORF annotation files to identify uORFs of the whole genome. Specifically, extract and record the genomic background information of uORFs, that is: First, extract the start codons of all annotated uORFs and the phyloP score (a nucleic acid conservation score, the higher the score, the more conserved the sequence, indicating that the sequence plays an important function) of the complete uORF sequence from the UCSC genome browser site (a genome browser containing genome drafts of multiple species such as humans, mice and rats, and providing a series of web analysis tools; in this embodiment, the hg38.phyloP100way.bw file is downloaded from this site) to determine the conservation of uORFs; then, establish a position probability matrix based on the 6 nucleotides upstream of the CDS start codon and the 1 nucleotide downstream of the CDS start codon of all protein-coding genes, and use this position probability matrix as the Kozak sequence (which refers to the Kozak sequence located in eukaryotic mRNA). The background score is calculated by extracting 6 nucleotides upstream of the uORF start codon and 1 nucleotide downstream of the uORF start codon. The Kozak score (a sequence score used to determine the consistency of the start codon during translation; Kozak refers to the sequence of several nucleotides surrounding the start codon in eukaryotic mRNA, typically gccRccAUGG, where R represents any base and AUG represents the start codon; a higher Kozak score indicates that the start codon is more easily translated, thus promoting gene expression) is considered more likely. Finally, based on the location of the uORF in the transcript, the distance between the start codon of the uORF and the start position of the transcript, and the distance between the stop codon of the uORF and the start codon of the CDS of the transcript, are calculated to obtain the genomic background information of the uORF. At the same time, annotation is performed according to the transcript name plus the position of the first base of the start codon of the uORF on the transcript, and the position of the start codon, the position of the stop codon, and the sequence length of the uORF are recorded. It should be noted that the uORF identified in this step refers to the uORF annotated at the DNA level in the human reference genome.

[0039] S2. Obtain uORFs with translational signals based on tumor translatome data;

[0040] This step specifically includes:

[0041] S21. Obtain FastQ (the raw data format of next-generation sequencing) data for the transcriptome and translatome, and remove adapter sequences, substandard reads, and reads that perfectly match rRNA and tRNA. Specifically: First, obtain FastQ data for the transcriptome and translatome; then, use Fastx_clipper software to remove reads shorter than 25 bases and reads containing the unknown base N; then use Fastq_quality_filter software to remove low-quality reads with half of their bases less than 20; finally, use Bowtie2 software to align the reads with human rRNA and tRNA sequences, identify and discard reads that perfectly match the reference genome rRNA and tRNA. In this step, the obtained transcriptome data refers to all transcribed mRNAs, and the obtained translatome data refers to all mRNAs that are bound to and translated by ribosomes. Not all annotated uORFs in step S1 can be ribosome-aggregated and participate in translation.

[0042] S22. Using STAR software, the reads obtained after processing in step S21 are aligned with the reference genome to obtain binary sequence alignment maps (BAM format) of reads successfully aligned to the reference genome. Then, based on RSEM software, the transcripts with the highest expression levels of each gene in the sample are identified from the BAM format files of the transcriptome, and new human genome annotation files are generated. Specifically, using STAR software, the reads obtained after processing in step S21 are aligned with the reference genome, and unique alignment reads with a maximum of two base mismatches and a number of successfully matched bases greater than 16 are retained to obtain BAM format files of reads successfully aligned to the reference genome. The position of each read on the reference genome is determined to obtain the position of the reads successfully aligned to the reference genome on each transcript (SAM format file, which refers to the format for storing RNA-seq alignment results); based on RSEM-calculate- The Expression (a transcript quantification script) command outputs results and sorts the alignment records in the output BAM format file according to chromosome position. Then, based on the alignment information of each read in the output BAM format file, it compares the expression levels of different transcripts of the same gene and considers the transcript with the highest expression level in the gene as the transcript expressed in the sample. Finally, it extracts the highest expression level transcript information from the total annotation file to generate a new human genome annotation file containing only the highest expression level. In the BAM format file processed by RSEM-calculate-Expression software, parameter A represents the average length of all fragments in the input BAM format file, and parameter B represents the standard deviation of all fragment lengths in the input BAM format file. If a strand-specific library construction method is used, the --strandednessforward parameter can also be added.

[0043] S23. Based on featureCounts software (a transcriptome quantification tool) and using new human genome annotation files, determine the expression level of each gene in the transcriptome data at the transcriptome level. If it is a strand-specific library construction method, the --strandedness forward parameter also needs to be added.

[0044] S24. Count the number of reading segments of different lengths within the translation group data, and filter out reading segments that cover more than 85% of the total reading segments, such as... Figure 2 As shown, Figure 2 The distribution of the number of all length reads in the translation group is shown, with the rectangles indicating length reads that cover more than 85% of the total reads.

[0045] S25. Based on the translation initiation site and the tribase periodicity, determine the P sites for reads of different lengths, locate the site of each read at the identified P site, then calculate the tribase periodicity of all reads in the sample, and screen out qualified samples. Specifically, as follows: Figure 3 and Figure 4 As shown, Figure 3 The diagram shows a translatome quality control plot with a read length of 28 and the P site being the 12th base of each read. Figure 4 The diagram shows a translatome quality control plot with a read length of 29 and the P site being the 13th base of each read. Figure 3 a and Figure 4 'a' indicates the phase with the most readings in the three-base periodic diagram. Figure 3 b and Figure 4 b shows the phase with the most read segments in the translation start point diagram (it should be noted that...). Figure 3 b and Figure 4 In step b, the phases are arranged from left to right in the order of phase 1, phase 2, and phase 3 (repeated from left to right). When the phase with the most reads in the translation initiation site map is the same as the phase with the most reads in the trinucleotide periodicity map, the P site of that read length is considered to be the distance between the phase with the most reads in the translation initiation site map and the 0 site. When the phase with the most reads in the translation initiation site map is different from the phase with the most reads in the trinucleotide periodicity map, the read length is deleted. Then, the sites of different read lengths are located at their respective identified P sites, and the trinucleotide periodicity of all read lengths in the sample is recalculated, such as... Figure 5 As shown, Figure 5 The three-base periodicity of all reads in the entire sample is shown. If the phase with the most reads in the sample is 1 and accounts for more than half of all reads, it is considered a qualified sample. To determine whether a sample meets the three-base periodicity standard, it is not only necessary to look at a read of a certain length, but also to make a comprehensive judgment based on the three-base periodicity of all reads in the entire sample.

[0046] S26. Based on the qualified samples selected in step S25, count the number of uORF and CDS reads in each transcript.

[0047] S3. Set the screening criteria for abnormal tumor proliferation uORF, and obtain uORF that inhibits abnormal tumor proliferation based on the set screening criteria.

[0048] This step specifically involves: calculating the fold difference in uORF translation efficiency and the fold difference in CDS translation efficiency between the tumor and adjacent normal tissue using Riborex software; when the fold difference in uORF translation efficiency is greater than that between the tumor and adjacent normal tissue, uORF(LFC) is considered as follows: TE The difference in translation efficiency between CDS and CDS (LFC) TE If the absolute value of the difference between the two values ​​is greater than 1.5, then the uORF is judged to be a uORF that inhibits abnormal tumor proliferation. Figure 6 As shown, Figure 6 The criteria for screening uORF for abnormal tumor proliferation are shown, from Figure 6 As can be seen from this, the difference in uORF translation efficiency between the points between line ① and line ② is proportional to the difference in uORF (LFC). TE The difference in translation efficiency between CDS and CDS (LFC) TE The absolute values ​​of the differences between the points on the upper side of line ① and the lower side of line ② are all no greater than 1.5, while the uORF translation efficiency difference factor between the points on the upper side of line ① and the points on the lower side of line ② is uORF(LFC). TE The difference in translation efficiency between CDS and CDS (LFC) TE The absolute values ​​of the differences were all greater than 1.5, and therefore the points above line ① and below line ② were the selected abnormal tumor proliferation uORFs. Translation efficiency refers to the ratio of the translatome level to the transcriptome level for a gene.

[0049] In this embodiment, whole-genome uORFs are first identified based on the human genome file and human genome annotation file. Then, uORFs with translational signals are extracted from these whole-genome uORFs. Next, screening criteria for uORFs that inhibit abnormal tumor proliferation are established, and uORFs that inhibit abnormal tumor proliferation are selected according to these criteria. Therefore, this method can screen for uORFs that inhibit abnormal tumor proliferation, providing potential tumor drug targets for future research and clinical medicine, and offering a reference direction for research on the role of uORFs in tumors.

[0050] The above provides a detailed description of the uORF screening method for inhibiting abnormal tumor proliferation provided by this invention. Specific examples have been used to illustrate the principles and implementation methods of this invention. The descriptions of the above embodiments are merely for the purpose of helping to understand the core ideas of this invention. It should be noted that those skilled in the art can make various improvements and modifications to this invention without departing from its principles, and these improvements and modifications also fall within the protection scope of the claims of this invention.

Claims

1. A method for screening uORFs that inhibit abnormal tumor proliferation, characterized in that, The method includes the following steps: S1. Identify uORFs of the whole genome based on human whole genome files and human genome annotation files; S2. Obtain uORFs with translational signals based on tumor translatome data; S3. Set screening criteria for tumor abnormal proliferation uORFs, and obtain uORFs that inhibit tumor abnormal proliferation based on the set screening criteria. Specifically, calculate the fold change in uORF translation efficiency and the fold change in CDS translation efficiency between the tumor and adjacent normal tissue using Riborex software, and obtain the uORF that inhibits tumor abnormal proliferation based on the absolute value of the difference between the uORF translation efficiency difference and the CDS translation efficiency difference. When the uORF translation efficiency difference fold change is greater than or equal to the uORF (LFC) TE The difference in translation efficiency between CDS and CDS (LFC) TE If the absolute value of the difference between the two values ​​is greater than 1.5, then the uORF is determined to be a uORF that inhibits abnormal tumor proliferation.

2. The screening method for uORFs that inhibit abnormal tumor proliferation as described in claim 1, characterized in that, Step S1 specifically includes: S11. Download the human whole genome cDNA fasta file and human genome annotation file from the ENSEMBL database, and screen out protein-coding genes based on the human genome annotation file. Then, find uORFs from the screened protein-coding genes according to the human whole genome cDNA fasta file. uORFs satisfy the following conditions: they are started by a start codon before the start codon of the CDS of the same transcript, and there is a stop codon in the same open reading frame of the start codon. S12. Extract the genetic background information of uORF and improve the uORF annotation file to identify uORFs of the whole genome.

3. The screening method for uORFs that inhibit abnormal tumor proliferation as described in claim 2, characterized in that, Step S12 specifically involves: extracting and recording the genomic background information of uORF; annotating the transcript by adding the position of the first base of the uORF start codon on the transcript, and recording the position of the uORF start codon, the position of the stop codon, and the sequence length.

4. The screening method for uORFs that inhibit abnormal tumor proliferation as described in claim 3, characterized in that, The extraction and recording of the genomic background information of the uORF specifically involves the following steps: First, the conservation of the uORF is assessed by extracting the start codons of all annotated uORFs from the UCSC site and the phyloP scores of the complete uORF sequences. Then, a position probability matrix is ​​established based on the 6 nucleotides upstream and 1 nucleotide downstream of the CDS start codon of all protein-coding genes, and this position probability matrix is ​​used as the background for the Kozak sequence score. Next, the Kozak score of the uORF is calculated by extracting the 6 nucleotides upstream and 1 nucleotide downstream of the uORF start codon. Finally, based on the position of the uORF in the transcript, the distance between the uORF start codon and the transcript start position, as well as the distance between the uORF stop codon and the CDS start codon of the transcript, are calculated to obtain the genomic background information of the uORF.

5. The method for screening uORFs that inhibit abnormal tumor proliferation as described in claim 4, characterized in that, Step S2 specifically includes: S21. Obtain FastQ data from the transcriptome and translatome, and remove adapter sequences, substandard reads, and reads that perfectly match rRNA and tRNA from the reads. S22. Using STAR software, the reads obtained after processing in step S21 are compared with the reference genome to obtain binary sequence alignment map BAM format files of reads successfully aligned to the reference genome. Then, based on RSEM software, the highest expression transcripts of each gene in the sample are identified from the BAM format files of the transcriptome and new human genome annotation files are generated. S23. Based on featureCounts software and using new human genome annotation files, determine the expression level of each gene at the transcriptome level in the transcriptome data; S24. Count the number of reading segments of different lengths within the translation group data, and filter out reading segments that cover more than 85% of the total reading segments; S25. Based on the translation initiation site and the periodicity of the three bases, determine the P site of different length reads, locate the site of each read at the identified P site, calculate the periodicity of the three bases of all reads in the sample, and screen out qualified samples. S26. Based on the qualified samples selected in step S25, count the number of uORF and CDS reads in each transcript.

6. The screening method for uORFs that inhibit abnormal tumor proliferation as described in claim 5, characterized in that, Step S21 specifically involves: first, acquiring FastQ data from the transcriptome and translatome; then, using Fastx_clipper software to delete reads shorter than 25 bases and to remove reads containing the unknown base N; next, using Fastq_quality_filter to remove low-quality reads with half of their bases having a quality of less than 20; and finally, using Bowtie2 software to align the reads with human rRNA and tRNA sequences, identifying and discarding reads that perfectly match the reference genome rRNA and tRNA.

7. The method for screening uORFs that inhibit abnormal tumor proliferation as described in claim 6, characterized in that, Specifically, step S22 involves using STAR software to align the reads obtained after step S21 with the reference genome, retaining the unique reads with at most two base mismatches and a number of successfully matched bases greater than 16, obtaining a BAM format file of the reads successfully aligned to the reference genome, and determining the position of each read on the reference genome to obtain a SAM format file of the position of the reads successfully aligned to the reference genome on each transcript. Based on the results of the RSEM-calculate-Expression command, the alignment records in the output bam format file are sorted according to chromosome position. Then, the expression levels of different transcripts in the same gene are compared according to the alignment information of each read in the output bam format file. The transcript with the highest expression level in the gene is regarded as the transcript expressed in the sample. Finally, the information of the transcript with the highest expression level is extracted from the total annotation file to generate a new human genome annotation file containing only the highest expression level.

8. The screening method for uORFs that inhibit abnormal tumor proliferation as described in claim 7, characterized in that, Step S25 specifically involves: when the phase with the most reads in the translation initiation site map is the same as the phase with the most reads in the tribase periodicity map, the P site of that read is considered to be the distance between the phase with the most reads in the translation initiation site map and the 0 site; when the phase with the most reads in the translation initiation site map is different from the phase with the most reads in the tribase periodicity map, the read of that length is deleted; then, the site of each read is located at the identified P site and the tribase periodicity of all reads in the sample is calculated; if all reads in the sample have tribase periodicity, it is considered a qualified sample.