A method for constructing and applying a full-length non-coding RNA sequencing library.
By ligating 3' DNA adapters and 5' RNA adapters to both ends of RNA, and combining DNA probes targeting non-target RNAs with truncated reverse transcription primers, a full-length non-coding RNA sequencing library was constructed. This solved the problem of capturing and sequencing the full-length sequence of medium-length ncRNAs in existing technologies, and enabled efficient and accurate ncRNA detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUN YAT SEN UNIV
- Filing Date
- 2023-04-06
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to efficiently and accurately capture and sequence the full-length sequences of medium-length, low-abundance non-coding RNAs (ncRNAs), especially K-turn RNAs. Furthermore, high-throughput sequencing methods generate excessive amounts of useless data, impacting data quality and result interpretation.
By ligating 3' DNA adapters and 5' RNA adapters to both ends of the RNA, and combining them with DNA probes targeting non-target RNA and truncated reverse transcription primers, PCR amplification was performed to construct a full-length non-coding RNA sequencing library.
It increased the sequence ratio of target ncRNAs in the sequencing library, reduced useless sequencing, lowered sequencing costs, and improved the detection accuracy of low-to-medium abundance ncRNAs.
Smart Images

Figure BDA0004166883070000071 
Figure BDA0004166883070000081 
Figure BDA0004166883070000082
Abstract
Description
Technical Field
[0001] This invention relates to the field of molecular biology, and in particular to a method for constructing and applying a sequencing library that captures full-length non-coding RNA. Background Technology
[0002] In addition to transcribed messenger RNA (mRNA), which encodes proteins, the human genome also transcribes a large number of non-coding RNAs (ncRNAs). Currently identified ncRNAs include: tRNA and rRNA involved in protein synthesis; snRNA involved in RNA processing; box C / D snoRNA and box H / ACA snoRNA involved in RNA modification; and miRNA, piRNA, circRNA, and lncRNA involved in post-transcriptional regulation of mRNA. Mutations and abnormal expression of these ncRNAs are closely related to major human diseases such as cancer. As key regulatory molecules of genetic information, ncRNAs need to be processed into specific lengths post-transcriptionally and interact with RNA-binding proteins to exert their regulatory functions. Taking Kink-turn (K-turn) type RNA as an example, it is an ncRNA containing a three-dimensional K-turn structure formed by a C box (conserved motif RUGAUGA) and a D box (conserved motif CUGA). Its length is usually 60-200 nt. It guides the 2'-O-methylation modification of RNA or the assembly of splicing complex through a 15.5 kDa binding protein to the K-turn structure (abbreviated as 15.5K, homologous protein in yeast is Snu13p, in archaea it is L7Ae, and in bacteria it is YbxF / YbxQ).
[0003] Among related technologies, the identification of full-length ncRNA sequences is an effective means of sequence analysis. This mainly includes RNA sequence alignment methods, RNA structure prediction methods, RACE (rapid-amplification of cDNA ends), and high-throughput sequencing methods. RNA sequence alignment and RNA structure prediction methods primarily rely on the conservation of ncRNA sequence and structure to predict RNA ends and determine the full-length sequence. While these methods are efficient, their accuracy is low. RACE-based methods can accurately identify the full-length sequence of RNA ends, but this technique has very low throughput. High-throughput sequencing methods can efficiently and accurately analyze the full-length sequence of RNA ends. However, current ncRNA full-length sequencing methods mainly target small RNAs such as miRNAs and piRNAs, or long ncRNAs with polyA tails similar to mRNAs such as lncRNAs. For medium-length, low-abundance ncRNAs such as K-turn RNA, which lack polyA tails, there is currently no specific method for high-throughput sequencing analysis. Therefore, how to specifically capture medium-length and low-abundance ncRNAs such as K-turn RNA and analyze their full-length sequences remains the biggest technical challenge in the field of RNA research.
[0004] In recent years, some researchers have developed a number of techniques for capturing the interaction between RNA and RNA-binding proteins, such as RIP-seq and CLIP-seq. These techniques are mainly divided into two categories: (1) capturing the RNA interacting with RNA through immunoprecipitation of RNA-binding proteins, then fragmenting the RNA and performing reverse transcription using random primers, followed by library construction and sequencing; (2) digesting RNA regions that are not bound by RNA-binding proteins through enzyme digestion, then capturing the RNA fragments interacting with RNA through immunoprecipitation of RNA-binding proteins, and constructing a library after ligation with RNA adapters. Although the above methods can study RNA and its interacting RNA-binding proteins in high throughput, they also have some obvious drawbacks: First, the information obtained is RNA fragment information, and the full-length information of the interacting RNA cannot be obtained, so it is impossible to rule out whether this interaction exists in the RNA precursor or mature form; Second, in the sequencing data, high-abundance RNA reads are the main ones (e.g., rRNA, snRNA, tRNA, etc.), and the large amount of such useless data leads to a severe compression of the required useful data, affecting data quality and result interpretation. Therefore, how to capture the full length of various ncRNAs and increase their proportion in sequencing data remains an urgent problem to be solved. Summary of the Invention
[0005] This invention aims to address at least one of the technical problems existing in the prior art. To this end, this invention proposes a method and application for constructing a sequencing library that captures full-length non-coding RNA, which can increase the ratio of target ncRNA reads in the sequencing library, significantly reduce useless sequencing, lower sequencing costs, and improve the accuracy of detecting low-to-medium abundance ncRNAs.
[0006] This invention also proposes a method for sequencing full-length non-coding RNA.
[0007] A first aspect of the present invention provides a method for constructing a sequencing library that captures full-length non-coding RNA, comprising:
[0008] Step S1: Obtain the RNA of the sample to be tested, and ligate a 3' DNA adapter and a 5' RNA adapter to both ends of the RNA to obtain the RNA ligation product;
[0009] Step S2: Mix the RNA ligation product with a DNA probe targeting non-target RNA and anneal it to remove non-target RNA and residual DNA probe, thereby obtaining the target RNA ligation product.
[0010] Step S3: Design truncated reverse transcription primers for the target RNA ligation product and synthesize cDNA. Then, use primers containing anchored bases to perform PCR amplification on the cDNA to obtain a full-length non-coding RNA sequencing library.
[0011] The construction method according to embodiments of the present invention has at least the following beneficial effects:
[0012] (1) This invention first captures the paired-end information of RNA by ligating 3' DNA adapters and 5' RNA adapters at both ends of RNA; then, after annealing the RNA ligation product with a DNA probe targeting non-target RNA, the non-target RNA and the DNA probe are digested with RNase H and single-stranded DNA exonuclease RecJf, respectively, which effectively reduces the interference of non-target RNA and residual DNA probes on subsequent experiments. After obtaining the enriched target RNA ligation product, cDNA is synthesized using truncated reverse transcription primers, and then PCR amplification is performed using primers containing anchored bases to improve the accuracy of RNA end identification.
[0013] (2) In related technologies, random primers are used for reverse transcription to obtain sequence information, which cannot obtain information about the two ends of RNA. However, in this invention, adapters are added to the target RNA strand to effectively anchor the RNA ends, and then sequencing can obtain RNA end information at the single-base precision level. Without the corresponding adapters, the RNA ends cannot be anchored. It is evident that the method of this invention can obtain ends at the single-base precision, providing an effective means for accurately studying the structure and motif characteristics of ncRNAs and discovering new types of ncRNAs.
[0014] (3) Since the present invention captures the information at both ends of RNA by connecting 3' DNA adapters and 5' RNA adapters at both ends of RNA, the coding RNA (such as mRNA) has a 5' cap structure. Therefore, the coding RNA cannot be connected to the 5' RNA adapter when the adapter is connected, thus effectively avoiding interference of the coding RNA on the construction and identification results of the full-length non-coding RNA library.
[0015] (4) The method of the present invention can greatly increase the proportion of target ncRNA sequences in the sequencing library, significantly reduce useless sequencing and reduce sequencing costs, and effectively improve the accuracy of detecting low- to medium-abundance ncRNAs.
[0016] In some embodiments of the present invention, the full-length non-coding RNA includes at least one of tRNA, rRNA, snRNA, snoRNA, scaRNA, miRNA, piRNA, circRNA, and lncRNA.
[0017] Preferably, the non-coding RNA is a non-coding RNA of medium length and low abundance.
[0018] Preferably, the non-coding RNA is a Kink-turn type RNA.
[0019] In some embodiments of the present invention, the RNA of the sample to be tested is RNA with genomic DNA removed.
[0020] Preferably, the method for removing genomic DNA includes: adding RQ1 DNase 1×Reaction Buffer, 2U / μL RiboLock RNase Inhibitor and RQ1 RNase-Free DNase to the RNA of the sample to be tested, reacting at 37°C for 30 minutes, and finally purifying the RNA using the RNAClean & Concentrator-5 kit.
[0021] The RQ1 RNase-Free DNase can be a Promega product, catalog number M6101; the RNAClean & Concentrator-5 can be a ZYMO RESEARCH product, catalog number R1015.
[0022] In some embodiments of the present invention, the RNA of the sample to be tested includes at least one of total RNA from cells, total RNA from tissues, RNA immunoprecipitated with RNA-binding proteins, and RNA from different organelles.
[0023] Preferably, the RNA used for immunoprecipitation of the RNA-binding protein includes 15.5K immunoprecipitated RNA.
[0024] In some embodiments of the present invention, the total RNA from the cells or tissues can be obtained using the TRIzol RNA extraction method.
[0025] In some embodiments of the present invention, the 3' DNA adapter is an adenylated 3' DNA adapter with random bases at the 5' end.
[0026] Preferably, the 3' DNA adapter is: rAppNNNNNNTGGAATTCTCGGGTGCCAAGG-C3 Spacer, where rApp is an adenylated modification, NNNNNN is a six-base deoxyribonucleotide, N represents any one of the four deoxyribonucleotides A, T, C, and G, and C3 Spacer is a blocking group.
[0027] Preferably, the unconnected 3' DNA adapter is removed using 5' Deadenylase, single-stranded DNA binding protein, and RecJf; specifically, the 5' Deadenylase is reacted at 28–32°C for 0.8–1.2 hours, the single-stranded DNA binding protein is reacted on ice for 25–35 minutes, and the RecJf is reacted at 36–38°C for 0.8–1.2 hours.
[0028] In some embodiments of the present invention, the 3' end of the 5' RNA adapter carries random bases.
[0029] Preferably, the nucleotide sequence of the 5' RNA linker is: GUUCAGAGUUCUACAGUCCGACGAUCNNNNNN, where NNNNNN represents a ribonucleotide with six random bases, and N represents any one of the four ribonucleotides A, U, C, and G.
[0030] Since rRNA, snRNA, and snoRNA have the highest abundance in total RNA and have the greatest impact on the identification of new types of non-coding RNA, the above three types of RNA are used as examples for removal operations in this invention.
[0031] Specifically, when the full-length non-coding RNA is selected from at least one of tRNA, scaRNA, miRNA, piRNA, circRNA, and lncRNA, the non-target RNA includes at least one of rRNA, snRNA, and snoRNA.
[0032] Preferably, when the full-length non-coding RNA is a Kink-turn type RNA, the non-target RNA includes at least one of rRNA, snRNA, and snoRNA.
[0033] Preferably, the rRNA includes 28S rRNA, 18S rRNA, 5.8S rRNA, 5S rRNA, 12S rRNA, and 16S rRNA; the snRNA includes U1, U2, U4, U5, U6, U11, U12, U4atac, and U6atac; and the snoRNA includes SNORD101, SNORD20, and SNORA23.
[0034] In some embodiments of the present invention, the nucleotide sequence of the DNA probe targeting snRNA is shown in SEQ ID NO. 1 to 29.
[0035] In some embodiments of the present invention, the nucleotide sequence of the DNA probe targeting snoRNA is shown in SEQ ID NO. 30-196.
[0036] In some embodiments of the present invention, the DNA probe is 38-55 nt in length.
[0037] In some embodiments of the present invention, the DNA probe is 40-55 nt in length. Probe spacer sequences less than 10 nt are used to design DNA probes that target non-target RNA.
[0038] In some embodiments of the present invention, the annealing temperature is 70–80°C.
[0039] In some embodiments of the present invention, the RNA ligation product is mixed with the DNA probe in equal mass.
[0040] In some embodiments of the present invention, the non-target RNA and the residual DNA probe are removed using RNase H enzyme and exonuclease RecJf, respectively.
[0041] In some embodiments of the present invention, the truncated reverse transcription primer sequence is shown in SEQ ID NO.197.
[0042] This invention uses truncated reverse transcription primers, which can reduce the chance of mismatch.
[0043] In some embodiments of the present invention, the fragment size in the non-coding RNA sequencing library is 150bp to 1500bp.
[0044] Preferably, the fragment size in the non-coding RNA sequencing library is 150bp to 700bp.
[0045] A second aspect of the present invention provides a method for sequencing full-length non-coding RNA, comprising constructing a sequencing library using the above method; and sequencing the sequencing library.
[0046] In some embodiments of the present invention, the sequencing is PE150 paired-end sequencing.
[0047] The sequencing libraries of this invention can be constructed and sequenced using RNA from different sources. For example, PEN-seq (Sequencing of Paired-Ends of NcRNAs) sequencing libraries targeting total RNA from cells or tissues; sub-PEN-seq sequencing libraries targeting RNA from various cellular components; and RIP-PEN-seq sequencing libraries targeting RNA immunoprecipitated with RNA binding proteins. The construction and sequencing of sub-PEN-seq sequencing libraries involves isolating RNA from various cellular components (e.g., cytoplasmic RNA, nuclear RNA, and nucleolar RNA), followed by PEN-seq for paired-end sequencing of ncRNAs and determination of their full-length sequences. Sub-PEN-seq technology combines cellular component isolation and PEN-seq techniques, enabling accurate identification of full-length RNA sequences while simultaneously determining RNA localization. The construction and sequencing of RIP-PEN-seq sequencing libraries involves RNA immunoprecipitation using K-turn RNA-specific binding proteins to enrich ncRNAs, followed by PEN-seq enrichment of K-turn RNA ligation products. RIP-PEN-seq technology combines RNA co-precipitation and PEN-seq techniques, enabling accurate identification of full-length RNA sequences while capturing RNA interactions.
[0048] Other features and advantages of the invention will be set forth in the description which follows, and will be apparent in part from the description, or may be learned by practicing the invention. Attached Figure Description
[0049] The present invention will be further described below with reference to the accompanying drawings and embodiments, wherein:
[0050] Figure 1 This is a schematic diagram illustrating the PEN-seq, RIP-PEN-seq, and sub-PEN-seq sequencing library construction techniques in embodiments of the present invention.
[0051] Figure 2 This is a flowchart illustrating the identification of paired-end and full-length RNA sequences based on PEN-seq, RIP-PEN-seq, and sub-PEN-seq libraries according to the present invention.
[0052] Figure 3This is a schematic diagram of the sequencing data analysis process of the present invention.
[0053] Figure 4 This invention provides a computer analysis workflow for identifying K-turn RNA based on K-turn result motifs contained in K-turn RNA.
[0054] Figure 5 The image shows the effect of qPCR and Western blot experiments used in this invention to detect the stable knockdown of 15.5K in HEK293T cells;
[0055] Figure 6 This figure shows a comparison of known K-turn RNA (i.e., boxC / DsnoRNA) origins identified by PEN-seq in the negative control cells shNC of this invention with annotated origins.
[0056] Figure 7 This figure shows a comparison of known K-turn RNA (i.e., boxC / DsnoRNA) endpoints identified using PEN-seq with annotated endpoints in the negative control cells shNC of this invention.
[0057] Figure 8 For the purposes of this invention, IGV was used to visualize the paired-terminal sites and full length of K-turn RNAbktRNA1 in shNC, sh15.5K-1, and sh15.5K-2 cells.
[0058] Figure 9 The expression levels of K-turnRNA in shNC, sh15.5K-1 and sh15.5K-2 cells were analyzed using heatmaps in this embodiment of the invention.
[0059] Figure 10 This invention uses violin plots to analyze changes in K-turn RNA expression levels in shNC, sh15.5K-1, and sh15.5K-2.
[0060] Figure 11 The Western blot results of this invention verify the overexpression of FLAG-15.5K in a stable FLAG-15.5K cell line (A) and the immunoprecipitation of FLAG-15.5K protein (B), where pCGP is a negative control cell.
[0061] Figure 12The diagram shows a comparison of known K-turn RNA (i.e., box C / D snoRNA) initiation sites identified by the 15.5K RIP-PEN-seq of this invention with annotated initiation sites (A), and a comparison of known K-turn RNA (i.e., box C / D snoRNA) endpoints with annotated endpoints (B).
[0062] Figure 13 This invention uses UCSC to visualize the paired-terminal sites and full length of K-turn RNA in 10 GAS5 introns identified by 15.5K RIP-PEN-seq, where Coverage indicates the full length of the RNA and its expression level.
[0063] Figure 14 This invention uses UCSC to visualize the paired-terminal sites and full length of the K-turn RNA bktRNA1 in the CWD19L1 intron identified by 15.5K RIP-PEN-seq. Coverage indicates the full length and expression level of the RNA, and Conservation represents the evolutionary conservation of bktRNA1 in 100 vertebrate species.
[0064] Figure 15 The purpose of this invention is to verify the separation of cytoplasm (Cyto), nucleoplasm (Np), and nucleolus (No) in HEK293T cells (A) and HCT116 (B) cells by Western blot.
[0065] Figure 16 This refers to the paired-terminal sites and full length of K-turn RNAbktRNA1 identified in sub-PEN-seq of various HEK293T cell components in this invention.
[0066] Figure 17 This refers to the paired-terminal sites and full length of K-turn RNAbktRNA1 identified in the HCT116 sub-PEN-seq of this invention.
[0067] Figure 18 The present invention uses heatmap analysis to analyze the expression level of K-turn RNA in each cellular component in sub-PEN-seq data of HEK293T and HCT116 cells.
[0068] Figure 19 This invention uses violin plot analysis to examine the differences in K-turn RNA expression levels among different cellular components in sub-PEN-seq data of HEK293T and HCT116 cells. Detailed Implementation
[0069] The following will describe the concept and technical effects of the present invention clearly and completely with reference to embodiments, so as to fully understand the purpose, features and effects of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are all within the scope of protection of the present invention.
[0070] In the description of this invention, the terms "one embodiment," "some embodiments," "illustrative embodiment," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0071] In an embodiment of the present invention, the solvent for the cell membrane lysis buffer is 10 mM Tris-HCl buffer at pH 7.5, and the solutes and their concentrations are as follows: 10 mM NaCl, 3 mM MgCl2, 0.3% (v / v) NP-40, 10% (v / v) glycerol, 1 mM DTT, 100 U / mL RiboLock RNase Inhibitor, and 400 μM Ribonucleoside Vanadyl Complex. Specifically, the RiboLock RNase Inhibitor is a Thermo Fisher product, catalog number EO0381; and the Ribonucleoside Vanadyl Complex is a NEB product, catalog number S1402S.
[0072] In embodiments of the present invention, the S1 sucrose solution formulation is: 0.25M sucrose, 10mM MgCl2, 1mM DTT, 100U / mL RiboLock RNase Inhibitor, and 400μM Ribonucleoside Vanadyl Complex; the S2 sucrose solution formulation is: 0.34M sucrose, 5mM MgCl2, 1mM DTT, 100U / mL RiboLock RNase Inhibitor, and 400μM Ribonucleoside Vanadyl Complex; the S3 sucrose solution formulation is: 0.88M sucrose, 5mM MgCl2, 1mM DTT, 100U / mL RiboLock RNase Inhibitor, and 400μM Ribonucleoside Vanadyl Complex.
[0073] In an embodiment of the present invention, the solvent of the RIP binding solution is a 50mM pH 7.5 Tris-HCl buffer solution, and the solute concentration is as follows: 150mM NaCl, 1mM MgCl2, 0.05% (volume percentage) NP-40, 20mM EDTA-Na2, 1mM DTT, 100U / mL RiboLock RNase Inhibitor, 1×Protease Inhibitor Cocktail.
[0074] Unless otherwise specified in the examples, the procedures should be performed under standard conditions or conditions recommended by the manufacturer. Reagents or instruments whose manufacturers are not specified are all commercially available products.
[0075] In embodiments of the present invention, PEN-seq sequencing library construction uses total RNA from cells or tissues; RIP-PEN-seq sequencing library construction uses RNA immunoprecipitated with RNA-binding proteins; and sub-PEN-seq sequencing library construction uses RNA from various cellular components.
[0076] In embodiments of the present invention, the basic process of sequencing library construction is as follows: Figure 1As shown, the specific steps include: after obtaining RNA, ligating an adenylated DNA adapter with random bases to the 3' end of the ncRNA and ligating an RNA adapter with random bases to the 5' end; then designing specific DNA probes for non-target RNAs (such as rRNA, snRNA, and snoRNA); annealing the ligation product with the DNA probe; digesting the non-target RNA with RNase H; digesting the DNA probe with the single-stranded DNA 5'-3' exonuclease RecJf; enriching the target ncRNA ligation product; then using truncated reverse transcription primers to transcribe the target RNA ligation product into cDNA; and then using primers containing anchored bases to perform PCR amplification to obtain a sequencing library, followed by paired-end PE150 high-throughput sequencing.
[0077] The method for capturing non-coding RNA to construct sequencing libraries provided by this invention, combined with the new paired-end sequencing technology, can effectively increase the ratio of target ncRNA sequencing reads in the data and reduce sequencing costs.
[0078] Example 1: Method for constructing sequencing libraries
[0079] This invention provides a method for constructing a sequencing library. The sequencing library construction process includes steps such as cell culture, total RNA extraction, DNase removal of genomic DNA from RNA, 3' DNA adapter ligation and removal of residual adapters, 5' RNA adapter ligation, removal of non-target RNA, reverse transcription, and library amplification.
[0080] 1. Extraction of total RNA from cells
[0081] When cells in the 6-well plate reach approximately 90% confluence, discard the culture medium and add 1 mL of Trizol. Incubate at room temperature for 10 minutes to lyse the cells. Transfer the lysate to a 1.5 mL centrifuge tube, add 200 μL of chloroform, vortex for 15 seconds, and incubate at room temperature for 3 minutes. Centrifuge at 13000 × g at 4°C for 10 minutes, retain 500 μL of supernatant, add 500 μL of isopropanol, mix well, and incubate at room temperature for 10 minutes to precipitate. Then centrifuge at 4°C at 20000 × g for 10 minutes, discard the supernatant, add 1 mL of 75% ethanol (prepared with DEPC water) to wash the RNA precipitate, then centrifuge at 4°C at 20000 × g for 5 minutes, discard the supernatant. Repeat this step once. Wash the precipitate again with 1 mL of anhydrous ethanol, centrifuge at 4°C at 20000 × g for 5 minutes, and discard the supernatant. Vacuum dry the RNA precipitate. Dissolve in 30 μL of DEPC water, then use NanoDrop to determine the RNA concentration (total RNA requires RNA integrity to meet the following conditions: 28S rRNA: 18S rRNA approximately equal to 2, A260 / A280 greater than 2, A260 / A230 greater than 2). For qualified total RNA samples, proceed directly to the next experiment or store at -80℃.
[0082] 2. Removal of genomic DNA from total RNA
[0083] Take 5 μg of total RNA sample, add 5 μL of RQ1 DNase 10× Reaction Buffer (RQ1 RNase-Free DNase reaction buffer), 2.5 μL of RiboLock RNase Inhibitor, and 5 μL of RQ1 RNase-Free DNase (Promega product, catalog number M6101), then add DEPC water to a final volume of 50 μL; react at 37°C for 30 minutes. Then purify the RNA obtained from the above reaction using RNAClean & Concentrator-5. Elute with 12 μL of DEPC water, determine the concentration using NanoDrop, and then proceed directly to adapter ligation or store at -80°C.
[0084] 3. 3' DNA adapter ligation
[0085] (1) Adenylation treatment of 3' DNA linker
[0086] Take 500 pmol of 3' DNA adapter, add 10 μL of 10×5' DNA Adenylation Reaction Buffer (Mth RNA ligase reaction buffer), 10 μL of RiboLock 1 mM ATP and 10 μL of Mth RNA ligase (NEB product, catalog number M2610), and then add DEPC water to 100 μL; react at 65°C for 2 hours, and inactivate the enzyme at 85°C for 10 minutes. The DNA adapters from the above reaction were then purified using Oligo Clean & Concentrator, followed by elution with 20 μL of DEPC water. After determining the concentration using NanoDrop, the DNA adapter concentration was adjusted to 20 μM to obtain adenosylated 3' DNA adapters. The adenosylated 3' DNA adapters are defined as: rAppNNNNNNTGGAATTCTCGGGTGCCAAGG-C3 Spacer, where rApp represents adenosylation modification, NNNNNN represents six random bases of deoxyribonucleotides, N represents any one of the four deoxyribonucleotides A, T, C, and G, and C3 Spacer is the blocking group.
[0087] (2) 3' DNA adapter ligation
[0088] Take 500 ng of the total RNA from which genomic DNA has been removed, add DEPC water to a final volume of 10.5 μL, then add 0.5 μL of the adenylated 3' DNA adapter. Mix well and denature at 70°C for 2 minutes, then immediately place on ice for 2 minutes. Add 2 μL of 10×T4RNALigase 2,truncated KQ reaction buffer (T4 RNALigase 2,truncated KQ reaction buffer), 5 μL of PEG 8000MW (50%), 1 μL of RiboLock RNase Inhibitor, and 1 μL of T4 RNALigase 2,truncated KQ (NEB product, catalog number M0373). Mix well and incubate at 16°C for 18 hours.
[0089] (3) Removal of residual joints
[0090] Add 2 μL of 5′Deadenylase (NEB product, catalog number M0331) to the above reaction system, mix well, and react at 30°C for 1 hour. Then add 2 μg of single-stranded DNA binding protein (Promega product, catalog number M3011), mix well, and react on ice for 30 minutes. Finally, add 2 μL of RecJf (NEB product, catalog number M0264), mix well, and react at 37°C for 1 hour.
[0091] 4, 5' RNA adapter ligation
[0092] Add 2 μL (40 pmol) of denatured 5' RNA adapter, 2 μL of 10×T4 RNA Ligase reaction buffer (T4 RNA Ligase 1 reaction buffer), 2.56 μL of PEG8000MW (50%), 1 μL of RiboLock RNase Inhibitor, 4 μL of 10 mM ATP, and 4 μL of T4 RNA Ligase 1 (NEB product, catalog number M0204) to the reaction system after removing the residual adapter. After mixing, react at 16°C for 18 hours. Then purify the RNA from the above reaction with RNAClean & Concentrator-5 and elute with 12 μL of DEPC water to obtain the RNA ligation product. The nucleotide sequence of the 5' RNA adapter is: GUUCAGAGUUCUACAGUCCGACGAUCNNNNNN, where NNNNNN represents a six-base ribonucleotide and N represents any one of the four deoxyribonucleotides A, T, C, and G.
[0093] 5. Removal of non-target RNA
[0094] Take 11.2 μL of the RNA ligation product obtained above, add 0.8 μL of DNA probe targeting non-target RNA (50 μM), 3 μL of 5× annealing buffer, mix well, react at 95°C for 2 minutes, then cool down to 22°C at 0.1°C per second, maintain at 22°C for 5 minutes, and then place on ice.
[0095] DNA probes targeting rRNA were referenced in a published paper (Adiconis, X., et al., Comparative analysis of RNA sequencing methods for degraded or low-input samples. NatMethods, 2013.10(7):p.623-9.), and the DNA probe sequences targeting snRNA are shown in Table 1.
[0096] Table 1: DNA probe sequence information targeting snRNA
[0097]
[0098]
[0099] The DNA probe sequences targeting snoRNA are shown in SEQ ID NO.30–196 (Table 2).
[0100] Table 2: DNA probe sequence information targeting snoRNA
[0101]
[0102]
[0103]
[0104]
[0105]
[0106] Then, add 2 μL of 10×RNase H reaction buffer, 0.2 μL of RiboLock RNase Inhibitor, and 2 μL of RNase H (NEB product, catalog number M0297) to the above reaction system, and add DEPC water to a final volume of 20 μL. After mixing, incubate at 37°C for 30 minutes. Then, purify the RNA obtained from the above reaction using RNA Clean & Concentrator-5, and finally elute with 22 μL of DEPC water.
[0107] 6. Removal of DNA probes
[0108] Take 21.5 μL of the product after removing non-target RNA, denature at 70°C for 2 minutes, then immediately place on ice for 2 minutes. Add 3 μL of 10×NEBuffer 2 (RecJf's matching reaction buffer), 1 μL of RiboLock RNase Inhibitor, and 7 μg of single-stranded DNA binding protein. Mix well and incubate on ice for 30 minutes. Add another 3 μL of RecJf and incubate at 37°C for 1 hour to digest the DNA probe. Then purify the RNA using RNA Clean & Concentrator-5. Elute with 12 μL of DEPC water. The sample can be directly used for the next reaction or stored at -80°C.
[0109] 7. Reverse transcription reaction
[0110] Take 11.5 μL of the product after removing the above DNA probe, add 0.5 μL of 40 μM truncated reverse transcription primer, mix well, denature at 65 °C for 5 minutes, immediately place on ice, then add 4 μL of 5×RT buffer (Thermo Fisher product, catalog number 18090050), 1 μL of 100 mM DTT, 1 μL of 10 mM dNTPs, 1 μL of RiboLock RNase Inhibitor and 1 μL of SuperScript IV Reverse Transcriptase, mix well, and react at 50 °C for 60 minutes. The nucleotide sequence of the truncated reverse transcription primer is as follows: GCCTTGGCACCCGAGAAT (SEQ ID NO.197).
[0111] Add 4 μL of Exonuclease I (NEB product, catalog number M0293) and 4 μL of rSAP (NEB product, catalog number M0371) to the above reaction system, and react at 37°C for 15 minutes. Then add 5 μL of 0.5M EDTA and 7 μL of 1M NaOH, mix well, and react at 70°C for 12 minutes. Then purify the cDNA using Oligo Clean & Concentrator. Elute with 16 μL of DEPC water to obtain cDNA.
[0112] 8. Sequencing adapter connection
[0113] Take 15 μL of the cDNA obtained above, add 25 μL of NEBNext Ultra II Q5 Master Mix (NEB product, catalog number M0544), 5 μL of RP1 (10 μM) and 5 μL of RPI-X (10 μM, including a series of primers containing different INDEXs from RPI 1 to 12, and the primers contain bases anchoring the 3' DNA linker) for PCR reaction.
[0114] The primer sequences for RP1 and RPI 1–12 are shown in Table 3.
[0115] Table 3: Primer sequences
[0116]
[0117] Note: The underlined parts in the table are the inserted INDEX sequences, and the * parts are thiomodifications.
[0118] The PCR reaction program was as follows: 98℃ pre-denaturation for 30 seconds; 15 cycles: 98℃ denaturation for 10 seconds, 65℃ annealing extension for 75 seconds; 65℃ extension for 5 minutes, and storage at 4℃.
[0119] After the PCR reaction was completed, DNA was purified using DNA Clean & Concentato-5, eluted with 20 μL of enzyme-free water, and then subjected to 4% low-melting-point agarose gel electrophoresis. Bands in the range of 150-700 bp were recovered using the Zymoclean Gel DNA Recovery Kit, eluted with 18 μL of DEPC water, and the concentration of the recovered gel product was determined using NanoDrop to obtain the sequencing library.
[0120] After obtaining the sequencing library, the Illumina Hiseq Xten sequencer was used to perform PE150 paired-end sequencing to identify the paired-ends of non-coding RNA and analyze their full-length sequences.
[0121] Example 2: Construction method of PEN-seq sequencing library
[0122] In this embodiment, the PEN-seq sequencing library was constructed using total RNA from 15.5K HEK293T cells that were stably knocked down, and then sequenced and analyzed. The specific process included the following steps.
[0123] (I) Culture of HEK293T cells with stable knockdown of 15.5K
[0124] In HEK293T cells, cell lines capable of inducing silencing of 15.5K (sh15.5K-1, sh15.5K-2, and control group shNC) were constructed following the miR-E method described in the published article (Fellmann, C., et al., An optimized microRNA backbone for effective single-copy RNAi. Cell Rep, 2013.5(6):p.1704-13.). These three cell lines were seeded into 6-well plates and cultured for 24 hours. Then, doxycycline (Selleck product, catalog number S5159) at a final concentration of 3 μM was added, and the cells were cultured for another 48 hours.
[0125] (II) Extraction of total RNA from HEK293T cells stably knocked down to 15.5K
[0126] When cells in the 6-well plate reached approximately 90% confluence, discard the culture medium, add 1 mL of Trizol, and lyse for 10 minutes at room temperature. Transfer the lysate to a 1.5 mL centrifuge tube, add 200 μL of chloroform, vortex for 15 seconds, and incubate at room temperature for 3 minutes. Centrifuge at 13000×g at 4℃ for 10 minutes, retain 500 μL of supernatant, add 500 μL of isopropanol, mix well, and incubate for 10 minutes at room temperature. Then centrifuge at 4℃ at 20000×g for 10 minutes, discard the supernatant, add 1 mL of 75% ethanol (prepared with DEPC water) to wash the RNA precipitate, then centrifuge at 4℃ at 20000×g for 5 minutes, and discard the supernatant. Repeat this step once. Add 1 mL of anhydrous ethanol to wash the precipitate again, centrifuge at 4℃ at 20000×g for 5 minutes, and discard the supernatant. Vacuum dry the RNA precipitate. Dissolve in 30 μL of DEPC water and determine RNA concentration using NanoDrop (total RNA requires RNA integrity to meet the following conditions: 28S rRNA: 18S rRNA approximately equal to 2, A260 / A280 greater than 2, A260 / A230 greater than 2).
[0127] (III) Construction of a stable 15.5K PEN-seq library
[0128] In this embodiment, the construction of the PEN-seq sequencing library is specifically shown in steps 2 to 8 of embodiment 1.
[0129] (iv) High-throughput sequencing
[0130] The PEN-seq constructed above was sequenced using an Illumina Hiseq Xten sequencer for PE150 paired-end sequencing.
[0131] Example 3: Construction of RIP-PEN-seq sequencing libraries
[0132] The RIP-PEN-seq library construction in this embodiment uses RNA-binding protein immunoprecipitation of RNA, specifically including HEK293T cell culture stably expressing FLAG-15.5K, cell lysis, RNA immunoprecipitation, isolation of FLAG-15.5K interacting RNA, and its library construction and sequencing.
[0133] (I) Construction and cell collection of a stable cell line overexpressing FLAG-15.5K
[0134] HEK293T cells stably expressing FLAG-15.5K were constructed using a lentiviral vector. The cells were expanded and cultured until they reached approximately 90% confluence in the cell culture dish. After discarding the culture medium, the cells were washed twice with pre-chilled DPBS. After discarding the DPBS, 3 mL of pre-chilled DPBS was added again. The cells were collected in a centrifuge tube using a cell scraper and centrifuged at 1000×g at 4℃ for 5 minutes. The supernatant DPBS was then discarded to obtain the cell pellet.
[0135] (II) Cell lysis and RNA immunoprecipitation
[0136] Add an equal volume of cell lysis buffer to the cell pellet, resuspend the pellet using a pipette, and incubate on ice for 15 minutes. Then centrifuge at 15000×g at 4°C for 15 minutes, retaining the supernatant cell lysis buffer. Add 1 / 20 volume of Dynabeads protein G magnetic beads to the cell lysis buffer, incubate at 4°C for 30 minutes, and then separate the beads and cell lysis buffer using a magnetic rack. Dilute the cell lysis buffer 10-fold with RIP binding buffer, and add an RNA-binding protein antibody (here, a FLAG antibody targeting FLAG-15.5K) at a ratio of 5 μg / mL of the diluted cell lysis buffer. Incubate at 4°C for 12 hours. Then add Dynabeads protein G (Thermo Fisher Scientific, catalog number 10004D) at a ratio of 10 μL / μg antibody, and continue incubating at 4°C for 3 hours.
[0137] (III) Isolation of FLAG-15.5K interacting RNA
[0138] After completing the above incubation, separate the magnetic beads and solution using a magnetic rack, discard the solution, then add 1 mL of TRIzol washing buffer to the magnetic beads, and wash by vortexing at room temperature for 3 minutes. Separate the magnetic beads and solution using a magnetic rack, and discard the solution. Repeat the washing process 4 times. Then, add 1 mL of TRIzol (Thermo Fisher Scientific, catalog number 15596018) to the washed magnetic beads, mix well, and incubate at room temperature for 5 minutes. Then add 200 μL of chloroform, vortex for 15 seconds, and incubate at room temperature for 3 minutes. Centrifuge at 13000 × g at 4 °C for 10 minutes, retain 500 μL of supernatant, add 500 μL of isopropanol and 4 μL of glycogen (Thermo Fisher Scientific, catalog number AM9510), mix well, and incubate at -20 °C overnight to precipitate. Centrifuge at 20000×g for 30 minutes at 4°C, discard the supernatant, add 1 mL of 75% ethanol (prepared with DEPC water) to wash the RNA precipitate, then centrifuge at 20000×g for 5 minutes at 4°C, discard the supernatant. Repeat this step once. Add 1 mL of anhydrous ethanol to wash the precipitate again, centrifuge at 20000×g for 5 minutes at 4°C, discard the supernatant. Vacuum dry the RNA precipitate. Dissolve in 30 μL of DEPC water, and determine the RNA concentration using NanoDrop (A260 / A280 > 2, A260 / A230 > 2). Qualified RNA samples can be directly used for the next experiment or stored at -80°C.
[0139] (iv) RIP-PEN-seq sequencing library preparation and sequencing
[0140] For the specific RIP-PEN-seq sequencing library preparation and sequencing methods in this embodiment, please refer to steps 2 to 8 in Example 1.
[0141] Example 4: Construction of sub-PEN-seq sequencing libraries
[0142] The sub-PEN-seq sequencing library in this embodiment uses RNA from various cellular components, and the specific construction method includes the following process.
[0143] (I) Culture and Collection of HEK293T and HCT116 Cells
[0144] HEK293T and HCT116 cells cultured in the laboratory were used as samples, with an initial cell sample size of 3 × 10⁻⁶. 7 Cells were cultured in a cell culture dish until they reached approximately 90% confluence. The culture medium was discarded, and the cells were washed twice with DPBS solution (pH 7.4). Then, trypsin was added to digest the cells and tissues. After terminating digestion with serum-containing culture medium, the cell suspension was collected in a conical tube, placed on ice, and centrifuged at 500×g and 4℃ for 5 minutes. The supernatant was discarded, and the cells were resuspended in pre-cooled DPBS and counted. The relative volume (RV) of the cells was determined.
[0145] (II) Cytoplasmic RNA Isolation
[0146] Add pre-chilled cell membrane lysis buffer at a relative volume of 15, resuspend the cells using a pipette and mix gently. Incubate on ice for 10 minutes. Gently vortex, centrifuge at 1000×g, 4°C for 3 minutes, and transfer the supernatant (cytoplasmic fraction) to a new centrifuge tube. The precipitate is the nuclear fraction. For the cytoplasmic fraction, add 950 μL of anhydrous ethanol and 50 μL of 3M sodium acetate (pH 5.5, Thermo Fisher Scientific, catalog number AM9740) per 330 μL of cytoplasm, mix well, and precipitate at -20°C for 2 hours. Then centrifuge at 18000×g, 4°C for 15 minutes and discard the supernatant. Add 1 mL of 75% ethanol, vortex wash, centrifuge at 18000×g, 4℃ for 5 minutes, discard the supernatant and dry slightly (air dry naturally), add 1 mL of TRIzol for lysis, then add 10 μL of 0.5 MEDTA and incubate at 65℃ for 10 minutes to fully dissolve the RNA, and then use chloroform for RNA extraction to obtain cytoplasmic RNA.
[0147] (III) Isolation of nuclear RNA
[0148] Add 30 times the relative volume of pre-cooled cell membrane lysis buffer to the nuclear fraction from step (II) above to wash the nuclei. Centrifuge at 200×g for 2 minutes at 4°C, and repeat this step once. Then add 30 times the relative volume of cell membrane lysis buffer to resuspend the nuclei, centrifuge at 1200×g for 5 minutes at 4°C, and discard the supernatant. Add 10 times the relative volume of S1 sucrose solution to resuspend the nuclei, and then add it to 10 times the relative volume of S3 sucrose solution. Centrifuge at 1200×g for 10 minutes at 4°C, and the resulting precipitate is the purified nuclei. Add 10 times the relative volume of S2 sucrose solution to the purified nuclei precipitate to resuspend it and transfer it to a new tube. Then sonicate the precipitate under the following conditions: Power 50%, sonicate for 15 seconds, with a 45-second interval, for 7 cycles. Then add the resuspended solution to 10 times the relative volume of S3 sucrose solution and centrifuge at 2000×g for 20 minutes at 4°C. The supernatant contains nucleoplasm, and the precipitate contains nucleoli. After removing the cell nucleoplasm from the supernatant, add 950 μL of anhydrous ethanol and 50 μL of 3M sodium acetate to every 330 μL, mix well, and precipitate at -20℃ for 2 h. Then, extract cell nucleoplasmic RNA according to the method in (II) above to obtain cell nucleoplasmic RNA.
[0149] (iv) Isolation of nucleolar RNA
[0150] The nucleolar precipitate from step (III) above was resuspended in 500 μL of S2 sucrose solution, centrifuged at 2000×g at 4℃ for 5 minutes, the supernatant was discarded, 1 mL of TRIzol was added, and the mixture was lysed at room temperature for 10 minutes. RNA was then extracted using chloroform to obtain nucleolar RNA.
[0151] (V) Sub-PEN-seq Library Preparation and Sequencing
[0152] For the specific methods of sub-PEN-seq sequencing library preparation and sequencing in this embodiment, please refer to steps 2 to 8 in Example 1.
[0153] Application Example 1: Construction and Sequencing Analysis of PEN-seq Sequencing Libraries
[0154] In this application example, total RNA was isolated from three stable cell lines treated with doxycycline: HEK293T-shNC, sh15.5K-1, and sh15.5K-2. The libraries were prepared and sequenced using the PEN-seq sequencing library construction method described in Example 2, and then the data were analyzed.
[0155] 1. Data Analysis Methods
[0156] The data analysis workflow for identifying paired-end and full-length non-coding RNA sequences from PEN-seq sequencing libraries is as follows: Figure 2 and Figure 3 As shown, the specific steps include: after obtaining the raw high-throughput paired-end sequencing data of the PEN-seq sequencing library, the adapter sequences and low-quality sequences in the raw PEN-seq paired-end sequencing data are first analyzed using Cutadapt (v2.8) software. Then, the filtered data are aligned to the human reference genome (hg38 version) using the sequence alignment software STAR (v2.7.1a). After reading the BAM file of the alignment results using SAMtools, cluster analysis is performed based on the overlap between sequences. Finally, the paired-end sites (start and end points) of non-coding RNA and the full-length sequence are determined from the cluster analysis results.
[0157] IGV or UCSC can be used to visualize PEN-seq data analysis results. Furthermore, PEN-seq data analysis results can be used to further screen K-turn RNAs based on their structural features. The computer analysis workflow for K-turn RNA identification based on K-turn structural motifs contained within K-turn RNAs is as follows: Figure 4 As shown.
[0158] 2. Data Analysis Results
[0159] First, qPCR and Western blot experiments were used to detect the effect of stable knockdown of HEK293T cells at 15.5K, as shown in the figure below. Figure 5 As shown, three stable HEK293T cell lines (shNC, sh15.5K-1, and sh15.5K-2) were treated with DMSO or doxycycline for 48 hours, and RNA and protein were collected. qPCR results showed that, compared with the negative control cells treated with doxycycline, the mRNA levels of shNC, sh15.5K-1, and sh15.5K-2 cells were significantly downregulated after doxycycline-induced expression of shRNA targeting 15.5K. Western blot results also showed a significant decrease in protein levels.
[0160] Based on the three-dimensional structural features of K-turn RNAs, the paired-terminal sites and full-length sequences of known and novel K-turn RNAs were identified. The start and end positions of known K-turn RNAs were compared with those of previously annotated K-turn RNAs. The comparison results of the start points of known K-turn RNAs (i.e., box C / D snoRNAs) identified using PEN-seq sequencing libraries with the annotated start points are shown below. Figure 6 As shown, the results indicate that PEN-seq sequencing library analysis can accurately identify the origins of known K-turn RNAs; the comparison results of the known K-turn RNA (i.e., box C / D snoRNA) endpoints identified using PEN-seq sequencing library with the annotated endpoints are as follows. Figure 7 As shown, the results demonstrate that PEN-seq sequencing libraries can accurately identify endpoints of known K-turn RNAs.
[0161] The above results indicate that the full-length sequence of known K-turn RNAs can be accurately identified in the PEN-seq library constructed based on HEK293T-shNC cells.
[0162] Furthermore, the PEN-seq sequencing library constructed using this invention has also identified a number of new K-turn RNAs through high-throughput sequencing, such as... Figure 8The bktRNA1 shown shows that since 15.5K can affect the processing of bktRNA1, the level of bktRNA1 decreased in the cell lines sh15.5K-1 and sh15.5K-2, which were silenced by 15.5K (Coverage indicates the full length and expression level of the RNA). This further illustrates that the start and end points of the RNA can be clearly determined using PEN-seq sequencing libraries, and its expression decreased significantly after silencing 15.5K. This is consistent with existing literature reports (Watkins, NJ, A. Dickmanns, and R. Luhrmann, Conserved stem II of the box C / D motif is essential for nucleolar localization and is required, along with the 15.5K protein, for the hierarchical assembly of the box C / D snoRNP. Mol Cell Biol, 2002. 22(23): p. 8342-52.) 15.5K can promote the processing and conformation of K-turn RNA.
[0163] For a newly identified batch of K-turn RNAs, their expression levels in shNC, sh15.5K-1, and sh15.5K-2 cells were further compared. Similar to bktRNA1, silencing 15.5K significantly downregulated the expression levels of these K-turn RNAs. The results of heatmap analysis of K-turn RNA expression levels in shNC, sh15.5K-1, and sh15.5K-2 cells are shown below. Figure 9 As shown, RPM stands for Reads Per Million Reads. Violin plots were used to analyze changes in K-turn RNA expression levels in shNC, sh15.5K-1, and sh15.5K-2, and the results are shown below. Figure 10 As shown, the results indicate that silencing 15.5K significantly downregulated the expression level of K-turn RNA.
[0164] The results in summary show that the method of this invention for constructing PEN-seq sequencing libraries and performing sequencing analysis can effectively capture full-length non-coding RNA sequence information, which is of great significance for studying the transcriptional level of non-coding RNA.
[0165] Application Example 2: RIP-PEN-seq Sequencing Library Construction and Sequencing Analysis
[0166] This application example involves collecting HEK293T-FLAG-15.5K cells that stably express FLAG-15.5K, immunoprecipitating FLAG-15.5K with FLAG antibody, and preparing and sequencing the library using the RIP-PEN-seq sequencing library construction method described in Example 3, followed by data analysis.
[0167] 1. Data Analysis Methods
[0168] The data analysis workflow for identifying paired-end and full-length sequences of non-coding RNAs from RIP-PEN-seq includes the following steps: First, Cutadapt (v2.8) software is used to parse adapter sequences and low-quality sequences in the raw paired-end sequencing data of the PEN-seq sequencing library. Then, STAR (v2.7.1a) sequence alignment software is used to align the filtered data to the human reference genome (hg38 version). After reading the BAM file of the alignment results using SAMtools, cluster analysis is performed based on the overlap between sequences. Finally, the paired-end sites (start and end points) and full-length sequences of non-coding RNAs are determined from the cluster analysis results. IGV or UCSC can be used to visualize the RIP-PEN-seq data analysis results. Furthermore, the RIP-PEN-seq data analysis results are used to identify K-turn RNAs interacting with 15.5K.
[0169] 2. Data Analysis Results
[0170] HEK293T-FLAG-15.5K cells stably expressing FLAG-15.5K were collected. Western blot was used to verify the overexpression of FLAG-15.5K in the stably FLAG-15.5K cell line. pCGP cells were used as a negative control. The results are as follows: Figure 11 As shown in Figure A, the results indicated that the expression of FLAG-15.5K protein was significantly increased in HEK293T-FLAG-15.5K cells. FLAG-15.5K was immunoprecipitated using FLAG antibody. Western blot analysis was performed to detect the effect of FLAG-15.5K protein immunoprecipitation in HEK293T-FLAG-15.5K cells, and the results are as follows. Figure 11 As shown in B, the results indicate that immunoprecipitation with the specific antibody FLAG targeting FLAG-15.5K in HEK293T-FLAG-15.5K cells significantly enriched FLAG-15.5K protein.
[0171] Library preparation, sequencing, and data analysis were further performed following the RIP-PEN-seq protocol, identifying known and novel K-turn RNA pairwise sites and their full-length sequences, as well as their interactions with 15.5K. Comparisons were made with the annotated start and end points of known K-turn RNAs. The comparison results of the known K-turn RNA (i.e., box C / DsnoRNA) start points identified by RIP-PEN-seq with the annotated start points are as follows: Figure 12 As shown in A, the results show that sequencing of the constructed RIP-PEN-seq library can accurately identify the starting point of known K-turn RNA; the comparison results of the known K-turn RNA (i.e., box C / D snoRNA) endpoints identified by 15.5K RIP-PEN-seq with the annotated endpoints are shown in B of 12, which shows that sequencing of the constructed RIP-PEN-seq library can also accurately identify the endpoints of known K-turn RNA.
[0172] The above results indicate that constructing a RIP-PEN-seq sequencing library using the method of this invention and performing high-throughput sequencing can accurately identify known K-turn RNAs, such as... Figure 13 The images show the paired-terminal sites and full length of K-turn RNAs in 10 GAS5 introns. In addition, a number of new K-turn RNAs were identified, such as... Figure 14 The bktRNA1 is shown to have its start and end points.
[0173] Application Example 3: Data Processing of Sub-PEN-Seq Sequencing Libraries
[0174] In this application example, after collecting cultured HEK293T and HCT116 cells, a sub-PEN-seq sequencing library was constructed and sequenced according to the method in Example 4, and data analysis was performed.
[0175] 1. Data Analysis Methods
[0176] This study identifies paired-end and full-length sequences and cellular localization information of non-coding RNAs from sub-PEN-seq. First, Cutadapt (v2.8) software was used to analyze adapter sequences and low-quality sequences in the raw paired-end sequencing data of the PEN-seq library. Then, STAR (v2.7.1a) sequence alignment software was used to align the filtered data to the human reference genome (hg38 version). SAMtools was used to read the BAM file of the alignment results, and cluster analysis was performed based on sequence overlap. Finally, the paired-end sites (start and end points) and full-length sequences of non-coding RNAs were determined from the cluster analysis results. IGV or UCSC can be used to visualize the sub-PEN-seq data analysis results. Furthermore, the sub-PEN-seq data analysis results can be used to further screen K-turn RNAs based on their structural characteristics and analyze their distribution in different cellular components, as well as their cellular localization information.
[0177] 2. Data Analysis Results
[0178] After collecting and culturing HEK293T and HCT116 cells, the cytoplasm, nucleoplasm, and nucleolus were separated according to the sub-PEN-seq method. The separation efficiency was assessed using specific proteins for each component. Western blot was used to verify the separation efficiency of HEK293T cytoplasm (Cyto), nucleoplasm (Np), and nucleolus (No). Figure 15 As shown in Figure A, the results indicate that the cytoplasmic component-specific protein GAPDH, the nucleoplasmic component-specific protein FUS, and the nucleolar component-specific protein FBL are all significantly distributed in their respective components and are very low in other components, indicating a clear separation of cell components. Western blot verification of the separation effect of HCT116 cytoplasm, nucleoplasm, and nucleolus is shown in Figure A. Figure 15 As shown in B in the diagram. The above illustrates that the method of the present invention can effectively separate various cellular components.
[0179] Further library preparation, sequencing, and data analysis were performed according to the sub-PEN-seq sequencing library construction method in Example 4, identifying known and novel K-turn RNA pairwise sites, full-length sequences, and their cellular localization. The pairwise sites and full-length sequences of K-turn RNA bktRNA1 identified in sub-PEN-seq of various HEK293T cell components are shown below. Figure 16 As shown, the paired-terminal sites and full-length of the K-turn RNA bktRNA1 identified in HCT116sub-PEN-seq are as follows: Figure 17 As shown, the results indicate that high-throughput sequencing of the sub-PEN-seq sequencing library constructed in this invention can accurately identify known K-turn RNAs. Furthermore, expression analysis of these K-turns revealed that K-turn RNAs in HEK293T and HCT116 cells are mainly distributed in the nucleoplasm and nucleolus. In addition, heatmap analysis of the expression levels of K-turn RNAs in various cellular components of the HEK293T and HCT116 cell sub-PEN-seq library data yielded the following results: Figure 18 As shown, the results indicated that the expression level of K-turn RNA in the nucleoplasm and nucleolus was higher than that in the cytoplasmic component. Furthermore, violin plot analysis was used to analyze the differences in K-turn RNA expression levels among different cellular components in HEK293T and HCT116 cell sub-PEN-seq data, and the results are shown below. Figure 19 As shown, the results indicate that K-turn RNA is mainly distributed in the nucleoplasm and nucleolus.
[0180] The above results demonstrate that constructing a full-length non-coding RNA sequencing library using the method of this invention and performing high-throughput sequencing can detect the full length of non-coding RNA in various cellular components (such as cytoplasm, nucleoplasm, and nucleolus).
[0181] In summary, this invention provides a method for constructing and applying a sequencing library for capturing full-length non-coding RNAs. High-throughput sequencing of the non-coding RNA sequencing library constructed in this invention shows that it can accurately identify non-coding RNAs, which is of great significance for capturing the full length of various ncRNAs and studying the localization of non-coding RNAs.
[0182] The embodiments of the present invention have been described in detail above. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention. Furthermore, the embodiments of the present invention and the features thereof can be combined with each other unless otherwise specified.
Claims
1. A method for constructing a sequencing library of full-length non-coding RNAs, characterized by, include: Step S1: Obtain the RNA of the sample to be tested, and ligate a 3' DNA adapter and a 5' RNA adapter to both ends of the RNA to obtain the RNA ligation product; wherein the 3' DNA linker is: rApp-NNNNNNTGGAATTCTCGGGTGCCAAGG C3 Spacer, the rApp is adenosine modification, the NNNNNN is six random base deoxyribonucleotides, the N represents any one of A, T, C, G four deoxyribonucleotides, and the C3 Spacer is a blocking group. The 5' RNA linker is: GUUCAGAGUUCUACAGUCCGACGAUCNNNNNN, where NNNNNN in the 5' RNA linker is a ribonucleotide with six random bases, and N represents any one of the four ribonucleotides A, U, C, and G; Step S2: Mix the RNA ligation product with a DNA probe targeting non-target RNA and anneal it to remove non-target RNA and residual DNA probe, thereby obtaining the target RNA ligation product; wherein, the non-target RNA includes at least one of rRNA, snRNA, and snoRNA. Step S3: Design truncated reverse transcription primers for the target RNA ligation product and synthesize cDNA. Then, use primers containing anchored bases to perform PCR amplification on the cDNA to obtain a full-length non-coding RNA sequencing library.
2. The construction method according to claim 1, characterized in that, The RNA in the sample to be tested includes at least one of the following: total RNA from cells or tissues, RNA immunoprecipitated with RNA-binding proteins, and RNA from different organelles.
3. The construction method according to claim 1, characterized in that, The DNA probe is 38-55 nt in length.
4. The construction method according to claim 1, characterized in that, The non-target RNA and the residual DNA probe were removed using RNase H and RecJf exonuclease, respectively.
5. The construction method according to claim 1, characterized in that, The truncated reverse transcription primer sequences are shown in SeqID No:
197.
6. The construction method according to claim 1, characterized in that, The fragment size in the captured full-length non-coding RNA sequencing library is 150bp~1500bp.
7. A method for sequencing full-length non-coding RNA, characterized in that, The method includes constructing a sequencing library using the method described in any one of claims 1 to 6; and sequencing the sequencing library.
8. The method according to claim 7, characterized in that, The sequencing was performed using PE150 paired-end sequencing.