Reverse transcriptase and use thereof

By developing novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT, the problems of insufficient synthesis efficiency and fidelity of existing reverse transcriptases in RNA research have been solved, enabling efficient synthesis of full-length cDNA at high temperatures, which is suitable for RNA library preparation in transcriptome sequencing.

WO2026129267A1PCT designated stage Publication Date: 2026-06-25BGI RESEARCH HANGZHOU +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
BGI RESEARCH HANGZHOU
Filing Date
2024-12-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing reverse transcriptases have shortcomings in RNA molecule research, such as low synthesis efficiency, low fidelity, and weak heat resistance, which limit the progress of RNA molecule research.

Method used

Novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT were developed, derived from Bacillus subtilis, Bacillus oryzae, and Psychrophilus fecalis from Huangdao deep-sea region. They exhibit enhanced thermal stability, continuous synthesis capability, and high fidelity, making them suitable for reverse transcription of complex RNA structures.

Benefits of technology

It can efficiently synthesize full-length cDNA at higher temperatures, improving the efficiency and accuracy of RNA library preparation, and is suitable for applications such as transcriptome sequencing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure PCTCN2024140780-FTAPPB-I100001
    Figure PCTCN2024140780-FTAPPB-I100001
  • Figure PCTCN2024140780-FTAPPB-I100002
    Figure PCTCN2024140780-FTAPPB-I100002
  • Figure PCTCN2024140780-FTAPPB-I100003
    Figure PCTCN2024140780-FTAPPB-I100003
Patent Text Reader

Abstract

Provided are a reverse transcriptase or a biologically active fragment thereof and the use thereof. The reverse transcriptase or the biologically active fragment thereof has: a. an amino acid sequence as shown in any one of SEQ ID NOs: 1-3; b. an amino acid sequence having one or more amino acid substitutions, deletions and / or additions on the basis of the amino acid sequence as shown in any one of SEQ ID NOs: 1-3; or c. an amino acid sequence having at least 70% identity to the amino acid sequence as shown in any one of SEQ ID NOs: 1-3. Moreover, the reverse transcriptase or the biologically active fragment thereof has a reverse transcriptase function.
Need to check novelty before this filing date? Find Prior Art

Description

Reverse transcriptase and applications thereof TECHNICAL FIELD

[0001] The present application relates to the field of biotechnology, in particular to a novel reverse transcriptase and applications thereof. BACKGROUND

[0002] Reverse transcriptase is a kind of RNA-dependent DNA polymerase, which guides the synthesis of dNTP based on the template RNA chain to synthesize a complementary DNA single strand, namely complementary DNA (cDNA). Reverse transcriptase is involved in the transmission of retroviruses and telomere elongation. In vitro, reverse transcriptase is widely used in various biological analysis methods, such as transcriptome sequencing (RNA-seq), reverse transcription PCR (RT-PCR) and reverse transcription loop-mediated isothermal amplification (RT-LAMP) and the like.

[0003] Common reverse transcriptases include Moloney murine leukemia virus reverse transcriptase (MMLV RT), human immunodeficiency virus reverse transcriptase (HIV RT), avian myeloblastosis virus reverse transcriptase (AMV RT) and the like. In addition, reverse transcriptases from group II introns have also been found and used as powerful biotechnological tools, which exhibit higher thermal stability, fidelity and continuous synthesis ability compared with commonly used MMLV RT and the like.

[0004] Although some commercialized modified reverse transcriptases (such as modified MMLV reverse transcriptase and reverse transcriptases from group II introns and the like) have been proposed in the related art, these reverse transcriptases still have the disadvantages of low synthesis efficiency, low fidelity and weak high-temperature resistance, which to some extent limits the research process of RNA molecules.

[0005] Therefore, it is urgent to develop new reverse transcriptases with improved performance to improve the efficiency and accuracy of RNA molecule synthesis and detection. SUMMARY

[0006] The first aspect of this disclosure provides a reverse transcriptase or a bioactive fragment thereof, said reverse transcriptase or bioactive fragment having: a. an amino acid sequence as shown in any one of SEQ ID NO: 1-3; b. an amino acid sequence having one or more amino acid substitutions, deletions, and / or additions compared to the amino acid sequence shown in any one of SEQ ID NO: 1-3, and said reverse transcriptase or bioactive fragment having reverse transcriptase function; or c. an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity compared to the amino acid sequence shown in any one of SEQ ID NO: 1-3, and said reverse transcriptase or bioactive fragment having reverse transcriptase function.

[0007] In some embodiments, the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 1. In some embodiments, the reverse transcriptase or its bioactive fragment is derived from *Thalassobacillus hwangdonensis*.

[0008] In some embodiments, the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 2. In some embodiments, the reverse transcriptase or its bioactive fragment is derived from Bacillus algicola.

[0009] In some embodiments, the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 3. In some embodiments, the reverse transcriptase or its bioactive fragment is derived from *Psychrobacter faecalis*.

[0010] A second aspect of this disclosure provides a fusion protein comprising a reverse transcriptase or a biologically active fragment thereof as described in any embodiment of the first aspect of this disclosure, and an additional portion fused thereto.

[0011] In some embodiments, the additional portion is a tag protein.

[0012] In some embodiments, the tag protein is at least one of Poly His, FLAG, GFP, Strep-Tag II, Poly Arg, C-myc, HA, V5, VSV-G, Trx, SUMO, GST, MBP, Ubiquitin, and NusA.

[0013] In some embodiments, the additional portion is located at the N-terminus and / or C-terminus of the reverse transcriptase or its bioactive fragment.

[0014] A third aspect of this disclosure provides a polynucleotide encoding a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure. In some embodiments, the polynucleotide comprises a sequence as shown in any one of SEQ ID NO: 4-6.

[0015] A fourth aspect of this disclosure provides a vector comprising a polynucleotide as described in any embodiment of a third aspect of this disclosure.

[0016] A fifth aspect of this disclosure provides a cell comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure or a vector as described in any embodiment of the fourth aspect of this disclosure, or expressing a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure or a fusion protein as described in any embodiment of the second aspect of this disclosure.

[0017] A sixth aspect of this disclosure provides a kit comprising a reverse transcriptase or a biologically active fragment thereof as described in any embodiment of the first aspect of this disclosure, and a fusion protein as described in any embodiment of the second aspect of this disclosure. In some embodiments, the kit further comprises a reverse transcriptase buffer, dNTPs, and / or primers.

[0018] In some embodiments, the reverse transcriptase buffer contains one or more selected from Tris-hydrochloric acid, ammonium sulfate, magnesium chloride, potassium chloride, and PBS.

[0019] The seventh aspect of this disclosure provides the use of reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, fusion protein as described in any embodiment of the second aspect of this disclosure, polynucleotide as described in any embodiment of the third aspect of this disclosure, vector as described in any embodiment of the fourth aspect of this disclosure, cell as described in any embodiment of the fifth aspect of this disclosure, or kit as described in any embodiment of the sixth aspect of this disclosure in the extension of nucleotide polymerization.

[0020] In some embodiments, the application includes one or more of the following: a. catalyzing reverse transcription; b. preparing a nucleic acid library including a reverse transcription reaction; and c. sequencing a nucleic acid library including a reverse transcription reaction.

[0021] An eighth aspect of this disclosure provides a reverse transcription method comprising: (i) mixing a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure, with an RNA template to obtain a reaction mixture; and (ii) performing a reverse transcription reaction based on the reaction mixture to obtain a DNA product.

[0022] In some embodiments, step (ii) further includes: performing a reverse transcription reaction based on the reaction mixture to obtain a cDNA first strand, wherein the cDNA first strand is wholly or partially complementary to the RNA template; and performing cDNA second strand synthesis based on the cDNA first strand to obtain a double-stranded DNA product.

[0023] In some embodiments, the reaction mixture further includes primers, dNTPs, and reverse transcriptase buffer.

[0024] In some embodiments, the reaction temperature of the reverse transcription reaction is 35°C-55°C, preferably 42°C-50°C.

[0025] The ninth aspect of this disclosure provides a method for preparing an RNA library, comprising: mixing a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure, with an RNA sample to obtain a library preparation mixture; and performing a reverse transcription reaction on the library preparation mixture to obtain the RNA library.

[0026] In some embodiments, the library preparation mixture further includes primers, dNTPs, and reverse transcriptase buffer. The technical solution disclosed herein achieves the following technical effects:

[0027] This disclosure provides novel reverse transcriptases (abbreviated as Tha-RT, Bac-RT, and Psy-RT, respectively, with amino acid sequences shown in SEQ ID NO: 1-3). Compared to existing reverse transcriptases, the novel reverse transcriptases provided in this disclosure exhibit superior catalytic performance, such as stronger thermal stability, enabling them to mediate the reverse transcription of complex RNA structures at relatively high temperatures; stronger polymerization ability, especially sustained synthesis ability, allowing them to perform efficient polymerization catalysis at low concentrations; and the ability to catalyze the production of longer reverse transcription product fragments in a shorter time. Therefore, the novel reverse transcriptases proposed in this disclosure are suitable for applications in reverse transcription reactions (including conventional cDNA synthesis, RACE, RT-LAMP, etc.) and various reverse transcription-based application scenarios, such as RNA library preparation in transcriptome sequencing. Attached Figure Description

[0028] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0029] Figure 1 shows the sequence alignment results of the three reverse transcriptases Tha-RT, Bac-RT and Psy-RT according to Example 1 of this disclosure with the same type of reverse transcriptases marathon RT and TGI RT;

[0030] Figure 2 shows the predicted structures of three novel reverse transcriptases Tha-RT, Bac-RT and Psy-RT according to Example 2 of this disclosure, and the similar reverse transcriptase TGI RT.

[0031] Figure 3 shows the protein purification results of three novel reverse transcriptases, Tha-RT, Bac-RT and Psy-RT, according to Example 3 of this disclosure;

[0032] Figure 4 illustrates the specific process of template-based cDNA synthesis using three novel reverse transcriptases, Tha-RT, Bac-RT, and Psy-RT, according to Example 4 of this disclosure.

[0033] Figure 5 shows the results of template-based cDNA synthesis using three novel reverse transcriptases, Tha-RT, Bac-RT and Psy-RT, according to Example 4 of this disclosure. Detailed Implementation

[0034] The present invention will now be described in further detail with reference to specific embodiments. The embodiments given are merely illustrative of the invention and are not intended to limit its scope. The embodiments provided below can serve as a guide for further improvements by those skilled in the art and do not constitute a limitation on the invention in any way.

[0035] This disclosure is based on the inventor's following understanding:

[0036] Reverse transcriptase is a specialized DNA polymerase that performs reverse transcription, synthesizing DNA from RNA as a template. Reverse transcriptases come from various sources, primarily RNA viruses. Common reverse transcriptases include Moloney murine leukemia virus reverse transcriptase (MMLV RT), human immunodeficiency virus reverse transcriptase (HIV RT), and avian myeloblast virus reverse transcriptase (AMV RT). In practical applications, reverse transcriptase is commonly used to study RNA molecules, such as in reverse transcription PCR (RT-PCR) or template-change-based RNA sequencing. Properties of reverse transcriptase that directly affect the efficiency of RNA reverse transcription to DNA include its thermostability, continuous synthesis capacity, fidelity, and template-change activity.

[0037] Bacterial type II introns

[0038] Bacterial type II introns are large catalytic RNAs, comprising intron ribozymes and intron-encoded reverse transcriptases. Typically, Group II reverse transcriptases derived from type II introns possess high fidelity, strong continuous synthesis capabilities, and unique template-changing activity (directly ligating RNA sequencing adapters to cDNA), making them of significant potential application value in RT-PCR and RNA sequencing. The reverse transcriptases Tha-RT, Bac-RT, and Psy-RT disclosed herein are reverse transcriptases derived from bacterial type II introns of *Thalassobacillus hwangdonensis*, *Bacillus algicola*, and *Psychrobacter faecalis*, respectively.

[0039] thermal stability

[0040] The thermostability of reverse transcriptase, i.e., its ability to withstand high temperatures, is a crucial factor affecting cDNA synthesis. Increasing the reaction temperature helps denature RNA with robust secondary structures and / or high GC content, enabling reverse transcriptase to read sequences. Most reverse transcriptases are not very thermostable, with an optimal reaction temperature of approximately 40°C, exhibiting a rapid decline in activity above this temperature. For example, MMLV reverse transcriptase shows a 50% decrease in activity after incubation at 44°C or 47°C for 10 minutes. In contrast, the reverse transcriptase provided in this disclosure maintains unaffected reverse transcription activity at 50°C. This facilitates the synthesis and increased yield of full-length cDNA, thereby enabling better reverse transcription of RNA with complex structures, and is particularly suitable for, for example, full-length transcriptome sequencing.

[0041] Continuous synthesis capability

[0042] The synthetic capacity of reverse transcriptase refers to the number of nucleotides bound to a single binding site of the enzyme. Reverse transcriptases with high sustained synthetic capacity can synthesize longer cDNA strands in a shorter reaction time. The reverse transcriptase provided in this disclosure can mediate the synthesis of longer cDNA bands even with low-quality and low-abundance RNA samples, and can synthesize more reaction products in the same time, thus exhibiting good sustained synthetic performance. It is suitable for use with RNA isolated from microorganisms, plants, animals, and clinical trial samples, as the RNA in these samples is degraded in an RNase-rich environment during processing, resulting in a low RNA initiation rate during reverse transcription.

[0043] Fidelity

[0044] The fidelity of reverse transcriptase refers to the sequence accuracy during the reverse transcription of RNA into DNA, which affects the accuracy of RNA sequencing. The reverse transcriptase provided in this disclosure has high fidelity and is suitable for use in reverse transcription reactions and various reverse transcription-based applications, such as RNA library preparation in transcriptome sequencing.

[0045] Template conversion activity

[0046] Reverse transcriptase possesses template-switching activity. During first-strand synthesis, upon reaching the 5' end of the RNA template, the reverse transcriptase adds additional nucleotides (primarily deoxycytidine C) to the 3' end of the newly synthesized cDNA strand. Subsequently, a template-switching oligonucleotide (TSO oligo) hybridizes with the C nucleotide added to the 3' end of the cDNA strand by the reverse transcriptase, thus "switching" the template strand to synthesize the second strand. The reverse transcriptase provided in this disclosure exhibits excellent template-switching activity, which is essential for obtaining complete cDNA from RNA and facilitates the efficient amplification of full-length transcript libraries.

[0047] Unless otherwise stated, all technical terms used herein have the meanings commonly understood by one of ordinary skill in the art to which this invention pertains. Generally, the nomenclature used herein and the laboratory procedures described below in cell culture, molecular genetics, organic chemistry, nucleic acid chemistry, and hybridization are well-known and commonly used in the art. Nucleic acid and peptide synthesis is performed using standard techniques. These techniques and procedures are performed according to conventional methods described in the art and various general references (see, generally, Sambrook et al., *Molecular Cloning: A Laboratory Manual*, 2nd edition (1989), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, which are incorporated herein by reference), which are incorporated herein by reference throughout. The nomenclature used herein and the laboratory procedures described below in analytical chemistry and organic synthesis are well-known and commonly used in the art. Chemical synthesis or chemical analysis may also be performed using standard techniques or variations thereof.

[0048] In this disclosure, "naturally occurring" or "wild-type" refers to a form found in nature. For example, a naturally occurring or wild-type polypeptide or polynucleotide sequence is a sequence present in an organism that has not been intentionally modified by human manipulation. A "mutant" means a sequence that has a change of at least one amino acid relative to a natural or wild-type amino acid sequence. In some embodiments, the change (mutation) includes at least one of substitution, deletion, and insertion.

[0049] In this disclosure, amino acids can be represented using the universal three-letter symbols or single-letter symbols recommended by the IUPAC-IUB Biochemistry Nomenclature Committee. Similarly, nucleotides can be represented by their recognized single-letter codes.

[0050] In this disclosure, the term "amino acid" refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimics that function similarly to naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, and those subsequently modified, such as hydroxyproline, γ-carboxyglutamic acid, and O-phosphoserine. Amino acid analogs are compounds with the same basic chemical structure as naturally occurring amino acids, i.e., carbon atoms bonded to hydrogen atoms, carboxyl groups, amino groups, and R groups, such as homoserine, ortholeucine, methionine sulfoxide, and methionine methylsulfonium. Such analogs have modified R groups (e.g., ortholeucine) or modified peptide backbones, but retain a substantially identical chemical structure to naturally occurring amino acids. Amino acid mimics are compounds with structures different from the common chemical structure of amino acids, but functioning similarly to naturally occurring amino acids. The amino acid sequences proposed in this disclosure (e.g., amino acid sequences shown in any one of SEQ ID NO: 1-3) may include the aforementioned amino acid analogs and mimics or related modifications, as long as they do not affect the basic properties of the corresponding amino acid or the activity of the entire enzyme or its active fragment.

[0051] An embodiment of the first aspect of this disclosure provides a reverse transcriptase or a bioactive fragment thereof having an amino acid sequence as shown in any one of SEQ ID NO: 1-3; or having an amino acid sequence with one or more amino acid substitutions, deletions and / or additions compared to the amino acid sequence shown in any one of SEQ ID NO: 1-3, and the reverse transcriptase or bioactive fragment thereof having the mutated amino acid sequence still has reverse transcriptase function.

[0052] In some embodiments, the reverse transcriptase or its bioactive fragment has an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the amino acid sequence shown in any one of SEQ ID NO: 1-3, and the reverse transcriptase or its bioactive fragment having these identical amino acid sequences still has reverse transcriptase function.

[0053] The reverse transcriptase or its bioactive fragment proposed in this disclosure may comprise a sequence having at least 80%, at least 85%, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 99.91%, 99.92%, 99.93%, 99.94%, 99.95%, 99.96%, 99.97%, 99.98%, 99.99% but less than 100% identity with any one of SEQ ID NO: 1-3, wherein the sequence is identical with SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. NO:3, compared to reverse transcriptases or their bioactive fragments that have one or more amino acid mutations and possess these identical amino acid sequences, still retain reverse transcriptase function.

[0054] In this disclosure, the term "percentage of identity" for nucleic acid or polypeptide sequences is defined as the percentage of nucleotide or amino acid residues in a candidate sequence that are identical to a known polypeptide after arranging the sequence to obtain the maximum percentage of identity and introducing gaps (if necessary) to achieve the maximum percentage of homology. N-terminal or C-terminal insertions or deletions should not be interpreted as affecting homology. Homology or identity at the nucleotide or amino acid sequence level can be determined by BLAST (Basic Local Alignment Search Tool) analysis using algorithms employed by the programs blastp, blastn, blastx, tblastn, and tblastx (Altschul (1997), Nucleic Acids Res. 25, 3389-3402 and Karlin (1990), Proc. Natl. Acad. Sci. USA. 87, 2264-2268), programs tailored for sequence similarity searches.

[0055] In this embodiment of the disclosure, compared with the amino acid sequence shown in any one of SEQ ID NO: 1-3, the reverse transcriptase or its bioactive fragment may have one or more amino acid substitutions, deletions, and / or additions, and the reverse transcriptase or its bioactive fragment with the mutated amino acid sequence still has reverse transcriptase function. For example, the reverse transcriptase or its bioactive fragment has at least 1-100 amino acid mutations compared with any one of SEQ ID NO: 1-3. In some embodiments, the reverse transcriptase or its bioactive fragment has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 amino acid mutations compared to any one of SEQ ID NO: 1-3. In embodiments of this disclosure, the reverse transcriptase or its bioactive fragment has at least 31-70 amino acid mutations compared to any one of SEQ ID NO: 1-3.

[0056] In this disclosure, the term "bioactive fragment" refers to any fragment, derivative, homolog, or analog of reverse transcriptase and its mutants that possesses biomolecular-specific in vivo or in vitro activity, including, for example, reverse transcriptase activity. In some embodiments, the bioactive fragment, derivative, homolog, or analog of reverse transcriptase possesses any degree of reverse transcriptase bioactivity in any in vivo or in vitro assay of interest. In some embodiments, the bioactive fragment may include any number of consecutive amino acid residues of reverse transcriptases Tha-RT, Bac-RT, or Psy-RT.

[0057] In some embodiments, the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 1. In some embodiments, the reverse transcriptase or its bioactive fragment is derived from *Thalassobacillus hwangdonensis*.

[0058] The amino acid sequence of Tha-RT reverse transcriptase (SEQ ID NO: 1, derived from Thalassobacillus hwangdonensis).

[0059] In some embodiments, the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 2. In some embodiments, the reverse transcriptase or its bioactive fragment is derived from Bacillus algicola.

[0060] The amino acid sequence of Bac-RT reverse transcriptase (SEQ ID NO: 2, derived from Bacillus algicola)

[0061] In some embodiments, the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 3. In some embodiments, the reverse transcriptase or its bioactive fragment is derived from *Psychrobacter faecalis*.

[0062] The amino acid sequence of Psy-RT reverse transcriptase (SEQ ID NO: 3, derived from Psychrobacter faecalis).

[0063] The novel reverse transcriptase proposed in this disclosure exhibits superior catalytic performance compared to existing reverse transcriptases, specifically in the following ways: enhanced thermostability, enabling it to mediate the reverse transcription of complex RNA structures at relatively high temperatures; and stronger polymerization ability, particularly its sustained synthesis capability, allowing it to perform highly efficient polymerization catalysis at low concentrations and to catalyze the production of longer reverse transcription product fragments in a shorter time. The novel reverse transcriptase proposed in this disclosure is suitable for applications in reverse transcription reactions (including conventional cDNA synthesis, RACE, RT-LAMP, etc.) and various reverse transcription-based application scenarios, such as RNA library preparation in transcriptome sequencing.

[0064] Embodiments of the second aspect of this disclosure provide a fusion protein comprising a reverse transcriptase or a biologically active fragment thereof as described in any embodiment of the first aspect of this disclosure, and an additional portion fused thereto.

[0065] In some embodiments, the additional portion is a tag protein.

[0066] In some embodiments, the tag protein is at least one selected from Poly His, FLAG, GFP, Strep-Tag II, Poly Arg, C-myc, HA, V5, VSV-G, Trx, SUMO, GST, MBP, Ubiquitin, and NusA. It is understood that the specific sequences of these tag proteins and the methods by which they are fused with the target protein are well known in the art. Furthermore, as is known to those skilled in the art, these tag proteins added to the N-terminus or C-terminus of the target protein can improve the expression level, solubility, stability, or facilitate purification after protein expression of the exogenous protein; the addition of the tag protein does not affect the performance of the reverse transcriptase.

[0067] In some embodiments, the additional portion is located at the N-terminus and / or C-terminus of the reverse transcriptase or its bioactive fragment.

[0068] In the embodiments of this disclosure, other tag sequences, or other small molecules or macromolecule conjugates, may be introduced into the reverse transcriptase as additional parts to form a fusion protein. This disclosure does not limit the specific form or conjugation method of the additional parts.

[0069] An embodiment of the third aspect of this disclosure provides a polynucleotide that encodes a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure.

[0070] In some embodiments, the polynucleotide comprises a sequence as shown in any one of SEQ ID NO: 4-6.

[0071] The nucleic acid sequence of Tha-RT reverse transcriptase (SEQ ID NO: 4, derived from Thalassobacillus hwangdonensis).

[0072] The nucleic acid sequence of Bac-RT reverse transcriptase (SEQ ID NO: 5, derived from Bacillus algicola).

[0073] The nucleic acid sequence of Psy-RT reverse transcriptase (SEQ ID NO: 6, derived from Psychrobacter faecalis).

[0074] In this disclosure, the term "nucleic acid" refers to DNA, RNA, single-stranded, double-stranded, or more highly aggregated hybridization motifs and any chemical modifications thereof. Modifications include, but are not limited to, those that provide chemical groups that introduce additional charges, polarities, hydrogen bonds, electrostatic interactions, or connection and interaction sites with nucleic acid ligand bases or the nucleic acid ligand as a whole. Such modifications include, but are not limited to, peptide nucleic acids (PNAs), phosphodiester group modifications (e.g., thiophosphates, methylphosphonates), 2'-sugar modifications, 5-pyrimidine modifications, 8-purine modifications, modifications at exocyclic amine sites, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodouracil, backbone modifications, methylation, and uncommon base pairing combinations such as isobases, isocytosides, and isoguanidines. Nucleic acids may also contain non-natural bases, such as nitroindole. Modifications may also include 3' and 5' modifications, such as capping with a fluorophore (e.g., quantum dots) or other portions. The nucleic acid sequences proposed in this disclosure (e.g., nucleic acid sequences shown in any one of SEQ ID NO: 4-6) may contain the above-mentioned non-natural bases or related modifications, as long as they do not affect the basic properties of the corresponding nucleotides.

[0075] It should be noted that, due to the codon degeneracy principle, the polynucleotide sequence that translates the amino acid sequence is not a unique and constant sequence, and any nucleotide sequence that can encode the same amino acid sequence is within the scope of protection of this patent.

[0076] An embodiment of the fourth aspect of this disclosure provides a vector comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure.

[0077] In this disclosure, "vector" refers to a polynucleotide that can replicate in a host organism independently of the host chromosome. Preferred vectors include plasmids and typically have a replication initiation site. Vectors may include, for example, transcription and translation terminators, transcription and translation initiation sequences, and promoters for regulating the expression of specific nucleic acids. This disclosure does not limit the specific elements included in the vector.

[0078] An embodiment of the fifth aspect of this disclosure provides a cell comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure, a vector as described in any embodiment of the fourth aspect of this disclosure, or expressing a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure.

[0079] The reverse transcriptase or its active fragment described in this invention can be expressed in a variety of host cells, including *Escherichia coli*, other bacterial hosts, yeast, filamentous fungi, and various higher eukaryotic cell lines such as COS, CHO, and HeLa cell lines and myeloma cell lines. Techniques for gene expression in microorganisms are described, for example, in Smith, *Gene Expression in Recombinant Microorganisms* (Bioprocess Technology, Vol. 22, Marcel Dekker, 1994). Examples of bacteria suitable for expression include, but are not limited to: *Escherichia*, *Enterobacter*, *Azotobacter*, *Erwinia*, *Bacillus*, *Pseudomonas*, *Klebsiella*, *Proteus*, *Salmonella*, *Serratia*, *Shigella*, *Rhizobium*, *Vibrio*, and *Paracococcus*. Filamentous fungi suitable as expression hosts include, for example, the following genera: *Aspergillus*, *Trichoderma*, *Neurospora*, *Penicillium*, *Cephalosporium*, *Amycium*, *Strigera*, *Mucor*, *Cyclophorus*, and *Pyrospora*. See, for example, U.S. Patent No. 5,679,543 and Stahl and Tudzynski, eds., *Molecular Biology in Filamentous Fungi*, John Wiley & Sons, 1992. The synthesis of heterologous proteins in yeast is well-known and described in the literature. This disclosure does not limit the expression of reverse transcriptase or its active fragments in the cells.

[0080] Numerous expression systems for generating polypeptides exist, known to those skilled in the art (see, for example, Gene Expression Systems, eds. Fernandex and Hoeffler, Academic Press, 1999; Sambrook and Russell, ibid.; and Ausubel et al., ibid.). Typically, the polynucleotide encoding the polypeptide is under the control of a promoter that is functional in the desired host cell. Many different promoters are available and known to those skilled in the art and can be used in the expression vectors of this invention, depending on the specific application. Typically, the chosen promoter depends on the cell in which the promoter will be active. Optional expression control sequences, such as ribosome binding sites, transcription termination sites, etc., may also be included. A construct containing one or more of these control sequences is called an “expression cassette.” Thus, the nucleic acid encoding the conjugated polypeptide is integrated to express at high levels in the desired host cell. This disclosure does not limit the expression systems for reverse transcriptases or their active fragments.

[0081] An embodiment of the sixth aspect of this disclosure provides a kit comprising a reverse transcriptase or a biologically active fragment thereof as described in any embodiment of the first aspect of this disclosure, and a fusion protein as described in any embodiment of the second aspect of this disclosure.

[0082] In some embodiments, the kit further comprises reverse transcriptase buffer.

[0083] In some embodiments, the reverse transcriptase buffer contains one or more selected from Tris-hydrochloric acid, ammonium sulfate, magnesium chloride, potassium chloride, and PBS.

[0084] In some embodiments, the kit may also contain other components for nucleotide polymerization extension, such as dNTPs and / or NTPs, metal ions, and optional primers.

[0085] Embodiments of the seventh aspect of this disclosure provide the use of reverse transcriptases or their bioactive fragments as described in any embodiment of the first aspect of this disclosure, fusion proteins as described in any embodiment of the second aspect of this disclosure, polynucleotides as described in any embodiment of the third aspect of this disclosure, vectors as described in any embodiment of the fourth aspect of this disclosure, cells as described in any embodiment of the fifth aspect of this disclosure, and kits as described in any embodiment of the sixth aspect of this disclosure in the extension of nucleotide polymerization.

[0086] It is understood that, as a special type of polymerase, the reverse transcriptase or its bioactive fragment proposed in this disclosure can guide nucleotides to complete the polymerization and elongation between nucleotides based on the principle of complementary base pairing, thereby synthesizing a cDNA strand complementary to the RNA template, or further synthesizing a DNA double strand complementary to the cDNA strand. Therefore, in some embodiments, the polymerized and elongated nucleotides may include deoxyribonucleotides and / or ribonucleotides or their analogues or derivatives.

[0087] In some specific embodiments, the application may include one or more of the following: a. catalyzing reverse transcription; b. preparing a nucleic acid library including a reverse transcription reaction; and c. sequencing a nucleic acid library including a reverse transcription reaction, wherein the sequencing may optionally be transcriptome sequencing.

[0088] An embodiment of the eighth aspect of this disclosure provides a reverse transcription method comprising: (i) mixing a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure, with an RNA template to obtain a reaction mixture; and (ii) performing a reverse transcription reaction based on the reaction mixture to obtain a DNA product.

[0089] In some embodiments, step (ii) further includes: performing a reverse transcription reaction based on the reaction mixture to obtain a cDNA first strand, wherein the cDNA first strand is wholly or partially complementary to the RNA template; and performing cDNA second strand synthesis based on the cDNA first strand to obtain a double-stranded DNA product.

[0090] In some embodiments, the reaction mixture further includes primers, dNTPs, and reverse transcriptase buffer.

[0091] In some embodiments, the reverse transcription reaction is carried out at a temperature of 35°C-55°C. In some embodiments, the reverse transcription reaction is carried out at a temperature of 42°C-50°C. The reverse transcriptase provided in this disclosure does not have its reverse transcription activity affected at 50°C. This is beneficial for achieving the synthesis and increased yield of full-length cDNA, thereby enabling better reverse transcription of RNA with complex structures, and is therefore particularly suitable for, for example, the preparation of full-length transcript libraries and sequencing.

[0092] An embodiment of the ninth aspect of this disclosure provides a method for preparing an RNA library, comprising: mixing a reverse transcriptase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, or a fusion protein as described in any embodiment of the second aspect of this disclosure, with an RNA sample to obtain a library preparation mixture; and performing a reverse transcription reaction on the library preparation mixture to obtain the RNA library.

[0093] In some embodiments, the library preparation mixture further includes primers, dNTPs, and reverse transcriptase buffer.

[0094] In some embodiments, the RNA library preparation method further includes one or more of the following: nucleic acid extraction, nucleic acid fragmentation, end repair, adapter ligation, library amplification, product circularization (preparation of DNA nanospheres), digestion, and optional product purification after each step.

[0095] In some embodiments, the RNA library includes a transcriptomic library (including a full transcriptomic library and a partial transcriptomic library) suitable for second-generation sequencing or a full-length transcriptomic library suitable for third-generation sequencing.

[0096] In some embodiments, the RNA library is used for high-throughput sequencing. In some embodiments, the library is used for second-generation high-throughput sequencing and / or third-generation high-throughput sequencing.

[0097] In summary, the embodiments of this disclosure propose novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT. Compared with existing reverse transcriptases, these novel reverse transcriptases have excellent catalytic performance, such as the ability to mediate the reverse transcription of complex RNA structures at relatively high temperatures, the high fidelity of the obtained reverse transcription products, and the longer fragment length. They are suitable for reverse transcription reactions (including ordinary cDNA synthesis, RACE, etc.) and various application scenarios based on reverse transcription reactions, such as RNA library preparation in transcriptome sequencing.

[0098] It should be noted that the foregoing explanations of the embodiments of the reverse transcriptase or its active fragment proposed in the first aspect of this disclosure also apply to the fusion protein as described in any embodiment of the second aspect of this disclosure, the polynucleotide proposed in the third aspect of this disclosure, the vector proposed in the fourth aspect of this disclosure, the cell proposed in the fifth aspect of this disclosure, the kit proposed in the sixth aspect of this disclosure, the application proposed in the seventh aspect of this disclosure, the reverse transcription method proposed in the eighth aspect of this disclosure, and the library preparation method proposed in the ninth aspect of this disclosure, which will not be repeated here.

[0099] Unless otherwise specified, the experimental methods used in the following examples are conventional methods, performed according to the techniques or conditions described in the literature in this field or according to the product instructions. Unless otherwise specified, the materials and reagents used in the following examples are commercially available.

[0100] Unless otherwise specified, the quantitative experiments in the following examples are all repeated three times, and the results are averaged.

[0101] Example 1: Identification of novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT

[0102] This embodiment identifies novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT by performing metagenomic sequencing and subsequent analysis on samples from deep-sea hydrothermal sediments in Huangdao. The specific steps are as follows:

[0103] 1.1 Metagenomic DNA was extracted from deep-sea hydrothermal sediment samples using the MGIEAsy Microbial DNA Extraction Kit (catalog number: 1000027955), and metagenomic sequence data were obtained through library construction and sequencing;

[0104] 1.2 Assemble the obtained metagenomic sequence data and annotate them by species and function;

[0105] 1.3 Perform sequence alignment with similar enzymes for the sequences of interest (Clustal Omega online sequence alignment website) and construct structural models.

[0106] Specifically, in section 1.3, sequences derived from bacterial type II introns containing mobile reverse transcription elements are listed as sequences of interest. This embodiment uses three reverse transcriptase sequences as sequences of interest: Tha-RT from *Thalassobacillus hwangdonensis*, Bac-RT from *Bacillus algicola*, and Psy-RT from *Psychrobacter faecalis*.

[0107] Furthermore, using the Clustal Omega online sequence alignment website, multiple sequence alignments were performed between the three novel Group II reverse transcriptase polymerases (Tha-RT, Bac-RT, and Psy-RT) and previously reported reverse transcriptases (i.e., reverse transcriptases also derived from bacterial type II introns) marathon RT (NCBI Reference Sequence: WP_015559317.1) and TGI RT (NCBI Reference Sequence: WP_053413546.1). The reverse transcriptase sequences used for alignment are shown in Table 1, and the alignment results are shown in Figure 1.

[0108] Table 1

[0109] The amino acid sequence of marathon RT (SEQ ID NO: 7):

[0110] The amino acid sequence of TGI RT (SEQ ID NO: 8):

[0111] As shown in Figure 1, the sequence identity of Tha-RT with respect to the two enzymes mentioned above is 46.75% and 58.51%, respectively; the sequence identity of Bac-RT with respect to the two enzymes mentioned above is 46.23% and 56.20%, respectively; and the sequence identity of Psy-RT with respect to the two enzymes mentioned above is 44.55% and 52.43%, respectively. This suggests that the reverse transcriptase proposed in this embodiment is a novel reverse transcriptase.

[0112] Example 2: Structural models of novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT

[0113] The structures of the three novel reverse transcriptases identified in Example 1 were predicted using the AlphaFold protein structure model, and the predicted structures are shown in Figure 2. Then, the predicted structure models were compared with those of the existing reverse transcriptase TGI RT (PDBID: 6AR1) using the TM-align online structure alignment website (https: / / zhanggroup.org / TM-align / ). The results showed that the TM-scores of Tha-RT, Bac-RT, and Psy-RT were 0.8760, 0.8598, and 0.8743, respectively. This indicates that although the sequence identity of each of the identified novel reverse transcriptases with the existing reverse transcriptases is less than 50%, their three-dimensional structures are highly homologous, suggesting that the three novel reverse transcriptases identified in this example are very likely to have similar reverse transcriptase functional activities.

[0114] Example 3: Determination of the polymerization activity of novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT

[0115] In this embodiment, the encoding gene sequences of Tha-RT, Bac-RT and Psy-RT were synthesized, expressed and purified to prepare reverse transcriptases Tha-RT, Bac-RT and Psy-RT (SEQ ID NO: 1-3), and their polymerization activities were then measured.

[0116] 3.1 Preparation of reverse transcriptases Tha-RT, Bac-RT and Psy-RT

[0117] 3.1.1 Construction of expression vector

[0118] The encoding gene sequences of Tha-RT, Bac-RT, and Psy-RT (SEQ ID NO: 4-6) were synthesized and introduced into the pET-28a expression vector (Changzhou Xinyi Biotechnology Co., Ltd.) to obtain recombinant vectors pET-28a-Tha-RT, pET-28a-Bac-RT, and pET-28a-Psy-RT, respectively. The cloning sites were Nde I and Xho I. The protein purification tag SUMO (see www.snapgene.com / plasmids / ta_and_gc_cloning_vectors / pET_SUMO_(linearized) for the specific SUMO sequence) was added to each recombinant vector using the In-Fusion method (In Fusion Hd Cloning Kit, TaKaRa, catalog number 639650). The reaction was carried out at 50℃ for 15 min, and the vectors were sequenced after the reaction. The In-Fusion reaction system is shown in Table 2 below.

[0119] Table 2

[0120] 3.1.2 Expression and purification of Tha-RT, Bac-RT and Psy-RT

[0121] Each recombinant vector, confirmed by sequencing, was transformed into *E. coli* BL21(DE3) competent cells and inoculated onto plates, then incubated overnight at 37°C. The next day, 3-6 single colonies were picked from the plates and inoculated into Erlenmeyer flasks containing 50 ml of liquid culture medium (catalog number: A507002-0250, purchased from Sangon Biotech), and incubated at 37°C for 5-7 h until the OD600 of the bacterial culture reached 0.6-4.0. The bacterial culture was then inoculated at a 1% inoculum into 2 L of LB medium and incubated at 37°C for 2-4 h until the OD600 of the bacterial culture reached 0.8-1.0. Isopropyl-β-D-thiogalactoside (IPTG, catalog number: A600168-0025, purchased from BBI) was added to the above bacterial culture to a final concentration of 0.5 mM. The culture was then placed in a shaker pre-cooled to 16°C and cultured at 220 rpm for 12-16 h to induce the cells to express three reverse transcriptases.

[0122] After induction culture, 50 ml of each reaction solution was taken and centrifuged at 8000 g for 30 min, and the precipitated bacterial cells were collected. Buffer A was added to the precipitate at a ratio of 1:20 to resuspend the bacterial cells, and then the cells were disrupted by sonication in an ice bath to obtain a whole bacterial culture. The whole bacterial culture was centrifuged at 12000 rpm at 4 °C for 60 min, and filtered through a 0.45 μm filter membrane to obtain the supernatant, which was used as the sample for loading onto the purification column.

[0123] Sample purification was performed using a nickel-filled gravity column. First, the supernatant was loaded onto the gravity column. After loading, the column was washed with 7 ml of binding buffer to obtain flow-through buffer. Then, elution was performed with 3 ml of Buffer A, Buffer B, and Buffer C, respectively, to obtain eluents. The preparation methods for each buffer are as follows:

[0124] Purification Buffer:

[0125] Binding buffer: 20mM Imidazole, 1×PBS, pH 7.4;

[0126] Buffer A: 50mM Imidazole, 1×PBS, pH 7.4;

[0127] Buffer B: 100mM Imidazole, 1×PBS, pH 7.4;

[0128] Buffer C: 300mM Imidazole, 1×PBS, pH 7.4

[0129] The purified proteins collected at each stage (whole bacterial culture, supernatant, flow-through, and elution buffer) were analyzed by SDS-PAGE electrophoresis, with the bacterial culture before induction used as a control. The electrophoresis results are shown in Figure 3. The theoretical molecular weights of Tha-RT, Bac-RT, and Psy-RT are 62.7 kDa, 61.9 kDa, and 66.9 kDa, respectively. As shown in Figure 3, the band sizes of each purified protein are consistent with the target band, proving that Tha-RT, Bac-RT, and Psy-RT were successfully expressed and prepared. Furthermore, compared to 50, 100, and 200 mM imidazole, more target protein was obtained when using 300 mM imidazole as the elution buffer.

[0130] 3.2 Determination of polymerization activity of Tha-RT, Bac-RT and Psy-RT

[0131] 3.2.1 Synthesize oligo(dT16) and 50nt Poly(rA), wherein the sequence of oligo(dT16) is TTTTTTTTTTTTTTT (SEQ ID NO: 9), and the sequence of Poly(rA) is AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (SEQ ID NO: 10);

[0132] 3.2.2 The purified proteins prepared in 3.1 (i.e., Tha-RT, Bac-RT, and Psy-RT) were used for reverse transcription at 37℃. The specific reaction system is shown in Table 3 below. The enzyme concentrations of Tha-RT, Bac-RT, and Psy-RT were 0.782, 0.380, and 0.160 mg / mL, respectively. The commercial Alpha Reverse Transcriptase (α-RT) used as the control was TransFlex I Reverse Transcriptase (catalog number: LS-EZ-E-00005O, purchased from Changzhou Xinyisheng Biotechnology Co., Ltd.), with a concentration of 1 mg / mL. After 10 min, 1 μL of 0.5 M EDTA was added to terminate the reaction.

[0133] Table 3

[0134] 3.2.3 The concentration of dsDNA generated in 3.2.2 was determined using the Qubit dsDNA HS Assay Kit (catalog number: Q32854, purchased from Invitrogen). The polymerization activity results of the three reverse transcriptases are shown in Table 4. As can be seen from Table 4, compared with the control enzyme α-RT, which involved a dsDNA concentration of 21.8 ng / μL, the three reverse transcriptases still synthesized considerable amounts of product even with lower input amounts than the control enzyme α-RT (1 μg). Taking Tha-RT as an example, even with an input amount of only 0.782 μg, it still synthesized a higher amount of product than the positive control group α-RT under the same conditions. This indicates that the three reverse transcriptases proposed in this embodiment have good polymerization activity and can perform reverse transcription even at low concentrations. They have high polymerization efficiency and low application cost, showing good application potential.

[0135] Table 4

[0136] Example 4: Performance of novel reverse transcriptases Tha-RT, Bac-RT, and Psy-RT in Smart-seq

[0137] This embodiment uses Smart-Seq, a transcriptome sequencing method based on template conversion library construction, to evaluate the performance of three reverse transcriptases in actual sequencing, with MMLV as a control. The procedure can be found in Picelli et al. (Picelli S, Faridani OR, AK, Winberg G, Sagasser S, Sandberg R. Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc. 2014; 9(1):171-181. doi:10.1038 / nprot.2014.006). The specific experimental steps are as follows.

[0138] First, cDNA was synthesized based on template conversion. The reverse transcription reaction system used is shown in Table 5 below. The reaction was carried out in a PCR instrument with the following program settings: 42℃ / 50℃, 1h; 85℃, 15min; 4℃, ∞.

[0139] The specific process for template-based cDNA synthesis is shown in Figure 4. First, the poly-A tail mRNA from the HEK293T sample is used as a template. The added Oligo-dT30VN primer pairs complementary to its poly-A tail, and reverse transcription is initiated by reverse transcriptase to synthesize the first-strand cDNA. Subsequently, during the synthesis of the first-strand cDNA, due to the 5' Cap structure of the mRNA template and the terminal transferase properties of reverse transcriptase, an additional non-template CCC base is introduced to the 3' end of the first-strand cDNA as it extends to the end of the mRNA template. Then, the rGrG+G on the template switching oligomer (TSO) in the system becomes complementary to the additional CCC base introduced at the 3' end of the first-strand cDNA. At this point, under the template-switching activity of reverse transcriptase, the template is converted from the original mRNA to TSO and extends along it, completing the cDNA synthesis.

[0140] Table 5

[0141] TSO: AAGCAGTGGTATCAACGCAGAGTACATrGrG+G (SEQ ID NO: 11);

[0142] Oligo-dT30VN:AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN (SEQ ID NO: 12).

[0143] After the reverse transcription reaction was completed, 5 μL of the reverse transcription product (i.e., the synthesized cDNA) was taken and PCR amplified in a 25 μL reaction system as shown in Table 6. The reaction was performed in a PCR instrument, and the reaction system is shown in Table 6. The program settings were as follows: 95℃ pre-denaturation for 5 min, 98℃ for 20 s; 58℃ for 20 s; 72℃ for 3 min; 15 cycles, and finally an additional 5 min of reaction at 72℃.

[0144] Table 6

[0145] IS PCR primer: AAGCAGTGGTATCAACGCAGAGT (SEQ ID NO: 13)

[0146] After the reaction, the enriched cDNA was quantified using the Qubit dsDNA HS Assay Kit (catalog number: Q32854, purchased from Invitrogen) and agarose gel electrophoresis to evaluate the performance of each reverse transcriptase in template-based library preparation and sequencing.

[0147] The performance evaluation results of the three reverse transcriptases in sequencing are shown in Figure 5 and Table 7. As can be seen from Figure 5 and Table 7, compared with MMLV, the amount of cDNA synthesized guided by Tha-RT, Bac-RT and Psy-RT is significantly increased, indicating that the three novel reverse transcriptases proposed in this embodiment have extremely high polymerization activity and can achieve efficient synthesis. In addition, the cDNA bands synthesized guided by Tha-RT, Bac-RT and Psy-RT still show high levels above 2kb, indicating that the three novel reverse transcriptases proposed in this embodiment have strong continuous synthesis ability and can guide the synthesis of longer fragments. Therefore, they have the potential to be used for third-generation sequencing with longer read lengths (such as smart-seq), and can be effectively used in multiple aspects including second-generation transcriptome sequencing, third-generation transcriptome sequencing (such as full-length transcriptome sequencing) and single-cell transcriptome sequencing and / or spatial transcriptome sequencing based on them.

[0148] Furthermore, by synthesizing cDNA at 42℃ and 50℃ respectively, as shown in Table 7, even at temperatures exceeding the optimal temperature of traditional reverse transcriptases (typically 40℃), the three reverse transcriptases proposed in this embodiment were able to efficiently synthesize large quantities of long-fragment products. This demonstrates that the reverse transcriptases of this embodiment possess strong thermostability and maintain strong continuous synthesis capabilities at high temperatures. Further, as shown in Table 7, the reverse transcription activity of the three reverse transcriptases did not change significantly at 50℃, again indicating that all three reverse transcriptases have good thermostability, superior to the optimal temperature of around 40℃ for most reverse transcriptase reactions. This is beneficial for the reverse transcription of complex RNA structures and is applicable to the reverse transcription and sequencing of the full-length transcriptome.

[0149] Table 7

[0150] In summary, the reverse transcriptases Tha-RT, Bac-RT, and Psy-RT provided in this embodiment exhibit high thermostability, strong continuous synthesis ability, and excellent template switching activity, making them novel reverse transcriptases with significant application value and promising modification prospects.

[0151] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0152] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.

Claims

1. A reverse transcriptase or a biologically active fragment thereof, wherein the reverse transcriptase or the biologically active fragment thereof: a. Has an amino acid sequence as shown in any one of SEQ ID NO: 1-3; b. Compared with the amino acid sequence shown in any one of SEQ ID NO: 1-3, it has an amino acid sequence with one or more amino acid substitutions, deletions, and / or additions, and the reverse transcriptase or its bioactive fragment has reverse transcriptase function; or c. An amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity with the amino acid sequence shown in any one of SEQ ID NO: 1-3, and the reverse transcriptase or its bioactive fragment having reverse transcriptase function.

2. The reverse transcriptase or its bioactive fragment according to claim 1, wherein the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO:

1. Optionally, the reverse transcriptase or its bioactive fragment is derived from *Thalassobacillus hwangdonensis*.

3. The reverse transcriptase or its bioactive fragment according to claim 1, wherein the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO:

2. Optionally, the reverse transcriptase or its bioactive fragment is derived from Bacillus algicola.

4. The reverse transcriptase or its bioactive fragment according to claim 1, wherein the reverse transcriptase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO:

3. Optionally, the reverse transcriptase or its bioactive fragment is derived from Psychrobacter faecalis.

5. A fusion protein comprising a reverse transcriptase or a biologically active fragment thereof as described in any one of claims 1 to 4, and an additional portion fused thereto. Optionally, the additional portion is a tag protein. Optionally, the tag protein is at least one selected from Poly His, FLAG, GFP, Strep-Tag II, Poly Arg, C-myc, HA, V5, VSV-G, Trx, SUMO, GST, MBP, Ubiquitin, and NusA. Optionally, the additional portion is located at the N-terminus and / or C-terminus of the reverse transcriptase or its bioactive fragment.

6. A polynucleotide encoding a reverse transcriptase or a biologically active fragment thereof as described in any one of claims 1 to 4, or a fusion protein as described in claim 5. Optionally, the polynucleotide comprises a sequence as shown in any one of SEQ ID NO: 4-6.

7. A vector comprising the polynucleotide as described in claim 6.

8. A cell comprising the polynucleotide of claim 6 or the vector of claim 7, or expressing a reverse transcriptase or a bioactive fragment thereof of any one of claims 1 to 4 or a fusion protein of claim 5.

9. A kit comprising the reverse transcriptase or its bioactive fragment as described in any one of claims 1 to 4, or the fusion protein as described in claim 5. Optionally, the kit further comprises reverse transcriptase buffer, dNTPs, and / or primers. Optionally, the reverse transcriptase buffer contains one or more selected from Tris-hydrochloric acid, ammonium sulfate, magnesium chloride, potassium chloride, and PBS.

10. The use of the reverse transcriptase or its bioactive fragment as described in any one of claims 1 to 4, or the fusion protein as described in claim 5, the polynucleotide as described in claim 6, the vector as described in claim 7, the cell as described in claim 8, or the kit as described in claim 9 in the extension of nucleotide polymerization.

11. The application according to claim 10, wherein the application includes one or more of the following: a. Catalyzing reverse transcription reactions; b. Nucleic acid library preparation, including reverse transcription; and c. Sequencing of nucleic acid libraries, including reverse transcription reactions.

12. A reverse transcription method, comprising: (i) mixing the reverse transcriptase or its bioactive fragment as described in any one of claims 1 to 4, or the fusion protein as described in claim 5, with an RNA template to obtain a reaction mixture; and (ii) Perform reverse transcription based on the reaction mixture to obtain DNA product.

13. The reverse transcription method according to claim 12, wherein step (ii) further comprises: Reverse transcription is performed based on the reaction mixture to obtain a cDNA strand, wherein the cDNA strand is wholly or partially complementary to the RNA template. and Based on the first strand of cDNA, a second strand of cDNA is synthesized to obtain a double-stranded DNA product.

14. The reverse transcription method according to claim 12 or 13, wherein the reaction mixture further comprises primers, dNTPs, and reverse transcriptase buffer.

15. The reverse transcription method according to claim 12, wherein the reaction temperature of the reverse transcription reaction is 35℃-55℃, preferably 42℃-50℃.

16. A method for preparing an RNA library, comprising: The reverse transcriptase or its bioactive fragment as described in any one of claims 1 to 4, or the fusion protein as described in claim 5, is mixed with an RNA sample to obtain a library preparation mixture; and The library preparation mixture was subjected to reverse transcription to obtain the RNA library.

17. The method of claim 16, wherein the library preparation mixture further comprises primers, dNTPs, and reverse transcriptase buffer.