Transposase and use thereof
By developing novel transposases or their bioactive fragments, the problem of poor disruption effect of Tn5 transposase in genome sequencing has been solved, enabling more efficient transposition reactions and library construction, and improving the efficiency of gene editing and library construction.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- BGI RESEARCH HANGZHOU
- Filing Date
- 2024-12-19
- Publication Date
- 2026-06-25
Smart Images

Figure PCTCN2024140781-FTAPPB-I100001 
Figure PCTCN2024140781-FTAPPB-I100002 
Figure PCTCN2024140781-FTAPPB-I100003
Abstract
Description
Transposases and their applications Technical Field
[0001] This disclosure relates to the field of biotechnology, specifically to transposases and their applications. Background Technology
[0002] A transposon (TE) is a DNA fragment capable of autonomous replication and movement within the genome. It replicates in situ or detaches directly from its original location and inserts into another site in the genome, regulating the recipient gene. Transposons are classified into type I and type II: Type I transposons are characterized by "copy-paste," meaning that RNA is transcribed from the transposon and then reverse-transcribed to regenerate DNA, which is then inserted into another location in the genome. Type II transposons are characterized by "cut-paste," meaning that they do not use RNA as a medium; the original transposon is directly broken from the donor site by a transposase, then transferred and integrated into the target site. Type II transposons are more commonly used in biotechnology; for example, the Tn5 transposase, commonly used in library construction, involves type II transposons.
[0003] The structure of a type II transposon includes a transposase gene sequence, terminal inverted repeats (TIRs, also known as ME sequences) at both ends, and a target site duplication (TSD). The transposition process of a type II transposon is as follows: the transposase gene sequence expresses a transposase; two transposase monomers bind to the ME sequences at both ends of the gene, forming a transposase-ME complex dimer (i.e., the transposase complex). This complex utilizes its cleavage activity to extrude the transposon from the donor strand through a series of chemical reactions. After leaving the donor strand and recognizing the target site, the transposase complex breaks down the target site and inserts it into the transposase gene sequence, completing the transposition event.
[0004] In practical applications, although type II transposases such as Tn5 transposase have been widely used in second- and third-generation sequencing library construction (such as genome library construction and transcriptome library construction) due to their ingenious fragmentation mechanism and high library construction efficiency, genome sequencing results show that the fragmentation effect of Tn5 transposase is still not as good as traditional fragmentation. Its fragmentation coverage is limited and it has a certain sequence bias.
[0005] Therefore, there is an urgent need to develop alternatives to Tn5 transposases with stronger enzyme activity and better disruption effect, which can be used for sequencing library construction and many other applications. Summary of the Invention
[0006] The first aspect of this disclosure provides a transposase or a bioactive fragment thereof, wherein the transposase or bioactive fragment: a. has the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2; b. has one or more amino acid substitutions, deletions, and / or additions compared to the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2, and the transposase or bioactive fragment thereof has transposase function; or c. has the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2. The amino acid sequence shown in NO:2 has at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, and at least 99.9% identity with the amino acid sequence shown in NO:2, and the transposase or its bioactive fragment has transposase function.
[0007] In some embodiments, the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 1.
[0008] In some embodiments, the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 2.
[0009] A second aspect of this disclosure provides a transposase complex comprising: a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect; and a transposase binding sequence comprising an inverted repeat sequence and optionally a DNA sequence to be transposed, wherein the transposase or the bioactive fragment thereof binds to the inverted repeat sequence. In some embodiments, the DNA sequence to be transposed is a gene or a fragment thereof, an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
[0010] In some embodiments, the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second nucleic acid sequence present an inverted repeat structure, and the transposase or its bioactive fragment recognizes and binds to the first nucleic acid sequence and the second nucleic acid sequence.
[0011] In some embodiments, the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 1, and the first nucleic acid sequence and the second nucleic acid sequence each have: a. nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6; or b. nucleic acid sequences with one or more nucleotide substitutions, deletions, and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding with the transposase or its bioactive fragment to form the transposase complex.
[0012] In some embodiments, the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 2, and the first nucleic acid sequence and the second nucleic acid sequence each have: a. nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8; or b. nucleic acid sequences with one or more nucleotide substitutions, deletions, and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding with the transposase or its bioactive fragment to form the transposase complex.
[0013] A third aspect of this disclosure provides a polynucleotide encoding a transposase or a biologically active fragment thereof or a complementary sequence thereof as described in any embodiment of the first aspect. In some embodiments, the polynucleotide comprises a sequence as shown in SEQ ID NO: 3 or SEQ ID NO: 4 or a complementary sequence thereof.
[0014] A fourth aspect of this disclosure provides a transposon comprising: a polynucleotide as described in any embodiment of the third aspect; and a transposase binding sequence comprising an inverted repeat sequence and optionally a DNA sequence to be transposable, the inverted repeat sequence being located flanking the polynucleotide, and the transposase or a biologically active fragment thereof being capable of binding to the inverted repeat sequence. In some embodiments, the DNA sequence to be transposable is a gene or a fragment thereof, an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
[0015] In some embodiments, the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second double-stranded sequence present an inverted repeat structure, and the first nucleic acid sequence and the second nucleic acid sequence are located on each side of the polynucleotide.
[0016] In some embodiments, the polynucleotide has a sequence as shown in SEQ ID NO: 3, and the first nucleic acid sequence and the second nucleic acid sequence each have: a) nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6; or b) nucleic acid sequences having one or more nucleotide substitutions, deletions, and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding to the transposase or its bioactive fragment encoded by the polynucleotide to form a transposase complex.
[0017] In some embodiments, the polynucleotide has a sequence as shown in SEQ ID NO: 4, and the first nucleic acid sequence and the second nucleic acid sequence each have: a) nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8; or b) nucleic acid sequences having one or more nucleotide substitutions, deletions, and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding to the transposase or its bioactive fragment encoded by the polynucleotide to form a transposase complex.
[0018] In some embodiments, the transposon further comprises a target site repeat sequence located to one side of the transposon binding sequence, preferably the inverted repeat sequence.
[0019] The fifth aspect of this disclosure provides a vector comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure or a transposon as described in any embodiment of the fourth aspect of this disclosure.
[0020] A sixth aspect of this disclosure provides a cell comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure, a transposon as described in any embodiment of the fourth aspect, or a vector as described in any embodiment of the fifth aspect, or expressing a transposase or a biologically active fragment thereof as described in any embodiment of the first aspect.
[0021] A seventh aspect of this disclosure provides a composition comprising: a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure, or a transposase complex as described in any embodiment of the second aspect; and an enzyme buffer, wherein the enzyme buffer optionally contains Mg. 2+ In some embodiments, the enzyme buffer contains one or more selected from Tris-hydrochloric acid, ammonium sulfate, magnesium chloride, potassium chloride, glycerol, and PBS.
[0022] An eighth aspect of this disclosure provides a kit comprising at least one of the following: a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure, a transposase complex as described in any embodiment of the second aspect, a polynucleotide as described in any embodiment of the third aspect, a transposon as described in any embodiment of the fourth aspect, a vector as described in any embodiment of the fifth aspect, a cell as described in any embodiment of the sixth aspect, and a composition as described in any embodiment of the seventh aspect.
[0023] The ninth aspect of this disclosure proposes the use of transposases or their bioactive fragments as described in any embodiment of the first aspect of this disclosure, transposase complexes as described in any embodiment of the second aspect, polynucleotides as described in any embodiment of the third aspect, transposons as described in any embodiment of the fourth aspect, vectors as described in any embodiment of the fifth aspect, cells as described in any embodiment of the sixth aspect, and compositions as described in any embodiment of the seventh aspect in nucleic acid cleavage and optional nucleic acid insertion.
[0024] In some embodiments, the application includes one or more of the following: a. endogenous gene editing; b. exogenous gene transfer; and c. library construction. In some embodiments, the library includes a transcriptome library, a genomic library, or an epigenetic detection library, wherein the epigenetic detection library may optionally be a DNA-protein interaction detection library or a DNA methylation detection library.
[0025] A tenth aspect of this disclosure provides an in vitro transposition method, comprising: contacting a donor DNA molecule with a target DNA molecule and a transposase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure for a period of time sufficient to allow the transposase or its bioactive fragment to catalyze in vitro transposition, wherein the donor DNA molecule comprises a DNA sequence to be transposed and an inverted repeat sequence; or contacting a transposase complex as described in any embodiment of the second aspect with the target DNA molecule for a period of time sufficient to allow the transposase or its bioactive fragment to catalyze in vitro transposition. In some embodiments, the DNA sequence to be transposed is a gene or a fragment thereof, an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
[0026] In some embodiments, the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second nucleic acid sequence present an inverted repeat structure, and the first nucleic acid sequence and the second nucleic acid sequence are respectively linked to the DNA sequence to be transposed, or are jointly linked to the DNA sequence to be transposed.
[0027] In some embodiments, based on the transposase or its bioactive fragment having an amino acid sequence as shown in SEQ ID NO: 1, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6, respectively.
[0028] In some embodiments, based on the transposase or its bioactive fragment having an amino acid sequence as shown in SEQ ID NO: 2, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8, respectively.
[0029] The eleventh aspect of this disclosure provides a library construction method, comprising: incubating a transposase complex with an analyte containing a target DNA molecule and performing a transposition reaction to obtain a transposition product; and obtaining the library based on the transposition product, wherein the transposase complex comprises a transposase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure, an inverted repeat sequence, and a DNA sequence to be transposed, wherein the DNA sequence to be transposed is an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
[0030] In some embodiments, the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second nucleic acid sequence present an inverted repeat structure, and the first nucleic acid sequence and the second nucleic acid sequence are respectively linked to the DNA sequence to be transposed, or are jointly linked to the DNA sequence to be transposed.
[0031] In some embodiments, based on the transposase or its bioactive fragment having an amino acid sequence as shown in SEQ ID NO: 1, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6, respectively.
[0032] In some embodiments, based on the transposase or its bioactive fragment having an amino acid sequence as shown in SEQ ID NO: 2, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8, respectively.
[0033] In some embodiments, obtaining the library based on the transposon product includes: performing PCR amplification on the transposon product to obtain an amplification product, wherein the amplification product is the library.
[0034] The technical solution disclosed herein achieves the following technical effects:
[0035] This disclosure presents novel transposases (abbreviated as 22°S and 11°N, with amino acid sequences shown in SEQ ID NO: 1-2, respectively). Compared to existing transposases, the novel transposases provided in this disclosure exhibit superior catalytic performance, such as significantly higher transposition activity. Therefore, they can serve as an alternative to traditional Tn5 transposases in various transposase application scenarios, such as endogenous gene editing, exogenous gene introduction, and library construction, to improve transposition reaction efficiency, gene editing efficiency, or library construction efficiency. Attached Figure Description
[0036] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0037] Figure 1 shows the sequence alignment of 22°S and 11°N with wild-type Tn5 transposase according to Example 1 of this disclosure;
[0038] Figure 2 shows the structural models at 22°S and 11°N according to Example 2 of this disclosure and their comparison with the wild-type Tn5 transposase model;
[0039] Figure 3 illustrates the combination of 22°S and 11°N with their respective ME sequences according to Embodiment 3 of this disclosure;
[0040] Figure 4 is a schematic diagram illustrating the detection principle of transposable activity at 22°S and 11°N according to Embodiment 4 of this disclosure;
[0041] Figure 5 shows the transposable activity test results at 22°S and 11°N according to Example 4 of this disclosure;
[0042] Figure 6 shows the transposition result at 22°S according to Embodiment 4 of this disclosure;
[0043] Figure 7 shows the transposition result at 11°N according to Embodiment 4 of this disclosure;
[0044] Figure 8 is a schematic diagram illustrating the detection principle of transposable activity at 22°S and 11°N according to Embodiment 5 of this disclosure;
[0045] Figure 9 shows the transposable activity test results at 22°S and 11°N according to Example 5 of this disclosure. Detailed Implementation
[0046] The present invention will now be described in further detail with reference to specific embodiments. The embodiments given are merely illustrative of the invention and are not intended to limit its scope. The embodiments provided below can serve as a guide for further improvements by those skilled in the art and do not constitute a limitation on the invention in any way.
[0047] Unless otherwise stated, all technical terms used herein have the meanings commonly understood by one of ordinary skill in the art to which this invention pertains. Generally, the nomenclature used herein and the laboratory procedures described below in cell culture, molecular genetics, organic chemistry, nucleic acid chemistry, and hybridization are well-known and commonly used in the art. Nucleic acid and peptide synthesis is performed using standard techniques. These techniques and procedures are performed according to conventional methods described in the art and various general references (see, generally, Sambrook et al., *Molecular Cloning: A Laboratory Manual*, 2nd edition (1989), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, which are incorporated herein by reference), which are incorporated herein by reference throughout. The nomenclature used herein and the laboratory procedures described below in analytical chemistry and organic synthesis are well-known and commonly used in the art. Chemical synthesis or chemical analysis may also be performed using standard techniques or variations thereof.
[0048] In this disclosure, "naturally occurring" or "wild-type" refers to a form found in nature. For example, a naturally occurring or wild-type polypeptide or polynucleotide sequence is a sequence present in an organism that has not been intentionally modified by human manipulation. A "mutant" means a sequence that has a change of at least one amino acid relative to a natural or wild-type amino acid sequence. In some embodiments, the change (mutation) includes at least one of substitution, deletion, and insertion.
[0049] In this disclosure, amino acids can be represented using the universal three-letter symbols or single-letter symbols recommended by the IUPAC-IUB Biochemistry Nomenclature Committee. Similarly, nucleotides can be represented by their recognized single-letter codes.
[0050] In this disclosure, the term "amino acid" refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimics that function similarly to naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, and those subsequently modified, such as hydroxyproline, γ-carboxyglutamic acid, and O-phosphoserine. Amino acid analogs are compounds with the same basic chemical structure as naturally occurring amino acids, i.e., carbon atoms bonded to hydrogen atoms, carboxyl groups, amino groups, and R groups, such as homoserine, ortholeucine, methionine sulfoxide, and methionine methylsulfonium. These analogs have modified R groups (such as ortholeucine) or modified peptide backbones, but retain a substantially identical chemical structure to naturally occurring amino acids. Amino acid mimics are compounds with structures different from the common chemical structure of amino acids, but functioning similarly to naturally occurring amino acids. The amino acid sequences proposed in this disclosure (e.g., the amino acid sequences shown in SEQ ID NO: 1 or 2) may include the aforementioned amino acid analogs and mimics or related modifications, as long as they do not affect the basic properties of the corresponding amino acid or the activity of the entire enzyme or its active fragment.
[0051] In this disclosure, the term "transposase" refers to an enzyme that performs transposition, typically encoded by a transposon, that recognizes specific sequences (such as inverted repeat sequences) at both ends of the transposon, thereby dislodging the transposon from its original position and inserting it into a new DNA target site. As used herein, "transposase function" or "transposase activity" refers to the function of a transposase or transposase complex to cleave the target site and insert the DNA sequence it carries into the target site.
[0052] In this disclosure, the term "transposase complex" generally comprises two transposase monomers and a transposase-binding sequence bound to them. The transposase-binding sequence may include a specific sequence (such as an inverted repeat sequence) recognized and bound by the transposase, and a DNA sequence to be transposed that can be used to artificially modify the target molecule, such as a gene or fragment thereof, an adapter sequence, a tag sequence, a sequencing primer-binding sequence, etc. In this disclosure, the transposase complex may contain two double-stranded nucleic acid sequences (i.e., a first double-stranded nucleic acid sequence and a second double-stranded nucleic acid sequence, which exhibit an inverted repeat structure, hence referred to as inverted repeat sequences), each recognized and bound by one transposase monomer molecule. In this disclosure, the transposase complex may contain one or more DNA sequences to be transposed. In some embodiments, the transposase complex may contain two DNA sequences to be transposed, which are respectively linked to the first double-stranded nucleic acid sequence and the second double-stranded nucleic acid sequence; the two DNA sequences to be transposed may be the same or different.
[0053] In this embodiment of the disclosure, "Terminal Inverted Repeat (TIR; also known as ME sequence)" refers to two double-stranded nucleic acid sequences (i.e., the first double-stranded nucleic acid sequence and the second double-stranded nucleic acid sequence) located at both ends of a transposon. These two double-stranded nucleic acid sequences exhibit an inverted repeat structure and are recognized and bound by transposase monomers to form a transposase complex and exert transposition activity. In this embodiment of the disclosure, "inverted repeat sequence" can refer to any one or both of the two inverted double-stranded nucleic acid sequences.
[0054] The first aspect of this disclosure provides a transposase or its bioactive fragment having the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2; or having, compared with the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2, one or more amino acid substitutions, deletions, and / or additions, and the transposase or its bioactive fragment having the mutated amino acid sequence still retains transposase function; or having, compared with the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2, one or more amino acid substitutions, deletions, and / or additions, ... Compared with the amino acid sequence shown in NO:2, the transposase or its bioactive fragment having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% identity amino acid sequences, and the transposase or its bioactive fragment having these identical amino acid sequences still has transposase function.
[0055] In embodiments of this disclosure, the transposase or its bioactive fragment may comprise a sequence having at least 80%, at least 85%, at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 99.91%, 99.92%, 99.93%, 99.94%, 99.95%, 99.96%, 99.97%, 99.98%, 99.99% but less than 100% identity with SEQ ID NO: 1 or SEQ ID NO: 2, wherein the sequence is identical with SEQ ID NO: 1 or SEQ ID NO: 2. NO:2, compared to transposases or their bioactive fragments that have one or more amino acid mutations and possess these identical amino acid sequences, still retain transposase function.
[0056] In this disclosure, the term "percentage of identity" for nucleic acid or polypeptide sequences is defined as the percentage of nucleotide or amino acid residues in a candidate sequence that are identical to a known polypeptide after arranging the sequence to obtain the maximum percentage of identity and introducing gaps (if necessary) to achieve the maximum percentage of homology. N-terminal or C-terminal insertions or deletions should not be interpreted as affecting homology. Homology or identity at the nucleotide or amino acid sequence level can be determined by BLAST (Basic Local Alignment Search Tool) analysis using algorithms employed by the programs blastp, blastn, blastx, tblastn, and tblastx (Altschul (1997), Nucleic Acids Res. 25, 3389-3402 and Karlin (1990), Proc. Natl. Acad. Sci. USA. 87, 2264-2268), programs tailored for sequence similarity searches.
[0057] In this embodiment of the disclosure, compared with the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2, the transposase or its bioactive fragment may have one or more amino acid substitutions, deletions, and / or additions, and the transposase or its bioactive fragment with the mutated amino acid sequence still retains transposase function. For example, the transposase or its bioactive fragment has at least 1-100 amino acid mutations compared with SEQ ID NO: 1 or SEQ ID NO: 2. In some embodiments, the transposase or its bioactive fragment has at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, or at least 30 amino acid mutations compared to the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2. In embodiments of this disclosure, the transposase or its bioactive fragment has at least 31-70 amino acid mutations compared to the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2.
[0058] In this disclosure, the term "bioactive fragment" refers to any fragment, derivative, homolog, or analog of a transposase and its mutants that possesses biomolecular-specific in vivo or in vitro activity, including, for example, transposase activity. In some embodiments, the bioactive fragment, derivative, homolog, or analog of the transposase possesses any degree of transposase bioactivity in any in vivo or in vitro assay of interest. In some embodiments, the bioactive fragment may comprise any number of consecutive amino acid residues at 22°S or 11°N of the transposase.
[0059] In some embodiments, the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 1. The inventors of this application have named it "transposase 22°S".
[0060] The amino acid sequence of transposase 22°S (SEQ ID NO: 1):
[0061] In some embodiments, the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 2. The inventors of this application have named it "transposase 11°N".
[0062] The amino acid sequence of transposase 11°N (SEQ ID NO: 2):
[0063] The transposases 22°S and 11°N proposed in this disclosure have superior catalytic performance compared to existing transposases, such as significantly higher transposition activity. Therefore, they can be used as alternatives to traditional Tn5 transposases in various transposase application scenarios, such as endogenous gene editing, exogenous gene introduction, and library construction, to improve transposition reaction efficiency, gene editing efficiency, or library construction efficiency.
[0064] Embodiments of the second aspect of this disclosure provide a transposase complex comprising: a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure; and a transposase binding sequence comprising an inverted repeat sequence and optionally a DNA sequence to be transposed, wherein the transposase or the bioactive fragment thereof binds to the inverted repeat sequence.
[0065] In some embodiments, the inverted repeat sequence is a double-stranded nucleic acid sequence, and the transposase complex can be a "transposase-inverted repeat sequence" dimer, meaning the complex contains two transposase monomers and two double-stranded nucleic acid sequences (i.e., a first double-stranded nucleic acid sequence and a second double-stranded nucleic acid sequence, both of which exhibit an inverted repeat structure, also referred to as inverted repeat sequences). Each of the two double-stranded nucleic acid sequences is recognized and bound by one transposase monomer molecule, and each of the two double-stranded nucleic acid sequences is linked to a DNA sequence to be transposed, which can be used for artificial modification of the target molecule, such as a gene or its fragment, an adapter sequence, a tag sequence, a sequencing primer binding sequence, etc. In some embodiments, the DNA sequences to be transposed linked to the two inverted repeat sequences can be the same or different.
[0066] In this embodiment of the disclosure, the structure of the inverted repeat sequence can be described using one single nucleic acid strand from each of the first and second double-stranded nucleic acid sequences as examples. For instance, based on the first double-stranded nucleic acid sequence being composed of a first nucleic acid sequence and a complementary third nucleic acid sequence, the first nucleic acid sequence is used to represent the first double-stranded nucleic acid sequence. Similarly, based on the second double-stranded nucleic acid sequence being composed of a second nucleic acid sequence and a complementary fourth nucleic acid sequence, the second nucleic acid sequence is used to represent the second double-stranded nucleic acid sequence. The first and second nucleic acid sequences can be located on the same nucleic acid strand of the transposon, and the third and fourth nucleic acid sequences can be located together on the other nucleic acid strand of the transposon. In some embodiments, based on the transposon being a double-stranded structure, it can have the following structure:
[0067] 5'-First nucleic acid sequence-transposase coding sequence-Second nucleic acid sequence-3'
[0068] 3' - Third nucleic acid sequence - Complementary sequence to transposase coding sequence - Fourth nucleic acid sequence - 5'.
[0069] It is understood that, due to the special inverted repeat structure between the first and second double-stranded nucleic acid sequences, the third nucleic acid sequence can be identical to the second nucleic acid sequence in the 5' to 3' direction; similarly, the fourth nucleic acid sequence can be identical to the first nucleic acid sequence in the 5' to 3' direction. The disclosure of the first and second nucleic acid sequences in this embodiment is equivalent to the disclosure of the third and fourth nucleic acid sequences, and also equivalent to the disclosure of the inverted repeat sequences recognized by each transposon in this embodiment.
[0070] In some embodiments, when the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 1, the first nucleic acid sequence may have the nucleic acid sequence shown in SEQ ID NO: 5, and the second nucleic acid sequence may have the nucleic acid sequence shown in SEQ ID NO: 6. In some embodiments, based on the fact that the third nucleic acid sequence is identical to the second nucleic acid sequence in the 5' to 3' direction, and the fourth nucleic acid sequence is identical to the first nucleic acid sequence, the nucleic acid sequence shown in SEQ ID NO: 6 may also be used to represent the third nucleic acid sequence, which is complementary to the first nucleic acid sequence described in SEQ ID NO: 5, forming a first double-stranded nucleic acid sequence that can be recognized by a transposase having the amino acid sequence shown in SEQ ID NO: 1. The second double-stranded nucleic acid sequence is similarly represented.
[0071] In other embodiments, when the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 2, the first nucleic acid sequence may have the nucleic acid sequence shown in SEQ ID NO: 7, and the second nucleic acid sequence may have the nucleic acid sequence shown in SEQ ID NO: 8. In some embodiments, based on the fact that the third nucleic acid sequence is identical to the second nucleic acid sequence in the 5' to 3' direction, and the fourth nucleic acid sequence is identical to the first nucleic acid sequence, the nucleic acid sequence shown in SEQ ID NO: 8 may also be used to represent the third nucleic acid sequence, which is complementary to the first nucleic acid sequence described in SEQ ID NO: 7, forming a first double-stranded nucleic acid sequence that can be recognized by a transposase having the amino acid sequence shown in SEQ ID NO: 2. The second double-stranded nucleic acid sequence is similarly represented.
[0072] 5'-ACCGTAGGATGGCCGCCCCGGCCGTCCATGGC-3' (SEQ ID NO: 5);
[0073] 5'-GCCATGGACGGCCGGGGCGGCCATCCTACGGT-3' (SEQ ID NO: 6);
[0074] 5'-CTGGCGAGCCGGTGCCGTCAGGCCTCTTTATCCCCCACATCTTGATTTTG-3' (SEQ ID NO: 7);
[0075] 5'-CAAAATCAAGATGTGGGGGATAAAGAGGCCTGACGGCACCGGCTCGCCAG-3' (SEQ ID NO: 8).
[0076] In some embodiments, when the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 1, the inverted repeat sequence it recognizes and binds to can be the nucleic acid sequence and its inverted repeat shown in SEQ ID NO: 5 and SEQ ID NO: 6, wherein the first double-stranded nucleic acid sequence is formed by complementary pairing of SEQ ID NO: 5 and SEQ ID NO: 6, and the second double-stranded nucleic acid sequence is formed by complementary pairing of SEQ ID NO: 5 and SEQ ID NO: 6, wherein the first double-stranded nucleic acid sequence and the second double-stranded nucleic acid sequence can each be recognized and bound by a molecule of the transposase or its bioactive fragment having the amino acid sequence shown in SEQ ID NO: 1.
[0077] In other embodiments, when the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 2, the inverted repeat sequence it recognizes and binds to can be the nucleic acid sequence and its inverted repeat shown in SEQ ID NO: 7 and SEQ ID NO: 8, wherein the first double-stranded nucleic acid sequence is formed by complementary pairing of SEQ ID NO: 7 and SEQ ID NO: 8, and the second double-stranded nucleic acid sequence is formed by complementary pairing of SEQ ID NO: 7 and SEQ ID NO: 8, wherein the first double-stranded nucleic acid sequence and the second double-stranded nucleic acid sequence can each be recognized and bound by a molecule of the transposase or its bioactive fragment having the amino acid sequence shown in SEQ ID NO: 2.
[0078] Furthermore, this disclosure is intended to cover other nucleic acid sequences derived from the nucleic acid sequences shown in SEQ ID NO: 5-6 or SEQ ID NO: 7-8, for example, by adding, deleting, or replacing one or more nucleotides at the ends or non-ends of the nucleic acid sequences shown in SEQ ID NO: 5-6 or SEQ ID NO: 7-8, so that the inverted repeat sequence has or substantially has an inverted repeat structure and can bind to the transposase or its bioactive fragment to form the transposase complex.
[0079] An embodiment of the third aspect of this disclosure provides a polynucleotide that encodes a transposase or a biologically active fragment thereof or a complementary sequence thereof as described in any embodiment of the first aspect of this disclosure.
[0080] In some embodiments, the polynucleotide comprises a sequence as shown in SEQ ID NO: 3 or SEQ ID NO: 4 or its complementary sequence.
[0081] The nucleic acid sequence of transposase 22°S (SEQ ID NO: 3):
[0082] The nucleic acid sequence of transposase 11°N (SEQ ID NO: 4):
[0083] In this disclosure, the term "nucleic acid" refers to DNA, RNA, single-stranded, double-stranded, or more highly aggregated hybridization motifs and any chemical modifications thereof. Modifications include, but are not limited to, those that provide chemical groups that introduce additional charges, polarities, hydrogen bonds, electrostatic interactions, or connection and interaction sites with nucleic acid ligand bases or the nucleic acid ligand as a whole. Such modifications include, but are not limited to, peptide nucleic acids (PNAs), phosphodiester group modifications (e.g., thiophosphates, methylphosphonates), 2'-sugar modifications, 5-pyrimidine modifications, 8-purine modifications, modifications at exocyclic amine sites, substitution of 4-thiouridine, substitution of 5-bromo or 5-iodouracil, backbone modifications, methylation, and uncommon base pairing combinations such as isobases, isocytosides, and isoguanidines. Nucleic acids may also contain non-natural bases, such as nitroindole. Modifications may also include 3' and 5' modifications, such as capping with a fluorophore (e.g., quantum dots) or other portions. The nucleic acid sequences proposed in the embodiments of this disclosure (such as the nucleic acid sequences shown in SEQ ID NO: 3 or 4) may contain the above-mentioned non-natural bases or related modifications, as long as they do not affect the basic properties of the corresponding nucleotides.
[0084] It should be noted that, due to the codon degeneracy principle, the polynucleotide sequence that translates the amino acid sequence is not a unique and constant sequence, and any nucleotide sequence that can encode the same amino acid sequence is within the scope of protection of this patent.
[0085] Embodiments of the fourth aspect of this disclosure provide a transposon comprising: a polynucleotide as described in any embodiment of the third aspect of this disclosure; and a transposase-binding sequence comprising an inverted repeat sequence and optionally a DNA sequence to be transposed, the inverted repeat sequence being located flanking the polynucleotide, and the transposase or a biologically active fragment thereof being capable of binding to the inverted repeat sequence. In some embodiments, the inverted repeat sequence is attached to at least one side of the polynucleotide. In some embodiments, the inverted repeat sequence is attached to both sides of the polynucleotide, i.e., a first nucleic acid sequence and a second nucleic acid sequence are located flanking each side of the polynucleotide. In some embodiments, the DNA sequence to be transposed is located flanking the inverted repeat sequence. In other embodiments, the DNA sequence to be transposed is located between the inverted repeat sequence and the polynucleotide.
[0086] In some embodiments, the polynucleotide has the sequence shown in SEQ ID NO: 3, and the first and second nucleic acid sequences respectively have: a) nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6; or b) nucleic acid sequences having one or more nucleotide substitutions, deletions, and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding to the transposase or its bioactive fragment encoded by the polynucleotide to form a transposase complex; or
[0087] The polynucleotide has the sequence shown in SEQ ID NO: 4, and the first nucleic acid sequence and the second nucleic acid sequence respectively have: a) the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8; or b) compared with the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8, having one or more nucleotide substitutions, deletions and / or additions, such that the inverted repeat sequence has or substantially has an inverted repeat structure and can bind to the transposase or its bioactive fragment encoded by the polynucleotide to form a transposase complex.
[0088] In some embodiments, the transposon further comprises a target site repeat sequence located to one side of the transposon binding sequence, preferably the inverted repeat sequence. Embodiments of the fifth aspect of this disclosure provide a vector comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure or a transposon as described in any embodiment of the fourth aspect of this disclosure.
[0089] In this disclosure, "vector" refers to a polynucleotide that can replicate in a host organism independently of the host chromosome. Preferred vectors include plasmids and typically have a replication initiation site. Vectors may include, for example, transcription and translation terminators, transcription and translation initiation sequences, and promoters for regulating the expression of specific nucleic acids. This disclosure does not limit the specific elements included in the vector.
[0090] An embodiment of the sixth aspect of this disclosure provides a cell comprising a polynucleotide as described in any embodiment of the third aspect of this disclosure, a transposon as described in any embodiment of the fourth aspect of this disclosure, a vector as described in any embodiment of the fifth aspect of this disclosure, or expressing a transposon or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure.
[0091] The transposase or its active fragment described in this invention can be expressed in a variety of host cells, including *Escherichia coli*, other bacterial hosts, yeast, filamentous fungi, and various higher eukaryotic cell lines such as COS, CHO, and HeLa cell lines and myeloma cell lines. Techniques for gene expression in microorganisms are described, for example, in Smith, *Gene Expression in Recombinant Microorganisms* (Bioprocess Technology, Vol. 22, Marcel Dekker, 1994). Examples of bacteria suitable for expression include, but are not limited to: *Escherichia*, *Enterobacter*, *Azotobacter*, *Erwinia*, *Bacillus*, *Pseudomonas*, *Klebsiella*, *Proteus*, *Salmonella*, *Serratia*, *Shigella*, *Rhizobium*, *Vibrio*, and *Paracococcus*. Filamentous fungi suitable as expression hosts include, for example, the following genera: *Aspergillus*, *Trichoderma*, *Neurospora*, *Penicillium*, *Cephalosporium*, *Amycium*, *Strigera*, *Mucor*, *Cyclophorus*, and *Pyrophyllus*. See, for example, U.S. Patent No. 5,679,543 and Stahl and Tudzynski, eds., *Molecular Biology in Filamentous Fungi*, John Wiley & Sons, 1992. The synthesis of heterologous proteins in yeast is well-known and described in the literature. This disclosure does not limit the cells in which transposases or their active fragments are expressed.
[0092] Numerous expression systems for generating polypeptides exist, known to those skilled in the art (see, for example, Gene Expression Systems, eds. Fernandex and Hoeffler, Academic Press, 1999; Sambrook and Russell, ibid.; and Ausubel et al., ibid.). Typically, the polynucleotide encoding the polypeptide is under the control of a promoter that is functional in the desired host cell. Many different promoters are available and known to those skilled in the art and can be used in the expression vectors of this invention, depending on the specific application. Typically, the chosen promoter depends on the cell in which the promoter will be active. Optional expression control sequences, such as ribosome binding sites, transcription termination sites, etc., may also be included. A construct containing one or more of these control sequences is called an “expression cassette.” Thus, the nucleic acid encoding the conjugated polypeptide is integrated to express at high levels in the desired host cell. This disclosure does not limit the expression systems for transposases or their active fragments.
[0093] An embodiment of the seventh aspect of this disclosure provides a composition comprising a) a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure or a transposase complex as described in any embodiment of the second aspect of this disclosure, and b) an enzyme buffer.
[0094] In some embodiments, the enzyme buffer contains one or more selected from Tris-hydrochloric acid, ammonium sulfate, magnesium chloride, potassium chloride, glycerol, and PBS. In some embodiments, the enzyme buffer contains Mg. 2+ In some embodiments, the composition further comprises other components for nucleotide polymerization extension, such as dNTPs and / or NTPs, metal ions, and optional primers, template sequences, etc., for gap filling after transposition and subsequent amplification of transposon products. This disclosure does not limit the specific components of the composition.
[0095] An embodiment of the eighth aspect of this disclosure provides a kit comprising at least one of the following: a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure, a transposase complex as described in any embodiment of the second aspect of this disclosure, a polynucleotide as described in any embodiment of the third aspect of this disclosure, a transposon as described in any embodiment of the fourth aspect of this disclosure, a vector as described in any embodiment of the fifth aspect of this disclosure, a cell as described in any embodiment of the sixth aspect of this disclosure, and a composition as described in any embodiment of the seventh aspect of this disclosure.
[0096] Embodiments of the ninth aspect of this disclosure present the use of transposase complexes as described in any embodiment of the second aspect of this disclosure, polynucleotides as described in any embodiment of the third aspect of this disclosure, transposons as described in any embodiment of the fourth aspect of this disclosure, vectors as described in any embodiment of the fifth aspect of this disclosure, cells as described in any embodiment of the sixth aspect of this disclosure, compositions as described in any embodiment of the seventh aspect of this disclosure, or kits as described in any embodiment of the eighth aspect of this disclosure in nucleic acid cleavage and optional nucleic acid insertion.
[0097] It is understood that, based on its specific transposition activity, the transposase or its bioactive fragment proposed in this disclosure can recognize, cleave, and optionally insert a DNA sequence to be transposed into a target DNA molecule in the form of a transposase complex, wherein the DNA sequence to be transposed is linked to an inverted repeat sequence bound by the transposase to achieve transposition. Therefore, the transposase or its bioactive fragment proposed in this disclosure can be effectively used in various scenarios where transposases are applicable. This disclosure does not limit the specific application scenarios of the transposase or its bioactive fragment and its complex.
[0098] In some specific embodiments, the application may include one or more of the following: a. endogenous gene editing; b. exogenous gene transfer; and c. library construction.
[0099] In some embodiments, the library may include: a transcriptome library, such as a second-generation transcriptome library, a full-length transcriptome library, a single-cell transcriptome library, a spatial transcriptome library, etc.; a genomic library, such as a whole-genome library, a targeted sequencing genomic library, etc.; and an epigenetic detection library, such as a DNA-protein interaction detection library or a DNA methylation detection library, etc. In some embodiments, the DNA-protein interaction detection library may be an ATAC library, a CUT&Tag library, etc. In some embodiments, the DNA methylation detection library may be a whole-genome bisulfite sequencing (WGBS) library.
[0100] An embodiment of the tenth aspect of this disclosure provides an in vitro transposition method comprising: contacting a donor DNA molecule with a target DNA molecule and a transposase or its bioactive fragment as described in any embodiment of the first aspect of this disclosure for a period of time sufficient to allow the transposase or its bioactive fragment to catalyze in vitro transposition, wherein the donor DNA molecule comprises a DNA sequence to be transposed and an inverted repeat sequence; or contacting a transposase complex as described in any embodiment of the first aspect of this disclosure with the target DNA molecule for a period of time sufficient to allow the transposase or its bioactive fragment to catalyze in vitro transposition.
[0101] It is understood that, based on its specific transposition activity, the transposase or its bioactive fragment proposed in this disclosure can recognize, cleave, and insert the DNA sequence to be transposed into the target DNA molecule in the form of a transposase complex. The DNA sequence to be transposed is linked to an inverted repeat sequence bound by the transposase to achieve transposition. This sequence can be any DNA sequence of interest, such as regulatory elements for gene editing, such as promoters, terminators, coding sequences of transcription factors, enhancers, silencers, multifunctional sequence elements, etc.; sequences for mutant construction, such as nucleic acid fragments, exons, intron sequences, etc., that induce insertions, deletions, or substitutions of interest; sequences for transgenic purposes, such as the sequence of the foreign gene to be introduced; and sequences for library construction, such as tag sequences (including molecular-derived tags, cell-derived tags, library-derived tags, etc.), adapter sequences, sequencing primer-binding sequences, etc. This disclosure does not limit the specific type of DNA sequence to be transposed.
[0102] In some embodiments, the features of the inverted repeat sequence proposed in the third aspect of this disclosure are also applicable to the in vitro transposition method of the tenth aspect of this disclosure, and will not be repeated here.
[0103] An embodiment of the eleventh aspect of this disclosure provides a library construction method, comprising: incubating a transposase complex with an analyte containing a target DNA molecule and performing a transposition reaction to obtain a transposition product; and obtaining the library based on the transposition product, wherein the transposase complex comprises a transposase or a bioactive fragment thereof as described in any embodiment of the first aspect of this disclosure, an inverted repeat sequence, and a DNA sequence to be transposed.
[0104] In this embodiment of the disclosure, the DNA sequence to be transposed can be any DNA sequence that can be used in library construction, such as tag sequences (including molecular-derived tags, cell-derived tags, library-derived tags, etc.), adapter sequences, sequencing primer binding sequences, etc. This disclosure does not limit the specific type of DNA sequence to be transposed.
[0105] In this embodiment, the analyte containing the target DNA molecule can be determined based on the specific library type. For example, if the constructed library is a transcriptome library, the analyte can be single-stranded cDNA, double-stranded cDNA, or cDNA-RNA hybrid strands; if the constructed library is a genomic library, the analyte can be whole-genome DNA, targeted enriched whole-genome DNA, mitochondrial genomic DNA, chloroplast genomic DNA, etc.; if the constructed library is an epigenetic detection library, the analyte can be fixed and cross-linked chromatin, chromatin co-incubated with antibody, genomic DNA, etc. This disclosure does not limit the specific type of analyte.
[0106] In some embodiments, the library can be used for transcriptome sequencing, genome sequencing, or epigenetic sequencing, wherein the epigenetic sequencing may include whole-genome bisulfite sequencing (WGBS), ATAC-seq, or CUT&Tag. In some embodiments, the sequencing may include second-generation sequencing or third-generation sequencing. This disclosure does not limit the specific sequencing method.
[0107] In some embodiments, obtaining the library based on the transposon product includes: performing PCR amplification on the transposon product to obtain an amplification product, wherein the amplification product is the library.
[0108] In some embodiments, the library preparation method further includes one or more of the following: nucleic acid extraction, reverse transcription, template conversion, targeted enrichment, gap filling, adapter ligation, product circularization (preparation of DNA nanospheres), digestion, and optional product purification after each step. This disclosure does not limit the individual steps of the library preparation method.
[0109] In some embodiments, the features of the inverted repeating sequences proposed in the third aspect of this disclosure are also applicable to the library construction method of the eleventh aspect of this disclosure, and will not be repeated here.
[0110] In summary, the embodiments of this disclosure propose novel transposases 22°S and 11°N. Compared with existing transposases, the novel transposases provided by this disclosure have excellent catalytic performance, such as significantly higher transposition activity. Therefore, they can be used as alternatives to traditional Tn5 transposases in various transposase application scenarios, such as endogenous gene editing, exogenous gene introduction, and library construction, to improve transposition reaction efficiency, gene editing efficiency, or library construction efficiency.
[0111] It should be noted that the foregoing explanations of the transposases or their active fragments proposed in the embodiments of the first aspect of this disclosure also apply to the transposase complexes described in any embodiment of the second aspect of this disclosure, the polynucleotides described in any embodiment of the third aspect of this disclosure, the transposons described in any embodiment of the fourth aspect of this disclosure, the vectors described in any embodiment of the fifth aspect of this disclosure, the cells described in any embodiment of the sixth aspect of this disclosure, the compositions described in any embodiment of the seventh aspect of this disclosure, or the kits described in any embodiment of the eighth aspect of this disclosure, the applications described in any embodiment of the ninth aspect of this disclosure, the in vitro transposition methods described in any embodiment of the tenth aspect of this disclosure, and the library construction methods described in any embodiment of the eleventh aspect of this disclosure, which will not be repeated here.
[0112] Unless otherwise specified, the experimental methods used in the following examples are conventional methods, performed according to the techniques or conditions described in the literature in this field or according to the product instructions. Unless otherwise specified, the materials and reagents used in the following examples are commercially available.
[0113] Unless otherwise specified, the quantitative experiments in the following examples are all repeated three times, and the results are averaged.
[0114] Example 1: Identification of transposases at 22°S and 11°N
[0115] This embodiment identified novel transposases 22°S and 11°N by performing metagenomic sequencing and subsequent analysis on samples from marine hydrothermal vent habitats and deep-sea abyss. The 22°S transposase originated from a hydrothermal vent habitat sample at latitude / longitude 21.99S 176.57W and a depth of 1876 meters; the 11°N transposase originated from a deep-sea sample at latitude / longitude 11.34N 142.22E and a depth of 8168 meters. The specific steps are as follows:
[0116] 1.1 Metagenomic DNA was extracted from the samples using the MGIEAsy Microbial DNA Extraction Kit (catalog number: 1000027955), and metagenomic sequence data were obtained through library construction and sequencing;
[0117] 1.2 Assemble the obtained metagenomic sequence data and annotate the species and functions. Then, use the sequence of wild-type Tn5 transposase of the IS4 transposase family (SEQ ID NO: 9) as the seed sequence for data mining.
[0118] 1.3 The obtained 22°S (SEQ ID NO: 1) and 11°N (SEQ ID NO: 2) after excavation were compared with wild-type Tn5 transposase (SEQ ID NO: 9) (Clustal Omega online sequence alignment website). The comparison results are shown in Figure 1.
[0119] Wild-type Tn5 transposase sequence (SEQ ID NO: 9):
[0120] As shown in Figure 1, the sequence identity of 22°S and 11°N with the same type of wild-type Tn5 transposase (WT-Tn5) is 43.5% and 44.1%, respectively, suggesting that the transposases 22°S and 11°N obtained in this embodiment are novel transposases.
[0121] Example 2: Structural models of transposases at 22°S and 11°N
[0122] The structures of the two novel transposases identified in Example 1 were predicted using the AlphaFold3 protein structure model. Then, the predicted structures were aligned with WT-Tn5 using the TM-align online structure alignment website (https: / / zhanggroup.org / TM-align / ). The predicted structures are shown in Figure 2. In A, the red portion represents the 22°S transposase dimer, and the yellow portion represents the WT-Tn5 dimer; in B, the red portion represents the 22°S transposase monomer, and the yellow portion represents the WT-Tn5 monomer; in C, the blue portion represents the 11°N transposase dimer, and the yellow portion represents the WT-Tn5 dimer; in D, the blue portion represents the 11°N transposase monomer, and the yellow portion represents the WT-Tn5 monomer. The results showed that the TM-scores for 22°S and 11°N were 0.86 and 0.88, respectively.
[0123] As can be seen from Figure 2 and the TM-score, although the sequence identity of each of the novel transposases identified is less than 50% with existing transposases of the same type, their three-dimensional structures are highly homologous, suggesting that the two transposases identified in this embodiment, 22°S and 11°N, are very likely to have similar transposase functional activities.
[0124] Example 3: Binding of transposases 22°S and 11°N to inverted repeat sequences (ME sequences)
[0125] The binding of the two transposases identified in Example 1 to their respective ME sequences was predicted using the AlphaFold3 protein structure model. The ME sequence of 22°S (SEQ ID NO: 1) is the nucleic acid sequence and its inverted repeat shown in SEQ ID NO: 5 and SEQ ID NO: 6 (Figure 3, left), and the ME sequence of 11°N (SEQ ID NO: 2) is the nucleic acid sequence and its inverted repeat shown in SEQ ID NO: 7 and SEQ ID NO: 8 (Figure 3, right). The prediction results are shown in Figure 3.
[0126] As shown in Figure 3, both 22°S (red) and 11°N (blue) can bind to their respective ME sequences, indicating that the ME sequence proposed in this embodiment can be effectively recognized and bound by 22°S and 11°N, thereby forming a transposase complex and exerting transposable activity.
[0127] Example 4: Determination of transposase activity at 22°S and 11°N
[0128] In this embodiment, the encoding gene sequences (SEQ ID NO: 3 and SEQ ID NO: 4) and their respective ME sequences of transposases 22°S and 11°N were synthesized, expressed, and purified to prepare transposases 22°S and 11°N (SEQ ID NO: 1 and SEQ ID NO: 2). The transposition activities of 22°S and 11°N were then measured, with WT-Tn5 used as a positive control. The specific steps are as follows.
[0129] 4.1 Preparation of transposases at 22°S and 11°N
[0130] A 22°S transposon sequence was synthesized: ME sequence-transposase gene sequence-ME sequence. This transposon sequence was then introduced into the pET-28a expression vector (Changzhou Xinyi Biotechnology Co., Ltd.), with cloning sites at Nco I and Xho I, yielding the pET-28a-22°S vector. The pET-28a-11°N and pET-28a-WT-Tn5 vectors were constructed using the same method, with the ME sequence of WT-Tn5 being CTGTCTCTTATACACATCT (SEQ ID NO: 12).
[0131] The recombinant vector was transformed into *E. coli* BL21(DE3) competent cells. The transformed cells were seeded onto plates and incubated overnight at 37°C. The next day, 3-4 single colonies were picked from the plates and inoculated into 2 ml of LB broth, incubated at 37°C for 5-7 h until the OD600 reached 0.8-1.0. Then, IPTG was added to the bacterial culture to a final concentration of 0.5 mM, and the culture was incubated at 16°C with shaking at 220 rpm for 16 h to induce expression.
[0132] Centrifuge the induced bacterial culture at 4,000 rpm for 5 min, discard the supernatant, add 100 μL of 1×PBS to resuspend, and store temporarily at 4℃.
[0133] 4.2 Detection of transposase activity at 22°S and 11°N
[0134] The transposase activity at 22°S and 11°N was detected using PCR amplification, as shown in Figure 4. Specifically, as shown in Figure 4, the transposon sequence was amplified using vector primers T7 and T7-ter flanking the transposon sequence, and the product length was detected. If no transposition reaction occurred, the product length between the T7 and T7-ter binding sites was approximately 1800 bp. If a transposition reaction occurred, i.e., the transposon detached from the binding site, the product length between the T7 and T7-ter binding sites was approximately 220 bp. The occurrence of a transposition reaction was determined by the product length, thus confirming the transposase activity. The specific detection steps are as follows.
[0135] First, in vivo transposition reactions were induced for each transposon: 8 μL of bacterial suspension resuspended in 1×PBS was added to 2 μL of 5×TAG Buffer (containing 50 mM TAPS-NaOH (pH 8.5 @ 25℃), 25 mM MgCl2, and 50% DMF), and incubated at 25℃ for 1 h, followed by incubation at 47℃ for 2 h to induce the transposition reaction.
[0136] Subsequently, 2 μL of the incubated transposon reaction solution was used for PCR, with 2×KAPA HiFi polymerase employed. TM The HotStart readymix (KM2602, Kapa Biosystems) was used, and the reaction system and procedure are shown in Tables 1 and 2, respectively. The amplification results are shown in Figure 5.
[0137] Table 1
[0138] T7 primer: TAATACGACTCACTATAGG (SEQ ID NO: 10)
[0139] T7-ter primer: GCTAGTATTGCTCAGCGG (SEQ ID NO: 11)
[0140] Table 2
[0141] As shown in Figure 5, compared to WT-Tn5, the bands of the 22°S transposase and the 11°N transposase are significantly brighter at the 220bp position used to indicate the transposition reaction, indicating that the 22°S transposase and the 11°N transposase proposed in this embodiment have significantly higher transposition activity than the traditional Tn5 transposase.
[0142] Furthermore, the band at 220 bp in the above results was recovered and subjected to Sanger sequencing. The sequencing results are shown in Figure 6-7.
[0143] Figure 6 shows the transposition results of the 22°S transposase, and Figure 7 shows the transposition results of the 11°N transposase. As can be seen from the sequencing alignment results in Figures 6 and 7, by confirming the transposase activity in the four monoclonal strains, it was found that all four strains showed the deletion of transposon sequences compared with the theoretical reference sequence, and the 220bp band could be aligned to the segments before and after the transposon sequence of the reference sequence, indicating that the 220bp band is indeed the transposition reaction product. The 22°S transposase and the 11°N transposase have significantly higher transposase activity.
[0144] Example 5: Determination of transposase activity at 22°S and 11°N
[0145] In this embodiment, the transposition activities of 22°S and 11°N transposases were evaluated using a dual plasmid system and the Papillation Assay, as shown in Figure 8. First, the gene sequences of the transposases were introduced into the pBAD / His A plasmid to construct the pBAD / His A-22°S and pBAD / His A-11°N recombinant vectors. Simultaneously, the ME sequences corresponding to each transposase were designed flanking the lacZ gene, with the lacZ gene containing a deletion of amino acids 1-8 (i.e., Δ). 1-8 lacZ), then ME-△ 1-8 lacZ-ME was constructed into the pACYC184 plasmid. Subsequently, the transposase recombinant vector was combined with the recombinant vector pACYC184-ME-△ containing its corresponding ME sequence. 1-8lacZ-ME was co-transformed into TOP10 competent E. coli at a 1:1 ratio. Transposase activity was then confirmed by blue-white screening based on lacZ: if the 22°S or 11°N transposase was active, it would cut the lacZ gene and randomly insert it into gene X in the E. coli genome. If the reading frame was correct, the E. coli would be transformed from a lac(-) strain to a lac(+) strain, expressing the lacZ protein as a fusion protein, which would then show blue colonies on LB solid medium containing X-gal (X-galactoside).
[0146] In the blue-white screening, the TOP10 vectors transformed with two recombinant vectors were cultured in LB solid medium containing X-gal at 37°C for 115 h. The LB solid medium containing X-gal contained: 40 μg / mL X-gal, 0.1% arabinose, 0.05% lactose, 25 μg / mL chloramphenicol, and 100 μg / mL ampicillin.
[0147] The results of the blue-white screening are shown in Figure 9. As can be seen from Figure 9, only the recombinant vector pACYC184-ME-△ was transformed... 1-8 E. coli lacZ-ME without the transposase gene sequence did not show blue spots (see Figures 9a and c, where Figure 9a shows pACYC184-ME-△ transformed only with the ME sequence corresponding to 22°S). 1-8 Figure c shows the results of lacZ-ME; Figure c shows pACYC184-ME-△ with only the ME sequence corresponding to 11°N transformed. 1-8 The results of lacZ-ME are shown in the figure. In E. coli transformed with the transposase gene sequence and its corresponding ME sequence, distinct blue spots were observed (shown as black spots in the figure, see figures b and d in Figure 9, where figure b shows pACYC184-ME-△ transformed with pBAD / His A-22°S and its corresponding ME sequence). 1-8 The results of lacZ-ME are shown in Figure d, where pACYC184-ME-△ is transformed with pBAD / His A-11°N and its corresponding ME sequence. 1-8 (The results of lacZ-ME) confirm once again that the 22°S and 11°N transposases proposed in this embodiment have high transposition activity and are a novel transposase with important application value and modification prospects.
[0148] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0149] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention.
Claims
1. A transposase or a bioactive fragment thereof, wherein the transposase or the bioactive fragment thereof: a. Has an amino acid sequence as shown in SEQ ID NO: 1 or SEQ ID NO: 2; b. Compared with the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2, it has an amino acid sequence with one or more amino acid substitutions, deletions, and / or additions, and the transposase or its bioactive fragment has transposase function; or c. An amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, or at least 99.9% identity with the amino acid sequence shown in SEQ ID NO: 1 or SEQ ID NO: 2, and the transposase or its bioactive fragment having transposase function.
2. The transposase or its bioactive fragment according to claim 1, wherein the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO:
1.
3. The transposase or its bioactive fragment according to claim 1, wherein the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO:
2.
4. A transposase complex comprising: The transposase or its bioactive fragment as described in any one of claims 1 to 3; and A transposase-binding sequence comprising an inverted repeat sequence and an optional DNA sequence to be transposed, wherein the transposase or a biologically active fragment thereof binds to the inverted repeat sequence; Preferably, the DNA sequence to be transposed is a gene or a fragment thereof, an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
5. The transposase complex according to claim 4, wherein the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second nucleic acid sequence present an inverted repeat structure, and the transposase or its bioactive fragment recognizes and binds to the first nucleic acid sequence and the second nucleic acid sequence.
6. The transposase complex according to claim 5, wherein the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 1, and the first nucleic acid sequence and the second nucleic acid sequence respectively have: a. Nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6; or b. A nucleic acid sequence having one or more nucleotide substitutions, deletions and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding to the transposase or its bioactive fragment to form the transposase complex.
7. The transposase complex according to claim 5, wherein the transposase or its bioactive fragment has the amino acid sequence shown in SEQ ID NO: 2, and the first nucleic acid sequence and the second nucleic acid sequence respectively have: a. Nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8; or b. A nucleic acid sequence having one or more nucleotide substitutions, deletions and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding to the transposase or its bioactive fragment to form the transposase complex.
8. A polynucleotide encoding a transposase or a biologically active fragment thereof or a complementary sequence thereof as described in any one of claims 1 to 3, optionally, the polynucleotide comprising a sequence as shown in SEQ ID NO: 3 or SEQ ID NO: 4 or a complementary sequence thereof.
9. A transposer, the transposer comprising: The polynucleotide as described in claim 8; and A transposase-binding sequence comprising an inverted repeat sequence and an optional DNA sequence to be transposed, the inverted repeat sequence being located flanking the polynucleotide, the transposase or a biologically active fragment thereof being capable of binding to the inverted repeat sequence. Preferably, the DNA sequence to be transposed is a gene or a fragment thereof, an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
10. The transposon of claim 9, wherein the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second double-stranded sequence form an inverted repeat structure, and the first nucleic acid sequence and the second nucleic acid sequence are respectively located on each side of the polynucleotide.
11. The transposable according to claim 10, wherein The polynucleotide has the sequence shown in SEQ ID NO: 3, and the first and second nucleic acid sequences respectively have: a) nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6; or b) nucleic acid sequences with one or more nucleotide substitutions, deletions, and / or additions compared to the nucleic acid sequences shown in SEQ ID NO: 5 and SEQ ID NO: 6, such that the inverted repeat sequence has or substantially has an inverted repeat structure and is capable of binding to the transposase or its bioactive fragment encoded by the polynucleotide to form a transposase complex; or The polynucleotide has the sequence shown in SEQ ID NO: 4, and the first nucleic acid sequence and the second nucleic acid sequence respectively have: a) the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8; or b) compared with the nucleic acid sequences shown in SEQ ID NO: 7 and SEQ ID NO: 8, having one or more nucleotide substitutions, deletions and / or additions, such that the inverted repeat sequence has or substantially has an inverted repeat structure and can bind to the transposase or its bioactive fragment encoded by the polynucleotide to form a transposase complex.
12. The transposon according to any one of claims 9 to 11, wherein the transposon further comprises a target site repeat sequence located to one side of the transposon binding sequence, preferably the inverted repeat sequence.
13. A vector comprising the polynucleotide of claim 8 or the transposon of any one of claims 9 to 12.
14. A cell comprising the polynucleotide of claim 8, the transposon of any one of claims 9 to 12, or the vector of claim 13, or expressing the transposase or a bioactive fragment thereof of any one of claims 1 to 3.
15. A composition comprising: The transposase or its bioactive fragment as described in any one of claims 1 to 3, or the transposase complex as described in any one of claims 4 to 7; and Enzyme buffer, wherein the enzyme buffer optionally contains Mg 2+ , Optionally, the enzyme buffer contains one or more selected from Tris-hydrochloric acid, ammonium sulfate, magnesium chloride, potassium chloride, glycerol, and PBS.
16. A kit comprising at least one of the following: a transposase or a bioactive fragment thereof as claimed in any one of claims 1 to 3, a transposase complex as claimed in any one of claims 4 to 7, a polynucleotide as claimed in claim 8, a transposon as claimed in any one of claims 9 to 12, a vector as claimed in claim 13, a cell as claimed in claim 14, and a composition as claimed in claim 15.
17. The use of the transposase or its bioactive fragment as described in any one of claims 1 to 3, the transposase complex as described in any one of claims 4 to 7, the polynucleotide as described in claim 8, the transposon as described in any one of claims 9 to 12, the vector as described in claim 13, the cell as described in claim 14, the composition as described in claim 15, or the kit as described in claim 16 in nucleic acid cleavage and optionally nucleic acid insertion.
18. The application of claim 17, wherein the application includes one or more of the following: a. Endogenous gene editing; b. Introduction of exogenous genes; and c. Library construction; Optionally, the library includes a transcriptome library, a genome library, or an epigenetic detection library, wherein the epigenetic detection library may optionally be a DNA-protein interaction detection library or a DNA methylation detection library.
19. An external transposition method, comprising: Contacting a donor DNA molecule with a target DNA molecule and a transposase or its bioactive fragment as described in any one of claims 1 to 3 for a period of time sufficient to allow the transposase or its bioactive fragment to catalyze in vitro transposition, wherein the donor DNA molecule comprises the DNA sequence to be transposed and an inverted repeat sequence; or Contacting the transposase complex as described in any one of claims 4 to 7 with the target DNA molecule for a period of time sufficient to allow the transposase or its bioactive fragment to catalyze in vitro transposition. Preferably, the DNA sequence to be transposed is a gene or a fragment thereof, an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence.
20. The method of claim 19, wherein the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second nucleic acid sequence form an inverted repeat structure, and the first nucleic acid sequence and the second nucleic acid sequence are respectively linked to the DNA sequence to be transposed, or are jointly linked to the DNA sequence to be transposed. Optionally, based on the fact that the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 1, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6, respectively; Optionally, based on the fact that the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 2, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8, respectively.
21. A library construction method, comprising: The transposase complex is incubated with the analyte containing the target DNA molecule and a transposition reaction is carried out to obtain the transposition product; and Based on the transposition product, the library is obtained. The transposase complex comprises the transposase or its bioactive fragment as described in any one of claims 1-3, an inverted repeat sequence, and the DNA sequence to be transposed. The DNA sequence to be transposed is an adapter sequence, a tag sequence, and / or a sequencing primer binding sequence. Optionally, the inverted repeat sequence comprises a first nucleic acid sequence and a second nucleic acid sequence, wherein the first nucleic acid sequence and the second nucleic acid sequence form an inverted repeat structure, and the first nucleic acid sequence and the second nucleic acid sequence are respectively linked to the DNA sequence to be transposed, or are jointly linked to the DNA sequence to be transposed. Optionally, based on the fact that the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 1, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 5 and SEQ ID NO: 6, respectively; Optionally, based on the fact that the transposase or its bioactive fragment has an amino acid sequence as shown in SEQ ID NO: 2, the first nucleic acid sequence and the second nucleic acid sequence have nucleic acid sequences as shown in SEQ ID NO: 7 and SEQ ID NO: 8, respectively.
22. The method of claim 21, wherein obtaining the library based on the transposition product comprises: The transposon product was subjected to PCR amplification to obtain an amplification product, which is the library.