A modified RNA helicase ns3h and uses thereof
By modifying the NS3h domain of hepatitis C virus RNA helicase, the movement of polynucleotides in nanopores was controlled by covalent linkage, which solved the problem of excessively fast sequencing speed in nanopore sequencing and improved the accuracy and sensitivity of sequencing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING POLYSEQ BIOTECH CO LTD
- Filing Date
- 2024-12-16
- Publication Date
- 2026-06-23
Smart Images

Figure CN122256296A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of biotechnology, and in particular to a modified RNA helicase NS3h and its uses. Background Technology
[0002] Currently, single-molecule sequencing technologies can be mainly divided into two categories: one is optical zero-mode waveguide sequencing, represented by Pacific Biosciences (PacBio) in the United States; the other is electrical nanopore sequencing, represented by Oxford Nanopore Technologies (ONT) in the United Kingdom. Nanopore sequencing technology is a novel nucleic acid sequencing technology developed in recent years. Based on pore type, it can be divided into solid-state pores and biological nanopores. Biological nanopores are pore proteins that allow substrates to pass through. The following nanopore sequencing refers specifically to biological nanopore sequencing technology.
[0003] Under the influence of an electric field, charged nucleic acid substrates can pass through biological nanopores. When nucleic acids pass through the nanopore, they impede the current flowing through it, generating different current signals. By analyzing these signals, the base information of the nucleic acid can be obtained. Compared to other sequencing methods, it offers advantages such as low equipment cost, simple sample preparation, and fast sequencing speed, and has begun to be applied in various fields. Specific advantages include: easy library construction without amplification; fast signal readout speed, typically reaching 200-300 bp / s; long readout length, typically reaching thousands of bases; direct detection of modifications on DNA; and direct RNA sequencing. Due to the characteristics of nanopore sequencing, RNA no longer needs to be reverse transcribed into DNA for sequence analysis, thus preserving modification information on the RNA. Because of these advantages, nanopore sequencing technology has gained widespread attention in recent years.
[0004] One of the challenges of nanopore sequencing technology is that DNA / RNA molecules often pass through nanopores too quickly, exceeding the instrument's resolution, making it difficult to obtain accurate electrical signals reflecting sequence information. Therefore, controlling or slowing down the speed at which DNA / RNA molecules pass through nanopores is crucial for improving the accuracy of nanopore sequencing. In existing technologies, helicases can be used to control the movement of DNA / RNA molecules through nanopores, increasing their residence time. Although various helicases have been disclosed for use in nanopore sequencing, each has its own advantages, disadvantages, and applicable environments. These helicases still fall short of meeting the increasingly stringent requirements of scientific research and medical technologies for nucleic acid sequencing. Therefore, there remains a need for novel helicases suitable for nanopore sequencing to improve its applicability, accuracy, and sensitivity. Summary of the Invention
[0005] To address the aforementioned technical problems, this invention overcomes the issue of excessively rapid migration of polynucleotides (RNA) through nanopores in existing technologies. It provides an RNA helicase NS3h from Hepatitis C Virus, abbreviated as HCV NS3h, which is located at positions 189-625 of the C-terminal domain of the viral NS3 protein (as shown in SEQ ID No. 48). It is very useful for controlling the movement of polynucleotides (RNA) through nanopores during strand sequencing.
[0006] Based on the resolved crystal structure, HCV NS3h (PDB ID: 3KQL) mainly comprises three structural domains: domain 1, domain 2, and domain 3. In this invention, when reducing the opening size through modification and connection, the focus is on domains 2 and 3, tending to use covalent connections to reduce or close the opening.
[0007] There are two ways to link amino acids. One way is through the natural amino acids themselves, by replacing or inserting new amino acids, such as cysteine and non-natural amino acids. The other way is through linker molecules, which tend to link cysteine residues. Linker molecules include BMOE, BMH, Bis(PEG)2 and Bis(PEG)3.
[0008] This invention provides a modified RNA helicase NS3h, wherein the modified RNA helicase NS3h is any of the following proteins:
[0009] (a1) A protein having the same function by substituting and / or adding one or more amino acid residues of the amino acid sequence shown in SEQ ID No. 48.
[0010] Proteins whose amino acid sequences defined by (a2) and (a1) are more than 80% identical and have the same function;
[0011] (a3) Proteins obtained by truncating 188 amino acid residues at the N-terminus and 7 amino acid residues at the C-terminus of the proteins defined in (a1) or (a2);
[0012] (a4) The fusion protein obtained by attaching a tag to the end of any of the proteins defined in (a1)-(a3).
[0013] Optionally, according to the modified RNA helicase NS3h described above, (a1) the protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is replaced by amino acid residues at at least one of the following positions: 279, 289, 292, 368, 374, 428, 431, 492, 494, 499, 525, 568, 584 and 622.
[0014] Optionally, the substitution is the replacement of the amino acid residue with an ornithine residue or a serine residue.
[0015] Optionally, according to the modified RNA helicase NS3h described above, (a1) the protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is further substituted with amino acid residues at at least one of the following positions: 256, 371, 372, 375, 394, 395, 396, 551, 552, 553, 554, 579, 580, and 581. Optionally, the substitution is the substitution of the amino acid residues with cysteine residues.
[0016] Optionally, (a1) the protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is substituted and / or added by at least two amino acid residues, and there is a covalent link between the at least two substituted and / or added amino acid residues. For example, the at least two amino acid residues between the at least two substituted and / or added amino acid residues are amino acid residues at the following positions: 256, 371, 372, 375, 394, 395, 396, 551, 552, 553, 554, 579, 580, and 581.
[0017] Optionally, there is a covalent link between the at least two substituted and / or added amino acid residues, specifically between the substituted and / or added amino acid residues at positions 394 and 553 of the amino acid sequence shown in SEQ ID No. 48.
[0018] In one embodiment of the present invention, (a1) the protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is replaced by amino acid residues at the following positions: 499, 584, 394 and 553, for example, the replacements are C499S, C584G, G394C and A553C, then (a3) the protein sequence is as shown in SEQ ID NO. 4.
[0019] Optionally, the covalent connection is achieved through disulfide bonds or linking molecules.
[0020] Optionally, the linker molecule is selected from at least one of bismaleimide ethane (BMOE), 1,4-bismaleimide butane (BMB), bismaleimide hexane (BMH), dithiomaleimide ethane (DTME), tris(2-maleimide ethyl)amine (TMEA), 1,8-bismaleimide-diethylene glycol (Bis(PEG)2), and 1,11-bismaleimide-triethylene glycol (Bis(PEG)3).
[0021] Optionally, according to the modified RNA helicase NS3h described above, (a1) the protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is further substituted with amino acid residues at at least one of the following positions: 444, 450, 245, 544, 594, 607, 292, 462, 465, 418, 493, 432, 587, 541, 570, 593 and 545.
[0022] Optionally, the substitution method is as follows: F at bit 444 is replaced by W; T at bit 450 is replaced by I; A at bit 245 is replaced by Q; A at bit 544 is replaced by Q; G at bit 594 is replaced by N; N at bit 607 is replaced by D; C at bit 292 is replaced by G; R at bit 462 is replaced by L; T at bit 465 is replaced by N; F at bit 418 is replaced by Y; E at bit 493 is replaced by K or Q; V at bit 432 is replaced by D; R at bit 587 is replaced by A; H at bit 541 is replaced by A; R at bit 570 is replaced by A; H at bit 593 is replaced by A; H at bit 545 is replaced by A.
[0023] For example, (a3) describes a protein sequence as shown in any of SEQ ID NO.12-29.
[0024] This invention also provides related biological materials for the above-mentioned modified RNA helicase NS3h, wherein the related biological materials are any of the following:
[0025] c1) The nucleic acid molecule encoding the modified RNA helicase NS3h described above;
[0026] c2) An expression cassette containing the nucleic acid molecule described in c1);
[0027] c3) A recombinant vector containing the nucleic acid molecule described in c1), or a recombinant vector containing the expression cassette described in c2);
[0028] c4) Recombinant microorganisms containing the nucleic acid molecules described in c1), or recombinant microorganisms containing the expression cassette described in c2), or recombinant microorganisms containing the recombinant vector described in c3).
[0029] Optionally, the nucleic acid molecule described in c1 is any of the following DNA molecules:
[0030] d1) The nucleotide sequence is any of the DNA molecules shown in SEQ ID NO.3 or SEQ ID NO.30-47 in the sequence listing;
[0031] d2) The coding sequence is any DNA molecule shown in SEQ ID NO.3 or SEQ ID NO.30-47 in the sequence listing;
[0032] d3) has 90% or more identity with the nucleotide sequence defined by d1) or d2) and is derived from the hepatitis C virus and encodes the above-mentioned protein DNA molecule.
[0033] d4) Hybridizes under strict conditions to the nucleotide sequence defined by d1) or d2) and encodes the aforementioned protein in a DNA molecule.
[0034] In the above-mentioned biological materials, the nucleic acid molecule can be DNA, such as cDNA, genomic DNA or recombinant DNA; the nucleic acid molecule can also be RNA, such as mRNA, siRNA, shRNA, sgRNA, miRNA or antisense RNA.
[0035] In the aforementioned biological materials, the expression cassette refers to DNA capable of expressing genes in host cells. This DNA may include not only promoters that initiate gene transcription but also terminators that terminate gene transcription. Furthermore, the expression cassette may also include enhancer sequences.
[0036] The present invention also provides a construct comprising the modified RNA helicase NS3h described above and a binding moiety for binding polynucleotides.
[0037] The use of the modified RNA helicase NS3h, the biological material, or the construct described above in any of the following ways also falls within the scope of protection of this invention:
[0038] (1) Characterize the target polynucleotide;
[0039] (2) Prepare products that characterize the target polynucleotide;
[0040] (3) Controlling the passage of target polynucleotides through nanopores;
[0041] (4) Prepare products that control the passage of target polynucleotides through nanopores.
[0042] This invention also provides a method for controlling the passage of target polynucleotides through nanopores, including...
[0043] (1) The target polynucleotide is ligated with the modified RNA helicase NS3h or the above-mentioned construct to obtain the sample to be tested.
[0044] (2) The sample to be tested is brought into contact with the nanoporin, so that the target polynucleotide in the sample to be tested moves relative to the nanoporin.
[0045] The present invention also provides a method for characterizing target polynucleotides, including...
[0046] A. Implementing the above-described method for controlling the passage of target polynucleotides through nanopores;
[0047] B. Acquire one or more measurements as the target polynucleotide moves relative to the nanoporin in the sample to be tested, thereby determining the presence, absence, or one or more characteristics of the target polynucleotide.
[0048] Optionally, the one or more features are selected from at least one of (i) the length of the target polynucleotide; (ii) the identity of the target polynucleotide; (iii) the sequence of the target polynucleotide; (iv) the secondary structure of the target polynucleotide; and (v) whether the target polynucleotide is modified.
[0049] The present invention also provides a kit for characterizing target polynucleotides or controlling the passage of target polynucleotides through nanopores, comprising at least one of the above-described modified RNA helicase NS3h or the above-described construct. Optionally, the kit further comprises nanopore proteins and a membrane. The membrane can be any membrane existing in the prior art, preferably a lipid bilayer. For example, the membrane is a lipid bilayer formed by the self-assembly of block copolymers / phospholipid molecules.
[0050] Optionally, the nucleic acid can be naturally occurring or artificially synthesized. Specifically, the nucleic acid can be natural DNA, RNA, or modified DNA or RNA, or it can be artificially synthesized nucleic acid, such as peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threonine nucleic acid (TNA), locked nucleic acid (LNA), or other synthetic polymers with nucleoside side chains.
[0051] Optionally, the nucleic acid is single-stranded, double-stranded, or at least partially double-stranded.
[0052] Optionally, the nucleic acid can be of any length. For example, the length of the nucleic acid can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotides or nucleotide pairs, or it can be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs, or 100000 or more nucleotides or nucleotide pairs.
[0053] Optionally, one or more nucleotides in the nucleic acid may be modified, such as methylated, oxidized, damaged, debased, protein-labeled, tagged, or linked to a spacer in the middle of a polynucleotide sequence.
[0054] In this article, amino acids and their abbreviations and English abbreviations are as follows: Histidine (His, H); Serine (Ser, S); Glutamic acid (Glu, E); Glutamine (Gln, Q); Glycine (Gly, G); Threonine (Thr, T); Phenylalanine (Phe, F); Aspartic acid (Asp, D); Tyrosine (Tyr, Y); Leucine (Leu, L); Isoleucine (Ile, I); Arginine (Arg, R); Alanine (Ala, A); Valine (Val, V); Tryptophan (Trp, W); Methionine (Met, M); Asparagine (Asn, N); Cysteine (Cys, C); Lysine (Lys, K); Proline (Pro, P). Standard substitution notation is also used, i.e., E80Q means that the E at position 80 of the sequence is replaced by Q.
[0055] The modified HCV NS3h helicase provided by this invention can control the movement of RNA through biological nanopores, especially under the influence of an electric field. The helicase enables target polynucleotides to move through the nanopores in a controlled and stepwise manner.
[0056] The HCV NS3h helicase mutants provided by this invention enhance the ability to control the translocation of polynucleotides through nanopores. These mutants typically have one or more modifications in domain 2 or domain 3. Therefore, the modified HCV NS3h helicase provided by this invention has at least one cysteine or non-natural amino acid insertion while still maintaining its ability to control polynucleotide movement.
[0057] This invention also provides a modified HCVNS3h helicase in which domain 2 and domain 3 are covalently linked by a linker molecule, which improves the stability of the HCV NS3h helicase of this invention in binding to polynucleotides. In particular, when the length of the polynucleotide chain increases, the helicase of this invention can still stably control the movement of polynucleotides without detaching from the polynucleotides. Attached Figure Description
[0058] Figure 1 This is an SDS-PAGE gel electrophoresis image of the purified HCV NS3h helicase.
[0059] Figure 2SDS-PAGE gel electrophoresis images of purified HCV NS3h-C499S / C584G / G394C / A553C protein and HCV NS3h-C499S / C584G / G394C / A553C protein-BMH.
[0060] Figure 3 The results are from the gel migration experiment in Example 3.
[0061] Figure 4 This is a schematic diagram of nucleic acid construct X.
[0062] Figure 5 Example current trajectory of nucleic acid construct X translocated through nanopores under the control of helicase controlled by HCV NS3h-C499S / C584G / G394C / A553C-BMH in Example 4.
[0063] Figure 6 This is an enlarged view of the region in Example 4 where the helicase controls the movement of the X-pore in the nucleic acid construct.
[0064] Figure 7 This is an SDS-PAGE gel electrophoresis image of the purified N1 to N18.
[0065] Figure 8 The graph shows the velocity distribution of the controlled nucleic acid construct X through the nanopore, with the horizontal axis representing velocity in nt / s and the vertical axis representing percentage.
[0066] Figure 9 The graph shows the velocity distribution of the N1-controlled nucleic acid construct X as it moves through the nanopore. The horizontal axis represents velocity in nt / s, and the vertical axis represents percentage.
[0067] Figure 10 The graph shows the velocity distribution of the controlled nucleic acid construct X through the nanopore, with the horizontal axis representing velocity in nt / s and the vertical axis representing percentage.
[0068] Figure 11 This is a velocity distribution diagram of the N16-controlled nucleic acid construct X moving through the nanopore. The horizontal axis represents velocity in nt / s, and the vertical axis represents percentage. Detailed Implementation
[0069] The present invention will now be described in further detail with reference to specific embodiments. The given embodiments are merely illustrative of the invention and not intended to limit its scope. The embodiments provided below can serve as a guide for further improvements by those skilled in the art and do not constitute a limitation on the invention in any way.
[0070] Unless otherwise specified, the experimental methods used in the following examples are conventional methods, performed according to the techniques or conditions described in the literature in this field or according to the product instructions. Unless otherwise specified, the materials and reagents used in the following examples are commercially available. All quantitative experiments in the following examples were performed in triplicate, and the results were averaged.
[0071] Example 1: Preparation of HCV NS3h helicase protein
[0072] Based on the amino acid sequence of the wild-type HCV NS3h protein (PDB ID: 3KQL), its nucleic acid sequence was obtained through in vitro gene synthesis. Then, using *E. coli* as the host, the relevant codons were optimized by replacing them with commonly used *E. coli* codons. Furthermore, to facilitate subsequent expression and purification, a His-tag purification label was added to the 3' end of the sequence. The optimized nucleic acid sequence encoding the HCV NS3h helicase protein was obtained, as shown in SEQ ID NO.1, and the translated protein amino acid sequence is shown in SEQ ID NO.2. This encoding gene was then ligated into the pET22b expression vector via two restriction enzyme sites, NdeⅠ and XhoⅠ. After sequencing verification of the sequence correctness, a recombinant expression plasmid expressing HCV NS3H helicase was finally obtained.
[0073] The recombinant expression plasmid was transformed into the BL21(DE3) *E. coli* expression host via heat shock. During the induction process, the host bacteria containing the recombinant expression plasmid were first cultured overnight at 37°C in LB medium supplemented with kanamycin. Then, the culture was scaled up at 37°C at a 1:100 ratio until the OD(600) value reached 0.4-0.6. The culture was then stopped and the bacteria were cooled to 4°C for 1 hour. Subsequently, 0.5 mM isopropyl β-D-thiogalactoside (IPTG) was added, and expression was induced at 16°C for 12-16 hours. The bacteria were then collected by centrifugation at 15,000 rpm at 4°C. The cells were then autoclaved at 4°C, and the supernatant was collected by centrifugation at 4°C. The target protein was then purified step-by-step using nickel columns, heparin columns, Q columns, and molecular sieves to obtain a large quantity of high-purity HCVNS3h helicase protein. SDS-PAGE electrophoresis was used to detect purified HCV NS3h helicase protein.
[0074] Figure 1 This is an SDS-PAGE gel electrophoresis image of purified HCV NS3h helicase protein. The left lane is the marker, and the right lane is the HCV NS3h helicase protein.
[0075] Example 2: Preparation of HCV NS3h-C499S / C584G / G394C / A553C and HCV NS3h-C499S / C584G / G394C / A553C-BMH proteins
[0076] The recombinant expression plasmid of HCV NS3H helicase prepared in Example 1 was subjected to site-directed mutagenesis using overlap PCR. The nucleic acid sequence encoding the HCV NS3h helicase protein in the recombinant expression plasmid was replaced with the nucleic acid sequence encoding the HCV NS3h-C499S / C584G / G394C / A553C protein with the mutant combination (as shown in SEQ ID NO.3), thus obtaining the mutated recombinant expression plasmid. The mutated recombinant expression plasmid expresses the HCV NS3h-C499S / C584G / G394C / A553C protein, the sequence of which is shown in SEQ ID NO.4. The difference between this protein and the HCV NS3h helicase protein is that cysteine at position 499 is replaced by serine (C499S), cysteine at position 584 is replaced by ornithine (C584G), ornithine at position 394 is replaced by cysteine (G394C), and alanine at position 553 is replaced by cysteine (A553C).
[0077] The mutated recombinant plasmid was transformed into the BL21(DE3) *E. coli* expression host using a heat shock method. During the induction process, the host bacteria containing the expression plasmid were first cultured overnight at 37°C in LB medium supplemented with kanamycin. Then, the culture was scaled up at 37°C at a 1:100 ratio. When the OD(600) value reached 0.4-0.6, the culture was stopped and the bacteria were cooled to 4°C for 1 hour. Subsequently, isopropyl β-D-thiogalactoside (IPTG) at a final concentration of 0.5 mM was added, and expression was induced at 16°C for 12-16 hours. Then, bacteria were collected by centrifugation at 4°C and 15,000 rpm. The bacterial cells were then subjected to high-pressure disruption at 4°C, followed by centrifugation at 4°C to collect the supernatant. The target protein was then purified step-by-step using nickel columns, heparin columns, Q columns, and molecular sieves to obtain a large quantity of high-purity HCV NS3h-C499S / C584G / G394C / A553C protein. SDS-PAGE electrophoresis was used to detect the purified HCV NS3h-C499S / C584G / G394C / A553C protein.
[0078] 1 μL of 1M DTT was added to 100 μL of purified HCV NS3h-C499S / C584G / G394C / A553C protein (stored in 25 mM Tris-HCl pH 7.5, 500 mM NaCl, 10% glycerol), and incubated at room temperature for 30 minutes. The buffer was replaced with PBS buffer (pH 7.0) through a 0.5 mL Zeba desalting column (7 kWh MWCO) to obtain 100 μL of sample. 0.5 μL of 10 mM BMH was added to the sample, and the mixture was incubated at 20 rpm for 1 hour at room temperature to obtain HCV NS3h-C499S / C584G / G394C / A553C-BMH protein. The incubation results were then analyzed using a 4-10% polyacrylamide gel electrophoresis. The HCV NS3h-C499S / C584G / G394C / A553C-BMH protein is obtained by crosslinking HCV NS3h-C499S / C584G / G394C / A553C protein with BMH. Specifically, BMH forms a covalent link between cysteine residues at positions 394 and 553 of the HCV NS3h-C499S / C584G / G394C / A553C protein.
[0079] Test results as follows Figure 2 As shown, lane 1 is the HCV NS3h-C499S / C584G / G394C / A553C protein strip, lane 2 is the HCV NS3h-C499S / C584G / G394C / A553C-BMH protein strip, and lane M is the marker. Figure 2 The results showed that the HCV NS3h-C499S / C584G / G394C / A553C protein was cross-linked with BMH in a yield close to 95%.
[0080] Example 3: Using a gel migration assay to detect the DNA binding ability of modified HCV NS3h helicase.
[0081] Nucleic acid substrate 1:
[0082] GGCGTCTGCTTGGGTGTTTAACCTTTTTTTTTTCCACAACTTCGTTCAGTTACGTATTGCT (SEQ ID NO. 5).
[0083] Nucleic acid substrate 2:
[0084] GCAATACGTAACTGAACGAAGTTGTGG (SEQ ID NO. 6).
[0085] Nucleic acid substrate 3:
[0086] GGTTAAACACCCAAGCAGACGCC (SEQ ID NO. 7).
[0087] Nucleic acid substrates required for gel migration experiments were prepared by annealing (nucleic acid substrate 1, nucleic acid substrate 2, and nucleic acid substrate 3 were annealed at a molar ratio of 1:1.1:1.1, with a final concentration of 10 μM).
[0088] The nucleic acid substrate was reacted with the HCV NS3h helicase protein prepared in Example 1 at a molar ratio of 1:5 in buffer (25 mM Tris-HCl pH 7.0, 100 mM NaCl, 10% glycerol, 1 mM EDTA) to obtain final reaction solution 1 (final protein concentration of 2.5 μM, final nucleic acid substrate concentration of 500 nM). The total reaction volume was 20 μL. The mixture was incubated at room temperature for one hour.
[0089] The nucleic acid substrate was mixed with the HCV NS3h-C499S / C584G / G394C / A553C protein prepared in Example 2 at a molar ratio of 1:5 in a buffer solution (25 mM Tris-HCl pH 7.0, 100 mM NaCl, 10% glycerol, 1 mM EDTA). BMH was then added to a final concentration of 50 μM to obtain final reaction solution 2 (final protein concentration 2.5 μM, final nucleic acid substrate concentration 500 nM, final BMH concentration 50 μM). The total reaction volume was 20 μL. The mixture was incubated at room temperature for one hour.
[0090] Subsequently, the results of incubation in final reaction solution 1 and final reaction solution 2 were detected using 4%-10% TBE gel, and the gel was run at 120V for 1.5 hours. The DNA bands were then observed under UV light by Gel Red staining.
[0091] Incubation result detection such as Figure 3 As shown, lane 1 represents the nucleic acid substrate without added protein, lane 2 represents the detection results after incubation in final reaction solution 1, and lane 3 represents the detection results after incubation in final reaction solution 2. In lanes 2 and 3, the upper bands represent nucleic acids that bind to the protein, and the lower bands represent the nucleic acid substrate alone. Compared to lane 2, the upper band in lane 3 is significantly brighter and clearer, indicating that the modified HCV NS3h (HCV NS3h-C499S / C584G / G394C / A553C-BMH protein) has a significantly enhanced ability to bind polynucleotides compared to the unmodified HCV NS3h (HCV NS3h helicase protein).
[0092] Example 4: HCV NS3h-C499S / C584G / G394C / A553C-BMH has the ability to control the movement of the intact nucleic acid construct X through the nanopore.
[0093] I. Preparation of the sample to be tested
[0094] Preparation such as Figure 4 The nucleic acid construct X shown contains fragments named regions A, B, C, D, E, F, and H. Region E is a single-stranded RNA fragment (sequence shown in SEQ ID NO. 8), with its 3' end linked to the 5' end of region D. Region D is a 5' phosphorylated single-stranded DNA fragment (sequence shown in positions 1-20 of SEQ ID NO. 9), with its 3' end linked to region C. Region C is an iSpC18 (sequence shown in position 21 of SEQ ID NO. 9), i.e., a spacer, and is also linked to the 5' end of region B. Region B consists of 10 i2FUs (sequence shown in positions 22-31 of SEQ ID NO. 9), which are U nucleotides with fluorine substitution modification at the 2'-OH of the sugar ring, with their 3' ends linked to the 5' end of region A. Region A is a single-stranded DNA fragment (sequence shown in positions 32-74 of SEQ ID NO. 9). Region F is a single-stranded DNA fragment (sequence shown in SEQ ID NO. 10) with a cholesterol modification at its 5' end. Region F hybridizes with regions D and E, where the 3' end of region F has 10 T bases that pair with the 3' end of region E (poly A). Region E is a single-stranded DNA fragment (sequence shown in SEQ ID NO. 11) that hybridizes with the 3' end of region A.
[0095] First, three sets of primers were designed. Primer 1 contains the sequences of regions A, B, C, and D, as shown in SEQ ID NO. 9. Primer 2 contains the sequence of region F, as shown in SEQ NO. 10, with a cholesterol modification at its 5' end. Primer 3 contains the sequence of region H, as shown in SEQ NO. 11. Primers 1, 2, and 3 were annealed at a molar ratio of 1:1.1:1.1 to obtain an annealed product with a final concentration of 10 μM. This annealed product contains a prominent 10 T's (at the 3' end of region F), which can base-pair with the poly-A tail of mRNA (region E, an RNA sequence, SEQ NO. 8). Then, a high concentration of T4 DNA ligase was used to ligate the annealed product to the 3' end of the mRNA to be tested, thereby constructing nucleic acid construct X. Reverse transcriptase (MMLV) was then added to reverse transcribe the mRNA to be tested in nucleic acid construct X to form a DNA / RNA hybrid. The DNA / RNA hybrid was then purified using magnetic beads.
[0096] The purified DNA / RNA hybrids were subjected to the following treatments to obtain different pre-incubated test samples.
[0097] (a) The purified DNA / RNA hybrid was pre-incubated with HCV NS3h-C499S / C584G / G394C / A553C protein (final concentration 10 nM) in a buffer (10 mM Hepes, pH 8.0, 100 mM KCl, 10% glycerol) at room temperature for 30 minutes, and then BMH was added to a final concentration of 50 μM. The mixture was incubated at room temperature for 1 hour to obtain the pre-incubated test sample.
[0098] (b) The purified DNA / RNA hybrid was pre-incubated with HCV NS3h-C499S / C584G / G394C / A553C protein (final concentration 10 nM) in a buffer solution (10 mM Hepes, pH 8.0, 100 Mm KCl, 10% glycerol) at room temperature for 30 minutes to obtain the pre-incubated test sample.
[0099] (c) The purified DNA / RNA hybrid was pre-incubated with HCV NS3h helicase protein (final concentration 10 nM) in a buffer solution (10 mM Hepes, pH 8.0, 100 Mm KCl, 10% glycerol) at room temperature for 30 minutes to obtain the pre-incubated test sample.
[0100] II. Construction of Single Nanopore Experimental System
[0101] Electron signal measurements were obtained from CsgG nanopores (specifically CsgG-Y51A / F56Q / R97W in WO 2017 / 149318A1) embedded in the DPhPC phospholipid bilayer in a buffer solution (600 mM KCl, 75 mM K3[Fe(CN)6, 25 mM K4[Fe(CN)6]·3H2O, 100 mM Hepes, pH 8.0). After achieving single CsgG nanopore insertion into the phospholipid bilayer, 2 ml of buffer solution (600 mM KCl, 75 mM K3[Fe(CN)6, 25 mM K4[Fe(CN)6]·3H2O, 100 mM Hepes, pH 8.0) was passed through the system to remove residual excess nanopores, thus obtaining a single nanopore experimental system.
[0102] III. Testing
[0103] The pre-incubated sample, ATP (final concentration 2 mM), and MgCl2 (final concentration 10 mM) were fed into a single nanopore experimental system (total volume 100 μL), and the signal was measured for 6 h at a constant voltage of +180 mV (including a potential 2 s -180 mV voltage reversal).
[0104] Some test results are as follows Figure 5 and Figure 6 As shown. Figure 5Example current trajectory of HCV NS3h-C499S / C584G / G394C / A553C-BMH helicase controlling the translocation of nucleic acid construct X through a nanopore, with the X-axis representing time (s) and the Y-axis representing current (nA). Figure 6 for Figure 5 The image shows a magnified view of the region controlled by the HCV NS3h-C499S / C584G / G394C / A553C-BMH helicase to allow nucleic acid construct X to pass through the nanopore. The X-axis represents time (s), and the Y-axis represents current (nA). The results show that the mutant can control the movement of the intact nucleic acid construct X through the nanopore. However, the HCV NS3h helicase protein and the HCV NS3h-C499S / C584G / G394C / A553C protein have poor persistence and easily detach from the nucleic acid, thus failing to control the passage of nucleic acid construct X through the nanopore.
[0105] Example 5: Preparation of different HCV NS3h mutants
[0106] In this embodiment, the HCV NS3h mutants N1-N18 shown in Table 1 were prepared.
[0107] The mutated recombinant expression plasmids prepared in Example 2 were subjected to site-directed mutagenesis using overlap PCR. The nucleic acid sequences encoding the HCV NS3h-C499S / C584G / G394C / A553C proteins in the recombinant expression plasmids were replaced with the nucleic acid sequences encoding the HCV NS3h mutants N1-N18, respectively, to obtain recombinant expression plasmids N1-N18.
[0108] The recombinant expression plasmid N1 contains the nucleic acid sequence encoding the HCV NS3h mutant N1 (as shown in SEQ ID NO.30) and expresses the HCV NS3h mutant N1 (as shown in SEQ ID NO.12). The difference between N1 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the phenylalanine at position 444 is replaced with tryptophan (F444W).
[0109] The recombinant expression plasmid N2 contains the nucleic acid sequence encoding the HCV NS3h mutant N2 (as shown in SEQ ID NO.31) and expresses the HCV NS3h mutant N2 (as shown in SEQ ID NO.13). The difference between N2 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the threonine at position 450 is replaced by isoleucine (T450I).
[0110] The recombinant expression plasmid N3 contains the nucleic acid sequence encoding the HCV NS3h mutant N3 (as shown in SEQ ID NO.32) and expresses the HCV NS3h mutant N3 (as shown in SEQ ID NO.14). The difference between N3 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the alanine at position 245 is replaced with glutamine (A245Q).
[0111] The recombinant expression plasmid N4 contains the nucleic acid sequence encoding the HCV NS3h mutant N4 (as shown in SEQ ID NO.33) and expresses the HCV NS3h mutant N4 (as shown in SEQ ID NO.15). The difference between N4 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the alanine at position 544 is replaced with glutamine (A544Q).
[0112] The recombinant expression plasmid N5 contains the nucleic acid sequence encoding the HCV NS3h mutant N5 (as shown in SEQ ID NO.34) and expresses the HCV NS3h mutant N5 (as shown in SEQ ID NO.16). The difference between N5 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the glycine at position 594 is replaced with asparagine (G594N).
[0113] The recombinant expression plasmid N6 contains the nucleic acid sequence encoding the HCV NS3h mutant N6 (as shown in SEQ ID NO.35) and expresses the HCV NS3h mutant N6 (as shown in SEQ ID NO.17). The difference between N6 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that asparagine at position 607 is replaced by aspartic acid (N607D).
[0114] The recombinant expression plasmid N7 contains the nucleic acid sequence encoding the HCV NS3h mutant N7 (as shown in SEQ ID NO.36) and expresses the HCV NS3h mutant N7 (as shown in SEQ ID NO.18). The difference between N7 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the cysteine at position 292 is replaced with glycine (C292G).
[0115] The recombinant expression plasmid N8 contains the nucleic acid sequence encoding the HCV NS3h mutant N8 (as shown in SEQ ID NO.37) and expresses the HCV NS3h mutant N8 (as shown in SEQ ID NO.19). The difference between N8 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that arginine at position 462 is replaced by leucine (R462L).
[0116] The recombinant expression plasmid N9 contains the nucleic acid sequence encoding the HCV NS3h mutant N9 (as shown in SEQ ID NO.38) and expresses the HCV NS3h mutant N9 (as shown in SEQ ID NO.20). The difference between N9 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the threonine at position 465 is replaced with asparagine (T465N).
[0117] The recombinant expression plasmid N10 contains the nucleic acid sequence encoding the HCV NS3h mutant N10 (as shown in SEQ ID NO.39) and expresses the HCV NS3h mutant N10 (as shown in SEQ ID NO.21). The difference between N10 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the phenylalanine at position 418 is replaced by a tyrosine (F418Y).
[0118] The recombinant expression plasmid N11 contains the nucleic acid sequence encoding the HCV NS3h mutant N11 (as shown in SEQ ID NO.40) and expresses the HCV NS3h mutant N11 (as shown in SEQ ID NO.22). The difference between N11 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the glutamic acid at position 493 is replaced with a lysine (E493K).
[0119] The recombinant expression plasmid N12 contains the nucleic acid sequence encoding the HCV NS3h mutant N12 (as shown in SEQ ID NO.41) and expresses the HCV NS3h mutant N12 (as shown in SEQ ID NO.23). The difference between N12 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that the glutamic acid at position 493 is replaced with glutamine (E493Q).
[0120] The recombinant expression plasmid N13 contains the nucleic acid sequence encoding the HCV NS3h mutant N13 (as shown in SEQ ID NO.42) and expresses the HCV NS3h mutant N13 (as shown in SEQ ID NO.24). The difference between N13 and the HCV NS3h-C499S / C584G / G394C / A553C protein is that valine at position 432 is replaced by aspartic acid (V432D).
[0121] The recombinant expression plasmid N14 contains the nucleic acid sequence encoding the HCV NS3h mutant N14 (as shown in SEQ ID NO.43) and expresses the HCV NS3h mutant N14 (as shown in SEQ ID NO.25), which differs from the N1 protein in that arginine at position 587 is replaced with alanine (R587A).
[0122] The recombinant expression plasmid N15 contains the nucleic acid sequence encoding the HCV NS3h mutant N15 (as shown in SEQ ID NO.44) and expresses the HCV NS3h mutant N15 (as shown in SEQ ID NO.26), which differs from the N1 protein in that histidine at position 541 is replaced with alanine (H541A).
[0123] The recombinant expression plasmid N16 contains the nucleic acid sequence encoding the HCV NS3h mutant N16 (as shown in SEQ ID NO.45) and expresses the HCV NS3h mutant N16 (as shown in SEQ ID NO.27). The difference from the N1 protein is that arginine at position 570 is replaced with alanine (R570A).
[0124] The recombinant expression plasmid N17 contains the nucleic acid sequence encoding the HCV NS3h mutant N17 (as shown in SEQ ID NO.46) and expresses the HCV NS3h mutant N17 (as shown in SEQ ID NO.28), which differs from the N1 protein in that histidine at position 593 is replaced with alanine (H593A).
[0125] The recombinant expression plasmid N18 contains the nucleic acid sequence encoding the HCV NS3h mutant N18 (as shown in SEQ ID NO.47) and expresses the HCV NS3h mutant N18 (as shown in SEQ ID NO.29), which differs from the N1 protein in that histidine at position 545 is replaced with alanine (H545A).
[0126] Recombinant plasmids N1-N18 were transformed into BL21(DE3) *E. coli* expression hosts via heat shock. During induction, host bacteria containing the plasmid were first cultured overnight at 37°C in LB medium supplemented with kanamycin. Then, the culture was scaled up at 37°C at a 1:100 ratio until the OD(600) value reached 0.4-0.6. The culture was then stopped and the bacteria were cooled to 4°C for 1 hour. Isopropyl β-D-thiogalactoside (IPTG) at a final concentration of 0.5 mM was added, and expression was induced at 16°C for 12-16 hours. Bacteria were then collected by centrifugation at 15,000 rpm at 4°C. The cells were then autoclaved at 4°C, followed by centrifugation at 4°C to collect the supernatant. The target protein was then purified step-by-step using nickel columns, heparin columns, Q columns, and molecular sieves to obtain large quantities of high-purity HCVNS3h mutants N1-N18. SDS-PAGE electrophoresis was used to detect purified HCV NS3h mutants N1-N18.
[0127] 1 μL of 1M DTT was added to 100 μL of purified HCV NS3h mutant N1-N18 (stored in 25 mM Tris-HCl pH 7.5, 500 mM NaCl, 10% glycerol), and incubated at room temperature for 30 minutes. The buffer was replaced with PBS buffer (pH 7.0) using a 0.5 mL Zeba desalting column (7 kM WCO) to obtain 100 μL of sample. 0.5 μL of 10 mM BMH was added to the sample, and the mixture was incubated at 20 rpm for 1 hour at room temperature to obtain HCV NS3h mutant N1-BMH to HCV NS3h mutant N18-BMH. The incubation results were then analyzed using a 4-10% polyacrylamide gel electrophoresis. The HCV NS3h mutant N1-BMH to the HCV NS3h mutant N18-BMH were obtained by crosslinking HCV NS3h mutant N1-N18 with BMH, specifically by forming covalent links between BMH and cysteine residues at positions 394 and 553 of HCV NS3h mutant N1-N18, respectively.
[0128] SDS-PAGE electrophoresis was used to detect purified HCV NS3h mutants N1-N18 as follows: Figure 7 The incubation results using 4-10% polyacrylamide gel electrophoresis showed that N1-N18 could all be successfully crosslinked with BMH to obtain the HCV NS3h mutant N1-BMH.
[0129] Table 1 Mutational combinations of HCV NS3h mutants
[0130]
[0131]
[0132] Example 6: Sequencing performance testing of different HCV NS3h mutants
[0133] The sequencing performance of the HCV NS3h mutant prepared in Example 5 was tested using the same method as in Example 4. The only difference was that when preparing the test samples, the purified DNA / RNA hybrids were pre-incubated with N1-N18 (final concentration 10 nM) in a buffer solution (10 mM Hepes, pH 8.0, 100 mM KCl, 10% glycerol) at room temperature for 30 minutes, and then BMH was added to a final concentration of 50 μM. The samples were then incubated at room temperature for 1 hour to obtain the pre-incubated test samples.
[0134] The test results showed that the mutant could control the movement of the complete nucleic acid construct X through the nanopore. Compared with N0, the mutant showed a more concentrated velocity distribution (results from statistical analysis of 1000 sequencing data points), for example... Figures 8-11 Compared to N0, whose main peak accounts for only 3.5%, the other three main peaks all account for a higher percentage, around 4%.
[0135] The present invention has been described in detail above. For those skilled in the art, the invention can be practiced in a wide range of ways with equivalent parameters, concentrations, and conditions without departing from its spirit and scope, and without requiring unnecessary experiments. Although specific embodiments have been given, it should be understood that further modifications can be made to the invention. In summary, according to the principles of the invention, this application is intended to include any changes, uses, or improvements to the invention, including changes made using conventional techniques known in the art that depart from the scope disclosed herein. Some of the essential features can be applied within the scope of the following appended claims.
Claims
1. A modified RNA helicase NS3h, characterized in that, The modified RNA helicase NS3h is any of the following proteins: (a1) A protein having the same function by substituting and / or adding one or more amino acid residues of the amino acid sequence shown in SEQ ID No.
48. Proteins whose amino acid sequences defined by (a2) and (a1) are more than 80% identical and have the same function; (a3) Proteins obtained by truncating 188 amino acid residues at the N-terminus and 7 amino acid residues at the C-terminus of the proteins defined in (a1) or (a2); (a4) The fusion protein obtained by attaching a tag to the end of any of the proteins defined in (a1)-(a3).
2. The modified RNA helicase NS3h according to claim 1, characterized in that, (a1) The protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is replaced by amino acid residues at at least one of the following positions: 279, 289, 292, 368, 374, 428, 431, 492, 494, 499, 525, 568, 584 and 622; Preferably, the substitution is that the amino acid residue is replaced by an ornithine residue or a serine residue.
3. The modified RNA helicase NS3h according to claim 2, characterized in that, (a1) The protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is further substituted with amino acid residues at at least the following positions: 256, 371, 372, 375, 394, 395, 396, 551, 552, 553, 554, 579, 580 and 581. Preferably, the substitution is that the amino acid residue is replaced by a cysteine residue; Preferably, (a1) the protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is substituted and / or added by at least two amino acid residues, and there is a covalent link between the at least two substituted and / or added amino acid residues; More preferably, there is a covalent link between the at least two substituted and / or added amino acid residues, that is, there is a covalent link between the substituted and / or added amino acid residues at positions 394 and 553 of the amino acid sequence shown in SEQ ID No. 48; More preferably, the covalent connection is achieved through disulfide bonds or linking molecules; Preferably, the linker molecule is selected from at least one of bismaleimide ethane, 1,4-bismaleimide butane, bismaleimide hexane, dithiomaleimide ethane, tris(2-maleimide ethyl)amine, 1,8-bismaleimide-diethylene glycol, and 1,11-bismaleimide-triethylene glycol.
4. The modified RNA helicase NS3h according to any one of claims 1-3, characterized in that, (a1) The protein is a protein in which the amino acid sequence shown in SEQ ID No. 48 is further substituted by amino acid residues at at least one of the following positions: 444, 450, 245, 544, 594, 607, 292, 462, 465, 418, 493, 432, 587, 541, 570, 593 and 545; Preferably, the replacement method is as follows: The F in the 444th position was replaced by W; The 450th 'T' is replaced by 'I'; The 245th letter, A, was replaced by Q; The 544th A was replaced by Q; The 594th G was replaced by N; The 607th N is replaced by D; The C in position 292 was replaced by G; The 462nd R was replaced by L; The 465th T is replaced by N; The 418th F was replaced by Y; The 493rd E is replaced by K or Q; The 432nd digit, V, was replaced by D; The 587th R was replaced by A; The 541st position, H, is replaced by A; The 570th R was replaced by A; The 593rd letter H was replaced by A; The 545th letter H was replaced by A.
5. The biomaterials related to the modified RNA helicase NS3h as described in claim 1, characterized in that: The relevant biomaterial is any one of the following: c1) The nucleic acid molecule encoding the modified RNA helicase NS3h as described in claim 1; c2) An expression cassette containing the nucleic acid molecule described in c1); c3) A recombinant vector containing the nucleic acid molecule described in c1), or a recombinant vector containing the expression cassette described in c2); c4) Recombinant microorganisms containing the nucleic acid molecules described in c1), or recombinant microorganisms containing the expression cassette described in c2), or recombinant microorganisms containing the recombinant vector described in c3). Preferably, the nucleic acid molecule described in c1 is any of the following DNA molecules: d1) The nucleotide sequence is any of the DNA molecules shown in SEQ ID NO.3 or SEQ ID NO.30-47 in the sequence listing; d2) The coding sequence is any DNA molecule shown in SEQ ID NO.3 or SEQ ID NO.30-47 in the sequence listing; d3) has 90% or more identity with the nucleotide sequence defined by d1) or d2), and is derived from hepatitis C virus and encodes the protein of claim 1. d4) Hybridizes under stringent conditions to a nucleotide sequence defined by d1) or d2) and encodes a DNA molecule that encodes the protein of claim 1.
6. A construct, characterized in that, The construct comprises the modified RNA helicase NS3h as described in any one of claims 1-4 and a binding moiety for binding polynucleotides.
7. The modified RNA helicase NS3h according to any one of claims 1-4, the biomaterial according to claim 5, or the construct according to claim 6 is used in any of the following: (1) Characterize the target polynucleotide; (2) Prepare products that characterize the target polynucleotide; (3) Controlling the passage of target polynucleotides through nanopores; (4) Prepare products that control the passage of target polynucleotides through nanopores.
8. A method for controlling the passage of a target polynucleotide through a nanopore, characterized in that, include (1) The target polynucleotide is ligated with the modified RNA helicase NS3h as described in any one of claims 1-4 or the construct as described in claim 6 to obtain the sample to be tested; (2) The sample to be tested is brought into contact with the nanoporin, so that the target polynucleotide in the sample to be tested moves relative to the nanoporin.
9. A method for characterizing a target polynucleotide, characterized in that, include A. To implement the method of claim 8; B. Acquire one or more measurements as the target polynucleotide moves relative to the nanoporous protein in the sample to be tested, thereby determining the presence, absence, or one or more characteristics of the target polynucleotide; Preferably, the one or more features are selected from at least one of (i) the length of the target polynucleotide; (ii) the identity of the target polynucleotide; (iii) the sequence of the target polynucleotide; (iv) the secondary structure of the target polynucleotide; and (v) whether the target polynucleotide is modified.
10. A kit for characterizing target polynucleotides or controlling the passage of target polynucleotides through nanopores, characterized in that, Includes at least one modified RNA helicase NS3h as described in any one of claims 1-4 or the construct as described in claim 6. Preferably, it also includes nanoporous proteins and membranes.