Nucleases and uses thereof
Smaller guide nucleic acid-guided DNA endonucleases like S9D01 to S9E04 overcome the delivery issues of Cas9, enabling effective DNA modification for genome editing and gene therapy.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- SHANGHAI INSTITUTE OF MATERIA MEDICA CHINESE ACADEMY OF SCIENCES
- Filing Date
- 2025-12-22
- Publication Date
- 2026-06-25
AI Technical Summary
The large size of Cas9 and its derivatives complicates delivery and limits their use in practical applications for genome editing and gene therapy.
Development of guide nucleic acid-guided DNA endonucleases, such as S9D01 to S9E04, with smaller sizes and improved properties, along with corresponding guide nucleic acids, to facilitate targeted DNA modification.
The new endonucleases enable efficient and targeted DNA modification, addressing the delivery limitations of Cas9 and its derivatives, enhancing their applicability in genome editing and gene therapy.
Smart Images

Figure PCTCN2025144342-FTAPPB-I100001 
Figure PCTCN2025144342-FTAPPB-I100002 
Figure PCTCN2025144342-FTAPPB-I100003
Abstract
Description
NUCLEASES AND USES THEREOF
[0001] REFERENCE TO RELATED APPLICATIONS
[0002] The instant application claims the priority to and the benefit of the filing date of PCT / CN2024 / 141003, filed on December 20, 2024, and PCT / CN2025 / 097420, filed on May 27, 2025, the entire contents of which, including any drawings and sequence listing, are incorporated herein by reference.
[0003] REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
[0004] The disclosure contains a Sequence Listing XML file which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on December 21, 2025, by software “WIPO Sequence” according to WIPO Standard ST. 26, is named SYP009PCT. xml, and is 85,399 bytes in size.
[0005] According to WIPO Standard ST. 26, symbol “t” is used to denote both T in DNA and U in RNA. Thus, in the instant sequence listing prepared according to ST. 26, wherever a sequence is an RNA, the T in the sequence shall be deemed as U.BACKGROUND
[0006] Cas9 (Class 2, Type II Cas) is a class of guide nucleic acid-guided DNA endonucleases and has been developed into genome editing tools for use in basic research and gene therapy development. Their nickase or deactivated (dead) versions fused with various functional domains (e.g., deaminase) have established base editing, prime editing, and epigenome editing technologies. However, the large size of Cas9 that complicates delivery has limited their use. A guide nucleic acid-guided DNA binding protein with more suitable properties for practical applications, such as, smaller size, would be desired to meet the unmet need in the art.
[0007] Citation or identification of any document in the disclosure is not an admission that such a document is available as prior art to the disclosure. Each of the references mentioned or cited in the disclosure is incorporated by reference in its entirety.SUMMARY
[0008] Included in the disclosure is a collection of guide nucleic acid-guided DNA endonucleases (or “endonucleases” for short) , including S9D01, S9D02, S9D03, S9D04, S9D05, S9D06, S9D07, S9E01, S9E02, S9E03, and S9E04 newly identified in the disclosure, and their derivatives and uses, meeting the unmet need in the art. Also included in the disclosure is a guide nucleic acid (e.g., gRNA) suitable for use to guide the corresponding guide nucleic acid-guided DNA endonucleases or derivatives thereof in the disclosure to a target DNA. Also included in the disclosure is a system or composition comprising the guide nucleic acid-guided DNA endonucleases or derivatives thereof in the disclosure and the corresponding guide nucleic acid in the disclosure suitable for use to target (e.g., function on) a target DNA. Also included in the disclosure is a method of using or use of the system in the disclosure to target (e.g., function on) a target DNA.
[0009] The endonucleases and the scaffold sequence of the guide nucleic acid corresponding to each endonuclease are listed in pair in Table 1 below. Provided herein are two versions of the scaffold sequences. The first version consists of the DR sequence and the tracr sequence with no linker in-between, and the second version consists of the DR sequence and the tracr sequence with a linker of AAAAAA (SEQ ID NO: 62) in-between.
[0010] Table 1
[0011] In an aspect, provided in the disclosure is a polypeptide comprising an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 4, 1-3, and 5-11.
[0012] In another aspect, provided in the disclosure is a fusion protein comprising the polypeptide of any preceding claim and a functional domain.
[0013] In yet another aspect, provided in the disclosure is a system comprising:
[0014] (1) the polypeptide of any preceding claim or the fusion protein of any preceding claim, or a polynucleotide (e.g., a DNA, an RNA) encoding the polypeptide or the fusion protein, and
[0015] (2) a guide nucleic acid or a polynucleotide (e.g., a DNA, an RNA) encoding the guide nucleic acid, the guide nucleic acid comprising:
[0016] (i) a scaffold sequence capable of forming a complex with the polypeptide or the fusion protein; and
[0017] (ii) a guide sequence capable of hybridizing to a target sequence of a target DNA, thereby guiding the complex to the target DNA.
[0018] In yet another aspect, provided in the disclosure is a polynucleotide comprising a sequence encoding the polypeptide of any preceding claim or the fusion protein of any preceding claim.
[0019] In yet another aspect, provided in the disclosure is a vector comprising the polynucleotide of any preceding claim; optionally, the vector is a plasmid vector, a viral vector (e.g., a recombinant AAV (rAAV) vector, a recombinant lentivirus vector) , a ribonucleoprotein (RNP) , or a lipid nanoparticle (LNP) .
[0020] In yet another aspect, provided in the disclosure is a cell comprising the polypeptide of any preceding claim, the fusion protein of any preceding claim, the system of any preceding claim, the polynucleotide of any preceding claim, or the vector of any preceding claim.
[0021] In yet another aspect, provided in the disclosure is a method for modifying a target DNA, comprising contacting the target DNA with the system of any preceding claim, wherein the guide sequence is capable of hybridizing to a target sequence of the target DNA, wherein the target DNA is modified by the complex.
[0022] The details of one or more embodiments of the disclosure are set forth in the description below. Other features or advantages of the disclosure will be apparent from the following drawings and detailed description of several embodiments, and also from the appended claims. It is understood that any aspect or embodiment of the disclosure can be combined with any other one or more aspects or embodiments of the disclosure, including aspects or embodiments only described in one sub-section, only in the examples, or only in the claims, to constitute another embodiment explicitly or implicitly disclosed herein unless otherwise indicated.BRIEF DESCRIPTION OF THE DRAWINGS
[0023] An understanding of the features and advantages of the disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure may be utilized, and the accompanying drawings of which:
[0024] Fig. 1 illustrates an exemplary target dsDNA, and an exemplary system comprising (1) an exemplary guide nucleic acid comprising a guide sequence and a scaffold sequence and (2) an exemplary endonuclease or nickase fused with a functional domain.
[0025] The figures herein are for illustrative purposes only and are not necessarily drawn to scale.DETAILED DESCRIPTION
[0026] The disclosure will be described with respect to particular embodiments, but the disclosure is not limited thereto in any respect. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms as set forth hereinafter are generally to be understood in their plain and ordinary meaning or common sense unless indicated otherwise.
[0027] Definition
[0028] The disclosure will be described with respect to particular embodiments, but the disclosure is not limited thereto in any respect. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which this disclosure belongs. Terms as set forth hereinafter are generally to be understood in their plain and ordinary meaning or common sense unless indicated otherwise.
[0029] Similar to guide nucleic acid-guided DNA endonucleases Cas9, Cas12, and IscB, the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure are capable of binding to a target DNA (e.g., a dsDNA) as guided by a guide nucleic acid (e.g., gRNA) comprising a guide sequence targeting the DNA. The guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure may be associated with the guide nucleic acid, which localizes / targets the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure to a target DNA that comprises a DNA strand (i.e., a target strand) that is reversely complementary to the guide nucleic acid, or a portion thereof (e.g., the guide sequence of a guide nucleic acid) . In other words, the guide nucleic acid is “programed” to localize and bind the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure to the target DNA such that the guide nucleic acid is also termed as a programmable nucleic acid. Binding of the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure to the target DNA enables the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure or a construct comprising the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure to access to and function on the target DNA. For this purpose, the guide nucleic acid comprises a scaffold sequence responsible for (capable of) forming a complex with the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure, and a guide sequence that is intentionally designed to be responsible for (capable of) hybridizing to a target sequence of the target DNA, thereby guiding the complex comprising the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure and the guide nucleic acid to the target DNA such that the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure is indirectly bound to the target DNA. The ability of (indirect) binding to target DNA makes the guide nucleic acid-guided DNA endonucleases or derivatives thereof of the disclosure also guide nucleic acid-guided DNA binding proteins, similar to Cas9, Cas12, and IscB.
[0030] Referring to Fig. 1, an exemplary dsDNA is depicted to comprise a 5’ to 3’ single DNA strand and a 3’ to 5’ single DNA strand.
[0031] An exemplary guide nucleic acid is depicted to comprise a guide sequence and a scaffold sequence. The guide sequence is designed according to base pairing principle to be capable of hybridizing to a part of the 3’ to 5’ single DNA strand, and so the guide sequence “targets” that part. And thus, the 3’ to 5’ single DNA strand is referred to as a “target strand (TS) ” of the dsDNA, while the opposite 5’ to 3’ single DNA strand is referred to as a “nontarget strand (NTS) ” of the dsDNA. That part of the target strand based on which the guide sequence is designed and to which the guide sequence is capable to hybridize is referred to as a “target sequence” , while the opposite part on the nontarget strand corresponding to that part is referred to as the “protospacer sequence” , which is typically 100% (fully) reversely complementary to the target sequence, if there is no intentional or unintentional mismatch.
[0032] Generally, as is conventional in the art, a nucleic acid sequence (e.g., a DNA sequence) is written in 5’ to 3’ direction / orientation unless explicitly indicated otherwise.
[0033] For example, for a DNA sequence of ATGC, it is usually understood as 5’-ATGC-3’ unless otherwise indicated. Its reverse sequence is 5’-CGTA-3’ . Its fully complementary sequence is 5’-TACG-3’ . Its fully reverse complementary sequence is 5’-GCAT-3’ (3’-TACG-5’ ) . Note that the fully complementary sequence usually does not have the ability to base-pair / hybridize with the original sequence.
[0034] Generally, the double-strand sequence of a dsDNA may be represented with the sequence of its 5’ to 3’ single DNA strand conventionally written in 5’ to 3’ direction / orientation unless otherwise indicated.
[0035] For example, for a dsDNA having a 5’ to 3’ single DNA strand of 5’-ATGC-3’ and a 3’ to 5’ single DNA strand of 3’-TACG-5’ as shown below, the dsDNA may be simply represented as 5’-ATGC-3’ .
[0036] 5’-----ATGC-----3’
[0037] 3’-----TACG-----5’
[0038] It should be noted that either the 5’ to 3’ single DNA strand or the 3’ to 5’ single DNA strand of a dsDNA can be a nontarget strand from which a protospacer sequence is selected.
[0039] In the sense of base editing, the strand on which the target nucleotide to be edited is located is termed as an edited strand, and the opposite strand is termed as a non-edited strand. As used herein, the nontarget strand is the edited strand, and the target strand is the non-edited strand.
[0040] Typically for a gene, the 5’ to 3’ single DNA strand of the gene is sense strand, and the 3’ to 5’ single DNA strand of the gene is antisense strand. Either the sense strand or the antisense strand can be a nontarget strand from which a protospacer sequence is selected.
[0041] To hybridize to a dsDNA, such as, a dsDNA 5’-ATGC-3’ , the guide sequence of a guide nucleic acid can be in one embodiment designed to have a sequence of 5’-AUGC-3’ that is fully reversely complementary to the 3’ to 5’ strand of the dsDNA (3’-TACG-5’ ) , which would be set forth in ATGC in the electric sequence listing and marked as an RNA sequence according to WIPO standard ST. 26; and in another embodiment, the guide sequence of a guide nucleic acid can be designed to have a sequence of 5’-GCAU-3’ that is fully reversely complementary to the 5’ to 3’ strand of the dsDNA (5’-ATGC-3’ ) , which would be set forth in GCAT in the electric sequence listing and marked as an RNA sequence according to WIPO standard ST. 26.
[0042] In the case that the guide sequence of a guide nucleic acid is fully reversely complementary to the target sequence of a target dsDNA and the target sequence of a target dsDNA is fully reversely complementary to the protospacer sequence of the target dsDNA, the guide sequence of a guide nucleic acid is identical to the protospacer sequence of the target dsDNA except for the difference between the U in the guide sequence due to its RNA nature and the corresponding T in the protospacer sequence due to its DNA nature. According to WIPO standard ST. 26, symbol “t” is used to denote both T in DNA and U in RNA (See “Table 1: List of nucleotides symbols” , the definition of symbol “t” is “thymine in DNA / uracil in RNA (t / u) ” ) . Thus, in the electronic sequence listing of the disclosure prepared according to WIPO standard ST.26, such a guide sequence of a guide nucleic acid could be set forth in the same sequence as a corresponding protospacer sequence of a target dsDNA in the same length. For convenience, a single SEQ ID NO entry in the electronic sequence listing can be used to denote both such a guide sequence of a guide nucleic acid and a protospacer sequence of a target dsDNA, despite whether the SEQ ID NO entry is marked as DNA or RNA in the electronic sequence listing. When a reference is made to such a SEQ ID NO entry that sets forth a protospacer / guide sequence, it refers to either a protospacer sequence that is a DNA sequence or a guide sequence of a guide nucleic acid that is an RNA sequence depending on the context, no matter whether it is marked as a DNA or an RNA in the electronic sequence listing.
[0043] As used herein, if a DNA sequence, for example, 5’-ATGC-3’ istranscribed to an RNA sequence, with each dT (deoxythymidine, or “T” for short) in the primary sequence of the DNA sequence replaced with a U (uridine) and each dA (deoxyadenosine, or “A” for short) , dG (deoxyguanosine, or “G” for short) , and dC (deoxycytidine, or “C” for short) replaced with A (adenosine) , G (guanosine) , and C (cytidine) , respectively, resulting in 5’-AUGC-3’ , it is said in the disclosure that the DNA sequence “encodes” the RNA sequence.
[0044] As used herein, the term “polypeptide” and “protein” are used interchangeably to refer to a polymer of amino acids of any length. The polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. The terms also encompass a polymer of amino acids that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
[0045] As used herein, the polypeptide of the disclosure may refer to any polypeptide in the disclosure, for example, a wild type or reference polypeptide in the disclosure, a mutant of a reference polypeptide in the disclosure, and more specifically, any one of the guide nucleic acid-guided DNA endonucleases of the disclosure and any derivative thereof, such as, a nickase or a high-efficiency mutant of any one of the guide nucleic acid-guided DNA endonucleases of the disclosure.
[0046] As used herein, the term “reference polypeptide” is used in the context of designing and developing a new polypeptide based on an original polypeptide (e.g., a wild-type polypeptide) . For example, the original polypeptide is mutated to generate the new polypeptide. In that case, the original polypeptide is a reference of the new polypeptide and termed as a reference polypeptide. The properties of a new polypeptide can be evaluated with the reference polypeptide as a reference from which the new polypeptide is derived. For example, one or more of the properties (e.g., endonuclease activity) of the new polypeptide can be compared with the reference polypeptide from which the new polypeptide is derived.
[0047] With respect to a polypeptide in the disclosure, the terms “variant” , “mutant” , and “engineered polypeptide” as used herein are used interchangeably to refer to a mutant of a reference polypeptide (e.g., a wild type polypeptide) generated by introducing an amino acid mutation (e.g., addition, deletion, substitution) into the reference polypeptide.
[0048] The amino acid sequence of a protein often starts with a most N-terminal Methionine (M) (i.e., at position 1) , since the coding sequence for the protein would require a 5’ end start codon ATG to initiate its transcription and translation, and the start codon ATG encodes amino acid Met. If a start codon ATG is already separately present upstream of the coding sequence for a polypeptide with a most N-terminal Met at position 1, then the codon for the Met in the coding sequence may be omitted as needed and hence the amino acid Met at position 1 of the polypeptide is deleted. Usually, such deletion of Met at position 1 would not affect the function of the polypeptide. In another view, the Met encoded by the upstream start codon ATG and the polypeptide lacking most N-terminal Met can together be regarded as the complete polypeptide having the N-terminal Met. In some embodiments, the polypeptide of the disclosure comprises a deletion of the most N-terminal Methionine (M) relative to a reference polypeptide. For convenience, when reference is made to any one of SEQ ID NOs: 1-11 or a derivative thereof or any other polypeptide in the disclosure, if Met is present at position 1 of any of those referred polypeptides, the reference is also made to an N-terminal truncation of any of those referred polypeptides lacking the most N-terminal Methionine (M) . For convenience, when reference is made to any one of SEQ ID NOs: 1-11 or a derivative thereof or any other polypeptide in the disclosure, if Met is not present at position 1 of any of those referred polypeptides and start codon ATG is essential for its expression, the reference is also made to the polypeptide with addition of an most N-terminal Methionine (M) .
[0049] As used herein, the description of a mutant “comprising an amino acid mutation (e.g., substitution) at a position of a reference polypeptide that is corresponding to a given position (e.g., D10) of a given polypeptide (e.g., S9D04) ” or similar description means that the mutant is a mutant of the reference polypeptide (which can be the given polypeptide or another polypeptide) and comprises an amino acid mutation to be introduced at a position of the reference polypeptide corresponding to the given position of the given polypeptide.
[0050] The position of the mutation to be introduced into the reference polypeptide may be the same as the given position of the given polypeptide, for example, when the mutant is a mutant of the given polypeptide.
[0051] The position of the mutation to be introduced into the reference polypeptide may be different from the given position of the given polypeptide. For example, the mutant may be a mutant of another polypeptide (as a reference polypeptide) different from the given polypeptide but the reference polypeptide is structurally similar to the given polypeptide and therefore the position of the mutation to be introduced into the reference polypeptide can be determined according to the given position of the given polypeptide, for example, by sequence alignment of the reference polypeptide and the given polypeptide. For example, the mutant is mutated from a reference polypeptide (e.g., S9D02) and comprises an amino acid mutation (e.g., substitution) at an XXX position of the reference polypeptide corresponding to YYY position of a given polypeptide (e.g., S9D04) , where the reference polypeptide and the given polypeptide are not identical but structurally similar, for example, the reference polypeptide and the given polypeptide are conservative at one or more amino acid residues. For example, S9D02 and S9D04 in the disclosure are two different polypeptides but structurally similar and conservative at D10 (numbered according to the sequence of S9D04) . Therefore, with respect to a mutant of S9D02 comprising an amino acid mutation at a position corresponding to D10 of S9D04, the position of the mutation to be introduced into S9D02 can be determined to be position D7 of S9D02 by sequence alignment of S9D02 and S9D04.
[0052] With respect to an amin acid residue of a polypeptide, by “conserved” or “conservative” it means that the amino acid residue is constant (not changed) across all indicated polypeptides. With respect to a motif of a polypeptide or a nucleic acid, by “conserved” or “conservative” it means that the motif is constant (not changed) across all indicated polypeptides or nucleic acids. As used herein, the term “motif” refers to a segment of a polypeptide or a nucleic acid, consisting of multiple (more than one) amino acids or nucleotides.
[0053] As used herein, the description of a mutant “comprising an amino acid mutation relative to a reference polypeptide that is corresponding to a given amino acid mutation (e.g., substitution, such as, X125Y) relative to a given polypeptide” means that the mutant is a mutant of the reference polypeptide (which can be the given polypeptide or another polypeptide) and comprises the same type of amino acid mutation (e.g., X-to-Y substitution) as the given amino acid mutation at a position in the reference polypeptide corresponding to the position (e.g., X125) of the given amino acid mutation (e.g., X125Y) numbered according to the given polypeptide. For example, a mutant comprising an amino acid substitution relative to S9D02 corresponding to D10A relative to S9D04 refers to the fact that S9D04 comprises amino acid D at position 10, and the mutant comprises amino acid A at position 7 of S9D02, from which the mutant is generated, corresponding to position 10 of S9D04. The corresponding relationship of positions in two or more amino acid sequences as determined by sequence alignment is explained in the previous paragraphs.
[0054] As used herein, “amino acid mutation” includes addition, deletion, and / or substitution. Insertion is also a kind of addition that often occurs within a polypeptide. Truncation is also a kind of deletion. N-terminal or C-terminal truncation / deletion means the truncation / deletion occurs at the N-terminal or C-terminal of a polypeptide. Typically, C-terminal truncation / deletion refers to a deletion of one or more amino acids starting from the most C-terminal amino acid of a polypeptide towards the N-terminal of the polypeptide. Typically, N-terminal truncation / deletion refers to a deletion of one or more amino acids starting from the most N-terminal amino acid of a polypeptide towards the C-terminal of the polypeptide. Alternatively, in some embodiments, the most N-terminal amino acid of a polypeptide is Met (corresponding to the start codon ATG of a nucleic acid encoding the polypeptide) , and the N-terminal truncation / deletion starts from the second amino acid of the polypeptide downstream of (C’ to) (on the right side of) the Met towards the C-terminal of the polypeptide. As a specific but non-limiting example, a N-terminal truncation of 20 amino acids of a polypeptide refers to a deletion of amino acids at positions 1-20 of the polypeptide, or in some embodiments, a deletion of amino acids at positions 2-21 of the polypeptide where the most N-terminal amino acid of the polypeptide is Met and retained after the deletion.
[0055] As used herein, a “conservative substitution” refers to a substitution of an amino acid made among amino acids within one of the following four groups:
[0056] (1) non-polar amino acids, including Glycine (Gly / G) , Alanine (Ala / A) , Valine (Val / V) , Cysteine (Cys / C) , Proline (Pro / P) , Leucine (Leu / L) , Isoleucine (Ile / I) , Methionine (Met / M) , Tryptophan (Trp / W) , and Phenylalanine (Phe / F) ;
[0057] (2) negatively charged amino acids, including Aspartic Acid (Asp / D) and Glutamic Acid (Glu / E) ;
[0058] (3) polar amino acids, including Serine (Ser / S) , Threonine (Thr / T) , Tyrosine (Tyr / Y) , Asparagine (Asn / N) , and Glutamine (Gln / Q) ; and
[0059] (4) positively charged amino acids, including Lysine (Lys / K) , Arginine (Arg / R) , and Histidine (His / H) .
[0060] As used herein, the terms “nucleic acid” and “polynucleotide” are used interchangeably. They refer to a polymer of deoxyribonucleotides or ribonucleotides or their mixtures of any length in either single-or double-stranded form, and, unless otherwise stated, encompass known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides. The terms encompass nucleic acid-like structures with synthetic backbones, as well as amplification products. DNAs and RNAs are both polynucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine) , nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O (6) -methylguanine, and 2-thiocytidine) , chemically modified bases, biologically modified bases (e.g., methylated bases) , intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, arabinose, and hexose) , or modified phosphate groups (e.g., phosphorothioates and 5′-N-phosphoramidite linkages) . The terms include both modified and unmodified.
[0061] As used herein, the terms “programmable nucleic acid” and “guide nucleic acid” are used interchangeably and refer to a nucleic acid-based molecule capable of guiding a polypeptide (for example, the gRNA-guided DNA endonucleases or derivatives thereof of the disclosure) to a target nucleic acid, by comprising a scaffold sequence capable of forming a complex with the polypeptide and comprising a guide sequence capable of hybridizing to the target nucleic acid. The terms include, but are not limited to, RNA-based molecules, e.g., a guide RNA.
[0062] As used herein, the terms “programmable RNA” , “RNA guide” , “guide RNA” , and “gRNA” are used interchangeably and refer to a RNA-based molecule capable of guiding a polypeptide (for example, the gRNA-guided DNA endonucleases or derivatives thereof of the disclosure) to a target nucleic acid, by comprising a scaffold sequence capable of forming a complex with the polypeptide and comprising a guide sequence capable of hybridizing to the target nucleic acid.
[0063] As used in the disclosure, the term “guide sequence” is used interchangeably with the term “spacer sequence” or “spacer” .
[0064] As used herein, the term “complex” refers to a grouping of two or more molecules, e.g., grouping of a polypeptide in the disclosure and a guide nucleic acid in the disclosure (via the scaffold sequence of the guide nucleic acid) . In some embodiments, the complex comprises a nucleic acid and a polypeptide interacting with (e.g., binding to, coming into contact with, adhering to) one another. As used herein, the term “complex” can refer to a grouping of a guide nucleic acid and a polypeptide. As used herein, the term “complex” can refer to a grouping of a guide nucleic acid, a polypeptide, and a target nucleic acid (e.g., a target DNA) .
[0065] With respect to a scaffold sequence in the disclosure, the terms “variant” and “mutant” are used interchangeably to refer to a mutant of a reference scaffold sequence (e.g., a scaffold sequence in Table 1) generated by introducing a nucleotide mutation (e.g., addition, deletion, substitution) into the reference scaffold sequence.
[0066] As described herein, the guide sequence is so designed to be capable of hybridizing to a target sequence of a target DNA. As used herein, the term “hybridize” , “hybridizing” , or “hybridization” refers to a reaction in which one or more polynucleotide sequences react to form a complex that is stabilized via hydrogen bonding between the bases of the one or more polynucleotide sequences. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. As used herein, the hybridization of a guide sequence and a target sequence is so stabilized to permit a polypeptide that is complexed with a guide nucleic acid comprising the guide sequence to act (e.g., cleave, deaminize) at or near the target sequence or its complement.
[0067] For the purpose of hybridization, in some embodiments, the guide sequence is reversely complementary to a target sequence. As used herein, the term “reverse complementary” refers to the ability of nucleobases of a first polynucleotide sequence, such as a guide sequence, to base pair with nucleobases of a second polynucleotide sequence, such as a target sequence, by traditional Watson-Crick base-pairing. Two reverse complementary polynucleotide sequences are able to non-covalently bind under appropriate temperature and solution ionic strength conditions. In some embodiments, a first polynucleotide sequence (e.g., a guide sequence) comprises 100% (fully) reverse complementarity to a second nucleic acid (e.g., a target sequence) . In some embodiments, a first polynucleotide sequence (e.g., a guide sequence) is reverse complementary to a second polynucleotide sequence (e.g., a target sequence) if the first polynucleotide sequence comprises at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%complementarity to the second nucleic acid (i.e., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the nucleotides of the first polynucleotide sequence can base-pair with the nucleotides of the second polynucleotide sequence) . As used herein, the term “substantially complementary” refers to a first polynucleotide sequence (e.g., a guide sequence) that has a certain level of complementarity to a second polynucleotide sequence (e.g., a target sequence) (e.g., at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%of the first polynucleotide sequence can base-pair with the second polynucleotide sequence, or at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotides of the first polynucleotide sequence mismatch the nucleotides of the second polynucleotide sequence) . In some embodiments, the level of complementarity is such that the first polynucleotide sequence (e.g., a guide sequence) can hybridize to the second polynucleotide sequence (e.g., a target sequence) with sufficient affinity to permit a polypeptide that is complexed with the first polynucleotide sequence or a nucleic acid comprising the first polynucleotide sequence to act (e.g., cleave, deaminize) on the target sequence or its complement. In some embodiments, a guide sequence that is substantially complementary to a target sequence has 100%or less than 100%complementarity to the target sequence. In some embodiments, a guide sequence that is substantially complementary to a target sequence has at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%complementarity to the target sequence, and / or has at most 1, 2, 3, 4, or 5 contiguous or non-contiguous nucleotide mismatches from the target sequence.
[0068] With respect to a system in the disclosure comprising a polypeptide (e.g., S9D04 or a derivative thereof) in the disclosure and a guide nucleic acid in the disclosure, wild type system may be used to refer to a system comprising a wild type polypeptide in the disclosure (e.g., S9D04 of SEQ ID NO: 4 in Table 1) and a guide nucleic acid comprising the scaffold sequence in the disclosure corresponding to the wild type polypeptide (e.g., the scaffold sequence of SEQ ID NO: 37 or 48 in Table 1) , and variant system may be used to refer to a system comprising a derivative of a wild type polypeptide (e.g., a fusion protein comprising S9D04) and / or a guide nucleic acid comprising a derivative of a wild type scaffold sequence (e.g., a mutant of the scaffold sequence of SEQ ID NO: 48 in Table 1) .
[0069] As used herein, the terms “protospacer adjacent motif (PAM) ” and “target adjacent motif (TAM) ” are used interchangeably and refer to a short nucleotide sequence (or a motif) immediately 3’ to a protospacer sequence on the nontarget strand of a target dsDNA recognizable by a polypeptide of the disclosure.
[0070] As used herein, the term “identity” or “sequence identity” refers to the overall relatedness between polymeric molecules, e.g., between nucleic acids (e.g., DNA and / or RNA) and / or between polypeptides. In some embodiments, polymeric molecules are considered to be “substantially identical” to one another if their sequences are at least 80%, 85%, 90%, 95%, or 99%identical. Calculation of the percent identity of two nucleic acids or polypeptides, for example, can be performed by aligning the two sequences for optimal comparison purpose (e.g., gaps can be introduced in one or both of a first and a second sequences for optimal alignment, and non-identical sequences can be disregarded for comparison purposes) . In certain embodiments, the length of a sequence aligned for comparison purpose is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, or 100%of the length of a reference sequence. The nucleotides or amino acids at corresponding positions are then compared. The comparison of sequences and determination of percent identity between two sequences can be accomplished using a mathematical algorithm. As is well known in the art, nucleic acids or polypeptides may be compared using any of a variety of algorithms, including those available in commercially available computer programs such as BLASTN for nucleotide sequences and BLASTP, gapped BLAST, and PSI-BLAST for amino acid sequences. In some embodiments, the sequence identity is calculated by global alignment, for example, using the Needleman-Wunsch algorithm, for example, using an online tool at ebi. ac. uk / Tools / psa / emboss_needle / . In some embodiments, the sequence identity is calculated by local alignment, for example, using the Smith-Waterman algorithm, for example, using an online tool at ebi. ac. uk / Tools / psa / emboss_water / .
[0071] As used herein, the terms “upstream” and “downstream” refer to relative positions within a single nucleic acid (e.g., DNA) or within a single polypeptide. “Upstream” and “downstream” relate to the 5’ to 3’ direction of a single nucleic acid, respectively, in which transcription occurs, or N-to C-orientation of a single polypeptide, respectively, in which translation occurs. For a first sequence and a second sequence present on the same strand of a single nucleic acid written in 5’ to 3’ direction or a single polypeptide written in N-to-C orientation, the first sequence is upstream of the second sequence when the 3’ end or C-terminal of the first sequence is on the left side of the 5’ end or N-terminal of the second sequence, and the first sequence is downstream of the second sequence when the 5’ end or N-terminal of the first sequence is on the right side of the 3’ end or C-terminal of the second sequence. For example, a promoter is usually at the upstream of a coding sequence under the regulation of the promoter; and on the other hand, a coding sequence under the regulation of a promoter is usually at the downstream of the promoter.
[0072] As used herein, the term “regulatory element” refers to a DNA sequence that controls or impacts one or more aspects of transcription and / or expression and is intended to include promoters, enhancers, silencers, termination signals, internal ribosome entry sites (IRES) , and other expression control elements (e.g., transcription termination signals such as polyadenylation signals and poly-U sequences) . Regulatory elements include those that direct constitutive expression of a nucleotide sequence in many types of host cells and those that direct expression of a nucleotide sequence only in certain host cells (e.g., tissue-specific regulatory sequences) . Regulatory elements may also direct expression in a time-dependent manner, e.g., in a cell cycle-dependent or developmental stage-dependent manner, which may or may not be tissue or cell type specific.
[0073] As used herein, the term “operably linked” refers to a juxtaposition wherein the components described are in a relationship permitting them to function in their intended manner. A regulatory element “operably linked” to a functional element is associated in such a way that transcription, expression, and / or activity of the functional element is achieved under conditions compatible with the regulatory element. In some embodiments, “operably linked” regulatory elements are contiguous (e.g., covalently linked) with the functional elements of interest; in some embodiments, regulatory elements act in trans to or otherwise at a distance from the functional elements of interest.
[0074] As used herein, the term “cell” is understood to refer not only to a particular individual cell, but to the progeny or potential progeny of the cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent cell, but are still included within the scope of the term.
[0075] As used herein, the term “in vivo” means inside the body of an organism, and the terms “ex vivo” or “in vitro” means outside the body of an organism.
[0076] As used herein, the term “treat” , “treatment” , or “treating” is an approach for obtaining beneficial or desired results including clinical results. For purposes of the disclosure, the beneficial or desired clinical results include, but are not limited to, one or more of the following: alleviating one or more symptoms resulting from a disease, diminishing the extent of a disease, stabilizing a disease (e.g., delaying the worsening of a disease) , delaying the spread (e.g., metastasis) of a disease, delaying the recurrence of a disease, reducing recurrence rate of a disease, delay or slowing the progression of a disease, ameliorating a disease state, providing a remission (partial or total) of a disease, decreasing the dose of one or more other medications required to treat a disease, delaying the progression of a disease, increasing the quality of life, and prolonging survival. Also encompassed by the term is a reduction of pathological consequence of a disease (such as cancer) . The methods of the disclosure contemplate any one or more of these aspects of treatment.
[0077] As used herein, the term “disease” includes the terms “disorder” and “condition” and is not limited to those specific diseases that have been medically or clinically defined.
[0078] As used herein, reference to “not” a value or parameter generally means and describes “other than” a value or parameter. For example, the method is not used to treat cancer of type X means the method may be used to treat cancer of types other than X.
[0079] As used herein, the singular forms “a” , “an” , and “the” include plural referents unless the context clearly dictates otherwise. That is, articles “a / an” and “the” are used herein to refer to one or more than one (i.e., at least one) grammatical object of the article. For example, “an element” means one element or more than one element, e.g., two elements.
[0080] As used herein, the term “and / or” in a phrase such as “A and / or B” is intended to mean either or both of the alternatives, including both A and B, A or B, A (alone) , and B (alone) . Likewise, the term “and / or” in a phrase such as “A, B, and / or C” is intended to encompass each of the following embodiments: A, B, and C; A, B, or C; A or C; A or B; B or C; A and C; A and B; B and C; A (alone) ; B (alone) ; and C (alone) .
[0081] As used herein, when the term “about” is ahead of a serious of numbers (for example, about 1, 2, 3) , it is understood that each of the serious of numbers is modified by the term “about” (that is, about 1, about 2, about 3) . The term “about X-Y” used herein has the same meaning as “about X to about Y. ”
[0082] As used herein, a numerical range includes the end values of the range and each specific value within the range, for example, “16 to 100 nucleotides” includes 16 nucleotides and 100 nucleotides and each specific value between 16 and 100, e.g., 17, 23, 34, 52, 78.
[0083] It is understood that embodiments of the disclosure described herein include “consisting” and / or “consisting essentially of” embodiments.
[0084] It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely” , “only” , and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
[0085] I. Overview
[0086] The disclosure provides, in part, guide nucleic acid-guided DNA endonucleases including SEQ ID NOs: 4, 1-3, and 5-11, and derivatives (including mutants and fusion proteins) thereof, systems comprising the guide nucleic acid-guided DNA endonucleases or the derivative, and uses thereof.
[0087] The guide nucleic acid-guided DNA endonucleases and derivatives thereof provided in the disclosure are CRISPR-associated proteins (Cas proteins) . The systems provided in the disclosure are CRISPR-Cas systems.
[0088] II. Representative polypeptides
[0089] In an aspect, the disclosure provides a guide nucleic acid-guided DNA endonuclease as set forth in any one of SEQ ID NOs: 4, 1-3, and 5-11. In some embodiments, the disclosure provides a wild type polypeptide as set forth in any one of SEQ ID NOs: 4, 1-3, and 5-11. In some embodiments, the disclosure provides a reference polypeptide as set forth in any one of SEQ ID NOs: 4, 1-3, and 5-11. In some embodiments, the disclosure provides a reference polypeptide that is a polypeptide in the disclosure.
[0090] In another aspect, the disclosure provides a polypeptide comprising an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 4, 1-3, and 5-11.
[0091] The derivatives of the guide nucleic acid-guided DNA endonucleases of the disclosure include, but not limited to, mutants of the guide nucleic acid-guided DNA endonucleases, and fusion proteins comprising the guide nucleic acid-guided DNA endonucleases or the mutants.
[0092] (I) Mutants
[0093] In some embodiments, the polypeptide provided in the disclosure is a mutant of a reference polypeptide (which could be any polypeptide in the disclosure, e.g., any one of SEQ ID NOs: 4, 1-3, and 5-11) in the disclosure, i.e., comprising an (e.g., one or more) amino acid substitution relative to (compared to) the reference polypeptide.
[0094] In some embodiments, the polypeptide has one or more of the properties, including but not limited to the followings, compared to the reference polypeptide:
[0095] (1) higher endonuclease activity;
[0096] (2) lower endonuclease activity;
[0097] (3) higher nickase activity;
[0098] (4) higher ability to bind to a target dsDNA;
[0099] (5) higher base editing efficiency when used in a base editor for base editing;
[0100] (6) higher prime editing efficiency when used in a prime editor for prime editing;
[0101] (7) higher epigenomic editing efficiency when used in an epigenomic editor for epigenomic editing;
[0102] (8) higher transcription activating efficiency when used in a transcriptional activator for transcriptional activation;
[0103] (9) lower off-target endonuclease activity;
[0104] (10) lower off-target nickase activity;
[0105] (11) wider PAM recognition;
[0106] (12) higher on-target editing; and
[0107] (13) lower off-target editing.
[0108] In some embodiments, the polypeptide comprises an amino acid substitution relative to (compared to) the amino acid sequence of any one of SEQ ID NOs: 4, 1-3, and 5-11.
[0109] In some embodiments, the polypeptide comprising the amino acid substitution comprises an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%) and less than 100%to the amino acid sequence of any one of SEQ ID NOs: 4, 1-3, and 5-11.
[0110] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, and 788 of SEQ ID NO: 1.
[0111] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, and 699 of SEQ ID NO: 2.
[0112] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, and 750 of SEQ ID NO: 3.
[0113] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, and 735 of SEQ ID NO: 4.
[0114] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, and 744 of SEQ ID NO: 5.
[0115] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, and 673 of SEQ ID NO: 6.
[0116] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, and 681 of SEQ ID NO: 7.
[0117] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, and 641 of SEQ ID NO: 8.
[0118] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, and 653 of SEQ ID NO: 9.
[0119] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, and 644 of SEQ ID NO: 10.
[0120] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position or that is a position selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, and 647 of SEQ ID NO: 11.
[0121] In some embodiments, the amino acid substitution is a conservative amino acid substitution or a non-conservative amino acid substitution.
[0122] In some embodiments, the amino acid substitution is an amino acid substitution with an amino acid residue that is different from the original amino acid residue at the position of any one of SEQ ID NOs: 4, 1-3, and 5-11.
[0123] In some embodiments, the amino acid substitution is an amino acid substitution with
[0124] (1) a non-polar amino acid residue (such as, Glycine (Gly / G) , Alanine (Ala / A) , Valine (Val / V) , Cysteine (Cys / C) , Proline (Pro / P) , Leucine (Leu / L) , Isoleucine (Ile / I) , Methionine (Met / M) , Tryptophan (Trp / W) , Phenylalanine (Phe / F) ,
[0125] (2) a polar amino acid residue (such as, Serine (Ser / S) , Threonine (Thr / T) , Tyrosine (Tyr / Y) , Asparagine (Asn / N) , Glutamine (Gln / Q) ) ,
[0126] (3) a positively charged amino acid residue (such as, Lysine (Lys / K) , Arginine (Arg / R) , Histidine (His / H) ) , or
[0127] (4) a negatively charged amino acid residue (such as, Aspartic Acid (Asp / D) , Glutamic Acid (Glue / E) ) .
[0128] In some embodiments, the amino acid substitution is an amino acid substitution with a positively charged amino acid residue, such as, Arginine (R) .
[0129] In some embodiments, the amino acid substitution is an amino acid substitution with a non-polar amino acid residue, such as, Alanine (A) .
[0130] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position selected from the group consisting of Y11, K14, Y15, N25, Q27, E35, G46, E49, G52, I53, T56, Q57, Q58, G75, Y76, D77, S79, T80, K84, Y95, A98, D99, M100, P103, E104, E105, I106, E107, E108, K115, T120, Q121, N123, S124, A132, K136, D145, H147, K149, T170, K173, G176, K177, I193, L199, N220, N229, Q230, D232, W233, K236, N237, D240, A242, K254, E255, E261, K262, T265, K268, A274, L277, K278, T286, K289, D290, N295, A297, D299, F308, A311, A312, K313, G316, K317, L320, A321, K322, D323, E324, Y325, K327, G328, N329, E331, G333, K337, A339, G346, N353, S358, S373, N374, F376, L379, A405, A408, K411, N412, D413, K415, K416, K417, D420, K423, S424, I425, N426, Q427, N431, K432, S435, H436, A437, Y469, G475, K477, K478, A481, D483, K484, H485, H486, E502, S514, S528, K529, E530, K532, D534, S536, S537, Y559, E563, F585, N592, K616, C617, T618, K624, G625, L629, G631, L648, C657, E658, K660, V665, E670, D673, K676, I679, K684, K685, N695, L700, E704, D709, W712, T714, S716, K719, K720, and T725 of SEQ ID NO: 4. In some embodiments, the amino acid substitution is an amino acid substitution with Arginine (R) .
[0131] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position selected from the group consisting of K14, Q57, K136, T286, A311, N374, L379, K484, E530, Y559, N592, E658, K685, and E704 of SEQ ID NO: 4. In some embodiments, the amino acid substitution is an amino acid substitution with Arginine (R) .
[0132] In some embodiments, the polypeptide comprises amino acid substitutions at positions of any one of SEQ ID NOs: 4, 1-3, and 5-11 that are corresponding to the positions of SEQ ID NO: 4 selected from the group consisting of
[0133] 1) A311, E530, E658, N592, and K14;
[0134] 2) E530, E658, and K14;
[0135] 3) A311, E530, and E658;
[0136] 4) A311, E530, E658, and Q57;
[0137] 5) A311, E530, E658, and E704;
[0138] 6) A311, E530, E658, and N374;
[0139] 7) A311, E530, E658, Q57, and E704;
[0140] 8) A311, E530, E658, E704, and N374;
[0141] 9) A311, E530, E658, Q57, E704, and N374;
[0142] 10) A311, E530, E658, Q57, and N374;
[0143] 11) A311, E530, E658, L379, and K484;
[0144] 12) A311, E530, E658, and K685;
[0145] 13) A311, E530, E658, K136, and T286;
[0146] 14) A311, E530, E658, Q57, E704, L379, and K484;
[0147] 15) A311, E530, E658, Q57, E704, and K685;
[0148] 16) A311, E530, E658, Q57, E704, K136, and T286;
[0149] 17) A311, E530, E658, Q57, L379, and K484;
[0150] 18) A311, E530, E658, Q57, and K685;
[0151] 19) A311, E530, E658, Q57, K136, and T286;
[0152] 20) A311, E530, E658, N374, L379, and K484;
[0153] 21) A311, E530, E658, N374, K685; and
[0154] 22) A311, E530, E658, N374, K136, and T286. In some embodiments, the amino acid substitution is an amino acid substitution with Arginine (R) .
[0155] In some embodiments, the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position selected from the group consisting of of D10, G12, E304, F308, H486, or D489 of SEQ ID NO: 4. In some embodiments, the amino acid substitution is an amino acid substitution with a non-polar amino acid residue, such as, Alanine (A) .
[0156] In some embodiments, the polypeptide comprises amino acid substitutions at positions of any one of SEQ ID NOs: 4, 1-3, and 5-11 that are corresponding to the positions of SEQ ID NO: 4 selected from the group consisting of
[0157] 1) D10, A311, E530, E658, N592, and K14;
[0158] 2) D10, E530, E658, and K14;
[0159] 3) D10, A311, E530, and E658;
[0160] 4) D10, A311, E530, E658, and Q57;
[0161] 5) D10, A311, E530, E658, and E704;
[0162] 6) D10, A311, E530, E658, and N374;
[0163] 7) D10, A311, E530, E658, Q57, and E704;
[0164] 8) D10, A311, E530, E658, E704, and N374;
[0165] 9) D10, A311, E530, E658, Q57, E704, and N374;
[0166] 10) D10, A311, E530, E658, Q57, and N374;
[0167] 11) D10, A311, E530, E658, L379, and K484;
[0168] 12) D10, A311, E530, E658, and K685;
[0169] 13) D10, A311, E530, E658, K136, and T286;
[0170] 14) D10, A311, E530, E658, Q57, E704, L379, and K484;
[0171] 15) D10, A311, E530, E658, Q57, E704, and K685;
[0172] 16) D10, A311, E530, E658, Q57, E704, K136, and T286;
[0173] 17) D10, A311, E530, E658, Q57, L379, and K484;
[0174] 18) D10, A311, E530, E658, Q57, and K685;
[0175] 19) D10, A311, E530, E658, Q57, K136, and T286;
[0176] 20) D10, A311, E530, E658, N374, L379, and K484;
[0177] 21) D10, A311, E530, E658, N374, K685; and
[0178] 22) D10, A311, E530, E658, N374, K136, and T286.
[0179] In some embodiments, the polypeptide comprises an amino acid substitution relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to an amino acid substitution selected from the group consisting of K14R, Q57R, K136R, T286R, T286R., A311R, N374R, L379R, K484R, E530R, N592R, E658R, K685R, and E704R relative to SEQ ID NO: 4.
[0180] In some embodiments, the polypeptide comprises amino acid substitutions relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that are corresponding to the amino acid substitutions relative to SEQ ID NO: 4 selected from the group consisting of
[0181] 1) A311R, E530R, E658R, N592R, and K14R;
[0182] 2) E530R, E658R, and K14R;
[0183] 3) A311R, E530R, and E658R;
[0184] 4) A311R, E530R, E658R, and Q57R;
[0185] 5) A311R, E530R, E658R, and E704R;
[0186] 6) A311R, E530R, E658R, and N374R;
[0187] 7) A311R, E530R, E658R, Q57R, and E704R;
[0188] 8) A311R, E530R, E658R, E704R, and N374R;
[0189] 9) A311R, E530R, E658R, Q57R, E704R, and N374R;
[0190] 10) A311R, E530R, E658R, Q57R, and N374R;
[0191] 11) A311R, E530R, E658R, L379R, and K484R;
[0192] 12) A311R, E530R, E658R, and K685R;
[0193] 13) A311R, E530R, E658R, K136R, and T286R;
[0194] 14) A311R, E530R, E658R, Q57R, E704R, L379R, and K484R;
[0195] 15) A311R, E530R, E658R, Q57R, E704R, and K685R;
[0196] 16) A311R, E530R, E658R, Q57R, E704R, K136R, and T286R;
[0197] 17) A311R, E530R, E658R, Q57R, L379R, and K484R;
[0198] 18) A311R, E530R, E658R, Q57R, and K685R;
[0199] 19) A311R, E530R, E658R, Q57R, K136R, and T286R;
[0200] 20) A311R, E530R, E658R, N374R, L379R, and K484R;
[0201] 21) A311R, E530R, E658R, N374R, and K685R; and
[0202] 22) A311R, E530R, E658R, N374R, K136R, and T286R.
[0203] In some embodiments, the polypeptide is selected from the group consisting of the following polypeptides:
[0204] 1) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, N592R, and K14R;
[0205] 2) SEQ ID NO: 4 with substitutions E530R, E658R, and K14R;
[0206] 3) SEQ ID NO: 4 with substitutions A311R, E530R, and E658R;
[0207] 4) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, and Q57R;
[0208] 5) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, and E704R;
[0209] 6) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, and N374R;
[0210] 7) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, and E704R;
[0211] 8) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, E704R, and N374R;
[0212] 9) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, E704R, and N374R;
[0213] 10) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, and N374R;
[0214] 11) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, L379R, and K484R;
[0215] 12) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, and K685R;
[0216] 13) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, K136R, and T286R;
[0217] 14) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, E704R, L379R, and K484R;
[0218] 15) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, E704R, and K685R;
[0219] 16) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, E704R, K136R, and T286R;
[0220] 17) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, L379R, and K484R;
[0221] 18) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, and K685R;
[0222] 19) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, Q57R, K136R, and T286R;
[0223] 20) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, N374R, L379R, and K484R;
[0224] 21) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, N374R, and K685R; and
[0225] 22) SEQ ID NO: 4 with substitutions A311R, E530R, E658R, N374R, K136R, and T286R.
[0226] In some embodiments, the polypeptide comprises an amino acid substitution relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to an amino acid substitution selected from the group consisting of D10A, G12A, E304A, F308A, H486A, and D489A relative to SEQ ID NO: 4.
[0227] In some embodiments, the polypeptide comprises an amino acid substitution relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to an amino acid substitution of D10A relative to SEQ ID NO: 4.
[0228] In some embodiments, the polypeptide comprises an amino acid substitution selected from the group consisting of:
[0229] (1) K14R
[0230] (2) Q57R
[0231] (3) K136R
[0232] (4) T286R
[0233] (5) T286R.;
[0234] (6) A311R;
[0235] (7) N374R;
[0236] (8) L379R;
[0237] (9) K484R;
[0238] (10) E530R;
[0239] (11) N592R;
[0240] (12) E658R;
[0241] (13) K685R;
[0242] (14) E704R;
[0243] (15) A311R + E530R + E658R + N592R + K14R;
[0244] (16) E530R + E658R + K14R;
[0245] (17) A311R + E530R + E658R;
[0246] (18) A311R + E530R + E658R + Q57R;
[0247] (19) A311R + E530R + E658R + E704R;
[0248] (20) A311R + E530R + E658R + N374R;
[0249] (21) A311R + E530R + E658R + Q57R + E704R;
[0250] (22) A311R + E530R + E658R + E704R + N374R;
[0251] (23) A311R + E530R + E658R + Q57R + E704R + N374R;
[0252] (24) A311R + E530R + E658R + Q57R + N374R;
[0253] (25) A311R + E530R + E658R + L379R + K484R;
[0254] (26) A311R + E530R + E658R + K685R;
[0255] (27) A311R + E530R + E658R + K136R + T286R;
[0256] (28) A311R + E530R + E658R + Q57R + E704R + L379R + K484R;
[0257] (29) A311R + E530R + E658R + Q57R + E704R + K685R;
[0258] (30) A311R + E530R + E658R + Q57R + E704R + K136R + T286R;
[0259] (31) A311R + E530R + E658R + Q57R + L379R + K484R;
[0260] (32) A311R + E530R + E658R + Q57R + K685R;
[0261] (33) A311R + E530R + E658R + Q57R + K136R + T286R;
[0262] (34) A311R + E530R + E658R + N374R + L379R + K484R;
[0263] (35) A311R + E530R + E658R + N374R + K685R;
[0264] (36) A311R + E530R + E658R + N374R + K136R + T286R; and
[0265] any combination of (1) - (36) ;
[0266] relative to SEQ ID NO: 4; and optionally further, amino acid substitution D10A relative to SEQ ID NO: 4. In some embodiments, the polypeptide comprises amino acid substitutions relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that are corresponding to the amino acid substitutions relative to SEQ ID NO: 4 selected from the group consisting of
[0267] 1) D10A, A311R, E530R, E658R, N592R, and K14R;
[0268] 2) D10A, E530R, E658R, and K14R;
[0269] 3) D10A, A311R, E530R, and E658R;
[0270] 4) D10A, A311R, E530R, E658R, and Q57R;
[0271] 5) D10A, A311R, E530R, E658R, and E704R;
[0272] 6) D10A, A311R, E530R, E658R, and N374R;
[0273] 7) D10A, A311R, E530R, E658R, Q57R, and E704R;
[0274] 8) D10A, A311R, E530R, E658R, E704R, and N374R;
[0275] 9) D10A, A311R, E530R, E658R, Q57R, E704R, and N374R;
[0276] 10) D10A, A311R, E530R, E658R, Q57R, and N374R;
[0277] 11) D10A, A311R, E530R, E658R, L379R, and K484R;
[0278] 12) D10A, A311R, E530R, E658R, and K685R;
[0279] 13) D10A, A311R, E530R, E658R, K136R, and T286R;
[0280] 14) D10A, A311R, E530R, E658R, Q57R, E704R, L379R, and K484R;
[0281] 15) D10A, A311R, E530R, E658R, Q57R, E704R, and K685R;
[0282] 16) D10A, A311R, E530R, E658R, Q57R, E704R, K136R, and T286R;
[0283] 17) D10A, A311R, E530R, E658R, Q57R, L379R, and K484R;
[0284] 18) D10A, A311R, E530R, E658R, Q57R, and K685R;
[0285] 19) D10A, A311R, E530R, E658R, Q57R, K136R, and T286R;
[0286] 20) D10A, A311R, E530R, E658R, N374R, L379R, and K484R;
[0287] 21) D10A, A311R, E530R, E658R, N374R, and K685R; and
[0288] 22) D10A, A311R, E530R, E658R, N374R, K136R, and T286R.
[0289] In some embodiments, the polypeptide is selected from the group consisting of the following polypeptides:
[0290] 1) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, N592R, and K14R;
[0291] 2) SEQ ID NO: 4 with substitutions D10A, E530R, E658R, and K14R;
[0292] 3) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, and E658R;
[0293] 4) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, and Q57R;
[0294] 5) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, and E704R;
[0295] 6) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, and N374R;
[0296] 7) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, and E704R;
[0297] 8) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, E704R, and N374R;
[0298] 9) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, E704R, and N374R;
[0299] 10) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, and N374R;
[0300] 11) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, L379R, and K484R;
[0301] 12) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, and K685R;
[0302] 13) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, K136R, and T286R;
[0303] 14) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, E704R, L379R, and K484R;
[0304] 15) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, E704R, and K685R;
[0305] 16) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, E704R, K136R, and T286R;
[0306] 17) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, L379R, and K484R;
[0307] 18) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, and K685R;
[0308] 19) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, Q57R, K136R, and T286R;
[0309] 20) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, N374R, L379R, and K484R;
[0310] 21) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, N374R, and K685R; and
[0311] 22) SEQ ID NO: 4 with substitutions D10A, A311R, E530R, E658R, N374R, K136R, and T286R.
[0312] In some embodiments, the polypeptide comprises amino acid substitutions of (a) D10A, A311R, E530R, E658R, K136R, and T286R relative to SEQ ID NO: 4; or (b) D10A, A311R, E530R, E658R, N374R, L379R, and K484R relative to SEQ ID NO: 4.
[0313] In some embodiments, the polypeptide comprises an amino acid sequence of SEQ ID NO: 53 (S9D04-D10A, A311R, E530R, E658R, K136R, and T286R) (HAP5889) ; or comprises an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 53.
[0314] In some embodiments, the polypeptide comprises an amino acid sequence of SEQ ID NO: 54 (S9D04-D10A, A311R, E530R, E658R, N374R, L379R, and K484R) (HAP5896) ; or comprises an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 54.
[0315] In some embodiments, the polypeptide has at least one of endonuclease activity, nickase activity, and DNA binding property.
[0316] In some embodiments, the polypeptide is capable of recognizing a protospacer adjacent motif (PAM) comprising, consisting essentially of, or consisting of 5’-NNN-3’ immediately 3’ to a protospacer sequence of a target DNA, wherein N is A, T, G, or C. In some embodiments, the polypeptide is capable of recognizing a PAM comprising, consisting essentially of, or consisting of 5’-TTN-3’ immediately 3’ to a protospacer sequence of a target DNA, wherein N is A, T, G, or C.
[0317] (II) Fusion proteins
[0318] The polypeptide of the disclosure can be linked to or fused to a functional domain to form a fusion protein. Alternatively, the polypeptide itself is a fusion protein herein. The functional domain is usually heterologous to the polypeptide.
[0319] Therefore, in an aspect, the disclosure provides a fusion protein comprising the polypeptide of the disclosure and a functional domain.
[0320] In some embodiments, the polypeptide is fused to a functional domain to form a fusion protein. The fusion protein can be regarded as a derivative of the polypeptide or as a mutant of the polypeptide with an amino acid addition at the N-terminal and / or C-terminal of the polypeptide or with an amino acid insertion within the polypeptide. The polypeptide in the disclosure may also refer to the fusion protein of the disclosure depending on the context.
[0321] In some embodiments, the functional domain is fused to the N-terminal of (N-terminally fused to) or the C-terminal of (C-terminally fused to) the polypeptide or inserted into (fused internally to) the polypeptide.
[0322] In some embodiments, the functional domain is fused to the polypeptide via a linker, e.g., a XTEN linker, a GS linker. As used herein, the term “GS linker” refers to a linker comprising one or more Gly (G; glycine) and one or more Ser (S; serine) in any sequence. In some embodiments, a GS linker may contain an additional sequence, e.g., a XTEN linker, a NLS, within the GS linker.
[0323] In some embodiments, the fusion protein contains the polypeptide and more than one (e.g., 2, 3, 4, 6, or more) functional domain. In some embodiments, two functional domains are fused together via a linker in the disclosure.
[0324] In some embodiments, the functional domain hastransposase activity, methylase activity, demethylase activity, translation activation activity, translation repression activity, transcription activation activity, transcription repression activity, deaminase activity, transcription release factor activity, chromatin modifying or remodeling activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, nucleic acid binding activity, detectable activity, or any combination thereof.
[0325] In some embodiments, the functional domain is selected from the group consisting of a nuclear localization signal (NLS) , a nuclear export signal (NES) , a deaminase or a catalytic domain thereof, an uracil glycosylase inhibitor (UGI) , an uracil glycosylase (UNG) , a methylpurine glycosylase (MPG) , a methylase or a catalytic domain thereof, a demethylase or a catalytic domain thereof, an transcription activating domain (e.g., VP64, VPR, or miniVPR) , an transcription inhibiting domain (e.g., KRAB moiety or SID moiety) , a reverse transcriptase or a catalytic domain thereof, an exonuclease or a catalytic domain thereof (e.g., T5 exonuclease) , a histone residue modification domain, a nuclease catalytic domain (e.g., FokI) , a transcription modification factor, a light gating factor, a chemical inducible factor, a chromatin visualization factor, a targeting polypeptide for providing binding to a cell surface portion on a target cell or a target cell type, a reporter (e.g., fluorescent) polypeptide or a detection label (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP) , a localization signal, a polypeptide targeting moiety, a DNA binding domain (e.g., MBP, Lex A DBD, Gal4 DBD) , an epitope tag (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc) , a transcription release factor, an HDAC, a moiety having RNA cleavage activity, a moiety having ssDNA cleavage activity, a moiety having dsDNA cleavage activity, a DNA or RNA ligase, a functional domain exhibiting activity to modify a target DNA selected from the group consisting of: methyltransferase activity, DNA repair activity, DNA damage activity, dismutase activity, alkylation activity, dealkylation activity, depurination activity, oxidation activity, deoxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyl transferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitination activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) , deglycosylation activity, and a catalytic domain thereof, and a functional fragment thereof, and any combination thereof.
[0326] In some embodiments, the fusion protein comprises a NLS at the N-terminal and / or the C-terminal of the polypeptide. In some embodiments, the fusion protein comprises one or two NLS at the N-terminal and / or the C-terminal of the polypeptide.
[0327] In some embodiments, the fusion protein comprises a NLS at the N-terminal and / or the C-terminal of the functional domain. In some embodiments, the fusion protein comprises one or two NLS at the N-terminal and / or the C-terminal of the functional domain.
[0328] In some embodiments, the NLS comprises or is SV40 NLS, bpSV40 NLS (BP NLS, bpNLS) , NP NLS (Xenopus laevis Nucleoplasmin NLS, nucleoplasmin NLS) , or c-myc NLS.
[0329] (i) Base editor
[0330] In some embodiments, the functional domain comprises a deaminase or a catalytic domain thereof.
[0331] In some embodiments, the functional domain comprises an uracil glycosylase (UNG) .
[0332] In some embodiments, the functional domain comprises a methylpurine glycosylase (MPG) .
[0333] The polypeptide of the disclosure can be used to replace the napDNAbp / napDNAbd in PCT / CN2023 / 094023(AYBE) and PCT / CN2024 / 089874 (gBE) to constitute AYBE base editor and gBE base editor, respectively, which two PCT applications are incorporated herein by reference in their entities.
[0334] (a) Adenine Base editor
[0335] In some embodiments, the deaminase or catalytic domain thereof is an adenine deaminase or a catalytic domain thereof (e.g., tRNA adenosine deaminase (TadA) , such as, TadA8e, TadA8.17, TadA8.20, TadA9, TadA8e-V106W, TadA8EV106W+D108Q TadA-CDa, TadA-CDb, TadA-CDc, TadA-CDd, TadA-CDe, TadA-dual, TADAC-1.2, TADAC-1.14, TADAC-1.17, TADAC-1.19, TADAC-2.5, TADAC-2.6, TADAC-2.9, TADAC-2.19, TADAC-2.23, TadA8e-N46L, TadA8e-N46P, TadA* (8.17m) , TadA (8.8m) ) .
[0336] In some embodiments, the adenine deaminase or a catalytic domain thereof is TadA8E or TadA8EV106W.
[0337] In some embodiments, the deaminase or catalytic domain thereof comprises an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 57.
[0338] In some embodiments, the adenine deaminase or a catalytic domain thereof is set forth in SEQ ID NO: 57 (TadA8E) .
[0339] In some embodiments, the fusion protein comprises the polypeptide and the adenine deaminase or a catalytic domain thereof.
[0340] In some embodiments, the fusion protein comprises, from N-to C-terminus, an optional NLS, the adenine deaminase or a catalytic domain thereof, an optional linker, the polypeptide, and an optional NLS.
[0341] In some embodiments, the fusion protein comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the amino acid sequence of SEQ ID NO: 56.
[0342] In some embodiments, the fusion protein comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the amino acid sequence of SEQ ID NO: 56 provided that the polypeptide in SEQ ID NO: 56 is replaced with the polypeptide of any preceding claim other than S9D04-D10A.
[0343] (b) Cytosine Base editor
[0344] In some embodiments, the deaminase or catalytic domain thereof is a cytosine deaminase or a catalytic domain thereof (e.g., an apolipoprotein B mRNA-editing complex (APOBEC) family deaminase, an activation induced deaminase (AID) , a cytidine deaminase 1 from Petromyzon marinus (pmCDA1) , DddA, or a functional variant thereof, e.g., APOBEC1 (rAPOBEC1) , APOBEC2, APOBEC3, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H, hAPOBEC3-W104A) .
[0345] In some embodiments, the cytidine deaminase or a catalytic domain thereof is any deaminase mentioned in PCT / CN2024 / 078613 or any PCT application claims the priority of PCT / CN2024 / 078613.
[0346] In some embodiments, the functional domain comprises an uracil glycosylase inhibitor (UGI) domain.
[0347] In some embodiments, the fusion protein comprises one, two, or three UGI domains.
[0348] In some embodiments, the fusion protein comprises the polypeptide, the cytidine deaminase or a catalytic domain thereof, and the UGI domain.
[0349] In some embodiments, the fusion protein comprises, from N-to C-terminus, an optional NLS, the cytidine deaminase or a catalytic domain thereof, an optional linker, the polypeptide, an optional linker, the UGI domain, an optional linker, optionally the UGI domain, an optional linker, and an optional NLS.
[0350] (ii) Prime editor
[0351] In some embodiments, the functional domain comprises a reverse transcriptase or a catalytic domain thereof.
[0352] In some embodiments, the fusion protein comprises the polypeptide and the reverse transcriptase or a catalytic domain thereof.
[0353] In some embodiments, the fusion protein comprises, from N-to C-terminus, an optional NLS, the polypeptide, an optional linker, the reverse transcriptase or a catalytic domain thereof, an optional linker, an optional NLS, an optional linker, and an optional NLS.
[0354] (iii) Epigenomic editor
[0355] In some aspects, the disclosure provides a way to epigenomic modification of a target gene, e.g., methylation, to regulate the gene. The epigenomic modification, in some embodiments, silences the expression of the gene, leading to reduced level of a corresponding mRNA and / or reduced level of a corresponding protein.
[0356] In some embodiments, the fusion protein comprises a transcription inhibiting domain (e.g., KRAB domain or SID domain) .
[0357] In some embodiments, the fusion protein comprises a KRAB domain.
[0358] In some embodiments, the fusion protein comprises a DNA methyltransferase, such as, DNMT3l, DNMT3a.
[0359] In some embodiments, the fusion protein comprises a DNMT3l domain and a DNMT3a domain.
[0360] In some embodiments, the fusion protein comprises the polypeptide, a DNMT3l domain, a DNMT3a domain, and a KRAB domain.
[0361] In some embodiments, the fusion protein comprises, from N-terminal to C-terminal, the polypeptide, the KRAB domain, the DNMT3l domain, and the DNMT3a domain.
[0362] (iv) Transcriptional Activator
[0363] In some aspects, the disclosure provides a way to transcriptional regulation of a target gene, for example, by transcriptional activation or inhibition of the promoter under the regulation of which the target gene is. The transcriptional regulation, in some embodiments, increase or decreases the expression of the target gene, leading to an increased or decreased level of a corresponding mRNA and / or an increased or decreased level of a corresponding protein.
[0364] In some embodiments, the functional domain comprises a transcription activating domain (e.g., VP64, VPR, or miniVPR) .
[0365] In some embodiments, the fusion protein comprises the polypeptide of the disclosure and the transcription activating domain.
[0366] In some embodiments, the fusion protein comprises, from N-terminal to C-terminal, the polypeptide, and the transcription activating domain.
[0367] In some embodiments, the transcription activating domain comprises miniVPR.
[0368] III. Representative systems
[0369] The polypeptide (including the fusion protein) of the disclosure is used in combination with a guide nucleic acid as described herein to constitute a system comprising the polypeptide and the guide nucleic acid. The system in the disclosure is non-naturally occurring as it requires a guide sequence targeting to a target dsDNA heterologous to the scaffold sequence.
[0370] In an aspect, the disclosure provides a system comprising:
[0371] (1) the polypeptide of the disclosure, or a polynucleotide (e.g., a DNA, an RNA) encoding the polypeptide, and
[0372] (2) a guide nucleic acid or a polynucleotide (e.g., a DNA, an RNA) encoding the guide nucleic acid, the guide nucleic acid comprising:
[0373] (i) a scaffold sequence capable of forming a complex with the polypeptide; and
[0374] (ii) a guide sequence capable of hybridizing to a target sequence of a target DNA, thereby guiding the complex to the target DNA.
[0375] In some embodiments, the system is a complex comprising the polypeptide complexed with the guide nucleic acid. In some embodiments, the complex further comprises the target DNA hybridized with the guide sequence.
[0376] In some embodiments, the system is a composition comprising the component (1) and the component (2) .
[0377] In some embodiments, the scaffold sequence is 3’ to the guide sequence.
[0378] In some embodiments, the guide nucleic acid is a guide RNA (gRNA) .
[0379] In yet another aspect, the disclosure provides a guide nucleic acid as described in the system of the disclosure.
[0380] (i) Scaffold sequence
[0381] For the purpose of the disclosure, the scaffold sequence is compatible with the polypeptide of the disclosure and is capable of complexing with the polypeptide. The scaffold sequence may be a naturally occurring scaffold sequence identified along with the polypeptide, or a variant thereof maintaining the ability to complex with the polypeptide. Generally, the ability to complex with the polypeptide is maintained as long as the secondary structure of the variant is substantially identical to the secondary structure of the naturally occurring scaffold sequence. A nucleotide deletion, insertion, or substitution in the primary sequence of the scaffold sequence may not necessarily change the secondary structure of the scaffold sequence (e.g., the relative locations and / or sizes of the stems, bulges, and loops of the scaffold sequence do not significantly deviate from that of the original stems, bulges, and loops) . For example, the nucleotide deletion, insertion, or substitution may be in a bulge or loop region of the scaffold sequence so that the overall symmetry of the bulge and hence the secondary structure remains largely the same. The nucleotide deletion, insertion, or substitution may also be in the stems of the scaffold sequence so that the lengths of the stems do not significantly deviate from that of the original stems (e.g., adding or deleting one base pair in each of two stems correspond to 4 total base changes) . On the other hand, engineering of the scaffold sequence may be applied to improve the activity of the system.
[0382] In some embodiments, the scaffold sequence comprises a direct repeat (DR) sequence and a tracr sequence 3’ to the DR sequence.
[0383] In some embodiments, the DR sequence is linked to the tracr sequence with or without a linker.
[0384] In some embodiments, the linker comprises or is 5’-GAAA-3’ or 5’-AAAAAA-3’ .
[0385] In some embodiments, the DR sequence comprises any one of SEQ ID NOs: 12-22 or a polynucleotide different from any one of SEQ ID NOs: 12-22 by no more than 1, 2, 3, 4, 5, 6, or 7 nucleotides.
[0386] In some embodiments, the tracr sequence comprises any one of SEQ ID NOs: 23-33 or a polynucleotide different from any one of SEQ ID NOs: 23-33 by no more than 1, 2, 3, 4, 5, 6, or 7 nucleotides.
[0387] In some embodiments, the scaffold sequence has substantially the same secondary structure as the secondary structure of any one of SEQ ID NOs: 34-55 and 65.
[0388] In some embodiments, the scaffold sequence comprises a polynucleotide sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 34-55 and 65.
[0389] In some embodiments, the scaffold sequence comprises the polynucleotide sequence of any one of SEQ ID NOs: 34-55 and 65.
[0390] In the disclosure, referring to the correspondence relationship as shown in Table 1, for any polypeptide of the disclosure based on SEQ ID NO: 1 (e.g., SEQ ID NO: 1 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 34 or 45 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 2 (e.g., SEQ ID NO: 2 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 35 or 46 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 3 (e.g., SEQ ID NO: 3 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 36 or 47 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO:4 (e.g., SEQ ID NO: 4 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 37 or 48 or 65 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO:5 (e.g., SEQ ID NO: 5 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 38 or 49 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 6 (e.g., SEQ ID NO: 6 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 39 or 50 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 7 (e.g., SEQ ID NO: 7 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 40 or 51 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 8 (e.g., SEQ ID NO: 8 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 41 or 52 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 9 (e.g., SEQ ID NO: 9 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 42 or 53 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 10 (e.g., SEQ ID NO: 10 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 43 or 54 or a derivative thereof; for any polypeptide of the disclosure based on SEQ ID NO: 11 (e.g., SEQ ID NO: 11 per se or a derivative thereof, the corresponding scaffold sequence is SEQ ID NO: 44 or 55 or a derivative thereof.
[0391] In some embodiments, the polypeptide comprises SEQ ID NO: 1 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 34 or 45 or a derivative thereof in the disclosure.
[0392] In some embodiments, the polypeptide comprises SEQ ID NO: 2 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 35 or 46 or a derivative thereof in the disclosure.
[0393] In some embodiments, the polypeptide comprises SEQ ID NO: 3 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 36 or 47 or a derivative thereof in the disclosure.
[0394] In some embodiments, the polypeptide comprises SEQ ID NO: 4 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 37 or 48 or 65 or a derivative thereof in the disclosure.
[0395] In some embodiments, the polypeptide comprises SEQ ID NO: 5 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 38 or 49 or a derivative thereof in the disclosure.
[0396] In some embodiments, the polypeptide comprises SEQ ID NO: 6 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 39 or 50 or a derivative thereof in the disclosure.
[0397] In some embodiments, the polypeptide comprises SEQ ID NO: 7 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 40 or 51 or a derivative thereof in the disclosure.
[0398] In some embodiments, the polypeptide comprises SEQ ID NO: 8 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 41 or 52 or a derivative thereof in the disclosure.
[0399] In some embodiments, the polypeptide comprises SEQ ID NO: 9 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 42 or 53 or a derivative thereof in the disclosure.
[0400] In some embodiments, the polypeptide comprises SEQ ID NO: 10 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 43 or 54 or a derivative thereof in the disclosure.
[0401] In some embodiments, the polypeptide comprises SEQ ID NO: 11 or a derivative thereof in the disclosure, and the scaffold sequence comprises SEQ ID NO: 44 or 55 or a derivative thereof in the disclosure.
[0402] As used herein, the term “derivative” in connection with the scaffold sequence of the disclosure refers to a mutant of the scaffold sequence in which a nucleotide addition, deletion, or substation is introduced.
[0403] In some embodiments, the scaffold sequence is a mutant of the scaffold sequence of SEQ ID NO: 48 and comprises a nucleotide deletion relative to SEQ ID NO: 48 selected from the group consisting of:
[0404] 1) Deletion of nucleotides at positions 17-19 and 24-26 of SEQ ID NO: 48;
[0405] 2) Deletion of nucleotides at positions 14-19 and 24-29 of SEQ ID NO: 48;
[0406] 3) Deletion of nucleotides at positions 12-19 and 24-31 of SEQ ID NO: 48;
[0407] 4) Deletion of nucleotides at positions 17-19, 24-26, 7-10, and 33-36 of SEQ ID NO: 48;
[0408] 5) Deletion of nucleotides at positions 46-53 and 126-131 of SEQ ID NO: 48;
[0409] 6) Deletion of nucleotides at positions 47-51 and 127-130 of SEQ ID NO: 48;
[0410] 7) Deletion of nucleotides at positions 50-54 and 125-127 of SEQ ID NO: 48;
[0411] 8) Deletion of nucleotides at positions 70-72 and 85-87 of SEQ ID NO: 48;
[0412] 9) Deletion of nucleotides at positions 92-99 of SEQ ID NO: 48;
[0413] 10) Deletion of nucleotides at positions 97-101 of SEQ ID NO: 48;
[0414] 11) Deletion of nucleotides at positions 14-19, 24-29, and 92-99 of SEQ ID NO: 48;
[0415] 12) Deletion of nucleotides at positions 12-19, 24-31, and 92-99 of SEQ ID NO: 48; and
[0416] 13) Deletion of nucleotides at positions 17-19, 24-26, 7-10, 33-36, and 92-99 of SEQ ID NO: 48.
[0417] In some embodiments, the scaffold sequence comprises, consists essentially of, or consists of a sequence of SEQ ID NO: 65.
[0418] (ii) Protospacer sequence / target sequence
[0419] In some embodiments, the protospacer sequence comprises about or at least about 14 contiguous nucleotides of the target DNA, e.g., about or at least about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides of the target DNA, or in a numerical range between any two of the preceding values, e.g., from about 14 to about 50, or from about 19 to about 24 contiguous nucleotides of the target DNA. In some embodiments, the protospacer sequence comprises about 20 contiguous nucleotides of the target DNA. As used herein, in the context of a target dsDNA, the protospacer sequence is on the nontarget strand of the target dsDNA.
[0420] In some embodiments, the protospacer sequence is immediately 5’ to a protospacer adjacent motif (PAM) . In some embodiments, the PAM comprises, consists essentially of, or consists of 5’-TTN-3’ , wherein N is A, T, G, or C.
[0421] In some embodiments, the target sequence comprises about or at least about 14 contiguous nucleotides of the target DNA, e.g., about or at least about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more contiguous nucleotides of the target DNA, or in a numerical range between any two of the preceding values, e.g., from about 14 to about 50, or from about 19 to about 24 contiguous nucleotides of the target DNA. In some embodiments, the target sequence comprises about 20 contiguous nucleotides of the target DNA. As used herein, in the context of a target dsDNA, the target sequence is on the target strand of the target dsDNA.
[0422] (iii) Guide sequence
[0423] In some embodiments, the guide sequence is about or at least about 14 nucleotides in length, e.g., about or at least about 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, or more nucleotides in length, or in a length of a numerical range between any two of the preceding values, e.g., in a length of from about 14 to about 50 nucleotides, or from about 19 to about 24 nucleotides. In some embodiments, the guide sequence is about 20 nucleotides in length.
[0424] In some embodiments, (1) the guide sequence is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100% (fully) reverse complementary to the target sequence; (2) the guide sequence contains no more than 5, 4, 3, 2, or 1 mismatch or contains no mismatch with the target sequence; or (3) the guide sequence comprises no mismatch with the target sequence in the first 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 nucleotides at the 5’ end of the guide sequence. In some embodiments, (1) the guide sequence is about 100% (fully) reverse complementary to the target sequence.
[0425] In some embodiments, the system comprises one guide nuclei acid comprising two or more guide sequences capable of hybridizing to two or more target sequences of the same target DNA or different target DNAs, wherein the two or more guide sequences are the same or different, and wherein the two or more target sequences are the same or different.
[0426] In some embodiments, the system comprises two or more guide nuclei acids comprising two or more guide sequences capable of hybridizing to two or more target sequences of the same target DNA or different target DNAs, wherein the two or more guide sequences are the same or different, and wherein the two or more target sequences are the same or different.
[0427] (iv) Target DNA
[0428] In some embodiments, the target DNA is a target dsDNA, such as, a eukaryotic dsDNA, e.g., a gene in a eukaryotic cell.
[0429] IV. Polynucleotides
[0430] In yet another aspect, the disclosure provides a polynucleotide comprising a sequence encoding the polypeptide of the disclosure. In some embodiments, the polynucleotide further comprises a sequence encoding a guide nucleic acid as described in the disclosure or further comprises a guide nucleic acid as described in the disclosure.
[0431] In yet another aspect, the disclosure provides a polynucleotide encoding or comprising a sequence encoding a guide nucleic acid as described in the disclosure.
[0432] (i) Regulation of guide nucleic acid
[0433] In some embodiments, the polynucleotide encoding or comprising a sequence encoding the guide nucleic acid is a DNA, a RNA, or a DNA / RNA mixture. By “DNA / RNA mixture” it refers to a nucleic acid comprising both one or more modified or unmodified ribonucleotides and one or more modified or unmodified deoxyribonucleotides, whether consecutive or not. However, by “DNA” or “RNA” it may also refer to a DNA containing one or more modified or unmodified ribonucleotides, whether consecutive or not, or an RNA containing one or more modified or unmodified deoxyribonucleotides, whether consecutive or not.
[0434] In some embodiments, the guide nucleic acid is operably linked to or under the regulation of a promoter.
[0435] In some embodiments, the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
[0436] Suitable promoters are known in the art and include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a β-actin promoter, an elongation factor 1α short (EFS) promoter, a βglucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and / or promoter, a chicken β-actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1α-subunit (EF1α) promoter, a ubiquitin C (UBC) promoter, a prion promoter, a neuron-specific enolase (NSE) , a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a platelet-derived growth factor (PDGF) promoter, a platelet-derived growth factor B-chain (PDGF-β) promoter, a synapsin (Syn) promoter, a synapsin 1 (Syn1) promoter, a methyl-CpG endonuclease 2 (MeCP2) promoter, a Ca2+ / calmodulin-dependent protein kinase II (CaMKII) promoter, a metabotropic glutamate receptor 2 (mGluR2) promoter, a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a β-globin minigene nβ2 promoter, a preproenkephalin (PPE) promoter, an enkephalin (Enk) promoter, an excitatory amino acid transporter 2 (EAAT2) promoter, a glial fibrillary acidic protein (GFAP) promoter, and a myelin basic protein (MBP) promoter.
[0437] (ii) Regulation of polypeptide
[0438] In some embodiments, the polynucleotide comprising a sequence encoding the polypeptide is a DNA, a RNA, or a DNA / RNA mixture.
[0439] In some embodiments, the polynucleotide comprising a sequence encoding the polypeptide is operably linked to or under the regulation of a promoter.
[0440] In some embodiments, the promoter is a ubiquitous, tissue-specific, cell-type specific, constitutive, or inducible promoter.
[0441] Suitable promoters are known in the art and include, for example, a Cbh promoter, a Cba promoter, a pol I promoter, a pol II promoter, a pol III promoter, a T7 promoter, a U6 promoter, a H1 promoter, a retroviral Rous sarcoma virus LTR promoter, a cytomegalovirus (CMV) promoter, a SV40 promoter, a dihydrofolate reductase promoter, a β-actin promoter, an elongation factor 1α short (EFS) promoter, a βglucuronidase (GUSB) promoter, a cytomegalovirus (CMV) immediate-early (Ie) enhancer and / or promoter, a chicken β-actin (CBA) promoter or derivative thereof such as a CAG promoter, CB promoter, a (human) elongation factor 1α-subunit (EF1α) promoter, a ubiquitin C (UBC) promoter, a prion promoter, a neuron-specific enolase (NSE) , a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a platelet-derived growth factor (PDGF) promoter, a platelet-derived growth factor B-chain (PDGF-β) promoter, a synapsin (Syn) promoter, a human synapsin (hSyn) promoter, a synapsin 1 (Syn1) promoter, a methyl-CpG endonuclease 2 (MeCP2) promoter, a Ca2+ / calmodulin-dependent protein kinase II (CaMKII) promoter, a metabotropic glutamate receptor 2 (mGluR2) promoter, a neurofilament light (NFL) promoter, a neurofilament heavy (NFH) promoter, a β-globin minigene nβ2 promoter, a preproenkephalin (PPE) promoter, an enkephalin (Enk) promoter, an excitatory amino acid transporter 2 (EAAT2) promoter, a glial fibrillary acidic protein (GFAP) promoter, a myelin basic protein (MBP) promoter, a OTOF promoter, a GRK1 promoter, a CRX promoter, a NRL promoter, a MECP2 promoter, a mMECP2 promoter, a hMECP2 promoter, an APP promoter, and a RCVRN promoter.
[0442] V. Delivery
[0443] Various ways of delivery can be applied to the polypeptide of the disclosure or the system of the disclosure as needed in practices.
[0444] In yet another aspect, the disclosure provides a delivery system comprising (1) the polypeptide of the disclosure, the polynucleotide of the disclosure, or the system of the disclosure; and (2) a delivery vehicle.
[0445] In yet another aspect, the disclosure provides a vector comprising the polynucleotide of the disclosure. In some embodiments, the vector is a plasmid vector, a viral vector (e.g., a recombinant AAV (rAAV) vector, a recombinant lentivirus vector) , a ribonucleoprotein (RNP) , or a lipid nanoparticle (LNP) .
[0446] In yet another aspect, the disclosure provides a recombinant AAV (rAAV) particle comprising the rAAV vector of the disclosure. In some embodiments, the rAAV vector is an RNA. A simple introduction of AAV for delivery may refer to “Adeno-associated Virus (AAV) Guide” (addgene. org / guides / aav / ) .
[0447] Adeno-associated virus (AAV) , when engineered to delivery, e.g., a protein-encoding sequence of interest, may be termed as a (r) AAV vector, a (r) AAV vector particle, or a (r) AAV particle, where “r” stands for “recombinant” . And the nucleic acid packaged in AAV vectors for delivery may be termed as a (r) AAV vector genome, vector genome, or vg for short, while viral genome may refer to the original viral genome of natural AAVs.
[0448] The serotypes of the capsids of rAAV particles can be matched to the types of target cells. For example, Table 2 of WO2018002719A1 lists exemplary cell types that can be transduced by the indicated AAV serotypes (incorporated herein by reference) .
[0449] In some embodiments, the rAAV particle comprises a capsid with a serotype suitable for delivery into target cells (e.g., inner hair cells) . In some embodiments, the rAAV particle comprising a capsid with a serotype of AAV1, AAV2, AAV3A, AAV3B, AAV4, AAV5, AAV6, AAV7, AAVrh74, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV-DJ, or AAV. PHP. eB, a member of the Clade to which any of the AAV1-AAV13 belong, or a functional variant (e.g., a functional truncation) thereof, encapsidating the rAAV vector. In some embodiments, the serotype of the capsid is AAV9 or a functional variant thereof.
[0450] General principles of rAAV particle production are known in the art. In some embodiments, rAAV particles may be produced using the triple transfection method (described in detail in U.S. Pat. No. 6,001,650) .
[0451] The vector titers are usually expressed as vector genomes per ml (vg / ml) . In some embodiments, the vector titer is above 1×109, above 5×1010, above 1×1011, above 5×1011, above 1×1012, above 5×1012, or above 1×1013 vg / ml.
[0452] Instead of packaging a single strand (ss) DNA as a vector genome of a rAAV particle, systems and methods of packaging an RNA as a vector genome into a rAAV particle is recently developed and applicable herein. See PCT / CN2022 / 075366, which is incorporated herein by reference in its entirety.
[0453] When the vector genome is RNA as in, for example, PCT / CN2022 / 075366, for simplicity of description and claiming, sequence elements described herein for DNA vector genomes, when present in RNA vector genomes, should generally be considered to be applicable for the RNA vector genomes except that the deoxyribonucleotides in the DNA sequence are the corresponding ribonucleotides in the RNA sequence (e.g., dT is equivalent to U, and dA is equivalent to A) and / or the element in the DNA sequence is replaced with the corresponding element with a corresponding function in the RNA sequence or omitted because its function is unnecessary in the RNA sequence and / or an additional element necessary for the RNA vector genome is introduced.
[0454] As used herein, a coding sequence, e.g., as a sequence element of rAAV vector genomes herein, is construed, understood, and considered as covering and covers both a DNA coding sequence and an RNA coding sequence. When it is a DNA coding sequence, an RNA sequence can be transcribed from the DNA coding sequence, and optionally further a protein can be translated from the transcribed RNA sequence as necessary. When it is an RNA coding sequence, the RNA coding sequence per se can be a functional RNA sequence for use, or an RNA sequence can be produced from the RNA coding sequence, e.g., by RNA processing, or a protein can be translated from the RNA coding sequence.
[0455] For example, a polynucleotide encoding a polypeptide covers either a DNA from which the polypeptide is expressed (indirectly via transcription and translation) or an RNA from which a polypeptide istranslated (directly) .
[0456] For example, a nucleic acid comprising or encoding a gRNA covers either a DNA from which a gRNA is transcribed or an RNA (1) which per se functions in the same way as the gRNA is, or (2) from which a gRNA is produced, e.g., by RNA processing.
[0457] In some embodiments for rAAV RNA vector genomes, 5’-ITR and / or 3’-ITR as DNA packaging signals may be unnecessary and can be omitted at least partly, while RNA packaging signals can be introduced. In some embodiments for rAAV RNA vector genomes, a promoter to drive transcription of DNA sequences may be unnecessary and can be omitted at least partly. In some embodiments for rAAV RNA vector genomes, a sequence encoding a polyA signal may be unnecessary and can be omitted at least partly, while a polyA tail can be introduced. Similarly, other DNA elements of rAAV DNA vector genomes can be either omitted or replaced with corresponding RNA elements and / or additional RNA elements can be introduced, in order to adapt to the strategy of delivering an RNA vector genome by rAAV particles.
[0458] In yet another aspect, the disclosure provides a ribonucleoprotein (RNP) comprising the polypeptide of the disclosure and the guide nucleic acid as described in the disclosure.
[0459] In yet another aspect, the disclosure provides a lipid nanoparticle (LNP) comprising (1) an RNA (e.g., mRNA) comprising a sequence encoding the polypeptide of the disclosure and (2) a guide nucleic acid as described in the disclosure.
[0460] In yet another aspect, the disclosure provides a cell comprising the polypeptide of the disclosure, the system of the disclosure, the polynucleotide of the disclosure, the vector of the disclosure, the rAAV particle of the disclosure, the RNP of the disclosure, or the LNP of the disclosure.
[0461] VI. Method of modifying
[0462] The polypeptide and system of the disclosure have a wide variety of utilities, including modifying (e.g., cleavage, base editing, transcriptional activation or inactivation, methylation or demethylation) a target DNA in a multiplicity of cell types. The system has a broad spectrum of applications requiring high activity / efficiency and small sizes, e.g., establishing animal models, cell engineering, prevention, diagnosis, and treatment of diseases.
[0463] The polypeptide and system of the disclosure can be used to modify a target DNA, for example, to cleave the target DNA, to base edit the target DNA. For example, the modification may lead to silence of a gene.
[0464] In yet another aspect, the disclosure provides a method for modifying a target DNA, comprising contacting the target DNA with the system of the disclosure, the vector of the disclosure, the ribonucleoprotein of the disclosure, or the lipid nanoparticle of the disclosure, wherein the guide sequence is capable of hybridizing to a target sequence of the target DNA, wherein the target DNA is modified by the complex.
[0465] In some embodiments, the modification includes cleavage, base editing (e.g., single base editing) , prime editing, epigenomic editing, including transcriptional activation or inhibition.
[0466] In some embodiments, the method is in vitro, in vivo, or ex vivo.
[0467] In some embodiments, the target DNA is in a cell.
[0468] In yet another aspect, the disclosure provides a cell modified by the system of the disclosure or the method of the disclosure. In some embodiments, the cell is modified in vitro, in vivo, or ex vivo.
[0469] In some embodiments, the cell is a eukaryotic cell (e.g., an animal cell, a vertebrate cell, a mammalian cell, a non-human mammalian cell, a non-human primate cell, a rodent (e.g., mouse or rat) cell, a human cell, a plant cell, or a yeast cell) or a prokaryotic cell (e.g., a bacteria cell) .
[0470] In some embodiments, the cell is from a plant or an animal. In some embodiments, the cell is not from a plant.
[0471] In some embodiments, the cell is a non-human mammalian cell, such as a cell from a non-human primate (e.g., monkey) , an ox / cow / bull / cattle, sheep, goat, pig, horse, dog, cat, rodent (such as rabbit, mouse, rat, hamster, etc. ) , alpaca. In some embodiments, the cell is from fish (such as salmon, zebra fish) , bird (such as poultry bird, including chick, duck, goose) , reptile, shellfish (e.g., oyster, clam, lobster, shrimp) , insect, worm, yeast, etc.
[0472] In some embodiments, the plant is a dicotyledon. In some embodiments, the dicotyledon is selected from the group consisting of soybean, cabbage (e.g., Chinese cabbage) , rapeseed, brassica, watermelon, melon, potato, tomato, tobacco, eggplant, pepper, cucumber, cotton, alfalfa, eggplant, grape. In some embodiments, the plant is a monocotyledon. In some embodiments, the monocotyledon is selected from the group consisting of rice, corn, wheat, barley, oat, sorghum, millet, grasses, Poaceae, Zizania, Avena, Coix, Hordeum, Oryza, Panicum (e.g., Panicum miliaceum) , Secale, Setaria (e.g., Setaria italica) , Sorghum, Triticum, Zea, Cymbopogon, Saccharum (e.g., Saccharum officinarum) , Phyllostachys, Dendrocalamus, Bambusa, Yushania.
[0473] In some embodiments, the cell is from a plant, such as monocot or dicot. In certain embodiment, the plant is a food crop such as barley, cassava, cotton, groundnuts or peanuts, maize, millet, oil palm fruit, potatoes, pulses, rapeseed or canola, rice, rye, sorghum, soybeans, sugar cane, sugar beets, sunflower, and wheat. In certain embodiment, the plant is a cereal (barley, maize, millet, rice, rye, sorghum, and wheat) . In certain embodiment, the plant is a tuber (cassava and potatoes) . In certain embodiment, the plant is a sugar crop (sugar beets and sugar cane) . In certain embodiment, the plant is an oil-bearing crop (soybeans, groundnuts or peanuts, rapeseed or canola, sunflower, and oil palm fruit) . In certain embodiment, the plant is a fiber crop (cotton) . In certain embodiment, the plant is a tree (such as a peach or a nectarine tree, an apple or pear tree, a nut tree such as almond or walnut or pistachio tree, or a citrus tree, e.g., orange, grapefruit or lemon tree) , a grass, a vegetable, a fruit, or an algae. In certain embodiment, the plant is a nightshade plant; a plant of the genus Brassica; a plant of the genus Lactuca; a plant of the genus Spinacia; a plant of the genus Capsicum; cotton, tobacco, asparagus, carrot, cabbage, broccoli, cauliflower, tomato, eggplant, pepper, lettuce, spinach, strawberry, blueberry, raspberry, blackberry, grape, coffee, cocoa, etc.
[0474] In some embodiments, the cell is a stem cell. In some embodiments, the cell is an embryonic stem cell. In some embodiments, the cell is a primary human cell or an established human cell line.
[0475] In some embodiments, the cell is not a human or animal embryonic stem cell. In some embodiments, the cell is not a human or animal germ cell. In some embodiments, the cell is not a plant cell.
[0476] Further embodiments are illustrated in the following Examples which are given for illustrative purposes only and are not intended to limit the scope of the disclosure.
[0477] EXAMPLES
[0478] The following examples are provided to further illustrate some embodiments of the disclosure but are not intended to limit the scope of the invention; it will be understood by their exemplary nature that other procedures, methodologies, or techniques known to those skilled in the art may alternatively be used.
[0479] Methods
[0480] Cell culture, transfection, and flow cytometry
[0481] HEK293T cells (Stem Cell Bank, Chinese Academy of Sciences) were maintained in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10%fetal bovine serum (FBS) and penicillin–streptomycin at 37 ℃ in 5%CO2. Cells were seeded onto poly-D-lysine–coated 24-well plates (Corning) prior to transfection.
[0482] For GFP activation assays, cells were transfected using polyethylenimine (PEI, Polysciences) according to the manufacturer’s instructions with a total of 1.6 μg plasmids (0.8 μg reporter and 0.8 μg Cas-expressing constructs) and 3.2 μl PEI. Forty-eight hours post-transfection, GFP activation was quantified using a CytoFLEX flow cytometer (Beckman Coulter) , and data were analyzed with FlowJo v10.9.0.
[0483] For endogenous editing assays, HEK293T cells were transfected with 800 ng of Cas-expressing plasmids, 400 ng of sgRNA plasmids, and 2.4 μl PEI. At 48 h post-transfection, ~12,000 mCherry+GFP+ double-positive cells were isolated by fluorescence-activated cell sorting (FACS) for indel quantification, and cells collected at 72 h were used for base editing efficiency analysis.
[0484] Targeted deep sequencing and analysis
[0485] Sorted cells were lysed in 20 μl of lysis buffer containing proteinase K (Vazyme) according to the manufacturer’s instructions to extract genomic DNA. For EditR analysis, the genomic region surrounding each target site was amplified by nested PCR using Phanta Max Super-Fidelity DNA polymerase (Vazyme) . Purified amplicons were subjected to Sanger sequencing, and editing outcomes were quantified with EditR. For deep sequencing, target loci were amplified by nested PCR, with barcoded primers introduced in the second round. Amplicons were purified using a gel extraction kit (Vazyme) and sequenced on an Illumina NovaSeq 6000 platform (AZENTA) with 150-bp paired-end reads. Raw reads were demultiplexed using Cutadapt, and editing outcomes (indels and base substitutions) were quantified with CRISPResso2.
[0486] In vitro target DNA cleavage assay
[0487] To investigate the target DNA cleavage activity of programmable RNA-guided DNA endonucleases, in vitro cleavage assays were performed using 708 bp of dsDNA containing the 20bp target site as substrate. Cleavage reactions were performed with 0.5 μM of target DNA and different concentration of endonuclease-sgRNA RNP in reaction buffer (25 mM Hepes-NaOH pH 7.5, 50 mM NaCl, 50 mM KCl, 1.5 mM MgCl2, 20%glycerol, and 1 mM DTT) . Reactions were incubated at 37 ℃ for 2 h and quenched by adding urea and proteinase K (Thermo Fisher Scientific) at final concentrations of 1 M and 1 μg μl-1, respectively, and incubated at 60 ℃ for 3 h. The sample was heated at 95 ℃ for 5 min before loading on a 6%Novex Native-PAGE (Thermo Fisher Scientific) . The gel was run at 200 V for 30 min followed by staining in GelRed Nucleic acid stain and imaged on an iBright FL1500 Imaging System. Gel images were processed and prepared on ImageJ (v. 1.53k) .
[0488] EXAMPLE 1. Characterization of endonuclease activities of the programmable nucleic acid-guided DNA endonucleases of the disclosure
[0489] Design and Construction:
[0490] A number of endonuclease systems were developed by the Applicant, each consisting of one endonuclease of SEQ ID NOs: 1-11 in Table 1 and a gRNA corresponding to the endonuclease, which was consisting of the scaffold sequence (one of SEQ ID NOs: 34-44 or 45-55) corresponding to the endonuclease in Table 1 and a guide sequence (spacer sequence) 5’ to the scaffold sequence (i.e., 5’-guide sequence -scaffold sequence -3’ ) without a linker in-between. As an example, the gRNA used in combination with the endonuclease of SEQ ID NO: 4 or a derivative thereof is consisting of the scaffold sequence of SEQ ID NO: 48 and a guide sequence 5’ to the scaffold sequence.
[0491] An expression plasmid was designed to express the endonuclease in eukaryotic cells, comprising, from 5’ to 3’ , CMV enhancer, chicken β-actin promoter, a sequence encoding the endonuclease, and bGH poly (A) signal; and a sequence encoding mCherry under the regulation of CMV promoter.
[0492] A gRNA plasmid was designed to express the gRNA in eukaryotic cells, comprising a sequence encoding a gRNA under the regulation of U6 promoter.
[0493] A fluorescent reporter system was used to evaluate the endonuclease activity of the eleven (11) endonuclease systems, which was indicated by the intensity of a specific fluorescent signal. The guide sequence of the gRNA was designed to target the fluorescent reporter system.
[0494] Results:
[0495] Endonuclease activity of the tested endonucleases has been shown.
[0496] EXAMPLE 2. Design of nickase based on the programmable RNA-guided DNA endonucleases of the disclosure and characterization of nickase activities
[0497] Designs:
[0498] By aligning the amino acid sequences of the eleven (11) nucleases of SEQ ID NO: 1-11 with CLUSTAL O (1.2.4) multiple sequence alignment (EMBL-EBI) , conservative motif (at positions corresponding to positions 10-12 of SEQ ID NO: 4) composed of conserved amino acid residues D and G and one undefined amino acid residue x between D and G was identified in the RuvC I domain of all the nucleases, conservative motif (at positions corresponding to positions 304-308 of SEQ ID NO: 4) composed of conserved amino acid residues E and F and three undefined amino acid residue x between E and F was identified in the RuvC II domain of all the nucleases, and conservative motif (at positions corresponding to positions 486-490 of SEQ ID NO: 4) composed of conserved amino acid residues H, D, and A and two undefined amino acid residue x between H and D was identified in the RuvC III domain of all the nucleases. By “conserved amino acid residue” it means that the amino acid residue is constant (not changed) across all the nucleases. By “conservative motif” it means that the motif (consisting of multiple amino acid residues) is constant (not changed) across all the nucleases. Mutation was then made at one or more of the conserved amino acid residues in one or more of the conservative motifs of S9D04 (SEQ ID NO: 4) to generate derivatives (mutants) and evaluate the possible outcome.
[0499] Evaluation:
[0500] To evaluate the endonuclease activity and nickase activity of the derivatives (mutants) generated based on S9D04 (SEQ ID NO: 4) according to the mutagenesis principle above, a dual plasmid GFxxFP reporter system was established, containing an expression plasmid, and either a reporter plasmid for endonuclease activity evaluation or a reporter plasmid for nickase activity evaluation.
[0501] The expression plasmid comprised, from 5’ to 3’ , (1) a polynucleotide encoding S9D04 or its mutants operably linked to a promoter; (2) a polynucleotide encoding a guide nucleic acid operably linked to a promoter; and (3) a polynucleotide encoding mCherry operably linked to a promoter. Red fluorescent signals generated by the expression of the mCherry indicated successful transfection and expression of the expression plasmid in host cells.
[0502] The two reporter plasmids comprised a polynucleotide encoding BFP-T2A-GFxxFP expression cassette with a deactivated EGFP coding sequence (GFxxFP coding sequence) operably linked to a promoter. Blue fluorescent signals generated by the expression of the BFP indicated successful transfection and expression of the reporter plasmid in host cells.
[0503] For evaluation of the endonuclease activities of the mutants of S9D04, the GFxxFP coding sequence harbored an insertion sequence containing a protospacer sequence containing premature stop codon, immediately 5’ to a protospacer adjacent motif (PAM) , on the 5’-3’ strand, which premature stop codon prevents translation of the GFxxFP coding sequence.
[0504] For evaluation of the nickase activities of the mutants of S9D04, the GFxxFP coding sequence harbored an insertion sequence containing (1) the complement of the protospacer sequence containing premature stop codon, immediately 5’ to a PAM, on the 3’-5’ strand and (2) protospacer sequence containing premature stop codon, immediately 5’ to a PAM, on the 5’-3’ strand, which premature stop codon prevents translation of the GFxxFP coding sequence.
[0505] The gRNA was designed to target the insertion sequence of the GFxxFP coding sequence so as to trigger double-strand cleavage by endonuclease activity or single-strand cleavage (nick) by nickase activity at the insertion sequence comprising the protospacer sequence and / or the complement of the protospacer sequence. For the purpose of targeting the GFxxFP coding sequence, the guide sequence is GFxxFP-targeting guide sequence capable of targeting the insertion sequence. For negative control (nontargeting control, “NT” ) , the guide sequence is nontargeting guide sequence incapable of targeting the insertion sequence.
[0506] When using the reporter plasmid for endonuclease activity evaluation, in the event that a double-strand break (DSB) was generated by the tested polypeptide guided by the GFxxFP-targeting gRNA, the subsequent DNA repairing trigged by the DSB would restore and activate the deactivated EGFP coding sequence to express EGFP with green fluorescence emission indicative of endonuclease activity functioning in a guide sequence-specific (on-target) manner.
[0507] When using the reporter plasmid for nickase activity evaluation, in the event that two single-strand break (SSB) (two nicks) were generated by the tested polypeptide guided by the GFxxFP-targeting gRNA, the two SSB are similar to a DSB and may also trigger subsequent DNA repairing, thereby restoring and activating the deactivated EGFP coding sequence to express EGFP with green fluorescence emission indicative of nickase activity functioning in a guide sequence-specific (on-target) manner. Note that the tested S9D04, while showing double-strand cleavage demonstrating that they are endonucleases, may also show single-strand cleavage (since certainly the endonuclease activity can generate SSB) in the nickase evaluation reporter system, which fact however does not make them nickase.
[0508] HEK293T cells were cultured in 24-well tissue culture plates according to standard methods for 12 hours, before the reporter and expression plasmids were co-transfected into the cells using standard polyethyleneimine (PEI) transfection. The transfected cells were then cultured at 37℃ under 5%CO2 for 48 hours. Then the cultured cells were analyzed by flow cytometry for BFP, EGFP, and mCherry fluorescent signals.
[0509] The endonuclease and nickase activities were calculated as the percentage of EGFP positive cells ( “EGFP+” ) in BFP &mCherry dual-positive cells ( “mCherry+ BFP+” ) . The higher the %EGFP+ / mCherry+ BFP+ is, the higher the endonuclease or nickase activity would be.
[0510] Results:
[0511] The mutants of S9D04 include S9D04-D10A, S9D04-G12A, S9D04-E304A, S9D04-F308A, S9D04-H486A, and S9D04-D489A.
[0512] The results demonstrate that by introducing a substitution (e.g., Alanine (A) ) at a conserved amino acid residue (e.g., D10) into the identified conservative motif of S9D04, the resulting mutant of S9D04 (e.g., S9D04-D10A) with such a substitution showed significantly reduced (almost eliminated) endonuclease activity as compared with S9D04 ( “WT” ) and showed significant nickase activity, suggesting that such a substitution inactivating the RuvC nuclease domain and generating a nickase that is believed to nick the target strand and suitable for base editing.
[0513] EXAMPLE 3. Design of DNA adenine base editor
[0514] Based on S9D04-D10A nickase, an adenine base editor (ABE) (SEQ ID NO: 56) was designed and termed as “wild type ABE (WT ABE) ” hereinafter. As shown, WT ABE contains TadA8e deaminase (SEQ ID NO: 57) . As shown, WT ABE contains a N-terminal bpSV40 NLS (SEQ ID NO: 58) and a C-terminal NP NLS (SEQ ID NO: 59) , and a XTEN-GS linker (SEQ ID NO: 60) between the deaminase and the nickase. A SpCas9-based adenine base editor (SEQ ID NO: 61) was used as control.
[0515] EXAMPLE 4. Design and screening of scaffold sequence of gRNA
[0516] For use in combination with WT ABE (SEQ ID NO: 56) comprising nickase S9D04-D10A, a scaffold sequence of a gRNA was designed to consist of a DR sequence of SEQ ID NO: 15, a tracr sequence of SEQ ID NO: 26, and a linker of 5’-AAAAAA-3’ (SEQ ID NO: 62) between the DR sequence and the tracr sequence. The scaffold sequence was set forth in SEQ ID NO: 48 and termed as “wild type scaffold (WT scaffold) ” hereinafter.
[0517] Further scaffold variants were designed based on the WT scaffold as shown in Table 2.
[0518] Table 2
[0519] An ABE expression plasmid was designed to express the ABE in eukaryotic cells, comprising, from 5’ to 3’ , CMV enhancer, chicken β-actin promoter, a sequence encoding the ABE, and bGH poly (A) signal; and a sequence encoding mCherry under the regulation of CMV promoter.
[0520] A gRNA plasmid was designed to express the gRNA in eukaryotic cells, comprising a sequence encoding a gRNA under the regulation of U6 promoter.
[0521] An A11 fluorescent reporter plasmid was designed to evaluate the base editing efficiency of the ABE, which was indicated by the intensity of a specific fluorescent signal. The fluorescent reporter system comprises a sequence encoding BFP-T2A-insertion sequence-P2A-EGFP cassette under the regulation of CAG promoter. The insertion sequence contains a protospacer sequence ( SEQ ID NO: 75) carrying a premature stop codon with nucleotide A at position 11 of the protospacer sequence. The guide sequence of the gRNA was designed to target the protospacer sequence. In the case that the ABE converted nucleotide A in the premature stop codon to nucleotide G in a new codon TGG (Trp; W) , the EGFP fluorescent signal would be observed.
[0522] Similarly, an A5 fluorescent reporter plasmid was designed with a protospacer sequence ( SEQ ID NO: 76) carrying a premature stop codon with nucleotide A at position 5 of the protospacer sequence in place of the protospacer sequence in the A11 fluorescent reporter system.
[0523] Similarly, a site 2229 fluorescent reporter plasmid was designed with a protospacer sequence ( SEQ ID NO: 77) carrying a premature stop codon with nucleotide A at position 6 of the protospacer sequence in place of the protospacer sequence in the A11 fluorescent reporter system.
[0524] The ABE expression plasmid, the gRNA plasmid, and one of the three fluorescent reporter plasmids were co-transfected into HEK293 cells to test the base editing efficiency of each ABE. NT refers to a negative control with a non-targeting gRNA.
[0525] The results in Table 3 below show that some scaffold variants in Table 2 achieved improved base editing efficiency over WT scaffold when used in combination with the same WT ABE.
[0526] Table 3
[0527] The results in Table 4 below show that some scaffold variants in Table 2 achieved improved base editing efficiency over WT scaffold when used in combination with the same ABE of SEQ ID NO: 46.
[0528] Table 4
[0529] EXAMPLE 5. Design and screening of high-efficiency mutants based on S9D04 with single substitution
[0530] Further mutants of S9D04 were designed by introducing into S9D04-D10A contained in WT ABE one additional single-substitution with R substantially across the full length of S9D04-D10A and tested for base editing efficiency. The single-substitutions leading to increased base editing efficiency (fold change of greater than 1.0) over WT ABE are listed in Tables 5-8 below.
[0531] Table 5
[0532] Table 6
[0533] Table 7
[0534] Table 8
[0535] Single-substitutions above were then combined into multiple-substitutions to form S9D04 mutants with multiple substitutions as shown in Table 9 below, and all of them contain D10A as well.
[0536] Table 9
[0537] EXAMPLE 6. Design and screening of high-efficiency mutants based on S9D04 with multiple substitutions
[0538] The multiple substitutions were introduced into S9D04-D10A contained in WT ABE to form new ABEs. The fold change of base editing efficiency of those new ABEs over WT ABE are listed in Tables 10-17 below. The new ABEs were tested in combination with the gRNA comprising the scaffold sequence of HAP5164 or the scaffold sequence of SEQ ID NO: 65.
[0539] Table 10
[0540] Table 11
[0541] Table 12
[0542] Table 13
[0543] Table 14
[0544] Table 15
[0545] Table 16
[0546] Table 17
[0547] EXAMPLE 7. Test of high-efficiency mutants based on S9D04 with multiple substitutions on endogenous sites
[0548] The new ABEs were tested for their base editing efficiency at A2, A6, A11, A17 site of endogenous genes, CIITA, CD52, PD1, B2M, TRAC, etc, as shown in Tables 18-26 below.
[0549] Single gRNA was used in the following experiments.
[0550] Table 18
[0551] Table 19
[0552] Table 20
[0553] Table 21
[0554] Mixed gRNA were used in the following experiments to test multiple target base editing.
[0555] Table 22
[0556] Table 23
[0557] Table 24
[0558] Table 25
[0559] Table 26
[0560] ***
[0561] Various modifications and variations of the described products, methods, and uses of the disclosure will be apparent to those skilled in the art without departing from the scope and spirit of the disclosure. Although the disclosure has been described in connection with specific embodiments, it will be understood that it is capable of further modifications and that the disclosure as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the disclosure that are obvious to those skilled in the art are intended to be within the scope of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure come within known customary practice within the art to which the disclosure pertains and may be applied to the essential features herein before set forth.
Claims
A polypeptide comprising an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to the amino acid sequence of any one of SEQ ID NOs: 4, 1-3, and 5-11.The polypeptide of any preceding claim, wherein the polypeptide comprises:(a) an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position selected from the group consisting of Y11, K14, Y15, N25, Q27, E35, G46, E49, G52, I53, T56, Q57, Q58, G75, Y76, D77, S79, T80, K84, Y95, A98, D99, M100, P103, E104, E105, I106, E107, E108, K115, T120, Q121, N123, S124, A132, K136, D145, H147, K149, T170, K173, G176, K177, I193, L199, N220, N229, Q230, D232, W233, K236, N237, D240, A242, K254, E255, E261, K262, T265, K268, A274, L277, K278, T286, K289, D290, N295, A297, D299, F308, A311, A312, K313, G316, K317, L320, A321, K322, D323, E324, Y325, K327, G328, N329, E331, G333, K337, A339, G346, N353, S358, S373, N374, F376, L379, A405, A408, K411, N412, D413, K415, K416, K417, D420, K423, S424, I425, N426, Q427, N431, K432, S435, H436, A437, Y469, G475, K477, K478, A481, D483, K484, H485, H486, E502, S514, S528, K529, E530, K532, D534, S536, S537, Y559, E563, F585, N592, K616, C617, T618, K624, G625, L629, G631, L648, C657, E658, K660, V665, E670, D673, K676, I679, K684, K685, N695, L700, E704, D709, W712, T714, S716, K719, K720, and T725 of SEQ ID NO: 4; optionally, the amino acid substitution is an amino acid substitution with Arginine (R) ; and / or(b) an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position selected from the group consisting of D10, G12, E304, F308, H486, and D489 of SEQ ID NO: 4; optionally, the amino acid substitution is an amino acid substitution with a non-polar amino acid residue, such as, Alanine (A) .The polypeptide of any preceding claim, wherein the polypeptide comprises an amino acid substitution relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to an amino acid substitution of D10A relative to SEQ ID NO: 4.The polypeptide of any preceding claim, wherein the polypeptide comprises an amino acid substitution at a position of any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to a position selected from the group consisting of K14, Q57, K136, T286, A311, N374, L379, K484, E530, N592, E658, K685, and E704 of SEQ ID NO: 4; optionally, the amino acid substitution is an amino acid substitution with Arginine (R) .The polypeptide of any preceding claim, wherein the polypeptide comprises an amino acid substitution relative to any one of SEQ ID NOs: 4, 1-3, and 5-11 that is corresponding to an amino acid substitution selected from the group consisting of K14R, Q57R, K136R, T286R, T286R., A311R, N374R, L379R, K484R, E530R, N592R, E658R, K685R, and E704R relative to SEQ ID NO: 4.The polypeptide of any preceding claim, wherein the polypeptide comprises an amino acid substitution selected from the group consisting of:(1) K14R(2) Q57R(3) K136R(4) T286R(5) T286R. ;(6) A311R;(7) N374R;(8) L379R;(9) K484R;(10) E530R;(11) N592R;(12) E658R;(13) K685R;(14) E704R;(15) A311R + E530R + E658R + N592R + K14R;(16) E530R + E658R + K14R;(17) A311R + E530R + E658R;(18) A311R + E530R + E658R + Q57R;(19) A311R + E530R + E658R + E704R;(20) A311R + E530R + E658R + N374R;(21) A311R + E530R + E658R + Q57R + E704R;(22) A311R + E530R + E658R + E704R + N374R;(23) A311R + E530R + E658R + Q57R + E704R + N374R;(24) A311R + E530R + E658R + Q57R + N374R;(25) A311R + E530R + E658R + L379R + K484R;(26) A311R + E530R + E658R + K685R;(27) A311R + E530R + E658R + K136R + T286R;(28) A311R + E530R + E658R + Q57R + E704R + L379R + K484R;(29) A311R + E530R + E658R + Q57R + E704R + K685R;(30) A311R + E530R + E658R + Q57R + E704R + K136R + T286R;(31) A311R + E530R + E658R + Q57R + L379R + K484R;(32) A311R + E530R + E658R + Q57R + K685R;(33) A311R + E530R + E658R + Q57R + K136R + T286R;(34) A311R + E530R + E658R + N374R + L379R + K484R;(35) A311R + E530R + E658R + N374R + K685R;(36) A311R + E530R + E658R + N374R + K136R + T286R; andany combination of (1) - (36) ;relative to SEQ ID NO: 4; and optionally further, amino acid substitution D10A relative to SEQ ID NO: 4.The polypeptide of any preceding claim, wherein the polypeptide comprises amino acid substitutions of (a) D10A, A311R, E530R, E658R, K136R, and T286R relative to SEQ ID NO: 4; or (b) D10A, A311R, E530R, E658R, N374R, L379R, and K484R relative to SEQ ID NO: 4.The polypeptide of any preceding claim, wherein the polypeptide comprises the amino acid sequence of SEQ ID NO: 53 or 54; or comprises an amino acid sequence having a sequence identity of at least about 80%(e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 53 or 54.The polypeptide of any preceding claim, wherein the polypeptide has at least one of endonuclease activity, nickase activity, and DNA binding property.A fusion protein comprising the polypeptide of any preceding claim and a functional domain.The fusion protein of any preceding claim, wherein the functional domain is selected from the group consisting of a nuclear localization signal (NLS) , a nuclear export signal (NES) , a deaminase or a catalytic domain thereof, an uracil glycosylase inhibitor (UGI) , an uracil glycosylase (UNG) , a methylpurine glycosylase (MPG) , a methylase or a catalytic domain thereof, a demethylase or a catalytic domain thereof, an transcription activating domain (e.g., VP64, VPR, miniVPR) , an transcription inhibiting domain (e.g., KRAB moiety or SID moiety) , a reverse transcriptase or a catalytic domain thereof, an exonuclease or a catalytic domain thereof (e.g., T5 exonuclease) , a histone residue modification domain, a nuclease catalytic domain (e.g., FokI) , a transcription modification factor, a light gating factor, a chemical inducible factor, a chromatin visualization factor, a targeting polypeptide for providing binding to a cell surface portion on a target cell or a target cell type, a reporter (e.g., fluorescent) polypeptide or a detection label (e.g., GST, HRP, CAT, GFP, HcRed, DsRed, CFP, YFP, BFP) , a localization signal, a polypeptide targeting moiety, a DNA binding domain (e.g., MBP, Lex A DBD, Gal4 DBD) , an epitope tag (e.g., His, myc, V5, FLAG, HA, VSV-G, Trx, etc) , a transcription release factor, an HDAC, a moiety having RNA cleavage activity, a moiety having ssDNA cleavage activity, a moiety having dsDNA cleavage activity, a DNA or RNA ligase, a functional domain exhibiting activity to modify a target DNA selected from the group consisting of: methyltransferase activity, DNA repair activity, DNA damage activity, dismutase activity, alkylation activity, dealkylation activity, depurination activity, oxidation activity, deoxidation activity, pyrimidine dimer forming activity, integrase activity, transposase activity, recombinase activity, polymerase activity, ligase activity, helicase activity, photolyase activity, glycosylase activity, acetyl transferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitination activity, adenylation activity, deadenylation activity, SUMOylation activity, deSUMOylation activity, ribosylation activity, deribosylation activity, myristoylation activity, demyristoylation activity, glycosylation activity (e.g., from O-GlcNAc transferase) , deglycosylation activity, and a catalytic domain thereof, and a functional fragment thereof, and any combination thereof.The fusion protein of any preceding claim, wherein the deaminase or catalytic domain thereof comprises an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to SEQ ID NO: 57.The fusion protein of any preceding claim, wherein the fusion protein comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the amino acid sequence of SEQ ID NO: 56; or wherein the fusion protein comprises, consists essentially of, or consists of an amino acid sequence having a sequence identity of at least about 80% (e.g., at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) to the amino acid sequence of SEQ ID NO: 56 provided that the polypeptide in SEQ ID NO: 56 is replaced with the polypeptide of any preceding claim other than S9D04-D10A.A system comprising:(1) the polypeptide of any preceding claim or the fusion protein of any preceding claim, or a polynucleotide (e.g., a DNA, an RNA) encoding the polypeptide or the fusion protein, and(2) a guide nucleic acid or a polynucleotide (e.g., a DNA, an RNA) encoding the guide nucleic acid, the guide nucleic acid comprising:(i) a scaffold sequence capable of forming a complex with the polypeptide or the fusion protein; and(ii) a guide sequence capable of hybridizing to a target sequence of a target DNA, thereby guiding the complex to the target DNA.The system of any preceding claim, wherein the scaffold sequence has substantially the same secondary structure as the secondary structure of any one of SEQ ID NOs: 34-55 and 65; or the scaffold sequence comprises a polynucleotide sequence having a sequence identity of at least about 80% (e.g., at least about 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, or 100%) to any one of SEQ ID NOs: 34-55 and 65; or the scaffold sequence comprises the polynucleotide sequence of any one of SEQ ID NOs: 34-55 and 65.The system of any preceding claim, wherein the scaffold sequence is a mutant of the scaffold sequence of SEQ ID NO: 48 and comprises a nucleotide deletion relative to SEQ ID NO: 48 selected from the group consisting of:1) Deletion of nucleotides at positions 17-19 and 24-26 of SEQ ID NO: 48;2) Deletion of nucleotides at positions 14-19 and 24-29 of SEQ ID NO: 48;3) Deletion of nucleotides at positions 12-19 and 24-31 of SEQ ID NO: 48;4) Deletion of nucleotides at positions 17-19, 24-26, 7-10, and 33-36 of SEQ ID NO: 48;5) Deletion of nucleotides at positions 46-53 and 126-131 of SEQ ID NO: 48;6) Deletion of nucleotides at positions 47-51 and 127-130 of SEQ ID NO: 48;7) Deletion of nucleotides at positions 50-54 and 125-127 of SEQ ID NO: 48;8) Deletion of nucleotides at positions 70-72 and 85-87 of SEQ ID NO: 48;9) Deletion of nucleotides at positions 92-99 of SEQ ID NO: 48;10) Deletion of nucleotides at positions 97-101 of SEQ ID NO: 48;11) Deletion of nucleotides at positions 14-19, 24-29, and 92-99 of SEQ ID NO: 48;12) Deletion of nucleotides at positions 12-19, 24-31, and 92-99 of SEQ ID NO: 48; and13) Deletion of nucleotides at positions 17-19, 24-26, 7-10, 33-36, and 92-99 of SEQ ID NO: 48.The system of any preceding claim, wherein the guide sequence is in a length of from about 19 to about 24 nucleotides; optionally, the guide sequence is about 20 nucleotides in length.A polynucleotide comprising a sequence encoding the polypeptide of any preceding claim or the fusion protein of any preceding claim.A vector comprising the polynucleotide of any preceding claim; optionally, the vector is a plasmid vector, a viral vector (e.g., a recombinant AAV (rAAV) vector, a recombinant lentivirus vector) , a ribonucleoprotein (RNP) , or a lipid nanoparticle (LNP) .A cell comprising the polypeptide of any preceding claim, the fusion protein of any preceding claim, the system of any preceding claim, the polynucleotide of any preceding claim, or the vector of any preceding claim.A method for modifying a target DNA, comprising contacting the target DNA with the system of any preceding claim, wherein the guide sequence is capable of hybridizing to a target sequence of the target DNA, wherein the target DNA is modified by the complex.