New CRISPR gene editing system
By developing a CRISPR system that combines TraC effector proteins with multiple guide RNAs, the problems of single function and difficult delivery of existing CRISPR-Cas12 proteins have been solved, enabling multiple targeted editing and efficient genome editing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INST OF GENETICS & DEVELOPMENTAL BIOLOGY CHINESE ACAD OF SCI
- Filing Date
- 2023-06-01
- Publication Date
- 2026-06-12
AI Technical Summary
Existing CRISPR-Cas12 proteins suffer from single-function limitations and delivery difficulties in gene editing, lacking multi-target editing capabilities and high efficiency.
A gene editing system comprising transposons and CRISPR-Cas12 intermediate (TraC) effector proteins was developed. The TraC effector proteins form complexes with guide RNA derived from the right-hand element of the transposon or with guide RNA containing tracrRNA and crRNA, enabling multiple targeting binding and editing.
It enriches genome editing tools, provides multiplex gene editing capabilities, improves editing efficiency, and the TraC effector protein is the smallest monomeric Cas12 protein, which is easy to deliver and edit in vivo.
Smart Images

Figure GDA0004544198950000251 
Figure GDA0004544198950000261 
Figure GDA0004544198950000301
Abstract
Description
Technical Field
[0001] This invention belongs to the field of genetic engineering. Specifically, this invention relates to a novel CRISPR gene editing system and its applications. More specifically, this invention provides a transposon and CRISPR-Cas12 intermediate (TraC) effector protein or a functional variant thereof, and a gene editing system based thereon and its applications. Background of the Invention
[0002] Type V effector proteins are Cas12 proteins with multiple functional domains, characterized by the inclusion of a RuvC-like domain, which is generally responsible for cleaving target DNA. Type V has a rich variety of subtypes; currently, 11 subtypes have been discovered and classified, including Cas12a-k. Among them, Cas12a and Cas12b have been developed into highly efficient eukaryotic gene editing systems. Cas12a, also known as Cpf1 protein, includes a RuvC-like domain similar to Cas9 or TnpB proteins. However, unlike Cas9, Cas12a family proteins lack the HNH domain and only use the RuvC domain to cleave both strands of DNA. Cas12b, initially called C2c1 (Class 2 Candidate 1), has a C-terminal sequence very similar to the IS605 family of TnpB proteins, but does not show significant sequence similarity to other Class II family proteins. Its Cas genes include the Cas1 / Cas4 fusion gene, Cas2, and Cas12b genes. The maturation of its crRNA also requires the participation of trRNA. Cas12c was initially called C2c3 (Class 2 Candidate 3), and its Cas gene only includes the Cas1 and Cas12c genes. The Cas12c gene only has limited similarity to the TnpB homologous sequence portion of Cpf1.
[0003] Thanks to improvements in bioinformatics analysis methods, the number of Type V subtypes has exploded in recent years, with a total of 10 Type V subtypes discovered, including Cas12a, Cas12b, and Cas12c proteins. The nucleic acid interference activities of these subtypes have also been gradually demonstrated experimentally. For example, scientists at Arbor Biotechnology demonstrated the DNA double-strand cleavage activity of the effector proteins Cas12c, Cas12g, Cas12h, and Cas12i from Type VC, Type VG, Type VH, and Type VI through in vitro experiments. Furthermore, the effector proteins of Type VD and Type VE subtypes, CasX and CasY, also known as Cas12d and Cas12e, were initially discovered in the metagenomics of "unculturable" microorganisms and were also demonstrated in 2019 to possess genome editing activity in E. coli and human cell lines. Previously considered a subtype of the Type VU family, the Type VF subtype's effector protein is Cas14 (also known as Cas12f), which can cleave single-stranded DNA and RNA. This Cas14 protein, only one-third the size of the Cas9 protein, was first developed into the nucleic acid detection tool DETECTOR, and has recently been shown to have DNA double-strand cleavage activity in prokaryotes and eukaryotes. The Casφ protein (also known as Cas12j), recently discovered in macrophages, has also been shown to have DNA double-strand cleavage ability in prokaryotes, animal cells, and plant cells. In the Type VK subtype, its effector protein Cas12k is "hijacked" by the transposon Tn7, generating an R-loop at the target site and utilizing the targeting ability of crRNA to achieve site-specific transposition of the transposon. This hijacking protein provides a new strategy for targeted insertion into DNA.
[0004] Identifying new CRISPR effector proteins capable of gene editing can enrich genome editing tools and is of great significance to the biomedical field. Invention Summary
[0005] This invention provides at least the following technical solutions:
[0006] Implementation Scheme 1. An engineered, regularly spaced, clustered short palindromic repeating sequence (CRISPR) system, comprising:
[0007] a) Transposons and CRISPR-Cas12 intermediate (TraC) effector proteins, or one or more nucleotide sequences encoding such effector proteins; and
[0008] b) One or more guide RNAs, or nucleotide sequences encoding such guide RNAs.
[0009] The guide RNA is selected from i) guide RNA derived from the right-hand element of a transposon (reRNA) and / or ii) guide RNA containing tracrRNA and / or crRNA, such as a single guide RNA (sgRNA) containing tracrRNA and crRNA.
[0010] The TraC effector protein can form a CRISPR complex with the guide RNA.
[0011] The TraC effector protein can target and bind to target DNA sequences both under the guidance of guide RNA derived from the right-hand element of the transposon and under the guidance of guide RNA containing tracrRNA and / or crRNA.
[0012] Implementation Scheme 2. An engineered CRISPR system according to Implementation Scheme 1, wherein the tracrRNA contains a non-target strand binding sequence (NTB) complementary to the non-target strand (NTS).
[0013] Implementation Scheme 3. An engineered, regularly spaced, clustered short palindromic repeat sequence CRISPR vector system comprising one or more constructs, comprising:
[0014] a) A first regulatory element operatively linked to the nucleotide sequence encoding the transposon and CRISPR-Cas12 intermediate (TraC) effector protein; and
[0015] b) A second regulatory element operatively linked to one or more nucleotide sequences encoding one or more guide RNAs selected from i) guide RNAs (reRNAs) derived from right-hand elements of transposons and / or ii) guide RNAs containing tracrRNA and / or crRNA, such as single guide RNAs (sgRNAs) containing tracrRNA and crRNA.
[0016] The TraC effector protein can form a CRISPR complex with the guide RNA.
[0017] The TraC effector protein can target and bind to target DNA sequences both under the guidance of guide RNA derived from the right-hand element of the transposon and under the guidance of guide RNA containing tracrRNA and / or crRNA.
[0018] Implementation Scheme 4. An engineered CRISPR vector system according to Implementation Scheme 3, wherein the tracrRNA contains a non-target strand binding sequence (NTB) complementary to the non-target strand (NTS).
[0019] Implementation Scheme 5. The system as described in Implementation Scheme 2 or 4, wherein the guide RNA is a guide RNA comprising tracrRNA and crRNA, wherein the tracrRNA contains a non-target strand binding sequence (NTB) complementary to the non-target strand (NTS), wherein the guide RNA hybridizes with the target strand (TS) of the target DNA sequence via crRNA and with the non-target strand (NTS) via NTB.
[0020] Implementation Scheme 6. The system as described in Implementation Scheme 4, wherein during transcription, one or more guide RNAs hybridize with the target DNA, and the guide RNAs form a complex with the TraC effector protein, the complex causing distal cleavage of the target DNA sequence.
[0021] Implementation Scheme 7. The system as described in any one of Implementation Schemes 1-6, wherein the target DNA sequence is intracellular, preferably eukaryotic.
[0022] Implementation Scheme 8. The system of any one of Implementation Schemes 1-7, wherein the effector protein comprises one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences or mitochondrial localization sequences.
[0023] Implementation Scheme 9. The system of any one of Implementation Schemes 1-8, wherein the nucleic acid sequences encoding the effector protein are codon-optimized for expression in eukaryotic cells.
[0024] Implementation Scheme 10. The system as described in any one of Implementation Schemes 1-9, wherein components a) and b) or their nucleotide sequences are constructed on the same or different vectors.
[0025] Implementation Scheme 11. A method for modifying a target DNA sequence, the method comprising systematically delivering the target DNA sequence or a cell containing the target DNA sequence as described in any one of Implementation Schemes 1-10.
[0026] Implementation Scheme 12. A method for modifying a target DNA sequence, the method comprising delivering a composition of a TraC effector protein and one or more nucleic acid components to the target DNA sequence, wherein the effector protein is capable of targeting and binding the target DNA sequence both under the guidance of a guide RNA derived from a right-hand element of a transposon and under the guidance of a guide RNA comprising tracrRNA and crRNA; the effector protein forms a CRISPR complex with the one or more nucleic acid components, and the effector protein induces modification of the target DNA sequence after the complex targets and binds to the 3' of the target DNA sequence of the pre-interstitial adjacent motif (PAM).
[0027] Implementation Scheme 13. The method as described in Implementation Scheme 12, wherein the target gene is in a cell, preferably a eukaryotic cell.
[0028] Implementation Scheme 14. The method as described in Implementation Scheme 13, wherein the cell is an animal cell or a human cell.
[0029] Implementation Scheme 15. The method as described in Implementation Scheme 13, wherein the cell is a plant cell.
[0030] Implementation Scheme 16. The method of Implementation Scheme 12, wherein the effector protein comprises one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences, or mitochondrial localization sequences.
[0031] Implementation Scheme 17. The method of Implementation Scheme 12, wherein the effector protein and nucleic acid component, or a construct expressing the effector protein and nucleic acid component, are contained in a delivery system.
[0032] Implementation Scheme 18. The method of Implementation Scheme 17, wherein the delivery system comprises a virus, virus-like particles, virion, liposome, vesicle, exogenous body, liposome nanoparticles (LNP), N-acetylgalactosamine (GalNAc), or engineered bacteria.
[0033] Implementation Scheme 19. A transposon and CRISPR-Cas12 intermediate (TraC) effector protein or a functional variant thereof for genome editing in an organism or somatic cells, wherein the TraC effector protein or a functional variant thereof is capable of forming a CRISPR complex with guide RNA;
[0034] The guide RNA is selected from i) guide RNAs derived from the right-hand element of a transposon (reRNA) and / or ii) guide RNAs containing tracrRNA and crRNA, such as single guide RNAs (sgRNAs) containing tracrRNA and crRNA.
[0035] The TraC effector protein or its functional variants can target and bind to target DNA sequences either under the guidance of guide RNA derived from the right-hand element of a transposon or under the guidance of guide RNA containing tracrRNA and crRNA.
[0036] Implementation Scheme 20. A transposon and CRISPR-Cas12 intermediate protein (TraC) effector protein or a functional variant thereof for genome editing in an organism or somatic cells, wherein the TraC effector protein or a functional variant thereof (i) comprises at least one, at least two, or all three amino acid sequence motifs selected from “TSxxCxxCx”, “GIDRG”, and “CxxCGxxxxADxxAA”, wherein x represents any amino acid, such as any naturally encoded amino acid; and
[0037] (ii) Contains an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or even 100% sequence identity with one of SEQ ID NO:1-37, or contains an amino acid sequence having one or more, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid substitutions, deletions, or additions relative to SEQ ID NO:1-37.
[0038] Implementation Scheme 21. The TraC effector protein or a functional variant thereof as described in Implementation Scheme 20, wherein the functional variant of the effector protein is derived from SEQ ID NO:25 and comprises, relative to the sequence of SEQ ID NO:25, one or more amino acid substitutions selected from K78R, D86R, S137R, V145R, I147R, P148R, D150R, V228R, V254R, A510R, A278R, K315R, S334R, L343R, A369R, H392R, L394R, S408R, N456R, V500R, A510R, and T573R.
[0039] Implementation Scheme 22. The TraC effector protein or a functional variant thereof as described in Implementation Scheme 20 or 21, wherein the functional variant of the effector protein is derived from SEQ ID NO:25 and contains any set of mutations selected from those shown in Table 3 or Table 4 relative to the SEQ ID NO:25 sequence.
[0040] Implementation Scheme 23. The TraC effector protein or a functional variant thereof as described in Implementation Scheme 20, wherein the functional variant of the effector protein comprises an amino acid sequence selected from SEQ ID NO:80-87.
[0041] Implementation Scheme 24. The TraC effector protein or a functional variant thereof of any one of Implementation Schemes 20-23, having at least the ability to target sequences specifically via guide RNA.
[0042] Implementation Scheme 25. The TraC effector protein or a functional variant thereof of any one of Implementation Schemes 20-23, having guide RNA-mediated sequence-specific targeting capability and double-stranded nucleic acid cleavage activity.
[0043] Implementation Scheme 26. The TraC effector protein or a functional variant thereof of any one of Implementation Schemes 20-23, having guide RNA-mediated sequence-specific targeting capability and nicking enzyme activity.
[0044] Implementation Scheme 27. The TraC effector protein or a functional variant thereof of any one of Implementation Schemes 20-23, which has guide RNA-mediated sequence-specific targeting ability, but does not have double-stranded nucleic acid cleavage activity and / or nicking enzyme activity.
[0045] Implementation Scheme 28. The TraC effector protein or a functional variant thereof of any one of Implementation Schemes 24-27, wherein the guide RNA is selected from i) guide RNAs derived from right-hand elements of transposons (reRNA) and / or ii) guide RNAs containing tracrRNA and / or crRNA, such as single guide RNAs (sgRNAs) containing tracrRNA and crRNA.
[0046] Implementation Scheme 29. The TraC effector protein or a functional variant thereof of Implementation Scheme 28, wherein the TraC effector protein or a functional variant thereof is capable of targeting and binding a target DNA sequence under the guidance of a guide RNA derived from a right-hand element of a transposon, or under the guidance of a guide RNA comprising tracrRNA and crRNA.
[0047] Implementation Scheme 30. The TraC effector protein of Implementation Scheme 28 or a functional variant thereof, wherein the guide RNA is a reRNA derived from the TnpB system, for example, the reRNA comprising the scaffold sequence shown in SEQ ID NO: 77 or 78.
[0048] Implementation Scheme 31. The TraC effector protein of Implementation Scheme 28 or a functional variant thereof, wherein the guide RNA is a single guide RNA (sgRNA) of tracrRNA and crRNA, for example, the sgRNA comprising the scaffold sequence shown in SEQ ID NO: 75 or 76.
[0049] Implementation Scheme 32. The TraC effector protein or a functional variant thereof of any of Implementation Schemes 19-31, further comprising at least one nuclear localization sequence (NLS), cytoplasmic localization sequence, chloroplast localization sequence or mitochondrial localization sequence.
[0050] Implementation Scheme 33. A fusion protein comprising the TraC effector protein or a functional variant thereof described in any one of Implementation Schemes 19-32, and at least one other functional protein.
[0051] Implementation Scheme 34. The fusion protein of Implementation Scheme 33, wherein the other functional protein is a deaminase.
[0052] Implementation Scheme 35. The fusion protein of Implementation Scheme 34, wherein the deaminase is a cytosine deaminase, for example, the cytosine deaminase is selected from APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase, double-stranded DNA deaminase (Ddd), single-stranded DNA deaminase (Sdd) or functional variants thereof.
[0053] Implementation Scheme 36. The fusion protein of Implementation Scheme 35, wherein the fusion protein further comprises a uracil DNA glycosylation inhibitor (UGI).
[0054] Implementation Scheme 37. The fusion protein of Implementation Scheme 34, wherein the deaminase is an adenine deaminase, for example, a DNA-dependent adenine deaminase derived from Escherichia coli tRNA adenine deaminase TadA (ecTadA).
[0055] Implementation Scheme 38. The fusion protein of any one of Implementation Schemes 34-37, wherein the fusion protein comprises cytosine deaminase and adenine deaminase.
[0056] Implementation Scheme 39. The fusion protein of Implementation Scheme 33, wherein the other functional proteins are selected from transcription activators, transcription repressors, DNA methyltransferases, DNA demethylases, and reverse transcriptases.
[0057] Implementation scheme 40. A fusion protein of any one of implementation schemes 33-39, wherein the different parts of the fusion protein can be independently connected by a linker or directly.
[0058] Implementation Scheme 41. The fusion protein of any one of Implementation Schemes 33-40, further comprising at least one nuclear localization sequence (NLS), cytoplasmic localization sequence, chloroplast localization sequence or mitochondrial localization sequence.
[0059] Implementation Scheme 42. The use of any TraC effector protein or a functional variant thereof from Implementation Schemes 19-32 or any fusion protein from Implementation Schemes 33-41 for genome editing of cells, preferably eukaryotic cells, more preferably plant cells.
[0060] Implementation Scheme 43. Use of Implementation Scheme 42, wherein the genome editing includes base editing (BaseEditor), prime editing (Prime Editor), and primeroot editing (PrimRoot Editor).
[0061] Implementation Scheme 44. A genome editing system for site-specific modification of target nucleic acid sequences in a cell genome, comprising:
[0062] TraC effector proteins or functional variants thereof from any of embodiments 19-32 or fusion proteins from any of embodiments 33-41; and / or
[0063] An expression construct encoding the nucleotide sequence of any TraC effector protein or its functional variant in embodiments 19-32 or the fusion protein in embodiments 33-41.
[0064] Implementation Scheme 45. The genome editing system of Implementation Scheme 44, further comprising at least one guide RNA (gRNA) and / or an expression construct containing a nucleotide sequence encoding said at least one guide RNA.
[0065] Implementation Scheme 46. The genome editing system of Implementation Scheme 45, wherein the genome editing system comprises any one of the following:
[0066] i) The TraC effector protein or its functional variant in any of embodiments 19-32 or the fusion protein in any of embodiments 33-41, and the at least one guide RNA, optionally, the TraC effector protein or its functional variant or the fusion protein and the at least one guide RNA form a complex;
[0067] ii) An expression construct comprising the nucleotide sequence encoding the TraC effector protein or a functional variant thereof of any one of embodiments 19-32 or the fusion protein of any one of embodiments 33-41, and the at least one guide RNA;
[0068] iii) The TraC effector protein or a functional variant thereof of any one of embodiments 19-32 or the fusion protein of any one of embodiments 33-41, and an expression construct comprising a nucleotide sequence encoding the at least one guide RNA;
[0069] iv) An expression construct comprising an expression construct comprising a nucleotide sequence encoding a TraC effector protein or a functional variant thereof of any one of embodiments 19-32 or a fusion protein of any one of embodiments 33-41, and an expression construct comprising a nucleotide sequence encoding the at least one guide RNA;
[0070] v) An expression construct comprising the nucleotide sequence encoding a TraC effector protein or a functional variant thereof of any one of embodiments 19-32 or a fusion protein of any one of embodiments 33-41 and the nucleotide sequence encoding the at least one guide RNA.
[0071] Implementation Scheme 47. A genome editing system of any one of Implementation Schemes 45-46, wherein the guide RNA is selected from i) guide RNA (reRNA) derived from the right-hand element of a transposon and / or ii) guide RNA containing tracrRNA and / or crRNA, such as a single guide RNA (sgRNA) containing tracrRNA and crRNA.
[0072] Implementation Scheme 48. The genome editing system of Implementation Scheme 47, wherein the guide RNA is a reRNA derived from the TnpB system, for example, the reRNA contains the scaffold sequence shown in SEQ ID NO: 77 or 78.
[0073] Implementation Scheme 49. A genome editing system of any one of Implementation Schemes 45-46, wherein the guide RNA is a single guide RNA (sgRNA) comprising tracrRNA and crRNA, for example, the sgRNA comprising the scaffold sequence shown in SEQ ID NO:75 or 76.
[0074] Implementation Scheme 50. A genome editing system of Implementation Scheme 47 or 49, wherein the guide RNA comprises tracrRNA and crRNA, for example, a single guide RNA (sgRNA) comprising tracrRNA and crRNA, wherein the crRNA comprises a sequence identical to the target sequence immediately adjacent to the PAM, and the tracrRNA comprises a sequence complementary to the sequence located distal to the target sequence in the direction of the PAM (non-target strand binding sequence, NTB).
[0075] Implementation Scheme 51. A genome editing system of any one of Implementation Schemes 44-50, wherein the genome editing system further comprises a donor nucleic acid molecule, the donor nucleic acid molecule comprising a nucleotide sequence to be inserted into the genome at a predetermined site, for example, the nucleotide sequence to be inserted into the genome at a predetermined site comprises sequences flanking sequences homologous to sequences flanking a target sequence in the genome.
[0076] Implementation Scheme 52. A genome editing system of any one of Implementation Schemes 44-51, wherein the nucleotide sequence encoding the TraC effector protein or a functional variant thereof or the fusion protein and / or the nucleotide sequence encoding the at least one guide RNA is operatively linked to an expression regulatory element such as a promoter.
[0077] Implementation Scheme 53. A genome editing system of any one of Implementation Schemes 44-52, wherein the components of the genome editing system are contained in a delivery system selected from viruses, virus-like particles, virions, liposomes, vesicles, exogenous bodies, liposome nanoparticles (LNP), N-acetylgalactosamine (GalNAc), or engineered bacteria.
[0078] Implementation Scheme 54. A method for producing genetically modified cells, comprising introducing a genome editing system of any one of Implementation Schemes 44-53 into the cells.
[0079] Implementation Scheme 55. The method of Implementation Scheme 54, wherein the cells are derived from prokaryotes or eukaryotes, preferably from mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry such as chickens, ducks, and geese; and plants, including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybeans, peanuts, and Arabidopsis thaliana.
[0080] The main advantages of this invention are:
[0081] (1) This invention yields a new CRISPR effector protein and its genome editing system, enriching the selection and application scenarios of genome editing tools;
[0082] (2) The TraC subbranch CRISPR effector protein obtained in this invention has a dual-guide mechanism and has both the TnpB system and the CRISPR system targeting cleavage pathway. That is, the TraC effector protein can target and bind to the target DNA under the guidance of reRNA and under the guidance of sgRNA, which helps to achieve multiple genome editing under the same gene editing tool.
[0083] (3) The TraC effector protein obtained in this invention is the smallest known monomeric Cas12 protein, which helps to achieve delivery and editing in vivo;
[0084] (4) The TraC effector protein obtained in this invention interacts with the non-target strand of the target dsDNA to form a bubble structure under the guidance of sgRNA containing a non-target strand complementary sequence (NTB) that is complementary to the non-target strand (NTS), which helps to open and edit the distal DNA of PAM. Brief description of the attached diagram
[0085] Figure 1 Three conserved structural motifs are shown in 86 Cas12 proteins.
[0086] Figure 2 This illustrates a prokaryotic expression system for the TraC protein.
[0087] Figure 3 This diagram illustrates a flowchart of a CRISPR system that uses a fluorescent reporter system to screen for DNA double-strand binding capabilities.
[0088] Figure 4 Results of screening for dLbCas12a protein using a fluorescence reporter system.
[0089] Figure 5 A flowchart for screening CRISPR systems capable of cutting DNA double strands.
[0090] Figure 6 Results of DNA double-strand cutting ability tests. A: Test results of TraC-875, TraC-365, TraC-655, and TraC-445; B: Test results of TraC-297, TraC-459, TraC-466, and TraC-949. LbCpf1 was used as a positive control.
[0091] Figure 7 A flowchart for detecting the DNA double-strand cutting ability of a novel CRISPR system using a plasmid interference system.
[0092] Figure 8 Results of testing TraC-459, TraC-875 and TraC-297 proteins using a plasmid interference system.
[0093] Figure 9. A: Secondary structure prediction and structural folding model analysis of accessory RNAs in type V CRISPR and TnpB systems; B: Model of co-evolution of effectors and accessory RNAs.
[0094] Figure 10. Optimization of TraC protein sgRNA. A: Predicted sgRNA for TraC-459 protein; B: tracrRNA: Effects of truncation length of the tracrRNA complementary region, truncation length of the tracrRNA 5' region, and spacer length on the editing efficiency of TraC-459 protein.
[0095] Figure 11 The results show that optimized sgRNA-opt can significantly improve the editing efficiency of TraC-459.
[0096] Figure 12This study demonstrates the ability of TraC-459 to cleave dsDNA of E. coli under different guide RNAs, using plasmid interference experiments.
[0097] Figure 13 The image shows the predicted folding pattern of the TraC-459 protein's three-dimensional structure.
[0098] Figure 14 The TraC-459 variant is shown in the filter.
[0099] Figure 15 Secondary structure prediction of the TraC effector protein sgRNA complex showed a bubble-like region at the end of the tracrRNA.
[0100] Figure 16 TraC effector proteins target DNA in bubble-like regions guided by reprogrammed sgRNAs.
[0101] Figure 17 Reprogrammed sgRNAs can improve editing efficiency.
[0102] Figure 18 The results show that the TraC protein is affected by temperature in plant cells. The editing efficiency of TraC-5M-7 at 32°C is 1-29 times higher than that at 25°C. Invention Details
[0103] I. Definition
[0104] In this invention, unless otherwise stated, scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. Furthermore, the terms and laboratory procedures related to protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, and immunology used herein are all widely used terms and routine procedures in their respective fields. For example, the standard recombinant DNA and molecular cloning techniques used in this invention are well known to those skilled in the art and are described more fully in the following literature: Sambrook, J., Fritsch, EF, and Maniatis, T., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989 (hereinafter referred to as "Sambrook"). Meanwhile, to better understand this invention, definitions and explanations of relevant terms are provided below.
[0105] As used herein, the term “and / or” covers all combinations of items connected by the term and should be regarded as if each combination had been listed separately herein. For example, “A and / or B” covers “A,” “A and B,” and “B.” For example, “A, B, and / or C” covers “A,” “B,” “C,” “A and B,” “A and C,” “B and C,” and “A and B and C.”
[0106] When the term "comprising" is used herein to describe a protein or nucleic acid sequence, the protein or nucleic acid may consist of the stated sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, while still possessing the activities described in this invention. Furthermore, those skilled in the art will understand that the methionine encoded by the start codon at the N-terminus of a polypeptide may be retained in certain practical situations (e.g., when expressed in a specific expression system) without substantially affecting the polypeptide's function. Therefore, when describing a specific polypeptide amino acid sequence in this specification and claims, although it may not contain the methionine encoded by the start codon at the N-terminus, the sequence containing that methionine is still included, and correspondingly, its encoding nucleotide sequence may also contain the start codon; and vice versa.
[0107] The term "genome," as used in this article, encompasses not only chromosomal DNA located in the cell nucleus but also organelle DNA located in subcellular components of the cell, such as mitochondria and plastids.
[0108] As used herein, “organism” includes any organism suitable for genome editing, preferably eukaryotes. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry such as chickens, ducks, and geese; and plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybeans, peanuts, and Arabidopsis thaliana.
[0109] "Genetically modified organism" or "genetically modified cell" refers to an organism or cell whose genome contains exogenous polynucleotides or modified genes or expression regulatory sequences. For example, exogenous polynucleotides can be stably integrated into the genome of an organism or cell and inherited across generations. Exogenous polynucleotides can be integrated into the genome alone or as part of a recombinant DNA construct. Modified genes or expression regulatory sequences are sequences in the genome of an organism or cell that contain single or multiple deoxynucleotide substitutions, deletions, and additions.
[0110] In relation to a sequence, “exogenous” means a sequence that originates from a foreign species, or, if from the same species, a sequence whose composition and / or loci have been significantly altered from its natural form through deliberate human intervention.
[0111] The terms “polynucleotide,” “nucleic acid sequence,” “nucleotide sequence,” or “nucleic acid fragment” are used interchangeably and refer to single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural, or modified nucleotide bases. Nucleotides are designated by their single-letter names as follows: “A” for adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), “C” for cytidine or deoxycytidine, “G” for guanosine or deoxyguanosine, “U” for uridine, “T” for deoxythymidine, “R” for purine (A or G), “Y” for pyrimidine (C or T), “K” for G or T, “H” for A, C, or T, “I” for inosine, and “N” for any nucleotide. Although nucleotide sequences may be represented as DNA sequences (containing T) herein, when referring to RNA, those skilled in the art can readily determine the corresponding RNA sequence (i.e., replacing T with U).
[0112] The terms “polypeptide,” “peptide,” and “protein” are used interchangeably in this invention to refer to polymers of amino acid residues. The term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acids, as well as to naturally occurring amino acid polymers. The terms “polypeptide,” “peptide,” “amino acid sequence,” and “protein” may also include modified forms, including but not limited to glycosylation, lipid linkage, sulfation, γ-carboxylation, hydroxylation, and ADP-ribosylation of glutamate residues.
[0113] Sequence “identity” has a generally accepted meaning in the art, and the percentage of sequence similarity between two nucleic acid or polypeptide molecules or regions can be calculated using publicly available techniques. Sequence similarity can be measured along the full length of a polynucleotide or polypeptide or along a region of that molecule. (See, for example: Computational Molecular Biology, Lesk, AM, ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, DW, ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, AM, and Griffin, HG, eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., Stockton Press, New York, 1991). Although there are many methods for measuring the similarity between two polynucleotides or polypeptides, the term "similarity" is well known to those skilled in the art (Carrillo, H. & Lipman, D., SIAM J Applied Math 48:1073 (1988)).
[0114] In peptides or proteins, suitable conserved amino acid substitutions are known to those skilled in the art and can generally be performed without altering the biological activity of the resulting molecule. Typically, those skilled in the art recognize that single amino acid substitutions in non-essential regions of a polypeptide do not substantially alter its biological activity (see, for example, Watson et al., Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin / Cummings Pub.co., p. 224).
[0115] As used in this invention, "construction" or "expression construct" refers to a vector, such as a recombinant vector, suitable for expressing a nucleotide sequence of interest in an organism. "Expression" refers to the production of a functional product. For example, the expression of a nucleotide sequence can refer to the transcription of the nucleotide sequence (e.g., transcription to generate mRNA or functional RNA) and / or the translation of RNA into a precursor or mature protein.
[0116] The "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, a translatable RNA (such as mRNA).
[0117] The "expression construct" of the present invention may contain regulatory sequences and nucleotide sequences of interest from different sources, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from those normally found in nature.
[0118] "Regulatory sequence" and "regulatory element" are used interchangeably, referring to nucleotide sequences located upstream (5' non-coding sequence), in the middle, or downstream (3' non-coding sequence) of a coding sequence that affect the transcription, RNA processing, or stability or translation of the relevant coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
[0119] A "promoter" refers to a nucleic acid fragment that controls the transcription of another nucleic acid fragment. In some embodiments of the present invention, a promoter is a promoter capable of controlling gene transcription in a cell, regardless of whether it originates from the cell. A promoter can be a constitutive promoter, a tissue-specific promoter, a developmental regulatory promoter, or an inducible promoter.
[0120] "Constraint promoters" refer to promoters that generally cause gene expression in most cell types and under most conditions. "Tissue-specific promoters" and "tissue-preferred promoters" are used interchangeably and refer to promoters that are primarily, but not necessarily, expressed specifically in one tissue or organ, and may also be expressed in a specific cell type. "Developmental regulatory promoters" are promoters whose activity is determined by developmental events. "Inducible promoters" selectively express manipulated DNA sequences in response to endogenous or exogenous stimuli (environment, hormones, chemical signals, etc.).
[0121] As used herein, the term "operably linked" refers to the linking of a regulatory element (e.g., but not limited to, promoter sequences, transcription termination sequences, etc.) to a nucleic acid sequence (e.g., coding sequences or open reading frames) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.
[0122] "Introducing" nucleic acid molecules (such as plasmids, linear nucleic acid fragments, RNA, etc.) or proteins into an organism refers to transforming the cells of an organism with the nucleic acid or protein, enabling the nucleic acid or protein to function within the cell. The term "transformation" as used in this invention includes both stable transformation and transient transformation.
[0123] "Stable transformation" refers to the introduction of a foreign nucleotide sequence into the genome, resulting in the stable inheritance of the foreign nucleotide sequence. Once stable transformation occurs, the foreign nucleic acid sequence is stably integrated into the genome of the organism and its subsequent generations.
[0124] "Transient conversion" refers to the introduction of nucleic acid molecules or proteins into cells to perform their functions without the foreign nucleotide sequence being stably inherited. In transient conversion, the foreign nucleic acid sequence does not integrate into the genome.
[0125] "Temperament" refers to the physiological, morphological, biochemical, or physical characteristics of a cell or organism.
[0126] "Agronomic traits" specifically refer to measurable parameters of crop plants, including but not limited to: leaf greenness, grain yield, growth rate, total biomass or accumulation rate, fresh weight at maturity, dry weight at maturity, fruit yield, seed yield, total nitrogen content of plants, nitrogen content of fruits, nitrogen content of seeds, nitrogen content of plant vegetative tissues, total free amino acid content of plants, free amino acid content of fruits, free amino acid content of seeds, free amino acid content of plant vegetative tissues, total protein content of plants, protein content of fruits, protein content of seeds, protein content of plant vegetative tissues, herbicide resistance and drought resistance, nitrogen uptake, root lodging, harvest index, stem lodging, plant height, ear height, ear length, disease resistance, cold resistance, salt tolerance, and tiller number, etc.
[0127] II. Genome Editing Systems
[0128] This invention provides a novel class of CRISPR effector proteins that possess both TnpB and CRISPR system targeting and cleavage activities. Specifically, they can target and bind to target DNA under the guidance of reRNA or guide RNAs composed of tracrRNA and / or crRNA, such as sgRNA. This subtype of CRISPR nuclease is also referred to herein as a transposon and CRISPR-Cas12 intermediate (TraC) effector protein.
[0129] Therefore, in one aspect, the present invention provides an engineered regularly spaced clustered short palindromic repeat sequence (CRISPR) system, comprising:
[0130] a) Transposons and CRISPR-Cas12 intermediate (TraC) effector proteins, or one or more nucleotide sequences encoding such effector proteins; and
[0131] b) One or more guide RNAs, or nucleotide sequences encoding such guide RNAs.
[0132] The guide RNA is selected from i) guide RNA derived from the right-hand element of a transposon (reRNA) and / or ii) guide RNA containing tracrRNA and / or crRNA, such as a single guide RNA (sgRNA) containing tracrRNA and crRNA.
[0133] The TraC effector protein can form a CRISPR complex with the guide RNA.
[0134] The TraC effector protein can target and bind to target DNA sequences both under the guidance of guide RNA derived from the right-hand element of the transposon and under the guidance of guide RNA containing tracrRNA and crRNA.
[0135] In some implementations, the engineered regularly spaced clustered short palindromic repeat (CRISPR) system is a genome editing system used for genome editing in organisms or somatic cells.
[0136] In some implementations, the TraC effector protein is defined as follows.
[0137] In some implementations, the tracrRNA contains a non-target strand binding sequence (NTB) that is complementary to the non-target strand (NTS).
[0138] In one aspect, the present invention also provides an engineered regularly spaced clustered short palindromic repeat sequence CRISPR vector system comprising one or more constructs, comprising:
[0139] a) A first regulatory element operatively linked to the nucleotide sequence encoding the transposon and CRISPR-Cas12 intermediate (TraC) effector protein; and
[0140] b) A second regulatory element operatively linked to one or more nucleotide sequences encoding one or more guide RNAs selected from i) guide RNAs (reRNAs) derived from right-hand elements of transposons and / or ii) guide RNAs containing tracrRNA and / or crRNA, such as single guide RNAs (sgRNAs) containing tracrRNA and crRNA.
[0141] The TraC effector protein can form a CRISPR complex with the guide RNA.
[0142] The TraC effector protein can target and bind to target DNA sequences both under the guidance of guide RNA derived from the right-hand element of the transposon and under the guidance of guide RNA containing tracrRNA and / or crRNA.
[0143] In some implementations, the TraC effector protein is defined as follows.
[0144] In some implementations, the tracrRNA contains a non-target strand binding sequence (NTB) that is complementary to the non-target strand (NTS).
[0145] In some embodiments, the guide RNA is a guide RNA comprising tracrRNA and / or crRNA, wherein the tracrRNA contains a non-target strand binding sequence (NTB) complementary to the non-target strand (NTS), wherein the guide RNA hybridizes with the target strand (TS) of the target DNA sequence via crRNA and with the non-target strand (NTS) via NTB.
[0146] In some embodiments, during transcription, one or more guide RNAs hybridize with the target DNA, and the guide RNA forms a complex with the TraC effector protein, which causes distal cleavage of the target DNA sequence.
[0147] In some implementations, the target DNA sequence is located within a cell, preferably a eukaryotic cell.
[0148] In some implementations, the effector protein contains one or more nuclear localization signals.
[0149] In some implementations, these nucleic acid sequences encoding the effector protein are codon-optimized for expression in eukaryotic cells.
[0150] In some implementations, components a) and b) or their nucleotide sequences are constructed on the same or different vectors.
[0151] In one aspect, the present invention provides a method for modifying a target DNA sequence, the method comprising systematically delivering the target DNA sequence as described herein or into a cell containing the target DNA sequence.
[0152] In one aspect, the present invention provides a method for modifying a target DNA sequence, the method comprising delivering a composition of a TraC effector protein and one or more nucleic acid components to the target DNA sequence, wherein the effector protein is capable of targeting and binding the target DNA sequence either under the guidance of a guide RNA derived from a right-hand element of a transposon or under the guidance of a guide RNA comprising tracrRNA and / or crRNA; the effector protein forms a CRISPR complex with the one or more nucleic acid components, and after the complex targets and binds to the 3' of the target DNA sequence at the pre-interstitial adjacent motif (PAM), the effector protein induces modification of the target DNA sequence.
[0153] In some embodiments, the target DNA sequence is located within a cell, preferably a eukaryotic cell.
[0154] In some implementations, the cell is an animal cell or a human cell.
[0155] In some implementations, the cell is a plant cell.
[0156] In some implementations, the effector protein includes one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences, or mitochondrial localization sequences.
[0157] In some implementations, the effector protein and nucleic acid components, or constructs expressing the effector protein and nucleic acid components, are contained in a delivery system.
[0158] In some implementations, the delivery system includes a virus, virus-like particle, virion, liposome, vesicle, exogenous body, liposome nanoparticle (LNP), N-acetylgalactosamine (GalNAc), or engineered bacteria.
[0159] In one aspect, the present invention provides a transposon and a CRISPR-Cas12 intermediate (TraC) effector protein or a functional variant thereof for genome editing in an organism or somatic cells, wherein the TraC effector protein or the functional variant thereof is capable of forming a CRISPR complex with a guide RNA.
[0160] The guide RNA is selected from i) guide RNAs derived from the right-hand element of a transposon (reRNA) and / or ii) guide RNAs containing tracrRNA and / or crRNA, such as single guide RNAs (sgRNAs) containing tracrRNA and crRNA.
[0161] The TraC effector protein or its functional variants can target and bind to target DNA sequences either under the guidance of guide RNA derived from the right-hand element of a transposon or under the guidance of guide RNA containing tracrRNA and crRNA.
[0162] In one aspect, the present invention provides transposons and CRISPR-Cas12 intermediate (TraC) effector proteins or functional variants thereof for genome editing in organisms or somatic cells, wherein the TraC effector protein or functional variant thereof
[0163] (i) Contains at least one, at least two, or all three amino acid sequence motifs selected from “TSxxCxxCx”, “GIDRG”, and “CxxCGxxxxADxxAA”, where x represents any amino acid, such as any naturally encoded amino acid; and
[0164] (ii) Contains an amino acid sequence having at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.1%, at least 99.2%, at least 99.3%, at least 99.4%, at least 99.5%, at least 99.6%, at least 99.7%, at least 99.8%, at least 99.9%, or even 100% sequence identity with one of SEQ ID NO:1-37, or contains an amino acid sequence having one or more, for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acid substitutions, deletions, or additions relative to SEQ ID NO:1-37.
[0165] In some embodiments, the effector protein or a functional variant thereof is derived from SEQ ID NO:25.
[0166] In some embodiments, the effector protein or a functional variant thereof comprises, relative to the sequence of SEQ ID NO:25, one or more amino acid substitutions selected from K78R, D86R, S137R, V145R, I147R, P148R, D150R, V228R, V254R, A510R, A278R, K315R, S334R, L343R, A369R, H392R, L394R, S408R, N456R, V500R, A510R, and T573R.
[0167] In some embodiments, the effector protein or a functional variant thereof contains mutations selected from any of the groups shown in Table 3 or Table 4 relative to the sequence of SEQ ID NO:25.
[0168] In some specific embodiments, the effector protein or a functional variant thereof comprises an amino acid sequence selected from SEQ ID NO:80-87.
[0169] In some embodiments, the TraC effector protein or a functional variant thereof possesses at least guide RNA-mediated sequence-specific targeting capability. That is, the TraC effector protein or a functional variant thereof is capable of forming a complex with guide RNA and binding to a specific target sequence (such as a DNA target sequence).
[0170] In some embodiments, the TraC effector protein or a functional variant thereof possesses guide RNA-mediated sequence-specific targeting capability and double-stranded nucleic acid (e.g., double-stranded DNA) cleavage activity. For example, after the TraC effector protein or a functional variant thereof forms a complex with guide RNA and binds to a specific target sequence (e.g., a DNA target sequence), it can cleave double-stranded nucleic acids (e.g., double-stranded DNA) within or near the target sequence, forming double-strand breaks (DSBs).
[0171] In some embodiments, the TraC effector protein or a functional variant thereof possesses guide RNA-mediated sequence-specific targeting capability and nicking enzyme activity. For example, after the TraC effector protein or a functional variant thereof forms a complex with guide RNA and binds to a specific target sequence (such as a DNA target sequence), it can generate a nick within or near the target sequence. TraC effector proteins or functional variants thereof with nicking enzyme activity are also referred to as TraC nickases.
[0172] In some embodiments, the TraC effector protein or a functional variant thereof has guide RNA-mediated sequence-specific targeting capability but lacks double-stranded nucleic acid cleavage activity and / or nickase activity. Such TraC effector proteins or functional variants lacking double-stranded nucleic acid cleavage activity and / or nickase activity are also referred to as dead TraC effector proteins.
[0173] "Guide RNA" and "gRNA" are used interchangeably in this document, referring to RNA molecules capable of forming complexes with TraC effector proteins or their functional variants and targeting the target sequence by means of a certain degree of similarity to the target sequence. Typically, gRNAs in the CRISPR system target the target sequence through base pairing between the crRNA and the complementary strand of the target sequence.
[0174] In this invention, the guide RNA may be selected from i) guide RNA (reRNA) derived from the right-hand element of a transposon and / or ii) guide RNA containing tracrRNA and / or crRNA, such as a single guide RNA (sgRNA) containing tracrRNA and crRNA.
[0175] In some embodiments, the TraC effector protein or its functional variants of the present invention can target and bind to target DNA sequences either under the guidance of a guide RNA derived from a right-hand element of a transposon or under the guidance of a guide RNA comprising tracrRNA and / or crRNA.
[0176] In some embodiments, the guide RNA is a guide RNA (reRNA) derived from the right-hand element of the transposon, for example, the reRNA comprising the scaffold sequence shown in SEQ ID NO:77 or 78. The specific form or sequence of the reRNA may vary depending on the specific TraC effector protein; for design, refer to Karvelis, T. et al. Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature 599, 692-696 (2021).
[0177] In some embodiments, the guide RNA comprises tracrRNA and / or crRNA. In some embodiments, the guide RNA is a guide RNA formed by the complementarity of tracrRNA and crRNA. In some embodiments, the guide RNA is a single guide RNA (sgRNA) comprising tracrRNA and crRNA, wherein tracrRNA and crRNA are fused. In some embodiments, the guide RNA may contain only crRNA, which may also be referred to as sgRNA. The specific form or sequence of the gRNA may vary depending on the specific TraC nuclease.
[0178] In this document, guide RNAs containing tracrRNA and / or crRNA may also be referred to as CRISPR system guide RNAs. Guide RNAs containing tracrRNA and / or crRNA are the standard form of guide RNAs in the CRISPR system. The sequences of tracrRNA and / or crRNA can be obtained by analyzing sequences near CRISPR effector protein loci. Analyzing and obtaining guide RNAs containing tracrRNA and / or crRNA for CRISPR effector proteins is within the capabilities of those skilled in the art.
[0179] In some embodiments, the guide RNA comprising tracrRNA and / or crRNA is derived from or matured from a nucleotide sequence of one of SEQ ID NO:38-74.
[0180] In some embodiments, the guide RNA comprises tracrRNA and crRNA, for example, a single guide RNA (sgRNA) comprising tracrRNA and crRNA. In some embodiments, the crRNA comprises a sequence identical to the target sequence immediately adjacent to the PAM (e.g., at the 3' of the PAM), thereby binding complementary to the opposite strand (target strand) of the PAM. In some embodiments, the tracrRNA comprises a sequence complementary to a sequence at the distal end of the PAM (in the direction of the target sequence) (a non-target strand binding sequence, NTB). In some embodiments, the non-target strand binding sequence is located at the 5' end of the tracrRNA.
[0181] The binding of the NTB in tracrRNA to the distal PAM sequence can help the effector protein-guide RNA complex open the distal PAM DNA region, improving editing efficiency.
[0182] In some embodiments, the complementary sequence of the non-target strand binding sequence is approximately 10 to approximately 50 nucleotides from the PAM, for example, approximately 10, approximately 16, approximately 20, approximately 24, approximately 28, approximately 30, approximately 40, or approximately 50 nucleotides, preferably approximately 20 nucleotides from the PAM. In some embodiments, the non-target strand binding sequence is approximately 5 to approximately 20 nucleotides long, preferably approximately 8 to 12 nucleotides, more preferably approximately 10 nucleotides. In some embodiments, the complementary sequence of the non-target strand binding sequence at least partially overlaps with the target sequence. In some embodiments, the complementary sequence of the non-target strand binding sequence is included in the target sequence.
[0183] As used herein, a "target sequence" refers to a sequence of approximately 20 nucleotides in length in the genome characterized by a flanking (e.g., 5' flanking) PAM (pre-interstitial sequence adjacent motif). Typically, the PAM is essential for the recognition of target sequences by a complex formed by a CRISPR nuclease, such as the TraC effector protein of this invention or a functional variant thereof, and guide RNA. Based on the presence of the PAM, those skilled in the art can readily identify target sequences in the genome that can be targeted. Moreover, depending on the location of the PAM, the target sequence can be located on any strand of the genomic DNA molecule; the strand to which the crRNA binds is called the target strand (TS), and the strand complementary to the target strand is called the non-target strand (NTS).
[0184] In some embodiments, the sgRNA comprises the scaffold sequence shown in SEQ ID NO:75 or 76.
[0185] In some embodiments, nucleotides 154-209 of the scaffold sequence shown in SEQ ID NO:75 or nucleotides 92-147 of the scaffold sequence shown in SEQ ID NO:76 are reprogrammable regions that can be reprogrammed to contain a non-target strand binding sequence (NTB).
[0186] In one aspect, the present invention also provides a protein complex of the TraC effector protein or a functional variant thereof with at least one other functional protein. In some embodiments, the TraC effector protein or a functional variant thereof and the other functional protein form a protein complex via an affinity tag that mediates specific binding. In some embodiments, the other functional protein forms a protein complex with the TraC effector protein or a functional variant thereof by specifically binding to a guide RNA.
[0187] In one aspect, the present invention also provides a fusion protein of the TraC effector protein or a functional variant thereof with at least one other functional protein.
[0188] In some embodiments, the other functional protein is a deaminase. Thus, the protein complex or fusion protein can be used for base editing in an organism or somatic cells. A protein complex or fusion protein comprising the TraC effector protein or a functional variant thereof and a deaminase is also referred to as a base editor. In some embodiments, the protein complex or fusion protein may contain one or more of the deaminases.
[0189] In some embodiments, the deaminase is a cytosine deaminase. "Cytosine deaminase" refers to a deaminase capable of accepting single-stranded DNA as a substrate and catalyzing the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. Examples of cytosine deaminases that can be used in this invention include, but are not limited to, APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase, double-stranded DNA deaminase (Ddd), single-stranded DNA deaminase (Sdd) (Ddd and Sdd refer to CN202310220057.1, PCT / CN2023 / 080052), or functional variants thereof. Each of the aforementioned documents or patents is incorporated herein by reference in its entirety.
[0190] In some embodiments of the present invention, the cytidine deaminase in the protein complex or fusion protein can deaminate the cytidine in the single-stranded DNA generated during the formation of the protein complex or fusion protein-guide RNA-DNA complex into U, and then achieve C to T base substitution through base mismatch repair.
[0191] In some embodiments, the protein complex or fusion protein containing cytosine deaminase further comprises a uracil DNA glycosylase inhibitor (UGI). Uracil DNA glycosylase can catalyze the removal of U from DNA and initiate base excision repair (BER), resulting in the repair of U:G to C:G. Therefore, without any theoretical limitations, including a uracil DNA glycosylase inhibitor (UGI) in the fusion protein of the present invention will be able to increase the efficiency of C to T base editing.
[0192] In some embodiments, the deaminase is an adenine deaminase. "Adenine deaminase" refers to a domain capable of accepting single-stranded DNA as a substrate and catalyzing the formation of inosine (I) from adenosine or deoxyadenosine (A).
[0193] In this invention, the adenine deaminase in the protein complex or fusion protein can deaminate adenosine in the single-stranded DNA generated during the formation of the protein complex or fusion protein-guide RNA-DNA complex and convert it into inosine (I). Since the DNA polymerase treats inosine (I) as guanine (G), the substitution from A to G can be achieved through base mismatch repair.
[0194] In some embodiments, the adenine deaminase is a DNA-dependent adenine deaminase derived from Escherichia coli tRNA adenine deaminase TadA (ecTadA).
[0195] In some embodiments, the protein complex or fusion protein includes cytosine deaminase and adenine deaminase.
[0196] In some embodiments, the other functional proteins may be transcription activators, transcription repressors, DNA methyltransferases, DNA demethylases, etc., thereby enabling transcriptional regulation and / or epigenetic modification functions. In some embodiments, the other functional proteins may be reverse transcriptases. Protein complexes or fusion proteins containing the TraC effector protein or its functional variants and reverse transcriptase can be used for large DNA insertion, such as prime editor (Anzalone, AV, Randolph, PB, Davis, JRe et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).) and PrimeRoot editor (Sun, C., Lei, Y., Li, B. et al. Precise integration of large DNA sequences in plant genomes using PrimeRoot editors. Nat Biotechnol (2023).), each of which is incorporated herein by reference in its entirety.
[0197] Different portions of the fusion protein of the present invention can be independently linked by a linker or directly. The linker described herein can be a non-functional amino acid sequence of 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids without secondary or higher structures. For example, the linker can be a flexible linker.
[0198] In some embodiments of various aspects of the present invention, the TraC effector protein or its functional variants, other functional proteins forming protein complexes, or the fusion protein are recombinantly generated. In some embodiments of various aspects of the present invention, the TraC effector protein or its functional variants, other functional proteins forming protein complexes, or the fusion protein further contains a fusion tag, such as a tag for the isolation and / or purification of the TraC effector protein or its functional variants, other functional proteins forming protein complexes, or the fusion protein. Methods for recombinantly generating proteins are known in the art. Furthermore, various tags that can be used for the isolation and / or purification of proteins are known in the art, including but not limited to His tags, GST tags, etc. Generally, these tags do not alter the activity of the target protein.
[0199] In some embodiments of various aspects of the present invention, the TraC effector protein or its functional variants, other functional proteins forming a protein complex, or the fusion protein of the present invention further comprises a nuclear localization sequence (NLS), for example, linked to the nuclear localization sequence via a linker. Generally, one or more NLSes of the TraC effector protein or its functional variants, other functional proteins forming a protein complex, or the fusion protein should have sufficient strength to drive the accumulation of the TraC effector protein or its functional variants, other functional proteins forming a protein complex, or the fusion protein in the cell nucleus to an amount sufficient to achieve its genome editing function. Generally, the strength of nuclear localization activity is determined by the number, location, one or more specific NLSes used, or a combination of these factors in the TraC effector protein or its functional variants, other functional proteins forming a protein complex, or the fusion protein. Exemplary nuclear localization sequences include, but are not limited to, the SV40 nuclear localization signal sequence and the nucleoplasmin nuclear localization signal sequence. Furthermore, depending on the DNA location to be edited, the TraC effector protein or its functional variants or the fusion protein of the present invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, etc.
[0200] In one aspect, the present invention provides the use of the TraC effector protein of the present invention or its functional variants or other functional proteins forming protein complexes or the fusion protein thereof in the genome editing of cells, preferably eukaryotic cells, more preferably plant cells.
[0201] In one aspect, the present invention provides a genome editing system for site-specific modification of target nucleic acid sequences in a cell genome, comprising the TraC effector protein of the present invention or a functional variant thereof or other functional proteins forming a protein complex or the fusion protein thereof and / or an expression construct comprising a nucleotide sequence encoding the TraC effector protein of the present invention or a functional variant thereof or the fusion protein thereof.
[0202] In this document, the terms "genome editing system" and "gene editing system" are used interchangeably, referring to a combination of components required for genome editing within the genome of an organism's cells. The individual components of this system, such as the TraC effector protein or its functional variants, other functional proteins forming protein complexes, fusion proteins, gRNAs, or corresponding expression constructs, may exist independently or in any combination as a composition. In some embodiments, the components of the genome editing system are contained in a delivery system selected from viruses, virus-like particles, virions, liposomes, vesicles, exosomes, liposome nanoparticles (LNPs), N-acetylgalactosamine (GalNAc), or engineered bacteria.
[0203] In some embodiments, the genome editing system further includes at least one guide RNA (gRNA) and / or an expression construct containing a nucleotide sequence encoding said at least one guide RNA.
[0204] In some embodiments, the guide RNA is selected from i) guide RNAs derived from right-hand elements of transposons (reRNA) and / or ii) guide RNAs containing tracrRNA and / or crRNA, such as single guide RNAs (sgRNAs) containing tracrRNA and crRNA.
[0205] In some embodiments, the guide RNA is a single guide RNA (sgRNA) comprising tracrRNA and crRNA, for example, the sgRNA comprising the scaffold sequence shown in SEQ ID NO:75 or 76.
[0206] In some embodiments, the guide RNA derived from the CRISPR system comprises tracrRNA and crRNA, for example, a single guide RNA (sgRNA) comprising tracrRNA and crRNA. In some embodiments, the crRNA comprises a sequence identical to the target sequence immediately adjacent to the PAM, thereby binding complementary to the opposite strand of the PAM. In some embodiments, the tracrRNA comprises a sequence complementary to a sequence distal to the target sequence in the direction of the PAM (a non-target strand binding sequence, NTB). In some embodiments, the non-target strand binding sequence is located at the 5' end of the tracrRNA.
[0207] In some embodiments, the complementary sequence of the non-target strand binding sequence is approximately 10 to approximately 50 nucleotides from the PAM, for example, approximately 10, approximately 16, approximately 20, approximately 24, approximately 28, approximately 30, approximately 40, or approximately 50 nucleotides, preferably approximately 20 nucleotides from the PAM. In some embodiments, the non-target strand binding sequence is approximately 5 to approximately 20 nucleotides long, preferably approximately 8 to 12 nucleotides, more preferably approximately 10 nucleotides. In some embodiments, the complementary sequence of the non-target strand binding sequence at least partially overlaps with the target sequence. In some embodiments, the complementary sequence of the non-target strand binding sequence is included in the target sequence.
[0208] In some embodiments, nucleotides 154-209 of the scaffold sequence shown in SEQ ID NO:75 or nucleotides 92-147 of the scaffold sequence shown in SEQ ID NO:76 are reprogrammable regions that can be reprogrammed to contain a non-target strand binding sequence (NTB).
[0209] Generally, the 5' or 3' end of the target sequence targeted by the genome editing system of this invention must contain a protospacer adjacent motif (PAM). The specific form or sequence of the gRNA varies depending on the specific nuclease.
[0210] In some implementations, the gRNA used for guided editing can be a so-called pegRNA. The pegRNA is sgRNA with the additional addition of a reverse transcription template (RT) sequence and a primer binding site (PBS) sequence.
[0211] In some embodiments, the PAM recognized by the nuclease of the present invention or a functional variant thereof is a T-rich PAM. In some embodiments, the PAM recognized by the nuclease of the present invention or a functional variant thereof is a G-rich PAM. The PAM may be, for example, 5'-TTTN-3', 5'-TGTNNN-3', PolyT, PolyG, 5'-TTTG-3', 5'-TTC-3', 5'-TGA-3', 5'-YTTC-3', 5'-CTCGTG-3', 5'-GTTG-3', 5'-CTTG-3', 5'-TCTG-3', 5'-TTTA-3', 5'-TTAG-3', where N represents A, G, C, or T, and Y represents C or G.
[0212] Based on the presence of PAM, those skilled in the art can readily identify target sequences in the genome that can be targeted and optionally edited, and design suitable guide RNAs accordingly. For example, if a PAM sequence 5'-TTC-3' is present in the genome, then approximately 18 to approximately 35, preferably 20, 21, 22, or 23 consecutive nucleotides immediately adjacent to its 5' or 3' end can serve as target sequences.
[0213] In some embodiments, the at least one guide RNA is encoded by a different expression construct. In some embodiments, the at least one guide RNA is encoded by the same expression construct. In some embodiments, the at least one guide RNA and the TraC effector protein of the present invention or a functional variant thereof or the fusion protein are encoded by the same expression construct.
[0214] For example, in some implementations, the genome editing system may include any of the following:
[0215] i) The TraC effector protein of the present invention or its functional variant or the other functional protein forming a protein complex or the fusion protein and the at least one guide RNA, optionally, the TraC effector protein or its functional variant or the fusion protein and the at least one guide RNA forming a complex;
[0216] ii) An expression construct comprising a nucleotide sequence encoding the TraC effector protein of the present invention or a functional variant thereof or other functional protein forming a protein complex or the fusion protein thereof, and the at least one guide RNA;
[0217] iii) The TraC effector protein of the present invention or a functional variant thereof or other functional protein forming a protein complex or the fusion protein thereof, and an expression construct comprising a nucleotide sequence encoding the at least one guide RNA;
[0218] iv) An expression construct comprising a nucleotide sequence encoding the TraC effector protein of the present invention or a functional variant thereof or other functional protein forming a protein complex or the fusion protein thereof, and an expression construct comprising a nucleotide sequence encoding the at least one guide RNA;
[0219] v) An expression construct comprising the nucleotide sequence encoding the TraC effector protein of the present invention or a functional variant thereof or other functional protein forming a protein complex or the fusion protein thereof, and the nucleotide sequence encoding the at least one guide RNA.
[0220] In some embodiments, the genome editing system further includes a donor nucleic acid molecule containing a nucleotide sequence to be inserted into the genome at a predetermined site. In some embodiments, the nucleotide sequence to be inserted into the genome is flanked by sequences homologous to sequences flanking a target sequence in the genome. After editing, the nucleotide sequence to be inserted into the genome can be integrated into the genome via homologous recombination.
[0221] In order to achieve effective expression in cells, in some embodiments of the present invention, the nucleotide sequence encoding the TraC effector protein or its functional variants or other functional proteins forming protein complexes or the fusion protein is codon-optimized for the organism from which the cells to be genome edited originate.
[0222] Codon optimization refers to the modification of nucleic acid sequences to enhance expression in host cells of interest by replacing at least one codon of the natural sequence with codons that are used more frequently or most frequently in the gene in the host cell (e.g., about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons while maintaining the natural amino acid sequence). Different species exhibit specific preferences for certain codons of specific amino acids. Codon preference (differences in codon use between organisms) is often associated with the translation efficiency of messenger RNA (mRNA), which is thought to depend on the nature of the codons being translated and the availability of specific transfer RNA (tRNA) molecules. The dominance of selected tRNAs in a cell generally reflects the codons most frequently used for peptide synthesis. Therefore, genes can be customized to achieve optimal gene expression in a given organism based on codon optimization. Codon utilization tables are readily available, for example, in the Codon Usage Database (“Codon Usage Database”) available at www.kazusa.orjp / codon / , and these tables can be adapted in various ways. See Nakamura. Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucl. Acids Res., 28:292 (2000).
[0223] The organism from which the genome is derived for genome editing via the TraC effector protein or its functional variants, or the fusion protein or genome editing system of the present invention can be a prokaryote or a eukaryote, preferably a eukaryote, including but not limited to mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry such as chickens, ducks, and geese; and plants including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybeans, peanuts, and Arabidopsis thaliana.
[0224] In some embodiments of the present invention, the nucleotide sequence encoding the TraC effector protein or a functional variant thereof or the fusion protein and / or the nucleotide sequence encoding the at least one guide RNA is operatively linked to an expression regulatory element such as a promoter.
[0225] Examples of promoters that can be used in this invention include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters. Examples of pol I promoters include the chicken RNA pol I promoter. Examples of pol II promoters include, but are not limited to, the cytomegalovirus Immediate Early (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the simian virus 40 (SV40) Immediate Early promoter. Examples of pol III promoters include the U6 and H1 promoters. Inducible promoters such as metallothionein promoters can be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β-galactosidase promoter, and the Sp6 phage promoter. When used in plants, the promoter can be the cauliflower mosaic virus 35S promoter, the maize Ubi-1 promoter, the wheat U6 promoter, the rice U3 promoter, or the rice actin promoter.
[0226] In some embodiments, to precisely generate guide RNA within cells, in an expression construct encoding the nucleotide sequence of the at least one guide RNA, the 5' end of the guide RNA coding sequence is linked to the 3' end of a first ribozyme coding sequence, the first ribozyme being programmed to cleave an intracellularly transcribed first ribozyme-guide RNA fusion at the 5' end of the guide RNA, thereby forming a guide RNA without an additional 5' nucleotide. In one embodiment, the 3' end of the guide RNA coding sequence is linked to the 5' end of a second ribozyme coding sequence, the second ribozyme being programmed to cleave an intracellularly transcribed guide RNA-second ribozyme fusion at the 3' end of the guide RNA, thereby forming a guide RNA without an additional 3' nucleotide. In some embodiments, the 5' end of the guide RNA coding sequence is linked to the 3' end of the first ribozyme coding sequence, and the 3' end of the guide RNA coding sequence is linked to the 5' end of the second ribozyme coding sequence. The first ribozyme is designed to cleave the intracellularly transcribed first ribozyme-guide RNA-second ribozyme fusion at the 5' end of the guide RNA, and the second ribozyme is designed to cleave the intracellularly transcribed first ribozyme-guide RNA-second ribozyme fusion at the 3' end of the guide RNA, thereby forming a guide RNA that does not carry additional nucleotides at the 5' and 3' ends.
[0227] The design of the first or second ribozyme is within the capabilities of those skilled in the art. For example, see Gao et al., JIPB, Apr, 2014; Vol 56, Issue 4, 343-349.
[0228] In some embodiments, to precisely generate guide RNA within cells, in an expression construct encoding the nucleotide sequence of the at least one guide RNA, the 5' end of the guide RNA coding sequence is linked to the 3' end of a first tRNA coding sequence, the first tRNA being programmed to be cleaved at the 5' end of the guide RNA (i.e., cleaved by a mechanism present in the cell for precisely processing tRNA (which precisely removes the 5' and 3' extra sequences of the precursor tRNA to form mature tRNA)) into a first tRNA-guide RNA fusion transcribed intracellularly, thereby forming a guide RNA without the extra 5' nucleotide. In one embodiment, the 3' end of the guide RNA coding sequence is linked to the 5' end of a second tRNA coding sequence, the second tRNA being programmed to be a guide RNA-second tRNA fusion transcribed intracellularly from the 3' end of the guide RNA, thereby forming a guide RNA without the extra 3' nucleotide. In some embodiments, the 5' end of the guide RNA coding sequence is linked to the 3' end of the first tRNA coding sequence, and the 3' end of the guide RNA coding sequence is linked to the 5' end of the second tRNA coding sequence. The first tRNA is designed to cleave the intracellularly transcribed first tRNA-guide RNA-second tRNA fusion at the 5' end of the guide RNA, and the second tRNA is designed to cleave the intracellularly transcribed first tRNA-guide RNA-second tRNA fusion at the 3' end of the guide RNA, thereby forming a guide RNA that does not carry additional nucleotides at the 5' and 3' ends.
[0229] The design of the tRNA-guide RNA fusion is within the capabilities of those skilled in the art. For example, see Xie et al., PNAS, Mar 17, 2015; vol.112, no.11, 3570-3575.
[0230] III. Methods for site-specific modification of target nucleic acid sequences in the cellular genome
[0231] In another aspect, the present invention provides a method for site-specific modification of target nucleic acid sequences in a cell genome, comprising introducing the genome editing system of the present invention into the cell.
[0232] In another aspect, the present invention also provides a method for producing genetically modified cells, comprising introducing the genome editing system of the present invention into the cells.
[0233] In another aspect, the present invention also provides genetically modified organisms comprising genetically modified cells or their progeny cells produced by the methods of the present invention.
[0234] In this invention, the target sequence to be modified can be located anywhere in the genome, such as within a functional gene like a protein-coding gene, or in a gene expression regulatory region such as a promoter or enhancer region, thereby achieving modification of the gene function or gene expression. The modification in the cellular target sequence can be detected using T7EI, PCR / RE, or sequencing methods.
[0235] In the method of the present invention, the gene editing system can be introduced into cells using various methods well known to those skilled in the art.
[0236] Methods for introducing the gene editing system of the present invention into cells include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, liposome transfection, microinjection, viral infection (such as baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses), gene gun method, PEG-mediated protoplast transformation, and Agrobacterium tumefaciens-mediated transformation.
[0237] In some embodiments, the method of the present invention is performed in vitro. For example, the cells are isolated cells, or cells in isolated tissues or organs.
[0238] In other embodiments, the method of the present invention can also be performed in vivo. For example, the cells are cells within an organism, and the system of the present invention can be introduced into the cells in vivo via, for example, a viral or Agrobacterium-mediated method.
[0239] Cells that can be genome edited using the method of this invention can be derived from prokaryotes or eukaryotes, such as mammals like humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, and cats; poultry like chickens, ducks, and geese; and plants, including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybeans, peanuts, and Arabidopsis thaliana.
[0240] The present invention provides a method for producing genetically modified plants, comprising introducing the genome editing system of the present invention into at least one of the plants, thereby resulting in modifications in the genome of the at least one plant.
[0241] In the method of this invention, the genome editing system can be introduced into plants using various methods well known to those skilled in the art. Methods that can be used to introduce the genome system of this invention into plants include, but are not limited to: gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube pathway method, and ovary injection method.
[0242] In the method of this invention, modification of the target sequence can be achieved simply by introducing or generating the TraC effector protein or its functional variants, or the fusion protein or guide RNA, in plant cells. Furthermore, the modification is stably inherited without requiring the genome editing system to be stably transformed into a plant. This avoids the potential off-target effects of a stably existing genome editing system and also prevents the integration of exogenous nucleotide sequences into the plant genome, thus providing higher biosafety.
[0243] In some preferred embodiments, the introduction is performed without selection pressure, thereby avoiding the integration of exogenous nucleotide sequences into the plant genome.
[0244] In some embodiments, the introduction includes converting the genome editing system of the present invention into isolated plant cells or tissues, and then regenerating the converted plant cells or tissues into complete plants. Preferably, the regeneration is performed without selection pressure, that is, without using any selection agents targeting the selection genes carried on the expression vector during tissue culture. Not using selection agents can improve the regeneration efficiency of plants and obtain herbicide-resistant plants free of exogenous nucleotide sequences.
[0245] In other embodiments, the genome editing system of the present invention can be transformed into specific parts of a whole plant, such as leaves, shoot tips, pollen tubes, young spikelets, or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate through tissue culture.
[0246] In some embodiments of the present invention, in vitro expressed proteins and / or in vitro transcribed RNA molecules are directly transformed into the plant. The proteins and / or RNA molecules enable genome editing in plant cells and are subsequently degraded by the cells, avoiding the integration of exogenous nucleotide sequences into the plant genome.
[0247] In some embodiments, the method further includes treating (e.g., culturing) plant cells, tissues, or whole plants that have been introduced into the genome editing system at an elevated temperature (relative to conventional culture temperatures such as room temperature), said elevated temperature being, for example, 32°C. In some preferred embodiments, the plant is rice.
[0248] Therefore, in some embodiments, genetic modification of plants using the methods of the present invention can yield plants whose genomes are free of foreign polynucleotide integration, i.e., non-transgene-free modified plants.
[0249] In some embodiments of the invention, the modification is related to plant traits such as agronomic traits, for example, the modification results in the plant having altered (preferably improved) traits, such as agronomic traits, relative to the wild-type plant.
[0250] In some embodiments, the method further includes the step of screening plants with desired modifications and / or desired traits such as agronomic traits.
[0251] In some embodiments of the invention, the method further includes obtaining offspring of the genetically modified plant. Preferably, the genetically modified plant or its offspring have the desired modification and / or desired traits such as agronomic traits.
[0252] In another aspect, the present invention also provides genetically modified plants or their offspring or portions thereof, wherein said plants are obtained by the methods described above. In some embodiments, the genetically modified plants or their offspring or portions thereof are non-GMO. Preferably, the genetically modified plants or their offspring have the desired genetic modification and / or desired traits such as agronomic traits.
[0253] In another aspect, the present invention also provides a plant breeding method, comprising crossing a genetically modified first plant obtained by the method described above with a second plant that does not contain the modification, thereby introducing the modification into the second plant. Preferably, the genetically modified first plant has desired traits such as agronomic traits.
[0254] IV. Therapeutic Applications
[0255] This invention also covers the application of the genome editing system of this invention in disease treatment.
[0256] By modifying disease-related genes using the genome editing system of this invention, it is possible to achieve upregulation, downregulation, inactivation, activation, or mutation correction of disease-related genes, thereby enabling disease prevention and / or treatment. For example, the genome modification described in this invention can be located within the protein-coding region of the disease-related gene, or, for example, within gene expression regulatory regions such as promoter regions or enhancer regions, thereby enabling modification of the function or expression of the disease-related gene. Therefore, the modification of disease-related genes described herein includes modification of the disease-related gene itself (e.g., protein-coding regions), as well as modification of its expression regulatory regions (e.g., promoters, enhancers, introns, etc.).
[0257] "Disease-associated" genes are any genes that produce transcriptional or translational products at abnormal levels or in abnormal forms in cells derived from tissues affected by a disease, compared to tissues or cells from non-disease control groups. In cases where altered expression is associated with the onset and / or progression of the disease, it can be a gene expressed at abnormally high levels; it can also be a gene expressed at abnormally low levels. Disease-associated genes also refer to genes with one or more mutations or genetic variations that are directly responsible for or linked to one or more genes responsible for the etiology of the disease in disequilibrium. Such mutations or genetic variations are, for example, single nucleotide variants (SNVs). The transcribed or translated products can be known or unknown and can be at normal or abnormal levels.
[0258] Therefore, the present invention also provides a method for treating a disease in a subject in need, comprising delivering an effective amount of the genome editing system of the present invention to the subject to modify a gene associated with the disease. The present invention also provides the use of the genome editing system in the preparation of a pharmaceutical composition for treating a disease in a subject in need, wherein the genome editing system is used to modify a gene associated with the disease. The present invention also provides a pharmaceutical composition for treating a disease in a subject in need, comprising the genome editing system of the present invention and optionally a pharmaceutically acceptable vector, wherein the genome editing system is used to modify a gene associated with the disease.
[0259] Preferably, the "object" referred to in this invention is a mammal, such as a human.
[0260] In some implementations, the genome editing system described in this invention is used to introduce point mutations into nucleic acids.
[0261] In some embodiments, the genome editing systems described herein are used to correct genetic defects, such as in correcting point mutations that result in loss of function in a gene product. In some embodiments, the genetic defect is associated with a disease or condition (e.g., lysosomal storage disease or metabolic disease, such as, for example, type 1 diabetes). In some embodiments, the methods provided herein can be used to introduce inactive point mutations into a gene or allele encoding a gene product associated with a disease or condition.
[0262] In some embodiments, the purpose of the schemes described in this invention is to treat diseases associated with or caused by point mutations, which can be corrected using the genome editing systems provided herein. In some embodiments, the disease is a proliferative disease. In some embodiments, the disease is a genetic disease. In some embodiments, the disease is a neonatal disease. In some embodiments, the disease is a metabolic disease. In some embodiments, the disease is a lysosomal storage disease.
[0263] In some embodiments, the purposes of the solutions described in this invention are for the treatment of mitochondrial diseases or disorders. As used herein, "mitochondrial disease" refers to diseases caused by abnormal mitochondria, such as mitochondrial gene mutations, enzyme pathways, etc. Examples of diseases include, but are not limited to: neurological disorders, loss of motor control, muscle weakness and pain, gastrointestinal disorders and dysphagia, poor growth, heart disease, liver disease, diabetes, respiratory complications, epilepsy, visual / hearing problems, lactic acidosis, developmental delay, and susceptibility to infection.
[0264] Examples of diseases described in this invention include, but are not limited to, genetic diseases, circulatory system diseases, muscle diseases, brain, central nervous system and immune system diseases, Alzheimer's disease, secretase disorders, amyotrophic lateral sclerosis (ALS), autism, trinucleotide repeat amplification disorders, hearing disorders, gene-targeted therapy of non-dividing cells (neurons, muscles), liver and kidney diseases, epithelial cell and lung diseases, cancer, Usher syndrome or retinitis pigmentosa-39, cystic fibrosis, HIV and AIDS, β-thalassemia, sickle cell disease, herpes simplex virus, autism, drug addiction, age-related macular degeneration, and schizophrenia. Other diseases that can be treated by correcting point mutations or introducing inactive mutations into disease-related genes are known to those skilled in the art, and therefore this disclosure is not limited in this respect. In addition to the diseases exemplarily described in this invention, other related diseases can also be treated with the strategies and genome editing systems provided in this invention, and this application will be apparent to those skilled in the art. The diseases or targets to which this invention can be applied are related to the genome editing systems listed in WO2015089465A1 (PCT / US2014 / 070135), WO2016205711A1 (PCT / US2016 / 038181), WO2018141835A1 (PCT / EP2018 / 052491), WO2020191234A1 (PCT / US2020 / 023713), WO2020191233A1 (PCT / US2020 / 023712), WO2019079347A1 (PCT / US2018 / 056146), and WO2021155065A1 (PCT / US2021 / 015580).
[0265] The administration of the genome editing system or pharmaceutical composition of the present invention can be tailored to the patient's or subject's weight and species. The frequency of administration is within medically or veterinary permissible limits. It depends on conventional factors including the patient's or subject's age, sex, general health condition, other conditions, and the specific symptom or condition being addressed.
[0266] V. Reagent Kit
[0267] The present invention also includes a kit for use with the methods of the present invention, the kit comprising the genome editing system of the present invention, and instructions for use. The kit generally includes a label indicating the intended use and / or method of use of the kit contents. Terminology labels include any written or documented material provided on or with the kit or otherwise accompanied by the kit. Example
[0268] Example 1: Bioinformatics Mining with a Novel CRISPR System
[0269] First, we developed a more targeted strategy for identifying CRISPR effector protein-coding genes, specifically those anchored to shared, highly conserved motifs. We used MEME motif software to predict conserved domains in 86 known Cas12 proteins from the Cas12b-Cas12i family, identifying three conserved motifs present in all 86 Cas12 proteins. Figure 1 The numbers are “TSxxCxxCx”, “GIDRG”, and “CxxCGxxxxADxxAA”, respectively.
[0270] Subsequently, a search was conducted on microbial genome / metagenomic data in published NCBI public databases. An initial search was performed on all proteins within 10kb upstream and downstream of 32,562 CRISPR arrays in the GTDB database, from which 166 candidate proteins were selected that possess at least [missing information - likely related to specific characteristics]. Figure 1 This is the second of the three conserved motifs shown. The next step involved filtering out candidate proteins with the same type as or similar sequences to already annotated proteins using CRISPR type analysis and protein similarity analysis. After removing redundancy, 37 novel proteins containing conserved domains were obtained (SEQ ID NO: 1-37). These proteins were defined as transposon-CRISPR-Cas12 intermediates. Tra nsposon and C RISPR-Cas12 (TraC for short). Accordingly, a CRISPR system using TraC as the effector protein is defined as a CRISPR-TraC system.
[0271] The prokaryotic expression system of TraC protein is based on Figure 2 For example. Figure 2The prokaryotic expression of the TraC-N483 protein is illustrated in the figure, where 483 represents the new protein name, and repeat represents the CRISPR locus region. NC1 and NC2 are non-coding RNA regions where tracrRNA may exist. During gene synthesis, the pTac promoter was used to drive the expression of the protein gene, and J22119 was used to initiate the expression of the Repeat-spacer-repeat-noncoding sequence (SEQ ID NO:38-74).
[0272] Table 1: Sequence Reference Table
[0273]
[0274]
[0275] Example 2: Screening for novel CRISPR systems with DNA-binding capabilities in prokaryotic cells using a fluorescent reporter system.
[0276] The inventors used a fluorescence reporter system to screen for the function of a novel CRISPR system, which can screen for CRISPR systems with DNA double-strand binding ability. The specific experimental design is as follows: Figure 3 , Figure 4 As shown:
[0277] Taking the non-cleaving dead LbCas12a as an example (i.e., dCas is dLbCas12a, and the corresponding Y53 vector is called Y53-dLbCas12a), a plasmid with p15a as the backbone is used to express the Cas12 protein, miniCRISPR (repeat-spacer-repeat) and non-coding RNA sequence (ncRNA), and another plasmid with pBR322 as the backbone is used to express yellow fluorescent protein (YFP) (pUC-PAM-YFP). The YFP protein has a target site complementary to the spacer sequence in the 5' untranslated region and an upstream random PAM library with sequences of: nnnnnnGTGATCGACAGCAACAAGTGAGCG or nnnnGTGATCGACAGCAACAAGTGAGCG, where nnnnnn and nnnn are libraries of different PAM lengths, covering 4096 and 256 PAM sequences respectively. Figure 3 .
[0278] If the protein being tested successfully matures into crRNA and, guided by the crRNA, targets the 5' untranslated region (5' untranslated region) of YFP, the protein will continuously bind to the 5' untranslated region of YFP, inhibiting YFP transcription and thus reducing YFP expression. However, this protein only functions when a suitable PAM is present. Therefore, bacteria with low YFP expression can be sorted by flow cytometry, and then first-generation sequencing can be performed on the sorted bacteria to quickly obtain the PAM sequence of the protein being tested. Figure 4 The screening results for dLbCas12a protein showed that bacteria with extremely low YFP expression in the P2 region (box B) were sorted by flow cytometry. After sorting by first-generation sequencing, the PAM of YFP-negative cells sorted by flow cytometry was found to be TTTN (same as the previously reported results of LbCas12a PAM). However, the PAM of bacteria sorted by flow cytometry (FACS) before or after IPTG-induced protein expression was in the form of random libraries (NNNN) (box A).
[0279] The aforementioned system can be used to screen candidate proteins for DNA double-strand binding characteristics in novel CRISPR systems. The inventors screened some representative candidate proteins. Among them, TraC-N287, TraC-445, TraC-483, and TraC-655 were all screened as T-enriched PAMs, while TraC-N701 was a G-enriched PAM. This suggests that most of these proteins have T-enriched or a small number of G-enriched PAMs, which is consistent with the previously reported finding that most Cas12 family proteins recognize T-enriched PAMs.
[0280] Example 3: Detailed detection of PAM proteins with double-strand cleavage function using next-generation sequencing
[0281] Furthermore, this system can be used to screen CRISPR systems capable of cutting DNA double strands. The specific experimental design is as follows:
[0282] like Figure 5 As shown, the plasmid containing the PAM library was co-transformed with the plasmid expressing the protein (this is the treatment group), while the control group was formed by co-transforming the protein expression vector with the PAM library with the protein expression vector lacking the crRNA expression cassette. Theoretically, the PAM that can be recognized and cleaved by the target protein will be lost, resulting in a decrease in the proportion of targeted PAM compared with the control group. Thus, the PAM sequence of the target protein can be obtained by comparing the reduction of the PAM libraries of the two groups through next-generation sequencing.
[0283] The inventors have made improvements to TraC-875, TraC-365, TraC-655, and TraC-445. Figure 6 A), TraC-297, TraC-459, TraC-466, TraC-949( Figure 6The PAM sequences of proteins B) were tested, with LbCpf1 used as a positive control in both groups. The results for the LbCpf1 positive control in both experiments were consistent with expectations, showing enrichment of TTTN PAM, indicating that the results of both experiments are reliable. In the first group of experiments, TraC-875 and TraC-365 proteins showed enrichment of TGTNNN PAM; TraC-655 and TraC-445 showed enrichment of weaker signals of PolyT or PolyG PAM. Figure 6 A). This result is similar to the PAM results obtained by flow cytometry in Example 2 above. These Cas proteins have 5' Poly T or Poly G PAMs. Experimental results of PAM types for TraC-297, TraC-459, TraC-466, and TraC-949 proteins ( Figure 6 B) It was found that TraC-297 recognizes TTTG type PAMs, TraC-459 recognizes TTC type PAMs, TraC-466 recognizes TTC type PAMs, and TraC-949 recognizes TGA type PAMs. These results further explore the functional requirements of these proteins in eukaryotic systems.
[0284] Example 4: Screening for novel CRISPR systems with DNA cutting capabilities in prokaryotic cells using plasmid interference systems.
[0285] To further test the DNA double-strand cutting capability of the novel CRISPR system, this embodiment uses a plasmid interference system as the detection model. The specific experimental design is as follows: Figure 7As shown. The specific PAM information of the candidate protein with a clear PAM obtained in Example 3 was verified using a plasmid interference experiment system. The specific implementation process is as follows: Taking the candidate protein TraC-459 as an example, Example 3 showed that this protein can recognize a typical 5'-TTC-3' PAM motif, with the 3' end of the motif immediately adjacent to the GFP-T1 target site (SEQ ID NO). IDNO: 79) A series of target vectors (pUC-TTC-YFP, pUC-GTC-YFP, pUC-TCC-YFP, pUC-TTG-YFP, pUC-TGC-YFP, pUC-CTTC-YFP, pUC-GTTC-YFP, and pUC-TTTC-YFP) carrying Tra-C459-recognizable PAM sequences were constructed using the pUC-polyT-YFP vector as a template. The Y53-459 vector was co-transformed with these target vectors into competent *E. coli* cells. Simultaneously, an empty Y53 vector was co-transformed with each target vector as a control. After overnight incubation on Lb-containing solid medium with antibiotics, the number of positive clones was calculated to test the targeting ability of TraC-459 on different PAMs. The results show that the candidate protein TraC-459 can recognize TTC PAMs, but its targeting ability on other PAMs is lower. Figure 8 A) This conclusion is the same as the second-generation test results of Example 3.
[0286] Similarly, the PAM of TraC-875 and TraC-297 proteins was validated. TraC-875 showed strong cleavage activity for the 5'-CTCGTG-3' PAM motif, and its detailed PAM sequence requires further investigation. TraC-297 can broadly and efficiently cleave target sequences under the 5'-GTTG-3', 5'-CTTG-3', 5'-TCTG-3', 5'-TTTA-3', and 5'-TTAG-3' PAM motifs. TraC-949 can cleave target sequences under the 5'-NTGA-3' PAM motif, with the highest cleavage efficiency for the 5'-TTGA-3' PAM motif, while its cleavage efficiency for the 5'-TTGA-3', 5'-ATGA-3', 5'-GTGA-3', and 5'-CTGA-3' PAM motifs is relatively low. The results are shown in... Figure 8 B.
[0287] Example 5: Evolutionary Model of Newly Predicted CRISPR Systems
[0288] To further analyze the structure and functional characteristics of the TraC protein in the newly obtained CRISPR system, the inventors predicted the secondary structure of the accessory RNA of the type V CRISPR and TnpB systems and analyzed their structural folding models. Figure 9AThe study found that different protein subtypes can be classified into three categories based on folding models. These folding models reflect the characteristics of the three CRISPR loci, namely the proximity of the CRISPR protein to the tracrRNA or the absence of the tracrRNA. The classification results indicate that the TnpB protein may have undergone transposon jumps to the CRISPR site, or that the reRNA split into tracrRNA and CRISPR RNA. The diversity of accessory RNA combinations also supports the model of co-evolution of effectors and accessory RNAs (see evolutionary model). Figure 9B ).
[0289] Example 6: Editing activity of TraC protein using sgRNA as guide RNA
[0290] In this embodiment, the TraC-459 protein from the TraC system was selected to verify its DNA editing activity.
[0291] On the one hand, the structure and length of sgRNA can affect the editing efficiency of the CRISPR system. Therefore, the inventors screened for the most suitable sgRNA structure for the TraC system. First, targeting the VEGFA-T1 site in HEK293T cells, the inventors designed a predictive sgRNA (sgRNA-predicted, or sgRNA-pre) through recombination of tracrRNA and crRNA (see [link to documentation]). Figure 10A Subsequently, using sgRNA-pre as a baseline, the effects of tracrRNA:crRNA complementary region truncation length, tracrRNA 5' region truncation length, and spacer length on TraC-459 protein editing efficiency were examined (see [link to study]). Figure 10B The results showed that truncating the complementary region of tracrRNA to 11-15 bp, or the 5' region of tracrRNA to 19-21 bp, or the spacer region to 22-27 bp, resulted in better editing performance. Based on this, an optimized sgRNA (sgRNA-optimal, or sgRNA-opt) was obtained. This optimization strategy is called the TraC system's second-generation sgRNA optimization method (sgRNA-v2). Figure 11 The results showed that sgRNA-opt, as a guide RNA, can significantly improve the editing efficiency of TraC-459.
[0292] Example 7: Editing activity of TraC protein using reRNA as guide RNA
[0293] The co-evolutionary model in Example 5 predicted that the TraC protein is a descendant of TnpB. Since the TnpB system uses a 3' flanking sequence as a guide RNA for DNA cleavage (Karvelis, T. et al. Transposon-associated TnpBis a programmable RNA-guided DNA endonuclease. Nature 599, 692-696 (2021)). This example examines the use of the TnpB guide RNA in the CRISPR system for the TraC protein.
[0294] In in vivo validation experiments, the inventors selected reRNAs (882-TnpB-reRNA and 966-TnpB-reRNA) of structurally similar TnpB mutant proteins as guide RNAs for validation based on three-dimensional structural clustering analysis. Targeting the GFP-T1 target, the inventors fused the scaffold sequences of 882-TnpB-reRNA and 966-TnpB-reRNA with the GFP-T1 target sequence. Subsequently, the plasmid interference experiment described in Example 4 was used to analyze the dsDNA cleavage ability of TraC-459 against *E. coli* under different guide RNAs. The experimental results are as follows: Figure 12 As shown. Experimental results indicate that TraC-459, compared to the blank vector control, exhibits better performance with different types of guide RNA. Figure 12 The pEmpty data showed varying degrees of DNA interference activity.
[0295] The combined results of Examples 6 and 7 indicate that TraC4-59 possesses a dual-guide mechanism, exhibiting both the TnpB and CRISPR targeting pathways. This means that TraC effector proteins can target and bind to target DNA under the guidance of both reRNA and sgRNA.
[0296] Example 8: Prediction of the protein function of TraC protein
[0297] To further elucidate the working mechanism of the TraC system proteins, the inventors constructed a dimer sequence of the TraC-459 protein and used the AlphaFold2 multimer v3 model to predict the folding of the TraC-459 protein's three-dimensional structure. The results showed that none of the five predicted optimal protein structures of TraC-459 (Rank 1-5) exhibited dimer-type interactions. Figure 13The image above is a heatmap of Predicted Alignment Error (PAE), which provides a distance error for each pair of residues. It gives an estimate of the positional error of the AlphaFold2 pair at residue x when the predicted and actual structures are aligned at residue y. Values range from 0-35 angstroms (white-black). It is typically displayed as a heatmap image where residue numbers run along the vertical and horizontal axes, and the color of each pixel represents the PAE value of the corresponding residue pair. If the relative positions of the two domains are reliably predicted, the PAE value for residue pairs with one residue in each domain will be low. (Less than 5A, white in the figure represents 0). The horizontal and vertical axes in the figure represent the lengths of two TraC-459 monomeric proteins. The first 575 amino acids represent one TraC-459 monomeric protein, and the last 575 represent the other. In the heatmap, only the first and second TraC-459 monomers are shown in white. The area between the two TraC-459 monomers is black. Combined with the compact protein structure of 575 amino acids, this suggests that TraC-459 is the smallest Cas12 monomer.
[0298] Example 9: Optimization of the TraC protein
[0299] To further validate the effectiveness of the TraC protein and expand its application scenarios, this embodiment obtained a series of optimized TraC-459 variants through arginine scanning mutation, directed evolution, and artificial intelligence-assisted evolution methods. The screening process is as follows: Figure 14 As shown in Figure ac. After examining the intracellular editing efficiency of the TraC-459 mutant library, some screened TraC-459 mutants exhibited higher editing efficiency. Taking the five-mutant variant as an example, the editing efficiency of the five-mutant variants in the mutant library was tested, with three parallel experiments conducted (Table 2). A ratio >1 between the obtained mutant editing efficiency and that of wild-type TraC-459 indicates that the mutant has higher editing efficiency. The results of mutants with improved editing efficiency screened using this method are shown in Table 3. Representative 5-arginine mutants obtained through arginine scanning mutation screening include TraC-5M-7 (S137R, P148R, D150R, K315R, and A369R), i.e. Figure 14 b) The TraC-5M-7 mutant. Studies have shown that TraC-5M-7 has a 24.02-fold higher editing efficiency at the VEGFA-T1 site than the original TraC-459. The TraC mutants with improved editing efficiency designed using this method are shown in Table 3.
[0300] To further design TraC-459 with enhanced editing activity, the inventors developed a deep learning model using data from a series of TraC variants. This model yielded results showing that seven representative mutants—TraC-B22, -B24, -B26, -B32, -B34, -B35, and B36—exhibited enhanced editing activity in human cells (editing activity as shown in the image). Figure 14 As shown in d, the mutation sites are listed in Table 4).
[0301] Table 2 Results of the five-amino acid mutant arginine scanning mutation experiment
[0302]
[0303]
[0304] Table 3 shows TraC mutants with improved editing efficiency.
[0305]
[0306] Table 4. Mutation sites and amino acid sequences of representative TraC-459 mutants.
[0307] Mutant name mutation site sequence TraC-5M-7 S137R, P148R, D150R, K315R, A369R SEQ ID NO: 80 TraC-B22 S137R,P148R,D150R,M284I,K315R,A369R,A423S SEQ ID NO: 81 TraC-B24 S137R,P148R,D150R,K315R,A369R,I417F,A510R SEQ ID NO: 82 TraC-B26 S137R,P148R,D150R,P245G,K315R,A369R,T426R SEQ ID NO: 83 TraC-B32 S137R,P148R,D150R,K315R,A369R,I417F,I471V SEQ ID NO: 84 TraC-B34 S137R,P148R,D150R,K189Q,K315R,A369R,A489G SEQ ID NO: 85 TraC-B35 S137R,P148R,D150R,K315R,A369R,A423S,A489G SEQ ID NO: 86 TraC-B36 S137R,P148R,D150R,K315R,A369R,P496G,A510R SEQ ID NO: 87
[0308] Example 10: Dual Pairing Function of TraC System
[0309] Subsequently, through secondary structure prediction of sgRNA-opt, we discovered a bubble-like region at the end of the tracrRNA. Figure 15 Based on evolutionary analysis, this prominent region may have evolved from flanking DNA during the reRNA evolution of TnpB, thus it could be a region capable of reprogramming target DNA. Combined with previously reported TnpB structural information, the TnpB protein cannot autonomously open the terminal region far from the PAM end of the target site, resulting in low editing efficiency in regions with high GC content. Therefore, modifying this region may allow it to bind to the region far from the PAM end, helping to open the DNA double strand, thereby improving editing efficiency at high GC content targets. Figure 16 a).
[0310] We reprogrammed the bubble structure region on the VEGFA-T1 target in HEK293 human cells to test the optimal pairing region and pairing length. This included the complementary region of VEGFA-T1 from 13 bp downstream of the PAM sequence to 47 bp. Figure 16 The L1–L7 regions of PAM were evaluated, and the 21–32 nt complementary region downstream of the PAM sequence (L2 construct) was found to have the highest editing activity. Figure 16b). Secondly, we tested the editing activity in the complementary region of the L2 construct with complementary lengths ranging from 20 bp to 6 bp (S1 to S8), and found that the complementary region length of 10 bp (S6) had the highest editing activity. Figure 16 c). Finally, we targeted complementary regions of the bubble structure and the distal PAM region, finding that reprogrammed sgRNAs could improve editing efficiency at endogenous target sites with high GC content in the distal PAM region of four human cell lines. Figure 17 This additional reprogrammable region is not found in other CRISPR systems.
[0311] In summary, this embodiment demonstrates that TraC-459 is a highly compact monomeric Cas12-like protein. By comparing it with known proteins in the prior art, the inventors found it to be the smallest known monomeric CRISPR effector protein. Furthermore, it possesses a unique dual-guide mechanism of sgRNA and reRNA, and a dual-pairing function, which is absent in other Cas12 isoforms.
[0312] Example 11: The effect of temperature on TraC protein in plant cells
[0313] In plant cells, some cells can tolerate high-temperature culture. To verify the effect of TraC protein on temperature conditions in plant cells, the inventors selected five endogenous target sites (OsAAT1, OsALST1, OsEPSPST1, OsPDST1, and OsPDST2) in rice protoplasts for testing at 25℃ and 32℃. Our in vitro tests showed that TraC-5M-7's editing efficiency at 32℃ was 1-29 times higher than at 25℃, with the highest efficiency reaching 3.41%. Figure 18 ).
[0314] Sequence List:
[0315] >SEQ ID NO:1
[0316] MITVRKLKLSIMADEELRIQQLKWIKDEQYNQYRALNNGMAFLIADHMLNTAESTKIIYKNNEINKKKKKIYYMEDKIKKENNKLEEEKILKFESDINKLKHEIKILENEKVELELETKNLSEQFKNHYVEDMYTRLDEIPFQYKDNKSLVQNRLKKDFDFYLNNGGKRGERKPTAYKRDYPLLIRGRLLNFYYNKDNVFIKWIAGITFKVELGNKIKNNIELRHTLHQCMNNEKYKVCDSSLQFDNKNNIILNLTIDIPINTSENNFIEGRVMGVDLGMKIPAYASFNDVEYCRAFGDIEDFLRVRTQLQSRMRKLQMALTLIKGGHGRGKKLQALNRLKDKEKDFVNTYNHMISKRIIEYSIKNCCGVINLEYLSLAAREKDLFLTLQPQKSNRIKRNWSYYDLQTKIENKAKKYGIIVKKIDPYLTSQTCHICGNYDEGQRISQEQFECKACNRKFNADYNASKNIALSTKYINNINESEFFKRYKNN
[0317] >SEQ ID NO:2
[0318] MTDVPVSRIRNFSIVAHIDHGKSTLADRLLQDTGTVAARDMKEQFLDNMELERERGITIKLQAARMDYRAKDGNDYVINLIDTPGHVDFSYEVSRSLAACEGALLVVDASQGVEAQTLANVYLALENNLEIVPVLNKIDLPGAEPDRVAGEIEDTIGLDCSEAIHASAKEGIGIREILEAIVQKVPPPADTVDKPLRALIFDSYYDSYRGVIVYFRVMDGRGKKGDKVRLMASGKEYEIDELGVLAPNQCPIDELHAGEVGYLAASIKAVGDARVGDTITLVSKPATEPWPGYTEAKPMVF CGLFPTDSDQFGELREALDRLRLNDAALSYEPETSSAMGFGFRCGFLGLLHMEIVQERLEREYNLDLIATAPSVVYRVTKIDGEVVMIDNPSELPEPQYRETIEEPYVKVEMITPESFVGTLMELSQTRRGVFVDMRYLTQGRTTLIYEIPLAEVVTDFFDQMKSRSRGYASMEYSLIGYRENPLVRLDVLINGDRVDSLSAIVHRDKAYYVGRALVEKLKELIPRHQFKVPIQATIGSRVVASEAIPALRKDVLAKCYGGDISRKKKLLKKQAEGKKRMKAIGTVDVPQSAFMAVLKIDRE
[0319] >SEQ ID NO:3
[0320] MSALSLSRPLAARLARRNIHYAWVVAAVTFLTMLVTAGAVGIPGVLIAPLEAEFGWTNGQISAAFAIRLVLFGLMGPFAAALMNRFGLRPVILASQLLIATGIGFGAMAMTQLWQLTLLLGVVVGIGTGLTAMVLGATVVSRWFAARRGLVIGLLTASTATGQLVFLPVLAALTEAVGWRSAALSLVTLMLTAALVMALMRNRPSDLGPPFGSATVPPEPPRQSLGAMLISPLTTLRDAVRVPMFWLLFGTFFVCGASTNGLVQTHFIALCGDYGMAAVTAAGMLAVIGVFDFFGTVGSGWLSDRYDSRWLLFWYYSLRGLSLVWLPFSDFSFYALSIFAVFYGLDWVATVPPTVKLTAEHFGERSNIVFGWVFTGHQLGAASAAWSAGATRTAYETYLPAFLTAGLLCLAAALVIALFIGSGLRRRATPGGARAAA
[0321] >SEQ ID NO:4
[0322] MLKAYRYRIYPNKEQEEFFAKTFGACRFVWNKMLEGKLNALNNKEKLPKITPAKYKKQYQFLREVDSLALANVQLNQEKAFRDYFKNKKHFGLPKFKKKKDKQSYTTNNQGNTIRIDFEKQLLYLPKVKTGIKIKLHRIFEGKIKSATITKTKSGNYYVSILVETDKVQNKIKQPKSKICGIDLGLKDFAIITNDNGSCKIENPKYLVRAEKRLKRLQRQLSRKQKGSNNYKKTREKIAKLHEYISNARNDFLHKTSKAIDENQVIVVEGLSVKALQQSMLSKLVSDVSWGTFLRYLEYKANWYGRELIIVDRFYPSSKTCSVCGYINTQLRLSNRYWECPECNTFHDRDINASKNLYKVGLSRVGTTRQACGECSDGGTMIYHRSTSQHSMKQEATTSVSGSSSLSSL
[0323] >SEQ ID NO:5
[0324] LSEQLDDTPEQPNEVEETKKRKQRNKGKHPARIWSVFSRYLVSGREHFDKQVLLAHRFRNKLVELELQRRAAANVVIAQASSELQPLIDALAAAEQVLEVSLQELKAVRAKHRRRAESAAQRDAVTNARTARNQASKALSKARKDAFASEAAQVGLWLAEEHHFQAVLAARHAFINDGLYWPTATDVQDRARAMRKGAPPVFRRFGGAEQAGRIAVQIQKSTDKSQSEGGITFEEAFSCSHGFFRLEKKPGRDPLPEIADQPDYKSKRQQLLTYARAWLRVGSEGKGARAKPCWVVADVLLTRQAPKTARIVQVYLDHSVIGDRERWRLSLVLTNQEGWPKPNRASGCMVGIDLGWRLLDTGELRVAYACGADGQHHELRLPASLVKVWRRPDRIQQERDNLFNDVKARLLEWLKGREDLPDWLKEQAEHLHLWKSSTRLSRLVDHWAGRDINWSSQRRIAGDEEILASLRGWVKRNLHLRDYQYHEREQLAAHRLDVYRKWADGLARLYQTAVLEDADWRDLARLPSPEDDAVNETARYNQRMASPGLLASVITNMFAITSRVECANTTRECWRCGHTEAFDAEAQLIRVCPGCGDACDQDESAARVLLARGQALNQSQVAEAAPSS
[0325] >SEQ ID NO:6
[0326] MDDAPRRTIPIKLDVSEERRGDLHQTKSQFLHCANRTSEWAWRYDDYCVTSKSKAENSLYDELREETDLTANLVQKGIRRAIEAVDAGVEKLQQGEQTSQPEFDSWSVVYDKRSASFNDDHATLSTPNGRVTTEYVLPPEDEREDTPFGRYYESDDWDASSATLQYDEQDDTFYLHVTLKNPDYTVDRTERQEASHDDDGAENGVVLGVDLNVTGAFAVTSTGKFIGSADYLTHKRDQYEKRRARLQQTGTRSAHLTIQSIGSRFSDWSLDWLHNRANDLIAEAQATDVDGIIFENLDHIRENIANESKFQQWTYAKFVELVEYKVESTALFVDTVNPAYTSQRCSHCGFTHEDNRDDKEFACQDCGYEVNADYNAAKNIAKRYCGYIHRGQKSRGGWATSQLALESGTLNVNGEYTPSALRG
[0327] >SEQ ID NO:7
[0328] MPIRLSGADYRRAHEACHNAASLWNFFLASIQEYWKENKTDPTEKEMRHFLYEQRPDLRDGLHAHTVQLILSDLMDAISTYRSNRDSGDNNARAPHREKNYRPLEFSAGFGWRVTPDRQHIALSFGKGHKRITLPLPEIADPKTGEILPPERWGSMQLCWNHNARKWSLHVSVPTPKPPQGNPNIIAAIDEGIINPMTVAIETENSYDVLVINGRHARAIKHYRNTRIATLQEKLSRCVKGSKRWHKLNAKRKRIEAMTDSALHNANHQTTRKVADFIQAHDAGRIVAGDVRDIEKNTRKDEAKRIKNTKNQRRRLSQWSRGKQGTLLEHKTGSTVGHINEAWSSKTCPACQTRNHPNGRGYHCHNCGFTCNRDAVGAINILMRAQHGAYTPIDMDKPIRVKYLRATPIFQPKTVPSME
[0329] >SEQ ID NO:8
[0330] MLGFTIPPKRKFQPRLMICKVSEFKANFTSDQIAIIREYQEGLRRLWNFALGQLINADENWHYDKVSKTYVSKCTLVRHQVKDDLSKTWVAHSPVLDRRSRWIVAACQAYPTATTKAGHQRMADKDHVFTRAEVKASGWVGGYGYSCPIQRDHNPGLISNSAMKDSKRGLGILARARDLQDPTIITPDLLKDLDRQLILNVQQKYRMGLLEQLGGAWDKYVKDRYSKESLRRGKPKFKGTNTPVNTIRHPNPNAGGKKPQSADAVRAIGSDVLKIPGFGDVHLPGLDKRWGKSRVKIMQIVEKASGFFVQLTREETPRRVRKVEPRGYAGIDWGWKEENYCSVSWVNEGGETRSQQISKPRFYRESQEQLGKAQQKLDEKLYRRLILWIHHPDTNLLDYVSEGDADILKTCRTVEDLCAHIGVDLNASTIQRLRWKQCGDSQGVKECKQKIKRIHEKTKRRRRSHQYRIASYLTTYSSGIAVEDGLQKKVGKAKAKAKEDGSGFDKNQASAVTGQNKSNRDAAIGQQIDLIDKKTQEFGKSTQRIKFAADLPLSRICCKCLAYDPEMDISKPVYHCTQCGYEGDRDVNSSALIAKIGREPSLQKTIKQKIKKTRKKS
[0331] >SEQ ID NO:9
[0332] MFAIKSMSELVQHHITIQLKAYLSTTQTALFENWTDSLRPLYNLALGLLYEEQQRRWRTNQKFLKNYLDKS
[0333] SLQTYLNEIENKPDIYPVEWHITKALPECDWLTKEENEVRKKDNTKSLACRTINRDGNFFTPIRPYWHLEEPQKLAKFKCFTNQWLISCNLLTNYHLQKLLNVNMKVRQSFISMNLMEAWKRYQKGDFRKLKFKSKRNPVISLCNKQTNRIKFDPEANNCQLLGKEFGLIEFRGLHNRHQGQIQPRNGSLTKKADGYYLNLVFQVEHKPIPDSDLQVGIDPGLVTLLTLSDGKCISNQRFLKENERHLTVLQKKLSRQTPGSKNWEKTKKALAKIHKQTADHRKYYNHKVSTHLVNKYGAIAIEDTKLTNMNKRPKAEKREDGKGYEHNGAKAKAGLNQSFHDAGLGQLRAFLESKANSYENRHIERVRANYTSQKCSRCGHTDSENRLTQASFHCLKCGLEMPADLNAAINIEQTAFGLDKS
[0334] >SEQ ID NO:10
[0335] MKRVTITIDGEQTKVIVIGTIAANHTAAEWLLTASVSAKSAKVRFDPEEAVAETSSLVMIAPTRTEKYLYLVPDEQVQPVTTIVRKYGLLSPLDWDCPDYPAGDAFEHLFLQNKLWNDLVTIEREHRAKYRELIGSDEETAQMDTEIASIKDRLSVLDEGRKKLRVEHRKKKCPEIDCLDENIKKLKSELKAVASKAKETRAAAKDRIRAAGNDIENLEKDRQAAVIKAYNNSGLWWGNYNAVLESYKKARIKALKDGAELKYHRFDGSGRFTNQIQGGMSVQDLLEGNRNVASLRLVSSGELGDISGKKPPSLDLQSVGSRRDSREYGILA ITLYTGTDEQSKKFRRTLSFPVILHRPLPEGATLKSLSVHRKRVGTDFVWSVVFTFTTDCPTYDQRSSTGNRCGLNLGWKKQAGGGLRVATIYDGSDARHITLPQAIIDGLDYVNGDLQGRIDSAANENHAWLLEQWGGDELPESLQELRSMLRRSKRPHPAKFAKAVIAWRNYPEYLGDARDEAEQRRKATKRLTIEMAHKREKLLRRRMDFYRNTAKQLTSVYDVICLDKMDLRRLLALEKGDGTPNELTKIARKQRQQAAISELRECLSKAAAKNGTQIEQVSTASSATCSACKGKMEQVDGIMWRCRECRALVDQDINAAANLFREVL
[0336] >SEQ ID NO:11
[0337] MTSSIEPTSDASKPATTTRVFKYGLLLPRPEDRSAVWEQLRAAHRYNKLIEIECHRRQAYRELRSELSRQVAVLEGRLAELNAQIDAALAGLQGERKAKRSAKTSGPEKEALKQLKAERKKLSAALKLERAAALNNPVLKTAIEKLDERSHKEL LEARANCGVYWGTYLLVEAAVQQARKSKIDPKFKRWNTGRLGVQVQKGISVSGFFFKAKEQKGGRQLVQVAELDPEAWSSPIRGVRRRLSRTTLRMRINSDAKGKPVWAEWPMVMHRPLPDGAQIMGVTISAHKKADRTFFTVDVACRLPAVVVA PGPRMVGIDLGWRKIGDEIRVASWVDSDGQKGELRLPASIQARIRKARDIRAIRDETRDKMLPVLATLLKPLKTSDEFKERIENLIMWRSFDRLHKLFWFWKNARFAGDEVPWYVFETWHHRDRHLHQYEAGCRSGALGHRREIYRIFAAHVAKHYGKVVLERWDLRKMAERKPPEEDTADSREPNRQRFDAGVSELRLAIVNAVEGGLIWRTPAALTTMRCHACGHVHDESIVALQHQCVGCSLVWDQDENAGWNLLASGQAMDNDPDLLAAANPPKKPRKPRFTKKKKASPTSSTMEHDEDLSPTGS
[0338] >SEQ ID NO: 12
[0339] MIRTYKYSLKAPENFAEDCEDELRRMNDLWNRLIEIDRQRERSFKDLCRSTSAEYAAAQDEIEALREPIDNLYDAIRAERIATRSKEPSDELRARRDELLGRRKALWEICKAIQKAIPKESQAPINEVYKTNVKLARQQSGCFWGNYNAVIESFETAKSKAIKDGGRLHFKSFDGSGRFVNQIQGGMTVTELLAGSHSQAQLTNLVTTNKTKGRFAFTAFTGKDDAGKRFRRQLFSEINYHRPIPADGVIKAVEVVKVPHDGKQKYKWHACFTVALPEVDIKHPKRNIAGVNLGWRQFGGRLRVAVVVDDAGKKTEYFVPAELVSKFEAAETIQKAADDARNEMLSWLRTFYQDNRDEAPQEWRESIQGLLRNRPSVDAANHLMTIWRECVFAQEESRRYAAWLKSDAALRRSYTGCRQNAVKWREEIYRHIAKELAERYAVLAVTDTPLSTMSRTKAKDDLAVDNALPESARRNRVIAAIYSLKEWIGKQAAKTGSTVETITGKMTATCHKCGYVAEKRLRGSQYATCKSCGSELELDENAAINCRNHASGAVLISDKPEKTGRFQRAKMAENDFARKIGDNASPLVT
[0340] >SEQ ID NO:13
[0341] MKTKTIEFKIYPTLAQSQTIDKWLQDIKWVWNKGLSLKFADRQKHYRTQIGDRNIPDGLPLRWKWRKVEATDKKGKITEKWEKIRLVGGGVVRPKSGYPYCPIREHRNIEDPGKFKYFRNDNSPQFVIDIPSEFKEGMADTLKKAWQAYSDPKRPTQKPKFKGKQDKVRSLTSLHAGGKSKLLKPERIPGSDNGFVQFPKLGKLKVKGLYSRHDWVEWGSAKIVKEPSGYYLHVCVDVPNDPLPKSDKSVGIDPGLLSVITTDQGREVAPPKLFRKQQTKLRRLQRKASRQQKGGSNQKKTYQKVALQHEKIRRSRNAFNHKLSTKVVREYSGIVMEDLKIQNLSRKPKAKKREDGNGYEQNGAKRKAGLNKSFADSALGDLISKIETKCKDTDREFVKVAAHHTTVDCSNCGAKIKKALSQRTHRCTECGYEDGRDSNAAKNILIKGQKEFKTVYRAWAWEHGET
[0342] >SEQ ID NO:14
[0343] MFAIKSMSELVQHHITIQLKAYLSTTQTALFENWTDSLRPLYNLALGLLYEEQQRRWRTNQKFLKNYLDKS
[0344] SLQTYLNEIENKPDIYPVEWHITKALPECDWLTKEENEVRKKDNTKSLACRTINRDGNFFTPIRPYWHLEEPAKLAKFKCFTNQWLISCNLLTNYHLQKLLNVTMKVRQSFISMNLMEAWKRYQKGDFRKLKFKSKRNPVISLCNKQTAIIKFDPEANNCQLLGKEFGLIEFRGLHNRHQGQIQPRNGSLTKKADGYYLNLVFQVEHKPIPDSDLQVGIDPGLVTLLTLSDGKCISNPRFLKENERHLTVLQKKLSRQTPGSKNWEKTKKALAKIHKQTADHRKYYNHKVSTHLVNKYGAIAIEDTKLTNMNKRPKAEKREDGKGYEHNGAKAKAGLNQSLHDAGLGQLRAFLESKANSYENRHIERVKANYTSQKCSRCGHTDKENRLTQASFYCLKCGLEMPADLNAAINIEQAAFGLDKS
[0345] >SEQ ID NO:15
[0346] MVKKPPKEKEDKPKKVIEFKIHPNKEQVEEINRSFASCKLLWNLSIALKEESKQRYYRKNYKFDEFSPEIWELSYSGHYDEEKFKSLKDKEKELLIKNPCCKIAYFKKTSDGKEYTPLNAIPIRRFMNAEDIDKDAVNYLNRQKLAFYFREDTAKFIGEIETEFKKGFFKSVIKTAYDAAKKGIRGIPKFKGRRDKVETLVNGQPDTIKIKSNGVIVSSKIGLLKVRGLDRLQGKAPRMAKITRKATGYYLQLTVETDDTIYKESDKCVGLDMGAVAIFTDDLGRQSEAKRYAKYIQK KRLNRLQRQASRQKDNSNNQRKTYAKLARVHEKIARQRKGRNAQLAHKITSEYQSVILEDLKLKNMTAAAAKPKEREDGDGYKQNGKKRKSGLNKALLDNAIGQLRTFIENKANERGRKIIRVNPKHTSQTCPNCGNIDKANRVSQSKFK CVSCGYEAHADQNAAANILIRGLQDEFLRAIGSLIKFPVSLIGKYPGLEGKFTPDLDANQESIGDAPIENAEHSISKQMKQEGNRTPTQSENGSQSLIFSSAPPQPCGDSHGKNKPKALPDKASKRSSKPRGAIPENPDQLTIWDLLD
[0347] >SEQ ID NO:16
[0348] MSKITRKIEIIPDVEGLTHEESNEKCYKAFYNYDRKLYKVANLLVSQLYGLDNLLSLMRLQNEEYVDSQRKLSFKSTTDTAKEEIKKRMEEIDAELMAIKKKIAPMHPQSYSYRAVNSSEYAKDMPSDILDSLKQDVYKHFNDSKKEQIRGERSLTTYKRGMPIPFNLKKKHSIVCDGGNYYLPWFEDTRFLNFGRDRSNNRAIIDNCIKTKKYKLCAAAKIQLKERKLFLLITVDIPKAESVPVKGKVMGVMGVDLGVNPAYVAVNDGPERSRIGNGEAFQKQRDVFRRRFRELQRSQLTQGGHGRKHKTKATEILRGKERNWVQTENHRISREIVNLASRWKVETIQMESLKGFGKNQEGEVENNHKRLLGRWSYFELQKDIEYKAAMAGIKVQYVNPAYTSQTCHVCGQRGNRIERDTFICTNPECTCYNQVQDADMNAAINIAKSIKVVK
[0349] >SEQ ID NO:17
[0350] MLQHVGTTWVSWTRTCRAGGGRWVSVASTKTVVLRAFKFTLAPTATQDQQLLRWCGNARLAFNYALASKRAAHTEWRAQVDALVTSGVKEPVARKRVTGPKTPTKPAVYKAFIAERGDTREGLDGVCPWAHEINTHVFQSAFIDADRAWKNWLDSFKGTRKGRRVGYPRFKKRGRARDAFRLHHTVTKPTIRFSTHRRLRLPTFGEVRLHDSARTLVRQIDRGTAVVQSVTVSRAGHRWYASVLCKVEMDLPSGPTRSQQAAGTVGIDFGVKALAALSKPLVPDRPESTLLPNPRHLAKAAHRLKRAQQTLSRRQKGSARREKARRRVARLHHEVAVRRQSALHQITKRLTTRFATIAVEDLHVSGMTRSARGTMDKPGRKVRQKAGLNRAILDASMAEARRQITYKTSWYGSRLAVLDRWWPSSKTCSACGWQNPSLTLADRVFECAQCGLTLDRDLNAARNIEQHAVQVASGTGETQNARGEPVRLPRPRAEKQGSTKREDTGPPGPVPPRRSDPPTPPNPRQGQAKLF
[0351] >SEQ ID NO:18
[0352] MPIITRKIELKIVRLTDEEYDQQWKYLYQINNTIYQAANRISTHCLFNDEYEMRLRLRYMSQCNKIDKKLEKLNPDKKTSDKEEQYRLLNERKELDEDFKNKKQDFQRIKRNSTYQL VSKEFLNYIPAEILTDLNQYVQNNHNNNKNKVKSGERALSTYKKGMGIPFSIKPESGLRLFVKEEGIYLKWFKGILFRLEFGKDASNNRCIVERLIESDKQQKKGEDVANNSCICL VKNGKNTRIFLLLSIDIPACKQVLDKEVVLGVDLGIKCPLYLAINKDDNFKMQIGDIEHFHNQRTMFQKRFKSLQKLMCTQGGHGRKKLEPLEKLKEKERNWVHTQNHVYSREVIK QALKQNAGTIHMESLKDFGKGKEGYVKDEYKYLLRYWSYYELQSMIEYKAKLEGIEVKYIDPAYTSQTCSYCGERGERKKQEEFVCTNPQCKRRGEKINADFNAARNIAMSKKIVKDN
[0353] >SEQ ID NO:19
[0354] MINKAYKFREEPNQAQAILINKTIGCSRFVFNHFLSLWDHAYKETGKGLTYGICSAKLPAMKKELVWLKEV
[0355] DSIAIQSSVRNLADAYTRFFKNQNSAPRFKSKKNNVQSYTTKQTNENIAVVGNKIKLPKLGLVRFAKSREVEGRIVNATVRRNPSGRYFVSLLVETEVQELPKTHSYIGIDVGLKDFAILSDGTHYENPKFFRSLEEKLAKAQHVLSRRM KGSSRWNKQRVKVARIHEYISNARK DYLDKISTEIIKNHDVGIEDLQVSNMLKNHKLAKAISEVSWLQFRSMLEYKAKWYGKQVIVVSKTFASSQLCSCCGYQNKDVKNLNLREWFCPSCQTHHDRDINASINLKNEAIRLLTARTAGLA
[0356] >SEQ ID NO:20
[0357] MSSERAPKLRNVVTQQAYKYALEPTPRQQCAFSSHAGAARFAYNWGIARVADSLDAYAEQKAAGIDEPDVKFPGHFDLCKMWTAWKNTAEWTDRHTGQTTTGVPWVASNFVGTYQAALRDAAGAWQRFFRARKTGARAGRPRFKKRGRARDSFQLHGDGLRIVDAKHVNLPKIGTVKTFEATRKLARRLAKGSVPCPTCRATGKITDSASGKVKKCSDCKAAGSRPAARIVRGTVARDSAGRWYLALTVELVREVRTAPTPRQLAGGPVGVDFGVRQVATLSTGQLVDNPRHLESHLRRVKTAQQALSRCPPGSRRRAKAQQRLGRLHARVRHLRENSLQQATSALIHQHSVIAVEGWDVQQTAQHASPKNLPKQIRRNRNRALLDTGIGAARWQLQSKGAWYGTTVVVTDRHAPTGRQCSACGTVKATPIPPTQDEYRCPACGTSLDRRTNTARVLAAVAAQHHDAPSGGESKNARGENTRPTAPRRNGQFSAKREPRSRPPGRGQTGTPGT
[0358] >SEQ ID NO:21
[0359]
[0360] >SEQ ID NO:22
[0361] MTGGNSMGGKVLCQTIRLELEYLYDEDGKRLPYYTELKRIQEQVYLADNRTIQILWEWDNYSVFYKKEHGVAPKAIDFSNGQGSIRGYVYDILKTEFPDWYSGNLTTSIQQTLASYKEVRKDVQAGKRSITSYKKDAPMDIHNACISFEFEGGNAAVSLKVFSEPFRKACGYGSTTVTFKVKKLGKAVREIIELCARGTNGYAFGASKLIYNAKKDCWFLHLAYKFRARENENLDPNHIMGVDLGIKYAAYMGFNHSPERYYIKGGEVETFRHKVEARKRSLQEQGKYCGDGRIGHGYQTRMKPVQKLSDSIARFRDTANHKYSRYIVDMAVKHGCGKIQVEDLSNIKDRQDKFLQEWTYFDLQQKIKYKAEAEGIEVVKVAPHHTSQRCSQCGYIDENNRPSQSEFKCGRCGFTANADYNASLNLATAGIEELIAEELRAKRKNASKH
[0362] >SEQ ID NO:23
[0363] MSQTITVKIKLLSTKEQASILKNMSKEYISTINSLVSEMVAEKKSTKKTTKDVPANLPSAVKNQVIKDAKSVFQKAKKSKYTAVSVLKKPVCIWNNQNYSFDFTHISMPIMIDGKTTKTPIRALLVDKDNRNFDLLKHKLGTLRITQKSGKWIAQISVTIPTVERTGLKVMGVDLGLKVPAVAVTGDDKVRFFGNGRQNKYVRRMFKSKRKKLGKLKKLNAIRNLDDKEQRWMKDQDHKISRAIVDFAKENKISVIRLEQLANIRQTARTSRKNEKNLHTWSFYRLSQFIEYKANLVGIKVEYVNPAYTSQTCPKCSARNKAQDRKYKCECGFEKHRDLVGAMNIRYAPVIDGNSQSA
[0364] >SEQ ID NO:24
[0365] MPIITRKIELKIVKDGLTDEEYDQQWKYLYQINNTIYQAANRISTHCLFNDEYEMRLKLHMPRYKEIDEGLENIKTELEKLNTKKKASDKEKRDRLLNEQRKLVDERKELDKDVKNKKKDFLQCSKQNSTYQLVSKEFLKYIPAEILTDLNQYVQNNYNNNKRKVKSGERALSTYKKGMGIPFSIKPESGLRLFIKENGIYLKWFKLNNDRKEKEILFRLEFGKDTSNNRCIVERLIESDRQQKKKGEDYIANNSSIKLVKNGKNTRIFLLLSIDIPAKKQMLDKDVVLGVDLGIKCPLYLAINKNDNFNMQIGDIEHFHNQRTMFQKRFKSLQKLMCTQGGHGRKKKLEPLEKLKEKERNWVHTQNHVYSREVIKQALKQNAGTIHMESLKDFGKGKDGYVKDEYKYLLRYWSYYELQSMIEYKAKLEGIEVKYIDPAYTSQTCSYCGERGERKKQEEFVCTNPQCKRCGEKINADFNAARNIAMSKKIVKDN
[0366] >SEQ ID NO:25
[0367] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPS
[0368] YKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLSKPRRSFAVQIPPDCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALEKAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTAEVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVAIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0369] >SEQ ID NO:26
[0370] MPEIVKERITHHAMKMRLYPSPEQAEKIGRIFRALQLAYNITFHAVFCMDPAVCEEPKADGAVWPSYKKMAKAPWRHALIEKNAAIAEAPAGALMTNEGLFLRDARRAWETGMHNLPINKADRKDFRFYHAGKPRRSFLVQMACGNLVPSPDNPKVAWITVQGVPGKMKARGFDRKIWFGAGGAHTYEEAVAAGEVASNLTVCVSQDSCGDYYISVRFSIGKKGARELYRETRACASPVPIGLDVGIKDIAITSEGTKYPNRRPKQRKSPVLERMSRQLSRRWGPANPSFRDYNKAVRQANRTLDQGPPKPLAQPSRGYLAVQRSRARLERRIARQRESYYHQVTAELVRRASMLAVETLHVKNMMQNHRLAYALGDAAMSDFLEKLRYKAERSHVPLVAVGMFEPTSQRCSVCGAQNEAVKSLAVRRWICPSCGAEHDRDINAARNILTIAQTTGPGEDKAKKDAGPRPKRGPRRPRAEVVFADRPQIVVRFSPELTKLNDPRYLIVDKETGAILDDAQGAGYRSITKAKNCYKAKAAWTKKRQETP
[0371] >SEQ ID NO:27
[0372] MEQMTITAKVQIVATDTDKVLLDETMSAYRDACNYVSDYVFQTHDLKQFSLNKVLYSTLREKFGLKSQMAQ
[0373] SVFKTVIARYKTILENQKEWIKPSFKKPQYDLVWNRDYSLTQNCFSVNTLNGRVKLPYFADGMSKYFDHTIYKFGTAKLINKHGKYYLHIPVTYDVEESNISDICNVVGIDRGINFVVATYDSNHKSGFVSGKAIKQKRANYSKLRKELQMRQTPSSRQRIKAIGGRENRWMQDINHQVSKALVENNPKHTLFVLEDLSGIRNTIERVKTKDRYVSVSWSFYDLEQKLIYKAKQNQSSIIKVDPRYTSQCCPCCGHVEKSNRNKKIHLFTCKNCGYKSNDDRIGAMNLYRMGINYLVDSQVPNTVVTE
[0374] >SEQ ID NO:28
[0375] MAETAVLRAFRFALDATAAQEEGFLRHAGASRWAFNHALGMKVAAHRQWQREVKALVEGGMLEAQARKTVK
[0376] VPVPTRPTIQKHLNRIKGDSRSPDHPEGAQGPQRPCPWFHEVSTYAFQCAFEDADRAWDNWQASLSGRRAGRKVGYPRFKKKGRTKDSFRICHDAKKPTIRPDGYRRLRIPVLGSVRLHDTAKPLARLVDRGASVKSVTVSRSGARWYASVLVSVLQDLPERPTRRQRQAGTVGVDFGVKTLAALSAPVTLPNLGTLTMVPNPRHLASDTRRLTKAQQTLSRTTKGSARRRKAAQRVGIVHHRIASRRATYLHTLTKQLATGYAVVAVEHLNVAGMTSARGTVEEPGSKVRQKSGLNRSILDASPAEMRRQLDYKTRWNGSQLAVCDRWPSSRTCSACGWQKPRLTLAERVFNCGQCGLVIDRDLNAARNIAAHAVLVPHGTAAPGSGEASNARGAATRPATPRGGRQAALKREDTGPPRPVPPQRSDPLTLFTLDVPDQQTAKRP
[0377] >SEQ ID NO:29
[0378] MATEYTCITRKIEVHLHKHGDGEEAEQRRAEEFRIWNEINDNLYKAANRIVSHCFFNDAYEYRLKIQSPRYNEIQKSLRYTKRNKLTDDEIKSLKAERKSLFDEFKKQRQTFLRGGVSEGANPEQNSTYKVVSNEFLDVIPSHILTCLNQNVSSTYKCYSKEVEFGNRTIPNFKKGIPVPFPIKHGNILLLKKREDGSIFVDFPKGLEWDLSFGRDRSNNREIVERILSGQYDAGTSSLQEGKNGKIFLLLVVKIPKQSNALDPNRVVGIDLGINIPLYAALNDNEYGGMSIGSREQFLKMRMRMTAQKRELQRNLHYSTNGGRGRSHKLQALERLEGKERNWVHLQNHIFSKNIIEFAVKNNAGVIQMERLTGFGHDRNDEVDDGFKFILRYWSFFELQSMIEYKAEAAGIEVRYIDPYHTSQTCSFCGHYEKGQRIDQATFICKNPECEKGKGKKRSDGTYIGINADWNAARNIALSDKFVDKKKK
[0379] >SEQ ID NO:30
[0380] MKTTEKNVLMTKCIKVTLNRCVNYNMKEIMNIIREMQYLSSKAYNLATNYLYIWDTNSMNFKNLYEEKIVDKDLLGKSKSAWIENRMNEIMKGFLTNNVAQARQDVINKYNKSKKDGLFIGKVTLPSYKMNGKVVIHNKAYRFSKNEGYFVEIGLFNKEKKEELNCDWIKFKLDKIDSNKKATIYKILNGDYKQGSAQLHINKKGKIEFIISYSFERENSIKLDKNRTLGIDIGIVNIAAMAIWDNNKQEWELTRYSHNLISGNEAIALRQKYYKLGLRNKELEKNINRELHELEEKEYRGLSTNIISGHNLTYKRIMLNSKRIRLSQSCKWCGNSKVGHGRRVRCKQVDKIGNKIERFKDTFNHKYSRYIVDFAVKNNCGIIQMENLKNFNPSEKFLKDWPYFDLQTKIEYKAKEYGIEVIKVNPKYTSKRCSRCGCINELNRDCKKNQSKFKCVNDECNNYENADINAAKNIALPYIDKIIEQCLETNKVV
[0381] >SEQ ID NO:31
[0382] MLRAYRLELEIASERKRRFLHQHFGAARFIYNWALRLVREKGFEFFKQGKGGIGKRILYYWREERDKVAPW
[0383] HHEINAHTFNGAVLDLGRDLKHHSPSQLRGRSNRHRSAHFYGIKLQHIGRRSIKLPGSRAGEFRGGVYLKVKKGTKLYEDIQQGRIRTIKRITVSERAGRYFASVLCEVQEPEPLPCTGRVCGLDVGISSIVTVALSDGRIIKQGNPQWLAKKLRRLRRIQRSLARCKTHFPRDPGFKSLHILRRDGEVLAVFPLMWRRQNGELVPEVGDDVRILAEGVASYALLSRAPDRLEKPYRLTVGGQGSGANIELSTMRIRQGHPVRVVHLSVDRSHTMERRRKTLARAHYRVSVAREDFWHRLSTWLVRNCDVIVVELSIKGMLRSGRFSRHIADAAWGTFFQMLQYKCERYGRTLLRVERSYPSSKKCSRCGKIRKALSSERTYRCDQCGLKIDRDENAALNLMKLGLAHLTSPTTARLAGSDAGGDTIAGGAAACT
[0384] >SEQ ID NO:32
[0385] MITVRKVKLIVNSEEAEEINRTYKFIRDSIYAQYQGLNRCMGYLLSGYYANGMDIKSDGFKNHMKTIKNSLNIFDDINFGIGIDSKSAITQKVKKDFSTSLKNGLAKGERGATNYKRNFPLMTRGRDVKISYLEDINTFVIRWVNKIEFKVILGQKDNIELSHTLHKIINKEYNLGQCTFEFDKNNKLLLALNINIPDNLISKNKEIIPGRVLGVDLGVKVPAMICLNDNTFIKKSIGSYNEFFKVRSQFKARRERLYKQLESSNGGKGRKHKLKATMQFRDKEKNFARTYNHFLSKNIIEFAKKYKCETINLEELNKGFDNNLLGKWGYYQLQSMIEYKAERVGIKVKYVDPAFTSQTCSKCGYVDEENRITQDKFECQKCGFTLNADHNAAINIARK
[0386] >SEQ ID NO:33
[0387] MTMFERYVKLPVYTINLRLYPTSSQKETIDTIIHELHKACNIAVYDMFTNLTNTTTEREDKESKQKIHFPNINTMIKKDYLEKLREERPSIIPASALSGNNGVFKRDLSKRLDAQVSEGNQKKKTNGKGVKRPIESSKAPYYSKKHPRRSYTYQETLSKIIFNETNDNVVHLNLNKVGRIKARGMKNYLRILRFDETCNMTFKEYCELHKRTAHLFTIKKDNCGDYFIQICLKNVYKLIKDSDNKKEIGIDVGVTDLMILSDGTKYSNPRFKNGEHGEVQKHREQLHRQLSRREGFANIKFREKYKVDHDTLPSKRYQETKLQAAKLERKVARQRKHHMENMVLDVIRQSSFIGIENLSVKDMMKNVKKNKSET
[0388] >SEQ ID NO:34
[0389] MPIITRKIELKIVKDGLTDEEYDQQWKYLYQINNTIYQAANRISTHCLFNDEYEMRLRLRYMSQCNKIDKKLEKLNPDKKTSDKEEQYRLLNERKELDEDFKNKKQDFQRIKRNSTYQLVSKEFLNYIPAEILTDLNQYVQNNHNNNNKKKVKSGERALSTYKKGMGIPFSIKPESGLRLFVKEEGIYLKWFKGILFRLEFGKDASNNRCIVERLIESDKQQKKKGEDYVANNSSIKLVKNGKNTRIFLLSIDIPAKKQVLDKEVVLGVDLGIKCPLYLAINKDDNFKMQIGDIEHFHNQRTMFQKRFKSLQKLMCTQGGHGRKKKLEPLEKLKEKERNWVHTQNHVYSREVIKQALKQNAGTIHMESLKDFGKGKEGYVKDEYKYLLRYWSYYELQSMIEYKAKLEGIEVKYIDPAYTSQTCSYCGERGERKKQEEFVCTNPQCKRRGEKINADFNAARNIAMSKKIVKDN
[0390] >SEQ ID NO:35
[0391] MTKVVKLPLICEQSDKDGNPIDYKKIYEILFELQRQTREIKNKSIQYCWEFSNFSSDYYKQNHEYPKEKDILSYTLVGFVNDKFKTGNDLYSGNCSTTVRGACGEFKNSKTDFLKGTKSIINYKGNQPLDLHNKTIRFECIGKDYYAYLKLLNRPAFQRNNFSSSEIKFKVLVYDNSSKTIVERCIDNIYKISASKLIYNEKKKCWVLNLSYSFTNNNVCELDENKILGVDLGIHYPICASVNGERKFFKIDGGEIDHTRRKIEVRKKSLLKQGSSCGEGRIGHGIKTRNKPVYNIEDKIACFRDTANHKYSRALINYAVNNCGIIQMEKLTGITADSDRFLKNWSYFDLQTKIEYKAKEAGITVVYIDPQYTSQRCSKCGYISKENRKVQAKFCCQKCGYEANADYNASQNIGIKDIDKIIKNTK
[0392] >SEQ ID NO:36
[0393] MITVRKLKLTIINDDETKRNEQYKFIRDSQYAQYQGLNLAMSVLTNAYLSSNRDIKSDLFKETQKNLKNSSHIFDDITFGKGTDNKSLINQKVKKDFNSAIKNGLARGERNITNYKRTFPLMTRGTALKFSYKDDCSDEIIIKWVNKIVFKVVIGRKDKNYLELMHTLNKVINGEYKVGQSSIYFDKSNKLILNLTLYIPEKKDDDAINGRTLGVDLGIKYPAYVCLNDTTFIRQHIGESLELSKQREQFRNRRKRLQQQQLKNVKGGKGREKKLAALDKVAVCERNFVKTYNHTISKRIIDFAKKNKCEFINLEQLTKDGFDNIILSNWSYYELQNMIKYKADREGIKVRYVNPAYTSQKCSKCGYIDKENRPTQEKFKCIKCGFELNADHNAAINISRLEE
[0394] >SEQ ID NO:37
[0395] MRKLRKSFKTEINPTEEQKIKIRKTIGTCRYIYNFYLTHNKELYDNGKKFMSGKSFSVWLNNEYLPNHPEYSWVKEVSSKSVKHSIECGCTAFTRFF KHQSAFPNYKKKGKSDVKMYFVKNNPKDCICERHRINIPTLGWVRIKEKGYIPTTKDGYIIKSGSVSIKVDKFYVSVLVEIPDGQIANNSNDGIGID LGLKDFAIVSNGKTYKNINKSARLKKLEKQLIREQRCLSRKYDENLKKGEVTQRANIQKQKRKVQKLHHKIDNVRTDYINKTIAEIVKTKPSYITIE NLNVKGMMKNRHLSKAVASQKFYEFRAKLQSKCNENGIELRVVARWYPSSKMCHRCGCIKRDLKLSDRIYRCECGYVEDRDFNAALNLRDAITYEVA
[0396] >SEQ ID NO: 38
[0397] GTAATAACTTAACTATATAGAATATGAATGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTAATAACTTAACTATATAGAATATGAATTATTTAACAAGTTAAGGCGATTCAACGTCTTTGGTTAGAGGATTATATCACTCGGCAAAAGGGTTAATGTATGTCAATATGTTACCGACATACTATTCTAAAAAATAATTTTGCAGTAAATCAAATTTTCATTAAAATATGCCTTAAAGTTAGTTGCATCAATATATTACATGTATTTTTAATTTTGTTAATAAAATATTACTATAGAATTACTGCAAAAACTTTCGTTTTAAAGATGAATAATAGCACTATAACTATATTTACATTAGGCATTCAAAGATTTTTATATACGAACTGATTTATAAATTAGAAATTTAGCTATATAAAAGGAGTACTTAATTTAATAAGGAACAATTAAATTTTTACTAATATTGAGTAGTTAAATTAAATACTACTCTTTTTTTATGCTCAATTATAGAAAGGACGTGATTATTTGGATACGGAATTATGGAAG
[0398] >SEQ ID NO:39
[0399]
[0400] >SEQ ID NO:40
[0401] GCTTCAATGAGGCCCGGGCTGTTGGCCCGGGAGTCGTGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGGGCTTCAATGAGGCCCGGGCTGTTGGCCCGGGAGTCGTCTCGAGCGCCGGATCGCCGCGCGGTTCGATCGCGCCGGAGGGGGCCCCACGGGGTGCCCGTCCGGCGCGGGAATCCGGACCGCGAACGCTTAGAGCCCTCGGCGGCGCGCGATAAGATGTATATACATCGTGGCCGACATGATGCATATGCATCTCTGACGTTCCTCTCTCCTCCGGCAACAGGTCGCTCCACCGGAGCGTGACGGGGCTAACGGCACGGTGTCGCAATGAGGCCCGGGCTGTTGGGCTGGGAGTCGTTCCGGCTCTATCTCGCGCTCGCCGAGAACATGCGATTCAAGCCTCGGGACCACCTCAAGCGCATCGAGCACCTGAAGCTTCAATGAGGCCCGGGAGTCGTTTGACTGCAAAAGTCAAGTCTCATTCCGAATATCAGGGCCCCCCGGATTTCCCCTGGCCGGCGAGTGCCTTGCAGGGGTGTTTGCGAGGGGTCCGCGGATGCATGCGGGTGCGCATGGATTTGCCCACGTTCCGGTTCATCCACGATAACAAAGAACCCGCCGGAATCAAGGCGTTGGCCGGTCTGCGAGCGCTGGGCGGAAGATCTGTCGCGCGGCAGCGCTCGCGCCGAGGCGGCGAAGCTT
[0402] >SEQ ID NO:
[0403] SEQ ID NO: 42
[0404] GTGACAAAGCCCTGTGCAGCGGGCTCAAAGCTGCGACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGGGTGACAAAGCCCTGTGCAGCGGGCTCAAAGCTGCGACCCGAGGAGTAATAGTTCTGGCCAACAAGCAGTTTGAGCACCTTTTGGAGGACCAGTACGATGGCGAACAGTCGCAGCATTCAGCCCGCTGCGGAGGGCTTTGGTCCAGCGGTCATCGTAGCACGTGCTAACTCTGTCGCCCGTCGAAATGAGCGCTCGCGAGTGGCTGATACAGGTCTGCGAGGCCCACAGGGCCAGGCAGCCATGGCGTCTTGTTGATTCACAGGCAGGTCCTTAACCGCGCCAAGGCTCTCTGCAGCGGACTAAATGCTGCAAAACCAATCGGTCCCACCTTTCAGGCCCCCTCTGTTGCTACCGTACATCCGTTCACAGAGTCAGGGCCTGCGCGGCGGCGCGGTAGCCGCGGCTGGCGATCCGGCCGACCCATGAAGGAGCGCTATCTAGGCTGGAACTTGCGTCGCTTTTGAGCCAGAAGTGGCGCTCGCGCGACGCAAGTTCCATACTAGTTAGTGTTTACGGCGGCTACCGTGTAGCAACTTGTCGTAACGGGTCGTCTGGTTGGTAACGCGCTCGCGTGGCTGGTTGTAACCTCTTGTCCTATCGCGTGCTACGCTACCGAGGGACGTGAATCGCTCCCAAACAGGAGTCAGCG
[0405] >SEQ ID NO:43
[0406]
[0407] >SEQ ID NO:44
[0408] GTGTCGCCGATGGTGGAGGTTTGGGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGGTGTGTCGCCGATGGTGGAGGTTTGGGCTTCCTCCGGCCTTGTGGCGAATAAGGGTGCTGGTGCGCCACCTGTCCGAATCCTGGATGTCTGATGCCGTGCCGGCTGCATGCAATGCCGCGCGCCAGGTCTCCTGGTGCATCCGGTTGGTGTCGTCGCCACACAGGTTAGCGGCAATCGAGAGGAATGTCACTAGTTCGCGCGAAAAAAATCACAAAATCACGCCTGCCCTTCGCCCCCGGGCACACATCTGGGAATCCACCGTGCTCAGCCTTCAAGCCAGCCATACAGCACCAGGGAATTCCCTTACTAATTTCTGTGCCGAATCCTTTTATCATGCCCCTCGCCAAGCGGGACAGTGCTGTGCTCAGTCCGCTTGGTCATCCCCTGCGATGTGGGGTCGCCGATGGCGGATGTTAGGAATCCCTATAGGAATCTGTATCCTCCGTTTTCGGCGACGGTTTTGTAGCACAATCATAGACCATGTGATAGAATGAGAGTCATGAAAAGAAGCGCCAGAACCAAAGACAGTGAGACATGGAGGGTTGCCGTCGCAATTCCGCTTGTAGAGGAGCTGCCTCTGCTTCGGGCCTGTAACCTCCACAATCGGCGACCCATGGATAACCTGTTCTAGGAGGTTGTGTCCTCCGTTTTCGGCGACGAACCGTATCATTTCGTTTGCCTGTGCCGTGCCTGCGCCATGCCTGCGCGATGCGCTGACGGGAATCATGGGGTATACAAAAGGGGCCGGTCCCGTGGAGGGAACCGGCCCCTTGAGTTCGTTCGGTGCTGCCGAATCATTTGTGGTTTTCAACGTCGTCGGACGCTGAAAACCACGTCGACCAGCAAACTGTAGTCTCGCGGCCAGCGGTCTGCGGTCTGCGGTCTGCGGCCACGGCCAGCGGCCTACCGCGACCGTGCCTTCGAC
[0409] >SEQ ID NO:45
[0410] GTTTCAAAATCTAATAGAGATTGCTGTGGATTGAAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTTTCAAAATCTAATAGAGATTGCTGTGGATTGAAACCTTACGATGTTCCCTGGTTTCGAGGGATACGGCGGTGTAGCTTCGGCAGCCGCAATCTGCTGGTGAGATCAACCAGCGATTTCTAATAGAGATTAGTTATTCATGAACCTTGAAAACTGAATATTCGGATTGCCCTCTAATTCACCGCAATCTCTATTCGCTTTTGGAAGAATTCCGACACAGGGTAGATGTTAAGTTGAAGTGATTCCGAGATAGGCTACAGCCCCGATCCCGATTGACTTCTAAGGACTCAACTGCTCAATCGACCGTCGGCCCTTATGCAGTCAGGGTTTCAGCGATTTTGGCAGGATCAAAGTCTCTCATTCAGCCATCTTTGGGGTTCAGTGTCGGAAACTGAGTGTCAAAGCTATGGCTGATAAGGATTCTACAGCAAGCGGTAGGGGCAAAAAATCTTGCGACCTTACCATTAA
[0411] >SEQ ID NO:46
[0412] GTTTCAATCCCATTGCTAGGATTCATTAATAAGAAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTTTCAATCCCATTGCTAGGATTCATTAATAAGAAACAGCTTGATCTACTAAACCATGTTTTTCCTTCAAGCAGTGATCTTAATCACATGAACGTCTTATTTTTTTAGCAGTTAATAATCATTTTCTTTTAGCGTTTTG ATAGCCTCGTGAGCTATACCTTATACCCAATCCTGTTTAGATAAACTGATTTACATTCTATCGGTTCCCTGTTTCCTCTTATATTTTTGCTATTATTAAATCAAGAGATTTTGAACCTTGAAAACTGAATAAAAATAGATCCCCTAATTTTGGCGAGAGCCACCGTACCCAAGTTTCAAGGGGGAACGTCGGTGAAAGACCGGAAAAGCTTGCCTT TGAGTGCCATAGTGTGCTGTCAGCAAGAAGCCGCAAGGCACAGCCTGAGATTACAGGCTAATCCCTCCTTTTTAGGGGATCTTCGCTAGGACTTAACTATCGCTAGAACTTAACTAAAACTGACGGCTCGAAGCGGGGTCAAAATCCCTGGCCTCCCGTCAGATCGCCAGAACCTTGATAATTAAAGAGTTGTAGCTGTTTCACCTCTGGGTTA TTACTAACATTGATGTCCCCAATCACTCGCGATCGCGCCAAGCCCATCCCCATAAGGATTGTGAGGTTAAATCTGACGGGACTCAAAAAAATGGAAGCATGGGGAAATCATTTTCTCCCTGCTTCTTGTAGGGAGATTCATTAGGAATTTTCGTGTTCCCGTCTTTGTTGACAAGAAACAAAGATCATTTTTAATTGTCGGGGGTTGCTTATTCT
[0413] SEQ ID NO: 47
[0414] GTCTCCACTACAAAGCGGCGGAGCACGCATGTCAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTCTCCACTACAAAGCGGCGGAGCACGCATGTCAACTGCCTATCAACTATATCCTTAAAATTACGAGATTATTCCGGGGTGATTTCGAGCACCCCTTTCAATTCGTCTATTTCAGACCACGGATTTGCAGTTAACTCTCGTTATCATACAAGAACTCGCTCCTTAAATTAAACGGTTCGAGCGGTACCGATGGTTTACGCAGGCTCTCCCTTGAGAAAAATCTTTGTCACGTGACCACGAATATTAATACAACACCGTTGTGTAAACACGCAACCATCTAATGGTTATTGAATAATTCGCATACACTAAA
[0415] >SEQ ID NO:48
[0416] GTCTCAGAACCCAAAGATCTGAGCTTCTCTTGTGAGGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTCTCAGAACCCAAAGATCTGAGCTTCTCTTGTGAGATCGCTACGCTAGCAACTAGCTCACTTTACGTCAAGAGCTACTAACACACTTTACGAACAGGTGTTAGGTTCAACGAATACCAGCTTGCCAGGCTTGAGCGATCTTTACCGCACGAGGTATCTTGGCTTCTTCATCTCCGGGAACAGAGGTCGTATGCGGTTGTCAAAGAGCGCGCGGACCGGCCCTTCGTCGGTCTAAGATCCCAAAGACCTGAGCTTCTATTGCAATACGCAGGAATCAC
[0417] >SEQ ID NO:49
[0418] GTCTCATTAACCAACCCAATGAGCAGGGTTTGTAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTCTCATTAACCAACCCAATGAGCAGGGTTTGTAACAACATGTACATTATATGGGTGATAATAGCGACCGATGACGTAATTAAATATTACAGTCCTAACAGGTTGGTGAAAAACTATATTTGAGACGCTAATTTTGCTCAATCGAACCAGCATTTAGTAGGAAATCGGCTTCATCGTCACTAAATATAGAGTCATTCAAAGCCAAAAAAGAG TTTTTCACCCAACCTGCTAAGTGTAATATTACAGCGAATGGTGTCTTATTGGCGTTTTATGCCTAACACACACACATTAGCATTAACCATAGTCGCTATATGTGCATCACTTTTCGGGATTCCGATCTATGTAATTGCCAATGCG TCTTTCGTTGGTCGATGAATCTTATTGCCTTCCTACAGCGACTTTACTGTACTGAAAGGAGCACGGGGATTTGTCTCATTAAACCAGCCCAATGAGCAGGGGTTTGAACTTTAAAACATTCAGATTTCGCTAATTGCGGGGCTACT
[0419] SEQ ID NO: 50
[0420] GTTAGGATGCCCGCAGGATGTCATATAGTTGGCAAGGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTTAGGATGCCCGCAGGATGTCATATAGTTGGCAAGATACAATTAATTTGTGGGTGTACTAGACCTACTCCTAGTGGTGAATCGGTGCTGACAACACCATCACGCGACTAGGTTTAATATTTTAATTTTACTGTATAAAGTACCAAAAAAACTATTATTAATTGTGGTAAAATATTAAAGTCAGCCCACGAAGGAATACGGGAATATTCCGTGTGGTTTCA CTGACTTATCTTTCTCCTGAAATAGCAGGTTACTCAATCCATGTCACCAGAGAGTAAGCTGTGTATGACTTTCATTGCTAGTTTCCGAGGGTCGCTGTTTTTTGGCATTTTTTCTTTTTGTTTTAATTGGTAAATCGTCTGTAACTTGGTCATGGCAAGTAACCGAAGTTCCCGGCTTGCCAACTCGGAGGATGTAATGTAGTTGGGGGAGAATATTCATCTCCCGGTGCTATTTCCGATAAATCGGTCTTAGCTTCCTGATGACATTCCGTATTGGAATCAGAGTCAGGCTT
[0421] >SEQ ID NO: 51
[0422] GTTTCAATCCCATTGCTAGGATTCATTAATAAGAAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTTTCAATCCCATTGCTAGGATTCATTAATAAGAAACAGCTTGATCTACTAAACCATGTTTTTCCTTCAAAAGCAGTGATCTTAATCACATGAACGTCTTATTTTTTTAGCAGTTAATAATCATTTTCTTTTAGCGTTTTGATAGCTCGTGAGGTATACCTTATACCCAATCCTGTTTAGATAAACTGATTTACATTCTATCGGTTCCCTGTTTCCTCTTATTGCGATCGCGCCAAGCCCATCCCCATAAGGATTGTGAGCTTAAATCTGACGGGGCTCAAAAAAATGAGTGATTGGGGACATCAATGTTAGTGATAACCCAGAGGTGAAACAGCTACAACTCTTTAATTATCAAGGTTCTGGCGATCTGACGGGAGGCCAGGGATTTTGACCCCGCTTCGAGCCGTCAGTTTTAGTTAAGTTCTAGCGATAGTTAAGTCCTAGCGAAGATCCCCTAAAAAGGAGGGATTAGCCTGTAATCTCAGGCTGTGCCTTGCGGCTTCTTGCTGACAGCACACTATGGCACTCAAAGGCAAGCTTTTCCGGTCTTTCACCGACGTTCCCCCTTGAAACTTGGGTACGGTGGCTCTCGCCAAAATTAGGGGATCTATTTTTATTCAGTTTTCAAGGTTCAAAATCTCTTGATTTAATAATAGCAAAAATGGAAGCATGGGGAAATCATTTTCTCCCTGCTTCTTGTAGGGAGATTCATTAGGAGTTTTCGCCTTCCCGTCTTTGTTGACAAGAAACAAAGATGATTTTTAATTGTCGGGGGTTACTTGTTCT
[0423] >SEQ ID NO:52
[0424] GTTTCAATCCCTAATAAGGCTTAAGCTTAATTGCTTCGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGTTTCAATCCCTAATAAGGCTTAAGCTTAATTGCTTCATTAAACCTAATTGAACTTATACTAGAGCTAGGCACTCTTTCAATTTGGTATTTATCTGGTTTCTTTACATTCTTTTTTGCTTTCATGGTTTCGACCTCACATAACCACTAACACACTTTATATTAACTTAAAAATAAAGCACCCATCTTCCCGAAACCCTTGATGTAAAAGGATTTCGGGTAAGGATTGCGCCCCATCATTAAAAATTCAGTTTTCAAGGTTCTGCTCGGTTCAAAAAATT GTTTCAGCGATCGCCGATCGCGTTGGTTGGTCTAGGTACGCGTACCGACGATCGTCGTGAATAAGTGCGGCGTTGCTGAACGGGGGGATGATAGCCGAAAGATGCGAAACGGGAAAAAAGAGATCGAATCGATCGCGAAAAATGAATTTCAGGCGTTGCATAATAGAGATATGAATGGAGGAAATCAAAATGCAATGCCCTGAATGTAAATGGTTCTTCCGAACTTACACTGTAGAAAATTCCTCAAGCTCCCTCTCTGCCAAGCAATGCTCTAAAGTTGATAAATATCAATCTAACAAAAACTCAAAGCCTTAGTATACAAGCTTTACAAGCACGAGTTGTTGAGAGTTTT
[0425] SEQ ID NO: 53
[0426]
[0427] >SEQ ID NO:54
[0428]
[0429] >SEQ ID NO: 55
[0430] GTTGGATATACCATTCAAAACAATGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATGGTAGCGTTGGATATACCATTCAAAACAATACTGACCTCCTTTTTTATTAGTGGGGACTGTAATGACAAAATTACAATTTTATTTGAAATGGCAAAGCTTTCTTGCAGATTATTACATACGTTAGTCGTTTCTTGTTACTAACGTATG CAATGATTGATTACTAACGTATGTGATAAGCAACCACATGCGTTAGCAATAATAGCCCTTATAAAGTATATTACCTTATAACTAAAATCAGTAAAAAAAGACACGATTATCAACCCCCATGCGCACATCATCTCAGAGCGGCTTATATTTTGATATACCATTCAAAACAATAATAAAACCAACATTCGGCCACATAGCCGTCTGTATTC TTCTGTTGGATATACCATAATAAAACCAACTCTGTGCCGGTTGTGCTGGGTGTAGGACTTGTTGGATATACCATTCAAAACAATGTTGGATATACCATTCAAAACAATGATAAAACCAACCGTTAGTTTCTAAGTTTATTAATTTCAATGTTTTGACGTACTTCCCGAGTATAGTTTCCTATCTCAGGTTAAGGACTTCCAGCAGCCCCAATTAGAAAATTTCTTTTTGTAATAATAGGCATACTCTTTTTAATTTTACAACCGCAAATATTGTAATTATAATTGTAAGTCTCAAATCCAACTCAAAATAATTTTCAGAATTTCACAAAATTCTCAAAATCGCCCATAAAAAATCCCCCATGCAACCGCATGAGGGATTCCACAAGTAAATTCTATTTCGAGAAATTTATTTTGCTACA
[0431] SEQ ID NO: 56
[0432]
[0433] >SEQ ID NO:57
[0434] GACGCCTGGCTCGACACCCCCACCGAACACTGGCCGCTGACCCTGCCGCCGGCCACGACAGACCCGCCCCCGAGGTCCTAGACCGCGAGACGAGAGGACGAGTTGTCGGGCCGCCGCCGCAGCGGTGATCGGTGAGCCCGACAGGGAGGCGTAAGCGGCCGCGCGGTCGGCGGTGATCCGCGCGGTGTCATACCGCAAGCAGCTTCGACAGCACCTGCGATCCAGGCTGTAGCACGCGGAGAGACGCGCGTAGTCCATCTCGGAACGGGTCCGGGCCGCCGCCCGGGCGCGCTGGGAGACGGTGATCGACAGCAACAAGTGAGCGGACGCAACTACCCGTCGCCCGTGTGCGCTGGGAGACGGTGATCGACAGCAACAAGTGAGCGGACGCAACTACCCGTCGCCCGTGTGCGCTGGGAGACGCCCCTTCCCGAGCTGGTGTCTTCTCCCGACCGCTGTGAAGTTGTGGGGTGCACGGACGCCCGGACCTTACGTTCCGGGGGTCCCCGTCTGACCTCGGCCGGGTGGCCGGGAGCGGGGTTCTCGTTTCGCCGAGAACTGCTCAGCTTCGGCGCCCTCTCCGACGACATCGGCCCCTCCACATTCCGGCACCGTAAGGCTACCAACGAAACCGACCCCGCCCGCACAGTGCCCGAAGCAGCCATGCGGCCGATGCCGGGCCGTCCCCGGGCCGCCGGGCAGGAGCACGCGGGGGCCGAGGACACCGCGCGCCGGCCGGCTGCCGGGGCGCAGAGCCGGGGCGCATGCCCGCGCGAGCCGCGCGGCGGGACCGCCACCCGGATCGCCGAAGCCGCCGGCCTCGGCACCGGTCCGAGCGGCGCAGGCTCCGGCGGCCAGGTGCTTCGCTGGGGCGGCCGACGGGCGGCCGCCCCAGCGTGACGTCAGGTACGCGGCTTGAGCCTGA
[0435] >SEQ ID NO: 58
[0436] TCTACACTAGTAGAAATTCTGAATGAGTTTTAGACGTGATCGACAGCAACAAGTGAGCGGACTCTACACTAGTAGAAATTCTGAATGAGTTTTAGACCCAGGTCCCGGACCGTAACTTTAATCCGAATGATTCCGATGTGGAAGACAGCCTTTCGGCCTGTTTGTTTTGGCGACTTAAACGAATCGAAAGGCCGCTTTTATTCTATCGCCAGCAGACCGTATCGTGCAGGGCGTCAGCCCCTGCACCATATCAAAAATTCACTTGATAACTGTCACACTCGGGTCTTAATCAATCTACACTAGTAGAAATTCTGAATGAGTTTTAGACGTGATCGACAGCAACAAGTGAGCGGACTCTACACTAGTAGAAATTCTGAATGAGTTTTAGACTTACCTAACATAAACTGTATGTAAGTAAT GGTATATAACACCTCTTACCGTACCCTAGCCAGCATATGAAAAATTGATCAGAAAACAGGATTGCGATTACTTAGCACGCGCAGAATTTGGGTTTTAGACTGGACACGGTCCAACACGGTCCAACCATTTAGACCCGATAGTTCGTACAAGGCCCATTTTAATCGGTTTAACGCGTTAGATCAATGTTATATATCATAAATAAACGTTTT TTTCATCATATATATACAGATTCGGACATCGTGCTGAAAAGAAGACGGAGGTTTTGTCCATCCATGCTAAGTAATCCGCCTGTGTGCATATATAATTTTGTATACCAAATGTTTTATTTATTTTTATAACATAATATTGTTATTAATGTAATTTTGCAATTATATATAAAGAACGCACATGCAGCCAACATGAAGTCCGGCGGACGCAGGAAC
[0437] SEQ ID NO: 59
[0438]
[0439] >SEQ ID NO:60
[0440]
[0441] SEQ ID NO: 61
[0442] ACTAACGTATGTAATGATTGATTACTAACGTATGTGATAAGCAACTACATGCGTTAGCAATAATAGCCCTTATAAGTATATTAAATTATAACCAAAATCAGTAAAAAAGACACGATTATCAACACCCATGCGTACATCATCGCAGATGACTTTATAG TTGGATATACCATACAAAACAATAATAAAACCAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAAATATACCATTCAAAACAATGATAAAACCAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAAATATACCATTCAAAACAATGA TAAAACCAACATTTAGTTTCTAAGTTTATTAATTTCAATGTTTTGACGTACTTCCCGAGTATAGTTTCCTATCTCAGGTTAAGGACTTCCAGCAGCCCCAATTAGAAAATTTCTTTTGTGTAATAATAGGCATACTCTTTTTAATTTTACCACCGCAAATATTGTAATTATAATTGTAAGTCTCAAATCCAACTCAAATAATTTCCTATAAAATCTATCGGCAAGTTTATATAATAGAATATCATGGCAAAGATACATTTTTTGCCGATATCGGCAATATTTCAATCAAATAAAGTCAGGCTATTATCCACCTGT
[0443] SEQ ID NO: 62
[0444] GAAGCACTTCACAGAAGTGCATGGAGCAGAAGGTGATCGACAGCAACAAGTGAGCGGACGCAACTAGAAGCACTTCACAGAAGTGCATGGAGCAGAAGCGCACTCCGTCTCTGCCCTGTCAAGGCAGCTGTTTTGCAAGTTCGATTATACCTTTGGGGTAAATGGTTGTCAATGGCGGGCCCGCGACAGAAAATAGGAAGCCTTTCTCTGCACATTCAACAAACAACATTTACAGAATGCCCCAATTATGTTATACTAATGTTGCCTTGCAATAATGTGAACGTGAAATGGAGGGATGTCTGTGGCAGATATCAAATCTCAAAGTGGTGGGAGGTCTGTCCCCACCATGGGGTGCGAACCTTGTGTGCTCATCATTGCCGTGAGCGTTCGCACGTCCAAACGACCATATCATCGTTGCCCTGCGACCATTCAGCGGCAATCAAGACGCAGGCATGATATGTAACCATGCATTTCTGTGAACTGCTTCCAAAACGCACTGCTTCATTATAATCCTCCCTGTGCAGTTCTGCCGAAGCACTTCACAGAAGTGCATGGAGCAGAAGGTGATCGACAGCAACAAGTGAGCGGACGCAACTAGAAGCACTTCACAGAAGTGCATGGAGCAGAAGAAGTCGTTTATAGCGGTTTGTTTACAGGGAGTTAGAAGCACATTTCAATTTTTCAGTAAAACTCCTCCAGCTTTGCAGCTGGGGGAGTTTTTTACTCCCCTCTTGACAATTATCTTCTACGTCTGTAGAATATAATCAAACTACAACCGTAGAAGATATTCACAGAAGGGAGGATAACGCTATG
[0445] >SEQ ID NO:63
[0446] CTTCTGCTCCACGCACTCGTGCGGAATGCTTCGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATCTTCTGCTCCACGCACTCGTGCGGAATGCTTCGGGGTCCGTAAGCGAGACTGATACCATGAGAAAAACGAGCGCCGTGGGGAGCCACGGCGCTCGTTCTTTTTTGTTGGGGGTTGAGAAAGGACCGCCGGTATCCCGGCGGTCCTGCTTCGTTTACAGAATGGAGCGAGGATATGGCACCTCTCTCGTGCGGAATGCTTCCTTGCGGCTGCCAGGCAGATGCGCATCGTCGTAGGTCTTCTGCTCCACGCACTCGTGCGGAATGCTTCGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATCTTCTGCTCCACGCACTCGTGCGGAATGCTTCGGCAAGATCCTACGATCTCATGGCTGCGAGAGGACGACGCGGAAGCAGTCTGCACGAATGCGTGGTCCCATATCATACCCGCGCGCGGATCGCCCACTCGAAATGGCGCGCAAGATAGCGATGATATGTCTGTTCGGAGGTGCGAACACCCGCGGCGACCGGCGGGCGATCAAGGTTCGCACTCCTCGGTGGGGACAGACCTCCCACCGATGTTCTCATTTCCTTGTTTCGTTCTGTTCGTTTCGTTCGATGTAGCTCCTCACCTCCCATGCTACATCATCCGACGCTTGGTCCTTTTCGGTCCCATCATACCACGCTGCGCCAGAGCTGTAAACACCGAGACCGTCCTAAGATGCAGAAAGGACCGCCGGGATGTCGGCGGTCCTTCTCTTTATCTCGTCCT
[0447] >SEQ ID NO:64
[0448] CTTTCAATTCACGACCTCCCGAGGAGGTCGACGTGATCGACAGCAACAAGTGAGCGGACGCAACTACTTTCAATTCACGACCTCCCGAGGAGGTCGACTTTTTTCGATTGTCAATTTCTTTTGGCAAATGACTATATTTCATTTTTAATCAGTGCAATTTTTTCCAATTTCTCGGATTCAAGTGTATCGCGGAATTTTAATTTCTTATTATATTTTTTCTCGTAATCCTTTTTGTTTAATCTGTATACACAATACCACAGAAGTAGCACGATTATTAGCCCTGACATAGTATTTACCCCTTTTCAACTCACAACTTCTTAATAAAAATCACGCTGCGCAAACCTTATTGCAGAATTTCTACTCACACGTCCCTTACGGGACGTGACATCTTCCGATTTAATAGAGACATTGTGGAATGTAATTTCT ACTCACACGTCCCTCGCGGGACGTGACGTCGGATGATTTGAATATAGAAACAAAGAGTAATCCATTTCAATTCCAGACCTTCCGAGAAGGTCGACAACATGCCGCTCATATCCGTCACCTGACTGGTATCCTTTCAATTCACGACCTCCCGAGGAGGTCGACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATCTTTCAATTCACGAC CTCCCGAGGAGGTCGACTGCAAAATTCCACAGTTTTCATCTCTAAAACCTTGAAATTTTTTAATACCTTTACTAATTTCTCTTACTCCTTCTTGCTTTAATAATACCACTCATTTTCCCTTTTCCTAACTAAAACCTCTCCATTTTCTGGTGCGAATGCCGAAGACTTTTCTGTGTGCTTAGGATTCGCGCTCCACTTGAAACATGTTCTGACACGG
[0449] >SEQ ID NO: 65
[0450] GTCCGGTCCGTGCGCGTGGGGAGCGACCGTGATCGACAGCAACAAGTGAGCGGACGCAACTGTCCGGTCCGTACGCGTGGGGAGCGACGAACGGATCCACAGCGGGCCCTTGATGACCCGACAGAAAGGCACCTTTCGTTGAGAAACAGGCTCTCCCGCCCCCCTGCGGCTCAAGGCCCGAGAACCGGCCTTCGCTACTGGCTCGCCCGCCGCCGCCGGGCCATGGCCGCACATGCACTGCGCGGCGCCTGCTACGGCATCGGCACCGGCGCAGTCGGCCTCGTATTCTGGTGGATCGAGCAGACCCTGTAAACGGGCTCCCGCACCCAAGGGCAGGGCGGGCGGCGCGGGTGATCTGGTGGTGAAGAAGCAGGTCAGCGGCCGAACGGGGACCGTTGTCAGTGGTGGCCTCTACAGTCGTCAACGACTCTTGGCGCGCGGGTGGTTGAAGGAGGTGGCTGATGGCGGAGACGGCGGTGCTGCGTGCGACCAGCAAACGGCAAAGCGGCCCTGAAACCGCAGGTGGCGAAGTGTCCGGTCCGTGCGCGTGGGGAGCGACCGTGATCGACAGCAACAAGTGAGCGGACGCAACTGTCCGGTCCGTACGCGTGGGGAGCGACGAGTTGCTGTTGTGCCTGCCCGCATGCCGTCTTGGTCTAGGCGATCTCCCAGCGGTTGCTGTCGTGGGTCTGTTCGT
[0451] >SEQ ID NO:66
[0452]
[0453] SEQ ID NO: 67
[0454] TTAAATATAACCTATGTTAATCATTAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATTTAAATATAACCTATGTTAATCATTAACGTACTCATTTTAAATCCCCCTCATTTTAATCTTACATTAAAAATTTTACCTCTTTAACTGTTTTCTGTACACAAATTCGACAAATTTAGAATTAAACATTTATTATTAGACAATTATTTTTAATAAAAAATGAGCCTAGAAGTAACTAAACTCCTAGGCTCATTTCGTGCAATATTCTTATTATATTTATAATTTATTTTAACATAATTCTATTATTTATTAAACTACATTAACCATTAAATTTACATTCATATTCAACATAATCTATGTTAATCATTAACAAAGGAGGTTGCTTATGTTTAATAACAATATTTCATTTAAATATAACCTA TGTTAATCATTAACGTGATCGACAGCAACAAGTGAGCGGACGCAACTAATTTAAATATAACCTATGTTAATCATTAACCTTAGTAACTCAGCCATTTCCTCTAATAATTATTAATTTTTCTAATTGGTTATATCATAGTACCCACCATGCTTATAGCATTTTTTATACTAGCTTTTTACCAGCAAATTTCAACTGGAGCGGTTGTAAGAGTGTTAACATGCTTCCTTTCAAGTAATTTCAATGTTTTTAAGCGTGTTACTTCTCTACTTAGGATGGTAAATCACCCTTTATTAAAATCATTTACTCCCTCATTTCAAAGTTTTATTTTTAAAGTTTTATAAATAAAAAAAGAAGACCTAGAGATTCCCCCTAGGTCTATTATGTGCAACATTTACTTATATTATTATAAAAATTATATTTTAAA
[0455] SEQ ID NO: 68
[0456]
[0457] >SEQ ID NO:69
[0458]
[0459] >SEQ ID NO:70
[0460]
[0461] >SEQ ID NO:71
[0462] AAAAATCGTGAAAGATAATTAGAAAAGAAATTTTCTAATTGGGGCTGCTGGAAGTCCTTAACCTGAGATAGGAAACTATACTCGGGAAGTACGTCAAAACATTGAAATTAATAAACTTAGAAACTAACGGTTGGTTTTATTATAGTTTTGAATGGTATATCCAACAGTGATCGACAGCAACAAGTGAGCGGACGCGTTGGTTTTATTATTGTTTTGAATGGTATATCCAACAGTGATCGACAGCAACAAGTGAGCGGACGCGTTGGTTTTATTATTGTTTTGAATGGTATATCCAACAGTGATCGACAGCAACAAGTGAGCGGACGCGTTGGTTTTATTATTGTTTTGAATGGTATATCCAACAGTGATCGACAGCAACAAGTGAGCGGACGCAATTGGTTTTATTATTGCTTTTGAATAGTATAAAGTCGTCTGCGATGATGTGCGCATGGGTGTTGATAATCGTGTCTTTTTTTTACTGATTTTAGTTATAAGTTAATATACTTATAAGGGATATTATTGCTAA CGCATGTAGCTGTTTATCACATACGTTAGTAATCAATCATTGCATACGTTAGTAACAAGAAACGACTAACGTATGTAATAATCTGCAAGAAAACTTTGCCATTTCAAATAAAATTGTAATTTTGTCATTACAGTCCCCACTAATAAAAAAGGAGGTCAGTAATGCTGTTCATGGTGAAATGGCTGGAACGTGTAGCAAAATAAATTTCTCGAAATAGAATTTACTTGTGGAATCCCTCATGCGGTTGCATGGGGGATTTTTTATGGGCGATTTTGAGAATTTTTGAAATTCTGAAAATTATTTGAGTTGGATTTGAGACTTACAATTATAATTACAATATTTGCGGTTGTAAAATTAAAAAAGAGTATGCCTATTATTACACGTAAGATAGAATTGAAAATCGTAAAAGATGGATTGACAGATGAAGAATATGACCAACAATGGAAAATATCTTTACCAGATAAACAACACC
[0463] SEQ ID NO: 72
[0464]
[0465] >SEQ ID NO:73
[0466]
[0467] >SEQ ID NO:74
[0468]
[0469] >SEQ ID NO: 75 sgRNA-pre scaffold (The underlined part is the reprogrammable region)
[0470] GCCUUGCAAUAAUGUGAACGUGAAAUGGAGGGAUGUCUGUGGCAGAUAUCAAAUCUCAAAGUGGUGGGAGGUCUGUCCCCACCAUGGGGUGCGAACCUUGUGUGCUCAUCAUUGCCGUGAGCGUUCGCACGUCCAAACGACCAUAUCAUCGUU GCCCUGCGACCAUUCAGCGGCAAUCAAGACGCAGGCAUGAUAUGUAACCAUGCAUU GGAAAGUGCAUGGAGCAGAAG
[0471] >SEQ ID NO: 76 sgRNA-opt scaffold (The underlined part is the reprogrammable region)
[0472] GGUGGGAGGUCUGUCCCCACCAUGGGGUGCGAACCUUGUGUGCUCAUCAUUGCCGUGAGCGUUCGCACGUCCAAACGACCAUAUCAUCGUU GCCCUGCGACCAUUCAGCGGCAAUCAAGACGCAGGCAUGAUAUGUAACCAUGCAU U GGAAAGUGCAUGGAGCAGAAG
[0473] >SEQ ID NO: 77 882-TnpB-reRNA scaffold
[0474] GGUGAACAUUCGCUACGCAACUGUGGUUGGCGGUAAUAGUCAAUCAGCCUAAGAACCUAUAUGGUCUGUCUUAGGAGGGGUGAUGGCACACCCUCAUCUUGAAGGCUGUUCAAAACAGAAAUGGACUGCGAACGUUUAGUCAUUCAAGAAUCCCAUCCGUUACACCAUAAGGUGUAGGACUUGCGGCUUUAGCCGUGGGAGUGUCAAAAAACACAGUAGGAGGGACUUACGUUAG
[0475] >SEQ ID NO: 78 966-TnpB-reRNA scaffold
[0476] GGUUCUGCAUGGUCAGAAUCGGAAUGGGUCAAAGAGCAGCUACAAGCCCAUGACUUCAGUCGUGGGUAGUUGACAAAACACAGUAGGAGGGACUUACGUUAG
[0477] >SEQ ID NO:79 GFP-T1 GTGATCGACAGCAACAAGTGAGCG
[0478] >SEQ ID NO:80 TraC-5M-7
[0479] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVAIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0480] >SEQ ID NO:81 TraC-B22
[0481] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKISRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVSIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0482] >SEQ ID NO:82 TraC-B24
[0483] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRFQVPLVAIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDRVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0484] >SEQ ID NO:83 TraC-B26
[0485] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSGIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVAIGRFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0486] >SEQ ID NO:84 TraC-B32
[0487] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRFQVPLVAIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0488] >SEQ ID NO:85 TraC-B34
[0489] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVAIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNVLAIAQNMLEKKVPFADEALPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0490] >SEQ ID NO:86 TraC-B35
[0491] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVSIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEGLPDEKPPAAPVKKAARKPRDAVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0492] >SEQ ID NO:87 TraC-B36
[0493] MSVSFSLNAKKIRLENYAMKMRLYPSPTQAEQMDKMFLALRLAYNMTFHEVFQQNPAVCGDPDEDGNVWPSYKKMANKTWRKALIDQNPAIAEAPAAAITTNNGLFLSNGQKAWKTGMHNLPANKADRKDFRFYSLRKPRRSFAVQIRPRCIIPSDTNQKVARIKLPKIDGAIKARGFNRKIWFGPDGKHTYEEALAAHELSNNLTVRVSKDTCGDYFICITFSQGKVKGDKPTWEFYQEVRVSPIPEPIGLDVGIKDIAILNTGTKYENKQFKRDRAATLKKMSRQLSRRWGPANSAFRDYNKNIRAENRALERAQQDPGSSGVGPEAPVLKSVAQPSRRYLTIQKNRAKLERKIARRRDTYYHQVTREVAGKSSLLAVETLRVKNMLQNHRLAFALSDAAMSDFISKLKYKARRIQVPLVAIGTFQPSSQTCSVCGSINPAVKNLSIRVWTCPNCGTRHNRDINAAKNILAIAQNMLEKKVPFADEALPDEKPGAAPVKKAARKPRDRVFPDHPDLVIRFSKELTQLNDPRYVIVNKATNQIVDNAQGAGYRSAAKAKNCYKAKLAWSSKTNK
[0494] >SEQ ID NO:88 VEGFA-T17 AAACACAGTAGGAGGGACTTACGTTAG
[0495] >SEQ ID NO:89 VEGFA-T3 CGAGCGCCGAGTCGCCACTGCGGCCCC
[0496] >SEQ ID NO:90 APEX-T1 CCAGGCTCAAAGTGATTTAGGGGTGGT
[0497] >SEQ ID NO:91 HEK-T1 CTAGACAGGGGCTAGTATGTGCAGCTC
[0498] >SEQ ID NO:92 CCDC127-T1 CTGTTGTTGTTTGTACTTCCTGATGCA
[0499] >SEQ ID NO:93 OsAAT-T1 AGCACCAGGGGCTTCCCTTCCTACTCC
[0500] >SEQ ID NO:94 OsALS-T1 CTCTGGGGTACAAAACTTTTGGTGAAG
[0501] >SEQ ID NO:95 OsEPSPS-T1 AGCACCAGGGGCTTCCCTTCCTACTCC
[0502] >SEQ ID NO:96 OsPDS-T1 TTGTACCTCAAGAGTGGAAAGAAATAT
[0503] >SEQ ID NO:97 OsPDS-T2 AGAAACAGTGAACAACCCACTAAACCA
Claims
1. An engineered CRISPR system, comprising: a) a transposon and the CRISPR-Cas12 intermediate TraC effector protein, or one or more nucleotide sequences encoding that effector protein; and b) One or more guide RNAs, or nucleotide sequences encoding such guide RNAs. The guide RNA is selected from i) reRNA and / or ii) single guide RNA containing tracrRNA and crRNA; The TraC effector protein can form a CRISPR complex with the guide RNA. The amino acid sequence of the TraC effector protein is selected from any one of SEQ ID NO:25 or SEQ ID NO:80-87, or the TraC effector protein is derived from SEQ ID NO:25 and is a mutation selected from any one of the groups shown in Table 2 or Table 3 relative to SEQ ID NO:
25.
2. The engineered CRISPR system of claim 1, wherein the tracrRNA contains a non-target strand binding sequence NTB that is complementary to the non-target strand NTS.
3. The system of claim 2, wherein the guide RNA is a single guide RNA comprising tracrRNA and crRNA, wherein the tracrRNA contains a non-target strand binding sequence NTB that is complementary to the non-target strand NTS, wherein the guide RNA hybridizes with the target strand TS of the target DNA sequence via crRNA and with the non-target strand NTS via NTB.
4. The system of claim 1, wherein during transcription, the one or more guide RNAs hybridize with the target DNA, and the guide RNAs form a complex with the TraC effector protein, the complex causing distal cleavage of the target DNA sequence.
5. The system of any one of claims 3-4, wherein the target DNA sequence is within a eukaryotic cell.
6. The system of any one of claims 1-4, wherein one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences, or mitochondrial localization sequences are linked to the effector protein via or without a adapter.
7. The system of any one of claims 1-4, wherein the nucleic acid sequences encoding the effector protein are codon-optimized for expression in eukaryotic cells.
8. The system according to any one of claims 1-4, wherein components a) and b) or their nucleotide sequences are constructed on the same or different vectors.
9. A method for modifying a target DNA sequence, the method comprising delivering a system as described in any one of claims 1-8 into the target DNA sequence or a cell containing the target DNA sequence, the method being for purposes other than the diagnosis or treatment of a disease.
10. The method of claim 9, further comprising delivering a composition of a TraC effector protein and one or more guide RNAs to the target DNA sequence, wherein the effector protein is capable of targeting and binding the target DNA sequence both under the guidance of a reRNA-derived protein and under the guidance of a single guide RNA comprising tracrRNA and crRNA; the effector protein forms a CRISPR complex with the one or more guide RNAs, and after the complex targets and binds to the 3' of the target DNA sequence of the pre-interstitial region adjacent to the motif PAM, the effector protein induces modification of the target DNA sequence.
11. The method of claim 10, wherein the target gene is in a eukaryotic cell.
12. The method of claim 11, wherein the cell is an animal cell.
13. The method of claim 11, wherein the cell is a human cell.
14. The method of claim 11, wherein the cell is a plant cell.
15. The method of claim 10, wherein one or more nuclear localization sequences (NLS), cytoplasmic localization sequences, chloroplast localization sequences, or mitochondrial localization sequences are linked to the effector protein via or without a adapter.
16. The method of claim 10, wherein the effector protein and the guide RNA, or a construct expressing the effector protein and the guide RNA, are contained in a delivery system.
17. The method of claim 16, wherein the delivery system comprises a virus, virus-like particles, virions, liposomes, vesicles, exogenous bodies, liposome nanoparticles (LNP), N-acetylgalactosamine (GalNAc), or engineered bacteria.
18. A transposon and CRISPR-Cas12 intermediate TraC effector protein for genome editing in an organism or somatic cells, wherein the amino acid sequence of the TraC effector protein is selected from any one of SEQ ID NO:25, SEQ ID NO:80-87, or the TraC effector protein is derived from SEQ ID NO:25 and is a mutation selected from any one of the groups shown in Table 2 or Table 3 relative to SEQ ID NO:
25.
19. The TraC effector protein of claim 18, wherein at least one nuclear localization sequence (NLS), cytoplasmic localization sequence, chloroplast localization sequence, or mitochondrial localization sequence is linked to the TraC effector protein via a linker or not.
20. The use of the engineered CRISPR system of any one of claims 1-8 or the TraC effector protein of any one of claims 18-19 for genome editing of cells, wherein the use is not for the diagnosis or treatment of a disease.
21. The use of claim 20, wherein the genome editing includes base editing, guided editing, and PrimeRoot editing.
22. A method for producing genetically modified cells, comprising introducing an engineered CRISPR system according to any one of claims 1-8 or a TraC effector protein according to any one of claims 18-19 into the cells, the method being for purposes other than the diagnosis or treatment of a disease.
23. The method of claim 22, wherein the cells are derived from prokaryotes or eukaryotes.
24. The method of claim 23, wherein the eukaryote is selected from animals or plants.
25. The method of claim 24, wherein the animal is selected from mammals or poultry.
26. The method of claim 24, wherein the animal is selected from humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats, chickens, ducks, and geese.
27. The method of claim 24, wherein the plant is selected from monocotyledons or dicotyledons.
28. The method of claim 24, wherein the plant is selected from rice, corn, wheat, sorghum, barley, soybean, peanut, and Arabidopsis thaliana.
29. Use of the engineered CRISPR system of any one of claims 1-8 in the preparation of a disease therapeutic agent, wherein the disease is selected from genetic diseases or neonatal diseases.