Adeno-associated virus for therapeutic use, including a liver-specific promoter for treating Pompe disease and lysosomal disorders.

rAAV vectors with liver-specific promoters and fusion peptides enhance GAA delivery to lysosomes, addressing inefficiencies in current treatments and minimizing side effects for Pompe disease.

JP7880290B2Active Publication Date: 2026-06-25ASKLEPIOS BIOPHARMACEUTICAL INC

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
ASKLEPIOS BIOPHARMACEUTICAL INC
Filing Date
2020-11-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Current methods for treating lysosomal storage disorders like Pompe disease face challenges in achieving efficient delivery and expression of lysosomal enzymes, such as GAA, to target tissues, leading to frequent infusions and side effects like hypoglycemia and hyperglycemia.

Method used

The use of rAAV vectors with liver-specific promoters and fusion peptides to deliver and target GAA polypeptides to lysosomes, enhancing liver-specific expression and reducing side effects.

Benefits of technology

This approach improves the efficacy of GAA delivery to lysosomes, reducing the need for frequent infusions and minimizing side effects, providing a more stable and effective treatment for Pompe disease.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007880290000018
    Figure 0007880290000018
  • Figure 0007880290000019
    Figure 0007880290000019
  • Figure 0007880290000020
    Figure 0007880290000020
Patent Text Reader

Abstract

A recombinant AAV (rAAV) vector comprising an rAAV genome including a heterologous nucleic acid encoding a lysosomal protein, e.g., an acid alpha-glucosidase (GAA) polypeptide, operably linked to a liver-specific promoter (LSP), and optionally a signal peptide and / or optionally a targeting sequence, e.g., an IGF2 targeting peptide, enabling the GAA polypeptide to be secreted from the liver and targeted to lysosomes. Certain embodiments relate to a recombinant AAV (rAAV) vector encoding an alpha-glucosidase (GAA) polypeptide, having a liver secretory signal peptide and an IGF2 targeting polypeptide that binds to the cation-independent mannose-6-phosphate receptor (CI-MPR) or the IGF2 receptor, enabling proper intracellular localization of the GAA polypeptide to lysosomes.
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] Sequence List This invention claims the benefit of Section 119(e) of the U.S. Patent Act as defined in U.S. Provisional Applications No. 62 / 937,556 and No. 62 / 937,583, filed on November 19, 2019, and No. 63 / 023,570, filed on May 12, 2020, the contents of which are incorporated herein by reference in their entirety.

[0002] Sequence List This application includes an electronically filed sequence listing in ASCII format, which is incorporated herein by reference in its entirety. The ASCII copy, created on November 17, 2020, is named 046192-096600WOPT SL.txt and is 840,179 bytes in size.

[0003] Field of Invention The present invention relates to adeno-associated virus (AAV) particles, virions, and vectors for targeted transfer of lysosomal enzymes, such as alpha-glucosidase (GAA) polypeptides, and to methods of use for the treatment of lysosomal storage disorders and lysosomal storage disorders, such as Pompe disease. [Background technology]

[0004] background More than 40 lysosomal storage disorders (LSDs) are caused directly or indirectly by the absence of one or more lysosomal enzymes in lysosomes. Enzyme replacement therapy for LSDs is being actively promoted. Therapies generally require LSD proteins to be taken up and delivered to lysosomes of various cell types in an M6P-dependent manner. One possible approach involves purifying LSD proteins and modifying them so that their carbohydrate moiety is incorporated by M6P. This modified material may be taken up more efficiently by cells than unmodified LSD proteins due to its interaction with M6P receptors on the cell surface.

[0005] The feasibility of gene therapy approaches to treat GSD-II is being investigated as an alternative or adjunct to enzyme therapy (Amalfitano, A., et al., (1999) Proc. Natl. Acad. Sci. USA 96:8861-8866, Ding, E., et al. (2002) Mol. Ther. 5:436-446, Fraites, TJ, et al., (2002) Mol. Ther. 5:571-578, Tsujino, S., et al. (1998) Hum. Gene Ther. 9:1609-1616).

[0006] However, viral or AAV delivery of genes, particularly lysosomal proteins and enzymes for the treatment of lysosomal storage disorders, presents challenges. Normally, mammalian lysosomal enzymes are synthesized in the cytosol, pass through the ER, and are glycosylated with N-linked high-mannose carbohydrates. In the Golgi apparatus, the high-mannose carbohydrates are modified on lysosomal proteins by the addition of mannose-6-phosphate (M6P), which targets these proteins to lysosomes. M6P-modified proteins are delivered to lysosomes via interaction with one of two M6P receptors. However, recombinantly produced proteins used in enzyme replacement therapy often lack the M6P addition necessary to target them to lysosomes, and therefore often require high doses of recombinantly produced enzymes and / or frequent infusions administered to patients.

[0007] Acid alpha-glucosidase (GAA) is a lysosomal enzyme that hydrolyzes alpha-1-4 linkages in maltose and other linear oligosaccharides, including the outer branching of glycogen, thereby breaking down excess glycogen in lysosomes (Hirschhorn et al. (2001) in The Metabolic and Molecular Basis of Inherited Disease, Scriver, et al., eds. (2001), McGraw-Hill: New York, p. 3389-3420). Like other mammalian lysosomal enzymes, GAA is synthesized in the cytosol, passes through the ER, where it is glycosylated with N-linked high-mannose carbohydrates. In the Golgi, high-mannose carbohydrates are modified on lysosomal proteins by the addition of mannose-6-phosphate (M6P), which targets these proteins to lysosomes. The modified M6P protein is delivered to the lysosome via interaction with one of two M6P receptors. The most preferred form of modification is when the two M6P proteins are attached to a high-mannose carbohydrate.

[0008] Insufficient GAA activity in lysosomes leads to Pompe disease, also known as acid maltase deficiency (AMD), glycogen storage disorder type II (GSDII), glycogen storage disease type II, or GAA deficiency. The reduction in enzyme activity is caused by various missense and nonsense mutations in the gene encoding GAA. As a result, glycogen accumulates in the lysosomes of all cells in patients with Pompe disease. Glycogen accumulation is particularly pronounced in the lysosomes of cardiac and skeletal muscle, the liver, and other tissues. The accumulated glycogen ultimately impairs muscle function. In the most severe forms of Pompe disease, death occurs before the age of two due to cardiopulmonary failure.

[0009] There is a need for effective treatment of Pompe disease. Enzyme replacement therapy for Pompe requires the administration of recombinant GAA protein, which is taken up by muscle and hepatic cells in the subject and subsequently transported to lysosomes in those cells in an M6P-dependent manner. However, although enzyme therapy has demonstrated reasonable efficacy for severe pediatric GSD II, the benefits of GAA enzyme therapy are limited by the need for frequent infusions and the development of inhibitors or neutralizing antibodies against recombinant hGAA protein in the subject (Amalfitano, A., et al. (2001) Genet. In Med. 3:132-138).

[0010] Gene therapy, using viruses, has the potential to not only cure hereditary disorders but also facilitate long-term, non-invasive treatment of acquired degenerative diseases. One gene therapy vector is adeno-associated virus (AAV). AAV itself is a non-pathogenic, dependent parvovirus that requires a helper virus for efficient replication. Due to its safety and simplicity, AAV is used as a viral vector for gene therapy. AAV has broad host and cell type tropism, capable of transducing both dividing and non-dividing cells.

[0011] However, AAV delivery of GAA polypeptide presents several challenges in achieving adequate expression in the liver and / or delivery to lysosomes, and patients have reported experiencing glycosylation. In particular, in human studies, administration of rAAV vectors encoding GAA polypeptides has resulted in several patients experiencing hypoglycemia or hyperglycemia due to nonspecific cellular updates (see, for example, Byrne et al., A study on the safety and efficacy of Reveglucosidease alfa in patients with late-onset Pompe disease; Orphanet J. of Rare diseases; 2017; 12: 144). Therefore, there is a need in the art for improved methods of generating lysosomal polypeptides such as GAA in vitro and in vivo to treat lysosomal polypeptide deficiencies, including, for example, modifications of GAA. There is also a need for improved secretion from the liver and improved targeting of GAA to lysosomes to help reduce any side effects caused by overexpression of GAA polypeptide and to reduce the risk of hypoglycemia. Furthermore, there is a need for methods that result in systemic delivery of GAA and other lysosomal polypeptides to affected tissues and organs. In particular, there remains a need for more efficient methods for targeted administration of GAA protein and for targeting GAA protein to patient lysosomes while reducing any potential side effects. [Prior art documents] [Non-patent literature]

[0012] [Non-Patent Document 1] Amalfitano, A., et al., (1999) Proc. Natl. Acad. Sci. USA 96:8861-8866 [Non-Patent Document 2] Ding, E., et al. (2002) Mol. Ther. 5:436-446、Fraites, T. J., et al., (2002) Mol. Ther. 5:571-578 [Non-Patent Document 3] Tsujino, S., et al. (1998) Hum. Gene Ther. 9:1609-1616) [Non-Patent Document 4] Hirschhorn et al. (2001) in The Metabolic and Molecular Basis of Inherited Disease, Scriver, et al., eds. (2001), McGraw-Hill: New York, p. 3389-3420 [Non-Patent Document 5] Amalfitano, A., et al. (2001) Genet. In Med. 3:132-138 [Summary of the Invention] [Means for Solving the Problems]

[0013] Summary of the Invention The techniques described herein generally relate to, for example, but are not limited to, gene therapy constructs, methods, and compositions for the treatment of lysosomal storage diseases and lysosomal storage disorders such as Pompe disease. More particularly, the technology relates to adeno-associated (AAV) virions configured to deliver lysosomal enzymes, such as GAA polypeptides, to a target, more particularly to deliver a lysosomal enzyme, such as a GAA polypeptide, that is targeted to lysosomes and secreted from liver cells, to the liver of a target.

[0014] In particular, a targeted viral vector using an rAAV vector is described herein as an exemplary example, which comprises a nucleotide sequence containing an inverted terminal repeat (ITR), a promoter, a heterogene, a polyA tail, and other potential regulatory elements for use in treating lysosomal storage disorders, such as those listed in Table 5A or Table 6A herein, where the heterogene is, for example, a lysosomal enzyme such as GAA, and the vector, e.g., rAAV, can be administered to a patient in a therapeutically effective dose, which is delivered to the appropriate tissue and / or organ for the expression of the heterogene lysosomal enzyme gene and for the treatment of the disease, e.g., Pompe disease.

[0015] Aspects of the present invention teach certain advantages in construction and use that result in the exemplary benefits described below.

[0016] Accordingly, in certain embodiments, rAAV vectors are described herein that include inverted terminal repeats (ITRs), as well as nucleotide sequences containing a liver-specific promoter (LSP) located between the ITRs, a heterogeneous nucleic acid sequence encoding the acid alpha-glucosidase (GAA) protein, a poly A tail, and other regulatory elements for use in potentially treating Pompe disease, wherein the rAAV expressing the GAA protein can be administered to a patient in a therapeutically effective dose, which is delivered to the appropriate tissue and / or organ for the expression of the heterogeneous gene encoding the GAA protein for the treatment of subjects having Pompe disease.

[0017] More specifically, the AAV virion or genome includes any promoter listed in Table 4 herein, or an LSP selected from functional variants or functional fragments thereof, or any LSP selected from SEQ ID NOs: 86, 91-96 or 146-150, or a functional variant or functional fragment thereof, which enables the preferential expression of a lysosomal protein, e.g., GAA protein, in the liver. In some embodiments, the liver-specific promoter may also express hGAA to some extent in another tissue of interest, e.g., muscle, or CNS, or muscle and CNS tissue, while preferentially expressing hGAA protein in the liver. In some embodiments, the expressed lysosomal enzyme, e.g., GAA protein, may be configured as a GAA fusion protein with a targeting sequence, such as an IGF2-targeting peptide disclosed herein, which targets the GAA protein to lysosomes, and / or may be fused with a signal peptide (SP), the GAA protein being expressed by the rAAV genome in the liver, where it is secreted and taken up by lysosomes in mammalian cells, particularly muscle cells.

[0018] In some embodiments of the compositions and methods described herein, the rAAV vector disclosed herein comprises in its genome a liver-specific promoter (LSP) operably ligated to a heterogeneous nucleic acid sequence encoding an alpha-glucosidase (GAA) polypeptide located between the 5' and 3' ITRs, the liver-specific promoter (LSP) being sequence number 86 (CRM The sequence includes 0412), SEQ ID NO: 91 (SP0412), or SEQ ID NO: 92 (SP0422), SEQ ID NO: 93 (SP0239), SEQ ID NO: 94 (SP0265, also referring to SP131_A1), SEQ ID NO: 95 (SP0240), or SEQ ID NO: 96 (SP0246), or SEQ ID NO: 146 (SP0265-UTR), SEQ ID NO: 147 (SP0239-UTR), SEQ ID NO: 148 (SP0240-UTR), SEQ ID NO: 149 (SP0246-UTR), or SEQ ID NO: 150 (SP0131-A1-UTR), or a functional fragment or variant thereof, or any LSP selected from SEQ ID NOs: 270-341 or 342-430, or a nucleic acid sequence selected from any promoter listed from those functional fragments or variants. In some embodiments, the GAA polypeptide is not fused to either an IGF2 targeting sequence or a signal sequence. In some embodiments, the GAA polypeptide is fused to a signal sequence and / or an IGF2 targeting sequence disclosed herein.

[0019] In some embodiments of the compositions and methods described herein, the rAAV vector disclosed herein comprises in its genome a liver-specific promoter (LSP) operably ligated to a heterogeneous nucleic acid sequence located between the 5' and 3' AAV inverted terminal repeat (ITR) sequences and a fusion polypeptide comprising (i) a secretory signaling peptide and / or an IGF2-targeting peptide and (ii) an alpha-glucosidase (GAA) polypeptide, wherein the liver-specific promoter (LSP) is selected from any promoter listed in Table 4 herein, or functional variants or functional fragments thereof, or any LSP selected from SEQ ID NOs. 86, 91-96 or 146-150, or functional variants or functional fragments thereof.

[0020] In some embodiments, the rAAV vectors disclosed herein include in their genome a heterogeneous nucleic acid sequence encoding a fusion polypeptide located between the 5' and 3' AAV inverted terminal repeat (ITR) sequences and (i) a secretory signal peptide (also referred to as a leader peptide), and (ii) an alpha-glucosidase (GAA) polypeptide, wherein the heterogeneous nucleic acid is operably ligated to any promoter listed in Table 4 herein, or a functional variant or functional fragment thereof, or any LSP selected from SEQ ID NOs. 86, 91-96 or 146-150, or a functional variant or functional fragment thereof, or any LSP selected from Table 4 herein, or a liver-specific promoter (LSP) selected from those functional variants or functional fragments. Exemplary leader sequences include, but are not limited to, the innate GAA leader sequences, AAT sequences, IL2(1-3), IL2 leader sequences (IL2 wt), modified IL2 leader sequences (IL2 mut), fibronectin (FN1) signaling sequences, or IgG leader sequences, or functional variants thereof, as disclosed herein. In some embodiments, the AAV vector includes a Kozak sequence located between the LSP and the leader sequence.

[0021] In some embodiments, the rAAV vectors disclosed herein include in their genome a heterogeneous nucleic acid sequence encoding a fusion polypeptide located between the 5' and 3' AAV inverted terminal repeat (ITR) sequences and (i) an IGF2-targeting peptide and (ii) an alpha-glucosidase (GAA) polypeptide, wherein the heterogeneous nucleic acid is operably ligated to any promoter listed in Table 4 herein, or a functional variant or functional fragment thereof, or any LSP selected from SEQ ID NOs. 86, 91-96 or 146-150, or a liver-specific promoter (LSP) selected from a functional variant or functional fragment thereof.

[0022] In further embodiments, the rAAV vectors disclosed herein include in their genome 5' and 3' AAV inverted terminal repeat (ITR) sequences, and heterogeneous nucleic acid sequences encoding an alpha-glucosidase (GAA) polypeptide located between the 5' and 3' ITRs (i.e., the GAA polypeptide is neither fused to a heterogeneous signal peptide (or leader sequence) nor to an IGF2 targeting sequence described herein), wherein the heterogeneous nucleic acid is operably ligated to any promoter listed in Table 4 herein, or a functional variant or functional fragment thereof, or any LSP selected from SEQ ID NOs. 86, 91-96 or 146-150, or a liver-specific promoter (LSP) selected from a functional variant or functional fragment thereof.

[0023] In some embodiments, the rAAV vector comprises a liver-specific capsid, for example, a liver-specific capsid selected from XL32 and XL32.1 disclosed in WO2019 / 241324, which is incorporated herein in its entirety by reference. In some embodiments, the rAAV vector is a haploid AAV vector comprising AAVXL32 or AAVXL32.1, or an AAV8 vector, or at least one AAV8 capsid protein (for example, at least one of VP1, VP2, or VP3 is derived from an AAV8 serotype), and in some embodiments, the AAV vector is a haploid AAV vector comprising at least two AAV8 capsid proteins. In some embodiments, the AAV vector comprises a capsid disclosed in WO2019241324A1 or International Patent Application PCT / US2019 / 036676, which are incorporated herein in their entirety by reference. In some embodiments, the AAV vector comprises a capsid encoded by a nucleic acid AAV capsid coding sequence that is at least 90% identical to the nucleotide sequence encoding any one of SEQ ID NOs: 1-3 disclosed in WO2019241324A1 or (b) any one of SEQ ID NOs: 4-6 disclosed in WO2019241324A1. In some embodiments, the AAV capsid, together with the AAV particle containing the AAV vector genome and AAV capsid of the present invention, comprises an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs: 4-6 disclosed in WO2019241324A1. In some embodiments, the rAAV vector includes a capsid protein so that the AAV vector transduces liver cells, and in some embodiments, the rAAV vector includes a capsid protein so that the AAV vector transduces muscle and liver cells.

[0024] An exemplary LSP included for use in the method and composition is SP0412 (SEQ ID NO: 91) or a functional variant thereof. In alternative embodiments, the LSP may be selected from SEQ ID NO: 86 (CRM 0412), SEQ ID NO: 91 (SP0412), or SEQ ID NO: 92 (SP0422), SEQ ID NO: 93 (SP0239), SEQ ID NO: 94 (SP0265, also referring to SP131_A1), SEQ ID NO: 95 (SP0240), or SEQ ID NO: 96 (SP0246), or SEQ ID NO: 146 (SP0265-UTR), SEQ ID NO: 147 (SP0239-UTR), SEQ ID NO: 148 (SP0240-UTR), SEQ ID NO: 149 (SP0246-UTR), or SEQ ID NO: 150 (SP0131-A1-UTR), or any functional fragment or variant thereof.

[0025] In some embodiments of the compositions and methods described herein, the secretory signaling peptide is selected from among AAT signaling peptide, fibronectin signaling peptide (FN1), GAA signaling peptide, innate GAA reader sequence, AAT sequence, IL2(1-3), IL2 reader sequence (IL2 wt), modified IL2 reader sequence (IL2 mut), or IgG reader sequence, or functional variants thereof having secretory signaling activity.

[0026] In some embodiments of the compositions and methods described herein, an alpha-glucosidase (GAA) polypeptide is ligated to an IGF2-targeting peptide at the N-terminus of the GAA polypeptide. In some embodiments, the IGF2-targeting peptide is ligated to the N-terminus of amino acid 70 of the human acid alpha-glucosidase (GAA) polypeptide (SEQ ID NO: 10) (i.e., ligated to the N-terminus of residues 70-952 of the human acid alpha-glucosidase (GAA) polypeptide), or ligated to a GAA polypeptide with at least 85% sequence identity to amino acids 70-952 of SEQ ID NO: 10. In alternative embodiments, the IGF2-targeted peptide is ligated to the N-terminus of amino acid 40 of a human acid alpha-glucosidase (GAA) polypeptide (SEQ ID NO: 10) (i.e., ligated to the N-terminus of residues 40-952 of the human acid alpha-glucosidase (GAA) polypeptide), or ligated to a GAA polypeptide with at least 85% sequence identity to amino acids 40-952 of SEQ ID NO: 10. In some embodiments of the compositions and methods described herein, the GAA polypeptide may be encoded by a wild-type GAA nucleic acid sequence (e.g., SEQ ID NO: 11 or SEQ ID NO: 72), or it may be a codon-optimized GAA nucleic acid sequence for any one of the following: increased expression in vivo, reduction of CpG islands, and / or reduction of the innate immune response in a subject. Exemplary codon-optimized GAA nucleic acid sequences include, but are not limited to, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, and SEQ ID NO: 182.

[0027] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a liver-specific promoter (LSP), which, for example, is selected from any of the functional variants thereof listed in Table 4 herein, but is not limited to. Exemplary LSPs included for use in the methods and compositions include SP0412 and its functional variants. In alternative embodiments, the LSP may include nucleic acid sequences selected from any of the functional variants thereof, such as SP0422, SP0131A1, SP0239, SP0240, or SP0246, as disclosed herein. For example, the liver-specific promoter may include nucleic acid sequences selected from any of the functional variants or fragments thereof, such as SEQ ID NO: 86 (CRM 0412), SEQ ID NO: 91 (SP0412), or SEQ ID NO: 92 (SP0422), as well as their functional variants or fragments. In alternative embodiments, the liver-specific promoter may include a nucleic acid sequence selected from SEQ ID NO: 93 (SP0239), SEQ ID NO: 94 (SP0265, also known as SP131_A1), SEQ ID NO: 95 (SP0240), or SEQ ID NO: 96 (SP0246), or SEQ ID NO: 146 (SP0265-UTR), SEQ ID NO: 147 (SP0239-UTR), SEQ ID NO: 148 (SP0240-UTR), SEQ ID NO: 149 (SP0246-UTR), or SEQ ID NO: 150 (SP0131-A1-UTR). In some embodiments of the compositions and methods disclosed herein, liver-specific promoters include liver-specific cis-regulating elements (CREs), synthetic liver-specific cis-regulating modules (CRMs), or synthetic liver-specific promoters comprising promoter sequences selected from either SEQ ID NOs. 270-341 (minimal LSPs, which may include CRMs) or SEQ ID NOs. 342-430 (exemplary synthetic LSPs), as previously disclosed in Table 4A or 4B of Provisional Application No. 62,937,556, which is entirely incorporated herein by reference, or functional fragments or functional variants thereof.These liver-specific promoter elements may include minimal liver-specific promoters (see, e.g., SEQ ID NOs. 86, 270-341) or liver-specific proximal promoters (see, e.g., SEQ ID NOs. 91-96, 146-150 and 342-430). For example, SEQ ID NOs. 86 (CRM 0412), SEQ ID NOs. 91 (SP0412), or SEQ ID NOs. 92 (SP0422), or their functional variants or functional fragments.

[0028] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a liver-specific promoter (LSP), for example, but not limited to, SEQ ID NOs: 86, 91-96, 146-150, 370-430, or any functional variant or functional fragment thereof.

[0029] For example, any functional variant or functional fragment of the liver-specific promoter disclosed in Table 4 of this Specification, or any LSP selected from SEQ ID NOs. 86, 91-96, 146-150, or 370-430, or any functional variant or functional fragment thereof, has at least about 75% sequence identity with the original unmodified reference sequence, or at least about 80% sequence identity, at least about 90% sequence identity, at least about 95% sequence identity, at least about 98% sequence identity, and also has at least 35% of the promoter activity of the corresponding unmodified promoter sequence, or at least about 45% of the promoter activity, or at least about 50% of the promoter activity, or at least about 60% of the promoter activity, or at least about 75% of the promoter activity, or at least about 80% of the promoter activity, or at least about 85% of the promoter activity, or at least about 90% of the promoter activity, or at least about 95% of the promoter activity.

[0030] For example, a functional variant or functional fragment of sequence number 92 (SP0422) or sequence number 91 (SP0412) has at least approximately 75% sequence identity with sequence number 92 or sequence number 91, or at least approximately 80% sequence identity with sequence number 92 or sequence number 91, at least approximately 90% sequence identity with sequence number 92 or sequence number 91, at least approximately 95% sequence identity with sequence number 92 or sequence number 91, at least approximately 98% sequence identity with sequence number 92 or sequence number 91, or the original unmodified sequence, and also has at least 35% of the promoter activity of the corresponding unmodified promoter sequence of sequence number 92 or sequence number 91, or at least approximately 45% of the promoter activity, or at least approximately 50% of the promoter activity, or at least approximately 60% of the promoter activity, or at least approximately 75% of the promoter activity, or at least approximately 80% of the promoter activity, or at least approximately 85% of the promoter activity, or at least approximately 90% of the promoter activity, or at least approximately 95% of the promoter activity.

[0031] A functional fragment is a portion of a promoter having at least 35%, or at least about 45%, or at least about 50%, or at least about 75%, or at least about 80%, or at least about 85%, or at least about 90% of an uncleaved promoter. In some embodiments, the functional fragment comprises a contiguous portion of an unmodified promoter sequence. TTR (SEQ ID NO: 431) is disclosed in the examples herein as an exemplary LSP, but those skilled in the art will know that the TTR promoter (SEQ ID NO: 431) is a nucleic acid sequence comprising one or more of the liver-specific promoters listed in Table 4 herein, for example, at least SEQ ID NO: 92 (SP0422) or SEQ ID NO: 91 (SP0412), or a functional variant or fragment of SEQ ID NO: 92 (SP0422) or SEQ ID NO: 91 (SP0412), or SEQ ID NO: 93 (SP0239), SEQ ID NO: 94 (SP1 The LSP can be replaced with a nucleic acid sequence containing any of the following: 31_A1), SEQ ID NO: 95 (SP0240), SEQ ID NO: 96 (SP0246), or SEQ ID NO: 146 (SP0265-UTR), SEQ ID NO: 147 (SP0239-UTR), SEQ ID NO: 148 (SP0240-UTR), SEQ ID NO: 149 (SP0246-UTR), or SEQ ID NO: 150 (SP0131-A1-UTR), or any sequence selected from SEQ ID NOs: 270-341 or 342-430, or functional variants or functional fragments thereof. In some embodiments, the LSP can also express hGAA to some extent in another tissue of interest, such as muscle, or CNS, or muscle and CNS tissue, while preferentially expressing the hGAA protein in the liver.

[0032] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding a wild-type GAA polypeptide (wtGAA) or a modified GAA polypeptide disclosed herein, wherein one or more amino acids of the GAA polypeptide are modified, e.g., H199R, R223H, H201L modifications. In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding a GAA polypeptide, which is a codon-optimized modified GAA nucleic acid sequence encoding a modified GAA polypeptide comprising a human GAA gene, or a human codon-optimized GAA gene (coGAA), or one or more modifications selected from H199R, R223H, H201L. In all embodiments of the methods and compositions disclosed herein, the nucleic acid sequence encoding the GAA polypeptide is codon-optimized for one or more of the following: enhanced expression in vivo, reduction of CpG islands, or reduction of the innate immune response. In all embodiments of the methods and compositions disclosed herein, the nucleic acid sequences encoding the GAA polypeptide are codon-optimized to reduce CpG islands and to reduce the innate immune response. In some embodiments, the nucleic acid sequences encoding the wild-type GAA polypeptide include those disclosed in SEQ ID NO: 182 and the modifications described herein.

[0033] Another aspect of the technology described herein relates to a pharmaceutical composition comprising any of the recombinant AAV vector compositions disclosed herein and a pharmaceutically acceptable carrier.

[0034] Another aspect of the technology herein relates to a composition comprising a nucleic acid sequence comprising a liver-specific promoter (LSP) operably ligated to a nucleic acid sequence comprising a nucleic acid encoding a modified GAA polypeptide, which comprises one or more modifications selected from 5'ITR, H199R, R223H, and H201L, in the following order: and a nucleic acid sequence comprising a 3'ITR. In one embodiment, the nucleic acid sequence optionally further comprises a nucleic acid sequence encoding a leader sequence (or signal sequence) located between the LSP and the nucleic acid encoding the GAA polypeptide, the leader sequence being selected from any of the innate GAA leader sequences, AAT sequences, IL2(1-3), IL2 leader sequences (IL2 wt), modified IL2 leader sequences (IL2 mut), fibronectin (FN1), or IgG leader sequences, or functional variants thereof, as disclosed herein. In some embodiments, the nucleic acid sequence optionally further comprises a Kozak sequence located between the LSP and the leader sequence. In some embodiments, the nucleic acid sequence optionally further comprises an IGF2-targeted peptide located between the leader sequence and the nucleic acid encoding the GAA polypeptide. In some embodiments, the nucleic acid sequence optionally further comprises a nucleic acid encoding the GAA polypeptide and a 3'UTR located at 3' of the polyA sequence. In some embodiments, the nucleic acid sequence optionally further comprises an intron sequence at 3' of the LSP and 5' of the nucleic acid encoding the GAA polypeptide, preferably between the LSP and the Kozak sequence. Exemplary constructs for rAAV vectors or rAAV genomes are shown in Figures 5A–5G.

[0035] Another aspect of the technology described herein relates to a composition comprising a nucleic acid sequence comprising a liver-specific promoter (LSP), a polyA sequence, and a 3'ITR sequence operably ligated to a nucleic acid sequence encoding a modified GAA polypeptide comprising one or more modifications selected from 5'ITR, H199R, R223H, and H201L, wherein the polyA sequence may be full-length or cleaved polyA signal sequence. Another aspect of the technology described herein relates to a composition comprising a nucleic acid sequence comprising a liver-specific promoter (LSP), a full-length polyA sequence, a terminal repeat sequence, and a 3'ITR sequence operably ligated to a nucleic acid sequence encoding a modified GAA polypeptide comprising one or more modifications selected from 5'ITR, H199R, R223H, and H201L, wherein the nucleic acid lacks an AAV P5 promoter sequence.

[0036] Another aspect of the technique described herein relates to a composition comprising, in the following order: (a) a nucleic acid encoding a secretory signaling peptide; (b) a nucleic acid encoding an IGF2-targeting peptide; and (c) a nucleic acid sequence comprising a liver-specific promoter (LSP) operably linked to a nucleic acid sequence encoding a GAA polypeptide.

[0037] Another aspect of the technique described herein relates to a composition comprising a nucleic acid sequence for a recombinant adenovirus-associated (rAAV) vector genome, wherein the nucleic acid sequence comprises (a) a 5' and 3' AAV inverted terminal repeat (ITR) nucleic acid sequence, and (b) a heterogeneous nucleic acid sequence encoding a polypeptide located between the 5' and 3' ITR sequences, the heterogeneous nucleic acid being operably linked to a liver-specific promoter as described above. Exemplary liver-specific promoters are SP0412 or SP0422, or functional variants thereof. In some embodiments, liver-specific promoters for use in the methods and compositions disclosed herein include liver-specific cis-regulatory elements (CREs), synthetic liver-specific cis-regulatory modules (CRMs), or synthetic liver-specific promoters disclosed in Table 4 herein.

[0038] In some embodiments of the methods and compositions disclosed herein, the nucleic acid sequence comprises a heterogeneous nucleic acid sequence encoding a GAA polypeptide, wherein the nucleic acid sequence is a human GAA gene, or a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence. In some embodiments of the methods and compositions disclosed herein, the nucleic acid sequence comprises a heterogeneous nucleic acid sequence that is a codon-optimized (coGAA)GAA gene for one or more of the following purposes: enhanced expression in vivo, reduction of CpG islands, or reduction of the innate immune response. In some embodiments of the methods and compositions disclosed herein, the nucleic acid sequence comprises a heterogeneous nucleic acid sequence that is a codon-optimized (coGAA)GAA gene for reducing CpG islands and reducing the innate immune response.

[0039] In some embodiments of the methods and compositions disclosed herein, the nucleic acid sequence comprises a heterogeneous nucleic acid sequence encoding a GAA polypeptide selected from among a nucleic acid sequence encoding a GAA polypeptide having the amino acid sequence of SEQ ID NO: 11 (full-length hGAA), SEQ ID NO: 55 (Dwight cDNA), SEQ ID NO: 56 (hGAA Δ1~66), or SEQ ID NO: 182 (modGAA, H199R, R223H), or SEQ ID NO: 170 (modGAA; H199R, R223H), or SEQ ID NO: 171 (modGAA; H199R, R223H, H201L), or a nucleic acid sequence encoding a GAA polypeptide having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with any of SEQ ID NOs: 11, 55, 56, or 182.

[0040] In some embodiments of the methods and compositions disclosed herein, the nucleic acid sequence comprises a heterogeneous nucleic acid sequence encoding a GAA polypeptide, the nucleic acid encoding the GAA polypeptide being selected from any of the following: SEQ ID NO: 74 (codon optimized 1), SEQ ID NO: 75 (codon optimized 2), and SEQ ID NO: 76 (codon optimized 3), or SEQ ID NO: 182 (modGAA, H199R, R223H), or any nucleic acid sequence having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with any of SEQ ID NOs: 74, 75, 76, or 182.

[0041] Another aspect of the techniques described herein relates to the use of the rAAV and nucleic acid compositions disclosed herein in methods for treating diseases. In particular, one aspect of the techniques described herein relates to the use of the rAAV vector compositions and nucleic acid compositions disclosed herein in methods for treating subjects having glycogen storage disorder type II (GSD II, Pompe disease, acid maltase deficiency) or having a deficiency in alpha-glucosidase (GAA) polypeptide, wherein the method comprises the step of administering either the recombinant AAV vector disclosed herein or the rAAV genome or nucleic acid sequence to the subject. In some embodiments of the methods disclosed herein, the expressed GAA polypeptide is secreted from the liver of the subject, and there is uptake of the secreted GAA by skeletal muscle tissue, cardiomyocyte tissue, diaphragmatic muscle tissue, or a combination thereof, and the uptake of the secreted GAA results in a reduction of lysosomal glycogen storage in the tissue. In some embodiments of the disclosed methods, recombinant AAV vectors, or rAAV genomes or nucleic acid sequences, are administered to a target by any preferred method of administration, for example, an administration method selected from, but not limited to, intramuscular, subcutaneous, intraspinal, intracisional, intrathecal, or intravenous administration. In some embodiments, the pharmaceutical compositions disclosed herein can be used in the disclosed methods.

[0042] Another aspect of the technology herein relates to cells comprising one or more of the rAAV compositions, rAAV genome compositions, or nucleic acid compositions disclosed herein. In some embodiments, the cells are human cells, or non-human mammalian cells, or insect cells.

[0043] Another aspect of the technology disclosed herein relates to a host animal comprising one or more of the rAAV compositions, rAAV genome compositions, or nucleic acid compositions disclosed herein. In some embodiments, the host animal is a mammal, a non-human mammal, or a human.

[0044] Another aspect of the technology disclosed herein relates to a host animal comprising at least one cell containing one or more of the rAAV compositions, rAAV genome compositions, or nucleic acid compositions disclosed herein. In some embodiments, the host animal comprising such modified cells is a mammal, a non-human mammal, or a human.

[0045] In some embodiments, a pharmaceutical formulation comprising an rAAV vector, a nucleic acid encoding an rAAV genome, and a pharmaceutically acceptable carrier is disclosed herein.

[0046] Aspects of the present invention teach certain advantages in construction and use that result in the exemplary advantages described below. Other features and advantages of aspects of the present invention will become apparent from the following more detailed description, which should be interpreted in conjunction with the accompanying drawings illustrating the principles of aspects of the present invention, for example.

[0047] This application file includes at least one drawing drawn in color. A copy of the publication of this patent application accompanied by the color drawing will be provided by the Office upon request and payment of the necessary fees. The attached drawing illustrates aspects of the invention. Such drawings are as follows: In embodiments of the present invention, for example, the following items are provided. (Item 1) Within that genome, a. 5' and 3' AAV inverted terminal repeat (ITR) sequences, and b. A heterogeneous nucleic acid sequence that encodes a polypeptide containing an alpha-glucosidase (GAA) polypeptide, located between the 5' and 3' ITRs, wherein the heterogeneous nucleic acid is i.CRM_SP0412 (SEQ ID NO: 86) or SP0412 (SEQ ID NO: 91), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 86 or SEQ ID NO: 91, ii. SP0422 (SEQ ID NO: 92), or a functional variant or functional fragment thereof having at least 60% activity of SEQ ID NO: 92, iii. CRM_SP0239 (SEQ ID NO: 87), SP0239 (SEQ ID NO: 93), SP0238-UTR (SEQ ID NO: 147), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 87, SEQ ID NO: 93, or SEQ ID NO: 147; iv. CRM_SP0265(SP0131_A1)(SEQ ID NO: 88) or SP0265(LVR_SP0131_A1)(SEQ ID NO: 94) or SP0265-UTR(SEQ ID NO: 146), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 88, SEQ ID NO: 94, or SEQ ID NO: 146; v.CRM_SP0240 (SEQ ID NO: 89), SP0240 (SEQ ID NO: 95), SP0240-UTR (SEQ ID NO: 148), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 89, SEQ ID NO: 95, or SEQ ID NO: 148; vi.CRM_SP0246 (SEQ ID NO: 90) or SP0246 (SEQ ID NO: 96) or SP0246-UTR (SEQ ID NO: 149), or functional variants or functional fragments thereof that have at least 60% activity of SEQ ID NO: 90, SEQ ID NO: 96, or SEQ ID NO: 149 A heterogeneous nucleic acid sequence operably linked to a liver-specific promoter selected from one of the following: Recombinant adenovirus-associated (AAV) vectors containing this vector. (Item 2) The recombinant AAV vector according to item 1, wherein the heterogeneous nucleic acid sequence encodes a fusion protein comprising a secretory signal fused to the GAA polypeptide, or a fusion polypeptide comprising a targeted peptide fused to the GAA polypeptide, or a fusion protein comprising a secretory signal and a targeted peptide fused to the GAA polypeptide. (Item 3) The aforementioned AAV genome is arranged in the direction from 5' to 3'. a.5'ITR, b. Liver-specific promoter sequence, c. Intron sequence, d. Nucleic acids encoding secretory signal peptides, e. Nucleic acids encoding IGF2-targeted peptides, f. Nucleic acid encoding alpha-glucosidase (GAA) polypeptide, g. PolyA sequence, and h.3'ITR Recombinant AAV vectors, including those listed in item 2. (Item 4) The nucleic acid encoding the secretory signal peptide is AAT signal peptide, fibronectin signal peptide (FN1), GAA reader sequence, IL-2 wt reader sequence A recombinant AAV vector according to any one of items 1 to 3, encoding a signal sequence selected from any of the following: a modified IL-2 reader sequence, an IL2(1-3) reader sequence, an IgG reader sequence, an AAT reader sequence, or an active fragment thereof having secretory signaling activity. (Item 5) A recombinant AAV vector according to any one of items 1 to 3, wherein the IGF2-targeting peptide binds to a human cation-independent mannose-6-phosphate receptor (CI-MPR) or IGF2 receptor. (Item 6) The recombinant AAV vector according to item 5, wherein the IGF2-targeting peptide comprises SEQ ID NO: 5 or comprises at least one amino variant of SEQ ID NO: 5 that binds to the IGF2 receptor. (Item 7) The recombinant AAV vector according to item 6, wherein at least one amino modification in SEQ ID NO: 5 is a V43M amino acid modification (SEQ ID NO: 8 or SEQ ID NO: 9), or Δ2-7 (SEQ ID NO: 6), or Δ1-7 (SEQ ID NO: 7). (Item 8) The recombinant AAV vector according to item 1 or 2, wherein the nucleic acid sequence encodes a wild-type GAA polypeptide or a modified GAA polypeptide. (Item 9) A recombinant AAV vector according to any one of items 1 to 8, wherein the nucleic acid sequence encoding the GAA polypeptide is a human GAA gene, a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence. (Item 10) A recombinant AAV vector according to any one of items 1 to 9, wherein the nucleic acid sequence encoding the GAA polypeptide is modified from SEQ ID NO: 11 for one or more of the following: (i) codon optimization for enhanced expression in vivo, (ii) reduction of CpG islands, (iii) modification of the STOP sequence, (iv) reduction of alternative reading frames, and (v) reduction of the innate immune response. (Item 11) A recombinant AAV vector according to any one of items 1 to 10, wherein the nucleic acid sequence encoding the GAA polypeptide comprises at least one, at least two, or at least three amino acid modifications selected from H201L, H199R, or R233H of SEQ ID NO: 10. (Item 12) A recombinant AAV vector according to any one of items 1 to 11, wherein the encoded fusion polypeptide further comprises a spacer having a nucleotide sequence for at least one amino acid located at the amino terminus of the GAA polypeptide and the C terminus of the IGF2-targeting peptide. (Item 13) The recombinant AAV vector according to item 12, further comprising a nucleic acid encoding a spacer of at least one amino acid located between the nucleic acid encoding the IGF2-targeted peptide and the nucleic acid encoding the GAA polypeptide. (Item 14) A recombinant AAV vector according to any one of items 1 to 13, further comprising at least one polyA sequence located at 3' of the nucleic acid encoding the GAA gene and at 5' of the 3'ITR sequence. (Item 15) The heterogeneous nucleic acid sequence is a collagen-stability (CS) sequence or a 3'UTR sequence located at 3' of the nucleic acid encoding the GAA polypeptide and 5' of the 3'ITR sequence. Alternatively, a recombinant AAV vector as described in any of items 1-14, further comprising CS and 3'UTR sequences. (Item 16) A recombinant AAV vector according to items 1 to 15, further comprising a collagen stability (CS) sequence or a 3'UTR sequence, or a nucleic acid encoding both the CS and 3'UTR sequences, located between the nucleic acid encoding the GAA polypeptide and the poly(A) sequence. (Item 17) A recombinant AAV vector according to any one of items 1 to 16, further comprising an intron sequence located at 5' of the sequence encoding a secretory signal peptide and at 3' of the promoter. (Item 18) The recombinant AAV vector according to item 17, wherein the intron sequence comprises an MVM sequence, an HBB2 sequence, or an SV40 sequence. (Item 19) The ITR is a recombinant AAV vector as described in any of items 1 to 18, including insertions, deletions, or substitutions. (Item 20) The recombinant AAV vector according to item 19, wherein one or more CpG islands in the ITR have been removed. (Item 21) a. The nucleic acid encoding the secretory signal peptide, AAT signal peptides (e.g., SEQ ID NO: 17), or nucleic acids encoding amino acid sequences having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with active fragments thereof having secretory signaling activity, e.g., SEQ ID NOs: 17-22; Nucleic acids encoding amino acid sequences having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with fibronectin signal peptides (FN1) (e.g., SEQ ID NOs. 18-21), or active fragments thereof having secretory signaling activity; Homogeneous GAA signal peptide (SEQ ID NO: 175), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 175; hIGF2 signal peptide (e.g., SEQ ID NO: 22), or nucleic acids encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with an active fragment thereof having secretory signaling activity, e.g., SEQ ID NO: 22; IgG1 leader peptide (SEQ ID NO: 177), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 177; wtIL2 leader peptide (SEQ ID NO: 179), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 179; A mutant IL2 leader peptide (SEQ ID NO: 181), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 181. Selected from any of the groups consisting of; b. The nucleic acid encoding the GAA polypeptide is selected from any of the group consisting of sequence numbers 11, 72, or 182, or nucleic acid sequences having at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with sequence number 11, 72, or 182. Recombinant AAV vectors listed in any of items 1-20. (Item 22) A recombinant AAV vector according to any one of items 1 to 21, wherein the IGF2-targeting peptide is selected from SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 9. (Item 23) A recombinant AAV vector according to any one of items 1 to 22, wherein the nucleic acid encoding the IGF2-targeted peptide is located between the nucleic acid encoding the secretory signal peptide and the nucleic acid encoding the alpha-glucosidase (GAA) polypeptide. (Item 24) Recombinant AAV vectors as described in any of items 1-23, which are chimeric AAV vectors, haploid AAV vectors, hybrid AAV vectors, or polyploid AAV vectors. (Item 25) A recombinant AAV vector as described in any of items 1-24, which is a rational haploid vector, a mosaic AAV vector, a chemically modified AAV vector, or an AAV vector derived from any AAV serotype. (Item 26) A recombinant AAV vector as described in any of items 1 to 25, selected from the group consisting of AAVXL32 vector, AAVXL32.1 vector, AAV8 vector, or haploid AAV8 vector containing at least one AAV8 capsid protein. (Item 27) A recombinant AAV vector with serotype AAV3b, as described in any of items 1-26. (Item 28) The recombinant AAV vector according to item 27, wherein the AAV3b serotype comprises one or more mutations in the capsid protein selected from 265D, 549A, and Q263Y. (Item 29) The recombinant AAV vector described in item 28, wherein the AAV3b serotype is selected from AAV3b265D, AAV3b265D549A, AAV3b549A, AAV3bQ263Y, or AAV3bSASTG. (Item 30) Within that genome, a. 5' and 3' AAV inverted terminal repeat (ITR) sequences, and b. A heterogeneous nucleic acid sequence encoding a polypeptide comprising an alpha-glucosidase (GAA) polypeptide located between the 5' and 3' ITRs, wherein the heterogeneous nucleic acid is operably linked to a liver-specific promoter, and the liver-specific promoter is i.CRM_SP0412 (SEQ ID NO: 86) or SP0412 (SEQ ID NO: 91), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 86 or SEQ ID NO: 91, ii. SP0422 (SEQ ID NO: 92), or a functional variant or functional fragment thereof having at least 60% activity of SEQ ID NO: 92, iii. CRM_SP0239 (sequence number 87) or SP0239 (sequence number 93) or SP0238-UTR (sequence number 147), or sequence number 87, sequence number 93 Alternatively, functional variants or functional fragments thereof that have at least 60% activity relative to SEQ ID NO: 147; iv. CRM_SP0265(SP0131_A1)(SEQ ID NO: 88) or SP0265(LVR_SP0131_A1)(SEQ ID NO: 94) or SP0265-UTR(SEQ ID NO: 146), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 88, SEQ ID NO: 94, or SEQ ID NO: 146; v.CRM_SP0240 (SEQ ID NO: 89) or SP0240 (SEQ ID NO: 95) or SP0240-UTR (SEQ ID NO: 148), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 89, SEQ ID NO: 95, or SEQ ID NO: 148; or vi.CRM_SP0246 (SEQ ID NO: 90) or SP0246 (SEQ ID NO: 96) or SP0246-UTR (SEQ ID NO: 149), or functional variants or functional fragments thereof that have at least 60% activity of SEQ ID NO: 90, SEQ ID NO: 96, or SEQ ID NO: 149 A heterogeneous nucleic acid sequence selected from one of the following: A recombinant adenovirus-associated (AAV) vector containing, Contains capsid proteins selected from serotypes AAV3, AAV3b, and AAV8. Recombinant AAV vector. (Item 31) The recombinant AAV vector according to item 30, wherein the heterogeneous nucleic acid sequence encoding the GAA polypeptide further comprises a nucleic acid encoding a secretion signal peptide located at 5' of the nucleic acid encoding the GAA polypeptide. (Item 32) The recombinant AAV vector according to item 31, wherein the heterogeneous nucleic acid sequence encoding the GAA polypeptide further comprises a nucleic acid encoding a targeted peptide located between the nucleic acid encoding the secretory signal peptide and the nucleic acid encoding the alpha-glucosidase (GAA) polypeptide. (Item 33) The aforementioned AAV genome is arranged in the direction from 5' to 3'. a.5'ITR, b. Liver-specific promoter sequence, c. Intron sequence, d. Nucleic acids encoding secretory signal peptides, e. Nucleic acid encoding alpha-glucosidase (GAA) polypeptide, f. Poly-A sequence, and g.3'ITR Recombinant AAV vectors, including those listed in item 30. (Item 34) The aforementioned AAV genome is arranged in the direction from 5' to 3'. a.5'ITR, b. Liver-specific promoter sequence, c. Intron sequence, d. Nucleic acids encoding targeted peptides, e. Nucleic acid encoding alpha-glucosidase (GAA) polypeptide, f. Poly-A sequence, and g.3'ITR Recombinant AAV vectors, including those listed in item 30. (Item 35) The secretory signal peptides include AAT signal peptide, fibronectin signal peptide (FN1), GAA reader sequence, IL-2 wt reader sequence, and modified IL-2. A recombinant AAV vector as described in any of items 30-34, comprising a leader sequence, an IL2(1-3) leader sequence, an IgG leader sequence, an AAT leader sequence, or an active fragment thereof having secretory signaling activity, or an active fragment thereof having secretory signaling activity, selected from any of these. (Item 36) A recombinant AAV vector according to any one of items 30 to 35, wherein the targeted peptide is selected from either a human cation-independent mannose-6-phosphate receptor (CI-MPR) or an IGF2-targeted peptide sequence that binds to the IGF2 receptor, or a functional variant thereof. (Item 37) The recombinant AAV vector according to item 36, wherein the IGF2-targeting peptide comprises SEQ ID NO: 5, or comprises at least one amino modification in SEQ ID NO: 5 that does not affect binding to the CI-MPR receptor or reduces binding to at least one serum IGF-binding protein (IGFBP), or comprises an amino acid sequence having at least 85% sequence identity with SEQ ID NO: 5. (Item 38) The recombinant AAV vector according to items 30-37, wherein the nucleic acid sequence encodes the wild-type GAA polypeptide of SEQ ID NO: 10 or a modified GAA polypeptide. (Item 39) The recombinant AAV vector according to items 30-38, wherein the nucleic acid sequence encodes a GAA polypeptide comprising at least one, at least two, or at least three amino acid modifications selected from H201L, H199R, or R233H of SEQ ID NO: 10. (Item 40) A recombinant AAV vector according to any one of items 30 to 39, wherein the nucleic acid sequence encoding the GAA polypeptide is a human GAA gene, a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence. (Item 41) A recombinant AAV vector according to any one of items 30 to 40, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized to reduce CpG islands. (Item 42) A recombinant AAV vector according to any one of items 30 to 41, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized to reduce the innate immune response, reduce CpG islands, or reduce the innate immune response and reduce the innate immune response. (Item 43) A recombinant AAV vector according to any one of items 30 to 42, wherein the intron sequence comprises an MVM sequence or an HBB2 sequence. (Item 44) A recombinant AAV vector according to any of items 30 to 43, wherein the ITR includes insertions, deletions, or substitutions, or one or more of the CpG islands are removed. (Item 45) A recombinant AAV vector as described in item 44, which is AAVXL32, or AAVXL32.1, or AAV8, or a haploid AAV8 vector containing at least one AAV8 capsid protein. (Item 46) a. The nucleic acid encoding the secretory signal peptide, AAT signal peptide (e.g., SEQ ID NO: 17), or a peptide having secretory signaling activity Active fragments of, for example, nucleic acids encoding amino acid sequences having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with sequence numbers 17-22; Nucleic acids encoding amino acid sequences having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with fibronectin signal peptides (FN1) (e.g., SEQ ID NOs. 18-21), or active fragments thereof having secretory signaling activity; Homogeneous GAA signal peptide (SEQ ID NO: 175), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 175; hIGF2 signal peptide (e.g., SEQ ID NO: 22), or nucleic acids encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with an active fragment thereof having secretory signaling activity, e.g., SEQ ID NO: 22; IgG1 leader peptide (SEQ ID NO: 177), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 177; wtIL2 leader peptide (SEQ ID NO: 179), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 179; A mutant IL2 leader peptide (SEQ ID NO: 181), or an active fragment thereof having secretory signaling activity, for example, a nucleic acid encoding an amino acid sequence having at least approximately 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 181. Selected from any of the groups consisting of; b. The nucleic acid encoding the GAA polypeptide is selected from any of the group consisting of SEQ ID NO: 11, SEQ ID NO: 72, or SEQ ID NO: 182, or nucleic acid sequences having at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 11, SEQ ID NO: 72, or SEQ ID NO: 182, or is a nucleic acid sequence encoding the GAA polypeptide having at least one, at least two, or at least three amino acid modifications selected from H201L, H199R, or R233H of SEQ ID NO: 10. Recombinant AAV vector as described in any of items 30-45. (Item 47) A recombinant AAV vector according to any of items 30 to 46, wherein the IGF2-targeting peptide is selected from SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 9. (Item 48) A recombinant AAV vector according to any one of items 30 to 47, wherein the IGF2-targeting peptide is SEQ ID NO: 8 or SEQ ID NO: 9, or a functional variant having at least 85% sequence identity thereto. (Item 49) A pharmaceutical composition comprising a recombinant AAV vector as described in any one of the preceding items in a pharmaceutically acceptable carrier. (Item 50) A nucleic acid sequence comprising a liver-specific promoter operably ligated to a nucleic acid sequence encoding a GAA polypeptide, wherein the liver-specific promoter is the following Selected from any one of the ters, the liver-specific promoter is i.CRM_SP0412 (SEQ ID NO: 86) or SP0412 (SEQ ID NO: 91), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 86 or SEQ ID NO: 91, ii. SP0422 (SEQ ID NO: 92), or a functional variant or functional fragment thereof having at least 60% activity of SEQ ID NO: 92, iii. CRM_SP0239 (SEQ ID NO: 87), SP0239 (SEQ ID NO: 93), SP0238-UTR (SEQ ID NO: 147), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 87, SEQ ID NO: 93, or SEQ ID NO: 147; iv. CRM_SP0265(SP0131_A1)(SEQ ID NO: 88) or SP0265(LVR_SP0131_A1)(SEQ ID NO: 94) or SP0265-UTR(SEQ ID NO: 146), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 88, SEQ ID NO: 94, or SEQ ID NO: 146; v.CRM_SP0240 (SEQ ID NO: 89) or SP0240 (SEQ ID NO: 95) or SP0240-UTR (SEQ ID NO: 148), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 89, SEQ ID NO: 95, or SEQ ID NO: 148; or vi.CRM_SP0246 (SEQ ID NO: 90) or SP0246 (SEQ ID NO: 96) or SP0246-UTR (SEQ ID NO: 149), or functional variants or functional fragments thereof that have at least 60% activity of SEQ ID NO: 90, SEQ ID NO: 96, or SEQ ID NO: 149 Select from one of the following: Nucleic acid sequence. (Item 51) Nucleic acid sequences for recombinant adenovirus-associated (AAV) vector genomes, a. 5' and 3' AAV inverted end repeat (ITR) nucleic acid sequences, and b. A heterogeneous nucleic acid sequence encoding a polypeptide comprising a secretory signal peptide and an alpha-glucosidase (GAA) polypeptide, located between the 5' and 3' ITR sequences, wherein the heterogeneous nucleic acid sequence is operably linked to a liver-specific promoter, and the liver-specific promoter is i.SP0422 (SEQ ID NO: 92), or a functional variant or functional fragment thereof having at least 60% activity of SEQ ID NO: 92, ii. CRM_SP0239 (SEQ ID NO: 87), SP0239 (SEQ ID NO: 93), SP0238-UTR (SEQ ID NO: 147), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 87, SEQ ID NO: 93, or SEQ ID NO: 147; iii. CRM_SP0265(SP0131_A1)(SEQ ID NO: 88) or SP0265(LVR_SP0131_A1)(SEQ ID NO: 94) or SP0265-UTR(SEQ ID NO: 146), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 88, SEQ ID NO: 94, or SEQ ID NO: 146; iv. CRM_SP0240 (SEQ ID NO: 89) or SP0240 (SEQ ID NO: 95) or SP0240-UTR (SEQ ID NO: 148), or functional variants or functional fragments thereof having at least 60% activity of SEQ ID NO: 89, SEQ ID NO: 95, or SEQ ID NO: 148; or v.CRM_SP0246 (SEQ ID NO: 90) or SP0246 (SEQ ID NO: 96) or SP0246-UTR (SEQ ID NO: 149), or functional variants or functional fragments thereof having at least 60% activity to SEQ ID NO: 90, SEQ ID NO: 96, or SEQ ID NO: 149 A heterogeneous nucleic acid sequence selected from one of the liver-specific promoters. A nucleic acid sequence containing the following: (Item 52) The nucleic acid sequence according to item 50 or 51, wherein the heterogeneous nucleic acid sequence encoding the GAA polypeptide further comprises an IGF2-targeting peptide located between the secretory signal peptide and the alpha-glucosidase (GAA) polypeptide. (Item 53) The nucleic acid sequences according to items 50-52, wherein the nucleic acid encoding the secretion signal is selected from sequence numbers 17, 22-26, 177, 179, 181, or nucleic acids having at least 85% sequence identity with them. (Item 54) The nucleic acid sequence according to item 50, wherein the nucleic acid encoding the IGF2-targeting peptide is selected from SEQ ID NO: 2 (IGF2-Δ2~7), SEQ ID NO: 3 (IGF2-Δ1~7), or SEQ ID NO: 4 (IGF2 V43M), or a nucleic acid having at least 85% sequence identity with them. (Item 55) The nucleic acid sequence according to items 50-54, wherein the nucleic acid sequence encoding the GAA polypeptide is a human GAA gene, a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence. (Item 56) The nucleic acid sequence according to item 55, wherein the nucleic acid sequence encoding the GAA polypeptide is modified from SEQ ID NO: 11 for one or more of the following: (i) codon optimization for enhanced expression in vivo, (ii) reduction of CpG islands, (iii) modification of STOP sequences, (iv) reduction of alternative reading frames, and (v) reduction of innate immune response. (Item 57) The nucleic acid sequence encoding the GAA polypeptide is codon-optimized to reduce CpG islands, reduce the innate immune response, or reduce CpG islands and reduce the innate immune response, and / or for enhanced expression in vivo, as described in item 55. (Item 58) The nucleic acid sequences described in items 50-57, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized for enhanced expression in vivo. (Item 59) The nucleic acid sequences described in items 50 to 58, wherein the nucleic acid encoding the GAA polypeptide is selected from any of the following: SEQ ID NO: 11 (full-length hGAA), SEQ ID NO: 55 (Dwight cDNA), SEQ ID NO: 56 (hGAA Δ1~66), or SEQ ID NO: 82 (mod_hGAA), or SEQ ID NO: 182, or any nucleic acid sequence having at least 80%, 85%, 90%, 95%, or 98% identity with them. (Item 60) The nucleic acid sequence according to items 50 to 58, wherein the nucleic acid encoding the GAA polypeptide is selected from any of the following: SEQ ID NO: 74 (codon optimized 1), SEQ ID NO: 75 (codon optimized 2), SEQ ID NO: 76 (codon optimized 3), and SEQ ID NO: 82 (mod_hGAA), or any nucleic acid sequence having at least 80%, 85%, 90%, 95%, or 98% identity with them. (Item 61) The nucleic acid sequence described in items 50-58, wherein the nucleic acid encodes a GAA polypeptide comprising at least one, at least two, or at least three amino acid modifications selected from H201L, H199R, or R233H of SEQ ID NO: 10. (Item 62) A method for treating a subject having type II glycogen storage disorder (GSD II, Pompe disease, acid maltase deficiency) or a deficiency in alpha-glucosidase (GAA) polypeptide, comprising the step of administering to the subject either a recombinant AAV vector or an rAAV genome or nucleic acid sequence as described in any of the preceding items. (Item 63) The method according to item 62, wherein a GAA polypeptide is secreted from the liver of the subject, and the secreted GAA is taken up by skeletal muscle tissue, cardiac muscle tissue, diaphragmatic muscle tissue, or a combination thereof, and the uptake of the secreted GAA results in a reduction of lysosomal glycogen storage in the tissue. (Item 64) The method according to item 62, wherein the step of administering to the subject is selected from intramuscular, subcutaneous, intraspinal, intracisional, intrathecal, or intravenous administration. (Item 65) The method according to item 62, wherein the recombinant AAV vector is a chimeric AAV vector, a haploid AAV vector, a hybrid AAV vector, or a polyploid AAV vector. (Item 66) The method according to item 62, wherein the recombinant AAV vector is a rational haploid vector, a mosaic AAV vector, a chemically modified AAV vector, or an AAV vector derived from any AAV serotype. (Item 67) The method according to item 62, wherein the recombinant AAV vector is an AAVXL32 vector, or an AAVXL32.1 vector, or an AAV8 vector, or a haploid AAV8 vector containing at least one AAV8 capsid protein. (Item 68) The method according to item 62, wherein the recombinant AAV vector is an AAV8 vector. (Item 69) A method for treating a subject having a lysosomal storage disorder (LSD), comprising the step of administering to the subject either a recombinant AAV vector described in any one of the preceding items, or an rAAV genome or nucleic acid sequence, wherein the AAV vector expresses a polypeptide selected from any polypeptides in Table 5B or Table 6B. (Item 70) The method according to item 69, wherein the lysosomal storage disorder (LSD) is selected from either of those listed in Table 5A or Table 6A. (Item 71) The method according to item 69, wherein the recombinant AAV vector is a chimeric AAV vector, a haploid AAV vector, a hybrid AAV vector, or a polyploid AAV vector. (Item 72) The method according to item 69, wherein the recombinant AAV vector is a rational haploid vector, a mosaic AAV vector, a chemically modified AAV vector, or an AAV vector derived from any AAV serotype. (Item 73) The method according to item 69, wherein the recombinant AAV vector is an AAVXL32 vector, or an AAVXL32.1 vector, or an AAV8 vector, or a haploid AAV8 vector containing at least one AAV8 capsid protein. [Brief explanation of the drawing]

[0048] [Figure 1] Figure 1 is a graph illustrating the vector genome per diploid genome, measured in whole blood, and the x-axis for different AAV serotypes, AAV3b, AAV3ST, AAV8, and AAV9, according to at least one embodiment.

[0049] [Figure 2] Figure 2 is a graph illustrating the vector genome per diploid genome and the different AAV serotypes AAV3b, AAV3ST, AAV8, and AAV9 on the y-axis, as measured in the left, middle, and right lobes of the liver according to at least one embodiment.

[0050] [Figure 3-1] Figures 3A and 3B show exemplary plasmids for the production of rAAV vectors useful in the methods and compositions disclosed herein. Figure 3A is an illustration of a plasmid map of a pAAV-LSPhGAA plasmid for the production of an rAAV vector in a production cell line, e.g., a pro-10 cell line, according to at least one embodiment, the plasmid comprising a 5'ITR, LSP, hGAA nucleic acid sequence, 3'UTR, poly(A) sequence, and 3'ITR, the ITR being derived from AAV2. Figure 3B shows a more detailed map of the plasmid map illustration in Figure 3A. [Figure 3-2]Figures 3A and 3B show exemplary plasmids for the production of rAAV vectors useful in the methods and compositions disclosed herein. Figure 3A is an illustration of a plasmid map of a pAAV-LSPhGAA plasmid for the production of an rAAV vector in a production cell line, e.g., a pro-10 cell line, according to at least one embodiment, the plasmid comprising a 5'ITR, LSP, hGAA nucleic acid sequence, 3'UTR, poly(A) sequence, and 3'ITR, the ITR being derived from AAV2. Figure 3B shows a more detailed map of the plasmid map illustration in Figure 3A.

[0051] [Figure 4-1]Figures 4A–4G illustrate exemplary nucleic acid constructs for the rAAV genome disclosed herein, having a targeted peptide and using hGAA as the exemplary lysosomal protein to be expressed. Figure 4A shows a nucleic acid construct for the rAAV genome, comprising a 5' ITR, a secretory signal peptide (SS), a targeted peptide (TP), and a liver-specific promoter (LSP) operably linked to a heterogeneous nucleic acid encoding a human GAA (hGAA) polypeptide, as well as a 3' ITR. Figure 4B shows an exemplary nucleic acid construct for the rAAV genome disclosed herein, comprising the same elements as Figure 4A, with the additional inclusion of at least one polyA signal at the 3' of the hGAA polypeptide and the 5' of the 3'-ITR. Figure 4C shows an exemplary nucleic acid construct for the rAAV genome disclosed herein, comprising the same elements as Figure 4B, except for the inclusion of an intron sequence at the 3' of the promoter. Figure 4D shows an exemplary nucleic acid construct for the rAAV genome disclosed herein, comprising the same elements as Figure 4C, except that it includes a collagen stability (CS) sequence and / or a 3'UTR sequence located prior to the 3' and polyA sequences of the hGAA polypeptide nucleic acid sequence. Figure 4E shows an exemplary nucleic acid construct for the rAAV genome disclosed herein, comprising the same elements as Figure 4D, except that it also includes a nucleic acid encoding the nucleic acid encoding the hGAA polypeptide and a targeted peptide (TP), for example, a nucleic acid encoding a spacer of at least one amino acid located between the nucleic acid encoding the IGF2 targeted peptide. Figure 4F shows an exemplary nucleic acid construct for the rAAV genome disclosed herein, comprising the same elements as in Figure 4E, wherein the promoter is a liver promoter, the intron sequence is selected from the MVM or HBB2 intron sequence, the secretory signal peptide is selected from either the FN1 signal peptide (e.g., hFN1, ratFN1), the AAT signal peptide, or the hGAA signal peptide, the targeting peptide is the IGF2 targeting peptide disclosed herein, and at least the polyA sequence is selected from the hGHpA or synPA polyA sequence.Figure 4G shows an exemplary nucleic acid construct for the rAAV genome disclosed herein, which contains the same elements as Figure 4F, except that the IGF2-targeting peptide is a nucleic acid sequence selected from SEQ ID NO: 2 (IGF2 Δ2-7), SEQ ID NO: 3 (IGF2 Δ1-7), or SEQ ID NO: 4 (IGF2 V43M). [Figure 4-2] Same as above.

[0052] [Figure 5-1]Figures 5A–5G show exemplary nucleic acid constructs for the rAAV genome. Figure 5A is a schematic diagram of an exemplary rAAV genome including a 5'ITR, a liver-specific promoter operably ligated to a nucleic acid encoding the hGAA polypeptide, one or more polyA sequences (e.g., hGHpA, synPA, RBG, or SV40 polyA sequences), and a 3'ITR. Figure 5B is a schematic diagram of an exemplary rAAV genome including a 5'ITR, a liver-specific promoter operably ligated to a nucleic acid encoding a signal secretion peptide (e.g., selected from FN1, AAT, or a homologous GAA signal peptide, IL2, mutIL2, or IgG), a human GAA polypeptide, a nucleic acid encoding a polyA sequence, and a 3'ITR. Figure 5C is a schematic diagram of an exemplary rAAV genome including a 5'ITR, a liver-specific promoter operably linked to an intron sequence (e.g., MVM, SV40, or HBB2 intron sequence), a nucleic acid encoding a signal secretion peptide (e.g., selected from FN1, AAT, or a cognate GAA signal peptide, IL2, mutIL2, or IgG), a nucleic acid encoding a human GAA polypeptide and a poly(A) sequence, and a 3'ITR. Figure 5D is a schematic diagram of a construct similar to Figure 5C, including a collagen stability (CS) sequence or 3'UTR located between the 3' of the GAA-encoding nucleic acid and at least one poly(A) sequence (e.g., hGHpA and / or synPA poly(A) sequence). In some embodiments, the construct includes both the CS sequence and the 3'UTR sequence disclosed herein. In some embodiments, the CS sequence may be replaced by the 3'UTR sequence disclosed herein. In Figures 5A–5D, exemplary liver-specific promoters may be selected from, but are not limited to, those disclosed in Table 4 of this specification, and include sequences such as SEQ ID NOs. 86, 91–96 or 146–150, or sequences having at least 85% sequence identity with SEQ ID NOs. 86, 91–96 or 146–150.Figure 5E is a schematic diagram of one embodiment of an AAV vector useful in the methods and compositions disclosed herein for treating Pompe disease, comprising, adjacent to the 5' and 3' ITR sequences and in the 5'-to-3' direction, an LSP promoter, a Kozak sequence, a signal sequence (referred to as the leader sequence in Figure 5E), a nucleic acid encoding hGAA, and a polyA sequence. In some embodiments, the leader sequence can be selected from any of the following: an innate GAA leader sequence, an IL2 leader sequence (IL2 wt), a modified IL2 leader sequence (IL2 mut), or an IgG leader sequence, or functional variants thereof, and the hGAA sequence can be selected from a consensus hGAA nucleic acid sequence, or an hGAA nucleic acid having at least the H201L mutation or other modifications disclosed herein (e.g., H199R, R223H).Figure 5F is a schematic diagram of another embodiment of an AAV vector useful in the methods and compositions disclosed herein for treating Pompe disease, comprising, in the 5' to 3' direction, a liver-specific promoter, an intron sequence, a Kozak sequence, a signal sequence (also referred to as a leader sequence), an IGF2-targeted peptide sequence (referred to as "GILT" in Figure 5F), a nucleic acid encoding hGAA, optionally a 3'UTR sequence, and a polyA sequence. For example, the promoter can be selected from any LSP disclosed herein, e.g., LSPs having different levels of expression such as high-expression LSP (LSP-H), medium-expression LSP (LSP-M), or low-expression LSP (LSP-L); the intron sequence can be selected from HBB2, MVM, SV40, and other intron sequences; and the leader sequence can be an innate GAA leader sequence, an AAT sequence (referred to as A1AT in Figure 5F), IL2 (1-3), an IL2 leader sequence (IL2 wt), or a modified IL2 leader sequence (IL2 Different embodiments are shown, in which the IGF2-targeted peptide sequence can be selected from any of the following: mut, fibronectin (FN1, referred to as FBN in Figure 5F), or IgG leader sequences, or functional variants thereof; the IGF2-targeted peptide sequence is selected from any of the IGF2-targeted peptides described herein, e.g., WT IGF2 (SEQ ID NO: 1), Δ2-7, V43M (SEQ ID NO: 9), Δ2-7V43M, or functional variants thereof; the hGAA nucleic acid sequence is, for example, C1-10, and may optionally include at least the H201L mutation and / or other modifications disclosed herein (e.g., H199R, R223H), and is codon-optimized as disclosed herein; and the polyA sequence is selected from, for example, RBG or SV40 polyA. LSPs designated as LSP-H, M-LSP, and LSP-L represent liver-specific promoters that primarily and preferentially express hGAA in the liver, but can express hGAA in one or more other tissues, e.g., muscle. Such LSPs enable expression in the liver for systemic secretion, as well as uptake by muscle cells and some expression in muscle tissue.Figure 5G shows schematic diagrams of different embodiments of AAV vector constructs useful in the methods and compositions disclosed herein for treating Pompe disease, construct 1 (upper panel) showing an rAAV vector construct comprising a 5'ITR, AAV P5 promoter, liver-specific promoter (LSP), hGAA nucleic acid sequence, cleaved poly(A) sequence (t-pA), and 3'ITR in the 5'-to-3' direction, construct 2 (lower panel) showing an exemplary rAAV vector construct in which the P5 AAV promoter fragment has been removed, the construct comprising a 5'ITR, liver-specific promoter (LSP), hGAA nucleic acid sequence, full-length poly(A) sequence (fl-pA), antisense-oriented terminator sequence, and 3'ITR (sense-oriented) in the 5'-to-3' direction. [Figure 5-2] Same as above. [Figure 5-3] Same as above. [Figure 5-4] Same as above.

[0053] [Figure 6] Figure 6 illustrates the Gibson cloning technique for constructing the rAAV genome disclosed herein. In particular, triple ligation can be performed to ligate three nucleic acid sequence blocks together, and then this can be cloned into a vector having a promoter, for example, a liver-specific promoter, as well as 5' and 3' ITRs, to construct the rAAV genome. The following rAAV genomes were constructed using the Gibson cloning methodology: SEQ ID NO: 57 (AAT-V43M-wtGAA(delta 1~69aa)); SEQ ID NO: 58 (ratFN1-IGF2V43M-wtGAA(delta 1~69aa)); SEQ ID NO: 59 (hFN1-IGF2V43M-wtGAA(delta 1~69aa)); SEQ ID NO: 60 (AAT-IGF2Δ2~7-wtGAA(delta 1~69)); SEQ ID NO: 61 (FN1rat-IGFΔ2~7-wtGAA(delta 1~69)); SEQ ID NO: 62 (hFN1-IGFΔ2~7-wtGAA(delta 1~69)).

[0054] [Figure 7]Figure 7 shows the construction of an exemplary rAAV genome of SEQ ID NO: 57, containing AAT-V43M-wtGAA (delta 1-69aa), using Gibson cloning of nucleic acid sequence blocks (1, 2, and 3). Those skilled in the art can easily replace the TTR liver promoter with any of the liver-specific promoters disclosed in Table 4 herein, including, but not limited to, promoters selected from SEQ ID NOs: 86, 91-96, 146-150, or functional variants or functional fragments thereof. Furthermore, in the AAT-V43M-wtGAA(delta1~69aa) vector, the locations shown are a 3-amino acid (3aa) spacer nucleic acid sequence located at 3' of the nucleic acid sequence encoding the IGF2(V43M) targeted peptide and at 5' of the nucleic acid encoding the wtGAA(Δ1~69) enzyme (the 3aa sequence "GAP" is exemplified as SEQ ID NO: 31), as well as the locations of the stuffer nucleic acid sequence located at 3' of the poly(A) sequence and 5' of the 3'ITR sequence (referred to as the "spacer" sequence in Figure 8).

[0055] [Figure 8] Figure 8 shows the construction of the rAAV genome of SEQ ID NO: 62, containing hFN1-IGFΔ2~7-wtGAA (delta 1~69), using Gibson cloning of nucleic acid sequence blocks (8, 2, and 3). Those skilled in the art can easily replace the TTR liver promoter with any of the liver-specific promoters disclosed in Table 4 herein, including, but not limited to, promoters selected from SEQ ID NOs: 86, 91~96, 146~150, or functional variants or functional fragments thereof. Furthermore, in the hFN1-IGFΔ2~7-wtGAA(delta 1~69) vector, the locations shown are a 3-amino acid (3aa) spacer nucleic acid sequence located at 3' of the nucleic acid sequence encoding the IGFΔ2~7 targeted peptide and at 5' of the nucleic acid encoding the wtGAA(Δ1~69) enzyme (the 3aa sequence "GAP" is exemplified as SEQ ID NO: 31), as well as the locations of the stuffer nucleic acid sequence located at 3' of the poly(A) sequence and 5' of the 3'ITR sequence (referred to as the "spacer" sequence in Figure 13).

[0056] [Figure 9-1]Figures 9A-9F show schematic diagrams of exemplary rAAV genome constructs expressing wild-type GAA. Figure 9A shows a schematic diagram of an exemplary rAAV genome construct for candidate 1_AAT_hIGF2-V43M_wtGAA_del1-69_Stuffer.V02 (SEQ ID NO: 79). Figure 9B shows a schematic diagram of an exemplary rAAV genome construct for candidate 2_FIBrat_hIGF2-V43M_wtGAA_del1-69_Stuffer.V02 (SEQ ID NO: 80). Figure 9C shows a schematic diagram of an exemplary rAAV genome construct for candidate 3_FIBhum_hIGF2-V43M_wtGAA_del1-69_Stuffer.V02 (SEQ ID NO: 81). Figure 9D shows a schematic diagram of an exemplary rAAV genome construct for candidate 4_AAT_GILT_wtGAA_del1-69__Stuffer.V02 (SEQ ID NO: 82). Figure 9E shows a schematic diagram of an exemplary rAAV genome construct of candidate 5_FIBrat_GILT_wtGAA_del1-69_Stuffer.V02 (SEQ ID NO: 83). Figure 9F shows a schematic diagram of an exemplary rAAV genome construct of candidate 6_FIBhum_GILT_wtGAA_del1-69_Stuffer.V02 (SEQ ID NO: 84). Those skilled in the art can easily replace the TTR liver promoter shown in Figures 9A-9F with any liver-specific promoter disclosed in Table 4 herein, including any LPS, e.g., a promoter selected from, but not limited to, SEQ ID NOs: 86, 91-96, or 146-150. Alternatively, the TTR promoter can be replaced with an LSP that can preferentially express the hGAA polypeptide in the liver and at least one other tissue of interest, e.g., muscle or CNS. In some embodiments, the TTR promoter can be replaced with an LSP that can preferentially express the hGAA polypeptide in the liver, as well as in muscle and CNS tissues.In some embodiments, the expressed lysosomal enzyme, e.g., GAA protein, may be configured as a GAA fusion protein with a targeting sequence, such as the IGF2-targeting peptide disclosed herein, which targets the GAA protein to lysosomes, and / or may be fused with a signal peptide (SP), the GAA protein being expressed by the rAAV genome in the liver, where it is secreted and taken up by lysosomes in mammalian cells, particularly muscle cells. Since these are exemplary constructs for illustrative purposes only, the wtGAA sequence can be readily replaced with the codon-optimized sequence disclosed herein, or a GAA sequence modified to reduce CpG islands and / or reduce innate immunity, as disclosed herein (see Figure 11B).

[0057] [Figure 9-2] Same as above. [Figure 9-3] Same as above.

[0058] [Figure 10] Figure 10 shows mean in vivo luciferase expression in mice driven by exemplary liver-specific promoters SP0244 and SP0239. Expression levels are shown as total luminous flux of mean bioluminescence intensity (photons per second). Error bars are the standard error of the mean. When animals were injected with saline alone (n=10), no luciferase bioluminescence was detected. When animals were injected with a construct containing luciferase operably ligated to the LP1 promoter (n=9), luciferase bioluminescence was detected. To test the activity of exemplary liver-specific promoters, animals were injected with equivalent constructs containing luciferase operably ligated to the SP0244 promoter (n=8) and the SP0239 promoter (n=10). Promoters SP0244 and SP0239 showed higher in vivo luciferase expression than the control LP1.

[0059] [Figure 11-1]Figures 11A–11D show exemplary modifications to nucleic acid sequences encoding the GAA polypeptide and nucleic acid constructs for optimizing GAA protein expression by AAV in vivo. Figure 11A shows a schematic diagram of a wild-type GAA (wtGAA) nucleotide sequence operably linked to a liver-specific promoter disclosed herein, e.g., the LSP in Table 4, showing the alternative reading frame, indicated by arrows, and three CpG islands. Figure 11B shows a schematic diagram similar to Figure 11A, showing a more detailed modification to the nucleic acid sequence encoding GAA to remove CpG islands. Figure 11C shows a modification to the wtGAA nucleic acid sequence of Sequence ID No. 182, which has been modified to include modifications for (i) reducing the alternative reading frame, (ii) the number of CpG islands, and (iii) the optimal Kozak sequence. Figure 11D shows another schematic diagram of a modification in the nucleic acid sequence encoding the GAA polypeptide for modifications for the number of CpG islands and the optimal Kozak sequence to reduce the alternative reading frame. [Figure 11-2] Same as above. [Figure 11-3] Same as above. [Figure 11-4] Same as above.

[0060] [Figure 12] Figure 12 shows a schematic diagram of an exemplary rAAV construct including an LSP for expressing GAA under a liver-specific promoter. The LSP can be selected from any of the liver-specific promoters disclosed in Table 4 herein, with or without a stuffer sequence.

[0061] [Figure 13]Figures 13A and 13B show GAA expression from constructs containing liver-specific promoters SP0412 and SP0422 in Huh7 and HEPG2 cells. Figure 13A shows a Western blot of GAA expression from constructs containing liver-specific promoters SP0412 (SEQ ID NO: 91) and SP0422 (SEQ ID NO: 92) in Huh7 cells. Figure 13A shows that hGAA expression using promoters 412 (SEQ ID NO: 91) and 422 (SEQ ID NO: 92) results in significantly higher hGAA expression in Huh7 cells compared to expression using the LP1 promoter (SEQ ID NO: 432), referred to as "LSP SS". Figure 13B shows a Western blot of GAA expression from constructs containing liver-specific promoters SP0412 (SEQ ID NO: 91) and SP0422 (SEQ ID NO: 92) in HEPG2 cells. GAA polypeptides were expressed from rAAVs constructed using the following plasmids: LSP NEW (SEQ ID NO: 160), 412 NEW (SEQ ID NO: 159), TTR NEW (SEQ ID NO: 155), LSP ss (AAV with LP-1), 412 TTR, 422 stuffer (SEQ ID NO: 158), 422 TTR, 412 stuffer (SEQ ID NO: 156). Figure 13B shows that hGAA expression using promoters 412 (SEQ ID NO: 91) and 422 (SEQ ID NO: 92) resulted in significantly higher hGAA expression in HepG2 cells compared to expression using the LP1 promoter (SEQ ID NO: 432), referred to as "LSP SS". [Modes for carrying out the invention]

[0062] The drawings described above illustrate aspects of the invention in at least one of its exemplary embodiments, which are further defined in detail in the following description. Features, elements, and aspects of the invention referenced by the same numbers in different drawings represent the same, equivalent, or similar features, elements, or aspects in one or more embodiments.

[0063] Detailed explanation The disclosures described herein generally relate to recombinant AAV (rAAV) vectors and constructs for rAAV genomes for gene therapy, for target delivery of lysosomal proteins such as GAA polypeptides. In particular, the technologies described herein generally relate to rAAV vectors or rAAV genomes for producing lysosomal proteins, such as GAA polypeptides, which are expressed in the liver and effectively targeted to lysosomes in mammalian cells, such as human cardiac muscle cells and skeletal muscle cells. For example, the technology relates to an rAAV vector for transducing hepatocytes, the transduced hepatocytes secrete GAA polypeptides, and the secreted GAA polypeptides are targeted to lysosomes in skeletal muscle tissue, cardiomyocytes, diaphragmatic muscle tissue, or a combination thereof.

[0064] Accordingly, one aspect of the technique described herein provides an rAAV vector comprising an rAAV genome that can be used to produce a lysosomal protein, such as GAA or modified GAA, which is more effectively secreted from cells, such as liver cells, and then targeted to lysosomes in mammalian cells, such as human cardiac muscle cells and skeletal muscle cells.

[0065] In particular, in some embodiments, the lysosomal protein, e.g., GAA polypeptide, is expressed by itself. In some embodiments, the lysosomal protein is expressed as a fusion protein comprising at least a signal peptide that promotes the secretion of the lysosomal protein, e.g., GAA polypeptide, from the liver. In some embodiments, the GAA polypeptide or modified GAA is expressed as a fusion protein comprising at least a signal peptide that promotes the secretion of the GAA polypeptide from the liver, as well as a targeting sequence that enables effective targeting to lysosomes in mammalian cells, e.g., muscle cells, e.g., human cardiac muscle cells and skeletal muscle cells. In some embodiments, the targeting peptide is an IGF2-targeting peptide as described herein.

[0066] One aspect of the technology described herein relates to an rAAV vector for use in treating diseases such as Pompe disease, and further for the treatment of Pompe disease, comprising a nucleotide sequence containing an inverted terminal repeat (ITR), a liver-specific promoter, a heterogene, a polyA tail, and potentially other regulatory elements, wherein the heterogene is GAA, and the rAAV GAA can be administered to a patient in a therapeutically effective dose delivered to appropriate tissues and / or organs for heterogene expression and treatment of the disease.

[0067] One aspect of the technology described herein relates to an rAAV vector comprising, in its genome, a heterogeneous nucleic acid sequence encoding an alpha-glucosidase (GAA) polypeptide located between the 5' and 3' ITRs, in a 5' to 3' direction, wherein the heterogeneous nucleic acid is operably ligated to a liver-specific promoter, for example, the liver-specific promoters disclosed in Table 4 herein, or functional variants thereof. Another aspect of the technology described herein relates to an rAAV vector comprising, in its genome, a heterogeneous nucleic acid sequence encoding a secretory signal peptide (SS) located between the 5' and 3' ITRs, in a 5' to 3' direction, and a nucleic acid sequence encoding an alpha-glucosidase (GAA) polypeptide, wherein the heterogeneous nucleic acid is operably ligated to a liver-specific promoter, for example, the liver-specific promoters disclosed in Table 4 herein, or functional variants thereof.

[0068] One aspect of the technology described herein relates to an rAAV vector comprising, in its genome, a heterogeneous nucleic acid sequence that encodes a fusion polypeptide located between the 5' and 3' ITRs, comprising (i) a secretory signal peptide (SS), (ii) an IGF2-targeting peptide, and (iii) an alpha-glucosidase (GAA) polypeptide, wherein the heterogeneous nucleic acid is operably linked to a liver-specific promoter, for example, the liver-specific promoters disclosed in Table 4 herein, or functional variants thereof.

[0069] In all embodiments of the techniques described herein, the liver-specific promoter preferentially expresses a lysosomal protein, such as the hGAA polypeptide, in the liver. In all embodiments of the techniques described herein, the liver-specific promoter preferentially expresses a lysosomal protein, such as the hGAA polypeptide, in the liver and at least one other tissue of interest, such as muscle or CNS, and in some embodiments, the LSP can be replaced with an LSP that can preferentially express the hGAA polypeptide in the liver and in the tissues of muscle and CNS. In all embodiments of the techniques described herein, the liver-specific promoter can be replaced with another promoter, such as a muscle promoter, in some embodiments where the AAV vector includes at least one capsid protein that targets muscle.

[0070] In some embodiments of the methods and compositions disclosed herein, the secretory signal peptide is selected from among AAT signal peptide, fibronectin signal peptide (FN1), GAA signal peptide, or active fragments thereof having secretory signaling activity.

[0071] In some embodiments, the rAAV vectors described herein are derived from any serotype. In some embodiments, the rAAV vector is an AAV3b serotype comprising, but not limited to, AAV3b265D virion, AAV3b265D549A virion, AAV3b549A virion, AAV3bQ263Y virion, or AAV3bSASTG virion (i.e., a virion comprising an AAV3b capsid containing the Q263A / T265 mutation). In some embodiments, the rAAV vector comprises a liver-specific capsid, for example, a liver-specific capsid selected from XL32 and XL32.1 disclosed in WO2019 / 241324, which is incorporated herein in its entirety by reference. In some embodiments, the rAAV vector is AAVXL32 or AAVXL32.1 disclosed in WO2019 / 241324, which is incorporated herein in its entirety by reference. In some embodiments, the rAAV vector is an rAAV8 vector, or a haploid rAAV vector comprising at least one capsid protein derived from AAV8 (i.e., one or more of VP1, VP2, or VP3 are derived from AAV8 or its chimeric protein). In some embodiments, the AAV vector comprises a capsid disclosed in WO2019241324A1 or International Patent Application PCT / US2019 / 036676, which are incorporated herein by reference in their entirety. In some embodiments, the AAV vector comprises a capsid encoded by a nucleic acid AAV capsid coding sequence that is at least 90% identical to the nucleotide sequence encoding any one of SEQ ID NOs: 1-3 disclosed in WO2019241324A1 or (b) any one of SEQ ID NOs: 4-6 disclosed in WO2019241324A1. In some embodiments, the AAV capsid, together with the AAV particle containing the AAV vector genome and AAV capsid of the present invention, has an amino acid sequence that is at least 90% identical to any one of SEQ ID NOs. 4-6 disclosed in WO2019241324A1.In some embodiments, the rAAV vector includes a capsid protein so that the AAV vector transduces liver cells, and in some embodiments, the rAAV vector includes a capsid protein so that the AAV vector transduces muscle and liver cells. In such embodiments, where the rAAV includes a capsid protein that enables transduction of muscle cells, the LSP can be replaced with another promoter, for example, a muscle promoter, or a promoter that expresses the protein in liver and muscle cells. I. Definition

[0072] The following terms are used in this specification and in the appended claims.

[0073] In the context describing the present invention (in particular in the context of the following claims), the terms “a,” “an,” “the,” and similar references should be interpreted as encompassing both singular and plural unless otherwise indicated herein or explicitly contradicted by the context. Furthermore, sequential designations such as “first,” “second,” “third,” etc., for identified elements are used to distinguish between elements and do not indicate or imply a required or limiting number of such elements, nor do they indicate a particular position or order of such elements unless otherwise specifically stated. All methods described herein can be performed in any preferred order unless otherwise indicated herein or explicitly contradicted by the context. The use of any examples or illustrative language provided herein (e.g., “such as”) is merely intended to clarify the present invention and does not imply any limitation to the scope of the present invention as described in other claims. Language herein should not be interpreted as indicating any element not described in any claim that is essential for the practice of the present invention.

[0074] Furthermore, as used herein, the term “about” means that when referring to a measurable value, such as the length, dose, time, or temperature of a polynucleotide or polypeptide sequence, it includes a variation of ±20%, ±10%, ±5%, ±1%, ±0.5%, or even ±0.1% of the specified amount.

[0075] Furthermore, as used herein, “and / or” means and encompasses one or any possible combination of the related listed items, as well as the absence of any combination, if interpreted alternatively ("or").

[0076] Where used herein, the transitional phrase “essentially from” should be interpreted as encompassing the specified materials or steps enumerated in the claims and “not substantially affecting the fundamental and novel features” of the invention described herein. See In re Herz, 537 F.2d 549, 551-52, 190 USPQ 461,463 (CCPA 1976) (emphasis in the original text); see also MPEP §2111.03. Therefore, the term “essentially from” is not intended to be interpreted as equivalent to “include” where used in the claims of the present invention. Unless the context otherwise indicates, the various features of the present invention described herein are intended to be used in any combination.

[0077] Furthermore, in some embodiments of the present invention, it is intended that any feature or combination of features described herein may be excluded or omitted.

[0078] To illustrate further, where this specification indicates that a particular amino acid can be selected from A, G, I, L, and / or V, this language also indicates that the amino acid can be selected from any subset of these amino acids, e.g., A, G, I, or L; A, G, I, or V; A or G; L only; etc., as each such subset is clearly described herein. Such language also indicates that one or more of the given amino acids may be abandoned (e.g., by negative conditioning). For example, in a particular embodiment, the amino acid may be neither A, G, nor I; not A; not G, nor V; etc., as each such possible abandonment is clearly described herein.

[0079] As used herein, the term “parvovirus” encompasses the Parvoviridae family, which includes autonomously replicating parvoviruses and dependent viruses. Examples of autonomous parvoviruses include members of the genera Parvovirus, Erythrovirus, Densovirus, Iteravirus, and Contravirus. Exemplary autonomous parvoviruses include, but are not limited to, mouse microvirus, bovine parvovirus, canine parvovirus, chicken parvovirus, feline panleukopenia virus, feline parvovirus, goose parvovirus, H1 parvovirus, Novyken parvovirus, B19 virus, and any other autonomous parvoviruses currently known or to be discovered later. Other autonomous parvoviruses are known to those skilled in the art. See, for example, BERNARD N. FIELDS et al., VIROLOGY, volume 2, chapter 69 (4th ed., Lippincott-Raven Publishers).

[0080] As used herein, the term “adeno-associated virus” (AAV) includes, but is not limited to, type 1 AAV, type 2 AAV, type 3 AAV (including types 3A and 3B), type 4 AAV, type 5 AAV, type 6 AAV, type 7 AAV, type 8 AAV, type 9 AAV, type 10 AAV, type 11 AAV, avian AAV, bovine AAV, canine AAV, equine AAV, sheep AAV, and any other AAVs currently known or to be discovered later. See, for example, BERNARD N. FIELDS et al., VIROLOGY, volume 2, chapter 69 (4th ed., Lippincott-Raven Publishers). Several relatively new AAV serotypes and clades have been identified (see, for example, Gao et al., (2004) J. Virology 78:6381-6388; Moris et al., (2004) Virology 33-:375-383; and Table 1 disclosed in U.S. Provisional Application No. 62,937,556 filed November 19, 2019, as well as Table 1 in international applications WO2020 / 102645 and WO2020 / 102667, each of which is incorporated herein by reference in whole).

[0081] The genome sequences of various serotypes of AAV and autonomous parvovirus, as well as the sequences of native inverted terminal repeats (ITRs), Rep proteins, and capsid subunits, are publicly known in the art. Such sequences can be found in the literature or in publicly available databases such as GenBank. For example, GenBank accessions NC_002077, NC_001401, NC_001729, NC_001863, NC_001829, NC_001862, NC_000883, NC_001701, NC_001510, NC_006152, NC_006261, AF063497, U89790, AF043303, AF028705, AF028704, and J0227. See 5, J01901, J02275, X01457, AF288061, AH009962, AY028226, AY028223, NC_001358, NC_001540, AF513851, AF513852, AY530579; their disclosures are incorporated herein by reference to teach the nucleic acid and amino acid sequences of parvovirus and AAV.For example, Srivistava et al., (1983) J Virology 45:555;Chiarini et al., (1998) J. Virology 71:6823;Chiarini et al., (1999) J. Virology 73:1309;Bantel-Schaal et al., (1999) J. Virology 73:939;Xiao et al., (1999) J. Virology 73:3994;Muramatsu et al., (1996) Virology 221:208;Shade et al., (1986) J. Viral. 58:921;Gao et al., (2002) Proc. Nat. Acad. Sci. USA 99:11854;Morris et al., (2004) Virology See also international patent publications WO00 / 28061, WO99 / 61601, WO98 / 11244; and U.S. Patent No. 6,156,303; their disclosures are incorporated herein by reference to teach the nucleic acid and amino acid sequences of parvovirus and AAV. See also Tables 1 and 5 disclosed in Patent No. 62,937,556 filed November 19, 2019, or Table 1 disclosed in international applications WO2020 / 102645 and WO2020 / 102667, each of which is incorporated herein by reference in whole. The capsid structures of autonomous parvovirus and AAV are described in more detail in BERNARD N. FIELDS et al., VIROLOGY, volume 2, chapters 69 & 70 (4th ed., Lippincott-Raven Publishers).See also the descriptions of the crystal structures of AAV2 (Xie et al., (2002) Proc. Nat. Acad. Sci. 99:10405-10), AAV4 (Padron et al., (2005) J. Viral. 79: 5047-58), AAV5 (Walters et al., (2004) J. Viral. 78: 3361-71), and CPV (Xie et al., (1996) J. Mal. Biol. 6:497-520 and Tsao et al., (1991) Science 251: 1456-64).

[0082] As used herein, the term "tropism" refers to the preferential entry of a virus into a particular cell or tissue, followed, if necessary, by the expression (e.g., transcription, and, if necessary, translation) of a sequence carried by the viral genome within the cell, for example, the expression of a desired heterologous nucleic acid in the case of a recombinant virus.

[0083] As used herein, “systemic tropism” and “systemic transduction” (and equivalent terms) indicate that the viral capsid or viral vector of the present invention exhibits tropism to and / or transdoses tissues throughout the body (e.g., brain, lungs, skeletal muscle, heart, liver, kidneys, and / or pancreas). In embodiments of the present invention, systemic transduction of the central nervous system (e.g., brain, neuronal cells, etc.) is observed. In other embodiments, systemic transduction of myocardial tissue is achieved.

[0084] As used herein, “selective tropism” or “specific tropism” means the delivery of a viral vector to certain target cells and / or certain tissues, and / or their specific transduction.

[0085] Unless otherwise indicated, “efficient transduction” or “efficient tropism” or similar terms can be determined by reference to a suitable control (e.g., at least about 50%, 60%, 70%, 80%, 85%, 90%, 95%, 100%, 125%, 150%, 175%, 200%, 250%, 300%, 350%, 400%, 500%, or higher, respectively, of the control’s transduction or tropism). In certain embodiments, the viral vector efficiently transduces liver cells and muscle cells, or has efficient tropism to them. A suitable control will depend on various factors, including the desired tropism and / or transduction profile.

[0086] Similarly, whether a virus is “not efficiently transduced” or “does not have efficient tropism” to a target tissue, or similar terms, can be determined by referring to a suitable control. In certain embodiments, the viral vector is not efficiently transduced to kidney, gonad, and / or germ cells (i.e., does not have efficient tropism). In certain embodiments, the transduction of a tissue (e.g., kidney) (e.g., undesirable transduction) is 20% or less, 10% or less, 5% or less, 1% or less, or 0.1% or less than the level of transduction of a desired target tissue (e.g., liver, skeletal muscle, diaphragmatic muscle, cardiac muscle, and / or central nervous system cells).

[0087] In some embodiments of the present invention, AAV particles comprising the capsid of the present invention can demonstrate multiple phenotypes of efficient transduction of certain tissues / cells and extremely low levels of transduction (e.g., reduced transduction) of certain tissues / cells in which such transduction is undesirable.

[0088] As used herein, the term “polypeptide” encompasses both peptides and proteins unless otherwise indicated.

[0089] A "polynucleotide" is a sequence of nucleotide bases and can be RNA, DNA, or DNA-RNA hybrid sequences (including both naturally occurring and non-naturally occurring nucleotides), but in typical embodiments, it is either a single-stranded or double-stranded DNA sequence.

[0090] The terms “heterogeneous nucleotide sequence” and “heterogeneous nucleic acid molecule” are used interchangeably herein and refer to nucleic acid sequences that do not naturally exist in viruses. Generally, a heterogeneous nucleic acid molecule or heterogeneous nucleotide sequence contains an open reading frame that encodes polypeptides and / or uncoding RNA of interest (e.g., for delivery to cells and / or targets).

[0091] A "chimeric nucleic acid" is a nucleic acid sequence containing two or more nucleic acid sequences covalently linked together to encode a fusion polypeptide. The nucleic acid can be DNA, RNA, or a hybrid thereof.

[0092] The term "fusion polypeptide" typically refers to two or more polypeptides that are covalently linked together by peptide bonds.

[0093] As used herein, “isolated” polynucleotides (e.g., “isolated DNA” or “isolated RNA”) mean polynucleotides that have been at least partially isolated from at least a portion of other components of a naturally occurring organism or virus, e.g., structural components of a cell or virus commonly found in relation to polynucleotides, or other polypeptides or nucleic acids. In typical embodiments, the “isolated” nucleotides are enriched at least about 10-fold, 100-fold, 1,000-fold, 10,000-fold, or more compared to the starting material.

[0094] Similarly, “isolated” polypeptide means a polypeptide that is at least partially isolated from at least a portion of other naturally occurring components of organisms or viruses, such as structural components of cells or viruses commonly found in relation to polypeptides, or other polypeptides or nucleic acids. In typical embodiments, the “isolated” polypeptide is enriched at least about 10-fold, 100-fold, 1,000-fold, 10,000-fold, or more compared to the starting material.

[0095] "Isolated cells" refers to cells separated from other components that are normally associated with them in their native state. For example, isolated cells may be cells in a culture medium and / or cells in a pharmaceutically acceptable carrier of the present invention. Thus, isolated cells may be delivered to and / or introduced into a subject. In some embodiments, isolated cells may be cells that are removed from a subject, manipulated ex vivo as described herein, and then returned to the subject.

[0096] A group of billions can be produced by any of the methods described herein. In one embodiment, the group is at least 101 billions. In one embodiment, the group is at least 102 billions, at least 103 billions, at least 104 billions, at least 105 billions, at least 106 billions, at least 107 billions, at least 108 billions, at least 109 billions, at least 1010 billions, at least 1011 billions, at least 1012 billions, at least 1013 billions, at least 1014 billions, at least 1015 billions, at least 1016 billions, or at least 1017 billions. The group of billions may be heterogeneous or homogeneous (e.g., substantially homogeneous or perfectly homogeneous).

[0097] "Substantially homogeneous population," as used herein, means a population of virtually identical billions with little to no impurities (non-identical billions). A substantially homogeneous population is at least 90% identical billions (e.g., desired billions) and may be at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, or at least 99.9% identical billions.

[0098] A perfectly homogeneous group of billions contains only identical billions.

[0099] As used herein, “isolating” or “purifying” (or grammatical equivalents) a viral vector, viral particles, or population of viral particles means that the viral vector, viral particles, or population of viral particles is separated at least partially from at least some of the other components in the starting material. In typical embodiments, the “isolated” or “purified” viral vector, viral particles, or population of viral particles is enriched at least about 10-fold, 100-fold, 1,000-fold, 10,000-fold, or more compared to the starting material.

[0100] Unless otherwise indicated, “efficient transduction” or “efficient tropism” or similar terms can be determined by reference to a suitable control (e.g., at least about 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 125%, 150%, 175%, 200%, 250%, 300%, 350%, 400%, 500%, or higher, respectively, of the control’s transduction or tropism). In certain embodiments, the viral vector efficiently transduces neuronal and cardiomyocytes, or has efficient tropism to them. A suitable control will depend on various factors, including the desired tropism and / or transduction profile.

[0101] A "therapeutic polypeptide" is a polypeptide that can alleviate, reduce, prevent, delay, and / or stabilize a condition resulting from the absence or deficiency of a protein in a cell or subject, and / or alternatively, a polypeptide that confers a benefit to the subject, such as enzyme replacement to reduce or eliminate symptoms of a disease, or improvement of graft viability, or induction of an immune response.

[0102] The terms "heterologous nucleotide sequence" and "heterologous nucleic acid molecule" are used interchangeably herein and refer to a nucleic acid sequence that does not occur naturally in a virus. Generally, a heterologous nucleic acid molecule or heterologous nucleotide sequence includes an open reading frame encoding a polypeptide of interest (e.g., for delivery to a cell and / or subject) and / or non-translated RNA, such as the open reading frame encoding the GAA polypeptide.

[0103] As used herein, the terms "viral vector", "vector", or "gene delivery vector" refer to a virus (e.g., AAV) particle that functions as a nucleic acid delivery vehicle and contains a vector genome (e.g., viral DNA [vDNA]) packaged within a virion. Alternatively, in some contexts, the term "vector" may be used to refer to the vector genome / vDNA alone.

[0104] An "rAAV vector genome" or "rAAV genome" is an AAV genome (i.e., vDNA) containing one or more heterologous nucleic acid sequences. Generally, rAAV vectors require only inverted terminal repeats (TRs) in cis to generate the virus. All other viral sequences are not critical and can be supplied in trans (Muzyczka, (1992) Curr. Topics Microbial. Immunol. 158:97). Typically, an rAAV vector genome will retain only one or more TR sequences to maximize the size of the transgene that can be efficiently packaged by the vector. Coding sequences for structural and non-structural proteins can be supplied in trans (e.g., from a vector such as a plasmid, or by stable incorporation of the sequences into packaging cells). In embodiments of the present invention, the rAAV vector genome comprises at least one ITR sequence (e.g., an AAV TR sequence), and optionally two ITRs (e.g., two AAV TRs), which are typically located at the 5' and 3' ends of the vector genome and are adjacent to, but do not need to be contiguous with, heterologous nucleic acids. The TRs may be the same or different from one another.

[0105] The term “terminal repeat” or “TR” includes any viral terminal repeat sequence or synthetic sequence that forms a hairpin structure and functions as an inverted terminal repeat (i.e., an ITR that mediates a desired function, e.g., replication, viral packaging, integration, and / or proviral rescue). A TR may be an AAV TR or a non-AAV TR. Non-AAV TR sequences, such as those of other parvoviruses (e.g., canine parvovirus (CPV), mouse parvovirus (MVM), human parvovirus B-19) or any other suitable viral sequence (e.g., an SV40 hairpin that functions as an SV40 replication origin), can be used as TRs, and they may be further modified by shortening, substitution, deletion, insertion, and / or addition. Furthermore, TRs may be partially or completely synthetic, such as the “double D sequence” described in U.S. Patent No. 5,478,745 by Samulski et al.

[0106] The "AAV terminal repeat" or "AAV TR", including the "AAV inverted terminal repeat" or "AAV ITR", can be derived from any AAV, including, but not limited to, serotypes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 or 12, or any other AAV that is currently known or later discovered. The AAV terminal repeat need not have the native terminal repeat sequence so long as the terminal repeat mediates a desired function, such as replication, viral packaging, integration and / or proviral rescue (e.g., the native AAV TR or AAV ITR sequence may be altered by insertions, deletions, truncations and / or missense mutations).

[0107] The AAV proteins VP1, VP2 and VP3 are capsid proteins that interact together to form the icosahedral AAV capsid. VP1.5 is an AAV capsid protein described in U.S. Publication No. 2014 / 0037585.

[0108] The viral vectors of the present invention can further be "targeted" viral vectors (e.g., having a defined directionality), and / or "hybrid" parvoviruses described in International Patent Publication WO00 / 28004 and Chao et al., (2000) Molecular Therapy 2:619 (i.e., parvoviruses in which the viral TR and viral capsid are from different parvoviruses).

[0109] The viral vectors of the present invention can further be double-stranded parvovirus particles described in International Patent Publication WO01 / 92551, the disclosure of which is incorporated herein by reference in its entirety. Thus, in some embodiments, a double-stranded (ds) genome can be packaged into the viral capsids of the present invention.

[0110] Furthermore, the viral capsid or genomic element can contain other modifications including insertions, deletions and / or substitutions.

[0111] When used herein, "chimeric" capsid protein means an AAV capsid protein (e.g., one or more of VP1, VP2, or VP3) that is modified by substitution of one or more amino acid residues (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) in the amino acid sequence of the capsid protein compared to the wild type, as well as by insertion and / or deletion of one or more amino acid residues (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) in the amino acid sequence compared to the wild type. In some embodiments, a complete or partial domain, functional region, epitope, etc. derived from a certain AAV serotype can be replaced in any combination with the corresponding wild-type domain, functional region, epitope, etc. from a different AAV serotype to generate the chimeric capsid protein of the present invention. The production of chimeric capsid proteins can be carried out according to protocols well known in the art, and a significant number of chimeric capsid proteins that may be included in the capsids of the present invention are described in the literature and herein.

[0112] As used herein, the term “haploid AAV” means the AAVs described in international application WO2018 / 170310 or US application US2018 / 037149, which are incorporated herein by reference in their entirety. In some embodiments, a population of virions is a population of haploid AAVs on which virion particles can be constructed, and at least one viral protein from the group consisting of AAV capsid proteins VP1, VP2, and VP3 is required to form a virion particle capable of encapsulating the AAV genome, unlike at least one of the other viral proteins. For each viral protein present (VP1, VP2, and / or VP3), the protein is of the same type (e.g., all AAV2 VP1). In one example, at least one of the viral proteins is a chimeric viral protein, and at least one of the other two viral proteins is not chimeric. In one embodiment, VP1 and VP2 are chimeric, and only VP3 is not chimeric. For example, only viral particles composed of VP1 / VP2 derived from chimeric AAV2 / 8 (the N-terminus of AAV2 and the C-terminus of AAV8) paired with VP3 derived from AAV2 only, or only chimeric VP1 / VP2 28m-2P3 (the N-terminus of AAV8 without a mature VP3 start codon and the C-terminus of AAV2) paired with VP3 derived from AAV2 only. In another embodiment, only VP3 is chimeric, while VP1 and VP2 are not. In another embodiment, at least one of the viral proteins is derived from a completely different serotype. For example, only chimeric VP1 / VP2 28m-2P3 paired with VP3 derived solely from AAV3. In yet another example, no chimera exists.

[0113] The term "hybrid" AAV vector or parvovirus refers to rAAV vectors in which the viral TR or ITR and viral capsid are derived from different parvoviruses. Hybrid vectors are described in International Patent Publication WO00 / 28004 and Chao et al., (2000) Molecular Therapy 2:619. For example, a hybrid AAV vector typically contains sufficient adenovirus 5' and 3' cis-ITR sequences (i.e., adenovirus terminal repeats and PAC sequences) for adenovirus replication and packaging.

[0114] The term "polyploid AAV" refers to an AAV vector composed of capsids derived from two or more AAV serotypes, which may, for example, take advantage of the individual serotypes for higher transduction but, in certain embodiments, eliminate parental tropism.

[0115] The terms "GAA" or "GAA polypeptide," as used herein, refer not only to mature (approximately 76 or approximately 67 kDa) and precursor (e.g., approximately 110 kDa) GAA, but also to modified (e.g., shortened, or mutated by insertion, deletion, and / or substitution) GAA proteins or fragments thereof that retain biological function (i.e., have at least one biological activity of the native GAA protein as defined above, e.g., capable of hydrolyzing glycogen), and GAA variants (e.g., GAA II as described by Kunita et al., (1997) Biochemica et Biophysica Acta 1362:269; GAA polymorphs and SNPs as described by Hirschhorn, R. and Reuser, AJ (2001) in The Metabolic and Molecular Basis for Inherited Disease (Scriver, CR, Beaudet. AL, Sly, WS & Valle, D. Eds.), pp. 3389-3419, This includes the GAA coding sequences described by McGraw-Hill, New York, see pages 3403–3405 (the entirety of which is incorporated herein by reference). Any GAA coding sequence known in the art may be used, for example, the coding sequences in Figures 8 and 9; see GenBank accession number NM_00152, and Hoefsloot et al., (1988) EMBO J. 7:1697 and Van Hove et al., (1996) Proc. Natl. Acad. Sci. USA 93:65 (human), GenBank accession number NM_008064 (mouse), and Kunita et al., (1997) Biochemica et Biophysics Acta 1362:269 (quail); their disclosures are incorporated herein by reference for their teaching of GAA coding and non-coding sequences.

[0116] The terms “cation-independent mannose-6-phosphate receptor (CI-MPR),” “M6P / IGF-II receptor,” “CI-MPR / IGF-II receptor,” “IGF-II receptor,” or “IGF2 receptor,” or their abbreviations, are used interchangeably herein and refer to cellular receptors that bind to both M6P and IGF-II.

[0117] The term “targeted peptide,” also referred to as “targeted sequence,” is intended, as used herein, to refer to a peptide that targets a specific intracellular compartment, such as mammalian lysosomes. The targeted peptides included for use herein are mannose-6-phosphate-independent lysosomal targeted peptides. An exemplary targeted sequence is the IGF2 targeted peptide disclosed herein.

[0118] The terms “IGF2 sequence” and “IGF2 targeted peptide,” as used in conjunction with “IGF2 targeted sequence” or “IGF2 reader sequence,” are used interchangeably herein and refer to sequences of IGF2 polypeptides that bind to CI-MBR on the surface of cells. In particular, an IGF2 sequence is a peptide that contains a portion of the IGF2 uptake sequence of SEQ ID NO: 5 or a modification of the amino acids of SEQ ID NO: 5. An IGF2 targeted peptide refers to a peptide sequence that binds to the receptor domain of the human cation-independent mannose-6-phosphate receptor (CI-MPR or CA-M6P receptor), which essentially consists of repeats 11–12, repeat 11, or amino acids 1508–1566.

[0119] The term “leader sequence” is used herein interchangeably with the terms “secretion signal sequence,” “signal sequence,” or “signal peptide,” or variations thereof, and is intended to refer to an amino acid sequence (as defined above) that functions to enhance the secretion from a cell of an operablely linked polypeptide (e.g., GAA peptide or IGF2-GAA fusion protein) compared to the level of secretion observed with the native polypeptide. As defined above, “enhanced” secretion means an increase in the relative proportion of lysosomal polypeptide synthesized by the cell that is secreted from the cell, and it is not necessary for the absolute amount of secreted protein to also increase. In certain embodiments of the present invention, essentially all of the GAA polypeptide (i.e., at least 95%, 97%, 98%, 99%, or higher) is secreted. However, it is not necessary for essentially all or even the majority of the GAA polypeptide to be secreted, as long as the level of secretion is enhanced compared to the native GAA polypeptide. Examples of leader sequences include, but are not limited to, innate GAA leader sequences (including congeneral GAA leader sequences), AAT sequences, IL2(1-3), IL2 leader sequences (IL2 wt), modified IL2 leader sequences (IL2 mut), fibronectin (FN1; also known as FBN), or IgG leader sequences, or functional variants thereof, as disclosed herein.

[0120] As used herein, the term “amino acid” encompasses any naturally occurring amino acids, their modified forms, and synthetic amino acids. Naturally occurring levorotatory (L-) amino acids are disclosed in their entirety in Table 2 of U.S. Publication No. 2018 / 0371496, which is incorporated herein. Alternatively, an amino acid may be a modified amino acid residue (non-limiting examples are shown in Table 4 of U.S. Publication No. 2018 / 0371496) and / or an amino acid modified by post-translational modifications (e.g., acetylation, amidation, formylation, hydroxylation, methylation, phosphorylation, or sulfation). Furthermore, amino acids not naturally occurring may be “unnatural” amino acids as described by Wang et al., Annu Rev Biophys Biomol Struct. 35:225-49 (2006). These unnatural amino acids can be advantageously used to chemically link the molecule of interest to the AAV capsid protein.

[0121] To illustrate further, for example, where this specification indicates that a particular amino acid can be selected from A, G, I, L, and / or V, this language also indicates that the amino acid can be selected from any subset of these amino acids, e.g., A, G, I, or L; A, G, I, or V; A or G; L only; etc., as each such subcombination is explicitly described herein. Such language also indicates that one or more of the given amino acids may be abandoned (e.g., by negative conditioning). For example, in a particular embodiment, the amino acid may be neither A, G, nor I; not A; not G, nor V; etc., as each such possible abandonment is explicitly described herein.

[0122] The term “cis-regulatory element” or “CRE” is well known to those skilled in the art and refers to nucleic acid sequences such as enhancers, promoters, insulators, or silencers that can regulate or modulate the transcription of an adjacent gene (i.e., cis-regulatory). CREs are found near the gene they regulate. Typically, CREs regulate gene transcription by binding to transcription factors (TFs), i.e., they contain TF-binding sites (TFBSs). A single TF may bind to many CREs and thus regulate the expression of many genes (pleomorphism). CREs are usually, but not always, located upstream of the transcription start site (TSS) of the gene they regulate. “Enhancers” are CREs that enhance (i.e., upregulate) the transcription of the gene they operably associate with and can be found upstream, downstream, and even within the introns of the gene they regulate. Multiple enhancers can act in a coordinated manner to regulate the transcription of a single gene. In this context, “silencer” refers to a CRE that binds to a TF called a repressor, which acts to prevent or downregulate gene transcription. The term “silencer” can also refer to a region in the 3' untranslated region of messenger RNA, which binds to a protein that represses the translation of mRNA molecules, but this use differs from its use in describing CRE. Generally, the CRE of the present invention is a liver-specific enhancer (often referred to as liver-specific CRE, or liver-specific CRE enhancer, etc.). In this context, the CRE is preferably located 1500 nucleotides or less from the transcription start site (TSS), more preferably 1000 nucleotides or less from the TSS, more preferably 500 nucleotides or less from the TSS, and preferably 250, 200, 150 or 100 nucleotides or less from the TSS. The CREs of the present invention are preferably relatively short in length, preferably 100 nucleotides or less, and for example, they may be 90, 80, 70, 60 nucleotides or less.

[0123] The term “cis-regulatory module” or “CRM” means a functional module composed of two or more CREs, where in this invention, CREs are typically liver-specific enhancers. Therefore, in this application, a CRM typically comprises multiple liver-specific enhancer CREs. Typically, multiple CREs within a CRM act together (e.g., in addition or synergistically) to enhance the transcription of the gene to which the CREs operatively associate. There are conservable ranges for shuffling (i.e., rearranging), inverting (i.e., reversing the directionality) and altering the spacing between CREs within a CRM. Thus, functional variants of a CRM in this invention include variants of a referenced CRM in which the CREs within are shuffled and / or inverted, and / or the spacing between CREs is altered.

[0124] As used herein, the term “promoter” generally refers to a region of DNA located upstream of the nucleic acid sequence being transcribed that is necessary for transcription to occur, i.e., to initiate transcription. Promoters enable appropriate activation or repression of transcription of coding sequences under their control. Promoters typically contain specific sequences that are recognized and bound to by multiple TFs. TFs bind to the promoter sequence, resulting in the recruitment of RNA polymerase, an enzyme that synthesizes RNA from the coding region of a gene. A great many promoters are known in the art.

[0125] As used herein, the term "synthetic promoter" refers to a promoter that does not occur naturally. In this context, this typically includes the synthetic CRE and / or CRM of the present invention operably linked to a minimal (or core) promoter or a liver-specific proximal promoter. The CRE and / or CRM of the present invention serve to enhance the liver-specific transcription of a gene operably linked to the promoter. The portions of the synthetic promoter may be naturally occurring (e.g., a minimal promoter, or one or more CREs in a promoter), but the fully synthetic promoter does not occur naturally.

[0126] As used herein, a "minimal promoter" (also known as a "core promoter") refers to a short DNA segment that is inactive or mostly inactive by itself, but can mediate transcription when combined with other transcriptional regulatory elements. Minimal promoter sequences can be derived from a variety of different sources, including prokaryotic and eukaryotic genes. Examples of minimal promoters are discussed above and include the dopamine beta-hydroxylase gene minimal promoter, the cytomegalovirus (CMV) major immediate early gene minimal promoter (CMV-MP), and the herpes simplex thymidine kinase minimal promoter (MinTK). A minimal promoter typically includes a transcription start site (TSS) and elements immediately upstream, a binding site for RNA polymerase II, and basal transcription factor binding sites (often a TATA box).

[0127] As used herein, a "proximal promoter" refers to a minimal promoter plus the proximal sequence upstream of a gene that tends to contain primary regulatory elements. This typically extends approximately 250 base pairs upstream of the TSS and includes specific TFBSs. In this case, the proximal promoter is preferably a naturally occurring liver-specific proximal promoter that can be combined with one or more CREs or CRMs of the present invention. However, the proximal promoter can be synthetic.

[0128] In the context of the present invention, a “functional variant” of a cis-regulating element (CRE), cis-regulating module (CRM), promoter, or other nucleic acid sequence is a variant of a reference sequence that retains the ability to function in the same manner as, for example, a liver-specific cis-regulating enhancer element, a liver-specific cis-regulating module, or a liver-specific promoter. Alternative terms for such functional variants include “biological equivalent” or “equivalent.”

[0129] It will be recognized that the ability of a given cis-regulatory element to function as a liver-specific enhancer is primarily determined by the ability of the sequence to bind to the same liver-specific transcription factor (TF) that binds to the reference sequence. Therefore, in most cases, a functional variant of a cis-regulatory element will contain a TFBS for the same TF as the reference cis-regulatory element. Although not essential, it is preferable that the transcription factor binding site (TFBS) of the functional variant be in the same relative position (i.e., order) as the reference cis-regulatory element. Although not essential, it is also preferable that the TFBS of the functional variant be in the same orientation as the reference sequence (note that in some cases, the TFBS may exist in the opposite orientation, for example, as an inverse complement relative to the sequence in the reference sequence). Although not essential, it is also preferable that the TFBS of the functional variant be on the same chain as the reference sequence. Therefore, in a preferred embodiment, the functional variant contains a TFBS for the same TF in the same order, orientation, and on the same chain as the reference sequence. It will also be recognized that the arrays between TFBSs (sometimes referred to as spacer arrays, etc.) are not particularly important to the function of the cis-regulating element. Such arrays can typically be modified to some extent, and their lengths can be altered. However, in preferred embodiments, the spacing (i.e., the distance between adjacent TFBSs) is substantially the same in the functional variant because it is in the reference array (e.g., it is not altered by more than 20, preferably more than 10%, and more preferably it is the same). In some cases, it will become apparent that functional variants of the cis-regulating enhancer element can exist in the opposite direction, for example, they may be the inverse complement of the cis-regulating enhancer element described above, or variants thereof.

[0130] The level of sequence identity between a functional variant and a reference sequence can also be an indicator of preserved functionality. A high level of sequence identity in a cis-regulatory element's TFBS is generally more important than sequence identity in a spacer sequence (where there are slight or no requirements for any sequence preservation). However, within the TFBS itself, it will be recognized that a considerable degree of sequence variation can be fitted, given that the functional TFBS sequence does not need to strictly match the consensus sequence.

[0131] The ability of one or more TFs to bind to TFBS in a given functional variant can be determined by any relevant means known in the art, including, but not limited to, electrophoretic mobility shift assays (EMSA), binding assays, chromatin immunoprecipitation (ChIP), and ChIP sequencing (ChIP-seq). In a preferred embodiment, the ability of one or more TFs to bind to a given functional variant is determined by EMSA. Methods for performing EMSA are well known in the art. A preferred approach is described in Sambrook et al. cited above. Numerous relevant papers describing this procedure are available, e.g., Hellman and Fried, Nat Protoc. 2007; 2(8): 1849-1861.

[0132] The terms “liver-specific” or “liver-specific expression,” when relating to promoters, refer to the ability of a cis-regulatory element, cis-regulatory module, or promoter to enhance or drive gene expression in the liver (or in cells derived from the liver) in a preferential or dominant manner compared to other tissues (e.g., spleen, muscle, heart, lung, and brain). Gene expression can be in the form of mRNA or protein. In some embodiments, liver-specific expression is such that there is only a very small amount of expression in other (i.e., non-liver) tissues or cells, i.e., the expression is highly liver-specific. In some embodiments, a liver-specific promoter preferentially drives expression in the liver, but it can also drive gene expression at a lower level in another tissue of interest, e.g., muscle.

[0133] The ability of a cis-regulatory element to function as a liver-specific cis-regulatory enhancer element can be readily assessed by those skilled in the art. They can therefore readily determine whether any variant of the specific cis-regulatory elements listed above remains functional (i.e., is a functional variant as defined above). For example, any given cis-regulatory element to be evaluated can be operably linked to a minimal promoter (e.g., placed upstream of a CMV-MP), and the ability of the cis-regulatory element to drive liver-specific expression of a gene (typically a reporter gene) can be measured. Alternatively, a variant of a cis-regulatory enhancer element can be substituted for a synthetic liver-specific promoter in place of a reference cis-regulatory enhancer element, and the effect on liver-specific expression driven by the modified promoter can be determined and compared to the unmodified form. Similarly, the ability of a cis-regulatory module or promoter to drive liver-specific expression can be readily assessed by those skilled in the art (e.g., as described in the examples below). The expression level of a gene driven by a variant of a reference promoter can be compared to the expression level driven by a reference sequence. In some embodiments, if the liver-specific expression level driven by the variant promoter is at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 100% of the expression level driven by the reference promoter, it can be said that the variant remains functional. Suitable nucleic acid constructs and reporter assays for evaluating the enhancement of liver-specific expression can be readily constructed, and the examples described below provide suitable methodologies.

[0134] Liver specificity can be identified, where the expression of a gene (e.g., a therapeutic gene or reporter gene) occurs preferentially or primarily in cells derived from the liver. Preferential or primary expression can be defined, for example, when the level of expression is significantly higher in liver-derived cells than in other types of cells (i.e., non-liver-derived cells). For example, expression in liver-derived cells is preferably at least 5 times higher than in non-liver cells, preferably at least 10 times higher, and in some cases, it can be 50 times or higher. For convenience, liver-specific expression can preferably be demonstrated by comparing the expression levels in hepatocyte lines (e.g., liver-derived cell lines such as Huh7 and / or HepG2 cells) or primary liver cells with the expression levels in kidney-derived cell lines (e.g., HEK-293), cervical tissue-derived cell lines (e.g., HeLa), and / or lung-derived cell lines (e.g., A549).

[0135] The synthetic liver-specific promoter of the present invention is preferably suitable for promoting expression in the target liver, for example, to drive liver-specific expression of a transgene, preferably a therapeutic transgene. In some embodiments, the liver-specific promoter of the present invention is suitable for promoting the expression of a liver-specific transgene at a level at least 1.5 times higher than the LP1 promoter of SEQ ID NO: 432, preferably 2 times higher than the LP1 promoter, more preferably 3 times higher than the LP1 promoter, and even more preferably 5 times higher than the LP1 promoter (SEQ ID NO: 432). Such expression is preferably determined in liver-derived cells, e.g., Huh7 and / or HepG2 cells, or primary hepatocytes (preferably primary human hepatocytes). In some embodiments, the synthetic liver-specific promoter of the present invention is suitable for promoting gene expression in non-liver-derived cells (e.g., HEK-293, HeLa and / or A549 cells) at a level of two-thirds or less of that of the LP1 promoter (SEQ ID NO: 432).

[0136] The preferred synthetic liver-specific promoter of the present invention is suitable for promoting the expression of liver-specific transgenes and has activity in liver cells that is at least 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 125%, 150%, 175%, 200%, 250%, 300%, 350%, or 400% of the activity of the TBG promoter (SEQ ID NO: 435).

[0137] The synthetic liver-specific promoters of the present invention are preferably suitable for promoting liver-specific expression at a level at least 1.5 times higher than the CMV-IE promoter of SEQ ID NO: 433 in liver-derived cells, and preferably at a level at least 2 times higher than the CMV promoter in liver-derived cells (e.g., HEK-293, HeLa, and / or A549 cells). The synthetic liver-specific promoters disclosed herein may be LSP-H, LSP-M, and LSP-L promoters that indicate high, medium, and low expression in the liver, and in some embodiments, LSP-H, LSP-M, and LSP-L can preferentially or predominantly express proteins in the liver, but can also express proteins in one or more other tissues, e.g., muscle and / or brain. Such LSP-H, LSP-M, and LSP-L promoters disclosed herein can preferentially express at least 90%, or at least 80%, or at least 70%, or at least 60%, or at least 50% of the protein in the liver, and can also express at least 10%, or at least 20%, or at least 30%, or at least 40%, or at least 50% in another tissue, such as muscle tissue. In some embodiments, for example, for the treatment of Pompe disease or lysosomal storage diseases, LSP-H, LSP-M, and LSP-L promoters useful in the methods and compositions disclosed herein can preferentially or primarily drive or enhance gene expression in the liver, but can also express at least some of the protein in muscle tissue.

[0138] Terms such as "identity" and "identity" refer to sequence similarity between two polymer molecules, or between two nucleic acid molecules, such as between two DNA molecules. Sequence alignment and sequence identity determination can be performed using basic local alignment search tools (BLAST), originally described, for example, by Altschul et al. 1990 (J Mol Biol 215: 403-10), or the "Blast2 sequence" algorithm, described, for example, by Tatusova and Madden 1999 (FEMS Microbiol Lett 174: 247-250).

[0139] As used herein, the term "synthetic" means a nucleic acid molecule that does not exist in nature. The synthetic nucleic acid expression constructs of the present invention are produced artificially, typically by recombinant techniques. Such synthetic nucleic acids may contain naturally occurring sequences (e.g., promoters, enhancers, introns, and other such regulatory sequences), but they are present in circumstances not found in nature. For example, a synthetic gene (or portion of a gene) may typically contain one or more nucleic acid sequences that are not contiguous in nature (chimeric sequences), and / or may include substitutions, insertions, and deletions, as well as combinations thereof.

[0140] A “spacer sequence” or “spacer,” as used herein, is a nucleic acid sequence that separates two functional nucleic acid sequences. It can be essentially any sequence, provided that it does not prevent the functional nucleic acid sequence (e.g., a cis-regulatory element) from functioning as desired (for example, this may occur if it contains a silencer sequence that prevents the binding of a desired transcription factor). Typically, it is non-functional, as it exists solely to separate adjacent functional nucleic acid sequences from each other.

[0141] When used herein, the term "pharmaceutically acceptable" means consistent with the art, compatible with other components of a pharmaceutical composition, and not harmful to its recipient.

[0142] The terms “to treat,” “to treat,” or “to treat” (and their grammatical variations) mean reducing, at least partially improving, or stabilizing the severity of the condition in question, and / or achieving some degree of reduction, mitigation, decrease, or stabilization of at least one clinical symptom, and / or delaying the progression of the disease or disability.

[0143] The terms “prevent,” “prevention,” and “prevention” (and their grammatical variations) refer to the prevention and / or delay of the onset of disease, disorder, and / or clinical symptoms in a subject, and / or a reduction in the severity of the onset of disease, disorder, and / or clinical symptoms, compared to what would occur in the absence of the method of the present invention. Prevention may be complete, for example, the complete absence of disease, disorder, and / or clinical symptoms. Prevention may also be partial, such that the appearance and / or onset of disease, disorder, and / or clinical symptoms in a subject is substantially less severe than what would occur in the absence of the present invention.

[0144] A “treatment-effective” dose, as used herein, is an amount sufficient to provide some improvement or benefit to a subject. In other words, a “treatment-effective” dose is an amount that provides some reduction, alleviation, decrease, or stabilization of at least one clinical symptom in a subject. Those skilled in the art will recognize that the therapeutic effect does not need to be complete or curative, as long as some benefit is provided to the subject.

[0145] The “preventive effective” amount, as used herein, is an amount sufficient to prevent and / or delay the onset of disease, disorder and / or clinical symptoms in a subject, and / or to reduce and / or delay the severity of the onset of disease, disorder and / or clinical symptoms in a subject compared to what would occur in the absence of the method of the present invention. Those skilled in the art will recognize that the level of prevention does not need to be complete, as long as some preventive benefit is provided to the subject.

[0146] The term "therapeutic dose" and similar terms mean the dose or plasma concentration in a subject that provides the desired specific pharmacological effect, for example, by expressing a therapeutic gene in the liver and secreting it into the plasma. It is emphasized that even if such a dosage is considered therapeutically effective by those skilled in the art, a therapeutically effective dose may not always be effective in treating the conditions described herein. The therapeutically effective dose may be modified based on the route of administration and dosage form, the age and weight of the subject, and / or the disease or condition being treated.

[0147] The terms “individual,” “subject,” and “patient” are used interchangeably and refer to any individual subject having a disease or condition requiring treatment. For the purposes of this disclosure, a subject may be a primate, preferably a human, or another mammal, such as a dog, cat, horse, pig, goat, or cattle.

[0148] Additional patents incorporated herein by reference that relate to, disclose, or describe AAV or embodiments of AAV, including DNA vectors containing genes intended to be expressed, are U.S. Patent Nos. 6,491,907; 7,229,823; 7,790,154; 7,01898; 7,071,172; 7,892,809; 7,867,484; 8,889,641; 9,169,494; 9,169,492; 9,441,206; 9,409,953; and 9,447,433; 9,592,247; and 9,737,618. II. rAAV genomic elements

[0149] As disclosed herein, one aspect of the art relates to an rAAV vector comprising a capsid and a nucleotide sequence within the capsid referred to as the “rAAV vector genome.” The rAAV vector genome (also referred to as the “rAAV genome”) comprises, but is not limited to, a plurality of elements, including two inverted terminal repeats (ITRs, e.g., 5'-ITR and 3'-ITR), as well as additional elements located between the ITRs, including a promoter, heterogenes, and a poly-A tail.

[0150] In some embodiments, the rAAV genome disclosed herein comprises 5'ITR and 3'ITR sequences, and a promoter operably linked to a heterogeneous nucleic acid encoding an alpha-glucosidase (GAA) polypeptide located between the 5'ITR and 3'ITR, for example, a liver-specific promoter sequence disclosed herein, the heterogeneous nucleic acid sequence may further optionally include one or more of the following elements: intron sequences, nucleic acids encoding secretory signal peptides, nucleic acids encoding IGF2-targeting peptides, and polyA sequences.

[0151] In some embodiments, the rAAV genome disclosed herein comprises 5'ITR and 3'ITR sequences, and promoters operably linked to heterogeneous nucleic acids encoding secretory peptides and nucleic acids encoding alpha-glucosidase (GAA) polypeptides located between the 5'ITR and 3'ITR (i.e., the heterogeneous nucleic acids encode GAA fusion polypeptides including signal peptide-GAA polypeptides), and the rAAV genome optionally further comprises one or more nucleic acids encoding intron sequences, collagen stability (CS) sequences, poly-A tails, and spacers of at least one amino acid. In some embodiments, the rAAV genome disclosed herein comprises 5'ITR and 3'ITR sequences, as well as a liver-specific promoter disclosed herein which is operably linked to heterogeneous nucleic acids encoding secretory peptides (e.g., FN1, AAT, or GAA signal peptides) and nucleic acids encoding alpha-glucosidase (GAA) polypeptides, located between the 5'ITR and 3'ITR. The rAAV genome may further optionally include one or more nucleic acids encoding intron sequences (e.g., MVM or HBB2 intron sequences), collagen stability (CS) sequences, poly-A tails, and spacers of at least one amino acid.

[0152] In some embodiments, the rAAV genome disclosed herein comprises 5'ITR and 3'ITR sequences, and a promoter operably linked to a heterogeneous nucleic acid (i.e., the heterogeneous nucleic acid encodes a GAA fusion polypeptide comprising a signal peptide-targeting sequence-GAA polypeptide) located between the 5'ITR and 3'ITR, the targeting peptide being an IGF2-targeting peptide as described herein, and the rAAV genome may further optionally comprise one or more nucleic acids encoding intron sequences, collagen stability (CS) sequences, poly-A tails, and spacers of at least one amino acid.

[0153] Each element in the rAAV genome is discussed herein. A. Alpha-glucosidase (GAA) polypeptide

[0154] Alpha-glucosidase (GAA) polypeptides are members of the family 31 of glycoside hydrolyases. Human GAA is synthesized as a 110 kDal precursor (Wisselaar et al. (1993) J. Biol. Chem. 268(3): 2223-31). The mature form of the enzyme is a mixture of monomers of 70 and 76 kDal (Wisselaar et al. (1993) J. Biol. Chem. 268(3): 2223-31). The precursor enzyme has seven possible glycosylation sites, four of which are retained in the mature enzyme (Wisselaar et al. (1993) J. Biol. Chem. 268(3): 2223-31). Protein degradation cleavage events that produce maturation enzymes occur in late endosomes or lysosomes (Wisselaar et al. (1993) J. Biol. Chem. 268(3): 2223-31).

[0155] The rAAV vector genome can encode a GAA polypeptide that may contain, for example, smaller portions of human GAA, such as amino acid residues 40-952 or 70-952, or amino acid residues 40-790 or 70-790.

[0156] In one embodiment, a GAA polypeptide may be fused to an IGF2 targeting sequence. In some embodiments, the IGF2 targeting sequence is fused to an amino acid at amino acid 40, or amino acid 70, or one or two amino acids at the positions of amino acid 40 or 70 of a human GAA polypeptide. In some embodiments, the IGF2 targeting peptide disclosed herein is a ligand for an extracellular receptor, for example, the IGF2 targeting peptide binds to a human cation-independent mannose-6-phosphate receptor (CI-MPR) or an IGF2 receptor.

[0157] The C-terminal 160 amino acids are absent in mature 70 and 76 kDal GAA polypeptide species. However, certain Pompe alleles, e.g., Val949Asp, which result in complete loss of GAA activity, are mapped to this region (Becker et al. (1998) J. Hum. Genet. 62:991). The phenotype of this mutant indicates that the C-terminal portion of the protein, while not part of the 70 or 76 kDal species, plays a crucial role in the protein's function. It has also been reported that the C-terminal portion of the protein is cleaved from the rest of the protein during processing but remains associated with the major species (Moreland et al. (Nov. 1, 2004) J. Biol. Chem., Manuscript 404008200). Therefore, the C-terminal residues may play a direct role in the protein's catalytic activity and / or may be involved in promoting the proper folding of the protein's N-terminal portion.

[0158] The native GAA gene encodes a precursor polypeptide possessing a signal sequence and an adjacent putative transmembrane domain, a trefoil domain (PFAM PF00088) (Thim (1989) FEBS Lett. 250:85), a cysteine-rich domain of approximately 45 amino acids containing three disulfide links, a domain defined by a 70 / 76 kDal mature polypeptide, and a C-terminal domain. Both the trefoil domain and the C-terminal domain have been reported to be necessary for the production of functional GAA, and that the C-terminal domain may interact with the trefoil domain during protein folding, possibly promoting proper disulfide bond formation in the trefoil domain.

[0159] GAA polypeptides are described in U.S. Patents 5,962,313 and 6,537,785, which are incorporated herein by reference in their entirety. Those skilled in the art can recognize specific positions on GAA to which secretory signaling peptides (SS) or, instead, targeted peptides (e.g., IGF2-targeted peptides) can be fused. Accordingly, in one embodiment, the present invention relates to GAA fusion proteins in which an SP or IGF2-targeted peptide is fused to human GAA of SEQ ID NO: 10, or modified GAA proteins of SEQ ID NOs: 170-174, or to amino acids 40, 68, 69, 70, 71, 72, 779, 787, 789, 790, 791, 792, 793, or 796 of a portion thereof.

[0160] In some embodiments of the methods and compositions disclosed herein, the human GAA protein expressed by AAV comprises the amino acids of SEQ ID NO: 10, or fragments or variants thereof, for example, a human GAA protein beginning at residues 40, 68, 69, 70, 71, 72, 779, 787, 789, 790, 791, 792, 793, or 796 of SEQ ID NO: 10. In some embodiments of the methods and compositions disclosed herein, the human GAA protein expressed by AAV comprises the amino acids of SEQ ID NO: 10, or a protein that is at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% identical to SEQ ID NO: 10. In some embodiments of the methods and compositions disclosed herein, the human GAA protein expressed by AAV comprises human GAA protein beginning at residue 40, 68, 69, 70, 71, 72, 779, 787, 789, 790, 791, 792, 793, or 796 of SEQ ID NO: 10, or amino acids that are at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% identical to them. In some embodiments, the human GAA protein expressed by AAV contains amino acids beginning at residues 40, 68, 69, 70, 71, 72, 779, 787, 789, 790, 791, 792, 793, or 796 of either SEQ ID NO: 170 (modGAA; H199R, R223H) or SEQ ID NO: 171 (modGAA; H199R, R223H, H201L), or proteins that are at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% identical to them.

[0161] In some embodiments, those skilled in the art can recognize specific positions on GAA to which secretory signal peptides (SS) or, instead, targeted peptides (e.g., IGF2-targeted peptides) can be fused. For example, International Patent Application WO2018046774A1, which is incorporated herein in its entirety by reference, discloses a cleaved GAA polypeptide to which secretory signal peptides (SS) or, instead, targeted peptides (e.g., IGF2-targeted peptides) can be attached. The signal peptide or IGF2-targeted peptide can be attached to any cleaved GAA polypeptide or cleaved modified GAA polypeptide, starting with an amino acid of a GAA-cleaved protein disclosed in U.S. Provisional Application No. 62,937,556 filed November 19, 2019 and International Application WO2020 / 102667 filed November 15, 2019, which are incorporated herein in its entirety by reference.

[0162] In some embodiments, the GAA fusion polypeptide encoded by the rAAV genome described herein may include smaller portions, such as amino acid residues 40–952 or 70–952 of human GAA, or amino acid residues 40–790 or 70–790. In one embodiment, a secretory signaling peptide (SS) or targeted peptide, such as an IGF2 targeted peptide, is fused to amino acid 40, or amino acid 70, or one or two amino acids within the positions of amino acid 40 or 70.

[0163] In some embodiments, the fusion protein (i.e., SS-GAA fusion polypeptide, or SS-IGF2-GAA fusion protein) comprising a secretory signal peptide (SS) and a GAA polypeptide, and optionally an IGF2-targeting peptide, comprises amino acid residues 40-952 or 70-952 of human acid alpha-glucosidase (GAA) (SEQ ID NO: 10). In some embodiments, the N-terminus of the GAA polypeptide is attached to the C-terminus of the SS, and in some embodiments, the N-terminus of the GAA polypeptide is attached to the C-terminus of the IGF2-targeting peptide, and the N-terminus of the IGF2-targeting peptide is attached to the C-terminus of the secretory signal peptide. (i) Modified GAA (modGAA)

[0164] In some embodiments, the GAA protein includes the H201L variant disclosed in US2014 / 0186326 and Moreland et al., Gene, 2012; 491 (25-30), both of which are incorporated herein by reference in their entirety. In particular, the histidine (His) at amino acid position 201 is replaced with a leucine (L) residue to enable rapid processing of the 76kD GAA preprotein to the 70kD mature GAA protein.

[0165] In particular, in some embodiments, the fusion proteins disclosed herein include the GAA polypeptide of SEQ ID NO: 10, having an amino acid modification that results in increased hydrophobicity at or near the N-terminal 70 kDa processing site. In some examples, the GAA peptide is modified with one or more amino acids corresponding to positions 190-209 of SEQ ID NO: 10. In further embodiments, the polypeptide is modified with one or more amino acids corresponding to positions 195-209 of SEQ ID NO: 10. In further embodiments, the modification is at one or more amino acids corresponding to positions 200-204 of SEQ ID NO: 10. In certain embodiments, the modification is at the amino acid corresponding to position 201 of SEQ ID NO: 10. In further embodiments, the modification is the substitution of one or more amino acids with a more hydrophobic amino acid. In other embodiments, the modification is the insertion of one or more hydrophobic amino acids. In even further embodiments, the hydrophobic amino acid is selected from leucine and tyrosine, or conserved amino acids of leucine or tyrosine.

[0166] In certain embodiments, GAA is modified to increase its hydrophobicity at or near the N-terminal 70 kDa processing site by substituting at least one amino acid with a more hydrophobic amino acid. In some embodiments, the substitution may be made within five amino acids upstream or downstream of the N-terminal 70 kDa processing site. In certain examples, the amino acid substitution may be made at amino acids corresponding to positions 195-209 of SEQ ID NO: 10. In other examples, the amino acid substitution may be made at amino acids corresponding to positions 200-204 of SEQ ID NO: 10. In further embodiments, the modified human GAA contains a hydrophobic amino acid at the position corresponding to amino acid position 201 of SEQ ID NO: 10. In some embodiments, GAA is modified by inserting one or more hydrophobic amino acids at or near the N-terminal 70 kDa processing site. Additional modifications include the deletion of one or more amino acids at or near the N-terminal 70 kDa processing site.

[0167] In certain embodiments, a modified human GAA is provided that contains hydrophobic amino acids (natural or synthetic) at two or more positions of the N-terminal 70 kDa processing site, or within five amino acids of the N-terminal 70 kDa processing site. In one embodiment, one of the modified amino acids is at the position corresponding to amino acid 201 of SEQ ID NO: 10.

[0168] In various embodiments, the hydrophobic amino acid is selected from valine, leucine, isoleucine, methionine, phenylalanine, tryptophan, tyrosine, cysteine, or alanine. In further embodiments, the hydrophobic amino acid is leucine or tyrosine. In some embodiments, the modified human GAA contains synthetic or non-natural amino acids that exhibit hydrophobic properties. Generally, the substituted amino acid is more hydrophobic than the wild-type amino acid, and therefore the hydrophobicity is increased at or near the 70 kDa processing site at the N-terminus.

[0169] In one exemplary embodiment, the modified GAA has leucine at the position corresponding to amino acid 201 of SEQ ID NO: 10. In another embodiment, the modified GAA has tyrosine at the position corresponding to amino acid 201 of SEQ ID NO: 10.

[0170] In some embodiments, the modified human GAA protein includes a polypeptide having a modification of His(H) to arginine(R) at amino acid position 199 of SEQ ID NO: 10 ((H199R)) (GAA(H199R)) or a modification of arginine(R) to histidine(H) at amino acid position 223 of SEQ ID NO: 10 ((R223H)) (GAA(R223H)). In some embodiments, the modified human GAA protein includes a polypeptide having a modification of His(H) to arginine(R) at amino acid position 199 of SEQ ID NO: 10 ((H199R)) and a modification of arginine(R) to histidine(H) at amino acid position 223 of SEQ ID NO: 10 ((GAA(H199R-R223H)). In some embodiments, the modified human GAA protein includes SEQ ID NO: 170, or This includes variants having at least one modification of H199R or R223H, or both, that have at least 80%, 90%, 95%, or 99% homology to at least 500, 550, 600, 650, 700, 750, 800, 850, or 900 amino acids of SEQ ID NO: 170. In some embodiments, the homologous leader sequence of GAA (i.e., amino acids 1-27 of SEQ ID NO: 175, or SEQ ID NO: 170) is replaced with an IGF2-targeted peptide disclosed herein, or the leader sequence of SEQ ID NO: 176, or an IL2 wild-type leader peptide (SEQ ID NO: 178), a modified IL2 leader peptide (SEQ ID NO: 180), or a leader peptide with at least 90% sequence identity to SEQ ID NO: 176, 178, or 180.

[0171] In some embodiments, the modified human GAA protein includes variants with at least 80%, 90%, 95%, or 99% homology to at least 500, 550, 600, 650, 700, 750, 800, 850, or 900 amino acids of SEQ ID NO: 171, or a modification of H210L. In some embodiments, the homologous leader sequence of GAA (i.e., amino acids 1-27 of SEQ ID NO: 175, or SEQ ID NO: 171) is replaced with an IGF2-targeted peptide disclosed herein, or the leader sequence of SEQ ID NO: 176, or an IL2 wild-type leader peptide (SEQ ID NO: 178), a modified IL2 leader peptide (SEQ ID NO: 180), or a leader peptide with at least 90% sequence identity to SEQ ID NO: 176, 178, or 180.

[0172] In some embodiments, the modified human GAA protein comprises a polypeptide having at least one modification selected from H199R, R223H, or H201L of SEQ ID NO: 10, or a variant having at least one of these modifications and exhibiting at least 80%, 90%, 95%, or 99% homology to at least 500, 550, 600, 650, 700, 750, 800, 850, or 900 amino acids of SEQ ID NO: 10. In some embodiments, the modified human GAA protein comprises a polypeptide having at least two modifications selected from H199R, R223H, or H201L of SEQ ID NO: 10, or a variant having at least two of these modifications and exhibiting at least 80%, 90%, 95%, or 99% homology to at least 500, 550, 600, 650, 700, 750, 800, 850, or 900 amino acids of SEQ ID NO: 10. In some embodiments, the modified human GAA protein includes a polypeptide having three modifications of SEQ ID NO: 10, namely H199R, R223H, and H201L (GAA-H199R-H201L-R223H), or a variant having these three modifications and exhibiting at least 80%, 90%, 95%, or 99% homology to at least 500, 550, 600, 650, 700, 750, 800, 850, or 900 amino acids of SEQ ID NO: 10.

[0173] In a particular embodiment, a modified human GAA is provided that has at least 80%, 90%, 95%, or 99% homology to at least 500, 550, 600, 650, 700, 750, 800, 850, or 900 amino acids of SEQ ID NO: 10, and further comprises at least one amino acid substituted with a more hydrophobic amino acid at the N-terminal 70 kDa processing site.

[0174] In some embodiments, at least 50% of the modified human GAA is processed to a 70 kDa form in lysosomes within 20, 30, or 40 hours. In further embodiments, substantially all of the modified human GAA is processed to a 70 kDa form in lysosomes within 55, 65, or 75 hours.

[0175] In certain embodiments, the modified human GAA of the present invention can be identified by its more rapid proteolytic processing to a 70 kDa mature form or its corresponding variant. In other embodiments, the modified human GAA described herein can be identified by the generation of an 82 kDa intermediate polypeptide that is not generated during the proteolytic processing of native human GAA. In further embodiments, the modified human GAA can be identified by the absence of a 76 kDa intermediate polypeptide that is generated during the proteolytic processing of unmodified human GAA.

[0176] In certain embodiments, the polypeptide has at least 80% identity with at least 500 amino acids of SEQ ID NO: 10 or SEQ ID NOs: 170-171. In some examples, the polypeptide has at least 90% identity with at least 500 amino acids of SEQ ID NO: 10 or SEQ ID NOs: 170-171. In other examples, the polypeptide has at least 95% identity with at least 500 amino acids of SEQ ID NO: 10 or SEQ ID NOs: 170-171.

[0177] In certain embodiments, GAA polypeptides having a modification to a hydrophobic residue at amino acid 201, e.g., the H201L modification, exhibit more rapid lysosomal protease processing compared to unmodified human acid alpha-glucosidase protein. In some embodiments, at least 50% of the GAA prepolypeptide is proteolytically processed to a 70 kDa mature GAA form within 20 hours of expression. In other embodiments, substantially all of the GAA prepolypeptide is proteolytically processed to a 70 kDa mature GAA form within 55 hours of expression.

[0178] In some embodiments, the homogeneous GAA leader peptide of amino acids 1-27 of SEQ ID NO: 10 (i.e., MGVRHPPCSHRLLAVCALVSLATAALL, SEQ ID NO: 175) is replaced with a different signal peptide (leader peptide). For example, the homogeneous leader peptide of GAA (SEQ ID NO: 175) may be replaced with any of the following: (i) an IgG1 leader peptide having the amino acid sequence MEFGLSWVFLVALLKGVQCE (SEQ ID NO: 176) encoded by the nucleic acid sequence of SEQ ID NO: 177 (referred to herein as "201 leader peptide" or "201lp"), (ii) wtIL2 lp:MYRMQLLSCIALSLALVTNS (SEQ ID NO: 178) encoded by the nucleic acid sequence of SEQ ID NO: 179, or (iii) mutIL2 lp:MYRMQLLLLIALSLALVTNS (SEQ ID NO: 180) encoded by the nucleic acid sequence of SEQ ID NO: 181. In some embodiments, the homogeneous GAA leader peptide (SEQ ID NO: 175) remains present, and one or more additional signal peptides are added, for example, an IgG1 leader peptide having the amino acid sequence MEFGLSWVFLVALLKGVQCE (SEQ ID NO: 176) encoded by the nucleic acid sequence of the signal peptide AAT, FN1, SEQ ID NO: 177 (referred to herein as "201 leader peptide" or "201 lp"), (ii) wtIL2 lp:MYRMQLLLLIALSLALVTNS (SEQ ID NO: 178) encoded by the nucleic acid sequence of SEQ ID NO: 179, or (iii) mutIL2 lp:MYRMQLLLLIALSLALVTNS (SEQ ID NO: 180) encoded by the nucleic acid sequence of SEQ ID NO: 181.

[0179] In some embodiments, GAA is modified to add or remove glycosylation sites, such as N-linked glycosylation sites, O-linked glycosylation sites, or both. In certain embodiments, the addition or removal of glycosylation sites is achieved by N-terminal deletion, C-terminal deletion, internal deletion, random point mutagenesis, or site-directed mutagenesis. In some embodiments, exemplary GAA modifications include the addition of one or more asparagine (Asn) residues, one or more mutations resulting in asparagine (Asn) residues, or the deletion of one or more asparagine (Asn) residues. In certain embodiments, all or some of the N-linked and / or O-linked glycosylation sites present in GAA are mutated. In some embodiments, GAA modifications will provide information associated with the bioactivity, physical structure, and / or substrate binding potential of GAA. (ii) Nucleic acid encoding GAA

[0180] In some embodiments, the rAAV genome includes heterologous nucleic acid sequences encoding the entire GAA polypeptide (e.g., the N-terminal / catalytic domain and the C-terminal domain), which are not fused to heterologous signal sequences or targeted peptides.

[0181] In some embodiments, the rAAV genome includes a heterogeneous nucleic acid sequence encoding a secretory signal peptide or an IGF2-targeting peptide, fused in-frame to the 3' end of a GAA nucleic acid sequence encoding the entire GAA polypeptide (e.g., the N-terminal / catalytic domain and the C-terminal domain). For example, the heterogeneous nucleic acid sequence encoding the secretory signal peptide or IGF2-targeting peptide is fused in-frame to the 3' end of a GAA nucleic acid sequence encoding 70 kDa and 76 kDa GAA polypeptides, and both such polypeptides are expressed from the rAAV genome when the rAAV vector transduces a mammalian cell. In some embodiments, GAA nucleic acid expression may be driven by two promoters in the rAAV genome, or by a single promoter driving the expression of a bicistronic construct.

[0182] In some embodiments of the methods and compositions disclosed herein, the rAAV vector comprises a nucleic acid sequence encoding the GAA protein, which is a wild-type GAA nucleic acid sequence, for example, SEQ ID NO: 11, or SEQ ID NO: 72, or SEQ ID NO: 182. In some embodiments of the methods and compositions disclosed herein, the rAAV vector comprises a nucleic acid sequence encoding the GAA protein, which is a codon-optimized GAA nucleic acid sequence for any one or more of the following: (i) enhanced expression in vivo, (ii) reduction of CpG islands, or (iii) reduction of the innate immune response. Exemplary codon-optimized GAA nuclear sequences incorporated for use in the methods and rAAV compositions disclosed herein can be selected from any of the following nucleic acid sequences: SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, or SEQ ID NO: 182, or SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, or SEQ ID NO: 182, which have at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity.

[0183] In addition, in some embodiments, the GAA nucleic acid sequences incorporated for use in the methods and rAAV compositions disclosed herein are further modified by at least one or more of the following modifications: (i) removal of at least one or two, or in some cases, all alternative leading frames; (ii) removal of one or more CpG islands; (iii) modification of the Kozak sequence; (iv) modification of the translation terminator sequence; and (v) removal of spacers between the promoter and the Kozak sequence.

[0184] For example, in some embodiments, the rAAV composition includes the hGAA nucleotide sequence of SEQ ID NO: 182, or a nucleic acid sequence having at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 182, wherein SEQ ID NO: 182 includes the following elements shown in Table 1A, compared to the wild-type nucleic acid sequence for GAA.

[0185] [Table 1A]

[0186] In some embodiments, the rAAV composition comprises the hGAA nucleotide sequence of SEQ ID NO: 182, or a nucleic acid sequence having at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 182, wherein the hGAA nucleotide sequence is modified with a series of point mutations that eliminate three potentially pro-inflammatory CpG motifs and several alternative reading frames (ARFs), and SEQ ID NO: 182, compared to the wild-type nucleic acid sequence for GAA, includes the following point mutations shown in Table 1B, the numbering in Table 1B assuming that "A" in the GAA start codon ATG is the first nucleotide.

[0187] [Table 1B]

[0188] In some embodiments, the nucleic acid sequence encoding the homogeneous leader peptide in SEQ ID NO: 182 (e.g., nucleotides 1-81 of SEQ ID NO: 182) can be replaced by a nucleic acid sequence encoding 201lp, wtIL2 lp, or mutIL2 lp. Thus, in some embodiments, nuclear residues 1-81 of SEQ ID NO: 182 (encoding the homogeneous leader peptide of GAA) can be replaced by the nucleic acid sequence of SEQ ID NO: 177 (201lp), SEQ ID NO: 179 (wtIL2 lp), or SEQ ID NO: 181 (mutIL2 lp), or by a nucleic acid sequence having at least 60%, 70%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 177, 179, or 181.

[0189] In some embodiments, the rAAV vector or rAAV genome includes a heterogeneous nucleic acid sequence encoding a GAA polypeptide, including SEQ ID NO: 170 (a GAA polypeptide having a congeneral GAA signal sequence and H199R, R223H modifications) or SEQ ID NO: 171 (a GAA polypeptide having a congeneral GAA signal sequence and H199R, H201L, R223H modifications). The GAA polypeptide of SEQ ID NO: 170 is encoded by the nucleic acid sequence of SEQ ID NO: 182. Therefore, in some embodiments, the rAAV vector includes the nucleic acid of SEQ ID NO: 182 encoding a modified GAA polypeptide including the H199R, R223H modifications. The GAA polypeptide of SEQ ID NO: 171 is encoded by the nucleic acid sequence of SEQ ID NO: 182, wherein base pairs (bp) 667-669 of SEQ ID NO: 182 are changed from CAC to one of UUA, UUG, CUU, CUC, CUA, or CUG (resulting in an amino acid change from histidine (H) to leucine (L)); or bp 668 of SEQ ID NO: 182 is changed from A to U. Therefore, in some embodiments, the rAAV vector contains the nucleic acid of SEQ ID NO: 182, wherein base pairs 667-669 of SEQ ID NO: 182 are changed from CAC to one of UUA, UUG, CUU, CUC, CUA, or CUG (resulting in an amino acid change from histidine (H) to leucine (L)); or bp 668 of SEQ ID NO: 182 is changed from A to U, which encodes a modified GAA polypeptide including the H199R, H201L, and R223H modifications.

[0190] In some embodiments, the rAAV vector or rAAV genome includes a heterogeneous nucleic acid sequence encoding a GAA polypeptide selected from SEQ ID NO: 172 (a GAA polypeptide in which the homogeneous signal peptide is replaced with an IgG signal sequence and H199R, R223H modifications), SEQ ID NO: 173 (a GAA polypeptide in which the homogeneous signal peptide is replaced with a wtIL2 signal sequence and H199R, R223H modifications), or SEQ ID NO: 174 (a GAA polypeptide in which the homogeneous signal peptide is replaced with a mutIL3 signal sequence and H199R, R223H modifications).

[0191] In some embodiments, the rAAV vector or rAAV genome includes a heterogeneous nucleic acid sequence containing SEQ ID NO: 182, wherein bp1-81 of SEQ ID NO: 182 is replaced with the nucleic acid of SEQ ID NO: 177 (IgG signal sequence), which encodes the GAA polypeptide of SEQ ID NO: 172 (IgG reader-GAA with H199R, R223H modifications). In some embodiments, the rAAV vector includes a heterogeneous nucleic acid sequence containing SEQ ID NO: 182, wherein bp668 of SEQ ID NO: 182 is changed from A to U, and bp1-81 of SEQ ID NO: 182 is replaced with the nucleic acid of SEQ ID NO: 177 (IgG signal peptide), which encodes the GAA polypeptide of SEQ ID NO: 172 (IgG reader-GAA with H199R, H201L, and R223H modifications).

[0192] In some embodiments, the rAAV vector or rAAV genome includes a heterogeneous nucleic acid sequence containing SEQ ID NO: 182, where bp1-81 of SEQ ID NO: 182 is replaced with the nucleic acid of SEQ ID NO: 179 (wt IL2 signal peptide), which encodes the GAA polypeptide of SEQ ID NO: 173 (wt IL2 signal peptide-GAA with H199R, R223H modifications). In some embodiments, the rAAV vector includes a heterogeneous nucleic acid sequence containing SEQ ID NO: 182, where bp668 of SEQ ID NO: 182 is changed from A to U, and bp1-81 of SEQ ID NO: 182 is replaced with the nucleic acid of SEQ ID NO: 179 (wt IL2 signal peptide), which encodes the GAA polypeptide of SEQ ID NO: 173 (wt IL2 signal peptide-GAA with H199R, H201L, and R223H modifications).

[0193] In some embodiments, the rAAV vector includes a heterogeneous nucleic acid sequence containing SEQ ID NO: 182, where bp1-81 of SEQ ID NO: 182 are replaced with the nucleic acid of SEQ ID NO: 181 (mut IL2 signal peptide), which encodes the GAA polypeptide of SEQ ID NO: 174 (mut IL2 signal peptide-GAA with H199R, R223H modifications). In some embodiments, the rAAV vector includes a heterogeneous nucleic acid sequence containing SEQ ID NO: 182, where bp668 of SEQ ID NO: 182 is changed from A to U, and bp1-81 of SEQ ID NO: 182 are replaced with the nucleic acid of SEQ ID NO: 181 (mut IL2 signal peptide), which encodes the GAA polypeptide of SEQ ID NO: 174 (mut IL2 signal peptide-GAA with H199R, H201L, and R223H modifications).

[0194] The C-terminal domain of GAA functions in trans in conjunction with the 70 / 76 kDal species to generate active GAA. The boundary between the catalytic domain and the C-terminal domain is thought to be around amino acid 791, based on its presence in a short region of less than 18 amino acids containing four consecutive proline residues in GAA, and its absence in most members of the Family 31 hydrolases. It has been reported that the C-terminal domain associated with the mature species begins at amino acid 792 (Moreland et al. (Nov. 1, 2004) J. Biol. Chem., Manuscript 404008200). Therefore, in some embodiments, the GAA nucleic acid sequence encodes the entire GAA polypeptide, excluding the C-terminal domain. Thus, in such embodiments, the rAAV vector can be used to transduce mammalian cells expressing the C-terminal domain of GAA as a separate polypeptide. B. Secretory signaling peptides

[0195] The native GAA signal peptide is not cleaved in the ER, thereby the native GAA polypeptide remains membrane-bound in the ER (Tsuji et al. (1987) Biochem. Int. 15(5):945-952). Disruption of GAA membrane association can be achieved by replacing the endogenous GAA signal peptide (and, if necessary, adjacent sequences) with a surrogate signal peptide for GAA.

[0196] Accordingly, in a typical embodiment, the rAAV vector and rAAV genome disclosed herein further comprises a heterogeneous nucleic acid encoding a GAA polypeptide, which is transferred into a target cell and attached to a heterogeneous nucleic acid sequence encoding a secretion signal peptide instead of an endogenous GAA signal peptide. The heterogeneous nucleic acid is operably associated with the segment encoding the secretion signal peptide so that, during transcription and translation, a fusion polypeptide is produced that contains a secretion signal sequence operably associated with the GAA polypeptide (e.g., directing its secretion).

[0197] In some embodiments, the AAV vector encodes a GAA polypeptide comprising an endogenous GAA signal peptide (e.g., amino acids 1-27 of SEQ ID NO: 10, also referred to as the “innate GAA” or “homogeneic GAA” signal peptide)). In some embodiments, the AAV vector encodes a GAA polypeptide comprising an endogenous GAA signal peptide (e.g., amino acids 1-27 of SEQ ID NO: 10, also referred to as the “innate GAA” or “homogeneic GAA” signal peptide) and an additional heterologous (non-native) signal sequence. In some embodiments, a GAA polypeptide lacking the endogenous signal peptide amino acids 1-27 of GAA is fused to a secretory signal. In some embodiments of the compositions and methods described herein, the secretory signal serves the general purpose of assisting the secretion of a GAA polypeptide, or fusion polypeptide, e.g., an IGF2-targeted peptide-GAA fusion polypeptide, from hepatocytes into the blood, where it can be transported to lysosomes in mammalian cells, e.g., human cardiac muscle cells and skeletal muscle cells, as described herein, and targeted. In some embodiments, the heterologous secretion signal is selected from AAT signal peptide, fibronectin signal peptide (FN1), GAA signal peptide, or an active fragment of AAT, FN1, or GAA signal peptide having secretion signaling activity.

[0198] In some embodiments, the secretory signal peptide is heterologous (i.e., exogenous or extrinsic) to the polypeptide of interest. For example, the heterologous secretory signal peptide is a fibronectin secretory signal peptide, and the polypeptide of interest is not fibronectin. In some embodiments, the secretory signal peptide is selected from either an AAT signal peptide, a fibronectin signal peptide (FN1), or an active fragment of an AAT, FN1, or GAA signal peptide having secretory signaling activity. In alternative embodiments, the secretory signal peptide is not heterologous to GAA; that is, the signal peptide is a GAA signal peptide (i.e., residues 1-27 of the native GAA polypeptide).

[0199] In some embodiments, the homogeneous GAA signal peptide of amino acids 1-27 of SEQ ID NO: 10 (i.e., MGVRHPPCSHRLLAVCALVSLATAALL, SEQ ID NO: 175) is replaced with a different or heterologous leader peptide. For example, the homogeneous leader peptide of GAA (SEQ ID NO: 175) may be replaced with any of the following: (i) an IgG1 leader peptide having the amino acid sequence MEFGLSWVFLVALLKGVQCE (SEQ ID NO: 176) encoded by the nucleic acid sequence of SEQ ID NO: 177 (referred to herein as "201 leader peptide" or "201 lp"), (ii) wtIL2 lp:MYRMQLLSCIALSLALVTNS (SEQ ID NO: 178) encoded by the nucleic acid sequence of SEQ ID NO: 179, or (iii) mutIL2 lp:MYRMQLLLLIALSLALVTNS (SEQ ID NO: 180) encoded by the nucleic acid sequence of SEQ ID NO: 181, or a heterologous signal peptide selected from leader peptides having at least 90% sequence identity with any of SEQ ID NOs: 176, 178, or 180.

[0200] Generally, secretory signal peptides are located at the amino terminus (N-terminus) of the fusion polypeptide (i.e., the nucleic acid segment encoding the secretory signal peptide is located at the 5' of the heterologous nucleic acid encoding the GAA peptide or GAA fusion peptide in the rAAV vector or rAAV genome disclosed herein). Alternatively, secretory signals may be located at the carboxy terminus or embedded within the GAA polypeptide or GAA fusion polypeptide (e.g., IGF2-GAA fusion polypeptide), insofar as the secretory signal operably associates with them and directs the secretion of the desired GAA polypeptide or GAA fusion polypeptide from the cell (either by cleavage of the signal peptide from the GAA polypeptide or without cleavage).

[0201] The secretory signal operably associates with GAA polypeptides or GAA fusion polypeptides that are targeted to the secretory pathway. In other words, the secretory signal operably associates with GAA polypeptides such that the GAA polypeptides or GAA fusion polypeptides are secreted from the cell at higher levels (i.e., in greater quantities) than in the absence of the secretory signal peptide. Generally, when the signal peptide is attached, at least about 20%, 30%, 40%, 50%, 70%, 80%, 85%, 90%, 95%, or more of the GAA polypeptide or IGF2-GAA fusion polypeptide (single and / or fused with the signal peptide) is secreted from the cell compared to the absence of the secretory signal peptide attachment. In other embodiments, essentially all of the detectable polypeptide (in the form of single and / or fusion polypeptides) is secreted from the cell.

[0202] The phrase "secreted from cells" means that polypeptides can be secreted into any extracellular compartment (e.g., fluids or spaces), including, but not limited to, the interstitial space, blood, lymph, cerebrospinal fluid, renal tubules, airways (e.g., alveoli, bronchioles, bronchi, nasal cavity, etc.), gastrointestinal tract (e.g., esophagus, stomach, small intestine, colon, etc.), vitreous fluid of the eye, and intracochlear lymph.

[0203] In one embodiment, the rAAV genome comprises a heterogeneous nucleic acid encoding a secretory signal peptide (SP) fused to a GAA fusion polypeptide, the GAA fusion polypeptide comprising a targeted peptide (e.g., an IGF2-targeted peptide) fused to the GAA polypeptide. As used herein, GAA also refers to the modified GAA described above. Thus, the signal peptides disclosed herein enhance the efficacy of secretion of GAA polypeptides or IGF2-GAA fusion polypeptides from cells transduced in an rAAV vector or from cells containing an rAAV genome, as described herein.

[0204] Therefore, in some embodiments, the rAAV genome disclosed herein includes 5'ITR and 3'ITR sequences, as well as promoters operably ligated between the 5'ITR and 3'ITR to heterogeneous nucleic acids encoding secretory peptides and nucleic acids encoding alpha-glucosidase (GAA) polypeptides (i.e., the heterogeneous nucleic acids encode GAA fusion polypeptides including signal peptide-GAA polypeptides).

[0205] In alternative embodiments, the rAAV genome disclosed herein comprises 5'ITR and 3'ITR sequences, as well as promoters operably linked to heterogeneous nucleic acids encoding a secretory peptide and nucleic acids encoding an alpha-glucosidase (GAA) fusion polypeptide, located between the 5'ITR and 3'ITR, wherein the fusion protein comprises an IGF2-targeted peptide and a GAA polypeptide (i.e., the heterogeneous nucleic acid encodes a GAA fusion polypeptide comprising a signal peptide-IGF2-GAA polypeptide).

[0206] Generally, secretory signal peptides are cleaved within the endoplasmic reticulum, and in some embodiments, the secretory signal peptides are cleaved from the GAA polypeptide before secretion. However, as long as the secretion of the GAA polypeptide or IGF2-GAA fusion polypeptide from the cell is enhanced and the GAA polypeptide remains functional, cleavage of the secretory signal peptide is not necessary. Therefore, in some embodiments, the secretory signal peptide is partially or completely retained.

[0207] In some embodiments, the rAAV genome or isolated nucleic acid disclosed herein comprises a nucleic acid encoding a chimeric polypeptide containing a GAA polypeptide operably linked to a secretory signal peptide, the chimeric polypeptide being expressed and produced from cells transduced with an rAAV vector, and the GAA polypeptide being secreted from the cells. The GAA polypeptide or GAA fusion polypeptide (e.g., IGF2-GAA fusion polypeptide) may be secreted after cleavage of all or part of the secretory signal peptide. Alternatively, the GAA polypeptide or GAA fusion polypeptide (e.g., IGF2-GAA fusion polypeptide) may retain the secretory signal peptide (i.e., the secretory signal is not cleaved). Thus, in this context, “GAA polypeptide or GAA fusion polypeptide” may be a chimeric polypeptide containing a secretory peptide.

[0208] The secretion signal sequences of the present invention are not limited to any particular length, as long as they direct the target polypeptide into the secretory pathway. In typical embodiments, the signal peptide is at least about 6, 8, 10, 12, 15, 20, 25, 30, or 35 amino acids long, and at most about 40, 50, 60, 75, or 100 amino acids long, or longer.

[0209] The secretory signal peptides encoded by the rAAV genomes and in rAAV vectors disclosed herein may include, be essentially derived from, or consist of naturally occurring secretory signal sequences or modifications thereof. Numerous secretory proteins and sequences that direct secretion from cells are known in the art and are disclosed in their entirety in U.S. Patent No. 9,873,868, which is incorporated herein by reference. Exemplary secretory proteins (and their secretory signals) include, but are not limited to, erythropoietin, coagulation factor IX, cystatin, lactotransferrin, plasma protease C1 inhibitors, and apolipoproteins (e.g., APOA, C, E), MCP-1, α-2-HS-glycoprotein, α-1-microglobulin, complement (e.g., C1Q, C3), vitronectin, lymphotoxin-α, azulocidine, VIP, metalloproteinase inhibitor 2, glypican-1, pancreatic hormone, clusterin, hepatocyte growth factor, insulin, α-1-antichymotrypsin, growth hormone, type IV collagenase, guanylin, propaginin, proenkephalin A, inhibin β (e.g., A chain), prealbumin, angiocenin, lutropin (e.g., β chain), insulin-like growth factor binding protein 1 and 2, proactivator polypeptide, fibrinogen (e.g., β chain), gastric triacylglycerol lipase, midkine, neutrophil defensin 1, 2, and 3. α-1-antitrypsin, matrix gla-protein, α-tryptase, bile salt-activated lipase, chymotrypsinogen B, elastin, IG lambda chain V region, platelet factor 4 variant, chromogranin A, WNT-1 proto-oncogene protein, oncostatin M, β-neoendorphin-dynorphin, von Willebrand factor, plasma serine protease inhibitor, serum amyloid A protein, nidogen, fibronectin, rennin, osteonectin, histatin 3, phospholipase A2, cartilage matrix protein, GM-CSF, matricin, neuroendocrine protein 7B2, placental protein 11, gelzolin, M-CSF, transcobalamin I, lactase-phloridine hydrolase, elastase 2B, pepsinogen A, MIP1-β, prolactin, trypsinogen II, gastrin-releasing peptide II, atrial natriuretic factor, secretory alkaline phosphatase, pancreatic α-amylase, secretogranin I, β-casein, serotransferrin, tissue factor pathway inhibitors, follitropin β-chain, coagulation factor XII, growth hormone-releasing factor, prostatic seminal plasma protein, interleukins (e.g., 2, 3, 4, 5, 9, 11), inhibins (e.g., alpha chain), angiotensinogen, thyroglobulin, IG heavy or light chain, plasminogen activator Inhibitor-1, Lysozyme C, Plasminogen Activator, Antileucoproteinase 1, Statelin, Fibrin-1, Isoform B, Uromodulin, Thyroxine-binding Globulin, Axonin-1, Endometrial α-2 Globulin, Interferon (e.g., Alpha, Beta, Gamma), β-2-Microglobulin, Procholecystokinin, Progastrixin, Prostatic Acid Phosphatase, Bone Sialoprotein II, Colipase, Alzheimer's Disease Amyloid A4 Protein, PDGF (e.g., A or B chain), Coagulation Factor V, triacylglycerol lipase, haptoglobuin-2, corticosteroid-binding globulin, triacylglycerol lipase, prorelaxin H2, follistatin 1 and 2, platelet glycoprotein IX, GCSF, VEGF, heparin cofactor II, antithrombin-III, leukemia suppressor, interstitial collagenase, pleiotrophin, small inducible cytokine A1, melanin-concentrating hormone, angiotensin-converting enzyme, pancreatic trypsin inhibitor, coagulation factor VIII, α-fetops Examples include rothein, α-lactalbumin, senogelin II, kappa casein, glucagon, thyrotropin beta chain, transcobalamin II, thrombospondin 1, parathyroid hormone, vasopressin copeptin, tissue factor, motilin, MPIF-1, kininogen, neuroendocrine combaturase 2, stem cell factor procollagen α1 chain, plasma kallikrein, keratinocyte growth factor, and any other secretory hormones, growth factors, cytokines, enzymes, coagulation factors, milk proteins, immunoglobulin chains, etc.

[0210] In some embodiments, other secretion signal peptides encoded by the rAAV genome and in the rAAV vector disclosed herein may be selected from, but are not limited to, prepro-cathepsin L (e.g., GenBank accession numbers KHRTL, NP_037288; NP_034114, AAB81616, AAA39984, P07154, CAA68691; their disclosures are incorporated herein by reference in their entirety) and prepro-alpha-2 collagen (e.g., GenBank accession numbers CAA98969, CAA26320, CGHU2S, NP_000080, BAA25383, P08123; their disclosures are incorporated herein by reference in their entirety), as well as variations, alterations, and functional fragments of their alleles (as discussed above with respect to fibronectin secretion signal sequences). Exemplary secretory signal sequences include those for preprocatepsin L (Rattus norvegicus, MTPLLLLAVLCLGTALA [SEQ ID NO: 27]; accession number CAA68691) and prepro-alpha-2 collagen (Homo sapiens, MLSUVDTRTLLLLAVTLCLATC [SEQ ID NO: 28]; accession number CAA98969). This also includes full-length secretory signal sequences derived from preprocatepsin L and prepro-alpha-2 collagen, or longer amino acid sequences containing their functional fragments (as discussed above with respect to fibronectin secretory signal sequences).

[0211] In some embodiments, the secretory signal peptide is derived in part or in whole from a secretory polypeptide produced by hepatocytes. In some embodiments, the secretory signal peptide may be further synthesized or artificial in whole or in part. Synthetic or artificial secretory signal peptides are known in the art; see, for example, Barash et al., "Human secretory signal peptide description by hidden Markov model and generation of a strong artificial signal peptide for secreted protein expression," Biochem. Biophys. Res. Comm. 294:835-42 (2002), the disclosure thereof is incorporated in whole herein. In certain embodiments, the secretory signal peptide includes, is essentially, or consists of the artificial secretory signal: MWWRLWWLLLLLLLLWPMVWA (SEQ ID NO: 29), or a variation thereof having one, two, three, four, or five amino acid substitutions (conservative amino acid substitutions, if applicable, are known in the art).

[0212] Exemplary signal peptides for use in the methods and compositions disclosed herein can be selected from any signal peptides disclosed in Table 2 or their functional variants. Exemplary signal peptides are fibronectin (FN1) or AAT. In some embodiments of the methods and compositions disclosed herein, the rAAV vector composition comprises a nucleic acid encoding a secretory signal peptide, e.g., AAT signal peptide (e.g., SEQ ID NO: 17), fibronectin signal peptide (FN1) (e.g., SEQ ID NOs: 18-21), GAA signal peptide, hIGF2 signal peptide (SEQ ID NO: 22), or a nucleic acid encoding an amino acid sequence having at least about 75%, or 80%, or 85%, or 90%, or 95%, or 98%, or 99%, sequence identity with SEQ ID NOs: 17-22.

[0213] In some embodiments of the methods and compositions disclosed herein, the nucleic acid encoding the secretion signal is selected from any nucleic acid sequence having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NOs: 17, 81-21, 22-26, or either SEQ ID NOs: 17 or 22-26.

[0214] In some embodiments, FN1 or the AAT signal peptide can be readily substituted with any signal peptide, including a signal peptide for a protein expressed in the liver, or a signal peptide disclosed in Patent No. 62,937,556 filed November 19, 2019, or PCT / US19 / 61653 filed November 15, 2019.

[0215] Fibronectin secretion signaling peptide:

[0216] In some embodiments, the secretory signal peptide is a fibronectin secretory signal peptide, and this term includes modifications of naturally occurring sequences (as described in more detail below).

[0217] In some embodiments, the secreted signal peptide is a fibronectin signal peptide, such as a signal sequence of human fibronectin or a signal sequence derived from rat fibronectin. Fibronectin (FN1) signal sequences and modified FN1 signal peptides incorporated for use in the rAAV genome and rAAV vectors described herein are disclosed in Table 3 of U.S. Patent No. 7,071,172, which is incorporated herein by reference in its entirety, and Provisional Application No. 62 / 937,556, filed November 19, 2019. Examples of exemplary fibronectin secreted signal sequences, but not limited to those listed in Table 1 of U.S. Patent No. 7,071,172, which is incorporated herein by reference in its entirety, are listed here.

[0218] [Table 2]

[0219] An exemplary nucleotide sequence encoding the fibronectin secretion signal sequence of Rattus norvegicus can be found in GenBank accession number X15906 (its disclosure is incorporated herein by reference). Another exemplary sequence is a nucleotide sequence encoding the secretion signal peptide of human fibronectin 1, transcript variant 1 (accession number NM_002026, nucleotides 268-345; the disclosure of accession number NM_002026 is incorporated herein by reference in its entirety). Another exemplary secretion signal sequence is encoded by a nucleotide sequence encoding the secretion signal peptide of Xenopus laevis fibronectin protein (accession number M77820, nucleotides 98-190; the disclosure of accession number M77820 is incorporated herein by reference in its entirety).

[0220] In another embodiment, the fibronectin signal sequence (FN1, nucleotides 208-303, 5'-ATG CTC AGG GGT CCG GGA CCC GGG CGG CTG CTG CTG CTA GCA GTC CTG TGC CTG GGG ACA TCG GTG CGC TGC ACC GAA ACC GGG AAG AGC AAG AGG-3', SEQ ID NO: 23) is derived from the rat fibronectin mRNA sequence (Genbank accession number X15906) and encodes the following peptide signal sequence: Met Leu Arg Gly Pro Gly Pro Gly Arg Leu Leu Leu Leu Ala Val Leu Cys Leu Gly Thr Ser Val Arg Cys Thr Glu Thr Gly Lys Ser Lys Arg (SEQ ID NO: 18). In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding a secretory signal peptide which is a fibronectin signal peptide (FN1) or an active fragment thereof having secretory signaling activity (for example, the FN1 signal peptide has an amino acid sequence having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with any of SEQ ID NOs. 18-21), wherein the heterogeneous nucleic acid sequence encodes an IGF2-targeted peptide selected from SEQ ID NOs. 5, 6, 7, 8, or 9, or an IGF2 peptide having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NOs. 5-9.In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding a secretory signal peptide which is an AAT signal peptide or an active fragment thereof having secretory signal activity (for example, the AAT signal peptide has the sequence of SEQ ID NO: 17, or an amino acid sequence having at least about 75%, or 80%, or 85%, or 90%, or 95%, or 98%, or 99% sequence identity with SEQ ID NO: 17), wherein the heterogeneous nucleic acid sequence encodes an IGF2-targeted peptide selected from any of SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 9, or an IGF2 peptide having at least about 75%, or 80%, or 85%, or 90%, or 95%, or 98%, or 99% sequence identity with SEQ ID NOs: 5-9.

[0221] Those skilled in the art will recognize that the secretory signal sequence may encode one, two, three, four, five, or six amino acids, or more, at the C-terminal end of the peptidase cleavage site (indicated by the upward arrow) (see, for example, SEQ ID NOs. 19 and 24 in Table 2). Those skilled in the art will also recognize that additional amino acids (e.g., one, two, three, four, five, six, or more) at the carboxyl-terminal end of the cleavage site may be included in the secretory signal sequence.

[0222] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises or consists of a heterogeneous nucleic acid sequence encoding a secretory signal peptide located between the 5'ITR and 3'ITR, and a nucleic acid encoding an hGAA polypeptide, wherein the nucleic acid sequence encoding the signal sequence is an AAT signal peptide (e.g., SEQ ID NO: 17), a fibronectin signal peptide (FN1) (e.g., SEQ ID NOs: 18-21), a homologous GAA signal peptide (SEQ ID NO: 175), a hIGF2 signal peptide (e.g., SEQ ID NO: 22), an IgG1 leader peptide (SEQ ID NO: 177), a wtIL2 leader peptide (SEQ ID NO: 179), a mutant IL2 peptide The signal peptide (SEQ ID NO: 181), or an active fragment thereof having secretory signaling activity, is selected from any nucleic acid encoding an amino acid sequence having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NOs: 17-22, 175, 177, 179, or 181, wherein the nucleic acid encoding the signal peptide is located at 5' of the nucleic acid encoding the hGAA polypeptide disclosed herein, and the signal sequence and the nucleic acid encoding the hGAA polypeptide are operably linked to any LSP disclosed herein in Table 4, or a functional variant thereof.

[0223] In embodiments of the present invention, the functional fragment has at least about 50%, 70%, 80%, 90%, or more than the secretory signaling activity of the sequences specifically disclosed herein, or even a higher level of secretory signaling activity.

[0224] Peptidase cleavage site

[0225] In some embodiments, one or more exogenous peptidase cleavage sites may be inserted into the secretory signal peptide-GAA fusion polypeptide, for example, between the secretory signal peptide and the GAA polypeptide. In certain embodiments, an autoprotease (e.g., foot-and-mouth disease virus 2A autoprotease) is inserted between the secretory signal peptide and the GAA polypeptide or IGF2-GAA fusion polypeptide. In other embodiments, protease recognition sites that can be controlled by the addition of an exogenous protease are used (e.g., a Lys-Arg recognition site for trypsin, a Lys-Arg recognition site for Aspergillus KEX2-like protease, a recognition site for metalloproteases, a recognition site for serine proteases, etc.). Modifications of the GAA polypeptide that delete or inactivate the native protease site are incorporated herein and disclosed in U.S. Provisional Application No. 62,937,556 filed November 19, 2019, and International Application No. PCT / US19 / 61653 filed November 15, 2019. C.IGF2-targeted peptide sequence

[0226] In one embodiment, the rAAV genome contains a heterogeneous nucleic acid encoding a targeted peptide (TP) fused to a GAA polypeptide. In some embodiments, the targeted peptide is a ligand for an extracellular receptor, and the targeted peptide binds to the extracellular domain of the receptor on the surface of the target cell, enabling the localization of the polypeptide in human lysosomes during internal translocation of the receptor. In one embodiment, the targeted peptide includes a urokinase-type plasminogen receptor moiety that can bind to a cation-independent mannose-6-phosphate receptor. In some embodiments, the targeted peptide incorporates one or more amino acid sequences of an IGF2 targeted peptide.

[0227] In some embodiments, the IGF2-targeted peptides disclosed herein include at least a portion of a ligand for an extracellular receptor, for example, the IGF2-targeted peptide binds to the human cation-independent mannose-6-phosphate receptor (CI-MPR) or the IGF2 receptor.

[0228] IGF2 is also known by other names; chromosome 11 open reading frame 43, insulin-like growth factor 2, IGF-II, FLJ44734; IGF2, somatomedin A, and preptin. The mRNA of the wild-type human IGF2 sequence is, This corresponds to GCTTACCGCCCCAGTGAGACCCTGTGCGGCGGGGAGCTGGTGGACACCCTCCAGTTCGTCTGTGGGGACCGCGGCTTCTACTTCAGCAGGCCCGCAAGCCGTGTGAGCCGTCGCAGCCGTGGCATCGTTGAGGAGTGCTGTTTCCGCAGCTGTGACCTGGCCCTCCTGGAGACGTACTGTGCTACCCCCGCCAAGTCCGAG (Sequence ID 1). The full-length IGF2 protein (including the IGF2 targeting sequence) is encoded by the nucleic acid sequence NM_000612.6, which encodes the full-length IGF2 protein NP_000603.1.

[0229] Mature human IGF2-targeted peptides are listed below: AYRPSETLCGGELVDTLQFVC GDRGFYFSRPASRVSRRSRGI VEECCFRSCDLALLETYCATP AKSE (Sequence ID 5).

[0230] The coding sequence for human IGF2 is also disclosed in U.S. Patent No. 8,492,388 (see, e.g., Figure 2), which is incorporated herein by reference in its entirety. The IGF2 protein is synthesized as a preproprotein having a 24-amino acid signal peptide at the amino terminus and an 89-amino acid carboxy-terminal region, both of which are post-translationally removed, as outlined in O'Dell et al. (1998) Int. J. Biochem Cell Biol. 30(7):767-71. The mature protein has 67 amino acids. A Leishmania codon-optimized version of mature IGF2 is disclosed in U.S. Patent No. 8,492,388 (see, e.g., Figure 3 of No. 8,492,388) (Langford et al. (1992) Exp. Parasitol. 74(3):360-1). Additional cassettes containing deletions of amino acids 1-7 or 2-7 of the mature polypeptide (Δ1-7), changes from tyrosine to leucine at residue 27 (Y27L), or both mutations (Δ1-7,Y27L, or Δ2-7,Y27L) were prepared to generate IGF-2 cassettes with specificity to only the desired receptor, as described below. Thus, in some embodiments, IGF2 targeting sequences that can be selected from any of the wild-type, Y27L, Δ1-7, Δ2-7, and Y27L-Δ1-7, Y27L-Δ2-7, V43M, Y27L-V43M, Y27L-Δ1-7-V43M, and Y27L-Δ2-7-V43M IGF2 variants are included for use herein.

[0231] Exemplary IGF2-targeted peptides for use in the methods and compositions described herein are disclosed in U.S. Provisional Application No. 62,937,556 filed November 19, 2019, and International Application No. PCT / US19 / 61653 filed November 15, 2019, and International Application No. PCT / US19 / 61701 filed November 15, 2019, each of which is incorporated herein by reference in whole.

[0232] In some embodiments, the IGF2-targeted peptides for use in the methods and compositions herein may have one or more modifications of E6R, F26S, Y27L, V43L, F48T, R495, S50I, A54R, L55R, and K65R, which are disclosed in their entirety in U.S. Application No. 2019 / 0343968, which is incorporated herein by reference. In some embodiments, the IGF2-targeted peptides may have one or more modifications selected from E6R, F26S, Y27L, V43L, F48T, R495, S50I, A54R, L55R, and K65R, in addition to the modification V43M. In some embodiments, the IGF2-targeting peptide has one or more modifications selected from E6R, F26S, Y27L, V43L, F48T, R495, S50I, A54R, L55R, and K65R, in addition to the Δ1-7 or Δ2-7 modifications. In some embodiments, the IGF2-targeting peptide has the Δ1-7 or Δ2-7 modifications, the V43M modification, and one or more modifications selected from E6R, F26S, Y27L, V43L, F48T, R495, S50I, A54R, L55R, and K65R.

[0233] In certain embodiments, the IGF2-targeted peptide includes a modification at valine 43 (V43M) in which valine is modified to met so that translation initiation begins at amino acid 43. The IGF2-targeted peptide having the V43M modification, which is incorporated herein for use as a targeted peptide or IGF2-targeted peptide, binds to a cation-independent mannose-6-phosphate receptor. In alternative embodiments, the IGF2-targeted peptide is IGF2 delta 1-42, with V43 changed to met (i.e., IGF2-Δ1-42 (SEQ ID NO: 8) or IGF2-V43M (SEQ ID NO: 9)).

[0234] In some embodiments, the rAAV genome comprises a nucleic acid encoding an IGF2-GAA fusion protein, and a nucleic acid encoding a mature IGF2-targeting peptide (SEQ ID NO: 5), or an IGF2-targeting peptide variant (e.g., SEQ ID NO: 6 (IGF2-Δ2~7); SEQ ID NO: 7 (IGF2-Δ1~7); SEQ ID NO: 8 (IGF2-Δ1~42), SEQ ID NO: 9 (IGF2-V43M)), or a sequence having at least 85%, 90%, or 95% sequence identity with SEQ ID NOs: 5-9, is fused to the 5' end of the nucleic acid encoding the GAA protein to produce a fusion protein (e.g., an IGF2-GAA fusion polypeptide) that can be taken up by various cell types and transported to lysosomes. Alternatively, a nucleic acid encoding a precursor IGF2 polypeptide may be fused to the 3' end of the GAA gene, the precursor containing a carboxyl-terminal portion that is cleaved in mammalian cells to produce a mature IGF2 polypeptide, but the IGF2-targeting peptide is preferably removed (or moved to the 5' end of the GAA gene). This method has numerous advantages over methods involving glycosylation, including simplicity and cost-effectiveness, because it eliminates the need for further modification once the protein is isolated.

[0235] In some embodiments, the IGF2-targeted peptides incorporated herein are described in their entirety by reference in U.S. Patents 7,785,856 and 9,873,868, respectively. (i) IGF2 deletion mutants:

[0236] In some embodiments, the IGF2-targeted peptide is a modified or cleaved IGF2-targeted peptide (also referred to as an IGF2 deletion variant), as disclosed in International Application PCT / US19 / 61701, filed November 15, 2019, which is incorporated in its entirety by reference herein. For example, in some embodiments, the IGF2-targeted peptide includes the V43M modification and also includes any deletion of one or more amino acids from amino acids 1 to 42. For example, in some embodiments of the methods and compositions disclosed herein, the IGF2-targeting peptide comprises V43M and Δ1-3, Δ1-4, Δ1-5, Δ1-6, Δ1-8, Δ1-9, Δ1-10, Δ1-11, Δ1-12, Δ1-13, Δ1-14, Δ1-15, Δ1-16, Δ1-17, Δ1-18, Δ1-19, Δ1-20, Δ1-21, Δ1-22, Δ1-2 3, further comprising one or more deletions selected from any of Δ1-24, Δ1-25, Δ1-26, Δ1-27, Δ1-28, Δ1-29, Δ1-30, Δ1-31, Δ1-32, Δ1-33, Δ1-34, Δ1-35, Δ1-36, Δ1-37, Δ1-38, Δ1-39, Δ1-40, Δ1-41, or Δ1-42, wherein residue 43 of SEQ ID NO: 5 is methionine (V43M). In some embodiments of the methods and compositions disclosed herein, the IGF2-targeting peptide comprises V43M and further comprises Δ1-7 deletions (IGF2-Δ1-7,V43M).

[0237] In some embodiments of the methods and compositions disclosed herein, the lysosome IGF2-targeting peptide is Δ2-3, Δ2-4, Δ2-5, Δ2-6, Δ2-8, Δ2-9, Δ2-10, Δ2-11, Δ2-12, Δ2-13, Δ2-14, Δ2-15, Δ2-16, Δ2-17, Δ2-18, Δ2-19, Δ2-20, Δ2-21, Δ2-22, Δ2-23, Δ The IGF2-targeted peptide further comprises one or more modifications selected from any of 2-24, Δ2-25, Δ2-26, Δ2-27, Δ2-28, Δ2-29, Δ2-30, Δ2-31, Δ2-32, Δ2-33, Δ2-34, Δ2-35, Δ2-36, Δ2-37, Δ2-38, Δ2-39, Δ2-40, Δ2-41, or Δ2-42, wherein residue 43 of SEQ ID NO: 5 is methionine (V43M). In some embodiments of the methods and compositions disclosed herein, the IGF2-targeted peptide comprises V43M and further comprises a Δ2-7 deletion (IGF2-Δ2-7,V43M).

[0238] In some embodiments, the IGF2-targeted peptide for fusion to the GAA polypeptide may include amino acids 8-28 and 41-61 of IGF2. In some embodiments, these stretches of amino acids may be directly conjugated or separated by a linker. Alternatively, amino acids 8-28 and 41-61 may be provided to separate polypeptide chains. In some embodiments, amino acids 8-28 of IGF2 or their conserved substitution variants may be fused to the GAA polypeptide to express an IGF2-GAA fusion protein from an rAVV vector, while separate rAAV vectors may express IGF2 amino acids 41-61 or their conserved substitution variants.

[0239] To facilitate the proper presentation and folding of IGF2-targeted peptides, longer portions of the IGF2 protein can be used. For example, IGF2-targeted peptides containing amino acid residues 1-67, 1-87, or the entire precursor form can be used.

[0240] In some embodiments, the IGF2-targeting peptide is a nucleic acid sequence encoding one of the following IGF2-targeting peptides: residue 1 followed by residues 8-67 of wild-type mature human insulin-like growth factor II (IGF2) in SEQ ID NO: 5 (i.e., SEQ ID NO: 6; i.e., IGF2-delta 2-7); residues 8-67 of wild-type mature human insulin-like growth factor II (IGF2) in SEQ ID NO: 5 (i.e., SEQ ID NO: 7; IGF2-delta 1-7); or residues 43-67 of wild-type mature human insulin-like growth factor II (IGF2) in SEQ ID NO: 5 (i.e., IGF2-V43M (SEQ ID NO: 9) or IGF-delta 1-42 (SEQ ID NO: 8)).

[0241] In some embodiments of the methods and compositions disclosed herein, the IGF2-targeted peptide is a nucleic acid sequence selected from any nucleic acid sequence that includes SEQ ID NO: 2 (i.e., IGF2-Delta 2-7); SEQ ID NO: 3 (i.e., IGF2-Delta 1-7); or SEQ ID NO: 4 (i.e., IGF2-V43M); or sequences with at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with them.

[0242] In some embodiments of the methods and compositions disclosed herein, the IGF2(V43M) sequence is a nucleic acid sequence encoding either an amino acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with SEQ ID NO: 65 (IGF2Δ2~7V43M), or an amino acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with SEQ ID NO: 66 (IGFΔ1~7V43M), or an amino acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% identity with SEQ ID NO: 66. [Table 3]

[0243] In some embodiments, longer portions of the IGF2 protein can be used to facilitate the proper presentation and folding of the IGF2-targeted peptide. For example, IGF2-targeted peptides containing amino acid residues 1-67, 1-87, or the entire precursor form can be used.

[0244] In some embodiments of the methods and compositions disclosed herein, recombinant AAV comprises a heterogeneous nucleic acid sequence encoding a signal peptide-GAA (SP-GAA) fusion polypeptide, further comprising an IGF2-targeting peptide located between a secretory signal peptide (SP) and an alpha-glucosidase (GAA) polypeptide.

[0245] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding an IGF2-targeted peptide that binds to a human cation-independent mannose-6-phosphate receptor (CI-MPR) or IGF2 receptor, for example, the heterogeneous nucleic acid sequence encoding an IGF2-targeted peptide having the amino acid sequence of SEQ ID NO: 5, or comprising at least one amino modification of SEQ ID NO: 5 that binds to an IGF2 receptor. In some embodiments, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding an IGF2-targeted peptide, wherein at least one amino modification in SEQ ID NO: 5 is the V43M amino acid modification (SEQ ID NO: 8 or SEQ ID NO: 9), or Δ2-7 (SEQ ID NO: 6), or Δ1-7 (SEQ ID NO: 7), or an IGF2 peptide having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NOs: 5-9.

[0246] In some embodiments of the methods and compositions disclosed herein, the nucleic acid encoding the IGF2-targeted peptide is selected from any of the following nucleic acid sequences: SEQ ID NO: 2 (IGF2-Δ2~7), SEQ ID NO: 3 (IGF2-Δ1~7), or SEQ ID NO: 4 (IGF2 V43M), or any of SEQ ID NOs: 2, 3, or 4, with at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity.

[0247] In some embodiments of the compositions and methods described herein, the IGF2-targeted peptide is a nucleic acid sequence encoding one of the following: residue 1 followed by residues 8-67 of wild-type mature human insulin-like growth factor II (IGF2) of SEQ ID NO: 5 (i.e., IGF2-delta 2-7 or IGF2Δ2-7; this corresponds to SEQ ID NO: 6); residues 8-67 of wild-type mature human insulin-like growth factor II (IGF2) of SEQ ID NO: 5 (i.e., IGF2-delta 1-7 or IGF2Δ1-7, this corresponds to SEQ ID NO: 7); or residues 43-67 of wild-type mature human insulin-like growth factor II (IGF2) of SEQ ID NO: 5 (i.e., IGF2-delta 1-42 or IGF2Δ1-42, this corresponds to SEQ ID NO: 8). In some embodiments of the compositions and methods described herein, the IGF2-targeted peptide is a nucleic acid sequence having a modification of amino acid residue 43, for example, residue 43 is modified to a start codon, e.g., IGF2-V43M (corresponding to SEQ ID NO: 9).

[0248] In some embodiments of the compositions and methods described herein, the IGF2-targeted peptide is a nucleic acid sequence comprising one of the following: SEQ ID NO: 2 (i.e., IGF2-Delta 2-7); SEQ ID NO: 3 (i.e., IGF2-Delta 1-7); or SEQ ID NO: 4 (i.e., IGF2-V43M).

[0249] In some embodiments of the compositions and methods described herein, a fusion protein comprising a GAA polypeptide and an IGF2-targeted peptide comprises amino acid residues 40-952 or 70-952 of a human acid alpha-glucosidase (GAA) polypeptide (SEQ ID NO: 10) attached to an IGF2-targeted peptide containing residue 1 followed by residues 8-67 of wild-type mature human insulin-like growth factor II (IGF2) (SEQ ID NO: 5) (i.e., residues 2-7 of mature human IGF2 (SEQ ID NO: 5) are absent), wherein the IGF2-targeted peptide is linked to amino acid residue 70 of human GAA (SEQ ID NO: 10).

[0250] In some embodiments of the compositions and methods described herein, a fusion protein comprising a GAA polypeptide and an IGF2-targeted peptide comprises amino acid residues 40-952 or 70-952 of a human acid alpha-glucosidase (GAA) polypeptide (SEQ ID NO: 10) attached to an IGF2-targeted peptide containing residues 8-67 of wild-type mature human insulin-like growth factor II (IGF2) (SEQ ID NO: 5) (i.e., residues 1-7 of mature human IGF2 (i.e., YRPSET; SEQ ID NO: 63)), wherein the IGF2-targeted peptide is linked to amino acid residue 70 of human GAA (SEQ ID NO: 10).

[0251] In some embodiments of the compositions and methods described herein, a fusion protein comprising a GAA polypeptide and an IGF2-targeted peptide comprises amino acid residues 40-952 or 70-952 of human acid alpha-glucosidase (GAA) (SEQ ID NO: 10) attached to a modified IGF2-targeted peptide containing residues 43-67 of wild-type mature human insulin-like growth factor II (IGF2) (SEQ ID NO: 5) (residues 1-42 of mature human IGF2 (SEQ ID NO: 5) are absent), wherein the IGF2-targeted peptide is linked to amino acid residue 70 of human GAA (SEQ ID NO: 10).

[0252] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding an IGF2 peptide, wherein the IGF2 peptide sequence is an IGF2 peptide having at least about 75%, 80%, 85%, 90%, 95%, 98%, or 99% sequence identity with SEQ ID NO: 8 or SEQ ID NO: 9. (ii) Modified IGF2-targeted peptides and IGF2 homologs

[0253] In some embodiments, the nucleic acids encoding IGF2 can be modified to reduce their affinity for IGFBPs and / or their affinity for binding to the IGF-I receptor, thereby increasing the targeting of the fused GAA polypeptide to lysosomes and increasing its bioavailability.

[0254] IGF2-targeted peptides preferably specifically target and bind to the M6P receptor. IGF2-targeted peptides having mutations in the IGF2 polypeptide that result in a protein that binds with high affinity to the CI-MPR / M6P receptor but no longer binds with recognizable affinity to the other two receptors are particularly useful.

[0255] IGF2(V43M) targeted peptides preferably specifically target the M6P receptor. IGF2(V43M) targeted peptides having mutations in the IGF2 polypeptide that result in a protein that binds with high affinity to the CI-MPR / M6P receptor but no longer binds with recognizable affinity to the other two receptors are particularly useful.

[0256] IGF2(V43M)-targeted peptides can also be modified to minimize binding to serum IGF-binding protein (IGFBP) (Baxter (2000) Am. J. Physiol Endocrinol Metab. 278(6):967-76) and IGF-I receptors to avoid sequestering of the IGF2 construct. Several studies have identified the IGF-1 and IGF2 residues necessary for binding to IGF-binding proteins. Constructs with mutations in these residues can be screened for maintaining high-affinity binding to the M6P / IGF2 receptor and for reduced affinity to IGF-binding proteins. For example, replacing Phe 26 in IGF2 with Ser has been reported to reduce the affinity of IGF2 to IGFBP-1 and IGFBP-6 without affecting binding to the M6P / IGF2 receptor (Bach et al. (1993) J. Biol. Chem. 268(13):9246-54). Other substitutions, such as Ser for Phe 19 and Lys for Glu 9, may also be beneficial. Similar mutations in the highly conserved IGF-I region of IGF2, either individually or in combination, result in a significant decrease in IGF-BP binding (Magee et al. (1999) Biochemistry 38(48): 15863-70).

[0257] IGF2-targeted peptides can also be modified to minimize binding to serum IGF-binding proteins (IGFBPs) and IGF-I receptors in order to avoid sequestering of the IGF2 construct.

[0258] In some embodiments, the IGF2-targeted peptide is modified to be furin-resistant, i.e., resistant to degradation by furin proteases that recognize the Arg-XX-Arg cleavage site. Such an IGF2-targeted peptide is disclosed in U.S. Patent Application No. 2012 / 0213762, which is incorporated herein in its entirety by reference. In some embodiments, the furin-resistant IGF2-targeted peptide for use in rAAV genomes described herein contains mutations that may be substitutions or deletions of any other amino acids within the region corresponding to amino acids 30-40 (e.g., 31-40, 32-40, 33-40, 34-40, 30-39, 31-39, 32-39, 34-37, 32-39, 33-39, 34-39, 35-39, 36-39, 37-40, 34-40) of SEQ ID NO: 5 (wt IGF2-targeted peptide). For example, a substitution at position 34 can affect furin recognition of the first cleavage site. Insertion of one or more additional amino acids within each recognition site can cause the loss of one or both furin cleavage sites. Deletion of one or more residues at degenerate positions can also cause the loss of both furin cleavage sites.

[0259] In some embodiments, the furin-resistant IGF2-targeted peptide contains an amino acid substitution at the position corresponding to Arg37 (R37) or Arg40 (R40) in SEQ ID NO: 5. In some embodiments, the furin-resistant IGF2-targeted peptide contains a Lys(K) or Ala(A) substitution at the Arg37 or Arg40 position in SEQ ID NO: 5. Other substitutions are possible, including combinations of Lys and / or Ala mutations at both positions 37 and 40, or substitutions of amino acids other than Lys(K) or Ala(A). In some embodiments, the IGF2-targeted peptides included for use in the rAVV genome disclosed herein are IGFΔ2~7-K37, or IGFΔ2~7-K40, or IGFΔ1~7-K37, or IGFΔ1~7-K40, indicating that the IGF2-targeted peptides have deletions of aa2~7 or 1~7, and modifications of the Arg(R) residue at position 37 to lysine (i.e., R37K modification) or R40K, respectively. In some embodiments, the IGF2-targeted peptides included for use in the rAVV genome disclosed herein are IGFΔ2~7-K37-K40 or IGFΔ1~7-R37K-R40K, indicating that the IGF2-targeted peptides have deletions of residues 2~7 or residues 1~7, and modifications of the R residues at positions 37 and 40 to lysine (R37K and R40K). In some embodiments, the IGF2-targeting peptides incorporated herein for use in rAVV genomes are selected from IGFΔ2~7-R37A, or IGFΔ2~7-R40A, or IGFΔ1~7-R37A, or IGFΔ1~7-R40A, IGFΔ2~7-R37A-R40A, or IGFΔ1~7-R37A-R40A. Exemplary constructs for the IGF2-targeting peptides incorporated herein for use in rAVV genomes are disclosed in U.S. Application No. 2012 / 0213762, which is incorporated herein by reference in its entirety.

[0260] In some embodiments, the furin-resistant IGF2-targeted peptide suitable for the present invention may contain additional mutations. For example, up to 30% or more of the residues in SEQ ID NO: 5 may be modified (for example, up to 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, or more residues may be modified). Therefore, a furin-resistant IGF2 mutant protein suitable for the present invention may have at least 70% of the same amino acid sequence as SEQ ID NO: 5, for example, at least 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99%.

[0261] Furthermore, the use of the IGF2-targeted peptides disclosed herein is also referred to in the art as glycosylation-independent lysosome targeting (GILT), since the IGF2-targeted peptides replace M6P as the lysosome-targeting portion. Details of GILT technology are described in U.S. Patent Publications 2003 / 0082176, 2004 / 0006008, 2004 / 0005309, 2003 / 0072761, 2005 / 0281805, 2005 / 0244400, and international publications WO03 / 032913, WO03 / 032727, WO02 / 087510, WO03 / 102583, and WO2005 / 078077, all of which are incorporated herein by reference.

[0262] Other modifications to the amino acid sequence of the IGF2-targeting peptide for use in the methods and compositions disclosed herein are disclosed in U.S. Provisional Application No. 62,937,556 filed November 19, 2019, and PCT Application No. PCT / US19 / 61653 filed November 15, 2019, both of which are incorporated herein by reference in their entirety.

[0263] IGF2 binds to IGF2 / M6P and IGF-I receptors with relatively high affinity and to the insulin receptor with lower affinity. Substitution of residues 48-50 (Phe Arg Ser) of IGF2 with the corresponding insulin-derived residue (Thr Ser Ile), or substitution of residues 54-55 (Ala Leu) with the corresponding IGF-I-derived residue (Arg Arg), reduces binding to the IGF2 / M6P receptor but retains binding to IGF-I and insulin receptors (Sakano et al. (1991) J. Biol. Chem. 266(31):20626-35).

[0264] IGF2 binds to repeat 11 of the cation-independent M6P receptor. In fact, a minireceptor in which only repeat 11 is fused to the transmembrane and cytoplasmic domains of the cation-independent M6P receptor can bind to IGF2 (with an affinity approximately one-tenth that of the full-length receptor) and mediate the internal translocation and delivery of IGF2 to lysosomes (Grimme et al. (2000) J. Biol. Chem. 275(43):33697-33703). The structure of domain 11 of the M6P receptor is publicly known (registered names 1GP0 and 1GP3 in the Protein Data Base; Brown et al. (2002) EMBO J. 21(5):1054-1062). The putative IGF2 binding site is a hydrophobic pocket that is thought to interact with the hydrophobic amino acids of IGF2; candidate amino acids of IGF2 include leucine 8, phenylalanine 48, alanine 54, and leucine 55. Repeat 11 is sufficient for IGF2 binding, but constructs containing a larger portion of the cation-independent M6P receptor (e.g., repeats 10-13 or 1-15) generally bind to IGF2 with greater affinity and increased pH dependence (see, e.g., Linnell et al. (2001) J. Biol. Chem. 276(26):23986-23991).

[0265] Substitution of IGF2 residue Tyr 27 with Leu, or Ser 26 with Phe, reduces the affinity of IGF2 to the IGF-I receptor by 1 / 94, 1 / 56, and 1 / 4, respectively (Torres et al. (1995) J. Mol. Biol. 248(2):385-401). Deletion of residues 1-7 in human IGF2 resulted in a 1 / 30 reduction in affinity to the human IGF-I receptor and a 12-fold increase in affinity to the rat IGF2 receptor (Hashimoto et al. (1995) J. Biol. Chem. 270(30):18013-8). The shortening of the C-terminus of IGF2 (residues 62-67) also appears to reduce the affinity of IGF2 for the IGF-I receptor to one-fifth (Roth et al. (1991) Biochem. Biophys. Res. Commun. 181(2):907-14).

[0266] Substitution of phenylalanine residue 26 of IGF2 with serine reduces binding to IGFBP1-5 by 5 to 75 times (Bach et al. (1993) J. Biol. Chem. 268(13):9246-54). Substitution of threonine-serine-isoleucine residues 48-50 of IGF2 reduces binding to most IGFBPs by less than 1 / 100 times (Bach et al. (1993) J. Biol. Chem. 268(13):9246-54); however, these residues are also important for binding to cation-independent mannose-6-phosphate receptors. The Y27L substitution, which disrupts binding to the IGF-I receptor, prevents the formation of a triple complex with IGFBP3 and its acid-unstable subunit (Hashimoto et al. (1997) J. Biol. Chem. 272(44):27936-42); this triple complex accounts for the majority of circulating IGF2 (Yu et al. (1999) J. Clin. Lab Anal. 13(4):166-72). Deletion of the first six residues of IGF2 also prevents IGFBP binding (Luthi et al. (1992) Eur. J. Biochem. 205(2):483-90).

[0267] Studies on the interaction of IGF-I with IGFBP have additionally revealed that serine substitution for phenylalanine 16 reduces IGFBP binding by 1 / 40 to 1 / 300, although this does not affect the secondary structure (Magee et al. (1999) Biochemistry 38(48):15863-70). The substitution of glutamate 9 with lysine also resulted in a significant reduction in IGFBP binding. Furthermore, the double mutant lysine 9 / serine 16 showed the lowest affinity for IGFBP. Sequence conservation between this region of IGF-I and IGF2 suggests that similar effects would be observed if a similar mutation were made in IGF2 (glutamate 12 lysine / phenylalanine 19 serine).

[0268] In some embodiments, the IGF2(V43M) sequence includes at least amino acids 48-55, at least amino acids 8-28 and 41-61, or at least amino acids 8-87, or sequence variants thereof that bind to a cation-independent mannose-6-phosphate receptor (e.g., R68A) or truncated versions thereof (e.g., with the C-terminus shortened from position 62).

[0269] In another embodiment of the present invention, the rAAV genome encoding a targeted peptide (e.g., an IGF2-targeted peptide) is inserted into the native GAA coding sequence at the junction with the 70 / 76 kDal mature polypeptide and its C-terminal domain, for example, at position 791. This creates a single chimeric polypeptide. In some embodiments, the protease cleavage site may be inserted immediately downstream of the targeted peptide (e.g., an IGF2-targeted peptide).

[0270] In one embodiment, a targeted peptide, as defined herein, such as an IGF2-targeted peptide, is directly fused to the N-terminus or C-terminus of a GAA polypeptide. In another embodiment, the IGF2-targeted peptide is fused to the N-terminus or C-terminus of the GAA polypeptide by a spacer. In one specific embodiment, the IGF2-targeted peptide is fused to the GAA polypeptide by a spacer of 10 to 25 amino acids. In yet another embodiment, the IGF2-targeted peptide is fused to the GAA polypeptide by a spacer containing a glycine residue.

[0271] In some embodiments, the IGF2-targeting peptide is fused to the GAA polypeptide by a spacer consisting of at least one, two, or three amino acids. In some embodiments, the spacer comprises the amino acids GAP or Gly-Ala-Pro (SEQ ID NO: 31), or an amino acid sequence at least 50% identical thereto. In some embodiments, the spacer is GGG, or GA, or AP, or GP, or variants thereof. In some embodiments, the spacer is encoded by the nucleic acids GGC GCG CCG (SEQ ID NO: 30).

[0272] In some embodiments, the IGF2-targeted peptide is fused to the GAA polypeptide by a spacer containing a helical structure. In another specific embodiment, the IGF2-targeted peptide is fused to the GAA polypeptide by a spacer that is at least 50% identical to the sequence GGGTVGDDDDK (SEQ ID NO: 35). In some embodiments of the methods and compositions disclosed herein, the spacer is SEQ ID NO: 31 (encoded by the nucleic acid of SEQ ID NO: 30). In some embodiments of the methods and compositions disclosed herein, the spacer is selected from SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, or SEQ ID NO: 35, or sequences with at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity with them. (iii) Alternative targeting peptides that bind to the cation-independent M6P receptor (CI-MPR).

[0273] In some embodiments, the targeted peptide is a lysosomal targeted peptide or protein, or any other portion of the IGF2 targeted peptide disclosed herein that binds to the cation-independent M6P / IGF2 receptor (CI-MPR) in a mannose-6-phosphate-independent manner. The CI-MPR also contains binding sites for at least three different ligands that can be used as targeted peptides. As disclosed herein, the IGF2 ligand binds to the CI-MPR primarily through interaction with repeat 11, at pH 7.4 or about pH 7.4, with a dissociation constant of about 14 nM. The CI-MPR can bind to high molecular weight O-glycosylated IGF2 forms. Therefore, in some embodiments, the IGF2 targeted peptide may be post-transcriptionally modified to include O-glycosylation.

[0274] In an alternative embodiment, the targeted peptide that binds to CI-MPR is retinoic acid. Retinoic acid binds to the receptor with a dissociation constant of 2.5 nM. Affinity photolabeling of the cation-independent M6P receptor with retinoic acid does not interfere with the binding of IGF2 or M6P to the receptor, indicating that retinoic acid binds to different sites on the receptor. Binding of retinoic acid to the receptor alters the intracellular distribution of the receptor by higher accumulation of the receptor in cytoplasmic vesicles and also enhances the uptake of M6P-modified β-glucuronidase. Retinoic acid has a photoactivatable moiety that can be used to link it to therapeutic agents without interfering with its ability to bind to the cation-independent M6P receptor.

[0275] Urokinase-type plasminogen receptors (uPARs) also bind to CI-MPR with a dissociation constant of 9 μM. uPARs are receptors immobilized on GPIs on the surface of most cell types, where they function as adhesion molecules in the activation of plasminogen and TGF-β proteolytic activity. Binding of uPARs to the CI-M6P receptor targets it to lysosomes, thereby modulating its activity. Therefore, fusion of the extracellular domain of uPARs, or a portion thereof, with therapeutic agents, which have the ability to bind to cation-independent M6P receptors, enables lysosomal targeting of those agents. Spacers and fusion joints for D.GAA polypeptides

[0276] When GAA is expressed as a fusion protein with a secretory signaling peptide (e.g., SS-GAA fusion polypeptide) or a fusion protein with a targeting peptide (i.e., SS-IGF2-GAA polypeptide double fusion polypeptide), the signaling peptide or IGF2-targeting peptide may be directly fused to the GAA polypeptide or separated from the GAA polypeptide by a linker. An amino acid linker (also referred to herein as a “spacer”) incorporates one or more amino acids other than those that appear at that position in the native protein. Spacers can generally be designed to be flexible or to insert structures such as an α-helix between two protein moieties.

[0277] Accordingly, in some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence encoding an IGF2-GAA fusion polypeptide, and the IGF2-GAA fusion protein further comprises a spacer having a nucleotide sequence of at least one amino acid length located at the N-terminus of the GAA polypeptide and the C-terminus of the IGF2-targeting peptide. In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence comprising a nucleic acid encoding a spacer of at least one amino acid located between the nucleic acid encoding the IGF2-targeting peptide and the nucleic acid encoding the GAA polypeptide.

[0278] In one embodiment, the IGF2-targeting peptide is directly fused to the N-terminus or C-terminus of the GAA polypeptide. In another embodiment, the IGF2-targeting peptide is fused to the N-terminus or C-terminus of the GAA polypeptide by a spacer. In one specific embodiment, the IGF2-targeting peptide is fused to the GAA polypeptide by a spacer of 10-25 amino acids. In another specific embodiment, the IGF2-targeting peptide is fused to the GAA polypeptide by a spacer containing a glycine residue. In yet another specific embodiment, the IGF2-targeting peptide is fused to the GAA polypeptide by a spacer containing a helical structure. In yet another specific embodiment, the IGF2-targeting peptide is fused to the GAA polypeptide by a spacer that is at least 50% identical to the sequence GGGTVGDDDDK (SEQ ID NO: 35).

[0279] In some embodiments, the spacer or linker may be relatively short, for example, at least 1, 2, 3, 4, or 5 amino acids, or for example, the sequence Gly-Ala-Pro (SEQ ID NO: 31) or Gly-Gly-Gly-Gly-Gly-Pro (SEQ ID NO: 32), or it may be longer, for example, 5-10 amino acid lengths or 10-25 amino acid lengths. For example, a flexible repeat linker of 3-4 copies of the sequence (GGGGS (SEQ ID NO: 33)) and an α-helix repeat linker of 2-5 copies of the sequence (EAAAK (SEQ ID NO: 34)) have been described (Arai et al. (2004) Proteins: Structure, Function and Bioinformatics 57:829-838).

[0280] The use of another linker, GGGTVGDDDDK (SEQ ID NO: 35), has also been reported in the context of IGF2 fusion proteins (DiFalco et al. (1997) Biochem. J. 326:407-413) and is included for use. Linkers incorporating the α-helix portion of human serum proteins can be used to minimize immunogenicity of the linker region.

[0281] In some embodiments, the spacer is encoded by the nucleic acid GGC GCG CCG (SEQ ID NO: 30), which encodes an amino acid spacer containing the amino acid GAP or Gly-Ala-Pro (SEQ ID NO: 31).

[0282] The fusion site in the GAA polypeptide for fusing with either a signal peptide (for constructing an SS-GAA fusion protein) or a targeted peptide (e.g., for constructing an SP-IGF2-GAA double fusion polypeptide) should be carefully selected to promote proper folding and activity of each polypeptide in the fusion protein, and to prevent premature separation of the signal peptide from the GAA polypeptide.

[0283] In some embodiments, the IGF2-targeted peptide is fused to the GAA polypeptide by a spacer containing a helical structure. In another specific embodiment, the IGF2-targeted peptide is fused to the GAA polypeptide by a spacer that is at least 50% identical to the sequence GGGTVGDDDDK (SEQ ID NO: 35). In some embodiments of the methods and compositions disclosed herein, the spacer is SEQ ID NO: 31 (encoded by the nucleic acid of SEQ ID NO: 30). In some embodiments of the methods and compositions disclosed herein, the spacer is selected from SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, or SEQ ID NO: 35.

[0284] Four exemplary strategies for producing IGF2-GAA fusion proteins can be developed, which are disclosed in Provisional Application No. 62,937,556 filed November 19, 2019 and PCT / US19 / 61653 filed November 15, 2019, which are incorporated herein by reference in their entirety.

[0285] In some embodiments, the targeted peptide (e.g., IGF2-targeted peptide) may be fused directly or via a spacer to amino acid 40 or amino acid 70 of GAA, a position that enables protein expression, catalytic activity of the GAA protein, and appropriate targeting by the IGF2-targeted peptide, as described herein in the examples. Alternatively, the targeted peptide (e.g., IGF2-targeted peptide) may be fused at or near a cleavage site that separates the C-terminal domain of GAA from the mature polypeptide. This enables the synthesis of a GAA protein having an internal targeted peptide (e.g., IGF2-targeted peptide), which may be cleaved to free the mature polypeptide or C-terminal domain from the targeted domain, depending on the arrangement of the cleavage site, as needed. Alternatively, the mature polypeptide may be synthesized as a fusion protein at approximately position 791 without incorporating the C-terminal sequence into the open reading frame of the expression construct.

[0286] To facilitate the folding of the IGF2-targeted peptide, the GAA amino acid residues adjacent to the fusion junction can be modified. For example, since GAA cysteine ​​residues can interfere with the proper folding of the targeted peptide (e.g., IGF2-targeted peptide), the terminal GAA cysteine ​​952 can be deleted or replaced with serine to accommodate the C-terminal targeted peptide (e.g., IGF2-targeted peptide). The targeted peptide (e.g., IGF2-targeted peptide) can also be fused immediately before the last Cys952. In addition to the mutation of the last Cys952 to serine, the second-to-last cys938 can be changed to proline. E.CS sequence

[0287] In some embodiments of the methods and compositions disclosed herein, the recombinant AAV vector comprises a heterogeneous nucleic acid sequence further comprising collagen stability (CS) sequences located at 3' and 5' of the 3'ITR sequence of the nucleic acid encoding the GAA polypeptide. In some embodiments, the rAAV genome disclosed herein comprises a heterogeneous nucleic acid sequence which may optionally include collagen stability sequences (CS or CSS) located at 3' of the GAA gene and 5' of the polyA signaling ring. In some embodiments, the CS sequence may be replaced by the 3'UTR sequence disclosed herein.

[0288] An exemplary collagen-stabilizing sequence includes CCCAGCCCACTTTTCCCCAA (SEQ ID NO: 65), or a sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto. An exemplary collagen-stabilizing sequence may have the amino acid sequence of PSPLFP (SEQ ID NO: 66), or an amino acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto. The CS sequence is disclosed in its entirety by reference in Holick and Liebhaber, Proc. Nat. Acad. Sci. 94: 2410-2414, 1997 (see, for example, Figure 3, page 5205). F. Promoter

[0289] In some embodiments, to achieve an appropriate level of GAA expression, the rAAV genotype includes a liver-specific promoter (LSP). The LSP enables the expression of a gene operably linked in the liver and, in some embodiments, may be an inducible LSP. In embodiments, the LSP is located upstream at 5' and operably linked to a heterologous nucleic acid sequence encoding the GAA protein. Exemplary liver-specific promoters are disclosed herein and include, for example, LSPs including SEQ ID NOs. 86, 91-96 or 146-150, or functional variants or functional fragments thereof, or any LSPs or functional fragments or functional variants thereof listed in Table 4 herein. In some embodiments of the compositions and methods disclosed herein, the liver-specific promoter may be a liver-specific cis-regulatory element (CRE), a synthetic liver-specific cis-regulatory module (CRM), or a synthetic liver-specific promoter selected from SEQ ID NOs. 270-341 (minimal LSP with CRM) or SEQ ID NOs. 342-430 (synthetic liver-specific proximal promoter) disclosed in Table 4 herein. In some embodiments, the rAAV vector genome may include one or more constitutive promoters, such as viral promoters or promoters derived from mammalian genes that are generally active in promoting transcription. (i) Synthetic liver-specific promoter

[0290] In some embodiments of the methods and compositions disclosed herein, the promoter is a liver-specific promoter and may be selected from, but is not limited to, those listed in Table 4 disclosed herein, or functional variants thereof, and / or promoters selected from, or functional variants thereof, from Tables 4A and 4B of U.S. Provisional Application No. 62,937,556 filed November 19, 2019.

[0291] The trans tiretin promoter (TTR) (SEQ ID NO: 431), SP0412 (SEQ ID NO: 91), and SP0422 (SEQ ID NO: 92) are used as exemplary liver-specific promoters in this specification and in the examples (see Examples 1, 12, and 13). Those skilled in the art can easily replace TTR with any liver-specific promoter or functional variant thereof disclosed in Table 4 of this specification, and / or any of the functional variants thereof selected from Tables 4A and 4B of U.S. Provisional Application No. 62,937,556 filed November 19, 2019. The liver-specific promoter may include a liver-specific cis-regulatory element (CRE), a synthetic liver-specific cis-regulatory module (CRM), or a synthetic liver-specific promoter disclosed in this specification, in Tables 4A and 4B of U.S. Provisional Application No. 62,937,556 filed November 19, 2019, or functional variants thereof.

[0292] Table 4 shows exemplary liver-specific promoters. The relatively small size of liver-specific promoters disclosed herein is advantageous because they incorporate the minimum amount of payload in the vector. This is particularly important when LSPs are used in vectors with limited capabilities, such as AAV vectors.

[0293] [Table 4-1] [Table 4-2] [Table 4-3] (ii) Functional variants of liver-specific promoters

[0294] In some embodiments, synthetic liver-specific promoters useful in the methods and compositions disclosed herein are bispecific or trispecific promoters, as defined herein. For example, a liver bispecific promoter is active in the liver and one other tissue, e.g., muscle. In addition, another example of a liver bispecific promoter is active in the liver and one other tissue, e.g., brain. For example, a liver trispecific promoter is active in the liver and two other tissues, e.g., muscle and brain. In addition, another example of a liver trispecific promoter is active in the liver and two other tissues, e.g., kidney and muscle.

[0295] In some embodiments, a synthetic liver-specific promoter that is at least 50%, 60%, 70%, 80%, 90%, or 95% identical to any of sequence numbers 86, 91-96, 146-150, or 270-430 comprises a source-regulating nucleic acid sequence that is preferentially active in the liver and is also less active in a second type of cell or tissue, e.g., muscle or CNS (e.g., <50% of total expression, or about 49-40%, or about 39-30%, or about 29-20%, or about 19-10%, or <10%).

[0296] In some embodiments, the promoter is a synthetic liver-specific promoter comprising the cis-regulating elements (CREs) CRE0051 (SEQ ID NO: 97) and CRE0042 (SEQ ID NO: 104), or a combination of their functional variants. These functional variants may have sequences that are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to them. Typically, the CREs are operably coupled to the promoter element. In some preferred embodiments, the liver-specific promoter comprises the CREs or their functional variants in the order CRE0051 (SEQ ID NO: 97), CRE0042 (SEQ ID NO: 104), and then the promoter element (the order is given in the upstream-to-downstream direction, as is customary in the art).

[0297] The promoter element may be any preferred proximal promoter or minimal promoter. In some embodiments, the promoter element is a minimal promoter. When the promoter is a proximal promoter, it is generally preferable that the proximal promoter be liver-specific.

[0298] In some preferred embodiments, the promoter element is CRE0059 (SEQ ID NO: 110) or a functional variant thereof. CRE0059 is a proximal promoter, as will be discussed further below.

[0299] Therefore, in one embodiment, the promoter includes the following regulatory elements: CRE0051 (SEQ ID NO: 97), CRE0042 (SEQ ID NO: 104), and CRE0059 (SEQ ID NO: 110), or functional variants thereof.

[0300] Functional variants of CRE0051 (SEQ ID NO: 97) are regulatory elements that have a modified sequence from CRE0051 but substantially retain its activity as a liver-specific CRE. Those skilled in the art will recognize that it is possible to modify the sequence of CRE while retaining its ability to bind to and enhance the expression of essential transcription factors (TFs). Functional variants may include substitutions, deletions, and / or insertions compared to the reference CRE, provided that they do not substantially render the CRE nonfunctional.

[0301] In some embodiments, a functional variant of CRE0051 can be considered a CRE that substantially retains its activity when substituted for CRE0051 in a promoter. For example, a liver promoter containing a functional variant of CRE0051 substituted for CRE0051 preferably retains 80% of its activity, more preferably 90%, more preferably 95%, and even more preferably 100% of its activity. For example, considering promoter SP0412 (SEQ ID NO: 91) as an example, CRE0051 in SP0412 can be replaced with a functional variant of CRE0051, and the promoter substantially retains its activity. The retention of activity can be evaluated by comparing the expression of a suitable reporter under equivalent conditions and under the control of a reference promoter with that of an otherwise identical promoter containing the substituted CRE.

[0302] In some embodiments, functional variants of CRE0051 include transcription factor binding sites (TFBSs) for the same liver-specific TFs as CRE0051. The liver-specific TFBSs present in CRE0051, listed in the order in which they exist, are HNF1 (SEQ ID NO: 98), HNF4 (SEQ ID NO: 99), HNF3 (SEQ ID NO: 100), HNF1' (SEQ ID NO: 101), and HNF3' (SEQ ID NO: 102); see Table 5. Therefore, functional variants of CRE0051 preferably include all of these TFBS. Preferably, they exist in the same order in which they exist in CRE0051, i.e., HNF1 (SEQ ID NO: 98), HNF4 (SEQ ID NO: 99), HNF3 (SEQ ID NO: 100), HNF1' (SEQ ID NO: 101), and HNF3' (SEQ ID NO: 102). When cis-regulatory elements associate with promoters and genes, this order is preferably considered to be upstream-to-downstream (i.e., distal to the transcription start site (TSS) to proximal to the TSS). Spacer sequences may be provided between adjacent TFBSs. In some embodiments, TFBSs may preferably overlap, provided they remain functional, i.e., both overlapping sequences are capable of binding to their respective TFs to the extent necessary to regulate expression.

[0303] In some embodiments, functional variants of CRE0051 (SEQ ID NO: 97) include the following TFBS sequences: GTTAATTTTTAAA(HNF1)(SEQ ID NO: 98), GTGGCCCTTGG(HNF4)(SEQ ID NO: 99), TGTTTGC(HNF3)(SEQ ID NO: 100), TGGTAATAATCTCA(HNF1')(SEQ ID NO: 101), then ACAAACA(HNF3)(SEQ ID NO: 102), complementary sequences to them, or functional variants of these TFBS sequences that maintain their ability to bind to their respective TFs. These may be in the same order as CRE0051, i.e., they may be in the order shown above. It is well known in the Art that sequence variability exists associated with TFBS, and that for a given TFBS, there is typically a consensus sequence, from which some degree of deviation typically exists. Further information about the deformations occurring in TFBS can be illustrated using a position-specific weight matrix (PWM), which typically represents the frequency with which a given nucleotide is found at a given location in the consensus sequence. Details of TF consensus sequences and associated PWMs can be found, for example, in the Jaspar or Transfac databases (http: / / jaspar.genereg.net / and http: / / gene-regulation.com / pub / databases.html). This information enables those skilled in the art to modify the sequence in any given TFBS of a CRE in a manner that preserves, and in some cases further enhances, the functionality of the CRE.

[0304] In some embodiments, the functional variant of CRE0051 is the array:

[0305] The sequence includes GTTAATTTTTAAA-Na-GTGGCCCTTGG-Nb-TGTTTGC-Nc-TGGTTAATAATCTCA-Nd-ACAAACA (SEQ ID NO: 103), or a sequence that is at least 70%, 80%, 90%, 95%, or 99% identical thereto, where Na, Nb, Nc, and Nd represent spacer sequences as needed. If present, Na may have a length of 10 to 26 nucleotides, preferably 14 to 22 nucleotides, more preferably 18 nucleotides. If present, Nb may have a length of 8 to 22 nucleotides, preferably 12 to 20 nucleotides, more preferably 16 nucleotides. If present, Nc may have a length of 1 to 10 nucleotides, preferably 1 to 5 nucleotides, more preferably 2 nucleotides. If present, Nd may preferably have a length of 1 to 13 nucleotides, preferably 2 to 9 nucleotides, more preferably 5 nucleotides.

[0306] In some embodiments, CRE consists of sequence numbers 98-102, or functional variants thereof.

[0307] It should be noted that CRE or its functional variants may be provided to either strand of a double-stranded polynucleotide and may be provided in either direction. Therefore, the complementary and reverse complementary sequences of SEQ ID NOs. 97-102, or their functional variants, are within the scope of the present invention. Single-stranded nucleic acids containing sequences of SEQ ID NOs. 97 or 103, or their functional variants, are also within the scope of the present invention.

[0308] In some embodiments, a CRE comprising or consisting of CRE0051 (SEQ ID NO: 97) or a functional variant thereof has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, or 100 or less nucleotides.

[0309] In some embodiments, a CRE comprising or consisting of CRE0042 (SEQ ID NO: 104) or a functional variant thereof has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, or 100 or less nucleotides. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0310] Functional variants of CRE0042 (SEQ ID NO: 104) are regulatory elements that have a modified sequence from CRE0042 but substantially retain their activity as liver-specific CREs. Those skilled in the art will recognize that it is possible to modify the sequence of a CRE while retaining its ability to bind to and enhance the expression of essential transcription factors (TFs). Functional variants may include substitutions, deletions, and / or insertions compared to the reference CRE, provided that they do not substantially render the CRE nonfunctional.

[0311] In some embodiments, a functional variant of CRE0042 (SEQ ID NO: 104) can be considered a CRE that substantially retains its activity when substituted for CRE0042 in a promoter. For example, a promoter containing a functional variant of CRE0042 substituted for CRE0042 preferably retains 80% of its activity, more preferably 90%, more preferably 95%, and even more preferably 100% of its activity (compared to a reference promoter containing CRE0042 (SEQ ID NO: 104)). For example, considering promoter SP0412 as an example, CRE0042 (SEQ ID NO: 104) in SP412 (SEQ ID NO: 91) can be replaced with a functional variant of CRE0042, and the promoter substantially retains its activity. The retention of activity can be evaluated by comparing the expression of a suitable reporter under the control of a reference promoter under equivalent conditions with that of an otherwise identical promoter containing the substituted CRE.

[0312] In some embodiments, functional variants of CRE0042 (SEQ ID NO: 104) preferably include TFBS for the same liver-specific TFs as CRE0042. The liver-specific TFBS present in CRE0042, listed in the order in which they exist, are HNF-3 (SEQ ID NO: 106), C / EBP (SEQ ID NO: 107), HNF-4 (SEQ ID NO: 108), and C / EBP' (SEQ ID NO: 109). Therefore, functional variants of CRE0042 preferably include all of these TFBS. Preferably, they exist in the same order in which they exist in CRE0042, i.e., HNF-3, C / EBP, HNF-4, and then C / EBP. When cis-regulatory elements associate with promoters and genes, this order is preferably considered to be in an upstream-to-downstream direction (i.e., from distal to proximal to the transcription start site (TSS)). Spacer sequences may be provided between adjacent TFBS. In some embodiments, the TFBS may preferably overlap, provided they remain functional, i.e., both overlapping sequences can bind to their respective TFs.

[0313] In some embodiments, functional variants of CRE042 (SEQ ID NO: 104) include the following TFBS sequences: GTTCAAACATG(HNF-3)(SEQ ID NO: 106), CTAATACTCTG(C / EBP)(SEQ ID NO: 107), TGCAAGGGTCAT(HNF-4)(SEQ ID NO: 108), and TTACTCAACA(C / EBP)(SEQ ID NO: 109), as well as complementary sequences or functional variants of these TFBS sequences that maintain their ability to bind to their respective TFs. These may be in the same order as CRE0042, i.e., in the order shown above. As discussed above, it is well known in the Art that sequence variability exists associated with TFBS, and that for a given TFBS, there typically exists a consensus sequence, from which some degree of deviation typically exists.

[0314] In some embodiments of the present invention, a functional variant of CRE0042 is the array:

[0315] The sequence includes GTTCAAACATG-Na-CTAATACTCTG-Nb-TGCAAGGGTCAT-Nc-TTACTCAACA (SEQ ID NO: 105), or a sequence that is at least 70%, 80%, 90%, 95%, or 99% identical thereto, where Na, Nb, and Nc represent spacer sequences as needed. If present, Na may have a length of 1 to 10 nucleotides, preferably 1 to 5 nucleotides, more preferably 2 nucleotides. If present, Nb may have a length of 1 to 10 nucleotides, preferably 2 to 6 nucleotides, more preferably 4 nucleotides. If present, Nc may have a length of 8 to 23 nucleotides, preferably 10 to 20 nucleotides, more preferably 15 nucleotides.

[0316] In some embodiments of the present invention, the cis-adjustment enhancer element consists of CRE0042 (SEQ ID NO: 104) or a functional variant thereof.

[0317] It should be noted that CRE or its functional variants may be provided to either strand of a double-stranded polynucleotide and may be provided in either direction. Therefore, the complementary and reverse complementary sequences of SEQ ID NOs. 104 or 105, or their functional variants, are within the scope of the present invention. Single-stranded nucleic acids containing sequences of SEQ ID NOs. 104 or 105, or their functional variants, are also within the scope of the present invention. In some embodiments, a CRE comprising or consisting of CRE0042 (SEQ ID NO: 104) or a functional variant thereof has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, 100 or less nucleotides, or 80 or less nucleotides.

[0318] In some embodiments, a CRE comprising or consisting of CRE0059 (SEQ ID NO: 110) or its functional variant has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, or 100 or less nucleotides. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0319] As discussed above, functional variants of CRE0059 (SEQ ID NO: 110) substantially retain the ability of CRE00059 to act as a liver-specific promoter element. For example, when a functional variant of CRE0059 is substituted with the liver-specific promoter SP0412, the modified promoter retains at least 80%, more preferably at least 90%, more preferably at least 95%, and even more preferably 100% of the activity of SP0412 (SEQ ID NO: 91). Preferably, functional variants of CRE0059 include a sequence having at least 70%, 80%, 90%, 95%, or 99% identity with SEQ ID NO: 110.

[0320] CRE0059 is a proximal promoter that includes a liver-specific TF, i.e., a TFBS for HNF1, upstream of the TSS. Therefore, a functional variant of CRE0059 preferably includes a TFBS for HNF1 upstream of the TSS.

[0321] In some embodiments, a functional variant of CRE0059 includes a sequence that is at least 70% identical (preferably at least 80%, 90%, 95%, or 99% identical) to sequence number 110, which contains a TFBS for HNF1 (sequence number 111), and downstream of the TFBS for HNF1, it includes a TSS sequence (referred to as p1@SERPINA1 or p1@AFP) that is at least 80%, 90%, 95%, or completely identical to sequence number 112.

[0322] In some embodiments, a functional variant of CRE0059 includes a sequence having at least 70%, 80%, 90%, 95%, or 99% identity with SEQ ID NO: 110, which further includes a TFBS containing SEQ ID NO: 111 for HNF1 at or near positions 24–36, and a TSS sequence at or near positions 73–93 that is at least 80%, 90%, 95%, or completely identical to SEQ ID NO: 112, the positions being numbered with reference to SEQ ID NO: 110. In this context, "at" or "near" preferably means within 10, 5, 4, 3, 2, or 1 nucleotide of the positions enumerated with reference to SEQ ID NO: 110. Preferred TFBS sequences are SEQ ID NO: 111 and SEQ ID NO: 112, but alternative TFBS sequences may be used.

[0323] In some embodiments, promoter elements containing or consisting of CRE0059 (SEQ ID NO: 110) or functional variants thereof have lengths of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, 110 or less nucleotides, or 95 or less nucleotides.

[0324] In some embodiments, liver-specific promoters useful in the methods and compositions disclosed herein include or consist of SEQ ID NO: 91 or its functional variants. In some embodiments, the functional variants may have sequences that are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it. Promoters having the sequence of SEQ ID NO: 91 are referred to as SP0412. The SP0412 promoter is particularly preferred in some embodiments. This promoter has been found to be potent and is also very short, which is advantageous in some situations. a. SP0265 (also known as SP131A1) and its variants

[0325] In some embodiments, the promoter is a synthetic liver-specific promoter comprising CRE0051 (SEQ ID NO: 97), CRE0058 (SEQ ID NO: 113), CRE0065 (SEQ ID NO: 117), and CRE0066 (SEQ ID NO: 122), or combinations of their functional variants. Typically, the CREs are operably coupled to the promoter element. In some preferred embodiments, the liver-specific promoter comprises the CREs or their functional variants in the order of CRE0051, CRE0058, CRE0065, CRE0066, and then the promoter element (from upstream to downstream).

[0326] The promoter element may be any preferred proximal promoter or minimal promoter. In some preferred embodiments, the promoter element is a minimal promoter. When the promoter is a proximal promoter, it is generally preferable that the proximal promoter be liver-specific.

[0327] In some preferred embodiments, the promoter element is CRE0052 (also referred to as G6PC) (SEQ ID NO: 126). CRE0052 is a minimal promoter (also referred to as a core promoter).

[0328] In some embodiments, the liver-specific promoter includes the following regulatory elements (or their functional variants): CRE0051, CRE0058, CRE0065, CRE0066, and then CRE0052 (SEQ ID NO: 126). The sequence of CRE0051 (sequence number 97) and its variants are shown above.

[0329] In some embodiments, a CRE comprising or consisting of CRE0058 (SEQ ID NO: 113) or its functional variant has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, 100 or less nucleotides, or 80 or less nucleotides. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0330] Functional variants of CRE0058 (SEQ ID NO: 113) are regulatory elements that have a modified sequence from CRE0058 but substantially retain their activity as liver-specific CRE. Those skilled in the art will recognize that it is possible to modify the sequence of CRE while retaining its ability to bind to essential TF and enhance its expression. Functional variants may include substitutions, deletions, and / or insertions compared to the reference CRE, provided that they do not render the CRE nonfunctional.

[0331] In some embodiments, a functional variant of CRE0058 (SEQ ID NO: 113) can be considered a CRE that substantially retains its activity when substituted for CRE0058 in a promoter. For example, a promoter containing a functional variant of CRE0058 substituted for CRE0058 preferably retains 80% of its activity, more preferably 90%, more preferably 95%, and even more preferably 100% of its activity (compared to a reference promoter containing CRE0058 (SEQ ID NO: 113)). For example, considering promoter SP0265 (SEQ ID NO: 94) as an example, CRE0058 in SP0265 can be replaced with a functional variant of CRE0058, and the promoter substantially retains its activity. Retention of activity can be evaluated by comparing the expression of a suitable reporter under the control of a reference promoter under equivalent conditions with that of an otherwise identical promoter containing the substituted CRE.

[0332] In some embodiments, functional variants of CRE0058 (SEQ ID NO: 113) preferably contain transcription factor binding sites (TFBSs) for the same liver-specific transcription factors (TFs) as CRE0058. The liver-specific TFBS present in CRE0058, listed in the order in which they exist, are HNF4 (SEQ ID NO: 115) and c / EBP (SEQ ID NO: 116). Therefore, functional variants of CRE0058 preferably contain all of these TFBS. Preferably, they exist in the same order in which they exist in CRE0058, i.e., HNF4 followed by c / EBP. When CRE associates with promoters and genes, this order is preferably considered to be upstream to downstream (i.e., distal to the transcription start site (TSS) to proximal to the TSS). Spacer sequences may be provided between adjacent TFBS. In some embodiments, the TFBS may preferably overlap, provided that they remain functional, i.e., both overlapping sequences are capable of binding to their respective TFs.

[0333] In some embodiments, functional variants of CRE0058 (SEQ ID NO: 113) include the following TFBS sequences: CGCCCTTTGGACC(HNF4)(SEQ ID NO: 115) and GACCTTTTGCAATCCTGG(c / EBP)(SEQ ID NO: 116), complementary sequences thereto, or functional variants of these TFBS sequences that maintain their ability to bind to their respective TFs. These may be in the same order as CRE0058, i.e., in the order shown above. As discussed above, it is well known in the Art that sequence variability exists associated with TFBS, and that for a given TFBS, there typically exists a consensus sequence, from which some degree of deviation typically exists.

[0334] In some embodiments, a functional variant of CRE0058 includes the sequence:GCGCCCTTTGGACCTTTTGCAATCCTGG (SEQ ID NO: 114), or a sequence that is at least 70%, 80%, 90%, 95%, or 99% identical thereto. In some embodiments, CRE consists of SEQ ID NO: 113 or 114, or functional variants thereof.

[0335] It should be noted that CRE or its functional variants may be provided to either strand of a double-stranded polynucleotide and may be provided in either direction. Therefore, the complementary and reverse complementary sequences of SEQ ID NOs. 113 or 114, or their functional variants, are within the scope of the present invention. Single-stranded nucleic acids containing sequences of SEQ ID NOs. 113 or 114, or their functional variants, are also within the scope of the present invention.

[0336] In some embodiments, a CRE comprising or consisting of CRE0058 or a functional variant thereof has a length of 120 or fewer nucleotides, 80 or fewer nucleotides, 60 or fewer nucleotides, or 40 or fewer nucleotides.

[0337] In some embodiments, a CRE comprising or consisting of CRE0065 (SEQ ID NO: 117) or its functional variant has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, 100 or less nucleotides, or 80 or less nucleotides. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0338] Functional variants of CRE0065 (SEQ ID NO: 117) are regulatory elements that can be modified from CRE0065 but have a sequence that substantially retains their activity as liver-specific CRE. It will be recognized by those skilled in the art that it is possible to modify the sequence of CRE while retaining its ability to bind to essential TF and enhance its expression. Functional variants may include substitutions, deletions, and / or insertions compared to the reference CRE, provided that they do not render the CRE nonfunctional.

[0339] In some embodiments, a functional variant of CRE0065 can be considered a CRE that substantially retains its activity when substituted for CRE0065 in a promoter. For example, a promoter containing a functional variant of CRE0065 substituted for CRE0065 preferably retains 80% of its activity, more preferably 90%, more preferably 95%, and even more preferably 100% of its activity (compared to a reference promoter containing CRE0065). For example, considering promoter SP0265 (SEQ ID NO: 94) as an example, CRE0065 in SP0265 can be replaced with a functional variant of CRE0065, and the promoter substantially retains its activity. Retention of activity can be evaluated by comparing the expression of a suitable reporter under the control of a reference promoter under equivalent conditions with that of an otherwise identical promoter containing the substituted CRE.

[0340] In some embodiments, a functional variant of CRE0065 preferably includes TFBS for the same liver-specific TFs as CRE0065. The liver-specific TFBS present in CRE0065, listed in the order in which they exist, are RXR-alpha (SEQ ID NO: 119), HNF3 (SEQ ID NO: 120), and HNF3 (SEQ ID NO: 121). Therefore, a functional variant of CRE0065 preferably includes all of these TFBS. Preferably, they exist in the same order in which they exist in CRE0065, i.e., RXR-alpha, HNF3, and then HNF3. When cis-regulatory elements associate with promoters and genes, this order is preferably considered to be in an upstream-to-downstream direction (i.e., from distal to proximal to the transcription start site (TSS)). Spacer sequences may be provided between adjacent TFBS. In some embodiments, the TFBS may preferably overlap, provided they remain functional, i.e., both overlapping sequences are capable of binding to their respective TFs.

[0341] In some embodiments, functional variants of CRE0065 include the following TFBS sequences: ACTGAACCCTTGACCCCTGCCCT(RXR alpha)(SEQ ID NO: 119), CTGTTTGCCC(HNF3)(SEQ ID NO: 120), and CTATTTGCCC(HNF3)(SEQ ID NO: 121), complementary sequences to them, or functional variants of these TFBS sequences that maintain their ability to bind to their respective TFs. These may be in the same order as CRE0065, i.e., in the order shown above. As discussed above, it is well known in the Art that sequence variability exists associated with TFBS, and that for a given TFBS, there typically exists a consensus sequence, from which some degree of deviation typically exists.

[0342] In some embodiments, the functional variant of CRE0065 is the array:

[0343] The sequence comprises ACTGAACCCTTGACCCCT-Na-CTGTTTGCCC-Nb-TATTTGCCC (SEQ ID NO: 118), or a sequence that is at least 70%, 80%, 90%, 95%, or 99% identical thereto, where Na and Nb represent spacer sequences as needed. If present, Na may have a length of 14 to 30 nucleotides, preferably 18 to 26 nucleotides, more preferably 22 nucleotides. If present, Nb may have a length of 1 to 10 nucleotides, preferably 2 to 6 nucleotides, more preferably 4 nucleotides. In some embodiments, the CRE consists of SEQ ID NO: 117 or 118, or functional variants thereof.

[0344] It should be noted that CRE or its functional variants may be provided to either strand of a double-stranded polynucleotide and may be provided in either direction. Therefore, the complementary and reverse complementary sequences of SEQ ID NOs. 117 or 118, or their functional variants, are within the scope of the present invention. Single-stranded nucleic acids containing sequences of SEQ ID NOs. 117 or 118, or their functional variants, are also within the scope of the present invention.

[0345] In some preferred embodiments, a CRE comprising or consisting of CRE0065 or a functional variant thereof has a length of 200 or fewer nucleotides, 150 or fewer nucleotides, 125 or fewer nucleotides, 90 or fewer nucleotides, or 72 or fewer nucleotides.

[0346] In some embodiments, a CRE comprising or consisting of CRE0066 (SEQ ID NO: 122) or its functional variant has a length of 200 or less nucleotides, 150 or less nucleotides, 125 or less nucleotides, 100 or less nucleotides, or 80 or less nucleotides. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0347] Functional variants of CRE0066 (SEQ ID NO: 122) are regulatory elements that have a modified sequence from CRE0066 but substantially retain their activity as liver-specific CRE. Those skilled in the art will recognize that it is possible to modify the sequence of CRE while retaining its ability to bind to essential TF and enhance its expression. Functional variants may include substitutions, deletions, and / or insertions compared to the reference CRE, provided that they do not render the CRE nonfunctional.

[0348] In some embodiments, a functional variant of CRE0066 can be considered a CRE that substantially retains its activity when substituted for CRE0066 in a promoter. For example, a promoter containing a functional variant of CRE0066 substituted for CRE0066 preferably retains 80% of its activity, more preferably 90%, more preferably 95%, and even more preferably 100% of its activity (compared to a reference promoter containing CRE0066 (SEQ ID NO: 122)). For example, considering promoter SP0265 (SEQ ID NO: 94) as an example, CRE0066 in SP0265 can be replaced with a functional variant of CRE0066, and the promoter substantially retains its activity. The retention of activity can be evaluated by comparing the expression of a suitable reporter under the control of a reference promoter under equivalent conditions with that of an otherwise identical promoter containing the substituted CRE.

[0349] In some embodiments, a functional variant of CRE0066 includes transcription factor binding sites (TFBSs) for the same liver-specific transcription factors (TFs) as CRE0066. The liver-specific TFBS present in CRE0066, listed in the order in which they exist, are HNF4G (SEQ ID NO: 124) and FOS::JUN (SEQ ID NO: 125). Therefore, a functional variant of CRE0066 preferably includes all of these TFBS. Preferably, they exist in the same order in which they exist in CRE0066, i.e., HNF4G followed by FOS::JUN. When cis-regulatory elements associate with promoters and genes, this order is preferably considered to be upstream-to-downstream (i.e., distal to the transcription start site (TSS) to proximal to the TSS). Spacer sequences may be provided between adjacent TFBS. In some embodiments, the TFBS may preferably overlap, provided they remain functional, i.e., both overlapping sequences are capable of binding to their respective TFs.

[0350] In some embodiments, functional variants of CRE0066 (SEQ ID NO: 122) include the following TFBS sequences: GCAGGGCAAAGTGCA(HNF4G)(SEQ ID NO: 124) and GATGACTCAG(FOS::JUN)(SEQ ID NO: 125), complementary sequences, or functional variants of these TFBS sequences that maintain their ability to bind to their respective TFs. These may be in the same order as CRE0066, i.e., in the order shown above. As discussed above, it is well known in the Art that sequence variability exists associated with TFBS, and that for a given TFBS, there typically exists a consensus sequence, from which some degree of deviation typically exists.

[0351] In some embodiments, the functional variant of CRE0066 (SEQ ID NO: 122) includes the sequence:GCAGGGCAAAGTGCA-Na-GATGACTCAG (SEQ ID NO: 123), or a sequence that is at least 70%, 80%, 90%, 95%, or 99% identical thereto, where Na represents an optional spacer sequence. If present, Na has a length of 10 to 28 nucleotides, preferably 14 to 24 nucleotides, and more preferably 19 nucleotides, as is necessary. In some embodiments, CRE consists of CRE0066 or a functional variant thereof.

[0352] It should be noted that CRE or its functional variants may be provided to either strand of a double-stranded polynucleotide and may be provided in either direction. Therefore, the complementary and reverse complementary sequences of SEQ ID NOs. 122 or 123, or their functional variants, are within the scope of the present invention. Single-stranded nucleic acids containing sequences of SEQ ID NOs. 122 or 123, or their functional variants, are also within the scope of the present invention.

[0353] In some preferred embodiments, a CRE comprising or consisting of CRE0066 or a functional variant thereof has a length of 200 or fewer nucleotides, 150 or fewer nucleotides, 125 or fewer nucleotides, 100 or fewer nucleotides, or 87 or fewer nucleotides.

[0354] In some embodiments, the promoter includes promoter element CRE0052 (also referred to as G6PC) (SEQ ID NO: 126), or a functional variant or functional fragment thereof. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0355] The functional variant of CRE0052 (SEQ ID NO: 126) substantially retains the ability of CRE0052 to act as a liver-specific promoter element. For example, when the functional variant of CRE0052 is substituted for the liver-specific promoter SP0265, the modified promoter retains at least 80%, more preferably at least 90%, more preferably at least 95%, and even more preferably 100% of the activity of SP0265.

[0356] In one embodiment, the liver-specific promoter includes SEQ ID NO: 94 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 94. The promoter having the sequence according to SEQ ID NO: 94 is referred to as SP0265 (also known as SP131A1 or LVR 131_A1). Promoters containing or consisting of SEQ ID NO: 94 are particularly preferred in some embodiments.

[0357] In some embodiments, the liver-specific promoter is SEQ ID NO: 94 and includes functional variants of SEQ ID NOs: 97, 113, 117, 122, or 126, which may have sequences identical to at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% of them. b.SP0239 and its variants

[0358] In some embodiments, the promoter is a synthetic liver-specific promoter comprising the following CREs: CRE0018 (SEQ ID NO: 151), CRE0051 (SEQ ID NO: 97), CRE0058 (SEQ ID NO: 113), CRE0065 (SEQ ID NO: 117), and CRE0066 (SEQ ID NO: 122), or functional variants thereof. Typically, the CREs are operably coupled to the promoter element. In some preferred embodiments, the liver-specific promoter comprises the CREs or their functional variants in the order of CRE0018, CRE0051, CRE0058, CRE0065, CRE0066, and then the promoter element (from upstream to downstream).

[0359] The promoter element may be any preferred proximal promoter or minimal promoter. In some preferred embodiments, the promoter element is CRE0052 (also referred to as G6PC), which is a minimal promoter (also referred to as a core promoter).

[0360] In some embodiments, the liver-specific promoter includes the following elements (or functional variants thereof): CRE0018, CRE0051, CRE0058, CRE0065, CRE0066, and then CRE0052.

[0361] The sequences of CRE0051, CRE0058, CRE0065, and CRE0066, as well as the promoter element CRE0052, and their functional variants are shown above.

[0362] CRE0018 has the sequence of sequence number 151, or a functional variant or functional fragment thereof. The functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0363] Functional variants of CRE0018 (SEQ ID NO: 151) are regulatory elements that have a modified sequence from CRE0018 but substantially retain their activity as liver-specific CRE. Those skilled in the art will recognize that it is possible to modify the sequence of CRE while retaining its ability to bind to essential TF and enhance its expression. Functional variants may include substitutions, deletions, and / or insertions compared to the reference CRE, provided that they do not substantially render the CRE nonfunctional.

[0364] In some embodiments, a functional variant of CRE0018 can be considered a CRE that substantially retains its activity when substituted for CRE0018 in a promoter. For example, a promoter containing a functional variant of CRE0018 substituted for CRE0018 preferably retains 80% of its activity, more preferably 90%, more preferably 95%, and even more preferably 100% of its activity (compared to a reference promoter containing CRE0018). For example, considering promoter SP0239 as an example, CRE0018 in SP0239 can be replaced with a functional variant of CRE0018, and the promoter substantially retains its activity. The retention of activity can be evaluated by comparing the expression of a suitable reporter under the control of a reference promoter, under equivalent conditions, with that of an otherwise identical promoter containing the substituted CRE.

[0365] In some embodiments, a functional variant of CRE0018 includes TFBS for the same liver-specific TFs as CRE0018. The liver-specific TFBS present in CRE0018, listed in the order in which they exist, are IRF (SEQ ID NO: 129), NF1 (SEQ ID NO: 130), HNF3 (SEQ ID NO: 131), HBLF (SEQ ID NO: 132), RXRa (SEQ ID NO: 133), EF-C (SEQ ID NO: 134), NF1 (SEQ ID NO: 135), and c / EBP (SEQ ID NO: 136). Therefore, a functional variant of CRE0018 preferably includes all of these TFBS. Preferably, they exist in the same order in which they exist in CRE0018, i.e., IRF, NF1, HNF3, HBLF, RXRa, EF-C, NF1, and then c / EBP. When CREs associate with promoters and genes, this order is preferably considered to be upstream-to-downstream (i.e., distal to the transcription start site (TSS) to proximal to the TSS). Spacer sequences may be provided between adjacent TFBSs. In some embodiments, TFBSs may preferably overlap, provided they remain functional, i.e., both overlapping sequences are capable of binding to their respective TFs.

[0366] In some embodiments, functional variants of CRE0018 include the following TFBS sequences: CTTTCACTTTC(IRF)(SEQ ID NO: 129), TCGCCAA(NF1)(SEQ ID NO: 130), TGTGTAAACA(HNF3)(SEQ ID NO: 131), TGTAAACAATA(HBLF)(SEQ ID NO: 132), CTGAACCTTTACCC(RXRa)(SEQ ID NO: 133), GTTGCCCGGCAAC(EF-C)(SEQ ID NO: 134), CAGGTCTGTGCCAAG(NF1)(SEQ ID NO: 135), TGCCAAGTGTTTG(c / EBP)(SEQ ID NO: 136), complementary sequences thereto, or functional variants of these TFBS sequences that maintain the ability to bind to their respective TFs in SEQ ID NOs. 129-136. These may be in the same order as CRE0018, i.e., in the order shown above. As discussed above, it is well known in the art that sequence variability exists in relation to TFBS, that for a given TFBS, there is typically a consensus sequence, and that there is typically some degree of deviation from it.

[0367] In some embodiments of the present invention, the functional variant of CRE0018 includes the sequence:CTTTCACTTTCTCGCCAA-Na-TGTGTAAACAATA-Nb-CTGAACCTTTACCC-Nc-GTTGCCCGGCAAC-Nd-CAGGTCTGTGCCAAGTGTTTG (SEQ ID NO: 128), or a sequence that is at least 70%, 80%, 90%, 95%, or 99% identical thereto, where Na, Nb, Nc, and Nd represent spacer sequences as needed. If present, Na may have a length of 10 to 20 nucleotides, preferably 13 to 17 nucleotides, more preferably 15 nucleotides. If present, Nb may have a length of 1 to 10 nucleotides, preferably 1 to 5 nucleotides, more preferably 1 nucleotide. If present, Nc may have a length of 1 to 10 nucleotides, preferably 1 to 5 nucleotides, more preferably 1 nucleotide. If present, Nd preferably has a length of 1 to 10 nucleotides, more preferably 2 to 8 nucleotides, and more preferably 3 nucleotides.

[0368] In some embodiments of the present invention, CRE consists of SEQ ID NO: 127 or 128, or functional variants thereof.

[0369] It should be noted that CRE or its functional variants may be provided to either strand of a double-stranded polynucleotide and may be provided in either direction. Therefore, the complementary and reverse complementary sequences of SEQ ID NOs. 128 or 129, or their functional variants, are within the scope of the present invention. Single-stranded nucleic acids containing sequences of SEQ ID NOs. 128 or 129, or their functional variants, are also within the scope of the present invention.

[0370] In some embodiments, a CRE comprising or consisting of CRE0018 (SEQ ID NO: 151) or its functional variants has a nucleotide length of 200 or less, 150 or less, 125 or less, or 103 or less.

[0371] In one embodiment, the liver-specific promoter includes or consists of SEQ ID NO: 93 or its functional variant. The promoter having the sequence of SEQ ID NO: 93 is referred to as SP0239. Functional variants of SP0239 may have sequences that are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical thereto.

[0372] Therefore, in some embodiments, the liver-specific promoter is SP0239 (SEQ ID NO: 93) and includes the following components: CRE0018 (SEQ ID NO: 151), CRE0051 (SEQ ID NO: 97), CRE0058 (SEQ ID NO: 113), CRE0065 (SEQ ID NO: 117), and CRE0066 (SEQ ID NO: 122), and CRE0052 (SEQ ID NO: 126), or functional variants that may have sequences identical to them by at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%. c.SP0240 and its variants

[0373] In some embodiments, the promoter is a synthetic liver-specific promoter comprising CRE0018 operably coupled to a promoter element. In some preferred embodiments, the liver-specific promoter comprises CRE0018 or an immediately upstream promoter element.

[0374] The promoter element may be any preferred proximal promoter or minimal promoter. In some preferred embodiments, the promoter element is CRE0006 (SEQ ID NO: 137), which is a liver-specific proximal promoter.

[0375] In some embodiments, the liver-specific promoter includes the following elements (or functional variants thereof): CRE0018, and then CRE0006.

[0376] The sequence of CRE0018 and its variants are shown above.

[0377] CRE0006 is a proximal promoter and contains TFBS for liver-specific TFs upstream of the TSS. The liver-specific TFBS present in CRE0006 in the order listed are HNF4 (SEQ ID NO: 138), RXRa (SEQ ID NO: 139), HNF4 (SEQ ID NO: 140), c / EBP (SEQ ID NO: 141), and HNF3 (SEQ ID NO: 142), as well as p1@VTN (SEQ ID NO: 143) if necessary. Thus, functional variants of CRE0006 preferably include these TFBS. Preferably, they are present in the same order as they are present in CRE0006, i.e., HNF4, c / EBP, HNF3, and HNF3. In some embodiments, the TFBS are duplicated, provided they remain functional, i.e., both duplicated sequences are capable of binding to their respective TFs.

[0378] p1@VTN (SEQ ID NO: 143) represents the transcription start site (TSS) in CRE0006, as determined by Cap analysis (CAGE) of gene expression.

[0379] In some embodiments, the functional variant of CRE0006 includes a sequence that is at least 70% identical to SEQ ID NO: 137 (preferably at least 80%, 90%, 95%, or 99% identical to SEQ ID NO: 25), which contains TFBS for HNF4, RXRa, HNF4, c / EBP, and HNF3, and preferably, downstream of the TFBS, contains a TSS sequence that is at least 80%, 90%, 95%, or completely identical to the TFBS for HNF4, RXRa, HNF4, c / EBP, and HNF3.

[0380] In some embodiments, a functional variant of CRE0006 has at least 70%, 80%, 90%, 95%, or 99% identity with SEQ ID NO: 137 and further includes the following TFBS: HNF4 (SEQ ID NO: 138) at or near positions 25-37; RXRa (SEQ ID NO: 139) at or near positions 73-83; HNF4 (SEQ ID NO: 140) at or near positions 74-86; c / EBP (SEQ ID NO: 141) at or near positions 123-136; and HNF3 (SEQ ID NO: 142) at or near positions 129-137, and includes a TSS sequence at or near positions 166-196 that is at least 80%, 90%, 95%, or completely identical to SEQ ID NO: 143, the positions being numbered with reference to SEQ ID NO: 137. In this context, "at" or "near" preferably means within 10, 5, 4, 3, 2, or 1 nucleotide of the positions listed with reference to SEQ ID NO: 137. The preferred TFBS sequences are sequence numbers 138-142, but alternative TFBS sequences can be used.

[0381] In one embodiment, the liver-specific promoter includes or consists of SEQ ID NO: 95 or its functional variant. The promoter having the sequence of SEQ ID NO: 95 is referred to as SP0240. Functional variants of SP0240 may have sequences that are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical thereto. d.SP0246 and its variants

[0382] In some embodiments, the promoter is a synthetic liver-specific promoter comprising the following CREs:CRE0051,CRE0058, andCRE0065, or functional variants thereof. Typically, the CREs are operably coupled to the promoter element. In some preferred embodiments, the liver-specific promoter comprises the CREs or their functional variants in the order of CRE0051,CRE0058, andCRE0065, and then the promoter element (from upstream to downstream).

[0383] The promoter element may be any preferred proximal promoter or minimal promoter. In some preferred embodiments, the promoter element is CRE0052 (also referred to as G6PC), which is a minimal promoter (also referred to as a core promoter).

[0384] In some embodiments, the liver-specific promoter includes the following elements (or their functional variants): CRE0051, CRE0058, CRE0065, and then CRE0052. The sequences of CRE0051, CRE0058, CRE0065, and promoter element CRE0052, as well as their functional variants, are shown above.

[0385] In one embodiment, the liver-specific promoter includes or consists of SEQ ID NO: 96 or its functional variants. The promoter having the sequence of SEQ ID NO: 96 is referred to as SP0246. Functional variants of SP0246 may have sequences that are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 96. e.SP0131 and its variants

[0386] In some embodiments, the promoter is a synthetic liver-specific promoter comprising the following CREs:CRE0058,CRE0065, andCRE0066, or functional variants thereof. Typically, the CREs are operably coupled to the promoter element. In some preferred embodiments, the liver-specific promoter comprises the CREs or their functional variants in the order of CRE0058,CRE0065,CRE0066, and then the promoter element (from upstream to downstream).

[0387] The promoter element may be any preferred proximal promoter or minimal promoter. In some preferred embodiments, the promoter element is CRE0052 (also referred to as G6PC), which is a minimal promoter (also referred to as a core promoter).

[0388] The sequences of CRE0058, CRE0065, and CRE0066, as well as the promoter element CRE0052, and their functional variants are shown above.

[0389] Sequence ID 141 or its functional variants. A promoter having the sequence of Sequence ID 141 is referred to as SP0131. Functional variants of SP0131 may have sequences that are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it. f. Compound promoter:

[0390] In some embodiments, the liver-specific promoter described above is operably ligated to one or more additional regulatory elements. These additional regulatory elements can enhance expression compared to, for example, a liver-specific promoter that is not operably ligated to them. Generally, it is preferable that the additional regulatory elements do not substantially reduce the specificity of the liver-specific promoter.

[0391] For example, a liver-specific promoter can be operably linked to sequences encoding UTRs (e.g., 5' and / or 3' UTRs), introns, etc.

[0392] In some embodiments, a liver-specific promoter is operably ligated to a sequence encoding a UTR, e.g., a 5'UTR. The 5'UTR can contain a variety of elements that can regulate gene expression. In native genes, the 5'UTR begins at the transcription start site and ends one nucleotide before the start codon of the coding region. It should be noted that the 5'UTR referred to herein may be the entire naturally occurring 5'UTR or a portion of a naturally occurring 5'UTR. The 5'UTR may also be partially or entirely synthetic. In eukaryotes, the 5'UTR has a median length of approximately 150 nt, but in some cases, they can be considerably longer. Regulatory sequences that may be found in the 5'UTR include, but are not limited to, Binding sites for proteins that may affect mRNA stability or translation; Riboswitch; Sequences that promote or inhibit translation initiation; and Linked introns within the 5'UTR for the regulation of gene expression and mRNA efflux These are some examples.

[0393] In some embodiments, the liver-specific promoter described above is operably ligated to a sequence encoding a 5'UTR derived from the main immediate-type gene of CMV (CMV-IE gene). For example, the 5'UTR derived from the CMV-IE gene preferably includes CMV-IE gene exon 1 and CMV-IE gene exon 1, or a portion thereof. In some cases, the promoter element may be modified to accommodate ligation to the 5'UTR, for example, by removing (e.g., replacing with the 5'UTR) a sequence downstream of the transcription start site (TSS) in the promoter element.

[0394] The CMV-IE 5'UTR is described in Simari, et al., Molecular Medicine 4: 700-706, 1998, "Requirements for Enhanced Transgene Expression by Untranslated Sequences from the Human Cytomegalovirus Immediate-Early Gene," which is incorporated herein by reference. Variants of the CMV-IE 5'UTR sequence discussed in Simari, et al. are also shown in WO2002 / 031137, which is incorporated by reference, and the regulatory sequences disclosed therein can also be used. Other UTRs that can be used in combination with promoters are known in the art, for example, in Leppek, K., Das, R. & Barna, M., "Functional 5' UTR mRNA structures in eukaryotic translation regulation and how to find them," which is incorporated by reference in Nat Rev Mol Cell Biol 19, 158-174 (2018).

[0395] In some embodiments, the sequence encoding the 5'UTR includes sequence number 145 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it. Sequence number 145 encodes the CMV-IE 5'UTR.

[0396] In some embodiments, the 5'UTR includes a nucleic acid motif that functions as a protein translation initiation site, such as a sequence that defines a Kozak sequence in the produced mRNA. For example, in some embodiments, the sequence encoding the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at or near its 3' end. Other Kozak sequences or other protein translation initiation sites are available as they are known in the art (e.g., Marilyn Kozak, "Point Mutations Define a Sequence Flanking the AUG Initiator Codon That Modulates Translation by Eukaryotic Ribosomes" Cell, Vol. 44, 283-292, January 31, 1986; Marilyn Kozak, "At Least Six Nucleotides Preceding the AUG Initiator Codon Enhance Translation in Mammalian Cells" J. Mol. Rid. (1987) 196, 947-950; Marilyn Kozak, "An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs" Nucleic Acids Research. Vol. 15 (20) 1987, all of which are incorporated herein by reference). The protein translation initiation site (e.g., a Kozak sequence) is preferably located immediately adjacent to the start codon.

[0397] In some embodiments, the sequence encoding the 5'UTR includes SEQ ID NO: 438 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it. This 5'UTR includes the six nucleotides of SEQ ID NO: 153, which defines the Kosack sequence at the 3' end of the CMV-IE 5'UTR.

[0398] In some embodiments, the SP0412 promoter or its variants discussed above are concatenated to a sequence encoding the 5'UTR to provide a composite promoter / 5'UTR regulatory construct. Herein, such a composite promoter / 5'UTR construct may be referred to simply as a “composite promoter,” or, in some cases, simply as a “promoter” for brevity.

[0399] In some embodiments, the composite promoter includes or consists of SEQ ID NO: 92 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 92.

[0400] This composite promoter includes SP0412 operably ligated to the Kozak sequence of GCCACC (SEQ ID NO: 153), which is derived from the CMV-IE gene and encodes a 5'UTR (SEQ ID NO: 145) derived from the CMV-IE gene. This (composite) promoter is referred to as SP0422 (SEQ ID NO: 92). SP0422 is a preferred liver-specific promoter in some embodiments. As discussed above, the 5'UTR preferably includes a sequence defining a nucleic acid motif, e.g., a Kozak sequence, which functions as a protein translation initiation site. In the above sequence, the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at its 3' end, although this sequence motif may be omitted or an alternative sequence may be used.

[0401] In some embodiments, the SP0265 promoter or its variant discussed above is concatenated to a sequence encoding the 5'UTR to provide a composite promoter (SP0236-5UTR).

[0402] In some embodiments, the composite promoter includes or consists of SEQ ID NO: 146 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 146.

[0403] This composite promoter includes SP0265 (SEQ ID NO: 94) operably ligated to the 5'UTR (SEQ ID NO: 145) derived from the CMV-IE gene and the Kozak sequence GCCACC (SEQ ID NO: 153). This (composite) promoter is referred to as SP0420. In this promoter, a short sequence downstream of the TSS in the CRE0052 promoter element is replaced with a sequence from the CMV-IE-derived 5'UTR. Thus, this promoter actually includes minor variants of SP0265 that have modifications to CRE0052, thereby removing some sequences. In some embodiments, SP0420 is preferred. As discussed above, the 5'UTR preferably includes a sequence that defines a nucleic acid motif, e.g., a Kozak sequence, which functions as a protein translation initiation site. In the above sequence, the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at its 3' end, although this sequence motif may be omitted or an alternative sequence may be used.

[0404] In some embodiments, the SP0239 promoter or its variant discussed above is concatenated to a sequence encoding the 5'UTR to provide a composite promoter (SP0239-UTR).

[0405] In some embodiments, the composite promoter includes or consists of SEQ ID NO: 147 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 147.

[0406] This composite promoter / 5'UTR construct includes SP0239 operably ligated to the 5'UTR derived from the CMV-IE gene and the Kozak sequence of GCCACC (SEQ ID NO: 153). This (composite) promoter is referred to as SP0421. Again, in this promoter, the short sequence downstream of the TSS in the CRE0052 promoter element is replaced with a sequence from the CMV-IE-derived 5'UTR. Thus, this promoter actually includes a minor variant of SP0239 having modifications to CRE0052, thereby removing some sequences. In some embodiments, SP0421 is preferred. As discussed above, the 5'UTR preferably includes a sequence defining a nucleic acid motif, e.g., a Kozak sequence, that functions as a protein translation initiation site. In the above sequence, the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at its 3' end, although this sequence motif can be omitted or an alternative sequence can be used.

[0407] In some embodiments, the SP0240 promoter or its variant discussed above is concatenated to a sequence encoding the 5'UTR to provide a composite promoter.

[0408] In some embodiments, the composite promoter includes or consists of SEQ ID NO: 148 or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0409] This composite promoter / 5'UTR construct comprises a 5'UTR derived from the CMV-IE gene and SP0240 operably ligated to the Kozak sequence of GCCACC (SEQ ID NO: 153). This (composite) promoter is referred to as SP0240-UTR. Again, in this promoter, a short sequence downstream of the TSS in the CRE0006 promoter element is replaced with a sequence from the CMV-IE-derived 5'UTR. Thus, this promoter actually includes minor variants of SP0240 that have modifications to CRE0006, thereby removing some sequences. In some embodiments, SP0240-UTR is preferred. As discussed above, the 5'UTR preferably includes a sequence that defines a nucleic acid motif, e.g., a Kozak sequence, which functions as a protein translation initiation site. In the above sequence, the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at its 3' end, although this sequence motif can be omitted or an alternative sequence can be used.

[0410] In some embodiments, the SP0246 promoter or its variant discussed above is ligated to a sequence encoding the 5'UTR to provide a composite promoter.

[0411] In some embodiments, the composite promoter includes or consists of SEQ ID NO: 149 (SP0246-UTR) or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0412] This composite promoter / 5'UTR construct includes a 5'UTR derived from the CMV-IE gene and SP0246 operably ligated to the Kozak sequence of GCCACC (SEQ ID NO: 153). This (composite) promoter is referred to as SP0246-UTR. Again, in this promoter, the short sequence downstream of the TSS in the CRE0052 promoter element is replaced with a sequence from the CMV-IE-derived 5'UTR. Thus, this promoter actually includes minor variants of SP0246 that have modifications to CRE0052, thereby removing some sequences. In some embodiments, SP0246-UTR is preferred. As discussed above, the 5'UTR preferably includes a sequence that defines a nucleic acid motif, e.g., a Kozak sequence, which functions as a protein translation initiation site. In the above sequence, the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at its 3' end, although this sequence motif can be omitted or an alternative sequence can be used.

[0413] In some embodiments, the SP0131_A1 promoter or a variant thereof discussed above is ligated to a sequence encoding the 5'UTR to provide a composite promoter.

[0414] In some embodiments, the composite promoter includes or consists of SEQ ID NO: 150 (SP0131 A1-UTR) or a functional variant thereof. In some embodiments, the functional variant may have a sequence that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% identical to it.

[0415] This composite promoter / 5'UTR construct comprises a 5'UTR derived from the CMV-IE gene and SP0131 operably ligated to the Kozak sequence of GCCACC (SEQ ID NO: 153). This (composite) promoter is referred to as SP0131-UTR. Again, in this promoter, the short sequence downstream of the TSS in the CRE0052 promoter element is replaced with a sequence from the CMV-IE-derived 5'UTR. Thus, this promoter actually includes minor variants of SP0131 that have modifications to CRE0052, thereby removing some sequences. In some embodiments, SP0131-UTR is preferred. As discussed above, the 5'UTR preferably includes a sequence that defines a nucleic acid motif, e.g., a Kozak sequence, which functions as a protein translation initiation site. In the above sequence, the 5'UTR includes the sequence motif GCCACC (SEQ ID NO: 153) at its 3' end, although this sequence motif can be omitted or an alternative sequence can be used.

[0416] In some embodiments, the liver-specific promoter is SP0412 (SEQ ID NO: 91) and includes the following components: CRE0051 (SEQ ID NO: 97), CRE0067 (SEQ ID NO: 152), CRE0059 (SEQ ID NO: 110), and the Kozak sequence (SEQ ID NO: 153), or functional variants that may have sequences identical to them by at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%.

[0417] In some embodiments, the liver-specific promoter is SP0422 (SEQ ID NO: 9) and includes the following components: CRE0051 (SEQ ID NO: 97), CRE0067 (SEQ ID NO: 152), CRE0059 (SEQ ID NO: 110), CMV-IE 5'UTR (SEQ ID NO: 153), and the Kozak sequence (SEQ ID NO: 153), or functional variants that may have sequences identical to them by at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99%. (iii) Functional variants of synthetic liver-specific promoters

[0418] In some embodiments, a functional variant of a liver-specific promoter can be considered a promoter element that substantially retains its activity when substituted in place of a reference promoter element within the promoter. For example, a functional variant of a liver-specific promoter may be a functional variant of a given promoter in Table 4 of this specification, or SEQ ID NO: 86 (CRM Any promoter listed from...

Claims

1. Within that genome, a. 5' and 3' AAV inverted terminal repeat (ITR) sequences, and b. A heterogeneous nucleic acid sequence that encodes a polypeptide containing an alpha-glucosidase (GAA) polypeptide, located between the 5' and 3' ITRs, wherein the heterogeneous nucleic acid is i. CRM_SP0412 (sequence number 86) or SP0412 (sequence number 91); ii. SP0422 (Sequence ID 92); or iii. CRM_SP0239 (sequence number 87) or SP0239 (sequence number 93); A heterogeneous nucleic acid sequence operably linked to a liver-specific promoter selected from one of the following: Recombinant adeno-associated virus (AAV) vectors containing this virus.

2. The recombinant AAV vector according to claim 1, wherein the heterogeneous nucleic acid sequence encodes a fusion polypeptide comprising a secretion signal peptide fused to the GAA polypeptide, or a fusion polypeptide comprising a targeted peptide fused to the GAA polypeptide, or a fusion polypeptide comprising a secretion signal peptide and a targeted peptide fused to the GAA polypeptide.

3. The AAV genome is arranged in the direction from 5' to 3'. a. 5'ITR, b. Liver-specific promoter sequence, c. Intron sequence, d. Nucleic acids encoding secretory signal peptides, e. Nucleic acids encoding IGF2-targeted peptides, f. Nucleic acids encoding alpha-glucosidase (GAA) polypeptides, g. Poly-A sequence, and h. 3'ITR A recombinant AAV vector according to claim 2, comprising:

4. The recombinant AAV vector according to claim 2 or 3, wherein the nucleic acid encoding the secretory signal peptide encodes a signal sequence selected from among AAT signal peptide, fibronectin signal peptide (FN1), GAA reader sequence, IL-2 wt reader sequence, modified IL-2 reader sequence, IL2(1-3) reader sequence, IgG reader sequence, or AAT reader sequence.

5. The recombinant AAV vector according to claim 3, wherein the IGF2-targeting peptide binds to a human cation-independent mannose-6-phosphate receptor (CI-MPR) or an IGF2 receptor.

6. The recombinant AAV vector according to claim 5, wherein the IGF2-targeting peptide comprises SEQ ID NO: 5 or comprises at least one amino acid modification in SEQ ID NO: 5 that binds to the IGF2 receptor.

7. The recombinant AAV vector according to claim 6, wherein the at least one amino acid modification in SEQ ID NO: 5 is the V43M amino acid modification (SEQ ID NO: 8 or SEQ ID NO: 9), or Δ2 to Δ7 (SEQ ID NO: 6), or Δ1 to Δ7 (SEQ ID NO: 7).

8. The recombinant AAV vector according to claim 1 or 2, wherein the nucleic acid sequence encodes a wild-type GAA polypeptide or a modified GAA polypeptide.

9. The recombinant AAV vector according to any one of claims 1 to 8, wherein the nucleic acid sequence encoding the GAA polypeptide is a human GAA gene, a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence.

10. The recombinant AAV vector according to any one of claims 1 to 9, wherein the nucleic acid sequence encoding the GAA polypeptide is modified from SEQ ID NO: 11 for one or more of the following: (i) codon optimization for enhanced expression in vivo, (ii) reduction of CpG islands, (iii) modification of the STOP sequence, (iv) reduction of alternative reading frames, and (v) reduction of the innate immune response.

11. The recombinant AAV vector according to any one of claims 1 to 10, wherein the nucleic acid sequence encoding the GAA polypeptide encodes a GAA polypeptide comprising at least one, at least two, or at least three amino acid modifications selected from V780I, H199R, or R223H of SEQ ID NO:

10.

12. The recombinant AAV vector according to any one of claims 3 and 5 to 7, wherein the encoded fusion polypeptide further comprises a spacer having a nucleotide sequence for at least one amino acid located at the amino terminus of the GAA polypeptide and the C terminus of the IGF2-targeting peptide.

13. The recombinant AAV vector according to claim 12, further comprising a nucleic acid encoding a spacer of at least one amino acid located between the nucleic acid encoding the IGF2-targeted peptide and the nucleic acid encoding the GAA polypeptide.

14. The recombinant AAV vector according to any one of claims 1 to 13, further comprising at least one poly(A) sequence located at 3' of the nucleic acid encoding the GAA polypeptide and 5' of the 3'ITR sequence.

15. The recombinant AAV vector according to any one of claims 1 to 14, wherein the heterogeneous nucleic acid sequence further comprises a collagen stability (CS) sequence, or a 3'UTR sequence, or a CS and a 3'UTR sequence, located at the 3' of the nucleic acid encoding the GAA polypeptide and the 5' of the 3'ITR sequence.

16. The recombinant AAV vector according to claim 3 or 14, further comprising a collagen stability (CS) sequence or a 3'UTR sequence, or a nucleic acid encoding both the CS and 3'UTR sequences, located between the nucleic acid encoding the GAA polypeptide and the polyA sequence.

17. The recombinant AAV vector according to any one of claims 2 to 16, further comprising intron sequences located at 5' of the sequence encoding the secretory signal peptide and 3' of the promoter.

18. The recombinant AAV vector according to claim 17, wherein the intron sequence comprises an MVM sequence, an HBB2 sequence, or an SV40 sequence.

19. The recombinant AAV vector according to any one of claims 1 to 18, wherein the ITR includes insertion, deletion, or substitution.

20. The recombinant AAV vector according to claim 19, wherein one or more CpG islands in the ITR are removed.

21. a. The nucleic acid encoding the secretory signal peptide is AAT signal peptide (SEQ ID NO: 17); Fibronectin signal peptide (FN1) (SEQ ID NOs: 18-21); Homogeneous GAA signal peptide (SEQ ID NO: 175) hIGF2 signal peptide (SEQ ID NO: 22); IgG1 leader peptide (SEQ ID NO: 177); wtIL2 leader peptide (SEQ ID NO: 179); and Mutant IL2 leader peptide (SEQ ID NO: 181) Selected from any of the groups consisting of; and b. The nucleic acid encoding the GAA polypeptide is selected from the group consisting of SEQ ID NO: 11, SEQ ID NO: 72, and SEQ ID NO:

182. A recombinant AAV vector according to any one of claims 2 to 7 and 17.

22. The recombinant AAV vector according to any one of claims 3 and 5 to 7, wherein the IGF2-targeting peptide is selected from any one of SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO:

9.

23. The recombinant AAV vector according to any one of claims 3, 5 to 7, and 22, wherein the nucleic acid encoding the IGF2-targeted peptide is located between the nucleic acid encoding the secretion signal peptide and the nucleic acid encoding the GAA polypeptide.

24. A recombinant AAV vector according to any one of claims 1 to 23, which is a chimeric AAV vector, a haploid AAV vector, a hybrid AAV vector, or a polyploid AAV vector.

25. A recombinant AAV vector according to any one of claims 1 to 24, which is a rational haploid vector, a mosaic AAV vector, a chemically modified AAV vector, or an AAV vector derived from an AAV serotype.

26. A recombinant AAV vector according to any one of claims 1 to 25, selected from the group consisting of an AAVXL32 vector, an AAVXL32.1 vector, an AAV8 vector, or a haploid AAV8 vector containing at least one AAV8 capsid protein.

27. A recombinant AAV vector according to any one of claims 1 to 26, wherein the serotype is AAV3b.

28. The recombinant AAV vector according to claim 27, wherein the AAV3b serotype comprises one or more mutations in the capsid protein selected from 265D, 549A, and Q263Y.

29. The recombinant AAV vector according to claim 28, wherein the AAV3b serotype is selected from AAV3b265D, AAV3b265D549A, AAV3b549A, AAV3bQ263Y, or AAV3bSASTG.

30. Within that genome, a. 5' and 3' AAV inverted terminal repeat (ITR) sequences, and b. A heterogeneous nucleic acid sequence encoding a polypeptide comprising an alpha-glucosidase (GAA) polypeptide located between the 5' and 3' ITRs, wherein the heterogeneous nucleic acid is operably linked to a liver-specific promoter, and the liver-specific promoter is i. CRM_SP0412 (sequence number 86) or SP0412 (sequence number 91); ii. SP0422 (Sequence ID 92); or iii. CRM_SP0239 (sequence number 87) or SP0239 (sequence number 93); A heterogeneous nucleic acid sequence selected from one of the following: A recombinant adeno-associated virus (AAV) vector containing, Containing a capsid protein that is serotype AAV3, AAV3b, or AAV8, Recombinant AAV vector.

31. The recombinant AAV vector according to claim 30, wherein the heterologous nucleic acid sequence encoding the GAA polypeptide further comprises a nucleic acid encoding a secretion signal peptide located at 5' of the nucleic acid encoding the GAA polypeptide.

32. The recombinant AAV vector according to claim 31, wherein the heterogeneous nucleic acid sequence encoding the GAA polypeptide further comprises a nucleic acid encoding a targeted peptide located between the nucleic acid encoding the secretory signal peptide and the nucleic acid encoding the GAA polypeptide.

33. The AAV genome is arranged in the direction from 5' to 3'. a. 5'ITR, b. Liver-specific promoter sequence, c. Intron sequence, d. Nucleic acids encoding secretory signal peptides, e. Nucleic acids encoding GAA polypeptides, f. Poly-A sequence, and g. 3'ITR A recombinant AAV vector according to claim 30, comprising:

34. The AAV genome is arranged in the direction from 5' to 3'. a. 5'ITR, b. Liver-specific promoter sequence, c. Intron sequence, d. Nucleic acids encoding targeted peptides, e. Nucleic acids encoding GAA polypeptides, f. Poly-A sequence, and g. 3'ITR A recombinant AAV vector according to claim 30, comprising:

35. The recombinant AAV vector according to any one of claims 31 to 33, wherein the secretory signal peptide is selected from any one of AAT signal peptide, fibronectin signal peptide (FN1), GAA reader sequence, IL-2 wt reader sequence, modified IL-2 reader sequence, IL2(1-3) reader sequence, IgG reader sequence, or AAT reader sequence.

36. The recombinant AAV vector according to claim 32 or 34, wherein the targeted peptide is selected from either a human cation-independent mannose-6-phosphate receptor (CI-MPR) or an IGF2-targeted peptide sequence that binds to an IGF2 receptor.

37. The recombinant AAV vector according to claim 36, wherein the IGF2-targeting peptide includes SEQ ID NO: 5, or includes at least one amino acid modification in SEQ ID NO: 5 that does not affect binding to the CI-MPR receptor or reduces binding to at least one serum IGF-binding protein (IGFBP).

38. The recombinant AAV vector according to any one of claims 30 to 37, wherein the nucleic acid sequence encodes the wild-type GAA polypeptide of SEQ ID NO: 10 or a modified GAA polypeptide.

39. The recombinant AAV vector according to any one of claims 30 to 38, wherein the nucleic acid sequence encodes a GAA polypeptide comprising at least one, at least two, or at least three amino acid modifications selected from V780I, H199R, or R223H of SEQ ID NO:

10.

40. The recombinant AAV vector according to any one of claims 30 to 39, wherein the nucleic acid sequence encoding the GAA polypeptide is a human GAA gene, a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence.

41. The recombinant AAV vector according to any one of claims 30 to 40, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized to reduce CpG islands.

42. The recombinant AAV vector according to any one of claims 30 to 41, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized to reduce the innate immune response, reduce CpG islands, or reduce the innate immune response and reduce CpG islands.

43. The recombinant AAV vector according to claim 33 or 34, wherein the intron sequence comprises an MVM sequence or an HBB2 sequence.

44. The recombinant AAV vector according to claim 41 or 42, wherein the ITR includes insertions, deletions, or substitutions, or one or more CpG islands in the ITR are removed.

45. The recombinant AAV vector according to claim 44, which is a haploid AAV8 vector comprising AAVXL32, or AAVXL32.1, or AAV8, or at least one AAV8 capsid protein.

46. a. The nucleic acid encoding the secretory signal peptide is AAT signal peptide (SEQ ID NO: 17); Fibronectin signal peptide (FN1) (SEQ ID NOs: 18-21); Homogeneic GAA signal peptide (SEQ ID NO: 175); hIGF2 signal peptide (SEQ ID NO: 22); IgG1 leader peptide (SEQ ID NO: 177); and wtIL2 leader peptide (SEQ ID NO: 179); Mutant IL2 leader peptide (SEQ ID NO: 181) Selected from any of the groups consisting of; and b. The nucleic acid encoding the GAA polypeptide is selected from the group consisting of SEQ ID NO: 11, SEQ ID NO: 72, and SEQ ID NO: 182, or is a nucleic acid sequence encoding the GAA polypeptide having at least one, at least two, or at least three amino acid modifications selected from V780I, H199R, or R223H of SEQ ID NO:

10. A recombinant AAV vector according to any one of claims 31 to 33 and 35.

47. The recombinant AAV vector according to claim 36 or 37, wherein the IGF2-targeting peptide is SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO:

9.

48. The recombinant AAV vector according to claim 36 or 37, wherein the IGF2-targeting peptide is SEQ ID NO: 8 or SEQ ID NO:

9.

49. A pharmaceutical composition comprising a recombinant AAV vector according to any one of claims 1 to 48 in a pharmaceutically acceptable carrier.

50. A nucleic acid comprising a liver-specific promoter operably linked to a nucleic acid sequence encoding a GAA polypeptide, wherein the liver-specific promoter is i. CRM_SP0412 (sequence number 86) or SP0412 (sequence number 91; ii. SP0422 (Sequence ID 92); or iii. CRM_SP0239 (sequence number 87) or SP0239 (sequence number 93); nucleic acids.

51. Nucleic acids for recombinant adeno-associated virus (AAV) vector genomes, a. 5' and 3' AAV inverted terminal repeat (ITR) nucleic acid sequences, and b. A heterogeneous nucleic acid sequence encoding a polypeptide comprising a secretory signal peptide and an alpha-glucosidase (GAA) polypeptide, located between the 5' and 3' ITRs, wherein the heterogeneous nucleic acid sequence is operably linked to a liver-specific promoter, and the liver-specific promoter is i. SP0422 (Sequence ID 92); or ii. CRM_SP0239 (sequence number 87) or SP0239 (sequence number 93) heterologous nucleic acid sequences Nucleic acids, including

52. The nucleic acid according to claim 51, wherein the heterogeneous nucleic acid sequence encoding the GAA polypeptide further comprises an IGF2-targeting peptide located between the secretory signal peptide and the GAA polypeptide.

53. The nucleic acid according to claim 51 or 52, wherein the nucleic acid encoding the secretion signal peptide is selected from any of SEQ ID NOs: 17, 22-26, 177, 179, and 181.

54. The nucleic acid according to claim 52, wherein the nucleic acid encoding the IGF2-targeted peptide is SEQ ID NO: 2 (IGF2-Δ2 to 7), SEQ ID NO: 3 (IGF2-Δ1 to 7), or SEQ ID NO: 4 (IGF2 V43M).

55. The nucleic acid according to any one of claims 50 to 54, wherein the nucleic acid sequence encoding the GAA polypeptide is a human GAA gene, a human codon-optimized GAA gene (coGAA), or a modified GAA nucleic acid sequence.

56. The nucleic acid according to claim 55, wherein the nucleic acid sequence encoding the GAA polypeptide is modified from SEQ ID NO: 11 for one or more of the following: (i) codon optimization for enhanced expression in vivo, (ii) reduction of CpG islands, (iii) modification of the STOP sequence, (iv) reduction of alternative reading frames, and (v) reduction of the innate immune response.

57. The nucleic acid according to claim 55, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized to reduce CpG islands, reduce the innate immune response, or reduce CpG islands and reduce the innate immune response, and / or for enhanced expression in vivo.

58. The nucleic acid according to any one of claims 50 to 57, wherein the nucleic acid sequence encoding the GAA polypeptide is codon-optimized for enhanced in vivo expression.

59. The nucleic acid according to any one of claims 50 to 58, wherein the nucleic acid encoding the GAA polypeptide is selected from any one of SEQ ID NO: 11 (full-length hGAA), SEQ ID NO: 55 (Dwith cDNA), SEQ ID NO: 56 (hGAA Δ1 to 66), SEQ ID NO: 82 (mod_hGAA), or SEQ ID NO:

182.

60. The nucleic acid according to any one of claims 50 to 58, wherein the nucleic acid encoding the GAA polypeptide is selected from any one of SEQ ID NO: 74 (codon optimized 1), SEQ ID NO: 75 (codon optimized 2), SEQ ID NO: 76 (codon optimized 3), or SEQ ID NO: 82 (mod_hGAA).

61. The nucleic acid according to any one of claims 50 to 58, wherein the nucleic acid encodes a GAA polypeptide comprising at least one, at least two, or at least three amino acid modifications selected from V780I, H199R, or R223H of SEQ ID NO:

10.

62. A composition for treating subjects having glycogen storage disorder type II (GSD II, Pompe disease, acid maltase deficiency) or a deficiency in alpha-glucosidase (GAA) polypeptide, comprising a recombinant AAV vector according to any one of claims 1 to 48, or a nucleic acid according to any one of claims 50 to 61.

63. The composition according to claim 62, wherein the GAA polypeptide is secreted from the liver of the subject, and the secreted GAA is taken up by skeletal muscle tissue, cardiac muscle tissue, diaphragmatic muscle tissue, or a combination thereof, and the uptake of the secreted GAA results in a reduction of lysosomal glycogen storage in the tissue.

64. The composition according to claim 62, characterized in that the administration of the composition to the subject is selected from among intramuscular, subcutaneous, intraspinal, intracisional, intrathecal, and intravenous administration.

65. The composition according to claim 62, wherein the recombinant AAV vector is a chimeric AAV vector, a haploid AAV vector, a hybrid AAV vector, or a polyploid AAV vector.

66. The composition according to claim 62, wherein the recombinant AAV vector is a rational haploid vector, a mosaic AAV vector, a chemically modified AAV vector, or an AAV vector derived from an AAV serotype.

67. The composition according to claim 62, wherein the recombinant AAV vector is an AAVXL32 vector, or an AAVXL32.1 vector, or an AAV8 vector, or a haploid AAV8 vector containing at least one AAV8 capsid protein.

68. The composition according to claim 62, wherein the recombinant AAV vector is an AAV8 vector.