Vaccine compositions comprising hotspot sequences from the large-hbsag

WO2026093412A3PCT designated stage Publication Date: 2026-06-25NEC ONCOIMMUNITY AS +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: NEC ONCOIMMUNITY AS
Filing Date: 2025-10-29
Publication Date: 2026-06-25

Application Information

Patent Timeline

29 Oct 2025

Application

25 Jun 2026

Publication

WO2026093412A3

IPC: A61K39/12; A61P31/20

CPC: C12N2730/10122; C12N2730/10134; A61K39/12; A61P31/20

AI Tagging

Application Domain

Viral antigen ingredients Antivirals

Technology Topics

Disease Microorganism

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Current HBV vaccines face challenges in generating a broad T-cell response due to human leukocyte antigen (HLA) restriction, leading to variable efficacy across different individuals, and there is a need for a vaccine that can effectively target a diverse spectrum of HLA types to provide global protection against HBV infections.

Method used

Development of polypeptides comprising specific hotspot sequences that bind to multiple HLA alleles, identified through in silico analysis, to stimulate both cellular and humoral immune responses, formulated into vaccines that cover a broad range of HBV genotypes.

Benefits of technology

The identified hotspot sequences in polypeptides offer a potential cure for chronic HBV infections and related diseases across various HLA types, providing robust immune response and global protection against HBV.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure EP2025081310_25062026_PF_FP_ABST

Patent Text Reader

Abstract

The present invention relates to polypeptides, polynucleotides, compositions, microorganisms, vectors, and vaccine compositions optimised for the treatment or prophylaxis of a disease or infection caused by the Hepatitis B virus (HBV). In particular, the invention provides a polypeptide comprising one or more of SEQ ID NOs: 1 to 7, preferably SEQ ID NOs: 5 and / or 7, or a variant thereof having at least 70% sequence identity thereto, said polypeptide being no more than 1500 amino acids in length.

Need to check novelty before this filing date? Find Prior Art

Description

[0001] HBV VACCINE

[0002] FIELD OF THE INVENTION

[0003] The present invention relates to polypeptides, polynucleotides, compositions, microorganisms, vectors, and vaccine compositions optimised for the treatment or prophylaxis of a disease or infection caused by the Hepatitis B virus (HBV), specifically HVB-related diseases, wherein the above are comprised of one or more amino acid sequences selected for their ability to stimulate a broad and effective adaptive immune response to HBV and cover a diverse spectrum of human leukocyte antigen (HLA) alleles.

[0004] BACKGROUND

[0005] Hepatitis B is a serious viral infection that can manifest as both chronic and acute disease. Although safe and effective vaccines aimed at preventing the transmission of hepatitis B virus (HBV) have been available for over four decades, approximately 1.5 million individuals are infected with HBV each year, with HBV infections accounting for an estimated 820,000 deaths in 2019 (World Health Organisation, October 2023).

[0006] HBV infections are primarily spread through horizontal transmission, and, in highly endemic areas, perinatal transmission. Other common means for transmission include sexual exposure and needlestick injury. Frequent contact with infected individuals therefore establishes a serious risk of infection to unvaccinated groups including health workers, persons with multiple sexual partners, and body modification artists.

[0007] HBV infections contracted during childhood predominantly develop as chronic hepatitis, in comparison to adulthood HBV infections which rarely lead to chronic hepatitis. Individuals with acute HBV illness typically present with nausea, vomiting, abdominal pain, and jaundice, which generally alleviates after several weeks, although severely acute cases can lead to liver failure and death. The chronic disease may be asymptomatic or may trigger complications including cirrhosis, hepatic decompensation, and hepatocellular carcinoma (HCC, one of the most prevalent forms of cancer in humans).

[0008] Cellular immunity is an arm of the adaptive immune system that is specialised to resolve intracellular infections and prevent reinfection from pathogens, and often works in tandem with humoral (antibody-based) immunity upon natural exposure to a foreign body. A cellular immune response involves the interaction of T cells, each providing a variety of immune-related functions to aid in the reduction or elimination of pathogen-infected host cells (Amanna & Slifka 2011, Virology 411(2): 206-215). Furthermore, the generation of memory T cells as part of the cellular immune response results in the ability to mount a faster and stronger immune response upon re-exposure to a previously encountered pathogen (Restifo & Tattinoni 2013, Current Opinion in Immunology 25(5): 556-63).

[0009] However, when designing vaccines engineered to induce a broad T cell response, there exists a challenge of human leukocyte antigen (HLA) restriction within an individual and a broader population. The HLA complex is a set of genes encoding the major histocompatibility complex (MHC) proteins in humans, responsible for the regulation of an individual’s immune system, as well as the ability to specifically present at the surface of infected cells, and elicit an immune response against, epitopes generated during a natural infection, or delivered to said individual in the form of a vaccine (Marsh et al. 2010 Tissue Antigens 75(4): 291 -455).

[0010] The high polymorphism of HLA alleles and subsequent immune system variability between individuals results in a diverse spectrum of “HLA types” across the population. As an added complication to peptide-based vaccine development, such HLA types can have a significant impact on the efficacy of a potentially prophylactic viral vaccine composition between different individuals. As such, generation of an epitope-based vaccine composition that is compatible with a particular subset of HLA types may prove ineffective with a significant proportion of the global population comprising individuals of different HLA types. Considering this, the generation of T-cell and B-cell epitope vaccines, that target a limited number of HLA types, may only prove advantageous for a narrow, select population.

[0011] Currently, no specific treatment for acute HBV infections exists (World Health Organisation, October 2023), and antiviral drugs including tenofovir or entecavir are the first-line drugs of choice to decrease viral load within infected individuals by inhibiting the reverse-transcriptase function of the HBV DNA polymerase. However, such antiviral therapies are virostatic and require life-long usage, and HBV mutants can develop that are resistant to these drugs, resulting in treatment failure.

[0012] Thus, there exists a clear unmet medical need for a safe and effective vaccine for use in the therapeutic or prophylactic treatment of HBV-related diseases and HBV infections. A suitable vaccine would need to be optimised to incorporate epitopes from a broad range of HBV viral genotypes and cover a diverse spectrum of HLA types, in order to provide broad protection across a global population.

[0013] SUMMARY OF INVENTION

[0014] This invention is based on the surprising identification of specific hotspot sequences that are present across a broad range of different hepatitis B virus (HBV) genotypes. These hotspots were effective in generating an immune response via validation experiments of the identified hotspot sequences in healthy donor peripheral blood mononuclear cells (PBMCs) and PBMCs from chronic HBV patients. Accordingly, new vaccines with predicted high efficacy can be formulated that comprise one or more amino acid sequences based on to these hotspot sequences. Such a vaccine has the potential to stimulate an immune response to HBV that is both cellular and potentially humoral in nature, for the therapeutic or prophylactic treatment of infection with HBV in humans across the global population.

[0015] In a first aspect of the invention, there is provided a polypeptide comprising one or more of SEQ ID NOs: 1 to 7, or a variant thereof having at least 70% sequence identity thereto, said polypeptide being no more than 1500 amino acids in length. Preferably, the polypeptide comprises SEQ ID NO: 5 and / or 7, or a variant having at least 70% sequence identity thereto, said polypeptide being no more than 1500 amino acids in length.

[0016] In a second aspect of the invention, there is provided a polynucleotide encoding a polypeptide according to the first aspect of the invention. The polynucleotide may be DNA, or RNA such as mRNA. The polynucleotide may include synthetic non-natural nucleotides, as would be understood by a person of skill in the art.

[0017] In a third aspect of the invention, there is provided a vector comprising a polynucleotide according to the second aspect of the invention, wherein the vector may further comprise regulatory elements capable of driving transcription and / or translation of the polynucleotide in a host cell. The vector may be a plasmid or a viral vector or any other suitable vector. The plasmid vector may comprise DNA or RNA (such as mRNA). The viral vector may comprise DNA or RNA (such as mRNA).

[0018] In a fourth aspect of the invention, there is provided a microorganism comprising a polypeptide according to the first aspect of the invention, or a polynucleotide according to the second aspect of the invention. The microorganism may be a bacterial microorganism.

[0019] In a fifth aspect of the invention, there is provided a composition comprising one or more of any of the amino acid sequences according to SEQ ID NOs: 1 to 7, or a variant thereof having at least 70% sequence identity, each of said amino acid sequences being no more than 150 amino acids in length, preferably SEQ ID NO: 5 and / or 7, or a variant having at least 70% sequence identity thereto, said polypeptide being no more than 150 amino acids in length.

[0020] In a sixth aspect of the invention, there is provided a vaccine composition comprising a polypeptide, polynucleotide, vector, microorganism, or composition according to the first, second, third, fourth or fifth aspects of the invention, respectively. In a seventh aspect of the invention, there is provided a polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth or sixth aspects of the invention, respectively, for therapeutic or prophylactic use.

[0021] In an eighth aspect of the invention, there is provided a polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth, or sixth aspects of the invention, respectively, for use in the treatment or prophylaxis of a HBV-related disease, preferably wherein the disease is hepatitis, HBV-related liver cirrhosis or HBV-related liver cancer.

[0022] In a ninth aspect of the invention, there is provided a method of treatment comprising administering to a subject the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth or sixth aspects of the invention, respectively. The treatment is preferably a treatment or prophylaxis of a HBV-related disease, the HBV-related disease being, more preferably, HBV-related liver cirrhosis or HBV-related liver cancer.

[0023] In a tenth aspect of the invention, there is provided a use of the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth or sixth aspects of the invention, respectively, in the manufacture of a medicament. The medicament is preferably for therapeutic use, more preferably for the treatment or prophylaxis of a HBV-related disease, the HBV-related disease being, more preferably, HBV-related liver cirrhosis or HBV-related liver cancer.

[0024] BRIEF DESCRIPTION OF FIGURES

[0025] Figures 1A, 1B, 1C and 1D show a workflow diagram for hotspot identification via the in silico approach for identifying and refining the top candidate hotspots starting from the HBV database of raw protein sequences. Figure 2 shows the process of evaluating candidate hotspots by estimating the coverage of simulated digital twin populations.

[0026] Figures 3A and 3B show the coverage of hotspots which aggregates across genotypes and the coverage which splits genotypes worldwide and across regionspecific populations (Europe, North-East Asia, and South-East Asia).

[0027] Figure 4 shows a workflow of an ELISPOT assay for evaluating candidate hotspots in PBMCs.

[0028] Figure 5 shows the peptide pools designed and synthesized.

[0029] Figure 6 shows a workflow of an ELISPOT assay for evaluating T-cell expansion in response to stimulation by various candidate peptides and peptide pools.

[0030] Figures 7A and 7B show the representative image results of the ELISPOT assay for determining T-cell expansion in response to stimulation by various candidate peptides or DMSO controls in three different HBV donors.

[0031] Figures 8A and 8B show specific interferon-gamma production after restimulation of cells with peptide pools versus restimulation of cells with DMSO.

[0032] Figures 9A and 9B show a summary of the ELISPOT assays in healthy donors (A) and HBV patients (B), where all healthy donor PBMCs responded to at least one of the candidate peptide pools.

[0033] Figure 10 shows the initial pooling of peptides according to their properties.

[0034] Figures 11A and 11B show the results of a Greedy Hill Climbing (GHC) algorithm to optimize vaccine element selection.

[0035] Figure 12 lists the Class II HLA alleles used to assess the presence of characteristic epitopes within the selected hotspots.

[0036] Figures 13 to 15 show the estimated population coverage for a number of preferred sequences. Figures 16 and 17 show the estimated population coverage for a number of preferred combinations of sequences.

[0037] DETAILED DESCRIPTION

[0038] This invention is predicated on the identification of specific hotspot sequences that contain multiple epitopes that bind to multiple HLA (encoded by different HLA alleles) and are present across a broad range of different hepatitis B virus (HBV) genotypes. The in silico analysis utilised by the inventors of the present invention has identified optimal candidates for HBV vaccines, which have subsequently been validated for their ability to stimulate an immune response. Polypeptides encoded by these hotspot sequences were evaluated for their ability to promote T cell expansion and thus stimulate an immune response in a subject in need thereof. Thus, the incorporation of these hotspots into polypeptides, polynucleotides, vectors, microorganisms, compositions, and vaccine compositions may allow for the therapeutic and / or prophylactic treatment of HBV-related diseases and HBV infections. The current first-line antiviral drugs which aim to prevent HBV-related diseases and HBV infections often require long-term (often life-long) adherence to the regime, and are associated with drug resistance and financial burden. The polypeptides, polynucleotides, vectors, microorganisms, compositions, and vaccine compositions of the present invention offer a potential treatment and / or cure for chronic HBV infections and HBV-related diseases caused by any of the 8 known HBV genotypes, which have distinct geographical distribution across the world. HBV genotype A (HBV-A) is prevalent across Europe, North America, South East Africa, and India, whereas HBV genotype B (HBV-B) and C (HBV-C) are widespread in Asia and Oceania. HBV-D is found in Europe, North America, North Africa, the Middle East, and Oceania, with HBV-E prevalent in West Africa, HBV-F in South America, and HBV-G and HBV-H in Central and South America (Rajoriya et al., 2017).

[0039] As the course, treatment, and prognosis of HBV-related diseases and HBV infections are influenced by the genotype in question, the present invention offers a global solution to HBV-related diseases and infections in that the hotspots identified are present across all HBV genotypes. Furthermore, a surprisingly robust statistical model allows for the identification of those predicted hotspots that are capable of triggering immunogenicity across a wide variety of human leukocyte antigen (HLA) types, further indicating that the present invention has the potential to elicit protection against HBV across the global human population.

[0040] Thus, in a first aspect of the invention, there is provided a polypeptide comprising one or more of SEQ ID NOs: 1 to 7 or a variant thereof having at least 70% sequence identity, said polypeptide being no more than 1500 amino acids in length. Preferably, the polypeptide comprises a sequence according to SEQ ID NO: 5 and / or 7, or a variant having at least 70% sequence identity thereto, said polypeptide being no more than 1500 amino acids in length. As used herein, the term “polypeptide” refers to a molecule that is made of two or more amino acids (a polymer of amino acids), which are bonded together via peptide bonds to form a polypeptide.

[0041] The polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 1 to 7, preferably 5 and / or 7, optionally also SEQ ID NO: 1. For the avoidance of doubt, SEQ ID NOs: 1 to 7 are amino acid sequences that are encoded by the hotspot sequences identified by the present inventors across the 8 different HBV genotypes. The polypeptide of the present invention may therefore be a larger polypeptide composed of multiple amino acid sequences of or encompassed by SEQ ID NOs: 1 to 7, preferably 5 and / or 7. Accordingly, the term “polypeptide” denotes the claimed polypeptide, whereas the term “hotspot” is used to denote the sequence encoding the amino acid sequences according to any of SEQ ID NOs: 1 to 7 or variants thereof.

[0042] In one embodiment of the invention, the one or more amino acid sequences (encoded by the hotspots) are (each) to be no more than 150 amino acids in length. For example, no more than 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, or 30 amino acids in length. In particular, amino acid sequences according to SEQ ID NO: 1 or variants thereof may be no more than about 140 amino acids in length. For example, the sequence may be 131 amino acids in length. Amino acid sequences according to SEQ ID NO: 2 or variants thereof may be no more than about 40 amino acids in length. For example, the sequence may be 31 amino acids in length. Amino acid sequences according to SEQ ID NO: 3 or variants thereof may be no more than about 90 amino acids in length. For example, the sequence may be 82 amino acids in length. Amino acid sequences according to SEQ ID NO: 4 or variants thereof may be no more than about 50 amino acids in length. For example, the sequence may be 42 amino acids in length. Amino acid sequences according to SEQ ID NO: 5 or variants thereof may be no more than about 40 amino acids in length. For example, the sequence may be 38 amino acids in length. Amino acid sequences according to SEQ ID NO: 6 or variants thereof may be no more than about 40 amino acids in length. For example, the sequence may be 36 amino acids in length. Amino acid sequences according to SEQ ID NO: 7 or variants thereof may be no more than about 50 amino acids in length. For example, the sequence may be 42 amino acids in length. In a preferred embodiment, the amino acid sequences encoded by the hotspot sequences may be between 31 and 131 amino acids in length. The varying lengths are intended to encapsulate the respective beneficial characteristics of having a polypeptide that is shorter, usually termed an “oligopeptide” or one that is longer, termed a “polypeptide”.

[0043] As described herein, these amino acid sequences or variants thereof may be present in any combination to form a larger polypeptide, with or without spacers in between each polypeptide, wherein the overall polypeptide of the present invention is to be no more than 1500, 1400, 1300, 1200, 1100, 1000, 950, 900, 850, 800, 750, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, or 30 amino acids in length. As such, it will be understood that the overall (i.e., larger) polypeptide may be no more than 1500, 1400, 1300, 1200, 1100, 1000, 950, 900, 850, 800, 750, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, or 30 amino acids in length, and each of the one or more constituent amino acid sequences within the overall polypeptide may each be no more than 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, or 30 amino acids in length. An example of this would be the overall polypeptide being 1500 amino acids in length comprising ten amino acid sequences each of 150 amino acids in length. Another example is the overall polypeptide being 1065 amino acids in length comprising seventeen amino acid sequences. In one embodiment, the seventeen amino acid sequences comprise three variants of SEQ ID NO: 1, two variants of SEQ ID NO: 2, three variants of SEQ ID NO: 3, three variants of SEQ ID NO: 4, two variants of SEQ ID NO: 5, SEQ ID NO: 6, and three variants of SEQ ID NO: 7.

[0044] The term “amino acid” retains its meaning as used in the art and would be understood by the skilled person. The term “variant” is used to refer to amino acid sequences that differ from the indicated SEQ ID NOs by one or more amino acid residues. This could be a substitution, addition, or deletion (e.g., any form of mutation or modification) of one or more amino acids. An example would be the amino acid sequences ACDEF and ACDEE, whereby the latter is a variant of the former as one amino acid has been substituted. Thus, variants may have amino acids inserted, deleted, or substituted from the indicated sequence.

[0045] The sequences of SEQ ID NOs: 1 to 7 of the invention are summarised in Table 1.

[0046]

[0047] Table 1. Amino acid sequences according to SEQ ID NOs: 1 to 7, with variable amino acids highlighted in bold (X, Z, J, O, U, B, X2and Z2). The possible identities of the variable amino acids are discussed below.

[0048] The polypeptide of the present invention comprises one or more of SEQ ID NOs: to 7, preferably 5 and / or 7, or a variant thereof having at least 70% sequence identity. As described below, SEQ ID NOs: 1 to 7 encompass variants of said sequences which may differ by one or more amino acids. In particular, SEQ ID NOs: 1 to 7 encompass variants according to SEQ ID NOs: 8 to 23. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more of SEQ ID NOs: 1 to 23. In particular, the polypeptide of the present invention may comprise one or more of SEQ ID NOs: 6 and 8 to 23.

[0049] In a particularly preferred embodiment, the polypeptide of the invention comprises any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23. Even more preferably, the polypeptide of the invention comprises any of SEQ ID NOs: 8, 9, 20 and / or 22. Even more preferably, the polypeptide of the invention comprises any of SEQ ID NOs: 8, 20 and / or 22. Even more preferably, the polypeptide of the invention comprises any of SEQ ID NOs: 20 and / or 22, preferably SEQ ID NOs: 20 and 22. The relationship between possible variants of SEQ ID NOs: 1 to 7 and their corresponding hotspots is summarised in Table 2 (see also Figure 5) and described below.

[0050]

[0051]

[0052] Table 2. Relationship between SEQ ID NO:s 1 to 7 and possible variant sequences with their corresponding hotspot (HS) labels.

[0053] As used herein, the term “sequence identity” and “sequence homology” are interchangeable and refers to the number of identical residues over a defined length into a given alignment. To calculate % sequence identity of any of the sequences herein disclosed, sequence comparison software may be used, for example, using the default settings on the BLAST software package (V2.10.1).

[0054] In accordance with claim 1, the polypeptide may comprise SEQ ID NO: 5 and / or 7, i.e., SEQ ID NOs 5 and 7 or SEQ ID NOs 5 or 7. In the context of SEQ ID NO: 5, X may represent different amino acids. In one embodiment, X represents a hydrophobic amino acid. Preferably, X represents valine (V) or alanine (A). A variant of SEQ ID NO: 5 may comprise any hydrophobic amino acid at position X. For example, SEQ ID NOs: 19 and 20 are variants of SEQ ID NO: 5. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 5, 19 and / or 20.

[0055] SEQ ID NO: 5 LWEWASXRFSWLSLLVPFVQWFVGLSPTVWLSXIWMMW

[0056] SEQ ID NO: 19 LWEWASVRFSWLSLLVPFVQWFVGLSPTVWLSVIWMMW

[0057] SEQ ID NO: 20 LWEWASARFSWLSLLVPFVQWFVGLSPTVWLSAIWMMW

[0058] Any other suitable variant is intended to be included, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NOs: 5, 19 and / or 20. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, or 97% sequence identity to SEQ ID NOs: 5, 19 and / or 20.

[0059] In accordance with claim 1, the polypeptide may comprise SEQ ID NO: 7. In the context of SEQ ID NO: 7, X, Z and J may represent different amino acids. In one embodiment, X represents alanine (A) or threonine (T). In another embodiment, Z represents asparagine (N), isoleucine (I), or serine (S). In another embodiment, J may represent a basic amino acid. Preferably, J represents histidine (H) or arginine (R). A variant of SEQ ID NO: 7 may comprise any selection of these amino acids at positions X, Z and / or J. For example, SEQ ID NOs: 21, 22 and 23 are variants of SEQ ID NO: 7. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 7, 21, 22 and / or 23.

[0060] SEQ ID NO: 7 FLVDKNPHNTXESRLVVDFSQFSRGZTJVSWPKFAVPNLQSL SEQ ID NO: 21 FLVDKNPHNTAESRLWDFSQFSRGNTRVSWPKFAVPNLQSL SEQ ID NO: 22 FLVDKNPHNTTESRLWDFSQFSRGITRVSWPKFAVPNLQSL SEQ ID NO: 23 FLVDKNPHNTTESRLWDFSQFSRGSTHVSWPKFAVPNLQSL

[0061] Any other suitable variant is intended to be included, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NOs: 7, 21, 22 and / or 23. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, or 97% sequence identity to SEQ ID NOs: 7, 21, 22 and / or 23.

[0062] The polypeptide of the invention may further comprise SEQ ID NO: 1. In the context of SEQ ID NO: 1, X, Z, J, O, U, B, X2and Z2may represent different amino acids. In one embodiment, X represents cysteine (C) or serine (S). In another embodiment, Z represents a hydrophobic amino acid. Preferably, Z represents leucine (L) or methionine (M). In another embodiment, J represents proline (P) or serine (S). In another embodiment, O represents a hydrophobic amino acid. Preferably, O represents isoleucine (I) or methionine (M). In another embodiment, U represents a hydrophilic amino acid. Preferably, U represents lysine (K) or asparagine (N). In another embodiment, B represents a hydrophobic amino acid. Preferably, B represents isoleucine (I) or leucine (L). In another embodiment, X2represents alanine (A) or threonine (T). In another embodiment, Z2represents alanine (A) or serine (S). A variant of SEQ ID NO: 1 may comprise any selection of these amino acids at positions X, Z, J, O, U, B, X2, Z2and / or J2. For example, SEQ ID NOs: 8, 9 and 10 are variants of SEQ ID NO: 1. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 1, 8, 9 and / or 10.

[0063] SEQ ID NO: 1 FTQCGYPALMPLYACIQZ2KQAFTFSPTYKAFLXKQYZNLYPVARQRJGLCQVFADATP TGWGLAOGHQRMRGTFVZ2PLPIHTAELLAACFARSRSGAUBBGTDNSVVLSRKYTSFP WLLGCX2ANWILRGTS

[0064] SEQ ID NO: 8 FTQCGYPALMPLYACIQAKQAFTFSPTYKAFLCKQYLNLYPVARQRPGLCQVFADATPT GWGLAIGHQRMRGTFVAPLPIHTAELLAACFARSRSGAKLIGTDNSVVLSRKYTSFPWL LGCAANWILRGTS SEQ ID NO: 9 FTQCGYPALMPLYACIQSKQAFTFSPTYKAFLSKQYMNLYPVARQRSGLCQVFADATPT GWGLAMGHQRMRGTFVSPLPIHTAELLAACFARSRSGANLIGTDNSVVLSRKYTSFPWL LGCTANWILRGTS SEQ ID NO: 10 FTQCGYPALMPLYACIQSKQAFTFSPTYKAFLSKQYMNLYPVARQRSGLCQVFADATPT GWGLAMGHQRMRGTFVSPLPIHTAELLAACFARSRSGANILGTDNSVVLSRKYTSFPWL LGCTANWILRGTS

[0065] Any other suitable variant is intended to be included, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NOs: 1, 8, 9 and / or 10. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 1, 8, 9 and / or 10.

[0066] The polypeptide may comprise any other sequence disclosed herein, including any of those according to or encompassed by SEQ ID NOs: 2, 3, 4, and 6. In the context of SEQ ID NO: 2, X may represent different amino acids. In one embodiment, X represents a hydrophilic amino acid. X may represent a basic amino acid. Preferably, X represents lysine (K) or arginine (R). A variant of SEQ ID NO: 2 may comprise any hydrophilic amino acid at position X. For example, SEQ ID NOs: 11 and 12 are variants of SEQ ID NO: 2. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 2, 11, and / or 12.

[0067] SEQ ID NO: 2 LGPLLVLQAGFFLLLTXILTIPQSLDSWWTSL

[0068] SEQ ID NO: 11 LGPLLVLQAGFFLLTRILTIPQSLDSWWTSL

[0069] SEQ ID NO: 12 LGPLLVLQAGFFLLTKILTIPQSLDSWWTSL

[0070] Any other suitable variant is intended to be included, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NOs: 2, 11, and / or 12. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, or 96% sequence identity to SEQ ID NOs: 2, 11, and / or 12.

[0071] In the context of SEQ ID NO: 3, X, Z, J, O and X2may represent different amino acids. In one embodiment, X represents a hydrophilic amino acid. Preferably, X represents lysine (K) or glutamine (Q). In another embodiment, Z represents a hydrophobic amino acid. Preferably, Z represents valine (V) or leucine (L). In another embodiment, J represents a hydrophilic amino acid. Preferably, J represents serine (S) or threonine (T). In another embodiment, O represents a hydrophilic (basic) amino acid. Preferably, O represents histidine (H) or arginine (R). In another embodiment, X2represents a hydrophilic amino acid. Preferably, X2represents lysine (K) or asparagine (N). A variant of SEQ ID NO: 3 may comprise any hydrophilic amino acid at position X, any hydrophobic amino acid at position Z, any hydrophilic amino acid at position J, any hydrophilic amino acid at position O, and / or any hydrophilic amino acid at position X2. For example, SEQ ID NOs: 13, 14 and 15 are variants of SEQ ID NO: 3. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 3, 13, 14, and / or 15.

[0072] SEQ ID NO: 3 VGPLTVNEX2RRLXLIMPARFYPNZTKYLPLDKGIKPYYPEHZVNHYFQTRHYLHTLWK AGILIYKREJTO SASFCGSPYSWEQ SEQ ID NO: 13 VGPLTVNEKRRLKLIMPARFYPNVTKYLPLDKGIKPYYPEHLVNHYFQTRHYLHTLWKA GILYKRETTRSASFCGSPYSWEQ SEQ ID NO: 14 VGPLTVNENRRLQLIMPARFYPNLTKYLPLDKGIKPYYPEHWNHYFQTRHYLHTLWKA GILYKRESTRSASFCGSPYSWEQ SEQ ID NO: 15 VGPLTVNENRRLQLIMPARFYPNLTKYLPLDKGIKPYYPEHWNHYFQTRHYLHTLWKA GILYKRETTHSASFCGSPYSWEQ Any other suitable variant is intended to be included, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NOs: 3, 13, 14, and / or 15. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 98% sequence identity to SEQ ID NOs: 3, 13, 14, and / or 15.

[0073] In the context of SEQ ID NO: 4, X, Z and X2may represent different amino acids. In one embodiment, X represents a hydrophobic amino acid, Z represents a hydrophilic amino acid, and / or X2represents a hydrophilic amino acid. Preferably, X represents valine (V) or isoleucine (I). Preferably, Z represents glutamate (E) or aspartate (D). Preferably, X2represents serine (S) or threonine (T). A variant of SEQ ID NO: 4 may comprise any hydrophobic amino acid at position X, any hydrophilic amino acid at position Z, and / or any hydrophilic amino acid at position X2. For example, SEQ ID NOs: 8, 9 and 10 are variants of SEQ ID NO: 4. Therefore, in a preferred embodiment, the polypeptide of the present invention may comprise one or more amino acid sequences according to SEQ ID NOs: 4, 16, 17, and / or 18.

[0074] SEQ ID NO: 4 MDIDPYKEFGAX2VELLSFLPSDFFPSXRDLLDTASALYRZAL SEQ ID NO: 16 MDIDPYKEFGATVELLSFLPSDFFPSVRDLLDTASALYREAL SEQ ID NO: 17 MDIDPYKEFGASVELLSFLPSDFFPSIRDLLDTASALYREAL SEQ ID NO: 18 MDIDPYKEFGASVELLSFLPSDFFPSIRDLLDTASALYRDAL

[0075] Any other suitable variant is intended to be included, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NOs: 4, 16, 17, and / or 18. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, or 97% sequence identity to SEQ ID NOs: 4, 16, 17, and / or 18. Any suitable variant of SEQ ID NO: 6 may be present, especially variants in which the amino acid variations have little to no effect on the binding of the polypeptide to its cognate HLA molecule, including variants having at least 70% sequence identity to SEQ ID NO: 6. In one embodiment, there may be a variant having at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, or 97% sequence identity to SEQ ID NO: 6.

[0076] SEQ ID NO: 6 KLHLYSHPIILGFRKIPMGVGLSPFLLAQFTSAICS

[0077] The polypeptide according to a first aspect of the invention preferably further comprises SEQ ID NO: 1. In another embodiment, the polypeptide comprises SEQ ID NO: 2. In another embodiment, the polypeptide comprises SEQ ID NO: 3. In another embodiment, the polypeptide comprises SEQ ID NO: 4. In another embodiment, the polypeptide comprises SEQ ID NO: 6.

[0078] Given the variants of the hotspot sequences according to SEQ ID NOs: 5 and / or 7, the polypeptide may comprise one or more of the amino acid sequences according to any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22, 23, or a sequence having at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity thereto. A sequence may differ from any of SEQ ID NOs: 1 to 23 by at least one amino acid. Therefore, unless the sequence shares 100% identity with a sequence any of SEQ ID NOs: 1 to 23, a sequence differing by at least one amino acid may have up to about 96% sequence identity to any one of SEQ I D NOs: 2, 11, or 12; up to about 97% sequence identity to any one of SEQ ID NOs: 4, 16, 17, 18, 6, 7, 21, 22, 23, 5, 19, or 20; up to about 98% sequence identity to any one of SEQ ID NOs: 3, 13, 14, or 15; up to about 99% sequence identity to any one of SEQ ID NOs: 1, 8, 9, or 10). Any number and combination of said sequences may be present to form a polypeptide of the present invention.

[0079] In one embodiment, one or multiple copies of any particular sequence according to any of SEQ ID NOs: 1 to 7, and in particular any of SEQ ID NOs: 6 and 8 to 23 may be present. In a preferred embodiment, one or multiple copies of any particular sequence according to SEQ ID NOs: 5 and / or 7, and in particular any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22, 23 may be present. Optionally, any of SEQ ID NOs: 8 to 23 may be present. For example, there may be at least one, at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, or more than ten copies of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, and / or SEQ ID NO: 23 present in the polypeptide of the present invention.

[0080] In addition, there may be any combination of one or multiple different sequences according to SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22, 23. In a preferred embodiment, the polypeptide may comprise any of SEQ ID NOs: 1, 2, 3, 4, 5, 6, and / or 7. In a more preferred embodiment, the polypeptide may comprise any of SEQ ID NOs: 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 and / or 23. In a more preferred embodiment, the polypeptide may comprise any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23. In an even more preferred embodiment, the polypeptide may comprise any of SEQ ID NOs: 8, 9, 20 and / or 22. In an even more preferred embodiment, the polypeptide may comprise any of SEQ ID NOs: 8, 20 and / or 22. In an even more preferred embodiment, the polypeptide may comprise any of SEQ ID NOs: 20 and / or 22, preferably SEQ ID NOs: 20 and 22.

[0081] The inventors of the present invention utilised a Greedy Hill Climbing (GHC) algorithm to assess optimal (sub)sets of refined hotspots with maximised population coverage (Figures 11 A and 11 B and Table 3). This analysis revealed that, of all 17 refined hotspot sequences (i.e., SEQ ID NOs: 6 and 8 to 23, encompassed by SEQ ID NOs: 1 to 7), SEQ ID NOs: 20 and 22 achieved significant population coverage, averaging 82.45% and 81.71%, respectively. SEQ ID NOs: 8 and 9 achieved a population coverage of 87.16% and 87.33%, respectively. Generally, the incorporation of multiple sequences in combination even further increased the average population coverage, however the inventors have identified optimal combinations of sequences which provide substantial population coverage (Table 3). Figures 13 to 17 show a schematic representation of these estimated population coverages.

[0082] Therefore, in a most preferred embodiment, the polypeptide comprises a sequence according to SEQ ID NO: 5 and / or 7, optionally further comprising a sequence according to SEQ ID NO: 1.

[0083]

[0084] Table 3. Preferred sequences and combinations thereof and their associated average population coverage.

[0085] In another preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 5. The polypeptide may comprise a sequence according to SEQ ID NOs: 19 and / or 20. In a most preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 20.

[0086] In another preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 7. The polypeptide may comprise a sequence according to SEQ ID NO: 21, 22 and / or 23. In a most preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 22.

[0087] Optionally, the polypeptide comprises a sequence according to SEQ ID NO: 5 and / or 7, and further comprises a sequence according to SEQ ID NO: 1. Therefore, the polypeptide may comprise a sequence according to SEQ ID NOs: 8, 9 and / or 10. In a most preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 8.

[0088] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 2. The polypeptide may comprise a sequence according to SEQ ID NOs: 11 and / or 12. In a most preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 12.

[0089] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 3. The polypeptide may comprise a sequence according to SEQ ID NOs: 13, 14 and / or 15. In a most preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 14.

[0090] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 4. The polypeptide may comprise a sequence according to SEQ ID NOs: 16, 17 and / or 18. In a most preferred embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 18.

[0091] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 16.

[0092] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 6.

[0093] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 9. In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 13.

[0094] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 21.

[0095] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 15.

[0096] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 11.

[0097] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 17.

[0098] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 19.

[0099] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 23.

[0100] In another embodiment, the polypeptide may comprise a sequence according to SEQ ID NO: 10.

[0101] The one or more sequences that are present in any combination to form the polypeptide may, or may not be, joined or linked together. In one embodiment, the sequences may be present separately or independently from one another, and may be present together, albeit not physically joined, as a composition. However, in another embodiment, the sequences may be present in a manner such that they are physically joined together. As such, the polypeptide may comprise a concatemer of sequences having the amino acid sequences of any of SEQ ID NOs: 1 to 23 (i.e., any of SEQ ID NOs: 1 to 7 or variants thereof). The term “concatemer” is used to refer to a linear arrangement of repeated sequences that are connected in a continuous chain, with or without spacers as required. The skilled person would understand that the polypeptide of the invention may alternatively comprise combinations of sequences selected from SEQ ID NOs: 1 to 23, preferably 5 and / or 7, wherein certain sequences are repeated. Thus, the polypeptide of the invention may comprise a combination of concatenated and non-concatenated sequences. Concatenated polypeptides of the invention may comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more repeats of any of the sequences of the invention, as would be appreciated by the person of skill in the art. This applies to all embodiments of the invention.

[0102] The polypeptide may further comprise one or more spacer regions; preferably wherein when the polypeptide comprises more than one of the amino acid sequences of SEQ ID NOs: 1 to 7, preferably SEQ ID NO: 5 and / or 7 or the variants thereof, optionally further comprising SEQ ID NO: 1 or the variants thereof (i.e., any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23) encoded by a hotspot sequence, the one or more spacer regions are positioned between one or more of each amino acid sequence encoded by a hotspot sequence. Alternatively, spacer regions may be positioned between only selected amino acid sequences as would be determined according to the specific polypeptide design, as explained above. Therefore, the overall polypeptide may comprise spacer regions between some of its constituent amino acid sequences and no spacer region between some of its other constituent amino acid sequences. The term “spacer regions” is used to describe additional stretches of amino acids between the amino acid sequences. These may be important to avoid additional off-target or autoimmune sequences being created at the junction between the intended amino acid sequences. The spacer regions of the polypeptide of the invention may be present in instances of either one or multiple of the sequences of SEQ ID NOs: 5 and / or7 or variants thereof, optionally further comprising SEQ ID NO: 1 (i.e., SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23 or any other sequence having at least 70% identity to SEQ ID NOs: 5 and / or 7 and optionally SEQ ID NO: 1) comprising the polypeptide of the invention. The inclusion of spacer regions in the polypeptide of the invention is applicable to all embodiments of the invention. The order of the sequences of the polypeptide can be tailored to optimise the immunogenicity of the polypeptide, whilst minimising the creation of off-target or autoimmune sequences.

[0103] For the avoidance of doubt, the polypeptide of the invention, according to any embodiment, may comprise at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, or more than ten sequences having the amino acid sequences of any of SEQ ID NOs: 1 to 7, preferably SEQ ID NOs: 5 and / or 7, or any variant thereof having at least 70% sequence identity thereto, optionally further comprising SEQ ID NO: 1, including any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23. As described herein, it will be appreciated that any number and / or combination of these amino acid sequences may be utilised.

[0104] As would be understood by the skilled person, that the polypeptide of the invention may preferably comprise a single amino acid sequence selected from SEQ ID NOs: 5 and / or 7 or any variant thereof having at least 70% sequence identity thereto, optionally further comprising SEQ ID NO: 1, including any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23, or a chain of one or more amino acid sequence selected from SEQ ID NOs: 5 and / or 7 or any variant thereof having at least 70% sequence identity thereto, and that within this chain, each single amino acid sequence is no longer than 150 amino acids in length. In an embodiment of the invention, each of the one or more variant sequences of the polypeptide have at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to SEQ ID NOs: 5 and / or 7. For SEQ ID NOs: 1 to 7, this includes: a sequence differing by at least one amino acid may have up to about 96% sequence identity to SEQ ID NO: 2; up to about 97% sequence identity to any one of SEQ ID NOs: 4, 5, 6, or 7; up to about 98% sequence identity to SEQ ID NO: 3; up to about 99% sequence identity to SEQ ID NO: 1). The skilled person would understand that the polypeptide of the invention may therefore comprise combinations of amino acid sequences selected from SEQ ID NOs: 1 to 7, and / or SEQ ID NOs: 8 to 23, or a sequence having up to about 96% sequence identity to any one of SEQ ID NOs: 2, 11, or 12; up to about 97% sequence identity to any one of SEQ ID NOs: 4, 16, 17, 18, 6, 7, 21, 22, 23, 5, 19, or 20; up to about 98% sequence identity to any one of SEQ ID NOs: 3, 13, 14, or 15; up to about 99% sequence identity to any one of SEQ ID NOs: 1, 8, 9, or 10. This means that the polypeptide may retain its intended qualities and uses thereof even if the amino acid sequence changes. Substitutions, deletions, and insertions of amino acids may comprise potential manipulations of the polypeptide of the invention.

[0105] The GHC analysis conducted by the inventors of the present invention identified some optimal combinations of amino acid sequences for their estimated percentage population coverage.

[0106] In a second aspect of the invention, there is provided a polynucleotide encoding a polypeptide according to the first aspect of the invention. The polynucleotide may be DNA, RNA, or mRNA. The term “polynucleotide” refers to a polymer whose molecule is composed of many nucleotide units, constituting a section of a nucleic acid molecule. The terms “DNA”, “RNA” and “mRNA” retain their meaning as used in the art and would be well understood by the skilled person.

[0107] In a third aspect of the invention, there is provided a vector comprising a polynucleotide according to the second aspect of the invention, wherein the vector further comprises regulatory elements capable of driving transcription and / or translation of the polynucleotide in a host cell. By “vector” we intend any vehicle that is capable of supporting (i.e., carrying, delivering, encapsulating, incorporating and / or protecting) the polynucleotide of the invention and facilitating stable or transient transfection of a target host cell with the polynucleotide of the invention. The vector may also include the indicated regulatory elements to drive expression of a polypeptide encoded by the polynucleotide in the host cell. Suitable vectors may be, for example, plasmids, viral particles (e.g., lentiviruses, adenoviruses, Adeno-associated virus (AAVs) and any number of other viruses, as would be understood by a person of skill in the art), nanoparticles, lipid-nanoparticles etc. The vector may be a naked DNA or RNA (i.e., mRNA) such as a plasmid or a virus, among other vectors used in the art and that would be familiar to the skilled person. The term “regulatory elements” refers to nucleotide sequences of genes that are involved in regulation of transcription and / or translation. In preferred embodiments, the vector is a viral vector. An example of a vector suitable for the invention is a lenti virus, capable of inserting DNA or RNA into host cell genomes. Examples may also include one or more adenovirus vectors, vesicular stomatis virus vectors, influenza virus vectors or measles virus vectors. When viral particles are used as vectors, they are typically modified from their wild-type form to remove viral antigen encoding nucleic acid and / or to prevent viral replication in the host cell.

[0108] In other preferred embodiments, the vector is a nanoparticle. The nanoparticle may be a lipid nanoparticle or a polymer nanoparticle. The polynucleotide of the second aspect of the invention or vector of the third aspect of the invention may be formulated within a nanoparticle for delivery to and subsequent transfection of a target host cell, such as a cell of a subject suffering from an HBV-related disease or HBV infection. For example, there may be the delivery of mRNA via a lipid nanoparticle. The polypeptide of the first aspect of the invention may be formulated within a nanoparticle.

[0109] In a fourth aspect of the invention, there is provided a microorganism comprising a polypeptide according to the first aspect of the invention, or a polynucleotide according to the second aspect of the invention. The term “microorganism” refers to a microscopic organism, such as a bacterium, virus, orfungus. Microorganisms may be used to further propagate the polynucleotide or polypeptide of the invention or to be administered as part of therapeutic or prophylactic use in order to deliver the polynucleotide or polypeptide to a specific tissue or cell type such as an antigen presenting cell. In a preferred embodiment, the microorganism is a bacterial microorganism. A microorganism comprising a polypeptide of the invention may be suited to industrial production of the polypeptide, and may include, but is not limited to, Escherichia coli, Bacillus subtilis, Streptomyces spp., Corynebacterium glutamicum, Pseudomonas putida, Clostridium spp. and Lactobacillus. The microorganism of the invention may also be genetically modified. The polypeptide of the present invention may be produced recombinantly within microorganisms or produced via synthetic means. The skilled person would be aware of both recombinant and synthetic methods (e.g., chemical or de novo synthesis) of polypeptide synthesis. The invention also provides a polynucleotide according to the first aspect of the invention, or a polynucleotide according to the second aspect of the invention, or a vector according to the third aspect of the invention, formulated in a nanoparticle or lipid formulation. The polypeptide may be encapsulated in a lipid formulation. The term “nanoparticle” refers to a nanoscale particle between 1 and 100 nanometres in diameter, often with a very high surface area to volume ratio. Formulation in a nanoparticle may improve delivery of the polynucleotide or vector of the invention to a subject and / or the immune system of a subject. The nanoparticle may be a lipid nanoparticle, designed to facilitate encapsulation and delivering of the polynucleotide or vector of the invention.

[0110] In a fifth aspect of the invention, there is provided a composition comprising a polypeptide as described above, or one or more of the amino acid sequences comprising SEQ ID NOs: 5 and / or 7 or a variant thereof having at least 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% sequence identity thereto, optionally further comprising SEQ ID NO: 1, including any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23, each of said amino acid sequences being no more than 150 amino acids in length, or one or more polynucleotides encoding one or more of said polypeptides / amino acid sequences. There may be one or more, two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, or more than ten amino acid sequences present. The amino acid sequences may be present in any combination and / or be present as repeated copies. The composition of the present invention may comprise one or more of said amino acid sequences that are joined together to form a larger overall polypeptide, wherein there may or may not be spacer sequences present. In this embodiment, the overall polypeptide is to be no longer than 1500, 1400, 1300, 1200, 1100, 1000, 950, 900, 850, 800, 750, 700, 650, 600, 550, 500, 450, 400, 350, 300, 250, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, or 30 amino acids in length, and each of the one or more constituent amino acid sequences within the overall polypeptide may each be no more than 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, or 30 amino acids in length. In other embodiments, the amino acid sequences present in the composition may not be joined together but may separate, or free to move relative to one another within the composition. For example, the composition may be composed of multiple separate copies of any one of SEQ ID NOs: 5 and / or 7, or may be composed of multiple separate copies of mixtures of SEQ ID NOs: 5 and / or 7.

[0111] In a sixth aspect of the invention, there is provided a vaccine composition comprising a polypeptide, polynucleotide, vector, microorganism or composition according to the first, second, third, fourth, or fifth aspects of the invention. In a preferred embodiment of the invention, the vaccine composition is an RNA vaccine. In a more preferred embodiment, the vaccine composition is an mRNA vaccine. In a further embodiment of the invention, the vaccine composition comprises a pharmaceutically acceptable carrier, diluent, excipient and / or adjuvant. In yet a further embodiment of the invention, the vaccine composition is formulated for parental, oral, sublingual, nasal, naso-oral, or pulmonary administration. In a still further embodiment of the invention, said parental administration is subcutaneous, intradermal, intramuscular, subdermal, intraperitoneal, or intravenous administration.

[0112] The term “mRNA vaccine” describes a vaccine that uses a copy of an mRNA molecule to produce an immune response. The vaccine delivers molecules of antigen-encoding mRNA into host immune cells, which use the designed mRNA as a blueprint to build foreign protein, or fragment of a protein, or multiple fragments from the same or different proteins, that would normally be produced by a pathogen such as a virus. These protein molecules stimulate an adaptive immune response that teaches the body to identify and destroy the corresponding pathogen. In an embodiment of the invention, the foreign protein(s), or fragments thereof, originate from the hepatitis B virus. The mRNA may be delivered by a co-formulation of the mRNA encapsulated in lipid nanoparticles. The mRNA may also be formulated for nasal or intratracheal delivery via a nanoparticle-delivery-based system. This system may comprise a biodegradable poly(amine-co-ester) polymer that forms polyplexes with mRNA so that the vaccine is inhalable.

[0113] The terms “vaccine composition” or “vaccine” relate to a biological preparation that induces active acquired immunity to a particular infectious disease, in this case a HBV infection. Typically, the vaccine contains an agent, or “foreign” agent, that resembles the infection-causing pathogen, or part of the infection-causing pathogen, which within the prior art has often been a weakened or killed form of said pathogen, or recombinant protein, protein fragment or protein fragments from such pathogen, or polynucleotide encoding such a protein, protein fragment or fragments (Williamson et al. 1995, FEMS Immunology and Medical Microbiology 12 (3-4): 223-230). Such a foreign agent, protein, protein fragment or protein fragments would be recognised by a vaccine-receiver’s immune system, which in turn would destroy said agent and develop “memory” against the pathogen, inducing a level of lasting protection against future infections from the same or similar pathogenic sub-species. Through the route of vaccination, including those vaccine compositions of the present invention, it is envisaged that the resulting immune response in the recipient would not only resolve the ongoing chronic infection, but would also through the induction of immune memory protect the recipient against future exposure to the same pathogen. The vaccine composition may comprise an adjuvant, a pharmaceutically acceptable carrier or excipient.

[0114] The phrase "pharmaceutically acceptable" refers to molecular entities and compositions that do not produce an adverse, allergic or other untoward reaction when administered to a human, as appropriate. The preparation of a pharmaceutical composition that contains the vaccine composition of the present invention will be known to those of skill in the art in light of the present disclosure. Moreover, for human administration, it will be understood that preparations should meet sterility, pyrogenicity, general safety and purity standards. Examples include, but are not limited to disodium hydrogen phosphate, soya peptone, potassium dihydrogen phosphate, ammonium chloride, sodium chloride, magnesium sulphate, calcium chloride, sucrose, borate buffer, sterile saline solution (0.9 % NaCI) and sterile water.

[0115] Suitable aqueous and non-aqueous carriers that may be employed in the vaccine compositions of the invention include water, ethanol, polyols (such as glycerol, propylene glycol, polyethylene glycol, and the like), and suitable mixtures thereof, vegetable oils, such as olive oil, and injectable organic esters, such as ethyl oleate. Proper fluidity can be maintained, for example, by the use of coating materials, such as lecithin, by the maintenance of the required particle size in the case of dispersions, and by the use of surfactants.

[0116] As used herein, "pharmaceutically acceptable carrier" includes any and all solvents, dispersion media, coatings, surfactants, antioxidants, preservatives {e.g., antibacterial agents, antifungal agents), isotonic agents, absorption delaying agents, salts, preservatives, drugs, drug stabilizers, gels, binders, excipients, disintegration agents, lubricants, sweetening agents, flavouring agents, dyes, such like materials and combinations thereof, as would be known to one of ordinary skill in the art (see, for example, Remington's Pharmaceutical Sciences, 18th Ed. Mack Printing Company, 1990, pp. 1289-1329).

[0117] Examples of adjuvants which may be effective include but are not limited to: unmethylated cytosine-guanine dinucleotide (CpG) motifs, granulocytemacrophage colony-stimulating factor (GM-CSF), aluminium hydroxide, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP), N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(1'-2'-dipalmitoyl-sn-glycero-3-hydroxyphosphoryloxy)-ethylamine (CGP I9835A, referred to as MTP-PE), and RIBI, which contains three components extracted from bacteria, monophosphoryl lipid A, trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in a 2% squalene / Tween 80 emulsion. Further examples of adjuvants and other agents include aluminium hydroxide, aluminium phosphate, aluminium potassium sulfate (alum), beryllium sulfate, silica, kaolin, carbon, water-in-oil emulsions, oil-in-water emulsions, muramyl dipeptide, bacterial endotoxin, lipid X, Corynebacterium parvum (Propionobacterium acnes), Bordetella pertussis, polyribonucleotides, sodium alginate, lanolin, lysolecithin, vitamin A, saponin, liposomes, levamisole, DEAB-dextran, blocked copolymers or other synthetic adjuvants. Such adjuvants are available commercially from various sources, for example, Merck Adjuvant 65 (Merck and Company, Inc., Rahway, N. J.) or Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, Detroit, Mich.).

[0118] Thus, in some embodiments of the invention the composition or vaccine composition may further comprise a pharmaceutically acceptable carrier, diluent, excipient and / or adjuvant. In a preferred embodiment, the composition may further comprise an adjuvant.

[0119] In a seventh aspect of the invention, there is provided a polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth or sixth aspects of the invention for therapeutic or prophylactic use. The polypeptide, polynucleotide, vector, microorganism, or composition according to the first, second, third, fourth, fifth or sixth aspects of the invention may be for use in medicine.

[0120] In an eighth aspect of the invention, there is provided a polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth, sixth or seventh aspects of the invention, for use in the treatment or prophylaxis of a HBV-related disease or HBV infection.

[0121] In a ninth aspect of the invention, there is provided a method of treatment comprising administering to a subject the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth or sixth aspects of the invention, respectively.

[0122] In a tenth aspect of the invention, there is provided a use of the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to the first, second, third, fourth, fifth or sixth aspects of the invention, respectively, in the manufacture of a medicament. The medicament is preferably for therapeutic use, more preferably for the treatment or prophylaxis of a HBV-related disease, the HBV-related disease being, more preferably, HBV-related liver cirrhosis or HBV-related liver cancer.

[0123] The terms “treatment” and “prophylaxis” are to be used interchangeably with the terms “therapeutic treatment” and “prophylactic treatment”, respectively. It is envisaged that the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition in any aspect of the present invention may be used against any HBV infection. The term “prophylactic treatment”, as used herein, refers to a medical procedure whose purpose is to prevent or reduce the morbidity or duration of (rather than treat or cure) a disease, such as an infection. In contrast, the term “therapeutic treatment” refers to a medical procedure with the purpose of treating or curing a viral infection or the associated symptoms thereof, as would be appreciated within the art.

[0124] It is envisaged that the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition of the present invention may aid in the therapeutic or prophylactic treatment of a HBV-related disease. In some embodiments, the HBV-related disease is chronic. In other embodiments, the HBV-related disease is acute. In other embodiments, the HBV-related disease is acute and progresses to chronic. The HBV-related disease may include, but is not limited to, hepatitis, HBV-related liver cirrhosis and HBV-related liver cancer. In one embodiment, the therapeutic or prophylactic treatment is for a human subject. However, any member of the animal kingdom suffering from an HBV-related disease may benefit from the therapeutic or prophylactic treatment as required.

[0125] It is envisaged that the polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition of the present invention may aid in the therapeutic or prophylactic treatment of a HBV infection or HBV-related disease in a human subject, wherein said composition comprises one or more epitopes of the present invention that are capable of stimulating a broad adaptive immune response across a variety of HLA types. The term “subject” refers to a person subjected to treatment, observation or experiment and would be understood by the skilled person as such.

[0126] The active acquired immunity that may be induced by the invention is expected to be predominantly cellular. The active acquired immunity that may be induced by the invention may also be humoral. Humoral immunity refers to a response involving B cells which produce antibodies that specifically bind to antigens, or any future antigens, corresponding to those within the administered vaccine composition. B cells, each expressing a unique B cell receptor (BCR), recognise antigens in their native form. Upon this recognition and further interaction with other cells of the immune system, the activated B cell can differentiate into a plasma cell specialised to secrete antibodies against the encountered antigen. The term antibody refers to an immunoglobulin (Ig) that is produced by the immune system to specifically identify and neutralise foreign antigens. A subset of these B-cell derived plasma cells become long-lived antigen-specific memory B cells, as would be well understood by the skilled person.

[0127] Cellular immunity, meanwhile, can be broken into two distinct arms. The first involves helper T cells, or CD4+ T cells, which produce cytokines and orchestrate the activity of other immune cells in the immune response. The second involves killer T cells, also known as cytotoxic T lymphocytes (CTLs), or CD8+ T cells, which are cells capable of recognising antigens / epitopes presented by HLA and eradicate viral or bacterial infected host cells or cancerous or otherwise diseased host cells. In contrast to B cells, T cells only recognise antigens that have been processed into peptides and have been loaded onto histocompatibility complex (MHC) molecule and presented at the cell surface. CD4+ T cells interact with MHC class II molecules (MHC Class II), and are responsible for orchestrating the immune response, recognizing foreign antigens, activating various parts of the immune system and activating B cells and CD8+ T cells. CD8+ T cells interact with MHC Class I receptors and play a role in mounting an immune response against intracellular pathogens. As would be understood by the skilled person, on resolution of the infection, a subset of both CD8+ T cells and CD4+ T cells may remain as memory T cells, contributing to the acquired adaptive immunity, and allowing for a faster and stronger response to any secondary infection from the same foreign body (Bonilla & Oettgen 2010, Journal of Allergy and Clinical Immunology 125:33-40).

[0128] The ability to activate CD4+ T cells is usually restricted to professional antigen presenting cells (APCs) that express MHC Class II molecules. Parenchymal cells normally do not express MHC Class II molecules. However, it has been observed that in clinical hepatitis, hepatocytes exhibit aberrant MHC Class II expression, thus allowing infected hepatocytes to act as APCs presenting class II epitopes and activate CD4+ T cells (Herkel J et al, Hepatology. 2003;37(5): 1079-85). Therefore, infected hepatocytes represent a known exception to the rule that MHC Class II expression is generally limited to professional APCs. As such, Class II epitopes are beneficial when designing HBV vaccines.

[0129] The amino acid sequences of the present invention may contain epitopes of the hepatitis B virus. As used herein, the term “epitope” is given its usual meaning in the art and is used to denote potentially shorter amino acid sequences making up the polypeptide. The term “epitope” as used herein also refers to any part of an antigen that is specifically recognised by receptors on any T cells (T cell receptors), as is its natural meaning. The “epitope” may also refer to any part of an antigen that is specifically recognised by any antibodies or B cells. An “antigen” refers to a molecule capable of being bound by a receptor on a T cell, and may be comprised of one or more epitopes. An “antigen” may also refer to a molecule capable of being bound by an antibody or a B cell. As such, the terms epitope and antigen may be used interchangeably herein. Epitopes may also be referred to by the molecule for which they bind, such as “T cell epitopes”, or more specifically, “MHC Class I epitopes” or“MHC Class II epitopes”.

[0130] As such, the vaccine composition of the present invention may be an epitopebased vaccine, or in other words, is comprised of one or more epitopes. Epitopebased vaccines (EVs) make use of short antigen-derived peptides corresponding to immune epitopes, which are administered to trigger a protective cellular immune response. EVs may also potentially enhance humoral responses. EVs potentially allow for precise control over the immune response activation by focusing on the most relevant, immunogenic and conserved, antigenic regions. Experimental screening of large sets of peptides is time-consuming and costly; therefore, in silico methods that facilitate the identification of hotspots and T-cell epitope mapping of protein antigens are paramount for EV development. The prediction of T-cell epitopes focuses on the presentation of peptides at the infected cell surface by proteins encoded by the major histocompatibility complex (MHC).

[0131] The amino acid sequences that make up the polypeptide of the present invention may interact with MHC Class I and / or MHC Class II molecules to induce a CD8+ T cell and / or CD4+ T cell response, respectively. In a preferred embodiment of the present invention, there may be at least one amino acid sequence that interacts with MHC Class I, and at least one amino acid sequence that interacts with MHC Class II.

[0132] Thus, in some embodiments of the present invention, if the amino acid sequence comprises epitopes, there may be one or more epitopes present within each amino acid sequence encoded by the hotspot sequence. The one or more epitopes may have the same length, or same number of amino acids. In other embodiments, the one or more epitopes may differ in length, or the number of amino acids.

[0133] It is envisaged that the one or more amino acid sequences of the present invention are capable of stimulating a broad adaptive immune response across a plurality of human leukocyte antigen (HLA) types. In the context of the present invention, the term “plurality” is used to refer to “at least two”, or “two or more”. The human leukocyte antigen (HLA) complex is a set of genes encoding the MHC proteins in humans. Owing to the highly polymorphic nature of HLA genes, in which the term “polymorphic” refers to a high variability of different alleles, the precise MHC proteins of each human individual coded by varying HLA genes may differ to finetune the adaptive immune system. Many thousands of different alleles have been recognised for HLA molecules. As a result, each individual may have a unique “HLA type”, or “HLA genotype”, that differs across the global population, with a slight variability in the functioning of the immune system. The terms “HLA type”, “HLA allele”, or “HLA genotype” may be used interchangeably herein. HLA types are of particular significance when considering a vaccine comprised of epitopes that interact with MHC class I and / or class II molecules, as many epitopes are restricted in their capability of binding only particular HLA molecules encoded by particular HLA alleles, or in other words, are restricted to certain HLA types only. It would thus be appreciated by the skilled person that T cell epitopes that are capable of binding to a subject’s MHC Class I or MHC Class II molecules (and be presented at the infect cell surface), compatible with said subject’s HLA type, would thus present as a robust vaccine. A vaccine composition consisting of the same T cell epitopes may not prove effective if given to a subject with a different HLA type, if said HLA type encodes MHC molecules that are not capable of binding or interacting with said T cell epitopes. Amino acid sequences comprising such epitopes would not be able to stimulate a broad adaptive immune response across for either MHC Class I and / or MHC Class II immunogenicity in that particular subject.

[0134] The polypeptides, polynucleotides, vectors, microorganisms, compositions, and vaccine compositions of the present invention, in contrast, are envisaged to be able to stimulate a broad adaptive immune response across a plurality of HLA types, including alleles such as HLA-A*24:02 and HLA-B*07:01. The HLA alleles as referenced herein are given contemporary HLA nomenclature as standard to the field, wherein HLA-A, for example, refers to the gene loci in chromosome 6, whilst HLA-A*24:02 refers to the protein the allele codes for. An in-depth explanation of the complexities of HLA nomenclature can be found in Marsh et al.

[0135] 2010, Tissue Antigens 75(4): 291-455.

[0136] The amino acid sequences of the present invention were identified by predicting and selecting hotspot sequences present across multiple HBV genotypes that were enriched for epitopes that could be bound and presented by the most frequent Class I HLA alleles in the human population. These sequences were validated in healthy donor and HBV PBMCs to confirm their efficacy in stimulating T cell expansion. The platform used to identify and predict the one or more amino acid sequences (hotspots) that may make up the polypeptide of the present invention was surprisingly robust, as was its integrated statistical analysis.

[0137] Firstly, the most prevalent amino acid sequence for each protein and genotype was identified within the input data set. Next, the selected amino acid sequences of the proteins were split into peptides of length 9 and 10 by means of a sliding window. The important determinants of antigen presentation (AP) were assessed for each peptide of each protein and each genotype for their potential to be efficiently presented. These determinants consisted of: (1) the predicted binding affinity between the candidate peptide and 156 of the most frequent Class I HLA molecules in the human population, (2) the predicted potential of the candidate peptide to be efficiently processed by the antigen processing machinery of the host infected cell, and (3) the predicted probability of the candidate peptide to be presented on the host infected cell surface.

[0138] The AI prediction platform used was the NEC Immune Profiler (NIP), which provided the AI predictions for these key determinants, such as the AP scores (https: / / pubmed.ncbi.nlm.nih.gov / 33361777 / ). The AP scores were calculated based on a set of 156 most frequent Class I HLA alleles (A and B) for all the possible combinations between an HLA allele and each epitope. As such, each amino acid within a considered protein sequence was assigned an AP score. Matrices with the AP scores at each amino acid position for each of the considered HLA alleles were generated for each protein and genotype. Hotspots were identified as protein regions consistently showing a higher-than-average AP score within each matrix, across all considered HLA alleles.

[0139] To assess the performance of the identified hotspots, “Digital Twin” (DT) simulations based on statistical models of optimized population coverage were performed. These analyses estimated the response likelihood of the hotspots against different HLA haplotypes of individuals from different regions of the world, as well as on a worldwide population. For this purpose, the HLA haplotype data from The Allele Frequency Net Database was leveraged (https: / / academic.oup.com / nar / article / 48 / D1 / D783 / 5624967).

[0140] Top candidate hotspots were selected as being the best performing with respect to their estimated population coverage (i.e., response likelihoods) from DT simulations and being the most dissimilar with respect to their amino acid sequence (e.g., differing by more than 90% in sequence). Finally, the homologs of each of the top hotspots in other genotypes were identified. This allowed the inventors to reduce the set of identified hotspots from 216 to the seven sequences as defined in SEQ ID NOs: 1 to 7.

[0141] This approach advantageously uses a statistical model to quantitatively analyse the predicted immunogenic potential of one or more amino acid sequences - in other words, the predicted ability of the one or more amino acid sequences to instigate an immunogenic response- within an amino acid sub-sequence, across a set of different HLA types. The candidate regions (or “hotspots”) of the amino acid sequences that are identified by the quantitative statistical analysis represent regions (or areas) of the one or more source proteins that are viable vaccine targets and should be used in vaccine design and creation. The HBV genotypes that were screened were obtained from the HBV database (HBVdb) (Hayer, 2013).

[0142] Each of the amino acid sequences (encoded by the identified hotspots) may comprise one or more epitopes capable of stimulating an adaptive immune response through MHC Class I and / or MHC Class II. A candidate region may comprise a single epitope that is predicted to instigate an immunogenic response across a plurality of the HLA types. Such an epitope may be termed as a “promiscuous epitope” and have multiple cognate HLA “receptors”. More typically however, a candidate region comprises a plurality of epitopes that, collectively, overlap with a large proportion of the analysed HLA types. For example, one epitope within a candidate region may overlap with n HLA types and a different epitope within the candidate region may overlap with m HLA types such that the candidate region is predicted to instigate an immunogenic response across the (m+n) HLA types.

[0143] The approach comprised the step of assigning, for each of the set of HLA types, an antigen presentation (AP) score for each amino acid, wherein said score is indicative of the immunogenic potential of an epitope within an amino acid sequence (hotspot) comprising that amino acid, for that HLA type. For a given HLA allele, the score allocated to an amino acid corresponds to the best score obtained by an epitope prediction overlapping with this amino acid. For Class I HLA alleles, 1 represents the best score, wherein the amino acid has a higher likelihood of being naturally presented on the cell surface, whereas a score closer to 0 represents a lower likelihood.

[0144] The predictions for Class I HLA types were performed using an antigen presentation and binding affinity prediction algorithm, as well as experimental data. The predictions for a subset of Class II HLA types were performed on the selected hotspots identified through Class I HLA predictions using Bert MHC (https: / / academic.oup.com / bioinformatics / article / 37 / 22 / 4172 / 6294399). Examples of publicly available databases and tools that may be used for such predictions include the Immune Epitope Database (IEDB) (https: / / www.iedb.org / ), the NetMHC family of prediction tools (http: / / www.cbs.dtu.dk / services / NetMHC / ), the TepiTool prediction tool (http: / / tools.iedb.org / tepitool / ), the NetChop prediction tool (http: / / www.cbs.dtu.dk / services / NetChop / ) and the MHC-NP prediction tool (http: / / tools.immuneepitope.org / mhcnp / .). Other techniques are disclosed in W02020 / 070307 and WO2017 / 186959.

[0145] Antigen presentation was predicted from a machine learning model that integrates in an ensemble machine learning layer information from several HLA binding predictors (trained on ic50nam binding affinity data) and a plurality of different predictors of antigen processing (trained on mass spectrometry data).

[0146] Each of the identified epitopes was then preferably allocated a score based on the immunogenic potential predicted using the above techniques. Advantageously, the method not only identified candidate regions comprising epitopes that may bind to an HLA molecule, but also those CD8 epitopes that are naturally processed by a cell’s antigen processing machinery, and presented on the surface of host infected cells.

[0147] The AP scores were assigned by the following protocol. Firstly, a plurality of epitopes were identified across the amino acid sequence, in a “moving window” of amino acids of fixed length. This was performed for each HLA type. For each of the identified first epitopes, a score was generated that is indicative of the immunogenic potential of that epitope, for the respective HLA type. A plurality of further epitopes were subsequently identified across the amino acid sequence, for each HLA type. Again, this was performed using a “moving window approach”. Each of the further epitopes were also assigned a score that was indicative of the immunogenic potential of that epitope, for the respective HLA type. Each amino acid was then assigned, for each HLA type, the score of the epitope that was predicted to have the best immunogenic potential of all the epitopes comprising that amino acid. Hence, for a particular HLA type, if epitope “A” and epitope “B” both comprised a particular amino acid “X”, the amino acid “X” would have been assigned the score of whichever epitope “A” or “B” is predicted to have the best immunogenic potential. In other words, for a given HLA type, the score allocated to an amino acid corresponds to the best score obtained by an epitope overlapping with this amino acid.

[0148] The AP score for each amino acid within a given source protein, or open reading frame, was averaged across HLA types, as shown in Figures 1-3. An AP score is given for each amino acids, wherein it is the average AP score of that amino acid across 156 of the most common HLA-Aand HLA-B alleles that correspond to MHC Class I (Figures 1A, 1B, 1C). In total, 156 of the most common human Class I HLA-A and HLA-B across the globe were subjected to analysis. The one or more epitopes of the present invention may (in addition to being able to interact with the top 156 HLA-A or HLA-B alleles) also be able to stimulate a broad adaptive immune response across a plurality of HLA types including HLA-C, HLA-DP, HLA-DQ and / or HLA-DR alleles. An in-house Class II analysis tool developed by the present inventors was used to check for the presence of Class II epitopes within the hotspots identified herein that could be recognized by common Class II HLA alleles or alleles relevant to HBV. Figure 12 lists the Class II HLA alleles used to assess the presence of characteristic epitopes within the selected hotspots.

[0149] The HLA types analysed may further be characterised into HLA types of the same or different human population groups. A population group may be an ethnic population group (e.g., Caucasian, Africa, Asian) or a geographical population group (e.g., Lombardy, Lagos).

[0150] The polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition of the present invention may comprise one or more epitopes within the amino acid sequences which make up the polypeptide, and wherein said epitopes meet a particular threshold of a mean antigen presentation (AP) cut off value. Said mean AP cut off value is the value, averaged across all amino acids within an epitope, for which said epitope is considered able to stimulate a broad adaptive immune response across a plurality of HLA types for MHC Class I. For the sake of avoiding confusion, the term “antigen presentation (AP) value” may be used to mean binding affinity or percentile ranking, and the terms shall be used interchangeably.

[0151] It is envisaged that the composition of the present invention may comprise an immunogenic portion of the hepatitis B virus. Each amino acid sequence must be able to stimulate a broad adaptive immune response across a plurality of HLA types, for either MHC Class I and / or MHC Class II immunogenicity.

[0152] In some embodiments, the size of said immunogenic portion may have an upper limit of 150 amino acids in length. In other embodiments, the upper limit may be 140 amino acids in length. In a further embodiment, the upper limit may be 130 amino acids. In yet another further embodiment, the upper limit may be 120 amino acids in length. In yet another further embodiment, the upper limit may be 110 amino acids in length. In yet another further embodiment, the upper limit may be 100 amino acids in length. In yet another further embodiment, the upper limit may be 90 amino acids in length. In yet another further embodiment, the upper limit may be 80 amino acids in length. In yet another further embodiment, the upper limit may be 70 amino acids in length. In yet another further embodiment, the upper limit may be 60 amino acids in length. In yet another further embodiment, the upper limit may be 50 amino acids in length. In yet another further embodiment, the upper limit may be 40 amino acids in length. In yet another further embodiment, the upper limit may be 30 amino acids in length. Accordingly, the immunogenic portion may consist of the complete amino acid sequence (hotspot), or fragments thereof.

[0153] It is envisaged that such an immunogenic portion for use in the composition or vaccine composition of the present invention may be recombinant in nature, wherein “recombinant” refers to the artificial and / or modified characteristic of said immunogenic portion, which may be produced through genetic recombination means. Such means would be apparent to the skilled person. As such, it is envisaged that the immunogenic portion may be a non-functional, recombinant fragment of a protein, wherein said non-functional, recombinant fragments include one or more of the epitopes capable of stimulating a broad adaptive immune response across a plurality of HLA types, as described in the present invention.

[0154] The amino acid sequences of the invention are able to stimulate a broad adaptive immune response across a plurality of HLA types, for either MHC Class I and / or MHC Class II. In some embodiments, the invention may comprise one or more amino acid sequences according to the present invention that are considered able to stimulate a broad adaptive immune response across a plurality of HLA types for MHC Class I. In other embodiments, the invention comprises one or more amino acid sequences according to the present invention that are considered able to stimulate a broad adaptive immune response across a plurality of HLA types for MHC Class I. In a preferred embodiment, the invention may comprise one or more amino acid sequences that are considered able to stimulate a broad adaptive immune response across a plurality of HLA types for MHC Class I.

[0155] It is envisaged that the present invention may further comprise tertiary protein structures, or domains thereof, of species including the hepatitis B virus.

[0156] In some embodiments of the present invention, the vaccine composition may further comprise a pharmaceutically acceptable carrier, diluent, excipient and / or adjuvant, as well as minor amounts of auxiliary substances such as wetting or emulsifying agents, pH buffering agents, and / or adjuvants which enhance the effectiveness of the vaccine.

[0157] In some embodiments, the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition may be used in the therapeutic or prophylactic treatment of any HBV infection or HBV-related disease. The compositions or vaccine compositions of the present invention may be formulated for parenteral, oral, sublingual, nasal, naso-oral, or pulmonary administration. In a preferred embodiment, the parenteral administration may be subcutaneous, intradermal, intramuscular, subdermal, intraperitoneal, or intravenous.

[0158] It is envisaged that administration of the polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition according to the present invention would be carried out following an appropriate immunisation regimen. The term “appropriate immunisation regimen” is to be construed as a schedule or timescale of one or more administrations of the compositions of the present invention, which may resultantly yield the most effective results in consideration of immunisation efficacy and safety of the subject to which the composition is being administered. For example, for the therapeutic or prophylactic treatment of an HBV infection, an immunisation regimen should be chosen that yields as effective immunisation against the HBV as possible, whilst still maintaining suitable safety for the subject. The immunisation regimen may act to "prime” “condition”, “boost”, “amplify”, “enhance”, “improve”, “augment” or “promote” (used interchangeably) an immune response in the subject receiving the compositions of the present invention. The immune response may be, amongst others, a systemic immune response, a local immune response, an innate immune response, an adaptive immune response, a memory immune response, a primary and / or secondary immune response, a specific and / or non-specific immune response, immune cell activation, proliferation, and / or differentiation or the like, or any combinations thereof.

[0159] In some embodiments of the present invention, the immunisation regimen may comprise a single administration. In other embodiments, the immunisation regimen may comprise multiple administrations, either concomitantly or over an appropriate period of time. It is envisaged that the appropriate dosage regimen may be repeated for a subject at a suitable time.

[0160] There exists further the possibility to further administer boost immunisations after a more extended period of time. This may be selected as an appropriate measure if a subject’s T-cell response falls below determined protective levels. The boost immunisations may be administered if a subject’s immunoglobulin G (IgG) antibody levels fall below determine protective levels. Thus, in some embodiments, an appropriate dosage regimen may be given as a “boost immunisation” after 6 months.

[0161] In some embodiments of the present invention, the polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition may be administered for the treatment or prevention of infections caused by a virus in combination with one or more other antiviral therapies or other appropriate therapies such as stem cell therapies. Such antiviral therapies may include administration of nucleos(t)ide analogues, including entecavir (Baraclude), tenofovir disoproxil (Viread), tenofovir alafenamide (Vemlidy) lamivudine (Epivir), adefovir dipivoxil (Hepsera), telbivudine (Tyzeka or Sebivo), amivudine (Epivir-HBV, Zeffix, or Heptodin). Other therapies may include, but are not limited to, pegylated interferons, interferon alpha (Intron A), small interfering RNAs (siRNA), tenofovir prodrugs, entry inhibitors, capsid inhibitors, smoothened agonist inhibitors, cccDNA inhibitors, CRISPR-Cas and transcription activator-like effector nucleases, toll-like agonists, STING, second mitochondrial-derived activator of caspases mimetics, and cyclophilin inhibitors. Any combination of these therapies may be administered with the polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition of the present invention.

[0162] Such therapies may be administered simultaneously, separately, or sequentially with the polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition of the present invention. In a further embodiment, the antiviral therapy may be administered via the same or different route of administration as the polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition of the present invention, for example via intradermal injection.

[0163] The use of the alternative (e.g., "or") should be understood to mean either one, both, or any combination thereof of the alternatives. As used herein, the indefinite articles "a" or "an" should be understood to refer to "one or more" of any recited or enumerated component.

[0164] As used herein, "about" means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, "about" can mean within 1 or more than 1 standard deviation per the practice in the art. Alternatively, "about" can mean a range from the nearest whole number and up to 20%. When particular values are provided in the application and claims, unless otherwise stated, the meaning of "about" should be assumed to be within an acceptable error range for that particular value.

[0165] The invention is now described with reference to the following Examples:

[0166] Example 1

[0167] The overall pipeline for identifying and evaluating candidate hotspots by estimating coverage of simulated digital twin populations is summarised in Figures 1 and 2. The first part of the pipeline focussed on the identification of hotspot vaccine candidates.

[0168] The raw amino acid protein sequences were downloaded from the HBV database (version 55.0, released on 2022-03-04). There are eight genotypes of HBV designated A to H based on a nucleotide variation of more than 7% across the entire genome. The eight HBV genotypes have specific geographical distribution which influences the disease pathology and necessary treatment. Table 4 provides a summary of the various HBV genomes.

[0169]

[0170] Table 4. HBVdb reference complete genomes by genome / subtype. The HBV core protein (HBc) and polymerase protein (Pol) are two proteins which are required for RNA encapsidation of HBV. For most HBV genotypes, the HBc antigen comprises 183 or 185 amino acids, whereas HBc of HBV-G is longer due to an additional internal sequence, and comprises 195 amino acids. The HBc antigen is involved in core particle assembly at its N-terminal (assembly) end, and packing of the pre-genome / reverse transcriptase complex at its C-terminal (functional) end. Pol (832 or 845 amino acids in length depending on the HBV genotype) exhibits both DNA-dependent and RNA-dependent DNA polymerase activity to replicate the HBV genome from a pre-genomic RNA template. The final product of replication is a relaxed-circular form of the HBV genome. HBV Pol consists of four distinct domains: the Terminal Protein (TP) domain, a nonconserved spacer domain, an RNAse H domain, and a reverse transcriptase domain. The latter is targeted by nucleos(t)ide analogs (antiviral drugs) which inhibit the reverse transcriptase activity in attempt to control viral replication.

[0171] The sequences were stratified by genotype: A, B, C, D, and ‘AH’; and by protein: HBe, HBc, HBx, LHBs, MHBs, SHBs, Pol, and HBSP. The ‘All’ genotype represents the complete collection of sequences from all the available HBV genotypes. For each genotype, the most frequent protein sequence that occurs in nature was selected. The advantages of this approach are that i) being the most prevalent sequence for any given protein in each genotype, and ii) it is a naturally occurring sequence.

[0172] These sequences were then individually run through the inventor’s Al prediction platform, the NEC Immune Profiler (NIP), which provided the Al predictions for these key determinants, such as the AP scores surface. The predictions were run individually for each protein and genotype of interest. The AP score is in the range between 0 and 1, with 1 being the maximum of the likelihood that a specific candidate peptide was presented on the host infected cell. In brief, the amino acid sequences of the proteins were split into minimal epitopes (9- and 10-mers) by means of a sliding window. The important determinants of antigen presentation (AP) were assessed for each peptide of each protein and each genotype for their potential to be efficiently presented. These determinants consisted of: (i) the predicted binding affinity between the candidate peptide and 156 of the most frequent Class I HLA molecules in the human population, (ii) the predicted potential of the candidate peptide to be efficiently processed by the antigen processing machinery of the host infected cell, and (iii) the predicted probability of the candidate peptide to be presented on the host infected cell surface, which, among other factors, takes binding and processing into account. For each protein a matrix of AP scores was generated. This matrix contains AP scores at each amino acid position. As a result, a matrix of aggregated AP scores for each amino acid-HLA pair is generated, which can be represented as a heatmap for each protein.

[0173] Based on each matrix of AP scores, the inventors then identified a set of top hotspots across the HBV genomes and refined the sequences of the top candidate hotspots.

[0174] A peak calling algorithm identifies sub-regions of a protein which consistently present higher immunogenicity than the rest of the protein. These regions are labeled as “hotspots”.

[0175] The entire matrix of AP score predictions of a given protein served as input data. First, the AP scores for each amino acid within the protein sequence were summed across the 156 considered HLA alleles and the mean of these sums was calculated. This mean represented the baseline for the peak detection algorithm. A local polynomial regression (LOESS) smoothing curve with a span of 10% was applied on the individual AP score sums (i.e., at amino acid level). Next, a rolling function was applied on the smoothed AP scores to identify all local maxima within a given protein. The default window size for the rolling function was set to nine amino acids, in agreement with the framework used to predict the presentation potential of each peptide within the protein. The identified peak maxima represent the peak summits across the protein sequence. Only peak summits that are above the baseline were considered. The peak summits were then Input into a run length encoding (RLE) algorithm to identify hotspot regions. A hotspot region was defined as the entire region flanking a peak summit on both sides that is consistently above the baseline. This framework led to the identification of 216 hotspot regions across all proteins and genotypes.

[0176] Example 2

[0177] The second part of the pipeline involved the simulation of a “Digital Twin” (DT) population whereby region-specific HLA haplotype distributions were built using real-world citizen HLA haplotypes (Figure 2). To assess how well the identified hotspots would perform as vaccine candidates, DT simulations based on statistical models of optimized population coverage were performed. These analyses estimated the response likelihood of the hotspots against different HLA haplotypes of individuals from the regions of interest: Europe, United States, Japan, China, and Worldwide. For this purpose, we leveraged the HLA haplotype data from The Allele Frequency Net Database (AFND). More specifically, the HLA haplotypes of ~5.7 million individuals accounting for all regions of the world were incorporated into the analyses.

[0178] The DT framework allowed for the assessment of hotspot performance by estimating the response likelihood of individuals to these peptides based on their HLA haplotype. This is achieved by (i) simulating populations for any given geographical region of interest based on the data derived from the AFND and (ii) selecting candidate hotspots from the set of hotspots that maximize the likelihood of response given the HLA haplotype makeup of the individuals in the underlying population, at the same time accounting for rare HLA alleles. More specifically, the population for each geographical region of interest was generated by sampling ten times 10,000 individuals from the total number of individuals available for that region in the data derived from AFND. This means that for each region, the hotspot performance was assessed across 100,000 individuals in total. For the Worldwide population, the individuals were sampled from all regions available in AFND with the sample size for each of these regions being proportional to the current global distribution. The simulations were run individually on the sets of hotspots corresponding to each genotype of interest (e.g., A, B, C, D, and all).

[0179] The DT also allows for a variable number of the candidate hotspots (i.e., how many hotspots are desired). This number was set to three candidate hotspots. The framework outputs the set of hotspots that were selected as the best candidate hotspots for the population in every region simulated and for each of the ten repetitions (e.g., 10 x 10,000 individuals per region).

[0180] The entire set of 216 identified hotspots was then used to assemble a subset of top candidate hotspots. For each genotype and population of interest, the inventors identified the top 3 most dissimilar hotspots, differing by more than 90% in sequence from the DT output (i.e., hotspots selected the most often by DT in the 10 population repetitions). This reduced the number of candidate hotspots from 216 to 75 (e.g., 3 hotspots x 5 genotypes x 5 geographical regions). Next, for each hotspot, the inventors identified its homologs (i.e., more than 90% similar) across the other genotypes and populations of interest. This resulted in a set of 11 high-performing, most dissimilar, and non-homologous hotspots. Further manual processing of the selected set of 11 hotspots reduced the set to seven hotspots due to sequence similarities which fall just outside of the 90% threshold.

[0181] Example 3

[0182] Each individual hotspot was then evaluated for its coverage across the populations (Figure 3).

[0183] The DT framework allowed for the estimation of expected coverage of the simulated populations described above. The expected coverage was estimated for either a single hotspot or a combination of them. Briefly, the coverage was estimated by simulating the response of each individual in the population based on their response likelihood, where the response likelihood for an individual is taken as the maximum AP score for any hotspot under consideration accounting for the HLAs forthat individual. For each individual, a random number is generated (uniform distribution in [0, 1]), and the individual is considered to have responded to the hotspot if that number is less than the response likelihood. The coverage is then the fraction of individuals in the population which respond.

[0184] An example of estimated population coverage of a set of hotspot candidates in different populations across genotypes is shown in Figure 3A. An example of estimated population coverage of a set of candidate hotspots split per genotype in different populations is shown in Figure 3B.

[0185] Example 4

[0186] The foregoing examples describe the identification and extraction of hotspot candidates from HBV proteins. The sequences of these hotspot candidates were derived from each genotype of interest and therefore they represent genotypespecific amino acids. However, even though the hotspots were selected from highly conserved regions of the protein, some amino acid positions within the hotspot are subject to natural variation. To address this observation and to increase the universality of the hotspot candidates, a further refinement step was performed. Hotspot refining allowed the inventors to trim, slice, and combine hotspot sequences such that as to minimize the amount of variable amino acid positions, but also evaluate possible hotspot variants.

[0187] First, the inventors analysed the amino acid frequency of hotspot candidates based on the entire set of unique HBV sequences for a given protein in a given genotype. If an amino acid accounts for more than 90% of all amino acids found at that specific position in that specific protein, then this amino acid was selected for further refinement. When the amino acid frequency at a given position was less than 90% in the entire set of protein sequences, we cross-referenced the amino acid frequency at the same position in the other genotypes of interest. If three of these genotypes shared a particular amino acid at more than 90% frequency at that position, then this amino acid was selected. If two or more amino acids are common at a given position, then the inventors compared the results of the NIP prediction pipeline of the epitopes that contain the conflicting amino acids. Finally, based on the amino acid frequency data and AP score, the inventors selected an amino acid for that position. Although the hotspot refining step can combine several hotspot sequences into one, the inventors propose that it is better to keep conflicting amino acids separate as different candidate hotspots for the assessment of their immunogenicity. Based on this proposition, the inventors generated several refined hotspots from the list of top candidates, except for the Pol:482-534 hotspot, as this hotspot did not show any conflicting amino acids. The refined versions of the sequences are detailed herein.

[0188] Example 5

[0189] To assess the optimal (sub)sets of refined hotspots such that the expected population coverage is maximized, the inventors employed a Greedy Hill Climbing (GHC) algorithm. Figures 11 A and 11 B illustrate the results of GHC fora number of vaccine elements, ranging from one vaccine element to all possible 17 vaccine elements. Table 3 also shows a number of preferred combinations of sequences.

[0190] In the table split across Figures 11Aand 11 B, the first column contains the allowed number of vaccine elements. The second contains the estimated population coverage for a worldwide population. The last column contains the total number of amino acids for a given number of vaccine elements / vaccine element combination. The rest of the columns contain the selected refined hotspot sequences.

[0191] These data demonstrate that with only one refined hotspot sequence, the inventors observed an expected population coverage of -87%. Importantly, the optimal combination was found to be using 12 vaccine elements with an expected population coverage of 93.07%. Adding more refined hotspot sequences beyond that does not increase the expected coverage. However, it is possible that the rest of the refined sequences would account for more rare, specific HLA haplotypes.

[0192] In brief, given a set of hotspot sequences and a fixed number of allowed vaccine elements or hotspot sequences, GHC will aim at selecting the optimal subset of vaccine elements that fits the number of allowed vaccine elements and maximizes the expected population coverage.

[0193] Example 6

[0194] Examples 1 to 5 detail the in silico approach used to identify individual candidate hotspots. Example 6 focusses on the wet lab techniques utilised to validate these hotspots.

[0195] Therefore, once the individual hotspots were identified using the in silico approach, the peptides were validated through ELISpot assays in order to assess which of the selected hotspots would generate an immune response. A workflow of an ELISPOT assay for evaluating the candidate hotspots in PBMCs is shown in Figure 4.

[0196] Peripheral blood mononuclear cells (PBMCs) were collected from healthy donors (n=10) of Hispanic and Caucasian ethnicity and chronic HBV patients (n=17) of Asian, Hispanic, Caucasian and African American ethnicity. All 10 of the healthy donor samples were obtained from Cellular Technology Limited. 7 of the HBV patient samples were obtained from Cureline, and 10 of the HBV patient samples were obtained from BioIVT.

[0197] The HBV hotspots identified were fragmented into 9-mer peptides for validation in PBMCs. Upon administration of a peptide, for example in a vaccine, the peptides are intracellularly processed by the body and broken down into Class I and Class II minimal epitopes, such as 9-mers, in order to induce an immune response. Therefore, 540 predicted Class I minimal 9-mer peptides selected from HBV hotspots were synthesized. Each 9-mer peptide was dissolved in DMSO to prepare 10 peptide pools fortesting the immunogenicity of the peptides.

[0198] Peptides were pooled according to their properties (e.g., molecular weight, isoelectric point, hydrophobicity), as shown in Figure 10. These properties of the peptides were calculated using the Peptides package in R (Osorio D., Rondon- Villarreal P., Torres R. Peptides: A package for data mining of antimicrobial peptides. R J. 2015;7:4–14. doi: 10.32614 / RJ-2015-001).

[0199] The number of peptides in each pool is shown in Figure 5. Table 2 details the relationship between the SEQ ID NOs disclosed herein and the hotspots (HS) synthesised. To summarise, HS1, HS2 and HS3 correspond to SEQ ID NOs: 16, 17 and 18, respectively, and each of HS1-HS3 are encompassed by SEQ ID NO: 4. HS4, HS5 and HS6 correspond to SEQ ID NOs: 13, 14 and 15, respectively, and each of HS4-HS6 are encompassed by SEQ ID NO: 3. HS7, HS8 and HS9 correspond to SEQ ID NOs: 21, 22 and 23, respectively, and each of HS7-HS9 are encompassed by SEQ ID NO: 7. HS10 corresponds to SEQ ID NO: 6. HS11, HS12 and HS13 correspond to SEQ ID NOs: 8, 9 and 10, respectively, and each of HS11-HS13 are encompassed by SEQ ID NO: 1. HS14 and HS15 correspond to SEQ ID NOs: 11 and 12, respectively, and each of HS14-HS15 are encompassed by SEQ ID NO: 2. HS16 and HS17 correspond to SEQ ID NOs: 19 and 20, respectively, and each of HS19-HS20 are encompassed by SEQ ID NO: 5.

[0200] The peptides originating from longer hotspots were divided into multiple pools, the peptides not exceeding 70 peptides per pool. For instance, pools 6-8 contain peptides from the same hotspot, namely SEQ ID NO: 1 which accounted for an initial 200 unique peptides. Prior to splitting them into multiple pools, the amino acid sequences of those peptides were sorted alphabetically and then split into three groups of similar sizes. Their positioning within the hotspot sequence was not taken into account during the split. The same approach was used for the rest of the hotspot sequences that required splitting into multiple pools.

[0201] Each peptide was dissolved in DMSO and combined into one peptide pool. DMSO (without peptides) was utilised as a control.

[0202] Figure 5 shows 17 different hotspots identified from the in silico analysis. 10 peptide pools were generated, each pool encompassing (i.e., representing) a certain hotspot of the 17 hotspots identified. Pool 1 encompasses three hotspot sequences with the hotspot (‘HS’) coordinate HBcl-42 (HS1, HS2, HS3).

[0203] Pool 2 encompasses three hotspot sequences with the hotspot coordinate Pol62-175 PoolA(HS4, HS5, HS6).

[0204] Pool 3 encompasses three hotspot sequences with the hotspot coordinate Pol62-175 PoolB (HS4, HS5, HS6).

[0205] Pool 4 encompasses three hotspot sequences with the hotspot coordinate POI363-415 (HS7, HS8, HS9).

[0206] Pool 5 encompasses one hotspot sequences with the hotspot coordinate Pol482-543 (HS10).

[0207] Pool 6 encompasses three hotspot sequences with the hotspot coordinate POI651-760 PoolA(HS11, HS12, HS13).

[0208] Pool 7 encompasses three hotspot sequences with the hotspot coordinate POI651-760 PoolB (HS11, HS12, HS13).

[0209] Pool 8 encompasses three hotspot sequences with the hotspot coordinate POI651-760 PoolC (HS11, HS12, HS13).

[0210] Pool 9 encompasses two hotspot sequences with the hotspot coordinate LHB102-211 (HS14 and HS15).

[0211] Pool 10 encompasses two hotspot sequences with the hotspot coordinate LHB325-392 (HS16 and HS17).

[0212] In order to assess the immunogenicity of the candidate peptide pools (i.e., determine which peptide pools were recognisable by T cells in PBMCs), the collected PBMCs were subject to T cell expansion ELISPOT (enzyme-linked immunosorbent spot) assays with pre-stimulation.

[0213] Briefly, frozen stocks of PBMC from each subject were thawed and lymphocyte numbers were measured. Then, PBMCs were cultured in appropriate medium (X-VIVO 15) supplemented with GM-CSF, IL-4 and Flt3-L at 5x106cells / ml in 96-well plates. After overnight incubation, cells were incubated with HBV peptide pools, R848, LPS, and IL-1 beta for stimulation at day one and expansion at day two. On day two, cells were incubated with IL-2, IL-7, and IL-15 for 7 days. Half of the medium was changed with the freshly prepared cytokine medium every 2-3 days.

[0214] On day nine, the pre-stimulated cells were then re-stimulated with the candidate peptide pools on ELISPOT plates (Cellular Technology Limited). The antigen specific interferon-gamma production was subsequently detected by ELISPOT at day ten by following manufacture’s instruction. Briefly, pre-stimulated cells were re-stimulated with the medium containing appropriated peptide pools or DMSO, and added to anti-CD28 and anti-CD49d antibody ELISPOT plates at appropriate cell numbers. After around 20 hours of re-stimulation, plates were washed, and the anti-interferon-gamma antibody was added for detection. After 2 hours of incubation, plates were washed, and spots were detected by chromogenic methods using alkaline phosphatase-conjugated antibodies. Spots numbers were counted by an ELISPOT reader (Cellular Technology Limited).

[0215] As positive controls, cells stimulated with CERI (Cellular Technology Limited) and HBV LEP / CAP (JPT Peptide Technologies) peptides were prepared. First, the ‘CERI’ control peptide pool was made up of 124 peptides in total derived from Cytomegalovirus (30 peptides), Epstein Barr Virus (59 peptides), Human Respiratory Syncytial Virus (11 peptides), and Influenza A Virus (24 peptides). Second, the ‘HBV LEP’ control peptide pool was made up of 216 peptides from a peptide scan (15-mer with 11 amino acid overlap) through the Large envelope protein of HBV. Third, the HBV CAP control peptide pool was made up of 155 peptides derived from a peptide scan (15-mer with 11 amino acid overlap) through the Capsid protein of HBV. Two peptide pools were utilised for stimulation of cells at day one. At day nine, cells were split into wells and re-stimulated with each single peptide pool or DMSO shown in Figure 6.

[0216] As an example, Figure 7A shows the representative results of the T cell expansion ELISPOT assay of PBMCs from two of the HBV donors ((donors ‘HBV10’ and ‘HBV11’, see Figures 7Aand 9B) across each of the 10 peptide pools (labelled as columns 1-10 in Figure 7A). Figure 7B shows further representative results of the T cell expansion ELISPOT assay of PBMCs from another of the HBV donors (donor ‘HBV16’) across each of the 10 peptide pools (labelled as columns 1-10 in Figure 7B).

[0217] Figure 8 shows the results of the ELISPOT in all 17 donors. Across the 10 healthy donors and 17 HBV patients, the results confirm that peptide pools derived from the candidate hotspots were reactive with T cells in both healthy donor PBMCs and PBMCs from chronic HBV patients, and all tested PBMCs responded in the HLA masked assays to at least one candidate peptide pools (Figure 8).

[0218] Figure 8A shows summary data, which takes into account the results of cells stimulated with the control peptide pools. The average number of spots observed for cells re-stimulated with DMSO controls was subtracted from the average number of spots observed for cells re-stimulated with the peptide pools to yield a representative result to calculate the number of ‘specific spots’. Figure 8B shows the results of cells stimulated with candidate peptide pools.

[0219] HLA typing by next generation sequencing (NGS) was carried out for DNA samples from PBMC donors conducted by GenoDive Pharma Inc.. By way of example, DNA samples can be taken from donors, then subject to fragmentation, for example using restriction enzymes. Adaptors and barcodes can be ligated to the fragments which may subsequently be selected by size and sequenced. The results of the ELISPOT assays in healthy donors are summarised in Figure 9A, which confirms that all of the PBMCs derived from healthy donors responded to at least one of the candidate peptide pools.

[0220] Figure 9B summarises the results of the ELISPOT assays in HBV patients, which confirms that all of the PBMCs derived from HBV patients responded to at least one of the candidate peptide pools.

[0221] Overall, the results shown across Figures 7 to 9 confirm the immunogenicity of the hotspots predicted via the in silico approach in donor PBMCs. SEQUENCES FORMING PART OF THE DESCRIPTION

[0222] SEQ ID NO: 1 (POL:651-760) FTQCGYPALMPLYACIQZ2KQAFTFSPTYKAFLXKQYZNLYPVARQRJGLCQVFADATP TGWGLAOGHQRMRGTFVZ2PLPIHTAELLAACFARSRSGAUBBGTDNSVVLSRKYTSFP WLLGCX2ANWILRGTS

[0223] SEQ ID NO: 2 (LHBS:103-211)

[0224] LGPLLVLQAGFFLLLTXILTIPQSLDSWWTSL

[0225] SEQ ID NO: 3 (POL:62-175) VGPLTVNEX2RRLXLIMPARFYPNZTKYLPLDKGIKPYYPEHZVNHYFQTRHYLHTLWK AGILIYKREJTO SASFCGSPYSWEQ

[0226] SEQ ID NO: 4 (HBC:1-42) MDIDPYKEFGAX2VELLSFLPSDFFPSXRDLLDTASALYRZAL

[0227] SEQ ID NO: 5 (LHBS:325-392) LWEWASXRFSWLSLLVPFVQWFVGLSPTVWLSXIWMMW

[0228] SEQ ID NO: 6 (POL:482-534: REFINED) KLHLYSHPIILGFRKIPMGVGLSPFLLAQFTSAICS

[0229] SEQ ID NO: 7 (POL:363-415)

[0230] FLVDKNPHNTXESRLVVDFSQFSRGZTJVSWPKFAVPNLQSL

[0231] SEQ ID NO: 8 FTQCGYPALMPLYACIQAKQAFTFSPTYKAFLCKQYLNLYPVARQRPGLCQVFADATPT GWGLAIGHQRMRGTFVAPLPIHTAELLAACFARSRSGAKLIGTDNSVVLSRKYTSFPWL LGCAANWILRGTS

[0232] SEQ ID NO: 9 FTQCGYPALMPLYACIQSKQAFTFSPTYKAFLSKQYMNLYPVARQRSGLCQVFADATPT GWGLAMGHQRMRGTFVSPLPIHTAELLAACFARSRSGANLIGTDNSVVLSRKYTSFPWL LGCTANWILRGTS

[0233] SEQ ID NO: 10 FTQCGYPALMPLYACIQSKQAFTFSPTYKAFLSKQYMNLYPVARQRSGLCQVFADATPT GWGLAMGHQRMRGTFVSPLPIHTAELLAACFARSRSGANILGTDNSVVLSRKYTSFPWL LGCTANWILRGTS

[0234] SEQ ID NO: 11

[0235] LGPLLVLQAGFFLLTRILTIPQSLDSWWTSL

[0236] SEQ ID NO: 12

[0237] LGPLLVLQAGFFLLTKILTIPQSLDSWWTSL

[0238] SEQ ID NO: 13 VGPLTVNEKRRLKLIMPARFYPNVTKYLPLDKGIKPYYPEHLVNHYFQTRHYLHTLWKA GILYKRETTRSASFCGSPYSWEQ

[0239] SEQ ID NO: 14 VGPLTVNENRRLQLIMPARFYPNLTKYLPLDKGIKPYYPEHVVNHYFQTRHYLHTLWKA GILYKRESTRSASFCGSPYSWEQ

[0240] SEQ ID NO: 15 VGPLTVNENRRLQLIMPARFYPNLTKYLPLDKGIKPYYPEHVVNHYFQTRHYLHTLWKA GILYKRETTHSASFCGSPYSWEQ

[0241] SEQ ID NO: 16

[0242] MDIDPYKEFGATVELLSFLPSDFFPSVRDLLDTASALYREAL

[0243] SEQ ID NO: 17

[0244] MDIDPYKEFGASVELLSFLPSDFFPSIRDLLDTASALYREAL SEQ ID NO: 18 MDIDPYKEFGASVELLSFLPSDFFPSIRDLLDTASALYRDAL SEQ ID NO: 19 LWEWASVRFSWLSLLVPFVQWFVGLSPTVWLSVIWMMW SEQ ID NO: 20 LWEWASARFSWLSLLVPFVQWFVGLSPTVWLSAIWMMW SEQ ID NO: 21 FLVDKNPHNTAESRLWDFSQFSRGNTRVSWPKFAVPNLQSL SEQ ID NO: 22 FLVDKNPHNTTESRLWDFSQFSRGITRVSWPKFAVPNLQSL SEQ ID NO: 23 FLVDKNPHNTTESRLWDFSQFSRGSTHVSWPKFAVPNLQSL SEQ ID NO: 24

[0245] DPASRDLVVNYVNTNMGLKIRQ SEQ ID NO: 25

[0246] GELMTLATW SEQ ID NO: 26

[0247] TFGRETVLEYLVSFGVWIRTPPAYR SEQ ID NO: 27 MDIDPYKEFGATVELLSFLPSDFFPSVRDLLDTA SEQ ID NO: 28

[0248] LSTLPETTVV REFERENCES

[0249] Arauz-Ruiz P, Norder H, Robertson BH, Magnius LO. Genotype H: a new Amerindian genotype of hepatitis B virus revealed in Central America. J Gen Virol.

[0250] 2002 Aug;83(Pt 8):2059-2073.

[0251] Flichman D, Galdame O, Livellara B, Viaut M, Gadano A, Campos R. Full-length genome characterization of hepatitis B virus genotype H strain isolated from serum samples collected from two chronically infected patients in Argentina. J Clin Microbiol. 2009 Dec;47(12):4191-3.

[0252] Hannoun C, Horal P, Lindh M. Long-term mutation rates in the hepatitis B virus genome. J Gen Virol. 2000 Jan;81(Pt 1):75-83.

[0253] Hass M, Hannoun C, Kalinina T, Sommer G, Manegold C, Günther S. Functional analysis of hepatitis B virus reactivating in hepatitis B surface antigen-negative individuals. Hepatology. 2005 Jul;42(1):93-103.

[0254] Herkel J, Jagemann B, Wiegard C, Lazaro JF, Lueth S, Kanzler S, Blessing M, Schmitt E, Lohse AW. MHC class ll-expressing hepatocytes function as antigen-presenting cells and activate specific CD4 T lymphocyutes. Hepatology. 2003 May;37(5): 1079-85. doi: 10.1053 / jhep.2003.50191. PMID: 12717388.

[0255] Kato H, Orito E, Gish RG, Sugauchi F, Suzuki S, Ueda R, Miyakawa Y, Mizokami M. Characteristics of hepatitis B virus isolates of genotype G and their phylogenetic differences from the other six genotypes (A through F). J Virol. 2002 Jun;76(12):6131-7.

[0256] Makuwa M, Caron M, Souquière S, Malonga-Mouelet G, Mahé A, Kazanji M. Prevalence and genetic diversity of hepatitis B and delta viruses in pregnant women in Gabon: molecular evidence that hepatitis delta virus clade 8 originates from and is endemic in central Africa. J Clin Microbiol. 2008 Feb;46(2):754-6. Doi: 10.1128 / JCM.02142-07. Epub 2007 Dec 12. Meldal BH, Bon AH, Prati D, Ayob Y, Allain JP. Diversity of hepatitis B virus infecting Malaysian candidate blood donors is driven by viral and host factors. J Viral Hepat. 2011 Feb;18(2):91-101.

[0257] Meldal BHM, Moula NM, Barnes IHA, Boukef K, Allain JP. A novel hepatitis B virus subgenotype, D7, in Tunisian blood donors. J Gen Virol. 2009 Jul;90(Pt 7): 1622-1628.

[0258] Nagasaki F, Niitsuma H, Cervantes JG, Chiba M, Hong S, Ojima T, Ueno Y, Bondoc E, Kobayashi K, Ishii M, Shimosegawa T. Analysis of the entire nucleotide sequence of hepatitis B virus genotype B in the Philippines reveals a new subgenotype of genotype B. J Gen Virol. 2006 May;87(Pt 5): 1175-1180.

[0259] Norder H, Couroucé AM, Magnius LO. Complete genomes, phylogenetic relatedness, and structural proteins of six strains of the hepatitis B virus, four of which represent two new genotypes. Virology. 1994 Feb;198(2):489-503.

[0260] Okamoto H, Tsuda F, Sakugawa H, Sastrosoewignjo Rl, Imai M, Miyakawa Y, Mayumi M. Typing hepatitis B virus by homology in nucleotide sequence: comparison of surface antigen subtypes. J Gen Virol. 1988 Oct;69 ( Pt 10):2575-83.

[0261] Osorio D., Rondón-Villarreal P., Torres R. Peptides: A package for data mining of antimicrobial peptides. R J. 2015;7:4–14. doi: 10.32614 / RJ-2015-001.

[0262] Pablo Valenzuela, Margarita Quiroga, Josefina Zaldivar, Patrick Gray, William J. Rutter. The nucleotide sequence of the hepatitis B viral genome and the identification of the major viral genes. Editor(s): BERNARD N. FIELDS, RUDOLF JAENISCH, Animal Virus Genetics, Academic Press, 1980, Pages 57-70. Stuyver L, De Gendt S, Van Geyt C, Zoulim F, Fried M, Schinazi RF, Rossau R. A new genotype of hepatitis B virus: complete genome and phylogenetic relatedness. J Gen Virol. 2000 Jan;81(Pt 1):67-74.

[0263] Thedja MD, Muljono DH, Nurainy N, Sukowati CH, Verhoef J, Marzuki S. Ethnogeographical structure of hepatitis B virus genotype distribution in Indonesia and discovery of a new subgenotype, B9. Arch Virol. 2011 May;156(5):855-68. Doi: 10.1007 / s00705-011-0926-y. Epub 2011 Feb 12.

Claims

CLAIMS1. A polypeptide comprising SEQ ID NOs: 5 and / or 7, or a variant having at least 70% sequence identity thereto, said polypeptide being no more than 1500 amino acids in length.

2. The polypeptide according to claim 1, further comprising SEQ ID NO: 1, or a variant having at least 70% sequence identity thereto.

3. The polypeptide according to claim 1 or 2, wherein the polypeptide comprises any of SEQ ID NOs: 8, 9, 10, 19, 20, 21, 22 and / or 23.

4. The polypeptide according to claim 3, wherein the polypeptide comprises any of SEQ ID NOs: 8, 9, 20 and / or 22.

5. The polypeptide according to claim 3 or 4, wherein the polypeptide comprises any of SEQ ID NOs: 8, 20 and / or 22.

6. The polypeptide according to any of claims 3 to 5, wherein the polypeptide comprises SEQ ID NOs: 20 and / or 22.

7. The polypeptide according to any of claims 1 to 6, wherein the polypeptide comprises SEQ ID NO: 5 and / or 7, or a variant thereof having at least 70% sequence identity thereto, and optionally one or more spacer regions.

8. A polynucleotide encoding a polypeptide as defined in any of claims 1 to 7.

9. The polynucleotide according to claim 8, wherein the polynucleotide is DNA.

10. The polynucleotide according to claim 8, wherein the polynucleotide is RNA.

11. The polynucleotide according to claim 10, wherein the RNA is mRNA.

12. A vector comprising a polynucleotide according to any of claims 8 to 11.

13. The vector according to claim 12, further comprising regulatory elements for its transcription or translation.

14. The vector according to claim 12 or 13, wherein the vector is a plasmid vector or a viral vector.

15. A microorganism comprising a polypeptide according to any of claims 1 to 7, or a polynucleotide according to any of claims 8 to 11, or a vector according to any of claims 12 to 14.

16. The microorganism according to claim 15, wherein said microorganism is a bacterial microorganism.

17. The polypeptide according to any of claims 1 to 7, or the polynucleotide according to any of claims 8 to 11, or the vector according to any of claims 12 to 14, formulated in a nanoparticle or lipid formulation.

18. A composition comprising the amino acid sequence SEQ ID NO: 5 and / or 7, or a variant having at least 70% sequence identity thereto, each of said amino acid sequences being no more than 150 amino acids in length.

19. A vaccine composition comprising a polypeptide, polynucleotide, vector, microorganism, or composition according to any one of claims 1 to 18.

20. The vaccine composition according to claim 19, wherein the vaccine is an mRNA vaccine.

21. The vaccine composition according to claim 18 or 19, wherein the vaccine further comprises a pharmaceutically acceptable carrier, diluent, excipient and / or adjuvant.

22. The vaccine composition according to any of claims 19 to 21, wherein said vaccine composition is formulated for a parental, oral, sublingual, nasal, naso-oral, or pulmonary administration.

23. The vaccine composition according to claim 22, wherein said parental administration is subcutaneous, intradermal, intramuscular, subdermal, intraperitoneal, or intravenous administration.

24. The polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition according to any preceding claim, for therapeutic use.

25. The polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition according to any of claims 1 to 23, for use in the treatment or prophylaxis of a HBV-related disease.

26. The polypeptide, polynucleotide, vector, microorganism, composition, or vaccine composition for use according to claim 25, wherein the HBV-related disease is hepatitis, HBV-related liver cirrhosis or HBV-related liver cancer.

27. A method of treatment comprising administering to a subject the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to any of claims 1 to 23.

28. The method according to claim 27, wherein the treatment is a treatment or prophylaxis of a HBV-related disease.

29. The method according to claim 28, wherein the HBV-related disease is hepatitis, HBV-related liver cirrhosis or HBV-related liver cancer.

30. Use of the polypeptide, polynucleotide, vector, microorganism, composition or vaccine composition according to any of claims 1 to 23 in the manufacture of a medicament.

31. The use according to claim 30, wherein the medicament is for therapeutic use.

32. The use according to claim 31, wherein the therapeutic use is a treatment or prophylaxis of a HBV-related disease.

33. The use according to claim 32, wherein the HBV-related disease is hepatitis, HBV-related liver cirrhosis or HBV-related liver cancer.