Methods of designing nanostructures
Decoupling geometry from function in nanostructure design reduces components and regulatory hurdles, enabling cost-effective and versatile nucleic acid-based nanostructures with varied pharmacokinetic properties.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- THE UNIVERSITY OF NEWCASTLE
- Filing Date
- 2025-12-19
- Publication Date
- 2026-06-25
AI Technical Summary
Existing nanostructures, particularly nucleic acid-based ones like DNA origami, are costly and time-consuming to produce due to their complex geometries and large number of components, and they face regulatory hurdles for therapeutic applications.
Designing nanostructures that decouple geometry from function, allowing for fewer components while maintaining desired topological and functional characteristics through computational methods, using multi-criteria decision-making and off-target reaction minimization.
This approach reduces production costs and time, minimizes regulatory burdens, and enables new conformations with varied pharmacokinetic properties, enhancing therapeutic efficacy and versatility.
Smart Images

Figure GB2025060065_25062026_PF_FP_ABST
Abstract
Description
[0001] METHODS
[0002] Field
[0003] The invention is in the field of nanostructures, in particular bio-based nanostructures.
[0004] Background
[0005] Nanostructures comprised of a number of individual smaller units are used in a wide array of applications, including therapeutic applications, smart materials, biosensing, DNA data storage and DNA computation.
[0006] Nanostructures can be made from a range of different subunits, for example subunits made of nucleic acid such as DNA or RNA, protein based subunits.
[0007] In particular nucleic acid-based nanostructures such as those formed by DNA origami are a growing area of interest since these structures have an array of uses, including in vaccines, drug delivery, viral therapeutics and other non biotechnological areas such as DNA data storage.
[0008] DNA or RNA origami refers to the use of smaller building blocks or subunits that each comprise one or a plurality of nucleic acids. The building blocks are designed so as to self-assemble into the desired conformation. Each building block typically comprises a longer scaffold nucleic acid, and multiple shorter "staple" strands, that are designed to hybridise to the scaffold at, in some instances, discontinuous, regions, forcing the scaffold to adopt pre-designed conformations.
[0009] Other nucleic acid-based nanostructures that are not generated with the origami approach are also known and can have a range of functions, for example as drug delivery systems, nano-pores, computational substrates, and data storage substrates.
[0010] The design of nanostructures can be performed using computational tools, whereby a user inputs specific target parameters, for example a nanostructure sub-structure that has a particular geometry. The user then physically generates the sub-structure according to the design generated by the computational aid.
[0011] Since nanostructures typically comprise a large number of parts, they can be costly both in terms of materials and time to produce. Furthermore, in the context of therapeutic uses of nanostructures, regulatory approval is required for each individual component of a therapeutic nanostructure.
[0012] There is a need for the ability to design nanostructures that perform as required, but which comprise fewer components than currently designed nanostructures.
[0013] Summary of the invention
[0014] The inventors have surprisingly found that whilst current computational methods of designing nanostructures, for example for designing the component parts that generate the nanostructure are geared to the provision of optimal, "ideal" nanostructures that most closely fit the design constraints, it is possible to re-design these structures so that the nanostructure retains the desired functional and / or topological characteristics, whilst comprising fewer overall components. Fewer components allow the structures to be produced faster and at a lower cost. Furthermore, reducing the number of component parts also reduces the number of failure modes. The computational methods and other methods described herein open up the design space for new conformations of nanostructures that may retain the desired properties (e.g. function) of the parent nanostructure while varying its topology, or have new and unanticipated functional and / or topological properties.
[0015] To date, nanostructures have been designed to have a particular geometry which correlates with function. In some instances the methods of the invention described herein separate function from geometry and focus on designing a structure with the required function, irrespective of geometry (while perhaps preserving some topological characteristics).
[0016] There may be provided a computer program, which when run on a computer, causes the computer to configure any apparatus, including a circuit, controller, converter, or device disclosed herein or perform any method disclosed herein. The computer program may be a software implementation, and the computer may be considered as any appropriate hardware, including a digital signal processor, a microcontroller, and an implementation in read only memory (ROM), erasable programmable read only memory (EPROM) or electronically erasable programmable read only memory (EEPROM), as non-limiting examples. The software may be an assembly program.
[0017] The computer program may be provided on a computer readable medium, which may be a physical computer readable medium such as a disc or a memory device, or may be embodied as a transient signal. Such a transient signal may be a network download, including an internet download. There may be provided one or more non-transitory computer-readable storage media storing computer-executable instructions that, when executed by a computing system, causes the computing system to perform any method disclosed herein.
[0018] Detailed description of the invention
[0019] The invention provides various methods including computer implemented methods and compositions relating to the provision of nanostructures that comprise a reduced number of component parts, and / or a reduced number of different component parts, to typical nanostructures, for example fewer component parts or fewer different component parts than a parent nanostructure- whilst retaining desired topological, functional and / or structural parameters. The invention also provides the nanostructures designed by these methods themselves. The invention also provides various computer implemented versions of these methods. Furthermore, the invention also provides methods and computer implemented methods for the optimal design of the parent nanostructures as well as the derivative nanostructures.
[0020] The invention provides a method for designing a nanostructure that is comprised of a number of component parts and that has a desired topology, function and / or geometry, but which advantageously has a reduced number of component parts relative to known nanostructures or a "parent" starting nanostructure that has the desired target topology, function or geometry. For example the nanostructure may have to have a particular topology to act in a certain way, for example as a drug delivery vehicle; or may have to have a topology or geometry that allows for delayed-degradation of the nanostructure, or for degradation of the nanostructure under human bodily conditions. Since a key role for this technology is in the design of nucleic acid origami structures with a reduced number of component parts, or a reduced number of different component parts with respect to a known nanostructure or a parent nanostructure, the method has been termed (dis)origami. The known nanostructure or "parent" may be designed using known methods for designing such structures, and the present invention improves on these designed structures.
[0021] The general premise underlying the invention is that in many instances, nanostructures tend to be optimised for geometry and shape, with the aim being that the organised structure has a particular use or function, for example self-assembling with other nanostructures to form a larger structure, and tend to be organised and structured. However, in many instances, the inventors have found that nanostructures can be produced that are actually optimised for topology and / or function, rather than a specific geometry or shape. From a geometrical point of view, these topologically / functionally optimised structures may appear to be physically disorganised compared to their structured counterparts. For example a nanostructure with fewer component parts relative to a parent nanostructure may appear to be geometrically disorganised, but can still be optimised for a function / topology, or has a function / topology that is within a tolerable range. In some instances the methods of the invention may result in a new nanostructure that has the same desired function / topology as the parent, or may in some instances have an even more preferred function / topology than the parent. For example, it is possible to express proteins from nucleic acid nanostructures within the human body. In some instances it is expected that using the methods described herein, whilst a range of nanostructures may all express the same protein, due to the different number of component parts, the range of nanostructures are expected to have a range of different pharmacokinetic properties, from which a desired structure can be selected.
[0022] Some exemplary desired features of a disorigami nanostructure and which can be optimised, whilst reducing the number of component parts are:
[0023] • Therapeutic mRNA Folding: Optimizing the design of therapeutic mRNA structures for efficient cellular delivery and translation.
[0024] • Therapeutic DNA folding: Optimizing the design of therapeutic DNA structures for efficient cellular delivery, transcription and translation.
[0025] • Small Circulating RNAs and Small Interfering RNAs: Designing and optimizing nano-machines that interact with (or are built with) these molecules for diagnostics and next-generation therapeutics.
[0026] • DNA / RNA Nanomachines: Creating nanomachines for in situ diagnostics and therapeutic applications.
[0027] • Biomedicine: Applications in gene therapy, drug delivery, and personalized medicine.
[0028] • Biosensors: programmable DNA / RNA nanomachines as biosensors (e.g. for the natural environment or for clinical diagnosis)
[0029] • DNA Data Storage: Utilizing DNA origami or other DNA / RNA multicomponent system for compact and stable data storage solutions in materio, in vitro, in vivo or in natura. The invention also provides methods of designing nanostructures that self-assembly from a number of component parts that have fewer off-target assemblies, relative to structures that are not designed with the method. The methods of the invention can be combined to design and generate nucleic acid structures with both fewer overall components parts and / or fewer overall different component parts relative to a parent structure, and which also has fewer off-target assemblies than the parent structure, or fewer off-target assemblies compared to other candidate nanostructures.
[0030] In many instances the method of the invention starts with a known nanostructure (a "parent" nanostructure) with known component parts and that has the desired topology, geometry or function. Candidate variants of the parent nanostructure with one or more fewer component parts are designed and the function / topology / geometry predicted, in silico, (though in some instances various candidates may be physically generated and the function / topology / geometry physically determined in real-world), and selecting those candidates that have fewer component parts but which have the same function / topology as the parent nanostructure, or which have a function / topology that is within a range of tolerance of the function / topology of the parent.
[0031] The majority of existing nanostructures, for example nucleic acid nanostructures such as DNA origami nanostructures, are designed based on a target shape or geometry. There are many software based tools available to design a nanostructure with a particular target geometry, for example caDNAno (www.cadnano.orq; Douglas SM, et al. Rapid prototyping of 3D DNA-origami shapes with caDNAno. Nucleic Acids Res. 2009;37:5001-5006). For example in the context of nucleic acid-based nanostructures the software will output a list of polynucleotide component parts of specified sequences that, once physically produced, should auto-assemble into the desired nanostructure. However, there can be off-target structures produced. In another aspect described elsewhere herein (but which can be used to design the parent nanostructure described herein) the invention also provides methods to mitigate the occurrence of off-target structure formation by taking further parameters into account during a multicriteria decision making (MCDM) process. This invention provides four new criteria that can guide any MCDM process and may be use to generate an optimal parent nanostructure as well as to guide the generation of the derivative disorigami.
[0032] This step of designing a desired nanostructure, based on a desired geometry, function and / or topology is step Cl (sequence selector programme) in Figure 1, and may or may not be part of the method of the invention. The parent nanostructure will comprise a set of component parts and the nanostructure will have (if tested experimentally and determined) or is predicted to have, a desired particular topology, function and / or geometry. The set of parent component parts comprises n total parent component parts and m different parent component parts. For example in some nanostructures each component part is different, in which case m=n. In other cases some of the component parts that make up a nanostructure are replicates of one another, in which case n>m. Reducing n (total number of component parts) and reducing m (number of different component parts) both have advantages. Reducing the total number of parts reduces cost and time; reducing the number different component parts reduces the regulatory burden since in some instances each different component part, e.g. each different nucleic acid, in a nanostructure, has to be separately approved.
[0033] The first step of the method of in the invention involves obtaining information about the parent nanostructure. This information is typically information about the component parts for example the sequences of any nucleic acid component parts, and the number (n) of total parts and number (m) of different parts.
[0034] This information is used to generate information about one or a plurality of candidate sets of component parts where each candidate set of component parts comprises: at least one fewer component part compared the parent set of component part (i.e. total component parts in a set of candidate component parts is n-1 or n- more than 1); and / or at least one fewer different component part compared to the parent set of component parts (i.e. the number of different component parts in a set of candidate component parts is m-1 or m-more than 1.
[0035] To be clear, in some preferred embodiments the candidate set of component parts is physically the same as the set of the parent component parts, just with one or more fewer parts. For example where the nanostructure is a DNA nanostructure that comprises 10 parts of sequence (a)-(j), a candidate set may comprise 9 parts having sequences (a)-(i); or 5 parts having sequences (a), (c), (e), (f) and (g). The nature of the component parts is the same between the parent and the candidate - there are fewer component parts in the candidate.
[0036] In some instances there may be some redesign of the sequence of one or more of the components parts, or one or more additional component parts may be introduced which supports the removal of one or more of the original pa rent / candidate component parts. The information about the one or plurality of sets of candidate component parts, as for the parent nanostructure, is typically information about the sequences of the component parts where the nanostructure is a nucleic acid nanostructure.
[0037] The step of generating the information about the one or plurality of candidate sets of component parts may be done by any means. In Figure 1 this step is indicated as C3 and can be a computer implemented step using a material minimisation step, e.g., implemented via a genetic algorithm. Alternatives to a genetic algorithm could be a monte carlo tree search, reinforcement learning or some other combinatorial optimisation or Al method or hybrid combination. Alternatively information about a range of sets of candidate component parts can be obtained without being a computer implemented method.
[0038] The nanostructures that would be formed from each set of candidate components is determined in silico, and the topology / geometry / function of each candidate is determined. For example programmes such as REVNANO (see Shirt-Ediss 2023 Computational and Structural Biotechnology Journal 21: 3615-3626) are able to predict the shape or topology of a nanostructure based on a set of input component parts. REVNANO is an example of software that is suitable for use in this step. Other suitable software may also be used, for example other rapid DNA / RNA multi-strand conformation prediction. For each nanostructure that is predicted to be formed from each of the candidate sets of component parts, the predicted topology, function and / or geometry is determined, and compared to the topology, function and / or geometry of the parent nanostructure and this distance between the topology, function and / or geometry of the candidate nanostructure and the topology, function and / or geometry of the parent nanostructure is determined. At this point it is likely that some or many of the nanostructures that are predicted to form from the sets of candidate component parts will fail and will be outside the tolerance range of topology, function and / or geometry with respect to the parent nanostructure. In other instances the distance between the two will not be so great and one or a subset of the candidate nanostructures will be within the tolerable range of topology, geometry and / or function. In some instances the topology, geometry and / or function may even be more preferred than the parent topology, geometry and / or function. Furthermore, candidates nanostructures may also be ranked with any one or combination of the new criteria in this invention (formulas w,x,y,z) as some candidates may have better performance than others when evaluated based on these criteria, i.e. may have reduced off-target assemblies than other candidates. Although the above is described in the context of an in silico approach, it is of course possible to perform the method using real-world data obtained from physically produced nanostructures. For example some or all of the candidate nanostructures may be physically made, and their physical topology, geometry and / or function determined and used to score the candidate nanostructure with respect to the parent nanostructure.
[0039] In either case, whether purely in silico or with some real-world data, the candidate nanostructures are scored relative to the parent nanostructure for the desired characteristic. Those that fall within an acceptable range of topology, geometry and / or function, with respect to the parent nanostructure are taken forward, and either used directly for the purpose for which they were designed, or are optimised further. For example the top 1, 2 or more candidates may be taken and optimised further, by for example altering one or more sequences in the now reduced set of component parts to augment the topology, geometry and / or function further; or the candidates may be put back into the method to further reduce the number of component parts whilst retaining the desired features.
[0040] Accordingly in some embodiments the method provides a method of designing a candidate nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts, wherein said method comprises: a) obtaining information about a set of parent component parts that comprise a parent nanostructure which has said or is estimated or calculated to have said topology, function and / or geometry, wherein said set of parent component parts comprises n total parent component parts and m different parent component parts; b) reducing the number of parent component parts or the number of different parent component parts to produce one or a plurality of candidate sets of component parts wherein each said candidate set of component parts comprises the same set of parent component parts, but wherein each set of candidate component parts comprises i) n-1, or n-2, or n-3 or fewer total candidate component parts; and / or m-1, or m-2 or m-3 or fewer different candidate component parts; c) measuring or calculating the topology, function and / or geometry of a candidate nanostructure formed from one or the plurality of sets of candidate component parts, based on the information about the plurality of sets of candidate component parts, and optionally determining Metric 1 (Mi), Metric 2 (M2), Metric 3 (M3) and / or Metric (M4) as defined elsewhere herein and calculating the likelihood of off- target assemblies; and d) selecting one or more candidate nanostructures that are measured or calculated to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s), and optionally calculated to have fewer off-target assemblies.
[0041] Since the candidate nanostructures have at least one fewer component than the parent, the selected candidates in (d) will each have fewer component parts than the starting parent nanostructure.
[0042] Step (f) is split into two parts since in step (f)(i) the information may be physically obtained information for example from wet-lab experimental work and / or, in (f)(ii), maybe be information obtained via in silico predictions using modelling software such as REVNANO (see Shirt-Ediss et al 2023 Comput Struct. Biotechnol. J. 21: 3615-3626). REVNANO is an example of software that is suitable for use in this step. Other suitable software may also be used, for example other rapid DNA / RNA multi-strand conformation prediction.
[0043] In some instances, it may be desired that the output of the method is a nanostructure with a predetermined number of total number of component parts, or a predetermined number of different component parts. In these instances, in step (b) one or more of the sets of candidate component parts may comprise a set of parent component parts that has: a) n-X or optionally fewer total candidate component parts; or b) m-X or optionally fewer different candidate component parts where X is a value that takes n and / or m to a predetermined number of total candidate component parts and / or different candidate component parts. In some instances this may be referred to as a materials budget - the amount of material that the designed nanostructure should have. The invention also then includes methods of designing a nanostructure that has a predetermined number of total component parts and / or a predetermined number of different component part, wherein the method comprises the methods of designing a candidate nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) described herein. The skilled person will appreciate that the steps of the method can be iterated to progressively reduce the number of component parts in a nanostructure whilst maintaining the desired characteristics.
[0044] In some instances, the method can comprise step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated at least once, optionally at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 150, 200 or more times. In some instances following iteration the topology, function and / or geometry of the further candidates is compared to the topology, function and / or geometry of the candidate from which the further candidate was derived. However, in some instances there is a risk that the topology, function and / or geometry deviates from the topology, function and / or geometry of the original a parent nanostructure. Accordingly in preferred instances the topology, function and / or geometry of each further candidate is compared to the topology, function and / or geometry of the parent nanostructure, and a determination made as to whether the topology, function and / or geometry of the further candidate is within a tolerable range or not. If within a tolerable range, the further candidate may be used directly of for the intended purposes, or the component parts fed back into the method to reduced the total number and / or number of different component parts further.
[0045] As above, in some instances the method may be performed with an aim to design a nanostructure with a predetermined number of total parts / different parts, or fewer. Accordingly in some instances step (e) comprises: wherein the one or more candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated until the one or more candidate nanostructures of (e) comprises a number of component parts that is below a predetermined number of component parts (materials budget).
[0046] As above, preferably the topology, function and / or geometry of the further candidates is compared to the original patent nanostructure, rather than to one of the intermediate candidates.
[0047] Accordingly in some embodiments each selected candidate nanostructure (from step (d)) has a set of component parts with a total number of component parts (x) and a total number of different component parts (y) and wherein the method further comprises: step (e) providing, for one or more or each of the selected candidate nanostructures of (d) information about one or a plurality of sets of further candidate component parts wherein each said set of further candidate component parts comprises the set of component parts of the corresponding candidate set of component parts, but wherein each set of further candidate component parts comprises i) x-1, or x-2, or x-3 or fewer total further candidate component parts; and / or y-1, or y-2 or y- 3 or fewer different candidate component parts; step (f) measuring or calculating the topology, function and / or geometry of one or the plurality of further candidate nanostructures formed from one or the plurality of sets of further candidate component parts, based on the information about the plurality of sets of further candidate component parts, and optionally and optionally determining Metric 1 (Mi), Metric 2 (M2), Metric 3 (M3) and / or Metric (M4) as defined elsewhere herein and calculating the likelihood of off-target assemblies; and step (f) selecting one or more further candidate nanostructures that are predicted to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s), and that optionally has fewer off target-assemblies.
[0048] As set out above the component parts may be polynucleotides; and / or b) polymers of amino acids.
[0049] It will be clear to the skilled person that the types of component parts of the candidate sets are the same types of component parts as the parent.
[0050] The nucleic acids may be DNA and / or RIMA; or may be variants of DNA and / or RNA, such as those selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
[0051] In some instances one or more or all of the nucleic acids of a set of component parts comprise at least 5 nucleotides for example at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200 or more nucleotides. The nanostructure, for example the parent nanostructure, candidate nanostructure or desired target nanostructure may be a nucleic acid nanostructure, for example a DNA nanostructure or an RIMA nanostructure.
[0052] The nanostructure may be a nucleic acid origami nanostructure wherein the component parts comprise at least one scaffold polynucleotide and a plurality of staple polynucleotides. For example in these cases the method of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) of the invention may result in a nucleic acid origami structure which has fewer total number of staple nucleic acids relative to a parent structure, and / or may have fewer total different staple nucleic acids, i.e. the number of nucleic acid staples with different sequences is reduced.
[0053] In some instances the nanostructure is a nucleic acid origami nanostructure that comprises a plurality of substructures, for example a plurality of substructures with a particular geometry, such as a triangular shape. In these instances one or more of the substructures may be designed using the method of the invention, i.e. the substructures may comprise the component parts and the number of component parts is reduced using the method of the invention. The component parts of the substructures may comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides.
[0054] As set out elsewhere herein, the method of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) may be performed on a computer. Accordingly the invention also provides a computer implemented method of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s). Preferences for features of the computer implemented method are as set out elsewhere.
[0055] An example of pseudocode for one exemplary implementation of a computer-based approach for executing the method of designing a nanostructure with a particular topology, function and / or geometry of that is within a target range of topologies, function(s) and / or geometry(s) the invention, for example in the context of reducing the number of staples for a DNA nanostructure is as follows:
[0056] Input: list of initial staple sequences network layout size of the population maximum number of generations percentage for reducing population size maximum allowed deletions
[0057] Output: Staple sequence with the best fitness
[0058] Initialize: assign zero to generation count and set best individual and best fitness to None
[0059] [population], [removed sequences] <- initialize population
[0060] Run: Sequence Selector
[0061] WHILE (i < generation count) do generation count + = 1
[0062] FOR (individual in population) do
[0063] Run: REVNANO
[0064] Run: Calculate Network Similarity
[0065] (Optionally) Run: Calculate metrics Mi, M2, M3, M4
[0066] Run: Pareto Sort
[0067] Run: Parent Selection
[0068] WHILE (population < population size) do
[0069] Run: Mutate
[0070] As described elsewhere herein, the target nanostructure may be defined by properties other than geometry, for example may be defined by a desired functional property or topology or may be defined by the optimisation of the new off-target criteria described in this invention. The workflow set out herein and exemplified in Figure 1, and the pseudocode set out above can also be used in these "target-shape-free" disorigami approaches.
[0071] Any of the methods of designing a nanostructure with a particular topology, function and / or geometry of that is within a target range of topologies, function(s) and / or geometry(s) can also comprise the further step of physically producing one or more candidate nanostructures that is predicted to have a particular topology, function and / or geometry of that is within the target range of topologies, function(s). The physically produced candidate nanostructure can be used for the desired target use, for example in drug delivery, or can be used to obtain physical data to feed back into the method allowing further iteration and optimisation. As set out above, the methods of designing a nanostructure with a particular topology, function and / or geometry of that is within a target range of topologies, function(s) and / or geometry(s) comprises a step (a): obtaining information about a set of parent component parts that comprise a parent nanostructure which has said or is predicted to have said particular topology, function and / or geometry, wherein said set of parent component parts comprises n total parent component parts and m different parent component parts
[0072] Step (a) therefore may simply comprise taking information that has been generated prior to commencing the method of the invention, for example digital information about the set of parent component parts that been designed using any computational approach.
[0073] In other instances, the step of obtaining the information about the set of parent component parts is part of the method of the invention, for example in such instances the methods of designing a nanostructure with a particular topology, function and / or geometry of that is within a target range of topologies, function(s) and / or geometry(s) may comprise a step (a)(i) prior to step (a):
[0074] (a)(i) designing a parent nanostructure that comprises a set of parent component parts and obtaining information about said parent component parts.
[0075] As set out elsewhere, the information may be sequence information and / or information about how each component part interacts with other component parts in the set.
[0076] The invention also provides one or more candidate nanostructures as set out herein with a particular topology, function and / or geometry of that is within a target range of topologies, function(s). Preferences for the nanostructure are set out herein, for example in some instances the nanostructure is a DNA or RIMA nanostructure for example a DNA origami nanostructure.
[0077] Nanostructures designed with the disorigami or off-target methods are physically and structurally different from conventional "perfectly ordered" origami in that they deliberately decouple topology (who is connected to whom) from geometry (the exact 2D / 3D shape), and are engineered to occupy a family of disordered, materials- minimised conformations rather than a single, idealised structure. Conventional DNA / RNA origami is sequence- and staple-designed to enforce one target geometry as closely as possible: a dense, highly regular lattice where almost all designed staples are present, local coordination numbers are fixed, and the scaffold routing plus staple set over-constrain the 3D embedding. Geometry and function are therefore tightly locked together. By contrast, in disorigami component strands (typically staples) are systematically removed and / or merged from a parent structure, and optionally sequences are redesigned, while constraining only the target topological connectivity and / or functional criteria to remain within a tolerance window. This yields nanostructures with: fewer total components (reduced n), fewer distinct components (reduced m), missing duplex segments, non-uniform edge lengths and junction valences, and / or increased local flexibility. From a physical point of view, the resulting network is a programmed disordered graph rather than a fully filled, periodic origami lattice.
[0078] By missing duplex segments we include the meaning that the structure is missing a double-stranded DNA helix in a region where the skilled person would usually expect to find a double-stranded DNA helix, for example in a conventional highly ordered nanostructure. In some embodiments the term missing duplex segment is relative to a similar structure that is highly ordered.
[0079] Because the connectivity constraints are relaxed relative to classical origami, a given disorigami design does not correspond to a single rigid geometry but to a tunable ensemble of shapes that all realise essentially the same topological scaffold-staple graph. In other words, the structure is designed to maintain a target topological connectivity while its actual geometry is allowed - and computationally driven - to change (e.g. aspect ratio, compactness, surface roughness, local curvature). The degree and pattern of staple removal (and / or sequence re-optimisation) directly control this geometric variation. Experimentally, this manifests as distinct sizes / compactions for different disorigami variants derived from the same parent. The Figures show that progressive material minimisation of a triangular origami yields systematically shifted bands on agarose gels for both DNA / RNA and DNA / DNA disorigami, evidencing physically different structures (different mobility and effective size / shape) built on the same underlying scaffold logic. The off-target optimisation method further introduces sequence-level structural differences that are absent in standard origami: scaffolds and staples are selected to minimise four explicit off-target metrics (M1-M4), reducing staple-scaffold, scaffold -scaffold, staple-staple and intrastaple misfolded structures. This produces nanostructures whose internal "off-target interaction landscape" is systematically suppressed and is therefore structurally distinct (in terms of accessible kinetic and thermodynamic states during folding) from known designs that only aim at the on-target geometry and ignore these off-target classes.
[0080] Functionally, disorigami architectures gives rise to a new product class: topologypreserving, geometry-tunable nanostructures. For example, a therapeutic mRNA or siRNA can be folded into a disorigami where the topological wrapping and shielding of the nucleic acid (number and pattern of protective contacts, encapsulation motifs, routing of the scaffold) is preserved so that key functions such as protection from degradation and correct delivery / translation are maintained. At the same time, by varying the disordered geometric embedding of that topology, one can deliberately avoid or enhance biological constraints that depend on geometry and surface presentation - such as recognition by immune receptors, interaction with restriction / cleavage enzymes, circulation half-life, or tissue penetration - without changing the encoded protein or the high-level topological protection. Different disorigami variants based on the same therapeutic sequence can therefore share the same topological function (e.g. encoding and protecting the same mRNA) but exhibit systematically different pharmacokinetics and biodistribution, which is not achievable with conventional single-shape, perfectly ordered origami.
[0081] Taken together, these features provide a clear structural distinction between disorigami structures and known structures. Disorigami / off-target nanostructures are defined not just by reduced materials usage, but by (i) a deliberately disordered, partially sparsified connectivity pattern relative to a parent yet preserving specified topological invariants; (ii) a correspondingly altered, tunable geometry evidenced by distinct physical properties (e.g. gel mobility) despite conserved topology; and (iii) sequence-level optimisation against off-target interactions, yielding a different accessible folding and interaction landscape than known "perfectly ordered" origami nanostructures.
[0082] Independently of a comparison to a parent nanostructure, the disorigami structures of the invention set out herein have any one or more of: a) missing duplex segments; b) non-uniform edge lengths; c) non-uniform junction valences; d) local flexibility, optionally high local flexibility.
[0083] In some embodiments the disorigami structure of the invention is not a fully filled, periodic origami lattice The new disorigami structures are either (a) a randomly partially disassembled nanostructure derived from the parent structure, which preserves or even improves function, costs, regulatory compliance or other criteria; (b) an optimally partially disassembled nanostructure derived from the parent structure, which preserves or even improves function, costs, regulatory compliance or other criteria; or (c) a parentless optimally disorganised nanostructure that optimises a target function, costs, regulatory compliance or other criteria.
[0084] The invention also provides a nanostructure with a disordered structure. In some instances the nanostructure may have a programmed disordered structure. Preferences for the disordered nanostructure are set out elsewhere herein, for example the disordered nanostructure may be a nucleic acid nanostructure, for example a DNA or RIMA nanostructure. The disordered nanostructure may be a disordered nucleic acid origami nanostructure, for example a disordered DNA origami nanostructure.
[0085] The invention also provides advances in the computational design of nanostructures, for example a parent nanostructure as set out herein. This second aspect may be used in combination with the first aspect for example to more optimally design the parent nanostructure (Cl in the workflow of Figure 1); or may be used entirely independently and used to design nanostructures that are not to go through the disorigami workflow of the first aspect of the invention.
[0086] Rather than top-down fabrication, DNA origami nanostructures are obtained through a complex ’ bottom-up’ self-assembly reaction involving hundreds of different types of single stranded nucleic acids interacting in parallel via hybridisation, melting and strand-displacement reactions in a decreasing temperature gradient applied over an extended time period. The end result of the cooled self-assembly reaction is, ideally, the target nanostructure at high copy number and yield.
[0087] Various works have investigated how in vitro origami assembly is affected by factors such as the origami design itself (i.e. the staple and scaffold routing patterns), the temperature protocol the stoichiometric ratio of staples to scaffold the type and amount of salts in the assembly buffer the solvent and the primary base sequences of the scaffold and staple strands, generally focussing only on the GC content.
[0088] Base sequences have traditionally been regarded as providing just an enthalpic contribution to origami folding, where higher nearest-neighbour base energies have been experimentally demonstrated to generally raise the melting temperature of the structure and higher GC content scaffolds have been shown to assemble at lower salt concentrations. Some origami CAD tools have included design algorithms to make synthetic scaffold sequences deficient in repeated regions and with controlled GC content.
[0089] Off-target reactions in DNA origami assembly have so far been ignored by the DNA nanotechnology community because of the perceived robustness of the DNA origami folding process and since origami assembly is largely conceived as a 'sequence agnostic nucleation and growth system'.
[0090] As robust as self-assembly seems, it is conceivable that the nucleation-and-growth process could still get kinetically trapped away from the target equilibrium origami structure if base sequences made particular off-target bindings possible. For example, strong scaffold secondary structure, made more favourable because of the high effective concentration of the scaffold strand with respect to itself, could block on- target staple binding sites and instead could co-operatively help the mis-binding of staples. In this way, via unplanned off-target bindings, sequence could make entropic as well as enthalpic contributions to folding. Furthermore, the ability of on-target staples to strand-displace mis-bound configurations could be hampered by off-target staples reversibly binding and occluding toeholds. Staples with a leg mis-bound could conceivably permit staple blocking by other copies of the same staple. Finally, off- target bindings between the staples themselves could create secondary structure slowing the kinetic rate of on-target staple attachment to the scaffold.
[0091] Off-target reactions may be more relevant when scaffold sequences are derived from repeat-prone biological sources, when staple legs are fairly short (7 or 8nt) and bind initially at intermediate temperatures, and when constant-temperature folding with no initial thermal denaturation step is employed.
[0092] Simulation of DNA origami folding by coarse grain molecular dynamics does in principle rigorously include all off-target reaction possibilities because of the base-level resolution. However, such simulations are practically restricted to second timescales after months of computation and typically require artificial assembly conditions (a small origami, high staple concentrations, relatively high constant temperature) to be feasible. Further coarse-graining to 8nt per bead and using a switchable force field has more recently allowed to simulate self-assembly for kilobase-size origamis (again at second timescales, constant temperature). However, the latter model only considers hybridising bead pairs to be those in the target origami design; accordingly, kinetic traps are limited to situations where on target binding sites become sterically inaccessible during folding. Moreover, the 8nt bead resolution cannot incorporate strand-displacement reactions - a crucial mechanism for resolving off-target binding sites.
[0093] Approximate ‘domain-level’ kinetic models of origami assembly based on mass action rates between hybridisation states are able to access experimentally relevant timescales (hours) by sacrificing geometric state information for a purely topological hybridisation state. They are also able to include a full temperature annealing ramp in the simulation. However, again, these models only consider on-target staple-scaffold bindings in their transition possibilities, leaving staple blocking (at very high staple concentrations) as the only mechanism able to create kinetic traps. While off-target reactions and associated strand displacement mechanisms could be (partially) included in these mass action models in future by relaxing the domain-level constraint, it is a formidable undertaking to implement: accurate reaction enumeration and parameterisation of possible hybridisation and (non-standard) strand-displacement reactions is non-trivial, and difficult computational problems such as stiff dynamics and efficient shortest path calculation on complex graphs must be overcome. Furthermore, all theoretical models mentioned above only track a single origami instance and do not take into account e.g. possible undesired inter-origami binding effects.
[0094] Given the difficulty in entirely dismissing the importance of off-target reactions in DNA origami self-assembly, and also given the above challenges of rigorously incorporating off-target side reactions into simulation models, the inventors have devised a different 'static' optimisation approach to identify well-folding origami scaffold sequences for a particular origami design.
[0095] The computational approach described herein is based on exactly enumerating and minimising four types of off-target side reactions for a particular origami design, and uses multi-objective optimisation techniques (also recently applied to RNAs, proteins and gene circuits) to identify a 'pareto set’ of scaffold sequences that best minimise off-target reactions from a large pool of alternatives. With less off-target reactions in the system of assembling strands, these scaffold sequences (along with their complementary staple sequences) are statistically less likely to get kinetically trapped during the origami folding process.
[0096] The Examples demonstrate exemplary in vitro assembly data for a DNA triangle and rectangle that were designed by selecting scaffold sequences with a high number of associated off-target reactions leads to failed origami folding (even when all staples are Watson-Crick complements to the scaffold); and assembly data for structures designed, using the method of the invention, to have lower off-target reactions, resulting in higher yield for both triangle and rectangle origamis. This principle is extrapolatable to any nucleic acid nanostructure that is designed from a set of component nucleic acid parts.
[0097] Principles of multi-obiective scaffold sequence selection
[0098] The multi-objective approach to select well-folding origami scaffold sequences is summarised in Figure 4. A target origami design is specified a priori and a pool of 'sequence variants' (potential scaffold sequences, each with their Watson-Crick complementary staples) is scored on four thermodynamic objective functions. Each of these objective functions Mi to M4 represents the worst-case prevalence of a particular class of off-target binding site during folding. Objective Mi represents the total energy of staple-scaffold off-target binding sites; M2 the total energy of scaffold-scaffold binding sites (all off-target); M3 the energy of the worst staple-staple co-fold and M4 the energy of the worst staple hairpin (see main text for detailed definitions). When an objective function is minimised to zero it signifies that off-target bindings of that particular type do not exist.
[0099] Rather than making a direct prediction of origami assembly yield for a sequence variant, the premise of the method is instead to calculate four scores for each variant which are general heuristics for high-yield folding (but not absolute measures of it). Sequence variants with low-scoring objectives are statistically more likely to fold with high-yield than sequence variants with high-scoring objectives because less off-target sites correlate with reduced total kinetic traps on the origami folding pathway.
[0100] Definitions of metrics 1, 2, 3, and 4 (M1-M4)
[0101] Note that all metrics defined below take the absolute value of the computed free energy quantity. This allows the metrics to be costs, which are minimised to zero from a positive number. Metrics 1 and 2 derive sequence-averaged free energies whereas Metrics 3 and 4 derive sequence-dependent free energies.
[0102] Ml Premise: The yield of the target origami is more likely be maximised if potential off- target staple binding positions on the scaffold are minimised, leaving only on-target staple binding positions.
[0103] Metric 1 indicates to what extent staples can initially bind the scaffold in non-designed i.e. "offtarget" locations. Metric 1 is the (absolute) total free energy of initial off-target staple binding locations on the scaffold. It should be emphasised that Metric 1 is concerned with minimising the number of initial off-target binding sites where staples first hybridise with the scaffold: it is not concerned with calculating and minimising binding sites for the second, third, fourth etc. staple legs, which are additionally determined by entropic loop penalties dependent on the current fold state of the structure.
[0104] Minimising Metric 1 reduces potential kinetic traps in origami self-assembly by simultaneously minimising both (i) the number and (ii) the strength of initial off-target binding sites.
[0105] Metric 1 is defined as: Ml = (o,AG°fftarget(si))| optimised as M^ o where S is the set of staples is the total free energy of initial off-target binding sites for staple si on the scaffold. Defining the terms in Eq. (7): G°totai(Si) is the total free energy of all initial on-target and off-target binding sites for staple Si on the scaffold. It is calculated algorithmically by positioning staple s anti-parallel to the scaffold and moving it along the scaffold as a "sliding window". At each position of the sliding window the energy of all potential binding sites between the staple bases and the aligned scaffold bases is calculated by the energy model, as explained elsewhere herein. AG°totai(Si) is the sum of all binding site energies when the staple window has been moved all the way along the scaffold.
[0106] • G°ontarget(Si) is the total free energy of the initial on-target designed binding sites for staple Si on the scaffold. It is the sum of the (sequence averaged) hybridisation energies for each of the individual staple leg binding sites, namely: where dSiis the number of staple legs that Si binds to the scaffold, and bsi is the total number of bases on staple Si that hybridise to the scaffold. This is derived by considering that each staple section initially binds with energy A G°init +b G°bP, where b is the number of bases on the staple section in question.
[0107] When all initial staple binding sites detected are exactly the designed staple binding sites, then G°total(Si) = G°ontarget(Si) and thus G°offtarget(si) = 0
[0108] The most common case is for extra off-target binding sites exist for a staple. Then A G^total(Si) < A G ntarget(Si) and thUS A G fftarget(si) < 0
[0109] Very short staple sections (e.g. less than 6nt) are not detectable by the energy model and therefore are not included in AG totai(s). However, they are included in AG° ontarget(si), making it more negative. Staple sections less than 6nt therefore have the effect of making AG°offtarget(si) less negative overall and make it seem (falsely) like less off-target sites exist.
[0110] The min function is included for completeness, to ensure that AG°offtarget(si) never becomes positive in the rare case that an origami design has most staples with very short sections.
[0111] Extended binding regions "spilling over" designed binding sites also (correctly) contribute to the total energy of off-target binding sites.
[0112] M2
[0113] Metric 2 approximates the total energy of all the scaffold-scaffold binding sites, in the absence of loops. Scaffold-scaffold binding events are effectively treated as bi- molecular events where two independent strands come together in solution. In reality, scaffold-scaffold bindings are pseudo-unimolecular reaction events. Instead of a constant initiation penalty, scaffold-scaffold bindings have an entropic loop penalty dependent on the loop size between the hybridising domains (which is, in turn, dependent on the temporal fold state of the origami). Hence, it should be emphasised that Metric 2 does not calculate quantitatively accurate hybridisation energies between different complementary regions of the scaffold. Rather, a lower value of Metric 2 simply indicates that one scaffold has relatively less off-target scaffold-scaffold binding sites than another scaffold. This is sufficient for the multi-objective selection approach in this work.
[0114] Metric 2 is defined as:
[0115] Where AGt0otai(p,) is the cumulative energy of off-tardet binding sites in permutation p,.
[0116] Metric 2 is more suitable than a direct MFE calculation of the scaffold sequence. This is because Metric 2 includes all possible scaffold-scaffold binding sites, whereas the MFE only includes scaffold-scaffold bindings in the MFE configuration. Furthermore, MFE calculation time for long scaffolds is considerable in NUPACK and circular scaffolds and pseudo-knots are prohibited.
[0117] Metric 2 also serves to quantify the extent of off-target bindings between two different copies of the scaffold strand in solution.
[0118] Circular and linear versions of the same scaffold generally have identical M2 scores, except when the circular version contains extra binding sites existing in the region where the virtual 5' and 3' scaffold ends join into a loop.
[0119] M3
[0120] Premise: The yield of the target origami is more likely be maximised if staples are able to engage in hybridisation reactions with the scaffold as singular entities, rather than being sequestered in strong staple / staple co-folds or in multi-stranded staple complexes.
[0121] Metric 3 calculates the worst extent of staple-staple co-folding in the staple set. It is reasoned that if staples are less likely to co-fold in pairs then multi-stranded staple complexes (> 3 strands) are also less likely. Metric 3 is the (absolute) energy of the strongest staple-staple co-fold at 55C as assessed by NUPACK mfe(). See Table below for NUPACK parameters:
[0122] Table: NUPACK mfe() parameters for metrics M3 and M4. DNA parameters attempt to mimic a typical origami folding buffer (12.5mM magnesium). RNA parameters are constrained to 1.0M sodium by lack of RNA salt corrections in NUPACK. The "No Stacking" ensemble disregards energetic contributions from dangling ends and co-axial stacking. Minimising Metric 3 brings down the ceiling value for all staple-staple co-fold MFE energies at 55C. The number of distinct staple-staple pairs in the set of all co-folds C is given by: where |S| denotes the total number of staples. Note that all staple dimers (sn, sn) are included in C.
[0123] Metric 3 is defined as: where A G°SI,S2 is the minimum free energy of formation of staple co-fold (si, s2) at 55C and G°Sn is the minimum free energy of formation of staple snat 55C. Note that term A G°SI,S2 - G°si - G°S2 gives the overall free energy of binding of the two staple strands, taking into account the secondary structure of the individual staples themselves. Staple pairs which are completely orthogonal (with no co-fold MFE structure defined) are omitted from the minimum calculation in (11). For computational efficiency, Metric 4 is calculated before Metric 3 and, once computed, AG°Sn values are saved for re-use by Metric 3.
[0124] Staple co-fold binding energies are dependent on three energies A G°SI,S2 , AG°siand A G°S2, each with their own temperature dependence. The choice is made to evaluate M3 at 55C in the middle of the origami folding temperature range.
[0125] Staple concentrations are also relevant for the formation of staple complexes. Metric 3 works on the assumption that minimising staple co-fold energies minimises the likelihood of staple complexes at any given concentration.
[0126] If some staples are intended to hybridise together as a feature of the origami design, then these hybridisation regions should be removed from the staples before Metric 3 is run.
[0127] M4
[0128] Premise: The yield of the target origami is more likely be maximised when individual staples possess less internal secondary structure as internal base pairing can sequester staple bases designed to hybridise with the scaffold.
[0129] Metric 4 calculates the worst extent of intra-staple secondary structure in the staple set. Metric 4 is the (absolute) energy of the staple with the strongest hairpin-like fold at 55C, as assessed by NUPACK mfe(). Minimising Metric 4 minimises the worst-case hairpin fold in the staple set. An evaluation temperature of 55C is used for consistency with Metric 3.
[0130] Metric 4 is defined as:
[0131] Filter G
[0132] Filter G is not a metric, but may help in deciding which sequence set to order. It counts the number of staples on an pareto front origami which contain four consecutive G's in their sequence. Such motifs may lead to stacked G-quadruplex structures which can cause difficulty in staple synthesis (reduced yields) and in determination of staple purity. A lower G score is preferable, ideally G = 0.
[0133] The relative comparison of sequence variants takes place in objective space (Figure 4e) where a 'pareto front' of optimal trade-off sequence variants becomes defined (black dots). Sequence variants on the pareto front are special in that they represent the set of optimal trade-offs for the optimisation problem: for pareto variants, it is not possible to decrease prevalence of one type of off-target reaction without increasing prevalence of other types of off-target reaction. A pareto front exists because it is typically not possible for a sequence variant to minimise all metrics Mi to M4 to zero simultaneously. Similarly, a reverse pareto front of ‘worse’ variants can be defined (red dots, Figure 4e) by treating the scoring metrics as benefits to be maximised rather than costs to be minimised.
[0134] For a given origami design and pool of scaffold / staple sequence variants realising that design, the method hence returns the pareto set of sequence variants that have the best trade-offs at minimising off-target reactions during self-assembly. Therefore, not one sequence variant, but a cloud of potentially good sequence variants are returned. Using all four scoring metrics, the pareto set is typically 2-3% of the initial pool of sequence variants. When the initial sequence variant pool has a size of 5000, the pareto set is over 100 variants. This number of pareto sequence variants is still too many for a human decision maker to manually review them and choose one. Therefore, for the last step, a Multi-Criteria Decision Making (MCDM) method to help decide which single variant on the pareto front is the most appropriate scaffold / staple sequence set to order for implementing the origami design. For each of the MCDM methods, in the absence of better information, we weighted all of the thermodynamic scoring metrics Mi to M4 equally. Where the initial pool size is smaller and the number of sequence sets on the pareto front is much smaller, it may not be necessary to use a MCDM to select the most optimal set of sequences. For example in some instances the number of sets on the pareto front is small enough so that each set of sequences can be physically made and tested for desired assembly properties in real-world.
[0135] A fast approximated energy model parameterised on NUPACK for calculating metrics Mi and M2 is also provided. The energy model uses sliding window techniques to efficiently detect off-target binding sites between staples and scaffold, and to detect regions where the scaffold can bind to itself. Binding sites are detected at baselevel resolution and the energy model allows for mis-matching symmetric interior loops of different sizes within binding sites (described elsewhere herein). Of note, the energy model takes into account all potential scaffold-scaffold bindings when calculating M2, not just those bindings present in the minimum free energy (MFE) configuration of the scaffold strand; also it is much faster to calculate than the MFE configuration. For metrics M3 and M4 requiring less calculations, we call NUPACK to calculate binding energies based on full sequence information.
[0136] Accordingly, the present invention provides improved methods for designing the underlying nucleic acid sequences that make up a nucleic acid nanoparticle to drive proper self-assembly of the parts, resulting in higher yields of the desired product.
[0137] Accordingly in some embodiments of this second aspect, the invention provides a method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off-target self-assembly.
[0138] The invention also provides a method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts. The nucleic acid nanostructure is preferably a DNA nanostructure (or variant thereof), but may be an RIMA nanostructure. For example the nucleic acid nanostructure may be or may comprise: a) the nucleic acids are DNA and / or RNA; b) a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems; c) mRNA (messenger RNA), tRNA (transfer RNA), rRNA (ribosomal RNA), siRNA (small interfering RNA), miRNA (microRNA), IncRNA (long noncoding RNA), snRNA (small nuclear RNA), snoRNA (small nucleolar RNA), piRNA (Piwi- interacting RNA), crRNA (CRISPR RNA), shRNA (short hairpin RNA), dsDNA (double-stranded DNA), ssDNA (single-stranded DNA), cDNA (complementary DNA), mtDNA (mitochondrial DNA), gRNA (guide RNA), sgRNA (single-guide RNA), tracrRNA (trans-activating CRISPR RNA); d) DNA or RNA used to store digital data via a suitable error correcting and encoding scheme.
[0139] Nucleic acid origami nanostructures are known to those in the field and the skilled person will recognise and understand the terms "scaffold" and "staples".
[0140] The:
[0141] A) method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off-target self-assembly; or
[0142] B) a method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts; comprises: a) designing a plurality of sets of test component parts where each set of test component parts is predicted to self-assemble into nanostructure with the desired geometry; and b) scoring each set of test component parts for each of the following cost criteria: i) Metric 1 (Mi) - the extent to which staples can initially bind the scaffold in non-designed locations wherein Mi is the (absolute) total fee energy of initial off-target staple binding locations on the scaffold ii) Metric 2 (M2) - the prevalence of scaffold-scaffold binding sites wherein M2 is the (absolute) total energy of scaffold-scaffold binding sites found in all of the distinct binding permutations iii) Metric 3 (M3) - the worst extent of staple-staple co-folding in the staple set wherein M3 is the (absolute) energy of the strongest staplestaple co-fold at 55C, optionally as assessed by NUPACK mfe() iv) Metric 4 (M4) the worst extent of intra-staple secondary structure in the staple set, wherein M4 is the (absolute) energy of the staple with the strongest hairpin-like fold at 55C, optionally as assessed by NUPACK mfe().
[0143] In some embodiments each score for Mi, M2, M3 and M4 is given equal weighting in the predictive model. In other embodiments the weighting of one or more of the scores may be higher or lower than one or more of the other scores. For example in some instance the effect of Mi may be considered to be more influential than the effect of M2-M4.
[0144] The methods can comprise a further optional screening step by application a filter G to the plurality of sets of test component parts, where filter G counts the number of staples in a pareto front origami which contain four consecutive Gs in their sequence or some other application specific constraint.
[0145] In some embodiments the methods further comprise:
[0146] (c) Mapping the scores for M1-M4 to a low-dimensional objective space and calculating the pareto front of optimal trade-offs in this space.
[0147] In some embodiments the methods further comprise:
[0148] (d) Applying a multi-criteria decision making method, for example TOPIS, SAW or KNEE to rank pareto candidates. The method may also comprise selecting one or a plurality of sets of component parts with the most desired characteristics for physical production. As above in some instances this selection may be made with the aid of a MCDM model.
[0149] The methods may comprise physically generating one or a plurality of sets of component parts with the most desired characteristics, for example for physical testing or for use in the desired end use.
[0150] As above, the nucleic acid nanostructure may comprise one or more nucleic acid variants, for example a variant of DNA and / or RIMA, for example selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
[0151] There are some instances where the method of the invention is considered to be particularly beneficial, for example wherein: a) the scaffold sequences are derived from repeat-prone biological sources; b) the staple legs are short, optionally 7 or 8nt in lenght and bind initially at intermediate temperatures; and / or c) when constant-temperature folding with no initial thermal denaturation step is employed.
[0152] In preferred embodiments the method comprises exactly enumerating and minimising the effects of M1-M4.
[0153] The method may be performed on any number of sets of test component parts. In some preferred instances the number of sets of test component parts onto which the method is applied is large, for example is at least 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750, 6000 or more sets of test component parts.
[0154] In some embodiments determining Mi involves the use of sliding windows to determine binding sites at base-level resolution and allows for mismatching symmetric interior loops of different sizes within binding sites. It will be clear to the skilled person that the method of the first aspect, of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s), and the method of the second aspect, wherein a nanostructure is designed that has improved on-target selfassembly properties, can be used in combination with one another.
[0155] For example in some embodiments the parent nanostructure used in the method of the first aspect has been designed according to any of method of the second aspect, and has been designed so as to have improved on-target self-assembly properties.
[0156] In other instances, the selected candidate nanostructure that are obtained from the method of the first aspect may be re-designed using the methods of the second aspect, so as to result in a nanostructure with a geometry, topology and / or function that is within a target range of geometries, topologies and / or functions with respect to a parent nanostructure, and which has improved on -target assemble with respect to the selected candidate nanostructure.
[0157] The invention also provides a nanostructure that has been designed according to any of these methods. The invention also provides a nanostructure with improved on- target self-assembly properties.
[0158] The invention provides a computer implemented method of any of the methods described herein.
[0159] As can be seen from Figure 1, it is possible, using the methods of the invention, to derive two databases: one database with optimal staple-scaffold sets and one database with (disorigami) optimal and minimal staple-scaffold sets. These databases can be used for various purposes, for example training Al algorithm, for example generative Al.
[0160] Any of the methods described herein may involve outputting the sequence information about the resultant component parts and / or other associated data to a database, which could, for example be used to train an algorithm such as an Al algorithm.
[0161] Accordingly the invention also provides a method of generating a database wherein the method comprises: i) the method of designing a nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts of the invention; and / or ii) the method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts that self-assemble to form the nanostructure, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off- target self-assembly relative to a nanostructure not designed using the method, optionally wherein one or more of all of the nucleic acids are nucleic acid variants; and / or iii) the method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts and wherein the database comprises information about the component parts of said nanostructures and outputting the resultant sequences or the component parts and / or other associated data to a database.
[0162] The invention also provides a method for training an Al algorithm, for example a generative Al system, wherein the method comprises training the Al system on one or more databases generated by one or more methods of the invention.
[0163] A method of using an Al that has been trained on the datasets provided herein.
[0164] A method of designing a nanostructure using an algorithm such as an Al algorithm that has been trained on a dataset or database generated using any one or more methods of the invention.
[0165] The invention also provides the following numbered embodiment paragraphs:
[0166] 1. A method of designing a nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts, wherein said method comprises: a) obtaining information about a set of parent component parts that comprise a parent nanostructure which has said or is estimated or calculated to have said topology, function and / or geometry, wherein said set of parent component parts comprises n total parent component parts and m different parent component parts; b) reducing the number of parent component parts or the number of different parent component parts to produce one or a plurality of candidate sets of component parts wherein each said candidate set of component parts comprises the same set of parent component parts, but wherein each candidate set of component parts comprises i) n-1, or n-2, or n-3 or fewer total candidate component parts; and / or m-1, or m-2 or m-3 or fewer different candidate component parts; c) measuring or calculating the topology, function and / or geometry of a candidate nanostructure formed from one or the plurality of sets of candidate component parts, based on the information about the plurality of sets of candidate component parts; d) selecting one or more candidate nanostructures that are measured or calculated to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s).
[0167] 2. The method of embodiment 1 wherein in step (b) one or more of the candidate sets of component parts comprises the set of parent component parts but has: a) n-X or fewer total candidate component parts; or b) m-X or fewer different candidate component parts where X is a value that reduces n and / or m to a threshold or target number of total candidate component parts and / or different candidate component parts.
[0168] 3. The method of embodiment 1 or 2 further comprising step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated at least once, optionally at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 150, 200 or more times. 4. The method of embodiment 1 or 2 further comprising step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated until the one or more candidate nanostructures of (e) comprises a number of component parts that is below a threshold or target number of component parts (materials budget).
[0169] 5. The method of any of embodiments 1 or 2 wherein each selected candidate nanostructure has a set of component parts with a total number of component parts (x) and a total number of different component parts (y) and wherein the method further comprises: step (e) providing, for one or more or each of the selected candidate nanostructures of (d) information about one or a plurality of further candidate sets of component parts wherein each said candidate set of component parts comprises the set of component parts of the corresponding candidate set of component parts, but wherein each set of further candidate component parts comprises i) x-1, or x-2, or x- 3 or fewer total further candidate component parts; and / or y-1, or y-2 or y-3 or fewer different candidate component parts; and step (f) measuring or calculating: i) the topology, function and / or geometry of one or the plurality of further candidate nanostructures formed from one or the plurality of sets of further candidate component parts, based on the information about the plurality of sets of further candidate component parts; and step (g) selecting one or more further candidate nanostructures that are predicted to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s).
[0170] 6. The method of any of the preceding embodiments wherein step (b) is a computer implemented step, optionally using a materials minimalization genetic algorithm.
[0171] 7. The method of any of the preceding embodiments wherein the component parts are polynucleotides. 8. The method of embodiment 7 wherein: a) the polynucleotides are composed of nucleic acids that are DNA and / or RNA; and / or b) the polynucleotides comprise a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
[0172] 9. The method of any of embodiments 6 or 7 wherein one or more or all of the nucleic acids comprise at least 5 nucleotides for example at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200 or more nucleotides.
[0173] 10. The method of any of the preceding embodiments wherein the nanostructure is a nucleic acid nanostructure.
[0174] 11. The method of any of the preceding embodiments wherein the nanostructure is a DNA nanostructure or an RNA nanostructure.
[0175] 12. The method of any of the preceding embodiments wherein the nanostructure is a nucleic acid origami nanostructure wherein the component parts comprise at least one scaffold polynucleotide and a plurality of staple polynucleotides.
[0176] 13. The method of any of the preceding embodiments wherein the nanostructure is a nucleic acid origami nanostructure that comprises a plurality of substructures, wherein the plurality of substructures is formed from the component parts and each substructure comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides.
[0177] 14. The method of any of the preceding embodiments wherein the nanostructure is a nucleic acid origami substructure that is formed from the component parts and comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides. 15. The method of any of the preceding embodiments comprising a step (a ) (i ) prior to step (a):
[0178] (a)(i) designing a parent nanostructure that comprises a parent set of component parts and obtaining information about said parent component parts.
[0179] 16. A nanostructure that has been designed according to the method of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) of any of the preceding embodiments.
[0180] 17. A nanostructure with a programmed disordered structure.
[0181] 18. The nanostructure of any of embodiments 16 or 17 wherein the nanostructure is: a nucleic acid nanostructure, optionally a DNA or RIMA nanostructure a nucleic acid origami nanostructure, optionally a DNA origami nanostructure.
[0182] 19. A method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts that self-assemble to form the nanostructure, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off- target self-assembly relative to a nanostructure not designed using the method, optionally wherein one or more of all of the nucleic acids are nucleic acid variants.
[0183] 20. A method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts.
[0184] 21. The method of any of embodiments 19 or 20 wherein the nanostructure is a nucleic acid nanostructure, optionally a DNA or an RNA nanostructure. 22. The method of any of embodiments 19-21 wherein the nanostructure is a DNA origami nanostructure that comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids.
[0185] 23. The method of any of embodiments 19-22 wherein the method comprises:
[0186] (a) designing a plurality of sets of test component parts where each set of test component parts is predicted to assemble into a nanostructure with the desired geometry; and
[0187] (b) scoring each set of the test component parts according to an off-target selfassembly cost function.
[0188] 24. .The method of embodiment 23, wherein the off-target self-assembly cost function comprises one or more of the following cost criteria: i) Metric 1 (Mi) - the extent to which staples can initially bind the scaffold in non-designed locations wherein Mi is the (absolute) total fee energy of initial off-target staple binding locations on the scaffold ii) Metric 2 (M2) - the prevalence of scaffold-scaffold binding sites wherein M2 is the (absolute) total energy of scaffold-scaffold binding sites found in all of the distinct binding permutations iii) Metric 3 (M3) - the worst extent of staple-staple co-folding in the staple set wherein M3 is the (absolute) energy of the strongest staple-staple co-fold at 55C, optionally as assessed by NUPACK mfe(); and / or iv) Metric 4 (M4) the worst extent of intra-staple secondary structure in the staple set, wherein M4 is the (absolute) energy of the staple with the strongest hairpin-like fold at 55C, optionally as assessed by NUPACK mfe().
[0189] 25. The method of embodiment 24 wherein Scores for Mi to M4 are given equal weighting.
[0190] 26. The method of embodiment 24 wherein one or more of Mi, M2, M3 or M4 is given more weighting than one or more of the other scores.
[0191] 27. The method according to any of embodiments 19-26 wherein the method comprises applying a filter G to the plurality of sets of component parts wherein filter G identifies the number of staples in a pareto front origami which contain four consecutive Gs in their sequence and optionally: i) removes the staple them from the set of sequences; and / or ii) removes the set of component parts from the plurality of test component parts.
[0192] 28. The method of any of embodiments 19-27 comprising mapping the scores for M1-M4 to a low-dimensional objective space and calculating the pareto front of optimal trade-offs in this space.
[0193] 29. The method of any of embodiments 19-28 further comprising applying a multicriteria decision making method, optionally TOPIS, SAW or KNEE to rank pareto candidates, and optionally selecting the one or more highest ranking candidates.
[0194] 30. The method of any of embodiments 19-28 further comprising selecting one or a plurality of sets of component parts with the most desired characteristics for physical production.
[0195] 31. The method of any of embodiments 19-30 further comprising outputting the selected design for physical production.
[0196] 32. The method of any of embodiments 19-31 further comprising producing the selected one or plurality of component parts.
[0197] 33. The method of any of embodiments 19-32 wherein the nucleic acid variant is a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
[0198] 34. The method of any of embodiments 19-33 wherein: a) the scaffold sequences are derived from repeat-prone biological sources; b) the staple legs are short, optionally 7 or 8nt in length and bind initially at intermediate temperatures; and / or c) when constant-temperature folding with no initial thermal denaturation step is employed.
[0199] 35. The method of any of embodiments 19-34 wherein the method comprises exactly enumerating and minimising the effects of M1-M4. 36. The method of any of embodiments 23-35 wherein the plurality of sets of test component parts is at least 50, 100, 200, 300, 400, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750, 6000 or more sets of test component parts.
[0200] 37. The method of any of embodiments 23-36 wherein determining Mi comprises using a sliding window process to determine binding sites at base-level resolution and allows for mismatching symmetric interior loops of different sizes within binding sites.
[0201] 38. The method of any of embodiments 1-15 wherein the parent nanostructure has been designed according to any of methods 19-37.
[0202] 39. The method of any of embodiments 1-15 wherein the selected candidate nanostructure is re-designed using the method of any of embodiments 19-37 so as to result in a nanostructure with a geometry, topology and / or function that is within a target range of geometries, topologies and / or functions with respect to a parent nanostructure, and which has improved on -target assemble with respect to the selected candidate nanostructure.
[0203] 40. A nanostructure that has been designed according to the method of any of embodiments 19-39.
[0204] 41. A nanostructure that has any one or more of: a) missing duplex segments; b) non-uniform edge lengths; c) non-uniform junction valences; d) local flexibility, optionally high local flexibility.
[0205] 42. The nanostructure of claim 41 that is not a fully filled, periodic origami lattice
[0206] 43. A nanostructure with improved on-target self-assembly properties.
[0207] 44. The nanostructure of claim 41 wherein the nanostructure is a partially disassembled nanostructure.
[0208] The invention also provides the following numbered clauses:
[0209] 1. A method of designing a nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts, wherein said method comprises: a) obtaining information about a set of parent component parts that comprise a parent nanostructure which has said or is estimated or calculated to have said topology, function and / or geometry, wherein said set of parent component parts comprises n total parent component parts and m different parent component parts; b) reducing the number of parent component parts or the number of different parent component parts to produce one or a plurality of candidate sets of component parts wherein each said candidate set of component parts comprises the same set of parent component parts, but wherein each candidate set of component parts comprises i) n-1, or n-2, or n-3 or fewer total candidate component parts; and / or m-1, or m-2 or m-3 or fewer different candidate component parts; c) measuring or calculating the topology, function and / or geometry of a candidate nanostructure formed from one or the plurality of sets of candidate component parts, based on the information about the plurality of sets of candidate component parts; d) selecting one or more candidate nanostructures that are measured or calculated to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s).
[0210] 2. The method of clause 1 wherein in step (b) one or more of the candidate sets of component parts comprises the set of parent component parts but has: a) n-X or fewer total candidate component parts; or b) m-X or fewer different candidate component parts where X is a value that reduces n and / or m to a threshold or target number of total candidate component parts and / or different candidate component parts.
[0211] 3. The method of clause 1 or 2 further comprising step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated at least once, optionally at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 150, 200 or more times. 4. The method of clause 1 or 2 further comprising step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated until the one or more candidate nanostructures of (e) comprises a number of component parts that is below a threshold or target number of component parts (materials budget).
[0212] 5. The method of any of clauses 1 or 2 wherein each selected candidate nanostructure has a set of component parts with a total number of component parts (x) and a total number of different component parts (y) and wherein the method further comprises: step (e) providing, for one or more or each of the selected candidate nanostructures of (d) information about one or a plurality of further candidate sets of component parts wherein each said candidate set of component parts comprises the set of component parts of the corresponding candidate set of component parts, but wherein each set of further candidate component parts comprises i) x-1, or x-2, or x- 3 or fewer total further candidate component parts; and / or y-1, or y-2 or y-3 or fewer different candidate component parts; and step (f) measuring or calculating: i) the topology, function and / or geometry of one or the plurality of further candidate nanostructures formed from one or the plurality of sets of further candidate component parts, based on the information about the plurality of sets of further candidate component parts; and step (g) selecting one or more further candidate nanostructures that are predicted to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s).
[0213] 6. The method of any of the preceding clauses wherein step (b) is a computer implemented step, optionally using a materials minimalization genetic algorithm.
[0214] 7. The method of any of the preceding clauses wherein the component parts are polynucleotides.
[0215] 8. The method of clause 7 wherein: a) the polynucleotides are composed of nucleic acids that are DNA and / or RNA; and / or b) the polynucleotides comprise a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
[0216] 9. The method of any of clauses 6 or 7 wherein one or more or all of the nucleic acids comprise at least 5 nucleotides for example at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200 or more nucleotides.
[0217] 10. The method of any of the preceding clauses wherein the nanostructure is a nucleic acid nanostructure.
[0218] 11. The method of any of the preceding clauses wherein the nanostructure is a DNA nanostructure or an RNA nanostructure.
[0219] 12. The method of any of the preceding clauses wherein the nanostructure is a nucleic acid origami nanostructure wherein the component parts comprise at least one scaffold polynucleotide and a plurality of staple polynucleotides.
[0220] 13. The method of any of the preceding clauses wherein the nanostructure is a nucleic acid origami nanostructure that comprises a plurality of substructures, wherein the plurality of substructures is formed from the component parts and each substructure comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides.
[0221] 14. The method of any of the preceding clauses wherein the nanostructure is a nucleic acid origami substructure that is formed from the component parts and comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides.
[0222] 15. The method of any of the preceding clauses comprising a step (a)(i) prior to step (a): (a)(i) designing a parent nanostructure that comprises a parent set of component parts and obtaining information about said parent component parts.
[0223] 16. A nanostructure that has been designed according to the method of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) of any of the preceding clauses.
[0224] 17. A nanostructure with a programmed disordered structure.
[0225] 18. The nanostructure of any of clauses 16 or 17 wherein the nanostructure is: a nucleic acid nanostructure, optionally a DNA or RIMA nanostructure a nucleic acid origami nanostructure, optionally a DNA origami nanostructure.
[0226] 19. A method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts that self-assemble to form the nanostructure, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off- target self-assembly relative to a nanostructure not designed using the method, optionally wherein one or more of all of the nucleic acids are nucleic acid variants.
[0227] 20. A method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts.
[0228] 21. The method of any of clauses 19 or 20 wherein the nanostructure is a nucleic acid nanostructure, optionally a DNA or an RNA nanostructure.
[0229] 22. The method of any of clauses 19-21 wherein the nanostructure is a DNA origami nanostructure that comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids.
[0230] 23. The method of any of clauses 19-22 wherein the method comprises: (a) designing a plurality of sets of test component parts where each set of test component parts is predicted to assemble into a nanostructure with the desired geometry; and
[0231] (b) scoring each set of the test component parts according to an off-target selfassembly cost function.
[0232] 24. .The method of clause 23, wherein the off-target self-assembly cost function comprises one or more of the following cost criteria: i) Metric 1 (Mi) - the extent to which staples can initially bind the scaffold in non-designed locations wherein Mi is the (absolute) total fee energy of initial off-target staple binding locations on the scaffold ii) Metric 2 (M2) - the prevalence of scaffold-scaffold binding sites wherein M2 is the (absolute) total energy of scaffold-scaffold binding sites found in all of the distinct binding permutations iii) Metric 3 (M3) - the worst extent of staple-staple co-folding in the staple set wherein M3 is the (absolute) energy of the strongest staple-staple co-fold at 55C, optionally as assessed by NUPACK mfe(); and / or iv) Metric 4 (M4) the worst extent of intra-staple secondary structure in the staple set, wherein M4 is the (absolute) energy of the staple with the strongest hairpin-like fold at 55C, optionally as assessed by NUPACK mfe().
[0233] 25. The method of clause 24 wherein Scores for Mi to M4 are given equal weighting.
[0234] 26. The method of clause 24 wherein one or more of Mi, M2, M3 or M4 is given more weighting than one or more of the other scores.
[0235] 27. The method according to any of clauses 19-26 wherein the method comprises applying a filter G to the plurality of sets of component parts wherein filter G identifies the number of staples in a pareto front origami which contain four consecutive Gs in their sequence and optionally: i) removes the staple them from the set of sequences; and / or ii) removes the set of component parts from the plurality of test component parts.
[0236] 28. The method of any of clauses 19-27 comprising mapping the scores for Mi to M4 to a low-dimensional objective space and calculating the pareto front of optimal trade-offs in this space. 29. The method of any of clauses 19-28 further comprising applying a multi-criteria decision making method, optionally TOPIS, SAW or KNEE to rank pareto candidates, and optionally selecting the one or more highest ranking candidates.
[0237] 30. The method of any of clauses 19-28 further comprising selecting one or a plurality of sets of component parts with the most desired characteristics for physical production.
[0238] 31. The method of any of clauses 19-30 further comprising outputting the selected design for physical production.
[0239] 32. The method of any of clauses 19-31 further comprising producing the selected one or plurality of component parts.
[0240] 33. The method of any of clauses 19-32 wherein the nucleic acid variant is a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
[0241] 34. The method of any of clauses 19-33 wherein: a) the scaffold sequences are derived from repeat-prone biological sources; b) the staple legs are short, optionally 7 or 8nt in length and bind initially at intermediate temperatures; and / or c) when constant-temperature folding with no initial thermal denaturation step is employed.
[0242] 35. The method of any of clauses 19-34 wherein the method comprises exactly enumerating and minimising the effects of M1-M4.
[0243] 36. The method of any of clauses 23-35 wherein the plurality of sets of test component parts is at least 50, 100, 200, 300, 400, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750, 6000 or more sets of test component parts. 37. The method of any of clauses 23-36 wherein determining Ml comprises using a sliding window process to determine binding sites at base-level resolution and allows for mismatching symmetric interior loops of different sizes within binding sites.
[0244] 38. The method of any of clauses 1-15 wherein the parent nanostructure has been designed according to any of methods 19-37.
[0245] 39. The method of any of clauses 1-15 wherein the selected candidate nanostructure is re-designed using the method of any of clauses 19-37 so as to result in a nanostructure with a geometry, topology and / or function that is within a target range of geometries, topologies and / or functions with respect to a parent nanostructure, and which has improved on -target assemble with respect to the selected candidate nanostructure.
[0246] 40. A nanostructure that has been designed according to the method of any of clauses 19-39, optionally where the nanostructure has any one or more of: a) missing duplex segments; b) non-uniform edge lengths; c) non-uniform junction valences; d) local flexibility, optionally high local flexibility, optionally where the nanostructure is not a fully filled, periodic origami lattice.
[0247] 41. A nanostructure with improved on-target self-assembly properties.
[0248] 42. A method of generating a database wherein the method comprises: i) the method of designing a nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts of any of clauses 1-15; and / or ii) the method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts that self-assemble to form the nanostructure, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off- target self-assembly relative to a nanostructure not designed using the method, optionally wherein one or more of all of the nucleic acids are nucleic acid variants of any of clauses 19, or 21-39; and / or iii) the method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts and wherein the database comprises information about the component parts of said nanostructures of any of clauses 20-39; and outputting the resultant sequences or the component parts and / or other associated data to a database.
[0249] 43. A method for training an Al algorithm, for example a generative Al system, wherein the method comprises training the Al system on one or more databases generated by the method of clause 42.
[0250] 44. A method of designing a nanostructure using an algorithm such as an Al algorithm that has been trained on a dataset or database generated the method of clause 42.
[0251] 45. A partially disordered nanostructure.
[0252] 46. A partially disordered nanostructure produced by a method of any of the preceding clauses.
[0253] Figure Legends
[0254] Figure 1: Materials Minimisation for DNA / RNA (dis)origami and other nanotechnologies based on forward and backward optimisation of DNA & RIMA molecules.
[0255] • Sequence Selector Software (Cl): This tool uses multicriteria decision-making (MCDM) techniques to select optimal scaffold and staple strands, minimizing off-target binding and optimizing energy distribution.
[0256] • REVNANO Software (C2): This reverse-engineering tool aligns scaffold and staple sequences, identifies crossover points, and infers spatial arrangements, which aids in design refinements. REVNANO is an example of software that is suitable for use in this step. Other suitable software may also be used, for example other rapid DNA / RNA multi-strand conformation prediction. • Genetic Algorithm (C3): This algorithm iteratively minimizes the number of staples while maintaining the structural integrity of the origami, producing multiple optimized design variants.
[0257] The workflow integrates these components to generate highly optimized and cost- effective DNA / RNA (dis)origami designs.
[0258] Figure 2: Experimental confirmation of materials reduction impact on agarose gel mobility. Starting with full origami on the left and moving towards a disorigami on the right with minimised materials.
[0259] Figure 3: Atomic Force Microscopy images for a range of DNA nanostructures (disorigami structures) that have been designed using the method of designing a nanostructure with a particular topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) as described herein.
[0260] Figure 4 (a-d): Principles of multi-objective scaffold sequence selection, (a) The same DNA origami target design can be realised by many different scaffold sequences. Three possible sequence variants are displayed for a triangle origami. Each variant, consists of a unique scaffold sequence along with the Watson-Crick complementary staple sequences that pin the scaffold strand into a specific rotation, (b) Scaffold sequences are commonly sourced from regions of biological vectors, custom-cloned sequences or from synthetic sequences, (c) The origami self-assembly reaction involves numerous off-target bindings (marked as [x]) in addition to the intended on-target staple scaffold / bindings (marked as [y]). Off-target bindings are classified into four types 1-4. (d) Scaffold sequence selection takes place by scoring a large pool of scaffold / staple sequence variants on four cost criteria that quantify the extent of each off-target binding site type in panel (c). Full definition of each metric is described elsewhere herein, e) After scoring, all variants (here n = 5000) are mapped to a low-dimensional objective space (N scores are omitted for 3D visualisation). The pareto front of optimal trade-off variants is computed in this space (black dots), (f) A multi-criteria decision making method (MCDM; here TOPSIS) ranks pareto candidates and aids a human decision maker in choosing a single scaffold / staple sequence variant to order for in vitro assembly. To visualise TOPSIS ranking, TOPSIS score contours are shown on a 2-dimensional objective space: normally this method would work on a 3 or 4 dimensional objective space.
[0261] Figure 5: Triangle and rectangle DNA origami sequence variants for in vitro assembly. Three variants of a DNA origami triangle Tl, T2, T3 (2410nt scaffold) and three variants of a DNA origami rectangle Rl, R2, R3 (2484nt scaffold) were selected for in vitro assembly. Due to their different scaffold sequences, the origami variants had different extents of off-target side reactions during the self-assembly reaction. Graphical depiction of on- and off-target binding sites existing between staples and scaffold (Mi) for variants of the triangle origami. All variants have the same on-target sites, but T3 has more off-target staple binding sites than T2 which in turn has more than Tl. See Supplementary Note 10 for full analysis of off-target sites
[0262] Figure 6: Triangle DNA origami sequence variants: agarose gel and AFM images, (a) Triangle variant Tl. (b) Triangle variant T2. (c) Triangle variant T3. In (a), (b) and (c) the first panel shows laser scanned image of SYBRRGold-stained 1% TBE agarose gel with lanes: L = Ikb Ladder; Sc = scaffold only; St = staples only; A = Origami self-assembly reaction mix (scaffold + staples); P = Purified assembly reaction mix used for AFM imaging. The assembly reaction was performed under a fast temperature ramp of 95° C to 20° C in Ih 15min (-0.1° C per cycle, each cycle 6 seconds). Remaining panels show representative high-resolution AFM images of the purified DNA self-assembly sample (P). (d) Manual quantification of origami objects on AFM images into "well-folded", "semi-folded" and "mis-folded" categories. Percent of origami objects in each category shown with mean sample size of 130 objects.
[0263] Figure 7: Rectangle DNA origami sequence variants: agarose gel and AFM images, (a) Rectangle variant Rl. (b) Rectangle variant R2. (c) Rectangle variant R3. In (a), (b) and (c) the first panel shows laser scanned image of SYBRRGold-stained 1% TBE agarose gel with lanes: L = Ikb Ladder; Sc = scaffold only; St = staples only; A = Origami self-assembly reaction mix (scaffold + staples); P = Purified assembly reaction mix used for AFM imaging. The assembly reaction was performed under a fast temperature ramp of 95° C to 20° C in Ih 15min (-0.1° C per cycle, each cycle 6 seconds). Remaining panels show representative high-resolution AFM images of the purified DNA self-assembly sample (P). In (a), dotted lines on AFM images indicate a composite image made of different sample areas to account for low origami adsorption density, (d) Manual quantification of origami objects on AFM images into "well-folded", "semi-folded" and "mis-folded" categories. Percent of origami objects in each category shown with mean sample size of 123 objects. Examples
[0264] Example 1
[0265] Three reduced sets of staples were generated using computational tools and tested experimentally showing differences in the mobility shifts in agarose electrophoresis. See Figure 2.
[0266] Example 2 - Reducing DNA origami assembly defects through selection of scaffold sequences that minimise off-target side reactions
[0267] Scaffold sequence selection from biological vectors is most effective
[0268] To see in which scenarios scaffold sequence selection is most effective at minimising off-target reactions, we performed a large-scale computational investigation that used our method to select scaffold sequences for 14 different 2D and 3D origami designs (see supplementary information to Example 2 in Example 3).
[0269] Each origami design had sequence variants (scaffold and complementary staple sequences) selected from three qualitatively different sequence pools; (i) a biological vectors pool containing 5000 scaffold sequences taken from different random contiguous regions of biological vectors pUC19 (2686bp), M13mpl8 (7249nt), p7560, p8064 and Lambda DNA (48502bp); (ii) A random pool containing 5000 synthetic random 4-letter scaffold sequences, where each base had an occurrence probability of 0.25; (iii) A de Bruijn pool containing 5000 synthetic de Bruijn sequences (where a de Bruijn sequence of order k has the special mathematical property that a window of length k bases (or larger) will never frame exactly the same sequence fragment twice when it is moved along the sequence base-by-base). We used k =5 for scaffolds 257 to 1024nt; k = 6 for scaffolds 1025 to 4096nt, and k = 7 for scaffolds 4097nt to 16384nt. Note that the three sequence pools described above were different for each origami. Each sequence pool only provided a very sparse sampling of the combinatorially vast sequence space; however, this sparse sampling was sufficient to create significant variation in the metric scores.
[0270] Additionally, we tried selecting scaffold rotations of the published scaffold sequence for those origamis in the test set which had a circular scaffold (where ‘scaffold rotation' signifies the physical translation of the fixed -sequence scaffold strand around the scaffold routing path in the origami nanostructure by choice of a different set of complementary staples). Overall, the computational investigation revealed that the biological vectors pool produced the largest variance in all scoring metrics Mi to M4 for each of the 14 test origamis. This is consistent with the fact that biological sequences typically have a heterogeneous spatial distribution of GC content which creates higher densities of repeated sub-sequences in specific regions.
[0271] Selection of scaffolds for all 14 test origamis from the biological vectors pool above yielded objective spaces where the reverse front and the pareto front of sequence variants were separated by a large margin. Even for larger 8knt scaffold origamis in the test set, there was still an =40% relative reduction in off-target sites from the worst sequence variant on the reverse front (with many off-target reactions) to the best sequence variant on the pareto front (with fewest off-target reactions). Therefore, we reasoned that scaffold sequences derived from biological vectors were not only the most commonly used to fabricate origamis, but also the most effective target for our multi-objective selection method.
[0272] For the synthetic sequence pools, we found that de Bruijn sequences had the best absolute performance across the test set of 14 origamis. These sequences optimally minimised the total number of off-target reactions between staples and scaffold (but note that Mi scores did not reduce to zero since the energy model additionally includes symmetric mismatches in binding sites that are encompassed by the de Bruijn sequence property).
[0273] De Bruijn sequences also significantly decreased scaffold / scaffold bindings (M2). Interestingly, even though the formal de Bruijn sequence property was not originally developed in relation to nucleic acids - and hence did not consider base pairing between a de Bruijn sequence and itself - this type of self-interaction also turns out to be quite minimal for de Bruijn sequences.
[0274] We found (synthetic) random sequences to minimise scaffold-scaffold interactions equally as well as de Bruijn sequences, and to typically perform better than biological sequences (but worse than de Bruijn sequences) at minimising off-target staplescaffold interactions (Mi). This performance can be attributed to the fact that a nondesigned binding region on a random sequence is exponentially less likely to be matched by a complement as it grows in length. In general, we found little benefit in performing DNA origami scaffold selection from large pools of de Bruijn or random sequences because of their generally good baseline performance (but see Example 3 for more information). We also observed that rotating the fixed-sequence scaffold strand of a DNA origami through the nanostructure did little to minimise the already present off-target reactions, particularly for larger origamis (Example 3). This effect can be anticipated, as rotations of the scaffold only change the staple sequences and different staples inherit similar sequence patterns as the scaffold is rotated. Experimentally, scaffold strand rotations have also been found not to affect assembly yield for large origamis.
[0275] On measuring the Pearson correlation between all unique pairs of metric scores Mi to M4, for all 14 origamis across all sequence pools, we detected no intrinsic linear correlation (Example 4). Visual observation of scatter plots also revealed no non-linear correlations. This signified that the scoring metrics measured independent off-target binding phenomena as intended.
[0276] Finally, the particular MCDM method (SAW, KNEE or TOPSIS) used to select a single origami sequence variant from the pareto front only made a difference in isolated cases. Largely due to the data normalisation scheme we used (Example 5), the MCDM methods approximately agreed about the best candidate sequence variants on the pareto front, and the worst candidate variants on the reverse front in objective space.
[0277] Selection of DNA origami triangle and rectangle variants for in vitro assembly
[0278] To empirically test if selecting scaffold sequences from regions of biological sequences with relatively many or relatively few off-target reactions had a measurable effect on DNA origami assembly yield, we selected three sequence variants of a 2410nt DNA origami triangle and three sequence variants of a 2484nt DNA origami rectangle for in vitro assembly. See Figure 3 for details.
[0279] Triangle variants Ti, T2 and T3 had a successively increasing number of potential off target bindings. Variant Ti acted as a control, with a synthetic scaffold sequence selected from a pool of 5000 de Bruijn sequences. Variant T2 was selected from the pareto front from a pool of 5000 biological vector sequences (derived from contiguous fragments of the pUC19, M13mpl8, p7560, p8064 and Lambda DNA sequences), while variant T3 was selected from the reverse front of the latter pool. Rectangle variants Ri, R2 and R3 were obtained similarly. It should be noted that all sequence variants Ti, T2, T3 formed exactly the same triangle design when perfectly self-assembled. Namely, all implemented the same staple pattern, the same scaffold routing, and all had a staple set which was fully complementary to the scaffold sequence). Likewise, all sequence variants Ri, R2, R3 implemented exactly the same rectangle design. The only factor differentiating variants was the number of off-target bindings possible in each case. 2D shapes were chosen for assembly because they deposit flat on mica and are amenable to direct AFM imaging for assessment of assembly quality.
[0280] While the 5000 biological vector sequences prepared for each origami came from fragments of five different biological sequences, the scaffold sequences actually selected from the pareto (T2,R2) and reverse fronts (Tj / Rj) all came from the Lambda DNA phage sequence.
[0281] Synthesis of single-stranded DNA scaffold sequences
[0282] Single-stranded DNA scaffold sequences can be made by any means.
[0283] In the Examples, strand-specific T7 exonuclease digestion can be employed to synthesise tailor-made single-stranded DNA sequences (ss- DNA) to be used as scaffold in the folding reactions of all six origami variants (three triangles and three rectangles).
[0284] Phosphorothioate protective modification at the 5’ end of the desired ssDNA strand was introduced by polymerase chain reaction (PCR) using a modified forward primer as previously reported. In detail, the double-stranded DNAs (dsDNA) of defined sequence and length were amplified from Lambda DNA or gBIocks Gene Fragments templates using a modified forward primer which contains sequential phosphorothioate bonds between the first five nucleotides. The modified phosphate backbone inhibited exonuclease digestion, while the non-required antisense strand was digested by T7 exonuclease. Six different dsDNA sequences were amplified and purified.
[0285] Next, the purified PCR products were selectively digested by T7 exonuclease overnight and purified to remove the polymerase, concentrated and resuspended the ssDNA in water discharging undesired ions. The resulting ssDNA sequences of lengths 2410 nt (triangle scaffolds) and 2484 nt (rectangle scaffolds) showed higher gel mobilities compared to the non-digested dsDNA. A faint higher band was also observed in each scaffold variant and considered as non-digested dsDNA. Sanger sequencing verified the correct composition of each ssDNA scaffold and underlined no off-target degradation of sense-DNA as previously shown.
[0286] Triangle and rectangle DNA origami variants: folding and pre-screening by gel electrophoresis
[0287] The origami self-assembly reactions were run in tris-acetate-EDTA (TAE) buffer containing 12.5 mM of magnesium acetate and with a 10-fold excess of staple strands, a common protocol for 2D origami folding.
[0288] For each origami variant, 4 different thermal annealing protocols were considered. The mixtures were heated at 95°C for a short period of time and then gradually cooled down to 20°C following a slow (5h 40 min) or a fast (Ih 15min) temperature ramp. Isothermal folding protocols were also considered, testing constant annealing temperature (37°C) with or without initial denaturation at 95°C.
[0289] To identify potential nanostructure side products (partially assembled and / or misfolded intermediates) and aggregates, folding solutions incubated at different temperature conditions were first analysed by agarose gel electrophoresis (AGE), a method commonly used to initially assess origami assembly performance. The migration distances, the presence of smearing or multiple bands, and band sharpness were considered as indicators of the folding quality, allowing us to select a specific folding condition for subsequent purification and characterisation by atomic force microscopy (AFM). Scaffold, staple strands, and scaffold mixed with noncomplementary staples set were considered as negative controls. The formation of DNA origami nanostructures should lead to a mobility shift compared to the scaffold band, which is expected to disappear, while scaffold mixed with a noncomplementary staples set should migrate as the scaffold alone.
[0290] The Triangle DNA origami variants Ti, T2, T3 folded in isothermal condition without initial denaturation showed similar band pattern characterised by a main band, a second fainter and diffuse band, and a smearing. While Ti and T2 variants folded in isothermal condition with initial denaturation were characterised by similar band pattern and migration distance, , the T3 variant showed aggregates visible in the loading well as a non-migrating band. Gel electrophoretic analysis of the same variants folded following a fast ramp revealed different band patterns compared to the profiles reported above. In detail, sharp leading bands without an intense smearing were clearly visible , while aggregates were still formed in the T3 variant. The scaffold mixed with a non-complementary staple set showed the same migration distance as the scaffold alone.
[0291] After 4 days (slow ramp) or 5 days (isothermal folding and fast ramp) at 4°C, the structural stability of the variant assemblies was evaluated. Ti and T2 variants showed a similar band intensity as on day 1 indicating a higher stability when compared to the T3 sample (fast ramp) and other origami samples folded at different temperature conditions.
[0292] Based on these results and considering the smearing and fainter bands noticeable in samples folded through a slow temperature ramp , we selected the fast ramp for further purification and characterisation by AFM (see below).
[0293] As in the triangle DNA origami gel image analysis, rectangle variants Ri, R2, R3 folded in isothermal condition without initial denaturation showed similar band pattern characterised by a main band, a less intense second band and a smearing. With initial denaturation, Ri and R2 variants were characterised by the same band pattern and migration distance with a less visible smearing, while the R3 variant showed aggregates in the loading well, a wide smearing and a main band migrating slightly slower.
[0294] Ri, R2 and R3 samples annealed with a fast ramp had a similar electrophoretic behaviour when compared with both isothermal processes, but with a less pronounced smearing and slightly different migration distances. The negative controls (scaffold alone and scaffold with non-complementary staple set) showed the same migration distances between them.
[0295] Considering the above results and the fainter bands corresponding to the slower ramp, assemblies obtained from the fast ramp were selected, purified, and imaged by AFM (see below). As noted for triangle variants, the rectangle assemblies showed different structural stability when stored at 4°C: after 4 / 5 days, RI and R2 samples (fast ramp) were more stable compared to the scaffold and samples folded with a slow temperature ramp, while the R3 sample (fast ramp) band disappeared, underlying the assembly instability. The Ri, R2 and R3 samples folded in isothermal conditions were relatively stable. Triangle and rectangle DNA origami variants: purification and AFM imaging
[0296] Triangle and rectangle DNA assemblies from a fast ramp annealing were purified by centrifugal filtration to remove the low molecular weight excess of staple strands and to concentrate the samples.
[0297] The purified reaction mixtures were analysed by AGE and compared with non-purified samples. The centrifugal filters efficiently separated the higher molecular weight DNA assemblies from staple strands. Purified samples (Figures 6 and 7, gel lanes 'P') showed the same migration distances as the non-purified samples (Figures 6 and 7, gel lanes 'A'), suggesting that the purification was damage-free.
[0298] We classified structures under AFM into three categories: 'well-folded' structures resembled the folded shape, 'semifolded' structures resembled the folded shape with small defects and 'mis-folded' structures were fragments or aggregates without a specific shape.
[0299] The origami triangle AFM images showed that Ti and T2 variants (Figure 6 a) and b) respectively) folded with the highest frequency into well-folded triangles (61.7%) and triangles with opened vertices (51.9%), respectively. The T3 variant (Figure 6) revealed the lowest percentage of well-folded nanostructures (18.7%) characterised by a low stability during the AFM imaging. Semi-folded and mis-folded assemblies represented 81.3% of the T3 origami population. The side average lengths of the Ti and T2 variants were compatible with the design.
[0300] The Ri and R2 variants (Figure 7 a) and b) respectively) exhibited the higher percentages of well-folded rectangles under AFM (Figure 7d). Conversely, the R3 variant (Figure 7c) showed a high percentage (about 90%) of semi-folded and misfolded assemblies.
[0301] It should be noted that blunt end stacking interactions leading to origami dimers and multimers were observed by AFM imaging on Ri and R2 variants (Figure 7 a) and b) respectively). In detail, the Ri variant often formed longer stacked step-like chains compared to the R2 variant with more frequent dimers. Methods to avoid aggregation based on stacking interactions were not present in the origami designs used. The measured length and width of Ri and R2 variants were consistent with theoretical values (length and width expected values were 62 nm and 47 nm, respectively).
Claims
Claims1. A method of designing a nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts, wherein said method comprises: a) obtaining information about a set of parent component parts that comprise a parent nanostructure which has said or is estimated or calculated to have said topology, function and / or geometry, wherein said set of parent component parts comprises n total parent component parts and m different parent component parts; b) reducing the number of parent component parts or the number of different parent component parts to produce one or a plurality of candidate sets of component parts wherein each said candidate set of component parts comprises the same set of parent component parts, but wherein each candidate set of component parts comprises i) n-1, or n-2, or n-3 or fewer total candidate component parts; and / or m-1, or m-2 or m-3 or fewer different candidate component parts; c) measuring or calculating the topology, function and / or geometry of a candidate nanostructure formed from one or the plurality of sets of candidate component parts, based on the information about the plurality of sets of candidate component parts; d) selecting one or more candidate nanostructures that are measured or calculated to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s).
2. The method of claim 1 wherein in step (b) one or more of the candidate sets of component parts comprises the set of parent component parts but has: a) n-X or fewer total candidate component parts; or b) m-X or fewer different candidate component parts where X is a value that reduces n and / or m to a threshold or target number of total candidate component parts and / or different candidate component parts.
3. The method of claim 1 or 2 further comprising step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated at least once, optionally at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 150, 200 or more times.
4. The method of claim 1 or 2 further comprising step (e) wherein the one or more selected candidate nanostructures of (d) become the target nanostructure of (a), and wherein steps (a) to (e) are repeated / iterated until the one or more candidate nanostructures of (e) comprises a number of component parts that is below a threshold or target number of component parts (materials budget).
5. The method of any of claims 1 or 2 wherein each selected candidate nanostructure has a set of component parts with a total number of component parts (x) and a total number of different component parts (y) and wherein the method further comprises: step (e) providing, for one or more or each of the selected candidate nanostructures of (d) information about one or a plurality of further candidate sets of component parts wherein each said candidate set of component parts comprises the set of component parts of the corresponding candidate set of component parts, but wherein each set of further candidate component parts comprises i) x-1, or x-2, or x- 3 or fewer total further candidate component parts; and / or y-1, or y-2 or y-3 or fewer different candidate component parts; and step (f) measuring or calculating: i) the topology, function and / or geometry of one or the plurality of further candidate nanostructures formed from one or the plurality of sets of further candidate component parts, based on the information about the plurality of sets of further candidate component parts; and step (g) selecting one or more further candidate nanostructures that are predicted to have a topology, function and / or geometry that is within the target range of topologies, functions(s) and / or geometry(s).
6. The method of any of the preceding claims wherein step (b) is a computer implemented step, optionally using a materials minimalization genetic algorithm.
7. The method of any of the preceding claims wherein the component parts are polynucleotides.
8. The method of claim 7 wherein: a) the polynucleotides are composed of nucleic acids that are DNA and / or RNA; and / or b) the polynucleotides comprise a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
9. The method of any of claims 6 or 7 wherein one or more or all of the nucleic acids comprise at least 5 nucleotides for example at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200 or more nucleotides.
10. The method of any of the preceding claims wherein the nanostructure is a nucleic acid nanostructure.
11. The method of any of the preceding claims wherein the nanostructure is a DNA nanostructure or an RNA nanostructure.
12. The method of any of the preceding claims wherein the nanostructure is a nucleic acid origami nanostructure wherein the component parts comprise at least one scaffold polynucleotide and a plurality of staple polynucleotides.
13. The method of any of the preceding claims wherein the nanostructure is a nucleic acid origami nanostructure that comprises a plurality of substructures, wherein the plurality of substructures is formed from the component parts and each substructure comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides.
14. The method of any of the preceding claims wherein the nanostructure is a nucleic acid origami substructure that is formed from the component parts and comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides.
15. The method of any of the preceding claims wherein one or more or all of the parent component parts and / or one or more or all of the candidate component parts comprise one or more nucleic acid variant, optionally where the nanostructure is a nucleic acid origami structure that is formed from the component parts that comprises at least one scaffold polynucleotide and a plurality of staple polynucleotides, then at least one or more or all of the scaffold and / or plurality of staple polynucleotides comprises one or more nucleic acid variant.
16. The method of any of the preceding claims comprising a step (a ) (i ) prior to step (a):(a)(i) designing a parent nanostructure that comprises a parent set of component parts and obtaining information about said parent component parts.
17. The method of any of the preceding claims where the parent nanostructure is produced through the self-assembly of the parent component parts and the candidate nanostructures are produced through the self-assembly of the candidate component parts.
18. The method of claim 17 where the method further comprises a step of scoring one or more or each set of the parent component parts and / or candidate component parts according to an off-target self-assembly cost function.
19. The method of claim 18 wherein the off-target self-assembly cost function comprises one or more of the following cost criteria: i) Metric 1 (Mi) - the extent to which staples can initially bind the scaffold in non-designed locations wherein Mi is the (absolute) total fee energy of initial off-target staple binding locations on the scaffold ii) Metric 2 (M2) - the prevalence of scaffold-scaffold binding sites wherein M2 is the (absolute) total energy of scaffold-scaffold binding sites found in all of the distinct binding permutations iii) Metric 3 (M3) - the worst extent of staple-staple co-folding in the staple set wherein M3 is the (absolute) energy of the strongest staple-staple co-fold at 55C, optionally as assessed by NUPACK mfe(); and / oriv) Metric 4 (M4) the worst extent of intra-staple secondary structure in the staple set, wherein M4 is the (absolute) energy of the staple with the strongest hairpin-like fold at 55C, optionally as assessed by NUPACK mfe().
20. The method of claim 19 wherein Scores for Mi to M4 are given equal weighting.
21. The method of claim 19 wherein one or more of Mi, M2, M3 or M4 is given more weighting than one or more of the other scores.
22. The method according to any of claims 18-21 wherein the method comprises applying a filter G to the plurality of sets of component parts wherein filter G identifies the number of staples in a pareto front origami which contain four consecutive Gs in their sequence and optionally: i) removes the staples from the set of sequences; and / or ii) removes the set of component parts from the plurality of test component parts.
23. The method of any of claims 18-22 comprising mapping the scores for Mi to M4 to a low-dimensional objective space and calculating the pareto front of optimal tradeoffs in this space.
24. The method of any of claims 18-23 further comprising applying a multi-criteria decision making method, optionally TOPIS, SAW or KNEE to rank pareto candidates, and optionally selecting the one or more highest ranking candidates.
25. The method of any of claims 1-24 further comprising selecting one or a plurality of sets of candidate component parts with the most desired characteristics for physical production.
26. The method of any of claims 1-25 further comprising outputting the selected design for physical production.
27. The method of any of claims 1-26 further comprising producing the one or more selected candidate component parts.
28. The method of any of claims 15-27 wherein the nucleic acid variant is a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), MorpholinoOligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
29. The method of any of claims 12-28 wherein: a) the scaffold sequences are derived from repeat-prone biological sources; b) the staple legs are short, optionally 7 or 8nt in length and bind initially at intermediate temperatures; and / or c) when constant-temperature folding with no initial thermal denaturation step is employed.
30. The method of any of claims 19-29 wherein the method comprises exactly enumerating and minimising the effects of M1-M4.
31. The method of any of claims 1-30 wherein the plurality of candidate sets of component parts is at least 50, 100, 200, 300, 400, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750, 6000 or more sets of candidate component parts.
32. The method of any of claims 19-31 wherein determining Ml comprises using a sliding window process to determine binding sites at base-level resolution and allows for mismatching symmetric interior loops of different sizes within binding sites.
33. The method of any of claims 1-32 wherein the parent nanostructure has been designed according to any of methods 1-32.
34. The method of any of claims 1-33 wherein the selected one or more candidate nanostructures has reduced off-target self-assembly.
35. A nanostructure that has been designed according to the method of any of the preceding claims.
36. The nanostructure of claim 35 wherein the nanostructure is produced from a set of self-assembling component parts that has reduced off-target self-assembly relative to a nanostructure that has not been designed according to the method of any of the preceding claims.
37. A nanostructure with a programmed disordered structure or a partially disordered nanostructure.
38. The nanostructure of any of claims 36 or 37 wherein the nanostructure has any one or more of: a) one or more missing duplex segments; b) one or more non-uniform edge lengths; c) one or more non-uniform junction valences; d) local flexibility, optionally high local flexibility optionally where the nanostructure is not a fully filled, periodic origami lattice39. The nanostructure of any of claims 35-38 wherein the nanostructure is: a nucleic acid nanostructure, optionally a DNA or RIMA nanostructure a nucleic acid origami nanostructure, optionally a DNA origami nanostructure.
40. A method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts that self-assemble to form the nanostructure, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off- target self-assembly relative to a nanostructure not designed using the method, optionally wherein one or more of all of the nucleic acids are nucleic acid variants.
41. A method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts.
42. The method of any of claims 40 or 41 wherein the nanostructure is a nucleic acid nanostructure, optionally a DNA or an RNA nanostructure.
43. The method of any of claims 40-42 wherein the nanostructure is a DNA origami nanostructure that comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids.
44. The method of any of claims 40-43 wherein the method comprises:(a) designing a plurality of sets of test component parts where each set of test component parts is predicted to assemble into a nanostructure with the desired geometry; and(b) scoring each set of the test component parts according to an off-target selfassembly cost function.
45. The method of claim 44, wherein the off-target self-assembly cost function comprises one or more of the following cost criteria: i) Metric 1 (Mi) - the extent to which staples can initially bind the scaffold in non-designed locations wherein Mi is the (absolute) total fee energy of initial off-target staple binding locations on the scaffold ii) Metric 2 (M2) - the prevalence of scaffold-scaffold binding sites wherein M2 is the (absolute) total energy of scaffold-scaffold binding sites found in all of the distinct binding permutations iii) Metric 3 (M3) - the worst extent of staple-staple co-folding in the staple set wherein M3 is the (absolute) energy of the strongest staple-staple co-fold at 55C, optionally as assessed by NUPACK mfe(); and / or iv) Metric 4 (M4) the worst extent of intra-staple secondary structure in the staple set, wherein M4 is the (absolute) energy of the staple with the strongest hairpin-like fold at 55C, optionally as assessed by NUPACK mfe().
46. The method of claim 45 wherein Scores for Mi to M4 are given equal weighting.
47. The method of claim 45 wherein one or more of Mi, M2, M3 or M4 is given more weighting than one or more of the other scores.
48. The method according to any of claims 40-47 wherein the method comprises applying a filter G to the plurality of sets of component parts wherein filter G identifies the number of staples in a pareto front origami which contain four consecutive Gs in their sequence and optionally: i) removes the staple them from the set of sequences; and / or ii) removes the set of component parts from the plurality of test component parts.
49. The method of any of claims 40-48 comprising mapping the scores for Mi to M4 to a low-dimensional objective space and calculating the pareto front of optimal tradeoffs in this space.
50. The method of any of claims 40-49 further comprising applying a multi-criteria decision making method, optionally TOPIS, SAW or KNEE to rank pareto candidates, and optionally selecting the one or more highest ranking candidates.
51. The method of any of claims 40-50 further comprising selecting one or a plurality of sets of component parts with the most desired characteristics for physical production.
52. The method of any of claims 40-51 further comprising outputting the selected design for physical production.
53. The method of any of claims 40-52 further comprising producing the selected one or plurality of component parts.
54. The method of any of claims 40-53 wherein the nucleic acid variant is a nucleic acid variant of DNA and / or RNA, optionally selected from the group comprising or consisting of: PNAs (peptide nucleic acids), Locked Nucleic Acid (LNA), Morpholino Oligonucleotides (MOs), Glycol Nucleic Acid (GNA), Threose Nucleic Acid (TNA), Xeno Nucleic Acids (XNAs), Cyclohexene Nucleic Acid (CeNA), Phosphorothioate DNA / RNA, and Artificial Expanded Genetic Information Systems.
55. The method of any of claims 40-54 wherein: a) the scaffold sequences are derived from repeat-prone biological sources; b) the staple legs are short, optionally 7 or 8nt in length and bind initially at intermediate temperatures; and / or c) when constant-temperature folding with no initial thermal denaturation step is employed.
56. The method of any of claims 40-55 wherein the method comprises exactly enumerating and minimising the effects of M1-M4.
57. The method of any of claims 40-56 wherein the plurality of sets of test component parts is at least 50, 100, 200, 300, 400, 500, 750, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5250, 5500, 5750, 6000 or more sets of test component parts.
58. The method of any of claims 40-57 wherein determining Ml comprises using a sliding window process to determine binding sites at base-level resolution and allows for mismatching symmetric interior loops of different sizes within binding sites.
59. The method of any of claims 1-34 wherein the parent nanostructure has been designed according to any of methods 40-58.
60. The method of any of claims 1-34 wherein the selected candidate nanostructure is re-designed using the method of any of claims 40-58 so as to result in a nanostructure with a geometry, topology and / or function that is within a target range of geometries, topologies and / or functions with respect to a parent nanostructure, and which has improved on -target assemble with respect to the selected candidate nanostructure.
61. A nanostructure that has been designed according to the method of any of claims 40-58.
62. A nanostructure with improved on-target self-assembly properties.
63. A method of generating a database wherein the method comprises: i) the method of designing a nanostructure with a topology, function and / or geometry that is within a target range of topologies, function(s) and / or geometry(s) and that comprises a plurality (at least two or more) of component parts of any of claims 1-34; and / or ii) the method of designing a nucleic acid nanostructure with a desired geometry that is comprised of a set of component parts that self-assemble to form the nanostructure, wherein the set of component parts comprises at least one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts, and wherein the self-assembly of the component parts results in fewer instances of off- target self-assembly relative to a nanostructure not designed using the method, optionally of any of claims 40-58, optionally wherein one or more of all of the nucleic acids are nucleic acid variants ; and / or iii) the method of reducing the incidence of off-target assembly of a nucleic acid nanostructure with a desired geometry wherein the nucleic acid nanostructure comprises a set of component parts, wherein the set of component parts comprises atleast one scaffold nucleic acid and a plurality of staple nucleic acids, wherein the nucleic acid nanostructure is produced through the self-assembly of the component parts and wherein the database comprises information about the component parts of said nanostructures of any of claims 40-58; and outputting the resultant sequences or the component parts and / or other associated data to a database.
64. A method for training an Al algorithm, for example a generative Al system, wherein the method comprises training the Al system on one or more databases generated by the method of claim 63.
65. A method of designing a nanostructure using an algorithm such as an Al algorithm that has been trained on a dataset or database generated the method of claim 63.