Self-driving laboratory system and method for ai-based design and discovery of molecules of interest, and lipid nanoparticles as delivery vehicles
An AI-driven laboratory system iteratively synthesizes and analyzes molecules to develop ionizable lipids for lipid nanoparticles, addressing data scarcity issues and optimizing molecular designs for effective cargo delivery.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Filing Date
- 2025-12-09
- Publication Date
- 2026-06-18
AI Technical Summary
The development of self-driving laboratory systems for molecular therapeutics is challenging due to the scarcity of annotated historical data, making it difficult for models to learn and adapt in unexplored chemical spaces.
An AI-based self-driving laboratory system integrates a machine learning module and a laboratory module to iteratively synthesize and analyze molecules, using a transformer-based foundation model and active-learning experiment workflow to optimize ionizable lipids for lipid nanoparticles, despite limited data availability.
The system efficiently discovers novel ionizable lipids for lipid nanoparticles, enhancing cargo molecule delivery by iteratively refining molecular designs through high-throughput experimentation, overcoming data scarcity challenges.
Smart Images

Figure CA2025051650_18062026_PF_FP_ABST
Abstract
Description
PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 SELF-DRIVING LABORATORY SYSTEM AND METHOD FOR AI-BASED DESIGN AND DISCOVERY OF MOLECULES OF INTEREST, AND LIPID NANOPARTICLES AS DELIVERY VEHICLESCROSS REFERENCE TO PRIOR APPLICATIONS
[0001] The present application claims priority to US Application Number 63 / 729,809 filed on December 9, 2024, the entire contents of which are incorporated herein as if set forth in its entirety.FIELD
[0002] The following relates generally to design and discovery of molecules of interest. Specifically, this disclosure relates to self-driving laboratory (SDL) systems and methods for artificial intelligence (AI) based design and discovery of novel chemical compounds, and more specifically, to design and discovery of ionizable lipids for use in lipid nanoparticles (LNPs) for molecule delivery, ionizable lipids and LNPs.BACKGROUND
[0003] The following paragraphs are provided by way of background to the present disclosure. They are not, however, an admission that anything discussed therein is prior art or part of the knowledge of persons skilled in the art.
[0004] The increasing complexity of molecular and material discovery necessitates the development of autonomous systems powered by Al to navigate vast, uncharted molecular spaces with precision and efficiency. SDLs, which combine advanced robotic automation with data-driven experimental workflows, have emerged as powerful platforms for accelerating discovery. By executing iterative design-make-test-analyze (DMTA) cycles autonomously, SDLs have demonstrated remarkable potential in areas with well-established datasets, such as solid-state materials and small organic molecules.
[0005] However, the development of SDLs in data-sparse domains, particularly in molecular therapeutics, remains particularly challenging due to the scarcity of annotated historical data, which makes it difficult for models to learn and adapt to these unexplored chemical spaces.
[0006] It would be desirable to develop a SDL system and method that address one or more of these issues and / or shortcomings.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 SUMMARY
[0007] Various embodiments of SDL systems, methods, ionizable lipids and LNPs are provided according to the teachings herein.
[0008] In accordance with one aspect of the present disclosure, there is provided an artificial intelligence (AI) based automated self-driving laboratory system for developing molecule candidates. The system comprises a machine learning module and a laboratory module operatively coupled to the machine learning module. The machine learning module is pretrained with a set of molecules to learn structures and one or more properties of interest of the set of molecules. The machine learning module is configured to produce an initial set of molecules to be synthesized and analyzed by the laboratory module based on the learned structures and properties of interest, and receive feedback of structures and properties of interest based on the initial set of molecules synthesized and analyzed by the laboratory module for a further training and produce a next iteration of molecules to be synthesized and analyzed by the laboratory module. The laboratory module is configured to synthesize and analyze the initial set of molecules received from the machine learning module, relay the structures and properties of interest determined from synthesis and analysis of the initial set of molecules to the machine learning module for the iterative training, receive from the machine learning module the next iteration of molecules, synthesize and analyze the next iteration of molecules. The machine learning module is iteratively retrained to produce the next iteration of molecules, and the next iteration of molecules is iteratively synthesized and analyzed by the laboratory module until at least one termination criterion is met.
[0009] In various embodiments, the at least one termination criterion comprises at least one lead molecule candidate being identified for output via an output device associated with the system.
[0010] In various embodiments, the one or more properties of interest comprise one or more of pKa, hydrophobicity, hydrophilicity, pH, and cargo molecule transfection efficiency.
[0011] In various embodiments, the laboratory module comprises a synthesis module configured to synthesize the one or more of initial and next iteration set of molecules received from the machine learning model, and an analytical module configured to analyze the synthesized molecules for the one or more properties of interest and produce feedback for the machine learning module.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0012] In various embodiments, the synthesis module comprises a first handler module to conduct the synthesis of the produced set of molecules; and a second handler module to deliver the synthesized molecules to the analytical module.
[0013] In various embodiments, the analytical module comprises an incubator module accessible by the first handler module, the second handler module or both, and a reader module configured to measure the one or more properties of the synthesized molecules. The incubator module is configured to hold targets for incubation of the synthesized molecules for a predetermined amount of time.
[0014] In various embodiments, training of the machine learning module comprises a pretraining stage with a generic dataset of molecules and a continual pretraining stage for refining model embeddings based on a domain specific dataset of molecules.
[0015] In various embodiments, the molecule candidates are ionizable lipids.
[0016] In various embodiments, the second handler module is further configured to formulate lipid nanoparticles (LNPs) and dose each of the formulated LNPs with the targets and deliver the formulated LNPs dosed with the targets to the incubator module for incubation, wherein each LNP comprises a unique synthesized ionizable lipid and a cargo molecule of interest.
[0017] In various embodiments, the reader module is configured to measure and determine a corresponding cargo molecule transfection efficiency value of each of the formulated LNPs in the targets.
[0018] In various embodiments, the cargo molecule of interest is a protein, a peptide, or a nucleic acid.
[0019] In various embodiments, the nucleic acid is DNA, siRNA, tRNA, circRNA, miRNA, mRNA or a combination thereof.
[0020] In various embodiments, the targets comprise cell systems, multicellular constructs, or both.
[0021] In various embodiments, the cell systems comprise one or more of cancer cell lines, immortalized cell lines, primary cells, stem cells, differentiated cells derived therefrom, mammalian cells, non-mammalian cells, engineered cells, and recombinant cells; and the multicellular constructs are selected from organoids and spheroids.
[0022] In various embodiments, the ionizable lipid is of General Formula (I)PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.11 3(I), wherein Ri is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, and optionally having a halogen at the terminal position or within the intermediate chain extension; R2is selected from acyclic amine, cyclic amine and heterocyclic amine, each of the acyclic amine, cyclic amine and heterocyclic amine having at least one ionizable nitrogen atom; R3is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl, C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; and R4is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; provided that at least one of Ri, R3and R4has a halogen at the terminal position or within the intermediate chain extension.
[0023] In some embodiments, the halogen is fluorine, chlorine or bromine. In some embodiments, R3has a halogen at the terminal position.
[0024] In various embodiments, Ri is:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1O' O' R2 is:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1oHOoPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 andR4is:
[0026] In various embodiments, the ionizable lipid is synthesized by a method comprising reacting a compound of Formula A, a compound of Formula B, a compound of Formula C and a compound of Formula D under conditions to provide the ionizable lipidPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1R1CHO R2-NH2R3-COOH R4-NCB C D |n at least one embodiment, the method is performed using a high-throughput chemical procedure.
[0027] In accordance with another aspect of the present disclosure, there is provided an AI-based method for developing molecule candidates. The method comprises: a) producing, by a pretrained machine learning module, an initial set of molecules to be synthesized and analyzed based on structures and properties of interest learned from a set of molecules; b) synthesizing and analyzing, by a laboratory module, the initial set of molecules synthesized; c) training the machine learning module based on the structure and analyzed properties of interest of the initial set of molecules synthesized and analyzed by the laboratory module; d) producing, by the machine learning module, a next iteration of molecules to be synthesized and analyzed; e) synthesizing and analyzing, by the laboratory module, the next iteration of molecules; and repeating steps b) to e) until at least one termination criterion is met.
[0028] In various embodiments, the at least one termination criterion comprises at least one lead molecule candidate being identified for output via an output device associated with one or more of the machine learning module and the laboratory module.
[0029] In various embodiments, the molecule candidates are ionizable lipids.
[0030] In various embodiments, the properties of interest comprise one or more of pKa, hydrophobicity, hydrophilicity, pH and cargo molecule transfection efficiency.
[0031] In various embodiments, training of the machine learning module comprises a pretraining stage with a generic dataset of molecules and a continual pretraining stage for refining model embeddings based on a domain specific dataset of molecules.
[0032] In accordance with yet another aspect of the present disclosure, there is provided a compound of General Formula IO R2R R1 3(I), wherein R1 is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, and optionally having a halogen at the terminal position or within the intermediate chain extension; R2is selected from acyclic amine, cyclic amine and heterocyclic amine,PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 each of the acyclic amine, cyclic amine and heterocyclic amine having at least one ionizable nitrogen atom; R3is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl, and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; and R4is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; provided that at least one of Ri, R3and R4has a halogen at the terminal position or within the intermediate chain extension.
[0033] In accordance with yet another aspect of the present disclosure, there is provided a lipid nanoparticle comprising an ionizable lipid described herein and a cargo molecule.
[0034] In various embodiments, the cargo molecule is a protein, a peptide, or a nucleic acid. In some embodiments, the nucleic acid is DNA, siRNA, tRNA, circRNA, miRNA, mRNA or a combination thereof.
[0035] Other features and advantages of the present disclosure will become apparent from the following detailed description taken together with the accompanying drawings. It should be understood, however, that the detailed description and the specific examples, are given by way of illustration only, since various changes and modifications within the spirit and scope of the present disclosure will become apparent to those skilled in the art from this detailed description.BRIEF DESCRIPTION OF THE DRAWINGS
[0036] Embodiments will now be described with reference to the appended drawings wherein:
[0037] FIG. 1Ato 1E (collectively, FIG. 1) illustrate an example closed-loop SDL system, hereinafter referred to as LUMI-system solely for ease of reference, according to some embodiments of the present disclosure. FIG. 1A illustrates an example molecular foundation model, hereinafter referred to as LUMI-model solely for ease of reference; FIG. 1B illustrates an example control system comprising one or more example software modules of a control panel and data analysis associated with the closed-loop SDL system; FIG. 1C illustrates an example simplified schematic of one or more example hardware modules, hereinafterPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 collectively referred to as LUMI-lab solely for ease of reference, and automatic experiments controlled by an orchestration module associated with the LUMI-system, with the curves illustrating, as an example, two experiments conducted simultaneously in an orchestrated manner; FIG. 1D illustrates the one or more example hardware modules shown in FIG. 1C; and FIG. 1E illustrates example iterations performed by the closed-loop SDL system;
[0038] FIG. 2 illustrates a timeline of automatic experiments in the one or more example hardware modules controlled by the orchestration module associated with the closed-loop SDL system of FIG. 1 and the shift of tasks between the experiments;
[0039] FIG. 3 is a schematic of a system architecture of the closed-loop SDL system of FIG. 1, showing example software modules and example hardware modules and their communication with the LUMI-model;
[0040] FIG. 4 illustrates an example method including training steps of the LUMI-model according to some embodiments of the present disclosure;
[0041] FIG. 5 is a top view illustration of an example setup showing example hardware components in the LUMI-lab of the closed-loop LUMI-system of FIG. 1, with arrows indicating the directions of transferring materials and reagents between various components;
[0042] FIG. 6 illustrates an example sampler module of the LUMI-lab shown in FIG. 5;
[0043] FIG. 7 illustrates an example feeder module in the LUMI-lab shown in FIG. 5;
[0044] FIG. 8Ato 8F (collectively, FIG. 8) illustrate an example embodiment of a method for developing LNPs. FIG. 8A illustrates examples of lipid candidates selected by an example LUMI-system; FIG. 8B is a schematic illustration of formation of examples cargo-LNPs followed by intratracheal (I. T.) injection for in vivo functional validation; FIGs. 8C and 8D illustrate graphs showing physical properties of the example cargo-LNPs, including size (nm) and Polymer Dispersity Index (PDI); and FIGs. 8E and 8F illustrate in vivo imaging system (IVIS) results showing luciferase expression in mouse lungs mediated by the example cargo-LNPs formulated with different lipid candidates; FIG. 8F illustrates quantification of luciferase expression in mouse lungs;
[0045] FIG. 9 shows a chemistry library for synthesis of the ionizable lipids using Ugi 4-Component Reactions (Ugi-4CRs) according to embodiments of the present disclosure;
[0046] FIG. 10Ato 10D (collectively, FIG. 10) illustrate an example method of using a closed-loop SDL system to develop cargo-LNPs according to embodiments of the present disclosure. FIG. 10A is an illustration of an example dual-plate experiment strategy in eachPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 iteration; FIG. 10B illustrates mRNA transfection potency (mTP) readouts of the exploitation plates of FIG. 10A through ten experiment iterations; FIG. 10C illustrates performance of four components in the combinatorial chemical library of FIG. 9 for the ionizable lipid synthesis; and FIG. 10D illustrates mTP readouts of the exploration plates through iterations;
[0047] FIG. 11 A to 11 E (collectively, FIG. 11) illustrate the ionizable lipids developed using the closed-loop SDL system as illustrated in FIG. 10A. FIG. 11A illustrates Uniform Manifold Approximation and Projection (UMAP) of all synthesizable lipids out of the Ugi-4CR library illustrated in FIG. 9; FIG. 11 B illustrates UMAP of all synthesizable lipids (top), and zoom-in panels with respect to the subclusters of the lipids shown in FIG. 11A; FIG. 11C illustrates distribution of the lipids; FIG. 11 D illustrates mTP distributions of each lipid candidate in cargo-LNP; and FIG. 11 E illustrates a pie chart showing categorizations of the lipids by the tails (inner) and by the headgroups (outer);
[0048] FIG. 12Ato 12C (collectively, FIG. 12) illustrate results from a head-to-head comparison of brominated and non-brominated ionizable lipids according to some embodiments of the present disclosure. FIG. 12A illustrates results of relative luminescence units between the compounds; FIG. 12B illustrates results of cytotoxicity assessment of the compounds; and FIG. 12C illustrates results of fluorescence image analysis;
[0049] FIG. 13Ato 13D (collectively, FIG. 13) illustrate gene editing evaluation of a brominated ionizable lipid according to some embodiments of the present disclosure. FIG.13A is an illustration of the gene editing procedure; FIG. 13B illustrates IVIS images of animal organs; FIG. 13C illustrates graphs of flow cytometry analysis; and FIG. 13D illustrates immunofluorescence images of lung sections;
[0050] FIG. 14Ato 14F (collectively, FIG. 14) illustrate a study of in vivo Adenine Base Editing (ABE) mediated by example cargo-LNPs according to some embodiments of the present disclosure. FIG. 14A illustrates a schematic of the gene editing strategy; FIG. 14B shows a schematic of experimental design for in vivo base editing; FIG. 14C illustrates an ex vivo bioluminescence imaging (IVIS) of major organs; FIG. 14D illustrates a graph showing luminescence activity (in Relative Light Units (RLU)); FIG. 14E illustrates a graph showing Sanger sequencing chromatograms of the target DNA locus from lung tissue; and FIG. 14F illustrates a graph showing in vivo base editing efficiency in the lung;
[0051] FIG. 15Ato 15D (collectively, FIG. 15) illustrate subchronic toxicity and immune response assessments of ionizable lipids according to some embodiments of the present disclosure. FIG. 15A illustrates: heatmaps displaying dynamic expression of cytokines andPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 chemokines in the plasma of the animal after LT. administration of example cargo-LNPs; FIG. 15B illustrates histopathological images; FIG. 15C illustrates images and graph from a hemolysis assay; and FIG. 15D shows a graph measuring sC5b-9 concentration;
[0052] FIG. 16Ato 16B (collectively, FIG. 16) illustrate another subchronic toxicity and immune response assessments of ionizable lipids according to some embodiments of the present disclosure. FIG. 16A illustrates heatmaps displaying dynamic expression of cytokines and chemokines in the plasma of the animal after I. M. administration of example cargo-LNPs; and FIG. 16B shows histopathological images;
[0053] FIG. 17Ato 17C (collectively, FIG. 17) illustrate small-angle X-ray scattering (SAXS) and Cryogenic Transmission Electron Microscopy (Cryo-TEM) study of ionizable lipids according to some embodiments of the present disclosure. FIG. 17A illustrates bioluminescence images and statistical analysis following inhalation of LUMI-6CI and LUMI-6 LNPs encapsulating mRNA; FIG. 17B illustrates a Cryo-TEM SAXS graph; and FIG. 17C shows Cryo-TEM images;
[0054] FIGs. 18Ato 18C (collectively, FIG. 18) illustrate development of ionizable lipids using the closed-loop SDL system as illustrated in FIG. 10. FIG. 18A illustrates UMAP of lipid embeddings by models after pretraining and selected iterations; FIG. 18B shows ranking across iterations of a subset of lipid candidates; and FIG. 18C illustrates a graph showing trends of proposed Br containing lipid candidates at individual exploitation plates and accumulated number of Br containing lipid candidates in experiments;
[0055] FIG. 19Aand 19B (collectively, FIG. 19) are graphs showing mTP results according to some embodiments of the present disclosure. FIG. 19A are graphs showing mTP results of classic formulations using different helper lipids; and FIG. 19B are graphs showing mTP results of predetermined formulations; and
[0056] FIG. 20 illustrates a block diagram of an exemplary SDL system according to various embodiments of the present disclosure.
[0057] Further aspects and features of the example embodiments described herein will appear from the following description taken together with the accompanying drawings.DETAILED DESCRIPTION
[0058] Various embodiments and aspects of the disclosure will be described with reference to details discussed below. The following description is illustrative of the disclosure and is not to be construed as limiting the disclosure. Numerous specific details are describedPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 to provide a thorough understanding of various embodiments of the present disclosure. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments of the present disclosure.
[0059] Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
[0060] In understanding the scope of the present disclosure, the articles “a”, “an”, “the”, and “said” preceding an element are intended to mean that there are one or more of the elements. Additionally, the term “comprising” and its derivatives, as used herein, are intended to be open-ended terms that specify the presence of the stated features, elements, components, groups, integers, and / or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and / or steps. The foregoing also applies to words having similar meanings such as the terms, “including”, “having”, and their derivatives.
[0061] It will be understood that any embodiments described as “comprising” certain components may also “consist of’ or “consist essentially of,” whereas “consisting of’ has a closed-ended or restrictive meaning and “consisting essentially of’ means including the components specified but excluding other components except for materials present as impurities, unavoidable materials present as a result of processes used to provide the components, and components added for a purpose other than achieving the technical effect of the invention.
[0062] It should also be noted that, as used herein, the wording “and / or” is intended to represent an inclusive-or. That is, “X and / or Y” is intended to mean X or Y or both, for example. As a further example, “X, Y, and / or Z” is intended to mean X or Y or Z or any combination thereof.
[0063] It should be noted that terms of degree such as “substantially”, “about” and “approximately” as used herein mean a reasonable amount of deviation of the modified term such that the end result is not significantly changed. These terms of degree may also be construed as including a deviation of the modified term, such as by 1%, 2%, 5%, or 10%, for example, if this deviation does not negate the meaning of the term it modifies.
[0064] Furthermore, the recitation of numerical ranges by endpoints herein includes all numbers and fractions subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.90, 4, and 5). It is also to be understood that all numbers and fractions thereof arePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 presumed to be modified by the term “about” which means a variation of up to a certain amount of the number to which reference is being made if the end result is not significantly changed, such as 1%, 2%, 5%, or 10%, for example.
[0065] As used herein, the term “exemplary” means “serving as an example, instance, or illustration,” and should not be construed as preferred or advantageous over other embodiments and / or configurations disclosed herein.
[0066] The term “alkyl” as used herein, whether it is used alone or as part of another group, means straight or branched chain, saturated alkyl groups. The number of carbon atoms that are possible in the referenced alkyl group is indicated by the prefix “Cn”. Thus, for example, the term “C1-C6 alkyl” (or“C1-6 alkyl”) means an alkyl group having 1, 2, 3, 4, 5, or 6 carbon atoms and includes any of the isomers, for example, any of the hexyl alkyl and pentyl alkyl isomers as well as n-, iso-, sec- and tert-butyl, n- and iso-propyl, ethyl and methyl. As another example, “C1-C4 alkyl” includes n-, iso-, sec- and tert-butyl, n- and isopropyl, ethyl and methyl.
[0067] The term “alkylene”, whether it is used alone or as part of another group, means a straight or branched chain, saturated alkylene group, that is, a saturated carbon chain that contains substituents on one or two of its ends. The number of carbon atoms that are possible in the referenced alkylene group are indicated by the prefix “Cn”. For example, the term C1-C6 alkylene means an alkylene group having 1, 2, 3, 4, 5 or 6 carbon atoms.
[0068] The term “alkenyl”, whether it is used alone or as part of another group, means a straight or branched chain, unsaturated alkyl groups containing at least one double bond. The number of carbon atoms that are possible in the referenced alkenyl group are indicated by the prefix “Cn”. For example, the term C2-6alkenyl means an alkenyl group having 2, 3, 4, 5 or 6 carbon atoms.
[0069] The term “cage hydrocarbon”, whether it is used alone or as part of another group, generally refers to a substituted or unsubstituted hydrocarbon compound in which the carbon atoms are arranged in a cage-like structure.
[0070] The term “hetero” as used herein, whether it is used alone or as part of another group, refers to an atom that is not carbon or hydrogen, or a hetero-moiety containing at least one heteroatom.
[0071] The term “hydrolyzable” as used herein refers to a chemical group that comprises at least one bond that is broken by a hydrolysis reaction under certain conditions, for example, under physiological conditions.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0072] The term “hydrolysis” as used herein refers to a chemical reaction in which a molecule of water breaks one or more chemical bonds.
[0073] As used herein, an “ionizable lipid” refers to a class of lipid molecules that remain neutral at physiological pH, but are protonated at low pH, making them positively charged. As an example, the term “ionizable” refers to the ability of a functional group, for example, a tertiary amine, in a molecule that remains neutral at physiological pH, but becomes protonated at lower pH, making it positively charged.
[0074] As used herein, the term “nanoparticle” refers to any particle having a diameter that makes the particle suitable for systemic administration of active agents or molecules.
[0075] As used herein, the term “lipid nanoparticle” (LNP) refers to a nanoparticle comprising lipids. LNPs are often used as a pharmaceutical drug delivery system or pharmaceutical formulation, or as a delivery vehicle for cargo molecules such as peptides, proteins, or nucleic acids, including ribonucleic acids (RNAs) or deoxyribonucleic acids (DNAs).
[0076] The term “transfection efficiency” as used herein refers to the proportion or percentage of targets, for example, cells, within a population that successfully uptake a cargo molecule (cargo for abbreviation). When the cargo molecule is mRNA, the term “transfection efficiency” as used herein refers to the proportion or percentage of cells within a population that successfully uptake and express the mRNA, introduced into them via an LNP.
[0077] The term “produce” (or “producing”, “produced” or “production”) as used herein in the context of Al, is generally understood, to include generating molecular structures and or conformations; and / or predicting molecular properties and / or other properties of interest followed by proper ranking of the properties for molecule candidate selection.
[0078] The example embodiments of the modules, apparatuses, devices, systems, or methods described in accordance with the teachings herein may be implemented as a combination of hardware and software. For example, the embodiments described herein may be implemented, at least in part, by using one or more computer programs, executing on one or more programmable devices comprising at least one storage element (i.e., at least one volatile memory element and at least one non-volatile memory element). The hardware may comprise input devices, including at least one of a touch screen, a keyboard, a mouse, buttons, keys, sliders, and the like, as well as one or more of a display, a printer, and the like, depending on the implementation of the hardware.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0079] While the present disclosure uses the design and discovery of LNPs to illustrate SDL systems disclosed herein and methods of using such systems, it will be appreciated that the systems and methods can be readily applied to the design and discovery of other molecules of interest, for example, molecules as therapeutics.SELF-DRIVING LABORATORY (SDL) SYSTEMS
[0080] Conventional SDL implementations often rely on machine learning models that require extensive fine-tuning or domain-specific training data to provide accurate predictions. This reliance on large-scale data and computational tuning presents a significant obstacle for emerging fields where the availability of historical data is sparse. For instance, with regard to the LNP development for nucleic acid delivery, annotated datasets are limited and the design landscape is largely unexplored. Addressing these limitations requires foundational advances in both the computational and experimental components of SDLs: models that are capable of efficiently generalizing novel tasks, and laboratory systems that can execute high-throughput, high-fidelity experimentation preferably with minimal human intervention.
[0081] The availability of historical data on LNPs is limited, as only three LNPs have received FDA approval to date. In addition, the design of ionizable lipids in LNPs, crucial for cargo molecules, such as mRNA, encapsulation and endosomal escape, has been heavily relying on prior expert knowledge due to the absence of comprehensive research data and systematic development platforms.
[0082] In one aspect, there is provided an integrated SDL system 100 that combines robotics, automation, and machine learning to create a fully autonomous platform for molecular discovery. An exemplary SDL system 100 is a LUMI- (Large-scale Unsupervised Modeling followed by Iterative experiments) system, which comprises two components: (i) a foundation machine learning module, referred to as LUMI-model solely for ease of reference, and (ii) an automated, closed-loop laboratory module, referred to as LUMI-lab solely for ease of reference. Generally, the LUMI-lab is driven by the LUMI-model’s predictions, provides “wet-lab” data to iteratively optimize the compounds or molecules of interest through syntheses, formulations, and / or high-throughput screening. The system details of the SDL system 100, the LUMI-model 102 and the LUMI-lab 104 will be described in greater detail later in the description with reference to FIGS. 1 and 20.
[0083] According to one embodiment, there is provided an SDL system 100 designed for the discovery of molecules of interest. For example, the SDL system 100 can be used for design and discovery of new ionizable lipids for the delivery of cargo molecules such asPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 proteins, peptides, and nucleic acids, addressing unique challenges of lipid LNP development in a field with limited historical data. Provided are various embodiments of an SDL system 100 comprising a closed-loop optimization and an autonomous lab for LNP engineering and particularly, ionizable lipid engineering. According to one embodiment, the SDL system 100 is a LUMI-system, which is a self-driving laboratory platform that integrates a transformer-based foundation machine learning model with an active-learning experiment workflow to address the challenges of data scarcity.
[0084] In an embodiment, the example closed-loop SDL system 100, or the LUMI-system, described herein is a fully autonomous platform for molecular discovery that integrates computational, software, and hardware components into a unified workflow.Designed to streamline the iterative DMTA process, the LUMI-system 100 combines machine learning-driven molecular design, experimental automation, and active learning to enable efficient optimization of complex chemical systems.
[0085] Reference is made to FIGs. 1 and 20, which illustrate an example LUMI-system 100, a closed-loop optimization and autonomous platform according to various embodiments of the present disclosure. In an example implementation, the LUMI-system 100 comprises a machine learning module 102, referred to as the LUMI-model herein solely for ease of reference, and a laboratory module 104, referred to as the LUMI-lab herein solely for ease of reference, and a control system 106 operatively coupled to the LUMI-model 102 and the LUMI-lab 104. The control system 106 may be configured to provide and / or receive input and output signals from and transmit one or more control signals to the one or more components, such as the LUMI-model 102, the LUMI-lab 104, associated with the LUMI-system 100.
[0086] The entire LUMI-system 100, including the LUMI-lab 104, the LUMI-model 102 and the control system 106, can be implemented within a single computing environment or in some examples, in a distributed computing environment, wherein the modules can be configured to communicate with one another over a network (not shown). In some examples, the LUMI-system 100 may be implemented in a cloud environment incorporating the operations of the LUMI-model 102, the LUMI-Lab 104, and / or the control system 106 to provide the functionalities provided herein this disclosure. Further, although the LUMI-model 102 and the control system 106 are shown as separate modules, it will be appreciated that they can be implemented in a single unit.
[0087] As shown in FIG. 20, the control system 106 can include an input / output (I / O) unit 202, a memory unit 204, a communication interface 206, and a processor 208. It will bePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 appreciated by those of ordinary skill in the art that FIG. 20 depicts the control system 106 in a simplified manner and a practical embodiment may include additional components and suitably configured logic to support known or conventional operating features that are not described in detail herein. It will further be appreciated by those of ordinary skill in the art that the control system 106 may be implemented as a cloud based system, a server, a personal computer, a desktop computer, a tablet, a smartphone, or as any other computing device known now or that may be developed in the future.
[0088] The I / O unit 202 may be used to receive one or more inputs from and / or to provide one or more system outputs to one or more devices or components. For example, the I / O unit 202 may be configured to receive one or more inputs from the one or more users of the LUMI-system 100. The control system 100 may enable the LUMI-system 100 to receive one or more input via the I / O unit 202 including, for example, a keyboard, touch screen, touchpad, mouse or any other input device. The control system 106 output may include the identified one or more lead molecule candidates provided by the LUMI-system 100 and can be provided as a system output via the I / O unit 202 including, for example, a display device, speakers, printer (not shown) or any other output device associated with the system 100.
[0089] The memory unit 204 may include any of the volatile memory elements (e.g., random access memory (RAM), nonvolatile memory elements (e.g., ROM), and combinations thereof. Further, the memory unit 204 may incorporate electronic, magnetic, optical, and / or other types of storage media. It may be contemplated that the memory unit 204 may have a distributed architecture, where various components are situated remotely from one another, and are accessed by the LUMI-system 100 and any of the components associated with the LUMI-system 100, such as the LUMI-model 102, or the control system processor 208. The memory unit 204 may include one or more software programs, each of which includes listing of computer executable instructions for implementing logical functions. The software in the memory unit 204 may include a suitable operating system and one or more programming codes for execution by the components, such as the LUMI-model 102, the LUMI-lab 104 and the control system processor 208 associated with the LUMI-system 100. The operating system may be configured to control the execution of the programming codes and provide scheduling, input-output control, file and data management, memory management, and communication control, and related services. The programming codes may be configured to implement the various processes, algorithms, methods, techniques, etc. described herein.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0090] The communication interface 206 may be configured to enable communication, such as on a network, between all the components of the LUMI-system 100. The communication interface 206 may include, for example, an Ethernet card or adapter or a wireless local area network (WLAN) card or adapter. Additionally, or alternatively, the communication interface 206 may include a radio frequency interface for wide area communications such as Long-Term Evolution (LTE) networks, or any other networks known now or developed in the future. The communication interface 206 may include address, control, and / or data connections to enable appropriate communications on the network.
[0091] The processor 208 may be a hardware device for executing software instructions, such as the software instructions stored in the memory unit 204 for achieving the functionalities of the LUMI-system 100 as described in the present disclosure. The processor 208 may include one or more of a custom made or commercially available processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the processor 208, a semiconductor-based microprocessor, or generally any device for executing software instructions. When the LUMI-system 100 is in operation, the processor 208 may be configured to enable executing a set of software modules to generally control and perform the one or more operations of the LUMI-system 100 pursuant to the software instructions.
[0092] In an embodiment, the machine learning module 102 is pretrained with a set of compounds or molecules to learn structures, physical, chemical properties of the set of compounds or molecules and / or other properties of interest. The physical and chemical properties can comprise, for example, one or more of pKa, hydrophobicity, hydrophilicity, and pH, and other properties of interest can compromise, for example, cargo molecular transfection efficiency. The LUMI-model 102 is configured to propose an initial set of molecules or compounds to be synthesized and analyzed by the laboratory module 104 based on the learned structures and properties. The LUMI-model 102 is further configured to receive feedback of structures of the initial set of molecules synthesized and the initial set of properties obtained, and analyzed by the laboratory module 104 fora further training and iterative retraining and propose a next iteration of molecules to be synthesized and analyzed by the laboratory module 104, such as for identifying one or more lead molecule candidates.
[0093] In an embodiment, using mRNA as the cargo molecule as an example, the LUMI-model 102 (illustrated in FIG. 1A) is a pretrained foundation model that serves as the computational “brain” of the LUMI-system 100. The LUMI-model 102 incorporates the transformer architecture to understand varying three-dimensional (3D) molecular structures,PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 and links lipid structural features to its potential mRNA transfection efficacy. In each experimental iteration, the LUMI-model 102 proposes candidate set of molecules, such as candidate molecules for synthesis and testing, gradually prioritizing and identifying those that can provide good or high mTP. The experimental process is supported by the control system 106 that includes a suite of software modules, which manage laboratory communication, experimental orchestration, and hardware controls (FIGs. 1B and 3). These modules integrate robotic operations within the laboratory module 104 (or the LUMI-lab 104) with realtime data acquisition, providing a centralized, automated platform for workflow management.
[0094] In an embodiment, the laboratory component 104 includes a synthesis module 302 and an analytical module 304 configured to synthesize and analyze, respectively, the provided set of molecule candidates received from the LUMI-model 102. For example, the hardware infrastructure of the laboratory module 104, includes samplers and handlers for lipid synthesis and nanoparticle formulation, robotic arms for material transfer, incubators, plate readers for data acquisition, analysis and readouts, and feeder systems for pipette tips and multi-well plates. These modules of the LUMI-lab 104 will be described in greater detail below.
[0095] Referring to FIGs. 1C, 1D, 5 and 20, the LUMI-lab is a “wet lab” component the LUMI-system 100, comprising a first handler module 306 and a second handler module 308, a sampler module 309, a robot unit 305, an incubator module 310, such as a cell incubator, a reader module 312, such as a plate reader, for assay measurements, and one or more helping components, comprising a feeder system 320 having automated loading systems for equipment such as pipette tips and multi-well plates (96-well plates are shown for illustration). In some example implementations, the first and second handler module 306, 308 can be implemented as automated liquid handlers (a first and a second liquid handler are shown for illustration, with one for ionizable lipid synthesis and the other for cargo-LNP formulation and cell dosing with the formulated cargo-LNPs). The sampler module can be implemented as an automated liquid sampler. Each of these components of the LUMI-lab 104 may be configured to perform their respective operations based on the one or more control signals received from the control system 106 based on the inputs from the LUMI-model 102, for example.
[0096] In an example implementation, the first handler module 306, the second handler module 308, and the sampler module 309 may function as part of the synthesis module 302, and the reader module 312 and the incubator module 310 may function as part of the analytical module 304 within the lab module 104 of the system 100. In some examples, thePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 robot unit 305 may be implemented as a robotic arm and may be configured to function as part of both the synthesis module 302 and the analytical module 304. Further, although there are two handler modules 306, 308 shown and described here, it will be appreciated that they can be combined and implemented within a single handler module to achieve the functionalities described herein.
[0097] In an example implementation, the sampler module 309 can include an automated liquid sampler that can act as a central supply hub. The sampler module 309 (hereinafter referred to as the liquid sampler 309 solely for the sake of simplicity) can, in some examples, comprise one or more two integrated modules. For example, as shown in FIGs. 6 and 7, the liquid sampler 309 can include a dispensing module 602 configured for an automated replenishment of raw synthesis reagents, and a raw material storage module 604 configured to store the raw synthesis reagents, both anchored to a base 606. In one example, located at the base 606, the dispensing module 602 is a liquid dispensing module, which features a multi-channel nozzle head (96-chanel is shown for illustration) matching a multi-well plate format (96-well plate is shown for illustration). Movement of this module can be driven by a dual-rail gantry system and a stepper motor 610, allowing a precise alignment with the deep-well plates. In one example, the raw material storage module 604 comprises one or more tiers (two tiers are shown for illustration). Each tier houses one or more pumps 612 that may be implemented as, for example, peristaltic pumps (48 peristaltic pumps each tier shown for illustration). This module 604 may be equipped with a quick-change adapter system with a matching number of syringes 614 for rapid reagent loading. In an example, each syringe 614 may be labeled with a QR code, enabling intelligent, vision-based inventory management.
[0098] In operation, upon completion of an experimental round, the control system 106 calculates reagent consumption. The robotic arm 305 places a deep-well plate onto the sampler module’s 309 mobile base, which transports it to the dispensing zone. In some implementations, controllers such as Raspberry Pi controllers may activate specific peristaltic pumps 612 to refill missing reagents. Once replenished, the well plate is ejected for the robotic arm 305 to retrieve. The control system 106 also monitors inventory levels, alerting operators when bulk refills are necessary.
[0099] In some example implementations, the first and second handler modules 306, 308 can be two liquid handling workstations that may be based on the Opentrons® OT-2 platform, designated as the first liquid handler 306 for ionizable lipid synthesis and the second liquid handler 308 for cargo-LNP formulation and cell dosing with the cargo-LNPs,PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 respectively. Both handlers 306, 308 are configured to leverage the OT-2's high-precision automated pipetting for high-throughput liquid transfer. The first liquid handler 306 can be dedicated to lipid synthesis by integrating an on-deck shaker module to automate the mixing and reaction processes within the workstation. The second liquid handler 308 can be dedicated to formulation of cargo-LNPs and cell dosing by integrating a custom-engineered cooling module to ensure synthesized lipids are preserved at optimal low temperatures prior to biological application.
[0100] For example, in operation, following reagent replenishment at the sampler module 309, the robotic arm 305 transfers the deep-well plate to the first liquid handler 306. Guided by Al predictions provided by the LUMI-model 102, the first liquid handler 306 executes the synthesis protocol. The resulting lipids are then transferred to the second liquid handler 308 for formulation of cargo-LNPs and cell dosing, after which the cell plates are moved to the incubator module 310.
[0101] In some example implementations, the incubator module 310 may be any one that is commercially available, which may be modified for automation, or equipped with automation function(s). In one example, the incubator module 310 is a MyTemp™ Mini CO2Digital Incubator modified for automation. Further, the door mechanism of the incubator 310 may be actuated by an automated hydraulic linear pusher, with the opening and closing sequences synchronized via communication protocols with the robotic arm 305 to ensure seamless plate loading and unloading.
[0102] In some example implementations, the robotic arm 305 may be any one that is commercially available. In one embodiment, the robotic arm 305 is a UR5e robotic arm by Universal Robots. The robotic arm 305 may be configured to serves as a central material transport unit. For example, the robotic arm 305 may be equipped with a gripper optimized for stability when handling various labware formats. The robotic arm 305 can maintain a seamless communication with all other hardware components, executing transfer tasks under the global orchestration of the Al model 102.
[0103] In some example implementations, the laboratory module 104 further includes the feeder system 320 for pipette tips and multi-well plates that comprises one or more modular feeder units which can be arranged in a vertical tower configuration, such as that shown in FIGS. 6 and 7. The feeder system 320 can be controlled by the control system 106 which may include a dedicated controller such as a Raspberry Pi, for controlling the feeder system 320 and utilize a motorized weighing platform.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0104] In operation, when the control system 106 triggers a restocking command, the platform of the feeder system 320 elevates the stack of plates or tips by exactly one unit height, presenting the consumable for retrieval by the robotic arm 305.
[0105] In some example implementations, the analytical module 304 comprises a reader module 312, such as a plate reader, for data acquisition, analysis and readouts. The reader module 312 may be any one that is commercially available. In one embodiment, the plate reader is BioTek™ Cytation 1 (from BioSPX).
[0106] For example, following the incubation period, cell plates are transferred by the robotic arm 305 to the plate reader 312. The resulting readout data is fed directly back into the LUMI-model 102, closing the loop for the optimization of subsequent experimental rounds.
[0107] In operation, each experimental iteration begins with the LUMI-model 104 producing a set of compounds or molecules, such as a set of chemically diverse ionizable lipid structures. These lipid candidates are then synthesized in the LUMI-lab 104 in a high-throughput manner by the first automated liquid handler 306, which is programmed to precisely mix samples and perform controlled shaking. Following the completion of a predetermined reaction duration, the resulting ionizable lipids are transferred to the second automated liquid handler 308 and formulated into LNPs with other components of classical LNP formulations consisting of helper lipids, cholesterol, and lipid-conjugated polyethylene glycol (PEGylated lipids), according a predetermined LNP formulation ratio, which are then combined with a cargo molecule according to a predetermined ratio to form cargo-LNP samples. The second automated liquid handler 308 is configured to then dose the cargo-LNP samples with targets, such as target cells, and transfer the target-dosed cargo-LNP samples to the incubation module 310 that includes, for example, multi-well plates containing cell cultures for cell incubation for a pre-determined amount of time. After the incubation, the robotic arm 305 retrieves the treated cell culture plates and transfers them back to the second liquid handler 308 to prepare assay samples. The robotic arm 305 then moves the cell cultures containing the assay samples to the analytical module 304 comprising the plate reader 312, which measures for example, the bioluminescence signal intensity to directly reflect cargo transfection efficiency. This enables a fast and quantitative evaluation of LNPs with varying ionizable lipids. The data from each experimental cycle are processed automatically by the control system 106, which handles quality control, data normalization, and iterative updates to the LUMI-model 102 using active learning algorithms.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0108] Scheme 1 below illustrates a flow of experiments in the LUMI-lab 104 according to some embodiments:Lipid SynthesisCargo-LNPs Formulation, Cell DosingAssayScheme I
[0109] According to some embodiments, over nine or ten iterative DMTA cycles, the LUMI-system 100 is capable of identifying numerous ionizable lipids capable of providing good or even excellent mRNA transfection potency, culminating in the discovery of novel halogen-modified lipid tails that can significantly enhance delivery performance. As further described below, examples of the novel lipids having a halogen-modified tail include those having a fluorine, bromine, chlorine or iodine-modified lipid tail. Some examples of the novel lipids having a halogen-modified tail include those having a fluorine, bromine or chlorinemodified lipid tail. It has been found that these examples of lipids, particularly, those with bromine or chlorine-modified tails consistently outperform existing benchmark ionizable lipids in both in vitro and in vivo studies, demonstrating the application of the closed-loop selfdriving laboratory (SDL) systems disclosed herein to design, discover and develop novel molecules for delivery of therapeutic agents and / or cargo molecules.
[0110] As can be seen by the sample experimental results provided herein, the LUMI-system’s 100 iterative design enables continuous enhancement of predictive accuracy. Using mRNA as the cargo molecule as an example, the mRNA transfection data generated in each experimental cycle are used to fine-tune the LUMI-model 102, thereby improving its ability to evaluate and identify high-efficacy candidates. This closed-loop approach systemically integrates experimental insights for optimizing the performance of the LUMI-model 102, thereby resulting in significant improvements in terms of transfection potency across iterations, as illustrated in FIG. 1E. In addition, to maximize the experiment throughput, the LUMI-system 100 incorporates the LUMI-lab 104 that manages concurrent tasks. For example, while one liquid handler doses the cargo-LNPs with target cells during an ongoing experiment, the other liquid handler initiates lipid synthesis for the next experiment, as illustrated by the curves in FIG. 1C, and by a timeline of each experiment inPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 the LUMI-lab 104 and the shift of tasks between experiments as illustrated in FIG. 2. This parallelized workflow can substantially reduce the turnaround time, achieving a throughput far exceeding traditional manual methods.
[0111] By combining predictive modeling with autonomous high-throughput experimentation, the LUMI system 100 provides a scalable solution for molecular discovery in data-sparse fields. The LUMI system’s ability to rapidly propose, produce, test, and optimize molecular candidates makes it particularly suited to addressing challenges in cargo molecule delivery.MOLECULAR FOUNDATION MODELS FOR DATA-EFFICIENT ACTIVE LEARNING
[0112] In one aspect, to address the challenges in emerging fields with limited historical data, there is provided the LUMI-model 102, a molecular foundation model for data-efficient active learning. The molecular foundation model can be pretrained with unsupervised learning of a broad set of molecules to develop a generic understanding of molecular structures. This approach leverages the fact that, even in the absence of annotated datasets, the chemical space of interest can be enumerated in its raw form, enabling the training of large-scale unsupervised machine learning (ML) models.
[0113] Unlike conventional supervised-learning models, the pretrained molecular foundation model LUMI-model 102 may be capable of few-shot learning, allowing it to adapt quickly with minimal ongoing “wet-lab” data. When integrated into an active learning framework, the model 102 can be continuously optimized in a closed-loop experimental workflow, further enhancing its predictive accuracy and efficiency.
[0114] According to one embodiment, the molecular foundation model is the LUMI-model 102 described herein. Functioning as the computational engine of the system 100, in an example implementation, the LUMI-model 102 is a transformer-based model designed to process two key inputs: (i) atomic types and (ii) 3D atomic coordinates that define molecular conformations. For example, by incorporating 3D structural information, the model 102 develops a comprehensive understanding of physical and chemical properties, and / or other properties of interest such as transfection efficiency and / or potency, and atomic interactions, which are critical fortasks involving molecular structure analysis and optimization. To overcome the limitations in data-sparse fields, the LUMI-model 102 employs a three-stage training pipeline: unsupervised molecular pretraining, continual pretraining, and supervised fine-tuning within an active learning framework. This progressive learning strategy equips thePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 model 102 with robust molecular representations, enabling rapid adaptation to cargo molecule such as nucleic acid delivery and broader molecular discovery applications.
[0115] Referring to FIG. 4, which illustrates training steps of the LUMI-model 102, the first stage (shown as Step 1 - pretraining with generic molecules) of the LUMI-Model 102 employs self-supervised pretraining on a massive dataset of over 13 million molecular structures and their corresponding 3D conformations. This stage uses a masked token prediction objective, where the model 102 learns to reconstruct original atoms and denoise the atom 3D coordinates during training. Additionally, a contrastive learning approach is also integrated into the pretraining to improve the model’s ability to distinguish similar molecule structures. This combined approach enables the model 102 to capture general chemical and spatial features that are broadly applicable across molecular design tasks.
[0116] The second stage (shown as Step 2 - continual pretraining with lipids), refines the model's embeddings by focusing on a domain-specific dataset of lipid molecules. This step adapts the general chemical representations learned in the first stage to the chemical space relevant to ionizable lipid design. In this stage, the continual pretraining ensures that the LUMI-model 102 obtains generic molecular knowledge while prioritizing features most critical for LNP engineering.
[0117] The third stage (shown as Step 3 - closed-loop active learning) involves supervised fine-tuning of the model 102 within an active learning framework. In this stage, the model 102 is incrementally trained on labeled experimental data from previous experiment iterations to predict the mTP. The actual lipids proposed for the next experiment iteration are selected based on the predicted mTP in an active learning manner. Later after the coming iteration, the predicted value will be matched with the actual on-board in vitro luminescence readout and fed into the next round of fine-tuning. This fine-tuning allows the model 102 to incorporate experimental feedback and improve its ability to propose high-performing candidates for subsequent rounds of synthesis and testing.
[0118] As can be appreciated, this three-stage process culminates in a versatile model 102 capable of both generalizing across chemical spaces and excelling in targeted molecular design tasks.
[0119] In some embodiments, the LUMI-model 102 is pre-trained on over 10 million to over 28 million molecular structures, which has the advantage of facilitating few-shot learning of new molecules with minimal experimental data. The LUMI-lab 104 thenPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 complements this computational precision with a high-throughput robotic platform capable of autonomous synthesis, formulation, and in vitro screening.
[0120] To evaluate the effectiveness of the pretraining of the LUMI-model 102 and its 3D inputs, a series of experiments has been conducted comparing its performance against baseline models, including a version of the LUMI-model 102 without pretraining, and standard linear models. In one embodiment, using a dataset of 1,920 lipid molecules with experimentally measured transfection efficiencies, the pre-trained LUMI-model 102 significantly outperforms all alternatives. The pretrained model 102 has achieved higher correlation compared to the non-pretrained version and linear models.
[0121] The embeddings generated by the pre-trained LUMI-model 102 have revealed a high degree of clustering for lipid molecules with similar physical, chemical properties or other properties of interest, and / or providing similar properties of interest, suggesting that the model 102 captures meaningful representations of chemical space. These embeddings have also demonstrated transferability across related tasks, highlighting the versatility of the pretraining approach and its applicability to other molecular discovery challenges.METHODS OF USING SDL SYSTEMS TO DEVELOP COMPOUNDS AND COMPOSITIONS
[0122] In one aspect, there are provided various embodiments of methods of using the closed-loop SDL system 100 described herein for the discovery and development of compound candidates and composition candidates.
[0123] According to some embodiments, with the use of the SDL system 100 described herein, novel halogen-modified lipid tails have been identified as a previously unrecognized structural feature that can enhance cargo molecular, such as mRNA, delivery efficiency in the context of LNPs.
[0124] Ribonucleic acid (RNA) has emerged as a promising drug modality. Messenger ribonucleic acid (mRNA) and small interfering RNA (siRNA) are being actively developed for diseases such as cancer and infections. However, clinical applications face significant hurdles, such as the inability to deliver these molecules to specific organs or tissues, their large size, negative charges, and susceptibility to enzymatic degradation. These challenges underscore a critical need for efficient delivery systems that preserve nucleic acid integrity and also enable targeted delivery. Among the options, LNPs have emerged as a leading platform for mRNA delivery. Additionally, there is a need for LNPs designed for tissue-PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 specific delivery, minimizing off-target effects and addressing safety issues associated with current LNPs.
[0125] LNP formulations typically include ionizable lipids, helper lipids such as phospholipids, PEGylated lipids, and cholesterol. Ionizable lipids are promising for complexation and targeted delivery of cargo molecules such as proteins, peptides, nucleic acids due to their amine head groups, linkers, and / or hydrophobic tails. Despite their promise, designing effective ionizable lipids presents significant challenges due to limited understanding of their complex properties and the intensive nature of synthesis and testing.
[0126] The various systems and methods disclosed herein include the SDL system 100 utilizing, for example, Ugi-4CR reactions to enable a high-throughput fabrication of ionizable lipids. Further, the various systems and methods disclosed herein strategically reconfigure functional groups and optimize component design to enhance the chemical diversity of the resulting lipids, particularly for their lipid tails.
[0127] In another aspect, the present disclosure provides novel ionizable lipids, which can be formulated into LNPs suitable for delivery of therapy involving cargo molecules such as proteins, peptides, and nucleic acids. The term “therapy” is understood broadly to refer to remediation or prevention of a disease or disorder. Example of therapy by medical means generally include, but are not limited to, drug therapy, gene therapy and vaccination.
[0128] According to the teachings of the present disclosure, novel ionizable lipids and LNPs discovered through iterative experimentation facilitated by the Al-driven closed-loop SDL system 100 include halogen-modified lipids.
[0129] Through iterative experimentation facilitated by the Al-driven closed-loop SDL system 100 described herein, it has been discovered that the incorporation of a halogen atom at the terminal position of the lipid tail(s) can substantially enhance LNP performance for transfection of cargo molecules. This structural modification has led to significant improvements in, for example, the mRNA transfection efficacy compared to non-halogenated analogs, demonstrating the power of Al in accelerating the discovery of novel lipids and lipid formulations. In particular, the Al-driven closed-loop SDL system 100 can integrate high-throughput experimentation, machine learning algorithms, and real-time data analysis to systematically explore the chemical space of lipid structures. In this process, the system 100 identifies halogen-modified lipid tails as a high-priority candidate for further evaluation.Subsequent experiments validate that the halogen-modified lipid has consistentlyPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 outperformed its non-halogenated counterparts across a variety of cellular models and delivery conditions.
[0130] Various embodiments of the halogen modification pattern have shown enhanced properties of interest of the LNPs, including improved lipid packing, augmented endosomal escape efficiency, and significantly enhanced delivery potency of the cargo molecule, such as mRNA.
[0131] Without being limited to any particular theory, it is hypothesized that the unique properties of the halogenated lipid tail likely arise from its impact on lipid packing, mRNA encapsulation, and cellular interactions. This modification may enhance the properties of the LNPs, including but not limited to, improved stability and optimized endosomal escape mechanisms, leading to superior transfection outcomes. The discovery highlights the synergy between exemplary Al-driven systems, such as the system 100 described herein, and lipid chemistry, offering a transformative approach to designing next-generation lipid-based delivery vehicles.
[0132] Therefore, in accordance with the teachings herein, there is provided an ionizable lipid of General Formula I:
[0133] wherein:
[0134] Ri is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, and optionally having a halogen at the terminal position or within the intermediate chain extension;
[0135] R2is selected from acyclic amine, cyclic amine and heterocyclic amine, each of the acyclic amine, cyclic amine and heterocyclic amine having at least one ionizable nitrogen atom;
[0136] R3is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl, C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having an unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; andPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0137] R4is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having an unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen;
[0138] provided that at least one of R1, R3and R4has a halogen at the terminal position or within the intermediate chain extension.
[0139] In some embodiments, the C2-C30 alkenyl in Ri, R3or R4comprises one or two double bonds.
[0140] In some embodiments, when present, the hydrolyzable group in Ri, R3or R4is selected from O, OC(O), NHC(O) and imide. In some embodiments, when present, the hydrolyzable group in Ri, R3or R4is OC(O).
[0141] In some embodiments, Ri is derived from an aldehyde precursor having a C1-C30 alkyl or C2-C30 alkenyl, the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, and optionally having a halogen at the terminal position or within the intermediate chain extension. In some embodiments, the C2-C30 alkenyl comprises one or two double bonds. In some embodiments, when present, the hydrolyzable group is selected from O, OC(O), NHC(O) and imide. In some embodiments, when present, the hydrolyzable group is OC(O).
[0142] In some embodiments, when present, the halogen in Ri, R3or R4is fluorine, chlorine, bromine, or iodine. In some embodiments, the halogen is fluorine, chlorine or bromine.
[0143] In some embodiments, the at least one ionizable nitrogen atom in R2is an ionizable tertiary amine group.
[0144] In some embodiments, R2is derived from an amine precursor, which comprises a cyclic or heterocyclic amine having at least one ionizable nitrogen atom, an acyclic amine having at least one ionizable nitrogen atom, or both. In some embodiments, the cyclic amine or heterocyclic amine is a C3-C17 cyclic or heterocyclic amine. In some embodiments, the cyclic or heterocyclic amine is a C3-C7 cyclic or heterocyclic amine. In some embodiments, the acyclic amine is a C1-C5 linear or branched alkyl amine. In some embodiments, the ionizable nitrogen atom is an ionizable tertiary amine group.
[0145] In some embodiments, R3is derived from a carboxylic acid precursor having a C1-C30 alkyl or C2-C30 alkenyl group, the C1-C30 alkyl or C2-C30 alkenyl group beingPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen. In some embodiments, the C2-C30 alkenyl comprises one or two double bonds. In some embodiments, when present, the hydrolyzable group is selected from O, OC(O), NHC(O) and imide. In some embodiments, when present, the hydrolyzable group is OC(O).
[0146] In some embodiments, R4is derived from an isocyanide precursor having a C1-C30 alkyl or C2-C30 alkenyl group, the C1-C30 alkyl or C2-C30 alkenyl group being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen. In some embodiments, the C2-C30 alkenyl comprises one or two double bonds. In some embodiments, when present, the hydrolyzable group is selected from O, OC(O), NHC(O) and imide. In some embodiments, when present, the hydrolyzable group is OC(O).
[0147] In some embodiments, Ri is:ooo' o'o' o'PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0148] In some embodiments, R2is:N
[0149] In some embodiments, R3is:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1OHoHOOPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0150] In some embodiments, R4is:""'NC
[0151] In some embodiments, R3is selected from C1-C30 alkyl and C2-C30 alkenyl, optionally interrupted by a hydrolyzable group, and has a halogen at the terminal position or within the intermediate chain extension. In some embodiments, R3has a bromine at the terminal position. In some embodiments, R3has a chlorine at the terminal position. In some embodiments, R3has a fluorine at the terminal position. In some embodiments, the C2-C30 alkenyl comprises one or two double bonds. In some embodiments, when present, the hydrolyzable group is the hydrolyzable group is selected from O, OC(O), NHC(O) and imide. In some embodiments, when present, the hydrolyzable group is OC(O).
[0152] In some embodiments, the ionizable lipid of Formula I isPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0153] In some embodiments, the ionizable lipids of Formula I exist as different isomers. In particular, the lipid chains may exist in the trans or cis configuration such as when they contain double bonds. In some embodiments, the lipids occur in the cis configuration.
[0154] As used herein, when a compound has one or more stereocenters, each stereocenter may have the R or S configuration, unless stated otherwise. In some embodiments, the compound is a racemic mixture of enantiomers and / or diastereoisomers, or in some embodiments, it exists as a mixture having an excess of one or more of the enantiomers and / or diastereoisomers, such as more than 60 %, more than 70 %, more than 80 %, more than 85 %, more than 90 %, more than 95 %, more than 98 %, more than 99 %.
[0155] In some embodiments, a method for preparing the ionizable lipid of General Formula I comprises reacting a compound of Formula A, a compound of Formula B, a compound of Formula C and a compound of Formula D, wherein R1, R2, R3 and R4are as described above, provided that at least one of R1, R3and R4includes a halogen at the terminal position or within the intermediate chain extension.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 RrCHO R2-NH2R3-COOH R4-NC A B C D
[0156] In some embodiments, the reaction is according to the following synthetic route (Scheme II):RrCHO+ EtOHR2-NH2- FU RUgi-4CR +3rt, 18hRj-COOH R, O+Rj-bJC Ugi adductScheme II.
[0157] As will be described in greater detail herein below, in accordance with the teachings herein, there are provided various embodiments of methods of using the LUMI-system 100 to develop a family of novel ionizable lipids with halogenated tails.
[0158] Reference is made to FIG. 10, which illustrates a method of using an example closed-loop SDL system 100 to develop ionizable lipids and LNPs. LUMI-lab 104 is utilized to conduct consecutive DMTA iterations to optimize LNPs for cargo delivery, for example, mRNA delivery, to targets, for example, target cells such as human bronchial epithelial (HBE) cells, as illustrated in FIG. 10. As shown in FIG. 10A, each iteration comprises two parallel experiments, synthesizing lipid candidates (in this example, 184 candidates) across two multi-well plates (in this example, 92 wells x 2), followed by cargo-LNP formulation and evaluation or assay. Each plate includes two wells treated by industry-standard LNPs containing benchmark lipids such as dilinoleylmethyl-4-dimethylaminobutyric acid (DLin-MC3-DMA or MC3 for abbreviation) or SM-102 as positive controls and two non-treated wells as negative controls. In some embodiments, a 3D surface plot as shown in FIG. 10A can be plotted from the predictions produced by an ensemble of the LUMI-model 102 to visualize the transfection potency (mTP shown in FIG. 10A for illustration) predictions of the lipid candidates, brighter regions indicating higher predicted mTP values, and the z-axis variance representing prediction uncertainty.
[0159] As described herein, targets suitable for various embodiments of the system and / or method described herein comprise, but are not limited to, cell systems and multicellular constructs. In some embodiments, the cell systems comprise, but are not limited to, one or more of cancer cell lines, immortalized cell lines, primary cells, stem cellsPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 (including pluripotent or multipotent stem cells), differentiated cells derived therefrom, mammalian cells, non-mammalian cells, engineered cells, and recombinant cells. In some embodiments, suitable targets can be any one or more of the following cells: HeLa cells, T-cells, stem cells (including pluripotent or multipotent stem cells), bronchial epithelial cells, lung epithelial cells, pancreatic beta cells, natural killer cells, macrophages, dendritic cells, endothelial cells, chondrocytes, skeletal muscle cells, cardiomyocytes, neurons, retinal cells and hepatocytes. In some embodiments, suitable multicellular constructs comprise, but are not limited to, organoids and spheroids.
[0160] MC3 is an ionizable cationic lipid extensively used in LNP formulations for delivering nucleic acids such as siRNA and mRNA. It is notably a key component in the FDA-approved drug Onpattro® (patisiran), which is the first siRNA therapeutic approved for treating hereditary transthyretin-mediated amyloidosis. MC3 contains a dimethylaminobutyric acid head group that is ionizable. This allows the lipid to acquire a positive charge under acidic conditions (e.g., during nanoparticle formation), facilitating electrostatic interaction with negatively charged nucleic acids.
[0161] SM-102 (a proprietary lipid developed by Moderna Therapeutics) is an ionizable lipid prominently used in Moderna's mRNA-based COVID-19 vaccine (mRNA-1273). It plays a crucial role in delivering mRNA encoding the spike protein of SARS-CoV-2 into human cells, eliciting an immune response. It contains an amine group that can be protonated under acidic conditions, essential for mRNA encapsulation during LNP formation.
[0162] Unlike conventional active learning workflows, which primarily focus on improving predictive accuracy, the LUMI-system 100 simultaneously refines predictions and identifies high-performing lipids within a finite number of iterations. In one embodiment, to achieve this, each iteration follows a dual-plate strategy, optimizing top-performing candidates while maximizing information gain. As shown in FIG. 10A, the first plate (exploitation plate) contains a plurality of ionizable lipid candidates (92 shown for illustration) predicted to exhibit high mTP, prioritized based on the latest LUMI-model 102 predictions, and the second plate (exploration plate) contains a plurality of ionizable lipid candidates (92 shown for illustration) selected based on high ensemble model uncertainty. Lipids with the greatest variance across ensemble predictions are prioritized, ensuring systematic exploration of underrepresented regions in the chemical space. This exploitation-exploration framework can enable the LUMI-system 100 to efficiently probe a broad ionizable lipid chemical space, ensuring both the discovery of superior lipid candidates and the continuous expansion of structure-activity relationship (SAR) knowledge. By iteratively improving predictions andPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 expanding the diversity of evaluated lipid structures, the LUMI-system 100 can steadily advance toward optimizing LNPs for cargo molecule delivery.
[0163] In one embodiment, over the course of nine iterations, an exemplary LUMI system 100 has evaluated a total of 864 lipid candidates, achieving significant advancements in transfection efficiency. Early iterations focus on refining the predictive capacity of the LUMI-model 102, as experimental data have revealed trends in structural features associated with transfection performance. By the fourth iteration, the system 100 can identify several promising lipids that outperform initial benchmarks. Notably, performance gains can accelerate in later iterations, reflecting the dual impact of refined exploitation and informed exploration.
[0164] In one embodiment, throughout ten iterations, the LUMI-system 100 can evaluate a total of 1,781 distinct lipid candidates, achieving substantial improvement in transfection efficiency. Early iterations prioritize refining the predictive accuracy of the LUMI-model 102, as the experimental data reveals more SARs between molecular features and transfection performance. After a number of iterations (by the 4th iteration as shown in FIG. 10B), the system 100 can identify several promising lipids that outperform positive controls, as illustrated in FIG. 10B. Later iterations can further improve the LNP performance driven by the combined effects of refined exploitation and informed exploration. In the final tenth iteration, over 50% of candidates exhibit superior transfection efficiencies (relative light units (RLU) > 10), highlighting the effectiveness of LUMI-system’s 100 optimization strategy, as illustrated in FIG. 10B.
[0165] FIG. 10C illustrates trends indicative of the LUMI-model’s 102 learning process by looking at the performance of four components in the combinatorial chemical library shown in FIG. 9 for the ionizable lipid synthesis, i.e. the amines, isocyanides, aldehydes, and carboxylic acids, respectively. The exploitation plates consistently yield high-performing candidates, while expanding the diversity of chemical features captured by the model 102, indicating the capability of rapid discovering positive candidates by an explorationexploitation active learning strategy employed by the LUMI-system 100. By later iterations, specific headgroups and linkers associated with high transfection potency, particularly: N-(2-Aminoethyl)piperazine (shown as A4 in FIG. 9) head, tris(2-aminoethyl)amine (shown as A8 in FIG. 9) head, and 1 -Isocyanoadamantane (shown as B12 in FIG.9) linker.
[0166] Reference is made to FIG. 11, which illustrates results of ionizable lipids developed using the closed-loop SDL system 100 as illustrated in FIG. 10. Throughout the active learning and fine-tuning process, the LUMI-model 102 can progressively learn thePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 relationship between molecular features of ionizable lipids and their transfection performance. This relationship can be visualized, for example, through UMAPs of lipid embeddings generated by the optimized LUMI-model 102 after the tenth iteration. FIG. 11A shows UMAP of the entire 221k synthesizable lipid chemical space out of the designed Ugi-4CR library illustrated in FIG. 9. Areas of the UMAP are shaded by the average mTP predicted by an ensemble of five models. The UMAP shows that lipids with the highest and lowest predicted mTPs form distinct local clusters, reflecting well-captured SARs. The top of FIG. 11B shows that one of the most prominent clusters corresponds to lipids with brominated tails. The bottom zoom-in panels of FIG. 11B show the average predicted mTP, the variance of predictions by the ensemble of view models, and the presence of the A4 head component in the subcluster of brominated lipids, respectively. This brominated lipid cluster is enriched in lipids with globally high predicted mTP and low prediction variance, suggesting a strong correlation between a bromine modification and transfection performance.
[0167] In addition to identifying brominated lipids as a distinct structural feature in the embedding space, the LUMI-model 102 actively prioritizes these lipids in experimental iterations. Despite accounting for only 8.33% of the 221k synthesizable lipids (see the left pie chart in FIG. 11C), the bromine-tail lipids constitute 26.4% of all candidates proposed by the LUMI-model 102 for synthesis and testing, as illustrated in the middle pie chart in FIG. 11C. Furter, 52% of all top-performing lipids (having top 10% mTP) across experiments are brominated (see the right pie chart in FIG. 11C), confirming their important contribution to enhanced transfection efficiency.
[0168] In various embodiments, after the self-supervised pretraining stage, the LUMI-model 102 can identify important structure features. For example, as shown in FIG. 18A, Br-containing structures are already distinguishable in the latent space after the self-supervised training stage.
[0169] In various embodiments, the LUMI-model 102 can identify potential molecule candidates during the supervised fine-tuning stage. For example, as shown in FIGs. 18B and 18C, the LUMI-model 102 can quickly associate the connection between important structure features identified after the self-supervised pretraining stage and high mTP potentials during the early iterations of the fine-tuning stage and continue to rank them until their rankings are stabilized with the feedback of growing data points.
[0170] Using the brominated lipids as an example, the superior performance of these lipids are further validated through a comparative analysis with non-brominated lipids. AsPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 illustrated in FIG. 11 D, left side graph, of all the lipid candidates, brominated lipids exhibit significantly higher expected mTP values (shown as the kernel density estimation (KDE) curves) (p < 0.0001) and a greater proportion of high-mTP lipids. This advantage remains consistent among the top-performing 10% of lipids (p < 0.001), as shown in FIG. 11D, right side graph. Further, other molecular features, particularly headgroups and linkers can also play important roles. For example, brominated and non-brominated lipids across share similar headgroup structures (FIG. 11E) with top-performing headgroups such as N-(2-Aminoethyl)piperazine (A4 in FIG. 9) and tris(2-aminoethyl)amine (A8 in FIG. 9) present in both categories.
[0171] According to various embodiments, candidates selected by the closed-loop SDL system 100 are synthesized, formulated into LNPs with cargo molecules, which are encapsulated by the LNPs. The LNP encapsulated cargo molecules are tested in vivo for their transfection efficiency in a target, thereby validating the predictive power of Al-driven molecular design using the LUMI-model 102.
[0172] In various embodiments, there is also included herein a LNP or LNP composition comprising a lipid, in particular an ionizable lipid as described herein, for example of Formula I, or as identified using the Al-based closed-loop SDL systems and methods disclosed herein.
[0173] Generally, the term “nanoparticle” refers to any particle having a diameter that makes the particle suitable for systemic, in particular intravenous administration, of active agents, typically having a diameter of less than 1000 nanometers (nm), less than 500 nm, less than 200 nm, such as for example between 50 and 200 nm; or between 80 and 160 nm.
[0174] In various embodiments, the nanoparticle is in the form of a LNP or LNP composition comprising a lipid, in particular, an ionizable lipid as described herein.
[0175] In a further aspect, the present disclosure also provides a pharmaceutical composition comprising one or more nanoparticles as described herein and a pharmaceutically acceptable agent, such as a carrier, excipient, etc. Such pharmaceutical compositions are particularly suitable in various fields such as prophylactic vaccines, therapeutic vaccines, protein replacement therapies, gene editing, gene silencing, small molecule delivery and the like.
[0176] In various embodiments, the nanoparticles and pharmaceutical compositions as described herein are used in the treatment of a disease, condition or disorder characterized by the overexpression of a polypeptide in a subject by administering to the subject aPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 pharmaceutical composition as described herein, wherein the active agent is a protein, a peptide or a nucleic acid. In some embodiments, the nucleic acid is DNA, RNA, or recombinantly produced and chemically synthesized molecules. In some embodiments, the nucleic acid is selected from DNA, genomic DNA, cDNA, RNA, tRNA, mRNA, si RNA, circRNA, micro RNA (miRNA), antisense oligonucleotides, ribozymes, plasmids, immune-stimulating nucleic acids, antisense nucleic acids, antagomirs (anti-miRs), miRs, supermiRs, U1 adaptors, and aptamers. In some embodiments, the nucleic acid is siRNA, tRNA, circRNA or mRNA, or a combination thereof. In some embodiments, the nucleic acid is selected from an siRNA, a microRNA, and an antisense oligonucleotide, Wwherein the siRNA, miRNA, or antisense oligonucleotide includes a polynucleotide that specifically binds to a polynucleotide that encodes the polypeptide, or a complement thereof. In some embodiments, the nucleic acid is a siRNA or miRNA. In some embodiments, the nucleic acid is mRNA.
[0177] A nucleic acid is either in the form of a molecule that is single stranded or double stranded and linear or closed covalently to form a circle. In some embodiments, the nucleic acid is used for introduction into, i.e. transfection of cells, for example, in the form of RNA which can be prepared by in vitro transcription from a DNA template. The RNA can moreover be modified before application by stabilizing sequences, capping, and / or polyadenylation.
[0178] The term " RNA" relates to a molecule that comprises ribonucleotide residues and in some embodiments being entirely or substantially composed of ribonucleotide residues. “Ribonucleotide” relates to a nucleotide with a hydroxyl group at the 2'-position of a p-D-ribofuranosyl group. The term includes double stranded RNA, single stranded RNA, isolated RNA such as partially purified RNA, essentially pure RNA, synthetic RNA, recombinantly produced RNA, as well as modified RNA that differs from naturally occurring RNA by the addition, deletion, substitution and / or alteration of one or more nucleotides. Such alterations can include addition of non-nucleotide material, such as to the end(s) of a RNA or internally, for example at one or more nucleotides of the RNA. Nucleotides in RNA molecules can also comprise non-standard nucleotides, such as non-naturally occurring nucleotides or chemically synthesized nucleotides or deoxynucleotides. These altered RNAs can be referred to as analogs. In some embodiments, nucleic acids are comprised in a vector. The vector may be any selected from vectors known to the skilled person including plasmid vectors, cosmid vectors, phage vectors such as lambda phage, viral vectors such asPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 adenoviral or baculoviral vectors, or artificial chromosome vectors such as bacterial artificial chromosomes (BAC), yeast artificial or analogs of naturally-occurring RNA.
[0179] In various embodiments, the term “RNA” relates to “mRNA” which means “messenger RNA” and relates to a “transcript” which is produced using DNA as template and encodes a peptide or protein. mRNA typically comprises a 5' untranslated region (5’ -UTR), a protein or peptide coding region and a 3' untranslated region (3'-UTR). mRNA has a limited half-life in cells and in vitro. In some embodiments, mRNA is produced by in vitro transcription using a DNA template. In some embodiments, the RNA is obtained by in vitro transcription or chemical synthesis. The in vitro transcription methodology is known to the skilled person. For example, there is a variety of in vitro transcription kits commercially available.
[0180] In various embodiments, the nanoparticles and pharmaceutical compositions as described herein are used in the treatment of a disease, condition or disorder characterized by under-expression of a polypeptide in a subject by administering to the subject a pharmaceutical composition as described herein, wherein the active agent is a plasmid that encodes the polypeptide or a functional variant or fragment thereof, such as in the context of protein replacement therapy.
[0181] In some embodiments, the nanoparticles and compositions as described herein are used as a transfection agent that includes the compositions or nanoparticles described herein, wherein the composition or nanoparticles comprise a cargo molecule. The transfection agent, when contacted with targets, can efficiently deliver the cargo molecule to the targets. Yet another aspect is a method of delivering a cargo molecule to the interior of a target, by obtaining or forming a composition or nanoparticles described herein, and contacting the composition or lipid particles with a target. In some embodiments, the cargo molecule is a nucleic acid.
[0182] While the teachings described herein are in conjunction with various embodiments for illustrative purposes, it is not intended that the teachings be limited to such embodiments, as the embodiments described herein are intended to be examples. On the contrary, the teachings described and illustrated herein encompass various alternatives, modifications, and equivalents, without departing from the embodiments described herein, the general scope of which is defined in the appended claims.
[0183] EXAMPLES
[0184] The following non-limiting examples are illustrative of the present disclosure.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 GENERAL METHODS
[0185] 1.1 LUMI-Model Architecture
[0186] In an example implementation, the machine learning module or the LUMI-model 102 was a large-scale molecular representation model that built on advances in 3D molecular pretraining frameworks. The LUMI-model 102 leveraged a 3D transformer architecture that has been proposed for modeling small molecules, and expanded on it with training procedures and objectives tailored to ionizable lipid engineering. The backbone of the model 102 employed a 15-layer transformer augmented to handle 3D spatial data effectively. The input to the model 102 contained atom types and atom coordinates in 3D space, typically representing a molecule conformation.
[0187] 7.7.7 Input embeddings
[0188] For an input molecule containing N atoms, the structural information is composed into two parts: (1) The atom types are denoted as a list of integer identifiers{. / ■;.. / •■j \. xtE NEachinteger denotes an atom element, such as carbon and nitrogen. (2) The pairwise distance matrix between all atoms denoted as,D, G:X)
[0189] PyTorch embedding layers are applied to encode the atom type into a learnable vector of', where D is the dimension of the embedding features, and the superscript (0) denotes it as the embedding before the first transformer layer. The HT’j = rM0-1h0 / ' (•'hrcombined embedding matrixL, Ai "2 ■ ■ ■ ■ •UN J, is input to the stacked transformer layers in the LUMI-model 102.
[0190] To encode the distance matrix, an atom type aware Gaussian Kernel method was used to process each distance recordbetween atoms / and j from Dxinto a pairwise e(Ti) wA'x. Vinteraction matrixJbetween atoms / and j. These encodings are only dependent on the pairwise distances, and thus ensure the model 102 is invariant to global rotations and translations, making it particularly suitable for handling the 3D conformation of molecules.
[0191] 7.7.2 Self-attention for molecular representation learningPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0192] In one embodiment, for the self-attention between atom embedding vectors, the Query, Key, and Value matrices at l-th layer were first computed as in standard multi-head self-attention. These matrices, denoted asA A A, are learnable T (1 — 1)transformations from the input embeddingsn. dkis the embedding dimension for the k-......... ' i dk = Dth attention head, and
[0193] In an example, the attention layer was enhanced to incorporate the atom coordinates information encoded in the pairwise distance representation, which was originally introduced by Uni-Mol (Zhou et al. “Uni-Mol: A Universal 3D Molecular Representation Learning Framework”, Conference Paper at ICLR 2023). Specifically, in the / -th ( / [1. 15]).....,.......v L J’ transformer layer, the self-attention per head k was computed as( jy-d) \T AttentionfQ^. = softmax) — - — - h, (1) where a standard softmax operator was used, and the interaction matrix per layers <i)was computed asJS, M)+EL if z > iif Z = 1k(2)f / )
[0194] The output embedding of the l-th layer,nwas then generated by consecutive computations of the concatenation of multi-head attentions, a feed-forward neural network, and layer normalization operation.
[0195] 1.2 Model Training
[0196] The LUMI-model 102 employed a three-stage training workflow to maximize its adaptability and performance:
[0197] 7.2. Step 1. Unsupervised pretraining on generic molecules
[0198] In an example implementation, the initial unsupervised pretraining phase was conducted on a comprehensive molecular dataset of 13,369,320 distinct molecules with 147,062,520 conformations. This extensive dataset comprised of commercially available molecules, originally released by Uni-Mol. To enhance data quality and reduce redundancy,PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 deduplication was applied to ensure a structurally diverse and high-quality molecular space capable of optimizing the LUMI-model’s 102 representation learning.
[0199] In an embodiment, the LUMI-model 102 was trained using three complementary learning objectives: (1) masked atom prediction, (2) 3D positional recovery, and contrastive representation learning. In the first task, a subset of atomic identities in the input molecule was randomly masked, and the model 102 learned to reconstruct the missing atomic properties. In the second task, Gaussian noise was added to the atomic coordinates, and the model 102 was trained to predict the original 3D positions of perturbed atoms. These first two objectives followed the settings in Uni-Mol. In addition to these objectives, the LUMI-model 102 used a third objective including (3) contrastive learning to improve conformation-aware molecular embeddings. For each mini-batch of training examples, two augmented versions of the same molecule were generated by applying independent atom masking and coordinate perturbations. These augmented versions served as positive pairs, while embeddings of different molecules within the mini-batch were treated as negative examples. The model was trained using the normalized temperature-scaled cross-entropy loss (NT-Xent), following the SimCLR framework. The NT-Xent loss is defined as:rexp(sim(z(,A-NT-Xent = — > log, = - — - — —(3)where z, and Zj represent the embeddings of two augmented versions of the same molecule, sim(z,, z7) denotes the cosine similarity between the two embeddings, Tis a temperature scaling parameter, andis the set of all samples in the minibatch. This loss encourages the model 102 to bring representations of different conformations of the same molecule closer while pushing apart those of different molecules.
[0200] 7.2.2 Step 2. Continual pretraining on lipid-like molecules
[0201] In an embodiment, following initial unsupervised pretraining, the model 102 further underwent continual pretraining on a domain-specific dataset of lipid-like molecules, employing the same learning objective as unsupervised pretraining. This specialized dataset included 15,491,072 unique lipid-like molecules with 170,401,792 distinct conformations. The details of constructing this lipid-focused corpus are described below under “1.2.4 Dataset for training stages”.
[0202] To avoid catastrophic forgetting during the continual pretraining stage, a phenomenon where the model loses its learning from the previous stage, data from the priorPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 stage were sampled and a lower learning rate during training was utilized, as described below in “TRAINING DETAILS FOR LUMI-MODEL” section.
[0203] This step adapts the general chemical knowledge gained in the first stage to the specific chemical space of ionizable lipids, ensuring the model prioritizes features most relevant to nucleic acid delivery applications.
[0204] 7.2.3 Step 3. Fine-tuning with closed-loop active learning
[0205] The final training stage involved fine-tuning in a supervised manner using experimental data generated by the LUMI-lab 104. For example, the model 102 predicted the potential mTP of the lipids, optimized to match the onboard mTP readouts. Fine-tuning was performed iteratively, with the model 102 updated after each experimental cycle to incorporate new data and refine its predictions. For each round of the experiment, an ensemble of 5 models was finetuned on 5-fold cross-validation. To be specific, data accumulated by all completed iterations were used fortraining. Each model was trained on a rolling 80% split of the accumulated data, and the other 20% was used as the validation set. The ensemble improved prediction robustness and enabled uncertainty quantification with prediction variance in the active learning framework.
[0206] The LUMI-model 102 operated within a closed-loop active learning framework, enabling it to improve predictions and optimize lipid proposals iteratively. At each iteration, the model 102 produced a set of compounds, such as a candidate pool of lipids for the LUMI-lab 104. To balance exploitation and exploration, two sets of candidates were selected for synthesis and testing: (1) The first experiment plate consisted of 92 lipid candidates prioritized for high predicted mTP, ensuring that the most promising candidates were experimentally validated; and (2) The second plate included 92 lipids with the highest uncertainty in ensemble predictions, maximizing the information gain in subsequent iterations.
[0207] To effectively explore the vast pool of 221 K candidates, a scalable sampling policy tailored to the foundation model 102 was designed. In an embodiment, initially, prediction uncertainty was quantified using ensemble prediction variance. Most uncertain lipid candidates often share similar chemical properties, such as having the same ionizable headgroups. Sending all such molecules for synthesis would inefficiently utilize experimental resources and fail to diversify the training data, as many samples would be overly redundant. Therefore, a diversity-aware sampling policy was implemented. First, these lipids were clustered using molecular embeddings generated by the foundation model 102, whichPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 captured the structural and chemical diversity of the molecules. Next, the N= 10,000 (~5% of the pool) most uncertain lipids based on ensemble variance or average predicted mTP (for the exploration or exploitation plate, respectively) were identified. To ensure a balanced representation across clusters, a round-robin sampling strategy was employed, selecting candidates from each cluster in a cyclical manner. This approach ensures both high uncertainty and molecular diversity in the exploration set. Additionally, a limit for the number of repeated chemical reagents (components in the Ugi-4CR reaction) used in each experiment was applied. This intends to first encourage the diversity of molecules again, and avoid stock management challenges by overusing of few specific reagents. The maximum number of usage per reagent was set to 35 for each experiment.7.2.4 Datasets for Training StagesConstruction of in-domain lipid-like molecule dataset
[0208] In an embodiment, to further align the LUMI-model's 102 capability for lipid engineering, an expanded lipid-like molecule dataset for continual pretraining was constructed. Following the Ugi-4CR combinatorial chemistry used for synthesizing lipids in SDL, the dataset was produced by systematically enumerating various options for each chemical component. Based on the coverage of chemical properties and empirical experience, 64 amines, 62 isocyanides, 64 alkyl aldehydes, and 61 alkyl carboxylic acids were selected for enumeration, as further described below in “CHEMICAL LIBRARY FOR EXPANDED LIPID DATASET” section, and 15,491,072 distinct lipid-like molecules in total were constructed.
[0209] To generate the 3D conformations for each molecule, a similar conformation generation pipeline used in constructing a dataset for unsupervised pretraining as described above in “1.2.1 Step 1. Unsupervised pretraining on generic molecules". ETKGDv3 (Sereina Riniker and Gregory A Landrum “Better informed distance geometry: using what we know to improve conformation generation”, Journal of chemical information and modeling, 55.12 (2015), pp. 2562 - 2574; and Shuzhe Wang et al. “Improving conformer generation for small rings and macrocycles based on distance geometry and experimental torsional-angle preferences”. Journal of chemical information and modeling 60.4 (2020), pp. 2044-2058) with RDKit (Greg Landrum. RDKit: Open-source cheminformatics. 2006. https: / / www.rdkit.org.) was used to propose conformation and Merck molecular force field (MMFF) was applied to optimize the generated conformation. For each of the molecules, 10 3D conformations were generated along with an additional molecular graph, resulting in a total of 11 conformers per molecule.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 Combinatorial chemical library for the closed-loop experimentsThe virtual library for screening ionizable lipids consisted of 221,184 lipids, covering 32 amines, 12 isocyanides, 36 aldehydes, and 16 carboxylic acids, as shown in FIG. 9. The construction of this virtual library is similar to the method described above for “Construction of in-domain lipid-like molecule dataset”.1.3 Design of Automatic Experiment
[0210] In an embodiment, each experimental iteration began with the LUMI-model 102 proposing a diverse set of ionizable lipid candidates, balancing predicted mTP and chemical diversity to ensure both optimization of high-performing lipids and exploration of new chemical space. The selected lipids were synthesized using the first handler module 306 implemented as, for example, an automated liquid handling system (Opentrons® OT-2). The handler module 306 is configured to precisely dispense reagents, mix components, and perform controlled shaking to initiate the reaction. The synthesis process continued for 18 hours, allowing the ionizable lipids to form under standardized conditions. Once the reaction was complete, the synthesized lipids were transferred to a second handler module 308, where they were formulated into firefly-luciferase (Flue) mRNA (ml_uc)-LNPs by combining them with helper lipids, cholesterol, PEGylated lipids, and mLuc. The formulation followed a predetermined optimized molar ratio to ensure efficient encapsulation and LNP stability.
[0211] To evaluate mRNA delivery efficiency, mLuc-LNPs were dosed into two replicate 96-well plates containing HBE cells. The second handler module 308 ensured precise and uniform LNP dosing, minimizing well-to-well variability. The plates were then incubated in the incubator module 310 for 18 hours, allowing sufficient time for cellular uptake and mRNA translation. After incubation, a robotic arm retrieved the treated plates and transferred them back to the second handler module 308, where a luciferase substrate reagent was automatically added to initiate the luminescence reaction.
[0212] To further enhance data reliability, each 96-well plate was read twice in the bioluminescence plate reader 312, resulting in four replicate readings per lipid (two independent cell culture plates x two reads per plate). This approach reduced technical fluctuations, ensuring that potential inconsistencies in cell incubation or signal detection were minimized and yielding highly reproducible mTP measurements. The luminescence intensity, which directly quantifies mRNA transfection efficiency, was automatically recorded and processed for analysis.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0213] All experimental data undergo real-time processing within the LUMI-system 100’s integrated software modules, such as those hosted via the control system 106, which performed error detection, data normalization, and quality control. The system automatically corrected inconsistencies, removed outliers, and normalized luminescence readings using control wells. The processed data were then fed back into the LUMI-model 102, iteratively refining its predictive framework with each experimental cycle. By continuously learning from experimental results, the LUMI-model 102 improved its ability to propose increasingly effective lipid candidates in subsequent iterations. The hardware of the LUMI-lab 104 and software designs in the control system 106 are further detailed below in “DESIGN OF MECHANICAL MODULES OF LUMI-LAB” and “DESIGN OF SOFTWARE MODULES” sections respectively.
[0214] Additional design choices were applied to the experiment iterations: (1) To accumulate diverse ionizable lipids at the beginning iterations, lipid candidates were selected based on their diversity in the molecular embeddings by the pretrained LUMI-model 102. Specifically, a round-robin sampling strategy was used to select candidates from different embedding clusters as described above in “7.2.3 Step 3. Fine-tuning with closed-loop active learning". This diversity-based strategy was used for the first two iterations to generate sufficient, diverse data as a warm start for the following iterations of active learning. (2) Starting from the third iteration, lipid candidates were proposed by the dual-plate strategy as described above in “7.2.3 Step 3. Fine-tuning with closed-loop active learning". To ensure continuous exploration of chemical space, previously tested lipids were excluded from selection, and resampling was performed if a duplicate lipid was encountered. (3) The last iteration only used the exploitation strategy to propose two plates (92 x 2) of lipids with high predicted MTP, and duplication of previously tested lipids was also allowed. This was designed to comprehensively assess the most promising lipid candidates identified throughout the optimization process.7.3.7 Online quality control and data processing
[0215] In some example implementations, to ensure the reliability and consistency of luminescence readings in high-throughput screening, an automated online quality control (QC) and data processing pipeline was implemented. This pipeline integrated log transformation, normalization, and quality assessment of experimental data obtained from 96-well plate assays, enabling real-time validation for downstream analysis.
[0216] Raw luminescence readings from each well were first log-transformed using base-2 logarithm to stabilize variance and improve comparability across experiments. ThePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 first two wells (A1, B1), which contained no ionizable lipids, were designated as empty control wells, while subsequent wells without lipid components but containing benchmark lipids (e.g., MC3) served as positive control wells. The remaining wells represented experimental lipid candidates. Normalization was performed by adjusting each reading relative to the control wells:Tl*norm=1OQ2 (7?raw ) 1* *£>2 (^control )where Rrawrepresents the raw luminescence reading, and Rcontroi is the mean control well reading. Any negative normalized value was clipped to zero to ensure robust normalization while maintaining interpretability.
[0217] Each well in the 96-well plate typically contained up to four replicate readings, with the first two replicates derived from one well and the latter two from another. The QC pipeline assessed the consistency of replicate readings using predefined thresholds.Replicates were deemed reliable if the absolute difference between paired readings did not exceed a predefined threshold of 3 in log scale. If the difference exceeded this threshold but the absolute values were high (Rnorm > 7), the maximum value was retained; otherwise, the data point was flagged as unreliable. For multiple replicates to be deemed trustworthy, the final well reading was computed as the max reading of the replicates.
[0218] This automated QC and data processing pipeline enabled a real-time validation of experimental data, ensuring that only high-confidence readings were used in subsequent analyses. The framework was designed to scale efficiently across experimental iterations, providing a robust foundation for adaptive learning and active optimization within the LUMI-system 100.1.4 Pre-experimental Dataset
[0219] To validate the model 102 design, a dataset produced using an established high-throughput synthesis and screening method (Yue Xu et al. “AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery”, Nature Communications 15.1 (July, 2024), p. 6305) was utilized. The dataset included 1,920 unique ionizable lipids synthesized through the Ugi-4CR reaction. This combinatorial lipid library was constructed with 20 amines, 4 aldehydes, 4 carboxylic acids, and 6 isocyanides as shown in Table 1 below. The 1,920 LNPs were tested in vitro, providing labeled transfection data for the pre-experimental test.
[0220] Table 1. Components of Combinatorial Lipid Library for Pre-Experimental DatasetPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 Amine0 z-a i i? J B V 0Aldehyde Oz\ / z\ Q s r< / Z X z< I Z I" I zCarboxylic o AAcid0o An HO y f f a / ? r IsocyanideCN'vx~x1CNY^T\-0ACk “X- CN^1.5 Pre-Experimental Evaluation and Comparison
[0221] To assess the predictive performance of the LUMI-model 102, a systematic benchmark analysis against several baseline models was conducted. The evaluation aimed to determine the effectiveness of pretraining strategies and 3D molecular representations inPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 improving mTP predictions. The LUMI-model 102 was compared against the following alternatives: a variant of LUMI-model 102 trained without any pretraining, a version that excluded continual pretraining while retaining the initial unsupervised pretraining step, a graph neural network (GNN) method MolCLR (Yuyang Wang et al. “Molecular contrastive learning of representations via graph neural networks”. Nature Machine Intelligence 4.3 (2022), pp. 279-287) trained on atom graphs without 3D coordinate input, a hybrid method LiON (Jacob Witten et al. “Artificial intelligence-guided design of lipid nanoparticles for pulmonary gene therapy”. Nature biotechnology (2024), pp. 1-10) that utilized GNNs and Morgan molecular fingerprints (David Rogers and Mathew Hahn. “Extended-connectivity fingerprints”. Journal of chemical information and modeling 50.5 (2010), pp. 742-754), XGBoost (Tianqi Chen and Carlos Guestrin “Xgboost: A scalable tree boosting system”. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016), pp. 785-794) trained on RDkit molecular descriptors, and an MLP model trained on Morgan molecular fingerprints.
[0222] The models were evaluated on a pre-experimental dataset of 1,920 lipid molecules with experimentally validated transfection efficacies as described above. To ensure robust performance estimation, a five-fold rolling validation scheme was employed, where each model was trained five times, with 20% of the data held out as a test set in each iteration. The cross validation was run twice per model with different random seeds, totaling 2 x 5 runs. Specifically, an additional set of five-fold cross validation runs was conducted for the method of XGBoost over molecular descriptors, due to its high observed performance between runs. The final reported performance for each model was obtained by averaging results across the test sets.
[0223] Model performance was assessed using Pearson correlation coefficients, which measure linear associations between predicted and experimental mTP values, respectively. These metrics were computed on two evaluation subsets: the full test set containing all lipids and the subset comprising the top 25% of lipids based on mTP, which specifically evaluated model performance on high-performing lipid candidates. This comprehensive processing allowed quantification of the contribution of pretraining, continual pretraining, and 3D molecular representations in enhancing model accuracy.
[0224] The LUMI-model consistently outperformed all other methods, and within the comparison of LUMI-model variants, the full LUMI-model outperformed all alternatives, verifying the effectiveness of the first and second stage pretraining. It was also observed that conventional baselines such as XGBoost and the MLP on fingerprints achieved moderatePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 performance overall but dropped sharply on the top 25% subset, underscoring their limited ability to prioritize high-performing candidates, while highlighting the relative advantage of the LUMI-model and its value of transformer architecture and the 3D conformation input features.
[0225] To evaluate the model’s robustness against potential experimental noise, a challenging test was conducted by progressively adding Gaussian perturbations (o = 2.0 to 8.0) to measured mTP values during training, simulating mild to severe data collection noise. This test included the full pretrained LUMI-model as well as two strong baselines, LiON and MolCLR. Among all noise levels, the LUMI-model consistently achieved superior performance with growing margins in comparison to others. Notably, at the highest noise level (o = 8.0), the top 25% Pearson correlation for LiON dropped to approximately 0.37, while the LUMI-model maintained a substantially higher median correlation of 0.51. These results supported the conclusion that pretrained model can leverage generalizable representations learned during the self-supervised pretraining, offering improved robustness and reliability under data distributional shifts.1.6 Benchmarking Active Learning Strategies
[0226] To evaluate the active learning strategy further, retrospective simulations using the public benchmarking data of QM9 (Zhenqin Wu et al. “MoleculeNet: a benchmark for molecular machine learning”. Chemical science 9.2 (2018), pp. 513-530) (= 134k molecules) were implemented to compare the exploration-exploitation balanced strategy against random and exploration-uncertainty-based baselines. A base modeling approach of MLP over Morgan fingerprints was used, which has demonstrated reasonable performance in previous pre-experimental dataset and is suitable for this large-scale simulation because of its high computational efficiency. All active strategies were used to optimize the training of the base model architecture in five-fold settings per round, to predict the polarizability of the molecules in QM9. In each round, a strategy selectively acquired 184 new data points from the observable data pool (90% of QM9) and was evaluated by Top-k enrichment metric (average true polarizability of the top 180 predicted molecules) on a held-out 10% test set.1.7 “Wet Lab” Materials
[0227] All materials were prepared and processed under nuclease-free conditions throughout synthesis and formulation. CleanCap Firefly Luciferase mRNA and CleanCap M6 CRISPR-Associated Protein 9 mRNA (N1-methylpseudouridine), both sourced from TriLink BioTechnologies, were used as purchased. All mRNAs were stored at -80 °C and thawed onPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 ice prior to use. MC3 and SM-102 were obtained from Echelon Biosciences. Amine headgroups and other precursors for ionizable lipid synthesis were procured from Sigma-Aldrich and TCI America. Lipid tails were synthesized and purified using flash column chromatography, and their final structures were confirmed using1H NMR (400 MHz, CDCI3). The synthesis methods and LNP formulation procedures followed previously published method. High-resolution mass spectra of the synthesized materials were acquired using an LC-MS spectrophotometer at the Centre for Pharmaceutical Oncology, University of Toronto. ONE-Glo™ luciferase assay system (Promega) was used for the detection of firefly luciferase reporter gene expression in vitro. Cell Counting Kit 8 (CCK-8, ab228554) was purchased from Abeam. Single-guide RNA (sgRNA) was chemically modified with 2’-O-methylation at the 2’-hydroxyl group and phosphorothioate bonds at the non-bridging oxygen in the phosphate backbone, specifically at or between the first three and last three nucleotides. This sgRNA was purchased from IDT. Cy5-mRNA was synthesized in-house following established protocols reported in the literature. Lysotracker Green DND-26 (Cat. No. 8783S) was purchased from New England Biolabs (NEB). Hoechst 33342, trihydrochloride trihydrate was obtained from Thermo Fisher Scientific.1.8 In vitro Luciferase Assay and Fluorescent Imaging
[0228] To create the high-throughput screening library, a modular synthesis approach was designed, enabling systematic variation of functional groups. This strategy required key building blocks, including isocyanides, amines, carboxylic acids, and aldehydes, which served as fundamental reactants in the formation of intermediates leading to the final products. Briefly, 96 distinct stock solutions of amines, carboxylic acids, aldehydes, and isocyanides were prepared in a 96-deep-well plate and dissolved in ethanol. Using an OT-2 liquid handler, these solutions were transferred into a 96-well PCR plate. For the synthesis of the final products, all isocyanides and amines were commercially sourced. A portion of the tails of aldehyde and carboxylic acid were synthesized as detailed as described below in “SYNTHESIS OF LIPID TAILS”.
[0229] Ionizable lipids were synthesized directly within each well by mixing the components in a 1:1:1:1 ratio and stirring the reactions for 18 hours.
[0230] LNP formulations and high-throughput screening methods were conducted as previously described (Yue Xu et al. “AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery”, Nature Communications 15.1 (July, 2024), p. 6305) using the LUMI-system 100. For each iteration, all the lipid candidates were formulated into cargo-LNPs and tested in duplicate. For in vitro screening, all cargo-LNPsPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 were formulated using a molar ratio of 35:16:46.5:2.5 for ionizable lipid, Dioleoylphosphatidylethanolamine (DOPE), cholesterol, and dimyristoyl-sn-glycero-3-phosphoethanolamine-N-(methoxy(polyethylene glycol)-2000) (C14-PEG2000), following the protocols previously established (Yue Xu et al. “AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery”, Nature Communications 15.1 (July, 2024), p. 6305).
[0231] The HBE cell line was obtained from Sigma-Aldrich and maintained in Eagle’s Minimum Essential Medium (MEM) supplemented with 10% fetal bovine serum (Gibco) and 1% Penicillin / Streptomycin (Gibco). To evaluate the impact of brominated and non-brominated ionizable lipids on mTP, LUMI-6 and its debrominated derivative, LUMI-6D, were synthesized, as described below. The ionizable lipid synthesized, a helper lipid, cholesterol, and C14-PEG2000 (all from Avanti Polar Lipids) were used to formulate LNPs.
[0232] Unless otherwise specified, distinct mRNA-LNP formulations were created using different helper lipids (DSPC, DOPE, or DOTAP) and molar ratios (ionizable lipid:helper lipid:cholesterol: PEGylated-lipid) as follows: a DSPC-containing formulation (50:10:38.5:1.5), a DOPE-containing formulation (35:16:46.5:2.5), and a DOTAP-containing formulation (30:39:30:1). The resulting mLuc-LNPs were incubated with HBE cells for 18 hours. After incubation, the cells were lysed by adding 50 pL of the ONE-Glo Luciferase Assay Reagent directly to each well. The plate was gently mixed on a plate shaker for 3 minutes to ensure complete cell lysis and then incubated at room temperature for 10 minutes. Luciferase activity was measured as relative light units (RLUs) on a Cytation microplate reader follow the vendor's protocol.
[0233] To visualize the intracellular localization of mRNA and lysosomal compartments, the following fluorescent probes were used: Cy5-mRNA (Aex= 649 nm, Aem= 670 nm): encapsulated into LNPs to track mRNA localization. Lysotracker Green DND-26 (Aex= 504 nm, Aem= 511 nm): staining lysosomes and late endosomes. Hoechst 33342 (Aex= 350 nm, Aem= 461 nm): used for nuclear staining. For fluorescent imaging assay, prior to transfection, the HBE cells were seeded in glass-bottom 35 mm dishes at an appropriate density to achieve 70-80% confluency on the day of imaging. For the mLuc-LNP treatment, the cells were incubated with LUMI-6 and LUMI-6D LNPs encapsuling mLuc at a final concentration of 100 ng per well. After incubation with the mLuc-LNPs, the cells were washed twice with PBS (pH = 7.4) and stained with Lysotracker Green (50 nM) and Hoechst 33342 (1 ug / mL) for 30 minutes at 37°C. The stained cells were then washed with fresh culture medium before imaging. Live-cell imaging was performed using a confocal laser scanningPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 microscope (Zeiss Axio Observer) equipped with a 63x oil-immersion objective. Excitation and emission wavelengths were set according to the fluorophore specifications.Fluorescence imaging data were analyzed using ZEN Microscopy Software (ZEISS).1.9 Manual Lipid Synthesis and LNP Characterization for Post-Optimization validation
[0234] The lipids of LUMI-1 to LUMI-6 and LUMI-6D were synthesized manually, and the detailed synthesis and purification methods are described in details below in “GENERAL SYNTHESIS OF LIPID CANDIDATES”. All the mRNA-LNPs used for I. T. injection experiments were formulated with the ionizable lipid, 1,2-dioleoyl-3-trimethylammonium-propane (DOTAP), cholesterol, and C14-PEG2000 in a molar ratio of 30:39:30:1.
[0235] The other mRNA-LNP formulations, including SM-102, and MC3, were prepared with a molar ratio of 50:10:38.5:1.5 for ionizable lipid, distearoylphosphatidylcholine (DSPC), cholesterol, and C14-PEG2000, respectively.
[0236] The mRNA-LNPs were synthesized using a microfluidic chip device by combining an aqueous phase containing the mRNAwith an ethanol phase containing the lipids. The aqueous phase was prepared using a 10 mM citrate buffer with the desired concentration of the mRNA. The ethanol phase was prepared by solubilizing a lipid mixture, which included an ionizable lipid, helper lipid, cholesterol, and C14-PEG2000. The lipids were mixed with the appropriate molar ratios as determined according to the LNP formulation parameters.
[0237] Following synthesis, the mRNA-LNPs were subjected to dialysis to remove ethanol and exchange the buffer. Dialysis was performed against phosphate-buffered saline (PBS) using either a 10,000 molecular weight cut-off (MWCO) Pierce 96-well microdialysis plate (ThermoFisher) or a 20,000 MWCO dialysis cassette (ThermoFisher). This step ensured buffer exchange while maintaining the structural integrity and stability of the mRNA-LNPs.
[0238] The size and polydispersity index (PDI) of mRNA-LNPs were measured using Zetasizer Nano ZS (Malvern Instruments) for quality control.1.10 Toxicity Assay and Safety Assessment of Lipid Candidate
[0239] The cytotoxicity of the lipid candidates was assessed using the CCK-8 assay following the manufacturer’s instructions. HBE cells were seeded into a 96-well plate at a density of 5 x 104cells per well. The mRNA-LNPs containing the test lipids were added into the cells and followed by an 18-hour cell incubation. After the incubation, 10 pL of CCK-8 reagent was added to each well containing 100 pL of culture medium. The plate was then incubated at 37°C for 2 hours in a cell incubator. The absorbance was measured at 450 nmPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 using a plate reader, with the absorbance intensity corresponding to the number of viable cells.
[0240] Luminex and xMAP technology was employed to quantitatively and simultaneously measure 32 mouse cytokines, chemokines, and growth factors. Multiplex analysis was performed by Eve Technologies Corporation (Calgary, Alberta, Canada) using the Luminex 200™ system (Luminex Corporation / DiaSorin, Saluggia, Italy) and Bio-Plex Manager™ software (Bio-Rad Laboratories Inc., Hercules, California, USA). Samples were analyzed with the Mouse Cytokine / Chemokine 32-Plex Discovery Assay Array (MD32) following the manufacturer’s instructions (MILLIPLEX Mouse Cytokine / Chemokine Magnetic Bead Panel, Cat. #MCYTOMAG-70K, Millipore Sigma, Burlington, Massachusetts, USA). The 32-plex panel included: Eotaxin / CCL11, G-CSF / CSF-3, GM-CSF, GRO-alpha / CXCL1 / KC / CINC-1, GRO-beta / CXCL2 / MIP-2 / CINC-3, IFN-gamma, IL-1-alpha, IL-1-beta, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-9, IL-10, IL-12(p40), IL-12(p70), IL-13, IL-15, IL-17, IP-10 / CXCL10, LIF, LIX, MCP-1 / CCL2, M-CSF, MIG / CXCL9, MIP-1-alpha / CCL3, MIP-1-beta / CCL4, RANTES / CCL5, TNF-alpha, and VEGF-A. Assay sensitivities for these analytes ranged from 0.3 to 30.6 pg / mL; individual sensitivity values were provided in the MILLIPLEX® protocol (Millipore Sigma).
[0241] The hemolytic potential of mRNA-LNPs formulated with LUMI-6 or the benchmark lipid SM-102 based on classic DSPC formulation was evaluated using an in vitro hemolysis assay. Freshly isolated murine red blood cells (RBCs) were incubated with the mRNA-LNP formulations for 1 hour at 37°C. Phosphate-buffered saline (PBS) and 1% Triton X-100 served as the negative (0% hemolysis) and positive (100% hemolysis) controls, respectively. Following incubation, the samples were centrifuged to pellet intact RBCs. The extent of hemolysis was quantified by measuring the absorbance of the hemoglobin released into the supernatant at 540 nm. The percentage of hemolysis was calculated relative to the positive control.
[0242] To evaluate the complement activation, the concentration of the soluble terminal complement complex (sC5b-9) in plasma was quantified using a commercial sandwich ELISA kit (Invitrogen, Thermo Fisher Scientific, Waltham, MA, USA, Catalog #EEL207) according to the manufacturer's protocol. Blood samples were collected in EDTA tubes, immediately placed on ice, and centrifuged at 4°C to separate the plasma, which was then stored at -80°C until analysis. For the ELISA assay, diluted plasma samples and standards were added to microtiter wells pre-coated with a monoclonal antibody specific for an sC5b-9 neoantigen. Bound complexes were subsequently detected with a horseradish peroxidasePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 (HRP)-conjugated antibody and a TMB substrate. The absorbance was measured at 450 nm, and sample concentrations were calculated from a standard curve generated using a four-parameter logistic curve fit.1.11 In vivo Luciferase Assay and Gene Editing Study
[0243] To create a plasmid for ABE gene editing, the ABE8e gene (Addgene, Cat# 185910) was inserted into an IVT template vector. NEBuilder HiFi DNA Assembly Cloning Kit (NEB, Cat# E5520) was used to build a construct that included a T7 promoter along with 5'UTR (GGGACATCGTAGAGAGTCGTACTTAGAAAAATCTATAGCAGAAGTCAGCGGTAGACGC ACGGCATAGCATCCAAC) and 3'UTR (CAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAAC AGCAGTGATTAACCTTTAGCAATAAACGTTTAACTAAGCTATACTAACCCCAGGGTTGGT CAATTTCGTGCCAGCCACACCGAAA), all in preparation for in vitro transcription. The plasmid for ABE8e (Addgene, Cat# 185910) was cloned into an IVT template vector. This vector, which contained a T7 promoter and specific 5' and 3' untranslated regions (UTRs), was assembled using the NEBuilder HiFi DNA Assembly Cloning Kit (NEB, Cat# E5520). To synthesize ABE8e-NGG mRNAs, the resultant plasmids were linearized with the enzyme Ndel (NEB, Cat# R0111) and purified using a FastPure Gel DNA Extraction Kit (Vazyme, Cat# DC301). Subsequent in vitro transcription (IVT) was performed with the HiScribe T7 RNA Synthesis Kit (NEB, Cat# E2040), where UTP was substituted with N1-methyl-pseudouridine-5'-triphosphate (SyngeneBio). The IVT product was then precipitated with lithium chloride, and the resulting pellet was washed with 70% ethanol, air-dried, and resuspended in nuclease-free water.
[0244] Following transcription, the mRNA underwent sequential modification. First, it was capped using the Faustovirus capping enzyme (NEB, Cat# M2081) and 2’-O-methyltransferase (NEB, Cat# M0366). A poly(A) tail was then added with the E. coli poly(A) polymerase (NEB, Cat# M0276). The final product was purified, quantified using a NanoDrop, and adjusted to a concentration of 1 ug / pL for LNP encapsulation.
[0245] All animal experiments were approved by the University Health Network Animal Resources Centre and conducted in compliance with Animal Use Protocol (AUP) guidelines. C57BL / 6 and B6. Cg-Gt(ROSA)26Sorttm9<CAG tdTomat°)Hze / J(Ai9) mice were purchased from Jackson Laboratory.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0246] For the lung bioluminescence assay, 50 pL of mLuc-LNP (0.25 mg / kg Flue mRNA (mLuc), 0.1 mg / mL mLuc-LNP) was administered via LT. injection. Six hours post-dosing, the mice received an intraperitoneal injection of 2 mg D-luciferin (0.1 g / kg, 10 mg / mL) to facilitate bioluminescence imaging. Anesthesia was provided using 1.5% isoflurane in oxygen, and the mice were euthanized 10 minutes later. The lungs were excised and imaged using the In Vivo Imaging System (MS, PerkinElmer). The total flux (photons per second) of bioluminescence in each organ was quantified. Bioluminescence imaging data were analyzed and quantified using Living Image Software (PerkinElmer).
[0247] For CRISPR-Cas9 gene editing in Ai9 mice, 50 pL of LNP-Cas9 / sgRNA (0.75 mg / kg Cas9 mRNA and sgRNA at a weight ratio 4:1, with a Cas9 mRNA dose of 1 mg / kg) was administered via intratracheal injection according to the specified dosing schedule shown in FIG. 13A. Nine days after the first dose, the mice were euthanized, and their lungs were harvested for flow cytometry analysis and immunofluorescence staining. Briefly, the tissues were washed twice with 1xPBS, followed by overnight fixation in 0.5% paraformaldehyde prepared in 1xPBS. The cells were then resuspended in 1xPBS containing 5% FBS and analyzed using a BD Biosciences flow cytometer. The antibodies and dyes were listed in Table 2.
[0248] Table 2 Antibodies and Dyes for Flow Cytometry and Immunofluorescence StainingAntibody / Dye Conjugate Source (Cat ^t)Flow Cytometry StainingTruStain FcX™ blocker - BioLegend (Cat, # 101320) CD31 Antibody Alexa Fluor 591 BioLegend (Cat # 102520) EpCAM Antibody Brilliant Violet, 421 BioLcgcnd (Cat - 118225) Zombie NIR"MFixable Viability Kit - BioLcgcnd (Cat # 423106) Fix / Perm. Kit - BD Bioscienees (Cat # 55-1711) Immunofluorescence StainingAcetylated Tubulin Antibody Alexa Fluor 647 Santa Cruz (Cat, -ft se-23950) CC10 Antibody Alexa Fluor 488 Santa Cruz (Cat # sc-390313) DAPI - Invitrogcn (Cat # P36931)
[0249] For ABE in vivo gene editing, LumA mice received an I. T. administration of LNP-mABE / sgFluc (50 uL) following the dosing scheme shown in FIG. 14B. The injected solution (0.3 mg / mL) was formulated with a 1:1 weight ratio of mABE to A9 guide RNA, resulting in a dose of 0.75 mg / kg. The mice were sacrificed 11 days post-injection, and lungs were harvested to quantify luminescence.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0250] 1.12 Mechanistic Investigation of LNPs
[0251] Small-angle X-ray scattering (SAXS) measurements were performed on an Anton Paar SAXSpace system equipped with a Primux 3000 Cu K-alpha X-ray source (A = 0.154 nm) and an EIGER R 1M detector in the Hospital for Sick Children’s Structural & Biophysical Core Facility. Data were collected in line-collimation mode with a sample-to-detector distance of 317 mm, covering a q range of 0.03 to 7.5 nm1. The mRNA-LNP samples (10-12 mg / mL lipid) were loaded into 1.5 mm quartz capillaries (Hilgenberg GmbH) and measured at 25 °C for 2 h. Background subtraction was performed using a buffer measured in the same capillary. Bulk-phase samples were analyzed for 2 h in a thermostatic sandwich holder with mica windows. A concentrated mRNA-LNP formulation (10-20 mg / mL lipid) was applied as a small drop onto a polymer-coated, carbon-reinforced copper grid, blotted with filter paper to form a thin film, and vitrified in liquid ethane at -180 °C to prevent ice crystallization. Sample preparation was performed immediately before imaging, and the grids were maintained at a temperature below -165 °C throughout the examination. Specimens were initially screened using a FEI Tecnai™ F20 electron microscope (Thermo Fisher Scientific) operated at 200 kV and equipped with a Gatan K2 Summit direct electron detector at the Hospital for Sick Children’s Nanoscale Biomedical Imaging Facility. Images were acquired at 25,000x magnification in counting mode, corresponding to a pixel size of 1.45 A and an exposure rate of ~5 e-Zpixel / s. Data collection was automated with EPU software (Thermo Fisher Scientific). Image processing was conducted using iTEM software (Olympus Soft Imaging Solutions GmbH), and particle size distributions were determined from three high-quality images per sample, analyzing >100 particles for each LNP formulation.1.13 Statistical Analysis
[0252] Statistical comparisons between two groups were conducted using a two-tailed Student’s t-test, while the one-way analysis of variance (ANO A) was used for comparisons involving more than two groups. Data analysis was performed using GraphPad Prism 10.0. Statistical significance was defined as P < 0.05, with significance levels indicated as *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.TRAINING DETAILS FOR LUMI-MODEL
[0253] Table 3 provides the parameters for unsupervised pretraining.
[0254] Table 3 Unsupervised Pretraining ParametersPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 Parameter ValueBatch size 448Scheduler Polynomial decayLearning rate 1 x 1(11Optimizer AdamOptimizer betas 0.9. 0.9!)Epsilon 1 x l()-fiWeight decay 1 x It)-'1Dropout 0.0Warmup 1000 stepsTraining steps 200000 stepsModel width 512Number of attention heads 04Atom mask probability 0.15Atom coordinate noise type uniformAtom coordinate noise scale 1.0Masked atom loss scale 0.5Coordinate recovery loss scab; 5.0Contrastive loss scale 10.0
[0255] The continued pretraining parameters are provided in Table 4.
[0256] Table 4 Continued Pretraining ParametersParameter ValueBatch size 448Scheduler Polynomial decayLearning rati; 2 x 10-’’Optimizer AdamOptimizer betas 0.9. 0.!)!)Epsilon 1 x 10 *’Weight, decay 1 x 10-'1Dropout 0.0Warmup 1000 stepsTraining steps 40000 stepsModel width 512Number of attention heads 04Atom mask probability 0.15Atom coordinate noise type uniformAtom coordinate noise scale 1.0Masked atom loss scale 0.5Coordinate recovery loss scale 5.0Contrast ive loss scale 10.0lipidiothors sampling ratio 1:0.5CHEMICAL LIBRARY FOR EXPANDED LIPID DATASET
[0257] The chemical library for expanded lipid dataset is provide in Table 5.
[0258] Table 5 Chemical Library for Expanded Lipid DatasetPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 ID SMILESCategory A: AmitiesAl W’CN(C(T(C(,MC( 'XA2 (’< 'X((’( ’)( ’(’('(’(( ' jXA3 MX’K 'C ((’)( ’( ’JA-i r ciqx u ojat’t |c ■<> u; icc2An Kcccxfcrc.xijrccxAlt X('(’1<’CX((’( ('())( '('1AT M'CK'CNfCCCCOjt'ClAS M 1< X'N(( ’('('( 'CO jCClAll 1YC1CCX(CCCOX ' LAlli 1Y( C< X’((V( jCVlAl l NCICCXCCCCC'COJCCIA12 XCCClCCX(( j(‘CIA 13 XCCC ICCX(CCCO)CCJA l l M’CC lCCXICCCCOX'ClA 15 xrcxtc x 'ICCA Hi XC(CCI JCC I MfCCjCCA l? XC(CCI; CCC I 2CCCC2A IS M'(CCI )CCC IX2CCCCC2A 1 it M '( CC1 )CCC 1 N CCCC C2A U NCCCCMCCCC1A L XCC1CX(C)CC1A22 NCCCCN(C)CA23 XCCCCCX(C)CA2 I X(’ M CX(C)CCJA A XCK(C I )CCN1CKA2ti XtXX’Xa 'CJ lCCX JCCCXA 27 ( )-( ’ < ’(( ’<’( '( 'K JM jXC ’fm’CN JCJ -C)A S < )-{’(( ’(I ’<’<’N)X 1 JM 'it VCX }CL —( >A 9 l [][fCX)M lX[(’Hi i](f('N)(’ 1 -0A3n () <’((’! ('MX J }XC(CX)<’1 ■ OA31 O (XCrNjNl lNQXK'l OA3 OCfOJ K 'C'CtCCNX 'C'lA3H X( '( ’X [( 'if ’A XM t VCCeiA X( ’( ’(’X 1(’( 'O( ’(’ IAli M'CMCCXCClA37 XCCC (CjCA3S NCCtCCCl lXlCCAM ixciccxcccijcA 1U NCCMCCKKX 'XAll XNICCOCCIA42 XCCMCCJCCCALl X(’(’(’ ((')C(’CXA l l M VKK’CCC]A 15 XC 'CXtCfOCM ’K 'K 'A Iti X( 'I -C( '(<’)-XM CPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 ID SMILESA 17 '(’(’Nyf (’)( ■(’A I CN(X; CA l!) XCC X ICt CIA-5U XCCXICf V( IA5I XCrcX(C( CVA-r> XCB V I:■('( ’<■ jNiCjrA53 XCreX lCB’lCCCClA51 xe i cxx riA55 X 'K'CXrCiCjC.iCClA5fi XC I X( ’2 t ’l’ ( '{ ' 1 ’2X 1A57 XCClCCN((V()jCClA5N XV(’ LC( ’X( ( ) j( V LA5D f JCVIf’XB C jCCriAGIJ ( 'jX i ccc ecx.icc ]Afil UCCIC’CXiCC^ CC(X) ( '< ■ t ’2|C( ’lAG CX 1C CX ('It'S63 x< ] X CH IAM XC( I C('(V( VICategory B: Isocynniti s111 ( ’( '( 'f Cl X | / / |C ’-|B2 ( ’< '('('( '( (( V X+ 1 / / [( ':111! ( ’( '( ' t ’j I |C-|B l ( (C )(,:X < ] [(.’ |B7> CCCit’li 'lX I | / > <l fi ( VCr;rjC| X - B7 ( '( '('(,I. CJ< X,| ' ■ / ! |C-|I JN (’( '{ ■t’CCi i’ X IB! J j jx i- 1 'Bui (’( '('(t’ciqx ■ / / ( ’-■B l I ( ’(.’(< 'B12 CCCrt’lCCfjX - i 'i ’i i i;i even rice x iBI I ( x 'cc '( '((' |.? [( ■;1115 C’C X I. / / |C ’-|BIG ( ’(.’ '( ’{( ’ '{ ■jC ’ X ] « |C ■B I T ( v< ■(’ ( '< '| -BIS (‘('('((’( 'jerxHid ( ’( 'er ’■!( ’( ')< ■[ ’ 1 1 |(’B d ( ’( '('( '( '(('{ 'iC 'ClX -KC:1121 ( iren X ’!! C’-|)C ’LB ( ’c J c c ' x i |f1l ire jB23 ( (Ci(’LC(’(’l|X 1 1 [<■-;:< C’Ji i i evict ’jrmviix - | / / 'e- jt viB25 ( '( '<'! ( ')( ' ] ( '( '{ '! I ' ■ / / 1126 e- a |.x i |e it ei eet v )eeiB27 ( X K '(■( ’( X <•] / ! [(.’ | )ec iI 12S CCX ICCC(|X 1 1 a --; CC’IB d evicjx irc iix - ' jC ’r iB:«l ( H IXK ( x'i? / / | -] iCC l11:! I CVC(rC|X IC( ( \X I |.y |C-JCC IPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.111) S MI LESB32 Vim (’(’( TiK H’i S i " K’-l1131 iB3- C( 'XiC( '|X I | / f « )<'( ’11311 (’(’CMCCCK’I’^': |; VB37 ( X’^i X ’l + ].? |(’ |B ccc(’Xic(’)cc|.x r i11311 C’<’LC ’< ’( '[ | X ‘ I ffBld ('( '( ' ] ( '( '( 'BiX -'!'! ' 'l11 1 1 (.’<’(( ■}< ■ 1 < ■< T’< X I a |C-|K’I111 ’K'CCf jX i J ff |t ’- |i( ’ lB-13 (’( '('(('ClClC’CCflX - | ff T '. / iC l11 hl ( '- / 1 IC ’IC l j( ’(’(’IC’2< 't ’(’( '2B1. CNK 'C'i -; / |C ] 'l11-1(1 C( 'X J ( '( X ’l 'X 1i i I? (’(’icjx im’i ix i | ff [c-i ic iBI CClCCjXlCCCilN-- | ff ( ' K ' lmil (’(’( 'B X ’j X H ’Cl ■( X: ii C’-| il’lB t> (■ « X i K'f ’Nji'-JCCCt^B3I c ’ l i.ii |C-|115-J ( ) Sf C' I | ff |(’-| CC Cl; OBX |C |, X-t ]('N ]('(<' C ( ’2 ) X XI115 1 Ct K ’B 'IX / / ( ' ) {)55 ( ) CB ’|X l j s i* ’-jiOCCB5I1 -' ir |X t ]ClCCC( ' l1157 - CX |B C:«:X l j(,(’XlC('O(,C’lB5 i; / ’-l 'Jlf' 'I HiO (’(VC X ■ j / T’-lBU I CClCXlX - < C11(12 C- / / 1 |r'l (C ('■<! <■ i r?< ’3 iC ’ < '<> < L | ] ( '|Cf f U| (( '7il C(’( 'c ’(T(,;jx - | / / CTB2 CCCCCCCiCjlK ■ X / '-]H3 rccfci x ■ / jc- iBl (’CiCJ-C X - '!f |C ■B5 (V< 'i< j( ’ | ' ff |C-|B(i ('( '( '( ‘i C 'l i l ff [C-|117 (’(’( '( V|(’|( '|X ■ | / / ( '- BN (’( ’( 'CCCi 'C IX ‘ if V ]B!) (‘( '('iccr I | |C-|111 (1 (’(’( '({ 'C’K’IX ■ | ff |( ’-|Bl l CCiCX'Cl -l-l < / |( ' ]I 2 C< ’(,(<' )C< ’|X ■ |.y |( '-|I ll s I ’(’CC((’ j( ’C| \ - 1 / / (■-■Bl l CCCCClCjCClK • / / |C-|1115 c crx ’cc -i x ■ I / < ■- Bld (’( ’('(■( (X’CK' X - ff V ]BI 7 C( '(■(’.:('( '|X 1 1 ff |( '-' )'( '<■( ■I ll s (’(’( '(( 'C’jf ’ClX - I ff ( '-■BID ('( '( 'CK 'CjCC iX - ]PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 ID SMILESH2I) ( ’(’( '( ’( ’(< Tice X I I " ( '■B'2J Ct 'lCCCf X I ] «=■[('-■ Jt’Cl ’1522 Ct JCCt’[|X - | / / C-] )t ’C l1 2: cc((')Ci{’t '('( X r.,7 1( ':( ■( ’ L15 1 CCiC iCK’CC l ] ) C]15'25 I ( ( ■(■( I X ■ / / |t'-|)('t 'Ii (i [c / t|x - ic '('Ct '-'icc JB‘27 t ’X ICCt’ilN ‘ \!i ’-Jjt’C I2H Ct 'X LCt '( '( [X f 't |( '-I jCt’l152 CC(C)XICCC( 1 1 / ;<■ Cl3D CC(CC)K1CC( 'i| ■ ' 1> |O-| )CCl15 1 c oc ix i ccjx +jn JCCl[532 | C-’ if | X Jt ' 1 ( ■{ ' X ( 2C CC H 'C IB33 CC X(C C)[K « |C-|15 I CCCXfCCCjC X I1 35 CXiCC X I ] ff [C- ’CB3(i CCCX(C CjCC|. / f |C ]1537 CCXlCCJC X I |;7|C-153H CC('C. XiCC)C |X-- |C ]1539 CLCCCCX I j ff [C-')( ’l15 IU C (']CCCl|N - | '-]iCl15 11 CCiClClCCCi X 5 j [c-;c i15 12 CC(CC)ClCCt x'| I ff C-l)ClB I3 CCC(CC)C1CCC(|X - j / |C |iCl[ 11 |( ' ff I X ‘ it '( t11 )( ’( ’( ' 1 ( '2< ’( '( ’( '215 ir> ex ] ( '{ ■( 'i |X ‘ I ■ / ( '-] 1C I15 Hi ( ’(’M t '< ’( 'i [x ■ ' If |(’-| ]( ' I1 17 CC(C)XICCC| ’ | I ff J ’-JC I15 IN CC(CCiNlCC(’(|N y / |C J iC l151!) t’(’C(CC IX L I ’C<"( X I ■ |C-;( ’|15. MJ [C;-y / |N - iCi Cl JCCX1C CCCC2[5 1 C X I | ff C- 1552 O' S(( '|X - ' ■ / / |C-| ltC l CO -C(C)C C l ) O1553 |C- / / |X ‘ |(:X IOiC CC C21 C2X X IB51 CtiCiC | W-J1555 O Cit'ljX -■ / / |C j'iOCC155(j [( J / / |X - ic iccec i[557 |C-J / | X ‘ |C 1 ( ■COCC I155N [C-J / jx - icCXJCCOCC]1559 ( ’(’(Ct ’(( ';( [X - / ' |C-| )C ji r ’iCBini ct '<■(’>■ i i ff:< <■Bill Ct ’(C )( X t ] ff [o JOBli2 |C-J / |X'- jt' l | ( '|< ■<> M l|2(:5 |C|C <= J l|3t ’ C" ■■ H|(C2 |Category C: Li id AldehydesCl t ’t’t '( ’(’(< ■)(’ ( )C Ct 'CCCCf CjC- 0C3 Ct’CC(’CC(CjC t >ci ct 'ccccccrcjc oC CCCCCCCCCit’jC 0C6 COCCt 'CCCCt ’Cf Cit ' {)PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 ID SMILES CT o(> C C’( ’( '( ’(’( VC ’( 7 t’j(’ oC‘J O C( ( ’{( 'C( ’( 'C( ’('C )( ’('( '( ’( 'C:i(}( ’( '{ ’( ’C({ ’j( ’ o(’Ll) ( ■( ’( ’( ■( '( ’( ■( ( ■( ’( VC '( I ■■! ( ■:< ■ t ) j o( ’ L l ( '( '( ’< '<’( ’( 'Ci ( ’)( '( ( '( ’( '('( '( ' )(':O<’L2 ( '( ’( ’( 't ’t’CCt’t '( ’( ’( '( ’( ’(( ' )( ’ ( }( ' 13 ('('(’('('( '('(‘(’C( ’( '( '( '( '( '( (')( '• OC’L I t 'CCCt ’(’Ct ■( ’( '( ’( ’( '( ’( ’( '( X«’J( ' ( )Cir> C('( C< '( '('( VC ■( '■..< '( 'C( '( 'CCfCJC- o( ’ l(i ( '( ■( ’( '( ■ / ( ' t ’\C / C C\CCC( '( ’( '( ■(( ’]( ' t )CL7 ( '( '( '( '(( ’ICC- o 't ’ LS CCCCC(( ’)( '( ’ ( }(' Ll? crccccicjcc oC’ H) erect ■(’( '(( ■)(’(' ( )( vi ( 'jcc o( ’ 2 erect. ■rccvci: rj( ’c t i( ’23 ( '< ’( ’( '(’CCt ■( ’(( ' )<’( ' {)C 1 c recrrcrciorr t)( vr> o C( ( ’{c< ( ’Ccrc< )( ’c< ( ’CC )( K ’ce( ’[C r( ’ r( ’2 ( em ’('( (’( ’C( ’)( ’(( )(■( ’(’Ci ( ’iCC 0) o( * 7 ccrcrrccrictCK’crrcjr o(’2 ( '( ■( ’( '('( ’( '( ‘( ’( '( '( ’( '(■( ’(( ' ’jC ()(’2i? ccccrcccecrrcrcci rc ic o( ’2b < ■( ■( ’< '(’( ’( '< '( ’cr( ’rex ’er.: ( ’or oC2i ccrccrcc / c r cccrccrtrcjr o( ’22 ( '( ■( ’cc / c r\(’ / c r\c(’cr(’cc.:(’c;r o( ’23 ccrctrc’ict ’ e( ’2! CC( ’( '{'.: ( ’( ' )( '( ’ o( ’25 Ct ’( ’( '( ’( ’ft '( ’)(’( ■ ( )( ' b CC( ’( '<’( ’( '(<’( ’)( '(' •()( ’27 er CCCC ’( '< ■( ( ’( ')< ’( ’ {)( ' s ( 'a 'c< '( '( '{L( ’(■( r( 'jcr- o( ’21) ( '( ■(’( ■( ■(’( ■( ■(’(( '( ’)(’( ' t )( '■ID c- r( '('( '( 'C( ’( '(crjcc- o(’ L l O ( '( C{ ( ■( ■( ’( ■( ■( ■(■( ■ )( ’(■( ■( ’( ■( ■ It H ’( ■( ’( ’(( ■< ■)( ’( ■ ( )( '■12 ( '< '( ’( '< '( ’( '( (■( C( ‘i( '(()('('( '( '( ( ’( " ICC •())• O( '■1 ( '( '( '( 'C( '( '( ‘( '(( 'iCC ■;< '(’( '( '( ’)( ' ()( ’ l l Ct ■( ’( '( ■( ’( '< ■( ’( '( ’( ’( '( ■( ’(( '(’( ’]( ' ( )dr, < ■( ’( '< '<’( 'cc( *( '( ’( 'ret '( '( (’( ■( ':.(’ o( ’ ib crrcrrccrcrrcrrcrv ’crjr o(’ 7 cccccrcc / c r c( cere ■< ’(cccit1or ih crrcc / c (’\c / c cxcrccrcctrcrir o(’ Ll? ( '< ’( '( '<’( ' ()r. n cc(’( '<■( ’( ' o(Vil ( '< ’( '( '<’( '( '< ’ o(’ 2 ( '( ■( ’( '{’( ’CC( ’ o( ’5 ( '( ’( ’( '(’( ’( '( ’VC ()( ’ 1 ( '( ■( ’< '<’( ’( '{ ’( ’( '( '( ’ ()C ct ’( ’( ■( ’( ’( 'rrer o( ’ t> C ( ’( ’<■( ’( ’< ■( ’( ’( ' ’ ()PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1]|) SMI LESC57 O C(C(CCCCCCCC|CCCCCC|OCCCCCC ()rr,s ccccccciccccjcitx’ccccc oj o( T,9 ccccccccc{ccccccjc oC(j(l C( '{ ‘C( 'CC( '( '(’< '{ 'Cl '( '( '■ oCfi l CC< ■(’CCCCCCCf 'CCCCC <)C(?2 ( ’('{ ’( ’CC( ’CCC< •()CIU CCC( ’CCC ’C / C 'CCC ( )C(j i ccccc / c- 'ccccccc- o( iih ^ui y I >2 ] jL[]i<l ( ’?irl i«xy lie AriclsDI ociccfci )( ( -2 jco, U II|: ICT '^II|2< 'C I. II| i c:n oD2 ( K ’(( ■[( ■ <1 <11 (( ’! j | C2)C|C ' Ip(( )]( '[( ' <ilT2( '|C<il IJ IC3) (>D3 (K 'lCCNJCCCCCl j- OD I CX(C( '({)) ( ) )( ’Da CX / CCC / Oi O1CDG CX(CCCC(O] ();('Dr (K '(CN ICCX •<■:. 0D> (X’lCNlCCOjCCOj (JD!) (} ( '(O lCCCCl irDUJ () '('(’( '(’BrDl l O ( '(OlCt’CCC’C '< ']![■DI 2 (’(. '( ’( ’(. '{ X ’((. '( X ’CX'iO) ()Dm t K ’(Ci ( ’cc< ’ccc ’D)C< ’rC(’C) oDl l CCCCC( ■( ’(’(() ) 0DI 5 (>( ’(( '( ’CC<’CC( ’i ODI G (’( '(( 'CCiO j OiCCCCC’DI? t }(’(( '( ’C’CCCCt ’(’) ODi s (K’i / C ’( '{ '( '( 'CCi •()DI G (>( ’(( '( ’( ’CC( ’C( ’C’CJ (JD2U (X 'iCCCCCCCCC’ / rCJ- ()D2 I tK’(C( ’C’CCCC( ’C C) ( )D22 (X ■((■<'( 'CCCCCCCCj -OD2;! CCCCCCC’CCCCCC’CiOj ( )1X21 CCCCCCCVCCCCCCCCiO.l- •()D25 rcrrcrrccrccrccrccit n o1X26 CCCCCCCC / r •C\CCCCCCCC(O) •(}D2? rcccc / c c'-.c / t’ cv’crrcr’cciO) oD2S CK ’iCt ■( ’( ■( ■(( K ’Ci t ’( ’er ’CC)( ’rcccct ■( ’] ()) ()D2H (X 'lCCCCCfOClCXX 'CCCCCCC) 0) 6D3H CCCB X’ICCCCCiO) ()i OiCCCCCCCCD3I (X 'lCCCCCfOCiCCCCCCjCCCi O'; ( )] U2 CCCCCCCVC'OCfCCCCC'Oj O) OiCCCCCCCCD3H (’CC((’CC((> C(CCCCC(Oi (>1 <))('(’(( 'jCX’CCCD21 (X’l / 'CCCnCX’CCCCCCCCC) O) oD35 oc(ccccc.:occccc<’cccrc) u;. uD4G (X’iCCCCCtCX’CCCCCCCCC C ) 0) (JD37 CCCCCCCICCCCCJCIO) O1XK (X’iCtCCCCCCCCjCCCCCCCj 0D3!) CC< ’(’( '< 'Cl C jCi Oi OD 1IJ (X ’iCK 'JCt 'CCCCCj OD l l (K ’(CCC(C( ’CCCiCC) ( )PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1ID SMILESD.rj DQCfCjCCCCCCCCj-OD i ()('( / (■ •< X X ’) oD M (X '(('(< ’)(XXXXXXXX') oD 15 (X'(( (<')(XXXXXX X’ / ? C) (>D ili (X '((’(< ')( X XXX? C) ()D 17 (X'((’K')(XXXXXXX XX’)-()JJ-lfS (XXXXXXXXXXXXXOt’tOJ-ODI!) OC((,CC(,(,(< J(’«’C)(,(,CC(,(X. X O}— <3Dsn cxxxxxx xxxxx xxx‘( )C(O)-oD. T] CXXXXXXXXXXXXXXXX’«’)('(O) OD.72 CXXXXXXX / C < '\CXXX XXX ‘(O< '(□) OD 3 tXXXX’ C (,(XXXXXX’( (,(()} ()D. -1 (X'f(XXXX’((XXX(XXXXXX’)(XXXXXXX’J ()) oD - CX'((XXXX ({X «X )( XXX C X ■)-())-()D5(j (X XXXXXX’(O(’(CXXXX'(O)-O)-O)(XX’5 DC((XXXXW((XXXXX')(XXXD. (X '((XX X X (O('(CX XXX XXX’JCXXXXXXXX ')—())—()D.7J e ’(( 'jex ‘((X '[( x x:((}) o) oj( x‘((xxx:)cxx'DtiO (X’fC XXX OtXeXX’JCXXXXXXXX!) O) OUtil (X'fCXXXX OCXCCXXXXXXXXXX ) ()) ()DESIGN OF MECHANICAL MODULES OF LUMI-LAB
[0259] An example LUMI-lab 104 is shown in FIGs. 1 and 5-7.Liquid Sampler
[0260] In an example implementation, the liquid sampler 309 is designed with two main components: a storage area and a sample loading module (shown more clearly in FIG. 6). The storage area comprises 96 peristaltic pumps 612, each paired with a 5 mL syringe 614. Each pump 612 is dedicated to dispensing a single type of chemical liquid. Through silicone tubes, the peristaltic pumps 612 transfer the liquid from the syringes 614 to the corresponding wells of the well sampler cap. The precise alignment of the loading heads on the sampler cap with the wells ensures accurate liquid dispensing (FIG. 6).
[0261] To optimize space utilization and maintain modular expandability, the 96 pumps are arranged across two layers. Each syringe 614 is labeled with a unique QR code for tracking raw material usage. During operation, the 96-well plate is positioned on a motorized base that moves into the loading zone via a screw-driven mechanism. Based on the loading volumes provided by our model, precise and individualized dispensing is executed.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0262] To mitigate errors caused by evaporation of residual liquid within the tubing during periods of inactivity, the system pre-wets the tubes by pumping out liquid before each operation. Subsequently, the residual liquid is completely retracted, followed by the formal loading process. To address discrepancies arising from variations in tubing lengths, a linear correction model is applied to adjust loading times, ensuring high accuracy and efficiency.
[0263] The sampler’s controller 617 utilizes a modular stacking driver board system, enabling scalability. This design allows for the potential integration of additional peristaltic pumps, significantly expanding the range of chemical liquids that can be sampled, thus demonstrating considerable future potential.Feeder for Pipette Tip Racks and Well Plates
[0264] Reference is made to FIG. 7, which illustrates the example feeder system 320 for pipette tip racks and well plates in the LUMI-lab 104 according to an embodiment.
[0265] The feeder system 320 is an automated system designed to replenish experimental consumables, such as tip racks, with high precision and efficiency. Its structural components are composed of aluminum extrusion frames and 3D-printed parts, ensuring a lightweight yet durable design. The core mechanism relies on a stepper motor 610 coupled with a lifting platform 619. This synchronous lifting mechanism enables the smooth and accurate positioning of consumables during the replenishment process, minimizing the risk of mechanical errors.
[0266] To enhance usability, the feeder system 320 includes a manual control interface that allows operators to lower the platform 619 for restocking experimental consumables manually. This design ensures operational flexibility and facilitates straightforward maintenance and replenishment tasks. Additionally, the system features a modular design, allowing for the seamless connection of multiple feeders via standardized connectors. This modularity not only supports scalability but also enhances adaptability to varying experimental setups, making it suitable for a wide range of laboratory environments.
[0267] The control method of the feeder system 320 is designed for consistency and interoperability. It employs the same control methodology as the clamper system, utilizing a Raspberry Pi and a motor driver board for precise actuation. This integration ensures compatibility with other automated laboratory equipment, streamlining system management and reducing the complexity of multi-device coordination.DESIGN OF SOFTWARE MODULESPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0268] An integrated software framework to orchestrate the LUMI-system’s mechanical components and manage complex, high-throughput experimental workflows was developed. The overall architecture for the software framework is illustrated in FIG. 3, which interconnects the hardware resources, local control panel, and cloud computing. In an example implementation, the control system 106 may be configured to implement the integrated software framework.
[0269] The system architecture addresses the following in automated experimentation: (i) parallel control of multiple hardware operations, (ii) closed-loop integration between computational modeling (LUMI-model 102) and “wet lab: experiments (LUMI-lab 104), and (iii) the human-computer interface or graphical user interface (GUI) for progress monitoring (such as that hosted via the control system 106).
[0270] To achieve the parallel control of hardware, a task planner for scheduling the tasks based on the pre-defined experiment protocol and the availability of the labware and consumables was designed. The task planner accelerates the experiment by maximizing the utilization of the available resources as depicted in FIG. 2. Most of the deployed hardware is controlled through HTTP or SSH protocol, including the robotic arm (via Universal Robot UR5e Dashboard Server), liquid handlers 306, 308 (via Opentrons API v2), and plate reader (via Microsoft Windows OLE service with Fastapi interface). For other hardware, including liquid sampler, cell incubator 310, and plate feeders, two Raspberry Pi’s were used for controlling the malicious motors and sensors, Fastapi was used for communicating with the Raspberry Pi’s. In some examples, the hardware and software components can be controlled via the central control system 106.
[0271] In some example implementations, an Integration of the LUMI-lab 104 and the LUMI-model 102 is achieved through a distributed architecture optimized for high-throughput data processing. Experimental readouts from the analytical module 304 of LUMI-lab 104 were managed through a MongoDB database, which served as the central data repository for model refinement. The LUMI-model 102 framework implemented a hybrid computing strategy: model fine-tuning was performed locally with one A6000 Ada GPU, while inference tasks were distributed across cloud-based GPU clusters. Cloud-based parallel inference was implemented using Modal’s serverless API and Docker containers with up to ten A100 GPUs. Prediction results were aggregated locally for subsequent iterations.
[0272] An intuitive web-based control interface was developed, enabling real-time monitoring and remote operation of the LUMI-lab 104. A control panel, implemented within the control system 106, was deployed using Streamlit, which presented key metrics for thePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 LUMI-lab 104, such as the current experiment plan, latest readouts, and consumable status. The LUMI-system 100 can support remote intervention capabilities, such as labware reconfiguration for consumable replenishment. System events, including experimental milestones and potential failures, can be communicated through automated Slack notifications, ensuring continuous experimental oversight while minimizing the need for constant human supervision.SYNTHESIS OF LIPID TAILS
[0273] General synthesis of aldehydes. Aldehyde tails were synthesized through either a one-step process (Route A shown below) or a two-step process (Route B shown below).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1hexadecan-1 -ol HO1-Hexadecanol O' palmitaldehyde HO1 -Heptadecanol heptadecan-1 -ol heptadecanal HO' 1 -Octadecanoloctadecan-1-ol DCM, DMP stearaldehyde HOOleyl Alcoholw.octadeo.9.en-l-ol (9Z,12Z)-octadeca-9,12-dlen-1 -ol olealdehydeLinoleyl alcohol (9Z,12Z)-octadeca-9,12-dienal2-hexyldecan-1-ol 2-hexyldecanal2-hexyldecanolc acid 6-hydroxyhexyl 2-hexyldecanoate DCM. DMP O6-oxohexyl 2-hexyldecanoate Route A:
[0274] Aldehyde tails were synthesized directly from alcohol tails using Dess-Martin Periodinane (DMP) oxidation. Alcohol (1 mmol) was dissolved in 30 ml of anhydrous dichloromethane (DCM), and DMP was added. The mixture was stirred under nitrogen at room temperature for 2 hours. After confirming reaction completion via thin-layer chromatography (TLC), 200 ml of 50% (w / v) sodium thiosulfate pentahydrate solution was added and stirred for an additional 15 minutes. The organic layers were combined, washedPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 with brine, dried over anhydrous sodium sulfate (Na2SO4), and concentrated to yield a crude colorless oil. Purification was carried out using silica gel chromatography with a gradient of 0- 50% ethyl acetate in hexane to obtain the aldehyde tails.Route B:
[0275] Aldehyde tails were synthesized in two steps. First, esterification was carried out by dissolving alcohol (10.0 mmol), carboxylic acid (1.0 mmol), dicyclohexylcarbodiimide (DCC, 1.1 mmol), and 4-dimethylaminopyridine (DMAP, 0.2 mmol) in 20 ml of anhydrous DCM in a round-bottom flask. The reaction was stirred at room temperature under nitrogen for 24 hours. The reaction mixture was filtered to remove dicyclohexylurea byproducts, and the filtrate was evaporated under vacuum. The residue was purified using a CombiFlash BUCHI C-815 chromatography system with gradient elution (0%- 20% ethyl acetate in hexane) to isolate the ester product. The product was then oxidized to aldehyde using the same DMP oxidation method described in Route A.
[0276] 6-oxohexyl 2-butyloctanoate (C9): Followed the synthesis method (Route B) as above. The residue was purified by silica gel chromatography to give 6-hydroxyhexyl 2-butyloctanoate as a colorless oil (yield 58%).1H NMR (400 MHz, CDCI3) 04.04 (t, J= 6.7 Hz, 2H), 3.61 (t, J = 6.6 Hz, 2H), 2.28 (tt, J = 9.0, 5.3 Hz, 1 H), 1.64 - 1.51 (m, 6H), 1.39 -1.21 (m, 18H), 0.87 - 0.84 (m, 6H). After the product oxidized to give 6-oxohexyl 2-butyloctanoate (C9) as a colorless oil (yield 95%).1H NMR (400 MHz, CDCI3) 04.06 (t, J = 6.6 Hz, 2H), 2.53 - 2.38 (m, 2H), 2.29 (tt, J = 9.0, 5.4 Hz, 1 H), 1.74 - 1.50 (m, 6H), 1.48 -1.35 (m, 4H), 1.33 - 1.17 (m, 13H), 0.86 (td, J= 7.0, 2.2 Hz, 6H).
[0277] 6-oxohexyl 2-hexyldecanoate (C10): Followed the synthesis method (Route B) as above. The residue was purified by silica gel chromatography to give 6-hydroxyhexyl 2-hexyldecanoate as a colorless oil (yield 52%).1H NMR (400 MHz, CDCI3) 54.05 (t, J= 6.6 Hz, 2H), 3.62 (t, J = 6.7 Hz, 2H), 2.29 (tt, J = 9.0, 5.3 Hz, 1 H), 1.64 - 1.52 (m, 6H), 1.41 - I.18 (m, 25H), 0.86 (t, J = 6.7 Hz, 6H). After the product oxidized to give 6-oxohexyl 2-hexyldecanoate (C10) as a colorless oil (yield 98%).1H NMR (400 MHz, CDCI3) 09.76 (dt, J = 3.9, 1.9 Hz, 1 H), 4.06 (td, J = 6.6, 3.5 Hz, 2H), 2.61 - 1.98 (m, 3H), 1.69 - 1.22 (m, 29H), 0.86 (dq, J= 6.9, 3.4 Hz, 6H).
[0278] 2-hexyldecanal (C11): Follow the synthesis method (Route A) as above. The residue was purified by silica gel chromatography to give 2-hexyldecanal (C11) as a colorless oil (yield 92%).1H NMR (400 MHz, CDCI3) 09.55 (d, J = 3.2 Hz, 1 H), 2.22 (dqd, J = I I.2, 5.4, 3.2 Hz, 1 H), 1.46 - 1.40 (m, 2H), 1.28 - 1.25 (m, 21 H), 0.88 (d, J = 2.2 Hz, 6H).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0279] Palmitaldehyde (C12): Followed the synthesis method (Route A) as above. The residue was purified by silica gel chromatography to give palmitaldehyde (C12) as a white powder (yield 95%).1H NMR (400 MHz, CDCI3) 09.75 (t, J= 1.9 Hz, 1H), 2.41 (td, J= 7.4, 1.9 Hz, 2H), 1.69 - 1.59 (m, 2H), 1.25 (s, 23H), 0.89 - 0.85 (m, 3H).
[0280] Heptadecanal (C13): Followed the synthesis method (Route A) as above. The residue was purified by silica gel chromatography to give heptadecanal (C13) as a white powder (yield 96%).1H NMR (400 MHz, CDCI3) 09.76 (t, J = 1.9 Hz, 1 H), 2.42 (td, J = 7.4, 1.9 Hz, 2H), 1.64 - 1.61 (m, 2H), 1.26 (d, J = 3.8 Hz, 25H), 0.86 (s, 3H).
[0281] Stearaldehyde (C14): Followed the synthesis method (Route A) as above. The residue was purified by silica gel chromatography to give stearaldehyde (C14) as a white powder (yield 95%).1H NMR (400 MHz, CDCI3) 09.75 (t, J= 1.9 Hz, 1H), 2.41 (td, J= 7.4, 1.9 Hz, 2H), 1.78 - 1.56 (m, 2H), 1.24 (s, 27H), 0.96 - 0.79 (m, 3H).
[0282] Olealdehyde (C15): Followed the synthesis method (Route A) as above. The residue was purified by silica gel chromatography to give olealdehyde (C15) as a colorless oil (yield 90%).1H NMR (400 MHz, CDCI3) 59.76 (t, J= 1.9 Hz, 1H), 5.49 - 5.12 (m, 2H), 2.41 (td, J = 7.4, 1.9 Hz, 2H), 2.00 (q, J = 5.9 Hz, 4H), 1.68 - 1.57 (m, 2H), 1.39 - 1.23 (m, 20H), 0.87 (t, J = 6.8 Hz, 3H).
[0283] (9Z,12Z)-octadeca-9,12-dienal (C16): Followed the synthesis method (Route A) as above. The residue was purified by silica gel chromatography to give (9Z,12Z)-octadeca-9,12-dienal (C16) as a colorless oil (yield 90%).1H NMR (400 MHz, CDCI3) 09.76 (t, J= 1.9 Hz, 1H), 5.54 - 5.13 (m, 4H), 2.87 -2.68 (m, 2H), 2.42 (td, J= 7.4, 1.9 Hz, 2H), 2.05 (q, J = 6.9 Hz, 4H), 1.62 (dd, J = 8.6, 5.9 Hz, 2H), 1.36 - 1.28 (m, 14H), 0.88 (d, J = 2.7 Hz, 3H).
[0284] General synthesis of carboxylic acids. Carboxylic acid tails were synthesized through a one-step process (Route C shown below).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 c6-((2-hexyldecyl)oxy)-6-oxohexanoic acid 2-hexyldecan-1-ol6-oxo-6-(undecan-2-yloxy)hexanoic acid6-oxo-6-(undecan-3-yloxy)hexanoic acidDCC, DMAP, DCM 6-(decan-4-yloxy)-6-oxohexanoic acid adipic acid O6-(heptadecan-9-yloxy)-6-oxohexanoic acid7-ethyl-2-methylundecan-4-ol 6-((7-ethyl-2-methylundecan-4-yl)oxy)-6-oxohexanoic acidDecyl alcohol 6-(decyloxy)-6-oxohexanoic acid1 -Undecanol 6-oxo-6-(undecyloxy)hexanoic acid10-Undecylenic alcohol6-oxo-6-(undecyloxy)hexanoic acid
[0285] In particular, esterification was carried out by dissolving alcohol (1.0 mmol), adipic acid (10.0 mmol), dicyclohexylcarbodiimide (DCC, 1.1 mmol), and 4-dimethylaminopyridine (DMAP, 0.2 mmol) in 20 ml of anhydrous DCM in a round-bottom flask. The reaction was stirred at room temperature under nitrogen for 24 hours. The reaction mixture was filtered to remove dicyclohexylurea byproducts, and the filtrate was evaporated under vacuum. The residue was purified using a CombiFlash BUCHI C-815 chromatography system with gradient elution (0%- 30% ethyl acetate in hexane) to isolate the ester product.
[0286] 6-((2-hexyldecyl)oxy)-6-oxohexanoic acid (D28): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-((2-hexyldecyl)oxy)-6-oxohexanoic acid (D28) as a white crystal (yield 75%).1H NMR (400 MHz,PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 CDCl3) 03.94 (dd, J= 5.9, 2.8 Hz, 2H), 2.48 -2.17 (m, 4H), 1.72 - 1.59 (m, 5H), 1.24 (d, J = 3.3 Hz, 24H), 0.86 (t, J= 6.7 Hz, 6H).
[0287] 6-oxo-6-(undecan-2-yloxy)hexanoic acid (D29): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-oxo-6-(undecan-2-yloxy) hexanoic acid (D29) as a white crystal (yield 72%).1H NMR (400 MHz, CDCl3) 55.03 - 4.71 (m, 1 H), 2.46 - 2.22 (m, 4H), 1.72 - 1.53 (m, 5H), 1.47 - 1.06 (m, 18H), 0.99 - 0.68 (m, 3H).
[0288] 6-oxo-6-(undecan-3-yloxy)hexanoic acid (D30): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-oxo-6-(undecan-3-yloxy) hexanoic acid (D30) as a white crystal (yield 69%).1H NMR (400 MHz, CDCI3) 64.80 (ddd, J= 12.3, 6.8, 5.5 Hz, 1H), 2.52 - 2.19 (m, 4H), 1.71 - 1.46 (m, 8H), 1.24 (s, 12H), 0.86 (td, J= 7.1, 2.1 Hz, 6H).
[0289] 6-(decan-4-yloxy)-6-oxohexanoic acid (D31): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-(decan-4-yloxy)-6-oxohexanoic acid (D31) as a white crystal (yield 75%).1H NMR (400 MHz, CDCl3) 54.87 (tt, J= 7.1, 5.4 Hz, 1H), 2.47 -2.18 (m, 4H), 1.72 - 1.60 (m, 4H), 1.56 - 1.41 (m, 4H), 1.34 -1.15 (m, 10H), 0.87 (dt, J= 9.8, 7.2 Hz, 6H).
[0290] 6-(heptadecan-9-yloxy)-6-oxohexanoic acid (D32): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-(heptadecan-9-yloxy)-6-oxohexanoic acid (D32) as a white crystal (yield 68%).1H NMR (400 MHz, CDCl3) 54.95 - 4.79 (m, 1 H), 2.50 - 2.18 (m, 4H), 1.96 - 1.06 (m, 26H), 0.87 (dt, J = 9.5, 7.2 Hz, 6H).
[0291] 6-((7-ethyl-2-methylundecan-4-yl)oxy)-6-oxohexanoic acid (D33): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-((7-ethyl-2-methylundecan-4-yl)oxy)-6-oxohexanoic acid (D33) as a white crystal (yield 73%).1H NMR (400 MHz, CDCl3) 54.94 (dtd, J= 8.6, 6.1, 4.3 Hz, 1H), 2.39 -2.19 (m, 4H), 1.68 - 1.45 (m, 7H), 1.33 - 1.09 (m, 13H), 1.03 - 0.59 (m, 12H).
[0292] 6-(decyloxy)-6-oxohexanoic acid (D34): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-(decyloxy)-6-oxohexanoic acid (D34) as a white crystal (yield 76%).1H NMR (400 MHz, CDCl3) 53.98 (t, J = 6.8 Hz, 2H), 2.40 -2.13 (m, 4H), 1.60 (h, J= 3.1 Hz, 4H), 1.52 - 1.04 (m, 17H), 0.84 -0.76 (m, 3H).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0293] 6-oxo-6-(undecyloxy)hexanoic acid (D35): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-oxo-6-(undecyloxy)hexanoic acid (D35) as a white crystal (yield 77%).1H NMR (400 MHz, CDCI3) 5 3.98 (q, J = 6.4 Hz, 2H), 2.28 (ddd, J = 18.7, 8.0, 4.8 Hz, 4H), 1.67 - 1.51 (m, 6H), 1.22 (dt, J = 21.2, 6.0 Hz, 16H), 0.94 - 0.68 (m, 3H).
[0294] 6-oxo-6-(undec-10-en-1-yloxy)hexanoic acid (D36): Followed the synthesis method as above. The residue was purified by silica gel chromatography to give 6-oxo-6-(undecyloxy)hexanoic acid (D36) as a white crystal (yield 66%).1H NMR (400 MHz, CDCI3) 5 5.79 (ddt, J = 16.9, 10.2, 6.7 Hz, 1H), 5.06 - 4.74 (m, 2H), 4.04 (t, J = 6.8 Hz, 2H), 2.39 -2.24 (m, 4H), 2.10 - 1.96 (m, 2H), 1.72 - 1.53 (m, 7H), 1.45 - 1.09 (m, 14H).GENERAL SYNTHESIS OF LIPID CANDIDATES
[0295] Briefly, for the synthesis of the ionizable lipid library, Ugi-4cr chemistry was employed to prepare ionizable cationic lipids through reactions involving amine groups (-NH2), aldehyde groups (-CHO), carboxylic acids (-COOH), and isocyanide groups (-NC). The amines, isocyanides, aldehydes, and carboxylic acids were sourced from TCI and Sigma Aldrich or synthesized following previously reported methods (Yue Xu et al. “AGILE platform: a deep learning powered approach to accelerate LNP development for mRNA delivery”, Nature Communications 15.1 (July, 2024), p. 6305).
[0296] In brief, the reactants were dissolved in ethanol and reacted in a round bottom flask for 24 hours at a molar ratio of 1:1:1:1 (aldehyde: amine: isocyanide: carboxylic acid). The reaction mixture was evaporated under vacuum. The residue was purified using a CombiFlash BUCHI C-815 chromatography system with gradient elution (1% Ammonia water, 0% - 10% MeOH in DCM) to isolate the final product (see FIG. 8A for the structures of the LUMI-1 to LUMI-6 candidates described below).LUMI-1
[0297] LUMI-1: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-1 as a colorless oil (yield 85%). MS (ESI) m / z: [M + H]+calcd. for C40H75BrN5O2, 737.97; found, 737.50.1H NMR (400 MHz, CDCI3) 5 5.49 - 5.23 (m, 4H), 3.57 - 3.31 (m, 3H), 2.77 (s, 1H), 2.22 (s, 2H), 2.12 - 1.88 (m, 16H), 1.84 (s, 1H), 1.67 (s, 6H), 1.41 - 1.12 (m, 34H), 0.89 (t, J = 6.8 Hz, 3H).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1LUMI-2
[0298] LUMI-2: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-2 as a colorless oil (yield 82%). MS (ESI) m / z:[M+H]+calcd. for C42H76BrN3O2, 734.51; found, 734.921H NMR (400 MHz, CDCI3) 55.42 - 5.30 (m, 2H), 3.39 (td, J = 6.9, 1.2 Hz, 4H), 2.50 - 2.43 (m, 2H), 2.34 (s, 7H), 1.93 (d, J = 2.9 Hz, 5H), 1.64 (d, J = 6.1 Hz, 6H), 1.48 - 1.21 (m, 40H), 0.86 (dd, J = 7.0, 1.2 Hz, 3H).LUMI-3
[0299] LUMI-3: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-3 as a colorless oil (yield 81%). MS (ESI) m / z:[M+H]+calcd. for C39H71BrN3O4.724.91, found 724.75.1H NMR (400 MHz, CDCI3) 54.10 (q, J = 7.1 Hz, 6H), 3.40 (t, J = 6.8 Hz, 8H), 2.55 (d, J = 1.7 Hz, 2H), 2.33 (t, J = 7.4 Hz, 6H), 2.03 (s, 6H), 1.89 - 1.83 (m, 6H), 1.66 - 1.61 (m, 10H), 1.50 - 1.45 (m, 6H), 1.25 (d, J= 7.2 Hz, 12H), 0.86 (td, J= 7.0, 2.0 Hz, 6H).'Y0YLUMI-4
[0300] LUMI-4: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give top 4 as a colorless oil (yield 88%). MS (ESI) m / z: [M+H]+calcd. for C41H73BrN4O2, 733.97, found 733.58.1H NMR (400 MHz, CDCI3) 55.39 - 5.18 (m, 2H), 2.52 (ddd, J = 69.7, 6.6, 4.2 Hz, 14H), 2.08 - 1.74 (m, 13H), 1.62 (q, J= 4.7 Hz, 10H), 1.35 - 1.04 (m, 28H), 0.96 - 0.61 (m, 3H).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0301] LUMI-5: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-5 as a colorless oil (yield 80%). MS (ESI) m / z:[M+H]+calcd. for C41H74BrN5O2, 748.50, found 748.50.1H NMR (400 MHz, CDCI3) 5 5.33 (d, J = 7.3 Hz, 4H), 2.85 - 2.61 (m, 12H), 2.08 - 1.91 (m, 20H), 1.31 - 1.24 (m, 27H), 0.87 (d, J = 2.8 Hz, 3H).
[0302] LUMI-6: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give top 6 as a colorless oil (yield 81%). MS (ESI) m / z: [M+H]+calcd. for C45H8iBrN4O4, 821.54, found 821.67.1H NMR (400 MHz, CDCI3) 56.67 - 6.42 (m, 1 H), 4.04 (td, J = 6.7, 4.0 Hz, 2H), 3.40 (td, J = 6.7, 3.0 Hz, 3H), 2.84 - 2.39 (m, 7H), 2.37 - 2.12 (m, 4H), 1.97 (d, J = 2.9 Hz, 4H), 1.92 - 1.84 (m, 2H), 1.75 - 1.19 (m, 41 H), 0.91 - 0.81 (m, 6H).LUMI-6D
[0303] LUMI-6D: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-6D as a colorless oil (yield 75%). MS (ESI) m / z: [M+H]+calcd. for C46H84N4O4757.65, found 757.50.1H NMR (400 MHz, CDCI3) 56.61 (s, 1H), 4.00 (t, J = 6.7 Hz, 2H), 3.33 (q, J = 5.7 Hz, 2H), 2.67 (t, J = 6.2 Hz, 1H), 2.49 (t, J = 6.0 Hz, 9H), 2.30 - 2.20 (m, 2H), 2.19 - 2.06 (m, 2H), 2.03 (t, J = 3.2 Hz, 3H), 1.69 - 1.12 (m, 49H), 0.83 (dtd, J = 7.0, 4.4, 2.2 Hz, 9H).PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0304] LUMI-6CI: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-6CI as a yellow oil (yield 85%).1H NMR (400 MHz, CDCl3) 55.76 (s, 1H), 4.67 (s, 1H), 4.02 (q, J = 6.4 Hz, 2H), 3.53 (td, J = 6.5, 4.5 Hz, 2H), 3.38 (ddt, J = 20.6, 10.6, 5.5 Hz, 1H), 2.96 (t, J = 4.8 Hz, 3H), 2.57 (q, J = 8.6 Hz, 4H), 2.47-2.13 (m, 4H), 2.11-0.98 (m, 52H), 0.93-0.79 (m, 6H).
[0305] LUMI-6F: Followed the synthesis method as above. The residue was purified by silica gel chromatography to give LUMI-6F as a yellow oil (yield 65%).1H NMR (400 MHz, CDCl3) 56.12 (s, 1H), 4.02 (q, J = 6.5 Hz, 2H), 3.36 (t, J= 7.3 Hz, 1H), 2.99 (t, J = 4.9 Hz, 3H), 2.71 -2.20 (m, 12H), 2.13 - 1.78 (m, 8H), 1.73 - 1.04 (m, 35H), 0.94 - 0.74 (m, 6H).LNP SCREENING
[0306] A general protocol of a screening method for cargo-LNPs included:
[0307] 1) Preparation of lipid mixture: a non-cationic lipid (e.g., phospholipid), a binding polymeric lipid to prevent particle aggregation (e.g., PEGylated-lipid), and a structural lipid (e.g., cholesterol) were added to reaction vessels containing the prepared ionizable lipids to form a lipid mixture.
[0308] 2) Preparation of aqueous phase: an aqueous solution containing a cargo nucleic acid, such as mRNA, was prepared.
[0309] 3) Formulation of cargo-LNPs: The lipid mixture obtained in Step 1) was mixed with the aqueous phase containing the nucleic acid prepared according to Step 2) to formulate the cargo-LNPs.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0310] In some examples, cargo-LNPs were synthesized according to reported method (He, Z. et al. A Multidimensional Approach to Modulating Ionizable Lipids for High-Performing and Organ-Selective mRNA Delivery. Angewandte Chemie International Edition 62, e202310401 (2023); and Xu, S. et al. Tumor-Tailored Ionizable Lipid Nanoparticles Facilitate IL-12 Circular RNA Delivery for Enhanced Lung Cancer Immunotherapy. Advanced Materials 36, 2400307 (2024).).
[0311] 4) Screening of cargo-LNPs: in vitro or in vivo assays were conducted to evaluate the LNPs encapsulating the nucleic acid.
[0312] As three different helper lipids were used to formulate the mRNA-LNPs, automated head-to-head comparisons were conducted to assess the impact of the helper lipids on the LNP activity. The comparisons showed that the relative advantage of ionizable lipid candidates identified by the LUMI-system 100 over baselines of commercial lipids remained consistently substantial across all tested varying helper lipids and formulations (FIGs. 19A and 19B). These results suggest that the ionizable lipid structure, not the helper lipids, predominantly affect the LNP performance, aligning with the strategy of using a standardized baseline formulation for high-throughput screening, followed by formulation optimization around lead ionizable lipid candidates.
[0313] An in vivo study of pulmonary mRNA delivery was conducted to evaluate the topperforming lipids from the LUMI-system 100’s final iteration.
[0314] FIG. 8B illustrates the formulation of mLuc-LNPs with each of LUMI-1 to LUMI-6, respectively. The characterization of the nanoparticle size and polydispersity index (PDI) confirmed that all six mLuc-LNPs formed stable complexes with known helper lipid systems and successfully encapsulated mLuc (see FIGs. 8C and 8D).
[0315] Generally, the mLuc-LNPs were administrated into mouse lungs via I. T. injection, and the In Vivo Imaging System (MS) images of the lungs were captured six hours postinjection to assess in vivo mRNA transfection efficiency.
[0316] The results were shown in FIGS. 8E and 8F, where the data are presented as mean ± s.e.m. of tdTOM+ cells in lungs (n = 3 biologically independent animals). Statistical significance evaluated using a two-tailed unpaired t-test (*P < 0.05; **P < 0.01; ***P < 0.001; ****p < 0.0001). Notably, five candidates exhibited mRNA delivery efficiency comparable to or exceeding SM-102, an industry-standard lipid used in Moderna’s COVID-19 mRNA vaccine, while the least effective candidate still matched the performance of MC3, another FDA-approved ionizable lipid.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1
[0317] As a comparison with LUMI-6, LUMI-6D, a debrominated derivative, was synthesized. mRNA-LNPs were formulated under identical conditions to ensure experimental consistency. Bioluminescence assays in HBE cells revealed that LUMI-6 exhibited 1.8-fold higher mRNA transfection efficiency than LUMI-6D, directly confirming that bromine incorporation plays a critical role in enhancing mRNA delivery (as illustrated in FIG. 12A). No significant difference in cytotoxicity was observed between the mRNA-LNPs formulated with brominated and non-brominated ionizable lipids (see FIG. 12B). Fluorescence imaging and colocalization analysis were conducted to assess LNP distribution, endosomal escape efficiency, and intracellular mRNA release. At early time points (2-6 hours post-transfection), both LUMI-6 and LUMI-6D exhibited similar intracellular distributions, with Cy5-tagged mRNA primarily sequestered within endosomal compartments, leading to comparable cytoplasmic / endosomal Cy5 intensity ratios, as shown in FIG. 12C, left graph. However, by 18 hours, LUMI-6 demonstrated significantly enhanced endosomal escape, as indicated by a marked increase in the cytoplasmic / endosomal Cy5 intensity ratio, total cytoplasmic Cy5 intensity (FIG. 12C, middle graph), and per-cell cytoplasmic Cy5 intensity (FIG. 12C, right graph) relative to LUMI-6D.
[0318] The gene editing potential of LUMI-6 was also investigated to assess its suitability for CRISPR / Cas9 mediated gene editing in the lung. LUMI-6 LNPs encapsulating Cas9 mRNA / gRNA were formulated separately using an optimized mass ratio described above (0.75 mg / kg Cas9 mRNA and sgRNA at a weight ratio 4:1, with a Cas9 mRNA dose of 1 mg / kg) and administered via I. T. instillation into Ai9 reporter mice (illustrated in FIG. 13A). A CRISPR Cas9 gene editing strategy was introduced to trigger the disruption of stop signals and turn-on of tdTomato expression in the Ai9 reporter mice. Three days after the last dose, the lungs were collected and analyzed by flow cytometry (n = 3 mice for each group). IVIS imaging at 6 h following I. T. administration of SM-102 or LUMI-6 mLuc-LNPs (1 mg / kg mLuc per mouse) revealed that the LUMI-6 mLuc-LNPs significantly outperformed the SM-102 mLuc-LNPs, demonstrating superior editing efficiency in the lung tissue (FIG. 13B). Flow cytometry analysis of tdTomato-positive cells in lung endothelial and epithelial cells from Ai9 mice treated with Cas9 mRNA and sgRNA-formulated LNPs (dose: 1 mg / kg, n = 3 biologically independent animals and n = 2 samples for each mice) further showed that LUMI-6 LNP achieved a gene editing efficiency of 20.3% in lung epithelial cells, a substantial improvement over SM-102 LNP (FIG. 13C). To further assess the distribution of CRISPR-Cas9 editing mediated by the LUMI-6 mLuc-LNPs across cell subtypes, immunofluorescence staining of ciliated cells (a-tubulin+) and club cells (CCSP+), two important airway epithelial populations implicated in lung diseases such as cystic fibrosisPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 (CF) and alpha-1 antitrypsin deficiency (AATD), was performed. The results confirmed efficient gene editing in both cell types, reinforcing LUMI-6's potential for gene editing in the airway epithelium (FIG. 13D).
[0319] To assess the in vivo base editing potential of mRNA-LUMI-6 LNPs, the LumA luciferase reporter mouse model was utilized. This model carries an R387X nonsense mutation that abolishes luciferase activity, which can be restored by A-to-G correction with an Adenine Base Editor (ABE) (FIG. 14A). ABE mRNA and sgRNA were encapsulated into LUMI-6 or control SM-102 LNPs and administered intratracheally to LumA mice (FIG. 14B). Following two administrations of 0.75 mg / kg of ABE mRNA / sgRNA encapsulated in LUMI-6 LNPs on Day 0 and Day 7, the animals were sacrificed on Day 11 for analysis. IVIS imaging of major organs (lung, heart, liver, spleen and kidney) revealed a substantially higher luminescence signal in the lungs of the LUMI-6 LNP group compared to the SM-102 LNP group (FIG. 14C). Quantification of luminescence activity in RLU in dissected organs confirmed this observation, showing significantly elevated luminescence activity in the lungs of LUMI-6 treated mice (FIG. 14D, using saline as the control). Subsequent sequencing analysis verified that the ABE mRNA / sgRNA LUMI-6 LNPs mediated a significantly higher percentage of T-to-C base conversion, corresponding to the A-to-G edit on the complementary strand, than the SM-102 counterparts, demonstrating the enhanced efficiency of using LUMI-6 as an ionizable lipid for pulmonary base editing (FIGs. 14E and 14F).
[0320] Subchronic toxicity and immune response profiles of the brominated lipids were also assessed by conducting a 28-day repeat-dose toxicity study in rodents using LUMI-6-based LNPs, LUMI-6 (DSPC as the helper lipid) and LUMI-6 (DOTAP as the helper lipid), delivered via I. T. route. The dosing schedule was as follows: a dose of 0.5 mg / kg of the LUMI-6-based LNPs on each of Day 0, 7, 14, 21 and 28 and the rodents were sacrificed on Day 29 for analysis. As shown in FIGs. 15B-D, the immune responses, including cytokine production (IL-6, TNF-alpha), complement activation, and hemolysis, of the LUMI-6-based LNPs were comparable to those of SM-102. These results suggest that the immunogenicity profile is not intrinsically heightened by the bromination.
[0321] An intramuscular (I. M.) injection study was also conducted to assess the LNP's safety in systemic applications, such as vaccines, by assessing the subchronic toxicity and immune response profiles of the brominated lipids in a 28-day repeat-dose toxicity study in rodents using LUMI-6 DSPC based LNPs, similar as the above-described assessment via the I. T. route. However, the permanently charged cationic lipid DOTAP was excluded in thePCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 experiment due to its known toxicity in I. M. injection. The results showed that, compared to the I. T. route, the I. M. administration had a similar immune response (FIGs. 16A and 16B). Furthermore, a histological analysis showed no significant pathological changes in major organs (FIG. 16B). In all cases, the overall safety profile was comparable to that of the clinical benchmark SM-102, suggesting the tolerability of LUMI-6 as an ionizable lipid for the LNPs.
[0322] LUMI-6CI was evaluated for its in vivo performance. The results showed that LUMI-6CI based LNP exhibited comparable in vivo transfection potency to the LUMI-6 based LNP (FIG. 17A).
[0323] To further elucidate the SAR of the halogenated lipids, their self-assembly properties at endosomal pH (5.5) were characterized using SAXS and Cryo-TEM. Based on the SAXS data, LUMI-6 and LUMI-6CI showed sharper peaks than LUMI-6D, suggesting that the halogenated lipids, LUMI-6 and LUMI-6CI, formed more compact and ordered structures compared to mRNA-LUMI-6D (FIG. 17C). Cryo-TEM imaging confirmed these findings and revealed a significant difference in the lipid phase organization (FIG. 17D). While mRNA-LUMI-6D LNPs exhibited a mix of lamellar (illustrated by a rectangle) in and inverted hexagonal (H_l I) (illustrated by a square) phases, the halogenated lipids predominantly organized into large domains of the H_ll phase.
[0324] All patent applications, patents, printed publications, and source codes cited herein are incorporated herein by reference in the entireties, except for any definitions, subject matter disclaimers or disavowals, and except to the extent that the incorporated material is inconsistent with the express disclosure herein, in which case the language in this disclosure controls.
[0325] For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the examples described herein. Also, the description is not to be considered as limiting the scope of the examples described herein.
[0326] It will be appreciated that the examples and corresponding diagrams used herein are for illustrative purposes only. Different configurations and terminology can be usedPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 without departing from the principles expressed herein. For instance, components and modules can be added, deleted, modified, or arranged with differing connections without departing from these principles.
[0327] Although the above principles have been described with reference to certain specific examples, various modifications thereof will be apparent to those skilled in the art as outlined in the appended claims.
Claims
PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 Claims:
1. An artificial intelligence (Al) based automated self-driving laboratory (SDL) system for developing molecule candidates, the system comprising:a machine learning module and a laboratory module operatively coupled to the machine learning module, wherein,the machine learning module is pretrained with a set of molecules to learn structures and one or more properties of interest of the set of molecules, the machine learning module being configured to produce an initial set of molecules to be synthesized and analyzed by the laboratory module based on the learned structures and properties of interest, and receive feedback of structures and properties of interest based on the initial set of molecules synthesized and analyzed by the laboratory module for a further training and produce a next iteration of molecules to be synthesized and analyzed by the laboratory module; and the laboratory module is configured to synthesize and analyze the initial set of molecules received from the machine learning module, relay the structures and properties of interest determined from synthesis and analysis of the initial set of molecules to the machine learning module for the iterative training, receive from the machine learning module the next iteration of molecules, synthesize and analyze the next iteration of molecules, and wherein the machine learning module is iteratively retrained to produce the next iteration of molecules, and the next iteration of molecules are iteratively synthesized and analyzed by the laboratory module until at least one termination criterion is met.
2. The Al-based automated SDL system of claims 1, wherein the at least one termination criterion comprises at least one lead molecule candidate being identified for output via an output device associated with the system.
3. The Al-based automated SDL system of claim 1 or claim 2, wherein the one or more properties of interest comprise one or more of pKa, hydrophobicity, hydrophilicity, pH, and cargo molecule transfection efficiency.
4. The Al-based automated SDL system of any one of claims 1-3, wherein the laboratory module comprises a synthesis module configured to synthesize the one or more of initial and next iteration set of molecules received from the machine learning model, and an analytical module configured to analyze the synthesized molecules for the one or more properties of interest and produce feedback for the machine learning module.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 5. The Al-based automated SDL system of claim 4, wherein the synthesis module comprises a first handler module to conduct the synthesis of the produced set of molecules; and a second handler module to deliver the synthesized molecules to the analytical module.
6. The Al-based automated SDL system of claim 5, wherein the analytical module comprises:an incubator module accessible by the first handler module, the second handler module or both, the incubator module being configured to hold targets for incubation of the synthesized molecules for a predetermined amount of time; anda reader module configured to measure the one or more properties of the synthesized molecules.
7. The Al-based automated SDL system of any one of claims 1-6, wherein training of the machine learning module comprises a pretraining stage with a generic dataset of molecules and a continual pretraining stage for refining model embeddings based on a domain-specific dataset of molecules.
8. The Al-based automated SDL system of any one of claims 1-7, wherein the molecule candidates are ionizable lipids.
9. The Al-based automated SDL system of claim 8, wherein the second handler module is further configured to formulate lipid nanoparticles (LNPs) and dose each of the formulated LNPs with the targets and deliver the formulated LNPs dosed with the targets to the incubator module for incubation, each LNP comprising a unique synthesized ionizable lipid and a cargo molecule of interest.
10. The Al-based automated SDL system of claim 9, wherein the reader module is configured to measure and determine a corresponding cargo molecule transfection efficiency value of each of the formulated LNPs in the targets.
11. The Al-based automated SDL system of any one of claims 9-10, wherein the cargo molecule of interest is a protein, a peptide, or a nucleic acid.
12. The Al-based automated SDL system of claim 11, wherein the nucleic acid is DNA, siRNA, tRNA, circRNA, miRNA, mRNA or a combination thereof.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 13. The Al-based automated SDL system of claim 9, wherein the targets comprise cell systems, multicellular constructs, or both.
14. The Al-based automated SDL system of claim 13, wherein the cell systems comprise one or more of cancer cell lines, immortalized cell lines, primary cells, stem cells, differentiated cells derived therefrom, mammalian cells, non-mammalian cells, engineered cells, and recombinant cells; and wherein the multicellular constructs are selected from organoids and spheroids.
15. The Al-based automated SDL system of claim 8, wherein the ionizable lipid is of General Formula (I)wherein:Ri is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, and optionally having a halogen at the terminal position or within the intermediate chain extension;R2is selected from acyclic amine, cyclic amine and heterocyclic amine, each of the acyclic amine, cyclic amine and heterocyclic amine having at least one ionizable nitrogen atom;R3is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl, C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; andR4is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen;PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 provided that at least one of R1, R3and R4has a halogen at the terminal position or within the intermediate chain extension.
16. The Al-based automated SDL system of claim 15, wherein R3has a halogen at the terminal position.
17. The Al-based automated SDL system of claim 15 or 16, wherein the halogen is fluorine, chlorine or bromine.
18. The Al-based automated SDL system of any one of claims 15-17, whereinRi is:R2is:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1oHOoPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 andR4is:
19. The Al-based automated SDL system of any one of claims 15-18, wherein the ionizable lipid is20. The Al-based automated SDL system of any one of claims 15-19, wherein the ionizable lipid is synthesized by a method comprising reacting a compound of Formula A, a compound of Formula B, a compound of Formula C and a compound of Formula D under conditions to provide the ionizable lipidPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1R1CHO R2-NH2R3-COOH R4-NCA B C D21. The Al-based automated SDL system of claim 20, wherein the method is performed using a high-throughput chemical procedure.
22. An Al-based method for developing molecule candidates, the method comprising:a. producing, by a pretrained machine learning module, an initial set of molecules to be synthesized and analyzed based on structures and properties of interest learned from a set of molecules;b. synthesizing and analyzing, by a laboratory module, the initial set of molecules; c. training the machine learning module based on the structure and analyzed properties of interest of the initial set of molecules synthesized and analyzed by the laboratory module;d. producing, by the machine learning module, a next iteration of molecules to be synthesized and analyzed;e. synthesizing and analyzing, by the laboratory module, the next iteration of molecules; andrepeating steps b to e until at least one termination criterion is met.
23. The Al-based method of claim 22, wherein the at least one termination criterion comprises at least one lead molecule candidate being identified for output via an output device associated with one or more of the machine learning module and the laboratory module.
24. The Al-based method of claim 22, wherein the molecule candidates are ionizable lipids.
25. The Al-based method of claim 22 or 23, wherein the properties of interest comprise one or more of pKa, hydrophobicity, hydrophilicity, pH and cargo molecule transfection efficiency.
26. The Al-based method of any one of claims 22-24, wherein training of the machine learning module comprises a pretraining stage with a generic dataset of molecules and a continual pretraining stage for refining model embeddings based on a domain specific dataset of molecules.
27. A compound of General Formula I:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1wherein:Ri is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, and optionally having a halogen at the terminal position or within the intermediate chain extension;R2is selected from acyclic amine, cyclic amine and heterocyclic amine, each of the acyclic amine, cyclic amine and heterocyclic amine having at least one ionizable nitrogen atom;R3is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl, C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen; andR4is selected from C1-C30 alkyl and C2-C30 alkenyl, each of the C1-C30 alkyl and C2-C30 alkenyl being optionally interrupted by a hydrolyzable group, optionally having a halogen at the terminal position or within the intermediate chain extension, and optionally having a unsubstituted or substituted C7-C30 cage hydrocarbon at the terminal position when the terminal position does not have a halogen;provided that at least one of Ri, R3and R4has a halogen at the terminal position or within the intermediate chain extension.
28. The compound of claim 27, wherein R3has a halogen at the terminal position.
29. The compound of claim 27 or claim 28, wherein the halogen is fluorine, chlorine or bromine.
30. The compound of any one of claims 27-29, whereinRi is:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1ooR2 is:PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1OHoPCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 andR4is:
31. The compound of any one of claims 27-30, wherein the compound is32. A lipid nanoparticle comprising the compound of any one of claims 27-31 and a cargo molecule.
33. The lipid nanoparticle of claim 32, wherein the cargo molecule is a protein, a peptide, or a nucleic acid.PCT Application CPST Ref. 41191 / 00011 CPST Doc: 1409-0341-7370.1 34. The lipid of claim 33, wherein the nucleic acid is DNA, siRNA, tRNA, circRNA, miRNA, mRNA or a combination thereof.