Method and associated device for obtaining an optimized aptamer library's template

The optimized aptamer library template design and selection method addresses the limitations of SELEX by enhancing sensitivity and specificity through structural diversity evaluation and network analysis, achieving improved predictive capacity for biomarker identification.

WO2026132016A1PCT designated stage Publication Date: 2026-06-25NEOVENTURES BIOTECH +1

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
NEOVENTURES BIOTECH
Filing Date
2025-12-17
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing aptamer selection methods, such as SELEX, face challenges in achieving high sensitivity and specificity due to variability in sequence frequency and structural complexity, leading to suboptimal performance in identifying biomarkers for diseases.

Method used

A computer-implemented method for designing an optimized aptamer library template with fixed and random nucleotide regions, followed by structural diversity evaluation and selection criteria, including secondary structure prediction and network analysis, to generate a diverse and specific aptamer library.

Benefits of technology

The method enhances aptamer performance by improving sensitivity and specificity, as demonstrated by improved predictive capacity in identifying biomarkers, particularly for brain amyloid deposition, with sensitivity of 0.88 and specificity of 0.76, surpassing SELEX-based approaches.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure EP2025087681_25062026_PF_FP_ABST
    Figure EP2025087681_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The present invention relates to a computer-implemented method and associated device for designing an optimized aptamer library's and to a method for generating at least one first secondary structure using the designed optimized aptamer library's template (50). The present invention further relates to a computer-implemented method and associated device for determining relatedness of aptamer's structures.
Need to check novelty before this filing date? Find Prior Art

Description

METHOD AND ASSOCIATED DEVICE FOR OBTAINING AN OPTIMIZED APTAMER LIBRARY’S TEMPLATEFIELD OF INVENTION

[0001] The present invention relates to aptamer development and selection methodologies. In particular, the invention relates to computer-implemented method and associated device for designing an optimized aptamer library’s and to a method for generating at least one first secondary structure using the designed optimized aptamer library’s template. The present invention further relates to a computer-implemented method and associated device for determining relatedness of aptamer’s structures.BACKGROUND OF INVENTION

[0002] The current patent application, focusing on the neomer pipeline for reproducible aptamer selection, builds upon and extends the concepts and methods described in the previous two patent applications (WO2023 / 137559 and WO2023 / 137558).

[0003] The Specification of the WO2023 / 137559 introduced the concept of closed sequence solution space libraries for aptamer selection, it also described the use of these libraries in single-round selection processes based on statistical evaluation of sequence frequency changes between positive and negative selections.

[0004] The Specification of the WO2023 / 137558 further expanded on these concepts by introducing the Aptamarker platform, which uses aptamer libraries to identify unknown biomarkers in biological fluids or tissues. It also described the application of the neomer library approach to eliminate the first step of SELEX-based Aptamarker selection and enabled direct application of the same library to different samples based on phenotypic differences.

[0005] The current patent application extends and refines these concepts in several key ways:• Description of an additional novel step in the design of optimized neomer libraries: This was not contemplated in the previous filings. This application provides a comprehensive explanation of the multi-step process used to generate structurally diverse and optimized DNA sequences for aptamer selection. It covers aspects such as primer design, fixed region composition, hybridization level assessment, and structural diversity maximization.• Statistical framework for structure analysis: The neomer structure analysis pipeline utilizes established statistical methods for differential enrichment analysis, adapted specifically for aptamer structure data. This approach enables the identification of significantly enriched structures while accounting for data variability. The application of these statistical techniques to the analysis of aptamer secondary structures in the context of a closed sequence library represents a novel methodology in the field of aptamer selection and characterization.• Advanced network analysis and visualization: The application introduces novel concepts for representing and interpreting relationships among selected aptamer structures. These network analysis techniques, combined with multi-dimensional data integration in network visualisation software, offer a powerful means of understanding the structural basis of aptamer-target interactions.• Detailed sequence selection methodology: The current application provides a comprehensive description of the sequence selection process, including clustering, dimensionality reduction, and the use of various statistical metrics. This multi-tiered approach ensures the selection of high-performing aptamers that capture both structural and sequence diversity.

[0006] The Punnett square approach recalculated the original full sequence, positing that the secondary structure of each module was the primary determinant of aptamer performance. While this simplification does not fully capture the complexity of the aptamer’s performance in the selection, it resulted in the selection of high-performing aptamers that significantly outperformed those derived from SELEX.

[0007] The reproducible library approach demonstrated improved predictive capacity compared to the previous SELEX-based Aptamarker development method. To clarify thedistinction, the previous SELEX-based approach, as described in earlier publications (Penner et al., 2021; Lecocq et al., 2018), involved a two-stage process:• An initial enrichment step consisting of multiple rounds of selection against a pool of samples to create an enriched aptamer library.• This enriched library was then aliquoted and applied in a further round of SELEX to individual samples.

[0008] Direct statistical comparisons between the two methods show that the SELEX approach resulted in a sensitivity of 0.55, a specificity of 0.77, and an AUC of 0.75 (air under the curve), while the neomer approach resulted in a model with a sensitivity of 0.88, specificity of 0.76, and an AUC of 0.79 when applied to brain amyloid deposition prediction. These results were achieved using eight Aptamarkers, age and sex as variables.

[0009] This improvement is likely due to the neomer approach's ability to maintain a consistent starting library across selections, enabling more accurate frequency comparisons between sample groups and potentially leading to the identification of more robust Aptamarkers. The successful application of the neomer approach to brain amyloid status prediction, with notably improved sensitivity and overall performance, demonstrates its potential for developing more effective Aptamarkers for various diseases and conditions. The underlying principle is that the signal from a highly selected module outweighs the noise from modules that might deprecate its function. This approach has demonstrated efficacy, but subsequent work has shown that while structural diversity is important for identifying aptamers specific to protein epitopes, statistical rigor is critical for evaluating discrete predictive characteristics of sequences, such as the structure of the full module.

[0010] This concept finds support in natural immune systems. In humans, the naive antibody repertoire is likely around 1E7, which is approximately in our range of structures and sequences we can consider as aptamers specific to a target (Briney et al., 2019). Although these are subsequently refined through somatic hypermutation, the principle remains that high specificity and sensitivity can be achieved with a limited number of structures.SUMMARY

[0011] This invention thus relates to a computer-implemented method for designing an optimized aptamer library’s template comprising: a. designing an aptamer library ’ s template comprising:(i) regions of fixed nucleotides comprising (i.a) 5’ end region of fixed nucleotides, 3’ end regions of fixed nucleotides, and (i.b) internal regions of fixed nucleotides;(ii) regions of random nucleotides; wherein said aptamer library’s template comprises from 40 to 90 nucleotides, preferably from 70 to 80 nucleotides, and wherein said regions of random nucleotides comprise from 12 to 16 nucleotides; b. generating multiple versions of said aptamer library’s template, wherein each of said multiple versions satisfy following criteria:• said 5 ’end region of fixed nucleotides and 3 ’end regions of fixed nucleotides comprise from 8 to 16 nucleotides each;• said internal regions of fixed nucleotides have a G / C content from 40 to 60%;• contains maximum three consecutive G’s. c. predicting one or more secondary structures of each of said multiple versions of said aptamer library’s template; d. evaluating structural diversity of each of said multiple versions comprising:• counting a number of secondary structures predicted;• evaluating complexity of secondary structures predicted; e. evaluating distribution of sequences within said multiple versions for comprising each secondary structure; f. selecting said optimized aptamer library’s template among said multiple versions as the one satisfying following predefined criteria:• the number of secondary structures predicted is superior to 100,000;• the complexity of secondary structures is selected from the group comprising: highly hybridized, mediumly hybridized, and lowly hybridized;• said distribution of sequences within said multiple versions per secondary structure has a median greater than 10.

[0012] According to one embodiment, the computer-implemented method for designing an optimized aptamer library’s template comprises: a. designing an aptamer library ’ s template comprising:(i) regions of fixed nucleotides comprising (i.a) 5’ end region of fixed nucleotides, 3’ end regions of fixed nucleotides, and (i.b) internal regions of fixed nucleotides;(ii) regions of random nucleotides; wherein said aptamer library’s template comprises from 40 to 90 nucleotides, preferably from 70 to 80 nucleotides, and wherein said regions of random nucleotides comprise from 12 to 16 nucleotides; b. generating multiple versions of said aptamer library’s template, wherein each of said multiple versions satisfy following criteria:• said 5 ’end region of fixed nucleotides and 3 ’end regions of fixed nucleotides comprise from 8 to 16 nucleotides each;• said internal regions of fixed nucleotides have a G / C content from 40 to 60%;• contains maximum three consecutive G’s. c. predicting one or more secondary structures of each of said multiple versions of said aptamer library’s template; d. evaluating structural diversity of each of said multiple versions comprising:• counting a number of secondary structures predicted;• evaluating complexity of secondary structures predicted; e. evaluating distribution of sequences within said multiple versions for comprising each secondary structure;f. selecting said optimized aptamer library’s template among said multiple versions as the one satisfying following predefined criteria:• the number of secondary structures predicted is superior to 100,000;• a predefined complexity criterion based on the complexity of secondary structures, or alternatively the complexity of secondary structures is associated to a predefined level of hybridization (e.g., said level of hybridization may be chosen from the followings: highly hybridized, mediumly hybridized, and lowly hybridized);• said distribution of sequences within said multiple versions per secondary structure has a median greater than 10.

[0013] According to one embodiment, said evaluation of structural diversity is evaluated using software selected from: RNAFold, MFold, or MXFold.

[0014] The present invention further relates to a method for generating at least one first secondary structure using an optimized aptamer library’s template, designed with a computer-implemented method for designing an optimized aptamer library’s template according to any embodiment of the present invention, against a first target molecule, said method for generating at least one first secondary structure comprising following steps: a. generating, from said optimized aptamer library’s template, an optimized aptamer library comprising multiple sequences of aptamer; b. applying said optimized aptamer library from step a, to said first target molecule, wherein said optimized aptamer library comprises an average copy number per sequence from 100 to 1E7; c. removing sequences of said optimized aptamer library which unbound to said first target molecule from step b through a wash step; d. eluting sequences of said optimized aptamer library which bound to said first target molecule from step b so as to obtain a first selected optimized aptamer library; e. proceeding to NGS analysis of said first selected optimized aptamer library;f. proceeding to NGS analysis of said optimized aptamer library of step a; g. evaluating secondary structure of each sequence in each library, said first selected optimized aptamer library and said optimized aptamer library, and counting the number of each secondary structure; h. comparing the number of each secondary structure of the first selected optimized aptamer library to the number of each secondary structure of the optimized aptamer library; i. identifying said at least one first secondary structure, being at least one secondary structure defined on the base of p-values and on the base of fold difference between the number of each secondary structure of said optimized aptamer library to the number of each secondary structure of said first selected optimized aptamer library.

[0015] According to other advantageous aspects of the invention, the method for generating at least one first secondary structure comprises one or more of the features described in the following embodiments, taken alone or in any possible combination.

[0016] According to one embodiment, said first target molecule is immobilized.

[0017] According to one embodiment, step i further comprises determining an optimal aptamer sequence by determining which sequence exhibits the strongest response to selection.

[0018] According to one embodiment, said method for generating at least one first secondary structure using an optimized aptamer library’s template further comprises following steps:• applying said at least one first secondary structure obtained from step i to at least one second target molecule;• evaluating a specificity of said at least one first secondary structure to bind to said at least one secondary target molecule.

[0019] According to one embodiment, said optimized aptamer library is applied to a target molecule, said target molecule being said first target molecule or said second target molecule, in separate replicated treatments and a naive form of said optimized aptamerlibrary is also evaluated in replicated treatments whereby such replicates is a minimum of two for each and a statistical significance of said fold differences identified for aptamer structures between said first selected optimized aptamer library and said unselected optimized aptamer library (i.e. naive library).

[0020] The present invention further relates to a computer-implemented method for determining relatedness of aptamer’s structures based on following steps:• evaluating single nucleotide position difference between predicted secondary structures at any given position, said single nucleotide position difference being defined as a unit of distance;• generating, as network analysis, at least one structural network topology, comprising: applying a structural distance threshold to each of said predicted secondary structures to identify related structures whose structural distance from said secondary structure is equal to or inferior than said structural distance threshold; and constructing a network topology that represents structural relationships and similarities among said selected secondary structures or their related structures.

[0021] According to other advantageous aspects of the invention, the computer- implemented method for determining relatedness of aptamer’s structures comprises one or more of the features described in the following embodiments, taken alone or in any possible combination.

[0022] According to one embodiment, said method for determining relatedness of aptamer’s structures further comprises:• using said at least one structural network topology as a basis for the hypothesis that each network contains a shared structural solution to the problem of binding to a specific epitope on a protein, and that different networks represent different structural solutions, this may mean different epitopes or the same epitope on a protein; and / or• selecting from the network analysis secondary structures (34) that show appropriate levels of sensitivity and specificity through integration ofinformation of how said secondary structures performed in selections across different targets and analyses.

[0023] According to one embodiment, an additional level of information is added that provides information as to which elements of a secondary structure are necessary for binding and which are not, for use in truncating said secondary structure to a minimally effective form comprising the steps of: a. identifying structures in a structure reference array, that is generated by predicting the most probable secondary structure for all possible sequences from the optimized aptamer library at a defined temperature and salt concentration appropriate for selection, that are within a structural distance threshold of one of said selected secondary structures from the network analysis; b. determining an effect of selection on other secondary structures equal or inferior to said structural distance threshold of one of said secondary structures selected from the network analysis; c. identifying structural elements within said secondary structures denoted from step b using a sliding window technique where a window size ranges between 15 and 45 nucleotides; d. determining, among said structural elements, critical structural elements by analyzing those that perform positively and negatively in selection against a target molecule, as defined by Fisher’s exact test derived p- values; e. isolating positively performing critical structural elements by truncating said secondary structure to retain only said positively performing critical structural elements; f. ensuring structural integrity of said truncated secondary structure is maintained by:• preserving complete hybridized portions of the secondary structure identified as critical structural elements for overall structural stability and functionality;• removing redundant or unstructured dangling ends that do not contribute to stability or binding efficiency of said secondary structure.

[0024] In addition, the disclosure relates to a device adapted to perform a computer- implemented method for designing an optimized aptamer library’s template compliant with any of the above execution modes when the program is executed by a processor.

[0025] In addition, the disclosure relates to a device adapted to perform a computer- implemented method for determining relatedness of aptamer’s structures compliant with any of the above execution modes when the program is executed by a processor.

[0026] The present disclosure further pertains to a non-transitory program storage device, readable by a computer, tangibly embodying a program of instructions executable by the computer to perform any of the computer-implemented methods herein disclosed.

[0027] Such a non-transitory program storage device can be, without limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, or any suitable combination of the foregoing. It is to be appreciated that the following, while providing more specific examples, is merely an illustrative and not exhaustive listing as readily appreciated by one of ordinary skill in the art: a portable computer diskette, a hard disk, a ROM, an EPROM (Erasable Programmable ROM) or a Flash memory, a portable CD-ROM (Compact-Disc ROM).

[0028] The present disclosure further relates to a method for generating at least one first secondary structure using an aptamer library’s template against a first target molecule, said method comprising following steps: a. receiving said aptamer library’s template, an aptamer library comprising multiple sequences of aptamer; b. applying said aptamer library from step a, to said first target molecule, wherein said aptamer library comprises an average copy number per sequence from 100 to 1E7; c. removing sequences of said aptamer library which unbound to said first target molecule from step b through a wash step;d. eluting sequences of said aptamer library which bound to said first target molecule from step b so as to obtain a first selected aptamer library; e. proceeding to NGS analysis of said first selected aptamer library; f. proceeding to NGS analysis of said aptamer library of step a; g. evaluating secondary structure of each sequence in each library, said first selected aptamer library and said aptamer library, and counting the number of each secondary structure; h. comparing the number of each secondary structure of the first selected aptamer library to the number of each secondary structure of the aptamer library; i. identifying said at least one first secondary structure, being at least one secondary structure defined on the base of p-values and on the base of fold difference between the number of each secondary structure of said aptamer library to the number of each secondary structure of said first selected aptamer library.

[0029] According to one embodiment, said first target molecule is immobilized.

[0030] According to one embodiment, step i further comprises determining an optimal aptamer sequence by determining which sequence exhibits the strongest response to selection.

[0031] According to one embodiment, said method for generating at least one first secondary structure using an aptamer library’s template further comprises following steps:• applying said at least one first secondary structure obtained from step i to at least one second target molecule;• evaluating a specificity of said at least one first secondary structure to bind to said at least one secondary target molecule.

[0032] According to one embodiment, said aptamer library is applied to a target molecule, said target molecule being said first target molecule or said second target molecule, in separate replicated treatments and a naive form of said aptamer library is also evaluated in replicated treatments whereby such replicates is a minimum of two foreach and a statistical significance of said fold differences identified for aptamer structures between said first selected aptamer library and said unselected aptamer library (i.e. naive library).

[0033] According to one embodiment, the first target molecule is not known and / or the aptamer library is applied to samples that differ in terms of a medical contrast or medical state, and / or where the enrichment of aptamer structures is used to predict said medical contrast or medical state.DEFINITIONS

[0034] In the present invention, the following terms have the following meanings:

[0035] The terms “adapted” and “configured” are used in the present disclosure as broadly encompassing initial configuration, later adaptation or complementation of the present device, or any combination thereof alike, whether effected through material or software means (including firmware).

[0036] The term “processor” should not be construed to be restricted to hardware capable of executing software, and refers in a general way to a processing device, which can for example include a computer, a microprocessor, an integrated circuit, or a programmable logic device (PLD). The processor may also encompass one or more Graphics Processing Units (GPU), whether exploited for computer graphics and image processing or other functions. Additionally, the instructions and / or data enabling to perform associated and / or resulting functionalities may be stored on any processor- readable medium such as, e.g., an integrated circuit, a hard disk, a CD (Compact Disc), an optical disc such as a DVD (Digital Versatile Disc), a RAM (Random-Access Memory) or a ROM (Read-Only Memory). Instructions may be notably stored in hardware, software, firmware or in any combination thereof.

[0037] “Machine learning (ML)” designates in a traditional way computer algorithms improving automatically through experience, on the ground of training data enabling to adjust parameters of computer models through gap reductions between expected outputs extracted from the training data and evaluated outputs computed by the computer models.

[0038] The above ML definitions are compliant with their usual meaning, and can be completed with numerous associated features and properties, and definitions of related numerical objects, well known to a person skilled in the ML field. Additional terms will be defined, specified or commented wherever useful throughout the following description.

[0039] “Amplification bias”: The preferential amplification of certain sequences over others during PCR, leading to an overrepresentation of those sequences in the resulting pool.

[0040] “ Aptamarker” : A term coined to describe aptamers that are predictive of complex diseases due to their ability to agnostically identify non-canonical biomolecules.

[0041] “Aptamers”: Short, single-stranded oligonucleotides (DNA or RNA) that fold into specific three-dimensional structures and bind to target molecules with high affinity and specificity.

[0042] “Asymmetric topology”: A property of a network graph in which the edges between nodes have a directionality or unequal weights, leading to an imbalance in the connectivity patterns.

[0043] “Base pairing”: The specific hydrogen bonding between complementary nucleotide bases (A-T or A-U, C-G) in double-stranded DNA or RNA molecules.

[0044] “Binding solutions”: The specific interactions and conformations that allow an aptamer to bind to its target molecule with high affinity and specificity.

[0045] “Bulge formations”: A type of secondary structure motif in which unpaired nucleotides are present on one strand of a double-stranded region.

[0046] “Copy number”: The number of identical copies of a specific sequence present in a library or pool of aptamers.

[0047] “Count matrix” : A table that contains the number of occurrences (counts) of each unique sequence or feature in a dataset, often used in NGS data analysis.

[0048] “Cytoscape”: A popular open-source software platform for visualizing and analyzing complex networks and biological pathways, often used in the study of gene expression, protein interactions, and other omics data.

[0049] “Dangling end parameters”: Thermodynamic parameters that describe the stability contributions of unpaired nucleotides adjacent to a base-paired region in a nucleic acid secondary structure.

[0050] “DESeq2”: A statistical package for analyzing count-based NGS data, such as RNA-seq or DNA-seq experiments. It uses negative binomial generalized linear models to test for differential expression or enrichment between conditions.

[0051] “Differential enrichment”: The process of identifying sequences or features that are significantly overrepresented or underrepresented in one sample compared to another, often used in the analysis of SELEX or other selection data.

[0052] “Dimensionality reduction”: A set of techniques used to reduce the number of variables or features in a high-dimensional dataset while preserving the most important information, often used to simplify complex data and facilitate visualization or downstream analysis.

[0053] “Dispersion estimation”: A statistical method used to quantify the variability of count data, often employed in the analysis of RNA sequencing or aptamer selection datasets.

[0054] “Dot-bracket notation”: A simple and widely used format to represent the secondary structure of RNA or DNA sequences, where unpaired bases are denoted by dots and paired bases by matching parentheses.

[0055] “Dynamic programming algorithm”: A computational approach that solves complex problems by breaking them down into simpler subproblems and storing the solutions to avoid redundant calculations.

[0056] “EdgeR”: A popular software package for the analysis of RNA-seq expression data. Like DESeq2, it uses empirical Bayes estimation and exact tests based on the negative binomial distribution to determine differential expression.

[0057] “Edges”: The lines or connections between nodes in a network graph, representing the relationships or interactions between the entities.

[0058] “Effect size”: A quantitative measure of the magnitude of a phenomenon, such as the difference between two groups or the strength of a relationship between variables.

[0059] “Epitope”: The specific region of an antigen that is recognized and bound by an antibody or aptamer.

[0060] “False discovery rate (FDR)”: A method of conceptualizing the rate of type I errors (false positives) in null hypothesis testing when conducting multiple comparisons. In aptamer selection, FDR correction helps control for the increased likelihood of false positives when testing many sequences or structures simultaneously.

[0061] “Fisher's exact test”: A statistical significance test used for the analysis of contingency tables, particularly in cases with small sample sizes. In aptamer selection, it can be used to compare the frequency of specific motifs or structures between selected and naive libraries.

[0062] “Fold change”: A measure of the relative change in abundance or expression of a sequence or feature between two conditions, calculated as the ratio of the abundance in one condition to the abundance in the other condition.

[0063] “Functional relationships”: The meaningful connections or interactions between entities in a biological system, such as the binding interactions between aptamers and their targets or the regulatory relationships between genes.

[0064] “GC content”: The percentage of guanine (G) and cytosine (C) bases in a DNA or RNA sequence.

[0065] “Hairpin loops” : A common secondary structure motif in which a single-stranded region of nucleotides is enclosed by a base-paired stem.

[0066] “Hamming distance”: A metric used to quantify the difference between two- character strings of equal length by counting the number of positions at which the corresponding symbols differ.

[0067] “Hex codes”: In the context of the neomer pipeline, these refer to unique nucleotide sequences added to selected libraries to serve as identifiers for different experimental conditions or replicates during next-generation sequencing.

[0068] “Hybridization levels”: The extent to which complementary nucleic acid strands form base pairs and associate into a double-stranded molecule.

[0069] “Internal loops”: A secondary structure motif in which unpaired nucleotides are present on both strands of a double-stranded region, interrupting the continuity of the duplex.

[0070] “Library”: A large, diverse pool of oligonucleotides used as the starting material for aptamer selection.

[0071] “Log2Fold change”: The logarithm (base 2) of the fold change, commonly used in the analysis of gene expression or sequence enrichment data to facilitate visualization and interpretation.

[0072] “Loop entropy parameters”: Thermodynamic parameters that account for the entropic cost of forming loops in nucleic acid secondary structures, used in the Turner energy model.

[0073] “Loop formations”: Unpaired nucleotides within a secondary structure, such as hairpin loops, internal loops, or bulges.

[0074] “Melting temperature”: The temperature at which half of the DNA or RNA molecules in a sample are in a single-stranded state, and the other half are in a doublestranded state.

[0075] “Minimum free energy (MFE)”: The lowest possible energy state of a secondary structure formed by a nucleic acid sequence, representing the most thermodynamically stable configuration.

[0076] “Multi -branched loops”: A secondary structure motif in which more than two helices are joined together by unpaired nucleotides.

[0077] “Nearest-neighbor thermodynamic parameters”: A set of energy values that describe the stability contributions of adjacent base pairs in a nucleic acid duplex, used in the Turner energy model.

[0078] “Neo-939 library”: A specific neomer library designed for aptamer selection, optimized for structural diversity and sequence composition.

[0079] “Neomer library”: A designed library of oligonucleotides with optimized sequence diversity and structural complexity for improved aptamer selection.

[0080] “Network” (in network analysis): Groups of related nodes connected by edges in a network graph, representing similar binding solutions or structural motifs in the context of aptamer selection.

[0081] “Network analysis”: A set of techniques used to study the relationships and interactions between entities in a complex system, often represented as graphs with nodes and edges.

[0082] “Next-generation sequencing” (NGS): High-throughput sequencing technologies that allow for the rapid and parallel sequencing of large numbers of nucleic acid fragments.

[0083] “Nodes”: The fundamental units in a network graph, representing the entities (such as sequences, structures, or genes) being studied.

[0084] “Non-canonical proteins”: Proteins that are not commonly studied or well- characterized, often due to their low abundance, unique isoforms, or post-translational modifications.

[0085] “Normalization procedures”: Statistical methods used to adjust for differences in sequencing depth or other technical factors between samples, allowing for more accurate comparisons of sequence abundance.

[0086] “Nucleic acid” or “Polynucleic” refers to a polymer of nucleotides covalently linked by phosphodiester bonds, such as deoxyribonucleic acids (DNA) or ribonucleic acids (RNA), in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides.

[0087] “P-value”: A statistical measure that indicates the probability of obtaining test results at least as extreme as the observed results, assuming that the null hypothesis istrue. In the context of aptamer selection, a low p-value suggests that the observed enrichment of a particular structure or sequence is unlikely to have occurred by chance.

[0088] “Pattern matching algorithms” : Computational methods used to find occurrences of a specific pattern within a larger sequence or dataset.

[0089] “Post-translational modifications”: Covalent modifications of proteins after their synthesis, such as phosphorylation, glycosylation, or ubiquitination, which can alter their function or stability.

[0090] “Power analysis”: A statistical method used to determine the minimum sample size required to detect an effect of a given size with a specified level of confidence.

[0091] “Primer3”: A widely used software tool for designing PCR primers, considering various parameters such as melting temperature, GC content, and potential for secondary structure formation.

[0092] “Principal Component Analysis” (PC A): A widely used dimensionality reduction method that transforms a set of correlated variables into a smaller number of uncorrelated variables called principal components, which capture the maximum amount of variance in the original data.

[0093] “Proteomics”: The large-scale study and quantitation of proteins, particularly their structures and functions.

[0094] “Punnett square approach”: A method used in the Alzheimer's disease study (Meehan et al., 2024) to generate a library of aptamers with 16 random nucleotides.

[0095] qPCR (quantitative polymerase chain reaction): A laboratory technique used to amplify and quantify targeted DNA molecules, often used for gene expression analysis or diagnostic applications.

[0096] “Regular expressions”: A sequence of characters that define a search pattern, primarily used for pattern matching and text processing.“RNAfold”: A computational tool used to predict the secondary structure of single-stranded RNA or DNA sequences.

[0098] “Salt correction factors”: Mathematical factors used to adjust the thermodynamic parameters for nucleic acid secondary structure prediction based on the salt concentration of the solution.

[0099] “Secondary structures”: The base-pairing interactions within a single-stranded RNA or DNA molecule, often represented by stem-loops, pseudoknots, or other structural motifs.

[0100] “SELEX” (Systematic Evolution of Ligands by Exponential enrichment): An iterative process used to identify aptamers that bind to a specific target molecule with high affinity and specificity.

[0101] “Sequence diversity”: The number of unique sequences present in a library or pool of aptamers.

[0102] “Shannon Diversity Index” (SDI): A quantitative measure that reflects the number of different types (such as different nucleotide bases or structural motifs) in a dataset, while also taking into account the evenness of the distribution of these types.

[0103] “Stacking interactions”: The noncovalent, attractive forces between adjacent base pairs in a nucleic acid double helix, contributing to the stability of the structure.

[0104] “Standard deviation”: A measure of the amount of variation or dispersion in a set of data values, often used to quantify the spread of a distribution.

[0105] “Statistical inference”: The process of drawing conclusions about a population based on data from a sample.

[0106] “Stem-loop structure”: A common structural motif in RNA or DNA molecules, consisting of a double-stranded stem and a single-stranded loop.

[0107] “Structural redundancy”: A library design with reduced random nucleotide number that exhibits a characteristic where multiple sequences form the same secondary structures such that the number of secondary structures is an order of magnitude lower than the total possible sequences (i.e. 1, 10, 100, 1000 times more possible sequences than structures).

[0108] “ Subnetwork” : Nodes in a network graph that are connected to a large number of other nodes, often representing key entities or hubs in the system being studied.

[0109] “Symmetrical "hairball" structure”: A type of network graph in which the nodes are highly interconnected without any clear pattern or organization, making it difficult to interpret the underlying relationships or functional significance.

[0110] “t-Distributed Stochastic Neighbor Embedding” (t-SNE): A non-linear dimensionality reduction technique used for visualizing high-dimensional data in a lowerdimensional space, often used for exploring and visualizing clusters in complex datasets.

[0111] “Temperature-dependent corrections”: Adjustments made to the thermodynamic parameters for nucleic acid secondary structure prediction to account for the effect of temperature on the stability of the structure.

[0112] “Theophylline”: A methylxanthine compound structurally similar to caffeine, often used as a target molecule in aptamer selection studies.

[0113] “Transcriptomics”: The study of the complete set of RNA transcripts produced by the genome under specific circumstances or in a specific cell, using high-throughput methods such as RNA sequencing.

[0114] “tRNA” (transfer RNA): Small RNA molecules that transport amino acids to ribosomes during protein synthesis.

[0115] “Turner energy model”: A widely used set of thermodynamic parameters for predicting the stability of RNA secondary structures, based on experimentally determined free energy contributions of various structural motifs.

[0116] “Z-score”: A statistical measure that indicates how many standard deviations an observation or data point is from the mean of a dataset. In aptamer selection, Z-scores can be used to identify structures or sequences that are significantly enriched compared to the background.BRIEF DESCRIPTION OF THE DRAWINGS

[0117] The present disclosure will be better understood, and other specific features and advantages will emerge upon reading the following description of particular andnon-restrictive illustrative embodiments, the description making reference to the annexed drawings wherein:

[0118] Figure 1 is a flow chart showing successive steps of a computer-implemented method for designing an optimized aptamer library’s template, compliant with the present disclosure;

[0119] Figure 2 is a flow chart showing successive steps of a method for generating at least one first secondary structure, compliant with the present disclosure;

[0120] Figure 3 is a flow chart showing successive steps of a method for determining relatedness of aptamer’s structures, compliant with the present disclosure;

[0121] Figure 4 is a block diagram representing schematically a particular mode of a device for designing an optimized aptamer library’s template, compliant with the present disclosure;

[0122] Figure 5 is a block diagram representing schematically a particular mode of a device for determining relatedness of aptamer’s structures, compliant with the present disclosure;

[0123] Figure 6 diagrammatically shows an apparatus integrating the functions of the device figure 4 and of the device of figure 5;

[0124] Figure 7 is an illustration of the explanation of structure annotation;

[0125] Figure 8 is a heatmap showing enrichment of SDI per position across multiple random nucleotides numbers in a neomer template and a Somalogics SELEX library;

[0126] Figure 9 is a histogram and densigram of logarithmically transformed counts of possible sequences per secondary structure in the optimized 12 nucleotide random library (Neomer Hyb939);

[0127] Figure 10 is a plot illustrating the dispersion estimate fitting of structure counts for Hyb939 IL6 using DESeq2;

[0128] Figure 11 is a plot of the principal component analysis, which is a dimensionality reduction technique that can be used to show how sample cluster with variability. Here, we see that sample conditions of IL-6 (bottom right) show high concordance with each other based on structure enrichment patterns, while naive samples cluster in the middle left and negative controls in the top right;

[0129] Figure 12 is a volcano plot of target conditions (IL6 here) vs naive show significant fold change and p-values that can be used to select differentially enriched structures;

[0130] Figure 13 is a Venn diagram of all differentially compared conditions are illustrated here; IL6 vs Naive (I6vsNlfr011c05), IL6 vs HSA (I6vsNAfr011c05) and IL6 vs IgG (I6vsNGfr011c05). Overlapping differentially enriched structures allows the identification of structures specific to IL6 with no cross reactivity;

[0131] Figure 14 is an illustration of the asymmetrical network topology for IL6 selected structures;

[0132] Figure 15 is an illustration of the asymmetric structure network analysis of inferred different binding solutions. Nodes indicated by boxes are candidate secondary structures for sequence analysis and subsequently truncation that are inferred to be from different binding solutions (Table 2). These also have the potential to be filtered and screened for other metrics such as log2fold enrichment in other selections that have used Hyb939. Red dots are those that passed filter checks, while yellows are the ones selected for different epitope binding;

[0133] Figure 16 is a plot of sequence enrichment analysis shows several enriched sequences for IL6 vs Naive in a volcano plot of log2fold change and log transformed p- values;

[0134] Figure 17 is a diagram showing the truncation of the IL6 aptamer IL6-D 1 from motif analysis;

[0135] Figure 18 is a diagram showing the truncation of the IL6 aptamer IL6-D 2 from motif analysis;

[0136] Figure 19 is a diagram showing the truncation of the IL6 aptamer IL6-D 3 from motif analysis;

[0137] Figure 20 is a diagram showing the truncation of the IL6 aptamer IL6-D 4 from motif analysis;

[0138] Figure 21 is a diagram showing the truncation of the IL6 aptamer IL6-D 5 from motif analysis;

[0139] Figure 22 is a surface plasmon resonance imaging results for all Interleukin-6 selected aptamers following subtraction of a negative aptamer with two different concentrations of interleukin 6 (A) 100 nM, (B) 250 nM. (The legend provides the identity of each aptamer in this graph). The association phase was for 240 seconds followed by the disassociation phase.

[0140] Figure 23 is a surface plasmon resonance imaging results for all Interleukin-6 selected aptamers following subtraction of a negative aptamer with two different concentrations of human serum albumin (A) 100 nM, (B) 250 nM. (The legend provides the identity of each aptamer in this graph). The association phase was for 240 seconds followed by the disassociation phase.

[0141] Figure 24 is a surface plasmon resonance imaging results for all Interleukin-6 selected aptamers following subtraction of a negative aptamer with two different concentrations of a pool of immunoglobulins (A) 100 nM, (B) 250 nM. (The legend provides the identity of each aptamer in this graph). The association phase was for 240 seconds followed by the disassociation phase.

[0142] Figure 25 is a flow chart showing successive steps of a computer-implemented method for generating at least one first secondary structure, compliant with the present disclosure.

[0143] On the figures, the drawings are not to scale, and identical or similar elements are designated by the same references.ILLUSTRATIVE EMBODIMENTS

[0144] The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its scope.

[0145] All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.

[0146] Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

[0147] Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein may represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

[0148] The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, a single shared processor, or a plurality of individual processors, some of which may be shared.

[0149] It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably,these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input / output interfaces.

[0150] The present disclosure will be described in reference to a particular functional embodiment of a computer-implemented method 1 for designing an optimized aptamer library’s template, as illustrated on Figure 1.

[0151] The computer-implemented method 1 (i.e., method 1) is adapted to produce an optimized aptamer library’s template.

[0152] The computer-implemented method 1 for designing an optimized aptamer library’s template may be associated with a method 2 for generating at least one first secondary structure using an optimized aptamer library’s template 50 obtained from method 1, represented on Figure 2, which will be subsequently described.

[0153] One aptamer sequence forms a unique secondary structure under specific conditions (such as temperature, salt, etc.). However, one unique secondary structure can be obtained by multiple aptamer sequences.

[0154] Though the presently described methods 1 and 2 are versatile and provided with several functions that can be carried out alternatively or in any cumulative way, other implementations within the scope of the present disclosure include methods having only parts of the present functionalities.

[0155] The computer-implemented 1 may comprise a first step 11 for designing an aptamer library’s template. The designing of said aptamer library’s template may notably comprise:(i) regions of fixed nucleotides comprising:(i.a) 5’ end region of fixed nucleotides and 3’ end region of fixed nucleotides, and (i.b) internal regions of fixed nucleotides;(ii) regions of random nucleotides; wherein said aptamer library’s template comprises from 50 to 90 nucleotides, preferably from 70 to 80 nucleotides; and wherein said regions of random nucleotides comprise from 12 to 16 nucleotides.

[0156] In some embodiments, the aptamer library’s template comprises (or consists in) 40 to 90 nucleotides, from 50 to 90 nucleotides, from 55 to 90 nucleotides, from 60 to 90 nucleotides, from 65 to 90 nucleotides, from 70 to 90 nucleotides, from 75 to 90 nucleotides, from 80 to 90 nucleotides, from 85 to 90 nucleotides, from 50 to 85 nucleotides, from 50 to 80 nucleotides, from 50 to 75 nucleotides, from 50 to 70 nucleotides, from 50 to 65 nucleotides, from 50 to 60 nucleotides, from 50 to 60 nucleotides, or from 50 to 55 nucleotides.

[0157] In some embodiments, the aptamer library’s template comprises (or consists in) preferably 70 to 80 nucleotides, 75 to 80 nucleotides, 70 to 75 nucleotides.

[0158] In some embodiments, the aptamer library’s template comprises (or consists in) 50 nucleotides, 55 nucleotides, 60 nucleotides, 65 nucleotides, 70 nucleotides, 75 nucleotides, 80 nucleotides, 85 nucleotides, or 90 nucleotides.

[0159] In some embodiments, the aptamer library’s template comprises (or consists in) 70 nucleotides, 71 nucleotides, 72 nucleotides, 73 nucleotides, 74 nucleotides, or 75 nucleotides, preferably 74 nucleotides.

[0160] In some embodiments, the aptamer library’s template comprises (or consists of) 49 nucleotides.

[0161] In some embodiments, the regions of random nucleotides comprise from 12 to 20 nucleotides, 12 to 16 nucleotides in total, the regions of random nucleotides comprises from 13 to 16 nucleotides in total, the regions of random nucleotides comprises from 14 to 16 nucleotides in total, the regions of random nucleotides comprises from 15 to 16 nucleotides in total, the regions of random nucleotides comprises from 12 to 15 nucleotides in total, the regions of random nucleotides comprises from 12 to 14 nucleotides in total, and the regions of random nucleotides comprises from 12 to 13 nucleotides in total.

[0162] In some embodiments, the regions of random nucleotides comprise 11 nucleotides in total, 12 nucleotides in total, the regions of random nucleotides comprise 13 nucleotides in total, the regions of random nucleotides comprise 14 nucleotides in total, the regions of random nucleotides comprise 15 nucleotides in total, the regions of random nucleotides comprise 16 nucleotides in total, 17 nucleotides in total, 18nucleotides in total, 19 nucleotides in total, 20 nucleotides in total, preferably the regions of random nucleotides comprise 14 nucleotides in total, preferably 12 nucleotides.

[0163] In some embodiments, the regions of random nucleotides consist of 11 nucleotides in total, 12 nucleotides in total, the regions of random nucleotides comprise 13 nucleotides in total, the regions of random nucleotides comprise 14 nucleotides in total, the regions of random nucleotides comprise 15 nucleotides in total, the regions of random nucleotides comprise 16 nucleotides in total, 17 nucleotides in total, 18 nucleotides in total, 19 nucleotides in total, 20 nucleotides in total, preferably the regions of random nucleotides comprise 14 nucleotides in total, preferably 12 nucleotides.

[0164] An aptamer library’s template is, for example, a structured nucleotide sequence framework that combines fixed and random regions to create a standard format for an aptamer. This aptamer library’s template may be thought of as an array or vector, where specific positions are filled with either fixed nucleotides (e.g., at the 5' and 3' ends) or random nucleotides (e.g., in internal regions) according to predefined rules to introduce diversity.

[0165] The method 1 may further comprise a second step 12 for generating multiple versions of the aptamer library’s template. Each one of these multiple versions being a sequence, so that multiple sequences are generated based on this designed aptamer library’s template. Each of the multiple versions may notably be generated so as to satisfy following criteria: said 5’end and 3’end regions of fixed nucleotide comprise from 8 to 16 nucleotides; said internal regions of fixed nucleotides have a G / C content from 40 to 60%; contains maximum three consecutive G’s.

[0166] In some embodiments, the 5’ end region of fixed nucleotides and the 3’end regions of fixed nucleotides have the same number of nucleotides.

[0167] In some embodiments, the 5’ end region of fixed nucleotides and the 3’end regions of fixed nucleotides have not the same number of nucleotides.

[0168] In some embodiments, the 5’ end region of fixed nucleotides comprises from 8 to 16 nucleotides, from 8 to 15 nucleotides, from 8 to 14 nucleotides, from 8 to 13nucleotides, from 8 to 12 nucleotides, from 8 to 11 nucleotides, from 8 to 10 nucleotides, from 8 to 9 nucleotides, from 9 to 16 nucleotides, from 10 to 16 nucleotides, from 11 to 16 nucleotides, from 12 to 16 nucleotides, from 13 to 16 nucleotides, from 14 to 16 nucleotides, from 15 to 16 nucleotides.

[0169] In some embodiments, the 5’ end region of fixed nucleotides comprises or consists of 8 nucleotides, 9 nucleotides, 10 nucleotides, 11 nucleotides, 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides or 16 nucleotides, preferably 8 nucleotides.

[0170] In some embodiments, the 5’ end region of fixed nucleotides is SEQ ID NO 1 : GCGTAAGC.

[0171] In some embodiments, the 3’ end region of fixed nucleotides comprises from 8 to 16 nucleotides, from 8 to 15 nucleotides, from 8 to 14 nucleotides, from 8 to 13 nucleotides, from 8 to 12 nucleotides, from 8 to 11 nucleotides, from 8 to 10 nucleotides, from 8 to 9 nucleotides, from 9 to 16 nucleotides, from 10 to 16 nucleotides, from 11 to 16 nucleotides, from 12 to 16 nucleotides, from 13 to 16 nucleotides, from 14 to 16 nucleotides, from 15 to 16 nucleotides.

[0172] In some embodiments, the 3’ end region of fixed nucleotides comprises or consists of 8 nucleotides, 9 nucleotides, 10 nucleotides, 11 nucleotides, 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides or 16 nucleotides, preferably 8 nucleotides.

[0173] In some embodiments, the 3’ end region of fixed nucleotides is SEQ ID NO 2: CCGGAATT.

[0174] In some embodiments, the internal regions of fixed nucleotides have a G / C content from 40 to 60%, from 45 to 60%, from 50 to 60%, from 55 to 60%, from 40 to 55%, from 40 to 50%, or from 40 to 45%.

[0175] In some embodiments, the internal regions of fixed nucleotides have a G / C content of 40%, of 45%, of 50% of 55%, or of 60%, preferably of 50%.

[0176] In some embodiments, the internal region of fixed nucleotides is SEQ ID NO 3: ATACAGCNGATGACTNNGACGNNAGGTNCCANNTGTNGCNATCANNTGCTG ANNCTAG.

[0177] In some embodiments, the aptamer library’s template contains 1 consecutive G, 2 consecutive G’s, 3 consecutive G’s, preferably no more than 1 consecutive G, no more than 2 consecutive G’ s, no more than 3 consecutive G’ s, more preferably no more than 2 consecutive G’s.

[0178] For example, a version of the aptamer library’s template may comprise 74 nucleotides in length, with a structure that includes an 8-nucleotide fixed region at the 5’ end, an internal region of 44 fixed nucleotides with 50% G / C content, and an 8-nucleotide fixed region at the 3’ end, 14 dispersed N nucleotides, and with no more than two consecutive G’s. For instance, a version of the aptamer library’s template may be: 5' - GCGTAAGC - ATACAGCNGATGACTNNGACGNNAGGTNCCANNTGTNGCN ATCANNTGCTGANNCTAG - CCGGAATT - 3' (SEQ ID NO 4).

[0179] In some embodiments, the aptamer library’s template is SEQ ID NO 4: 5' - GCGTAAGC - ATACAGCNGATGACTNNGACGNNAGGTNCCANNTGTNGCN ATCANNTGCTGANNCTAG - CCGGAATT - 3', wherein -GCGTAAGC- is 5’ end region of fixed nucleotides, -CCGGAATT - is 3’ end region of fixed nucleotides, - ATACAGCNGATGACTNNGACGNNAGGTNCCANNTGTNGCNATCANNTGCTGANNCTAG - is the internal region, and wherein N is a random nucleotide chosen from A, T, G or C.

[0180] Advantageously, this specific construction of an aptamer library’s template allows optimizing the number of secondary structures and maximizing their diversity.

[0181] Method 1 may further comprise a step 13 for predicting one or more secondary structures using the multiple versions, and that for each version of the multiple versions. For example, RNAfold may be used to predict the most probable secondary structure, using a dot-bracket notation to facilitate comparisons across multiple versions of the aptamer library’s template and identify common structural motifs. RNAfold may calculate the minimum free energies (MFE) of secondary structures at a specific temperature (e.g. 22°C) and a specific salt concentration (e.g. 0.127M). RNAfold employs a dynamic programming algorithm to determine the secondary structure with the lowest energy, focusing on nearest-neighbor interactions that influence the secondarystructure stability, loop entropy for various types of loops (such as hairpin, bulge, internal, and multi-branched loops), and dangling end parameters.

[0182] Method 1 may further comprise a step 14 for evaluating the structural diversity of each version (i.e., generated sequence) of the multiple versions. The evaluation of each version may comprise: counting a number of secondary structures predicted during step 13 for said sequence; evaluating the complexity of the secondary structures predicted during step 13.

[0183] Evaluating the complexity of secondary structures may comprise: counting the number of base pairs in each secondary structure to assess hybridization levels; sorting secondary structures based on base pair counts; and selecting a predefined number of libraries (e.g. nine libraries). For example, nine libraries may be selected among which three are highly hybridized (top base pair count = ~32), three are mediumly hybridized (closest to mean base pair count = ~16), and three are lowly hybridized (lowest base pair count = ~0).

[0184] In some embodiments, the complexity of secondary structures may be selected from the group comprising: highly hybridized, mediumly hybridized, and lowly hybridized.

[0185] In some embodiments, highly hybridized corresponds to the range from [0 to 16[ hybridized base pairs, mediumly hybridized from [16 to 32[ hybridized base pairs, and lowly hybridized 32 or more hybridized base pairs, such as [32 to 40],

[0186] In some embodiments, the predefined complexity criterion may be a level, a percentage, or a numerical value. A level may categorize complexity into different hybridization categories such as “low,” “medium,” and “high”. A percentage or a numerical value may quantify the degree of hybridization of a secondary structure by measuring for example the extent of base-pairing interactions (e.g. pairing within regions of the secondary structure) within the secondary structure.

[0187] Furthermore, the Shannon diversity Index (SDI) may be used for the evaluation of complexity of secondary structures. In this context, the SDI can quantify how many of these secondary structures contribute to the overall diversity. For instance, if many secondary structures possess high internal hybridization, but only a few types are present, the diversity score would be lower. Conversely, if a wide variety of secondary structures with varying levels of internal hybridization are observed, the SDI will indicate higher structural diversity. Advantageously, a high SDI suggests a rich diversity of secondary structures that may enhance the functionality and binding specificity of the aptamer’s library template, as diverse structures can interact with a wider range of target molecules.

[0188] The method 1 may further comprise a step 15 for evaluating of distribution of said multiple versions for each secondary structure. Step 15 may analyze the frequency and diversity of the multiple versions associated with each secondary structure to ensure balanced representation and maximize binding affinity and specificity. This advantageously allows to identify predominant versions, among the multiple versions, that may overshadow others, guiding the optimization of the secondary structure selection.

[0189] The method 1 may further comprise a step 16 for selecting the optimized aptamer library’s template 50 among the multiple versions as the one satisfying the following predefined criteria:• a number of secondary structures superior to 100,000,• the complexity of secondary structures being selected from the group comprising: highly hybridized, mediumly hybridized, and lowly hybridized.• said distribution of said sequences per secondary structure has a median greater than 10.

[0190] In some embodiments, the number of secondary structures is superior than 100 000, superior than 150 000, superior than 200 000, superior than 300 000, superior than 400 000, superior than 500 000, superior than 1 million, superior than 500 000 million, superior than 1 billion, or superior than 100 000 billion.

[0191] In some embodiments, the number of secondary structures is 100 000, 150 000, 200 000, 300 000, 400 000, 500 000, 1 million, 500 00 million, 1 billion, or 100 000 billion.

[0192] In some embodiments, the distribution of the possible sequences per secondary structure has a median more than 10, more than 20, more than 50, or more than 100. In some embodiments, the distribution of the possible sequences per secondary structure has a median about 10, about 20, about 50, or about 100.

[0193] In other words, step 16 involves a selection process for identifying the optimized aptamer library’s template 50 based on specific, predefined criteria that ensure structural diversity and functional efficacy. For instance, the requirement that the number of secondary structures exceeds a certain threshold (e.g., more than 50 secondary structures) ensures that the optimized aptamer library’s template 50 can potentially bind to a wide range of target molecules, increasing the likelihood of finding aptamers that can interact with various proteins or biomarkers. Similarly, a criterion for the complexity of secondary structures (e.g., requiring a minimum complexity score of 20 based on factors like loop types and folding patterns) may guarantee that the aptamers exhibit intricate configurations that may enhance binding specificity and affinity. Furthermore, ensuring that the distribution of multiple versions per secondary structure has a median greater than 10 (e.g., indicating that at least half of the structures have more than 10 associated sequences) promotes robust representation and diversity within the library, which is essential for successful downstream applications like diagnostics or therapeutic interventions. By adhering to these criteria, the selection process maximizes the potential of the optimized aptamer library’s template 50 to yield effective and specific binding solutions.

[0194] In some embodiments, taken together, these three criteria act synergistically: i) a high number of distinct secondary structures provide broad structural coverage; ii) a sufficient structural complexity ensures that these structures are; iii) a sufficient distribution of multiple versions per structure guarantees that each structure is properly represented during selection. By meeting all three conditions together, the optimizedaptamer library’s template 50 achieve maximal efficiency by producing high-affinity and high-specificity aptamers.

[0195] The evaluation of structural diversity may be evaluated using software selected from: RNAFold, MF old, and MXFold. RNAFold may be particularly effective for predicting the minimum free energy (MFE) structures and provides insights into the most stable configurations based on nucleotide interactions, which can help assess structural diversity across various sequences. MF old, on the other hand, allows for comprehensive folding simulations, enabling the exploration of multiple potential secondary structures and their probabilities, thus facilitating a deeper understanding of structural variability within a library. Whereas, MXFold employs advanced algorithms to evaluate complex RNA structures and their interactions, offering high-resolution predictions that are vital for identifying folding patterns and motifs. These tools can help researchers to quantitatively and qualitatively analyze the structural diversity of an aptamer library, ensuring the identification of candidates with optimal functional properties for binding applications.

[0196] As previously mentioned, the present invention further relates to a method 2 adapted to generate at least one first secondary structure using an optimized aptamer library’s template 50 obtained from the computer-implemented method 1.

[0197] Method 2 may comprise a first step 21 configured to generate, from the optimized aptamer library’s template 50, an optimized aptamer library comprising multiple sequences. In other words, step 21 may comprise the generation of the optimized aptamer library through replacement of each random nucleotide N in the optimized aptamer library’s template 50 with one of the four nucleotides A, T, C, and G. For example, if the optimized aptamer’s library template comprises 12 random nucleotide positions, and each position can be filled with any of the four nucleotides (A, T, C, G), the total number of combinations would be 412. This means that 16777216 sequences may be generated from the optimized aptamer library’s template 50 based on the random nucleotides positions. Advantageously, this not only leads to a rich library of diverse multiple sequences but also ensures that each variant (i.e. each sequence of the multiple sequences) retains thestructural features crucial for aptamer function, such as the ability to fold into specific secondary structures.

[0198] The method 2 may further comprise a second step 22 for applying the optimized aptamer library from step a (step 21), to one or more target molecules, notably at least one target molecule. In one example, the optimized aptamer library comprises an average copy number per sequence from 100 to 1E7. For instance, if the (at least one) target molecule is a specific protein, the optimized aptamer library might be incubated with a solution containing said specific protein, allowing aptamers that recognize and bind to the specific protein to attach themselves. Alternatively, if the target molecule is a viral RNA, the optimized aptamer library could be applied in a cell-free system where the aptamers interact with the viral RNA, seeking to identify sequences of the optimized aptamer library that bind most effectively to the target molecule / viral RNA. Another example could involve using the optimized aptamer library in a diagnostic assay for detecting biomarkers in a patient sample, where the optimized aptamer library is applied to the patient sample to capture a specific disease-related target molecule. In each case, the aim is to advantageously utilize the optimized aptamer library to find sequences (of the optimized aptamer library) that demonstrate strong binding affinity to the target molecule. With an average copy number per sequence ranging from 100 to 10 million, this ensures robust representation during selection.

[0199] The method 2 may further comprise a step 23 for removing sequences of the optimized aptamer library which unbound to the (first) target molecule from step b (step 22) through a wash step. During step 23, the sequence (i.e., first target molecule) interactions are stabilized, allowing bound sequences to remain attached to the first target molecule. Meanwhile, unbound sequences are washed away using a suitable buffer or saline solution.

[0200] Method 2 may further comprise a step 24 for eluting sequences of the optimized aptamer library which bound to the first target molecule from step b (step 22) so as to obtain a first selected optimized aptamer library. In other words, step 24 may be used to separate bound sequences from the target molecule, resulting in the creation of a firstselected optimized aptamer library that contains only those sequences with specific affinity for the first target molecule.

[0201] Method 2 may further comprise a step 25 for proceeding to NGS analysis of the first selected optimized aptamer library, and a step 26 for proceeding to NGS analysis of the optimized aptamer library of step a (step 21). Steps 25 and 26 are not necessarily in this specific order.

[0202] Method 2 may further comprise a step 27 for predicting one or more secondary structures using each sequence in each library (i.e., the first selected optimized aptamer library and the optimized aptamer library) and evaluating secondary structure(s) of each sequence in each library. Alternatively, step 27 may involve evaluating the secondary structures of each sequence within both the first selected optimized aptamer library and the optimized aptamer library. This evaluation is essential for understanding how the sequences fold into secondary structures, which can influence their binding properties and overall functionality. By using computational tools such as RNAfold or similar software, the sequences may be analyzed to predict their secondary structures. After identifying the secondary structures, step 27 may comprise counting the number of occurrences of each secondary structure within both libraries (i.e. the first selected optimized aptamer library and the optimized aptamer library). This advantageously helps to assess the structural diversity of the secondary structure candidates, identify common structural motifs, and determine which secondary structures may be associated with high binding affinity to the target molecule (e.g. the first target molecule).

[0203] Method 2 may further comprise a step 28 for comparing the number of each secondary structure of the first selected optimized aptamer library to the number of each secondary structure of the optimized aptamer library. Step 28 is important for assessing the effectiveness of the selection process, as it reveals which secondary structures have been enriched, indicating their potential binding affinity to the target molecule.

[0204] Method 2 may further comprise a step 29 for identifying the at least one first secondary structure, being at least one secondary structure defined on the base of p-values and on the base of fold difference between the number of each secondary structure of theoptimized aptamer library to the number of each secondary structure of the first selected optimized aptamer library. On one hand, p-values can indicate the statistical significance of the differences in the occurrence of each secondary structure between the optimized aptamer library and the first selected optimized aptamer library. A low p-value suggests that the observed differences are unlikely to have occurred by chance, thereby highlighting structures that are significantly enriched in the first selected optimized aptamer library. On the other hand, the fold difference quantifies how many times more prevalent each secondary structure is in the first selected optimized aptamer library compared to the optimized aptamer library. Structures with a high fold difference indicate a substantial increase in their presence after the selection process, further signifying their potential relevance in binding interactions. By integrating these two metrics (i.e. p-values and fold difference), one can identify one or more secondary structures that are not only statistically significant but also demonstrate a meaningful increase in abundance following selection. Advantageously, step 29 allows to pinpoint the most promising aptamer candidates (i.e. secondary structures) that are likely to contribute to successful binding to a target molecule.

[0205] According to one embodiment, the first target molecule is immobilized. In other words, the first target molecule can be fixed in place on a solid surface, such as a microplate or a sensor. This immobilization enhances the efficiency of binding interactions between secondary structures and the first target molecule, as the secondary structures can readily access the stationary target without the need for diffusion. Moreover, immobilization facilitates steps, such as washing away unbound sequences (in step 23) and eluting bound sequences (in step 24), ultimately improving the identification and isolation of sequences of the optimized aptamer library with strong, specific binding interactions.

[0206] It has to be understood that the embodiments previously described with respect to a one target molecule (first target molecule), may be repeated on a second target molecule, third target molecule etc.

[0207] Step i (i.e. step 29) may further comprise determining an optimal aptamer sequence by determining which sequence exhibits the strongest response to selection. Asecondary sequence with "strongest response to selection" refers to the secondary sequence that is most significantly enriched after the selection process, meaning it is present in a much higher quantity in the first selected optimized aptamer library relative to its initial abundance in the optimized aptamer library. The evaluation of the “strongest response to selection” may be based on multiple criteria, including the frequency of a secondary structure in the first selected optimized aptamer library, the statistical significance of its enrichment (as indicated by p-values), and the extent of its abundance compared to the optimized aptamer library.

[0208] Method 2 may further comprise applying the at least one first secondary structure obtained from step i (i.e. step 29) to at least one second target molecule and evaluating a specificity of the at least one first secondary structure to bind to the at least one secondary target molecule. For example, if the first secondary structure is a specific aptamer that binds effectively to a first target molecule A, this step would entail testing whether that same aptamer can also bind to a second target molecule B. For instance, the aptamer (i.e. secondary structure) may be incubated with the second target molecule B; then the binding affinity and specificity of the second target molecule B may be assessed through techniques like surface plasmon resonance or enzyme-linked assays. If the aptamer binds strongly to the first target molecule A but shows little to no interaction with second target molecule B, this indicates a high specificity for the first target molecule A. Advantageously, this helps to ensure that the aptamer selectively interacts with the intended target without cross-reacting with other molecules. Techniques such as surface plasmon resonance (SPR), bio-layer interferometry (BLI), and enzyme-linked immunosorbent assays (ELISA) may be used to assess the binding affinity and specificity of an aptamer for its target molecules.

[0209] According to one embodiment, a defined target is not applied, but the Aptamarker approach is applied. This differs in that a specific target is not defined to identify sensitive and specific aptamers from the optimized aptamer library’s template, but rather that the aptamers with the highest fold differences and statistical significance in one condition versus another condition (i.e diseased versus healthy) are selected on the basis that they are inferred to be binding to an unknown target that is discriminative and thus predictiveof the disease condition. This has been enabled by a previous filing and the application of FRELEX for the agnostic identification of predictive biomarkers. The FRELEX technic can be used for defined targets. It would be clear to one trained in the art that this embodiment enables the identification of aptamers that are either specific for one target over another or are able to bind to both targets.

[0210] The present disclosure also relates to a method 7 adapted to generate at least one first secondary structure using an aptamer library’s template 50-1. One example of method 7 is illustrated in figure 25.

[0211] Method 7 is advantageous because it enables the identification of secondary structures that bind to a target molecule with high specificity and sensitivity. By comparing the number of each secondary structure and identifying those defined by p- values and fold differences, method 7 identifies the most promising aptamer candidates. This approach ensures that the identified aptamers are not only statistically significant but also demonstrate a meaningful increase in abundance following selection, making it a powerful tool for aptamer selection. In addition, method 7 is versatile and simple to implement, as it does not rely on a particular library. It makes the method 7 flexible and adaptable to various experimental setups.

[0212] The method 7 may comprises a first step 61 for receiving an aptamer library’s template 50-1. The aptamer library’s template 50-1 may be generated according to any known technics. For example, the library is generated as disclosed in WO2023 / 137558. The aptamer library’s template 50-1 comprises multiple sequences. The aptamer library is obtained through replacement of each random nucleotide N in the aptamer library’s template 50-1 with one of the four nucleotides A, T, C, and G. For example, if the aptamer’s library template comprises 12 random nucleotide positions, and each position can be filled with any of the four nucleotides (A, T, C, G), the total number of combinations would be 412. This means that 16777216 sequences may be generated from the aptamer library’s template 50 based on the random nucleotide positions.

[0213] The method 7 may comprises a second step 62 for applying the aptamer library from the first step (step 61) to one or more target molecules, notably at least one targetmolecule. The step 62 has the same implementation, functionalities, and specificities as the step 22 of method 2.

[0214] The method 7 may further comprise a step 63 for removing sequences of the aptamer library which unbound to the (first) target molecule from step 62 through a wash step. The step 63 has the same implementation, functionalities, and specificities as the step 23 of method 2.

[0215] The method 7 may further comprises a step 64 for eluting sequences of the aptamer library which bound to the first target molecule from step 62 so as to obtain a first selected aptamer library. The step 64 has the same implementation, functionalities, and specificities as the step 24 of method 2.

[0216] The method 7 may further comprises a step 65 for proceeding to sequencing analysis, such as NGS analysis, of the first selected aptamer library, and a step 66 for proceeding to NGS analysis of the aptamer library of step 61. Steps 65 and 66 are not necessarily in this specific order.

[0217] The method 7 may further comprises a step 67 for predicting one or more secondary structures using each sequence in each library (i.e., the first selected aptamer library and the aptamer library) and evaluating secondary structure(s) of each sequence in each library. The step 67 has the same implementation, functionalities, and specificities as the step 27 of method 2.

[0218] The method 7 may further comprises a step 68 for comparing the number of each secondary structure of the first selected aptamer library to the number of each secondary structure of the aptamer library. The step 68 has the same implementation, functionalities, and specificities as the step 28 of method 2.

[0219] The method 7 may further comprises a step 69 for identifying the at least one first secondary structure, being at least one secondary structure defined on the base of p-values and on the base of fold difference between the number of each secondary structure of the aptamer library to the number of each secondary structure of the first selected aptamerlibrary. The step 69 has the same implementation, functionalities, and specificities as the step 29 of method 2.

[0220] According to one embodiment, the first target molecule is immobilized. In other words, the first target molecule can be fixed in place on a solid surface, such as a microplate or a sensor. This immobilization enhances the efficiency of binding interactions between secondary structures and the first target molecule, as the secondary structures can readily access the stationary target without the need for diffusion. Moreover, immobilization facilitates steps, such as washing away unbound sequences (in step 63) and eluting bound sequences (in step 64), ultimately improving the identification and isolation of sequences of the aptamer library with strong, specific binding interactions.

[0221] It has to be understood that the embodiments previously described with respect to a one target molecule (first target molecule), may be repeated on a second target molecule, third target molecule etc.

[0222] The step i (i.e. step 69) may further comprise determining an optimal aptamer sequence by determining which sequence exhibits the strongest response to selection. A secondary sequence with "strongest response to selection" refers to the secondary sequence that is most significantly enriched after the selection process, meaning it is present in a much higher quantity in the first selected aptamer library relative to its initial abundance in the aptamer library. The evaluation of the “strongest response to selection” may be based on multiple criteria, including the frequency of a secondary structure in the first selected aptamer library, the statistical significance of its enrichment (as indicated by p-values), and the extent of its abundance compared to the aptamer library.

[0223] The method 7 may further comprise applying the at least one first secondary structure obtained from step i (i.e. step 69) to at least one second target molecule and evaluating a specificity of the at least one first secondary structure to bind to the at least one secondary target molecule. For example, if the first secondary structure is a specific aptamer that binds effectively to a first target molecule A, this step would entail testing whether that same aptamer can also bind to a second target molecule B. For instance, theaptamer (i.e. secondary structure) may be incubated with the second target molecule B; then the binding affinity and specificity of the second target molecule B may be assessed through techniques like surface plasmon resonance or enzyme-linked assays. If the aptamer binds strongly to the first target molecule A but shows little to no interaction with second target molecule B, this indicates a high specificity for the first target molecule A. Advantageously, this helps to ensure that the aptamer selectively interacts with the intended target without cross-reacting with other molecules. Techniques such as surface plasmon resonance (SPR), bio-layer interferometry (BLI), and enzyme-linked immunosorbent assays (ELISA) may be used to assess the binding affinity and specificity of an aptamer for its target molecules.

[0224] According to one embodiment, a defined target is not applied, but the Aptamarker approach is applied. This differs in that a specific target is not defined to identify sensitive and specific aptamers from the aptamer library’s template, but rather that the aptamers with the highest fold differences and statistical significance in one condition versus another condition (i.e diseased versus healthy) are selected on the basis that they are inferred to be binding to an unknown target that is discriminative and thus predictive of the disease condition. This has been enabled by a previous filing and the application of FRELEX for the agnostic identification of predictive biomarkers. It would be clear to one trained in the art that this embodiment enables the identification of aptamers that are either specific for one target over another or are able to bind to both targets.

[0225] The computer-implemented method 3 (i.e. method 3) for determining relatedness of aptamer’s structures, illustrated on Figure 3, may comprise a step 31 for evaluating single nucleotide position difference between predicted secondary structures at any given position, the single nucleotide position difference being defined as a unit of distance. Step 31 may comprise comparing nucleotide sequences of predicted secondary structures one nucleotide at a time using a distance measurement. For instance, if secondary structure 1 is 5'-AGCTGAC-3' and secondary structure 2 is 5'-AGCTGTC-3', the single nucleotide change at the sixth position (A vs. T) signifies a unit of distance of one between the two sequences. Hamming distance may be calculated between sequences as an efficient method to quantify these differences, providing a straightforward metric for assessingstructural similarity of secondary structures. By calculating these differences across multiple predicted structures, step 31 may further comprise generating a distance matrix (i.e. structure reference array) comprising the distance between a secondary structure (i.e. reference secondary structure) and the other secondary structures; such a distance matrix allows to quantify the structural similarity or dissimilarity of each secondary structure (i.e. aptamer). Furthermore, this analysis allows to identify clusters of similar secondary structures, aiding in the selection of candidates for further study based on their functional potential and structural diversity.

[0226] Method 3 may further comprise a step 32 for generating, as network analysis, at least one structural network topology, comprising: applying a structural distance threshold to each of the predicted secondary structures to identify related structures whose structural distance from the secondary structure (i.e. reference secondary structure) is equal to or inferior than the structural distance threshold; and constructing a network topology that represents structural relationships and similarities among the selected secondary structures or their related structures. Step 32 begins by applying a structural distance threshold to the distance measurements obtained in step 31. The structural distance threshold may be used to identify related structures whose distance from a reference secondary structure is equal to or less than a predetermined value. A nearest neighbor approach may be used to identify centroid nodes and their related structures, connecting these through shared nodes to form clusters that represent analogous binding solutions to targets. This asymmetrical network structure is favored over a symmetrical "hairball” model, as it provides meaningful topological insights into functional relationships, aligning with recent advances in network theoiy. For example, if the structural distance threshold is set at 2, any secondary structure that is within this distance from the reference secondary structure will be considered related. Once the related structures are identified, a network topology is constructed to represent these structural relationships. In this network, nodes correspond to the secondary structures, while edges are drawn between nodes that meet the distance criteria. For instance, if secondary structure A has a distance of 1 from the reference secondary structure and secondary structure B has a distance of 2, both will be connected to the reference node corresponding to the reference secondary structure. This visual representation not only highlights direct relationships but alsoadvantageously allows to observe patterns, clusters, and relationships among the secondary structures. By analyzing the structural network topology, insights into the structural similarities of the aptamers (i.e. secondary structures) may be inferred, guiding the selection of candidates for further functional studies or applications based on their inherent structural diversity. Consequently, step 32 enhances the ability to explore and utilize aptamers with potentially similar functionalities.

[0227] In some embodiments, the selected secondary structure that binds to a given epitope on a protein can correspond to a shared structural solution to the problem of binding to a specific epitope.

[0228] According to one embodiment, said method further comprises:• using said at least one structural network topology as a basis for the hypothesis that each network contains a shared structural solution to the problem of binding to a specific epitope on a protein, and that different networks represent different structural solutions, this may mean different epitopes or the same epitope on a protein; and / or• selecting from the network analysis secondary structures that show appropriate levels of sensitivity and specificity through integration of information of how said secondary structures performed in selections across different targets and analyses.

[0229] Advantageously, a structural network topology also allows to form a conceptual space composed of multiple networks that represent different binding solutions to a target molecule. Each network may emerge from the structural relatedness of the aptamers, suggesting that aptamers within the same network may exhibit similar binding characteristics due to their shared structural features. This allows the exploration of aptamer candidates' functional potential, guiding the selection process by considering both structural similarities and their implications for binding behavior. For example, networks may correspond to different structural distance thresholds (e.g. Hamming distance threshold). Networks may be visualized using software like Cytoscape.

[0230] According to other advantageous aspects of the invention, the computer- implemented method 3 for determining relatedness of aptamer’s structures comprises one or more of the features described in the following embodiments, taken alone or in any possible combination.

[0231] Method 3 may further comprise a step 33 of using the at least one structural network topology as a basis for the hypothesis that each network contains a selected secondary structure that binds to a given epitope on a protein, and that different structural networks topology derived from a same protein bind to different epitopes. For example, if several secondary structures are grouped into different networks based on their structural similarities, one can hypothesize that each secondary structure (represented by a different network) is designed to bind to a specific epitope on a protein. This suggests that while all secondary structures are related to the same protein, they can be tailored to recognize different epitopes on that protein. This approach advantageously allows for a more targeted and effective binding strategy.

[0232] Method 3 may further comprise a step 34 for selecting from the network analysis secondary structures that show appropriate levels of sensitivity and specificity through integration of information of how the secondary structures performed in selections across different targets and analyses. Advantageously, step 34 allows to identify secondary structures that effectively bind to specific targets while minimizing interactions with nontargets, ensuring they are reliable candidates for further use.

[0233] In method 3, an additional level of information may be added; such level of information may provide information as to which elements of a secondary structure are necessary for binding and which are not, for use in truncating the secondary structure to a minimally effective form. In other to further obtain this information method 3 may further comprise the steps of: a. identifying structures in a structure reference array, that is generated by predicting the most probable secondary structure for all possible sequences from the optimized aptamer library at a defined temperature and salt concentration appropriate for selection, that are within a structural distance threshold of one of the selected secondary structures from the network analysis;b. determining an effect of selection on other secondary structures equal or inferior to the structural distance threshold of one of said secondary structures selected from the network analysis; c. identifying structural elements within said secondary structures denoted from step b using a sliding window technique where a window size ranges between 15 and 45 nucleotides; d. determining, among said structural elements, critical structural elements by analyzing those that perform positively (contribute significantly to the secondary structures effective binding to a target molecule) and negatively (may interfere with or reduce the secondary structures binding efficiency / specificity) in selection against a target molecule, as defined by Fisher’s exact test derived p- values; e. isolating positively performing critical structural elements by truncating said secondary structure to retain only said positively performing critical structural elements; f. ensuring structural integrity of the truncated secondary structure is maintained by: g. preserving complete hybridized portions of the secondary structure identified as critical structural elements for overall structural stability and functionality; h. removing redundant or unstructured dangling ends that do not contribute to stability or binding efficiency of the secondary structure.

[0234] In other words, after identifying structures in a structure reference array that are within the structural distance threshold, determining an effect of selection may be for instance achieved by assessing the performance of a secondary structure D in binding to a target molecule and find that it has a high affinity, then analyzing other sequences (e.g., secondary structure E and secondary structure F) that fall within the structural distance threshold from the secondary structure D. Evaluating the performance in selection assays for example, may reveal that while secondary structure E shows a similar binding affinity compared to secondary structure D, secondary structure F performs poorly. This may help to understand which structural features might be crucial for binding. Then, the sliding window allows to identify structural elements (e.g. stems, loops) that can be essential forthe secondary structure functionality and binding ability. Once the structural elements are identified, determining critical structural elements may be achieved by analyzing structural elements that perform positively and negatively in selection against a target molecule, as defined by Fisher’s exact test derived p-values. Structures elements that show a lower p-value may be identified as critical. After that, the secondary structure is truncated to retain only the positively performing critical structural elements. Finally, the structural integrity of the truncated secondary structure is ensured by: preserving complete hybridized portions of the secondary structure that are considered as critical structural elements; and removing redundant or unstructured dangling ends that do not contribute to stability or binding efficiency of the secondary structure.

[0235] The device 4 for designing an optimized aptamer library’s template is represented on Figure 4. Device 4 may implement the computer-implemented method 1 for designing an optimized aptamer library’s template according to any embodiment or any plausible combination of embodiments of the method 1.

[0236] The device 5 for determining relatedness of aptamer’s structures is represented on Figure 5. Device 5 may implement the computer-implemented method 3 for determining relatedness of aptamer’s structures according to any embodiment or any combination of embodiments of the method 3.

[0237] Though the presently described devices 4 and 5 are versatile and provided with several functions that can be carried out alternatively or in any cumulative way, other implementations within the scope of the present disclosure include devices having only parts of the present functionalities.

[0238] Devices 4 and / or 5 may also implement a computer-implemented method that may comprise at least one of the steps g, h and i of method 2.

[0239] Devices 4 and / or 5 may interact with a user interface 20, via which information can be entered and retrieved by a user. The user interface 20 includes any means appropriate for entering or retrieving data, information or instructions, notably visual, tactile and / or audio capacities that can encompass any or several of the following meansas well known by a person skilled in the art: a screen, a keyboard, a trackball, a touchpad, a touchscreen, a loudspeaker, a voice recognition system.

[0240] Devices 4 and / or 5 may interact with one or more local or remote database(s) 10. The latter can take the form of storage resources available from any kind of appropriate storage means, which can be notably a RAM or an EEPROM (Electrically-Erasable Programmable Read-Only Memory) such as a Flash memory, possibly within an SSD (Solid-State Disk).

[0241] A particular apparatus 9, visible on Figure 6, is embodying the device 4 as well as the device 5 described above. It corresponds for example to a workstation, a laptop, a tablet, a smartphone, or a head-mounted display (HMD).

[0242] That apparatus 9 may comprise the following elements, connected to each other by a bus 95 of addresses and data that also transports a clock signal:- a microprocessor 91 (or CPU);- a graphics card 92 comprising several Graphical Processing Units (or GPUs) 920 and a Graphical Random Access Memory (GRAM) 921; the GPUs are quite suited to image processing, due to their highly parallel structure;- a non-volatile memory of ROM type 96;- a RAM 97;- one or several I / O (Input / Output) devices 94 such as for example a keyboard, a mouse, a trackball, a webcam; other modes for introduction of commands such as for example vocal recognition are also possible;- a power source 98; and- a radiofrequency unit 99.

[0243] According to a variant, the power supply 98 is external to the apparatus 9.

[0244] The apparatus 9 may also comprise a display device 93 of display screen type directly connected to the graphics card 92 to display synthesized images calculated and composed in the graphics card. The use of a dedicated bus to connect the display device 93 to the graphics card 92 offers the advantage of having much greater data transmission bitrates and thus reducing the latency time for the displaying of images composed by thegraphics card. According to a variant, a display device is external to apparatus 9 and is connected thereto by a cable or wirelessly for transmitting the display signals. The apparatus 9, for example through the graphics card 92, comprises an interface for transmission or connection adapted to transmit a display signal to an external display means such as for example an LCD or plasma screen or a video-projector. In this respect, the RF unit 99 can be used for wireless transmissions.

[0245] It is noted that the word "register" used hereinafter in the description of memories 97 and 921 can designate in each of the memories mentioned, a memory zone of low capacity (some binary data) as well as a memory zone of large capacity (enabling a whole program to be stored or all or part of the data representative of data calculated or to be displayed). Also, the registers represented for the RAM 97 and the GRAM 921 can be arranged and constituted in any manner, and each of them does not necessarily correspond to adjacent memory locations and can be distributed otherwise (which covers notably the situation in which one register includes several smaller registers).

[0246] When switched-on, the microprocessor 91 loads and executes the instructions of the program contained in the RAM 97.

[0247] As will be understood by a skilled person, the presence of the graphics card 92 is not mandatory, and can be replaced with entire CPU processing and / or simpler visualization implementations.

[0248] In variant modes, the apparatus 9 may include only the functionalities of the device 4, and not the functionalities of the device 5. In addition, the device 4 and / or the device 5 may be implemented differently than a standalone software, and an apparatus or set of apparatus comprising only parts of the apparatus 9 may be exploited through an API call or via a cloud interface.

[0249] The present invention further relates to the product obtained by the method as described herein.

[0250] The invention also relate to an optimized aptamer library’s template and / or an optimized aptamer library obtained by the method as described herein.EXAMPLES

[0251] The present invention is further illustrated by the following examples.Optimized selection library design

[0252] The present invention answers the need of a method for the optimization of a reproducible aptamer selection library in regard to maximizing structural diversity while maintaining operational efficacy with library processing.

[0253] A library of 12 random nucleotides was designed (16.8M possible sequences) interspersed with fixed regions and flanked by primer recognition sequences to optimize the number of secondary structures and maximize diversity. RNAfold was used to predict the most probable secondary structure for each sequence in our library, representing the structures using dot-bracket notation (Figure 7). However, it should be noted that other secondary structure prediction tools could also be used. In this notation, dots represent unpaired nucleotides, while opening and closing parentheses represent paired nucleotides forming a stem.

[0254] The neomer library optimization pipeline (i.e. the method / process according to the present invention, or the pipeline) employs a multi-step process to generate a structurally diverse set of DNA or RNA sequences optimized for aptamer selection. The process begins by generating 10,000 random combinations of primers, interspersed fixed regions, and evenly distributed random nucleotide position templates. Each template is 49 nucleotides long, containing 37 nucleotides of fixed regions interspersed with 12 randomly placed nucleotides (Ns), excluding the primers at either end. This design allows for 16.8 million possible sequences within the random regions.

[0255] The optimization of interspersed fixed regions provides sequence context that influences overall structure and potential binding properties. The sequences of primers are designed using Primer3 with specific parameters: optimal size of 15 bp (range 8-16 bp), optimal melting temperature of 49°C (range 43-54°C), and GC content between 40- 60%. The pipeline ensures the template does not form secondary structures with the primer recognition regions. RNAfold with DNA energy parameters, was used to predict secondary structures and minimum free energies (MFE) for each sequence at 22°C and0.127M salt concentration. RNAfold uses a dynamic programming algorithm to find the secondary structure with the minimum free energy (MFE) for a given DNA sequence, considering base pairing and stacking interactions, as well as loop and bulge formations. It employs the Turner energy model adapted for DNA, which includes:1. Nearest-neighbor thermodynamic parameters for DNA duplex stability.2. Loop entropy parameters adjusted for DNA, including hairpin loops, bulge loops, internal loops, and multi-branched loops.3. DNA-specific dangling end parameters.4. Salt correction factors for the 0.127M concentration.5. Temperature-dependent corrections for 22°C.

[0256] The number of base pairs in each structure is counted to assess hybridization levels. Sequences are then sorted based on base pair count, and nine libraries are selected: three highly hybridized (top base pair count = —32), three medium hybridized (closest to mean base pair count = —16), and three lowly hybridized (lowest base pair count = ~0). It is hypothesized that the optimum level of hybridization may be different for different targets. As such, the hybridization level is also an important variable for library optimization.

[0257] For each selected library, the pipeline generates 500 template permutations by varying the positions of the 12 random nucleotides within the 37-nucleotide fixed region. This step characterizes different structural possibilities while maintaining the optimized fixed sequence and primer composition. For each template, 1,000 sequences are generated by filling the N positions with random nucleotides, and their secondary structures are predicted using RNAfold.

[0258] The Shannon Diversity Index (SDI) is calculated for each position in the template across all 1,000 sequences. This measures the diversity of structural elements (paired, unpaired, or specific base pairings) at each position. The SDI calculation uses the formula: SDI = -X(p_i * log(p_i)), where p i is the probability of each structural element at a given position.

[0259] The template with the highest overall SDI is selected as the optimal template for each library, maximizing structural diversity. Through the use of this process, anoptimized library was identified: the Neo-939 library. This library contains 105,826 secondary structures with a median of 34 sequences per structure at 22°C and 0.127M salt concentration.

[0260] A set of optimized templates with varying random nucleotide number were created using the process described (Table 1; Seq ID 5 - 14). It was found that the totalSDI per position was similarly enriched in templates with 12 to 20 random nucleotides and showed higher enrichment compared to a library with 11 random nucleotides and Somalogic’s standard SELEX library (Figure 8). It is important to note here, that this was performed to achieve a template with a total length of 80 nucleotides to make it directly comparable to the SELEX template. This shows that’s 12 random nucleotides were sufficient to maximize structural diversity in the template to a greater extent than a commonly used SELEX library.

[0261] Table 1 : Most structurally diverse templates for each set of random nucleotides identified from the optimised template approach

[0262] The predicted secondary structures are represented using dot-bracket notation, this approach enables easy comparison and analysis of structural similarities and differences between aptamers, facilitating the identification of common structural motifs that are enriched in response to specific targets or conditions, even if the underlying sequences differ.

[0263] The pipeline optimizes several key factors iteratively to compute and select the top libraries based on total or average sdi per position (Figure 8; Table 2; Sequences ID: 15 - 21):1. Primer design for efficient amplification 2. Interspersed fixed region compositions for structural context3. Hybridization levels to cover a range of structural stabilities4. Positions of random nucleotides to maximize structural diversity

[0264] Table 2: Most structurally diverse templates identified from the optimized template approach

[0265] Libraries are also evaluated and chosen based on the distribution of sequences across secondary structures with the following method: 1. Generate all the possible sequence combinations for each optimized library template;2. Predict the secondary structures of these sequences using RNAfold;3. Count the number of sequences that fold into each secondary structure;4. The number of sequences folding into each predicted structure is assessed to identify library templates that minimize the number of structures with high and low sequence representation;5. It is possible to identify library templates that have a more even distribution of sequences across a wide range of structures. This helps us select libraries that offer an unbiased representation of the structural space, maximizing the potential binding solutions that can be effectively evaluated. This even distribution can be seen in Figure 9;6. Additionally, the number of sequences folding into each secondary structure is assessed to identify library templates that minimize the number of structures with low and high sequence representation. A range of 5-3000 sequences per structure would be considered optimal as these can be effectively evaluated after a selection based on expected observations.

[0266] Step by step this approach aims to optimize the structural diversity of the library while maintaining adequate sequence representation for each structure, enhancing the statistical power of subsequent analyses. This library design technique is critical for optimizing the utility of our approach by maximizing the potential binding solutions while adhering to the statistical principles outlined in the previous section.

[0267] The library of Sequence ID 11 was applied to immobilized interleukin-6. Recombinant IL6 with an N-terminal his tag was immobilized on His-Pur™ Ni-NTA Resin (ThermoScientific) following a modified purification protocol. IL6 (50 pg) was prepared in 200 pL of binding buffer (20 mM sodium phosphate, 300 mM sodium chloride, pH 7.4) and incubated with 70 pL of Ni-NTA Resin for 2 hr at room temperature with agitation. Unbound protein was collected by centrifuging at 700 x g for 2 min and removing the supernatant. The resin was washed once with 140 pL of binding buffer. Remaining active sites were blocked with 2 mM imidazole, incubating for 1 hr at room temperature with rotation. After blocking resin was washed with lx PBS and resuspended as a 50:50 slurry in lx PBS.

[0268] Human serum albumin (HAS) (Sigma-Aldrich Canada) was immobilized on Ultralink Biosupport (ThermoScientific), which uses azlactone functional groups to bind primary amines on proteins, following user guide recommendations. Solubilized HSAwas prepared to a final concentration of 2 mg / mL in conjugation buffer (500 mM sodium citrate, 200 mM sodium bicarbonate, pH 8.5). 200 pL of protein solution was added to 8.8 mg of dry UltraLink Beads (swell volume of 8 pL / mg) and incubated for 2 hr at room temperature with agitation. Unbound protein was removed by centrifuging at 1200 x g for 5 min. The resin was washed once with 140 pL of conjugation buffer. Remaining active sites were blocked overnight with 1 M Tris (which contains a primary amine) at 4 °C. After blocking the resin was washed and stored in PBS. This method results in multiple orientations of immobilized HSA, presenting various epitopes for aptamer binding.Neomer Selection

[0269] Prior to selection, the Neomer library was denatured at 95 °C for 5 min, cooled on ice for 10 min, and equilibrated to room temperature. Selection was conducted in a 1 mL column fitted with a 20 pm frit. The refolded library (7.15 pmoles) was incubated with 10 pL of IL6-Ni-NTA resin (223 pmol IL6) in 100 pL of selection buffer (10 mM HEPES, 120 mM sodium chloride, 5 mM potassium chloride, 5 mM magnesium chloride, pH 7.6) for 30 min at room temperature with agitation. The unbound library was discarded in the flowthrough and the column washed three times with 500 pL of selection buffer. Bound library was eluted by adding 200 pL of 6 M urea to the column, heating at 85 °C for 5min and collecting the flowthrough. The elution was repeated, and the eluents pooled together. The eluted library was purified using the GeneJet PCR Purification Kit (ThermoFisher) and eluted with 50 pL of MilliQ filtered water. The library was brought up to 100 pL in selection buffer and incubated with another 10 pL of IL6-Ni-NTA resin. The purified library was eluted in 400 pL of MilliQ filtered water and stored in a salinized vial. Selection against HSA was conducted using the same procedure with the following changes. The library was incubated with 10 pL of HS A-Ultralink Biosupport (369 pmol HSA) in 100 pL of PBS. The purified library was eluted in 400 pL of MilliQ filtered water. Each selection as described was conducted in triplicate for each target.NGS Preparation

[0270] Nested PCR primers were applied to each selected DNA library for sequence identification in a two-step process. The sequences used for this process are listed below.This process adds a hex code for each selection replicate immediately to the 5’ side of the neomer sequence. This amplicon is flanked by universal primer sequences.

[0271] The amplified sequences were then purified from a 20% acrylamide gel. The target band was excised from the gel, fragmented, and stored in TE buffer in a salinized vial for 3 days to elute the DNA. The DNA was subsequently purified using a Genejet PCR purification kit. The DNA was purified following standard protocol and eluted from the column with 30 pl of water. 10 pl of the DNA is run on a 10% polyacrylamide gel to determine concentration of each library. Sequence analysis was completed using Illumina HiSeq 2500.

[0272] Negative control selections of the library against UltraLink resin and nickel resin without immobilized proteins was also performed in triplicate as well as the unselected naive library.2 Neomer structure analysis

[0273] Previously, regarding the development of a reproducible aptamer selection library, it was relied on a method that involved splitting the library into two halves (modules), characterizing the selected sequences from each module separately through next generation sequencing analysis and predicting optimally selected sequences by multiplying the individual module sequence frequencies by each other in an outer product matrix. This approach represented a significant advance in aptamer development by reducing selection to a reproducible process, but the use of an outer product matrix to compile predicted frequencies of individual sequences was limited in it’s accuracy. The outer product matrix approach was not deterministic. As such this approach enabled the characterization of selection on the level of each module, but not on the full sequence level.

[0274] To overcome this constraint, a method to characterize the effect of selection on neomer structures rather than sequences was invented. This method was fully deterministic in terms of accuracy. In general, any given structure within the optimized neomer selection library was created by more than one sequence. This redundancy increases the statistical robustness of this approach by increasing the number of reads per structure from NGS analysis given a fixed number of total reads.

[0275] The Neomer selection analysis pipeline was designed to process next-generation sequencing (NGS) data from Neomer selections by evaluating statistically enriched structures and generating output tables suitable for network analysis. Network analysis is based on relationships among structures and serves as a basis for selection of candidate structures. This approach aims to emulate transcriptomics methodologies by identifying variables that exhibit differential enrichment between conditions, applying analogous logic for selecting structures with appropriate count levels for statistical inference.

[0276] The pipeline's initial steps parallel those of RNA-seq analysis workflows, where raw sequence data is processed, aligned to a reference, and quantified. In our case, the reference consists of the set of all possible predicted secondary structures from an optimized selection library rather than a genome or transcriptome. The quantification step produces a count matrix of structures across samples, analogous to gene expression matrices in transcriptomics.

[0277] Statistical analysis for differential enrichment of structures between conditions is performed using methods adapted from differential expression analysis in transcriptomics. Similar approaches to those used in popular tools such as DESeq2 or edgeR were employed, adjusting for the specific characteristics of our structural data (Figure 10). This includes normalization procedures to account for sequencing depth differences and dispersion estimation to model the variance of structure counts. A key difference is that our pipeline works on the structural level rather than the sequence level.

[0278] The application of statistical techniques from transcriptomics to our aptamer selection methodology provides a robust statistical framework with innovative utility. This approach represents a preferred enablement of this invention for the following reasons:1. Statistical rigor: Transcriptomics methods have established protocols for handling large-scale, high-dimensional data with inherent biological variability. These statistical approaches, when applied to aptamer selection, enable high-confidence identification of significant structures. Visualisations such as principal component analysis’s, concordance scatterplots and densigrams are used to verify sample condition variation and technical concordance and are shown as a specific enablement of this approach for the detection of neomers for IL6 (Figure 11).2. Reproducibility: Standardized transcriptomics workflows have dramatically improved reproducibility in gene expression studies. Adapting these principles to aptamer selection addresses a critical need for more consistent and reliable aptamer discovery processes.3. Differential analysis: Transcriptomics techniques for identifying differentially expressed genes can be analogously applied to detect differentially enriched aptamer structures, potentially revealing structure-function relationships and are shown to describe a specific enablement of this approach for the detection of neomers for IL6 (Figures 12, 13).

[0279] Gene network exploration refers to the study of interactions and relationships among genes using network analysis techniques. In this context, a gene network is a graphical representation of gene interactions, where nodes represent genes, and edges represent the relationships or interactions between them. These relationships can be based on various factors, such as co-expression patterns, regulatory interactions, or functional associations.

[0280] Instead of using gene expression or interaction data, it is used selection performance data and structural relationships among selected aptamer structures, giving us insight into the nature of the structures selected. This is immediately useful in terms of assigning aptamers to different structural groups as a means of identifying aptamers that bind to different epitopes on a target. For many diagnostic applications, the identification of a capture / detection pair of aptamers that bind to different epitopes on a protein is necessary. Our analysis of relationships among selected structures enables scientific guidance in selecting candidates for this. Secondly, insights into the structural basis of selection enables scientific guidance over the truncation of aptamers to their minimal binding domains, a process that often increases specificity and affinity. To our knowledge, our insight regarding the application of gene network and transcriptomic analysis to the characterization of aptamer selection is novel. These insights are enabled by the existence of a reproducible aptamer selection platform, but they are not obvious as these approaches have not been used previously to characterize the selection process for aptamers, antibodies or peptides with phage-display.

[0281] In an enablement, the top ten, twenty -five, seventy-five, one hundred, or two hundred and fifty or more high-performing neomer structures is selected.Neomer structure network analysis

[0282] The invention of characterizing the effect of selection at a structural rather than a sequence level for neomers enabled us to develop an additional insight. This insight is that the relationship among selected structures has utility in terms of defining different structural solutions for binding to a given protein that are apparent in the selected neomer library. In this example a method for the analysis and use of such potential structural relationships is detailed.

[0283] Hamming distance is defined for the purposes of this invention as representing the number of position-specific differences in dot-bracket secondary structures. These distances are determined for all selected neomer structures. And are used to define edges between them. A threshold level Hamming distance is established and applied to create an asymmetric topology, which has been shown to be particularly useful in network analysis for inferring functional relationships. This approach was applied to the selected IL6 neomer structures (Figure 14).

[0284] An extension of this approach was also applied in which the concept of nearest neighbour modules or subnetworks is used to describe centroid nodes and their similar structures. When nearest neighbour modules or subnetworks share an identical node, an edge connects them through that node, forming what is described as a cluster. Clusters are interpreted as representing similar binding solutions to a target, which is infered to be topologically meaningful in representing a solution or epitope for which structures show varying specificity and binding characteristics. This approach was applied to the selected IL6 neomer structures (Figure 15, in which red dots are those that passed filter checks, while yellows are the ones selected for different epitope binding).

[0285] This asymmetric network structure contrasts with the symmetrical "hairball" structure that would result if all nodes were connected based on Hamming distance alone. Such symmetrical networks, while comprehensive, often lack inherent topological meaning for deciphering functional relationships. Our approach, by introducingasymmetry, aligns with recent advancements in network theory that highlight the importance of directed and asymmetric relationships in biological networks.

[0286] The use of Hamming distance as a measure of structural similarity in our network construction is analogous to sequence similarity measures used in protein interaction networks. However, our focus on secondary structure provides a unique perspective on potential functional similarities between aptamers.

[0287] The neomer analysis pipeline culminates in the integration of multi-dimensional data into Cytoscape for advanced network visualization and analysis. It should be noted that this could be any network visualisation software such Gephi, a custom interactive network visualisation software or python library that visualise non-interactive networks combined with subsetting and filtering thresholds. This process incorporates structure identifiers, enrichment statistics, diversity metrics, and thermodynamic parameters, providing a comprehensive view of aptamer performance and relationships across different selections.

[0288] Central to this analysis are the enrichment metrics: Target and counter-target averages, fold changes, log-transformed ratios, and other statistical metrics such as p- values, or other relevant measures of statistical significance. These parameters allow for a nuanced assessment of aptamer specificity and performance across various selection conditions. By comparing these metrics between different selections, researchers can identify structures that demonstrate consistent enrichment for the target of interest while maintaining low affinity for counter-targets.

[0289] Network visualization reveals clusters of related structures based on their Hamming distance. This approach allows for the strategic selection of aptamers based on their structural diversity, ensuring a diverse set of binding solutions that potentially target distinct epitopes on the molecule of interest. By comparing the performance of structures across different selections, it is possible to identify aptamers that are not only highly enriched but also highly specific to the target of interest.

[0290] This method of integrating and visualizing data from multiple selections represents a significant advance over traditional SELEX approaches in several key ways:1. Comprehensive structural characterization: Our approach allows for the full characterization of all possible secondary structures in the initial library and their binding during selection. This contrasts with SELEX, where libraries don’t show redundancy characteristics (multiple sequences corresponding to the same structure) that make it amenable to reproducible and robust statistical analysis.2. Structure-based analysis: By analyzing aptamers based on their secondary structures rather than just nucleotide sequences, deeper insights into the structural determinants of binding are gained. This allows us to identify families of related structures that may bind to similar epitopes on the target.3. Statistical robustness: The closed sequence space enables rigorous statistical analysis of enrichment patterns across the entire structural landscape of the optimised template, including structures that may be negatively selected against. This provides a more complete understanding of selection dynamics.4. Multi-dimensional data integration: Our approach integrates enrichment data, structural information, and statistical metrics across multiple selection conditions. This multi-faceted view enables researchers to make more informed decisions in aptamer selection.5. Specificity analysis: There are two primary concerns regarding target specificity for which the approach disclosed herein provides an improvement over SELEX. The approach for both specificity concerns is the same and is based on the following steps: i. Applying the same set of sequences in triplicate across different targets ii. Identifying the secondary structures selected for the desired target iii. Characterizing the performance of the selected secondary structures from step ii in selections against other targets. iv. Maximize specificity of aptamer binding for a specified target molecule versus other similar molecules (Example: protein isoforms, protein post- translational modifications, or similar metabolites, such as steroids or beta-lactam antibiotics). v. Analyzing the differential enrichment of secondary structures among targets exposed to selection with the same initial library. Structures thatshow high enrichment in the target selection but low enrichment in the off- target selections will be more specific for the target. vi. By performing specificity analysis as outlined above, it is possible to identify aptamer structures that are not only highly enriched for the target of interest but also demonstrate minimal binding to related molecules, non-specific surfaces, or unrelated targets. This approach helps to pinpoint secondary structures that are likely to be highly specific for the target, improving the chances of success in downstream applications.

[0291] The ability of the present invention to fully characterize the structural basis for selection by comparing selection effect across related structures represents a key advancement in aptamer science. It enables a more comprehensive understanding of aptamer-target interactions by simultaneously considering enrichment, specificity, structural diversity, and statistical significance across various selection conditions. This multi-dimensional analysis allows researchers to make informed decisions in aptamer selection, potentially increasing the success rate in binding assays.

[0292] By performing specificity analysis as outlined above, it is possible to identify aptamer structures that are not only highly enriched for the target of interest but also demonstrate minimal binding to related molecules, non-specific surfaces, or unrelated targets. This approach helps to pinpoint structures that are likely to be highly specific for the target, improving the chances of success in downstream applications.Neomer sequence selection

[0293] It is ultimately necessary to define an aptamer sequence for candidate aptamer structures arising from the previous example for synthesis and experimentation. The choice of which sequence representing a selected structure is comprised of the following steps:1. Sequence Selection: Sequences for each identified secondary structure of interest are analysed relative to the naive library.2. Enrichment Quantification: The degree of enrichment is determined by comparing the counts of each sequence within that secondary structure to its counts in the naive library3. Specificity Considerations: Specificity analysis is deliberately omitted from the selection criteria. This decision is based on the following observations: i. The naive library typically shows non-zero counts for almost all sequences across various samples. ii. In contrast, counter-target conditions often result in zero counts for many sequences. iii. This disparity in sequence representation between the target and naive library conditions can lead to mathematical artifacts when calculating dispersion, fold change, and other statistical metrics such as p-values.4. Statistical Robustness: By focusing on enrichment relative to the naive library, rather than incorporating counter-target data at this stage, the statistical integrity of the selection process is maintained and potential biases introduced by the artifacts are avoided.

[0294] This approach was applied to the selected IL6 structures to identify a set of neomer sequences specific to IL6 (Figure 16; Table 3; Table 4; Seq ID 22 - 26).

[0295] Table 3: Structure candidates selected from Hyb939 IL6 network analysis

[0296] Table 4: Significant sequences selected from candidate structures for truncation analysis.

[0297] The pipeline flow ensures a reliable initial selection of sequences based on their structural properties and enrichment profiles, while reserving specificity analysis for subsequent stages of aptamer development and characterization.

[0298] By combining this sequence-level analysis with the previous structure-level network analysis, the neomer pipeline provides a multi-tiered approach to aptamer selection. It ensures that chosen aptamers not only represent high-performing structures but also capture sequence diversity within those structures, increasing the probability of selecting aptamers that are sensitive and specific to a target.

[0299] The novel ability to assign selected neomer structures to different clusters based on structural similarity has utility in the following applications:1. Many diagnostic applications rely on the use of a pair of ligands, known as a capture ligand and a detection ligand, for example but not limited to Enzyme Linked OligoNucleotide Assays (ELONA). In these applications it is necessary that the capture and detection neomers bind to different epitopes on the same target. This invention enables rational choices for the assembly of such pairs.2. The specificity performance of an aptamer or neomer as applied in a selection library may differ from the specificity performance when such an aptamer or neomer is used by itself in a diagnostic assay, as the presenceof the other aptamers in the library may block aspecific binding to other targets. The ability to characterize aptamer structures that bind to different epitopes is useful for the selection of candidate sequences for binding assays. This increases the probability that a neomer will be identified that exhibits the desired level of specificity by enabling the screening of different binding solutions.3. It is possible that in some instances the different solutions derived for binding to the same protein may involve the same epitope on such a protein. In these cases a combination of the two neomers in one molecule may result in higher binding affinity at a given epitope.4. Certain applications of aptamers or neomers involve binding to a protein in order to inhibit a specific property of the protein, such as but not limited to a binding domain within the protein. The network analysis described herein could be applied in a co-enrichment analysis to identify structures that bind to a common protein domain, across different proteins.5. High-plex proteomics (also named next generation sequencing based proteomics) simplifies traditional proteomics based on liquid chromatography followed by two mass-spectroscopy analyses (LC- MS / MS) by translating information on protein abundance to surrogate DNA sequence abundance which can be readily translated by NGS or qPCR analysis. Our invention of a method to predict neomers that bind to the same protein but at different epitopes enables High-plex proteomics to also screen for post-translational modifications across multiple epitopes.5: Neomer optimization through truncation

[0300] An enablement of this invention is to optimise aptamer function by decreasing the flux of structures represented by a selected aptamer sequence arising from example 3. It is disclosed here a method for this enablement based on the characterization of those elements within a selected structure that are responsible for target affinity and specificity and those elements that are not. This characterization is then used to truncate selected aptamer sequences to their optimal binding efficacy. This is applied to the selectedsecondary structure and most significantly enriched sequence and comprises the following steps:1. Selection of Target Structures: Structures are chosen based on a Hamming distance threshold of 5, 10, 15, 20 or 25 from the reference secondary structure. This focuses on structures closely related to the reference, implying similar functional capabilities.2. Motif Extraction and Analysis: All motifs from selected structures are extracted, including information about their position, frequency, and significance. A sliding window method examines motif lengths from 15 to 45 nucleotides within the structure. The significance of motifs is evaluated using Fisher's exact test, comparing their occurrence to the naive dataset.3. Aligning and Overlaying Motifs: Significant motifs are realigned on the reference structure based on their starting positions. A dynamic threshold for statistical significance (p-value) is set, at the 75th, 80th, 85th, 90th, 95th, 97th or 99th percentile of all motif p-values. Motifs below this threshold are considered for truncation.4. Optimization of the Truncated Structure: The truncated structure is adjusted to balance hybridized regions, ensuring structural stability, first considering the original nucleotide sequence and using non-reference sequence nucleotides if this fails. Excess regions or 'dangling ends' that do not contribute to the structure's function are trimmed for being in a percentile less than the percentiles chosen from the previous step or by secondary structure annotation identity in the case of dangling ends using tools like bpRNA.5. Verification and Adjustment: Tools like RNAfold are used to verify the truncated structure's stability and correctness. If the structure does not meet expected criteria, further adjustments are made, such as extending certain dangling ends regions that may contribute to the stability of functional areas of the structure (Figures 11-15, Table 5).6. Mutation analysis: To understand how single nucleotide positions can change the structure of the truncated aptamer, interactive plots (HTML visualizations) are created to map each random nucleotide’s identity and its effect on statistical significance (p-values) to the truncated structure. This characterises the mostimportant parts of the truncates structure to give us an insight into binding positions.

[0301] Table 5: Truncated aptamer statistics

[0302] This truncation process ensures that aptamer structures are effectively minimized while retaining their functional regions, to improve affinity and specificity.

[0303] Truncation was applied to the IL6 selected structures and sequences to generate a set of truncated neomers specific to IL6 (Tables 3-5; Figures 11-15; Seq ID 22 - 31).

[0304] The original full length selected aptamer sequences and the truncated sequences resulting from the examples outlined herein were tested for their binding affinity to interleukin 6 (positive target) human serum albumin (negative target) and immunoglobulin (negative target) using surface plasmon resonance imaging (SPRi).

[0305] Each aptamer was synthesized with a 5’ extension as listed in Table 5. This extension was added to each aptamer such that the ability of the aptamer to form the same structure selected for in the free state would be possible when immobilized to a surface.

[0306] Table 6: Binding coefficients for full length and truncated aptamers selected for interleukin-6NB = No binding observed

[0307] The aptamers were immobilized in triplicate on a gold surface, each at a concentration of 5 pM in a volume of 10 nL. A negative aptamer was spotted as a control to correct for baseline resonance and determine resonance due to binding with the candidate aptamers. The gold chip was suspended above a glass prism on a layer of mineral oil. The targets were prepared at lOOnM in 10X HEPES buffer pH 7.4 (identical in composition to the buffer used for selection and then flowed over the chip. A volume of 200 pL of the target solution was injected each time, with the flow rate maintained at 50 pL per minute. After each injection, the chip was regenerated by injecting 200 pL of 5% (w / v) sodium dodecyl sulfate (SDS).

[0308] Figure 22 shows the binding performance of all aptamers against a 100 nM (A) and a 250 nM (B) concentration of Interleukin 6. The legend provides a guide for the identification of each aptamer in each figure.

[0309] Figure 23 shows the binding performance of all aptamers against a 100 nM (A) and a 250 nM (B) concentration of human serum albumin. The legend provides a guide for the identification of each aptamer in each figure.

[0310] Figure 24 shows the binding performance of all aptamers against a 100 nM (A) and a 250 nM (B) concentration of a pool of immunoglobulins. The legend provides a guide for the identification of each aptamer in each figure.

[0311] It is clear given the lack of positive resonance due to binding that only the aptamers D2 and Dl.l exhibited any binding response to human serum albumin and that no aptamers exhibited any binding response to immunoglobulins.

[0312] The binding affinities of the aptamers against interleukin 6 are provided in Table 5.

[0313] Certain of the selected aptamers exhibit high binding affinity to interleukin 6 with no apparent binding to human serum albumin or immunoglobulins. In particular the aptamers D4.1 and D5.1 exhibited digit nM binding affinity to the desired target.Example 6: Power analysis for determining appropriate experimental design and interpretation

[0314] To validate the statistical robustness of our approach, a power analysis was conducted to determine whether the expected observations of structures were detectable for various effect sizes.

[0315] The power analysis results demonstrate that for large effect sizes (greater than a Cohen’s effect size of 3), our sample size of at least 3 target samples and 9 naive samples at 1E7 reads for the initial selection is more than adequate to achieve 80% power and a significance of 0.05 (Table 7).

[0316] Table 7: Power analysis statistics show that most structures in Hyb939 cross the statistical threshold for an experiment conducted under standard conditions.

[0317] This analysis is particularly relevant in the context of our Alzheimer's disease study, which utilized the original reproducible 16 nucleotide sequence focused approach detailed in the previous patent. A sample size of 10 cases vs. 10 controls for the initial aptamarker selection was used, followed by qPCR assays on 390 samples for validation. Aptamarkers with effect sizes ranging from 2.05 to 3.08 effect size were identified.

[0318] The ability to detect these large effect sizes with just 10 samples per group underscores the efficiency of our approach in identifying strong aptamer signals that are predictive of complex diseases due to their ability to agnostically identify non-canonical biomolecules as explained in our aptamarker patent. This has enabled us to develop an aptamer-based qPCR assay test achieving 82% accuracy for Alzheimer's disease using this small sample size for initial selection, further validating the power and efficiency of our neomer analysis pipeline in biomarker discovery and diagnostic development.

[0319] It is important to note that while the Alzheimer's study was conducted using the16 nucleotide Punnett square approach detailed in the paper, recent advancements have led to the development of the new method detailed here yields even more sensitive and specific aptamers by focusing on a structure focused approach. Future studies employingthis improved approach will further enhance the statistical power and performance of our aptamarker-based diagnostic tests.

[0320] The neomer technology, as described in the current patent application, extends beyond aptamer selection and Aptamarkers to proteomics. With the reproducibility and statistical robustness of the neomer approach, it is possible to develop high-throughput, quantitative protein profiling methods similar to those employed by companies like Olink and SomaLogic but using a top-down approach as opposed to a bottom-up approach.

[0321] Olink and SomaLogic use probe-based assays for highly multiplexed protein quantification. This allows them to identify individual aptamers for each canonical target protein using a bottom-up method. In contrast, the neomer approach would identify all the associations between the structures and proteins in a sample simultaneously by developing a machine learning training model on samples with known protein concentrations in a single selection round. SomaLogic’ s SomaScan platform relies on the specific binding of proprietary aptamers (Somamers) binding to target proteins. Both companies provide their platforms for analysis with next generation sequencing. Their approach depends on the training of probes on recombinant proteins, thus we term these approaches as bottom-up. Probes are trained on recombinant proteins and then combined into a panel to detect these proteins simultaneously. In contrast, the neomer approach can be considered top-down in that it involves the application of the same library of probes on different samples. There is no pre-training of the probes on recombinant proteins. Information is gained through differences in the relative proportions of aptamer structures across samples. As such, the platform is capable of detecting any change in any protein in any species.

[0322] Protein identity of aptamer structures that change in relation to a medical contrast can be determined by several methods that are known in the art including but not limited to positive correlation with measurements of known proteins across the same samples, reverse proteomics involving the immobilization of an enriched aptamer, flow of the biological sample over the immobilized aptamer, elution of bound proteins, and characterization of their identity by mass spectrometry, characterization of the libraryresponse to known proteins by training it against recombinant proteins, and through integration with genomics, transcriptomics or other proteomics data.

[0323] Enablement of this invention as applied to High-Plex proteomics include the following:1. Top-down protein profiling: By applying the neomer library to samples with known protein concentrations, it would be possible to identify the associations between aptamer structures and protein levels in a single selection round. This top-down approach would enable the simultaneous development of aptamers for multiple proteins in a single process. Once the panel is built, like Olink and SomaLogic assays, it can be applied as a single multiplexed assay. The key difference is in the initial development process, where our approach reduces the need for individual aptamer selection for each target protein, as is required in the bottom-up methods used to develop the Olink and SomaLogic panels;2. Multiple aptamers targeting protein epitopes: The neomer approach can identify multiple aptamer structures that bind to different epitopes on the same protein. This is particularly important for accurate protein quantification, as it allows for the development of sandwich assays and provides a more comprehensive view of protein abundance. By selecting aptamers against different epitopes, the neomer technology can improve the specificity and reliability of protein measurements, reducing the risk of false positives or negatives due to epitope masking or conformational changes;3. Non-canonical protein identification through epitope mapping: The neomer technology's ability to identify multiple aptamers targeting different epitopes on a single protein can be extended to the discovery and characterization of non- canonical proteins. By comparing the epitope-binding patterns of aptamers across different samples, it is possible to identify novel protein isoforms, post- translational modifications, or protein-protein interactions. This epitope mapping approach will provide valuable insights into the functional diversity of the proteome and help uncover new biomarkers or therapeutic targets. The characterization of the proteomic basis for any medical condition or state by agnostically applying the neomer library to samples that differ for such a state anddetermining differences in aptamer (Aptamarker) recovery that can be used to define patterns that predict said medical condition or state. Such pattern identification can be performed by methods known to those trained in the art including but not limited to correlation or regression analysis, machine learning approaches, artificial intelligence. The delineation of the identity of the proteins bound by the aptamers used in this manner is desirable but not necessary for application.

Claims

CLAIMS1. A computer-implemented method (1) for designing an optimized aptamer library’s template (50) comprising: a. designing (11) an aptamer library’s template comprising:(i) regions of fixed nucleotides comprising (i.a) 5’ end region of fixed nucleotides, 3’ end regions of fixed nucleotides, and (i.b) internal regions of fixed nucleotides;(ii) regions of random nucleotides; wherein said aptamer library’s template comprises from 40 to 90 nucleotides, preferably from 70 to 80 nucleotides, and wherein said regions of random nucleotides comprise from 12 to 16 nucleotides; b. generating (12) multiple versions of said aptamer library’s template, wherein each of said multiple versions satisfy following criteria:• said 5 ’end region of fixed nucleotides and 3 ’end regions of fixed nucleotides comprise from 8 to 16 nucleotides each;• said internal regions of fixed nucleotides have a G / C content from 40 to 60%;• contains maximum three consecutive G’s. c. predicting (13) one or more secondary structures of each of said multiple versions of said aptamer library’s template; d. evaluating structural diversity (14) of each of said multiple versions comprising:• counting a number of secondary structures predicted;• evaluating complexity of secondary structures predicted; e. evaluating distribution (15) of sequences within said multiple versions for comprising each secondary structure; f. selecting (16) said optimized aptamer library’s template among said multiple versions as the one satisfying following predefined criteria:• the number of secondary structures predicted is superior to 100,000;• the complexity of secondary structures is selected from the group comprising: highly hybridized, mediumly hybridized, and lowly hybridized;• said distribution of sequences within said multiple versions per secondary structure has a median greater than 10.

2. The computer-implemented method (1) according to claim 1, wherein said evaluation of structural diversity (14) is evaluated using software selected from: RNAFold, MF old, and MXFold.

3. A method (2) for generating at least one first secondary structure (60) using an optimized aptamer library’s template (50), designed with a method according to claim 1 or claim 2, against a first target molecule, said method (2) comprising following steps: a. generating (21), from said optimized aptamer library’s template (50), an optimized aptamer library comprising multiple sequences of aptamer; b. applying (22) said optimized aptamer library from step a, to said first target molecule, wherein said optimized aptamer library comprises an average copy number per sequence from 100 to 1E7; c. removing (23) sequences of said optimized aptamer library which unbound to said first target molecule from step b through a wash step; d. eluting (24) sequences of said optimized aptamer library which bound to said first target molecule from step b so as to obtain a first selected optimized aptamer library; e. proceeding to NGS analysis of said first selected optimized aptamer library (25); f. proceeding to NGS analysis of said optimized aptamer library of step a (26); g. evaluating (27) secondary structure of each sequence in each library, said first selected optimized aptamer library and said optimized aptamer library, and counting the number of each secondary structure;h. comparing (28) the number of each secondary structure of the first selected optimized aptamer library to the number of each secondary structure of the optimized aptamer library; i. identifying (29) said at least one first secondary structure (60), being at least one secondary structure defined on the base of p-values and on the base of fold difference between the number of each secondary structure of said optimized aptamer library to the number of each secondary structure of said first selected optimized aptamer library.

4. The method (2) according to claim 3, wherein said first target molecule is immobilized.

5. The method (2) according to claim 3 or claim 4, wherein step i further comprises determining an optimal aptamer sequence by determining which sequence exhibits the strongest response to selection.

6. The method (2) according to any one of claims 3 to 5, wherein said method (2) further comprises following steps:• applying said at least one first secondary structure (60) obtained from step i to at least one second target molecule;• evaluating a specificity of said at least one first secondary structure (60) to bind to said at least one secondary target molecule.

7. The method (2) according to any one of claims 3 to 6, wherein said optimized aptamer library is applied to a target molecule, said target molecule being said first target molecule or said second target molecule, in separate replicated treatments and a naive form of said optimized aptamer library is also evaluated in replicated treatments whereby such replicates is a minimum of two for each and a statistical significance of said fold differences identified for aptamer structures between said first selected optimized aptamer library and said unselected optimized aptamer library.

8. A method (7) for generating at least one first secondary structure (60-1) using an aptamer library’s template (50-1) against a first target molecule, said method (7) comprising following steps: a. receiving (61) said aptamer library’s template (50-1), an aptamer library comprising multiple sequences of aptamer;b. applying (62) said aptamer library from step a, to said first target molecule, wherein said aptamer library comprises an average copy number per sequence from 100 to 1E7; c. removing (63) sequences of said aptamer library which unbound to said first target molecule from step b through a wash step; d. eluting (64) sequences of said aptamer library which bound to said first target molecule from step b so as to obtain a first selected aptamer library; e. proceeding (65) to NGS analysis of said first selected aptamer library; f. proceeding (66) to NGS analysis of said aptamer library of step a; g. evaluating (67) secondary structure of each sequence in each library, said first selected aptamer library and said aptamer library, and counting the number of each secondary structure; h. comparing (68) the number of each secondary structure of the first selected aptamer library to the number of each secondary structure of the aptamer library; i. identifying (69) said at least one first secondary structure (60-1), being at least one secondary structure defined on the base of p-values and on the base of fold difference between the number of each secondary structure of said aptamer library to the number of each secondary structure of said first selected aptamer library.

9. The method (7) according to claim 8, wherein said first target molecule is immobilized.

10. The method (7) according to claim 8 or claim 9, wherein step i further comprises determining an optimal aptamer sequence by determining which sequence exhibits the strongest response to selection.

11. The method (7) according to any one of claims 8 to 10, wherein said method (7) further comprises following steps:• applying said at least one first secondary structure (60-1) obtained from step i to at least one second target molecule;• evaluating a specificity of said at least one first secondary structure (60-1) to bind to said at least one secondary target molecule.

12. The method (7) according to any one of claims 8 to 11, wherein said aptamer library is applied to a target molecule, said target molecule being said first target molecule or said second target molecule, in separate replicated treatments and a naive form of said aptamer library is also evaluated in replicated treatments whereby such replicates is a minimum of two for each and a statistical significance of said fold differences identified for aptamer structures between said first selected aptamer library and said unselected aptamer library.

13. The method (7) according to any one of claims 8 to 12, wherein the first target molecule is not known and / or the aptamer library is applied to samples that differ in terms of a medical contrast or medical state, and / or where the enrichment of aptamer structures is used to predict said medical contrast or medical state.

14. A computer-implemented method (3) for determining relatedness of aptamer’s structures based on following steps:• evaluating single nucleotide position difference (31) between predicted secondary structures at any given position, said single nucleotide position difference being defined as a unit of distance;• generating (32), as network analysis, at least one structural network topology, comprising: applying a structural distance threshold to each of said predicted secondary structures to identify related structures whose structural distance from said secondary structure is equal to or inferior than said structural distance threshold; and constructing a network topology that represents structural relationships and similarities among said selected secondary structures or their related structures.

15. The computer-implemented method (3) according to claim 14, wherein said method (3) further comprises:• using said at least one structural network topology (33) as a basis for the hypothesis that each network contains a shared structural solution to the problem of binding to a specific epitope on a protein, and that different networks represent different structural solutions, this may mean different epitopes or the same epitope on a protein; and / or• selecting from the network analysis secondary structures (34) that show appropriate levels of sensitivity and specificity through integration of information of how said secondary structures performed in selections across different targets and analyses.

16. The computer-implemented method (3) according to claim 14, wherein an additional level of information is added that provides information as to which elements of a secondary structure are necessary for binding and which are not, for use in truncating said secondary structure to a minimally effective form comprising the steps of: a. identifying structures in a structure reference array, that is generated by predicting the most probable secondary structure for all possible sequences from the optimized aptamer library at a defined temperature and salt concentration appropriate for selection, that are within a structural distance threshold of one of said selected secondary structures from the network analysis; b. determining an effect of selection on other secondary structures equal or inferior to said structural distance threshold of one of said secondary structures selected from the network analysis; c. identifying structural elements within said secondary structures denoted from step b using a sliding window technique where a window size ranges between 15 and 45 nucleotides; d. determining, among said structural elements, critical structural elements by analyzing those that perform positively and negatively in selection against a target molecule, as defined by Fisher’s exact test derived p- values; e. isolating positively performing critical structural elements by truncating said secondary structure to retain only said positively performing critical structural elements; f. ensuring structural integrity of said truncated secondary structure is maintained by:• preserving complete hybridized portions of the secondary structure identified as critical structural elements for overall structural stability and functionality;• removing redundant or unstructured dangling ends that do not contribute to stability or binding efficiency of said secondary structure.