Computer-implemented method for determining the most optimal position match of a binding molecule in a docking site of a target molecule
By representing the binding molecule and the target molecule as a graphical model and using a quantum computing system to solve the binary optimization problem, the complexity of matching the docking sites of the binding molecule and the target molecule in the drug discovery process is solved, and efficient and accurate ligand recognition is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DABO PHARM
- Filing Date
- 2024-11-15
- Publication Date
- 2026-06-19
AI Technical Summary
In the drug discovery process, existing technologies involve a complex and computationally resource-intensive process for determining the matching sites between binding molecules and target molecules, making it difficult to effectively identify suitable ligand-target molecule pairs.
A computer-based approach is employed, representing the structures of the combined molecule and the target molecule as graphical models. A quantum computing system is used to solve the binary problem, determining the optimal positional matching, including graphical correlation and error function minimization. A quadratic unconstrained binary optimization problem is then constructed for optimization.
It achieves the accurate identification of the optimal match between binding molecules and the docking sites of target molecules with lower computational resource requirements, thus improving the efficiency and accuracy of the drug discovery process.
Smart Images

Figure CN122249857A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a computer-implemented method for determining the optimal positional match of a binding molecule at a docking site of a target molecule, and optionally for determining the optimal positional match of a ligand at a docking site of a target protein.
[0002] The present invention also relates to computer-implemented methods for modeling binding molecules, such as for modeling ligands designed to bind to docking sites of target proteins.
[0003] The present invention also relates to a computer-implemented method for modeling docking sites of target molecules (e.g., target proteins).
[0004] The present invention also relates to a computing system configured to perform the above-described methods. Additionally, the present invention relates to a non-transitory computer-readable storage medium configured to be coupled to one or more processors and storing instructions thereon that, when executed by one or more processors, cause one or more processors to perform the above-described methods.
[0005] The methods, systems, and media of the present invention can be used in drug discovery processes to facilitate the identification of promising matches between binding molecules (e.g., ligands) and target molecules (e.g., proteins). Background Technology
[0006] Drug discovery involves several tasks performed on a computer (in-silico), in vitro, and in vivo. While the process outputs a single effective solution, it typically begins by considering a large set of candidate molecules. In the early stages of drug discovery, the focus is on identifying a small set of candidate molecules or ligands that have strong interactions with the binding sites (called pockets) of the target molecule. These stages are performed on a computer, and they typically utilize molecular docking algorithms to estimate the interaction strength between the target pocket and the evaluated ligand. The interaction strength depends on the three-dimensional displacement of the ligand as it interacts with the target pocket. Therefore, the docking algorithm estimates the correct ligand pose to test whether the ligand must be discarded. This is a complex task because of the large number of degrees of freedom involved. In addition to the six degrees of freedom (three translations and three rotations) of a rigid body in three-dimensional space, ligands also contain so-called rotational isomers (rotational bonds between rigid segments of the ligand). Therefore, each rotational isomer introduces another degree of freedom (the angle between two adjacent segments) into the problem.
[0007] In this context, identifying suitable ligand-target molecule pairs is a computationally complex problem.
[0008] The purpose of this invention
[0009] One object of the present invention is to overcome the shortcomings and / or limitations of the above-described prior art solutions. Additional objects of the present invention are as follows.
[0010] One object of the present invention is to provide a novel method for modeling binding molecules (e.g., ligands) and / or target molecules (e.g., proteins). In detail, one additional object is to identify such modeling methods that allow for computationally simpler solutions for determining the optimal match of a binding molecule at a docking site on a target molecule, and optionally for determining the optimal match of a ligand at a docking site on a target protein.
[0011] Another object of the present invention is to provide a novel method for determining the optimal positional match of a binding molecule at a docking site of a target molecule, and optionally for determining the optimal positional match of a ligand at a docking site of a target protein.
[0012] Another objective is to implement the above method as a computer-based approach, which requires relatively low computational power but can still produce reliable solutions.
[0013] Another object of the present invention is to provide a system, and in particular a computer system, configured to implement one or more new methods.
[0014] Another object of the present invention is to provide a medium for storing instructions that, when executed by a computing system, configure or program the computing system to perform one or more of the methods described above.
[0015] There is also an additional purpose: to provide molecular modeling methods and ways to determine the optimal match between candidate binding molecules and target molecules at docking sites, which allows for the formulation of binary problems that are particularly well-suited for solving by quantum computing systems. Summary of the Invention
[0016] One or more of the above objectives are achieved substantially by computer-based methods for determining the optimal positional match of a binding molecule at a docking site of a target molecule, such as determining the optimal positional match of a ligand at a docking site of a target protein, as disclosed herein.
[0017] One or more of the above objectives are achieved substantially by computer-based methods for modeling binding molecules, such as for modeling ligands designed to bind to docking sites of target proteins, as disclosed herein.
[0018] One or more of the above objectives are achieved substantially through computer-based methods for modeling target molecules, such as for modeling target proteins, as disclosed herein.
[0019] One or more of the above objectives are also achieved by a computing system configured to perform at least one of the above methods.
[0020] In addition, one or more of the described objectives are also achieved by a non-transitory computer-readable storage medium carrying instructions that, when processed by a computing system, allow the execution of at least one of the above methods.
[0021] Several aspects of the present invention are disclosed below.
[0022] The first aspect relates to a computer-based method for modeling binding molecules, the computer-based method comprising the following steps:
[0023] - Receive (31) the molecular description of the molecule to be modeled,
[0024] - The model to be modeled is combined with the molecular representation (32) into a graph (G mol ), where the diagram of the bound molecules (G) mol )include:
[0025] • Multiple nodes (1), where each node represents a corresponding atom in the bonded molecule,
[0026] • Multiple first sides (2), where each first side represents a bond between corresponding atomic pairs.
[0027] In the second aspect according to the first aspect, the binding molecular object of the modeling method is a ligand designed to bind to the docking site of the target protein.
[0028] In the third aspect according to any of the foregoing aspects, the computer implementation method provides that the weight of each first side (2) represents the bond length between the corresponding atomic pairs.
[0029] In the fourth aspect according to any of the foregoing aspects, Figure (G) mol It also includes:
[0030] - One or more second sides (3), where each second side represents a corresponding bond angle, where each bond angle is the angle between a corresponding pair of bonds adjacent to the same atom.
[0031] In the fifth aspect according to the foregoing, the weight of each second side (3) represents the magnitude of the corresponding bond angle.
[0032] In the sixth aspect according to any of the foregoing aspects, the figure also includes:
[0033] - One or more third sides (4), where each third side represents a corresponding dihedral angle.
[0034] Each dihedral angle is formed between the plane defined by the first and second bonds (AB; BC) and the plane defined by the second and third bonds (BC, CD).
[0035] In the sequence of first, second, third and fourth atoms (A, B, C, D) connected in succession, the first bond (AB) is between the first and second atoms, the second bond (BC) is between the second and third atoms, and the third bond is between the third and fourth atoms, wherein the first and second bonds are adjacent to the second atom (B), and wherein the second and third bonds are adjacent to the third atom (C).
[0036] In the seventh aspect according to any of the foregoing aspects, the molecular description of the molecule to be modeled includes the identification of the atoms of the molecule.
[0037] In the eighth aspect according to any of the foregoing aspects, the molecular description of the molecule to be modeled includes the bonds between the atoms of the molecule.
[0038] In the ninth aspect according to any of the foregoing aspects, the molecular description of the molecule to be modeled includes the type of the bond.
[0039] In the 10th aspect according to any of the foregoing aspects, the step (31) of receiving the molecular description of the molecule to be modeled includes:
[0040] - Access a database (103) containing multiple molecular descriptions in a given format, optionally wherein the database contains molecular descriptions in .mol2 file format;
[0041] - Select the molecules to be modeled from the database; optionally, select the molecules to be modeled from the database that are described in .mol2 file format.
[0042] In aspect 11 of the foregoing, the step (31) of receiving a molecular description of the molecule to be modeled includes:
[0043] - Access a database containing multiple molecular descriptions in .mol2 file format (103);
[0044] - Select the molecules to be modeled from the database, wherein the molecules to be modeled from the database are described in .mol2 file format.
[0045] In the 12th aspect according to any of the foregoing aspects, the method specifies that, when the binding molecule is represented as a graph (G) molBefore or during step (32), the method includes a molecular simplification step (33) of the molecule to be modeled, which includes removing at least one or more atoms from the molecule to be modeled.
[0046] In aspect 13 of the foregoing, the molecular simplification step (33) further includes:
[0047] - A sub-step to reduce the number of atoms in the molecule to be modeled, optionally by removing terminal hydrogens.
[0048] In aspect 14 according to either of the two aforementioned aspects, the molecular simplification step (33) further includes:
[0049] - Fragment removal substep, which includes removing one or more atoms and one or more bonds that are not contained in the shortest path connecting adjacent rotatable bonds, and introducing one or more constraints to maintain the arbitrary dihedral angles of the molecule to be modeled.
[0050] In aspect 15, which is based on any of the three aspects mentioned above, the molecular simplification step (33) further includes:
[0051] - A fragment substitution sub-step, which includes removing one or more atoms and one or more bonds that are not contained in the shortest path connecting adjacent rotatable bonds, and replacing the removed atoms and bonds with dummy atoms optionally located at the centroid of the removed atoms.
[0052] The 16th aspect relates to a computer-based method for modeling docking sites of target molecules, the computer-based method comprising the following steps:
[0053] - Receive the description of the docking site to be modeled (41).
[0054] - The docking sites to be modeled are represented as Figure (42), which includes:
[0055] • Multiple nodes, where each node represents a corresponding docking point.
[0056] • Multiple edges, where each edge represents a corresponding connection between two docking points.
[0057] In aspect 17 of the foregoing, the target molecule is a protein designed to accept a ligand.
[0058] In aspect 18, which is based on either of the two aspects mentioned above, the weight of each edge represents the Euclidean distance between two connected points.
[0059] In aspect 19, which is based on any of the three aspects mentioned above, the description of the docking site to be modeled includes the spatial coordinates of a plurality of docking points that are part of the docking site.
[0060] In aspect 20, which is based on any of the four aspects mentioned above, the description (41) of receiving the docking site to be modeled includes:
[0061] - Access a database containing descriptions of docking sites for multiple molecules in a given format (103).
[0062] - Select the docking sites to be modeled from the database.
[0063] In aspect 21, which is based on any of the five aspects mentioned above, the description (41) of receiving the docking site to be modeled includes:
[0064] - Access a database (103) containing descriptions of docking sites of multiple molecules in a given format, wherein the database contains descriptions of docking sites of molecules in .pbp file format;
[0065] - Select the docking sites to be modeled from the database; wherein the descriptions of the docking sites to be modeled from the database are in .pbp file format.
[0066] In aspect 22 according to any of the aforementioned six aspects, the method includes an enhancement step (43), which includes:
[0067] - Based on information related to the binding molecule (optionally a ligand) to be accommodated at the docking site with the target molecule, add one or more nodes and one or more edges to the graph representing the docking site.
[0068] In aspect 23 according to any of the aforementioned seven aspects, the method includes an enhancement step (43), which includes:
[0069] - Based on the diagram (G) representing the binding molecules mol Understanding the side length distribution of ) in the diagram representing the docking sites (G) grid Insert nodes into the edge weight distribution to achieve a similar edge weight distribution.
[0070] In the 24th aspect according to any of the aforementioned eight aspects, the method includes a reduction step (44), which includes:
[0071] - Based on information related to the binding molecule (optionally a ligand) to be accommodated at the docking site with the target molecule, remove one or more nodes and one or more edges of the graph representing the docking site.
[0072] In aspect 25 of the foregoing, the reduction step includes, based on a graph (G) representing the binding molecules... mol The side length distribution of ) from the graph representing the docking sites (G) grid Remove the length of the graph representing the bound molecules in the graph (G). mol Any edge whose length does not match or differs from the given tolerance limit.
[0073] In aspect 26, which is based on any of the aforementioned ten aspects, in combination with aspect 22,
[0074] The enhancement step includes adding nodes to the graph representing docking sites so that the edge weight distribution in the graph representing docking sites is similar to the edge weight distribution in the graph representing binding molecules (optionally ligands).
[0075] Aspect 27 relates to a computer-implemented method for determining the optimal positional match of a binding molecule at a docking site of a target molecule, for example, for determining the optimal positional match of a ligand at a docking site of a target protein, the computer-implemented method comprising the following steps:
[0076] - Obtain (51) a diagram representing the bound molecules (G) mol ),
[0077] - Obtain (52) a map representing the docking sites of the target molecule (G grid ),
[0078] - A map representing docking sites of the target molecule (G grid ) and the diagram representing the bound molecules (G) mol (53) Association is used to determine whether there is an ideal positional match between the docking sites of the binding molecule and the target molecule.
[0079] In aspect 28 of the foregoing, the method is used to determine the optimal positional match of the ligand at the docking site of the target protein.
[0080] In aspect 29, which is based on either of the two aforementioned aspects, the diagram representing the binding molecule (G) mol () is obtained using the method described according to any one of aspects 1 to 15.
[0081] In aspect 30, based on any of the three aspects mentioned above, the diagram (G) representing the docking sites of the target molecule. grid () is obtained using the method described in any one of aspects 16 to 26.
[0082] In aspect 31, which is based on any of the four aspects mentioned above, determining whether there is an ideal positional match between the docking sites of the binding molecule and the target molecule includes:
[0083] - Identify one or more of the most ideal three-dimensional orientations of the binding molecule at the docking site.
[0084] In aspect 32 of the foregoing, each three-dimensional pose defines the conformation, position, and orientation of the binding molecule within the docking site.
[0085] In aspect 33, which is based on any of the aforementioned six aspects, the diagram representing the docking site (G) grid ) and the diagram representing the bound molecules (G) mol The associated steps (53) include:
[0086] - Determine the docking sites representing the target molecule (G) map grid Does it contain a diagram (G) representing the bound molecules? mol Isomorphic subgraphs (53a).
[0087] In aspect 34, which is based on any of the aforementioned seven aspects, the diagram (G) representing the docking site will be... grid ) and the diagram representing the bound molecules (G) mol The associated steps include:
[0088] - Determine the docking sites representing the target molecule (G) map grid Does it contain a diagram (G) representing the bound molecules? mol Subgraphs that are weighted isomorphic or nearly weighted isomorphic (53b).
[0089] In aspect 35 of the foregoing, a graph (G) representing docking sites of the target molecule is determined. grid Does it contain a diagram (G) representing the bound molecules? mol Subgraphs that are weighted isomorphic or nearly weighted isomorphic The steps in (53b) include:
[0090] - By minimizing an error function, one or more optimal three-dimensional orientations of the binding molecule at the docking site are identified, said error function being correlated with a graph (G) representing the binding molecule. mol The weights of the edges and the graph representing the docking sites (G) grid subgraph The difference between the weights of the edges is related.
[0091] In aspect 36, which is based on any of the aforementioned six aspects, the step of identifying one or more of the most ideal three-dimensional orientations of the binding molecule at the docking site includes verification:
[0092]
[0093] In aspect 37 of the foregoing, the step of identifying one or more optimal three-dimensional orientations of the binding molecule at the docking site further includes minimizing or reducing to zero the following error function:
[0094]
[0095] in:
[0096]
[0097] This means that in order to make the error function zero (or small enough, i.e., below a given acceptable threshold) ), and edge The relevant weights must be equal to (or nearly equal to) the edges. The weights, where and And for a given isomorphism f, for each exist ;
[0098] The terms in the above expressions for aspects 36 and 37 are as follows:
[0099] - It is a diagram representing the binding of molecules.
[0100] - This is a diagram representing the docking sites.
[0101] - It is G grid A subgraph, and is G mol In G grid isomorphism on,
[0102] - V mol It is G mol The set of nodes,
[0103] - E mol It is G mol The set of edges,
[0104] - W mol It is G mol The set of edge weights,
[0105] - V grid It is G grid The set of nodes,
[0106] - E grid It is G grid The set of edges,
[0107] - W grid It is Ggrid The set of edge weights,
[0108] - yes The set of nodes,
[0109] - yes The set of edges,
[0110] - yes The set of edge weights,
[0111] - It is G mol The node,
[0112] - It is G grid The node.
[0113] In aspect 38, according to any of the aforementioned eleven aspects, the diagram representing the docking site (G) grid ) and the diagram representing the bound molecules (G) mol The associated steps include:
[0114] - In the first stage (53a), a map (G) representing docking sites of the target molecule was determined. grid Does it contain a diagram (G) representing the bound molecules? mol Subgraphs exhibiting unweighted isomorphism This allows for the identification of the pre-optimal three-dimensional orientation of the binding molecule at the docking site.
[0115] - Then, in the second stage (53b), the optimal three-dimensional pose is selected from the pre-optimal three-dimensional poses identified in the first stage, which minimizes the error function, said error function being correlated with the graph (G) representing the binding molecules. mol The weights of the edges and the graph representing the docking sites (G) grid subgraph The difference between the weights of the edges is related.
[0116] In aspect 39 of the foregoing, the first stage (53a) step of identifying the pre-ideal three-dimensional orientation of the binding molecule at the docking site includes verification:
[0117]
[0118] In aspect 40 of the foregoing, the second-stage (53b) step of selecting the optimal three-dimensional pose from the pre-optimal three-dimensional poses identified in the first stage includes determining the three-dimensional pose that minimizes or equals zero the following error function:
[0119]
[0120]
[0121] in:
[0122]
[0123] This means that in order to make the error function zero (or small enough, i.e., below a given acceptable threshold), with the edge The relevant weights must be equal to (or nearly equal to) the edges. The weights, where and And for a given isomorphism f, for each exist ;
[0124] The terms in the above expressions for aspects 39 and 40 are as follows:
[0125] - It is a diagram representing the binding of molecules.
[0126] - This is a diagram representing the docking sites.
[0127] - It is G grid A subgraph, and is G mol In G grid isomorphism on,
[0128] - V mol It is G mol The set of nodes,
[0129] - E mol It is G mol The set of edges,
[0130] - W mol It is G mol The set of edge weights,
[0131] - V grid It is G grid The set of nodes,
[0132] - E grid It is G grid The set of edges,
[0133] - W grid It is G grid The set of edge weights,
[0134] - yes The set of nodes,
[0135] - yes The set of edges,
[0136] - yes The set of edge weights,
[0137] - It is G mol The node,
[0138] - It is G grid The node.
[0139] In aspect 41, according to any of the aforementioned thirteen aspects, the diagram representing the docking site (G) grid ) and the diagram representing the bound molecules (G) mol The associated steps include constructing the corresponding Quadratic Unconstrained Binary Optimization (QUBO) problem (53c).
[0140] In aspect 42 of the foregoing, given a graph (G) representing docking sites... grid ) and diagrams representing bound molecules (G) mol The corresponding quadratic unconstrained bivariate optimization (QUBO) problems include:
[0141] - Define a set of binary variables:
[0142]
[0143] - The diagram of the binding molecules (G) mol Each node of ) is mapped to the docking point map (G grid One and only one node:
[0144]
[0145] - The diagram of the binding molecules (G) mol Each edge of ) is mapped to the docking point map (G) grid The edge of )
[0146]
[0147] - Define isomorphisms:
[0148]
[0149] In aspect 43 of the foregoing, constructing the corresponding quadratic unconstrained bivariate optimization (QUBO) problem further includes:
[0150] - Define optimization terms:
[0151]
[0152] This means if there are two variables
[0153] It will node Mapped on nodes on, and
[0154] It will node Mapped on nodes superior
[0155] If the value is 1, then the evaluation score is given, and this score is equal to the value of the edge. and The squared modulus of the difference between the relevant edge weights.
[0156] In aspect 44 of the foregoing, constructing the corresponding quadratic unconstrained bivariate optimization (QUBO) problem includes:
[0157] - The complete Hamiltonian associated with the formulation of the quadratic unconstrained bivariate optimization (QUBO) problem ( ) is defined as:
[0158]
[0159] Where (A) is the optimization parameter.
[0160] In aspect 45 according to the foregoing, by making the complete Hamiltonian ( Minimize to determine the graph (G) representing the bound molecules. mol The weights of the edges and the graph representing the docking sites (G) grid subgraph The ideal 3D pose is the one where the error function related to the difference between the weights of the edges is minimized.
[0161] In aspect 46, which is based on one of the two aforementioned aspects, the solution of the quadratic unconstrained binary optimization (QUBO) problem and / or the determination of the most ideal three-dimensional pose is performed by a quantum annealing (QA) optimization process executed by a quantum annealer.
[0162] In aspect 47, based on one of the aforementioned three aspects, the optimization parameter (A) is as follows:
[0163]
[0164] In aspect 48 of the foregoing, the optimization parameter (A) is:
[0165]
[0166] In aspect 49, according to any one of aspects 6 through 48, the diagram representing the binding molecule (G) mol It does not include any edges other than the first edge (2), the second edge (3), and the third edge (4).
[0167] In aspect 50 according to any one of aspects 6 to 49, the combined molecule comprises five or more atoms, and wherein the diagram (G) representing the combined molecule... mol ) contains one or more nodes that are not connected to the same graph (G) representing the binding molecule via the first, second, or third edge. mol It is directly connected to at least one other node.
[0168] In aspect 51 according to any one of aspects 6 to 50, the combined molecule comprises six or more atoms, and wherein the diagram (G) representing the combined molecule... mol ) contains one or more nodes (1), which are not connected to the same graph (G) representing the binding molecule by the first, second, or third side. mol () is directly connected to at least two or more other nodes.
[0169] In aspect 52 according to any of the foregoing aspects, the diagram (G) representing the binding molecule mol It is not a fully connected graph.
[0170] In aspect 53 according to any of the foregoing aspects, the diagram representing the binding molecule (G) mol The total number of nodes (1) is an integer denoted as n, and the total number of edges (2, 3, 4) is an integer denoted as E, where:
[0171]
[0172] In aspect 54 according to any of the foregoing aspects, two molecules that differ only in the degree of rotation of a rotatable bond (a rotatable bond is a bond connecting two parts of a molecule that can rotate relative to each other) are connected by the same graph (G). mol )express.
[0173] In aspect 55 according to any of the foregoing aspects, a diagram (G) representing docking sites of the target molecule. grid () is a fully connected graph.
[0174] In aspect 56 according to any of the foregoing aspects, the binding molecule is a ligand and the target molecule is a protein, thereby a computer-implemented method determines the optimal positional match of the ligand at the docking site of the target protein.
[0175] In aspect 57 of any of the foregoing aspects, the method is used within the scope of the drug discovery process.
[0176] Aspect 58 relates to a drug discovery process, which includes computer implementation methods according to any of the preceding aspects.
[0177] Aspect 59 relates to a computer-implemented method for identifying one or more binding molecules from a list of numerous candidate binding molecules that are expected to match at the docking site of a target molecule, for example, for identifying one or more ligands from a list of numerous ligands that are expected to match a protein pocket.
[0178] The method includes:
[0179] - Perform the method according to any one of aspects 27 to 58 above for each candidate binding molecule.
[0180] - Identify one or more binding molecules from among many candidate binding molecules that are expected to match at the docking site of the target molecule, as those binding molecules that have been determined to have the most ideal positional match at the docking site of the target molecule.
[0181] In aspect 60 of the foregoing, the one or more binding molecules that are expected to match at the docking site of the target molecule are those binding molecules whose most ideal three-dimensional orientation at the docking site has been identified.
[0182] In aspect 61, according to either of the two preceding aspects, the method is used within the scope of the drug discovery process.
[0183] Aspect 62 relates to a drug discovery process, which includes computer implementation methods according to any one of the preceding three aspects.
[0184] Aspect 63 relates to computing systems, which include:
[0185] One or more processors; and
[0186] A computer-readable storage device coupled to one or more processors and having instructions stored thereon, which, when executed by one or more processors, cause the one or more processors to perform the method described in any of the foregoing aspects.
[0187] The 64th aspect relates to a non-transitory computer-readable storage medium that can be coupled to one or more processors and has instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform the method described in any one of the preceding aspects 1 through 62. Attached Figure Description
[0188] Some embodiments and aspects of the present invention are described below with reference to the accompanying drawings, which are provided for illustrative purposes only and therefore not for limiting purposes, wherein:
[0189] Figure 1 and Figure 2 It is a graphical representation of molecules;
[0190] Figure 3 This is a schematic diagram showing the key angle between two edges that leave a common node;
[0191] Figure 4 and Figure 5 It is another graphical representation of the combined molecules;
[0192] Figure 6 This is a schematic diagram showing the twist angle (also known as the dihedral angle) between the AB and DC bonds when considering four atoms connected in the order ABCD.
[0193] Figure 7 It is a schematic diagram representing molecules and related graphs;
[0194] Figures 8 to 12 A schematic diagram is shown showing the molecular process in multiple steps during the molecular simplification stage;
[0195] Figure 13 This is a schematic diagram of a computing system for performing the methods described herein, according to some aspects of the present invention;
[0196] Figure 14 This is a block diagram of a computer implementation method for modeling bound molecules according to some aspects of the present invention;
[0197] Figure 15 This is a block diagram of a computer-implemented method for modeling docking sites of target molecules according to some aspects of the present invention; and
[0198] Figure 16This is a block diagram representing a computer-implemented method for determining the optimal positional match of a binding molecule (e.g., a ligand) in a docking site (e.g., a pocket of a protein) of a target molecule, according to other aspects of the invention.
[0199] Conventions and Definitions
[0200] In this detailed description and the accompanying drawings, corresponding parts are indicated by the same reference numerals. The drawings may show a non-scale representation; furthermore, the parts and components shown in the drawings may be in schematic form.
[0201] Note that in the following text and claims, the binding molecule may be, for example, a ligand, and the target molecule may be, for example, a protein. Therefore, when describing methods or method steps (e.g., methods for modeling molecules or methods for determining the best match between a list of numerous ligands and a pocket of a target protein), this should not be interpreted in a limiting manner, and it should be understood that the methods described and claimed herein can be applied to binding molecules other than ligands and target molecules other than proteins.
[0202] Bond order refers to the number of chemical bonds formed between two atoms in a molecule. For example, two hydrogen atoms bond to form a hydrogen molecule (HH or H2). Therefore, the bond order of H2 is 1. Similarly, the bond order of oxygen (O=O or O2) is 2, and the bond order of nitrogen (N≡N or N2) is 3. This value does not need to be an integer, especially for polyatomic molecules with resonance structures. Bond order indicates the stability of the bond. It is inversely proportional to the bond length. The shorter the bond length, the higher the bond order.
[0203] The bond order can be determined from the Lewis structure using the following steps:
[0204] Step 1: Draw the Lewis structure
[0205] Step 2: Assign the following key levels to different key types:
[0206] 0: No key
[0207] 1: Single covalent bond
[0208] 2: Bicovalent bond
[0209] 3: Tricovalent bond
[0210] Step 3: Count the total number of bonded atom pairs or bond groups.
[0211] Step 4: Add the key levels of each key group.
[0212] Step 5: Divide the total bond level by the number of bond groups to obtain the average bond level.
[0213] Bond length describes the distance between the nuclei of bonded atoms. Bond length is inversely proportional to bond order; the higher the bond order, the shorter the bond length. This is because a higher bond order increases the amount of attraction between the atoms, resulting in a shorter bond. Bond length is expressed in pm (1 picometer). 10 -12 (meters) or Å (1 angstrom) 10 -10 (Meters) are used for measurement, see, for example, the following figures showing the bond lengths between atoms in certain molecules.
[0214]
[0215] Bond length examples in picometers and angstroms
[0216] Bond lengths can be considered fixed: they cannot be arbitrarily compressed or stretched. At short distances, hard spherical boundaries prevent the interaction from becoming positive. At long distances, spinoline conditions prevent the bond constant from becoming negative. Both are intrinsic consequences of bond lengths at equilibrium. Detailed Implementation
[0217] A. General Introduction
[0218] Molecular docking (MD) is an important step in the drug discovery process, which aims to calculate the preferred position and shape of the first molecule relative to the second molecule when the first molecule binds to the second molecule.
[0219] The MD process consists of two main tasks:
[0220] - Shape Complementarity (SC) Search: Detects the three-dimensional orientation of a molecule (ligand), that is, the effective conformation, position and orientation of the ligand within the active site (also known as the pocket) of the target molecule (protein).
[0221] - Binding Affinity (BA) Evaluation: Orientation is ranked using a scoring function. Generally, a lower docking score indicates better binding affinity. Scoring functions are limited to evaluating, rather than calculating, the binding affinity between proteins and ligands. For this purpose, assumptions and simplifications are utilized. Scoring functions can be categorized based on whether they use statistical knowledge of the interaction between the physical and chemical characteristics of the ligand and pocket, or the type of chemical compound and some typical patterns in pocket shape.
[0222] The computer implementation method, computing system, and storage medium of the present invention can be used in drug discovery processes, and for example in SC searches, to perform one or more of the following:
[0223] - Effectively model the structure and geometry of bound molecules (e.g., ligands).
[0224] - Effectively model the structure and geometry of target molecules or pockets within target molecules (e.g., pockets in proteins).
[0225] - Facilitates the recognition of promising matches between binding molecules (e.g., ligands) and target molecules (e.g., proteins).
[0226] - Solve the so-called "molecular docking (MD) problem," which involves identifying from a list of numerous binding molecules those that are likely to match the docking sites of the target molecule. For example, from a list of numerous ligands, identify those that are likely to match the protein pocket.
[0227] Computer implementation methods may include the aspects summarized below.
[0228] According to one aspect, a complex object, such as a molecule or a part thereof (e.g., docking site or pocket), can be simplified by considering only the following structural information: atoms, the relative bonds between atoms, and optional other constraints.
[0229] In fact, at least from a geometric point of view, candidate binding molecules are considered as structures that must bind to the docking sites of the target molecule in order to dock.
[0230] The same approach can be applied to target protein pockets, which can be viewed as three-dimensional volumes in which binding molecules can be adapted using rigid rotational translation and additional torsion provided by the internal degrees of freedom of the binding molecules.
[0231] As further described below, and according to another aspect, candidate binding molecules are appropriately modeled using corresponding graphs, the docking sites or pockets of the target molecule are also modeled using corresponding graphs, and then the binding molecule graph is appropriately correlated with the target molecule graph to search for the most suitable compound binding molecule-target molecule, as further explained below.
[0232] In other aspects, the optimal match between a list of numerous ligands (or other candidate binding molecules) and the pocket of the target protein (or other target molecule) is determined, selecting the best fit from the tested candidates. In one possible aspect, the proposed method addresses the SC search phase by formulating the problem in a form that is more easily digestible by annealing methods. The new method is better suited for quadratic unconstrained binary optimization (QUBO) formulation, which is naturally applicable to optimization problems solved, for example, by quantum annealing.
[0233] According to one aspect of the proposed method, docking points within a pocket (e.g., a protein pocket) that identifies the active region of the pocket itself are selected. The number of docking points depends on the shape and size of the pocket and is generated using methods derived from the literature (e.g., CAVIAR, PASS, POCASA). The docking points of the pocket can be viewed as vertices of a weighted spatial grid that identifies a discretization of the 3D spatial regions within the pocket, where the weights represent the distances between docking points.
[0234] On the other hand, ligands (or other binding molecules) are represented by weighted graphs that incorporate the molecular geometry, such as the connectivity between atoms, rotatable bonds, bond lengths, and fixed angle values.
[0235] Finally, the attitude of the ligand (or other binding molecule) is evaluated based on the weighted subgraph isomorphism between the ligand graph and the spatial grid of the pocket. This method has the ability to be naturally formulated as a QUBO problem, thus avoiding the wasteful overhead of resources typically associated with the transformation from High Order Binary Optimization (HUBO) problems to QUBO.
[0236] Optimization of weights enables the search for configurations that maintain geometry, while subgraph isomorphism allows rotatable bonds to rotate in space, and ligand / binding molecules as a whole rotate and translate within the spatial grid of the pocket.
[0237] B. Computing System
[0238] As already mentioned, the methods disclosed herein are computer-implemented methods. These methods can be implemented by system 100 (see...). Figure 10 The system 100, for example, a computing system, includes one or more processors 101 and a computer-readable storage device 102 coupled to the one or more processors 101 and having instructions stored thereon, which, when executed by the one or more processors, cause the one or more processors to perform the methods disclosed herein and any one of the preceding claims. The computing system 100 may also be communicatively connected to one or more databases 103 for receiving molecular-related information, as further described below. The methods disclosed and claimed herein may also be executed by one or more processors, for example... Figure 13 The processor 101 is capable of executing appropriate instructions stored in a non-transitory computer-readable storage medium that can be coupled to the one or more processors.
[0239] As used herein, the term "computing system" encompasses all devices, apparatuses, and machines used for processing data, including, for example, programmable processors, computers, or multiple processors or computers. In addition to hardware, a device may include code that creates an execution environment for a computer program (e.g., code constituting processor firmware, protocol stacks, database management systems, operating systems, or any suitable combination thereof). A computer program (also referred to as a program, software, software application, script, or code) may be written in any suitable form of programming language, including compiled or interpreted languages, and may be deployed in any suitable form, including as a standalone program or as a module, component, subroutine, or other unit suitable for a computing environment. A computer program does not necessarily correspond to a file in a file system. A program may be stored as part of a file containing other programs or data (e.g., in one or more scripts in a markup language document), in a single file dedicated to the program in question, or in multiple co-located files (e.g., in a file storing one or more modules, subroutines, or portions of code). A computer program may be deployed to execute on a single computer, or on multiple computers located at one site or distributed across multiple sites and interconnected by a communication network.
[0240] The processes and logic flows described in this specification can be executed by one or more programmable processors that execute one or more computer programs to perform functions by manipulating input data and generating outputs. The processes and logic flows can also be executed by dedicated logic circuits, and the device can be implemented as dedicated logic circuits (e.g., FPGAs (Field Programmable Gate Arrays) or ASICs (Application-Specific Integrated Circuits), GPUs (Graphics Processing Units), TPUs (Tensor Processing Units), or quantum computers).
[0241] Processors suitable for executing computer programs include, for example, both general-purpose and special-purpose microprocessors, and any one or more processors of any suitable type of digital computer. Typically, the processor receives instructions and data from read-only memory or random access memory, or both. The components of a computer may include a processor for executing instructions and one or more storage devices for storing instructions and data. Typically, a computer will also include one or more mass storage devices (e.g., disks, magneto-optical disks, or optical disks) for storing data, or operatively coupled thereto to receive data from or transfer data to, or both. However, a computer does not need to have such devices. Furthermore, a computer may be embedded in another device, such as a mobile phone, a wearable computing device (e.g., a smartwatch, smart wristband, smart ring), a personal digital assistant (PDA), a mobile audio player, or a Global Positioning System (GPS) receiver. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and storage devices, including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices); magnetic disks (e.g., internal hard disks or removable disks); magneto-optical disks; and CD-ROMs and DVD-ROMs. Processors and memory may be supplemented by or incorporated into dedicated logic circuitry.
[0242] C. From binding molecules (e.g., ligands) to molecular diagrams
[0243] According to one aspect, a computer-based method for modeling binding molecules is now described. This method and related method steps are described through… Figure 14 The block diagram is schematically represented, wherein the method is generally indicated by reference numeral 30 in the accompanying drawing.
[0244] The description of the binding molecule may already be available to the computational system 100, or it may be obtained, for example, from a database 103 of molecules communicatively connected to the computational system 100 (step 31). The computational system 100 may access the database 103, which contains multiple molecular descriptions in a given format (e.g., the database may contain molecular descriptions in .mol2 file format), and subsequently select the binding molecule to be modeled from the database.
[0245] A computational system 100 capable of performing the methods described herein starts from a molecular description (e.g., in .mol2 format) and obtains a graph (G) representing the molecular geometry by introducing weighted edges to enforce constraints on bond lengths and, for example, bond angles. molIt should be noted that, regardless of the format used, the molecular description of the bound molecule to be modeled may include at least the identification of the following: the atoms of the bound molecule, the bonds between the atoms of the bound molecule, and the type of said bonds.
[0246] As already mentioned, the binding molecule (ligand) to be modeled is converted ( Figure 14 Step 32) Mapping (G) mol ), its in Figure 1 The diagram is shown schematically. (Figure G) mol A graph (G) can represent molecular features and geometric constraints. Then, a graph representing the bound molecules (G) is used. mol It can be conveniently associated with the target molecule (protein). Figure 1 It is used to determine whether the two molecules in question are an acceptable match, as explained further below.
[0247] More specifically, once a molecular description is obtained, for example in .mol2 format, in order to map the molecules to the graph (G mol (See again) Figure 1 Each atom of the molecule is represented by one of the nodes 1 (or vertices) of the graph (substep 32a). Furthermore, to connect the nodes 1 of the graph to maintain the coherent structure of the molecule, an edge 2 is added between nodes 1 associated with atoms having a non-zero bond order (substep 32b). In other words, the bonds between corresponding atom pairs of the molecule are represented by edge 2 in the graph (also referred to below as...). Figure 1 The first side (3) in the middle represents this.
[0248] In the other hand, to respect the bond lengths between atomic pairs in the bonded molecule, edge 2 (the first edge) is a weighted edge. This is in Figure 2 The middle part indicates that, in other words, the edge is rich in weight information representing the bond length between two related atoms (substep 32c). The weight is equal to the distance between the two atoms, expressed in angstroms or picometers.
[0249] To provide an even better model of the bonded molecule, another fixed geometric feature to consider is the bond angle between corresponding bond pairs adjacent to the same atom. According to another approach, each bond angle can be represented in the molecular diagram as an additional (or second) side 3, which has a weight representing the magnitude of the corresponding bond angle (sub-step 32d). In fact, as... Figure 3 As shown in the example, the key angle between two edges departing from a common node can be identified by adding a length constraint between two adjacent edges (i.e., between two nodes at the ends of two adjacent edges and opposite to the common node). Figure 3In the case shown, the measurements of the two adjacent sides are A = 3 and B = 4 in given units, and the measurement of the third side shown is C = 5 (where A, B, and C are measured in the same unit, such as picometers), resulting in an angle of 90° between A and B.
[0250] Figure 4 and Figure 5 The molecular diagrams are shown before and after the addition of the second side 3 with the replacement bond angle constraint.
[0251] In summary, one can start with molecules in .mol2 format to obtain a graph representing their geometry and ensuring bond length and bond angle constraints are met by introducing weighted edges. Each second edge 3 has a weight representing the magnitude of the corresponding bond angle.
[0252] Besides bond lengths and bond angles, another parameter to consider when describing molecular conformation is the torsion angle. The torsion angle (also called the dihedral angle) is the relative angle between the AB and DC bonds when considering four atoms connected in the ABCD sequence. The torsion angle, or dihedral angle, can also be considered as the angle between two planes defined by ABC and BCD. Therefore, rotation around the BC single bond results in a different torsion angle. The torsion angle is usually referred to as τ (see [link to τ]). Figure 6 By convention, following the definition proposed by W. Klyne and V. Prelog (Experientia, 1960, 16, 521-523, which is incorporated herein by reference), a positive value for the torsion angle ABCD is specified as a clockwise rotation of up to 180°, which is necessary for the preceding atom to be in an overlapping position with the following atom. That is, it should be noted that molecules can have rotatable bonds. A rotatable bond is defined as any single non-cyclic bond connected to a non-terminal non-hydrogen atom. In these cases, the torsion angle should not be considered fixed. Rotatable bonds add a degree of freedom to a general rigid body in three-dimensional space (three translational degrees of freedom and three rotational degrees of freedom). In particular, single bonds in organic molecules are free to rotate due to the end-to-end nature of their orbital overlap. Double and triple bonds are not rotatable because the π bond (generated by overlapping p orbitals) prevents the rotation of the dihedral angle. Since some dihedral angles are fixed in some molecules, according to the other hand, it is conceivable that a diagram representing the bound molecule or ligand can also represent this constraint. To address this, for each dihedral angle ABCD that needs to be fixed, one or more additional edges (or third edges) 4 and 5 can be added to the molecular diagram between the nodes representing atoms A and D. Figure 14 (Step 32e in the text). This strategy allows for the topological representation of non-rotational constraints between four atoms. Figure 7 The diagram shows how to fix the dihedral angle in a diagram related to ethane: adding one of the two edges, 4 or 5, is sufficient.
[0253] On the other hand, it is worth noting that, generally speaking, a binding molecule (e.g., a ligand) may comprise two or more parts capable of rotating relative to each other. According to another aspect of the invention, the parts of the same molecule that have rotational degrees of freedom relative to each other can be efficiently represented by the methods disclosed herein. In fact, according to another aspect of the invention, a diagram (G) representing a binding molecule... mol ) is not a fully connected graph; in other words, in the graph representing the bound molecules (G) mol In graph (G), if mol The total number of nodes (1) in graph (G) is an integer represented as n, and the graph (G) mol If the number of edges (2, 3, 4) in a given number is an integer denoted by E, then:
[0254] .
[0255] For example, the value of E can be compared with The values differ by 1, 2, 3 or more. The value is greater than E. This means that the graph representing the bound molecules (G) is greater than E. mol This is not a graph type in which every possible pair of nodes (or vertices) in the graph has an edge between them. This leaves room for proper modeling of molecules / ligands, where one or more parts of the molecule can be rotated relative to one or more other parts. Therefore, the graph according to the invention allows for more efficient determination of promising matches between binding molecules and docking sites of target molecules, because the binding molecules (by graph (G)) are... mol The diagram indicates that the part can have rotation and adapt to docking sites, which is the opposite of the model in which the bound molecules are represented by a fully connected and therefore completely rigid diagram.
[0256] In one possible instance, a diagram (G) representing the binding molecules. mol The first side, the second side, and the third side, 2, 3, 4, 5, as described above, may be included (thus fixing the distance, the angle between two bonds connected to the same node, and the dihedral angle), but not the other sides except for the first side 2, the second side 3, and the third side 4, 5, and in particular, not the other sides, which determine the locking constraints between the parts of the molecule that should have rotational degrees of freedom from each other by means of the other sides.
[0257] In one possible instance, if the combined molecule contains five atoms, then the diagram (G) representing the combined molecule is... mol () may include a node that is not connected to the same graph (G) representing the binding molecule via the first, second, or third edge. mol A node can be directly connected to at least one other node; in other words, a node may not be connected to all the remaining nodes.
[0258] In another example, the bound molecule contains six atoms, and the diagram representing the bound molecule (G) mol ) may include one of nodes 1, which is not connected to the same graph (G) representing the binding molecule by the first, second, or third edge. mol The two nodes are directly connected.
[0259] Once the described constraints are in graph (G) mol The geometry of the molecules is defined in the same diagram (G). mol The edges and edge weights of a graph (G) are used to represent the graph. Therefore, each graph (G) defined above... mol No distortion of bond lengths, bond angles, or fixed dihedral angles associated with non-rotatable bonds is introduced. On the other hand, graphical representation (G) mol It does not constrain the rotation of rotatable bonds. In other words, two identical molecules that are different in terms of the rotation of rotatable bonds produce the same graph: the same set of nodes and the same set of edges (and edge weights), because bond lengths and bond angles are independent of the configuration of the dihedral angles associated with the rotatable bonds.
[0260] According to another aspect, the computer implementation method described herein may include a molecular simplification step (step 33), which aims to reduce the modeling size of the molecule without substantially affecting the geometric behavior of the molecule during docking evaluation, and at the same time reduce the computational requirements of the computing system 100.
[0261] The first type of molecular simplification involves, for example, reducing the number of atoms in the molecule to be modeled by removing terminal hydrogens. Before removing terminal hydrogens ( Figure 8 The diagram shows a given starting molecule (6) with a terminal hydrogen 7 and the molecule after removing the terminal hydrogen ( ). Figure 9 A comparison between molecules 6 and 6 can reveal that molecule 6, as well as the correlation map (G) to be determined subsequently, mol How does it become simpler? In other words, by removing some simple atoms, such as terminal hydrogens, the simplification effect is not negligible: both the number of edges and the number of nodes are reduced. Therefore, the docking task also becomes simpler, and the lost information does not significantly affect the quality of the model or the reliability of the docking task.
[0262] In another aspect, it is conceivable to remove fragment 8 of molecule 6 by removing one or more atoms and one or more bonds that are not included in the shortest path connecting adjacent rotatable bonds. Figure 10 This simplification allows for significant relief of the problem that, in some cases, small molecules can be reduced to sequences of a few (< 10) rotatable bonds. This simplification can be combined with the removal of terminal hydrogens (see again). Figures 8 to 10It is important to emphasize that after removing the rigid structure, the associated dihedral angles (which are essentially non-rotatable) must be fixed. For this purpose, it may be necessary to insert bond 9 into the molecule each time a molecular fragment is replaced. Figure 11 This simplification, through molecular fragment removal, significantly reduces the difficulty of the docking problem. However, choosing a graph with a lower number of vertices representing the molecules increases the need to discard some vertices from the docking molecule graph (G). mol ) and target molecular map (G grid The possibilities of poses obtained from the association of graphs are explained below when discussing weighted isomorphisms between graphs.
[0263] In the simplified variation of fragment removal (see...) Figure 12 This method can remove large rigid structures, such as chains of aromatic compounds, while retaining only one edge of the two rotatable bonds connecting these large rigid structures. To enhance the expressive power of this simplification, the method can provide the addition of representative nodes 10 to the graph of the bound molecules, for example, positioned at the center of mass. In this way, the presence of rigid structures at that location can still be considered without having to retain a large number of atoms (which would result in a large number of vertices and edges that would be matched during the determination of the isomorphism of the graph associated with the molecule).
[0264] After the above steps (or a portion thereof, depending on the situation) are completed, the final binding molecular diagram G is generated. mol And the methods described in this section have ended. Figure 14 Step 34 in the middle.
[0265] D. From target molecule (e.g., protein or protein pocket) to target molecule map (e.g., protein pocket map)
[0266] According to another aspect, a computer-based implementation method for modeling target molecules is now described. Figure 15 The report presents a block diagram of the steps of the method, wherein the entire method is indicated by reference numeral 40 in the accompanying figure.
[0267] The description of the target molecule may already be available to the computational system 100, or it may be obtained, for example, from a database 103 of molecules communicatively connected to the computational system 100. The computational system 100 can access the database 103, which contains descriptions of multiple target molecules in a given format (e.g., the database may contain descriptions of molecular docking sites in .pbp file format), and subsequently select docking sites or target molecules to be modeled from the database. Figure 15 Step 41 in the process.
[0268] Using techniques from the literature (e.g., CAVIAR, PASS, POCASA), a description of the docking sites (coordinate set) for a given target molecule (e.g., a given protein) can be found. Therefore, docking sites are described as a set of points and their associated coordinates. The computational system 100 starts with the coordinates of the set of points representing the docking sites, for example, from a .pbp file description of the docking sites of the target molecule, and obtains a graph representing the geometry of the docking sites. Figure 15 (Step 42 in the process). In detail, the computer implementation method executed by the computing system 100 provides a representation of the docking site to be modeled as a graph including: a plurality of nodes, wherein each node represents a corresponding docking point of the docking site; and a plurality of edges, wherein each edge represents a corresponding connection between two docking points.
[0269] In summary, a graph (G) can be generated from a grid of points describing pockets in a target molecule or protein. grid Each node of the graph is obtained from the set of obtained nodes. Different strategies can be used to address the connectivity (i.e., arcs or edges) of the graph.
[0270] According to one aspect, the docking point v is used as the vertex of a weighted graph associated with the pocket of the protein (or other target molecule).
[0271]
[0272] Where e u,v The edge connecting points u and v has a weight w. u,v .
[0273] In the current preferred solution, graph (G) grid ) is a complete graph (i.e., a fully connected graph), and the edge weight w u,v It is limited to the Euclidean distance between docking points u and v, i.e. The distance is typically expressed in angstroms or picometers. This distance can be obtained from the x, y, z coordinates of the docking point in a molecular description (e.g., in the .pbp file format as described above).
[0274] Therefore, the constructed weighted graph (G) grid The 3D spatial grid inside the pocket was confirmed. This is compared with the diagram representing the bound molecules (G). mol Conversely, the constructed weighted graph (G) grid It can be a fully connected graph.
[0275] As described, this method requires the x, y, z positions of the pocket points. Therefore, different file extensions can be used, or simply a binary file containing a list of points and their coordinates can be used. The dataset that can be used to create a graph of the proteins used (protein pockets or other target molecule pockets) consists of proteins stored, for example, in '.pdb' files. Protein database file formats allow for databases representing three-dimensional structural data of (large) biomolecules (e.g., proteins). These protein datasets are typically available on the Internet through organizational websites such as PDBe, PDBj, RCSB, and BMRB. In one implementation, the pocket points (and the corresponding .pdb files) are generated using the POCASA software via the Roll algorithm.
[0276] According to another aspect, the method may also include a pocket augmentation step. Figure 15 (Step 43 in the text). In fact, it is possible that there is no edge weight w between the edges of the pocket (grid) corresponding to the edge length of the binding molecule. u,v Therefore, based on information about the binding molecule or ligand, a program can be used to strategically insert additional points to compensate for the aforementioned problems. Information related to the binding molecule (optionally a ligand) to be hosted at the docking site of the target molecule includes the edge weights of the graph representing the binding molecule (optionally a ligand). Enlargement step 43 includes adding nodes to the graph representing the docking site such that the graph representing the docking site (G...)... grid The edge weight distribution in the graph (G) represents the bound molecules (optionally ligands). mol The edge weight distribution is similar. For example, if, based on knowledge of the binding molecule or ligand, it is determined that the pocket has edges that do not match most or a significant portion of the edges of the binding molecule, then intermediate points can be added to the spatial grid of the binding molecule pocket, and thus intermediate nodes can be added to the graph representing the docking site of the target molecule. In this example, a greater number of edges are obtained, potentially covering more distance values similar to the edges of the graph representing the binding molecule. In some cases, it is also possible, or alternatively, to appropriately remove one or more nodes and one or more edges from the graph representing the docking site, based on information related to the binding molecule (optionally a ligand) to be accommodated in the docking site of the target molecule. Figure 15 (reduction step 44 in the middle).
[0277] E. Ideal positional matching of the binding molecule at the docking site of the target molecule.
[0278] Another aspect relates to a computer-implemented method for determining the optimal positional match between a binding molecule (e.g., a ligand) and a docking site in a target molecule (e.g., a pocket of a protein). This method can be executed by a computational system 100, which can be configured to perform the following steps (see...). Figure 16 It shows a block diagram of the method involved, where reference numeral 50 indicates the entire method).
[0279] - Obtain (step 51) a diagram (G) representing the bound molecules mol ),as well as
[0280] - Obtain (step 52) a map (G) representing docking sites of the target molecule. grid ).
[0281] Diagram representing bound molecules (G) mol The computer implementation method for modeling binding molecules described in the preceding section "C. From Binding Molecules (e.g., Ligands) to Molecular Diagrams" can be used (e.g., Figure 14 Method 30), obtained from molecular description.
[0282] Conversely, the diagram representing the docking sites of the target molecule (G) grid The computer-implemented method for modeling docking sites of target molecules (e.g., docking sites of target proteins that receive ligands) as described in section D. from target molecule (e.g., protein or protein pocket) to target molecule diagram (e.g., protein pocket diagram) described in the detailed description of this invention above can be used. Figure 15 Method 40), obtained from the target molecule description or docking site description.
[0283] Using the combined molecular diagram (G mol ) and docking sites or target molecular maps (G grid This method associates two graphs (step 53) to determine whether an ideal positional match exists between the docking sites of the binding molecule and the target molecule by identifying one or more ideal three-dimensional poses of the binding molecule at the docking site. Each three-dimensional pose defines the conformation, position, and orientation of the binding molecule within the docking site, according to the preferred solution. To achieve this (i.e., to complete step 53), the method associates the graph (G) representing the docking sites of the target molecule with the target molecule. grid ) and the diagram representing the bound molecules (G) mol The molecular docking problem is correlated with the target molecule and determined whether an optimal positional match exists between the docking sites of the binding molecule and the target molecule. More specifically, according to one aspect of the invention, the molecular docking problem is solved by mapping the molecular docking problem (a complex optimization problem) to an isomorphism problem between graphs; in other words, the method described herein correlates two graphs (G... mol and G grid The association was verified, and the graph of the binding molecules (G) was validated. mol Whether it can be "incorporated" into the graph (G) associated with the target molecule pocket identified as a potential docking site. grid )middle.
[0284] More detailed, the diagram representing the docking sites of the target molecule (G) grid ) and the diagram representing the bound molecules (G) mol Step 53 of the association includes determining a map (G) representing docking sites of the target molecule. grid Does it contain a diagram (G) representing the bound molecules? mol Isomorphic subgraphs .
[0285] More specifically, according to another aspect, the graph representing the docking site (G) grid ) and the diagram representing the bound molecules (G) mol The association includes identifying docking sites representing target molecules in a graph (G). grid Does it contain a diagram (G) representing the bound molecules? mol Subgraphs that are weighted isomorphic or nearly weighted isomorphic According to another aspect, a map (G) representing the docking sites of the target molecule is determined. grid Does it contain a diagram (G) representing the bound molecules? mol Subgraphs that are weighted isomorphic or nearly weighted isomorphic The steps involve identifying one or more optimal three-dimensional orientations of the binding molecule at the docking site by minimizing an error function, which is correlated with a graph (G) representing the binding molecule. mol The edge weights of the graph (G) representing docking sites are compared with the graph representing docking sites. grid subgraph The difference between the edge weights is relevant. In other words, a "perfectly matched solution" may not be certain, and therefore this method accepts a certain range of error: thus, an error function is defined, and the error is calculated and compared with an acceptable threshold. Comparisons can be made; different error functions can be defined as needed, as long as they are based on graphs representing the bound molecules (G). mol The edge weights of the graph (G) representing docking sites are compared with the graph representing docking sites. grid subgraph The difference between the edge weights is sufficient.
[0286] In summary, according to a currently preferred aspect, the isomorphism required to solve the docking task is of a weighted type. Therefore, the concept of isomorphism can be extended to a graph that must not only preserve its topological structure but also the variations in the labels associated with its edges.
[0287] Therefore, given the diagram related to the binding molecules:
[0288]
[0289] And diagrams related to docking sites or spatial grids of the target molecule:
[0290]
[0291] When G is determined mol The ideal match between the grid and its subgraphs is considered to have solved the docking problem.
[0292]
[0293] And therefore .
[0294] Since this isomorphism is weighted, it should ideally satisfy:
[0295]
[0296] This means that with the edge The relevant weight must be equal to the edge weight. The weights, where and And for a given isomorphism For each All exist .
[0297] The items in the above expression are as follows:
[0298] - It is a diagram representing the binding of molecules.
[0299] - This is a diagram representing the docking sites.
[0300] - It is G grid A subgraph, and is G mol In G grid isomorphism on,
[0301] - V mol It is G mol The set of nodes,
[0302] - E mol It is G mol The set of edges,
[0303] - W mol It is G mol The set of edge weights,
[0304] - V grid It is G grid The set of nodes,
[0305] - E grid It is Ggrid The set of edges,
[0306] - W grid It is G grid The set of edge weights,
[0307] - yes The set of nodes,
[0308] - yes The set of edges,
[0309] - yes The set of edge weights,
[0310] - u,v is G mol The node,
[0311] - u',v' is G grid The node.
[0312] Since the size of the grid is finite, the existence of a perfect solution for the weighted subgraph isomorphism cannot be guaranteed. In other words, it is unlikely to determine an isomorphism that perfectly satisfies the conditions shown in equation (2) because the target molecule (protein) pocket graph is a discretization of the pocket space and has a finite number of points.
[0313] Therefore, this issue can be reconsidered in two phases:
[0314] 1. The first phase 53a is to determine perfect topological (unweighted) subgraph isomorphisms.
[0315] 2. Then, in the second stage 53b, among the determined solutions, choose the one that makes graph (G)... mol Each edge weight of ) is mapped to the graph (G) grid The solution that minimizes the error introduced by the mismatch when ) on.
[0316] In other words, the graph (G) representing the docking site grid ) and the diagram representing the bound molecules (G) mol The associations include:
[0317] - In the first phase 53a, a map (G) representing docking sites for the target molecule was determined. grid Does it contain a diagram (G) representing the bound molecules? mol Subgraphs exhibiting unweighted isomorphism This allows for the identification of the pre-optimal three-dimensional orientation of the binding molecule at the docking site.
[0318] Then, in the second stage 53b, the optimal three-dimensional pose that minimizes the aforementioned error function (e.g., equation (2) above or other equations that can be used to determine the minimum difference between the edge weights of two related graphs) is selected from the pre-optimal three-dimensional poses identified in the first stage. This error function is related to the graph (G) representing the binding molecules. mol The edge weights of the graph (G) representing docking sites. grid subgraph The difference between the edge weights is related.
[0319] In the first phase (53a), identifying the pre-optimal three-dimensional orientation of the binding molecule at the docking site includes verification:
[0320]
[0321] In other words, geometric isomorphism must be checked.
[0322] Then, in the second stage (53b), the step of selecting the optimal three-dimensional pose from the pre-optimal three-dimensional poses identified in the first stage includes determining the three-dimensional pose that minimizes or equals zero the following error function:
[0323]
[0324] in:
[0325]
[0326] This means that for an error function of zero or a given acceptable threshold, with the edge The relevant weights must be equal to or nearly equal to the edges. The weights, where, and And for a given isomorphism For each All exist In other words, according to this second method, once the unweighted isomorphism (first stage = selecting several potential good candidates, second stage selects the most ideal candidate and accepts an error that can be limited by the operator).
[0327] The items in the above expressions are as follows (as indicated above):
[0328] - It is a diagram representing the binding of molecules.
[0329] - This is a diagram representing the docking sites.
[0330] - It is G grid The subgraph and is G mol In Ggrid isomorphism on,
[0331] - V mol It is G mol The set of nodes,
[0332] - E mol It is G mol The set of edges,
[0333] - W mol It is G mol The set of edge weights,
[0334] - V grid It is G grid The set of nodes,
[0335] - E grid It is G grid The set of edges,
[0336] - W grid It is G grid The set of edge weights,
[0337] - yes The set of nodes,
[0338] - yes The set of edges,
[0339] - yes The set of edge weights,
[0340] - u,v is G mol The node,
[0341] - u',v' is G grid The node.
[0342] In stage 53b, different measures of error can be used to rank samples that satisfy the isomorphism constraint (i.e., the binding molecules from stage 53a), for example:
[0343] - Total Pairwise Distance (TPD);
[0344] - Root Mean Squared Deviation (RMSD);
[0345] - Average Bond Distortion (ABD).
[0346] Each scoring function has the property of being usable in different contexts, and of course, technicians can determine other ranking methods. For example, RMSD is only a measurable metric when the crystal orientation of the ligand in the pocket is known. The TPD and ABD methods are described in detail below.
[0347] The assembly pair distance is calculated as follows: the graph G related to the molecule is... mol Each edge Mapped to graph G associated with protein pockets grid edge The sum of the accumulated errors.
[0348] G' grid Defined as a sample solution to a problem that satisfies hard constraints:
[0349]
[0350] Given sample solution The pairwise distance of the constraints is calculated as follows:
[0351]
[0352] In other words, the TPD of the solution is equal to the error introduced by mapping the molecular edges to mesh edges with different weights. Better mappings are those that allow for lower TPD values.
[0353] Another version of the TPD function is the same as the previous equation, but it uses the absolute value instead of the square of the distance:
[0354]
[0355] The distribution shape (landscape) of the solution space is defined by using the total pairwise distance within the objective function.
[0356] One of the limitations of the assembly on the distance is that, for the same candidate binding molecule, this value changes drastically by applying different simplifications, becoming the weight associated with each bond. Once mapped to the grid representing the pocket The range of cumulative distortions on the surface.
[0357] Furthermore, this value depends heavily on the number of edges in the graph associated with the binding molecule, making it difficult to compare the performance of different binding molecule-target molecule pairs, especially when they have different sizes.
[0358] Therefore, average bond distortion (ABD) can be used, which measures the average degree of distortion of each bond (including bonds added to represent bond angle and dihedral angle constraints in the case of double and triple bonds) in a specific sample (candidate binding molecules) that satisfies the isomorphic hard constraints (i.e., the first stage).
[0359] For the total paired distance, G' grid This represents a sample solution to the problem that satisfies the hard constraints (i.e., a potential candidate binding molecule that passes through the first stage). Sample solution G' grid The average bond distortion was measured as follows:
[0360]
[0361] In other words, the ABD of the solution is equal to the root mean square of the TPD. It measures the mean square distortion of the edges of the molecule when the edges of the molecule are mapped to grid edges with different weights. Better mappings are those that allow for lower ABD values.
[0362] For TPD, instead of using the square of the difference to constrain another version of the ABD measure, the absolute value is applied:
[0363]
[0364] In summary, the described method allows determining whether an ideal positional match exists between a binding molecule (e.g., a ligand) and a docking site (e.g., a pocket of a protein) of a target molecule by associating two graphs and subsequently identifying the most ideal match (if any) or substantially the most ideal match. Note that it is not necessary to strictly select one binding molecule (e.g., a ligand) and one target molecule (e.g., a protein): nothing prevents the creation of graphs of multiple corresponding binding molecules and graphs of docking sites of the target molecule, and subsequently testing even a large number of binding molecules for matches with one or more docking sites (and vice versa), followed by ranking the matches using an error function.
[0365] F. From Mathematical Modeling to QUBO Formulation
[0366] According to another aspect, in order to solve the docking problem, the graph (G) representing the docking site is used. grid ) and the diagram representing the bound molecules (G) mol The association includes constructing (step 53c) the corresponding Quadratic Unconstrained Binary Optimization (QUBO) problem.
[0367] As described in Section E above, after obtaining the .mol2 file of the bound molecules and the .pdb file of the target molecule (protein), the corresponding graph (G) for each bound molecule is calculated. mol ) and the corresponding map of docking sites for each target molecule (G grid ).
[0368] The binding molecular diagram (G) obtained from the .mol2 filemol This can be simplified by removing terminal hydrogens and certain rigid segments, leaving only the atoms and edges that form the shortest path connecting the rotatable bonds adjacent to the rigid segments. Alternatively, nodes can be added at the centroid to replace the atoms removed from the segments, thus maintaining greater expressiveness.
[0369] Once some data from candidate binding molecules is obtained, the target molecule map (G) obtained from the .pdb file is... grid This can be expanded by adding nodes at key locations. For example, knowing the graph (G) obtained from bond length and bond corner substitutions... mol The side length distribution of ) can be found in the grid diagram (G). grid Insert nodes into the array where the edge weights (i.e., lengths) have a similar distribution.
[0370] The goal then is to determine the rules that associate each node of the molecular graph with a node of the mesh graph. .
[0371] Therefore, weighted subgraph isomorphism is used. Thus, it is necessary to extract the subgraph as graph G. mol In G grid Graphs of efficient isomorphism on As defined in equation (1), The graph is such that its vertices are the set of vertices V of a mesh graph. grid A subset of the graph; the edges are the set of edges E of the mesh graph. grid A subset of (only including those belonging to) (part of the edge).
[0372] Topological isomorphism (ignoring weights) determines the solutions that can exist on the mesh, but this does not mean that equation (2) holds. However, since determining the most ideal isomorphism is not an easy task, these solutions are still taken into account.
[0373] Given two graphs G mol and G grid The quadratic unconstrained bivariate optimization (QUBO) problem is formalized by limiting the set of bivariate variables:
[0374]
[0375] Therefore, the QUBO formula is limited to a weighted sum of two terms:
[0376] - Isomorphism: Ensures that every solution is G mol In G grid Effective topological isomorphism on (without considering edge weight W) mol and W grid The hard constraint of G. This constraint ensures that G mol Each node or vertex with G gridOne and only one node or vertex is associated with the molecule, and each edge of the molecule Mapped to edges that actually belong to the grid .
[0377] - Optimization term: If it is related to the edge belonging to the molecule Associated weights are mapped to edges with different weights. Then evaluate the penalty item.
[0378]
[0379] Introducing coefficient A as a hyperparameter of the problem: the isomorphism term and the optimization term are the Hamiltonian of the problem. It is part of the solution, but for a solution to be considered as such, the former must be satisfied.
[0380] To achieve this, the first part H of the Hamiltonian expression iso It is scaled to factor A to increase its priority (i.e., severely penalizes solutions that do not meet it).
[0381] F1. Isomorphisms in QUBO Formulation
[0382] The hard constraints of this problem must be ensured; if they are guaranteed, graph G can be found. mol In Figure G grid The graph is topologically isomorphic. Therefore, this term does not consider any values of the edge weights of the graph.
[0383] Isomorphisms must ensure two properties that are effectively mapped:
[0384] - Figure G mol Each node or vertex must be mapped to the mesh graph G. grid One and only one vertex:
[0385]
[0386] - Figure G mol Each edge must be mapped to a point belonging to the mesh graph G. grid A portion of the edges:
[0387]
[0388] The isomorphism term consists of the sum of the two components mentioned above:
[0389]
[0390] When both conditions are met, H isoThe value of is minimized. In the case of a fully connected graph associated with a protein pocket (or the pocket of another target molecule), the second term equals zero:
[0391] .
[0392] F2. Optimization terms for QUBO formulas
[0393] The goal of the optimization term is to evaluate as a graph In Figure G grid The solution of the efficient isomorphism on the topology. Since this can be a weighted isomorphism, the topological efficient matching may not satisfy the most ideal requirement of equation (2), that is, the weight of each edge. Edges with the same weight ( )match.
[0394] Therefore, solutions that satisfy the hard constraints have additional optimization scores that need to be minimized.
[0395]
[0396] This means that if the binary variable w u,u' (node) Mapped to node (above) and w v,v' (node) Mapped to node If the value of (above) equals 1, then the evaluation score is equal to the score of the edge. and edge The squared modulus of the difference in the relevant edge weights.
[0397] In other words, if the bond lengths or bond angles of atoms are distorted during isomorphism, then H opt The value will be strictly greater than 0.
[0398] The complete Hamiltonian related to the QUBO formulation of molecular docking problems using weighted subgraph isomorphisms for:
[0399]
[0400] Due to H iso It must be considered a hard constraint, therefore the hyperparameter A must be large enough to ensure .
[0401] The hyperparameter A can be chosen such that:
[0402]
[0403] For example, A could be one of the following:
[0404]
[0405] Technicians may choose other values of A depending on the circumstances.
[0406] As already discussed, another aspect relates to a computing system (100) comprising one or more processors; and a computer-readable storage device coupled to one or more processors and having instructions stored thereon that, when executed by one or more processors, cause one or more processors to perform the methods disclosed above and in the claims.
[0407] Furthermore, another aspect relates to a non-transitory computer-readable storage medium coupled to one or more processors (e.g., the processor of system 100) and having instructions stored thereon that, when executed by one or more processors, cause one or more processors to perform the methods described herein and claimed in the appended claims.
[0408] While this specification contains numerous details, it should not be construed as limiting the scope of this disclosure or the scope of possible claims, but rather as a description of specific features of particular embodiments. Certain features described in the context of individual embodiments in this specification may also be implemented in combination in a single embodiment. Conversely, multiple features described in the context of a single embodiment may also be implemented individually or in any suitable sub-combination in multiple embodiments. Furthermore, although features may be described above as functioning in certain combinations and even initially claimed, in some cases one or more features from that claimed combination may be removed, and the claimed combination may be for sub-combinations or variations thereof.
[0409] Similarly, although operations are depicted in a specific order in the accompanying drawings, this should not be construed as requiring such operations to be performed in the specific order or sequence shown, or requiring all shown operations to achieve the desired result. In some cases, multitasking and parallel processing can be advantageous. Furthermore, the separation of various system components in the above embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
[0410] Many embodiments have been described. However, it should be understood that various modifications can be made without departing from the spirit and scope of this disclosure. For example, the steps can be rearranged, added, or deleted in various forms of the process shown above. Therefore, other embodiments are within the scope of the appended claims.
Claims
1. A computer-based method for determining the optimal positional match between a binding molecule and a docking site on a target molecule, used in a drug discovery process, the computer-based method comprising the following steps: - Obtain a diagram representing the bound molecules (G mol (51), - Obtain a map representing docking sites of the target molecule (G grid (52), - The map (G) representing the docking sites of the target molecule grid ) and the diagram (G) representing the bound molecules mol (53) Correlation is used to determine whether there is an ideal positional match between the docking sites of the binding molecule and the target molecule.
2. The computer implementation method according to claim 1, wherein the graph (G) representing the binding molecule is obtained. mol (51) includes the following steps: - Receive (31) the molecular description of the molecule to be modeled, - The molecular representation (32) to be modeled is given by graph (G) mol ), wherein the diagram (G) representing the bound molecules mol )include: • Multiple nodes (1), wherein each node represents a corresponding atom of the combined molecule, • Multiple first sides (2), where each first side represents a bond between corresponding atomic pairs.
3. The computer implementation method according to claim 2, wherein the weight of each first side (2) represents the bond length between the corresponding atom pairs.
4. The computer implementation method according to claim 3, wherein the graph (G) representing the binding molecule mol It also includes: - One or more second sides (3), where each second side represents a corresponding bond angle, Each bond angle is the angle between the corresponding pair of bonds adjacent to the same atom, and the weight of each second side (3) represents the magnitude of the corresponding bond angle.
5. The computer implementation method according to claim 4, wherein the graph (G) representing the binding molecule mol It also includes: - One or more third sides (4), where each third side represents a corresponding dihedral angle. Each dihedral angle is formed between the plane defined by the first and second bonds (AB; BC) and the plane defined by the second and third bonds (BC, CD), wherein in the sequence of the first, second, third, and fourth atoms (A, B, C, D) connected in succession, the first bond (AB) is between the first atom and the second atom, the second bond (BC) is the bond between the second atom and the third atom, and the third bond is the bond between the third atom and the fourth atom, wherein the first and second bonds are adjacent to the second atom (B), and wherein the second and third bonds are adjacent to the third atom (C).
6. The computer implementation method according to claim 5, wherein the graph (G) representing the binding molecule mol It does not include any edges other than the first edge (2), the second edge (3), and the third edge (4).
7. The computer-implemented method according to any one of claims 5 to 6, wherein the binding molecule comprises five or more atoms, and wherein the graph (G) representing the binding molecule... mol ) includes one or more nodes, said nodes not connected to the same graph (G) representing the binding molecule by a first, second, or third edge. mol At least one other node in the chain is directly connected; or The aforementioned bound molecule comprises six or more atoms, and the diagram (G) representing the bound molecule is described. mol ) includes one or more nodes (1), said node (1) not connected to the same graph (G) representing the binding molecule by a first side, second side or third side. mol At least two or more other nodes in the ) are directly connected.
8. The computer-implemented method according to any one of claims 2 to 7, wherein the molecular description of the molecule to be modeled includes at least the following identification: - The atoms of the bound molecule, - The bonds between the atoms of the binding molecules, and - The type of the key.
9. The computer-implemented method according to any one of claims 2 to 8, wherein step (31) of receiving a molecular description of the molecule to be modeled comprises: - Access a database (103) containing multiple molecular descriptions in a given format, optionally wherein the database contains molecular descriptions in .mol2 file format; - Select the molecules to be modeled from the database; optionally, select the molecules to be modeled from the database that are described in .mol2 file format.
10. The computer implementation method according to any one of claims 2 to 9, wherein the combined molecule is represented as Figure (G) mol Before or during step (32), the method includes a molecular simplification step (33) of the molecule to be modeled, the molecular simplification step (33) including the removal of at least one or more atoms from the molecule to be modeled.
11. The computer implementation method according to claim 10, wherein the molecular simplification step (33) comprises: - A sub-step to reduce the number of atoms in the molecule to be modeled.
12. The computer implementation method of claim 11, wherein the sub-step of reducing the number of atoms in the modeling molecule is achieved by removing terminal hydrogen present in the molecule.
13. The computer implementation method according to claim 10, 11, or 12, wherein the molecular simplification step (33) comprises: - Fragment removal sub-step, which includes removing one or more atoms and one or more bonds that are not contained in the shortest path connecting adjacent rotatable bonds, and introducing one or more constraints to maintain the arbitrary dihedral angles of the molecule to be modeled.
14. The computer implementation method according to claim 10, 11, 12, or 13, wherein the molecular simplification step (33) comprises: - A fragment substitution sub-step, which includes removing one or more atoms and one or more bonds that are not contained in the shortest path connecting adjacent rotatable bonds, and replacing the removed atoms and bonds with dummy atoms, which optionally are located at the centroid of the removed atoms.
15. The computer-implemented method according to any one of the preceding claims, wherein the graph (G) representing the binding molecule mol () is a non-fully connected graph; In the diagram (G) representing the bound molecules mol In the equation, let n represent the total number of nodes (1), and let E represent the total number of edges (2, 3, 4). Then: E < [n (n – 1)] / 2.
16. The computer-implemented method according to any one of the preceding claims, wherein the map (G) representing the docking sites of the target molecule is obtained (52). grid This includes the following steps: - Receive the description of the docking site to be modeled (41). - The docking site to be modeled is represented as Figure (42), wherein the figure includes: • Multiple nodes, where each node represents a corresponding docking point of the docking site. • Multiple edges, where each edge represents a corresponding connection between two docking points.
17. The computer implementation method of claim 16, wherein the weight of each edge represents the Euclidean distance between two connected points.
18. The computer-implemented method according to any one of the preceding two claims, wherein the description of the docking site to be modeled includes the spatial coordinates of a plurality of docking points that are part of the docking site; and The description (41) of the receiving docking site to be modeled includes: - Access a database (103) containing descriptions of docking sites of multiple molecules in a given format, optionally wherein the database contains descriptions of docking sites of molecules in .pbp file format; - Select a docking site to be modeled from the database; optionally, select a docking site to be modeled in .pbp file format from the database.
19. The computer-implemented method according to any one of the preceding three claims, wherein the method includes an enhancement step (43), the enhancement step (43) comprising: - Based on information related to the binding molecules to be accommodated at the docking site of the target molecule, the graph (G) representing the docking sites is mapped... grid Add one or more nodes and one or more edges, wherein the binding molecule is optionally a ligand.
20. The computer implementation method of claim 19, wherein the enhancement step includes adding nodes to the graph representing docking sites such that the edge weight distribution in the graph representing docking sites is similar to the edge weight distribution in the graph representing binding molecules, the binding molecules optionally being ligands.
21. The computer-implemented method according to any one of the preceding five claims, wherein the method includes a reduction step (44), the reduction step (44) comprising: - Based on information related to the binding molecules to be accommodated at the docking site of the target molecule, remove the graph (G) representing the docking site. grid One or more nodes and one or more edges of a ), wherein the binding molecule is optionally a ligand.
22. The computer implementation method according to claim 21, wherein the reduction step comprises: Based on the diagram (G) representing the binding molecules mol The side length distribution of ) from the graph (G) representing the docking sites grid Remove the length of the graph representing the bound molecules in the graph (G). mol Any edge whose length does not match or differs from the given tolerance limit.
23. The computer implementation method according to any one of the preceding claims, wherein the graph (G) representing docking sites of the target molecule grid () is a fully connected graph.
24. The computer-implemented method according to any one of the preceding claims, wherein determining whether an ideal positional match exists between the docking sites of the binding molecule and the target molecule comprises: - Identify one or more of the most ideal three-dimensional orientations of the binding molecule at the docking site. Each three-dimensional pose defines the conformation, position, and orientation of the binding molecule within the docking site.
25. The computer-implemented method according to any one of the preceding claims, wherein the graph (G) representing the docking site is... grid ) and the diagram (G) representing the bound molecules mol The association (53) includes: - Determine the map (G) representing the docking sites of the target molecule as described in (53a). grid Does it contain a graph (G) that represents the binding molecule? mol Isomorphic subgraphs .
26. The computer-implemented method according to any one of claims 1 to 24, wherein the graph (G) representing the docking site is... grid ) and the diagram (G) representing the bound molecules mol The associations include: - Determine the map (G) representing the docking sites of the target molecule as described in (53b). grid Does it contain a graph (G) that represents the binding molecule? mol Subgraphs exhibiting weighted isomorphism .
27. The computer implementation method according to claim 26, wherein the graph (G) representing the docking sites of the target molecule is determined (53b). grid Does it contain a graph (G) that represents the binding molecule? mol Subgraphs exhibiting weighted isomorphism include: - By minimizing an error function, one or more optimal three-dimensional orientations of the binding molecule at the docking site are identified, the error function being correlated with the graph (G) representing the binding molecule. mol The edge weights of the graph (G) representing the docking sites and the graph representing the docking sites. grid subgraph The difference between the edge weights is related.
28. The computer-implemented method according to any one of claims 24 to 26, wherein identifying one or more optimal three-dimensional orientations of the binding molecule at the docking site includes verification: And minimize or zero the following error function: in: This means that for an error function that is zero or below a given acceptable threshold... In terms of border The relevant weights must be equal to or nearly equal to the edges. The weights, where and And for a given isomorphism f, for each All exist ; The items in the above expressions are as follows: - This is a diagram representing the bound molecules. - This is the diagram representing the docking sites. - It is G grid A subgraph, and is G mol In G grid isomorphism on, - V mol It is G mol The set of nodes, - E mol It is G mol The set of edges, - W mol It is G mol The set of edge weights, - V grid It is G grid The set of nodes, - E grid It is G grid The set of edges, - W grid It is G grid The set of edge weights, - yes The set of nodes, - yes The set of edges, - yes The set of edge weights, - u, v is G mol The node, - u', v' are G grid The node.
29. The computer-implemented method according to any one of claims 24 to 27, wherein the graph (G) representing the docking site is... grid ) and the diagram (G) representing the bound molecules mol The association (53) includes: - In the first stage (53a), a map (G) representing the docking sites of the target molecule is determined. grid Does it contain a graph (G) that represents the binding molecule? mol Subgraphs exhibiting unweighted isomorphism This allows for the identification of the pre-ideal three-dimensional orientation of the binding molecule at the docking site. - Then, in the second stage (53b), the optimal three-dimensional pose that minimizes the error function is selected from the pre-optimal three-dimensional poses identified in the first stage, the error function being correlated with the graph (G) representing the binding molecules. mol The edge weights of the graph (G) representing the docking sites and the graph representing the docking sites. grid subgraph The difference between the edge weights is related.
30. The computer implementation method according to claim 29, wherein in the first stage (53a), identifying the pre-ideal three-dimensional orientation of the binding molecule at the docking site includes verification: Furthermore, in the second stage (53b), selecting the optimal three-dimensional pose from the pre-optimal three-dimensional poses identified in the first stage includes determining the three-dimensional pose that minimizes or reduces to zero the following error function: in: This means that, for an error function of zero or a given acceptable threshold, the relationship with the edge... The relevant weights must be equal to or nearly equal to the edges. The weights, where and And for a given isomorphism f, for each All exist ; The items in the above expressions are as follows: - This is a diagram representing the bound molecules. - This is the diagram representing the docking sites. - It is G grid A subgraph, and is G mol G mol In G grid isomorphism on, - V mol It is G mol The set of nodes, - E mol It is G mol The set of edges, - W mol It is G mol The set of edge weights, - V grid It is G grid The set of nodes, - E grid It is G grid The set of edges, - W grid It is G grid The set of edge weights, - yes The set of nodes, - yes The set of edges, - yes The set of edge weights, - u, v is G mol The node, - u', v' are G grid The node.
31. The computer implementation method according to any one of the preceding claims, wherein the graph (G) representing the docking site is... grid ) and the diagram representing the bound molecules (G) mol The association (53) includes constructing (53c) the corresponding quadratic unconstrained binary optimization (QUBO) problem.
32. The computer implementation method according to claim 31, wherein the solution of the quadratic unconstrained binary optimization (QUBO) problem and / or the determination of the most ideal three-dimensional pose is performed by a quantum annealing (QA) optimization process executed by a quantum annealer.
33. The computer implementation method according to claim 31 or 32, wherein, Given the graph (G) representing the docking sites grid ) and the diagram representing the bound molecules (G) mol The corresponding quadratic unconstrained bivariate optimization (QUBO) problems include: - Define a set of binary variables: - The diagram (G) of the binding molecules mol Each node of ) is mapped to the docking point map (G) grid One and only one node: - The diagram (G) of the binding molecules mol Each edge of ) is mapped to the docking point map (G) grid The edge of ) - Define isomorphisms: - Define optimization terms: This means that if a binary variable x u,u’ It will node Mapped to node on, and x v,v’ It will node Mapped to node superior If all values are equal to 1, then the evaluation score is given, and the score is equal to the value on the edge. With edge The squared modulus of the difference between the relevant edge weights - The complete Hamiltonian associated with the formulation of the quadratic unconstrained bivariate optimization (QUBO) problem Defined as: Where A is the optimization parameter.
34. The computer implementation method of claim 33, wherein the complete Hamiltonian is made... Minimization is used to determine the optimal three-dimensional pose that minimizes the error function, which is related to the graph (G) representing the binding molecules. mol The edge weights of the graph (G) representing the docking sites and the graph representing the docking sites. grid subgraph The difference between the edge weights is related.
35. A computer-based method for a drug discovery process, wherein the method is configured to identify one or more binding molecules representing a potential match at the docking site of a target molecule from a list of candidate binding molecules, and optionally to identify one or more ligands representing a potential match with a protein pocket from a list of numerous ligands. The method includes: - Perform the method according to any one of the preceding claims on each candidate binding molecule. - Identify one or more binding molecules among the candidate binding molecules that represent potential matches at the docking site of the target molecule, as those binding molecules that have been determined to have the most ideal positional match at the docking site of the target molecule.
36. The computer-implemented method according to any one of the preceding claims, wherein the binding molecule is a ligand and the target molecule is a protein, wherein the computer-implemented method determines the most ideal positional match of the ligand at the docking site of the target protein.
37. A drug discovery process, comprising a computer-implemented method according to any one of the preceding claims.
38. A computing system, comprising: One or more processors; and A computer-readable storage device coupled to one or more processors and storing instructions thereon that, when executed by one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 36.
39. A non-transitory computer-readable storage medium, which can be coupled to one or more processors and stores instructions thereon that, when executed by the one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 36.