Method and apparatus for predicting target protein-based toxicity value
The method addresses the limitations of conventional toxicity prediction by incorporating binding affinity and compound structure analysis, enabling efficient design of non-toxic compounds through advanced computational methods, thereby optimizing drug development.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NAMUICT CO LTD
- Filing Date
- 2025-01-03
- Publication Date
- 2026-07-02
AI Technical Summary
Conventional toxicity prediction methods are limited by relying solely on compound structure databases, failing to account for binding affinity with proteins, and do not provide guidance on modifying binding sites to create non-toxic compounds that effectively bind to proteins.
A method and apparatus that predicts toxicity values by considering the binding affinity between a target protein and a compound, using a meta-ensemble approach with 2D and 3D descriptive methods, molecular graph methods, and molecular fingerprint methods, along with artificial intelligence learning to identify influential compound fragments and optimize their structure for non-toxicity.
Enables efficient design of non-toxic compounds by understanding protein-compound interactions, reducing the time and cost of new drug development by identifying critical compound fragments that affect toxicity.
Smart Images

Figure KR2025000114_02072026_PF_FP_ABST
Abstract
Description
Method and apparatus for predicting target protein-based toxicity value
[0001] The present invention relates to a method and apparatus for predicting toxicity values based on a target protein, and more specifically, to a method and apparatus for predicting toxicity values based on a correlation between the binding affinity between a protein and a compound related to toxicity and toxicity information, and to provide conditions for creating a new compound by identifying the influence of detailed pieces constituting the compound on the toxicity value.
[0002] One of the critical requirements when manufacturing novel compounds, such as new drugs, is that they must be non-toxic. Conventional toxicity prediction devices are limited to utilizing databases that record toxicity information derived from the compound's structure. Due to this approach, it is currently very difficult to efficiently design non-toxic compounds.
[0003] The problem with predicting toxicity values based solely on the structure of a compound is that existing database values do not match due to binding with proteins. To address this, the binding affinity between the target protein and the compound must be additionally considered, but there is a problem in that no such method has been proposed.
[0004] Furthermore, even if toxicity is identified using existing methods, there is a problem in that it cannot answer which part of the binding site between the compound and the protein needs to be modified to create a new compound that is non-toxic yet binds well to the protein.
[0005] The present invention aims to solve the aforementioned problems by providing a target protein-based toxicity value prediction method and apparatus that can efficiently design a novel, non-toxic compound by identifying the influence of each detailed component constituting the compound on the toxicity value and providing conditions for the creation of the novel compound.
[0006] A method for predicting a target protein-based toxicity value according to an embodiment of the present invention for achieving the above objective may include the steps of: receiving target protein structure information, compound structure information, and toxicity information; predicting binding affinity based on the received target protein structure information and compound structure information; obtaining compound characteristic information from the received compound structure information through a meta-ensemble method; learning the predicted binding affinity and the received toxicity information through one of a fully connected layer and a graph convolution layer, respectively, according to a method for configuring the meta-ensemble method, and predicting a toxicity value by concatenating the respective learning results; if the predicted toxicity value is greater than or equal to a preset value, analyzing the influence of the detailed fragments constituting the compound on the toxicity value; and determining the detailed fragments among the detailed fragments that should be maintained when creating a new compound using the analyzed influence.
[0007] And the step of predicting the binding affinity may include: a step of deriving side chain information of the target protein based on the received target protein structure information; a step of obtaining a compound structure within the pocket structure of the target protein through the consensus of at least three analysis methods using rigid docking and flexible docking methods, respectively, based on the received compound structure information and the derived side chain information; and a step of predicting the binding affinity between the target protein and the compound based on the compound structure within the obtained pocket structure.
[0008] Additionally, the step of predicting the binding affinity between the target protein and the compound based on the compound structure within the obtained pocket structure may include the step of obtaining the optimal side chain position of the target protein based on the compound structure within the obtained pocket structure, the step of deriving the final structure of the compound to which a diffusion model is applied within the pocket structure of the target protein based on the optimal side chain position, and the step of predicting the binding affinity between the target protein and the compound using the final structure of the derived compound.
[0009] And the step of obtaining the compound feature information can obtain the first to third compound feature information by applying the 2D and 3D descriptive method, the molecular graph method, and the molecular fingerprint method, respectively, to the received compound structure information.
[0010] Additionally, the step of predicting the toxicity value may include: a step of extracting weights to be applied to the predicted binding affinity and the received toxicity information; a step of generating first to third integrated information by applying the extracted weights to the predicted binding affinity and the received toxicity information and then integrating them with the first to third compound characteristic information, respectively; a step of learning the generated first and third integrated information through a fully connected layer and learning the generated second integrated information through a graph convolution layer; and a step of predicting the toxicity value by concatenating the respective learning results.
[0011] Meanwhile, a target protein-based toxicity value prediction device according to one embodiment of the present invention for achieving the above objective includes a communication unit that communicates with an external device, a database that stores data, and a processor that controls the target protein-based toxicity value prediction device. The processor receives target protein structure information, compound structure information, and toxicity information through the communication unit, predicts binding affinity based on the received target protein structure information and compound structure information, obtains compound characteristic information from the received compound structure information through a meta-ensemble method, learns the predicted binding affinity and the received toxicity information through one of a fully connected layer and a graph convolution layer according to a method of configuring the meta-ensemble method, predicts a toxicity value by concatenating the respective learning results, analyzes the influence of the detailed fragments constituting the compound on the toxicity value if the predicted toxicity value is greater than or equal to a preset value, and uses the analyzed influence to determine the detailed fragments among the detailed fragments that must be maintained when creating a new compound, and can store information about the detailed fragments in the database.
[0012] And the processor derives side chain information of the target protein based on the received target protein structure information, obtains the compound structure within the pocket structure of the target protein through the consensus of at least three analysis methods using rigid docking and flexible docking methods, respectively, with the received compound structure information and the derived side chain information, and can predict the binding affinity between the target protein and the compound based on the obtained compound structure within the pocket structure.
[0013] In addition, the processor can obtain the optimal side chain position of the target protein based on the compound structure within the obtained pocket structure, derive the final structure of the compound to which a diffusion model is applied within the pocket structure of the target protein based on the obtained optimal side chain position, and predict the binding affinity between the target protein and the compound using the derived final structure of the compound.
[0014] And the processor can obtain first to third compound feature information by applying the 2D and 3D descriptive methods, molecular graph methods, and molecular fingerprint methods, respectively, to the received compound structure information.
[0015] Additionally, the processor extracts weights to be applied to the predicted binding affinity and the received toxicity information, applies the extracted weights to the predicted binding affinity and the received toxicity information, and then integrates them with the first to third compound feature information to generate first to third integrated information, the generated first and third integrated information are learned through a fully connected layer, the generated second integrated information is learned through a graph convolution layer, and the respective learning results are concatenated to predict a toxicity value.
[0016] According to various embodiments of the present invention as described above, the mechanism of action of a compound with a protein that acts as a toxin can be understood, and based on this, the influence of the detailed components constituting the compound on the toxicity value can be identified to provide conditions for the generation of a novel compound. Through this, non-toxic compounds can be designed very efficiently, and the time and cost involved in new drug development can be saved.
[0017] FIG. 1 is a schematic block diagram illustrating the configuration of a target protein-based toxicity value prediction device according to one embodiment of the present invention,
[0018] FIG. 2 is a reference diagram for explaining the function of a target protein-based toxicity value prediction device,
[0019] FIG. 3 is a conceptual diagram for explaining the operation of a compound characteristic extraction unit,
[0020] FIG. 4 is a conceptual diagram for explaining the operation of the toxicity value prediction unit,
[0021] Figure 5 is a reference diagram to explain the details of the compound structure, and,
[0022] FIG. 6 is a flowchart illustrating a target protein-based toxicity value prediction method according to one embodiment of the present invention.
[0023] Various embodiments of this document are described below with reference to the accompanying drawings. However, this is not intended to limit the technology described in this document to specific embodiments and should be understood to include various modifications, equivalents, and / or alternatives to the embodiments of this document. Similar reference numerals may be used for similar components in connection with the description of the drawings.
[0024] In this document, expressions such as 'have,' 'can have,' 'include,' or 'can include' refer to the existence of the relevant feature (e.g., components such as numerical values, functions, actions, or parts) and do not exclude the existence of additional features.
[0025] In this document, expressions such as 'A or B', 'at least one of A and / or B', or 'one or more of A and / or B' may include all possible combinations of items listed together. For example, 'A or B', 'at least one of A and B', or 'at least one of A or B' may refer to cases including (1) at least one A, (2) at least one B, or (3) both at least one A and at least one B. Expressions such as 'first', 'second', 'first', or 'second' used in this document may modify various components regardless of order and / or importance, and are used only to distinguish one component from another and do not limit said components.
[0026] As used in this document, the expression 'configured to' may be replaced, depending on the context, with, for example, 'suitable for,' 'having the capacity to,' 'designed to,' 'adapted to,' 'made to,' or 'capable of.' The term 'configured to' does not necessarily mean 'specifically designed to' in hardware. Instead, in some situations, the expression 'device configured to' may mean that the device is 'capable of' doing something together with other devices or components.
[0027] The terms used in this specification are for the purpose of describing embodiments and are not intended to limit or / or restrict the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "comprising" or "having" are intended to indicate the existence of the features, numbers, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, actions, components, parts, or combinations thereof.
[0028] In the embodiments, a 'module' or 'part' performs at least one function or operation and may be implemented in hardware or software, or a combination of hardware or software. Additionally, a plurality of 'modules' or a plurality of 'parts' may be integrated into at least one module and implemented by at least one processor, except for a 'module' or 'part' that needs to be implemented in specific hardware.
[0029] The present invention will be described in detail below using the attached drawings.
[0030] FIG. 1 is a schematic block diagram for explaining the configuration of a target protein-based toxicity value prediction device (100) according to an embodiment of the present invention. Referring to FIG. 1, the toxicity value prediction device (100) may include a communication unit (110), a database (120), and a processor (130). It goes without saying that other components, such as an input unit (not shown), may also be included.
[0031] The toxicity value prediction device (100) can be implemented in various forms. For example, the toxicity value prediction device (100) may be implemented as a separate electronic device to provide the function of predicting toxicity values through communication with an external device or server and determining a compound fragment that does not exhibit toxicity when a new compound is created. As another example, the toxicity value prediction device (100) may be implemented in the form of a cloud server to provide toxicity value prediction and compound fragment determination functions in a SaaS format.
[0032] The communication unit (110) communicates with external devices or servers, etc. For example, the communication unit (110) can receive data such as protein structure information, compound structure information, and toxicity information from an external server or database. To this end, the communication unit (110) can use wired communication methods such as LAN and HDMI, and wireless communication methods such as wireless LAN, NFC, IR communication, Zigbee communication, and Bluetooth.
[0033] The database (120) may store various modules, software, functions, artificial intelligence trained models, agents, data, etc. for driving the toxicity value prediction device (100). For example, the database (120) may store at least one artificial intelligence trained model for predicting toxicity values, and may store information on compound details determined to be maintained when creating new compounds.
[0034] The processor (130) can control the remaining configuration of the toxicity value prediction device (100). For example, the processor (130) can control the communication unit (110) to receive protein structure information, compound information, toxicity information, etc. from an external device. The processor (130) may be implemented as a single CPU to perform functions such as predicting binding affinity, obtaining compound characteristic information, predicting toxicity values, analyzing the impact of toxicity values, and generating new compounds, or it may be implemented as multiple processors and IPs or accelerators that perform specific functions. The operation of the specific processor (130) will be described below.
[0035] The processor (130) may receive relevant data from an external server or database, etc., to predict toxicity values and provide conditions for the creation of new compounds. Specifically, the processor (130) may control the communication unit (110) to receive target protein structure information, compound structure information, and toxicity information. For example, toxicity information may be received from the database of the EPA (United States Environmental Protection Agency) or PubChem. The processor (130) may integrate the received data and store it in the database (120).
[0036] To explain the operation of the processor (130), it will be explained by dividing it into units according to function. Referring to FIG. 2, the processor (130) may include a binding strength prediction unit (131), a compound characteristic extraction unit (132), a toxicity value prediction unit (133), a toxicity characteristic evaluation unit (134), and a new compound generation unit (135).
[0037] The processor (130) can predict the binding affinity between the target protein and the compound. Specifically, the binding affinity prediction unit (131) can predict the binding affinity between them based on the target protein structure information and the compound structure information.
[0038] The binding strength prediction unit (131) can derive side chain information of the target protein based on the target protein structure information. The binding strength prediction unit (131) can obtain the compound structure within the pocket structure of the target protein from the side chain information of the target protein and the compound structure information. When obtaining the compound structure within the pocket structure, the binding strength prediction unit (131) obtains it through the consensus of at least three analysis methods. For example, the binding strength prediction unit (131) can use three or more analysis methods that differ in scoring function, accuracy and performance depending on the situation, and scope of application. For example, the binding strength prediction unit (131) can use analysis methods such as AD4 (AutoDock4), GNINA, and SMINA. AD4 uses a semi-empirical scoring function, GNINA uses a convolutional neural network (CNN) ensemble as the scoring function, and SMINA is based on the empirical scoring function of Vina. The bonding force prediction unit (131) can predict a more accurate bonding force by using a combination of these.
[0039] Additionally, the binding strength prediction unit (131) can obtain the compound structure within the pocket structure through the consensus of the results obtained when using three or more analysis methods, respectively, using rigid docking and flexible docking methods. Rigid docking can calculate the binding structure quickly and simply, but it has the disadvantage that accuracy may decrease when structural changes are large. In the case of rigid docking, only the rotation and movement of the ligand are mainly considered because the structures of the ligand and the receptor are fixed. In contrast, flexible docking allows for realistic modeling by considering the flexibility of the molecule, but it has the disadvantage of being expensive and complex. Flexible docking allows for structural changes of the ligand and can also consider the flexibility of the side chain of the receptor's binding site. In this regard, it is important to use an appropriate method depending on the characteristics of the target compound. In the present invention, the binding strength prediction unit (131) can predict the binding strength through a method of combining rigid docking and flexible docking. Each of the three or more analysis methods has the problem that only one of the rigid docking or flexible docking methods can be applied at a time. To overcome this, the bonding force prediction unit (131) can consider the results of the analysis method using the rigid docking method and the results of the analysis method using the flexible docking method together.
[0040] When the structure of a compound within the pocket structure of a target protein is obtained, the binding strength prediction unit (131) can predict the binding affinity between the target protein and the compound. Specifically, the binding strength prediction unit (131) can obtain the optimal location of the side chain of the target protein based on the structure of the compound within the obtained pocket structure. At this time, it is possible to obtain the optimal location of the side chains of the target protein using an energy minimization technique. Subsequently, the binding strength prediction unit (131) can derive the final structure of the compound within the pocket structure of the target protein based on the optimal location of the side chain. At this time, to derive the final structure of the compound, the binding strength prediction unit (131) can use an artificial intelligence learning completed model that uses a diffusion model within the protein pocket structure. Using the final structure of the compound derived in this way, the binding strength prediction unit (131) can predict the binding affinity between the target protein and the compound. The processor (130) can control the database (120) to store the predicted binding affinity information.
[0041] The processor (130) can obtain characteristic information of the compound itself. Specifically, the compound characteristic extraction unit (132) can obtain compound characteristic information from the compound structure information through a meta-ensemble method.
[0042] A meta-ensemble method refers to the simultaneous use of 2D and 3D descriptive methods, molecular graph methods, and molecular fingerprint methods. As shown in the example of FIG. 3, the compound characteristic extraction unit (132) can obtain first to third compound characteristic information by applying the 2D and 3D descriptive methods, molecular graph methods, and molecular fingerprint methods, respectively, to the received compound structure information.
[0043] The 2D and 3D descriptive information acquisition unit (1321) can extract first compound feature information including chemical information, geometric information, etc. from the received compound information. For example, the 2D and 3D descriptive information acquisition unit (1321) can extract information such as the number of atoms, number of bonds, molecular weight, autocorrelation descriptor, substituent constant, surface-to-volume ratio, etc.
[0044] The molecular graph characteristic information acquisition unit (1322) can extract second compound characteristic information including structural characteristics, chemical characteristics, physicochemical characteristics, biological activity, reactivity, etc. from the received compound information. For example, structural characteristics may include the size, shape, and symmetry of the molecule, chemical characteristics may include the bonding pattern between atoms and the presence or absence of functional groups, physicochemical characteristics may include solubility, polarity, molecular weight, etc., and biological activity may include the efficacy and toxicity of the drug.
[0045] The molecular fingerprint information acquisition unit (1323) can extract third compound feature information, including specific structural features, the presence of substructures or atomic patterns, etc., from the received compound information. For example, the molecular fingerprint information acquisition unit (1323) can identify compounds likely to have specific activity in a large-scale compound library by converting the structural features of a molecule into a fingerprint. Thus, the extracted third compound feature information can be used to predict and optimize the characteristics of new drug candidate substances.
[0046] The processor (130) can perform artificial intelligence learning for predicting toxicity values using the predicted binding affinity, received toxicity information, and first to third compound characteristic information obtained by the meta-ensemble method. The toxicity value prediction unit (133) can vary the learning method depending on which of the methods constituting the meta-ensemble method is used to obtain the information. Then, the processor (130) can predict toxicity values through the artificial intelligence model that has completed the learning. As described below, the processor (130) can comprehensively predict toxicity values by concatenating the outputs from each of the three artificial intelligence models that have completed the learning.
[0047] Referring to FIG. 4, the toxicity value prediction unit (133) can learn the first compound feature information obtained from the 2D and 3D descriptive information acquisition unit (1321) among the meta-ensemble methods, along with the predicted binding affinity and received toxicity information, through the first fully connected layer module (1331). At this time, the first compound feature information, the predicted binding affinity, and the received toxicity information can be integrated in advance as input data to generate the first integrated information. When generating the first integrated information, the toxicity value prediction unit (133) can extract weights to be applied to the binding affinity and toxicity information in advance. Unlike conventional methods that simply predict toxicity based on the structure of a compound, the present invention uses binding affinity together with the structure of the compound. At this time, various weights can be applied, such as applying the structure information and the binding affinity information equally, lowering the weight of the binding affinity information, or increasing the weight of the binding affinity information. For example, the toxicity value prediction unit (133) can determine how to determine the weight to be applied based on the first compound characteristic information.
[0048] Likewise, the toxicity value prediction unit (133) can learn the second compound characteristic information obtained from the molecular graph characteristic information acquisition unit (1322) among the meta-ensemble methods, along with the predicted binding affinity and received toxicity information, through the graph convolution network layer module (1332). At this time, the second compound characteristic information, the predicted binding affinity, and the received toxicity information can be integrated in advance as input data to generate second integrated information. When generating the second integrated information, the content for extracting weights is replaced with the content explained when generating the first integrated information.
[0049] Additionally, the toxicity value prediction unit (133) can learn the third compound characteristic information obtained from the molecular fingerprint information acquisition unit (1323) among the meta-ensemble methods, along with the predicted binding affinity and received toxicity information, through the second fully connected layer module (1333). At this time, the third compound characteristic information, the predicted binding affinity, and the received toxicity information can be integrated in advance as input data to generate the third integrated information. When generating the third integrated information, the content for extracting weights is replaced with the content explained when generating the first integrated information.
[0050] In this way, the toxicity value prediction unit (133) uses an appropriate artificial intelligence learning method according to the input compound characteristic information and can predict the toxicity value by concatenating the output results of each completed artificial intelligence learning model. A fully connected layer (FC Layer) has a dense connection structure in which all input nodes are connected to all output nodes. A fully connected layer is applicable to all types of data and does not consider the structural characteristics of the input data. A graph convolution network layer (GCN Layer) has a sparse connection structure in which each node is connected only to neighboring nodes. A graph convolution network layer uses only a fixed number of parameters regardless of the graph size and processes data by directly utilizing the graph structure. In this respect, a graph convolution network layer is more suitable for processing the second compound characteristic information obtained by the molecular graph method.
[0051] The toxicity value prediction unit (133) can predict toxicity values by concatenating the outputs from AI learning completed models trained by different AI learning methods. Through concatenation, the performance of the toxicity value prediction model can be improved by combining features obtained from various learning methods. The toxicity value prediction unit (133) can extract feature vectors of results output from AI learning completed models trained in the first fully connected layer module (1331), the graph convolution network layer module (1332), and the second fully connected layer module (1333), respectively. Then, the toxicity value prediction unit (133) can combine the extracted feature vectors according to a preset dimension. Subsequently, the toxicity value prediction unit (133) can perform a final toxicity value prediction using the combined feature vectors.
[0052] The processor (130) can control the database (120) to store the compound as a candidate for new compound creation if the predicted toxicity value is less than or equal to a preset value. The processor (130) can create a new compound using the information on candidates for new compound creation stored in the database (120), and this can be expressed as the operation of the new compound creation unit (135).
[0053] Conversely, if the predicted toxicity value is greater than or equal to a preset value, the processor (130) can analyze the influence of the detailed pieces constituting the compound on the toxicity value. Specifically, the toxicity characteristic evaluation unit (134) analyzes the influence of each detailed piece constituting the compound on the toxicity value and, based on the analysis results, can determine whether some detailed pieces may be retained when creating a new compound, while others are pieces that increase the toxicity value and must be excluded when creating a new compound. That is, the toxicity characteristic evaluation unit (134) can determine which detailed piece among the detailed pieces constituting the candidate compound should be retained when creating a new compound. FIG. 5 is a reference diagram for explaining the detailed pieces in the structure of a compound. In the structure of a compound, the detailed pieces and their structures, particularly those shown in FIG. 5, affect toxicity. Since it is possible to determine which detailed pieces must be used to create a new compound without toxicity, according to one embodiment of the present invention, a new compound (e.g., a new drug) can be created more efficiently.
[0054] The new compound generation unit (135) can generate a new compound based on detailed piece information to be maintained. Because it is based on information regarding detailed piece information to be maintained, the candidate compound newly created by the new compound generation unit (135) may be a compound that is freer from toxicity issues.
[0055] According to various embodiments of the present invention as described above, the accuracy of the prediction can be improved by additionally utilizing the binding affinity between the target protein and the compound, rather than simply predicting toxicity values based solely on information obtained from the structure of the compound. Furthermore, compound characteristic information having multiple different features can be obtained using a meta-ensemble method, and effective analysis is possible by using an artificial intelligence learning method suited to these features. In addition, by concatenating various artificial intelligence learning results, toxicity value prediction results reflecting various features can be obtained, thereby allowing for the determination of which specific fragment of the compound affects toxicity. If a new candidate compound is generated by retaining only the specific fragments that do not affect toxicity, it is possible to manufacture a compound that is freer from toxicity.
[0056] FIG. 6 is a flowchart illustrating a method for predicting the potential toxicity value of a target protein according to an embodiment of the present invention. Referring to FIG. 6, a toxicity value prediction device (100) can receive target protein structure information, compound structure information, and toxicity information (S610). Subsequently, the toxicity value prediction device (100) can predict binding affinity based on the received target protein structure information and compound structure information (S620). Specifically, the toxicity value prediction device (100) can derive side chain information of the target protein based on the received target protein structure information. Then, the compound structure within the pocket structure of the target protein can be obtained through the consensus of at least three analysis methods using rigid docking and flexible docking methods, respectively, with the received compound structure information and the derived side chain information.
[0057] Furthermore, the toxicity value prediction device (100) can predict the binding affinity between the target protein and the compound based on the structure of the compound within the acquired pocket structure. Specifically, the toxicity value prediction device (100) can obtain the optimal position of the side chain of the target protein based on the structure of the compound within the acquired pocket structure, and derive the final structure of the compound to which a diffusion model is applied within the pocket structure of the target protein based on the optimal position of the side chain. Using the final structure of the derived compound, the toxicity value prediction device (100) can predict the binding affinity between the target protein and the compound.
[0058] The toxicity value prediction device (100) can obtain compound characteristic information from the received compound structure information through a meta-ensemble method (S630). Step S630 may be performed after Step S620, but it may also be performed simultaneously in parallel with Step S620. The toxicity value prediction device (100) can obtain first to third compound characteristic information through a meta-ensemble method that applies the 2D and 3D descriptive method, the molecular graph method, and the molecular fingerprint method, respectively, to the received compound structure information.
[0059] Next, the toxicity value prediction device (100) inputs the binding affinity, toxicity information, and the first to third compound characteristic information obtained through the meta-ensemble method into an artificial intelligence model, and can predict the toxicity value by concatenating each output value (S640). That is, the toxicity value prediction device (100) learns the predicted binding affinity and the received toxicity information through one of a fully connected layer and a graph convolution layer, respectively, according to the method of configuring the meta-ensemble method, and can predict the toxicity value by concatenating each learning result.
[0060] Specifically, the toxicity value prediction device (100) can extract weights to be applied to the predicted binding affinity and the received toxicity information. Then, the toxicity value prediction device (100) can apply the extracted weights to the predicted binding affinity and the received toxicity information. Subsequently, the toxicity value prediction device (100) can generate first to third integrated information by integrating the first to third compound characteristic information and the binding affinity and toxicity information to which the weights have been applied, respectively. At this time, the first and third integrated information generated can be learned through a fully connected layer, and the second integrated information generated can be learned through a graph convolution network layer. The toxicity value prediction device (100) can predict the toxicity value by concatenating each learning result.
[0061] If the predicted toxicity value is greater than or equal to a preset value (S650-Y), that is, if the compound is a toxicity problem, the toxicity value prediction device (100) can analyze the influence of each of the detailed pieces constituting the compound on the toxicity value (S660). Then, the toxicity value prediction device (100) can use the analyzed influence to determine which of the detailed pieces should be retained when creating a new compound (S670). By using the detailed pieces determined in this way, there is no need to worry about toxicity issues when designing and creating new compounds in the future, thus enabling the establishment of an efficient new compound design process.
[0062] The methods described above may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., either individually or in combination. The program instructions recorded on the medium may be those specifically designed and configured for the present invention, or they may be those known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, and flash memory. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.
[0063] As described above, although the present disclosure has been explained by limited embodiments and drawings, the present disclosure is not limited to the above embodiments, and various modifications and variations are possible from this description by those skilled in the art to which the present disclosure belongs. Therefore, the scope of the present disclosure should not be limited to the described embodiments, but should be defined by the claims set forth below as well as equivalents thereof.
Claims
In a method for predicting toxicity values based on target proteins, A step of receiving target protein structure information, compound structure information, and toxicity information; A step of predicting binding affinity based on the received target protein structure information and compound structure information; A step of obtaining compound feature information from the received compound structure information through a meta-ensemble method; A step of learning the predicted binding affinity and the received toxicity information through one of a fully connected layer and a graph convolution layer, respectively, according to the method of configuring the meta-ensemble method, and predicting a toxicity value by concatenating the respective learning results; If the predicted toxicity value is greater than or equal to a preset value, a step of analyzing the influence of the detailed pieces constituting the compound on the toxicity value; and A method for predicting toxicity values based on a target protein, comprising the step of determining a detailed piece to be retained among the detailed pieces when creating a new compound using the analyzed effects above. In paragraph 1, The step of predicting the bond affinity above is, A step of deriving side chain information of the target protein based on the received target protein structure information; A step of obtaining a compound structure within the pocket structure of the target protein through the consensus of at least three analysis methods using rigid docking and flexible docking methods, respectively, with respect to the received compound structure information and the derived side chain information; and A method for predicting toxicity values comprising the step of predicting the binding affinity between the target protein and the compound based on the structure of the compound within the pocket structure obtained above. In paragraph 2, The step of predicting the binding affinity between the target protein and the compound based on the structure of the compound within the pocket structure obtained above is: A step of obtaining the optimal position of the side chain of the target protein based on the compound structure within the pocket structure obtained above; A step of deriving the final structure of the compound to which a diffusion model is applied within the pocket structure of the target protein based on the optimal side chain position obtained above; and A method for predicting toxicity values comprising the step of predicting the binding affinity between the target protein and the compound using the final structure of the compound derived above. In paragraph 1, The step of obtaining the above compound characteristic information is, A method for predicting toxicity values by applying the 2D and 3D descriptive methods, molecular graph methods, and molecular fingerprint methods, respectively, to the received compound structure information to obtain first to third compound characteristic information. In paragraph 4, The step of predicting the above toxicity value is, A step of extracting weights to be applied to the predicted binding affinity and the received toxicity information; A step of applying the extracted weights to the predicted binding affinity and the received toxicity information, and then integrating them with the first to third compound characteristic information, respectively, to generate first to third integrated information; A step of learning the first and third integrated information generated above through a fully connected layer, and learning the second integrated information generated above through a graph convolution layer; and A method for predicting toxicity values comprising the step of predicting toxicity values by concatenating each of the above learning results. In a target protein-based toxicity value prediction device, A communication unit that communicates with an external device; A database that stores data; and A processor for controlling the target protein-based toxicity value prediction device; comprising The above processor is, A target protein-based toxicity value prediction device that receives target protein structure information, compound structure information, and toxicity information through the communication unit, predicts binding affinity based on the received target protein structure information and compound structure information, obtains compound characteristic information from the received compound structure information through a meta-ensemble method, learns the predicted binding affinity and the received toxicity information through one of a fully connected layer and a graph convolution layer according to the method of configuring the meta-ensemble method, predicts a toxicity value by concatenating the respective learning results, analyzes the influence of the detailed fragments constituting the compound on the toxicity value if the predicted toxicity value is greater than or equal to a preset value, determines the detailed fragments among the detailed fragments that must be maintained when creating a new compound using the analyzed influence, and stores information about the detailed fragments in the database. In paragraph 6, The above processor is, A toxicity value prediction device that derives side chain information of a target protein based on the received target protein structure information, obtains a compound structure within a pocket structure of the target protein through the consensus of at least three analysis methods using rigid docking and flexible docking methods, respectively, based on the received compound structure information and the derived side chain information, and predicts the binding affinity between the target protein and the compound based on the obtained compound structure within the pocket structure. In Paragraph 7, The above processor is, A toxicity value prediction device that obtains an optimal side chain position of the target protein based on the structure of the compound within the pocket structure obtained above, derives a final structure of the compound to which a diffusion model is applied within the pocket structure of the target protein based on the optimal side chain position obtained above, and predicts the binding affinity between the target protein and the compound using the final structure of the derived compound. In paragraph 6, The above processor is, A toxicity value prediction device that obtains first to third compound characteristic information by applying the 2D and 3D descriptive methods, molecular graph methods, and molecular fingerprint methods, respectively, to the received compound structure information. In Paragraph 9, The above processor is, A toxicity value prediction device that extracts weights to be applied to the predicted binding affinity and the received toxicity information, applies the extracted weights to the predicted binding affinity and the received toxicity information, and then integrates them with the first to third compound characteristic information to generate first to third integrated information, learns the generated first and third integrated information through a fully connected layer, learns the generated second integrated information through a graph convolution layer, and predicts a toxicity value by concatenating the respective learning results.