Stent-oriented universal wire system
By converting the chemical structure of molecules into the SOULS representation of scaffold and decorative sequences, the problem of existing line notation being difficult to use in computational environments is solved, thereby improving computational efficiency and the performance of machine learning models.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSILICO MEDICINE IP LTD
- Filing Date
- 2021-01-14
- Publication Date
- 2026-06-16
AI Technical Summary
Existing line notations for computational chemistry structures are difficult to use effectively in a computer environment, especially in the development of deep neural networks, and need to be improved to generate molecules that satisfy specific targets.
The SOULS (Solar Scaffold-Oriented Universal Line System) representation method is used to convert the chemical structure of molecules into scaffold and decoration sequences. By identifying and separating scaffolds and decorations, line symbols that can be used for computational protocols are generated.
It improves the efficiency and accuracy of computational chemical structures, especially in machine learning models, enhancing the ability to generate and predict models.
Smart Images

Figure CN115088039B_ABST
Abstract
Description
[0001] Inventor
[0002] A. Zavorokov
[0003] D. Polikovsky
[0004] M. Kuznetsov
[0005] A. Filimonov
[0006] Cross-references
[0007] This patent application claims priority to U.S. Patent Application Serial No. 16 / 831,747, filed March 26, 2020, and U.S. Provisional Application No. 62 / 966,465, filed January 27, 2020, each of which is incorporated herein by reference in its entirety. Background of the Invention Technical Field
[0009] This disclosure relates to systems and methods for providing scaffold-oriented universal line systems for chemical symbols. More specifically, this disclosure relates to systems and methods for converting simplified molecular input line system (SMILES) symbols or graphic symbols of molecules into more useful scaffold-oriented universal line systems (SOULS).
[0010] Related technical specifications
[0011] Chemical structures have at least a two-dimensional graphical representation of molecules, and often a three-dimensional representation. However, it is difficult to use such 2D or 3D representations when calculating chemical structures in a computer environment. Therefore, chemical structures can be defined using line notation, such as molecular string representations. Molecular string representation is a type of line notation that uses ASCII strings to describe the structure of a chemical species. An example of this representation is the Simplified Molecular Input Line Input System (SMILES). SMILES representations can be obtained from the chemical structure analysis of molecules, and these representations can be converted back to 2D or 3D chemical structures. Other molecular line notation includes the Wiswesser line notation (WLN), ROSDAL, and SYBYL line notation (SLN).
[0012] Therefore, the use of line notation for chemical structures is necessary in computational protocols. While some line notation is currently available, computational techniques are constantly being updated and improved. The development of deep neural networks (DNNs) continues to drive the optimization and improvement of data processing. These DNNs are configured to generate objects that satisfy defined conditions. For example, DNNs can generate molecules with specific biological activities for specific targets (e.g., receptors involved in disease states). Therefore, there remains a need to refine the line notation for chemical structures for computational techniques. Summary of the Invention
[0013] In some embodiments, scaffold-oriented line notation for a chemical structure may include: a scaffold sequence of multiple atomic identifiers arranged in line notation, defining a scaffold for the molecular chemical structure, wherein the scaffold sequence includes at least one decorative marker (or any number of decorative markers), each decorative marker being adjacent to an atomic identifier of a connecting atom of the scaffold to which the decoration is attached, wherein in the chemical structure of the molecule, the decoration is a chemical portion bound to a connecting atom of the scaffold; a decorative separator following the last atomic identifier or the last decorative marker in the scaffold sequence; at least one decoration (or any number of decorations) having at least one atomic identifier in the line notation, defining a chemical structure for a chemical portion of the decoration attached to a connecting atom of the molecular scaffold; wherein: in the scaffold sequence, the order of at least one decorative marker (or any number of decorative markers) defines the order of at least one decoration (or any number of decorations); in said at least one decoration, the first decoration follows the first decorative separator; and in said at least one decoration, the first decoration is defined as a first connecting atom identifier among multiple atomic identifiers attached between the first atomic identifier and the last atomic identifier. While scaffold-oriented line notation has been described for molecules comprising a scaffold and at least one decoration, it should be recognized that scaffold-oriented line notation can be applied to molecules without any decoration. For example, benzene can be represented by a line symbol facing the support, and it has no decoration.
[0014] In some embodiments, the at least one decorative mark is located in one of the following: before the first atomic identifier of the bracket sequence connected to the first decorative mark; after the first atomic identifier of the bracket sequence connected to the first decorative mark; before the first connection atomic identifier of the bracket sequence connected to the first decorative mark, wherein the first connection atomic identifier is not the first atomic identifier in the bracket sequence; after the first connection atomic identifier of the bracket sequence connected to the first decorative mark; before the subsequent atomic identifier of the bracket sequence connected to the first decorative mark; or after the subsequent atomic identifier of the bracket sequence connected to the first decorative mark.
[0015] In some implementations, within the scaffold sequence, a first connecting atom identifier of multiple atom identifiers is adjacent to a first decorative marker. In some aspects, the first decorative marker precedes the first atom identifier of the scaffold sequence. The first connecting atom can be any atom, including the first or last atom in the scaffold sequence, or any atom in between.
[0016] In some implementations, the line symbol may include: at least one subsequent decoration mark adjacent to the subsequent atom identifier; at least one subsequent decoration separator after the first decoration; and at least one subsequent decoration after the at least one subsequent decoration separator, wherein each subsequent decoration is separated by a subsequent decoration mark.
[0017] In some embodiments, the line symbol may include: a plurality of decorative marks adjacent to the corresponding atom identifier; a plurality of decorations separated by a plurality of decorative separators; each of the plurality of decorations following the corresponding decorative separator. In some aspects, each decoration includes a corresponding decorative mark followed by a line symbol of the chemical structure of the decoration. In some aspects, each atom identifier is defined by the periodic table. In some aspects, each decorative mark is a symbol. In some aspects, each decorative separator is a second symbol different from the decorative maker symbol. In some aspects, each decorative mark in the scaffold sequence is connected by a third symbol different from the decorative maker symbol and the decorative separator symbol.
[0018] In some embodiments, a method for converting the line notation of a molecule's chemical structure into a scaffold-oriented line notation of the chemical structure may include: providing the line notation of the chemical structure; converting the line notation into a graphical notation of the chemical structure; identifying the scaffold of the graphical notation of the chemical structure; searching for at least one decoration of the graphical notation of the chemical structure; separating the scaffold from any decoration; converting the graphical representation of the scaffold into a corresponding line notation representation of the scaffold, wherein the line notation includes a plurality of atomic identifiers arranged in a scaffold sequence; converting the graphical representation of any decoration into a corresponding line notation representation for each decoration; identifying a first connecting atom in the scaffold connected to the first decoration when the first decoration is present and connected to a first connecting atom in the chemical structure; identifying a first connecting atom identifier of the first connecting atom in the scaffold sequence when the first connecting atom is identified; placing a first decoration marker adjacent to the first connecting atom identifier in the scaffold sequence when the first decoration is present in the chemical structure; placing a first decoration separator after the last atomic identifier or the last decoration marker in the scaffold sequence; placing the first decoration after the first decoration separator when the first decoration is present in the chemical structure; and providing the scaffold-oriented line notation for the chemical structure. This method can be performed using molecules comprising scaffolds with or without decorations. When the molecule is merely a scaffold, the methodological steps describing the decorative actions are omitted.
[0019] In some embodiments, the method may include: identifying at least one decoration of a graphic symbol of a chemical structure; separating a support from the at least one decoration; converting a graphic representation of each decoration into a corresponding line symbol representation of each decoration; identifying a first connecting atom identifier in a support sequence for connecting to a first connecting atom of the first decoration of the last identified decoration; placing a first decoration marker adjacent to the first connecting atom identifier; placing the first decoration after a first decoration separator; and providing a support-oriented line symbol for the chemical structure, wherein the support-oriented line symbol includes a support sequence and a decoration sequence of at least one decoration, wherein the support sequence and the decoration sequence are separated by a first decoration separator.
[0020] In some embodiments, the method may include: identifying each atom and each bond of the molecular chemical structure; identifying the scaffold of the chemical structure; identifying each decoration attached to the scaffold atoms; identifying each bond between each decoration of the scaffold and the corresponding atom; and breaking the identified bonds between each decoration of the scaffold and the corresponding atom.
[0021] In some implementations, the method may include: replacing each broken bond with a support node connected to the corresponding atom of the support; and replacing each broken bond with a decorative node arranged on each decoration.
[0022] In some implementations, the method may include: constructing a line symbol for a bracket with decorative markings for each decorative node; and constructing a line symbol for each decoration.
[0023] In some implementations, the method may include: determining the order of at least one decorative mark in the line symbols of a bracket; and arranging the at least one decoration in a decorative sequence having the order of the at least one decorative mark in the line symbols of the bracket, wherein each decoration has a decorative line symbol and is separated by a decorative separator.
[0024] In some implementations, the method may include arranging a scaffold sequence such that a first decorative marker precedes a first connecting atom identifier in the scaffold sequence. The first connecting atom can be any atom, including the first or last atom in the scaffold sequence or any atom in between.
[0025] In some implementations, the method may include arranging line symbols to have: at least one subsequent decoration mark adjacent to a subsequent atom identifier; at least one subsequent decoration separator after a first decoration; and at least one subsequent decoration after the at least one subsequent decoration separator, wherein each subsequent decoration is separated by a subsequent decoration mark.
[0026] In some implementations, the method may include arranging line symbols to have: a plurality of decorative marks adjacent to the corresponding atomic identifiers; a plurality of decorative separators separated by the plurality of decorative marks; and each of the plurality of decorative marks following the corresponding decorative separator.
[0027] In some implementations, the method may include defining each decoration as a line symbol comprising a corresponding decoration mark followed by the chemical structure of the decoration.
[0028] In some implementations, the line symbols for the scaffold may include at least one of the following: each atom identifier is defined by the periodic table of elements; each decorative mark is a symbol; each decorative separator is a second symbol different from the decorative maker symbol; or each decorative mark in the scaffold sequence is connected by a third symbol different from the decorative maker symbol and the decorative separator symbol.
[0029] In some aspects, a method for converting a scaffold-oriented line symbol of a chemical structure of one embodiment into different line symbols of the chemical structure may include: providing a scaffold-oriented line symbol for the chemical structure; splitting the scaffold-oriented line symbol into a scaffold sequence and each decoration; constructing a graphical representation of the scaffold sequence; constructing a graphical representation of each decoration; combining the graphical representation of the scaffold sequence and the graphical representation of each decoration to form a graphical representation of the molecule; and converting the graphical representation of the molecule into different line symbols. In some aspects, the method may include identifying scaffold connection points on the graphical representation of the scaffold for each decoration; identifying scaffold atoms connected to the scaffold connection points for each decoration; and removing each scaffold connection point. In some aspects, the method may include: identifying decoration connection points on the graphical representation of each decoration; identifying decoration atoms connected to the decoration connection points for each decoration; and removing each decoration connection point.
[0030] In some implementations, the method may include: connecting each support atom to a corresponding decorative atom via bonds; and providing a graphical representation of the molecular chemical structure.
[0031] In some implementations, the method may include identifying a first decoration separator and each decoration separator between each decoration, wherein the first decoration separator is located after the last atomic identifier or the last decoration mark.
[0032] In some implementations, the method may include: identifying atom A in a support that defines a connection point to a decoration; identifying atom B in a decoration that defines a connection point to a support; identifying atom A_neig connected to atom A; identifying atom B_neig connected to atom B; removing atom A; removing atom B; and connecting atom A_neig to atom B_neig via a bond.
[0033] In some implementations, the method may include: identifying each atom A in a support that defines a connection point to a decoration; identifying each atom B in a decoration that defines a connection point to a support; identifying each atom A_neig connected to each atom A; identifying each atom B_neig connected to each atom B; removing each atom A; removing each atom B; and connecting each atom A_neig to each corresponding atom B_neig via a bond.
[0034] In some embodiments, the method for calculating chemical structures may include: providing a scaffold-oriented line symbol of a chemical structure according to one embodiment to a computing system; and using the computing system to execute a computing protocol with the scaffold-oriented line symbol.
[0035] In some embodiments, the method for calculating chemical structures may include: providing scaffold-oriented line symbols of the chemical structure obtained by performing one of the methods to a computing system; and using the computing system to perform a computing protocol with scaffold-oriented line symbols.
[0036] In some aspects, a computer program product may include a non-transient tangible storage device having computer-executable instructions that, when executed by a processor, cause the execution of a method of one embodiment for converting line symbols into support-oriented line symbols.
[0037] In some respects, a computer program product may include a non-transient tangible storage device having computer-executable instructions that, when executed by a processor, result in the execution of a method for converting a support-oriented line symbol into a different line symbol.
[0038] The above description is for illustrative purposes only and is not intended to be limiting in any way. Other aspects, embodiments, and features will become apparent from the accompanying drawings and the following detailed description, in addition to the illustrative aspects, embodiments, and features described above. Attached Figure Description
[0039] The foregoing and following information, as well as other features of the invention, will become more apparent from the description and appended claims, taken in conjunction with the accompanying drawings. It should be understood that these drawings merely illustrate a few embodiments of the invention and should therefore not be considered as limiting its scope; the invention will be described with additional specificity and detail using the drawings.
[0040] Figure 1A The method for obtaining the SOULS representation is shown.
[0041] Figure 1B A method for generating scaffolds and peripheral decorations from molecules is provided.
[0042] Figure 2A method for converting a line notation representation (e.g., SMILES representation) of a molecule into a SOULS representation of the molecule is shown.
[0043] Figure 3A An example of an algorithm for converting molecular line symbols SMILES to SOULS representations is shown; however, it should be recognized that any molecular line symbol can be used to generate SOULS representations.
[0044] Figure 3B Another example of an algorithm for converting molecular line symbols SMILES to SOULS representations is shown, but it should be recognized that any molecular line symbol can be used to generate SOULS representations.
[0045] Figure 4A An example of a method for converting SMILES representation to SOULS representation is shown.
[0046] Figure 4B A detailed example of the method for converting SMILES representation to SOULS representation is shown.
[0047] Figure 5A An example of a method for converting SOULS representation to line notation (e.g., SMILES representation) is shown.
[0048] Figure 5B An example method for constructing a complete graphical representation of a molecule from a SOULS representation is shown.
[0049] Figure 5C Another example method for generating SOULS or graphical representations as different line symbol representations is shown.
[0050] Figure 5D An example of an algorithm for converting SOULS representation to SMILES representation is shown, but it should be recognized that any molecular line symbol can be generated from SOULS representation.
[0051] Figure 5E A detailed example of the method for converting SOULS representation to SMILES representation is shown.
[0052] Figure 6 Examples of computers or computing systems configured to perform the calculations and methods described herein are shown.
[0053] The elements and components in the accompanying drawings may be arranged according to at least one embodiment described herein, and those skilled in the art may modify the arrangement based on the disclosure provided herein.
[0054] Detailed description
[0055] In the following detailed description, reference is made to the accompanying drawings, which form a part of this document. In the drawings, similar symbols generally identify similar components unless the context otherwise requires. The illustrative embodiments described in the detailed description, drawings, and claims are not intended to be limiting. Other embodiments may be used, and other modifications may be made, without departing from the spirit or scope of the subject matter described herein. It will be readily understood that, as generally described herein and illustrated in the figures, aspects of the invention can be arranged, substituted, combined, separated, and designed in a variety of different configurations, all of which are explicitly contemplated herein.
[0056] Generally, this technology includes systems and methods for providing scaffold-oriented universal line systems for chemical symbols. More specifically, this disclosure relates to systems and methods for converting simplified molecular input line input systems (SMILES) into useful scaffold-oriented universal line systems (SOULS). However, it should be appreciated that the systems and methods can be used to convert any chemical linear or graphical symbol into SOULS symbols representing chemical structures.
[0057] SOULS representations of chemical structures can be used in a variety of computational techniques related to chemical structures. Some exemplary techniques that can be implemented using SOULS representations are provided in the incorporated references. SOULS representations are particularly useful in artificial intelligence (AI), such as in training and using machine learning models for chemical analysis and design, as well as in other computations involving chemical structure or molecular data.
[0058] In some implementations, SOULS representations can be used to train machine learning models. Therefore, SOULS representations depict molecular structures in a computer-readable format that can be processed by a computer during machine learning. SOULS representations provide a scaffold-oriented molecular representation that isolates the molecular scaffold from the peripheral chemical components (e.g., decorations) within the molecular representation. SOULS representations are a chemical representation that is a useful tool for many machine learning methods, including generative modeling, property optimization using algorithms and reinforcement learning, and predictive modeling.
[0059] In some embodiments, the system and method include an algorithm for converting a molecular structure into a SOULS representation. Therefore, the system may include a computer configured with an algorithm designed to convert any representation of a molecule into a SOULS representation. The system can obtain molecular representations in various formats, such as string formats (e.g., line notation, linear notation, etc.) or graphical representations. When a molecular representation is provided in string format, the system uses a string-to-graph conversion to convert the string notation into a graphical notation, and then processes the graphical notation to obtain a SOULS representation. When a molecular representation is provided in graphical notation, the system processes the graphical notation to obtain a SOULS representation. Therefore, the system can select the algorithm for obtaining the SOULS representation based on the format of the provided molecular structure.
[0060] In some embodiments, the system and method include algorithms for converting the molecular structure represented in the SOULS representation into other standard representations, such as string formats (e.g., SMILES) or graphics (e.g., 2D or 3D). Therefore, the system may include a computer configured with algorithms designed to convert the SOULS representation into any other molecular representation, such as those described herein. The conversion may continue across one or more steps until the desired representation is obtained.
[0061] In some implementations, the conversion from SOULS representation to another molecular representation can be used to provide a certain molecular representation that may be required for other defined operations. The ability to convert a molecular representation to a SOULS representation and then back from a SOULS representation to a molecular representation (e.g., non-SOULS) allows for two chemical structure conversions between different variants, where a molecular symbol can be converted to a SOULS representation and then back to the original symbol or a different symbol. This one-way or two-way conversion can be used for various computational processing of chemical structure data.
[0062] In some implementations, the system and method include the use of SOULS representations for algorithmic applications in machine learning systems, including generative modeling, predictive modeling, and attribute optimization. That is, SOULS can be used for AI, rather than other string or graphical formats. The conversion can be implemented according to computational protocols. Some steps may use software with specific notations, so SOULS can be converted to the specific notations for these steps. In other steps, computations can be improved using the SOULS representation format, and such computations can be presented using SOULS representations, where any other molecular representation can be converted to SOULS for such computations. For algorithms that convert molecules to SOULS format, the data can be a set of molecular structures represented in any format, including MOL, SDF, SMI, and PDB file formats, graphs, or SMILES. For algorithms that convert molecules in SOULS format to any other format, the data is a set of molecules represented in SOULS format. For machine learning applications of the proposed framework, the data can be a set of molecules represented in any format. For some applications, such as predictive modeling, each molecule may have a specified set of properties.
[0063] In some embodiments, the SOULS representation may include a sequence of atomic identifiers for a molecular scaffold, within which are decoration markers (e.g., indicators). The indicators identify the locations where chemical moieties (e.g., substituents) within the sequence are attached to the scaffold, which can be represented as decorations or side chains of the molecular core scaffold. Thus, the language describing the molecule includes a scaffold representing the molecular core and decorations representing chemical moieties (e.g., substituents) attached to the core molecule. Decorations may have connection points on the core scaffold, which can be considered nodes, where each decoration has a node on the scaffold. The line symbols may include a scaffold sequence of atomic identifiers for the scaffold, with decoration markers identifying the locations of decorations attached within the sequence. The decoration markers are placed near atoms that serve as decoration nodes. Decorations are then listed in a decoration sequence following the scaffold sequence, where each decoration in the decoration sequence is separated by a decoration separator (e.g., a period "."). Each decoration in the decoration sequence is adjacent to a decoration marker (e.g., *) and separated from each other by decoration separators. Thus, the SOULS representation includes a scaffold sequence and a decoration sequence. The decoration sequence includes at least one decoration. In some cases, the decoration sequence includes multiple decoration line symbols. Each decoration has its own line symbol.
[0064] The order of the ornament markers in the bracket sequence defines the order of the ornaments listed in the ornament sequence. A typical frame includes reading from left to right, with the bracket order on the left and the ornament order on the right; however, the direction can be modified, for example: reading from right to left, where the bracket order is on the right and the ornament order is on the left; reading from right to left, where the bracket order is on the left and the ornament order is on the right; or reading from left to right, where the ornament order is on the right and the bracket order is on the left.
[0065] The support sequence can include atom identifiers (e.g., atoms represented as on the periodic table), where decoration markers are adjacent to the atoms that connect the decorations. Typically, decoration markers are to the left of the decoration node atom of the initial support atom, but decoration makers may be to the right of the support atom that serves as a decoration node atom. Decoration markers identify the location of the decoration, and the order of decoration makers identifies the order in which the decorations are defined in the decoration sequence. Left or right adjacency can be modified depending on the symbol to be used. However, as presented herein, the first decoration marker is on the left and is the initial character represented by SOULS (e.g., the symbol *), and subsequent decoration markers are to the right of the support atoms that serve as decoration node atoms.
[0066] For example, a SOULS can be read as follows:
[0067] *C1Oc2ccc(*)cc2N(*)C1=O.*C.*Cl.*CC(O)CO
[0068] Here, the initial symbol is an asterisk (*), used here as the first decoration marker; however, it should be recognized that any other symbol (e.g., non-alphanumeric) may be used. Following the decoration marker is the support sequence C1Oc2ccc, which defines a portion of the support. This support sequence C1Oc2ccc is followed by a second asterisk (*), used as the decoration marker for the second decoration listed in the decoration sequence, where the decoration marker is located to the left of the support atom that serves as the decoration node. Following the second asterisk (*) is the support sequence cc2N, followed by a third asterisk (*), used as the decoration marker for the third decoration listed in the decoration sequence (e.g., N is the decoration node). Following the third asterisk (*) is the support sequence C1=O, followed by a period (e.g., "."), which indicates the end of the support sequence. The subject following the period (e.g., the decoration separator) comprises the decoration sequence, which defines the decorations according to the order in which the asterisks are placed in the support sequence. Thus, the first asterisk (*) is defined as *C, defining the first decoration in the support sequence as C (e.g., carbon). Following the first decoration *C is another period, used as the decoration separator. However, any symbol other than a period (e.g., not alphanumeric or used for different markers) can be used as a decoration separator. The decoration separator period is followed by *Cl (e.g., chlorine Cl), so *Cl is the second decoration attached to the scaffold at the position of the second decoration marker asterisk (*) in the scaffold sequence. The second decoration *C is followed by the decoration separator period, then the third decoration *CC(O)CO, which defines the chemical structure of the atom attached to the scaffold sequence with the third decoration marker (*). Thus, this representation defines the molecular structure by resolving the structure into scaffold and decoration, and sequentially defines the positions of the decorations within the line symbol sequence of the molecular representation. The order of the decoration markers in the scaffold sequence defines the order in which the decorations are defined in the decoration sequence. This makes it easy to determine the scaffold structure, the decoration structure, and then their combination with the decorations attached to the scaffold, as indicated by the positions of the decoration markers, in the corresponding order.
[0069] As can be seen, when the decorative asterisk (*) is listed at the beginning of the support sequence, it is not within parentheses, but it can be listed within parentheses if needed. Therefore, the asterisk (*) or an asterisk (*) within parentheses can be used as a decorative marker. Furthermore, the decorative marker can be a vertical line, such as "|" or any other symbol. Preferably, the decorative marker is not alphanumeric, as it is necessary to clearly identify atoms separated from the decorative marker. The use of parentheses or other symbols can be used for all instances of the decorative maker's position, or only within or inside the support sequence. Thus, using parentheses around an asterisk defines the corresponding decoration as being located within the support, while the absence of parentheses around an asterisk defines the first dangling atom in the support sequence.
[0070] In some implementations, SOULS signifies including a first character selected from a first atom identifier or a first decorative mark. In some aspects, the first character is a first atom identifier, which may be defined by an atom in the periodic table. In some aspects, the first character may be a first decorative mark (e.g., an asterisk *). SOULS signifies that a pre-mark sequence of one or more atom identifiers may be included before the first decorative mark. Alternatively, SOULS signifies that a first decoration may be included, followed by a first support sequence of one or more atom identifiers. Then, after the first support sequence is a second decorative mark (e.g., an asterisk (*) in parentheses) that identifies the atom preceding the second decorative mark as a second support node atom. Following the second decorative mark is a second support sequence, followed by a third decorative mark that identifies the atom preceding the third decorative mark as a third support node atom. Following the third decorative mark is the final support sequence, followed by a first decorative separator (e.g., a period "."). This first decorative separator separates the support sequence from the first decoration and the entire decoration sequence. The individual decorations are separated by decorative separators in the order in which they are presented as decorative marks in the support sequence. Each decoration is preceded by a decoration marker (e.g., an asterisk *) to identify the character that defines the decoration. Therefore, SOULS indicates the separation of the support structure from the decoration, which allows for improved use of various computational techniques, such as machine learning models.
[0071] In some implementations, the SOULS representation may include a line symbol or string consisting of two parts separated by a special symbol, which may be a dot or period "." used herein, but other symbols may be defined as decoration separators separating the decoration sequence from the support sequence. Decoration separators allow the support sequence to be written first, followed by the basic line symbol of the decoration sequence, and vice versa. The first part of the support sequence contains a special symbol (e.g., an asterisk "*") corresponding to the outer connection points, such as atoms of the support connected to the outer decoration. The second part of the SOULS representation treats the decoration sequence as individual decoration segments (e.g., the line symbol for each decoration), in the same order as the corresponding connection points listed in the support sequence of the SOULS representation, separated by a special symbol, such as a dot or period ".", but may differ from the symbol separating the support and decoration sequences if desired. However, a dot or period can be used to indicate that the following character is a decoration line symbol. Each decoration segment is written with a basic line symbol and the connection points are marked with a special symbol (e.g., an asterisk "*"). In some aspects, the line symbols in the SOULS representation are defined and used as in the SMILES line symbols.
[0072] Figure 1AA method 100 for obtaining a SOULS representation is shown. At box 102, method 100 can be implemented by obtaining a graphical representation of the molecule. Then, at box 104, the graphical representation is divided into a support portion and at least one decorative portion. Here, the term "decorative" is used to describe a chemical portion (e.g., a substituent) coupled to the support at a decorative node atom. However, the term "decorative" can be used interchangeably with terms such as "peripheral," "dangling," or other identifiers of chemical portions attached to the support. At box 106, method 100 takes the graphical representation of the support and identifies a first support node atom linked to the first decorative portion, and presents a first decorative mark at the first support node atom. Then, at box 108, method 100 takes the graphical representation of the support and converts it into ASCII-compliant line symbols (e.g., SMILES) (e.g., each line symbol is ASCII-compliant), which begin at the first support node atom. Furthermore, at box 110, the graphical representation of each decorative portion is converted into a corresponding line symbol (e.g., SMILES or the same as the support line symbol). At box 112, the subsequent support node atoms in the line symbol are identified with a second decoration mark, and this operation is repeated until all support node atoms are identified with subsequent decoration marks. At box 114, the decorations are arranged in the order of the decoration marks in the support sequence. At box 116, SMILES indicates that it includes a decoration sequence associated with the support sequence, which can be adjacent, for example, the support sequence is on the left and the decoration sequence is on the right, and they are separated from each other by defining separate characters. Therefore, SMILES indicates that it includes a support sequence with decoration mark order and a decoration sequence decorated in the order of decoration marks.
[0073] In some implementations, scaffolds can be identified and / or separated from decorations by various frames. Different frames can reach different scaffolds. This invention allows for the identification of different scaffolds to follow the line notation they describe. This invention uses the scaffolds and decorations of molecules to generate line notation in the SMILES representation.
[0074] In some implementations, the Bemis-Murcko framework can be used to separate the graphical representation into scaffolds and peripheral decorations. The Bemis-Murcko framework provides a system for separating scaffolds from peripheral decorations by defining the scaffold as a set of ring structures and connecting atoms, along with the peripheral decorations attached to those connecting atoms. The Bemis-Murcko algorithm can be used to define the scaffold and the decorations attached to it at the nodes. This can include extracting the Bemis-Murcko scaffold from the molecular structure.
[0075] Figure 1BA method 120 is provided for generating a scaffold and peripheral decorations from a molecule. At box 122, method 120 obtains the molecule as its graphical representation. Then, at box 124, the graphical representation can be analyzed to identify nodes as atoms and bonds as edges. At box 126, decorations are identified and removed from the decoration nodes, such that each decoration node is identified and each decoration is identified. At box 128, after removing the decorations, the scaffold is identified as the remaining structure in the graphical representation.
[0076] In some implementations, the algorithm for extracting Bemis-Murcko scaffolds from a molecular structure can be as follows: (1) Represent the molecular structure as a graph, where nodes are atoms and bonds are edges; (2) When the molecular graph has leaf nodes, remove the leaf nodes and the edges connected to them from the graph; (3) The remaining graph is the Bemis-Murcko scaffold of the molecule; (4) All nodes removed from the graph are the periphery of the molecule. In some aspects, the Bemis-Murcko framework may define leaf nodes in different ways, such as nodes with at most one connecting edge, or nodes with at most one connecting edge corresponding to single bonds (as opposed to double bonds or aromatic bonds). In some aspects, decorations can be considered as leaves connected to leaf nodes. In any case, the protocol can resolve the molecule to a scaffold and connect one or more decorations to the decoration nodes of the scaffold (i.e., scaffold nodes).
[0077] In some implementations, alternative definitions to the Bemis-Murcko scaffold are possible and can be used to create the SOULS representation as described herein. For example, the Bemis-Murcko scaffold extension can be used where the peripheral decorations are connected to the scaffold via any bond (not just single bonds). Furthermore, any other algorithm that divides the molecular graph into a central scaffold and decorative peripheral portions, where all peripheral decorations are interconnected only through the central portion, can be used for the scaffold-periphery definition. While the disclosure herein may refer to the scaffold as a Bemis-Murcko scaffold with leaf nodes defined using single bonds, it should be recognized that the scaffold and decorations can be generated using other algorithms.
[0078] Figure 2A method 200 for converting a line notation (e.g., SMILES notation) of a molecule into a SOULS notation of the molecule is shown. At block 202, method 200 includes obtaining the molecule in line notation (e.g., SMILES notation). Then, at block 204, the line notation (e.g., SMILES notation) is converted into a graphical representation of the molecule. It should be appreciated that, when available, the molecule can be initially obtained in the graphical representation without first starting from the line notation. Then, at block 206, the graphical representation of the molecule is converted into a support and at least one decoration connected to the support at a node. Then, at block 208, the bonds between the support and the decoration are identified. Then, at block 210, the identified bonds are labeled as bonds (a1, a2), where a1 is a node atom in the support (e.g., support node atom) and a2 is a node atom in the decoration (e.g., decoration node atom), and vice versa. Then, at box 212, the bond (a1, a2) is removed and replaced with a special node c1 (e.g., a special node) for atom a1 and a special node c2 for atom a2. Then, at box 214, a bond (a1, c1) is added between the support node atom a1 and the special node c1 of the support, and a bond (c2, a2) is added between the decoration node atom a2 and the special node c2 of the decoration. The bond types of (a1, c1) and (c2, a2) are the same as those of (a1, a2). At box 216, for each bond between the support atom and the corresponding decoration atom, the processes in boxes 210, 212, and 214 are repeated until all bonds between the support and decoration are removed, and bonds are generated from the support node atom (e.g., a1) to the special node (e.g., c1), and from the decoration node atom (e.g., a2) to the special node (e.g., c2). This separates the support from the different decorations. In box 218, the line symbols for the supports and each decoration are constructed. At box 220, the support atoms connected to specific nodes (e.g., (a1, c1) are identified and marked (e.g., with decoration markers) in decoration order using the support line symbols. Identification can use decoration markers as defined herein, such that the support line symbols include at least one decoration marker. At box 222, the decoration line symbols are ordered using the order of the decoration markers in the support line symbols.
[0079] Figure 3A An example algorithm for converting molecular line symbols (SMILES) to SOURES representations is shown; however, it should be recognized that any molecular line symbol can be used to generate a SOURES representation. Furthermore, the algorithm may be activated when a graphical representation is provided instead of molecular line symbols. Figure 3AThe algorithm used to convert SMILES to SOULS is as follows: (1) Construct a graph of the molecule defined in the SMILES; (2) Apply the Bemis-Murcko algorithm to assign each atom to the scaffold or periphery (i.e., decoration); (3) For each bond (a1, a2) in the molecule, (3A) if ((a1 in the scaffold) and (a2 in the periphery)) or ((a2 in the scaffold) and (a1 in the periphery)); (3A1) Remove the bond (a1, a2); (3A2) Create special nodes c1 and c2 with atom type "*"; (3A3) Add bonds (a1, c1) and (c2, a2) with the same bond type as (a1, a2); (3A4) Repeat step (3A) until all bonds have been analyzed; (4) Construct the SMILES representation of the scaffold (for the non-canonical SOULS representation, scaf_sm = smiles(scaffold), for the canonical SOULS representation, scaf_sm = canonical smiles(scaffold)), where “canonical_smiles” is any SMILES normalization algorithm; (5) start SOULS construction using the constructed SMILES representation scaffold (SOULS = scaf_sm); (6) for each atom in scaf_sm, if atom_type == “*”; (6A) find the periphery P previously attached to that atom; (6B) add the periphery's SMILES representation and “.” from the “*” atom to SOULS (SOULS = SOULS + “.” + smiles(P, start_at = “*”); and (7) return the SOULS representation. Here, the periphery P is the same as the decoration described herein, and it is clear that the chemical part of the structure is located on its periphery.
[0080] Figure 3B Another example of an algorithm for converting molecular line symbols SMILES to SOULS representations is shown; however, it should be recognized that any molecular line symbol can be used to generate a SOULS representation. Furthermore, the algorithm may be activated when a graphical representation is provided instead of molecular line symbols. Figure 3BThe algorithm used to convert SMILES to SOULS is as follows: (1) Construct the molecular graph defined in SMILES; (2) Apply the Bemis-Murcko algorithm to assign each atom to the scaffold or periphery (i.e., decoration); (3) For each bond (a1, a2) in the molecule, if ((a1 in the scaffold) and (a2 in the periphery)) or ((a2 in the scaffold) and (a1 in the periphery)); (3A) Remove the bond (a1, a2); (3B) Create a special node c with atom type "*"; (3C) Add the bonds (a1, c) and (c, a2); (4) Use some of the scaffolds SMILES indicates the start of SOULS (SOULS = smiles(scaffold)); (4) for atoms in smiles(scaffold), if atom_type == "*"; (4A) find the periphery P previously connected to that atom; (4B) add the periphery's SMILES representation and "." from the "*" atom to SOULS (SOULS = SOULS + "." + smiles(P, start_at = "*"); and (5) return the SOULS representation. Here, the periphery P is the same as the decoration described herein, where the chemical part of the obvious structure is located on its periphery.
[0081] In some implementations, a single molecule can have multiple SOULS representations, depending on the order of the peripheral decorations, different basic line symbols, or different graph traversals when constructing the basic line symbols. A canonical SOULS is a SOULS representation of a molecule obtained by applying a normalization algorithm to the SOULS representation. When the basic line symbol is SMILES, an example normalization algorithm is as follows: (1) Normalize the first part of the SOULS using the SMILES normalization algorithm; (2) Change the order of the peripheral fragments of the second part of the SOULS accordingly; (3) For each peripheral fragment, apply the SMILES normalization process so that the connection point of the fragment becomes the first symbol after normalization.
[0082] Examples of SMILES and SOULS representations for the same molecule are given below.
[0083] Example 1:
[0084] Standard SMILES: CC1Oc2ccc(Cl)cc2N(CC(O)CO)Cl=O
[0085] SOULS:*C1Oc2ccc(*)cc2N(*)C1=O.*C.*Cl.*CC(O)CO
[0086] Standard SOULS: *c1ccc2c(c1)N(*)C(=O)C(*)O2.*Cl.*CC(O)CO.*C
[0087] Example 2:
[0088] Standard SMILES: CC1C2CCC(C2)C1CN(CCO)C(=O)c1ccc(Cl)cc1
[0089] SOULS:C(C1C(*)C2CC1CC2)N(*)C(c1ccc(cc1)*)=O.*C.*CCO.*Cl
[0090] Standard SOULS: *c1ccc(C(=O)N(*)CC2C3CCC(C3)C2*)cc1.*Cl.*CCO.*C
[0091] Example 3:
[0092] Standard SMILES: CCCS(=O)c1ccc2[nH]c(=NC(=O)OC)[nH]c2c1
[0093] SOULS:*c1cc2c(cc1)[nH]c([nH]2)=N*.*S(=O)CCC.*C(=O)OC
[0094] Standard SOULS: *N=c1[nH]c2ccc(*)cc2[nH]1.*C(=O)OC.*S(=O)CCC
[0095] Example 4:
[0096] SMILES: C#CC(C)(C)NC(=O)CN(c1cc(C)ccc1C)S(=O)(=O)c1ccccc1
[0097] Standard SOULS:
[0098] *c1ccc(*)c(N(*)S(=O)(=O)c2ccccc2)c1.*C.*C.*CC(=O)NC(C)(C)C#C
[0099] SOULS:
[0100] c1cc(*)c(cc1*)N(S(=O)(=O)c1ccccc1)*.*C.*C.*CC(=O)NC(C)(C)C#C
[0101] Example 5:
[0102] The SOULS in Example 1 is represented as follows:
[0103] SOULS:*C1Oc2ccc(*)cc2N(*)C1=O.*C*Cl*CC(O)CO
[0104] SOULS:*C1Oc2ccc(*)cc2N(*)C1=O.*C|*Cl|*CC(O)CO
[0105] SOULS:*C1Oc2ccc(*)cc2N(*)C1=OC|Cl|CC(O)CO
[0106] SOULS:*C1Oc2ccc(*)cc2N(*)C1=OCCl.CC(O)CO
[0107] Furthermore, the SOULS representation can be used for molecules that contain only the scaffold sequence and have no modifications. However, the method for generating the SOULS representation applies to molecules that serve only as a scaffold and have no decorations. Therefore, the first symbol or identifier in the SOULS representation does not necessarily have to be a decoration mark or an asterisk. The leading asterisk in each decoration provided here is not required, but it is useful for visualization, as most SMILES visualization tools will draw such SOULS as a collection of fragments with different connection points.
[0108] The SOULS representation can be checked and verified to ensure it is generated correctly. The following set of conditions are used to verify SOULS:
[0109] The bracket S and all peripheral decorative segments [P_1,...,P_n] are valid SMILES;
[0110] The number of "*" atoms in the scaffold S is equal to the number of the outer decorative fragments P;
[0111] Each peripheral decoration fragment P_i contains only one "*" atom; and / or
[0112] Each peripheral decoration fragment P_i contains at least one non-asterisk "*" atom.
[0113] When SOULS is a canonical SOULS, another rule for valid SOULS is that each peripheral decoration fragment P_i begins with an asterisk (*).
[0114] Figure 4A An example of a method for converting the SMILES representation to the SOULS representation is shown. Step 1 involves creating a graphical representation of the molecule. Step 2 involves identifying the scaffold and perimeter decorations, where the scaffold is surrounded by solid lines and the perimeter decorations by dashed lines. Thus, there is a single scaffold and three perimeter decorations. Step 3 involves separating the scaffold from the perimeter decorations and adding markers (e.g., asterisks *).
[0115] Figure 4B A detailed example of the method used to convert the SMILES representation to the SOULS representation is shown. Step 1 involves creating a graphical representation of the molecule. Step 2 involves applying the Bemis Murcko algorithm to the graphical representation to identify the scaffold and the outer decoration, where the scaffold is surrounded by solid lines and the outer decoration is surrounded by dashed lines. Step 3 involves iterating through the molecular bonds one by one until a bond (a1, a2) is found, as shown in the circle (e.g., between nitrogen and carbon). Step 4 involves creating new nodes c1 and c2 with atom type "*", removing bond (a1, a2), and adding bonds (a1, c1) and (c2, a2). Step 5 involves iterating through the molecular bonds one by one until a bond (a1, a2) is found, as shown in the circle (e.g., between phenyl and methyl). Step 6 involves creating new nodes c1 and c2 with atom type "*", removing bond (a1, a2), and adding bonds (a1, c1) and (c2, a2). Step 7 involves iterating through the molecular bonds one by one until a bond (a1, a2) is found, as indicated by the circle (e.g., between a phenyl group and other methyl groups). Step 8 involves creating new nodes c1 and c2 with atom type "*", removing the bond (a1, a2), and adding bonds (a1, c1) and (c2, a2). Step 9 involves generating line symbols (e.g., SOULS) for the scaffold, which involves iterating through the scaffold atoms until a connection point for the peripheral decoration is found. Step 10 involves adding the line symbols for the decoration to the scaffold line symbols. Step 11 involves repeating steps 9 and 10 for each peripheral decoration until completion and a SOULS representation is provided.
[0116] Furthermore, the protocol used to generate the SOULS representation can be reversed to generate a graphical representation and / or other line-symmetric representations, such as SMILES. This protocol can generate a graphical representation and then convert it to a line-symmetric representation. Therefore, this protocol can be used to convert a graphical representation to a line-symmetric representation.
[0117] Figure 5A An example of method 500 for converting a SOULS representation to a line notation representation (e.g., SMILES representation) is shown. At box 502, method 500 may include obtaining a SOULS representation of the molecule. Then, at box 504, the SOULS representation is split into scaffold sequence line symbols and decoration sequence line symbols. Here, the decoration sequence includes at least one decoration, and each decoration is defined by a line symbol. Thus, the decoration sequence lists all peripheral decoration fragments. Then, at step 506, a graphical representation of the molecule is constructed using the scaffold line symbols and the individual decoration line symbols. Then, at box 508, the graphical representation is converted to a line notation (e.g., SMILES) representation.
[0118] Figure 5BAn example method 510 for constructing a complete graphical representation of a molecule from SOULS representation is shown. Here, method 510 provides additional details for box 506. At box 512, method 510 may include constructing a graphical representation of the scaffold line symbols. Here, the graphical representation preserves the atomic ordering from the scaffold sequence. Then, at box 514, each atom in the graphical representation is analyzed to determine the connection points of the decorations. At box 516, for the connection points of the decorations in the scaffold, the correct decoration is identified and a graphical representation of that decoration is created. Then, at box 518, the connection points in the graphical representation of the decorations are identified. At box 520, the connection points in the scaffold graphical representation and the connection points in the decoration graphical representation are connected or added together. Then, at box 522, the scaffold atoms and decoration atoms adjacent to the connection points are identified. At box 524, the connection points are removed, and the identified adjacent scaffold atoms and the identified adjacent decoration atoms are connected together to form bonds. For each atom in the scaffold with a connection point and for each corresponding decoration, the method steps from boxes 518 to 524 are repeated. Once the entire molecule has been iterated to attach the decoration to the scaffold, a complete graphical representation of the molecule is generated at box 526. Then, at box 528, the graphical representation is converted to a line notation representation of the molecule (e.g., SMILES). This conversion from graphical to line representation can be performed in a manner known or developed in the art. For example, a complete SMILES representation is generated.
[0119] Figure 5C Another example method 550 for generating SOULS or graphical representations into different line notation representations is shown. At box 552, method 550 may include obtaining a SOULS representation of the molecule. At box 554, symbols separating the scaffold sequence from the decoration sequence are identified, and at box 556, the scaffold S is separated from the decoration P (e.g., peripheral decoration). At box 558, the scaffold graphical representation and the decoration representation are generated. At box 560, the scaffold is analyzed to identify atom A in the scaffold as a connection point* of the decoration. At box 562, atom B in the decoration as a corresponding connection point* is identified. At box 564, neighboring atom A_neig (e.g., connection point*) adjacent to atom A in the scaffold is identified. At box 566, neighboring atom B_neig (e.g., connection point*) adjacent to atom B in the decoration is identified. In box 568, atom A is removed from the scaffold and atom B is removed from the decoration. At box 570, the scaffold atom A_neig is connected to the decoration atom B_neig (e.g., A-A_neig and B-B_neig have the same connection time). At box 572, once all decorations are connected to the scaffold, a graphical representation of the molecule is generated. Then at box 574, the graphical representation of the molecule is converted into line notation of the molecule (e.g., SMILES).
[0120] Figure 5D An example algorithm for converting SOULS representation to SMILES representation is shown, but it should be recognized that any molecular line symbol can be generated from SOULS representation. Figure 5D In the algorithm used to convert SMILES to SOULS, the following steps are performed: (1) Split the SOULS representation by the "." symbol and obtain the scaffold S and the list of peripheral fragments [P_1, P_2, P_3, ..., P_N]; (2) Construct the molecular graph MOL from the scaffold S that maintains atomic order; (3) i = 1; (4) For atom A in the MOL atom (preserving the atomic order starting from S), if atom A is a connection point (atom_type(A) == '*'); (3A) Create the molecule P_MOL from the peripheral fragment P_i; (3B) Find the connection point in P_MOL. B(atomic_type(B)=='*'); (3C) add P_MOL to MOL; (3D) define A_neig as the atom in molecule MOL connected to connection point A; (3E) define B_neig as the atom in fragment P_MOL connected to connection point B; (3F) remove connection points A and B; (3G) connect atoms A_neig and B_neig with bonds of the same type as bond (A,A_neig); (4) repeat i=i+1 until all peripheral decorations are connected; (5) return the SMILES representation of molecule MOL.
[0121] Figure 5E A detailed example of a method for converting a SOULS representation to a SOULS representation is shown. Step 1 involves splitting the SOULS representation into a scaffold sequence S and separate peripheral decorations [P_1,P_2,...,P_N] by identifying at least one decorative separator, which is a period “.” representing a decorative line symbol. This splits the line symbol at each “.” symbol and obtains the scaffold sequence S and peripheral decorations [P_1,P_2,...,P_N]. Step 2 involves creating a molecular graphical representation of the scaffold, and optionally each peripheral decoration. Step 3 involves a connection point for the peripheral decoration P1, identified as methyl. Step 4 involves connecting the peripheral decoration P1 to the position on the molecule defined by the corresponding decorative marker “*”. Step 5 involves identifying a connection point for the peripheral decoration P2, which is methyl. Step 6 involves connecting the peripheral decoration P2 to the position on the molecule defined by the corresponding decorative marker “*”. Step 7 involves identifying a connection point for the peripheral decoration P3, which is N-(2-methylbut-3-yn-2-yl)propionamide. Step 8 involves attaching the peripheral decoration P3 to the position on the molecule defined by the corresponding decoration mark "*", which completes the graphical representation. Step 9 involves generating the SMILES representation from the molecular graphical representation.
[0122] In some implementations, the SOULS representation of a molecule can be used in a variety of computational systems, such as for describing molecules, and essentially for any computational processing protocol involving molecules. The SOULS representation is a general framework that can be applied to multiple domains, including machine learning. Machine learning protocols that can utilize the SOULS representation include representation learning, predictive modeling, generative modeling, property optimization (e.g., using Bayesian optimization), or supervised and unsupervised tasks in any general algorithm involving molecules.
[0123] In some implementations, for predictive modeling, SOULS representations can be used as a direct replacement for SMILES representations of molecules or any other line notation of molecules in a computational system or any computational protocol. SOULS representations are particularly useful in deep recurrent neural networks involving computation of molecular chemical structures. Therefore, SOULS representations can be used to predict various properties of molecules, including biochemical (e.g., biopharmacokinetic or pharmacodynamic) and physical properties (e.g., solubility, vaporization temperature, or others). This makes SOULS representations useful in many neural network architectures, including autoencoder-based networks for representation learning. The implantation of SOULS representations can serve as both input and output to molecules represented in SOULS representation format. SOULS representations can also be used with other molecular representations, such as SMILES or Graphs, or converted back and forth as needed by various computational protocols. For example, an encoder-decoder model can receive input represented in SOULS format and execute a computational protocol to convert the output to SMILES format, and vice versa.
[0124] In some implementations, SOULS representations can be used in protocols for molecular property optimization that may include generating representations of molecules that satisfy a given set of properties (e.g., solubility or ease of synthesis) used for training models or computational criteria. The SOULS format partitions the scaffold and separates peripheral decorations for easy readability during computation, where the order of decoration markers within the scaffold sequence defines the presentation order of decoration line symbols, making scaffolds and decorations easier to associate. SOULS notations are also more human-readable because the order of decoration markers allows for easier tracking of defined decorations at specific connection points. Therefore, SOULS representations allow for optimization by finding molecules with high values of certain quality functions that are typically associated with certain scaffolds or decorations. Optimization of SOULS notations can be accomplished using various methods, including genetic algorithms, Bayesian optimization, and random search.
[0125] Genetic algorithms, as a type of metaheuristic algorithm inspired by the process of natural selection, belong to the class of evolutionary algorithms (EA). They are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover, and selection. When used in any genetic algorithm, SOULS indicates that it can be used to define chemical structures.
[0126] Furthermore, the SOULS notation can be used to create analogs of chemical structures. The SOULS notation can be used to exchange different decorations for a specific scaffold, or to exchange scaffolds for different but similar structures with the same number and position of decorations. For example, to generate analogs, one possible mutation procedure is to replace random peripheral fragments with random peripheral fragments from other molecules. Therefore, the improved SOULS notation allows for easy replacement of substituents on the scaffold by switching or exchanging one decoration line symbol for another, which significantly simplifies the process.
[0127] The SOULS symbol is configured for mutations and modifications of peripheral decoration, thus providing a library of substituents for molecules at specific positions by identifying the order of decoration markers in the scaffold sequence and then tracing the ordered decoration line symbols within the decoration sequence. Similarly, the SOULS symbol allows for the modification or substitution of scaffold sequences with different sequences, enabling computers to process mutations by replacing a molecule's scaffold with one from another different molecule having the same number of peripheral fragments. Therefore, the SOULS symbol can be used to create analogs of chemical scaffolds, as well as analogs with the same or related chemical scaffolds and a range of different substituent patterns.
[0128] A scaffold-oriented line symbol for a chemical structure includes: a scaffold sequence of atomic identifiers arranged in line symbols, defining a scaffold for the chemical structure, the scaffold sequence including at least one decorative mark; a decorative separator following the last atomic identifier or the last decorative mark in the scaffold sequence; at least one decoration having at least one atomic identifier of a line symbol defining a chemical structure to which the decoration is attached; the order of the decorative marks in the scaffold sequence defining the order of the decorations; a first atomic identifier adjacent to a first decorative mark in the scaffold sequence; the first decoration following the first decorative separator in the at least one decoration; and the first decoration attached to the first atomic identifier in the at least one decoration.
[0129] In some embodiments, the line symbols for the scaffold-oriented chemical structure may include: a scaffold sequence of multiple atomic identifiers arranged in line symbols, defining the scaffold of the molecular chemical structure, wherein the scaffold sequence includes at least one decorative mark located at one of the following positions: before a first atomic identifier of the scaffold sequence; before a subsequent / second atomic identifier; or after a subsequent / second atomic identifier; a decorative separator after the last atomic identifier or the last decorative mark of the scaffold sequence; at least one decoration having at least one atomic identifier of line symbols, defining the chemical structure of a peripheral decoration attached to the molecular scaffold; wherein: in the scaffold sequence, the order of at least one decorative mark defines the order of at least one decoration; in the scaffold sequence, a first atomic identifier is adjacent to a first decorative mark; in said at least one decoration, the first decoration follows a first decorative separator; in said at least one decoration, the first decoration is defined as the first atomic identifier attached to the chemical structure of the molecule. In some aspects, the first decorative mark precedes the first connector atomic identifier of the scaffold sequence. The first connector atom can be any atom, including the first atom or the last atom in the scaffold sequence or any atom between them.
[0130] In some implementations, the line symbol facing the support includes: at least a second decorative mark adjacent to the second atom identifier; at least one second decorative separator after the first decorative mark; and at least one second decorative mark after the at least one second decorative separator, wherein each second decorative mark is separated by a second decorative mark.
[0131] In some implementations, the line symbol facing the support includes: a plurality of decorative marks adjacent to the corresponding atomic identifier; a plurality of decorative separators separated by the plurality of decorative marks; and each of the plurality of decorative marks follows the corresponding decorative separator.
[0132] In some implementations, each decoration includes a corresponding decorative mark followed by a line symbol of the chemical structure of the decoration.
[0133] In some implementations, each atom identifier is defined by the periodic table. In some aspects, each decorative mark is a single symbol. In some aspects, each decorative separator is a second symbol distinct from the decorative maker symbol. In some aspects, each decorative mark in the scaffold sequence is connected by a third symbol distinct from both the decorative maker symbol and the decorative separator symbol.
[0134] In some embodiments, a method for converting a line symbol of a chemical structure into a scaffold-oriented line symbol of the chemical structure may include: providing a line symbol of the chemical structure; converting the line symbol into a graphic symbol of the chemical structure; identifying at least one decoration of the scaffold and the graphic symbol of the chemical structure; separating the scaffold from the at least one decoration; converting the graphic representation of the scaffold into a corresponding line symbol representation of the scaffold; converting the graphic representation of each decoration into a corresponding line symbol representation of each decoration; identifying a first connecting atom in the scaffold connected to a first decoration; placing a first decoration mark adjacent to the first connecting atom; placing a first decoration separator after the last atom identifier or the last decoration mark of the scaffold sequence; placing a first decoration after the first decoration separator; and providing a scaffold-oriented line symbol for the chemical structure. The first connecting atom may be any atom, including the first atom or the last atom in the scaffold sequence or any atom between them.
[0135] In some embodiments, the method includes: identifying each atom and each bond of the chemical structure of the molecule; identifying the scaffold of the chemical structure; identifying each decoration attached to the scaffold atoms; identifying each bond between each decoration of the scaffold and the corresponding atom; and breaking the identified bonds between each decoration of the scaffold and the corresponding atom.
[0136] In some embodiments, the method includes: replacing each broken bond with a support node connected to the corresponding atom of the support; and replacing each broken bond with a decorative node arranged on each decoration.
[0137] In some embodiments, the method includes: constructing line symbols for the brackets having decorative markings for each bracket node; and constructing line symbols for each decoration.
[0138] In some embodiments, the method includes: determining the order of at least one decorative mark in the line symbols of the bracket; and arranging the at least one decoration in a decoration sequence having the order of the at least one decorative mark in the line symbols of the bracket, wherein each decoration has a decorative line symbol and is separated by a decorative separator.
[0139] In some implementations, the method includes arranging a scaffold sequence such that a first decorative marker precedes a first connecting atom identifier in the scaffold sequence. The first connecting atom can be any atom, including the first or last atom in the scaffold sequence or any atom in between.
[0140] In some embodiments, the method includes arranging line symbols to include: at least one second decorative mark adjacent to a second connecting atom identifier; at least one second decorative separator after a first decorative mark; and at least one second decorative mark after the at least one second decorative separator, wherein each second decorative mark is separated by a second decorative mark.
[0141] In some implementations, the method includes arranging line symbols to include: a plurality of decorative marks adjacent to the corresponding connecting atom identifier; a plurality of decorative separators separated by the plurality of decorative marks; and each of the plurality of decorative marks following the corresponding decorative separator.
[0142] In some implementations, the method includes defining each decoration to include a corresponding decoration marker, followed by a line symbol of the decoration's chemical structure.
[0143] In some implementations, the method includes at least one of the following: each atom identifier is defined by a periodic table; each decoration mark is a symbol; each decoration separator is a second symbol different from the decoration maker symbol; or each decoration mark in the scaffold sequence is connected by a third symbol different from the decoration maker symbol and the decoration separator symbol.
[0144] In some implementations, a method for converting scaffold-oriented line symbols (e.g., SOULS) of a chemical structure into different line symbols (e.g., SMILES) of the chemical structure may include: providing scaffold-oriented line symbols for the chemical structure; splitting the scaffold-oriented line symbols into scaffold sequences and each decoration; constructing a graphical representation of the scaffold sequences; constructing a graphical representation of each decoration; combining the graphical representations of the scaffold sequences and each decoration to form a graphical representation of the molecule; and converting the graphical representation of the molecule into different line symbols.
[0145] In some implementations, the method (e.g., SOULS to SMILES) includes: identifying bracket connection points on a graphical representation of each decorative bracket; identifying bracket atoms at each bracket connection point of each decorative bracket; and removing each bracket connection point.
[0146] In some implementations, the method (e.g., SOULS to SMILES) includes: identifying decoration connection points on the graphical representation of each decoration; identifying decoration atoms connected to the decoration connection points for each decoration; and removing each decoration connection point.
[0147] In some implementations, the method (e.g., SOULS to SMILES) includes: connecting each support atom to a corresponding decorative atom with bonds; and providing a graphical representation of the molecular chemical structure.
[0148] In some implementations, the method (e.g., SOULS to SMILES) includes identifying a first decoration separator and each decoration separator between each decoration, the first decoration separator being located after the last atomic identifier or the last decoration mark.
[0149] In some implementations, the method (e.g., SOULS to SMILES) includes identifying atom A in the support that defines the connection point of the decoration; identifying atom B in the decoration that defines the connection point of the support; identifying atom A_neig connected to atom A; identifying atom B_neig connected to atom B; removing atom A; removing atom B; and connecting atom A_neig to atom B_neig by a bond.
[0150] In some implementations, the method (e.g., SOULS to SMILES) includes: identifying each atom A in the support, which defines a connection point to the decoration; identifying each atom B in the decoration, which defines a connection point to the support; identifying each atom A_neig connected to each atom A; identifying each atom B_neig connected to each atom B; removing each atom A; removing each atom B; and connecting each atom A_neig to each corresponding atom B_neig by a bond.
[0151] In some embodiments, the method for calculating chemical structures may include: providing scaffold-oriented line symbols for a chemical structure of one embodiment to a computing system; and using the computing system to execute a computing protocol with scaffold-oriented line symbols.
[0152] In some embodiments, the computer program product may include: a non-transient tangible memory device having computer-executable instructions that, when executed by a processor, cause the execution of a method for converting line symbols of a chemical structure into scaffold-oriented line symbols for a chemical structure in one embodiment.
[0153] In some implementations, the computer program product may include: a non-transient tangible memory device having computer-executable instructions that, when executed by a processor, result in the execution of a method for converting scaffold-oriented line symbols of a chemical structure into different line symbols of the chemical structure.
[0154] Those skilled in the art will understand that, for the processes and other processes and methods disclosed herein, the functions performed in the processes and methods may be implemented in different orders. Furthermore, the steps and operations outlined are provided by way of example only, and some steps and operations may be optional, may be combined into fewer steps and operations, or may be extended into additional steps and operations without departing from the essence of the disclosed embodiments.
[0155] This disclosure is not limited to the specific embodiments described herein, which are intended as illustrative of various aspects. Many modifications and variations can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Based on the foregoing description, functionally equivalent methods and apparatuses within the scope of this disclosure, in addition to those listed herein, will be apparent to those skilled in the art. Such modifications and variations are intended to fall within the scope of the appended claims. This disclosure is limited only by the terms of the appended claims and the full scope of their equivalents. It should be understood that this disclosure is not limited to specific methods, reagents, compound compositions, or biological systems, which can certainly be varied. It should also be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not restrictive.
[0156] In one embodiment, the method may include aspects that are executed on a computing system. Therefore, the computing system may include a memory device having computer-executable instructions for performing the method. The computer-executable instructions may be part of a computer program product that includes one or more algorithms for performing any of the methods in any claim.
[0157] In one embodiment, any operation, process, method, or step described herein may be implemented as computer-readable instructions stored on a computer-readable medium. These computer-readable instructions may be executed by a processor of a variety of computing systems, including desktop computing systems, portable computing systems, tablet computing systems, handheld computing systems, and network elements, base stations, femtocells, and / or any other computing device.
[0158] There is little difference between the hardware and software implementations of various aspects of the system; the use of hardware or software is typically (but not always, as the choice between hardware and software can become important in some cases) a design choice representing a trade-off between cost and efficiency. The processes and / or systems and / or other technologies described herein may be influenced by a variety of tools (e.g., hardware, software, and / or firmware), and the preferred tools will vary depending on the environment in which the processes and / or systems and / or other technologies are deployed. For example, if the implementer determines that speed and accuracy are paramount, the implementer may choose hardware and / or firmware-dominant tools; if flexibility is paramount, the implementer may choose a software-dominant implementation; or, the implementer may choose some combination of hardware, software, and / or firmware.
[0159] The foregoing detailed description has illustrated various implementations of the process using block diagrams, flowcharts, and / or examples. As such block diagrams, flowcharts, and / or examples contain one or more functions and / or operations, those skilled in the art will understand that each function and / or operation in such block diagrams, flowcharts, or examples can be implemented individually and / or collectively by various hardware, software, firmware, or virtually any combination thereof. In one implementation, several portions of the subject matter described herein can be implemented using application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), or other integration formats. However, those skilled in the art will recognize that some aspects of the embodiments disclosed herein, in whole or in part, can be equivalently implemented in integrated circuits as one or more computer programs running on one or more computers (e.g., one or more programs running on one or more computer systems), as one or more programs running on one or more processors (e.g., one or more programs running on one or more microprocessors), firmware, or virtually any combination thereof, and that designing circuitry and / or writing code for the software and / or firmware will be entirely within the skill of those skilled in the art in light of this disclosure. Furthermore, those skilled in the art will understand that the mechanisms of the subject matter described herein can be distributed as program products in various forms, and regardless of the specific circumstances, the illustrative implementations of the subject matter described herein are applicable to the type of signal-bearing medium in which the distribution is actually performed. Examples of signal-bearing media include, but are not limited to, the following: recordable media, such as floppy disks, hard disk drives, CDs, DVDs, digital magnetic tapes, computer memory, etc.; and transmission-type media, such as digital and / or analog communication media (e.g., fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
[0160] Those skilled in the art will recognize that describing devices and / or processes in the manner described herein is common practice within the art, and that such devices and / or processes are subsequently integrated into data processing systems using engineering practice. That is, at least a portion of the devices and / or processes described herein can be integrated into data processing systems with a reasonable amount of experimentation. Those skilled in the art will recognize that typical data processing systems generally include one or more system unit housings, video display devices, memory (e.g., memory of volatile and non-volatile memories), processors (e.g., microprocessors and digital signal processors), computing entities (e.g., operating systems), drivers, graphical user interfaces and applications, one or more interactive devices, such as touchpads or screens, and / or control systems including feedback loops and control motors (e.g., feedback for sensing position and / or speed; control motors for moving and / or adjusting components and / or numbers). Typical data processing systems can be implemented using any suitable commercially available components, such as those commonly found in data computing / communication and / or network computing / communication systems.
[0161] The topics described herein sometimes illustrate different components contained within or connected to different other components. It should be understood that the architectures described are merely exemplary, and many other architectures that achieve the same functionality can actually be implemented. Conceptually, any arrangement of components that achieve the same functionality is effectively “associated” to achieve the desired function. Therefore, any two components combined in this document to achieve a particular function can be considered “associated” with each other to achieve the desired function, regardless of the architecture or intermediate components. Similarly, any two components so associating can also be considered “operably connected” or “operably coupled” to each other to achieve the desired function, and any two components that can be so associating can also be considered “operably coupled” to each other to achieve the desired function. Specific examples of operably coupled components include, but are not limited to, physically matchable and / or physically interacting components and / or wirelessly interacting and / or logically interacting components.
[0162] Figure 6 An example computing device 600 is shown, arranged to perform any of the computing methods described herein. In a very basic configuration 602, the computing device 600 typically includes one or more processors 604 and system memory 606. A memory bus 608 is used for communication between the processors 604 and the system memory 606.
[0163] Depending on the required configuration, processor 604 can be of any type, including but not limited to microprocessors (μP), microcontrollers (μC), digital signal processors (DSPs), or any combination thereof. Processor 604 may include multi-level caches, such as L1 cache 610 and L2 cache 612, processor core 614, and registers 616. Example processor core 614 may include an arithmetic logic unit (ALU), a floating-point unit (FPU), a digital signal processing core (DSP core), or any combination thereof. Example memory controller 618 may also be used with processor 604, or in some embodiments, memory controller 618 may be an internal part of processor 604.
[0164] Depending on the desired configuration, system memory 606 can be of any type, including but not limited to volatile memory (e.g., RAM), non-volatile memory (e.g., ROM, flash memory, etc.), or any combination thereof. System memory 606 may include operating system 620, one or more application programs 622, and program data 624. Application program 622 may include measurement application 626, which is arranged to perform the functions described herein, including those described with respect to the methods. Program data 624 may include measurement information 628 that can be used to analyze the contamination characteristics provided by sensor unit 240. In some embodiments, application program 622 may be arranged to operate together with program data 624 on operating system 620, thereby enabling the verification of work performed by untrusted computing nodes, as described herein. Figure 6 The basic configuration 602 is described by the components within the internal dashed lines.
[0165] Computing device 600 may have additional features or functions, as well as additional interfaces, to facilitate communication between basic configuration 602 and any desired devices and interfaces. For example, bus / interface controller 630 may be used to facilitate communication between basic configuration 602 and one or more data storage devices 632 via storage interface bus 634. Data storage device 632 may be removable storage device 636, non-removable storage device 638, or a combination thereof. Examples of removable and non-removable storage devices include disk devices such as floppy disk drives and hard disk drives (HDDs), optical disk drives such as compact disc (CD) drives or digital versatile disk (DVD) drives, solid-state drives (SSDs), and magnetic tape drives, etc. Example computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information, such as computer-readable instructions, data structures, program modules, or other data.
[0166] System memory 606, removable storage device 636, and non-removable storage device 638 are examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other storage technologies, CD-ROM, digital versatile disk (DVD) or other optical storage, cassette tape, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and is accessible by computing device 600. Any such computer storage medium may be part of computing device 600.
[0167] The computing device 600 may also include an interface bus 640 for facilitating communication from various interface devices (e.g., output device 642, peripheral interface 644, and communication device 646) via a bus / interface controller 630 to the basic configuration 602. Example output device 642 includes a graphics processing unit 648 and an audio processing unit 650, which can be configured to communicate with various external devices, such as displays or speakers, via one or more A / V ports 652. Example peripheral interface 644 includes a serial interface controller 654 or a parallel interface controller 656, which can be configured to communicate with external devices such as input devices (e.g., keyboards, mice, pens, voice input devices, touch input devices, etc.) or other peripheral devices (e.g., printers, scanners) via one or more input / output ports 658. Example communication device 646 includes a network controller 660, which can be arranged to facilitate communication with one or more other computing devices 662 over a network communication link via one or more communication ports 664.
[0168] A network communication link can be an example of a communication medium. A communication medium can typically be embodied in computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transmission mechanism, and can include any information delivery medium. A “modulated data signal” can be a signal having one or more characteristics set or altered in a manner that encodes information in the signal. For example, but not limited to, communication media can include wired media, such as wired networks or direct wired connections, and wireless media, such as acoustic, radio frequency (RF), microwave, infrared (IR), and other wireless media. The term computer-readable medium as used herein can include storage media and communication media.
[0169] The computing device 600 can be implemented as part of a small portable (or mobile) electronic device, such as a mobile phone, personal data assistant (PDA), personal media player device, wireless network watch device, personal headset device, application-specific device, or hybrid device including any of the above functions. The computing device 600 can also be implemented as a personal computer, including laptop and non-laptop configurations. The computing device 600 can also be any type of network computing device. The computing device 600 can also be an automated system as described herein.
[0170] The implementation methods described herein may include the use of dedicated or general-purpose computers that include various computer hardware or software modules.
[0171] Embodiments within the scope of this invention also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available medium accessible to a general-purpose or special-purpose computer. By way of example, and not limitation, such computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, or any other medium that can be used to carry or store the required program code means in the form of computer-executable instructions or data structures and that can be accessed by a general-purpose or special-purpose computer. When information is transmitted or provided to a computer via a network or other communication connection (hardwired, wireless, or a combination of hardwired and wireless), the computer correctly considers the connection as a computer-readable medium. Therefore, any such connection is properly referred to as a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.
[0172] Computer-executable instructions include, for example, instructions and data that cause a general-purpose computer, a special-purpose computer, or a special-purpose processing device to perform a particular function or group of functions. Although the subject matter has been described in language specific to structural features and / or methodological behavior, it should be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or behaviors described above. Rather, the specific features and behaviors described above are disclosed as exemplary forms for implementing the claims.
[0173] As used herein, the terms "module" or "component" can refer to a software object or routine that executes on a computing system. The different components, modules, engines, and services described herein can be implemented as objects or processes that execute on a computing system (e.g., as separate threads). While the systems and methods described herein are preferably implemented in software, implementation in hardware or a combination of software and hardware is also possible and contemplated. In this description, a "computing entity" can be any computing system as previously defined herein, or a combination of any modules or modulators running on a computing system.
[0174] Regarding the use of any plural and / or singular terms in this document, those skilled in the art can translate them from plural to singular and / or from singular to plural depending on the context and / or application. For clarity, various singular / plural permutations may be explicitly described herein.
[0175] Those skilled in the art will understand that, in general, the terms used herein, and in particular the appended claims (e.g., the body of the appended claims), are typically intended to be “open” terms (e.g., the term “comprising” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “at least having,” the term “comprising” should be interpreted as “including but not limited to,” etc.). Those skilled in the art will further understand that if a particular number of the introduced claims are intended to be referenced, this intention will be explicitly stated in the claims, and without such reference, this intention does not exist. For example, to aid understanding, the appended claims below may contain the use of introductory phrases “at least one” and “one or more” to introduce the claim statements. However, the use of such phrases should not be construed as implying that a claim statement introduced by the indefinite article “a” (a) or “an” (an) limits any particular claim containing such an introduced claim statement to containing only one implementation of such a statement, even when the same claim includes the introductory phrase “one or more” or “at least one” and the indefinite article, such as “a” (a) or “an” (an) (e.g., “a” (a) or “an” (an) should be interpreted as “at least one” or “one or more”); the same applies to the use of definite articles used to introduce claim statements. Furthermore, even when a specific number of introduced claims is explicitly enumerated, those skilled in the art will recognize that such enumeration should be interpreted as indicating at least the number enumerated (e.g., a simple enumeration of “enumerated twice” without other modifiers, indicating at least two enumerations, or two or more enumerations). Furthermore, when using a structure similar to "at least one of A, B, and C," this structure is generally intended to convey the meaning of the agreement as understood by a person skilled in the art (e.g., "a system having at least one of A, B, and C" includes, but is not limited to, systems where A is alone, B is alone, C is alone, A and B together, A and C together, B and C together, and / or A, B, and C together, etc.). When using a structure similar to "at least one of A, B, or C," this structure is generally intended to convey the meaning of the agreement as understood by a person skilled in the art (e.g., "a system having at least one of A, B, or C" includes, but is not limited to, systems where A is alone, B is alone, C is alone, A and B together, A and C together, B and C together, and / or A, B, and C together, etc.). A person skilled in the art will further understand that any separate words and / or phrases that actually present two or more alternative terms, whether in the specification, claims, or drawings, should be understood to presuppose the possibility of including one, any, or both terms. For example, the phrase “A or B” would be understood to include the possibility of “A” or “B” or “A and B”.
[0176] Furthermore, in cases where features or aspects of this disclosure are described in accordance with the Markush Group, those skilled in the art will recognize that this disclosure is therefore also described in accordance with any individual member or subgroup of members of the Markush Group.
[0177] Those skilled in the art will understand that, for any and all purposes, such as providing a written description, all scopes disclosed herein also include any and all possible subscopes and combinations thereof. Any listed scope can be readily identified as sufficiently descriptive, and the same scope can be decomposed into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each scope discussed herein can be readily decomposed into a lower third, a middle third, and an upper third, etc. As those skilled in the art will also understand, all language, such as “at most,” “at least,” etc., includes the listed numbers and refers to a scope that can subsequently be decomposed into subscopes as described above. Finally, as those skilled in the art will understand, a scope includes each individual member. Thus, for example, a group having 1-3 units means a group having 1, 2, or 3 units. Similarly, a group having 1-5 units means a group having 1, 2, 3, 4, or 5 units, and so on.
[0178] In summary, it should be understood that various embodiments of this disclosure have been described herein for ease of explanation, and various modifications may be made without departing from the scope and spirit of this disclosure. Therefore, the various embodiments disclosed herein are not intended to be limiting, and the true scope and spirit are indicated by the appended claims.
[0179] This patent application cross-references the following: U.S. Application No. 16 / 015,990, filed June 2, 2018; U.S. Application No. 16 / 134,624, filed September 18, 2018; U.S. Application No. 16 / 562,373, filed September 5, 2019; U.S. Application No. 62 / 727,926, filed September 6, 2018; U.S. Application No. 62 / 746,771, filed October 17, 2018; and U.S. Application No. 62 / 809,413, filed February 22, 2019; these applications are incorporated herein by reference in their entirety. All references cited herein are incorporated herein by reference in their entirety.
Claims
1. A non-transient computer-readable medium storing instructions that, when executed by one or more processors, cause a system to perform operations, said operations including: Chemical structures are obtained through computational systems; The chemical structure was converted into computer-readable scaffold-oriented line symbols; The line symbols facing the bracket include: A scaffold sequence of multiple atomic identifiers arranged in line symbols, which defines a scaffold for the chemical structure of the molecule, wherein the scaffold sequence includes at least one decorative mark, each decorative mark being adjacent to an atomic identifier of a connecting atom of a scaffold to which the decoration is attached, wherein, in the chemical structure of the molecule, the decoration is a chemical portion of a connecting atom of a scaffold bound to the scaffold. Decorative separator, following the last atomic identifier or the last decorative mark in the scaffold sequence; At least one decoration having at least one atom identifier in a line symbol that defines the chemical structure of the chemical portion of the decoration that connects to the support of the molecule. in: In the bracket sequence, the order of at least one decorative mark defines the order of at least one decoration; In the at least one decoration, the first decoration follows the first decoration separator; and In the at least one decoration, the first decoration is defined as a first connecting atom identifier among a plurality of atom identifiers connected between a first atom identifier and a last atom identifier, wherein the first connecting atom is any atom, including the first atom or the last atom in the scaffold sequence or any atom between them; and The computing system is used to execute a computing protocol with support-oriented line symbols.
2. The non-transient computer-readable medium as described in claim 1, characterized in that, The at least one decorative mark is located in one of the following positions: before the first atomic identifier of the bracket sequence connected to the first decoration; after the first atomic identifier of the bracket sequence connected to the first decoration; before the first connection atomic identifier of the bracket sequence connected to the first decoration, wherein the first connection atomic identifier is not the first atomic identifier in the bracket sequence; after the first connection atomic identifier of the bracket sequence combined with the first decoration; before the subsequent atomic identifier of the bracket sequence connected to the first decoration. Or after the subsequent atomic identifiers connected to the first decorative bracket sequence.
3. The non-transient computer-readable medium as described in claim 1, characterized in that, In the scaffold sequence, the first connecting atomic identifier of multiple atomic identifiers is adjacent to the first decorative mark.
4. The non-transient computer-readable medium of claim 1, comprising a first decorative mark preceding the first atomic identifier of the scaffold sequence.
5. The non-transient computer-readable medium of claim 2, comprising: At least one subsequent decorative mark adjacent to the subsequent atomic identifier; At least one subsequent decorative separator after the first decorative element; and At least one subsequent decoration after at least one subsequent decoration separator, Each subsequent decoration is separated by a subsequent decoration marker.
6. The non-transient computer-readable medium of claim 1, comprising: Multiple decorative markers adjacent to the corresponding atomic identifier; Multiple decorations separated by multiple decoration separators; and Each of the multiple decorations follows its corresponding decoration separator.
7. The non-transient computer-readable medium of claim 1, comprising each decoration including a corresponding decoration mark followed by a line symbol of the decoration chemical structure.
8. The non-transient computer-readable medium as claimed in claim 1, characterized in that, Each atom identifier is defined by the periodic table.
9. The non-transient computer-readable medium as claimed in claim 8, characterized in that, Each decorative mark is a symbol.
10. The non-transient computer-readable medium as claimed in claim 9, characterized in that, Each decorative separator is a second symbol, distinct from the decorative maker's symbol.
11. The non-transient computer-readable medium as claimed in claim 10, characterized in that, Each decorative mark in the bracket sequence is connected by a third symbol, different from the decorative maker symbol and the decorative separator symbol.
12. The non-transient computer-readable medium as claimed in claim 1, characterized in that, The operation includes converting the line notation of the molecular chemical structure into a scaffold-oriented line notation of the chemical structure through the following steps: The line symbols of the chemical structure are obtained using a computer. The line symbols are converted into graphic symbols of the chemical structure using a computer. A scaffold for identifying the graphic symbols of the chemical structure; Identify any decorations attached to the support in the graphic symbol of the chemical structure; Separate the stand from any decoration; The graphical representation of the stent is converted into a corresponding line symbol representation of the stent by a computer, wherein the line symbol includes multiple atomic identifiers arranged in the stent sequence; Convert the graphic representation of any decoration into the corresponding line symbol representation for each decoration; When a first decoration is present and connected to a first connecting atom in the chemical structure, the first connecting atom connected to the first decoration in the support is identified; When the first connecting atom is identified, the first connecting atom identifier of the first connecting atom in the scaffold sequence is identified; When the first decoration is present in the chemical structure, the first decoration mark is placed adjacent to the first connecting atom identifier in the scaffold sequence; Place the first decorative separator after the last atomic identifier or the last decorative mark in the scaffold sequence; When the first decoration exists in the chemical structure, place the first decoration after the first decoration separator; The chemical structure is provided by computer as a scaffold-oriented line symbol; and A computational protocol with support-oriented line symbols is executed by a computer.
13. The non-transient computer-readable medium of claim 12, wherein the operation further comprises: At least one decoration of a graphic symbol for identifying a chemical structure; Separate the bracket from at least one decoration; Convert the graphic representation of each decoration into the corresponding line symbol representation of each decoration; Identify the first connection atom identifier of the first decoration connected to the last identified decoration in the scaffold sequence; Place the first decorative mark adjacent to the first connecting atom identifier; Place the first decoration after the first decoration separator; and Provided for scaffold-oriented line symbols for chemical structures, wherein the scaffold-oriented line symbols include at least one decorative scaffold sequence and a decorative sequence, wherein the scaffold sequence and the decorative sequence are separated by a first decorative separator.
14. The non-transient computer-readable medium as claimed in claim 13, characterized in that, The support line symbol includes: A scaffold sequence of multiple atomic identifiers arranged in line symbols defines a scaffold for a molecular chemical structure, wherein the scaffold sequence includes at least one decorative marker, each decorative marker being adjacent to an atomic identifier of a connecting atom of a scaffold to which the decoration is attached, wherein, in the chemical structure of the molecule, the decoration is a chemical part of a connecting atom attached to the scaffold. Decorative separator, following the last atomic identifier or the last decorative mark in the scaffold sequence; The at least one decoration has at least one atom identifier in a line symbol, which defines the chemical structure of the chemical portion of the decoration that is connected to the connecting atom of the molecular scaffold; in: In the bracket sequence, the order of at least one decorative mark defines the order of at least one decoration; In the at least one decoration, the first decoration follows the first decoration separator; and In the at least one decoration, the first decoration is defined as the first connecting atomic identifier among a plurality of atomic identifiers connected between the first atomic identifier and the last atomic identifier.
15. The non-transient computer-readable medium of claim 13, wherein the operation further comprises: Identify each atom and each bond in the molecular chemical structure; scaffolds for identifying chemical structures; Identify each decoration connected to the support atom; Identify each bond between the corresponding atoms of each decoration and support; and Break the identified bonds between the corresponding atoms of each decoration and support.
16. The non-transient computer-readable medium of claim 15, wherein the operation further comprises: Replace each broken bond with a support node that is connected to the corresponding atom of the support; and Replace each broken key with the decorative nodes arranged on each decoration.
17. The non-transient computer-readable medium of claim 16, wherein the operation further comprises: Construct a line symbol for the bracket with decorative markings for each decorative node; and Construct the line symbols for each decoration.
18. The non-transient computer-readable medium of claim 16, wherein the operation further comprises: Determine the order of at least one decorative mark in the line symbol of the bracket; and The at least one decoration is arranged in a decorative sequence, the decorative sequence having the order of the at least one decorative mark in the line symbols of the bracket, wherein each decoration has a decorative line symbol and is separated by a decorative separator.
19. The non-transient computer-readable medium of claim 13, wherein the operation further comprises: Arrange the support sequence such that the first decorative mark precedes the first connecting atom identifier of the support sequence.
20. The non-transient computer-readable medium of claim 13, wherein the operation further comprises arranging line symbols to include: At least one subsequent decorative mark adjacent to the subsequent atomic identifier; At least one subsequent decorative separator following the first decorative element; and At least one subsequent decoration following the at least one subsequent decoration separator, Each subsequent decoration is separated by a subsequent decoration marker.
21. The non-transient computer-readable medium of claim 13, wherein the operation further comprises arranging line symbols to include: Multiple decorative markers adjacent to the corresponding atomic identifier; Multiple decorative separators separated by multiple decorative elements; and Each of the multiple decorations follows its corresponding decoration separator.
22. The non-transient computer-readable medium of claim 13, wherein the operation includes defining each decoration as including a corresponding decoration mark, followed by a line symbol of the chemical structure of the decoration.
23. The non-transient computer-readable medium of claim 13, wherein the operation further comprises at least one of the following: Each atom identifier is defined by the periodic table; Each decorative mark is a symbol; Each decorative separator is a second symbol distinct from the decorative maker's symbol; or Each decorative mark in the bracket sequence is connected by a third symbol, which is different from the decorative maker symbol and the decorative separator symbol.
24. The non-transient computer-readable medium of claim 12, wherein the operation further comprises: Line symbols facing the support are used as training data for the machine learning model.
25. The non-transient computer-readable medium of claim 1, wherein the operation further comprises: Provide the scaffold-oriented line symbols of the aforementioned chemical structure to the computing system; and Using a computing system to execute a computational protocol with scaffold-oriented line symbols, including at least one of genetic algorithms, Bayesian optimization, and random search.
26. The non-transient computer-readable medium of claim 12, wherein the operation further comprises: Using a computing system to execute a computational protocol with scaffold-oriented line symbols, including at least one of genetic algorithms, Bayesian optimization, and random search.
27. The non-transient computer-readable medium of claim 1, wherein the operation further comprises: A machine learning model was trained using the scaffold-oriented line notation of the chemical structure. and Chemical analysis was performed using a trained machine learning model.
28. The non-transient computer-readable medium of claim 27, wherein the operation further comprises: The chemical structures generated are designed using the trained machine learning model; and Provide the chemical structure of the generated product.
29. The non-transient computer-readable medium of claim 27, wherein the operation further comprises: The desired properties of the chemical structure are determined using the trained machine learning model. The trained machine learning model is used to generate a molecule, wherein the generated molecule has the desired property; and Output the generated molecule.
30. The non-transient computer-readable medium of claim 27, wherein the operation further comprises generating an analogue of the chemical structure for training the machine learning model by means of the following steps: Identify at least one decoration on the chemical structure that is to be replaced by at least one different decoration; The chemical structure is modified with at least one different decoration to produce an analogue of the chemical structure; and Provide analogues of the chemical structure described above.
31. The non-transient computer-readable medium of claim 12, wherein the operation further comprises: A machine learning model was trained using the scaffold-oriented line notation of the chemical structure. and Chemical analysis was performed using a trained machine learning model.
32. The non-transient computer-readable medium of claim 31, wherein the operation further comprises: The chemical structures generated are designed using the trained machine learning model; and Provide the chemical structure of the generated product.
33. The non-transient computer-readable medium of claim 31, wherein the operation further comprises: The desired properties of the chemical structure are determined using the trained machine learning model. The trained machine learning model is used to generate a molecule, wherein the generated molecule has the desired property; and Output the generated molecule.
34. The non-transient computer-readable medium of claim 31, wherein the operation further comprises generating an analogue of the chemical structure for training the machine learning model by means of the following steps: Identify at least one decoration on the chemical structure that is to be replaced by at least one different decoration; The chemical structure is modified with at least one different decoration to produce an analogue of the chemical structure; and Provide analogues of the chemical structure described above.
35. A computer program product comprising: A non-transient tangible storage device having computer-executable instructions that, when executed by a processor, cause the execution of the operation as described in claim 12.