Prioritized biological targets

By receiving category levels input by users, the system uses machine learning methods to determine the consistency between biological targets and levels, automatically prioritizing biological targets. This solves the problem of biological target identification in existing technologies, which is time-consuming, expensive, and susceptible to human bias, and achieves efficient and accurate biological target prioritization.

CN114762049BActive Publication Date: 2026-06-12BENEVOLENTAI TECH LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BENEVOLENTAI TECH LTD
Filing Date
2020-11-27
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

The process of identifying suitable biological targets in existing technologies is time-consuming, expensive, and susceptible to human bias, leading to inaccurate results.

Method used

By receiving category levels input by users, machine learning methods are used to determine the degree of consistency between biological targets and each level, and biological targets are automatically prioritized based on the degree of consistency, outputting prioritized biological target representations.

Benefits of technology

It reduces the time and cost of manual analysis, eliminates human bias, enables the review of larger data sources, uncovers potential results, and improves the accuracy of results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN114762049B_ABST
    Figure CN114762049B_ABST
Patent Text Reader

Abstract

A computer-implemented method of prioritizing biological targets is disclosed. The method includes receiving a selection of one or more categories of rankings; and determining, for each of a plurality of biological targets, a degree of consistency of the biological target with each selected ranking. The method further includes prioritizing the biological targets based on the degree of consistency; and outputting a representation of one or more prioritized biological targets.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application relates to systems and methods for prioritizing biological targets. The currently disclosed techniques find specific applications in the fields of biochemistry and drug discovery where biological targets with certain characteristics may be required. Background Technology

[0002] In the field of drug discovery, it is necessary to identify suitable biological targets, such as genes, nucleic acid sequences, proteins, amino acid sequences, protein complexes, or biological pathways for treating diseases. Typically, potentially suitable biological targets are reviewed by scientific experts in the field. They manually review data sheets related to the target and rank or otherwise prioritize them according to desired criteria. For example, scientists might manually review data related to the severity and incidence of side effects associated with a biological target. Other categories to be reviewed and considered may include druggability, other safety aspects, and whether there is a known association between the target and successful treatment of the disease. This manual review process is time-consuming and expensive, and the results can be affected by human bias or error.

[0003] Therefore, there is a need for an improved technique for identifying suitable biological targets that does not require users to manually review biological target data.

[0004] The embodiments described below are not limited to implementations that address any or all of the drawbacks of the known methods described above. Summary of the Invention

[0005] This summary is provided to introduce the selection of concepts in a simplified form, which will be further described in the detailed description below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to define the scope of the claimed subject matter.

[0006] In a first aspect, this disclosure provides a computer-implemented method for prioritizing biological targets, the method comprising: receiving selections of one or more categories of classes; determining, for each of a plurality of biological targets, a degree of alignment between the biological target and each selected class; prioritizing the biological targets based on the degree of alignment; and outputting a representation of one or more prioritized biological targets.

[0007] Optionally, the rank of the category represents a numerical value or range of values ​​for the category. Optionally, selected ranks in one of the categories are not adjacent to each other. Optionally, the selection of ranks includes at least two ranks of the same category. Optionally, the category represents the nature of the biological target. Optionally, the method includes: receiving user input, the user input including the selection of ranks for one or more categories. Optionally, the degree of consistency between the biological target and the selected rank includes the probability that the biological target belongs to the selected rank. Optionally, the probability corresponds to a normalized distribution across all ranks of the same category. Optionally, the method includes: determining the degree of consistency from one or more data sources. Optionally, the method includes: aggregating the degree of consistency from the rank based on the respective data sources. Optionally, the method includes: determining the degree of consistency using a trained machine learning ranker. Optionally, the biological target includes genes, nucleic acid sequences, proteins, amino acid sequences, protein complexes, and / or biological pathways. Optionally, prioritizing the biological target includes identifying biological targets matching the user input by applying a minimum required degree of consistency to each selected rank. Optionally, the method includes: determining confidence metrics for the degree of consistency, and optionally, ranking the biotargets matching the user input based on the confidence metrics. Optionally, the method includes: determining the confidence metrics using machine learning techniques. Optionally, prioritizing the biotargets includes ranking the biotargets based on their degree of consistency with a selected level. Optionally, the user input includes an indication of the relative importance of the category, and prioritizing the biotargets includes using the indication of the relative importance. Optionally, the method includes: outputting a representation of the biotargets matching the user input. Optionally, the method includes: outputting a representation of the ranking. Optionally, the method includes: outputting a representation of the confidence metrics. Optionally, the method includes: providing a graphical user interface as an input and / or output tool. Optionally, the method includes: providing a user input tool to enable a user to generate a manual marking command to overwrite at least a portion of the output, the manual marking command specifying whether one of the biotargets belongs to one of the levels. Optionally, the method includes: training the grader based on the manual labeling command and / or using an overwrite command to add a set of training data.

[0008] In a second aspect, this disclosure provides a computer-readable medium for storing code that, when executed by a computer, causes the computer to perform the method described in any of the preceding claims.

[0009] In a third aspect, this disclosure provides a system for prioritizing biological targets, the system comprising: an input module configured to receive selections of one or more categories of levels; an analysis module configured to determine, for each of a plurality of biological targets, the degree of consistency between the biological target and each selected level; a prioritization module configured to prioritize the biological targets based on the degree of consistency; and an output module configured to output representations of one or more prioritized biological targets.

[0010] The methods described herein can be implemented by software in a machine-readable form on a tangible storage medium, such as in the form of a computer program comprising computer program code, which is suitable for implementing all steps of any method described herein when the program is run on a computer and the computer program therein can be embodied on a computer-readable medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory cards, etc., but exclude propagated signals. The software can be adapted to execute on a parallel or serial processor, such that the method steps can be implemented in any suitable order or simultaneously.

[0011] This application acknowledges that firmware and software can be valuable, separately tradable commodities. It is intended to cover software that runs or controls “dumb” or standard hardware to implement the required functions. It is also intended to cover software that “describes” or defines the configuration of hardware, such as HDL (Hardware Description Language) software used to design silicon chips or configure general-purpose programmable chips to implement the required functions.

[0012] Preferred features can be appropriately combined, as will be apparent to those skilled in the art, and can be combined with any aspect of the invention. Attached Figure Description

[0013] Embodiments of the invention will be described by way of example with reference to the following figures, wherein:

[0014] Figure 1 This is a block diagram of a system for prioritizing biological targets according to an embodiment of the present invention;

[0015] Figure 2 According to embodiments of the present invention, it can be made by Figure 1 A flowchart of the system implementation method;

[0016] Figure 3 A block diagram of the system's analysis modules to display optional features;

[0017] Figure 4 A block diagram showing sample data sources that the system can use;

[0018] Figure 5A block diagram showing the system's prioritization modules for optional features;

[0019] Figure 6 A block diagram illustrating an example implementation of the system;

[0020] Figure 7 A block diagram of a variant of the system according to another embodiment of the present invention; and

[0021] Figure 8 A block diagram of a computer suitable for implementing embodiments of the present invention.

[0022] Common reference numerals are used in all the accompanying drawings to indicate similar features. Detailed Implementation

[0023] The following description of embodiments of the invention is by way of example only. These examples represent the best mode of carrying out the invention as currently known to the applicant, although they are not the only ways in which this can be achieved. The description illustrates the function of the examples and the order of steps for constructing and operating the examples. However, the same or equivalent function and order can be accomplished by different examples.

[0024] In the field of biochemistry, the task of developing new therapies for diseases often involves attempting to identify suitable biological targets, such as genes, nucleic acid sequences, proteins, amino acid sequences, protein complexes, or biological pathways that can interact with drugs. For the avoidance of ambiguity, in this document, “nucleic acid sequence” includes deoxyribonucleic acid (DNA) (including genes) and ribonucleic acid (RNA), and “biological target” includes biomolecules, complexes, or pathways that can be targeted by drugs to treat diseases. To identify suitable biological targets from a large pool of potential candidates, an evaluation of their characteristics and a decision-making process can be implemented regarding which candidates meet a desired set of criteria. Depending on the context and purpose of the required biological target, the desired characteristics can span multiple categories, such as ligandability, safety, and therapeutic evidence, all of which must be considered; therefore, the search requires considering multiple properties of candidates simultaneously. In traditional methods, this complex analysis is performed manually by scientists who review data related to potential biological targets and screen candidates for potential matches of the desired characteristics. In cases involving multiple categories and a large number of potential biological targets, this manual analysis is very time-consuming and tends to lead to delays and increased costs in the process of developing new therapies for diseases.

[0025] The inventors have recognized the need for a system that eliminates the burden on scientists to manually review potential biological targets and assists the process by automatically generating outputs of biological targets that have been prioritized in a reasonable manner based on user-specified criteria.

[0026] The system according to the invention for automated prioritization of biological targets is associated with a number of advantages. Such a system not only saves time but also eliminates the potential dangers of human bias in decision-making that could limit or distort outcomes. Therefore, it can produce results that manual review of available information might miss. Furthermore, the automated system's ability to review larger data sources in ways different from humans also increases the likelihood of producing results that human experts might not find.

[0027] Figure 1 A system 100 for prioritizing biological targets based on user input, according to an embodiment of the present invention, is illustrated. It will be understood that in other embodiments, the biological targets may be based on predetermined or automatically generated criteria, rather than on user input. Figure 1 In this embodiment, system 100 is configured to receive user input 102 at input module 104 of system 100. User input 102 relates to desired characteristics of a biological target, which may include numerical values ​​or ranges of values ​​expressed as permissible levels within various categories. Categories may represent the physical, chemical, or biological properties of the biological target, or other classifications or categorizations. Other classifications of the biological target may represent considerations such as the target's well-known association with a specific disease. Permissible levels for these categories may represent permissible numerical values ​​or ranges of values ​​for these categories requested by the user. In this example, input module 104 may suitably include a graphical user interface configured to receive user selections of permissible levels across one or more categories, where the categories may also be user-selected. Values ​​for some categories (such as solubility) may be numerical, while values ​​for other categories may be expressed using words such as "safe" or "unsafe." By specifying permissible levels for one or more categories, the user is able to indicate to the system the desired characteristics of the biological target. When a user selects multiple categories and allowed levels, complex analyses considering multiple categories can be performed automatically, thus eliminating the burden of manual analysis for the user.

[0028] An unrestricted list of example categories includes the following:

[0029] • Ligand capacity – An assessment of the likelihood of the existence or potential generation of small molecule regulators that can effectively interact with biological targets.

[0030] • Safety – Assessment of the likelihood that modulating the target may lead to serious adverse clinical events.

[0031] • Therapeutic evidence – An assessment of the potential to treat related diseases by targeting known modulatory targets.

[0032] • Biological principles – assessing the likelihood that abnormal regulation of targets leads to disease.

[0033] • Target expression – A measure of whether a target is expressed in the relevant tissue / cell type and / or differentially expressed in the relevant healthy and diseased tissue / cell types.

[0034] • Stratifiability – A measure of whether a target is expressed differentially between or across different endotypes, which can be defined by clinical characteristics or latent variables. For example, latent variables can provide a measure of whether a target is expressed differentially across different patient endotypes, which can be defined by clinical or biological data. In this case, expression differences occur between disease subgroups, and within a specific disease endotype, the target can be consistently expressed within predefined boundaries. In another example, latent variables can provide a measure of whether a target is expressed differentially within a patient endotype, which can be defined by clinical or biological data. In this case, expression differences occur within disease subgroups, causing the target to be expressed in multiple ways within a single endotype of interest.

[0035] System 100 includes an analysis module 106 configured to determine the degree of consistency between each of a plurality of biological targets and each selected level input by the user. The degree of consistency between a biological target and a selected level provides a measure of the probability that the biological target belongs to the selected level. Therefore, determining the degree of consistency provides a deeper understanding of how well each biological target matches the user-specified criteria. For example, the degree of consistency can be expressed as a percentage or a probability value, or in other suitable examples, as words such as “high probability” or “low probability.” In the case where the degree of consistency is expressed as a numerical probability, the probability can correspond to a distribution, such as a normalized probability distribution across all levels of the same category. Using data from one or more data sources 108, the analysis module 106 can query a large number (e.g., hundreds, thousands, or hundreds of thousands) of biological targets.

[0036] System 100 includes a prioritization module 110 configured to prioritize biological targets based on a degree of consistency. For example, biological targets with good consistency with a user-selected level may be considered to match user requirements and may be prioritized over non-matching biological targets. Alternatively or additionally, biological targets may be prioritized by ordering them according to proximity to user requirements and / or according to the confidence level of the match between the biological targets and user-specified criteria. In this context, prioritization means any form of organization, classification, or labeling of biological targets based on their degree of conformity to user-defined criteria input by the user using a degree of consistency. The following is combined with... Figure 5 Details of the prioritization module 110 are described.

[0037] Finally, system 100 includes an output module 112 configured to output a representation of one or more prioritized biological targets 114. This may include at least some biological targets in sorted order, for example, the top ten biological targets whose properties most closely match the allowed levels specified in user input 102. Alternatively, all biological targets deemed to meet the user's requirements may be reported, or any other suitable reporting format may be provided, consisting of biological targets organized, categorized, or labeled by the prioritization module.

[0038] Therefore, this disclosure extends to a system 100 for prioritizing biological targets based on user input. System 100 includes: an input module 104 configured to receive user input including the selection of levels for one or more categories; an analysis module 106 configured to determine the degree of consistency between each of a plurality of biological targets and each selected level; a prioritization module 110 configured to prioritize biological targets based on the degree of consistency; and an output module 112 configured to output a representation of one or more prioritized biological targets. This disclosure also extends to systems where the selection of levels for one or more categories is not based on user input, but may, for example, be predetermined or automatically generated.

[0039] This disclosure also extends to a computer-implemented method 200 for prioritizing biological targets based on user input. Method 200 includes: receiving 202 user input, the user input including a selection of ranks for one or more categories; determining 204 a degree of consistency between the biological target and each selected rank for each of a plurality of biological targets; prioritizing the biological targets based on the degree of consistency 206; and outputting 208 a representation of the one or more prioritized biological targets. This disclosure also extends to the method in which the selection of ranks for one or more categories is not based on user input, but may, for example, be predetermined or automatically generated.

[0040] As described above, the analysis module 106 is configured to determine the degree of consistency between each of a plurality of biological targets and each permitted level already selected by the user. This creates a metric of the degree to which each biological target conforms to the requirements already specified by the user, allowing biological targets to be subsequently prioritized based on their level of conformity to the user's requirements. As described above, the degree of consistency between one biological target and its respective permitted level can take the form of the probability that the biological target belongs to its respective permitted level.

[0041] In one example, user input 102 could include a selection of allowed levels across categories of safety and biological principles. Within the safety category, the user may have selected levels based on which regulation of the biological target is known not to cause serious adverse clinical events or is predicted not to cause serious adverse clinical events. Within the biological principles category, the user may have selected levels where abnormal regulation of the target is known not to cause disease or is predicted not to cause disease. Example user selections for these levels are shown in Table 1 below.

[0042]

[0043]

[0044] Table 1

[0045] In this example, the analysis module 106 is configured to determine the consistency of each of a plurality of biological targets with safety categories 3 and 4, and to determine the consistency of each of the biological targets with biological principle categories 3 and 4. This can be achieved by referencing one or more data sources containing data related to the biological targets, which can be used to determine the characteristics of the biological targets and thereby infer their degree of conformity with the user-selected levels.

[0046] Therefore, the analysis module 106 can perform the role of a classifier by querying data from one or more data sources to classify each biological target with the probability of belonging to each level of each category. In a suitable example, the probability can be in the form of a probability normalized across levels of a given category. For example, this can be achieved by using various probability classifier methods (such as Naive Bayes). (Baynes), logistic regression, or support vector machines can be used to assign probabilities to each category level. Alternatively or alternatively, machine learning methods, such as a trained machine learning classifier, can be used to determine the degree of consistency between the biological target and the user requirements. Therefore, the analysis module 106 may suitably include, for example, Bayesian regression, logistic regression, or support vector machines. Figure 3 The machine learning grader 304 shown is illustrated.

[0047] If multiple data sources are used, then analysis module 106 can be configured to generate possibilities that take into account the combination of multiple data sources. The use of multiple data sources can result in multiple classifications for the same category, thus some form of aggregation may be needed to return a final classification for a given biological target and a given category. This can be implemented in various ways, such as by determining a weighted average of the classifications across all available data sources. In this case, analysis module 106 may appropriately include, for example... Figure 3 The summary module 302 shown is configured to generate a degree of consistency by hierarchically summarizing them from their respective data sources.

[0048] A confidence score can be configured during grading to indicate the level of confidence in the degree of consistency determined between the biological target and the user requirements. The confidence score can be inferred from the distribution of results from multiple data sources and / or from a machine learning model (in the case where analysis module 106 includes a machine learning grader 304) or from any other suitable computational method.

[0049] refer to Figure 4 It is understood that, in the case of machine learning methods, one or more data sources 108 can be used to train the machine learning grader 304. These data sources 108 may include biomedical literature 402, at least one biomedical database 404, and predictions 406 related to the characteristics and properties of biological targets. When ingesting biomedical literature 402, the machine learning grader 304 can be configured to examine the text of the literature to determine the probable grade for a given target within a given category. For example, if a drug target and words indicating serious effects or serious side effects are frequently mentioned in biomedical literature, then the probable grade assigned to this target in the safety category may be associated with a high probability of adverse reactions.

[0050] Once the analysis module 106 has determined the degree of consistency between the biological targets and the category level selected by the user, the characterization of the biological targets is complete. The system 100 then prepares to classify and organize the characterized biological targets by prioritizing them according to the degree of consistency, in order to return feasible recommendations for biological targets that meet the user's requirements.

[0051] refer to Figure 5 The prioritization module 110 may include a matching identification module 502, which is configured to identify biological targets deemed to match user input 102 by applying a minimum required degree of consistency for each selected level. Appropriately, biological targets deemed to match user input may need to meet the minimum required degree of consistency for each selected level. Upon identification of a matching biological target, the output module 112 of system 100 may be configured to output a representation of the biological targets matching the user input, and may also be configured to output a list of targets deemed unsuitable based on user input. For example, these outputs may be provided to the user using a graphical user interface.

[0052] Prioritization module 110 may include confidence module 504 configured to determine a confidence metric, such as using machine learning technique 508, indicating a confidence metric in the degree of consistency. In this case, priorityization module 110 may also include ranking module 506 configured to rank biological targets or subsets of biological targets (such as those matching user input) based on the confidence metric. Alternatively or additionally, ranking module 506 may be included in priorityization module 110 to rank some or all biological targets based on their degree of consistency with the selected rank. If the ranking and / or confidence metric are determined, output module 112 may be configured to output a representation of the ranking or confidence metric. In embodiments, these may be output to a user using a graphical user interface. It will be understood that, for example, confidence metrics may be represented as percentage confidence, text strings (such as high, medium, or low), or any other suitable manner.

[0053] In some embodiments, the minimum confidence level at which a target belongs to a particular category level may be included in the requirement of user input 102. Alternatively or additionally, user input 102 may include user indicators of the relative importance of categories. In this case, priority module 110 may be configured to prioritize biological targets using indicators of relative importance. For example, a user may indicate that a safety category is more important than a ligand capability category.

[0054] Output module 112 can be configured to provide further information to the user. For example, it can provide the user with the highest probability level for each category via a graphical user interface or other reporting methods. Alternatively, in some cases, it can return a flag indicating insufficient information for target classification to the user. Output module 112 can also be configured to specify to the user that a given target belongs to some user-defined category levels and not others.

[0055] refer to Figure 6 The illustration shows a non-limiting example use case 600 of an embodiment of the present invention. In this example, two categories are provided from which the user can select an appropriate level. The category is ligand capacity and therapeutic evidence. Figure 6In use case 600, there is user input 602, which includes two levels 604 of ligand capabilities that have been selected by the user as appropriate. The selected ligand capabilities are level 1 (“Unpredictable or unknown as ligandable”) and level 4 (“Suitable tool compounds are available in the library”). It is important to note that within the range of levels 1 to 4, levels 1 and 4 of ligand capabilities are not adjacent to each other (i.e., they are not adjacent to each other). Instead, there are other levels (2 and 3) between them, and they represent highly contrasting levels within the category. In the category of therapeutic evidence, the user has specified that the target belongs to level 1 (“The target-disease relationship is well-known”) or level 2 (“A target-disease relationship is shown but not well-known”) 606. This requirement can be expressed by stating that the target should at least be shown to be related to the disease.

[0056] After receiving user input 602, the analysis module 608 examines data related to known biological targets from one or more data sources 610 for grading each. In this use case, two targets 612 are considered. The analysis module 608 includes a grader that returns the percentage probability that each target belongs to each grade of each category. For example, as... Figure 6 As shown, the probability that target 1 belongs to level 1 of the ligand capacity category is 20%. Similarly, the probability that target 2 belongs to level 3 of the therapeutic evidence category is 55%.

[0057] The percentage probability determined by the analysis module 608 can now be used by the prioritization module to determine the degree to which targets 1 and 2 meet the requirements specified in user input 602. In this use case 600, the prioritization module 618 is configured to determine those targets (i.e., targets 1 and 2) that are considered to match the requirements of user input 602. For example, for a category of ligand capability, a target must be highly likely to belong to either level 1 or level 4 to be considered to match that category. In this example use case 600, the minimum threshold for the probability of belonging to one of the selected levels is 80%. For ligand capability, the user selected levels 1 and 4, and the analysis module 608 determines that target 1 has a 20% probability of belonging to level 1 and a 65% probability of belonging to level 4. Therefore, the probability that target 1 belongs to one of the acceptable levels is 20% + 65% = 85%. Because this exceeds the minimum threshold of 80%, target 1 is considered to meet the user's requirement for the category of ligand capability.

[0058] Using the same method, there is a 10% + 5% = 15% probability (see figure 614) that target 2 belongs to one of the acceptable levels of ligand capacity. Because 15% is below the 80% threshold for being considered a match, user input 602 does not consider target 2 to match the category of ligand capacity.

[0059] Using the same method and the same 80% threshold, the prioritization module 618 calculates the percentage 616 of each category of treatment evidence and finds that target 1 has a 20% + 75% = 95% probability of falling into the acceptable category, thus matching treatment evidence, while target 2 has a 10% + 35% = 45% probability of falling into the acceptable category, thus not matching treatment evidence. The prioritization module 618 thus produces a result 620, that is, target 1 matches both categories, while target 2 does not match either category. It is understood that other thresholds may be used in other embodiments or use cases. In some cases, further techniques such as machine learning may be used to automatically determine one or more thresholds.

[0060] In use case 600, analysis module 608 is configured to determine a confidence metric that represents the level of confidence that the grading is accurate for each target. Output module (not shown) is configured to output 622 matching targets, followed by non-matching targets, each accompanied by its respective confidence score. As illustrated in the example of use case 600, output 622 provides target 1 as a match with a 90% confidence score, followed by target 2 as a non-match with a 70% confidence score.

[0061] Figure 7 It shows Figure 1 A variant 700 of system 100 is described. Variant 700 includes a feedback loop to enable manual user feedback to be provided to the analysis module 106 via a machine learning grading system 702. System 700 includes a user input device, such as a graphical user interface, configured to receive a user-provided manual labeling command 704 to overwrite at least a portion of the output 114. For example, an expert in the field may know that a given biological target has a high probability of causing adverse side effects, while the machine learning grading system 702 may have already determined this probability to be low. Therefore, the user can manually assign a "high" label to biological targets in the category of "risk of causing adverse side effects," thereby overwriting the system's "low" output. This manual input can then be automatically fed back into the machine learning grading system 702 and used for further training of the grader. Alternatively or additionally, the manual input 704 can be stored and used for subsequent use of system 700 by being included in training data 706. Other methods incorporating user feedback are also envisioned, such as the use of techniques like supervised or semi-supervised methods in machine learning approaches, as well as the use of unsupervised machine learning techniques.

[0062] Computer equipment 800 suitable for implementing the method according to the invention, such as Figure 8 As shown in the diagram. The device 800 includes a processor 802, an input-output device 804, a communication portal 806, and a computer memory 808. The memory 808 can store information that enables the device 800 to function when executed by the processor 802. Figure 2 The code for method 200 shown.

[0063] In the embodiments described above, the server may include a single server or a network of servers. In some examples, the functionality of the server may be provided by a network of servers distributed across geographical areas, such as a globally distributed network of servers, and users can connect to one of the appropriate server networks based on their location.

[0064] For clarity, the above description has referred to embodiments of the invention discussed with reference to a single user. It will be understood that in practice the system can be shared by multiple users, and possibly by a very large number of users simultaneously.

[0065] The embodiments described above are fully automated. In some examples, the system user or operator may manually instruct some steps of the method to be implemented.

[0066] In the embodiments described in this invention, the system can be implemented as any form of computing and / or electronic device. Such a device may include one or more processors, which may be microprocessors, controllers, or any other suitable type of processor, for processing computer-executable instructions to control the operation of the device, thereby collecting and recording routing information. In some examples, such as in the case of a system using a chip architecture, the processor may include one or more fixed-function blocks (also called accelerators) that are implemented as part of a hardware (rather than software or firmware) approach. Platform software, including an operating system or any other suitable platform software, may be provided at the computing-based device to enable application software to execute on the device.

[0067] The various functions described herein can be implemented in hardware, software, or any combination thereof. If implemented in software, these functions can be stored or transmitted as one or more instructions or code on a computer-readable medium. A computer-readable medium can include, for example, a computer-readable storage medium. A computer-readable storage medium can include volatile or non-volatile, removable or non-removable media implemented using any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. A computer-readable storage medium can be any available storage medium accessible to a computer. By way of example and not limitation, such a computer-readable storage medium can include RAM, ROM, EEPROM, flash memory or other storage devices, CD-ROM or other optical disc storage, disk storage or other magnetic storage devices, or any other medium that can be used to carry or store the required program code in the form of instructions or data structures and is accessible to a computer. As used herein, discs and disks include compact discs (CD), laser discs, optical discs, DVDs, floppy disks, and Blu-ray discs (BD). Furthermore, transmitted signals are not included within the scope of computer-readable storage media. Computer-readable media also includes communication media, encompassing any medium that facilitates the transfer of a computer program from one place to another. For example, a connection can be a communication medium. For instance, if software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies (such as infrared, radio, and microwave), it is included in the definition of communication media. Combinations of the above should also be included within the scope of computer-readable media.

[0068] Alternatively, or further, the functionality described herein may be implemented at least in part by one or more hardware logic components. For example, but not limited to, hardware logic components that may be used may include: Field Programmable Gate Array (FPGA), Application-Specific Integrated Circuit (ASIC), Application-Specific Standard Product (ASSP), System-on-a-Chip (SOC), Complex Programmable Logic Device (CPLD), etc.

[0069] Although the diagram depicts a single system, it should be understood that computing devices can be distributed systems. Therefore, for example, several devices can communicate via a network connection and collaboratively perform tasks described as being performed by the computing devices.

[0070] Although the illustration shows a local device, it is understood that the computing device can be located remotely and can be accessed via a network or other communication link (e.g., using a communication interface).

[0071] This document uses the term "computer" to refer to any device that has processing power to enable it to execute instructions. Those skilled in the art will recognize that this processing power is incorporated into many different devices, and therefore the term "computer" includes PCs (personal computers), servers, mobile phones, personal digital assistants, and many other devices.

[0072] Those skilled in the art will recognize that the storage device used to store program instructions can be distributed across a network. For example, a remote computer can store examples describing processes in software. A local or terminal computer can access the remote computer and download part or all of the software to run the program. Alternatively, a local computer can download fragments of software as needed, or execute some software instructions on a local terminal and some software instructions on a remote computer (or computer network). Those skilled in the art will also recognize that, by utilizing conventional techniques known to them, all or part of the software instructions can be implemented by dedicated circuitry, such as DSPs, programmable logic arrays, etc.

[0073] It is understood that the above benefits and advantages may relate to one embodiment or several embodiments. The embodiments are not limited to embodiments that solve any or all of the described problems or embodiments that have any or all of the described benefits and advantages.

[0074] Any reference to “an” means one or more of those items. The term “comprising” is used herein to mean including the identified method steps or elements, but such steps or elements are not included in an exclusive list, and the method or apparatus may contain additional steps or elements.

[0075] As used herein, the terms "component" and "system" are intended to cover computer-readable data storage configured with computer-executable instructions that, when executed by a processor, cause certain functions to be performed. Computer-executable instructions may include routines, functions, etc. It is also understood that a component or system may be located on a single device or distributed across multiple devices.

[0076] Furthermore, as used herein, the term “exemplary” is intended to mean “as an illustration or example of something.”

[0077] Furthermore, within the scope of the use of the term "include" in the detailed description or claims, the term is intended to include in a manner similar to the term "comprising," as interpreted when "comprising" is used as a transition word in a claim.

[0078] The accompanying drawings illustrate exemplary methods. Although these methods are shown and described as a series of actions performed in a specific order, it should be understood that these methods are not limited by the order. For example, some actions may occur in a different order than those described herein. Furthermore, one action may occur simultaneously with another. Moreover, in some cases, not all actions are required to implement the methods described herein.

[0079] Furthermore, the actions described herein may include computer-executable instructions that can be implemented by one or more processors and / or stored on a computer-readable medium. Computer-executable instructions may include routines, subroutines, programs, threads of execution, and / or the like. Further, the results of the actions of the method may be stored in a computer-readable medium, displayed on a display device, and / or similar.

[0080] The order of steps in the methods described herein is exemplary, but these steps can be performed in any suitable order, or simultaneously where appropriate. Furthermore, steps can be added to or substituted in any method, or individual steps can be deleted from any method, without departing from the scope of the subject matter described herein. Aspects of any of the examples described above can be combined with aspects of any other examples described to form further examples without losing the desired effect.

[0081] It is understood that the above description of the preferred embodiments is given by way of example only, and various modifications can be made by those skilled in the art. The content described above includes examples of one or more embodiments. Of course, for the purposes of describing the above aspects, it is impossible to describe every conceivable modification and alteration of the above apparatus or method, but those skilled in the art will recognize that many further modifications and arrangements of the aspects are possible. Therefore, the described aspects are intended to include all such changes, modifications, and variations that fall within the scope of the appended claims.

Claims

1. A computer-implemented method for prioritizing biological targets, the method comprising: Receives a selection of one or more categories of levels, wherein the categories represent the properties of a biological target, wherein the categories include one or more of the following: ligand capacity; safety; therapeutic evidence; biological principles; target expression; and stratification capacity, and wherein the level of the category represents a numerical value or range of the category; For each of a plurality of biological targets, a trained machine learning classifier is used to determine the degree of consistency between the biological target and each selected level; wherein the degree of consistency between the biological target and the selected level includes the probability that the biological target belongs to the selected level, and wherein the plurality of biological targets include genes, nucleic acid sequences, proteins, amino acid sequences, protein complexes and / or biological pathways; Prioritize the biological targets based on the degree of consistency; Output representations of one or more prioritized biological targets; Provide a user input tool to enable a user to generate a manual labeling command to overwrite at least a portion of the output, the manual labeling command specifying whether one of the biological targets belongs to one of the levels; and train the classifier based on the manual labeling command.

2. The method according to claim 1, wherein, The selected levels in one of the categories are not adjacent to each other.

3. The method according to claim 1 or 2, wherein, The selection of the grades includes at least two grades of the same category.

4. The method according to any one of the preceding claims, comprising: Receive user input, which includes the selection of the level of the one or more categories.

5. The method according to any one of the preceding claims, wherein, The probability corresponds to a standardized distribution across all levels of the same category.

6. The method according to any one of the preceding claims, comprising: Determine the degree of consistency from one or more data sources.

7. The method according to any one of the preceding claims, comprising: The degree of consistency is aggregated from the tiers based on their respective data sources.

8. The method according to any one of the preceding claims, wherein, Prioritizing the biological targets involves identifying biological targets that match the user input by applying a minimum required degree of consistency to each selected level.

9. The method according to any one of the preceding claims, comprising: A confidence metric for the degree of consistency is determined, and optionally, the biological targets matching the user input are ranked based on the confidence metric.

10. The method of claim 9, comprising: The confidence measure is determined using machine learning techniques.

11. The method according to any one of the preceding claims, wherein, Prioritizing the biological targets includes ranking the biological targets based on their degree of consistency with the selected level.

12. The method according to any of the preceding claims, wherein, The user input includes a marker of the relative importance of the category, and prioritizing the biological target includes using the marker of relative importance.

13. The method according to any one of claims 8-12, comprising: Output a representation of the biological target that matches the user input.

14. The method of claim 11, comprising: Output the representation of the sorting.

15. The method according to claim 9 or 10, comprising: Output a representation of the confidence measure.

16. The method according to any one of the preceding claims, comprising: Provide a graphical user interface as an input and / or output tool.

17. The method according to any one of the preceding claims, comprising: Use the overwrite command to add a set of training data.

18. A computer-readable medium storing code that, when executed by a computer, causes the computer to perform the method of any of the preceding claims.

19. A system for prioritizing biological targets, the system comprising: An input module configured to receive a selection of levels for one or more categories, wherein the categories represent the properties of a biological target, and wherein the categories include one or more of the following: ligand capacity; safety; therapeutic evidence; biological principles; target expression; and stratification capacity, and wherein the level of the category represents a numerical value or a range of values ​​for the category; An analysis module is configured to determine the degree of consistency between a biological target and each selected level using a trained machine learning classifier for each of a plurality of biological targets; wherein the degree of consistency between the biological target and the selected level includes the probability that the biological target belongs to the selected level, and wherein the plurality of biological targets include genes, nucleic acid sequences, proteins, amino acid sequences, protein complexes and / or biological pathways; A prioritization module configured to prioritize the biological targets based on the degree of consistency; Output module, configured to output representations of one or more prioritized biological targets; and A user input tool is configured to enable a user to generate a manual labeling command to overwrite at least a portion of the output, the manual labeling command specifying whether one of the biological targets belongs to one of the grades, and wherein the grader is configured to be trained based on the manual labeling command.