Drug target screening method, device, equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By integrating multi-level genetic evidence and machine learning models, the accuracy and efficiency of drug target screening are addressed, enabling more comprehensive and reliable target evaluation, improving drug development efficiency and clinical trial success rates, and reducing costs.

CN122201409APending Publication Date: 2026-06-12WESTLAKE UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: WESTLAKE UNIV
Filing Date: 2026-01-26
Publication Date: 2026-06-12

Application Information

Patent Timeline

26 Jan 2026

Application

12 Jun 2026

Publication

CN122201409A

IPC: G16B15/30; G16B40/20; G16B20/40

AI Tagging

Application Domain

Biostatistics Proteomics

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing drug target screening methods are inaccurate, lack comprehensive features, and are inefficient. They also do not cover all genetic analysis methods, resulting in high drug development costs and low efficiency.

⚗Method used

By integrating multi-level information such as GWAS signaling, QTL functional annotation, epigenomics, and network and pathway analysis, combined with gene database information, machine learning models are used to model multi-source data, screen drug targets, avoid data circular dependence, and improve the accuracy and coverage of target screening.

🎯Benefits of technology

It significantly improves the accuracy and coverage of drug target screening, increases the success rate of clinical trials, reduces drug development costs, and accelerates the process of bringing new drugs to market.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201409A_ABST

Patent Text Reader

Abstract

The application relates to the technical field of biological medicine, and discloses a drug target screening method, device, equipment and storage medium, the method comprising the following steps: acquiring N target diseases and corresponding M target genes, and constructing N*M disease-gene pairs; for each disease-gene pair, acquiring corresponding target information; inputting the target information of each disease-gene pair into a machine learning model respectively to obtain corresponding prediction results, the prediction results being used for indicating the priority of the target gene in the disease-gene pair as a target point of the target disease; and based on the prediction results of each disease-gene pair, screening a drug target point of the target disease. The application significantly improves the accuracy and efficiency of the priority screening of the drug target point.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of biomedical technology, specifically to a method, apparatus, equipment, and storage medium for screening drug targets. Background Technology

[0002] To improve drug development efficiency and reduce high R&D costs, it is necessary to enhance the accuracy and feasibility of drug development in its initial stages, such as target screening and validation. Based on this need, various methods and tools for drug target screening and prioritization have emerged in recent years. These methods integrate existing genome-wide association study (GWAS) data, molecular quantitative trait data such as eQTL / pQTL (eQTL, expression quantitative trait loci; pQTL, protein quantitative trait loci), and multidimensional genetic and omics information such as protein-protein interactions and transcriptional regulation data. Typical methods include L2G (locus-to-gene) models, GPS and ML-GPS, and V2G (variant-to-gene) scoring frameworks. These methods attempt to establish a link between drug development and genomics data, helping researchers eliminate a large number of candidate genes lacking genetic support at an early stage, thereby improving R&D efficiency. However, these methods are not very accurate, do not cover all genetic analysis methods, and require complex data processing, resulting in low efficiency. Summary of the Invention

[0003] In view of this, the present invention provides a method, apparatus, device and storage medium for drug target screening, in order to solve the problems of low accuracy, incomplete features and low efficiency in drug target screening.

[0004] In a first aspect, the present invention provides a method for screening drug targets, the method comprising:

[0005] Obtain N target diseases and M target genes corresponding to the target diseases, and construct N×M disease-gene pairs; For each disease-gene pair, corresponding target information is obtained, including at least one of the following: genome annotation information, mutation location gene information, gene-level genetic association test information, locus multi-omics integration analysis information, gene functional similarity analysis and network analysis information, genetic method summative score, network diffusion information of the genetic method summative score, gene pathway information, disease database information, gene functional annotation information, gene protein family information, and gene expression level information; the disease database information refers to information related to the disease-gene pair obtained by screening from a disease database. The target information of each disease-gene pair is input into a machine learning model to obtain corresponding prediction results. The prediction results are used to indicate the priority of the target gene in the disease-gene pair as a target of the target disease. Based on the prediction results of each of the disease-gene pairs, drug gene targets for the target disease are screened.

[0006] In one optional implementation, the prediction result includes predicted values corresponding to multiple categories, the multiple categories including at least one of the following: non-drug targets, preclinical stage, clinical phase I, clinical phase II, clinical phase III, and approved for marketing; the priority of each target gene is determined based on the predicted value corresponding to the selected category.

[0007] In an optional implementation, when the target information includes a total score from genetic methods, obtaining the corresponding target information for each disease-gene pair includes: Each sub-item is scored separately, and the sub-item includes at least one of the following: genome annotation, mutation localization genes, gene-level testing, site-based multi-omics integration analysis, and network analysis. The total score of the genetic method is calculated based on the scores corresponding to each of the sub-items.

[0008] In an optional implementation, when the sub-items include the genome annotation, obtaining the scores corresponding to each sub-item includes: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Within a predetermined range upstream and downstream of each independent single nucleotide polymorphism, a genome-wide association analysis site range is defined; Within the specified genome-wide association study (GWAS) loci interval, determine the GWAS loci; By using gene locations marked by genome annotation, the base distance between potential pathogenic mutations at the genome-wide association analysis sites and the transcription start sites of the target gene is calculated, and it is detected whether any potential pathogenic mutations are located in the coding region of the target gene, thereby achieving the localization of mutations to genes; The genome annotation score is obtained by statistically analyzing whether the pathogenic mutation is located within the coding region of the target gene.

[0009] In an optional implementation, when the sub-item includes the mutation-localizing gene, obtaining the score corresponding to each sub-item includes: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Within a predetermined range upstream and downstream of each independent single nucleotide polymorphism, a genome-wide association analysis site range is defined; Within the specified genome-wide association study (GWAS) loci interval, determine the GWAS loci; By using chromatin function data, the possible pathogenic mutations in the genome-wide association analysis sites are linked to the exons, promoters, and enhancer functional regions of the target gene, thereby achieving functional localization from mutation to gene. The number and types of connections between the pathogenic mutation and the target gene are counted to obtain the score corresponding to the mutation-localized gene.

[0010] In an optional implementation, when the sub-items include the gene-level test, obtaining the scores corresponding to each sub-item respectively includes: Genetic association analysis methods at multiple gene levels were employed, and gene-based association analysis was performed based on the linkage disequilibrium structure of chromosomes to summarize signals at the single nucleotide polymorphism level. The statistical gene-level test uses a genetic association analysis method with threshold detection to obtain the score corresponding to the gene-level test.

[0011] In an optional implementation, when the sub-items include the multi-omics integrated analysis of the loci, obtaining the scores corresponding to each sub-item separately includes: Colocalization or causal inference was performed on the target gene's RNA expression QTLs, RNA splicing QTLs, protein abundance QTLs, methylation QTLs, chromatin accessibility QTLs, and histone modification QTLs in multiple organs, tissues, and cell types using colocalization, Mendelian randomization, and transcriptome association analysis. In the statistical locus multi-omics integration analysis, the scores corresponding to the multi-omics integration analysis of the locus are obtained by using threshold detection methods and tissue organs.

[0012] In an optional implementation, when the sub-item includes the network analysis, obtaining the score corresponding to each sub-item includes: Based on gene and protein networks, various gene functional similarity analysis and network analysis methods were used to evaluate the functional association of genes in co-expression networks and protein interactions. In statistical network analysis, a score corresponding to the network analysis is obtained through gene functional similarity analysis and network analysis methods using threshold detection.

[0013] In an optional implementation, when the target information includes network diffusion information of the sum of genetic method scores, obtaining the corresponding target information for each disease-gene pair includes: Obtain the sum of genetic methods scores for the disease-gene pairs; In the protein-protein interaction network copy corresponding to the target disease, all target genes with a total genetic method score greater than zero in the disease-gene pair corresponding to the target disease are set as seed nodes. The target genes are then diffused to their neighboring nodes in the network by combining the total genetic method scores of the disease-gene pairs through random walk and PageRank algorithms, respectively. The post-diffusion score of the target gene in the protein-protein interaction network copy corresponding to the target disease is obtained as the network diffusion information of the sum score of genetic methods.

[0014] In one optional implementation, the disease database information includes at least one of the following: Annotation information from the online database of human Mendelian genetics; Annotation information from the clinical variant database; Annotation information from the mouse genome information database; Information on the catalog of cancer somatic mutations; Information on the labeling of cancer gene lists; Integrate annotation information from tumor genomics; Annotation information for cancer gene networks.

[0015] In one alternative implementation, the machine learning model includes a multilayer perceptron network structure and a CORAL ordered regression framework.

[0016] In a second aspect, the present invention provides a drug target screening device, the device comprising: The first acquisition module is used to acquire N target diseases and M target genes corresponding to the target diseases, and construct N×M disease-gene pairs; The second acquisition module is used to acquire corresponding target information for each disease-gene pair. The target information includes at least one of the following: genome annotation information, mutation location gene information, gene-level genetic association test information, locus multi-omics integration analysis information, gene functional similarity analysis and network analysis information, genetic method summative score, network diffusion information of the genetic method summative score, gene pathway information, disease database information, gene functional annotation information, gene protein family information, and gene expression level information. The disease database information refers to information related to the disease-gene pair that is screened from a disease database. The prediction module is used to input the target information of each disease-gene pair into the machine learning model to obtain the corresponding prediction results. The prediction results are used to indicate the priority of the target gene in the disease-gene pair as the target of the target disease. A screening module is used to screen pharmacogenetic targets for the target disease based on the prediction results of each of the disease-gene pairs.

[0017] Thirdly, the present invention provides a computer device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the drug target screening method of the first aspect or any corresponding embodiment described above.

[0018] Fourthly, the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to perform the drug target screening method of the first aspect or any corresponding embodiment described above.

[0019] Fifthly, the present invention provides a computer program product, including computer instructions for causing a computer to execute the drug target screening method of the first aspect or any corresponding embodiment described above.

[0020] The drug target screening method, apparatus, equipment, and storage medium provided in this invention comprehensively and deeply integrate multi-level information and multi-dimensional genetic evidence, including GWAS signals, QTL functional annotations, epigenomics, and network and pathway analysis, to further improve our understanding of the molecular mechanisms of human diseases. Based on this, it combines gene-level features, including disease database information, gene pathway information, the protein family to which the gene belongs, the gene's expression level in different tissues, and gene functional annotation information. It also utilizes protein-protein interaction network propagation features and employs machine learning models that avoid data circularity to model multi-source data. This significantly improves the accuracy of drug target prioritization screening, enhances the comprehensiveness of covered genetic analysis methods, increases the success rate and efficiency of clinical trials, effectively reduces drug development costs, and accelerates the new drug launch process.

[0021] In other words, this invention provides a more comprehensive and reliable target evaluation system for drug development, improves the efficiency of drug target discovery, helps accelerate the process of new drug launch and improves the success rate of clinical trials. Attached Figure Description

[0022] To more clearly illustrate the technical solutions in the specific embodiments or related technologies of the present invention, the drawings used in the description of the specific embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0023] Figure 1 This is a flowchart illustrating a drug target screening method according to an embodiment of the present invention; Figure 2 This is a flowchart illustrating another drug target screening method according to an embodiment of the present invention; Figure 3 This is a structural block diagram of a drug target screening device according to an embodiment of the present invention; Figure 4 This is a schematic diagram of the hardware structure of a computer device according to an embodiment of the present invention. Detailed Implementation

[0024] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0025] The methods and tools proposed in related technologies for screening and prioritizing various drug targets have the following problems: (1) The model input features are relatively limited. They usually only consider limited features such as the distance between the gene and the GWAS signal site, whether it is located in the coding region, eQTL or pQTL, and fail to fully integrate multiple types of genetic and functional omics evidence. (2) It is not easy to obtain strong positive samples (such as mature drugs that have been approved for marketing or have entered the late stage of clinical trials and paired with targets), and the number of targets is extremely low relative to the total number of human coding genes. The imbalance between the number of positive and negative samples makes the model prone to bias. (3) Some methods suffer from result leakage or potential circular prediction problems during model training, using interdependent training data or functional features, which can lead to overestimation of some results. For example, the input information of some methods includes the output information of other methods, and using the output of others can lead to circular dependencies between data.

[0026] According to an embodiment of the present invention, a method for screening drug targets is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of executable computer instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than that shown here.

[0027] This embodiment provides a drug target screening method that can be used with various computer devices. Figure 1 This is a flowchart of a drug target screening method according to an embodiment of the present invention, such as... Figure 1 As shown, the process includes the following steps: Step S101: Obtain N target diseases and M target genes corresponding to the target diseases, and construct N×M disease-gene pairs. Specifically, M and N are positive integers.

[0028] Step S102: For each disease-gene pair, obtain the corresponding target information, such as... Figure 2 As shown, the target information includes at least one of the following: genome annotation information, mutation location gene information, gene-level genetic association test information, locus multi-omics integration analysis information, gene functional similarity analysis and network analysis information, genetic method summative score (i.e., genetic analysis summative score 19), and network diffusion information of the genetic method summative score (i.e., Figure 2The genetic analysis scoring network diffusion 20, gene pathway information (i.e., signaling pathway data 21), disease database information 2, gene function annotation information 5, gene to protein family information (i.e., protein family data 22), and gene expression level information 4 are included; the disease database information refers to the information related to the disease-gene pair obtained from the disease database.

[0029] In some optional embodiments, the disease database may include monogenic disease data and / or a tumor database. Monogenic disease data includes at least one of OMIM (Online Mendelian Inheritance in Man), ClinVar (Clinical Variation), HGMD (The Human Gene Mutation Database), and MGI (Mouse Genome Informatics). Tumor databases include at least one of COSMIC CGC (Catalogue of Somatic Mutations in Cancer, Cancer Gene Census), Intogen (Integrative Onco Genomics), and NCG (Network of CancerGenes). Accordingly, the disease database information includes at least one of the following: OMIM-annotated pathogenic gene information; ClinVar-annotated harmful mutations, their number, and their gene location information; HGMD-annotated harmful mutations and their gene location information; MGI-annotated genetic information in humans and mice that is associated with disease; The information on the pathogenicity level, whether it is a tumor marker, whether it contains somatic mutations, whether it contains embryonic mutations, whether it is an oncogene, and whether it is a tumor suppressor gene is marked by COSMIC CGC. Intogen-annotated genes provide information on mutation rate, gene function, whether they are oncogenes, and gene mutation rate in the sample population. Information on gene duplication, necessity in cell lines, RNA expression levels, protein expression levels, single-base mutations in embryonic mutations, and the proportion of structural mutations, all labeled by NCG.

[0030] Gene expression information 4 consists of gene expression data and subcellular localization data in various tissues and cell types.

[0031] In addition, pre-trained embedded features can be combined to improve the model's predictive performance. Pre-trained embedded features can be low-dimensional fractional vectors formed by integrating multiple 0 / 1 distributed features obtained from the literature.

[0032] In some optional embodiments, when the target information includes a total score from genetic methods, step S102, i.e., obtaining the corresponding target information for each disease-gene pair, includes: Step S1021: Obtain the scores corresponding to each sub-item, such as... Figure 2 As shown, the sub-items include at least one of the following: genome annotation, mutation localization of genes, gene-level testing, site-specific multi-omics integration analysis, and network analysis; Step S1022: Calculate the total score of the genetic method based on the scores corresponding to each of the sub-items.

[0033] In embodiments of the present invention, such as Figure 2 As shown, the integrated genetic analysis methods based on genome-wide association studies (GWAS) include functional annotation of GWAS signals (i.e., genome annotation), gene-based association analysis (i.e., gene-level testing), and integrated analysis of GWAS and quantitative trait loci (i.e., locus multi-omics integration analysis). Figure 2 The analysis includes site integration analysis 17), gene localization based on chromatin function (i.e., mutation localization gene 15), and signaling pathway and network analysis methods (i.e., network analysis 18), calculating the total score of genetic methods 19, which is also the sum of the scores of genetic analysis methods.

[0034] In some optional implementations, when the sub-items include the genome annotation, step S1021, i.e., obtaining the scores corresponding to each sub-item, includes: Step S1021A: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Step S1021B: Define genome-wide association analysis site intervals within a preset range upstream and downstream of each independent single nucleotide polymorphism; Step S1021C: Within the genome-wide association analysis (GWAS) locus interval, determine the GWAS locus. Step S1021D: Calculate the base distance between the possible pathogenic mutations in the genome-wide association analysis site and the transcription start site of the target gene using the gene locations marked by genome annotation, and detect whether any possible pathogenic mutations are located in the coding region of the target gene, thereby achieving the localization of mutations to genes; Step S1021E: Calculate whether the pathogenic mutation is located in the coding region of the target gene to obtain the score of the genome annotation.

[0035] In some optional embodiments, when the sub-item includes the mutation localization gene, step S1021, i.e., obtaining the score corresponding to each sub-item, includes: Step S10211: For the disease-gene pair, identify the independent single nucleotide polymorphism (SNP) associated with the target disease.

[0036] Step S10212: Define genome-wide association study (GWAS) site ranges within a preset range upstream and downstream of each independent single nucleotide polymorphism. The preset ranges can be set based on experience or needs, for example, a 1 Mb range.

[0037] Step S10213: Within the genome-wide association analysis (GWAS) site interval, determine the GWAS site.

[0038] Step S10214, through chromatin function data (i.e. Figure 2 The chromatin function annotation 10 shown in the figure establishes a link between possible pathogenic mutations in the genome-wide association analysis site and the exon, promoter, and enhancer functional regions of the target gene, thereby realizing the functional localization from mutation to gene.

[0039] Step S10215: Count the number and types of connections between pathogenic mutations and the target gene to obtain the score corresponding to the mutation-localized gene.

[0040] In this embodiment of the invention, the site definition for mutation-to-gene mapping (i.e.) Figure 2 The effect variant localization shown in Figure 13) identifies independent single nucleotide polymorphisms (SNPs) associated with traits using LD Clumping (i.e., linkage disequilibrium clustering or linkage disequilibrium grouping) and COJO (i.e., conditional and joint analysis) methods. A genome-wide association study (GWAS) site interval is defined within 1 Mb upstream and downstream of each SNP to refine the genomic extent of the associated region. Potential pathogenic mutations in each GWAS site are linked to the exons, promoters, and enhancers of the target gene through genomic annotation and chromatin function data, achieving functional localization from variant to gene.

[0041] LD Clumping is a linkage disequilibrium (LD)-based method used to select a set of non-redundant, significantly associated SNPs as representatives. Since SNPs may exhibit high correlation (i.e., be in an LD state), a representative SNP is needed to represent the entire LD region. COJO (Conditional and Joint Analysis) is a conditional and joint analysis method that further refines the selected SNP set, ensuring that each retained SNP contributes independently to phenotypic variation.

[0042] In addition, embodiments of the present invention can also generate local site data maps of the mutated genes and output them for user viewing.

[0043] In some optional embodiments, when the sub-items include the gene level test, step S1021, i.e., obtaining the scores corresponding to each sub-item, includes: Step S1021a involves employing various gene-level genetic association analysis methods to perform gene-based association analysis based on chromosomal linkage disequilibrium structures, thereby summarizing signals at the single nucleotide polymorphism (SNP) level. For example, MAGMA and mBAT-combo methods can be used to perform gene-based association analysis based on chromosomal linkage disequilibrium structures.

[0044] Step S1021b involves statistically analyzing the genetic associations of genes that pass the threshold detection at the gene level to obtain the corresponding score for the gene level test. For example, between the MAGMA method and the mBAT-combo method, only the MAGMA method passes the threshold detection, so the score for the gene level test is 1.

[0045] In this embodiment of the invention, gene-level testing 16 uses MAGMA and mBAT-combo methods to perform gene-based association analysis, performs gene-based association tests based on linkage disequilibrium structures, and summarizes the signals at the single nucleotide polymorphism level.

[0046] MAGMA (Multi-marker Analysis of GenoMic Annotation) is a widely used tool primarily for gene set enrichment analysis and gene-level association analysis. It helps researchers understand which genes or gene sets are significantly associated with specific phenotypes or diseases. mBAT-combo is a method for calculating the Polygenic Risk Score (PRS), designed to improve the accuracy and interpretability of PRS. It combines the advantages of various statistical methods, enhancing predictive performance by combining different weighting strategies.

[0047] In some optional implementations, when the sub-items include the multi-omics integrated analysis of the loci, step S1021, i.e., obtaining the scores corresponding to each sub-item, includes: Step S1021 (Ⅰ) uses multi-omics integration analysis methods such as colocalization (COLOC), Mendelian randomization (SMR), and transcriptome association analysis (TWAS) to perform colocalization or causal inference on the target gene's RNA expression QTLs, RNA splice QTLs, protein abundance QTLs, methylation QTLs, chromatin accessibility QTLs, and histone modification QTLs (i.e., quantitative trait loci 11) in multiple organs, tissues, and cell types; thereby integrating the genetic signals of genome-wide association studies with molecular phenotypic quantitative trait loci (i.e., multi-omics integration analysis) to identify potential pathogenic mechanisms and related genes.

[0048] Tools such as FUSION or OPERA can be used to perform colocalization or causal inference on QTLs for RNA expression, RNA splicing, protein abundance, methylation, chromatin accessibility, and histone modification to identify potential pathogenic mechanisms and related genes.

[0049] Step S1021 (II): In the statistical multi-omics integration analysis of loci, the score corresponding to the multi-omics integration analysis of the loci is obtained by using the threshold detection method and the tissues and organs.

[0050] In this embodiment of the invention, the integration analysis with quantitative trait loci integrates genome-wide association studies with the genetic signals of molecular phenotypic quantitative trait loci through colocalization, Mendelian randomization, and transcriptome association analysis (i.e., multi-omics integration analysis of loci). Specifically, tools such as SMR, COLOC, FUSION, or OPERA can be used to perform colocalization or causal inference on QTLs for RNA expression, RNA splicing, protein abundance, methylation, chromatin accessibility, and histone modification to identify potential pathogenic mechanisms and related genes.

[0051] SMR stands for Summary-data-based Mendelian Randomization. It's a method that uses GWAS and QTL (Quantitative Trait Loci) data to detect causal relationships between gene expression levels and complex traits. COLOC (Colocalisation) is a method used to detect whether two different traits (such as gene expression and disease state) share the same causal variation. It compares GWAS and QTL data to determine whether there is a common causal variation at a given locus. FUSION (Functional Summary-based ImputatiON) is a suite of tools for performing Transcriptome-wide Association Studies (TWAS) and Roverome-wide Association Studies (RWAS). FUSION constructs predictive models of functional / molecular phenotypic genetic components and uses GWAS (Genome-wide Association Studies) summary statistics to predict and test the association between these components and diseases. Its goal is to identify the association between GWAS phenotypes and functional phenotypes measured only in reference data. OPERA (OmicsPlEiotRopic Association) can jointly analyze xQTL and GWAS data from multiple omics levels by summarizing statistical data and referencing LD (linkage disequilibrium) to reveal the potential molecular mechanisms of GWAS sites.

[0052] In addition, embodiments of the present invention can also generate local site data maps of Mendelian randomization analysis results and output them for user viewing.

[0053] In some optional implementations, when the sub-items include the network analysis, step S1021, i.e., obtaining the scores corresponding to each sub-item, includes: Step S1021-1: Based on gene and protein networks, various gene functional similarity analysis and network analysis methods are used to evaluate gene interactions in co-expression networks and protein interactions (i.e.,Figure 2 Functional relationships in the signal path / network 12 shown; Step S1021-2: Statistical network analysis uses threshold detection for gene functional similarity analysis and network analysis methods to obtain the score corresponding to the network analysis.

[0054] For example, in step S1021-1, the functional associations of genes in co-expression networks and protein interactions can be evaluated using the DEPICT and PoPS algorithms, respectively. Then, in step S1021-2, if only the DEPICT method passes the threshold detection, the score for the network analysis is 1.

[0055] Among them, DEPICT (Data-driven Expression-Prioritized Integration for Complex Traits) systematically prioritizes the screening of the most likely pathogenic genes at associated sites by predicting gene function, highlighting enriched pathways, and identifying tissue or cell types with high expression of genes at associated sites.

[0056] PoPS (Polygenic Priority Score) utilizes genome-wide GWAS summary statistics and combines a large number of publicly available mixed cell and single-cell expression datasets, biological pathways, and predicted protein-protein interaction data to prioritize disease-related genes.

[0057] In summary, the embodiments of the present invention integrate multi-omics data and multi-level genetic evidence, including functional annotation of genome-wide association study (GWAS) signals, gene-based association analysis, integrated analysis of GWAS and quantitative trait loci (QTL), gene localization based on chromatin function, and signaling pathway and network analysis, to calculate a total genetic method score, which is used to screen and evaluate potential drug targets.

[0058] In some optional embodiments, when the target information includes network diffusion information of the sum of genetic methods scores, step S102, i.e., obtaining the corresponding target information for each disease-gene pair, includes: Step S102(Ⅰ): Obtain the genetic method sum score of the disease-gene pair.

[0059] Step S102(Ⅱ): In the protein-protein interaction network copy corresponding to the target disease, all target genes with a total genetic method score greater than zero in the disease-gene pair corresponding to the target disease are set as seed nodes. The total genetic method score of the disease-gene pair is combined, and the target gene is diffused to the adjacent nodes in the network through random walk and PageRank algorithms (i.e., genetic analysis scoring network diffusion 20) to identify additional potential associated genes. Step S102 (Ⅲ) involves obtaining the post-diffusion score of the target gene in the protein-protein interaction network copy corresponding to the target disease. This score serves as the network diffusion information for the sum of genetic method scores. This increases the likelihood that the target gene in the disease-gene pair will be used as a drug target and that the corresponding drug will enter various stages of drug development.

[0060] In this embodiment of the invention, the network propagation characteristics of protein-protein interaction networks are utilized to perform network diffusion (also known as network propagation) on the sum score of genetic methods.

[0061] The PageRank algorithm is an improved method based on random walks, which takes into account the strength and direction of links between nodes, as well as the influence of the global network structure.

[0062] Step S103: Input the target information of each of the disease-gene pairs into the machine learning model (i.e., Figure 2 The ordered regression machine learning model 6 shown in the figure obtains the corresponding prediction results, which are used to indicate the priority of the target gene in the disease-gene pair as the target of the target disease.

[0063] In some optional implementations, the machine learning model includes a multilayer perceptron network structure and a CORAL (Consistent Rank Logits) ordered regression framework. Additionally, a multi-label regression framework may also be included.

[0064] Specifically, the machine learning model includes multiple multilayer perceptron hidden layers, each with a number of neurons, and each layer is equipped with the Leaky ReLU activation function (an improved version of the traditional ReLU (Rectified Linear Unit) activation function, i.e., a leaky linear rectifier unit) and Dropout and batch normalization.

[0065] In addition, the loss function of the machine learning model is the multi-label cross-entropy function or the coral_loss function. The coral_loss function is a loss function specifically designed for ordered regression tasks. It improves the accuracy of the model in predicting ordered labels by considering the order relationship between categories.

[0066] In this embodiment of the invention, a multilayer perceptron network structure is employed for sequence-type data, combined with multi-label regression or CORAL ordered regression frameworks to perform ordered multi-classification of drug clinical stages. This avoids circular dependencies between data (specifically, training data and validation data) and demonstrates the orderliness of the research and development process when facing clinical progress information. In other words, the ordered multi-classification method can more reasonably predict clinical stage progress, effectively reducing the risk and cost of failure in drug development due to inappropriate target selection.

[0067] In some optional implementations, the prediction results include predicted values corresponding to multiple categories, which include at least one of the following: non-drug targets, preclinical stage, Phase I clinical trials, Phase II clinical trials, Phase III clinical trials, and market approval; the priority of each target gene is determined based on the predicted value corresponding to the selected category. These five categories are five ordered classes.

[0068] Specifically, in the prediction results of a disease-gene pair, the predicted values corresponding to each category represent the probability that the target gene cannot be used as a drug target (i.e., non-drug target category) for the target disease, and the probability that it can be used as a drug target to reach the corresponding stage (including the preclinical stage, clinical stage I, clinical stage II, clinical stage III and market approval stages).

[0069] Step S104: Based on the prediction results of each of the disease-gene pairs, screen drug gene targets for the target disease.

[0070] The drug target screening method provided in this embodiment can be described as a comprehensive approach combining genetic analysis and machine learning. By comprehensively and deeply integrating multi-level information and multi-dimensional genetic evidence, such as GWAS signaling, QTL functional annotation, epigenomics, and network and pathway analysis, it further improves the understanding of the molecular mechanisms of human diseases. On this basis, it combines gene-level features, including disease database information, gene pathway information (i.e., signaling pathway data 21), the protein family to which the gene belongs (i.e., protein family information 22), the gene expression level in different tissues (i.e., gene expression level information 4), and gene function annotation information 5. It also uses protein-protein interaction network propagation features and employs a machine learning model that avoids data circular dependence to model multi-source data. This significantly improves the accuracy of drug target priority screening, enhances the comprehensiveness of the genetic analysis methods covered, increases the success rate and efficiency of clinical trials, effectively reduces drug development costs, and accelerates the process of bringing new drugs to market.

[0071] In other words, this invention provides a more comprehensive and reliable target evaluation system for drug development, improves the efficiency of drug target discovery, helps accelerate the process of new drug launch and improves the success rate of clinical trials.

[0072] This invention also effectively solves problems in existing drug target prediction such as incomplete features, imbalanced positive and negative samples (in this invention, a large amount of pre-drug development data is introduced during training, increasing the number of positive samples), data leakage, and overfitting (in this invention, the output of other models is not used during training, and a method of splitting the model performance by retaining all genes on one chromosome as the validation set is adopted when validating the model performance, avoiding inflated performance due to similarity to the disease). This improves the efficiency and success rate of drug development and has broad application prospects. Furthermore, it can discover gene-disease pairs with the potential for "drug repurposing" under high-threshold screening. For example, if a potential gene-disease pair is discovered, the gene itself may already be a target for drugs targeting other diseases. This drug may also be suitable for the current disease, making it easier to find preclinical experimental evidence.

[0073] The drug target screening method provided in this invention can be developed into an application, which has pre-integrated relevant molecular characteristic data, GWAS and QTL association results, disease gene databases, genetic analysis scores, and other relevant annotations. Furthermore, the application has standardized the names of various disease phenotypes through software and manual matching.

[0074] In this embodiment, users can collect new GWAS data themselves and then use the machine learning model in the program to predict or score the new data to obtain the probability of gene-disease pair associations and potential drug development stages. The specific operation process is as follows: I. Preparing GWAS Data: Users obtain their own GWAS summary statistical file. This file should contain information such as SNP loci, corresponding p-values, and effects, and should be in a standard GWAS format. Users confirm the name of the GWAS phenotype and attempt to match it with the corresponding disease / phenotype name in the database.

[0075] 2. Log in and upload data: Users utilize the model's "prediction / scoring" function. Users aggregate statistical data using their own GWAS and, within a specified phenotype name, match that name with existing diseases / phenotypes.

[0076] III. Data Integration and Preprocessing: The program first performs LD Clumping or COJO analysis on the uploaded GWAS data (if the user has already completed this step, they can also directly upload a list of screened independent SNPs) to define the scope of GWAS loci and calibrate the correspondence with existing gene annotations in the database. The program can automatically call the Fine-Mapping step in the background or directly utilize the user-provided reliable mutation set (PIP>0.1), and then link chromatin function data to these pathogenic mutation regions to obtain annotation information such as V2G and L2G. The program will compare the user-uploaded phenotype to be predicted with various QTL data stored in the database. If the corresponding QTL information (including eQTL, sQTL, mQTL, etc.) is available within the user-requested locus range, it will be added to the scoring features; if no corresponding information is available, the null features will be filled with default values (z-score uses the mean, p-value uses 1, and probability uses 0).

[0077] IV. Prediction using a pre-trained model: The program uses a pre-trained machine learning model to score or predict the uploaded GWAS data. This model includes disease-specific and non-disease-specific features (signaling pathways, gene families, gene expression, etc.). The model outputs five ordered categories: (0) Non-drug target, (1) Preclinical stage, (2) Phase I clinical trial, (3) Phase II clinical trial, (4) Phase III clinical trial, and (5) Approved for marketing. For each gene-disease pair corresponding to the new GWAS data, the program will provide a predicted value, representing the probability trend of the gene from never entering the highest clinical stage of drug development.

[0078] V. View and Export Results: Based on the prediction results, the program generates risk predictions and potential clinical development stage predictions for each gene-disease pair. Users can view the gene-disease pair prediction table, including information such as molecular regulatory evidence, co-localization results, and consistency scores. Users can export the results for further analysis or as a reference for drug development clues.

[0079] In this embodiment, users can use a machine learning model that has been trained on a database to infer new GWAS data without having to train a neural network model themselves, thereby obtaining gene-disease relationships and potential research and development stage analysis results.

[0080] In some scenarios, users may want to retrain their models using large-scale GWAS data or data from specific populations that they have collected themselves, in order to obtain more accurate predictive capabilities on more granular phenotypic or population characteristics. The process is as follows: I. Preparing Training Data: Users collect and organize a large amount of GWAS summary statistics, covering multiple studies containing the target phenotype or related traits. To ensure compatibility with the genetic analysis methods described in this application, users should use standardized GWAS format files corresponding to each method, clearly specifying the phenotype name and the corresponding disease database ID. Users may also add data features from other sources (such as new QTL data, unique population gene expression profiles), provided that they are consistent with the existing annotation system of the database (gene IDs, disease terminology, etc.) or can be compared using third-party tools.

[0081] II. Data Preprocessing: Configure training parameters, such as which levels of scores to train, whether to include disease-specific features such as ClinVar, OMIM, HGMD, and MGI, and whether to incorporate principal component analysis information of protein family features. Automatically fill in missing features; for parts without data support on QTLs or chromatin networks, automatically use default filler values.

[0082] III. Training Set Partitioning and Model Training: Configure cross-validation strategy; call the multilayer perceptron structure of the CORAL framework for ordered multi-class classification training. The model structure by default includes 2 hidden layers and modules such as LeakyReLU, Dropout (0.2) and Batch Normalization; define the learning rate (default Adam optimizer 0.001), number of iterations and early stopping strategy.

[0083] IV. Validation and Hyperparameter Tuning: Record evaluation metrics on the validation set (such as accuracy, mean squared error, AUC, or specific metrics based on CORAL); review the training curves and manually adjust hyperparameters as needed; after training, evaluate the final performance on the test set and output the confusion matrix, classification report, and ranking evaluation results.

[0084] Model saving and prediction application: Save well-trained or best-performing models, identify the target population or specific phenotypic domains for which they are applicable, and indicate the source of the training data in the model information; if users or other collaborators need to use the model for prediction in the future, they can input the new GWAS data and select the custom model to obtain the corresponding prediction results.

[0085] Through the above steps, users can retrain their models using their own large-scale GWAS data, thereby obtaining more accurate gene-disease association predictions and drug development guidance in more diverse or specific population contexts.

[0086] In summary, through the operational process examples of the three embodiments above, the analytical methods and models provided in this application achieve integrated analysis and prediction of gene-disease associations, including but not limited to: analyzing gene-disease pair correlations through integrated genetic methods, predicting new GWAS data based on existing models, and training or refining models independently. This application can simultaneously meet the data analysis, bioinformatics mining, and drug development needs of different researchers, greatly improving the efficiency and accuracy in complex disease gene identification and potential drug target screening.

[0087] This embodiment also provides a drug target screening device for implementing the above-described drug target screening method embodiments and preferred embodiments, which will not be repeated hereafter. As used below, the term "module" can refer to a combination of software and / or hardware that performs a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0088] This embodiment provides a drug target screening device, such as... Figure 3 As shown, it includes: The first acquisition module 301 is used to acquire N target diseases and M target genes corresponding to the target diseases, and construct N×M disease-gene pairs; The second acquisition module 302 is used to acquire corresponding target information for each disease-gene pair. The target information includes at least one of the following: genome annotation information, mutation location gene information, gene-level genetic association test information, site-specific multi-omics integration analysis information, gene functional similarity analysis and network analysis information, genetic method summative score, network diffusion information of the genetic method summative score, gene pathway information, single-gene disease database information, tumor disease database information, gene functional annotation information, gene protein family information, and gene expression level information. The single-gene disease database information refers to information related to the disease-gene pair that is screened from the disease database. The prediction module 303 is used to input the target information of each disease-gene pair into the machine learning model to obtain the corresponding prediction results. The prediction results are used to indicate the priority of the target gene in the disease-gene pair as the target of the target disease. The screening module 304 is used to screen drug gene targets for the target disease based on the prediction results of each of the disease-gene pairs.

[0089] In some optional embodiments, when the target information includes a genetic method summation score, the second acquisition module 302 includes: The scoring acquisition unit is used to acquire the scores corresponding to each sub-item, wherein the sub-item includes at least one of the following: genome annotation, mutation localization gene, gene-level test, site multi-omics integration analysis, and network analysis. The total score calculation unit is used to calculate the total score of the genetic method based on the scores corresponding to each of the sub-items.

[0090] In some optional implementations, where the sub-item includes the genome annotation, the scoring unit is specifically used for: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Within a predetermined range upstream and downstream of each independent single nucleotide polymorphism, a genome-wide association analysis site range is defined; Within the specified genome-wide association study (GWAS) loci interval, determine the GWAS loci; By using gene locations marked by genome annotation, the base distance between potential pathogenic mutations at the genome-wide association analysis sites and the transcription start sites of the target gene is calculated, and it is detected whether any potential pathogenic mutations are located in the coding region of the target gene, thereby achieving the localization of mutations to genes; The genome annotation score is obtained by statistically analyzing whether the pathogenic mutation is located within the coding region of the target gene.

[0091] In some optional embodiments, when the sub-item includes the mutation-localized gene, the scoring unit is specifically used for: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Within a predetermined range upstream and downstream of each independent single nucleotide polymorphism, a genome-wide association analysis site range is defined; Within the specified genome-wide association study (GWAS) loci interval, determine the GWAS loci; By using chromatin function data, the possible pathogenic mutations in the genome-wide association analysis sites are linked to the exons, promoters, and enhancer functional regions of the target gene, thereby achieving functional localization from mutation to gene. The number and types of connections between the pathogenic mutation and the target gene are counted to obtain the score corresponding to the mutation-localized gene.

[0092] In some optional embodiments, when the sub-item includes the gene level test, the scoring acquisition unit is specifically used for: Genetic association analysis methods at multiple gene levels were employed, and gene-based association analysis was performed based on the linkage disequilibrium structure of chromosomes to summarize signals at the single nucleotide polymorphism level. The statistical gene-level test uses a genetic association analysis method with threshold detection to obtain the score corresponding to the gene-level test.

[0093] In some optional implementations, when the sub-item includes the multi-omics integrated analysis of the loci, the scoring acquisition unit is specifically used for: Colocalization or causal inference was performed on the target gene's RNA expression QTLs, RNA splicing QTLs, protein abundance QTLs, methylation QTLs, chromatin accessibility QTLs, and histone modification QTLs in multiple organs, tissues, and cell types using colocalization, Mendelian randomization, and transcriptome association analysis. In the statistical locus multi-omics integration analysis, the scores corresponding to the multi-omics integration analysis of the locus are obtained by using threshold detection methods and tissue organs.

[0094] In some optional implementations, when the sub-item includes the network analysis, the scoring acquisition unit is specifically used for: Based on gene and protein networks, various gene functional similarity analysis and network analysis methods were used to evaluate the functional association of genes in co-expression networks and protein interactions. In statistical network analysis, a score corresponding to the network analysis is obtained through gene functional similarity analysis and network analysis methods using threshold detection.

[0095] In some optional embodiments, when the target information includes network diffusion information of the summation score of genetic methods, the second acquisition module 302 includes: The total score acquisition unit is used to acquire the genetic method total score of the disease-gene pair; The diffusion unit is used to set all target genes with a total genetic method score greater than zero in the protein-protein interaction network copy corresponding to the target disease as seed nodes, and to diffuse the target genes to the adjacent nodes in the network by combining the total genetic method scores of the disease-gene pairs through random walk and PageRank algorithms respectively. The diffusion information acquisition unit is used to obtain the post-diffusion score of the target gene in the protein-protein interaction network copy corresponding to the target disease, as the network diffusion information of the sum score of genetic methods.

[0096] Further functional descriptions of the above modules and units are the same as those in the corresponding embodiments described above, and will not be repeated here.

[0097] In this embodiment, the drug target screening device is presented in the form of a functional unit. Here, a unit refers to an ASIC (Application Specific Integrated Circuit) circuit, a processor and memory that execute one or more software or fixed programs, and / or other devices that can provide the above functions.

[0098] This invention also provides a computer device having the above-described features. Figure 3 The drug target screening device shown.

[0099] Please see Figure 4 , Figure 4 This is a schematic diagram of the structure of a computer device provided in an optional embodiment of the present invention, such as... Figure 4 As shown, the computer device includes one or more processors 410, memory 420, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The components communicate with each other via different buses and can be mounted on a common motherboard or otherwise installed as needed. The processors can process instructions executed within the computer device, including instructions stored in or on memory to display graphical information of a GUI on external input / output devices (such as display devices coupled to the interfaces). In some alternative implementations, multiple processors and / or multiple buses can be used with multiple memories and multiple memory modules, if desired. Similarly, multiple computer devices can be connected, each providing some of the necessary operations (e.g., as a server array, a group of blade servers, or a multiprocessor system). Figure 4 Take a processor 410 as an example.

[0100] Processor 410 may be a central processing unit, a network processor, or a combination thereof. Processor 410 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The programmable logic device may be a complex programmable logic device (CAMP), a field-programmable gate array (FPGA), a general-purpose array logic (GDA), or any combination thereof.

[0101] The memory 420 stores instructions executable by at least one processor 410 to cause the at least one processor 410 to perform the method shown in the above embodiments.

[0102] The memory 420 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created based on the use of the computer device. Furthermore, the memory 420 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, the memory 420 may optionally include memory remotely located relative to the processor 410, and these remote memories may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0103] The memory 420 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk or solid-state drive; the memory 420 may also include a combination of the above types of memory.

[0104] The computer device also includes an input device 430 and an output device 440. The processor 410, memory 420, input device 430, and output device 440 can be connected via a bus or other means. Figure 4 Taking the example of a connection between China and Israel via a bus.

[0105] Input device 430 can receive input numerical or character information, and generate key signal inputs related to user settings and function control of the computer device, such as a touchscreen, keypad, mouse, trackpad, touchpad, joystick, one or more mouse buttons, trackball, joystick, etc. Output device 440 may include display devices, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors). The aforementioned display devices include, but are not limited to, liquid crystal displays, light-emitting diodes, displays, and plasma displays. In some alternative embodiments, the display device may be a touchscreen.

[0106] The computer device also includes a communication interface for communicating with other devices or communication networks.

[0107] This invention also provides a computer-readable storage medium. The methods described above according to embodiments of the invention can be implemented in hardware or firmware, or implemented as computer code that can be recorded on a storage medium, or implemented as computer code downloaded via a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code, which, when accessed and executed by the computer, processor, or hardware, implements the methods shown in the above embodiments.

[0108] A portion of this invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to the invention through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.

[0109] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. A method for screening drug targets, characterized in that, The method includes: Obtain N target diseases and M target genes corresponding to the target diseases, and construct N×M disease-gene pairs; For each disease-gene pair, corresponding target information is obtained, including at least one of the following: genome annotation information, mutation location gene information, gene-level genetic association test information, locus multi-omics integration analysis information, gene functional similarity analysis and network analysis information, genetic method summative score, network diffusion information of the genetic method summative score, gene pathway information, single-gene disease database information, tumor disease database information, gene functional annotation information, gene protein family information, and gene expression level information; the disease database information refers to information related to the disease-gene pair obtained from disease databases. The target information of each disease-gene pair is input into a machine learning model to obtain corresponding prediction results. The prediction results are used to indicate the priority of the target gene in the disease-gene pair as a target of the target disease. Based on the prediction results of each of the disease-gene pairs, drug gene targets for the target disease are screened.

2. The method according to claim 1, characterized in that, The prediction results include predicted values corresponding to multiple categories, which include at least one of the following: non-drug targets, preclinical stage, clinical phase I, clinical phase II, clinical phase III, and approved for marketing; the priority of each target gene is determined based on the predicted value corresponding to the selected category.

3. The method according to claim 1, characterized in that, When the target information includes a total score from genetic methods, obtaining the corresponding target information for each disease-gene pair includes: Each sub-item is scored separately, and the sub-item includes at least one of the following: genome annotation, mutation localization genes, gene-level testing, site-based multi-omics integration analysis, and network analysis. The total score of the genetic method is calculated based on the scores corresponding to each of the sub-items.

4. The method according to claim 3, characterized in that, When the sub-item includes the genome annotation, obtaining the score corresponding to each sub-item includes: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Within a predetermined range upstream and downstream of each independent single nucleotide polymorphism, a genome-wide association analysis site range is defined; Within the specified genome-wide association study (GWAS) loci interval, determine the GWAS loci; By using gene locations marked by genome annotation, the base distance between potential pathogenic mutations at the genome-wide association analysis sites and the transcription start sites of the target gene is calculated, and it is detected whether any potential pathogenic mutations are located in the coding region of the target gene, thereby achieving the localization of mutations to genes. The genome annotation score is obtained by statistically analyzing whether the pathogenic mutation is located within the coding region of the target gene.

5. The method according to claim 3, characterized in that, When the sub-item includes the mutant localization gene, obtaining the score corresponding to each sub-item includes: For the disease-gene pair, identify independent single nucleotide polymorphisms associated with the target disease; Within a predetermined range upstream and downstream of each independent single nucleotide polymorphism, a genome-wide association analysis site range is defined; Within the specified genome-wide association study (GWAS) loci interval, determine the GWAS loci; By using chromatin function data, the possible pathogenic mutations in the genome-wide association analysis sites are linked to the exons, promoters, and enhancer functional regions of the target gene, thereby achieving functional localization from mutation to gene. The number and types of connections between the pathogenic mutation and the target gene are counted to obtain the score corresponding to the mutation-localized gene.

6. The method according to claim 3, characterized in that, When the sub-item includes the gene-level test, obtaining the score corresponding to each sub-item includes: Genetic association analysis methods at multiple gene levels were employed, and gene-based association analysis was performed based on the linkage disequilibrium structure of chromosomes to summarize signals at the single nucleotide polymorphism level. The statistical gene-level test uses a genetic association analysis method with threshold detection to obtain the score corresponding to the gene-level test.

7. The method according to claim 3, characterized in that, When the sub-items include the multi-omics integrated analysis of the loci, the step of obtaining the scores corresponding to each sub-item includes: Colocalization or causal inference was performed on the target gene's RNA expression QTLs, RNA splicing QTLs, protein abundance QTLs, methylation QTLs, chromatin accessibility QTLs, and histone modification QTLs in multiple organs, tissues, and cell types using colocalization, Mendelian randomization, and transcriptome association analysis. In the statistical locus multi-omics integration analysis, the scores corresponding to the multi-omics integration analysis of the locus are obtained by using threshold detection methods and tissue organs.

8. The method according to claim 3, characterized in that, When the sub-item includes the network analysis, obtaining the scores corresponding to each sub-item includes: Based on gene and protein networks, various gene functional similarity analysis and network analysis methods were used to evaluate the functional association of genes in co-expression networks and protein interactions. In statistical network analysis, a score corresponding to the network analysis is obtained through gene functional similarity analysis and network analysis methods using threshold detection.

9. The method according to claim 1, characterized in that, When the target information includes network diffusion information of the sum of genetic method scores, obtaining the corresponding target information for each disease-gene pair includes: Obtain the sum of genetic methods scores for the disease-gene pairs; In the protein-protein interaction network copy corresponding to the target disease, all target genes with a total genetic method score greater than zero in the disease-gene pair corresponding to the target disease are set as seed nodes. The target genes are then diffused to their neighboring nodes in the network by combining the total genetic method scores of the disease-gene pairs through random walk and PageRank algorithms, respectively. The post-diffusion score of the target gene in the protein-protein interaction network copy corresponding to the target disease is obtained as the network diffusion information of the sum score of genetic methods.

10. The method according to claim 1, characterized in that, The disease database information includes at least one of the following: Annotation information from the online database of human Mendelian genetics; Annotation information from the clinical variant database; Annotation information from the mouse genome information database; Information on the catalog of cancer somatic mutations; Information on the labeling of cancer gene lists; Integrate annotation information from tumor genomics; Annotation information for cancer gene networks.

11. The method according to claim 1, characterized in that, The machine learning model includes a multilayer perceptron network structure and a CORAL ordered regression framework.

12. A drug target screening device, characterized in that, The device includes: The first acquisition module is used to acquire N target diseases and M target genes corresponding to the target diseases, and construct N×M disease-gene pairs; The second acquisition module is used to acquire corresponding target information for each disease-gene pair. The target information includes at least one of the following: genome annotation information, mutation location gene information, gene-level genetic association test information, locus multi-omics integration analysis information, gene functional similarity analysis and network analysis information, genetic method summative score, network diffusion information of the genetic method summative score, gene pathway information, disease database information, gene functional annotation information, gene protein family information, and gene expression level information. The disease database information refers to information related to the disease-gene pair that is screened from a disease database. The prediction module is used to input the target information of each disease-gene pair into the machine learning model to obtain the corresponding prediction results. The prediction results are used to indicate the priority of the target gene in the disease-gene pair as the target of the target disease. A screening module is used to screen pharmacogenetic targets for the target disease based on the prediction results of each of the disease-gene pairs.

13. A computer device, characterized in that, include: A memory and a processor are communicatively connected, the memory stores computer instructions, and the processor executes the computer instructions to perform the drug target screening method according to any one of claims 1 to 11.

14. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the drug target screening method according to any one of claims 1 to 11.