Graph-based clinical analysis method integrating medical information data and medical ontology, and computer device therefor

The graph-based clinical analysis method addresses the challenges of data heterogeneity and complexity by integrating medical information and ontology, ensuring reliable and explainable clinical insights through graph neural networks and verification techniques.

WO2026127740A1PCT designated stage Publication Date: 2026-06-18SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION
Filing Date
2025-12-11
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Conventional medical data analysis technologies struggle with handling the heterogeneity and complexity of medical data, failing to preserve semantic context and temporal sequences, and lack explainability, leading to unreliable and unverifiable analysis results.

Method used

A graph-based clinical analysis method integrating medical information data and medical ontology, utilizing a graph neural network to generate embedding vectors that encapsulate structural and semantic information, and performing topological and statistical verification to ensure stability and explainability.

🎯Benefits of technology

Preserves deep semantic context and topological features, enhancing the precision and reliability of clinical analysis, providing explainable insights for medical professionals.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025095783_18062026_PF_FP_ABST
    Figure KR2025095783_18062026_PF_FP_ABST
Patent Text Reader

Abstract

According to an embodiment of the present application, provided is a graph-based clinical analysis method integrating medical information data and medical ontology. The method may comprise the steps of: constructing an integrated graph database by modeling medical information data and medical ontology data of a plurality of patients with nodes and edges; generating a graph structure to be analyzed on the basis of at least some data of the integrated graph database; applying an embedding generation model to the graph structure to generate an embedding vector encoding structural information and semantic information of the graph structure; analyzing a relationship structure between patients on the basis of the embedding vector; verifying at least one of topological stability or statistical stability of the relationship structure; and generating and providing explainable clinical information corresponding to a user query on the basis of the relationship structure and the integrated graph database.
Need to check novelty before this filing date? Find Prior Art

Description

Graph-based clinical analysis method integrating medical information data and medical ontology and computer device for the same

[0001] The present application relates to a graph-based clinical analysis method integrating medical information data and medical ontology, and a computer device for the same.

[0002] Due to the rapid recent advancements in medical IT technology, vast amounts of medical big data, including electronic medical records (EMR), genomic data, and life logs, are being accumulated. Consequently, research on clinical decision support systems (CDSS) and precision medicine technologies is actively underway to analyze this accumulated data using artificial intelligence (AI) and machine learning techniques to predict patient diseases or suggest personalized treatments.

[0003] However, conventional medical data analysis technologies face technical limitations in handling the inherent heterogeneity and complexity of medical data. Medical data exists in various forms, such as diagnoses, prescriptions, and test results, and often uses different formats depending on the institution; consequently, existing relational database (RDB)-based systems find it difficult to flexibly reflect complex relationships between data or hierarchical semantic information, such as medical ontologies. As a result, there is a problem where the semantic context or temporal sequence between clinical events is omitted or not fully utilized during the analysis process.

[0004] Furthermore, the black-box nature of artificial intelligence models, such as deep learning, which have recently demonstrated high predictive performance, is also becoming an obstacle to their adoption in clinical settings. In the medical field, which deals with human lives, it is essential to explain not only the accuracy of simple prediction results but also the medical basis upon which those results were derived. However, existing models struggle to gain the trust of medical professionals because they fail to provide specific grounds for judgment, revealing the limitations of systems lacking explainability.

[0005] Furthermore, topological and statistical verification procedures for the derived analysis results are also currently inadequate. For instance, when deriving phenotypes by clustering patient data, it is difficult to clearly verify whether the formed clusters are the result of chance caused by data noise or parameter settings, or if they are meaningful outcomes reflecting the intrinsic structure of the actual data. This lack of verification undermines the reproducibility and reliability of the analysis results.

[0006] Accordingly, there is a need to develop new medical information analysis technologies that can integrate heterogeneous medical data to preserve semantic connectivity between data without loss, objectively verify the stability of analysis results, and provide medical professionals with explainable grounds for prediction results.

[0007] The present application aims to provide a graph-based clinical analysis method integrating medical information data and medical ontology, and a computer device for the same.

[0008] According to an embodiment of the present application, a graph-based clinical analysis method integrating medical information data and medical ontology is provided. The method may include the steps of: constructing an integrated graph database by modeling medical information data and medical ontology data of a plurality of patients into nodes and edges; generating a graph structure to be analyzed based on at least some of the data of the integrated graph database; applying an embedding generation model to the graph structure to generate an embedding vector that encapsulates structural information and semantic information of the graph structure; analyzing the relationship structure between patients based on the embedding vector; verifying at least one of topological stability and statistical stability of the relationship structure; and generating and providing explainable clinical information corresponding to a user query based on the relationship structure and the integrated graph database.

[0009] In addition, the step of generating the graph structure can be performed by extracting data from the integrated graph database that meets the conditions of a specific disease group or patient cohort set according to the purpose of analysis.

[0010] In addition, the step of constructing the integrated graph database may include establishing a semantic mapping relationship between a diagnosis or prescription node included in the medical information data and a concept node of the medical ontology, and preserving a semantic connection structure including at least one of a superordinate concept relationship, a lesion site relationship, and an association relationship within the medical ontology to integrate the semantic context of a clinical event.

[0011] In addition, the step of constructing the integrated graph database may further include the step of preprocessing the medical information data based on a predetermined standard format, and the step of preserving a time-series causal relationship by generating temporal edges indicating the chronological relationship or temporal interval of occurrence between a plurality of clinical event nodes connected to the same patient node.

[0012] Additionally, the embedding generation model may be a graph neural network (GNN)-based model and may include at least one of a graph attention network (GAT), a graph convolutional network (GCN), a relational graph convolutional network (RGCN), and GraphSAGE.

[0013] Additionally, the step of generating the embedding vector may include the step of optimizing the embedding vector by performing multitask learning including clinical event prediction, inter-node link prediction, and community detection tasks.

[0014] In addition, in the step of optimizing the embedding vector, model parameters can be optimized by applying a spectral-topological normalization loss function that induces the overall connectivity structure and topological features of the graph structure to be maintained within the embedding vector space when performing the multitask learning.

[0015] Additionally, the step of analyzing the relationship structure between the patients may include the step of calculating a distance or similarity index between the embedding vectors of the patient nodes, and the step of constructing a Patient Similarity Network (PSN) by setting edges for pairs of patient nodes that satisfy a set criterion based on the calculated distance or similarity index.

[0016] Additionally, the step of analyzing the relationship structure between the patients may further include the step of identifying a plurality of patient clusters within the patient similarity network, the step of identifying a hub node within each of the patient clusters whose centrality is greater than or equal to a predetermined threshold, and the step of defining the hub node as a clinical prototype representing the corresponding patient cluster and analyzing the clinical characteristics of the hub node.

[0017] Additionally, the step of verifying at least one of the topological stability and statistical stability may include the step of performing topological data analysis on the patient similarity network to extract topological invariants or persistence features.

[0018] Additionally, the step of verifying at least one of the topological stability and statistical stability may further include the step of evaluating the reliability or significance of the topological invariant or persistence feature using statistical resampling.

[0019] Additionally, the method may further include a step of identifying patient phenotype groups by performing clustering on the embedding vectors.

[0020] Additionally, the step of verifying at least one of the topological stability and statistical stability may further include a step of verifying the validity of the data-drivenly discovered phenotype by comparing whether the patient phenotype group matches the topological invariant or persistence feature.

[0021] Additionally, the step of generating and providing the above-describeable clinical information may include the step of generating a graph query to explore the integrated graph database by analyzing the user's natural language query, and the step of generating an answer to the natural language query based on the patient trajectory path extracted from the integrated graph database and the patient cluster information identified in the relationship structure as the result of executing the graph query.

[0022] Additionally, the step of generating the above-mentioned explainable clinical information may include the step of calculating a prediction result regarding the patient's clinical condition using the above-mentioned embedding vector, the step of calculating a major clinical variable that contributes significantly to the above-mentioned prediction result, and the step of providing the contribution information of the major clinical variable by including it in the above-mentioned explainable clinical information.

[0023] In addition, the method may further include a step of calculating a quality indicator for at least one of a specific disease group and prescription pattern using the integrated graph database.

[0024] A computer program is provided according to an embodiment of the present application. The program may be stored on a recording medium to execute a method according to an embodiment of the present application.

[0025] According to an embodiment of the present application, a computer device for performing graph-based clinical analysis integrating medical information data and medical ontology is provided. The computer device comprises at least one processor; and a memory for storing a program executable by the processor. By executing the program, the processor models the medical information data and medical ontology data of a plurality of patients into nodes and edges to construct an integrated graph database; generates a graph structure to be analyzed based on at least some data of the integrated graph database; applies an embedding generation model to the graph structure to generate an embedding vector containing structural information and semantic information of the graph structure; analyzes the relationship structure between patients based on the embedding vector; verifies at least one of topological stability and statistical stability of the relationship structure; and generates and provides explainable clinical information corresponding to a user query based on the relationship structure and the integrated graph database.

[0026] According to the embodiments of the present application, by integrating electronic medical record (EMR) data and standard medical ontologies into a multilayer graph structure and applying semantic mapping, the deep semantic context between clinical events can be perfectly preserved, thereby dramatically improving the precision of tracing complex disease pathways and the accuracy of data interpretation at the same time.

[0027] In addition, according to the embodiments of the present application, not only local connectivity of patient data but also macroscopic topological features can be precisely reflected in the learning process, thereby maximizing the performance of clinical deterioration prediction even in high-dimensional, sparse medical data environments and fundamentally solving the problem of overfitting.

[0028] In addition, according to the embodiments of the present application, by constructing a patient similarity network (PSN) based on learned embedding vectors and performing phase data analysis (TDA) and bootstrap-based statistical verification thereon, the structural stability of data-driven patient phenotypes can be mathematically verified, thereby significantly increasing clinical reliability.

[0029] In addition, according to the embodiments of the present application, not only the prediction results but also the patient trajectory path and quality measures that serve as the basis for them are extracted and linked with a large-scale language model (LLM), thereby providing medical professionals with explainable clinical information with medical validity, which can dramatically improve the efficiency of decision-making in clinical settings.

[0030] The effects obtainable from the embodiments of the present application are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which the present application belongs from the description below.

[0031] A brief description of each drawing is provided to help to better understand the drawings cited in this application.

[0032] FIG. 1 is a flowchart illustrating a graph-based clinical analysis method that integrates medical information data and medical ontology according to an embodiment of the present application.

[0033] Figure 2 is a flowchart illustrating the integrated graph database construction step (S110) of Figure 1.

[0034] Figure 3 is a flowchart illustrating the patient relationship structure analysis step (S140) of Figure 1.

[0035] Figure 4 is a flowchart illustrating the stability verification step (S150) of Figure 1.

[0036] FIGS. 5 and FIGS. 6 are flowcharts illustrating the explainable clinical information provision step (S160) of FIG. 1.

[0037] FIG. 7 is a diagram schematically illustrating an overall data processing pipeline that performs clinical analysis based on a graph integrating medical information data and medical ontology according to an embodiment of the present application.

[0038] FIG. 8 is a diagram illustrating an exemplary form in which patient information, clinical events, and medical ontologies are connected in a multi-layered hierarchical structure within an integrated graph database according to an embodiment of the present application.

[0039] FIG. 9 is a diagram illustrating an exemplary visualization of a specific graph structure in which a patient's clinical path and medical ontology are interconnected within an integrated graph database according to an embodiment of the present application.

[0040] FIG. 10 is a diagram visualizing a hub node identified within a patient similarity network according to an embodiment of the present application.

[0041] FIG. 11 is a diagram visualizing patient phenotype clusters identified by projecting patient embedding vectors according to an embodiment of the present application into a two-dimensional space.

[0042] FIGS. 12 and 13 are diagrams visualizing the topological features of patient data extracted through phase data analysis (TDA) according to an embodiment of the present application.

[0043] FIG. 14 is a block diagram of a computer device for performing graph-based clinical analysis that integrates medical information data and medical ontology according to an embodiment of the present application.

[0044] The technical concept of the present application is subject to various modifications and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the technical concept of the present application to specific embodiments, and it should be understood that it includes all modifications, equivalents, and substitutions that fall within the scope of the technical concept of the present application.

[0045] In explaining the technical concept of the present application, detailed descriptions of related prior art are omitted if it is determined that such descriptions may unnecessarily obscure the essence of the present application.

[0046] The terms used herein are for describing embodiments and are not intended to limit or / or restrict the present application. Singular expressions include plural expressions unless the context clearly indicates otherwise. Additionally, numbers used herein (e.g., First, Second, etc.) are merely identifiers to distinguish one component from another.

[0047] In this specification, when it is stated that a part is connected to another part, this includes not only cases where they are directly connected, but also cases where they are indirectly connected with other components in between. Furthermore, when it is stated that a part includes a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components.

[0048] Furthermore, in this application, the term “or” is intended to mean an implicit “or” rather than an exclusive “or.” That is, unless otherwise specified or evident from the context, “X uses A or B” is intended to mean one of the natural implicit substitutions. That is, where X uses A; where X uses B; or where X uses both A and B, “X uses A or B” may apply to any of these cases. Additionally, the term “and / or” as used herein should be understood to refer to and include all possible combinations of one or more of the enumerated related configurations.

[0049] In addition, terms such as “~part,” “~device,” “~device,” and “~module” described in this application refer to a unit that processes at least one function or operation, and this can be implemented as hardware or software or a combination of hardware and software, such as a processor, microprocessor, microcontroller, CPU (Central Processing Unit), GPU (Graphics Processing Unit), APU (Accelerated Processing Unit), DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), etc.

[0050] Furthermore, it is intended to clarify that the classification of the components in this application is merely based on the primary function each component is responsible for. That is, two or more components described below may be combined into a single component, or a single component may be divided into two or more components based on more subdivided functions. Additionally, each component described below may additionally perform some or all of the functions performed by other components in addition to its own primary function, and it is obvious that some of the primary functions performed by each component may be exclusively performed by other components.

[0051]

[0052] The method according to the embodiment of the present application may be performed on a personal computer, workstation, server computer device, etc., equipped with computing power, or on a separate device for this purpose.

[0053] Additionally, the method may be performed on one or more computing devices. For example, at least one step of the method according to an embodiment of the present application may be performed on a client device, and other steps may be performed on a server device. In this case, the client device and the server device may be connected via a network to transmit and receive computation results. Alternatively, the method may be performed by distributed computing technology.

[0054]

[0055] In this specification, the term "artificial intelligence model" may be used interchangeably with "artificial intelligence learning model," "computational model," "machine learning model," etc. An artificial intelligence model may be trained by various algorithms, such as, for example, decision tree, random forest, Gaussian naive bayes, k-nearest neighbor, Ada Boost, support vector machine, voting, bagging, neural network, and deep learning. However, it is not limited thereto.

[0056] An artificial intelligence model can be trained using at least one of supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. Training an artificial intelligence model may be a process of applying knowledge to the model to perform a specific action.

[0057] When algorithms such as neural networks or deep learning are applied to an artificial intelligence model, the AI ​​model may be referred to as a network function. The term "network function" can be used interchangeably with "neural network." A neural network can generally be composed of a set of interconnected computational units referred to as nodes. These nodes may also be referred to as neurons. A neural network is composed of at least one node, and the nodes may be interconnected by one or more links.

[0058] Neural networks may include deep neural networks (DNNs). Deep neural networks may include convolutional neural networks (CNNs), recurrent neural networks (RNNs), autoencoders, restricted Boltzmann machines (RBMs), deep belief networks (DBNs), Q networks, U networks, Siamese networks, and Generative Adversarial Networks (GANs), but are not limited to these.

[0059] In this application, a Large Language Model refers to an artificial intelligence model designed to enable natural language processing. For example, a Large Language Model can generate an optimal response (or answer) to a question by grasping the context of the question related to the input data and analyzing the relationships between words. For example, the language model can be pre-trained based on large-scale data and can be further trained and fine-tuned to suit a specific task or domain.

[0060]

[0061] Hereinafter, embodiments of the present application will be described in detail in turn.

[0062]

[0063] FIG. 1 is a flowchart illustrating a graph-based clinical analysis method that integrates medical information data and a medical ontology according to an embodiment of the present application. In addition, FIG. 2 is a flowchart illustrating the step of constructing an integrated graph database (S110) of FIG. 1, FIG. 3 is a flowchart illustrating the step of analyzing the relationship structure between patients (S140) of FIG. 1, FIG. 4 is a flowchart illustrating the step of verifying stability (S150) of FIG. 1, and FIG. 5 and FIG. 6 are flowcharts illustrating the step of providing explainable clinical information (S160) of FIG. 1.

[0064] In step S110, the computer device can construct an integrated graph database containing medical information data and medical ontology of multiple patients.

[0065] Here, medical information data may refer to clinical data related to a patient's health status, medical history, or treatment process. For example, medical information data may be Electronic Medical Record (EMR) data including the patient's personal details, diagnosis, prescribed medications, procedure history, lab results, or vital signs. However, this is merely illustrative and is not limited thereto.

[0066] In addition, medical ontology may refer to a knowledge system structured to enable computers to understand and process meanings by systematically defining concepts within a medical domain and the relationships between them. For example, medical ontology may include, but is not limited to, disease classification systems such as ICD-10 (International Classification of Diseases), clinical terminology standards such as SNOMED-CT (Systematized Nomenclature of Medicine - Clinical Terms), or drug classification systems such as RxNorm.

[0067] In step S110, an integrated graph database can be created by interconnecting individual instances of such medical information data with abstract concepts of medical ontologies based on nodes and edges. Specifically, the integrated graph database may take the form of a Knowledge Graph composed of nodes representing patients, diagnoses, prescriptions, procedures, etc., and edges representing semantic connections between nodes, such as 'diagnosed_with', 'prescribed_with', and 'is_a'. Through this, the system according to the embodiment can construct structured information that includes deep contextual relations between data, going beyond a simple collection of text data.

[0068] In the embodiment, step S110 may include steps S111 to S113 as shown in FIG. 2.

[0069] First, in step S111, the computer device can preprocess collected medical information data of multiple patients based on a predetermined standard format. Here, the standard format may refer to a standard specification for data exchange to ensure the interoperability of medical data. For example, the standard format may include the HL7 FHIR (Fast Healthcare Interoperability Resources) standard, but this is merely an example and is not limited thereto; various standards such as CDA (Clinical Document Architecture) or OMOP CDM may be applied. The computer device can ensure data consistency by using a data integration tool such as HAPI FHIR to parse data in heterogeneous formats collected from different medical institutions and convert it into the corresponding standard format. In addition, during this process, the computer device can perform de-identification and typo correction tasks in parallel through a regular expression-based preprocessing module. For example, it is possible to mask patient names, resident registration numbers, etc. within the data, correct typos, and convert abbreviations such as 'Tab' and 'Tyl' into standard terms such as 'Tablet' and 'Tylenol'.

[0070] Subsequently, in step S112, the computer device can integrate the semantic context of clinical events by establishing a semantic mapping relationship between a diagnosis or prescription node included in the medical information data and a concept node of the medical ontology, while simultaneously preserving the unique semantic connection structure within the medical ontology on a graph.

[0071] For example, a computer device can utilize a medical domain-specific natural language processing (NLP) engine to extract diagnostic codes (e.g., ICD-10) within input medical information or clinical terms contained in unstructured text. Subsequently, the computer device can map and connect individual diagnosis or prescription nodes extracted from a patient's medical record to standard concept nodes defined within a medical ontology, such as SNOMED-CT or RxNorm. In this process, the computer device does not stop at simply connecting nodes one-to-one, but can also transfer the unique knowledge system inherent in the medical ontology into a graph database. That is, by preserving the semantic connection structure as is—which includes at least one of the following: a hierarchical concept relationship (Is-A) representing the taxonomic hierarchy of the disease, a lesion site relationship (Finding Site) representing the location of the disease's occurrence, or an association relationship between the disease and symptoms—the computer device can reconstruct fragmented clinical events into an integrated semantic context.

[0072] In step S113, the computer device can complete the integrated graph database by generating temporal edges between multiple clinical event nodes connected to the same patient node.

[0073] In other words, the computer device can utilize a Graph Database Management System (GDBMS), such as Neo4j, to sort event nodes—including hospitalization, diagnosis, medication, and surgery—in chronological order of occurrence, rather than simply listing the patient's medical records. The device can connect nodes with sequential relationships using a 'NEXT_EVENT' relationship and assign the time interval between the two events as an attribute of the corresponding edge. By preserving the chronological causal relationships of the patient's medical history within the graph structure, the progression of the disease or the effectiveness of treatment over time can be structured in a traceable format.

[0074] Referring again to FIG. 1, in step S120, the computer device can generate a graph structure to be analyzed based on at least some data of the integrated graph database.

[0075] In this case, the generated graph structure can be defined as a heterogeneous graph containing multiple node types and multiple edge types. That is, unlike a homogeneous graph composed of only a single type of entity, the graph structure contains a mixture of nodes with different attributes, such as 'patient', 'diagnosis', 'prescription', 'procedure', and 'concept' of medical ontology; furthermore, the relationships between them can have a complex structure connected by various types of edges, such as 'treated', 'taken', 'super-concept', and 'temporal precedence'.

[0076] In the embodiment, step S120 can be performed by extracting data from an integrated graph database that meets the conditions of a specific disease group or patient cohort set according to the purpose of analysis.

[0077] The computer device can construct a subgraph by selectively extracting only patient nodes and associated heterogeneous nodes that satisfy specific cohort conditions (e.g., 'men in their 50s diagnosed with hypertension') that the user wishes to analyze from a vast, integrated graph database. Through this, the computer device can exclude interference from unnecessary data and generate an optimized graph structure in which key heterogeneous information suitable for the purpose of analysis is densely packed.

[0078] Referring to FIG. 1, in step S130, the computer device can apply an Embedding Generation Model to the generated graph structure to generate an Embedding Vector that encapsulates the structural information and semantic information of the graph structure.

[0079] Here, the embedding generation model is an artificial intelligence model that transforms high-dimensional graph data into a low-dimensional dense vector space, and can be constructed based on Graph Neural Networks (GNNs), which are artificial intelligence algorithms for learning complex relationships in graph data. For example, the embedding generation model may include at least one architecture among a Graph Convolutional Network (GCN), a Graph Attention Network (GAT), GraphSAGE (Graph Sample and Aggregate), and a Relational Graph Convolutional Network (RGCN) specialized for heterogeneous graph processing.

[0080] Specifically, a computer device can vectorize the following information by learning it through an embedding generation model.

[0081] First, as structural information, it is possible to learn the topology in which each node is located within the graph and the connection patterns between nodes. That is, through a message passing method that aggregates information about neighboring nodes connected to a central node, it is possible to learn that adjacent nodes in the graph are distributed in similar positions in vector space.

[0082] Second, as semantic information, the unique attributes and type information of nodes and edges can be learned. In particular, considering that the graph generated in the previous S120 step is a heterogeneous graph, the computer device can apply different weight matrices or assign attention scores depending on the type of edge (e.g., 'diagnosed', 'prescribed', 'super-concept', etc.). Through this, a sophisticated embedding vector can be generated that reflects not only the simple existence of a connection but also the importance of the medical significance of that connection.

[0083] Consequently, the generated embedding vectors represent information from a complex medical knowledge graph compressed into real-valued vectors that can be computed by a computer, and can be utilized in subsequent steps such as patient condition analysis, disease prediction, or similarity calculation.

[0084] In an embodiment, the computer device can optimize embedding vectors by performing multitask learning that includes tasks of predicting generated clinical events, predicting links between nodes, and community detection.

[0085] Specifically, a method of training multiple auxiliary tasks simultaneously, rather than a single objective function, can be applied. For example, a computer device can train the entire model by weighted summing loss functions of (1) 'clinical event prediction (node ​​classification)' which predicts whether a target patient will develop a future disease, (2) 'link prediction' which infers potential relationships between patients and drugs that are not currently connected, and (3) 'community detection' which identifies patient clusters with similar characteristics. Through this, it is possible to prevent overfitting to specific tasks and generate robust embedding vectors that can be universally used for various analysis tasks.

[0086] In addition, in the embodiments, when performing such multitask learning, a Spectral-Topological Regularization Loss Function may be applied.

[0087] In the general graph neural network training process, there is a risk that macroscopic information about the entire graph may be lost due to the focus on local neighbor information. To prevent this, a computer device can add a spectral-topological normalization loss function to the objective function to ensure that the overall connectivity structure and topological features of the graph structure are maintained within the embedding vector space. That is, the computer device can optimize model parameters in a direction that minimizes structural distortion of the data by constraining strongly connected nodes in the original graph to be located close to each other in the embedding space, and by calculating and backpropagating a learning error that preserves the spectral distribution of the entire graph among the embedding vectors.

[0088] In step S140, the computer device can analyze the relationship structure between patients based on embedding vectors.

[0089] This step may refer to the process of deriving a new secondary relationship network centered on the Patient object, which is different in dimension from the graph structure (a graph containing a mixture of patients, diseases, and drugs) constructed in step S130. By utilizing embedding vectors in which each patient's vast medical history and medical context are compressed, the computer device can identify patients with essentially similar clinical characteristics, even if their outwardly apparent diagnoses differ, and structure their collective patterns.

[0090] In the embodiment, step S140 may include steps S141 to S145 as illustrated in FIG. 3.

[0091] First, in step S141, the computer device can calculate the distance or similarity index between the embedding vectors of the patient nodes.

[0092] For example, a computer device can calculate various distance / similarity metrics, such as Euclidean distance, cosine similarity, or Pearson correlation coefficient, between two patient nodes using the coordinate values ​​of each patient node projected onto a vector space. The resulting indicators represent a high-dimensional similarity that comprehensively considers structural information (similarity of treatment processes) and semantic information (similarity of the medical context of the disease), going beyond simply whether two patients share the same disease code.

[0093] In step S142, the computer device can construct a Patient Similarity Network (PSN) by setting edges for pairs of patient nodes that satisfy criteria where distance or similarity indicators are set.

[0094] For example, a computer device can define all patients as nodes and selectively generate edges only for patient pairs whose calculated similarity value exceeds a preset threshold. Alternatively, the network can be constructed by connecting only the top K neighbors with the highest similarity (K-Nearest Neighbors) to each patient node. The patient similarity network constructed through this process forms a dense topological structure in which patients with similar clinical characteristics cluster together.

[0095] In step S143, the computer device can identify multiple patient clusters within the constructed patient similarity network.

[0096] At this stage, network analysis algorithms such as the Louvain algorithm, Leiden algorithm, or spectral clustering techniques can be applied to detect node groups with high connection strength and internal density within the network. Each patient cluster identified in this way can be defined as a cohort of patients sharing latent clinical patterns, such as a 'high-risk complication group' or a 'highly drug-responsive group'.

[0097] In step S144, the computer device can identify a hub node within a patient cluster whose connectivity centrality is greater than or equal to a predetermined threshold.

[0098] First, the computer device can calculate centrality indicators based on graph theory for patient nodes belonging to each cluster. For example, it can calculate 'degree centrality,' which is connected to the most similar patients, or 'closeness centrality,' which is closest to other patients. Subsequently, the computer device can select the node with the highest centrality indicator or the top N% as the core hub node of the corresponding cluster.

[0099] Finally, in step S145, the computer device defines the hub node as a clinical prototype representing a corresponding patient cluster and can analyze the clinical characteristics of the hub node.

[0100] This is a process designed to improve the inefficiency of analyzing large-scale patient data individually. Instead of analyzing every patient within a cluster, the computer device can focus on analyzing the medical records, disease progression trajectories, and prescription drug patterns of a hub node (prototype patient) that most perfectly represents the characteristics of that cluster. As a result of this analysis, the clinical features of the hub node can be interpreted as standard clinical trajectories or dominant prognostic patterns representative of the entire cluster and provided to medical professionals.

[0101] Meanwhile, although not illustrated, in the embodiment, the method (100) may further include the step of performing clustering on the embedding vector to identify patient phenotype groups. Specifically, the computer device may project the previously generated high-dimensional embedding vector into a low-dimensional space using a dimensionality reduction technique such as Uniform Manifold Approximation and Projection (UMP) or t-SNE. Subsequently, the computer device may apply K-Means clustering or a Hierarchical Density-Based Clustering (HDBSCAN) algorithm to identify clusters of patients that are densely clustered in the embedding space as 'patient phenotype groups'. That is, if the patient cluster identified in step S143 is based on the connection structure on the network, the patient phenotype group may refer to a newly discovered patient subtype in a data-driven manner based on the distribution characteristics of the embedding vector. Through this, potential patient types that were difficult to distinguish using only existing clinical knowledge or diagnostic codes can be identified.

[0102] Referring again to FIG. 1, in step S150, the computer device can verify at least one of the topological stability and statistical stability of the relational structure analyzed through step S140.

[0103] Here, topological stability refers to structural robustness, indicating how long geometric features inherent to the data, such as connectivity components or holes, are maintained without disappearing even when the thresholds or scales constituting the patient similarity network change. Additionally, statistical stability refers to reliability, indicating whether the analysis results remain consistent and reproducible even when variations are introduced, such as randomly sampling a portion of the data or injecting noise.

[0104] Through such stability verification, the computer device can mathematically prove that patient clusters or patterns derived by the artificial intelligence model are not the result of chance but are meaningful structures actually existing within the data. For example, the computer device can objectively guarantee the reliability of the discovered patient clusters or patterns by introducing Topological Data Analysis (TDA), which analyzes the shape of the data, and statistical testing techniques.

[0105] In the embodiment, step S150 may include steps S151 to S153 as shown in FIG. 4.

[0106] First, in step S151, the computer device can perform Topological Data Analysis (TDA) on the patient similarity network to extract Topological Invariants or Persistence Features.

[0107] In the embodiment, at step S151, a persistent homology technique may be utilized. Specifically, the computer device may track the lifetimes of topological features (i.e., Betty numbers), such as connected components (β0), holes (β1), or voids (β2), that are created and destroyed within the network while gradually changing the connection threshold of the patient similarity network. The topological invariants or persistence features extracted at this time may be visualized or quantified in the form of a barcode or persistence diagram, which can represent the structural shape inherently possessed by the data without being affected by noise.

[0108] Subsequently, in step S152, the computer device can evaluate the statistical stability of topological invariants or persistence features using a statistical resampling technique.

[0109] In other words, since the mere existence of topological features is not sufficient, the computer device can verify robustness against data variability by applying resampling techniques such as bootstrapping. For example, the computer device can generate multiple virtual networks by randomly sampling from the original data with replacement, and by checking whether the topological features identified in step S151 (e.g., Betti numbers) are consistently observed in these networks as well, it can calculate a confidence interval for the feature and verify that the network structure is not due to chance but is a statistically stable feature.

[0110] Finally, in step S153, the computer device can verify the validity of the phenotype by comparing whether the patient phenotype group based on clustering of the embedding vector matches the topological invariant or persistence feature.

[0111] Specifically, the computer device can compare whether the number of previously identified patient phenotype groups or cluster structures match the topological invariants (e.g., the number of zero-dimensional features on the persistence diagram) extracted through topological data analysis in step S151. If the number of patient clusters discovered through data-driven clustering matches the number of stable connected components derived mathematically (topologically), the computer device can confirm that the discovered patient phenotypes are valid results consistent with the intrinsic topological structure of the data and output this as the final analysis result.

[0112] Meanwhile, although not illustrated, in an embodiment, the method (100) may further include the step of calculating a quality indicator for at least one of a specific disease group and a prescription pattern using an integrated graph database.

[0113] Specifically, the computer device can determine whether it meets pre-established medical quality control standards by exploring patient, diagnosis, and prescription nodes and the relationships between them stored in an integrated graph database. For example, to calculate the 'Statin Use in Persons with Diabetes (SUPD)' metric, the computer device can identify patient nodes with diabetes diagnosis codes among all patients and check whether statin-class drugs exist among the prescription nodes connected to those patient nodes. Additionally, to calculate the 'Concurrent Use of Opioids and Benzodiazepines (COB)' metric, the computer device can detect time-series patterns in which opioid-class drugs and benzodiazepine-class drugs are prescribed to the same patient for a specified period (e.g., 30 days) or longer. Through this, the computer device can monitor in real-time whether regulatory agency evaluation criteria are met or proactively identify patient groups at risk of potential drug side effects to provide alerts to medical staff.

[0114] In step S160, the computer device can generate and provide explainable clinical information corresponding to a user query based on a relational structure and an integrated graph database.

[0115] In the embodiment, step S160 may include steps S161 to S164 as illustrated in FIGS. 5 and FIGS. 6.

[0116] First, in step S161, the computer device can analyze the user's natural language query and generate a GraphQuery to explore the integrated graph database.

[0117] Specifically, a computer device can use a large language model (LLM) to understand the intent of a user's question and automatically generate a query for querying a graph database that matches it (e.g., a Cypher query).

[0118] Subsequently, in step S162, the computer device can generate an answer to a natural language query based on patient cluster information identified in the patient trajectory path and relationship structure (i.e., patient similarity network) extracted from the integrated graph database as a result of executing a graph query.

[0119] First, the computer device can extract key information necessary for generating an answer from an integrated graph database as a result of executing a graph query. The information extracted at this time may include not only simple text, but also structured evidence data such as (1) the patient journey of the target patient, (2) prediction results and major contributing variables regarding the patient's clinical condition (described in detail in steps S163 to S164 below), (3) cluster and hub node information identified on the patient similarity network, (4) super / subordinate concepts of the relevant medical ontology, and (5) quality indicators (SUPD, COB, etc.).

[0120] Subsequently, the computer device can generate medically valid and logical answers to natural language queries by injecting the extracted structured evidence data into a large-scale language model (LLM) as context. For example, the computer device can explain a specific prognosis by referring to the treatment progress of a hub node (prototype) among similar patients, along with the predicted result of the current condition of the queried patient, common clinical characteristics of the similar cluster to which the patient belongs (e.g., impaired liver function).

[0121] In step S163, the computer device can generate a prediction result regarding the patient's clinical condition using the embedding vector. This step is a process of numerically predicting the risk of future clinical events (e.g., occurrence of sepsis, acute exacerbation, etc.) by utilizing the previously generated high-quality patient embedding vector.

[0122] Finally, in step S164, the computer device may calculate major clinical variables that contribute significantly to the prediction results and provide information on the contribution of major clinical variables by including it in explainable clinical information.

[0123] For example, a computer device can calculate SHAP (SHapley Additive exPlanations) values ​​and identify clinical factors (e.g., elevated white blood cell counts, administration of specific drugs, etc.) that had the greatest impact on the predicted risk. Furthermore, by visualizing this contribution information or including it in text descriptions to provide it to the user, the basis for the AI's judgment can be transparently explained.

[0124] The configurations of FIGS. 1 to 6 are exemplary, and various configurations may be applied according to the embodiments of the present application.

[0125]

[0126] FIG. 7 is a diagram schematically illustrating an overall data processing pipeline that performs clinical analysis based on a graph integrating medical information data and medical ontology according to an embodiment of the present application.

[0127] First, computer devices can collect patients' Electronic Health Record (EHR) data from sources such as hospital information systems. The collected data includes patient information, medical records, and prescription history, and can be combined with medical ontologies (such as SNOMED-CT) to construct an integrated graph database (Graph DB). In this process, individual data records are transformed into nodes and edges, forming an interconnected knowledge graph.

[0128] Next, the computer device can train a Graph Neural Network (GNN) model using the constructed integrated graph database as input. The GNN model learns complex relationship information and node attributes within the graph structure to convert each patient node into a low-dimensional embedding vector. At this time, techniques such as multitask learning or spectral normalization can be applied to optimize the quality of the embeddings.

[0129] Based on learned embedding vectors, a computer device can generate a Patient Similarity Network (PSN). This is a new type of secondary graph structure in which patients with similar clinical characteristics are connected, providing a basis for visualizing and analyzing potential associations between patients.

[0130] Finally, the computer device can perform Topological Data Analysis (TDA) and Graph Analysis (ML on Graphs) on the generated patient similarity network. Through TDA, the stability of the analysis results can be verified by identifying the topological structure of the data (connected components, holes, etc.), and data-driven clinical insights can be derived by identifying patient clusters or discovering key hub nodes (prototypes) through graph analysis techniques (community detection, centrality analysis, etc.).

[0131]

[0132] FIG. 8 is a diagram illustrating an exemplary form in which patient information, clinical events, and medical ontologies are connected in a multi-layered hierarchical structure within an integrated graph database according to an embodiment of the present application.

[0133] Referring to Fig. 8, the integrated graph database can be modeled by dividing it into multiple levels according to the level of data abstraction and semantic connection depth.

[0134] First, Level 1 is the 'Patient Basic Information Hierarchy'. The top-level node, the Patient node, is connected to the Admission node via the 'HAS_ADMISSION' edge, and the Admission node is connected to the DRG Code node ('CLASSIFIED_BY'), which is the patient classification system, to define the patient's basic personal information and insurance claim information.

[0135] Level 2 is the 'Clinical Event' layer. The admission node is connected to the Intensive Care Unit (ICUStay) node, which in turn is connected to specific clinical action nodes such as medication administration (InputEvent), vital sign measurement (ChartEvent), excretion (OutputEvent), and procedures (ProcedureEvent). Each event node is connected to an Item node representing the definition of the corresponding action via a 'DEFINED_BY' edge to clarify the standardized meaning of the data.

[0136] Level 3 is the 'Diagnosis / Drug Ontology' layer. The Diagnosis node connected to the Inpatient node is linked to concept nodes of standard medical terminology systems, such as ICD-10 or SNOMED-CT, via a 'MAPS_TO' edge, extending medical meaning (such as hierarchical concepts) beyond simple diagnostic codes. Additionally, the Prescription node is connected to the Drug Item node, which is then linked to the IngredientEvent node via a 'CONTAINS_INGREDIENT' edge, configured to allow tracking of specific ingredient information of the drug.

[0137] Level 4 is the 'Lab / Monitoring Ontology' layer. Diagnostic test (LabEvent) nodes and biosignal (ChartEvent) nodes are connected to standardized test item (LabItem) and measurement item (Item) nodes, respectively, providing a foundation for integrating and analyzing data measured from different equipment or departments if they have the same meaning.

[0138] Finally, Level 5 is the 'Administration Ontology' layer. The prescription node is connected to the Electronic Medication Record (EMAR) node and the Detail (EMAR_Detail) node ('DOCUMENTS_DISPENSATION', 'HAS_DETAIL') to record the time and dosage at which the drug was administered to the actual patient, and is linked to the Pharmacy node ('BASED_ON') to track the entire drug use path, including the dispensing and dispensing process.

[0139] Through the multi-layered hierarchical structure constructed in this way, the computer device can perform three-dimensional data inference, such as vertically exploring from macroscopic patient hospitalization information to microscopic drug component information, or horizontally analyzing the correlation between diagnosis and prescription.

[0140]

[0141] FIG. 9 is a diagram illustrating an exemplary visualization of a specific graph structure in which a patient's clinical path and medical ontology are interconnected within an integrated graph database according to an embodiment of the present application.

[0142] Referring to Figure 9, the integrated graph database has a hierarchical structure in which multiple admission nodes are connected around a patient node, and each admission node is connected to an intensive care unit (ICU) stay node.

[0143] Specifically, the central patient node is connected to the '1st Admission' and '2nd Admission' nodes in chronological order via the 'HAS_ADMISSION' edge, representing the patient's hospital visit history in a time-series manner. Each admission node is then connected to the '1st ICU Stay' and '2nd ICU Stay' nodes that occurred during the corresponding hospitalization period via the 'HAS_ICU_STAY' edge.

[0144] Each ICU Stay node is radially connected to nodes representing specific clinical actions performed during that period. For example, 'Input Events' (e.g., drug administration), 'Output Events' (e.g., urine volume measurement), and 'Diagnosis' nodes are connected via relationships such as 'HAS_EVENT' to record the patient's detailed treatment process.

[0145] In particular, the Diagnosis node contains diagnostic information such as ICD-10 codes, which are connected to the 'SNOMED-CT Concepts' node via the 'MAPS_TO' edge. This demonstrates that data can be expanded beyond simple diagnosis records to encompass a vast medical knowledge system (Medical Ontology), including higher-level concepts of the disease, affected sites, and associated symptoms.

[0146] Through this graph structure, the computer device can seamlessly track the patient's entire journey from admission to discharge and readmission, while simultaneously deeply interpreting the medical context of clinical events occurring at each stage through an ontology.

[0147]

[0148] FIG. 10 is a diagram visualizing a hub node identified within a patient similarity network according to an embodiment of the present application.

[0149] Referring to Figure 10, the patient similarity network has a complex structure in which multiple patient nodes are interconnected, and at the center is a hub node (Hub Patient, e.g., visualized in yellow) that has the most connections (Edges). For example, it can be seen that patient number 817 is directly connected to 327 other patients and acts as a central anchor in the most dense area of ​​the network.

[0150] In addition, neighboring nodes around the hub node can be visualized in different colors according to their predicted risk scores. For example, nodes of the first color (e.g., red) can be visualized to represent the high-risk group (High Risk > 0.7), nodes of the second color (e.g., green) to represent the medium-risk group (Medium Risk 0.4-0.7), and nodes of the third color (e.g., blue) to represent the low-risk group (Low Risk < 0.4). Through this, one can intuitively understand the risk distribution of the patient cluster formed around the hub node.

[0151] In particular, the hub node can function as a prototype representing the clinical characteristics of the cluster. By analyzing the detailed clinical information of this hub node (e.g., diagnosis name K830, risk level 0.561, severity level Severe, etc.), the computer device can efficiently infer the dominant pathophysiology or clinical patterns (e.g., severe inflammatory response, etc.) shared by the entire cluster and generate hypotheses.

[0152]

[0153] FIG. 11 is a diagram visualizing patient phenotype clusters identified by projecting patient embedding vectors according to an embodiment of the present application into a two-dimensional space.

[0154] Referring to Figure 11, when the Uniform Manifold Approximation and Projection (UMP) algorithm is applied to a high-dimensional patient embedding vector space and visualized, it can be seen that different independent patient clusters are formed as a result.

[0155] These clusters are derived based solely on the latent structure and patterns of the data learned by GNN models, without relying on predefined clinical labels or diagnostic codes. This can visually demonstrate the existence of potential patient subtypes or new phenotypes that were not clearly distinguished by existing clinical knowledge alone.

[0156] Furthermore, points that do not belong to a cluster and are scattered (outliers) may represent specific patient cases that deviate from typical patterns; these anomalies can be identified separately to classify them as potential risk groups or used for further detailed analysis. By cross-validating these clustering results with the topological structure of the previously constructed patient similarity network, they can serve as important evidence supporting the statistical and clinical validity of the discovered patient phenotypes.

[0157]

[0158] FIGS. 12 and 13 are diagrams visualizing the topological features of patient data extracted through phase data analysis (TDA) according to an embodiment of the present application.

[0159] First, Figure 12 shows a persistence barcode that represents the persistence of topological features in the form of bars.

[0160] Referring to Fig. 12, (a) shows the barcode for connected components (H0), which are 0-dimensional phase features; (b) shows the barcode for cycles or holes (H1), which are 1-dimensional features; and (c) shows the barcode for voids (H2), which are 2-dimensional features. The horizontal axis represents the filtration parameter, indicating the change in the network connection threshold, and the length of the bar indicates the lifespan during which the corresponding feature is maintained. In the figure, features indicated by thick, long bars represent the intrinsic structure (signal) of the data that is not easily lost due to noise, and through this, it can be confirmed that there are multiple stable clusters (H0) and complex cyclic relationships (H1) within the patient data.

[0161] Next, Fig. 13 shows a persistence diagram projecting the birth and death times of topological features onto a two-dimensional coordinate plane.

[0162] Referring to FIG. 13, each point represents a topological feature, and the further it is located from the diagonal (y=x), the longer the lifespan and persistence of the corresponding feature. In the H0 diagram of FIG. 13 (a), the H1 diagram of (b), and the H2 diagram of (c), multiple points spaced apart from the diagonal are observed, which visually demonstrates that the patient similarity network constructed in this application is not a simple random connection, but has a mathematically significant and robust topological structure.

[0163]

[0164] FIG. 14 is a block diagram of a computer device for performing graph-based clinical analysis that integrates medical information data and medical ontology according to an embodiment of the present application.

[0165] The communication unit (1410) can receive or transmit data from inside or outside. The communication unit (1410) may include a wired or wireless communication unit. If the communication unit (1410) includes a wired communication unit, the communication unit (1410) may include one or more components that enable communication through a Local Area Network (LAN), a Wide Area Network (WAN), a Value Added Network (VAN), a mobile radio communication network, a satellite communication network, and combinations thereof. Additionally, if the communication unit (1410) includes a wireless communication unit, the communication unit (1410) can transmit or receive data or signals wirelessly using cellular communication, a wireless LAN (e.g., Wi-Fi), etc. In an embodiment, the communication unit (1410) can transmit or receive data or signals to and from an external device or an external server under the control of a processor (1440).

[0166] The input unit (1420) can receive various user commands through external operation. To this end, the input unit (1420) may include or be connected to one or more input devices. For example, the input unit (1420) may receive user commands by being connected to an interface for various inputs, such as a keypad or a mouse. To this end, the input unit (1420) may include an interface such as a USB port as well as a Thunderbolt. Additionally, the input unit (1420) may receive external user commands by including or combining with various input devices such as a touchscreen or a button.

[0167] The memory (1430) can store programs and / or program instructions for the operation of the processor (1440) and can temporarily or permanently store input / output data. The memory (1430) may include at least one type of storage medium among flash memory type, hard disk type, multimedia card micro type, card type memory (e.g., SD or XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, magnetic memory, magnetic disk, and optical disk.

[0168] Additionally, the memory (1430) can store various artificial intelligence models, network functions and algorithms, and can store various data, programs (one or more of which are instructions), applications, software, commands, code, etc. for driving and controlling the device (1400).

[0169] The processor (1440) can control the overall operation of the device (1400). The processor (1440) can execute one or more programs or software stored in memory (1430). The processor (1440) may mean a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a dedicated processor (1440) on which the methods according to embodiments of the present application are performed.

[0170] In an embodiment, the processor (1440) can perform the method (100) described above with reference to FIGS. 1 to 6 by executing one or more programs or software stored in memory (1430).

[0171] In an embodiment, the processor (1440) models medical information data and medical ontology data of a plurality of patients into nodes and edges to construct an integrated graph database, generates a graph structure to be analyzed based on at least some of the data in the integrated graph database, applies an embedding generation model to the graph structure to generate an embedding vector that encapsulates structural information and semantic information of the graph structure, analyzes the relationship structure between patients based on the embedding vector, verifies at least one of topological stability and statistical stability of the relationship structure, and generates and provides explainable clinical information corresponding to a user query based on the relationship structure and the integrated graph database.

[0172] In an embodiment, the processor (1440) can generate a graph structure by extracting data from an integrated graph database that meets the conditions of a specific disease group or patient cohort set according to the purpose of analysis.

[0173] In an embodiment, the processor (1440) can integrate the semantic context of a clinical event by establishing a semantic mapping relationship between a diagnosis or prescription node included in medical information data and a concept node of a medical ontology, and by preserving a semantic connection structure including at least one of a super-subordinate concept relationship, a lesion site relationship, and an association relationship within the medical ontology.

[0174] In an embodiment, the processor (1440) can preserve a time-series causal relationship by preprocessing medical information data based on a predetermined standard format and generating temporal edges indicating the chronological relationship or temporal interval of occurrence between a plurality of clinical event nodes connected to the same patient node.

[0175] In an embodiment, the processor (1440) can optimize the embedding vector by performing multitask learning including clinical event prediction, inter-node link prediction, and community detection tasks.

[0176] In an embodiment, the processor (1440) can optimize model parameters by applying a spectral-topological normalization loss function that induces the overall connectivity structure and topological features of the graph structure to be maintained within the embedding vector space when performing multitask learning.

[0177] In an embodiment, the processor (1440) can construct a Patient Similarity Network (PSN) by calculating a distance or similarity index between embedding vectors of patient nodes and setting edges for pairs of patient nodes that satisfy a set criterion for the calculated distance or similarity index.

[0178] In an embodiment, the processor (1440) identifies a plurality of patient clusters within a patient similarity network, identifies a hub node within each patient cluster whose connectivity centrality is greater than or equal to a predetermined threshold, defines the hub node as a clinical prototype representing the corresponding patient cluster, and can analyze the clinical characteristics of the hub node.

[0179] In an embodiment, the processor (1440) can perform topological data analysis on a patient similarity network to extract topological invariants or persistence features.

[0180] In an embodiment, the processor (1440) can evaluate the reliability or significance of a topological invariant or persistence feature using statistical resampling.

[0181] In an embodiment, the processor (1440) can identify patient phenotype groups by performing clustering on the embedding vectors.

[0182] In an embodiment, the processor (1440) can validate the validity of a data-drivenly discovered phenotype by comparing whether a patient phenotype group matches a topological invariant or persistence feature.

[0183] In an embodiment, the processor (1440) can generate a graph query to explore an integrated graph database by analyzing a user's natural language query, and generate an answer to the natural language query based on patient trajectory paths extracted from the integrated graph database and patient cluster information identified in the relationship structure as a result of executing the graph query.

[0184] In an embodiment, the processor (1440) can calculate a prediction result for a patient's clinical condition using an embedding vector, calculate a major clinical variable that contributes significantly to the prediction result, and provide information on the contribution of the major clinical variable by including it in explainable clinical information.

[0185] In an embodiment, the processor (1440) can calculate a quality indicator for at least one of a specific disease group and prescription pattern using an integrated graph database.

[0186] The configuration of FIG. 14 is exemplary, and various configurations may be applied according to embodiments of the present application.

[0187]

[0188] The method according to an embodiment of the present application may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., either alone or in combination. The program instructions recorded on the medium may be those specifically designed and configured for the present application or may be those known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, and flash memory. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

[0189] Additionally, the method according to the disclosed embodiments may be provided by being included in a computer program product. The computer program product may be traded between a seller and a buyer as a product.

[0190] A computer program product may include a software program and a computer-readable storage medium on which the software program is stored. For example, a computer program product may include a product in the form of a software program (e.g., a downloadable app) that is electronically distributed through a manufacturer of an electronic device or an electronic market (e.g., Google Play Store, App Store). For electronic distribution, at least a portion of the software program may be stored on a storage medium or temporarily created. In this case, the storage medium may be a server of the manufacturer, a server of the electronic market, or a storage medium of a relay server that temporarily stores the software program.

[0191] A computer program product may include a storage medium of a server or a storage medium of a client device in a system composed of a server and a client device. Alternatively, if there is a third device (e.g., a smartphone) that communicates with the server or the client device, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may include the S / W program itself that is transmitted from the server to the client device or the third device, or transmitted from the third device to the client device.

[0192] In this case, one of the server, the client device, and the third device may execute the computer program product to perform the method according to the disclosed embodiments. Alternatively, two or more of the server, the client device, and the third device may execute the computer program product to perform the method according to the disclosed embodiments in a distributed manner.

[0193] For example, a server (e.g., a cloud server or an artificial intelligence server, etc.) can execute a computer program product stored on the server to control a client device connected to the server in communication to perform a method according to the disclosed embodiments.

[0194]

[0195] Although the embodiments have been described in detail above, the scope of the present application is not limited thereto, and various modifications and improvements by those skilled in the art using the basic concept of the present application as defined in the following claims also fall within the scope of the present application.

Claims

1. In a graph-based clinical analysis method integrating medical information data and medical ontology, A step of constructing an integrated graph database by modeling medical information data and medical ontology data of multiple patients into nodes and edges; A step of generating a graph structure to be analyzed based on at least some data of the integrated graph database; A step of applying an embedding generation model to the graph structure to generate an embedding vector that encapsulates the structural information and semantic information of the graph structure; A step of analyzing the relationship structure between patients based on the above embedding vector; A step of verifying at least one of the topological stability and statistical stability of the above relational structure; and A method comprising the step of generating and providing explainable clinical information corresponding to a user query based on the above relationship structure and the above integrated graph database.

2. In Paragraph 1, The step of creating the above graph structure is, A method performed by extracting data from the integrated graph database that meets the conditions of a specific disease group or patient cohort set according to the purpose of analysis.

3. In Paragraph 1, The step of constructing the above integrated graph database is, A method comprising the step of establishing a semantic mapping relationship between a diagnosis or prescription node included in the medical information data and a concept node of the medical ontology, and preserving a semantic connection structure including at least one of a super-subordinate concept relationship, a lesion site relationship, and an association relationship within the medical ontology to integrate the semantic context of a clinical event.

4. In Paragraph 3, The step of constructing the above integrated graph database is, A step of preprocessing the above medical information data based on a predetermined standard format; and A method further comprising the step of preserving a time-series causal relationship by generating a temporal edge indicating a sequence of occurrence times or a temporal interval between a plurality of clinical event nodes connected to the same patient node.

5. In Paragraph 1, The above embedding generation model is a graph neural network (GNN)-based model and comprises at least one of a graph attention network (GAT), a graph convolutional network (GCN), a relational graph convolutional network (RGCN), and GraphSAGE.

6. In Paragraph 5, The step of generating the above embedding vector is, A method comprising the step of optimizing the embedding vector by performing multitask learning including clinical event prediction, inter-node link prediction, and community detection tasks.

7. In Paragraph 6, In the step of optimizing the above embedding vector, A method for optimizing model parameters by applying a spectral-topological normalization loss function that induces the overall connection structure and topological features of the graph structure to be maintained within the embedding vector space when performing the multitask learning described above.

8. In Paragraph 1, The step of analyzing the relationship structure between the above patients is, A step of calculating a distance or similarity index between embedding vectors of patient nodes; and A method comprising the step of constructing a Patient Similarity Network (PSN) by setting edges for pairs of patient nodes that satisfy a set criterion for the calculated distance or similarity indicator.

9. In Paragraph 8, The step of analyzing the relationship structure between the above patients is, A step of identifying multiple patient clusters within the above patient similarity network; A step of identifying a hub node within each of the above patient clusters whose connectivity centrality is greater than or equal to a predetermined threshold; and A method comprising further including the step of defining the hub node as a clinical prototype representing the corresponding patient cluster and analyzing the clinical characteristics of the hub node.

10. In Paragraph 8, The step of verifying at least one of the above topological stability and statistical stability is, A method comprising the step of performing topological data analysis on the patient similarity network to extract topological invariants or persistence features.

11. In Paragraph 10, The step of verifying at least one of the above topological stability and statistical stability is, A method further comprising the step of evaluating the reliability or significance of the topological invariant or persistence feature using statistical resampling.

12. In Paragraph 10, A method further comprising the step of identifying patient phenotype groups by performing clustering on the above embedding vectors.

13. In Paragraph 12, The step of verifying at least one of the above topological stability and statistical stability is, A method further comprising the step of verifying the validity of a data-drivenly discovered phenotype by comparing whether the patient phenotype group matches the topological invariant or persistence feature.

14. In Paragraph 1, The step of generating and providing the above-describeable clinical information is, A step of analyzing a user's natural language query to generate a graph query for exploring the integrated graph database; and A method comprising the step of generating an answer to the natural language query based on a patient trajectory path extracted from the integrated graph database as the result of executing the graph query and patient cluster information identified in the relationship structure.

15. In Paragraph 1, The step of generating the above-describeable clinical information is, A step of calculating a prediction result for the patient's clinical condition using the above embedding vector; A step of calculating key clinical variables that contribute significantly to the above prediction results; and A method comprising the step of providing information on the contribution of the above major clinical variables by including it in the above explainable clinical information.

16. In Paragraph 1, A method further comprising the step of calculating a quality indicator for at least one of a specific disease group and a prescription pattern using the integrated graph database above.

17. A computer program stored on a recording medium to execute a method according to any one of paragraphs 1 through 16.

18. A computer device for performing graph-based clinical analysis integrating medical information data and medical ontology, At least one processor; and It includes memory for storing a program executable by the above processor, and A computer device comprising: a processor, by executing the program, modeling medical information data and medical ontology data of a plurality of patients into nodes and edges to construct an integrated graph database; generating a graph structure to be analyzed based on at least some data of the integrated graph database; applying an embedding generation model to the graph structure to generate an embedding vector containing structural information and semantic information of the graph structure; analyzing the relationship structure between patients based on the embedding vector; verifying at least one of topological stability and statistical stability of the relationship structure; and generating and providing explainable clinical information corresponding to a user query based on the relationship structure and the integrated graph database.