Chronic disease progression pattern mining method and system based on time-series comorbidity network
By constructing a time-related risk coefficient-based temporal co-morbidity network and combining it with a graph neural network, the critical path of disease progression is identified, which solves the problem that the influence of time intervals is not considered in the existing technology and achieves more accurate identification of disease progression paths.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- UNIV OF ELECTRONICS SCI & TECH OF CHINA
- Filing Date
- 2025-07-23
- Publication Date
- 2026-06-26
Smart Images

Figure CN120824036B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of disease progression prediction technology, specifically to a method and system for mining chronic disease progression patterns based on temporal comorbidity networks. Background Technology
[0002] With the advancement of large-scale medical data analysis and network analytics techniques, constructing disease networks and conducting comorbidity studies using clinical medical data has become a common approach. For directed disease networks, current research attempts to uncover disease progression patterns through network analysis, particularly identifying ordered pathways of disease development or predicting potential disease progression directions in patients. This type of research has significant reference value for disease prediction, prevention, and the development of personalized medicine plans.
[0003] However, existing research has the following problems and gaps: (1) When calculating the directed relationship between diseases based on medical visit data, the influence of the time interval between the two diseases on the strength of the relationship is generally not considered, and only the existence of a sequential relationship is considered. (2) There is still a gap in how to mine important and meaningful chronic disease progression paths through directed temporal comorbidity networks.
[0004] In view of this, the present invention is proposed. Summary of the Invention
[0005] This invention aims to solve at least one of the above-mentioned technical problems and provides a method and system for mining chronic disease progression patterns based on temporal co-morbidity networks. This method combines node feature vectors with graph neural network learning results to identify and mine key paths of disease progression, and further constructs a temporal co-morbidity network by calculating time-related risk coefficients.
[0006] Compared with the prior art, the present invention has the following beneficial effects:
[0007] (1) This invention provides a method for identifying critical paths of disease progression that combines time-related risk coefficients and graph neural networks. Compared with traditional path mining methods, it incorporates graph neural networks to fuse node information of diseases in the graph, thereby enhancing the correlation between different diseases in a specific context and improving the accuracy and stability of path identification.
[0008] (2) In constructing the relevant risk coefficient, the present invention considers the influence of the time interval and retains the temporal correlation information between diseases more accurately than the traditional correlation coefficient construction method. Attached Figure Description
[0009] Figure 1 A flowchart illustrating the method for mining chronic disease progression patterns based on temporal comorbidity networks provided in this embodiment of the invention;
[0010] Figure 2 This is a framework diagram of a method for mining chronic disease progression patterns based on temporal comorbidity networks, provided in a specific embodiment of the present invention.
[0011] Figure 3 A schematic diagram of the structure of a chronic disease progression pattern mining system based on temporal comorbidity networks provided for an embodiment of the present invention;
[0012] Figure 4 A schematic block diagram of an example electronic device provided for an embodiment of the present invention. Detailed Implementation
[0013] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0014] refer to Figure 1 The first embodiment of the present invention provides a method for mining chronic disease progression patterns based on temporal comorbidity networks, which specifically includes the following steps:
[0015] S101: Obtain multi-time-point medical diagnosis records of the patient group and construct a directed temporal co-disease network with edge weights, where nodes represent diseases and directed edges represent the temporal progression relationship between diseases.
[0016] This step requires obtaining multiple medical records from the patient population. Generally, the more medical records and the larger the sample size, the more accurate the final data mining results. For any given patient, each diagnostic record typically includes the patient's unique identifier, the date of their visit, and one or more disease diagnoses.
[0017] Methods for constructing directed temporal co-problem networks with edge weights include:
[0018] S101-1, Construct an illness time matrix based on the patient's first diagnosis time.
[0019] The method for constructing the disease time matrix in this invention is a commonly used method in the field. Specifically, for each patient, the first occurrence time of all their diagnosed diseases is counted to construct a disease time matrix, with rows representing all patients and columns representing disease types. Matrix elements represent the time when a patient first contracted a certain disease; if a patient was not diagnosed with that disease, the corresponding matrix element is set to empty.
[0020] S101-2, for any two diseases, determine the temporal directionality using a binomial test.
[0021] The method used in this invention to determine the temporal directionality is the binomial test, which is commonly used in the field. For example, taking disease i and disease j as examples, the number of patients suffering from i and j and their onset times are statistically analyzed based on the disease duration matrix. The proportion of patients suffering from both i and j whose disease j was diagnosed after i is calculated. : , where N ij The total number of patients who have both diseases i and j. This represents the number of patients who have both disease i and j, and j is diagnosed after i.
[0022] For satisfying For the disease pair, a binomial test was further used to calculate its significance level p:
[0023] Where X represents a random variable, indicating that in N... ij The number of times disease j was diagnosed after i in each independent experiment; For N ij In each independent experiment, X is greater than or equal to The probability of; Let p0 be the summation variable; p0 = 0.5, representing the expected probability of the random order.
[0024] For disease pairs that meet the significance threshold, the temporal directionality from disease i to disease j is considered to hold.
[0025] S101-3, for disease pairs with significant temporal directionality, calculate the time-related risk coefficient as the edge weight.
[0026] To more accurately preserve the temporal correlation information between diseases, in some preferred embodiments, the time interval between the onset of disease pairs is considered in the correlation calculation of disease pairs, and a time-related risk coefficient is proposed. The influence of the time interval is taken into account in the construction of the correlation risk coefficient. The edge weights are dynamically adjusted according to the time interval of the disease pairs; the shorter the time interval, the larger the weight value.
[0027] The calculation of edge weights includes: for disease pairs with significant temporal directionality, calculating the weight value based on a dynamic decay function of the time interval between the diagnosis of the two diseases; the dynamic decay function satisfies that the weight value monotonically decreases as the time interval increases. The decay rate of the dynamic decay function is controlled by preset parameters.
[0028] Specifically, the formula for calculating the edge weights, that is, the formula for calculating the time-related risk coefficient TWRR(i,j) of the disease pair (i, j), is as follows: The weighting term in this formula This is the dynamic decay function of the time interval, where x represents the patient. This is the set of patients who have both disease i and disease j, and j is diagnosed after i. N represents the number of patients who have both disease i and j, and j is diagnosed after i. i N j Let Δt represent the number of patients suffering from disease i and disease j, respectively, and let N represent the total number of patients in the patient group. x The interval between diagnosing disease i and disease j for patient x is t0, where t0 is the preset initial weight, i.e., the preset parameter that controls the decay rate.
[0029] The set of directed disease pairs identified above is used as the network node set. The time-related risk coefficients and temporal directions between diseases are used as the weights and directions of the edges to form the network edge set, thus obtaining the complete directed temporal co-disease network.
[0030] S102, Based on the directed temporal co-disease network, the individual characteristics and network topology characteristics of each disease node are fused to generate an initial feature vector for the node.
[0031] The method for generating the initial feature vector of nodes in this invention is a commonly used method in the field, where "fusion" refers to the simple concatenation of individual features with network topology features. Specifically, based on a directed temporal co-disease network, an initial feature vector is constructed for any disease node i. Where R is the set of real numbers and d is the feature dimension, the feature vector consists of two types of features:
[0032] (1) Individual characteristics of the disease, such as the prevalence of disease i, the mortality rate of disease i, etc.;
[0033] (2) Node network characteristics, such as node out-degree, node in-degree, node centrality, clustering coefficient, etc.
[0034] S103, the initial feature vector of the node and the edge weight are input into the graph attention network, and the features of adjacent nodes are dynamically aggregated through the attention mechanism to output the disease node embedding representation containing temporal semantic association.
[0035] As a specific embodiment, refer to Figure 2 In this step, the input information for the graph attention network is the initial feature vector in S102: ,in, R is the set of real numbers, and v is the number of nodes. Since the graph attention mechanism is based on edge weights, it can be considered as inputting the initial feature vector and edge weights together into the graph attention network.
[0036] The execution of the attention mechanism includes:
[0037] Use edge weights as a weighting factor in the attention coefficient calculation;
[0038] The representation of the current node is updated by weighting and aggregating the features of neighboring nodes based on the attention coefficient.
[0039] Specifically, for any disease node i, disease j is the starting node of an incoming edge from disease node i, and the attention coefficient between node pairs (i, j) in the q-th layer of the graph attention network is... The calculation formula is: Where LeakReLU is the neural network activation function; q is the current network layer number; and Let be the feature vectors of disease i and disease j output by the (q-1)th layer of the graph attention network, respectively, where 1 ≤ q ≤ L, and L is the number of layers in the graph attention network. When q = 1, and Let be the initial feature vectors of disease nodes i and j, respectively. When q = L, and These are the feature vectors of disease nodes i and j, respectively, output by the graph attention network. , These are learnable parameters, all representing the parameters of the q-th layer network; It is a weighting function for the edge weights.
[0040] Attention coefficient The complete attention weights can be obtained by performing softmax normalization, which is commonly used in this field. Calculation formula: , where exp is the exponential operation with the natural constant e as the base, N_in(i) is the set of neighboring disease nodes of all incoming edges of disease i, and k represents the index of all neighboring nodes.
[0041] The feature vector of the disease node is updated using the attention weights obtained above, that is, the features of neighboring nodes are aggregated based on the attention coefficients. The calculation formula is as follows: Where σ is the activation function, Let be the feature vector of disease i output by the graph attention network at layer q, which is the updated feature vector at layer q.
[0042] After processing by the L-layer graph attention network, its final output information is: ,in, , d L This represents the dimension of the final output feature. The output information is a disease node embedding representation containing temporal semantic associations. This disease node embedding representation is a low-dimensional vector generated by a graph attention network, which simultaneously encodes the temporal association strength between diseases and the similarity of medical features.
[0043] As can be seen from the above, the graph attention network structure used in this invention is a basic structure in this field. The key to this invention is to use the edge weights, i.e. the time-related risk coefficients, as weighting factors in the calculation of attention coefficients, and to use the graph attention network model to construct disease node embedding representations with temporal semantics for subsequent disease path risk calculation.
[0044] S104. For a given initial disease and target disease, combine the disease node embedding representation with the edge weights of the directed temporal co-disease network to calculate the criticality score of the disease progression path, and output the critical progression path based on the score.
[0045] In some preferred embodiments, the calculation of the criticality score of the disease progression path satisfies:
[0046] For each edge in the path, calculate the product of the semantic similarity of the embeddings of its two endpoints and the edge weight;
[0047] The average of the products of all edges on the path is taken as the score for that path.
[0048] Specifically, given an initial disease s and a target disease t, if there exists a path P from s to t... st =[s, 1, 2, ..., t], calculate the criticality score of the disease progression path as follows:
[0049] Where n is the path number, P n For a given starting disease s to a target disease t, |P n | indicates the number of edges in the path. L To determine the number of layers in a graph attention network, , These are the feature vectors of disease nodes i and j, respectively, output by the graph attention network. The function is and The cosine similarity is given by TWRR(i, j), where TWRR(i, j) is the edge weight.
[0050] Enumerate all paths from s to t, P = {P1, P2, ..., Pt} n} and calculate the critical score for each path, and output the top M paths with the highest scores as critical paths, where M is a variable parameter.
[0051] A second aspect of the present invention provides a chronic disease progression pattern mining system 300 based on temporal co-morbidity networks, comprising a construction module 301, a fusion module 302, a graph attention module 303, and a computation module 304. The specific functions of each module are described below:
[0052] Module 301 is used to obtain multi-time point medical diagnosis records of the patient group and construct a directed temporal co-disease network with edge weights, where nodes represent diseases and directed edges represent the temporal progression relationship between diseases.
[0053] The fusion module 302 is used to fuse individual features and network topology features for each disease node based on the directed temporal co-disease network to generate an initial feature vector for the node.
[0054] The graph attention module 303 is used to input the initial feature vector of the node and the edge weight into the graph attention network, dynamically aggregate the features of adjacent nodes through the attention mechanism, and output the disease node embedding representation containing temporal semantic association.
[0055] The calculation module 304 is used to calculate the criticality score of the disease progression path for a given initial disease and target disease, combining the disease node embedding representation and the edge weights of the directed temporal co-disease network, and output the critical progression path based on the score.
[0056] Based on the above embodiments, the present invention also provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps in the method for mining chronic disease progression patterns based on temporal co-morbidity networks in the first aspect embodiment.
[0057] Figure 4 A schematic block diagram of an example electronic device 400 that can be used to implement embodiments of the present invention is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.
[0058] like Figure 4 As shown, the electronic device 400 may include a computing unit 401, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 402 or a computer program loaded from a storage unit 408 into a random access memory (RAM) 403. The RAM 403 may also store various programs and data required for the operation of the device 400. The computing unit 401, ROM 402, and RAM 403 are interconnected via a bus 404. An input / output (I / O) interface 405 is also connected to the bus 404.
[0059] Multiple components in device 400 are connected to I / O interface 405, including: input unit 406, such as keyboard, mouse, etc.; output unit 407, such as various types of monitors, speakers, etc.; storage unit 408, such as disk, optical disk, etc.; and communication unit 409, such as network card, modem, wireless transceiver, etc. Communication unit 409 allows device 400 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0060] The computing unit 401 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the various methods and processes described above, such as methods for mining chronic disease progression patterns based on temporal co-disease networks or methods for training models for mining chronic disease progression patterns based on temporal co-disease networks. For example, in some embodiments, methods for mining chronic disease progression patterns based on temporal co-disease networks or methods for training models for mining chronic disease progression patterns based on temporal co-disease networks can be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program can be loaded and / or installed on device 400 via ROM 402 and / or communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the method for mining chronic disease progression patterns based on temporal co-disease networks or the model training method for mining chronic disease progression patterns based on temporal co-disease networks, as described above, can be performed. Alternatively, in other embodiments, computing unit 401 can be configured by any other suitable means (e.g., by means of firmware) to perform the method for mining chronic disease progression patterns based on temporal co-disease networks or the model training method for mining chronic disease progression patterns based on temporal co-disease networks.
[0061] Based on the above embodiments, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions, the computer instructions being used to cause the computer to execute the chronic disease progression pattern mining method based on temporal comorbidity networks disclosed in the embodiments of the present invention.
[0062] Based on the above embodiments, the present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements the method for mining chronic disease progression patterns based on temporal comorbidity networks disclosed in the embodiments of the present invention.
[0063] The computer program includes computer program code, which can be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium can include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium can be appropriately added or removed according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electrical carrier signals and telecommunication signals.
[0064] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0065] In the several embodiments provided in this application, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, or indirect coupling or communication connection between devices or units, and may be electrical, mechanical, or other forms.
[0066] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0067] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0068] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application.
[0069] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A method for mining chronic disease progression patterns based on temporal co-morbidity networks, characterized in that, The method includes the following steps: obtaining multi-time-point medical diagnosis records of a patient group, constructing a directed temporal co-disease network with edge weights, where nodes represent diseases and directed edges represent the temporal progression relationship between diseases; the calculation of the edge weights includes: for any two diseases, determining the temporal directionality through a binomial test; for disease pairs with significant temporal directionality, calculating the weight value based on a dynamic decay function of the time interval between the diagnosis of the two diseases, wherein the dynamic decay function satisfies that the weight value monotonically decreases as the time interval increases; Based on the directed temporal co-disease network, individual features and network topology features are fused for each disease node to generate an initial feature vector for the node; The initial feature vector of the node and the edge weight are input into the graph attention network. The features of adjacent nodes are dynamically aggregated through the attention mechanism, and the disease node embedding representation containing temporal semantic association is output. Given an initial disease and a target disease, the criticality score of the disease progression path is calculated by combining the disease node embedding representation with the edge weights of the directed temporal co-disease network, and the critical progression path is output based on the score.
2. The method for mining chronic disease progression patterns based on temporal co-morbidity networks as described in claim 1, characterized in that, The edge weight, i.e., the time-related hazard coefficient TWRR(i,j) of disease pair (i,j), is calculated using the following formula: , x is the patient, This is the set of patients who have both disease i and disease j, and j is diagnosed after i. N represents the number of patients who have both disease i and j, and j is diagnosed after i. i N j Let Δt represent the number of patients suffering from disease i and disease j, respectively, and let N represent the total number of patients in the patient group. x The interval between diagnosing disease i and disease j for patient x is t0, where t0 is the preset initial weight.
3. The method for mining chronic disease progression patterns based on temporal co-morbidity networks as described in claim 1, characterized in that, The execution of the attention mechanism includes: Use edge weights as a weighting factor in the attention coefficient calculation; The representation of the current node is updated by weighting and aggregating the features of neighboring nodes based on the attention coefficient.
4. The method for mining chronic disease progression patterns based on temporal co-morbidity networks as described in claim 3, characterized in that, Attention coefficients of the q-th layer of the graph attention network The calculation formula is: Where LeakReLU is the neural network activation function; q is the current network layer number; and Let be the feature vectors of disease i and disease j output by the (q-1)th layer of the graph attention network, respectively, where 1 ≤ q ≤ L, and L is the number of layers in the graph attention network. When q = 1, and Let be the initial feature vectors of disease nodes i and j, respectively. When q = L, and These are the feature vectors of disease nodes i and j, respectively, output by the graph attention network. , These are learnable parameters, all representing the parameters of the q-th layer network; It is a weighting function for the edge weights.
5. The method for mining chronic disease progression patterns based on temporal co-morbidity networks as described in claim 3, characterized in that, The formula for calculating the attention coefficient-weighted aggregation of neighbor node features is as follows: Where σ is the activation function, and N_in(i) is the set of neighboring disease nodes of all incoming edges of disease i. The attention coefficients of the q-th layer of the graph attention network Normalized attention weights Let be the feature vector of disease j output by the (q-1)th layer graph attention network. Let be the feature vector of disease i output by the graph attention network at layer q, which is the updated feature vector at layer q.
6. The method for mining chronic disease progression patterns based on temporal co-morbidity networks as described in claim 1, characterized in that, After calculating the criticality score of the disease progression path, the top M paths with the highest criticality scores from all paths are selected as critical paths; the calculation of the criticality score of the disease progression path satisfies the following: For each edge in the path, calculate the product of the semantic similarity of the embeddings of its two endpoints and the edge weight; The average of the products of all edges on the path is taken as the score for that path.
7. The method for mining chronic disease progression patterns based on temporal co-morbidity networks as described in claim 1 or 6, characterized in that, Score (P) for the criticality of the disease progression pathway n The formula for calculating ) is: Where n is the path number, P n For a given path from the initial disease to the target disease, |P n | indicates the number of edges in the path. L To determine the number of layers in a graph attention network, , These are the feature vectors of disease nodes i and j, respectively, output by the graph attention network. The function is and The cosine similarity is given by TWRR(i, j), where TWRR(i, j) is the edge weight.
8. A system for mining chronic disease progression patterns based on temporal co-morbidity networks, characterized in that, Include: A construction module is used to acquire multi-time-point medical diagnosis records of a patient group and construct a directed temporal co-disease network with edge weights, where nodes represent diseases and directed edges represent the temporal progression relationship between diseases; the calculation of the edge weights includes: for any two diseases, determining the temporal directionality through a binomial test; for disease pairs with significant temporal directionality, calculating the weight value based on a dynamic decay function of the time interval between the diagnosis of the two diseases, wherein the dynamic decay function satisfies that the weight value monotonically decreases as the time interval increases; The fusion module is used to fuse individual features and network topology features for each disease node based on the directed temporal co-disease network to generate an initial feature vector for the node. The graph attention module is used to input the initial feature vector of the node and the edge weight into the graph attention network, dynamically aggregate the features of adjacent nodes through the attention mechanism, and output the disease node embedding representation containing temporal semantic association. The calculation module is used to calculate the criticality score of the disease progression path for a given initial disease and target disease, combining the disease node embedding representation and the edge weights of the directed temporal co-disease network, and output the critical progression path based on the score.