A method for constructing a prostate cancer patient prognosis risk prediction model
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DEHUA COUNTY HOSPITAL
- Filing Date
- 2026-01-12
- Publication Date
- 2026-06-23
Smart Images

Figure CN121506505B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical artificial intelligence and prognosis prediction technology, specifically a method for constructing a prognostic risk prediction model for prostate cancer patients. Background Technology
[0002] Currently, machine learning-based medical prognosis prediction models have become important tools for assisting clinical decision-making. When building such models, conventional technical processes typically treat feature engineering and model training as two separate stages. Feature selection often employs static methods such as filtering, wrapping, or importance assessment based on model embedding. These methods determine the feature set before model training begins or after a single training cycle, lacking the ability to adaptively adjust based on dynamic feedback during training. Regarding the model itself, although deep learning methods, represented by attention mechanisms, can capture key information from the data, their weight allocation is entirely data-driven, lacking a guiding mechanism that integrates external prior knowledge or high-level feature correlations. This may lead to attention being scattered to noisy or irrelevant features, affecting the stability and interpretability of the predictions.
[0003] The main shortcomings of existing technologies lie in the disconnect between feature selection and model learning, as well as the closed nature of the model's attention mechanism. Static feature selection struggles to adapt to the complex and evolving feature association patterns in patients' entire disease progression time-series data, easily getting trapped in local optima and unable to optimize feature subsets in real time during training. Simultaneously, unguided attention mechanisms may fail to accurately focus on key feature combinations that truly have a synergistic impact on prognosis, and their decision-making logic is disconnected from medical knowledge. This limits the model's predictive accuracy and clinical reliability in complex clinical scenarios. A method is needed that enables real-time interaction between feature selection and model training, and can utilize deep correlation knowledge between features to guide the model's attention mechanism. Summary of the Invention
[0004] The purpose of this invention is to provide a method for constructing a prognostic risk prediction model for prostate cancer patients, so as to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the present invention provides a method for constructing a prognostic risk prediction model for prostate cancer patients, the method comprising:
[0006] Create a hierarchical knowledge network of prognostic features;
[0007] Based on the diagnosis and treatment event logs, the temporal causal dependencies of the diagnosis and treatment path are mined to generate a diagnosis and treatment state transition topology;
[0008] Based on the aforementioned diagnosis and treatment state transition topology, the patient's entire disease course data is sliced to form multiple continuous time-series data segments with overlapping information;
[0009] The hierarchical knowledge network is used to perform feature semantic mapping and encoding on each time-series data segment;
[0010] The encoded temporal data segments are input into a deep prediction architecture that includes a multi-scale temporal attention mechanism;
[0011] A simulated annealing-optimized feature selection loop is run synchronously, and the feature selection loop interacts with the training process of the deep prediction architecture to dynamically filter feature subsets;
[0012] Based on the intermediate results of the feature selection loop, a feature influence propagation graph is constructed to identify key feature clusters;
[0013] The attention weights of the deep prediction architecture are guided and corrected using the key feature clusters.
[0014] The guided correction deep prediction architecture, the final output of the feature selection loop, and the diagnosis and treatment state transition topology are integrated to assemble a complete prognostic risk prediction model.
[0015] An online adaptive fine-tuning loop is established for the prognostic risk prediction model, and the internal parameters of the model are incrementally adjusted based on the newly input patient data stream.
[0016] Preferably, the process of creating a hierarchical knowledge network of prognostic-related features includes:
[0017] The hierarchical knowledge network contains multiple associated nodes, each of which corresponds to a clinical feature and its attribute definition.
[0018] Entities and attributes are extracted from structured electronic medical records and unstructured clinical texts. The entities include disease names, examination items, treatment plans, and biomarkers.
[0019] Define the medical logical relationships between entities, including causal relationships, accompaniment relationships, temporal relationships, and mutual exclusion relationships;
[0020] Based on the aforementioned medical logic, multiple related nodes are connected to form a directed acyclic graph structure, which serves as the initial knowledge network.
[0021] A medical ontology library is introduced to perform semantic disambiguation and concept merging on the nodes in the initial knowledge network to ensure the uniqueness of each associated node;
[0022] Each associated node is assigned a weight vector and a confidence interval. The weight vector reflects the importance of the clinical feature in different prognostic assessment scenarios, and the confidence interval reflects the medically reasonable range of values for the clinical feature.
[0023] The network structure with added weight vectors and confidence intervals is solidified to form the hierarchical knowledge network.
[0024] Preferably, the process of generating a treatment state transition topology by mining the temporal causal dependencies of treatment paths based on treatment event logs includes:
[0025] Analyze the patient's entire diagnosis and treatment process event log recorded in the hospital information system. Each event log contains the diagnosis and treatment action, execution time, and patient status identifier.
[0026] A causal discovery algorithm is used to analyze the sequence and conditional probability of different diagnostic and treatment actions, and to identify statistically significant causal edges.
[0027] By using patient status identifiers as state nodes and identifying causal edges as directed connections between state nodes, a preliminary diagnosis and treatment state transition network is constructed.
[0028] Calculate the transition strength and uncertainty measure of each directed connection in the diagnosis and treatment state transition network;
[0029] Based on the transition strength and uncertainty measure, the preliminary diagnosis and treatment state transition network is pruned and merged to simplify it, removing redundant connections and merging similar state nodes to generate a refined diagnosis and treatment state transition topology.
[0030] Preferably, the process of slicing the patient's entire disease course data into multiple continuous time-series data segments with overlapping information, based on the aforementioned diagnosis and treatment state transition topology, includes:
[0031] Using each state node in the aforementioned diagnosis and treatment state transition topology as a time anchor point, a core time window is determined on the patient's entire disease course timeline;
[0032] Around each core time window, a preset time range is extended forward and backward to form an extended time window, with adjacent extended time windows overlapping in time;
[0033] All clinical data within each extended time window are extracted from the patient's full disease course data to form a raw data slice;
[0034] For each original data slice, based on its corresponding state node, relevant associated nodes and their attribute definitions are matched from the hierarchical knowledge network;
[0035] By using the matched associated nodes and their attribute definitions, the original data slices are restructured in a structured manner, missing attribute fields are filled in and the data format is unified to form standardized time-series data segments.
[0036] Preferably, the process of using the hierarchical knowledge network to perform feature semantic mapping and encoding for each time-series data segment includes:
[0037] Each data record in the time series data segment is semantically similar to the associated nodes in the hierarchical knowledge network to find the corresponding target associated node;
[0038] Read the weight vector and confidence interval of the target associated node;
[0039] The original numerical values of the data records are transformed into the semantic space defined by the target associated node. The transformation includes numerical normalization, outlier truncation, and reasonableness verification based on confidence intervals.
[0040] The transformed data records are arranged in chronological order and concatenated with the corresponding weight vectors to generate high-dimensional feature vectors.
[0041] The high-dimensional feature vector is compressed and encoded using a time-series encoder to obtain a fixed-dimensional semantic encoding vector, which represents the feature expression of the time-series data segment.
[0042] Preferably, inputting the encoded temporal data segment into a deep prediction architecture that includes a multi-scale temporal attention mechanism includes:
[0043] The deep prediction architecture employs a phased, gradual unfreezing strategy during training.
[0044] A neural network with a deep prediction architecture is constructed, which includes parallel long-range attention paths and short-range attention paths, used to capture dependencies across time-series data segments and local patterns within a single time-series data segment, respectively.
[0045] Initialize all parameters of the deep prediction architecture and freeze all weights except for the final output layer;
[0046] The semantic encoding vector is input into the deep prediction architecture in chronological order, and the unfrozen output layer is trained using the initial loss function until the loss converges.
[0047] The deeper network modules in the deep prediction architecture are unfrozen step by step in a preset order. After each unfreezing, the unfrozen part and the previously unfrozen part are retrained using data containing historical semantic encoding vectors, and the network parameters are updated.
[0048] Once all modules of the deep prediction architecture have been unfrozen and trained, the basic prediction model is obtained.
[0049] Preferably, a simulated annealing-optimized feature selection loop is run synchronously. This feature selection loop interacts with the training process of the deep prediction architecture, and the process of dynamically selecting a subset of features includes:
[0050] Initialize a global feature pool containing all potential clinical features;
[0051] After each training cycle of the basic prediction model, a subset of features is randomly sampled from the global feature pool;
[0052] Using the data corresponding to the currently sampled feature subset, perform forward inference on the basic prediction model and calculate the model's performance index on the current feature subset.
[0053] The simulated annealing algorithm is used to determine whether to accept the currently sampled feature subset as a new candidate subset, with the performance index as the optimization target, and the probability of accepting the suboptimal subset is dynamically adjusted according to the annealing temperature.
[0054] The subset of candidate features accepted in multiple iterations is recorded and fused to form a dynamically evolving feature importance distribution;
[0055] Based on the feature importance distribution, features with importance scores exceeding a threshold are selected from the global feature pool to form a dynamically updated feature subset, which will be used to guide the next stage of model training.
[0056] Preferably, the process of constructing a feature influence propagation graph and identifying key feature clusters based on the intermediate results of the feature selection loop includes:
[0057] Collect the subset of accepted candidate features recorded in multiple iterations of the simulated annealing optimization feature selection loop;
[0058] Analyze the frequency of feature co-occurrence and conditional dependencies in each candidate feature subset;
[0059] A weighted directed feature co-occurrence network is constructed using features as nodes and the co-occurrence strength and conditional dependency strength between features as edges.
[0060] A community detection algorithm is run on the feature co-occurrence network to group closely connected feature nodes into the same community, and each community forms a feature cluster.
[0061] Calculate the internal connectivity density and average contribution to model performance for each feature cluster, and mark the feature clusters with high internal connectivity density and large average contribution as key feature clusters.
[0062] Preferably, the process of guiding the correction of the attention weights of the deep prediction architecture using the key feature clusters includes:
[0063] Extract the original attention weight distribution generated by the multi-scale temporal attention mechanism from the trained base prediction model;
[0064] Map the features contained in the key feature clusters to the feature dimensions corresponding to the multi-scale temporal attention mechanism;
[0065] The original attention weight distribution is enhanced on the feature dimensions corresponding to the key feature clusters, while the weight values on the feature dimensions corresponding to non-key feature clusters are weakened proportionally.
[0066] The adjusted attention weight distribution is normalized to ensure that the sum of all weights is one.
[0067] The normalized attention weight distribution is re-injected into the basic prediction model to replace the original attention weights, thereby obtaining a deep prediction architecture with guided correction.
[0068] Preferably, the process of establishing the online adaptive fine-tuning loop of the prognostic risk prediction model includes:
[0069] Deploy the assembled prognostic risk prediction model to the online prediction service environment;
[0070] The system receives new patient data streams in real time and performs the same hierarchical knowledge network mapping, temporal slicing, and semantic encoding processing on the new patient data streams according to the process in the construction method, generating online semantic encoding vectors.
[0071] The online semantic encoding vector is input into the prognostic risk prediction model to obtain the prognostic risk prediction result, and the uncertainty estimate of the prognostic risk prediction result is calculated at the same time.
[0072] When the uncertainty of the prediction result exceeds the preset confidence threshold, the fine-tuning mechanism is triggered;
[0073] The fine-tuning mechanism utilizes newly input patient data streams and their corresponding real prognostic labels to construct small-batch fine-tuning data;
[0074] Using the small batch of fine-tuning data, some internal parameters of the prognostic risk prediction model are updated by gradient descent with a limited number of steps to achieve incremental adjustment;
[0075] Record the parameters and data batches adjusted in each fine-tuning operation for model version tracking and rollback.
[0076] Compared with the prior art, the beneficial effects of the present invention are:
[0077] The synchronously running simulated annealing optimization feature selection loop interacts with the deep prediction architecture training process, changing the traditional serial or static feature selection mode. The simulated annealing algorithm continuously explores different feature subsets during training using a probabilistic strategy; its ability to accept suboptimal solutions helps it escape local optima. Information such as the loss gradient during training is fed back to this loop in real time, dynamically adjusting the feature search direction so that feature selection closely adapts to the current model's learning state and data distribution. This dynamic interaction mechanism ensures that the final feature subset is not only data-driven but also optimized through deep game theory with the specific prediction task, improving the ability to characterize the complex mapping relationship between feature combinations and prognostic risk, and enhancing the model's feature robustness.
[0078] A feature influence propagation graph is constructed based on the intermediate results of the feature selection loop to identify key feature clusters. This graph is then used to guide the correction of the multi-scale temporal attention mechanism, providing structured knowledge constraints for data-driven attention. The propagation graph reveals the dependencies and co-occurrence relationships between features, and the key feature clusters reflect groups of features with strong internal correlations. Injecting this knowledge into the attention calculation process through regularization or weight biasing constrains the model to prioritize these feature clusters with proven synergistic influence when calculating the importance of different time steps and feature dimensions. This guidance aligns the model's focus more closely with the overall effect of feature groups in medical knowledge, reducing the possibility of attention being interfered with by irrelevant or weakly correlated features, thereby improving the clinical interpretability of predictive decisions and the accuracy of capturing key prognostic factors. Attached Figure Description
[0079] Figure 1 This is a schematic diagram illustrating the working principle of the method for constructing a prognostic risk prediction model for prostate cancer patients as described in this invention.
[0080] Figure 2 A flowchart for creating a hierarchical knowledge network of prognostic features;
[0081] Figure 3 A flowchart for generating the topology of diagnosis and treatment state transitions;
[0082] Figure 4 To simulate annealing characteristics, an iterative AUC variation trend graph was selected;
[0083] Figure 5 Heatmap of co-occurrence network and key feature clusters for prognostic features of prostate cancer. Detailed Implementation
[0084] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0085] Please see Figure 1 This invention provides a method for constructing a prognostic risk prediction model for prostate cancer patients. The method includes: mining the temporal causal dependencies between treatment actions based on treatment event logs obtained from a hospital information system, and generating a state transition topology representing a typical treatment path. Using this state transition topology, the patient's continuous disease course data is sliced to generate a series of temporally continuous and overlapping temporal data segments. Each temporal data segment is mapped and uniformly encoded using a hierarchical knowledge network. The encoded temporal data is then fed into a deep prediction architecture containing a multi-scale temporal attention mechanism for training. During training, a feature selection loop optimized based on simulated annealing is run synchronously. This loop interacts with the training iterations of the deep prediction architecture, dynamically evaluating and selecting feature subsets. Based on the intermediate results accumulated in the feature selection loop, an influence propagation graph between features is constructed, identifying key feature clusters that significantly contribute to the prediction. These key feature clusters are used to guide the correction of the attention weight distribution within the deep prediction architecture. The attention-corrected deep prediction architecture, the stable feature subset output by the feature selection loop, and the diagnosis-treatment state transition topology are integrated to form a complete prognostic risk prediction model. To maintain model performance, an online adaptive fine-tuning loop is established to make small-scale incremental adjustments to the model's internal parameters based on the newly input patient data stream.
[0086] In one embodiment of the present invention, see [reference] Figure 2In its implementation, creating a hierarchical knowledge network of prognostic features involves systematically organizing clinical features and their intrinsic relationships from multi-source medical data. This hierarchical knowledge network contains multiple interconnected nodes, each corresponding to a specific clinical feature and its complete attribute definition. The implementation process begins with extracting medical entities and attributes from structured electronic medical record forms and unstructured clinical text records. In structured electronic medical records, entities such as "prostate-specific antigen," "Gleason score," and "clinical T stage" can be extracted by directly reading field values. In unstructured clinical texts such as pathology reports or imaging reports, entities such as "nerve invasion," "bone metastasis," and "endocrine treatment plan" are extracted using named entity recognition technology. It can be understood that the extracted entity types cover disease names, imaging and laboratory test items, surgical and drug treatment plans, and biomarkers.
[0087] In practical implementation, defining the medical logical relationships between entities is the foundation for constructing a hierarchical knowledge network semantic framework. These medical logical relationships include causal relationships, concomitant relationships, temporal relationships, and mutually exclusive relationships. Causal relationships describe the influence of one clinical feature on another feature or outcome; for example, "persistently elevated prostate-specific antigen (PSA) levels" and "increased risk of biochemical recurrence" constitute a causal relationship. Concomitant relationships describe features that frequently occur simultaneously or are jointly managed in clinical practice; for example, "androgen deprivation therapy" and "GnRH agonist administration" constitute a concomitant relationship. Temporal relationships specify the logical order in which diagnostic and treatment events occur; for example, "radical prostatectomy" should precede "postoperative pathological confirmation." Mutually exclusive relationships indicate features that cannot be simultaneously established in medical logic; for example, "organ-localized tumor" and "distant metastasis" constitute a mutually exclusive relationship. Based on the defined medical logical relationships, all extracted associated nodes are connected using directed edges to form an initial knowledge network. This initial knowledge network presents a directed acyclic graph structure, where nodes are clinical feature entities, and edges represent the relationships between entities.
[0088] In some embodiments, introducing an external medical ontology library to perform semantic disambiguation and concept merging on the initial knowledge network is a step to ensure knowledge consistency. The medical ontology library provides a standardized medical terminology and concept encoding system. By mapping and matching the node terms in the initial knowledge network with the standard concepts in the medical ontology library, ambiguities such as polysemy or homosemy can be eliminated. For example, "PSA" and "prostate-specific antigen" appearing in medical records can be unified into the standard concept "prostate-specific antigen (LOINC code: 2857-1)," and "castration therapy" and "androgen deprivation therapy" can be merged into the same concept node. This merging process ensures that each clinical feature-related node in the hierarchical knowledge network has a unique and clear medical definition. It can be understood that the network structure after semantic disambiguation and concept merging is clearer and more standardized.
[0089] In practice, attaching a weight vector and a confidence interval to each uniquely identified associated node is a crucial operation for injecting quantitative medical knowledge into the network. The weight vector is a multi-dimensional vector reflecting the relative importance of clinical features in different prognostic assessment scenarios. The weight vector is initialized through prior knowledge from medical experts or learned from historical data. For example, for the associated node "Gleason score," its weight vector might be initialized to [0.9, 0.7, 0.3], corresponding to the importance in predicting three different scenarios: "biochemical relapse," "metastatic progression," and "overall survival," respectively. The confidence interval defines the medically reasonable range for the value of this clinical feature and is used for subsequent data validation.
[0090] In some embodiments, the network structure with added weight vectors and confidence intervals is solidified to form the final hierarchical knowledge network. The solidification process includes serializing node, relation, weight, and confidence information into a data structure or storing it in a graph database. As a static knowledge framework, the hierarchical knowledge network maintains stability in its nodes and relations during the model building phase, while the node weight vectors can be fine-tuned during subsequent model training. The solidified hierarchical knowledge network provides a unified semantic dictionary and constraint specifications for subsequent feature mapping. An optional weight vector initialization formula can be expressed as:
[0091] ;
[0092] in: Indicates the first The weight vector of each associated node. Represents a multivariate normal distribution. It is a mean vector defined based on prior knowledge. It is a diagonal covariance matrix used to control the uncertainty of the initial weights.
[0093] In one embodiment of the present invention, see [reference] Figure 3In practice, the process of mining the temporal causal dependencies of treatment paths and generating treatment state transition topologies based on treatment event logs begins with parsing the patient's entire treatment process event logs recorded by the hospital information system. Each event log contains three core elements: the specific treatment action, the precise execution time, and the corresponding patient status identifier. Treatment actions originate from medical orders and operation records, such as "ordering a prostate-specific antigen test," "performing a radical prostatectomy," or "starting androgen deprivation therapy." The execution time records the date and time the action occurred, and the patient status identifier is a summary code of the patient's current overall condition, such as "newly diagnosed localized cancer," "postoperative monitoring period," or "castration-resistant prostate cancer." It is understandable that the parsing process requires cleaning, aligning, and standardizing massive amounts of heterogeneous raw log records to form standardized sequence data that can be processed by the algorithm.
[0094] In some embodiments, a causal discovery algorithm is employed to analyze the sequence and conditional probabilities between different diagnostic and treatment actions. This algorithm identifies statistically significant causal edges from a large sequence of patient treatment events. The algorithm calculates the probability of treatment action B occurring within a subsequent time window given that treatment action A has occurred, and compares this probability with the marginal probability of treatment action B. Hypothesis testing is then used to determine whether the causal relationship from treatment action A to treatment action B is significant. For example, the analysis might reveal that the treatment action "positive bone scan" significantly increases the conditional probability of the subsequent treatment action "starting radium-223 treatment," thus identifying a causal edge from "positive bone scan" to "starting radium-223 treatment." Optionally, the causal discovery algorithm can employ PC algorithms, FCI algorithms, or methods based on transfer entropy. The core objective is to uncover causal dependencies between diagnostic and treatment events that go beyond simple temporal correlation.
[0095] In practical implementation, patient status identifiers are used as state nodes, and identified causal edges are used as directed connections between state nodes to construct a preliminary treatment state transition network. Each unique status identifier, such as "biochemical relapse" or "local clinical progression," constitutes a state node in the network. The causal edge from treatment action A to treatment action B, identified by the causal discovery algorithm, is transformed into a directed connection from the state node belonging to treatment action A to the state node belonging to treatment action B. For example, if causal discovery determines that "prostate-specific antigen levels >0.2 ng / mL twice consecutively" leads to "initiation of salvage radiotherapy," and these two actions belong to the "biochemical relapse monitoring period" and "salvage treatment period" states, respectively, then a directed edge from the "biochemical relapse monitoring period" node to the "salvage treatment period" node will be established in the preliminary treatment state transition network. This network initially depicts the possible state evolution paths of the patient.
[0096] In practical implementation, calculating the transition strength and uncertainty measure of each directed connection in the treatment state transition network is a key step in network refinement. Transition strength quantifies the frequency and certainty of state transitions. Transition strength can be calculated by statistically analyzing the proportion of transitions from the source state node to the target state node, and weighted by the average time interval between transitions. Uncertainty measure assesses the statistical confidence of the transition relationship, considering factors such as the p-value of a causal test, the confidence interval width, or the Bayesian posterior probability. For example, the transition strength from the state "active surveillance period" to the state "curative treatment period" may be high, while the uncertainty is low; conversely, the transition strength from the state "endocrine therapy period" to a state with a rare complication may be very low, with high uncertainty. A formula for calculating weighted transition strength is as follows:
[0097] ;
[0098] in: Indicates from the state node To the state node The transfer intensity, From the state node Transfer to state node Number of observations From the state node Total number of transfers. It is the time decay coefficient. From the state node To the state node The average time interval between transfers.
[0099] In some embodiments, the initial treatment state transition network is pruned and simplified based on metastasis intensity and uncertainty metrics. Pruning removes directed connections with metastasis intensity below a threshold α or uncertainty metrics above a threshold β, considering these connections as statistical noise or atypical treatment pathways. Simplification targets multiple state nodes with highly similar clinical significance and indistinguishable metastasis patterns; for example, "neoadjuvant therapy phase" and "preoperative preparation phase" may be merged into a single "preoperative treatment management phase" node. After pruning and simplification, a more streamlined, more typical, and clinically interpretable treatment state transition topology is generated, summarizing the core states and key metastatic relationships in the treatment process of prostate cancer patients.
[0100] In practice, when slicing the patient's entire disease course data based on the treatment state transition topology, each state node in the topology is used as a time anchor point to determine a core time window on the patient's entire disease course timeline. For a given state node, such as "recovery period after radical prostatectomy," determining the core time window requires tracing back the treatment event log to find all consecutive time periods marked as that state. The core time window may start from the patient's first entry into that state and continue until the state identifier changes. Around each determined core time window, a preset fixed time range is extended forward and backward, for example, 30 days forward and 60 days backward, forming an extended time window. By setting the extension range, the extended time windows corresponding to adjacent state nodes will inevitably have overlapping areas on the timeline, thereby capturing information near the state transition boundary.
[0101] In practice, a raw data slice is formed by extracting all clinical data within each extended time window from the patient's full-course disease data. The full-course data includes laboratory test results, imaging reports, pathology records, vital signs, and medication records. The extraction is performed according to timestamps; all clinical records falling within an extended time window, regardless of their data type, are grouped into the same raw data slice. For example, for an extended time window centered on "early stage of endocrine therapy," its raw data slice might include multiple prostate-specific antigen (PSA) test values, testosterone levels, medication records (such as bicalutamide and leuprorelin), and records of potential side effects during that window. Optionally, the raw data slices are organized in the form of a time-series table, with each record containing a timestamp, data category, and specific value.
[0102] In practice, for each original data slice, relevant associated nodes and their attribute definitions are matched from the hierarchical knowledge network based on its corresponding state node. Each clinical record in the original data slice needs to find a corresponding associated node in the hierarchical knowledge network. The matching process is based on semantic mapping between the record's data category and the definition of the associated node in the hierarchical knowledge network. For example, for a record in the original data slice with "prostate-specific antigen = 8.5 ng / mL", an associated node named "prostate-specific antigen" is matched in the hierarchical knowledge network, and the attribute definition of this node is read, including units, normal value range, weight vector, etc. Using the matched associated nodes and their attribute definitions, the original data slice is structurally reconstructed. The structural reconstruction, based on the attribute definition template of the associated nodes, fills the data in the original slice with uniform fields, such as ensuring that all date and time formats are consistent, all units of measurement are uniform, and missing attribute fields are filled with marker values, ultimately forming standardized time-series data segments. Each time-series data segment is closely associated with a core state node in the diagnosis and treatment state transition topology.
[0103] In one embodiment of the present invention, the process of using a hierarchical knowledge network to perform feature semantic mapping and encoding on each time-series data segment is a key step in transforming the original clinical records into vector expressions that the model can process. The mapping process begins by performing semantic similarity matching between each data record in the time-series data segment and the associated nodes in the hierarchical knowledge network. For a laboratory record containing the value "8.5" and the unit "ng / mL", the system calculates its semantic similarity with all associated nodes in the hierarchical knowledge network that describe laboratory test items. By comparing the test item name, code, and unit in the record, the record is matched to a target associated node named "prostate-specific antigen". The weight vector and confidence interval attached to the target associated node "prostate-specific antigen" are read. The weight vector may be [0.85, 0.60, 0.20], and the confidence interval may be [0, 1000] ng / mL.
[0104] The raw numerical values of the data records are transformed into the semantic space defined by the target associated nodes. The transformation process is performed according to the attribute definitions of the associated nodes, specifically including numerical normalization, outlier truncation, and reasonableness verification based on confidence intervals. For the raw value "8.5 ng / mL", reasonableness verification is first performed based on the confidence interval. Since 8.5 falls within the interval [0, 1000], it is determined to be reasonable. Then, a normalization operation is performed, linearly scaling the raw value to the range [0, 1]. If the upper limit of the reference range for prostate-specific antigen is set to 20 ng / mL, the normalized value is calculated as 8.5 / 20 = 0.425. If the raw value exceeds the confidence interval, for example, a record value of 1500 ng / mL, outlier truncation is performed, setting it to the upper limit of the interval of 1000 ng / mL before normalization. The semantically transformed data records are arranged strictly according to their timestamp order and concatenated with the weight vector of the target associated node corresponding to each record to generate a high-dimensional feature vector. For example, a record with a normalized value of 0.425 after transformation is concatenated with its weight vector [0.85, 0.60, 0.20] to form a four-dimensional vector [0.425, 0.85, 0.60, 0.20]. All records within a time series data segment are processed in this way to form a high-dimensional feature vector sequence. A time-series encoder can be used to compress and encode the high-dimensional feature vector sequence. The time-series encoder uses a gated recurrent unit network (GRN). The GRN processes each concatenated feature vector sequentially, ultimately converging the information of the entire sequence into the hidden state of the last time step. This hidden state is used as a fixed-dimensional semantic encoding vector, which represents the comprehensive feature expression of the time series data segment.
[0105] In some embodiments, inputting encoded temporal data segments into a deep prediction architecture incorporating a multi-scale temporal attention mechanism involves constructing a neural network with a specific structure and employing a phased progressive unfreezing strategy. The constructed deep prediction architecture neural network includes parallel long-range attention paths and short-range attention paths. The long-range attention path receives a sequence of semantically encoded vectors from multiple temporal data segments and calculates the correlation between different semantically encoded vectors through a self-attention mechanism to capture long-term dependencies across temporal data segments, such as identifying the association between "preoperative prostate-specific antigen (PSA) levels" and "postoperative two-year biochemical recurrence risk." The short-range attention path focuses on the internal structure of a single temporal data segment. Its input is a sequence of high-dimensional feature vectors generated before the semantically encoded vector is generated. It captures local fine-grained patterns through a convolutional attention module, such as identifying the slope and fluctuation characteristics of the decrease in PSA levels during the "initial stage of endocrine therapy." It is understood that the outputs of the long-range and short-range attention paths are fused later in the network to jointly participate in the final risk prediction.
[0106] In practice, the phased, progressive unfreezing strategy begins by initializing all parameters of the deep prediction architecture and freezing all weights except for the final output layer. The parameters of the deep prediction architecture include the weight matrices and bias terms in the long and short attention paths, as well as the parameters of subsequent fusion layers. All these parameters are set to an untrainable state after initialization, except for the weights of the fully connected output layer used to output the risk score, which remain trainable. The semantic encoding vectors of all patients in the entire training dataset are organized chronologically and input into the deep prediction architecture. The only unfrozen output layer is trained using the initial binary cross-entropy loss function. Training continues until the loss function converges on the validation set. This stage allows the output layer to learn how to make preliminary predictions based on fixed feature representations. Deeper network modules in the deep prediction architecture are gradually unfrozen in a predetermined order, starting with the fusion modules closest to the output layer and working backwards towards the front attention modules. One complete module is unfrozen at a time; for example, the feature fusion layer is unfrozen first, then the last transformation layer in the long attention path is unfrozen, followed by the corresponding part of the short attention path. After each unfreezing, the network parameters of the currently unfreezed portion and all previously unfreezed portions are retrained using all training data containing historical semantic encoding vectors. The parameters of the unfreezed portions remain unchanged. Through this gradual approach, the network parameters are finely tuned layer by layer. Once all modules of the deep prediction architecture, including the core attention computation layers of both the long-range and short-range attention paths, are unfrozen and trained, a basic prediction model with initial training is obtained. The attention weights within this basic prediction model are capable of learning dependencies from the data. An optional formula for calculating local feature weights in the short-range attention path is:
[0107] ;
[0108] in: This indicates that within a single time-series data segment, the first... Attention weights of the high-dimensional feature vectors corresponding to each time point It is the first Feature vectors at each time point It is a learnable global query vector. It is a function that calculates similarity, such as a dot product or a small neural network. It represents the total number of time points within this time series data segment.
[0109] In one embodiment of the invention, a simulated annealing-optimized feature selection loop is run synchronously. This loop interacts with the training process of the deep prediction architecture to dynamically filter feature subsets. The process begins by initializing a global feature pool containing all potential clinical features. This pool is derived from the clinical features corresponding to all associated nodes in a hierarchical knowledge network, and may include dozens to hundreds of features such as "prostate-specific antigen baseline value," "Gleason score," "clinical T stage," "bone metastasis status," "endocrine therapy duration," and "prostate-specific antigen doubling time." After each training cycle of the basic prediction model, the feature selection loop is triggered, randomly sampling a feature subset from the global feature pool according to a uniform distribution. The sampling size can be set to a fixed proportion of the global feature pool size, for example, randomly selecting 30% of the features each time to form a feature subset. The data corresponding to the currently sampled feature subset is used to perform forward inference on the basic prediction model. This means that only the feature data belonging to this subset is input into the model, while other features are masked. Subsequently, the performance index of the model on the current feature subset is calculated. The performance index can be the area under the receiver operating characteristic curve (AUC) or the F1 score on the validation set.
[0110] The simulated annealing algorithm is used to determine whether to accept the currently sampled feature subset as a new candidate subset, with performance metrics as the optimization objective. The simulated annealing algorithm maintains a current optimal feature subset and its corresponding performance score, and treats each randomly sampled feature subset as a "neighborhood solution." The algorithm dynamically adjusts the probability of accepting a suboptimal subset based on performance improvement and an annealing temperature that decays over time. If the performance of a newly sampled subset is better than the current best, it is always accepted; if the performance is worse, it is accepted with a probability determined by both the annealing temperature and the performance degradation. The annealing temperature is higher in the early stages of iteration, allowing for more solutions with decreased performance to avoid getting trapped in local optima. As iterations progress, the temperature gradually decreases, and the algorithm tends to accept only solutions with improved performance. The accepted candidate feature subsets from multiple iterations are recorded and merged. The records include the specific features contained in the subset and their corresponding performance metrics. By statistically analyzing the frequency of each feature in all accepted subsets and combining this with the average performance of the subset at the time of its occurrence, a dynamically evolving feature importance distribution is formed. Based on the distribution of feature importance, features with importance scores exceeding a preset threshold are selected from the global feature pool to form a dynamically updated feature subset. For example, a threshold is set to select the top 40% of features by importance score. This dynamically updated feature subset will guide the next stage of training of the base prediction model. In the next stage of training, the model mainly focuses on these selected features, but the feature selection loop will continue to run for further optimization. For a more detailed explanation, refer to Table 1, which shows a simplified subset of candidate features recorded during the simulation.
[0111] Table 1: Candidate Feature Subset Record Table
[0112]
[0113] In a specific implementation, the probability of accepting a suboptimal subset of features in the simulated annealing algorithm. Calculated using the following formula:
[0114] ;
[0115] in: Indicates the performance index of the newly sampled feature subset Compared with the current best performance indicators The difference ( ),when This indicates a performance degradation. This is the current annealing temperature. With iteration rounds From the initial temperature according to attenuation, It is the attenuation coefficient ( It's understandable that when the temperature... At higher levels, even with a significant performance decrease ( (relatively high) probability of acceptance It may also be non-zero; as the temperature decreases, the probability of accepting a subset with degraded performance decreases rapidly.
[0116] In practice, a feature influence propagation graph is constructed based on the intermediate results of the feature selection loop, and key feature clusters are identified. This process collects all accepted candidate feature subsets recorded in multiple iterations of the simulated annealing optimization feature selection loop. These subsets constitute the basic data for analyzing the relationships between features. The frequency of feature co-occurrence and conditional dependency in each candidate feature subset are analyzed. Feature co-occurrence frequency refers to the number of times two features appear simultaneously in the same accepted subset. Conditional dependency is evaluated by statistically analyzing the conditional probability that feature B also appears in the subset where feature A appears. For example, the analysis may find that "Gleason score" and "clinical T stage" have a high co-occurrence frequency and conditional probability. A weighted directed feature co-occurrence network is constructed with features as nodes and the co-occurrence strength and conditional dependency strength between features as edges. Co-occurrence strength can be represented by the standardized value of co-occurrence frequency, and conditional dependency strength can be represented by the conditional probability value. The direction of the edges is from the conditional feature to the dependent feature, such as from "Gleason score" to "prostate-specific antigen baseline value".
[0117] Running a community detection algorithm on a feature co-occurrence network groups tightly connected feature nodes into the same community. This algorithm can employ the Louvain algorithm or spectral clustering. Based on the weights of edges between nodes, the algorithm divides the network into several communities with tightly connected internal connections and sparse external connections, each forming a feature cluster. In some embodiments, the internal connection density and average contribution to model performance of each feature cluster are calculated. The internal connection density is the ratio of the sum of all edge weights within the cluster to the maximum possible number of edges. The average contribution to model performance can be approximated by calculating the average performance gain of features belonging to that cluster across all candidate subsets in which they appear. Feature clusters with high internal connection density and large average contribution are marked as key feature clusters. For example, a feature cluster containing "Gleason score," "clinical T stage," and "prostate-specific antigen baseline value" might be identified. These clusters have tight internal connections, and the joint occurrence of these features often corresponds to high model performance; this cluster is thus marked as a key feature cluster.
[0118] See Figure 4During the iterative process of simulated annealing feature selection, the dynamic changes in the performance metric (AUC) were demonstrated. Specifically, with the iteration number as the horizontal axis and the AUC value as the vertical axis, the "single-round AUC" (blue line) presents the immediate performance of each feature subset in each round, while the "3-round rolling mean" (orange line) smooths out single-round fluctuations and reflects trend changes. The orange-filled area represents the fluctuation range of the single-round AUC. From the process, in the early iterations (around round 150), the single-round AUC showed significant fluctuations (e.g., dropping to around 0.77), but the 3-round rolling mean gradually increased. As the iterations progressed, the fluctuation range of the single-round AUC first narrowed and then widened, significantly climbing to 0.86 around round 167.5, while the 3-round rolling mean increased synchronously. In terms of parameters, the annealing temperature in this stage of simulated annealing was in a decaying process: the initial temperature was higher, allowing performance fluctuations to explore the feature space; later, the temperature decreased, and the algorithm was more inclined to accept feature subsets with improved performance, ultimately driving the AUC to converge to a higher level.
[0119] In one embodiment of the present invention, the attention weights of the deep prediction architecture are guided and corrected using key feature clusters. The original attention weight distribution generated by the multi-scale temporal attention mechanism is extracted from the trained base prediction model. This original attention weight distribution may include attention scores between different semantic encoding vectors in the long-range attention path and attention scores between features at different time points within a temporal data segment in the short-range attention path. The features contained in the key feature clusters are mapped to the feature dimensions corresponding to the multi-scale temporal attention mechanism. Key feature clusters are sets of features identified by the feature influence propagation graph that are internally connected and contribute significantly to model performance. For example, a key feature cluster may include "Gleason score," "clinical T stage," and "prostate-specific antigen baseline value." In the hierarchical knowledge network, these features each correspond to associated nodes, and are then mapped to specific dimensional indices of the input feature vector. The original attention weight distribution is enhanced on the feature dimensions corresponding to key feature clusters, while the weight values on the feature dimensions corresponding to non-key feature clusters are proportionally weakened. For long-range attention, when calculating the attention between two semantic encoding vectors, if these vectors show a significant correlation on the key feature cluster dimension, their attention score is increased. For short-range attention, within a certain time series, if a record at a certain time point belongs to a feature of a key feature cluster, the attention weight at that time point is increased. The adjusted attention weight distribution is then normalized to ensure that the sum of all weights is one. This is typically done by recalculating the probability distribution of the adjusted original scores using the Softmax function. The normalized attention weight distribution is then reinjected into the base prediction model, replacing the original attention weight parameters, resulting in a guided correction deep prediction architecture. This guided correction deep prediction architecture will focus more on the clinical information indicated by key feature clusters during inference. One method for adjusting short-range attention weights can be described as follows:
[0120] ;
[0121] in: It is the adjusted version, used for the first Raw attention scores at each time point It is the raw attention score calculated by the basic prediction model. It is a positive guiding strength coefficient. It is an indicator function, when the first... Features processed at each time point Belongs to key feature cluster The value is 1 if it is true, and 0 otherwise. After this adjustment, all The weights will then be processed by the Softmax function to obtain the final normalized weights. In some embodiments, the guiding strength coefficient... It can be dynamically set based on the average contribution of key feature clusters.
[0122] In practical implementation, an online adaptive fine-tuning loop for the prognostic risk prediction model is established. The assembled prognostic risk prediction model is deployed to the online prediction service environment, which receives new patient data streams pushed in real time from the hospital information system. Upon receiving the new input patient data streams in real time, the same hierarchical knowledge network mapping, temporal slicing, and semantic encoding processes are applied to the new input patient data streams according to the construction method, generating online semantic encoding vectors. This means that the data of new patients also needs to be sliced based on the topology of diagnosis and treatment state transitions and encoded using hierarchical knowledge networks. The online semantic encoding vectors are input into the prognostic risk prediction model to obtain the prognostic risk prediction results. Simultaneously, the uncertainty estimate of the prognostic risk prediction results is calculated. The uncertainty estimate can be obtained by observing the probability entropy output by the model or by estimating the prediction variance using a Bayesian neural network method. When the uncertainty estimate of the prediction results exceeds a preset confidence threshold, a fine-tuning mechanism is triggered. For example, if the prediction probability entropy is greater than 0.7 or the prediction variance is greater than 0.05, the model is considered to have low confidence in predicting that sample, and adaptive learning is required.
[0123] The fine-tuning mechanism constructs mini-batch fine-tuning data using newly input patient data streams and their corresponding true prognostic labels. These labels may originate from biochemical recurrence, radiological progression, or survival status confirmed by subsequent follow-up. Mini-batch fine-tuning data is used to update some internal parameters of the prognostic risk prediction model using gradient descent with a limited number of steps, achieving incremental adjustment. In practice, typically only the last few layers of the model are unfrozen and updated to avoid catastrophic amnesia, and the number of gradient update steps is strictly controlled. The parameters adjusted and the data batches for each fine-tuning operation are recorded for model version tracking and rollback. Each fine-tuning generates a log with a timestamp and batch identifier, recording the name of the updated parameter, the norm change before and after the update, and the anonymized patient batch ID used. This allows for rollback to previous parameter snapshots if model performance is found to have degraded due to certain batches of data. In essence, the online adaptive fine-tuning loop enables the model to continuously adapt to the slow evolution of patient population characteristics or new treatment modalities in clinical practice. An optional uncertainty metric for triggering fine-tuning is the entropy of the predicted probability distribution.
[0124] ;
[0125] in: This represents the entropy value of the model's predicted output for a single sample. The model predicts that the sample belongs to a certain category. The probability, for example in binary risk prediction, c can take two values: "low risk" and "high risk". This can be understood as entropy. The larger the value, the higher the uncertainty of the model's predictions.
[0126] See Figure 5 In the feature influence propagation analysis phase of the prognostic risk prediction model for prostate cancer, this heatmap presents the co-occurrence network and community segmentation results of prognostic-related clinical features. In practice, the co-occurrence intensity of features is quantified in matrix form (the matrix element values correspond to the strength of the co-occurrence association between features, with values ranging from 0 to 1). The feature nodes are clustered using a community detection algorithm to form four key feature clusters: Key feature cluster 1 (red box) includes Gleason score, clinical T stage, and PSA baseline value, with a co-occurrence intensity of ≥0.80 among the three, indicating that the features within this cluster have a strong association in prognostic assessment; Feature cluster 2 (blue box) includes age and lymph node status, with co-occurrence intensities of 1.00 (age-age) and 0.75 (age-lymph node status); Feature cluster 3 (green box) includes treatment plan, bone scan results, and history of biochemical recurrence, with a co-occurrence intensity of ≥0.70; Feature cluster 4 (yellow box) includes BMI and number of comorbidities, with co-occurrence intensities of 1.00 (BMI-BMI) and 0.70 (BMI-number of comorbidities). The heatmap visually presents the degree of correlation between features through color gradients (from light to dark corresponding to co-occurrence intensity from 0 to 1). Combined with community partitioning results, it can clarify the synergistic effect of different feature clusters in the prognostic model, providing key feature dimensions for attention weight-guided correction of subsequent deep prediction architectures. In the parameter configuration, the co-occurrence intensity threshold is set to 0.70, and the internal connection density threshold for community partitioning is 0.85.
[0127] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0128] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A method for constructing a prognostic risk prediction model for prostate cancer patients, characterized in that, The process includes the following: Create a hierarchical knowledge network of prognostic features; Based on the diagnosis and treatment event logs, the temporal causal dependencies of the diagnosis and treatment path are mined to generate a diagnosis and treatment state transition topology; Based on the aforementioned diagnosis and treatment state transition topology, the patient's entire disease course data is sliced to form multiple continuous time-series data segments with overlapping information; The hierarchical knowledge network is used to perform feature semantic mapping and encoding on each time-series data segment; The encoded temporal data segments are input into a deep prediction architecture that includes a multi-scale temporal attention mechanism; A simulated annealing-optimized feature selection loop is run synchronously. This feature selection loop interacts with the training process of the deep prediction architecture to dynamically filter feature subsets, including: Initialize a global feature pool containing all potential clinical features; After each training cycle of the basic prediction model, a subset of features is randomly sampled from the global feature pool; Using the data corresponding to the currently sampled feature subset, perform forward inference on the basic prediction model and calculate the model's performance index on the current feature subset. The simulated annealing algorithm is used to determine whether to accept the currently sampled feature subset as a new candidate subset, with the performance index as the optimization target, and the probability of accepting the suboptimal subset is dynamically adjusted according to the annealing temperature. The subset of candidate features accepted in multiple iterations is recorded and fused to form a dynamically evolving feature importance distribution; Based on the feature importance distribution, features with importance scores exceeding a threshold are selected from the global feature pool to form a dynamically updated feature subset, which will be used to guide the model training in the next stage. Based on the intermediate results of the feature selection loop, a feature influence propagation graph is constructed to identify key feature clusters, including: Collect the subset of accepted candidate features recorded in multiple iterations of the simulated annealing optimization feature selection loop; Analyze the frequency of feature co-occurrence and conditional dependencies in each candidate feature subset; A weighted directed feature co-occurrence network is constructed using features as nodes and the co-occurrence strength and conditional dependency strength between features as edges. A community detection algorithm is run on the feature co-occurrence network to group closely connected feature nodes into the same community, and each community forms a feature cluster. Calculate the internal connectivity density and average contribution to model performance for each feature cluster, and mark the feature clusters with high internal connectivity density and large average contribution as key feature clusters; Guided correction of the attention weights of the deep prediction architecture using the key feature clusters includes: Extract the original attention weight distribution generated by the multi-scale temporal attention mechanism from the trained base prediction model; Map the features contained in the key feature clusters to the feature dimensions corresponding to the multi-scale temporal attention mechanism; The original attention weight distribution is enhanced on the feature dimensions corresponding to the key feature clusters, while the weight values on the feature dimensions corresponding to non-key feature clusters are weakened proportionally. The adjusted attention weight distribution is normalized to ensure that the sum of all weights is one. The normalized attention weight distribution is re-injected into the basic prediction model to replace the original attention weights, thereby obtaining a deep prediction architecture with guided correction. The guided correction deep prediction architecture, the final output of the feature selection loop, and the diagnosis and treatment state transition topology are integrated to assemble a complete prognostic risk prediction model. An online adaptive fine-tuning loop is established for the prognostic risk prediction model, and the internal parameters of the model are incrementally adjusted based on the newly input patient data stream.
2. The method for constructing a prognostic risk prediction model for prostate cancer patients according to claim 1, characterized in that, The process of creating a hierarchical knowledge network of prognostic features includes: The hierarchical knowledge network contains multiple associated nodes, each of which corresponds to a clinical feature and its attribute definition. Entities and attributes are extracted from structured electronic medical records and unstructured clinical texts. The entities include disease names, examination items, treatment plans, and biomarkers. Define the medical logical relationships between entities, including causal relationships, accompaniment relationships, temporal relationships, and mutual exclusion relationships; Based on the aforementioned medical logic, multiple related nodes are connected to form a directed acyclic graph structure, which serves as the initial knowledge network. A medical ontology library is introduced to perform semantic disambiguation and concept merging on the nodes in the initial knowledge network to ensure the uniqueness of each associated node; Each associated node is assigned a weight vector and a confidence interval. The weight vector reflects the importance of the clinical feature in different prognostic assessment scenarios, and the confidence interval reflects the medically reasonable range of values for the clinical feature. The network structure with added weight vectors and confidence intervals is solidified to form the hierarchical knowledge network.
3. The method for constructing a prognostic risk prediction model for prostate cancer patients according to claim 2, characterized in that, The process of generating a treatment state transition topology by mining the temporal causal dependencies of treatment paths based on treatment event logs includes: Analyze the patient's entire diagnosis and treatment process event log recorded in the hospital information system. Each event log contains the diagnosis and treatment action, execution time, and patient status identifier. A causal discovery algorithm is used to analyze the sequence and conditional probability of different diagnostic and treatment actions, and to identify statistically significant causal edges. By using patient status identifiers as state nodes and identifying causal edges as directed connections between state nodes, a preliminary diagnosis and treatment state transition network is constructed. Calculate the transition strength and uncertainty measure of each directed connection in the diagnosis and treatment state transition network; Based on the transition strength and uncertainty measure, the preliminary diagnosis and treatment state transition network is pruned and merged to simplify it, removing redundant connections and merging similar state nodes to generate a refined diagnosis and treatment state transition topology.
4. The method for constructing a prognostic risk prediction model for prostate cancer patients according to claim 3, characterized in that, Based on the aforementioned diagnosis and treatment state transition topology, the process of slicing the patient's entire disease course data to form multiple continuous time-series data segments with overlapping information includes: Using each state node in the aforementioned diagnosis and treatment state transition topology as a time anchor point, a core time window is determined on the patient's entire disease course timeline; Around each core time window, a preset time range is extended forward and backward to form an extended time window, with adjacent extended time windows overlapping in time; All clinical data within each extended time window are extracted from the patient's full disease course data to form a raw data slice; For each original data slice, based on its corresponding state node, relevant associated nodes and their attribute definitions are matched from the hierarchical knowledge network; By using the matched associated nodes and their attribute definitions, the original data slices are restructured in a structured manner, missing attribute fields are filled in and the data format is unified to form standardized time-series data segments.
5. The method for constructing a prognostic risk prediction model for prostate cancer patients according to claim 4, characterized in that, The process of using the hierarchical knowledge network to perform feature semantic mapping and encoding for each time series data segment includes: Each data record in the time series data segment is semantically similar to the associated nodes in the hierarchical knowledge network to find the corresponding target associated node; Read the weight vector and confidence interval of the target associated node; The original numerical values of the data records are transformed into the semantic space defined by the target associated node. The transformation includes numerical normalization, outlier truncation, and reasonableness verification based on confidence intervals. The transformed data records are arranged in chronological order and concatenated with the corresponding weight vectors to generate high-dimensional feature vectors. The high-dimensional feature vector is compressed and encoded using a time-series encoder to obtain a fixed-dimensional semantic encoding vector, which represents the feature expression of the time-series data segment.
6. The method for constructing a prognostic risk prediction model for prostate cancer patients according to claim 5, characterized in that, Inputting the encoded temporal data segments into a deep prediction architecture that includes a multi-scale temporal attention mechanism includes: The deep prediction architecture employs a phased, gradual unfreezing strategy during training. A neural network with a deep prediction architecture is constructed, which includes parallel long-range attention paths and short-range attention paths, used to capture dependencies across time-series data segments and local patterns within a single time-series data segment, respectively. Initialize all parameters of the deep prediction architecture and freeze all weights except for the final output layer; The semantic encoding vector is input into the deep prediction architecture in chronological order, and the unfrozen output layer is trained using the initial loss function until the loss converges. The deeper network modules in the deep prediction architecture are unfrozen step by step in a preset order. After each unfreezing, the unfrozen part and the previously unfrozen part are retrained using data containing historical semantic encoding vectors, and the network parameters are updated. Once all modules of the deep prediction architecture have been unfrozen and trained, the basic prediction model is obtained.
7. The method for constructing a prognostic risk prediction model for prostate cancer patients according to claim 6, characterized in that, The process of establishing the online adaptive fine-tuning loop of the prognostic risk prediction model includes: Deploy the assembled prognostic risk prediction model to the online prediction service environment; The system receives new patient data streams in real time and performs the same hierarchical knowledge network mapping, temporal slicing, and semantic encoding processing on the new patient data streams according to the process in the construction method, generating online semantic encoding vectors. The online semantic encoding vector is input into the prognostic risk prediction model to obtain the prognostic risk prediction result, and the uncertainty estimate of the prognostic risk prediction result is calculated at the same time. When the uncertainty of the prediction result exceeds the preset confidence threshold, the fine-tuning mechanism is triggered; The fine-tuning mechanism utilizes newly input patient data streams and their corresponding real prognostic labels to construct small-batch fine-tuning data; Using the small batch of fine-tuning data, some internal parameters of the prognostic risk prediction model are updated by gradient descent with a limited number of steps to achieve incremental adjustment; Record the parameters and data batches adjusted in each fine-tuning operation for model version tracking and rollback.