A survival risk prediction method and system fusing wsi images and clinical data

By employing feature compression and multi-stage pathological characterization methods, combined with WSI images and clinical data for feature encoding and cross-modal fusion, the problem of insufficient information utilization in existing technologies is solved, thereby improving the accuracy and stability of survival risk prediction.

CN122201596APending Publication Date: 2026-06-12SHANGHAI JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI JIAOTONG UNIV
Filing Date
2026-04-24
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing survival risk prediction methods struggle to fully utilize the multi-source heterogeneous information from WSI images and clinical data, resulting in high computational complexity, excessive redundant information, difficulty in characterizing global dependencies, and insufficient prediction accuracy and generalization ability.

Method used

By employing feature compression, multi-stage pathological characterization, and cross-modal fusion methods, and utilizing multilayer perceptron and cross-attention mechanisms, combined with WSI images and clinical data, feature encoding, compression, modeling, and modulation are performed to extract multi-stage pathological characterization features and conduct cross-modal fusion to generate survival risk prediction results.

🎯Benefits of technology

It improves the efficiency of utilizing global pathological information, enhances the ability to characterize key prognostic information, and improves the accuracy and stability of survival risk prediction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122201596A_ABST
    Figure CN122201596A_ABST
Patent Text Reader

Abstract

The present application relates to the field of artificial intelligence assisted medical technology, and discloses a survival risk prediction method and system fusing WSI images and clinical data, comprising: acquiring WSI images and clinical data of a to-be-tested object, and respectively performing feature coding on the WSI images and the clinical data to obtain pathological features and clinical features; performing feature compression processing on the pathological features to obtain a compressed pathological feature sequence representing overall information of the WSI images; performing feature modeling and aggregation on the compressed pathological feature sequence to extract multi-stage pathological feature; modulating the multi-stage pathological feature based on the clinical features, and performing cross-modal fusion on the modulated clinical features and the pathological features to output a survival risk prediction result of the to-be-tested object; through the efficient cooperative fusion of the WSI images and the clinical data by the above method, the accuracy, stability and clinical application value of the survival risk prediction are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence-assisted medical technology, and in particular to a survival risk prediction method and system that integrates WSI images and clinical data. Background Technology

[0002] In recent years, artificial intelligence-based methods for medical prognostic analysis have received widespread attention. Survival risk prediction, as an important research direction in prognostic analysis, has significant application value in assisting clinicians in identifying high-risk patients and developing individualized treatment strategies.

[0003] Whole-slide images (WSI) contain rich information on tissue structure, cell morphology, and the tumor microenvironment, making them an important data source for survival analysis. However, due to the typically ultra-high resolution of WSI images, directly modeling from the original images often presents challenges such as high computational complexity, excessive redundancy, and difficulty in characterizing global dependencies. On the other hand, clinical data can supplement basic patient information, pathological staging, and treatment-related information, thus providing strong complementarity with pathological image information.

[0004] Most existing survival risk prediction methods employ a single-modality modeling approach, making it difficult to fully utilize the joint representation capabilities of multi-source heterogeneous data. While some multimodal methods introduce fusion mechanisms between pathological images and clinical information, they still have shortcomings in pathological feature compression, hierarchical representation extraction, and feature modulation guided by clinical information. This results in limited ability to capture key prognostic information, thereby affecting prediction performance and generalization ability.

[0005] Therefore, it is necessary to propose a survival risk prediction scheme that integrates WSI images and clinical data to achieve effective compression of global pathological information, full extraction of multi-stage pathological characteristics, and efficient fusion of information from different modalities, thereby improving the accuracy of survival risk prediction. Summary of the Invention

[0006] To address the problems of insufficient utilization of overall WSI image information, limited multimodal data fusion effects, and insufficient accuracy in survival risk prediction in existing technologies, this invention provides a survival risk prediction method, system, device, and storage medium that integrates WSI images and clinical data to achieve collaborative modeling of global pathological information and clinical information, thereby improving the accuracy and stability of survival risk prediction.

[0007] On the one hand, the present invention provides a survival risk prediction method that integrates WSI images and clinical data, comprising the following steps: S1: Acquire WSI images and clinical data of the subject to be tested, and encode the WSI images and clinical data for features to obtain pathological features and clinical features respectively; S2: Perform feature compression processing on the pathological features to obtain a compressed pathological feature sequence that represents the overall information of the WSI image; S3: Perform feature modeling and aggregation on the compressed pathological feature sequence to extract multi-stage pathological characterization features; S4: Modulate the multi-stage pathological features based on the clinical features, and perform cross-modal fusion of the modulated clinical features and the pathological features to output the survival risk prediction results of the test subject.

[0008] Further, in step S1, feature encoding of the clinical data includes: For discrete categorical variables in clinical data, an independent embedding mapping layer is constructed for each categorical feature to map discrete values ​​to the corresponding semantic feature space, thus obtaining the categorical feature embedding representation. ; For continuous numerical variables in clinical data, a multilayer perceptron (MLP) is used to model the nonlinear relationships between continuous variables, resulting in continuous feature representations. ; Embed the category features to represent With continuous feature representation The data are spliced ​​and fused, and then further transformed by a multilayer perceptron to output a unified clinical semantic vector. As a priori representation of the aforementioned clinical features, and for subsequent modulation of pathological features and cross-modal fusion, it is expressed as follows: , in, This represents the mapping function of a multilayer perceptron. Represents the clinical semantic vector Location Maintaining the feature space, This represents the feature dimension of the clinical semantic vector.

[0009] Preferably, in step S1, feature encoding of the WSI image includes: The WSI image includes several image patches, and each patch is encoded into a corresponding patch token by the pathological basic model; The spatial coordinate information of each Patch Token is fused with the corresponding visual features, and the spatial perceptual features of each Patch Token are output as the pathological features, represented as follows: , in, For the first The spatial awareness features of the Patch Token, i.e., the first Patch Token One pathological feature, For the first Visual feature vectors of each patch For the first A spatial location encoding vector for each Patch Token.

[0010] Further, in step S2, the feature compression processing of the pathological features includes: Multiple spatially or sequentially adjacent pathological features are grouped together. Based on a learnable weight vector, each pathological feature within a group is weighted and aggregated to obtain the aggregated local region feature vector, i.e., the compressed pathological feature, expressed as: , in, Indicates the first The aggregated features after group compression Indicates the first The set of pathological feature indexes corresponding to the group For the first Learnable weight coefficients corresponding to each pathological feature.

[0011] Further, in step S3, feature modeling of the compressed pathological feature sequence includes: The compressed pathological feature sequence is input into the hybrid expert gating module, and the correlation score between the input feature and multiple expert networks is calculated through the gating network. The top-K experts with the highest relevance scores are selected to perform feature transformation on the compressed pathological features. The outputs of the selected experts are then weighted and aggregated according to their corresponding mixed weights. Combined with residual connections, the output sequence is obtained, as follows: , in, This represents the input compressed pathological feature sequence. This represents the Top-K expert indexes with the highest relevance scores. For the corresponding expert's mixed weight, Indicates the first An expert feedforward network processes the input sequence. The output, This represents the output sequence after weighted aggregation and residual connection.

[0012] Preferably, in step S3, the extraction of the multi-stage pathological characterization features includes: Based on learnable query vectors and cross-attention mechanisms, pathological feature sequences are compressed into fixed-length representations. Multi-stage pathological characterization features are extracted from four perspectives: mean aggregation, maximum aggregation, variance aggregation, and risk-weighted aggregation, respectively represented as follows: , , , , in, This represents the mean aggregation feature. This represents the maximum aggregation feature. Indicates variance aggregation features, This indicates a risk-weighted aggregation characteristic. Indicates the first Each pathological feature indicates that... The number of pathological features, Indicates the first Risk probability weights for each pathological feature.

[0013] Further, in step S4, modulating the multi-stage pathological characterization features includes: The features of the four stages are projected onto the clinical semantic space via a multilayer perceptron, and gating values ​​are generated from the clinical feature vectors. and using the gate value The features of each stage are modulated, and the modulated first stage... Each stage characteristic is represented as follows: , Wherein, it represents the first Characteristics of each stage Indicates the first gated modulation. Characteristics of each stage express Some of these components do not participate in gradient operations, thereby controlling the degree to which clinical information influences pathological features.

[0014] Furthermore, in step S4, the survival risk prediction results of the test object are output, including: Using the clinical feature vector as the query vector, multi-head cross-attention fusion is performed on the four modulated stage features respectively. The fusion results are then concatenated and input into a multilayer perceptron to obtain the unnormalized prediction score logits output by the prediction output model, denoted as: , in, This indicates that the query vector is the clinical vector, and the modulated first vector is used as the query vector. Characteristics of each stage This is the multi-head cross-attention output for the key vector Key and the value vector Value.

[0015] On the other hand, the present invention provides a survival risk prediction system that integrates WSI images and clinical data, comprising: The feature encoding module is used to acquire WSI images and clinical data of the subject to be tested, and to perform feature encoding on the WSI images and clinical data respectively to obtain pathological features and clinical features; The pathological feature compression module is used to perform feature compression processing on the pathological features to obtain a compressed pathological feature sequence that represents the overall information of the WSI image. A multi-stage pathological characterization module is used to perform feature modeling and aggregation on the compressed pathological feature sequence and extract multi-stage pathological characterization features. The cross-modal fusion prediction module is used to modulate the multi-stage pathological characterization features based on the clinical features, and to perform cross-modal fusion of the modulated clinical features and the pathological characterization features to output the survival risk prediction results of the subject.

[0016] Compared with the prior art, the beneficial effects of the present invention are: This invention, by performing feature compression processing on WSI images, can retain key features representing overall pathological information while reducing computational complexity, thereby improving the efficiency of utilizing global pathological information. This invention, by performing feature modeling and aggregation on compressed pathological feature sequences, can extract multi-stage pathological characterization features, thereby enhancing the ability to express pathological information at different levels. This invention modulates multi-stage pathological features based on clinical characteristics, which helps to highlight key pathological information related to the individual patient's condition and improve the pertinence of feature representation. This invention, by fusing pathological and clinical features across modalities, can fully explore complementary information between different modalities, thereby improving the accuracy and stability of survival risk prediction results. Attached Figure Description

[0017] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is a flowchart of a survival risk prediction method that integrates WSI images and clinical data according to the present invention. Figure 2 This is a block diagram of a survival risk prediction system that integrates WSI images and clinical data according to the present invention. Detailed Implementation

[0018] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0019] The specific embodiments of the present invention will be described below with reference to the accompanying drawings and examples. Example 1

[0020] Please see Figure 1 This embodiment provides a survival risk prediction method that integrates WSI images and clinical data, including the following steps: S1: Obtain the WSI image and clinical data of the subject to be tested, and encode the WSI image and clinical data respectively to obtain pathological features and clinical features.

[0021] WSI images can be full-field digital images obtained from pathological slide scans. Because WSI images typically have high resolution and large size, in practical processing, they can be preprocessed before being input into a pathological feature encoding network for feature extraction. The preprocessing may include at least one of the following: slicing, filtering effective regions, normalization, and data augmentation. The pathological feature encoding network can be a convolutional neural network, a visual Transformer network, or other neural network structures capable of extracting image representations.

[0022] Clinical data can include basic patient information, pathological staging information, treatment information, examination indicators, and other structured or semi-structured data related to survival risk prediction. For clinical data, preprocessing such as missing value handling, normalization, and discrete variable encoding can be performed before using multilayer perceptrons, embedded layer networks, or other feature encoding structures to obtain clinical feature representations.

[0023] Specifically, feature encoding of the clinical data includes: For discrete categorical variables in clinical data, an independent embedding mapping layer is constructed for each categorical feature to map discrete values ​​to the corresponding semantic feature space, thus obtaining the categorical feature embedding representation. ; For continuous numerical variables in clinical data, a multilayer perceptron (MLP) is used to model the nonlinear relationships between continuous variables, resulting in continuous feature representations. ; Embed the category features to represent With continuous feature representation The data are spliced ​​and fused, and then further transformed by a multilayer perceptron to output a unified clinical semantic vector. As a priori representation of the aforementioned clinical features, and for subsequent modulation of pathological features and cross-modal fusion, it is expressed as follows: , in, This represents the mapping function of a multilayer perceptron. Represents the clinical semantic vector Location Maintaining the feature space, This represents the feature dimension of the clinical semantic vector.

[0024] Secondly, feature encoding of the WSI image includes: The WSI image includes several image patches, and each patch is encoded into a corresponding patch token by the pathological basic model; The spatial coordinate information of each Patch Token is fused with the corresponding visual features, and the spatial perceptual features of each Patch Token are output as the pathological features, represented as follows: , in, For the first The spatial awareness features of the Patch Token, i.e., the first Patch Token One pathological feature, For the first Visual feature vectors of each patch For the first A spatial location encoding vector for each Patch Token.

[0025] S2: Perform feature compression processing on the pathological features to obtain a compressed pathological feature sequence that represents the overall information of the WSI image.

[0026] Since the pathological features corresponding to WSI images typically have long sequence lengths, directly performing global modeling can easily lead to high computational costs and a large amount of redundant information. Therefore, this embodiment uses a feature compression mechanism to compress the pathological features, extracting key features that can characterize the overall information of the WSI image from the original pathological features to form a compressed pathological feature sequence.

[0027] The feature compression process for the pathological features includes: Multiple spatially or sequentially adjacent pathological features are grouped together. Based on a learnable weight vector, each pathological feature within a group is weighted and aggregated to obtain the aggregated local region feature vector, i.e., the compressed pathological feature, expressed as: , in, Indicates the first The aggregated features after group compression Indicates the first The set of pathological feature indexes corresponding to the group For the first Learnable weight coefficients corresponding to each pathological feature.

[0028] In some implementations, the feature compression process can be achieved through attention mechanisms, clustering strategies, pooling operations, learnable query vectors, or other sequence compression methods. The compressed pathological feature sequence retains global pathological information while reducing the complexity of subsequent feature modeling.

[0029] S3: Perform feature modeling and aggregation on the compressed pathological feature sequence to extract multi-stage pathological characterization features.

[0030] In this embodiment, to more fully characterize the hierarchical information and histological features at different scales in pathological images, multi-stage feature modeling is performed on the compressed pathological feature sequence. The multi-stage feature modeling may include multiple feature extraction layers arranged in series or parallel, with each stage used to learn pathological characterization information at different levels.

[0031] Specifically, feature modeling of the compressed pathological feature sequence includes: The compressed pathological feature sequence is input into the hybrid expert gating module, and the correlation score between the input feature and multiple expert networks is calculated through the gating network. The top-K experts with the highest relevance scores are selected to perform feature transformation on the compressed pathological features. The outputs of the selected experts are then weighted and aggregated according to their corresponding mixed weights. Combined with residual connections, the output sequence is obtained, as follows: , in, This represents the input compressed pathological feature sequence. This represents the Top-K expert indexes with the highest relevance scores. For the corresponding expert's mixed weight, Indicates the first An expert feedforward network processes the input sequence. The output, This represents the output sequence after weighted aggregation and residual connection.

[0032] Secondly, the features output from each stage can be aggregated to obtain multi-stage pathological characterization features that comprehensively reflect the pathological state of the test object. This aggregation process can employ methods such as splicing, weighted summation, attention fusion, gating fusion, or other feature integration techniques. Specifically, the extraction of the multi-stage pathological characterization features includes: Based on learnable query vectors and cross-attention mechanisms, pathological feature sequences are compressed into fixed-length representations. Multi-stage pathological characterization features are extracted from four perspectives: mean aggregation, maximum aggregation, variance aggregation, and risk-weighted aggregation, respectively represented as follows: , , , , in, This represents the mean aggregation feature. This represents the maximum aggregation feature. Indicates variance aggregation features, This indicates a risk-weighted aggregation characteristic. Indicates the first Each pathological feature indicates that... The number of pathological features, Indicates the first Risk probability weights for each pathological feature.

[0033] Next, S4: Modulate the multi-stage pathological characterization features based on the clinical features, and perform cross-modal fusion of the modulated clinical features and the pathological characterization features to output the survival risk prediction results of the test subject.

[0034] Clinical features reflect a patient's basic condition, disease stage, and treatment-related information. By modulating pathological features using clinical features, the representation of key pathological information related to an individual's survival risk can be enhanced, improving the specificity and discriminative power of pathological features.

[0035] In step S4, modulating the multi-stage pathological characterization features includes: The features of the four stages are projected onto the clinical semantic space via a multilayer perceptron, and gating values ​​are generated from the clinical feature vectors. and using the gate value The features of each stage are modulated, and the modulated first stage... Each stage characteristic is represented as follows: , Wherein, it represents the first Characteristics of each stage Indicates the first gated modulation. Characteristics of each stage express Some of these components do not participate in gradient operations, thereby controlling the degree to which clinical information influences pathological features.

[0036] In some implementations, the modulation method can be at least one of channel weighting, feature scaling, bias adjustment, gating control, or conditional attention mechanisms. After modulation, the modulated pathological features and clinical features are input into a cross-modal fusion network to achieve correlation modeling and complementary information mining between different modalities.

[0037] Furthermore, the cross-modal fusion network can employ feature splicing combined with fully connected layers, cross-attention mechanisms, bilinear fusion, Transformer fusion networks, or other multimodal fusion structures. The fusion result can be further input into the risk prediction layer to output the survival risk prediction result of the test object. The prediction result can be one or more of the following: risk score, risk stratification category, survival probability, and hazard function value. In this embodiment, the output survival risk prediction result of the test object includes: Using the clinical feature vector as the query vector, multi-head cross-attention fusion is performed on the four modulated stage features respectively. The fusion results are then concatenated and input into a multilayer perceptron to obtain the unnormalized prediction score logits output by the prediction output model, denoted as: , in, This indicates that the query vector is the clinical vector, and the modulated first vector is used as the query vector. Characteristics of each stage This is the multi-head cross-attention output for the key vector Key and the value vector Value.

[0038] In summary, the survival risk prediction method that integrates WSI images and clinical data provided in this embodiment, through compression processing of pathological features, multi-stage feature modeling, and modulation fusion based on clinical features, can improve the representation ability of key prognostic information while preserving the overall pathological information of WSI images, and enhance the synergistic effect between different modal information, thereby improving the accuracy and stability of survival risk prediction results, and has good application value.

[0039] Based on the above methods, such as Figure 2 As shown, this embodiment also provides a survival risk prediction system that integrates WSI images and clinical data, including: The feature encoding module 10 is used to acquire the WSI image and clinical data of the subject to be tested, and to perform feature encoding on the WSI image and clinical data respectively to obtain pathological features and clinical features. The pathological feature compression module 20 is used to perform feature compression processing on the pathological features to obtain a compressed pathological feature sequence that represents the overall information of the WSI image. The multi-stage pathological characterization module 30 is used to perform feature modeling and aggregation on the compressed pathological feature sequence and extract multi-stage pathological characterization features. The cross-modal fusion prediction module 40 is used to modulate the multi-stage pathological characterization features based on the clinical features, and to perform cross-modal fusion of the modulated pathological characterization features and the clinical features to output the survival risk prediction results of the subject.

[0040] It should be noted that the steps in the survival risk prediction method that integrates WSI images and clinical data provided in this embodiment can be implemented based on the corresponding modules in the survival risk prediction system that integrates WSI images and clinical data. Those skilled in the art can refer to the technical solution of the system to implement the steps of the method. That is, the embodiments in the system can be understood as preferred examples of implementing the method, and will not be elaborated here.

[0041] Besides implementing the system and its various devices provided by this invention in purely computer-readable program code, the same functions can be achieved by logically programming the method steps, making the system and its various devices of this invention appear as logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, the system and its various devices provided by this invention can be considered as a hardware component, and the devices included therein for implementing various functions can also be considered as structures within the hardware component; alternatively, the devices for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0042] Finally, it should be noted that the above description is only a preferred embodiment of the present invention, and the scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be pointed out that for those skilled in the art, any improvements and modifications made without departing from the principle of the present invention should also be considered within the scope of protection of the present invention.

[0043] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

Claims

1. A survival risk prediction method integrating WSI images and clinical data, characterized in that, Includes the following steps: S1: Acquire WSI images and clinical data of the subject to be tested, and encode the WSI images and clinical data for features to obtain pathological features and clinical features respectively; S2: Perform feature compression processing on the pathological features to obtain a compressed pathological feature sequence that represents the overall information of the WSI image; S3: Perform feature modeling and aggregation on the compressed pathological feature sequence to extract multi-stage pathological characterization features; S4: Modulate the multi-stage pathological features based on the clinical features, and perform cross-modal fusion of the modulated clinical features and the pathological features to output the survival risk prediction results of the test subject.

2. The survival risk prediction method integrating WSI images and clinical data according to claim 1, characterized in that, In step S1, feature encoding of the clinical data includes: For discrete categorical variables in clinical data, an independent embedding mapping layer is constructed for each categorical feature to map discrete values ​​to the corresponding semantic feature space, thus obtaining the categorical feature embedding representation. ; For continuous numerical variables in clinical data, a multilayer perceptron (MLP) is used to model the nonlinear relationships between continuous variables, resulting in continuous feature representations. ; Embed the category features to represent With continuous feature representation The data are spliced ​​and fused, and then further transformed by a multilayer perceptron to output a unified clinical semantic vector. As a priori representation of the aforementioned clinical features, and for subsequent modulation of pathological features and cross-modal fusion, it is expressed as follows: , in, This represents the mapping function of a multilayer perceptron. Represents the clinical semantic vector Location Maintaining the feature space, This represents the feature dimension of the clinical semantic vector.

3. The survival risk prediction method integrating WSI images and clinical data according to claim 1, characterized in that, In step S1, feature encoding of the WSI image includes: The WSI image includes several image patches, and each patch is encoded into a corresponding patch token by the pathological basic model; The spatial coordinate information of each Patch Token is fused with the corresponding visual features, and the spatial perceptual features of each Patch Token are output as the pathological features, represented as follows: , in, For the first The spatial awareness features of the Patch Token, i.e., the first Patch Token One pathological feature, For the first Visual feature vectors of each patch For the first A spatial location encoding vector for each Patch Token.

4. The survival risk prediction method integrating WSI images and clinical data according to claim 3, characterized in that, In step S2, the feature compression processing of the pathological features includes: Multiple spatially or sequentially adjacent pathological features are grouped together. Based on a learnable weight vector, each pathological feature within a group is weighted and aggregated to obtain the aggregated local region feature vector, i.e., the compressed pathological feature, expressed as: , in, Indicates the first The aggregated features after group compression Indicates the first The set of pathological feature indexes corresponding to the group For the first Learnable weight coefficients corresponding to each pathological feature.

5. The survival risk prediction method integrating WSI images and clinical data according to claim 1, characterized in that, In step S3, feature modeling of the compressed pathological feature sequence includes: The compressed pathological feature sequence is input into the hybrid expert gating module, and the correlation score between the input feature and multiple expert networks is calculated through the gating network. The top-K experts with the highest relevance scores are selected to perform feature transformation on the compressed pathological features. The outputs of the selected experts are then weighted and aggregated according to their corresponding mixed weights. Combined with residual connections, the output sequence is obtained, as follows: , in, This represents the input compressed pathological feature sequence. This represents the Top-K expert indexes with the highest relevance scores. For the corresponding expert's mixed weight, Indicates the first An expert feedforward network processes the input sequence. The output, This represents the output sequence after weighted aggregation and residual connection.

6. The survival risk prediction method integrating WSI images and clinical data according to claim 1, characterized in that, In step S3, the extraction of the multi-stage pathological characterization features includes: Based on learnable query vectors and cross-attention mechanisms, pathological feature sequences are compressed into fixed-length representations. Multi-stage pathological characterization features are extracted from four perspectives: mean aggregation, maximum aggregation, variance aggregation, and risk-weighted aggregation, respectively represented as follows: , , , , in, This represents the mean aggregation feature. This represents the maximum aggregation feature. Indicates variance aggregation features, This indicates a risk-weighted aggregation characteristic. Indicates the first Each pathological feature indicates that... The number of pathological features, Indicates the first Risk probability weights for each pathological feature.

7. The survival risk prediction method integrating WSI images and clinical data according to claim 6, characterized in that, In step S4, modulating the multi-stage pathological characterization features includes: The features of the four stages are projected onto the clinical semantic space via a multilayer perceptron, and gating values ​​are generated from the clinical feature vectors. and using the gate value The features of each stage are modulated, and the modulated first stage... Each stage characteristic is represented as follows: , Wherein, it represents the first Characteristics of each stage Indicates the first gated modulation. Characteristics of each stage express Some of these components do not participate in gradient operations, thereby controlling the degree to which clinical information influences pathological features.

8. The survival risk prediction method integrating WSI images and clinical data according to claim 7, characterized in that, In step S4, the survival risk prediction results of the test object are output, including: Using the clinical feature vector as the query vector, multi-head cross-attention fusion is performed on the four modulated stage features respectively. The fusion results are then concatenated and input into a multilayer perceptron to obtain the unnormalized prediction score logits output by the prediction output model, denoted as: , in, This indicates that the query vector is the clinical vector, and the modulated first vector is used as the query vector. Characteristics of each stage This is the multi-head cross-attention output for the key vector Key and the value vector Value.

9. A survival risk prediction system integrating WSI images and clinical data, characterized in that, include: The feature encoding module is used to acquire WSI images and clinical data of the subject to be tested, and to perform feature encoding on the WSI images and clinical data respectively to obtain pathological features and clinical features; The pathological feature compression module is used to perform feature compression processing on the pathological features to obtain a compressed pathological feature sequence that represents the overall information of the WSI image. A multi-stage pathological characterization module is used to perform feature modeling and aggregation on the compressed pathological feature sequence and extract multi-stage pathological characterization features. The cross-modal fusion prediction module is used to modulate the multi-stage pathological characterization features based on the clinical features, and to perform cross-modal fusion of the modulated clinical features and the pathological characterization features to output the survival risk prediction results of the subject.