Pathological wsi dynamic rendering and computing power scheduling method and system based on multi-modal semantic-space mapping
By using clinical text parsing based on a large language model and a two-dimensional spatial computing power weight matrix, the rendering of pathological images and the allocation of computing power are dynamically optimized, solving the problems of I/O congestion and missed diagnosis in digital pathology processing. This achieves efficient rendering and precise computing power scheduling, improving the performance and diagnostic accuracy of the digital pathology system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN SHENGQIANG TECH
- Filing Date
- 2026-04-02
- Publication Date
- 2026-06-19
AI Technical Summary
Existing digital pathology processing solutions suffer from I/O congestion, memory waste, and missed diagnoses of small lesions in terms of rendering and computing power allocation. This is mainly because the system cannot predict the area of focus for doctors, resulting in high-magnification images of non-critical areas being blindly pushed into memory and computing power being evenly distributed.
By parsing clinical texts using a large language model to extract diagnostic intent, and combining low-magnification panoramic images to generate a two-dimensional spatial computing power weight matrix, the system dynamically guides the asymmetric cascade scheduling of memory rendering and AI models, prioritizing the push of high-weight tiles and downgrading low-priority regions.
It effectively solves the system I/O bottleneck, improves rendering efficiency, saves memory, reduces computing power waste, lowers the false negative rate, and achieves a smooth image viewing experience and efficient AI diagnosis.
Smart Images

Figure CN121979684B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of digital pathology, multimodal large language model (LLM), and computer system resource scheduling, specifically to a method and system for dynamically optimizing the loading priority of ultra-high resolution pathological images (WSI) tiles and the allocation of underlying GPU computing power by utilizing prior knowledge of clinical text. Background Technology
[0002] Digital pathology whole-slide images (WSI) possess ultra-high resolution, typically reaching the 100,000 x 100,000 pixel level, with single file sizes ranging from several gigabytes to tens of gigabytes. To accommodate computer processing, a multi-level "tile pyramid" architecture is usually employed for their storage, rendering, and AI analysis. However, existing digital pathology processing solutions face the following serious bottlenecks in practical clinical applications:
[0003] Firstly, regarding front-end rendering and memory scheduling, existing WSI image viewing systems typically employ hard-coded strategies such as "sliding window" or "uniform loading per field of view" when loading images. The system cannot predict the anatomical areas that the doctor is currently most interested in, resulting in a large number of non-critical areas (such as large areas of blank glass or normal adipose tissue) being blindly pushed into memory and video memory at high magnification, causing severe I / O congestion, rendering stutters, and even memory overflow (OOM).
[0004] Secondly, regarding the allocation of underlying computing power, existing AI-assisted diagnostic models often distribute the same amount of GPU inference power across all tissue regions during full-scan scanning, for example, uniformly using large models like ViT or ResNet with massive parameters. In reality, doctors' diagnostic focus is often hidden within the patient's pathology request or clinical text. Existing AI systems ignore this valuable prior textual knowledge, consuming significant computing power in low-risk areas. This not only leads to longer overall report generation times but also makes it highly susceptible to missing minute lesions in high-risk areas due to the even distribution of computing power. Summary of the Invention
[0005] This invention provides a method and system for dynamic rendering and computing power scheduling of pathological WSI based on multimodal semantic-spatial mapping. It addresses the problems of existing technologies that use indiscriminate uniform loading and computing power amortization strategies when processing ultra-high resolution pathological images, which lead to system I / O congestion, serious waste of memory and GPU computing power, and easy to miss small lesions due to the dispersion of computing power.
[0006] The core technology of this invention is to extract diagnostic intent by parsing clinical text using a large language model, combine it with a low-magnification panoramic image to generate a two-dimensional spatial computing power weight matrix, and use this to dynamically guide the underlying system's memory prefetching on demand and the asymmetric cascade scheduling of large / small AI models.
[0007] In a first aspect, the present invention provides a method for dynamic rendering and computational scheduling of pathological WSI based on multimodal semantic-spatial mapping, the method comprising the following steps:
[0008] Acquire the clinical prior text and corresponding whole slice images (WSI) of the target patient;
[0009] Clinical prior text is input into a large language model for parsing, diagnostic intent labels with initial semantic risk weights are extracted, and the diagnostic intent labels are transformed into pathological semantic feature vectors.
[0010] Low-resolution panoramic images of whole slice images (WSI) are extracted. Local visual features of the panoramic images are extracted through a lightweight segmentation network. Cross-modal similarity matching is performed between the local visual features and pathological semantic feature vectors. Regions that meet the similarity requirements are mapped to a two-dimensional physical tile coordinate set at a high-resolution level.
[0011] For the image tiles to be processed, the scheduling priority of the image tiles is calculated by combining their corresponding semantic risk weights, local image information entropy and rendering cost, and a two-dimensional spatial computing power weight matrix is generated based on the scheduling priority of the entire tile.
[0012] The two-dimensional spatial computing power weight matrix is input into the underlying resource scheduling engine, and the memory rendering scheduling strategy and / or the computing power allocation strategy of the artificial intelligence inference model are executed based on the scheduling priority.
[0013] Furthermore, the steps of inputting prior clinical text into a large language model for parsing specifically include:
[0014] Medical entities containing anatomical orientation or specific histological features are extracted using preset prompts and used as diagnostic intent tags.
[0015] The diagnostic intent labels are assigned initial semantic risk weights, and the standardized medical entities are encoded into pathological semantic feature vectors.
[0016] Furthermore, the step of performing cross-modal similarity matching between local visual features and pathological semantic feature vectors specifically includes:
[0017] Candidate tissue connected components and local visual features of low-resolution panoramic images are generated using a lightweight morphological segmentation network.
[0018] Query the pre-defined cross-modal mapping dictionary and calculate the similarity score between the pathological semantic feature vector and the local visual features of the connected domains of each candidate tissue.
[0019] Candidate connected regions with similarity scores higher than a preset threshold are selected as target regions, and the target regions are linearly mapped to a set of two-dimensional physical tile coordinates at a high resolution level according to the scaling relationship of the image pyramid hierarchy.
[0020] Furthermore, the construction method of the cross-modal mapping dictionary includes: annotating typical anatomical or pathological regions on low-resolution pathological images to form a visual prototype library, and inputting standardized pathological terms and corresponding annotated regions into the text encoder and image encoder respectively, and completing cross-modal alignment training through visual-language contrastive learning to obtain a semantic-visual embedding space for retrieval.
[0021] Furthermore, the scheduling priority of image tiles is calculated, specifically including:
[0022] The semantic risk weights corresponding to the image tiles are weighted and summed with the local image information entropy. The sum is then compared or weighted based on the rendering cost of loading the image tiles from the storage medium to the video memory to obtain the final scheduling priority.
[0023] Among them, local image information entropy is used to characterize the image complexity after excluding blank and uniform bubble interference, and the rendering cost is obtained by comprehensively evaluating file size, storage and reading latency, network transmission latency and decoding and handling time.
[0024] Furthermore, the memory rendering scheduling strategy includes: forcibly pushing image tiles with a scheduling priority greater than the first threshold in the two-dimensional spatial computing power weight matrix into the system's cache; and executing lazy loading or discard instructions for image tiles with a scheduling priority lower than or equal to the first threshold.
[0025] Furthermore, the computing power allocation strategy of the artificial intelligence inference model includes: enabling an adaptive cascaded deep learning framework, calling a heavy inference model for fine recognition and analysis of image tiles with a scheduling priority greater than the second threshold in the two-dimensional spatial computing power weight matrix; and calling a lightweight coarse screening model for pre-screening or skipping image tiles with a scheduling priority lower than or equal to the second threshold.
[0026] Furthermore, the first threshold and the second threshold are dynamically adaptive thresholds; the method also includes: based on the preset baseline threshold, the first threshold and the second threshold are dynamically adjusted in real time by combining the full slice image size, the number of candidate tiles, the current remaining host memory, the remaining video memory of the graphics processor, and the degree of input and output congestion.
[0027] Furthermore, the pathological WSI dynamic rendering and computing power scheduling method also includes anti-hallucination confidence verification and system rollback mechanisms:
[0028] When the set of two-dimensional physical tile coordinates obtained by cross-modal matching is empty, the confidence of cross-modal matching is lower than the safety threshold, or the system experiences input and output anomalies, the system rollback mechanism of the underlying resource scheduling engine is triggered, the scheduling according to the two-dimensional spatial computing power weight matrix is stopped, and the resource allocation mode is rolled back to the full-map uniform scanning and sequential loading mode.
[0029] Secondly, this invention provides a pathological WSI dynamic rendering and computing power scheduling system based on multimodal semantic-spatial mapping, comprising:
[0030] The multi-source data acquisition and parsing module is used to acquire clinical prior text and whole slide images, extract diagnostic intent labels with semantic risk weights through a large language model and convert them into pathological semantic feature vectors.
[0031] The image topology sensing module is used to extract low-resolution panoramic images and obtain local visual features, and to map high-risk areas to a set of two-dimensional physical tile coordinates at high resolution through cross-modal matching.
[0032] The computing power matrix generation module is used to combine the semantic risk weights of image tiles, local image information entropy, and rendering cost to calculate scheduling priority and generate a two-dimensional spatial computing power weight matrix.
[0033] The underlying resource scheduling engine receives the two-dimensional spatial computing power weight matrix and performs memory prefetching scheduling and dynamic computing power allocation for image tiles and artificial intelligence cascade models accordingly.
[0034] The main contributions and innovations of this invention are as follows:
[0035] 1. Overcoming system I / O bottlenecks and improving rendering efficiency: The conventional uniform loading mechanism based on field of view is abandoned, and natural language is used as the memory scheduling instruction for the underlying image engine. By prioritizing the loading of high-weight tiles and performing lazy loading on low-priority areas, network bandwidth and front-end memory are effectively saved. Experiments show that front-end high-ratio rendering latency is significantly reduced, peak memory usage is greatly reduced, and a smooth viewing experience is achieved.
[0036] 2. Reduce computational waste and accelerate model inference: Asymmetric cascaded computational power allocation is achieved using prior textual knowledge. Heavy inference models with a large number of parameters are allocated to key areas, while lightweight coarse-screening models are allocated to non-key areas. This achieves efficient utilization of computational power with limited hardware resources, effectively improving single-chip inference efficiency and reducing model computational consumption.
[0037] 3. Highly reliable medical fault-tolerant design: A cross-modal dictionary verification and multi-condition triggered system fallback mechanism are introduced. When the large language model produces medical hallucinations or the text is misleading, the system can automatically detect the anomaly and degrade to a global uniform rendering and scanning mode, effectively reducing the risk of hallucinations generated by traditional multimodal large models in medical applications, and has good practical application potential.
[0038] Details of one or more embodiments of the present invention are set forth in the following drawings and description, so that other features, objects and advantages of the invention will be more readily understood. Attached Figure Description
[0039] The accompanying drawings, which are included to provide a further understanding of the invention and form part of this invention, illustrate exemplary embodiments of the invention and are used to explain the invention, but do not constitute an undue limitation of the invention. In the drawings:
[0040] Figure 1 This is an architecture diagram of a pathological WSI dynamic rendering and computing power scheduling system based on multimodal semantic-spatial mapping according to an embodiment of the present invention;
[0041] Figure 2 This is a flowchart of a pathological WSI dynamic rendering and computing power scheduling method based on multimodal semantic-spatial mapping according to an embodiment of the present invention;
[0042] Figure 3 This is a schematic diagram illustrating the principle of cross-modal semantic-space mapping according to an embodiment of the present invention;
[0043] Figure 4 This is an execution diagram of the underlying software and hardware scheduling logic according to an embodiment of the present invention. Detailed Implementation
[0044] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with one or more embodiments of this specification. Rather, they are merely examples of apparatuses and methods consistent with some aspects of one or more embodiments of this specification as detailed in the appended claims.
[0045] It should be noted that the steps of the corresponding methods are not necessarily performed in the order shown and described in this specification in other embodiments. In some other embodiments, the methods may include more or fewer steps than described in this specification. Furthermore, a single step described in this specification may be broken down into multiple steps in other embodiments; and multiple steps described in this specification may be combined into a single step in other embodiments.
[0046] To address the technical problems of memory congestion and GPU computing power waste caused by uniform loading of whole slice images (WSI) in existing technologies, this invention proposes a method and system for dynamic rendering and computing power scheduling of pathological WSI based on multimodal semantic-spatial mapping.
[0047] Example 1
[0048] Reference Figure 1 The diagram below shows the overall system architecture of this invention. The pathological WSI dynamic rendering and computing power scheduling system based on multimodal semantic-spatial mapping of this invention can be divided into the following core functional modules from a logical and data flow perspective. Their specific structure and interaction relationships are detailed below:
[0049] 1. Data source layer (input end)
[0050] Clinical text database (HIS / LIS): As the system's source of prior textual knowledge, it is responsible for providing unstructured or semi-structured medical text data such as clinical application forms and surgical records of target patients.
[0051] Whole Slice Image Storage Server (WSI Storage): As an image data source, it is responsible for storing and providing pathological image files with a multi-level "Tile Pyramid" architecture featuring ultra-high resolution.
[0052] 2. Multimodal analysis and perception layer (processing center)
[0053] Large Language Model Parsing Engine: This module receives clinical text from HIS / LIS, and its core functions include "intent label extraction" and "semantic vectorization". It understands the doctor's focus (such as specific anatomical sites or lesion categories) through cue word engineering, transforms it into diagnostic intent labels with risk weights, and encodes them into pathological semantic feature vectors for downstream output.
[0054] Image Topology Awareness Module: This module connects image storage with the large model engine. Internally, it includes functions for "low-magnification image reading," "lightweight morphological segmentation network," and "physical coordinate anchoring." It first pulls low-magnification panoramic images (1.25x or 2.5x) from the WSI server, quickly segments the organizational topology using a lightweight network, and performs cross-modal matching between the semantic feature vectors from the large model and the visual features of the image. Ultimately, it identifies the two-dimensional physical tile coordinates of high-risk areas at high magnification in WSI.
[0055] 3. Scheduling Strategy Generation Layer (Decision Core)
[0056] Computing Power Matrix Generator: This module has a built-in "Spatial Computing Power Weight Matrix Generation Algorithm". It receives the physical coordinates determined by the image topology perception module and combines semantic risk weights, local image information entropy, and memory / video memory rendering costs to calculate the scheduling priority for each tile in the slice. Finally, it generates a "two-dimensional spatial computing power weight matrix" that is proportional to the physical size of WSI, which serves as a "map" to guide the operation of the underlying hardware.
[0057] 4. Low-level resource execution layer (hardware control)
[0058] The underlying resource scheduling engine is the bridge connecting software algorithms and underlying hardware, integrating two core controllers:
[0059] Memory paging / caching controller: It is responsible for intercepting traditional in-order load instructions, forcibly preloading high-priority tiles into the cache (L1 Cache / RAM) according to the computing power matrix, and performing lazy loading or paging on low-priority areas.
[0060] The GPU large / small model dynamic allocator is responsible for controlling the asymmetric allocation of AI computing resources. High-weight tiles are sent to heavy inference models (large models), while low-weight tiles are downgraded to lightweight coarse-screen models (small models). In addition, the system fallback mechanism is also executed in this module.
[0061] 5. Application and Output Layer (User End)
[0062] Front-end doctor's image viewing terminal (Viewer): With the support of the underlying memory scheduling controller, when doctors drag and view ultra-high resolution WSI on this terminal, high-definition images of high-risk areas are already resident in memory, thus obtaining a super-fast image viewing experience with no lag in seconds.
[0063] AI Diagnostic Report Generation Module: With the support of GPU dynamic allocator, this module precisely allocates extremely limited heavy computing power to the core lesion area, quickly generating high-precision auxiliary diagnostic reports including resection margin assessment and lymph node metastasis status, significantly shortening single-piece processing time and reducing the rate of missed micrometastases.
[0064] The Light Model is used for rapid pre-screening of large-scale WS image tiles. It preferably uses MobileNetV3, ResNet18, EfficientNet-B0 or a lightweight detection network to perform preliminary identification and scoring of suspected high-risk areas.
[0065] The Heavy Model is used to perform fine analysis on high-priority regions after coarse screening. It preferably uses ResNet50 / 101, Swin Transformer, ConvNeXt, MIL models or joint segmentation and detection models to perform fine classification, identification or segmentation of target regions.
[0066] Example 2
[0067] Reference Figure 2 This is the main flowchart of the core method of the present invention. The method in this embodiment specifically includes the following steps:
[0068] Step S1: Multi-source data acquisition and prior semantic vectorization
[0069] First, the system retrieves the target patient's prior clinical text (such as pathology request forms and surgical records) from the Clinical Text Database (HIS / LIS) and the corresponding whole-slice images (WSI) from the Whole-Slice Image Storage Server (WSI Storage). Then, the prior clinical text is input into a pre-trained Large Language Model (LLM) parsing engine. Through pre-defined prompt word engineering, the LLM extracts "diagnostic intent labels" containing anatomical orientation or specific histological indications. For example, labels such as "focus on assessing the superior resection margin" or "find lymph nodes" are extracted, and an initial semantic risk weight value (e.g., 0.95) is assigned to each label. Finally, the diagnostic intent label is transformed into a standardized pathological semantic feature vector.
[0070] Step S2: Low-Magnification Topology Sensing and Cross-Modal Physical Coordinate Anchoring
[0071] Combination Figure 3 The cross-modal semantic-spatial mapping principle shown in the diagram involves the system extracting the low-resolution panoramic image (preferably a 1.25x or 2.5x magnification image) from the bottom layer of the WSI pyramid architecture. The image topology awareness module uses a lightweight morphological segmentation network (such as a multi-scale feature fusion network based on MobileNetV3, EfficientNet-Lite backbone encoders combined with U-Net or FPN) to extract the local visual features of the panoramic image, generating an organization topology table and candidate organization connected components.
[0072] In this embodiment, to align text semantics with image features, the system needs to query a "pathological anatomy semantics and visual morphological feature mapping dictionary." This dictionary is not a simple rule base, but is constructed using a "static prototype library + dynamic embedding matching" approach.
[0073] Offline construction phase: Pathology experts annotate typical anatomical / pathological regions such as lymph nodes, resection margins, and fat areas on 1.25x or 2.5x low-magnification WSI thumbnails, forming a low-magnification visual prototype library with semantic labels. Subsequently, standardized pathological terms and their corresponding annotated regions are input into the text encoder and image encoder, respectively. Cross-modal alignment training is completed through CLIP-style visual-language contrastive learning to obtain a retrieval-ready semantic-visual embedding space.
[0074] During the online matching phase (system runtime): The system first calculates the similarity between the pathological semantic feature vector generated in step S1 and the local visual feature vectors of each candidate region in the panoramic image. To improve the localization accuracy, the matching process is not limited to a single vector dot product, but combines the area, roundness, boundary features, and surrounding adipose tissue context of the candidate region for joint scoring.
[0075] The system filters out image regions with a joint score (similarity) higher than a preset threshold. Based on the WSI pyramid hierarchy scaling relationship, these target regions (candidate boxes) are linearly mapped to a two-dimensional physical tile coordinate set at a high resolution level.
[0076] Taking the location of "lymph nodes" as an example: After the large language model extracts the diagnostic intent label of "lymph node" from the application form and maps it to a standard semantic identifier, the system generates candidate tissue connected regions on a 1.25x low-magnification image using a lightweight morphological segmentation network, and extracts the bounding box coordinates and visual features of each candidate region. Next, cross-modal similarity matching is performed between the "lymph node" semantic vector and the visual vectors of each candidate region, selecting regions with scores higher than a threshold as lymph node candidate boxes, obtaining their physical coordinates at the 1.25x level. Finally, these coordinates are linearly scaled and mapped to a set of tile coordinates at a high magnification (e.g., 40x) level for subsequent rendering prefetching and priority use by the heavy inference model. This process forms a complete technical loop from "textual semantics" to "low-magnification localization" and then to "high-magnification tile scheduling."
[0077] Step S3: Construct the "diagnostic benefit - rendering cost" scheduling function and generate the weight matrix
[0078] For each tile to be processed in the WSI pyramid, the system calculates its dynamic scheduling priority. The priority function is preferably calculated using a linear weighted summation method, as shown in the following formula:
[0079]
[0080] Here, P(x,y) represents the dynamic scheduling priority (i.e., the objective function value) of the image tile located at coordinates (x,y). The magnitude of this value directly determines whether the tile is prioritized for cache loading and whether the heavy inference model is invoked. The numerator represents the "diagnostic benefit." This is the weighted semantic risk weight. The weighted local image information entropy; the denominator represents the "rendering cost". It is calculated by combining the size of the tile file, storage and retrieval time, and transmission time. To prevent smoothing minima with a denominator of zero, x and y represent the horizontal and vertical coordinates of a high-resolution whole-slice image (WSI) tile in a two-dimensional physical coordinate system. α represents the adjustment coefficient (weighting factor) for semantic risk weights. It is used to dynamically control the proportion of prior textual knowledge in the comprehensive decision-making process under different disease types or clinical diagnostic scenarios. β represents the adjustment coefficient (weighting factor) for local image information entropy. It is used to control the proportion of underlying objective visual features of the image in the comprehensive decision-making process. By traversing all tile coordinates, a "two-dimensional spatial computational power weight matrix" proportional to the physical size of the WSI is generated.
[0081] Step S4: Dynamic scheduling and execution of underlying resources based on the weight matrix
[0082] Reference Figure 4 This is the execution diagram of the underlying hardware and software scheduling logic. The two-dimensional spatial computing power weight matrix is input into the underlying resource scheduling engine, and the following strategies are executed:
[0083] (1) Memory and front-end rendering optimization: The system intercepts regular in-order loading instructions and forces tiles with scheduling priority greater than the first threshold (T1) in the weight matrix to be pushed into the high-speed cache (L1 Cache or RAM) first. For memory-resident tiles, Linux's mmap is used to establish WSI tile mapping, combined with madvise to implement prefetch hints, and mlock is used to lock high-priority tiles in RAM. Low-priority tiles are either lazy-loaded or resident on disk.
[0084] (2) Asymmetric allocation of AI inference computing power: For high-priority tile regions, the system dynamically calls heavy inference models (such as ResNet50, ViT, etc.) with large parameter counts and wide receptive fields for fine analysis. For low-priority regions, it dynamically downgrades to lightweight coarse screening models (such as MobileNetV3, lightweight CNN, etc.) for rapid pre-screening. Hot switching between large and small models is achieved through "dual-model resident memory + CUDA multi-stream asynchronous scheduling + memory quota control" to ensure low-latency switching.
[0085] In addition, the first threshold and the second threshold for model switching mentioned above adopt a dynamic adaptive mechanism, which is adjusted in real time by combining the WSI slice size, the remaining host memory, the remaining GPU memory, and the degree of I / O congestion.
[0086] Step S5: Confidence verification and system fallback mechanism for anti-hallucination
[0087] To prevent the large language model from generating "medical illusions," a multi-condition safety degradation mechanism is triggered when the physical coordinate set is empty (i.e., the text requests a search for a certain tissue but no match is found in the image), the cross-modal matching confidence is too low, or there are abnormalities in video memory / I / O. In this case, the system automatically ignores the weight matrix generated by the large language model and reverts to a global uniform computing power scanning and sequential loading mode, thereby preventing the risk of missed diagnoses.
[0088] The Fallback mechanism is preferably a multi-condition triggered safety degradation mechanism, rather than triggering only when the coordinate set is empty. Besides an empty coordinate set, it can also be triggered under conditions such as low cross-modal matching confidence, insufficient candidate region coverage, unstable candidate ranking, semantic and morphological rule conflicts, and abnormal GPU memory / I / O. After triggering, the system stops local skew scheduling based on the LLM weight matrix, reallocates GPU computing power to a full-area uniform scanning task, and switches the front-end rendering strategy to a sequential loading / neighborhood prefetching mode consistent with traditional WSI browsing, thereby preventing the risk of missed diagnoses due to large model medical illusions or textual misleading.
[0089] Specific application verification:
[0090] Application 1 (Radical gastrectomy for gastric cancer):
[0091] The prior clinical text was "An ulcerative mass is seen on the greater curvature; please focus on assessing the upper resection margin and surrounding lymph node metastasis." The large language model analysis clearly identified the "upper resection margin" and "lymph nodes" as highly optimized rendering areas. The system located their coordinates on a 1.25x thumbnail and generated a computational matrix. When a doctor opens a 30GB image, high-resolution images of high-risk areas are already resident in memory, enabling "instant opening" when dragging and viewing the image. The AI focuses 90% of the GPU computing power on the analysis of nuclear atypia of cells at the resection margin, avoiding unnecessary computational consumption on normal gastric wall smooth muscle, reducing the processing time per image from 5 minutes to 1 minute.
[0092] Application 2 (Validation in scenarios with different histological distribution characteristics):
[0093] To verify the generalization ability of the present invention, three disease types with different histological distribution characteristics were selected for WSI (average file size: 25GB-30GB / image, scan magnification: 40X).
[0094] Scenario A: Breast-conserving surgery and sentinel lymph node biopsy for breast cancer (excess fat, extremely small lesion)
[0095] Clinical preliminary text: "Postoperatively, sentinel lymph nodes were sent for examination after left breast invasive ductal carcinoma surgery. Please focus on screening for lymph node micrometastases."
[0096] WSI characteristics: The slices contain a large amount of invalid normal adipose tissue (low information entropy) and a very small proportion of lymph node tissue.
[0097] Scenario B: Lobectomy specimen (multiple cavities, need to examine the edges)
[0098] Clinical preliminary text: "Right upper lobe resection revealed grayish-white nodules. Please assess pleural involvement and bronchial resection margins."
[0099] WSI features: Lung tissue contains numerous alveolar cavities (large areas of blank noise), and the real diagnostic challenge lies in the tumor breakthrough at the periphery of the pleura.
[0100] Scenario C: Radical resection specimen of colorectal cancer (distinct layers, deeper layers need to be examined)
[0101] Clinical preliminary text: "Ulcerative mass in the ascending colon, please assess the depth of tumor invasion (whether it penetrates the muscle layer) and intravascular tumor emboli."
[0102] WSI characteristics: The intestinal wall has obvious topological layering structures such as mucosa, submucosa, and muscular layer, and high-precision computational coverage of the blood vessels deep in the muscular layer is required.
[0103] The specific experimental data are shown in Table 1 below:
[0104] Table 1
[0105]
[0106] These experimental data, using disease scenarios with different histological characteristics (breast cancer, lung cancer, colorectal cancer) as test benchmarks, fully verify the generalization ability of this invention and its significant technical advantages in practical clinical applications.
[0107] 1. Effectively breaks through I / O bottlenecks, achieving ultra-fast rendering and memory reduction.
[0108] Experimental data shows that, compared to traditional uniform loading strategies, this invention achieves a qualitative leap in front-end high-ratio rendering latency. In breast cancer and lung cancer scenarios, the waiting time for doctors to drag high-resolution images plummeted from 1850ms and 2100ms to 120ms and 180ms, respectively, with latency reductions exceeding 91.4%. Simultaneously, regarding peak system memory and video memory usage, by performing lazy loading or degradation processing on non-critical areas, peak memory usage in lung cancer and colorectal cancer scenarios was reduced by 77.2% and 69.0%, respectively, fundamentally eliminating the risk of system memory overflow (OOM).
[0109] 2. Asymmetric computing power scheduling greatly saves AI computing resources and improves inference efficiency.
[0110] This invention achieves precise allocation of computing power through semantic-spatial dynamic scheduling. Regarding the proportion of computing power allocated to heavy models, the system breaks away from the traditional model of running large models globally at 100%, instead activating heavy models only in high-weight regions. For example, in the breast cancer scenario, only 12% of the region is activated, and in the colorectal cancer scenario, only 35% of the region is activated, resulting in a sharp decrease in computing power consumption of 88% and 65%, respectively. Thanks to the asymmetric and fine-grained allocation of computing power, the total time for AI single-image inference is significantly reduced from an average of 325 seconds per image to 68 seconds per image, improving overall processing efficiency by approximately 4.7 times.
[0111] 3. Improve model robustness and significantly reduce false negative and false positive rates.
[0112] Regarding diagnostic accuracy, this invention effectively overcomes the drawbacks of feature dilution and noise interference caused by traditional computing power amortization. In the sentinel lymph node scenario with small lesions (Scenario A), because the large model's computing power is fully focused on high-weight areas, the false negative rate of micrometastases decreases from 4.5% to 0.8%. In the pleural scenario of lung cancer with many cavities (Scenario B), non-text-focused areas are downgraded to coarse screening, significantly reducing the interference of background noise such as inflammatory cells, and reducing the false positive rate of overscanning from 6.2% to 1.5%.
[0113] Those skilled in the art should understand that the technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments have been described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0114] The above embodiments are merely illustrative of several implementations of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the appended claims.
Claims
1. A method for pathological WSI dynamic rendering and computing power scheduling based on multi-modal semantic-spatial mapping, characterized in that, Includes the following steps: Acquire the clinical prior text and corresponding whole slice images (WSI) of the target patient; The clinical prior text is input into a large language model for parsing, and diagnostic intent labels with initial semantic risk weights are extracted. The diagnostic intent labels are then converted into pathological semantic feature vectors. Low-resolution panoramic images of the whole slice image WSI are extracted, local visual features of the panoramic images are extracted through a lightweight segmentation network, and cross-modal similarity matching is performed between the local visual features and the pathological semantic feature vector. Regions that meet the similarity requirements are mapped to a two-dimensional physical tile coordinate set at a high-resolution level. For each image tile to be processed, its scheduling priority is calculated by combining its corresponding semantic risk weight, local image information entropy, and rendering cost. A two-dimensional spatial computing power weight matrix is then generated based on the scheduling priority of all tiles. Specifically, calculating the scheduling priority of the image tile involves: weighting and summing the semantic risk weight and local image information entropy corresponding to the image tile; using the weighted sum as the numerator and the sum of the rendering cost of loading the image tile from storage to video memory and the smoothed minimum value as the denominator, a ratio is calculated to obtain the final scheduling priority. The local image information entropy is used to characterize the image complexity after excluding blank spaces and uniform bubble interference; the rendering cost is obtained by comprehensively evaluating file size, storage read latency, network transmission latency, and decoding / transfer time. The two-dimensional spatial computing power weight matrix is input into the underlying resource scheduling engine, and a memory rendering scheduling strategy and / or a computing power allocation strategy of the artificial intelligence inference model are executed based on the scheduling priority; wherein, the memory rendering scheduling strategy includes: forcibly pushing image tiles with a scheduling priority greater than a first threshold in the two-dimensional spatial computing power weight matrix into the system's high-speed cache; and executing lazy loading or discard instructions for image tiles with a scheduling priority lower than or equal to the first threshold; The computational power allocation strategy of the artificial intelligence inference model includes: enabling an adaptive cascaded deep learning framework; for image tiles with a scheduling priority greater than the second threshold in the two-dimensional spatial computational power weight matrix, calling a heavy inference model for fine recognition and analysis; for image tiles with a scheduling priority lower than or equal to the second threshold, downgrading the call to a lightweight coarse screening model for pre-screening or skipping the processing.
2. The pathological WSI dynamic rendering and computing power scheduling method of claim 1, wherein, The step of performing cross-modal similarity matching between the local visual features and the pathological semantic feature vector specifically includes: Candidate tissue connected components and local visual features of the low-resolution panoramic image are generated using a lightweight morphological segmentation network. Query the preset cross-modal mapping dictionary and calculate the similarity score between the pathological semantic feature vector and the local visual features of the connected domains of each candidate tissue; Candidate connected regions with similarity scores higher than a preset threshold are selected as target regions, and the target regions are linearly mapped to the two-dimensional physical tile coordinate set at a high resolution level according to the image pyramid hierarchy scaling relationship.
3. The pathological WSI dynamic rendering and computing power scheduling method of claim 2, wherein, The construction method of the cross-modal mapping dictionary includes: annotating typical anatomical or pathological regions on low-resolution pathological images to form a visual prototype library, and inputting standardized pathological terms and corresponding annotated regions into a text encoder and an image encoder, respectively. Cross-modal alignment training is completed through visual-language contrastive learning to obtain a semantic-visual embedding space for retrieval.
4. The pathological WSI dynamic rendering and computing power scheduling method of claim 1, wherein, The first threshold and the second threshold are dynamic adaptive thresholds; The method further includes: dynamically adjusting the first threshold and the second threshold in real time based on a preset baseline threshold, taking into account the size of the full slice image, the number of candidate tiles, the current remaining host memory, the remaining video memory of the graphics processor, and the degree of input / output congestion.
5. The pathological WSI dynamic rendering and computing power scheduling method of claim 1, wherein, The pathological WSI dynamic rendering and computing power scheduling method also includes anti-hallucination confidence verification and system rollback mechanism: When the two-dimensional physical tile coordinate set obtained by cross-modal matching is empty, the cross-modal matching confidence is lower than the safety threshold, or the system experiences input / output anomalies, the system rollback mechanism of the underlying resource scheduling engine is triggered, the scheduling according to the two-dimensional spatial computing power weight matrix is stopped, and the resource allocation mode is rolled back to the full-map uniform scanning and sequential loading mode.
6. A system for implementing the pathological WSI dynamic rendering and computing power scheduling method of any one of claims 1-5, characterized in that, include: The multi-source data acquisition and parsing module is used to acquire clinical prior text and whole slide images, extract diagnostic intent labels with semantic risk weights through a large language model and convert them into pathological semantic feature vectors. The image topology sensing module is used to extract low-resolution panoramic images and obtain local visual features, and to map high-risk areas to a set of two-dimensional physical tile coordinates at high resolution through cross-modal matching. The computing power matrix generation module is used to combine the semantic risk weights of image tiles, local image information entropy, and rendering cost to calculate scheduling priority and generate a two-dimensional spatial computing power weight matrix. The underlying resource scheduling engine is used to receive the two-dimensional spatial computing power weight matrix and perform memory prefetching scheduling and dynamic computing power allocation for the image tiles and the artificial intelligence cascade model accordingly.
Citation Information
Patent Citations
WSI image weak supervision pathology analysis method and device based on deep learning
CN116309333A
Online map tile rapid loading and cache optimization method, medium and system
CN121326442A