Method for processing dicom file sequence, medical image analysis method and system

By generating dynamic subsequence partitioning strategies and online fine-tuning mechanisms in a heterogeneous computing environment, the problems of low resource utilization and long processing time are solved, enabling efficient and progressive medical image analysis, adapting to dynamic load changes, and improving user experience and the degree of analysis automation.

CN121601170BActive Publication Date: 2026-06-19BEIJING HUAYI NETWORK TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING HUAYI NETWORK TECH CO LTD
Filing Date
2026-01-28
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies have low resource utilization and long processing time in heterogeneous computing environments, and traditional parallel processing methods lack incremental feedback, making it difficult to meet the real-time needs of clinical practice.

Method used

By generating a dynamic subsequence partitioning strategy through parallelism adaptation analysis, image subsequences are non-uniformly distributed to computing units, and an online fine-tuning mechanism and progressive result integration are introduced to dynamically adjust task load and resource allocation.

Benefits of technology

It improves the utilization of heterogeneous computing resources, shortens processing time, enables progressive result output, improves user experience, adapts to dynamic load changes, and enhances the automation and reliability of medical image analysis results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121601170B_ABST
    Figure CN121601170B_ABST
Patent Text Reader

Abstract

This invention belongs to the field of image processing, specifically relating to a method for processing DICOM file sequences, a medical image analysis method, and a system, aiming to solve the problems of low resource utilization and long processing time of existing parallel processing technologies in heterogeneous environments. The method includes: acquiring and preprocessing the original DICOM file sequence; performing parallelism adaptation analysis based on the total number of images and currently available computing resources to generate a dynamic subsequence partitioning strategy that determines the total number of subsequences and the number of non-uniform images in each subsequence; non-uniformly partitioning the image sequence into multiple image subsequences according to this strategy; allocating each image subsequence to independent image processing threads for parallel execution, and monitoring the load in real time to fine-tune the task allocation online; triggering dynamic integration upon completion of calculation by any thread, merging the processing results according to their order in the original sequence to generate the final processed image sequence. This application significantly shortens the overall processing time and improves processing efficiency and user experience.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing, and specifically relates to a method for processing DICOM file sequences, a medical image analysis method, and a system. Background Technology

[0002] With the development of modern medical imaging technologies such as computed tomography (CT) and magnetic resonance imaging (MRI), the resulting file sequences conforming to the Medical Digital Imaging and Communications Standard (DICOM) contain massive amounts of image frames, resulting in an enormous data volume. Image processing of these sequences, including 3D reconstruction, lesion identification, and quantitative analysis, is a common requirement in clinical diagnosis and medical research. However, due to the sheer volume of data, traditional serial processing methods are extremely time-consuming and cannot meet the real-time requirements of clinical practice.

[0003] To improve processing efficiency, existing technologies typically employ parallel computing methods. A common parallel strategy is to evenly divide an image sequence into several subsequences and distribute them across multiple computing cores, such as CPU cores or GPU stream processors, for simultaneous processing. However, this simple even partitioning strategy faces significant efficiency bottlenecks in practical applications. First, modern computing systems are often heterogeneous, meaning that different computing units, such as different models of CPU cores or CPUs and GPUs, have varying processing capabilities. Evenly distributing tasks on such systems can lead to powerful computing units becoming idle after completing their tasks, waiting for weaker units, creating a bottleneck effect and unnecessarily lengthening the overall processing time. Second, system load is dynamic; the operation of other processes may preempt computing resources, causing the originally statically allocated task load to become unsuitable during execution, further reducing resource utilization. Furthermore, traditional parallel processing models often employ a branching connection pattern, which requires waiting for all subtasks to complete before integrating the results. This results in users experiencing long wait times for the final results, lacking progressive and effective feedback.

[0004] Therefore, how to design a parallel processing method that can fully utilize heterogeneous computing resources, dynamically adapt to changes in system load, and achieve rapid and progressive integration of processing results is a technical problem that urgently needs to be solved in the field of medical image processing. Summary of the Invention

[0005] To address the aforementioned problems in existing technologies, namely the low resource utilization and long processing time of existing parallel processing technologies in heterogeneous environments, this invention provides a method for processing DICOM file sequences, a medical image analysis method, and a system.

[0006] In a first aspect, the present invention provides a method for processing DICOM file sequences, comprising the following steps:

[0007] Acquire and preprocess the raw DICOM file sequence to generate a normalized image sequence;

[0008] Parallelism adaptation analysis is performed on the standardized image sequence. The parallelism adaptation analysis is based on the total number of images and the currently available computing resources to generate a dynamic subsequence partitioning strategy that determines the total number of subsequences and the number of non-uniform images in each subsequence.

[0009] Based on the dynamic subsequence partitioning strategy, the standardized image sequence is non-uniformly divided into multiple image subsequences;

[0010] Each image subsequence is assigned to an independent image processing thread, and the same target image processing algorithm is executed in parallel in each image processing thread. At the same time, the starting index of each image subsequence in the original sequence is recorded.

[0011] Based on the starting index and the calculation completion status of each image processing thread, all processed image subsequences are dynamically integrated. When any thread completes its calculation, it triggers the merging of the corresponding processing results according to the order of each image subsequence in the original sequence to generate the final processed image sequence.

[0012] Furthermore, the parallelism adaptation analysis includes:

[0013] The currently available computing resources are mapped into multiple computing units, and a resource topology model is established for each computing unit. The baseline throughput of each computing unit is calculated, and the total number of subsequences is set to be equal to the number of computing units.

[0014] The computing units are arranged in descending order based on their baseline throughput to form an ordered sequence of computing units.

[0015] A task volume gradient allocation model is constructed. The task volume gradient allocation model defines the functional relationship between the difference in the number of images allocated to any two adjacent computing units in the ordered computing unit sequence and the difference in the baseline throughput of the two computing units through a preset decay function.

[0016] Based on the aforementioned functional relationship and the total number of images, the task quantity gradient allocation model is solved to determine the specific number of images contained in each subsequence, thereby generating the dynamic subsequence partitioning strategy.

[0017] Furthermore, based on the aforementioned functional relationship and combined with the total number of images, the task quantity gradient allocation model is solved to determine the specific number of images contained in each subsequence, thereby generating the dynamic subsequence partitioning strategy. The method is as follows:

[0018] An initial image quantity baseline value to be allocated is set for the ordered computation unit sequence;

[0019] Based on the preset decay function, the difference between the baseline throughput of the first-ranked computing unit and the baseline throughput of the second-ranked computing unit in the ordered computing unit sequence is used to calculate the difference in the number of image allocations between the first-ranked computing unit and the second-ranked computing unit.

[0020] Based on the difference between the initial image quantity benchmark value and the image allocation quantity, the image allocation quantity of the second-ranked calculation unit is determined;

[0021] The preset decay function is applied iteratively, along with the number of images allocated to the previous calculation unit in the ordered calculation unit sequence, to sequentially determine the number of images allocated to each subsequent calculation unit.

[0022] The number of images allocated to all computing units in the ordered computing unit sequence is summed, and the summation result is compared with the total number of images;

[0023] If the summation result is not equal to the total number of images, the initial image quantity benchmark value is adjusted according to the difference, and the steps starting from calculating the difference in the number of image allocations are repeated for iterative calculation.

[0024] The iteration terminates when the summation result matches the total number of images. At this point, the number of images allocated to each computing unit constitutes the dynamic subsequence partitioning strategy.

[0025] Furthermore, each image subsequence is assigned to an independent image processing thread, and the same target image processing algorithm is executed in parallel in each image processing thread. Simultaneously, the starting index of each image subsequence in the original sequence is recorded. The method is as follows:

[0026] Based on the dynamic subsequence partitioning strategy, an image processing thread pool corresponding to the total number of subsequences is created, and each image processing thread in the pool is associated with a physical computing unit in the ordered computing unit sequence.

[0027] Based on the number of non-uniform images in each image subsequence, the start index and end index of each image subsequence in the standardized image sequence are calculated sequentially, and the corresponding image data blocks are mapped to the private memory space of each image processing thread through memory mapping technology.

[0028] Each image processing thread independently executes the target algorithm based on the data in its own memory space, and monitors the actual load indicators of its associated computing units;

[0029] The execution progress and load metrics of all image processing threads are periodically collected. When the actual performance metrics of the associated computing units monitored within the preset continuous monitoring period deviate from the corresponding baseline throughput, the online fine-tuning of the data block range allocated to the image processing threads associated with the computing units whose performance metrics deviate is triggered based on the real-time load and the number of remaining unprocessed images.

[0030] The online fine-tuning is achieved by dynamically adjusting the offset and length of the memory mapping region corresponding to the affected image processing thread, and after completion, a synchronization event is sent to the affected image processing thread to update the data range and starting index processed by the image processing thread.

[0031] After each image processing thread completes the processing of the current data block, it associates the final processing result with the starting index it maintains and outputs it.

[0032] Furthermore, the online fine-tuning is achieved by dynamically adjusting the offset and length of the corresponding memory-mapped region, including:

[0033] Based on the degree and duration of deviation between the actual performance indicators of the associated computing units and the benchmark throughput, it is determined whether to trigger a fine-tuning operation.

[0034] When fine-tuning is triggered, the adjustment amount for the affected thread is calculated based on the actual load index and the number of remaining unprocessed images;

[0035] Based on the adjustment amount, define the data sub-blocks to be cut from the range of data blocks currently allocated to the affected threads;

[0036] The association between the data sub-block and the original memory mapping region is released using atomic operations;

[0037] By creating a private memory mapping region pointing to the starting physical address of the data sub-block for the candidate thread, the data sub-block to be cut will be unassociated and the memory mapping relationship between it and the candidate thread with the lowest current real-time load will be re-established.

[0038] Synchronously update the offset address and length of the memory mapping region corresponding to the affected thread and the candidate thread, as well as the start index and end index maintained by each thread.

[0039] Furthermore, based on the starting index and the computation completion status of each image processing thread, all processed image subsequences are dynamically integrated, as follows:

[0040] Set up a final result buffer that matches the total number of images, and start a separate event listener process to poll the computation completion status of each image processing thread;

[0041] When any thread completes its calculation, immediately obtain the image subsequence processing result and the starting index of the record output by that image processing thread;

[0042] Based on the obtained starting index, the corresponding processing result data is written sequentially into the corresponding logical storage area of ​​the final result buffer;

[0043] In a global status table, the logical range covered by this write result is marked as integrated;

[0044] Based on the global state table, continuously determine whether there is a continuous and uninterrupted integrated interval starting from the logical start position of the standardized image sequence;

[0045] If it exists, mark the buffer data corresponding to the longest continuous integrated interval that has been formed as committable.

[0046] The process of polling, acquiring, writing, marking, and judging is repeated until the global state table indicates that all logical intervals of the standardized image sequence have been marked as integrated, and the complete interval from the beginning to the end is in a committable state. At this time, the data in the final result buffer constitutes the final processed image sequence.

[0047] In a second aspect, the present invention proposes a medical image analysis method based on a method for processing DICOM file sequences, comprising:

[0048] While generating the dynamic subsequence partitioning strategy, execution priority and resource allocation parameters are determined for at least two medical image analysis algorithms that are to be applied sequentially to the output results of the standardized image sequence or the target image processing algorithm.

[0049] After obtaining multiple image subsequences according to the dynamic subsequence partitioning strategy, a composite computation task is constructed for each image subsequence. The composite computation task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence.

[0050] An independent image processing thread is created for each of the composite computing tasks and assigned to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority.

[0051] After completing the composite computation task, each image processing thread generates a result data block containing all algorithm results and records the starting index corresponding to the processed image subsequence;

[0052] Start a result aggregation thread to monitor the completion status of each image processing thread. When a thread is detected to be complete, obtain the result data block output by that image processing thread and its corresponding starting index.

[0053] The result aggregation thread merges each result data block into the global result sequence according to the original logical order based on the starting index, and performs weighted fusion of the merged results according to the execution priority;

[0054] During the process of dynamically integrating all processed image subsequences, a progressive analysis report matching the processing procedure of the composite computing task is generated synchronously based on the integration status of the global result sequence.

[0055] Furthermore, the execution priority and resource allocation parameters for at least two medical image analysis algorithms are determined, including the following steps:

[0056] Obtain historical execution performance data of the at least two medical image analysis algorithms, wherein the historical execution performance data includes peak time for a single operation and peak memory usage.

[0057] Based on the data characteristics of the data processed by the medical image analysis algorithms, the degree of dependence of each medical image analysis algorithm on the data characteristics is analyzed, and the dependence level is defined;

[0058] Based on the peak computation time and memory usage in the historical execution performance data, as well as the dependency level, a quantified initial execution priority score is generated for each medical image analysis algorithm through a preset priority calculation function.

[0059] During the generation or application of the dynamic subsequence partitioning strategy, the initial execution priority score is mapped to specific resource allocation parameters based on the baseline throughput and free memory capacity of each computing unit. The resource allocation parameters include the proportion of computing time slices reserved for each algorithm on each computing unit and the maximum available memory quota.

[0060] Based on the resource allocation parameters, the resource quota allocated by each computing unit to the composite computing task is dynamically adjusted.

[0061] Furthermore, when the result aggregation thread acquires the result data block and its starting index, it synchronously receives the actual load indicators of the computing unit where it is located when processing the composite computing task, which are monitored and reported by each image processing thread.

[0062] Based on the actual load indicators, the effective throughput of the corresponding computing unit when executing various medical image analysis algorithms is re-evaluated;

[0063] The effective throughput is compared with the baseline throughput of the computing unit, and the deviation value is calculated.

[0064] During the process of merging each result data block into the global result sequence, a confidence factor that is dynamically adjusted according to the deviation value is attached to the algorithm result in each result data block. When the deviation value exceeds a preset threshold, the confidence factor is negatively adjusted.

[0065] When it is necessary to fuse multiple results from different result data blocks that belong to the same logical location and originate from the same algorithm, a weighted fusion coefficient is calculated based on the confidence factor attached to each result and the execution priority.

[0066] After completing the fusion processing of all currently available result data blocks, a computing resource efficiency assessment report is generated based on the overall distribution of the deviation values.

[0067] Based on the aforementioned computing resource performance evaluation report, online fine-tuning of the resource allocation parameters of the computing units associated with the image processing threads that have not yet completed their computations is triggered.

[0068] In a third aspect, the present invention provides a medical image analysis system for performing a medical image analysis method, the system comprising:

[0069] The strategy generation module is used to determine the execution priority and resource allocation parameters for at least two medical image analysis algorithms that are to be applied sequentially to the output results of the standardized image sequence or the target image processing algorithm while generating the dynamic subsequence partitioning strategy.

[0070] The task construction module is used to obtain multiple image subsequences according to the dynamic subsequence partitioning strategy, and then construct a composite computing task for each image subsequence. The composite computing task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence.

[0071] The thread scheduling module is used to create an independent image processing thread for each of the composite computing tasks and allocate it to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority.

[0072] The result encapsulation module is used by each image processing thread to generate a result data block containing all algorithm results after completing the corresponding composite calculation task, and to record the starting index corresponding to the processed image subsequence.

[0073] The result monitoring and collection module is used to start a result aggregation thread, monitor the completion status of each image processing thread, and when the thread is detected to be completed, obtain the result data block output by that image processing thread and the corresponding starting index.

[0074] The result fusion module is used by the result aggregation thread to merge each result data block into the global result sequence according to the original logical order based on the starting index, and to perform weighted fusion of the merged results according to the execution priority.

[0075] The report generation module is used to synchronously generate a progressive analysis report that matches the corresponding image processing process based on the integration status of the global result sequence during the process of dynamically integrating all processed image subsequences.

[0076] The beneficial effects of this invention are:

[0077] This invention fundamentally overcomes the limitations of traditional uniform task partitioning by introducing a parallelism adaptation analysis based on currently available computing resources and the total number of images, and generating a dynamic subsequence partitioning strategy for non-uniform image numbers. This method accurately matches the computational load with the actual processing capabilities of heterogeneous computing units, allocating more image data to computing units with higher baseline throughput. This capability-gradient task allocation mechanism effectively avoids the "weakest link" effect caused by a few slow units dragging down the overall progress, thus significantly improving the overall utilization of computing resources and shortening the overall processing time in a heterogeneous computing environment.

[0078] The online fine-tuning mechanism of this invention endows the system with robustness and adaptability in the face of dynamically changing loads. By continuously monitoring the actual performance indicators of each computing unit during parallel execution and dynamically adjusting the memory mapping region when a continuous deviation is detected, the system can achieve real-time rebalancing of the task load. This feature makes the processing not only dependent on the initial static partitioning, but also responsive to performance fluctuations caused by factors such as other processes preempting resources during runtime, ensuring that high processing efficiency and stability are maintained even in highly dynamic environments.

[0079] This invention differs from the traditional branching-connection model's synchronization mechanism, which requires waiting for all subtasks to complete. The proposed dynamic integration method, based on event triggering and state polling, enables the progressive generation and output of processing results. Once any image processing thread completes its calculation, its corresponding results can be immediately and sequentially merged into the final sequence. This means users can obtain some or even most of the processing results without waiting for all data to be calculated, greatly improving the user experience and providing a technological foundation for clinical scenarios requiring real-time or near-real-time feedback.

[0080] When this invention is further applied to the field of medical image analysis, it forms an efficient and intelligent end-to-end analysis workflow by constructing composite computational tasks, managing the priority and resource allocation of multiple algorithms, and introducing confidence factors and progressive report generation. This system not only inherits the efficiency and adaptability advantages of the underlying parallel processing framework, but also improves the automation level, the reliability of results, and the visualization level of medical image analysis through algorithm-level optimization and integration and result quality assessment, thus providing a more powerful and reliable tool for assisted diagnosis and medical research. Attached Figure Description

[0081] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0082] Figure 1 This is a flowchart of a method for processing a DICOM file sequence according to the first embodiment of the present invention;

[0083] Figure 2 This is a flowchart of a medical image analysis method according to the second embodiment of the present invention;

[0084] Figure 3 This is a structural diagram of a medical image analysis system according to the third embodiment of the present invention;

[0085] Figure 4 This is a schematic diagram of the structure of a computer system used to implement the methods, systems, and electronic devices of this application. Detailed Implementation

[0086] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the invention. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0087] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0088] The first embodiment of the present invention proposes a method for processing DICOM file sequences, comprising the following steps:

[0089] Step S10: Obtain and preprocess the original DICOM file sequence to generate a normalized image sequence;

[0090] Step S20: Perform parallelism adaptation analysis on the standardized image sequence. The parallelism adaptation analysis is based on the total number of images and the currently available computing resources to generate a dynamic subsequence partitioning strategy that determines the total number of subsequences and the number of non-uniform images in each subsequence.

[0091] Step S30: According to the dynamic subsequence partitioning strategy, the standardized image sequence is non-uniformly divided into multiple image subsequences;

[0092] Step S40: Assign each image subsequence to an independent image processing thread, and execute the same target image processing algorithm in parallel in each image processing thread, while recording the starting index of each image subsequence in the original sequence;

[0093] Step S50: Based on the starting index and the calculation completion status of each image processing thread, dynamically integrate all processed image subsequences. When any thread completes its calculation, it triggers the merging of the corresponding processing results according to the order of each image subsequence in the original sequence to generate the final processed image sequence.

[0094] To more clearly illustrate the method for processing DICOM file sequences according to the present invention, the following description is in conjunction with... Figure 1 The steps in the embodiments of the present invention are described in detail below:

[0095] Step S10: Obtain and preprocess the original DICOM file sequence to generate a normalized image sequence;

[0096] Step S10 in this embodiment is the foundation for all subsequent efficient parallel processing. Its core objective is to transform raw medical image data from diverse sources with potentially subtle format differences into a unified, well-organized, and high-performance in-memory data structure suitable for high-performance computing. Specifically, this step first retrieves all DICOM files constituting a complete examination sequence in batches from a specified storage location, such as the local file system, network attached storage, or directly from a medical image archiving and communication system, i.e., a PACS system. During the retrieval process, the system needs to have intelligent identification and filtering capabilities, automatically filtering out non-DICOM files or files that do not belong to the target sequence based on file header information or metadata, ensuring the purity of the data source. After retrieving the files, a crucial sub-step is sequence rearrangement. Since the naming or arrangement order of files in physical storage may not be consistent with their logical anatomical or temporal order, all files must be precisely sorted. This sorting process is not simply based on filenames, but rather through deep parsing of the metadata tags of each DICOM file, such as using instance numbers, or more reliable slice locations, and image location tags, to accurately reconstruct and sort all image frames in three-dimensional space or time, thereby forming a logically continuous raw image sequence.

[0097] After the sequences are correctly sorted, the core stage of data preprocessing and standardization begins. This stage aims to eliminate data heterogeneity caused by different scanning devices, scanning parameters, or reconstruction algorithms. The system reads the sorted DICOM files one by one, extracting pixel data and key metadata information. An important standardization operation is the conversion of pixel values. Pixel values ​​in raw DICOM images are usually stored in integer form, and their physical meaning needs to be interpreted through window level and window width in the metadata, or more generally, rescaling slope and rescaling intercept. This method applies this metadata to convert the raw stored pixel values ​​into units with explicit physical meaning, such as converting them to Heinz units in CT images, thereby ensuring the numerical comparability of image data from different sources. To adapt to the requirements of specific downstream image processing algorithms, intensity value normalization may also be performed, such as linearly scaling pixel values ​​to the floating-point range of 0 to 1, or performing Z-score standardization to ensure that the distribution satisfies zero mean and unit variance, which is particularly important for many deep learning-based analysis models.

[0098] Besides pixel standardization, spatial standardization is equally crucial. Images in the original DICOM sequence may contain anisotropic voxels, meaning the interlayer spacing is inconsistent with the intralayer pixel spacing. To facilitate accurate 3D analysis, this method performs voxel resampling. Using interpolation algorithms such as trilinear interpolation, cubic spline interpolation, or more advanced methods, the entire 3D image dataset is resampled onto a uniform, isotropic voxel grid, for example, uniformly 1mm × 1mm × 1mm cubes. After this series of processes, the originally scattered and inconsistent DICOM file set is transformed and loaded into a continuous, uniformly typed, numerically standardized, and spatially standardized multidimensional array—the standardized image sequence—stored in computer memory. This standardized data block lays a solid foundation for subsequent parallel partitioning and efficient processing, ensuring that all parallel processing threads deal with homogeneous data, thereby simplifying algorithm design and improving processing stability and accuracy.

[0099] Step S20: Perform parallelism adaptation analysis on the standardized image sequence. The parallelism adaptation analysis is based on the total number of images and the currently available computing resources to generate a dynamic subsequence partitioning strategy that determines the total number of subsequences and the number of non-uniform images in each subsequence.

[0100] In this embodiment, the parallelism adaptation analysis includes:

[0101] The currently available computing resources are mapped into multiple computing units, and a resource topology model is established for each computing unit. The baseline throughput of each computing unit is calculated, and the total number of subsequences is set to be equal to the number of computing units.

[0102] The computing units are arranged in descending order based on their baseline throughput to form an ordered sequence of computing units.

[0103] A task volume gradient allocation model is constructed. The task volume gradient allocation model defines the functional relationship between the difference in the number of images allocated to any two adjacent computing units in the ordered computing unit sequence and the difference in the baseline throughput of the two computing units through a preset decay function.

[0104] Based on the aforementioned functional relationship and the total number of images, the task quantity gradient allocation model is solved to determine the specific number of images contained in each subsequence, thereby generating the dynamic subsequence partitioning strategy.

[0105] Based on the aforementioned functional relationship and the total number of images, the task quantity gradient allocation model is solved to determine the specific number of images contained in each subsequence, thereby generating the dynamic subsequence partitioning strategy. The method is as follows:

[0106] An initial image quantity baseline value to be allocated is set for the ordered computation unit sequence;

[0107] Based on the preset decay function, the difference between the baseline throughput of the first-ranked computing unit and the baseline throughput of the second-ranked computing unit in the ordered computing unit sequence is used to calculate the difference in the number of image allocations between the first-ranked computing unit and the second-ranked computing unit.

[0108] Based on the difference between the initial image quantity benchmark value and the image allocation quantity, the image allocation quantity of the second-ranked calculation unit is determined;

[0109] The preset decay function is applied iteratively, along with the number of images allocated to the previous calculation unit in the ordered calculation unit sequence, to sequentially determine the number of images allocated to each subsequent calculation unit.

[0110] The number of images allocated to all computing units in the ordered computing unit sequence is summed, and the summation result is compared with the total number of images;

[0111] If the summation result is not equal to the total number of images, the initial image quantity benchmark value is adjusted according to the difference, and the steps starting from calculating the difference in the number of image allocations are repeated for iterative calculation.

[0112] The iteration terminates when the summation result matches the total number of images. At this point, the number of images allocated to each computing unit constitutes the dynamic subsequence partitioning strategy.

[0113] Step S20 in this embodiment is the core decision-making step in the entire parallel processing flow. Its goal is to generate an optimal initial task partitioning scheme before starting actual computation, based on the total data scale of the standardized image sequence to be processed and the specific status of computing resources that can be safely and stably called in the current operating environment, through a set of refined quantitative analysis and modeling calculations. The core feature of this scheme is its dynamism and non-uniformity. That is, the total number of subsequences strictly corresponds to the number of currently available computing units that have been evaluated for performance, while the number of image frames allocated to each subsequence is not equal, but is allocated in a gradient and differentiated manner according to the actual processing capabilities of the computing units associated with it. Thus, at the beginning of task distribution, it strives to achieve a precise match between computing load and hardware capabilities, maximizing the initial utilization efficiency of the heterogeneous computing cluster.

[0114] In practice, the parallelism adaptation analysis first initiates a resource discovery and modeling process. The system uses interfaces provided by the operating system or dedicated performance monitoring libraries to explore all available physical computing resources on the current computing node. These resources may include CPU physical cores of different performance levels, streaming multiprocessor clusters in GPU devices supporting parallel computing, and even computing nodes available for remote invocation. The system abstracts and maps these physical resources into multiple independent logical computing units. A resource topology model is established for each computing unit. This model not only records its hardware identifier and type, but more importantly, it calculates its baseline throughput by executing a set of standardized benchmark tests. This baseline throughput is a comprehensive performance indicator that reflects the number of standard image frames that the computing unit can stably process per unit time when performing a load similar to the computational characteristics of the target image processing algorithm; its unit can be frames per second. Benchmark tests are typically performed when the system is idle or the load is controllable to ensure the representativeness of the results. After the calculation is completed, the system directly sets the total number of subsequences to be divided as the number of currently identified and modeled computing units, ensuring that each computing unit can be assigned a dedicated image subsequence for processing, achieving maximum parallelism.

[0115] After obtaining the baseline throughput of all computing units, the system sorts all computing units in descending order based on this rate value, forming an ordered sequence of computing units from strongest to weakest, denoted as . ,in Represents the baseline throughput The highest unit, Represents the baseline throughput The lowest unit, This is the total number of units. This sorting forms the basis for subsequent non-uniform distribution.

[0116] Next, a key task workload gradient allocation model is constructed. The core idea of ​​this model is that computing units with greater processing power should undertake more computing tasks. To accurately quantify this allocation relationship, the difference in workload between adjacent tasks is first defined. Let the workload allocated to computing units be... The number of image frames is Then define the element and the next weaker element. The difference in the amount of work between them is This difference It is a non-negative value that needs to be determined.

[0117] The model uses a pre-defined monotonically increasing decay function. This formally defines how performance differences are mapped to workload differences. Specifically, the function defines how any two adjacent computational units in an ordered sequence of computational units... and The corresponding task volume difference The difference between the baseline throughput of these two computing units should be... Through function Relatedness, that is, establishing a functional relationship: .in, and This corresponds to the baseline throughput. Decay function. The specific form can be determined based on experience or experimentation; for example, it can be designed as a linear function. , For predefined scaling factors;

[0118] In this embodiment, to smooth out the differences, a nonlinear function in logarithmic or exponential form can also be used. This function ensures that the performance differences are mitigated. The larger the gap, the greater the difference in the amount of tasks assigned. The gap will also increase accordingly, but the rate at which the gap widens can be controlled through a function, thus avoiding excessive concentration of tasks on a few top-level units.

[0119] Based on the above functional relationship And combined with the total number of known images The system determines the specific number of images that each subsequence (i.e., each computational unit) should contain by iteratively solving the task workload gradient allocation model. The solution process starts from an assumed initial image quantity benchmark. Initially, this value is typically set to an estimate based on the idea of ​​average distribution, for example... Round up. First, because... It is the first and most capable unit in the sequence, and it is assigned the most images. Set as current Next, the difference in the first task quantity is calculated based on the functional relationship: Then, the second unit. Number of images allocated You can pass The calculation yielded the result.

[0120] Next, iterative calculations are performed to determine the allocation of all subsequent units in the sequence. For the th unit in the sequence... Units ( ), calculate its relationship with the previous unit The difference in baseline throughput And obtain the allocation difference based on the attenuation function. Then, the number of images allocated to that unit. That is This process begins Begin by sequentially calculating until... This yields the initial allocation of all computing units. .

[0121] Then, the summation of the number of all initially allocated images is obtained. The summation result Compare with the actual total number of images N. If Not equal to This indicates the current initial baseline value. Inappropriate settings can lead to discrepancies in the total allocation. The system will then adjust the allocation based on the difference. To adjust The adjustment strategy is usually as follows: if Then, according to a certain step size (such as 1 or based on...) (Calculation) increase ;like Then reduce .Adjustment Then, the system repeats the process from the settings. Begin, recalculate This process continues with subsequent recursive calculations and summation comparisons, forming an iterative loop.

[0122] The above iterative process continues until the sum of the number of all cell-assigned images calculated after a certain iteration is reached. Exactly equal to the total number of images The iteration terminates at this point, and the final number of images allocated to this group is determined. This constitutes the core data of the dynamic subsequence partitioning strategy. Based on this strategy, the system can explicitly and non-uniformly segment a standardized image sequence into subsequences. The nth subsequence, where the nth subsequence is... The subsequence contains A series of consecutive images, and this subsequence will be assigned to the corresponding first frame in the ordered computational unit sequence. Each computing unit This strategy generation process, through quantitative modeling and iterative solving, achieves a precise and adaptive match between task load and heterogeneous computing capabilities, laying a scientific foundation for subsequent efficient parallel execution.

[0123] Step S30: According to the dynamic subsequence partitioning strategy, the standardized image sequence is non-uniformly divided into multiple image subsequences;

[0124] Specifically, step S30 includes:

[0125] The strategy is analyzed to obtain the total number of subsequences and the corresponding sequence of image counts;

[0126] Initialize the cumulative index based on the memory starting address, single frame size, and total number of frames of the standardized image sequence;

[0127] The image quantity sequence is processed sequentially, and the current cumulative index is used as the starting index of the current subsequence;

[0128] Add or subtract one from the starting index and the current number of images to obtain the ending index of the current subsequence;

[0129] Update the cumulative index to the end index plus one, and repeat the above steps until all images have been processed, while verifying the validity of all index ranges.

[0130] Calculate the data start pointer based on the starting index, single frame size, and memory start address of each subsequence;

[0131] Generate a task descriptor for each subsequence, containing its data start pointer, number of images, and start index.

[0132] In specific implementation, the dynamic subsequence partitioning strategy is first obtained from the output of step S20. This strategy is represented in memory as an integer array of length n, denoted as [ ], where element I i This indicates that it should be allocated to the computing unit. The system holds a sequence of standardized images, which is logically considered as a contiguous block of memory starting at address pBase, with a total number of N frames and a frameSize bytes per frame.

[0133] The partitioning operation begins with the calculation of the boundary of the first subsequence. The system initializes a cumulative index variable acc. idx A value of 0 indicates the logical starting point of the entire sequence. For the first subsequence (corresponding to the computational unit)... Its logical starting index S1 is acc idx Based on the number of frames specified in the policy. The logical end index E1 is calculated using the formula E1 = S1 + I1 - 1. The system records this index range (S1, E1) as a tuple and associates it with the identifier of the subsequence. Subsequently, acc idx It is updated to E1+1, which is the value of S2, in preparation for calculating the next subsequence.

[0134] The system then iteratively partitions the subsequent i-th subsequence (i ranging from 2 to n). For each subsequence, its logical starting index S... i Inherited from the updated acc idx Its logical end index E i From formula E i =S i +I i-1 is confirmed. After each calculation, the system records the range and updates the cumulative index acc. idx =E i +1. Throughout the iteration, the system performs boundary checks to ensure that E holds for all i. i ≤N-1, and verify the accuracy after processing the last subsequence. idx The value of is exactly equal to the total number of frames N, in order to ensure the integrity and accuracy of the division and prevent data overflow or omission.

[0135] After calculating all logical index ranges, the system constructs a lightweight task descriptor for each image subsequence. The key feature of this descriptor is providing efficient, zero-copy access to the original data. For the i-th subsequence, the starting address of its corresponding image data in memory is... Through formula Calculated. The task descriptor will encapsulate the following information: data start pointer. Number of image frames in this subsequence I i Its logical starting index S in the original sequence i, And its associated target computing unit identifier. Finally, the output of step S30 is n such task descriptors, which are non-uniformly mapped to different continuous regions of the original data and accurately preserve global order information, providing directly usable task units for parallel scheduling and processing in subsequent steps.

[0136] Step S40: Assign each image subsequence to an independent image processing thread, and execute the same target image processing algorithm in parallel in each image processing thread, while recording the starting index of each image subsequence in the original sequence;

[0137] In this embodiment, step S40 specifically includes:

[0138] Based on the dynamic subsequence partitioning strategy, an image processing thread pool corresponding to the total number of subsequences is created, and each image processing thread in the pool is associated with a physical computing unit in the ordered computing unit sequence.

[0139] Based on the number of non-uniform images in each image subsequence, the start index and end index of each image subsequence in the standardized image sequence are calculated sequentially, and the corresponding image data blocks are mapped to the private memory space of each image processing thread through memory mapping technology.

[0140] Each image processing thread independently executes the target algorithm based on the data in its own memory space, and monitors the actual load indicators of its associated computing units;

[0141] The execution progress and load metrics of all image processing threads are periodically collected. When the actual performance metrics of the associated computing units monitored within the preset continuous monitoring period deviate from the corresponding baseline throughput, the online fine-tuning of the data block range allocated to the image processing threads associated with the computing units whose performance metrics deviate is triggered based on the real-time load and the number of remaining unprocessed images.

[0142] The online fine-tuning is achieved by dynamically adjusting the offset and length of the memory mapping region corresponding to the affected image processing thread, and after completion, a synchronization event is sent to the affected image processing thread to update the data range and starting index processed by the image processing thread.

[0143] After each image processing thread completes the processing of the current data block, it associates the final processing result with the starting index it maintains and outputs it.

[0144] The online fine-tuning is achieved by dynamically adjusting the offset and length of the corresponding memory-mapped region, including:

[0145] Based on the degree and duration of deviation between the actual performance indicators of the associated computing units and the benchmark throughput, it is determined whether to trigger a fine-tuning operation.

[0146] When fine-tuning is triggered, the adjustment amount for the affected thread is calculated based on the actual load index and the number of remaining unprocessed images;

[0147] Based on the adjustment amount, define the data sub-blocks to be cut from the range of data blocks currently allocated to the affected threads;

[0148] The association between the data sub-block and the original memory mapping region is released using atomic operations;

[0149] By creating a private memory mapping region pointing to the starting physical address of the data sub-block for the candidate thread, the data sub-block to be cut will be unassociated and the memory mapping relationship between it and the candidate thread with the lowest current real-time load will be re-established.

[0150] Synchronously update the offset address and length of the memory mapping region corresponding to the affected thread and the candidate thread, as well as the start index and end index maintained by each thread.

[0151] In practical implementation, after binding threads to hardware, the system needs to provide each thread with its own dedicated image data. Based on the logical starting index and the number of non-uniform images calculated for each image subsequence in step S30, the system can accurately calculate the physical offset and data length of the data block corresponding to each subsequence in the shared memory region of the standardized image sequence. Subsequently, using memory mapping technology, such as through the mmap system call or similar high-performance I / O interfaces, the physical memory data block corresponding to each subsequence is mapped to the private virtual address space of its associated thread. As a result of this operation, each image processing thread obtains a direct pointer to its dedicated data block, allowing the thread to directly read and write these image data as if accessing a local array, without any expensive data copying process. While performing memory mapping, each thread must be safely informed of and store the logical starting index of the image subsequence it is responsible for processing in the original global sequence. This index value will be strictly maintained by the thread from this moment on and will serve as the identity label of its final output result, which is the fundamental basis for ensuring that the distributed computing results are correctly placed in the original order.

[0152] Once all threads have gained data access and are aware of their logical context, they begin to execute the same target image processing algorithm independently and concurrently on their bound physical computing units. Algorithm execution is the core of the computationally intensive process. To address potential resource contention and performance fluctuations during runtime, a fixed time interval, such as 100ms, is used as the monitoring period to periodically poll or passively receive status reports from all worker threads. These status reports mainly contain two types of information: first, execution progress, such as the number of image frames successfully processed; and second, real-time performance metrics of their associated physical computing units. These metrics may include, but are not limited to, CPU or GPU utilization, current operating frequency, cache hit rates at all levels, memory bandwidth utilization, and thread queue length. The monitoring module internally maintains a baseline throughput curve for each computing unit when the system is idle. It compares and analyzes the periodically collected real-time metrics with the corresponding baseline curve to assess whether the actual operating status of each computing unit deviates from expectations.

[0153] If, within several consecutive preset monitoring periods, such as three consecutive periods, the actual performance index of a specific computing unit is consistently and significantly lower than a certain threshold of its baseline throughput (e.g., consistently below 80% of the baseline value), the system determines that the computing unit may be experiencing resource preemption by other high-priority processes or may be experiencing frequency reduction due to overheating triggering thermal protection, resulting in a substantial decrease in its processing capacity. At this point, the system will automatically trigger an online fine-tuning mechanism. The fine-tuning decision first involves quantitative calculation: based on the number of remaining image frames not yet processed by the affected thread, the actual performance degradation ratio of its current computing unit, and the real-time load of all other threads in the current thread pool, a suggested workload for transfer is calculated using a predefined load transfer model—that is, the number of image frames that need to be cut from and reallocated from the overloaded thread. Next, starting from the end of the data block logic currently handled by the affected thread, the system traces backwards to define a data sub-block to be cut, which is exactly the same size as the calculated workload.

[0154] The execution of online fine-tuning is a highly coordinated process that ensures data consistency. The system first coordinates the affected threads, causing them to pause processing subsequent frames and enter a safe point after completing the current image frame. Then, the system kernel or hypervisor atomically disconnects the data sub-blocks to be segmented from the affected threads' current private memory mapping regions. After deassociation, the system selects the least loaded candidate thread from one or more candidate threads showing the lowest real-time load. Next, the system creates a new private memory mapping region for this candidate thread, pointing to the starting physical address of the deassociated data sub-block, thus transparently and dynamically transferring this portion of the computational load to computing units with available processing power. After successful memory mapping adjustment, the system must synchronously update the task metadata maintained by both the affected and candidate threads. This includes their respective new data processing ranges, i.e., the updated start and end logical indices, and adjust the boundary conditions of their internal loops or pointers accordingly. After the update, the system notifies both threads of the change in their data boundaries by sending a synchronization event or signal. After receiving an event, the thread reloads its processing range parameters and continues execution of the target algorithm from the safe point of interruption, processing its own new data blocks.

[0155] Ultimately, each image processing thread, regardless of whether its workload has been dynamically adjusted during the process, will perform a result output operation after completing the processing of all the image data it has been allocated. This operation tightly associates and encapsulates all the processing results generated by the thread—which may be a continuous block of image data or a set of extracted feature vectors—with the original logical starting index maintained by the thread throughout, forming a complete result unit with a clear location label, and outputting it to a designated result queue or shared memory area. Through the above-described complete closed-loop process integrating initial static load matching, continuous performance monitoring, and dynamic task reallocation, step S40 not only achieves parallel data computing but also endows the system with the adaptive capability of self-diagnosis and self-adjustment to maintain overall processing efficiency and high stability when facing complex and ever-changing heterogeneous computing environments.

[0156] Step S50: Based on the starting index and the calculation completion status of each image processing thread, dynamically integrate all processed image subsequences. When any thread completes its calculation, it triggers the merging of the corresponding processing results according to the order of each image subsequence in the original sequence to generate the final processed image sequence.

[0157] In this embodiment, based on the starting index and the calculation completion status of each image processing thread, all processed image subsequences are dynamically integrated. The method is as follows:

[0158] Set up a final result buffer that matches the total number of images, and start a separate event listener process to poll the computation completion status of each image processing thread;

[0159] When any thread completes its calculation, immediately obtain the image subsequence processing result and the starting index of the record output by that image processing thread;

[0160] Based on the obtained starting index, the corresponding processing result data is written sequentially into the corresponding logical storage area of ​​the final result buffer;

[0161] In a global status table, the logical range covered by this write result is marked as integrated;

[0162] Based on the global state table, continuously determine whether there is a continuous and uninterrupted integrated interval starting from the logical start position of the standardized image sequence;

[0163] If it exists, mark the buffer data corresponding to the longest continuous integrated interval that has been formed as committable.

[0164] The process of polling, acquiring, writing, marking, and judging is repeated until the global state table indicates that all logical intervals of the standardized image sequence have been marked as integrated, and the complete interval from the beginning to the end is in a committable state. At this time, the data in the final result buffer constitutes the final processed image sequence.

[0165] Step S50 is responsible for dynamically integrating all parallel-processed image subsequence results into a final ordered sequence. Its implementation begins synchronously after parallel computation starts. The process first allocates a contiguous storage area in memory as a final result buffer. The size of this buffer is precisely calculated based on the total number of image frames and the size of the result data for each frame, ensuring it can accommodate the complete final processed image sequence. Simultaneously, a global state table is created. This table is typically implemented as a bitmap or a state array of equal length, with its length matching the total number of image frames. Each bit or array element corresponds to a logical storage unit in the final result buffer, used to precisely track the result data filling status at each position. The initial values ​​of all state bits are set to preset values ​​indicating that the data is not filled. A dedicated listening thread is then started. This thread is independent of all worker threads executing image processing algorithms, and its core function is to monitor the completion status of worker threads. The listening thread checks a shared thread-safe queue or waits for synchronization primitives triggered by worker threads, such as condition variables, to determine if any worker thread has completed its computation task. The listening loop can be designed to include a polling method with brief sleep intervals, such as 100ms between each loop, to avoid excessive consumption of processor resources.

[0166] When any image processing thread completes all its assigned image processing tasks, it encapsulates its output result data block, the starting logical index of this data block in the original normalized image sequence, and the number of result frames contained in the data block into a completion event object. The worker thread adds this completion event object to the shared completion event queue through atomic operations and immediately notifies the listening thread that new results are available for integration via a signal mechanism. Subsequently, the worker thread can enter a waiting state or terminate directly. Once the listening thread detects the event notification or successfully retrieves the event from the queue, it begins result integration. The listening thread parses the event, obtaining the memory pointer of the result data block, the starting logical index S, and the result frame length L contained within it. Based on the starting index S, the listening thread sequentially copies each frame of result data from the result data block to a position L consecutive units starting at offset S in the final result buffer. That is, for the i-th result within the block, its target position index in the write buffer is S+i. This write operation ensures that each result fragment can be accurately positioned according to its own logical position information.

[0167] After data writing is complete, the listening thread immediately updates the global state table, marking all corresponding states in the index range from S to S+L-1 as preset values ​​indicating that the data at that position has been integrated. After the state update, the listening thread executes continuity judgment logic. This logic scans the global state table to check if there exists a continuous, uninterrupted interval of integrated states starting from index 0. The scan starts from the first position and continues until the first unintegrated state bit is encountered. At this point, the longest continuous integrated prefix interval can be determined, assuming its ending index is M-1. If the length of this continuous interval is greater than zero, the listening thread marks the data area in the final result buffer corresponding to this interval—that is, the data area from index 0 to M-1—as committable. This marking can be achieved by setting an atomic variable to record the current committable boundary index M, or by publishing an event containing information about this interval. Downstream display, storage, or analysis modules continuously monitor this committable state. Once a new data interval is marked as committable, it can immediately read the ready final result data for subsequent operations, thereby achieving progressive output of processing results and improving the user experience.

[0168] After completing an event processing step, including data writing, status updating, and continuity checking, the listening thread returns to its monitoring loop to wait for and process the next completion event. This loop continues until the global status table shows that all status bits have been marked as integrated. When all image frames' corresponding status bits are marked, it means that the final result buffer has been completely filled, and the entire buffer data naturally forms a complete committable range from beginning to end. At this point, the dynamic integration process ends, and the final result buffer stores a complete, sequential sequence of processed images ready for final use. Through this event-driven, state-tracking, and sequential direct writing mechanism, this method achieves efficient, low-latency streaming result integration, effectively avoiding the output delays caused by traditional batch processing integration methods.

[0169] To further illustrate the method of this embodiment, please refer to the following example:

[0170] A medical imaging center received enhanced abdominal CT scan data from a patient, containing a sequence of 512 DICOM files. The analysis task was to apply a computationally intensive 3D Gaussian smoothing filter to this 3D data for image denoising preprocessing. The computing device performing this task was a workstation equipped with heterogeneous processing cores, featuring two high-performance P-Cores and four more energy-efficient E-Cores.

[0171] First, step S10 is executed to obtain the 512 DICOM files. It parses the metadata of each file and precisely sorts all images according to slice location labels, ensuring top-to-bottom anatomical continuity. Next, the system converts the pixel data of each file from raw 16-bit integers to standard Henness units using window width and window level, and then normalizes it. All processed image frames are loaded and merged into a single, contiguous, normalized image sequence stored in memory, preparing for subsequent parallel processing.

[0172] Next, we move to the core step S20, the parallelism adaptation analysis. Six available CPU cores (two P-Cores and four E-Cores) are identified as six independent computing units. A short benchmark test shows that the P-Cores have a baseline throughput of approximately 25 frames per second, while the E-Cores have approximately 12 frames per second. Based on this, the total number of subsequences is set to six, and these six computing units are arranged in descending order of performance. Subsequently, the task gradient allocation model begins iterative solving. After calculation, the model generates a dynamic subsequence partitioning strategy, deciding to allocate more computing tasks to the more powerful P-Cores. The final strategy is: each of the two P-Cores processes 116 frames, and each of the four E-Cores processes 70 frames, totaling 2 × 116 + 4 × 70 = 232 + 280 = 512 frames, perfectly matching the total number of images.

[0173] Based on this strategy, in step S30, the 512-frame sequence is non-uniformly divided into 6 logical sub-sequences. In step S40, a thread pool containing 6 threads is created, and each thread is precisely bound to a physical core by setting processor affinity: the first two threads are bound to the P-Core, and the last four are bound to the E-Core. The system uses memory mapping technology to map the data blocks corresponding to each sub-sequence to the private memory space of the corresponding thread with zero copy, and informs each thread of the starting index of the data it is processing in the original sequence. All 6 threads then begin to concurrently execute the three-dimensional Gaussian smoothing filtering algorithm.

[0174] During processing, a security software on the workstation suddenly initiated a full scan, causing significant preemption of computing resources for the thread (Thread 1) bound to one of the P-Cores. The online monitoring mechanism detected over several monitoring cycles that the core's actual throughput plummeted to 10 frames per second, far below its baseline. At this point, Thread 1 had processed 60 frames, leaving 56 frames remaining. The online fine-tuning mechanism was triggered, calculating that the remaining 30 frames should be offloaded. The system selected the thread (Thread 5) associated with the E-Core with the lowest current load as a candidate. Through atomic operations, the system dynamically adjusted the memory mapping region, remapping the last 30 frames of image data originally belonging to Thread 1 to Thread 5, and simultaneously updating the data range and starting index information that each thread needed to process.

[0175] Finally, the dynamic integration phase of step S50 begins. Assume that the other unaffected P-Core (thread 2) completes its 116-frame processing task first. It outputs the result along with its starting index (116). A separate listening thread immediately captures this completion event and precisely writes the processing results of these 116 frames into the corresponding positions in the final result buffer. However, at this point, since the first subsequence at index 0 has not yet been completed, the system determines that there is no continuous integrated interval starting from the beginning. Shortly afterward, thread 1, after task adjustment, completes its remaining 26-frame task, and its results are written to positions 0 to 59 in the buffer. At this time, the listening thread determines that the length of the continuous integrated interval starting from position 0 is 60, and thus marks this part of the data as "committable," allowing the user interface to immediately load and display this portion of the denoised image. As other threads complete their tasks and write their results to the correct positions, this "committable" continuous data area continues to grow until all 512 frames are integrated, forming the final processed image sequence.

[0176] Table 1 below quantitatively demonstrates the technical advantages of the method proposed in the first embodiment of this invention compared to traditional uniform parallel processing schemes by comparing key performance indicators:

[0177] Table 1:

[0178]

[0179] As shown in Table 1 above, this embodiment generates a non-uniform subsequence partitioning strategy through parallelism adaptation analysis, achieving a precise match between computational load and heterogeneous unit capabilities. This results in a several-fold increase in processing speed and an extremely high level of resource utilization. The event-triggered dynamic integration mechanism allows partial results to be output upon completion of any subtask, significantly shortening the waiting time for users to receive initial feedback. Furthermore, the online fine-tuning function endows the system with strong adaptive capabilities, maintaining stable processing performance even when faced with runtime resource fluctuations. These improvements collectively address issues such as low resource utilization, long processing times, and lack of progressive feedback, significantly enhancing the overall efficiency and user experience of medical image processing.

[0180] See Figure 2 The second embodiment of the present invention proposes a medical image analysis method, based on a method for processing DICOM file sequences according to the first embodiment, comprising:

[0181] Step A10: While generating the dynamic subsequence partitioning strategy, determine the execution priority and resource allocation parameters for at least two medical image analysis algorithms to be applied sequentially to the output results of the standardized image sequence or the target image processing algorithm.

[0182] Step A20: After obtaining multiple image subsequences according to the dynamic subsequence partitioning strategy, a composite computing task is constructed for each image subsequence. The composite computing task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence.

[0183] Step A30: Create an independent image processing thread for each of the composite computing tasks and assign it to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority.

[0184] Step A40: After completing the composite calculation task, each image processing thread generates a result data block containing all algorithm results and records the starting index corresponding to the processed image subsequence.

[0185] Step A50: Start a result aggregation thread to monitor the completion status of each image processing thread. When a thread is detected to be complete, obtain the result data block output by that image processing thread and its corresponding starting index.

[0186] Step A60: The result aggregation thread merges each result data block into the global result sequence according to the original logical order based on the starting index, and performs weighted fusion of the merged results according to the execution priority;

[0187] In step A70, during the process of dynamically integrating all processed image subsequences, a progressive analysis report matching the processing procedure of the composite computing task is generated synchronously based on the integration status of the global result sequence.

[0188] Step A10: While generating the dynamic subsequence partitioning strategy, determine the execution priority and resource allocation parameters for at least two medical image analysis algorithms to be applied sequentially to the output results of the standardized image sequence or the target image processing algorithm.

[0189] In this embodiment, determining the execution priority and resource allocation parameters for at least two medical image analysis algorithms includes the following steps:

[0190] Obtain historical execution performance data of the at least two medical image analysis algorithms, wherein the historical execution performance data includes peak time for a single operation and peak memory usage.

[0191] Based on the data characteristics of the data processed by the medical image analysis algorithms, the degree of dependence of each medical image analysis algorithm on the data characteristics is analyzed, and the dependence level is defined;

[0192] Based on the peak computation time and memory usage in the historical execution performance data, as well as the dependency level, a quantified initial execution priority score is generated for each medical image analysis algorithm through a preset priority calculation function.

[0193] During the generation or application of the dynamic subsequence partitioning strategy, the initial execution priority score is mapped to specific resource allocation parameters based on the baseline throughput and free memory capacity of each computing unit. The resource allocation parameters include the proportion of computing time slices reserved for each algorithm on each computing unit and the maximum available memory quota.

[0194] Based on the resource allocation parameters, the resource quota allocated by each computing unit to the composite computing task is dynamically adjusted.

[0195] In this embodiment, based on the baseline throughput and free memory capacity of each computing unit, the initial execution priority score is mapped to specific resource allocation parameters, including:

[0196] Obtain the baseline performance indicators of each computing unit as determined by the parallelism adaptation analysis, as well as its real-time free memory capacity, and simultaneously obtain the initial execution priority score generated for each medical image analysis algorithm.

[0197] Based on the initial execution priority scores, the proportion of computation time slices reserved for each medical image analysis algorithm in the composite computation task is mapped and determined for each corresponding computing unit, so that the algorithm with the higher score can obtain more processor time.

[0198] By combining the historical peak memory usage data of each medical image analysis algorithm with the real-time free memory capacity of the computing unit, the maximum available memory quota for each algorithm on each computing unit is determined. The determination of this quota must ensure that the peak memory requirement of the algorithm is met and include a safety margin.

[0199] Based on the mapped computation time slice ratio and maximum available memory quota set for each algorithm on each computing unit, the resource quota allocated to each computing unit for the composite computing task is dynamically adjusted.

[0200] Furthermore, based on the resource allocation parameters, the resource quota allocated to each computing unit for the composite computing task is dynamically adjusted, and the method is as follows:

[0201] Based on the resource allocation parameters determined for the composite computing tasks on each computing unit, an independent resource control group is created for each computing unit carrying the composite computing tasks through the resource management interface provided by the operating system.

[0202] For each resource control group created, the maximum available memory quota set for each algorithm on the corresponding computing unit in the resource allocation parameters is used to calculate and determine the overall hard limit of memory usage for the composite computing task, and this hard limit is configured as the memory constraint rule of the resource control group by calling the resource management interface.

[0203] Based on the proportion of computing time slices reserved for each algorithm on the corresponding computing unit in the resource allocation parameters, the overall processor time share that the composite computing task should enjoy on the computing unit is calculated, and this share value is set as the processor scheduling weight of the resource control group through the resource management interface.

[0204] When starting an independent image processing thread created for each of the composite computing tasks, each image processing thread and its associated system process are programmed to be assigned to the resource control group created for its corresponding computing unit.

[0205] During the execution of the composite computing task, its memory consumption and processor time scheduling are both constrained and managed by the rules configured by the resource control group, thereby realizing the dynamic adjustment of the resource quota allocated to each computing unit according to the resource allocation parameters.

[0206] In practice, this step first retrieves detailed historical performance data for all medical image analysis algorithms to be executed from a pre-established algorithm performance knowledge base implemented using, for example, an SQL database or Redis key-value store. This knowledge base is continuously updated by recording and analyzing the algorithm's performance in previous runs, maintaining an accurate performance profile for each algorithm, with its primary key being the algorithm name and version number. Taking a liver segmentation algorithm named "LiverSeg-v2.1" as an example, its historical performance data is not a general time consumption value, but rather includes a detailed statistical distribution of the time consumed in processing a 512x512 pixel CT image slice, recorded as an average time of 150 ms, a median of 145 ms, and a 95th percentile time of 210 ms. This allows the system to more accurately predict its time consumption under different loads. Similarly, the peak memory usage record represents the maximum memory requirement of the algorithm throughout the entire execution cycle, including loading the PyTorch framework and model weights, and caching intermediate activation layers. The statistics show that the average peak is 2.5 gigabytes, and the peak at the 95th percentile is 3.1 gigabytes, thus providing a reliable basis for the pre-allocation of memory resources.

[0207] After acquiring basic performance data, the system further analyzes the dependence of each medical image analysis algorithm on the data characteristics of the processed data. Data characteristics refer to the intrinsic properties of the image, such as whether the modality is CT or MRI, the presence of moderate motion artifacts greater than level 2, the signal-to-noise ratio of the image being below 15 dB, or the smoothness of the output result after processing by the preceding denoising algorithm. Different algorithms have varying sensitivities to these characteristics. For example, a blood vessel tracking algorithm based on edge detection may be extremely sensitive to image noise and therefore highly dependent on the quality of the preceding denoising processing result; while a U-Net classification model based on deep learning may have specific requirements for the normalization method of the image's window width and window level. The system uses a built-in expert knowledge base, which stores data in the form of rules or lookup tables, to quantify the dependence of each algorithm on various data characteristics and define it as a specific dependency level. For example, the liver segmentation algorithm "LiverSeg-v2.1" has a dependency level of 4 out of 5 on the data characteristic of "presence of contrast agent," indicating that its performance is heavily dependent on the correct contrast agent timing; while its dependency level on "image signal-to-noise ratio" is 2, showing strong noise resistance. These quantified dependency levels provide a crucial logical basis for determining the execution order of algorithms.

[0208] Next, based on the peak computation time and memory usage in the acquired historical performance data, and the dependency level just defined, a quantified initial execution priority score is generated for each medical image analysis algorithm using a preset priority calculation function. This function is a comprehensive multi-factor weighted model, and its specific form is as follows: In this formula, and These are the average computation time and peak memory usage of the algorithm, respectively, after being normalized to the 0-1 range among all algorithms to be executed. It is the normalized value of its dependency level score; , , These are preset weighting coefficients; for example, when the system focuses on optimizing time efficiency, they can be set... =0.5, =0.2, =0.3. Assuming that besides "LiverSeg-v2.1" (150ms execution time, 2.5 gigabytes memory, dependency level 4), there is also a "Radiomics-v1.3" feature extraction algorithm (30ms execution time, 0.2 gigabytes memory, dependency level 2), after normalization and weighted calculation, "LiverSeg-v2.1" has a priority score of 0.45, while "Radiomics-v1.3" has a score of 0.82. This initial execution priority score reflects the inherent importance and resource friendliness of the algorithm itself; the algorithm with the higher score, "Radiomics-v1.3," will receive a higher priority in subsequent scheduling.

[0209] In the generation or application of the dynamic subsequence partitioning strategy, this abstract initial execution priority score is combined with the actual capabilities of each computing unit in the current environment, mapping it to specific resource allocation parameters. This process fully leverages the underlying parallel framework's precise understanding of the hardware environment, including each computing unit, such as a high-performance CPU computing unit CU1 with 16 cores and 64 gigabytes of memory and a mid-range CPU computing unit CU2 with 8 cores and 32 gigabytes of memory, with benchmark throughput obtained through benchmark testing and currently available free memory capacities of 40 gigabytes and 20 gigabytes obtained through real-time querying, respectively. The mapping process tailors a set of resource quotas for each algorithm on each computing unit it is about to be allocated to. For example, the resource allocation parameters may include the proportion of computation time slices reserved for each algorithm on each computing unit, meaning that in a composite task consisting of multiple algorithms serially, higher-priority algorithms will receive more processor time. More specifically, the parameters also include a maximum available memory limit, which is a hard limit. For example, for the aforementioned composite task to be executed on CU1, the memory limit is set to the maximum memory requirement of the two algorithms, i.e., 2.5 gigabytes, multiplied by a safety factor of, for example, 1.2, and finally determined to be 3.0 gigabytes. This ensures that memory-intensive tasks have sufficient memory, while preventing the composite task as a whole from crashing due to excessive memory consumption.

[0210] Finally, based on the resource allocation parameters generated above and customized for each algorithm on each computing unit, the resource quota allocated to each computing unit for the upcoming composite computing task is dynamically adjusted. This adjustment is achieved by invoking the resource management mechanism at the operating system level, such as utilizing the resource control functions commonly provided in modern operating system kernels. Before starting processing, the system programmatically creates an independent resource control entity for the composite task that will run on CU1. Subsequently, the system interacts with the operating system kernel interface to set the 3.0 gigabytes of maximum available memory calculated in the previous step as the hard limit for memory usage of this resource control entity. At the same time, based on the overall priority score of the composite task, a relatively high CPU scheduling weight is configured for it. For example, its CPU time slice share is set to a value such as 2048, which is significantly higher than the default share of 1024 for ordinary processes in the system, thereby instructing the operating system scheduler to prioritize allocating computing time to this task when CPU resources are contested. When the image processing thread created for this composite task starts, its corresponding system process or thread will be automatically and programmatically assigned to this specially configured resource control entity. In this way, when a complex computational task containing multiple serial analysis algorithms is scheduled to be executed on a certain computing unit, its resource consumption is subject to pre-planned, differentiated, and fine-grained control from the outset. This series of operations ensures that not only is the allocation of tasks among computing units optimized, but also that the execution of multiple analysis algorithms within each computing unit is resource-controlled and has clearly defined priorities, thus forming a comprehensive intelligent scheduling system from macro to micro levels.

[0211] Step A20: After obtaining multiple image subsequences according to the dynamic subsequence partitioning strategy, a composite computing task is constructed for each image subsequence. The composite computing task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence.

[0212] In practice, the system first receives the output from step S30, which is a set containing n task descriptors, where n is the number of available computing units. Each task descriptor precisely defines an image subsequence. For example, for the first image subsequence assigned to the first computing unit CU1 in the ordered computing unit sequence, its task descriptor may contain the following information: a starting index of 0, a number of images of 120 frames, and a memory pointer pointing to the starting position of the subsequence in the normalized image sequence memory block. Simultaneously, the system also obtains the execution priority and resource allocation parameters determined for the two medical image analysis algorithms from step A10, such as a liver segmentation algorithm called "LiverSeg-v2.1" and a radiomics feature extraction algorithm called "Radiomics-v1.3".

[0213] Based on this, the system instantiates a composite computation task object for each image subsequence. This object is a self-consistent data structure containing complete processing logic. Taking the processing of the first subsequence containing 120 frames of images as an example, the composite computation task object created by the system encapsulates the following key information: First, it contains a reference to the subsequence data, i.e., the aforementioned task descriptor, ensuring that the task object clearly knows the range of data it needs to process. Second, an ordered algorithm call chain is constructed internally within this object. The order of this chain strictly follows the execution order determined in step A10 based on algorithm dependency and priority scoring. Assuming the target image processing algorithm is a Gaussian blur denoising algorithm, and analysis shows that the "Radiomics-v1.3" algorithm depends on the segmentation result of the "LiverSeg-v2.1" algorithm, then the call chain is constructed as follows: first, Gaussian blur denoising is performed; second, the "LiverSeg-v2.1" algorithm is performed on its output; and third, the "Radiomics-v1.3" algorithm is performed on the segmentation mask generated by the "LiverSeg-v2.1" algorithm.

[0214] To manage the data flow between steps in the algorithm call chain, each composite computation task object also manages a private intermediate result buffer. When the first algorithm in the call chain, Gaussian blur denoising, processes 120 frames of images, its generated result data—a memory block of size 120 times the image width, image height, and data type bytes—is directly written to this private buffer. Subsequently, the second algorithm in the call chain, "LiverSeg-v2.1," is triggered. It reads the denoised image data from this private buffer as its input and processes it. Assuming the generated segmentation mask is a single-channel 8-bit integer image, after processing, it writes this mask data block of size 120 times the image width, image height, to the next available area in the private buffer, or overwrites no longer needed denoised image data. Finally, the third algorithm, "Radiomics-v1.3," is called. It reads the segmentation mask from the buffer and extracts radiomics features; the final feature results are also stored in this buffer awaiting encapsulation. This design, which encapsulates an intermediate data management mechanism, ensures efficient and conflict-free data transfer within a complex task.

[0215] This construction process is repeated for each of the n image subsequences, ultimately generating a set or queue containing n composite computational task objects. Each object encapsulates the specific image subsequence to be processed, an identical ordered chain of algorithm calls, and a set of resource allocation parameters tailored to the computational units allocated to that task. These objects are now in a schedulable state; they are completely independent atomic task units that encapsulate complete processing logic and resource constraints, providing directly usable input to the thread scheduling module in subsequent step A30, ensuring that each created thread can receive a well-defined, resource-controlled composite computational task to execute.

[0216] Step A30: Create an independent image processing thread for each of the composite computing tasks and assign it to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority.

[0217] In this embodiment, step A30 specifically includes:

[0218] Based on the correspondence between the composite computing task and the independent computing unit, an image processing thread is created, and the operating system function is invoked to bind each image processing thread to its corresponding independent computing unit;

[0219] Within each of the image processing threads, the composite computation task object it carries is parsed to obtain the ordered sequence of algorithm calls arranged according to execution priority contained therein;

[0220] The medical image analysis algorithms are executed sequentially following the ordered algorithm call sequence, wherein the execution of the next algorithm is premised on the completion of the previous algorithm, and the processing output of the previous algorithm is used as the input data of the next algorithm.

[0221] After the last algorithm in the ordered algorithm call sequence has been executed, a complete result data block containing the final results of all analysis steps is encapsulated and associated with the original position information of the image subsequence it processed, and the encapsulated result data block is output.

[0222] In practice, the system first receives a set of n composite computing task objects from step A20, where n is equal to the number of available computing units. Then, an image processing thread pool of size n is created. For the i-th computing unit CUi (i from 1 to n) in the ordered sequence of computing units, the scheduling module creates a dedicated image processing thread Threadi and passes the i-th composite computing task object as a startup parameter to that thread. For example, if the system identifies four available CPU cores as computing units, it will create four threads, Thread1 to Thread4, and associate the first to fourth composite computing task objects with them respectively.

[0223] After creating threads, a crucial step is to precisely allocate and bind each thread to its corresponding independent computing unit. This operation is achieved by calling the processor affinity setting interface provided by the operating system. The thread scheduling module configures an exclusive processor mask for each newly created thread, Threadi. For example, suppose computing unit CU1 corresponds to physical CPU core 0, CU2 to core 1, CU3 to core 4, and CU4 to core 5. Then, after starting Thread1, the scheduling module will immediately set an affinity mask for it that only contains CPU core 0 (e.g., in Linux systems, a cpu_set_t structure containing only bits 0 set to 1). Similarly, Thread2 will be bound to core 1, Thread3 to core 4, and Thread4 to core 5. The direct effect of this binding is that the operating system scheduler is forced to run the thread only on the specified physical core, thereby preventing expensive migration of threads between different cores and ensuring that the non-uniform task partitioning based on the baseline throughput of each computing unit in step S20 is strictly executed at the physical level.

[0224] After a thread is successfully created and bound to its dedicated computing unit, the thread begins executing its assigned composite computing task. The thread's main execution function first parses the passed-in composite computing task object to obtain the ordered chain of algorithm calls contained within it. This chain of calls has been constructed in step A20 according to the execution priority determined in step A10. For example, the chain of calls might be a list containing three function pointers or algorithm objects: [denoising algorithm, liver segmentation algorithm, radiomics feature extraction algorithm].

[0225] The execution logic of a thread is a strictly serial call process. It executes each algorithm sequentially according to the order defined in the call chain. Specifically:

[0226] Step 1: The thread calls the first algorithm in the list, the denoising algorithm, and takes the memory pointer to the current subsequence data recorded in the composite task object as its input. Assuming the current subsequence contains 120 frames of 512x512 16-bit images, the input data block size is 120×512×512×2 bytes. After the denoising algorithm completes its processing, its output (a data block of the same size but with modified content) is stored in the thread-private intermediate result buffer. A middle.

[0227] Step two: The thread then calls the second algorithm in the list, namely the "LiverSeg-v2.1" liver segmentation algorithm. At this point, the algorithm's input is no longer the original subsequence data, but rather the data stored in the buffer from the previous step. AThe image data is denoised. After the algorithm is executed, it generates, for example, an 8-bit single-channel segmentation mask of 512x512 pixels for 120 frames, and the result is stored in another private intermediate result buffer. B .

[0228] The third step: The thread finally calls the third algorithm in the list, namely the "Radiomics-v1.3" radiomics feature extraction algorithm. The input to this algorithm is a buffer. B The segmentation mask and buffer in A The algorithm extracts features from the denoised image based on the masked regions. Ultimately, it outputs a structured feature set, such as an array of 120 elements (one per frame), where each element is a vector containing 107 radiomics features. This final result is stored in the thread's output buffer.

[0229] Throughout the serial call process, all operations are completed within the context of a single thread, ensuring that dependencies between algorithms are satisfied. Data transfer between algorithms is achieved through efficient memory pointer operations, avoiding unnecessary cross-thread communication or data copying. Furthermore, since the thread's process ID is assigned to a resource control entity (such as Linux cgroups) configured based on resource allocation parameters calculated in step A10 (e.g., a 3.0 gigabyte memory limit and a CPU weight of 2048), when the thread executes, for example, the memory-intensive "LiverSeg-v2.1" algorithm, the operating system kernel will actively enforce a hard limit of 3.0 gigabytes for memory usage and grant it higher scheduling priority when CPU resources are scarce.

[0230] Through the aforementioned thread creation, hardware binding, serial execution within tasks, and runtime resource constraints, step A30 organically combines macro-level parallel strategies with micro-level algorithmic processes, ensuring that each complex computing task can efficiently and orderly complete its complex analysis process on its dedicated and resource-controlled computing unit.

[0231] Step A40: After completing the composite calculation task, each image processing thread generates a result data block containing all algorithm results and records the starting index corresponding to the processed image subsequence.

[0232] In practice, step A40 is the final output stage of each independent image processing thread in its lifecycle, marking the end of the complete processing flow of the specific image subsequence it carries. When an image processing thread successfully executes the target image processing algorithm and all specified medical image analysis algorithms in sequence according to the ordered algorithm call sequence defined in step A30, such as sequentially completing Gaussian blur denoising, U-Net-based liver region segmentation, and radiomics feature extraction for the segmented region, the thread enters the result encapsulation stage.

[0233] The core task at this stage is to integrate and solidify the multiple, heterogeneous intermediate and final results generated by various algorithm steps, distributed in the thread's private memory space, into a unified, self-contained, and securely transferable result data block. First, the thread allocates a new, contiguous storage space in memory according to a predefined encapsulation protocol, large enough to hold all analysis results for all image frames in the current subsequence. Then, the thread accesses its internal intermediate result buffer one by one, copying key results generated during the execution of the composite computation task—such as the binarized segmentation mask data for each image frame and a set of numerical radiomics feature vectors extracted from these mask regions—in an orderly manner corresponding to the original image frames, into this newly allocated storage space, thus forming a structured data set.

[0234] To ensure the parsability and integrity of this result data block, its internal structure typically includes a header region and a subsequent data payload region. The header region acts as a metadata descriptor, and a crucial field within it is the original logical starting index of the image subsequence it processes within the entire normalized image sequence, maintained by the image processing thread since its creation. This index value, such as the integer 501, is precisely written into the header, becoming a unique and immutable identifier for this result data block within the global sequence. Besides the starting index, the header region may also contain other key metadata, such as the total number of image frames in the subsequence (e.g., 120 frames), the data type and dimension of each algorithm's output, and its specific offset within the data payload region.

[0235] Furthermore, to enhance the system's robustness and traceability, the thread also encapsulates its own performance metrics monitored during the execution of the composite computation task, such as the precise time consumed to complete the entire task, the number of CPU cycles, and peak memory usage, into the header of the result data block or associates it as an independent metadata structure. This information will provide crucial real-time data for the weighted fusion and performance evaluation in the subsequent A60 step. After all data and metadata have been written, the thread may calculate a checksum and store it in the header to ensure data integrity during subsequent transmission. Finally, the thread generates a pointer or handle to this structured and fixed-content result data block that fully encapsulates all algorithm results, starting indexes, and other metadata, and uses this handle as its final output, ready to be submitted to the next step of the result aggregation and integration process.

[0236] Step A50: Start a result aggregation thread to monitor the completion status of each image processing thread. When a thread is detected to be complete, obtain the result data block output by that image processing thread and its corresponding starting index.

[0237] In practical implementation, step A50 serves as a crucial bridge connecting distributed computing and centralized integration. Its execution is parallel to the startup of each image processing thread in step A30. After distributing all composite computing tasks and starting the image processing thread pool, the system immediately creates an independent, long-lived result aggregation thread. The core function of this thread is as a dedicated, asynchronous event listener and data collector. It does not participate in any intensive image processing operations, thus ensuring that it can respond to the completion signal of any image processing thread with extremely low latency. To implement this listening function, the system establishes a shared, thread-safe communication channel between the main process and all image processing threads. This channel is preferably implemented as a first-in-first-out (FIFO) blocking concurrent queue, which internally uses synchronization primitives such as mutexes and condition variables to ensure the atomicity of operations and data consistency in a multi-producer, single-consumer scenario.

[0238] Once any image processing thread has completed all the result encapsulation work defined in step A40, it assumes the role of a producer. It encapsulates the lightweight result descriptor object—containing the memory pointer to the result data block, the original logical starting index, and other metadata—as a whole and performs an enqueue operation, atomically pushing it into the aforementioned shared concurrent queue. After completing the enqueue operation, the image processing thread's mission is complete; it can safely release its occupied resources and terminate, or be reclaimed by the thread pool for reuse by subsequent tasks. Simultaneously, the result aggregation thread, as the sole consumer of the queue, performs a blocking dequeue operation in its main loop. The characteristic of this operation is that if the shared queue is empty, the result aggregation thread will automatically relinquish its processor time slice and enter an efficient sleep waiting state until a new result descriptor is pushed into the queue by the producer. This event-driven mechanism completely avoids the resource waste caused by repeatedly checking the completion status of each thread using a high-overhead polling method.

[0239] Technically, the moment the result aggregation thread successfully wakes up its dequeue operation and returns a valid result descriptor object is detected. Once this object is retrieved from the queue, the result aggregation thread gains exclusive access to the output of the completed subtask. It immediately parses the result descriptor, extracting two crucial pieces of information: first, the memory address or handle of the result data block that actually stores all the analysis algorithm results; and second, the integer value that uniquely identifies the data block's position in the global sequence—the original logical starting index. Through this process, the result aggregation thread not only successfully receives the data outputs generated by the distributed computing nodes but also precisely grasps the logical coordinates of each output within the final complete sequence, providing all the necessary prerequisites for the precise and orderly merging operation in subsequent step A60.

[0240] Step A60: The result aggregation thread merges each result data block into the global result sequence according to the original logical order based on the starting index, and performs weighted fusion of the merged results according to the execution priority;

[0241] In this embodiment, the merging results are weighted and fused according to the execution priority, and the method is as follows:

[0242] When the result aggregation thread obtains the result data block and its starting index, it simultaneously receives the actual load index of the computing unit where it is located when processing the composite computing task, which is monitored and reported by each image processing thread.

[0243] Based on the actual load indicators, the effective throughput of the corresponding computing unit when executing various medical image analysis algorithms is re-evaluated;

[0244] The effective throughput is compared with the baseline throughput of the computing unit, and the deviation value is calculated.

[0245] During the process of merging each result data block into the global result sequence, a confidence factor that is dynamically adjusted according to the deviation value is attached to the algorithm result in each result data block. When the deviation value exceeds a preset threshold, the confidence factor is negatively adjusted.

[0246] When it is necessary to fuse multiple results from different result data blocks that belong to the same logical location and originate from the same algorithm, a weighted fusion coefficient is calculated based on the confidence factor attached to each result and the execution priority.

[0247] After completing the fusion processing of all currently available result data blocks, a computing resource efficiency assessment report is generated based on the overall distribution of the deviation values.

[0248] Based on the aforementioned computing resource performance evaluation report, online fine-tuning of the resource allocation parameters of the computing units associated with the image processing threads that have not yet completed their computations is triggered.

[0249] In practice, step A60, as the intelligent core for orderly integration and quality perception of results in the entire analysis process, involves operations far exceeding simple mechanical splicing. For clarity, consider a specific scenario: the result aggregation thread is waiting for processing results. At this time, the image processing thread Thread3, bound to the less powerful computing unit CU3, completes its task first. In step A50, the result aggregation thread retrieves the result descriptor submitted by Thread3 from the shared queue. This descriptor explicitly indicates that the starting index of the processed image subsequence is 201, containing a total of 80 frames. Simultaneously, the descriptor also encapsulates detailed runtime performance metrics reported by Thread3: the total time taken to complete the composite computation task of these 80 frames is 10.0 seconds.

[0250] The result aggregation thread then proceeds to the post-performance evaluation phase. First, based on these firsthand actual load metrics, it recalculates the effective throughput of CU3 during this task execution, using 80 frames divided by 10.0 seconds, resulting in an effective throughput of 8 frames per second. Next, the result aggregation thread retrieves the baseline throughput of 10 frames per second measured for CU3 before the task began from the system configuration. By comparing these two values, the result aggregation thread calculates a performance deviation of (8-10) / 10, a negative deviation of 20%. This value precisely quantifies the degree of performance degradation of CU3 relative to its ideal performance during actual operation.

[0251] Before merging these 80 frames of result data into the corresponding positions in the global result sequence, specifically the interval from index 201 to 280, the result aggregation thread dynamically applies a confidence factor to all analysis results within this data block, including 80 liver segmentation masks and 80 sets of radiomics feature vectors, using this performance bias value. The system has a pre-defined confidence adjustment function; for example, the base confidence is 1.0 minus the absolute value of the bias, i.e., 1.0 - 0.2 = 0.8. Furthermore, the system has a 15% performance bias warning threshold. If the bias exceeds this threshold by 20%, the system triggers an additional penalty mechanism, multiplying the calculated confidence by a penalty coefficient of, for example, 0.9. Therefore, the final confidence factor applied to all results in these 80 frames is determined to be 0.8 multiplied by 0.9, i.e., 0.72.

[0252] This confidence factor plays a crucial role when fusing results from overlapping regions. Suppose that due to fine-tuning of previous online tasks, the last two frames processed by Thread3, namely frames 279 and 280, overlap with the first two frames of a subsequence processed by Thread1, which was completed slightly later and ran on the high-performance computing unit CU1. Assume that Thread1's performance deviation is minimal, and its result has a confidence factor as high as 0.98. When the result aggregation thread needs to determine the final segmentation result for frame 279, it will face two segmentation masks from Thread1 and Thread3 respectively. At this point, it will calculate a weighted fusion coefficient based on their respective confidence factors: the fusion weight for the Thread3 result is 0.72 / (0.72+0.98), approximately equal to 0.42; the fusion weight for the Thread1 result is 0.98 / (0.72+0.98), approximately equal to 0.58. Ultimately, the official segmentation mask for frame 279 will be generated by pixel-level weighted averaging of the two masks, with the mask from Thread1, which has more stable performance and more reliable results, taking a larger share.

[0253] After merging and integrating all arriving result data blocks, the result aggregation thread extends its responsibilities to macro-level performance review. It summarizes the performance deviation values ​​of all completed subtasks, forming a deviation value sequence, such as [-20%, -5%, -22%, +1%]. Based on this sequence, it calculates the overall average deviation, variance, and the proportion of tasks exceeding warning thresholds, generating a computing resource efficiency assessment report. This report might state: "In this analysis task, computing units CU3 and CU4 exhibited consistently substandard performance, with an average performance deviation exceeding 20%, potentially indicating resource contention or hardware limitations." Finally, this report triggers a feedback control signal. Based on the report's conclusions, the central scheduling module identifies other unfinished tasks still running on these inefficient computing units and performs real-time, targeted online fine-tuning of their resource allocation parameters, such as CPU scheduling priority. This transforms the insights gained from the result assessment into direct optimization of the ongoing computation process, forming a complete intelligent closed loop from performance monitoring and result quality awareness to dynamic real-time tuning.

[0254] In step A70, during the process of dynamically integrating all processed image subsequences, a progressive analysis report matching the processing procedure of the composite computing task is generated synchronously based on the integration status of the global result sequence.

[0255] In practice, the execution of step A70 is tightly coupled with the result merging process of step A60, occurring synchronously. Its core function is to transform the intermediate results of background computation into meaningful and continuously enriched analysis reports in real time and in an orderly manner. This function is handled by a dedicated report generation module, designed as a direct downstream consumer of the result aggregation thread. The report generation module is triggered immediately after the result aggregation thread successfully and seamlessly writes one or more result data blocks into the global result sequence, thereby extending the continuous integrated data segment starting from position zero in the sequence. For example, when the global status table shows that all results from frame 0 to frame 80 are in place, the result aggregation thread sends an event notification to the report generation module, precisely indicating that the longest continuous available data range currently formed is 0 to 80.

[0256] Upon receiving this notification, the report generation module does not wait for the entire sequence to finish processing but immediately begins its work. It maintains an internal state variable recording the index of the last frame from which a report was previously generated, initially set to negative one. Based on the received new range of 0 to 80, it identifies the frame interval requiring new report content as the range from the previous recorded index plus one to 80. Subsequently, the module directly accesses this ready-to-use data in the global result sequence buffer. Since each composite computation task contains multiple sequentially executed algorithms, the global result sequence does not store single image data but rather a structured, multi-layered result set. The report generation module parses this result set frame by frame, generating rich visualizations and quantitative content for each frame that match the processing flow of the composite computation task. Taking the composite task of liver segmentation and radiomics feature extraction as an example, for the i-th frame, the report generation module generates a page or chart containing multiple views. This may include a three-window comparison chart: the left side shows the original, normalized image slice; the middle side shows the image after target image processing algorithms such as Gaussian blur denoising; and the right side shows the segmentation contours or masks generated by a liver segmentation algorithm and highlighted in a color such as red, superimposed on the denoised image. Below the visualization is a data table detailing all the quantitative metrics calculated by radiomics feature extraction algorithms for the liver segmentation region of that frame, such as volume, average density, and gray-level co-occurrence matrix features describing texture complexity.

[0257] As the results aggregation thread continuously integrates subsequently arriving result data blocks, assuming the next notification indicates that the continuous integration range has expanded to frame 150, the report generation module will be reactivated. At this point, it will begin generating the same structured report content from frame 81 to frame 150, starting from its previous processing position of frame 80, and seamlessly append it to the existing report document. More importantly, the global summary section of the report will also be updated synchronously. For example, a key indicator on the report's homepage, "Total Liver Volume Analyzed," will be updated in real-time by accumulating the segmented volume of each slice in the newly processed 70 frames. Similarly, for all 151 processed frames of data, the report will recalculate and display the global statistical distribution of all radiomics features, such as mean, standard deviation, and histograms, providing users with a dynamically evolving, macroscopic quantitative understanding of the entire lesion region. This progressive generation method means that users do not need to wait for the total processing time to end (several minutes or even longer). They can start browsing the analysis results from the top of the sequence shortly after processing begins, and see the report content "grow" and become richer in real time as processing progresses. This greatly improves the interactive experience and provides strong technical support for real-time clinical diagnostic decisions.

[0258] Continuing with the scenario of the first embodiment, the complexity of the analysis task is increased. It not only requires denoising 512 frames of abdominal CT sequences, but also further execution of deep learning-based automatic liver segmentation, and extraction of a complete set of radiomics features from the segmented liver regions. This workflow constitutes a complex computational task, which will be executed on the same workstation with 2 P-Cores and 4 E-Cores.

[0259] In step A10, when the system generates the same non-uniform partitioning strategy as the previous example (P-Core processes 116 frames, E-Core processes 70 frames), it simultaneously initiates the analysis of the three algorithms (Gaussian denoising, liver segmentation, and feature extraction) in the composite task. The system retrieves data from the knowledge base: the liver segmentation model (such as U-Net) has a high peak memory usage (approximately 3.5GB) and a long computation time; while the feature extraction algorithm depends on the segmentation result. Based on this, the system determines a strict sequential execution priority: denoising → segmentation → extraction. Simultaneously, it specifies resource allocation parameters for the composite task on each computing unit; for example, it allocates a higher CPU scheduling priority to P-Core and sets a 4GB memory usage hard cap (cgroup limit) to ensure the stable operation of the high-consumption segmentation algorithm.

[0260] In steps A20 and A30, the system constructs six composite computation task objects for each of the six image subsequences. Each object encapsulates the aforementioned three-step serial algorithm call chain. The six image processing threads created subsequently are bound to their respective cores and begin execution. Taking thread 1, bound to the P-Core, as an example, it first performs Gaussian denoising on the 116 frames it is responsible for. After completion, the denoising result is used as input to call the liver segmentation model, generating 116 binary liver mask images. Finally, it performs calculations on the denoised images within the mask regions, extracting 116 groups of radiomics feature vectors, each containing 107 feature values.

[0261] Steps A40 and A50 are triggered when thread 4, running on E-Core, completes its 70-frame composite task. The thread encapsulates the three sets of results—denoised image, segmentation mask, and feature vector—along with their starting index (e.g., 348) and the actual execution time of this task (e.g., 45 seconds) into a structured result data block and submits it to the result aggregation thread.

[0262] The result aggregation thread performs in-depth processing on this result data block in step A60. Based on the actual execution time of 45 seconds, it compares it to the E-Core's baseline performance and calculates a performance deviation of -15%. Since this deviation does not exceed the preset severity threshold of -20%, the system attaches a confidence factor (e.g., 0.85) dynamically adjusted based on the deviation to all results within this data block (segmentation masks, feature values, etc.). Subsequently, these results with confidence labels are merged into the global result sequence at starting index 348.

[0263] Simultaneously, the progressive report generation module in step A70 is activated. The reporting module is triggered immediately after the first P-Core (thread 1) completes its tasks for frames 0 to 115 and is integrated. It accesses the 116 frames of data already available in the global results sequence and generates the initial portion of the report. An interactive view is displayed in real-time on the user interface, allowing the user to scroll through the first 116 slices. Each layer clearly displays the original image, an image overlaid with red highlighted segmentation outlines, and a table detailing the radiomics characteristics of the liver region in that slice. The report's summary also begins to form, displaying a preliminary estimate of the "currently analyzed liver volume." As results from other threads arrive and are integrated, the content of this analysis report is continuously expanded and enriched, with global statistics (such as the mean and variance of all features) dynamically updated until the entire sequence analysis is complete, resulting in a final report containing all visualizations and quantitative results.

[0264] See Figure 3 A third embodiment of the present invention provides a medical image analysis system for executing a medical image analysis method described in the second embodiment. The system includes:

[0265] The strategy generation module is used to determine the execution priority and resource allocation parameters for at least two medical image analysis algorithms that are to be applied sequentially to the output results of the standardized image sequence or the target image processing algorithm while generating the dynamic subsequence partitioning strategy.

[0266] The task construction module is used to obtain multiple image subsequences according to the dynamic subsequence partitioning strategy, and then construct a composite computing task for each image subsequence. The composite computing task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence.

[0267] The thread scheduling module is used to create an independent image processing thread for each of the composite computing tasks and allocate it to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority.

[0268] The result encapsulation module is used by each image processing thread to generate a result data block containing all algorithm results after completing the corresponding composite calculation task, and to record the starting index corresponding to the processed image subsequence.

[0269] The result monitoring and collection module is used to start a result aggregation thread, monitor the completion status of each image processing thread, and when the thread is detected to be completed, obtain the result data block output by that image processing thread and the corresponding starting index.

[0270] The result fusion module is used by the result aggregation thread to merge each result data block into the global result sequence according to the original logical order based on the starting index, and to perform weighted fusion of the merged results according to the execution priority.

[0271] The report generation module is used to synchronously generate a progressive analysis report that matches the corresponding image processing process based on the integration status of the global result sequence during the process of dynamically integrating all processed image subsequences.

[0272] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process and related explanations of the methods described above can be found in the corresponding processes in the foregoing system embodiments, and will not be repeated here.

[0273] It should be noted that the medical image analysis system provided in the above embodiments is only an example of the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the modules or steps in the embodiments of the present invention can be further decomposed or combined. For example, the modules in the above embodiments can be merged into one module, or further divided into multiple sub-modules to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the various modules or steps and are not considered as an improper limitation of the present invention.

[0274] A device according to a third embodiment of the present invention includes:

[0275] At least one processor;

[0276] and a memory communicatively connected to at least one of the processors;

[0277] The memory stores instructions that can be executed by the processor to implement the aforementioned medical image analysis method.

[0278] A fourth embodiment of the present invention provides a computer-readable storage medium storing computer instructions, which are executed by the computer to implement the above-described medical image analysis method.

[0279] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process and related descriptions of the storage device and processing device described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0280] The following is for reference. Figure 4 It shows a schematic diagram of the structure of a computer system for implementing embodiments of the systems, methods, and electronic devices of this application. Figure 4 The server shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.

[0281] like Figure 4 As shown, the computer system includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes based on programs stored in Read Only Memory (ROM) 402 or programs loaded from storage section 408 into Random Access Memory (RAM) 403. RAM 403 also stores various programs and data required for system operation. The CPU 401, ROM 402, and RAM 403 are interconnected via bus 404. Input / output (I / O) interface 405 is also connected to bus 404.

[0282] The following components are connected to I / O interface 405: an input section 406 including a keyboard, mouse, etc.; an output section 407 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 408 including a hard disk, etc.; and a communication section 409 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 409 performs communication processing via a network such as the Internet. A drive 410 is also connected to I / O interface 405 as needed. A removable medium 411, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 410 as needed so that computer programs read from it can be installed into storage section 408 as needed.

[0283] Specifically, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 409, and / or installed from removable medium 411. When the computer program is executed by central processing unit (CPU) 401, it performs the functions defined in the methods of this application. It should be noted that the computer-readable medium described above in this application can be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on a computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0284] Computer program code for performing the operations of this application can be written in one or more programming languages ​​or a combination thereof, including object-oriented programming languages ​​such as Java, Smalltalk, and C++, and conventional procedural programming languages ​​such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).

[0285] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0286] The terms “first”, “second”, etc., are used to distinguish similar objects, not to describe or indicate a specific order or sequence.

[0287] The term "comprising" or any other similar term is intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus / device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent in such process, method, article, or apparatus / device.

[0288] The technical solution of the present invention has been described above with reference to the preferred embodiments shown in the accompanying drawings. However, it will be readily understood by those skilled in the art that the scope of protection of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to the relevant technical features, and the technical solutions after these changes or substitutions will all fall within the scope of protection of the present invention.

Claims

1. A method for processing DICOM file sequences, characterized in that, Includes the following steps: Acquire and preprocess the raw DICOM file sequence to generate a normalized image sequence; Parallelism adaptation analysis is performed on the standardized image sequence. The parallelism adaptation analysis is based on the total number of images and the currently available computing resources to generate a dynamic subsequence partitioning strategy that determines the total number of subsequences and the number of non-uniform images in each subsequence. Based on the dynamic subsequence partitioning strategy, the standardized image sequence is non-uniformly divided into multiple image subsequences; Each image subsequence is assigned to an independent image processing thread, and the same target image processing algorithm is executed in parallel in each image processing thread. At the same time, the starting index of each image subsequence in the original sequence is recorded. Based on the starting index and the calculation completion status of each image processing thread, all processed image subsequences are dynamically integrated. When any thread completes its calculation, it triggers the merging of the corresponding processing results according to the order of each image subsequence in the original sequence to generate the final processed image sequence. The parallelism adaptation analysis includes: The currently available computing resources are mapped into multiple computing units, and a resource topology model is established for each computing unit. The baseline throughput of each computing unit is calculated, and the total number of subsequences is set to be equal to the number of computing units. The computing units are arranged in descending order based on their baseline throughput to form an ordered sequence of computing units. A task volume gradient allocation model is constructed. The task volume gradient allocation model defines the functional relationship between the difference in the number of images allocated to any two adjacent computing units in the ordered computing unit sequence and the difference in the baseline throughput of the two computing units through a preset decay function. Based on the aforementioned functional relationship and combined with the total number of images, the task quantity gradient allocation model is solved to determine the specific number of images contained in each subsequence, thereby generating the dynamic subsequence partitioning strategy. Based on the aforementioned functional relationship and the total number of images, the task quantity gradient allocation model is solved to determine the specific number of images contained in each subsequence, thereby generating the dynamic subsequence partitioning strategy. The method is as follows: An initial image quantity baseline value to be allocated is set for the ordered computation unit sequence; Based on the preset decay function, the difference between the baseline throughput of the first-ranked computing unit and the baseline throughput of the second-ranked computing unit in the ordered computing unit sequence is used to calculate the difference in the number of image allocations between the first-ranked computing unit and the second-ranked computing unit. Based on the difference between the initial image quantity benchmark value and the image allocation quantity, the image allocation quantity of the second-ranked calculation unit is determined; The preset decay function is applied iteratively, along with the number of images allocated to the previous calculation unit in the ordered calculation unit sequence, to sequentially determine the number of images allocated to each subsequent calculation unit. The number of images allocated to all computing units in the ordered computing unit sequence is summed, and the summation result is compared with the total number of images; If the summation result is not equal to the total number of images, the initial image quantity benchmark value is adjusted according to the difference, and the steps starting from calculating the difference in the number of image allocations are repeated for iterative calculation. The iteration terminates when the summation result matches the total number of images. At this point, the number of images allocated to each computing unit constitutes the dynamic subsequence partitioning strategy.

2. The method according to claim 1, characterized in that, Each image subsequence is assigned to an independent image processing thread, and the same target image processing algorithm is executed in parallel in each image processing thread. Simultaneously, the starting index of each image subsequence in the original sequence is recorded. The method is as follows: Based on the dynamic subsequence partitioning strategy, an image processing thread pool corresponding to the total number of subsequences is created, and each image processing thread in the pool is associated with a physical computing unit in the ordered computing unit sequence. Based on the number of non-uniform images in each image subsequence, the start index and end index of each image subsequence in the standardized image sequence are calculated sequentially, and the corresponding image data blocks are mapped to the private memory space of each image processing thread through memory mapping technology. Each image processing thread independently executes the target algorithm based on the data in its own memory space, and monitors the actual load indicators of its associated computing units; The execution progress and load metrics of all image processing threads are periodically collected. When the actual performance metrics of the associated computing units monitored within the preset continuous monitoring period deviate from the corresponding baseline throughput, the online fine-tuning of the data block range allocated to the image processing threads associated with the computing units whose performance metrics deviate is triggered based on the real-time load and the number of remaining unprocessed images. The online fine-tuning is achieved by dynamically adjusting the offset and length of the memory mapping region corresponding to the affected image processing thread, and after completion, a synchronization event is sent to the affected image processing thread to update the data range and starting index processed by the image processing thread. After each image processing thread completes the processing of the current data block, it associates the final processing result with the starting index it maintains and outputs it.

3. The method according to claim 2, characterized in that, The online fine-tuning is achieved by dynamically adjusting the offset and length of the corresponding memory-mapped region, including: Based on the degree and duration of deviation between the actual performance indicators of the associated computing units and the benchmark throughput, it is determined whether to trigger a fine-tuning operation. When fine-tuning is triggered, the adjustment amount for the affected thread is calculated based on the actual load index and the number of remaining unprocessed images; Based on the adjustment amount, define the data sub-blocks to be cut from the range of data blocks currently allocated to the affected threads; The association between the data sub-block and the original memory mapping region is released using atomic operations; By creating a private memory mapping region pointing to the starting physical address of the data sub-block for the candidate thread, the data sub-block to be cut will be unassociated and the memory mapping relationship between it and the candidate thread with the lowest current real-time load will be re-established. Synchronously update the offset address and length of the memory mapping region corresponding to the affected thread and the candidate thread, as well as the start index and end index maintained by each thread.

4. The method according to claim 1, characterized in that, Based on the starting index and the computation completion status of each image processing thread, all processed image subsequences are dynamically integrated. The method is as follows: Set up a final result buffer that matches the total number of images, and start a separate event listener process to poll the computation completion status of each image processing thread; When any thread completes its calculation, immediately obtain the image subsequence processing result and the starting index of the record output by that image processing thread; Based on the obtained starting index, the corresponding processing result data is written sequentially into the corresponding logical storage area of ​​the final result buffer; In a global status table, the logical range covered by this write result is marked as integrated; Based on the global state table, continuously determine whether there is a continuous and uninterrupted integrated interval starting from the logical start position of the standardized image sequence; If it exists, mark the buffer data corresponding to the longest continuous integrated interval that has been formed as committable. The process of polling, acquiring, writing, marking, and judging is repeated until the global state table indicates that all logical intervals of the standardized image sequence have been marked as integrated, and the complete interval from the beginning to the end is in a committable state. At this time, the data in the final result buffer constitutes the final processed image sequence.

5. A medical image analysis method, characterized in that, A method for processing a DICOM file sequence according to any one of claims 1 to 4, comprising: While generating a dynamic subsequence partitioning strategy, the execution priority and resource allocation parameters are determined for at least two medical image analysis algorithms that are to be applied sequentially to the output results of standardized image sequences or target image processing algorithms. After obtaining multiple image subsequences according to the dynamic subsequence partitioning strategy, a composite computation task is constructed for each image subsequence. The composite computation task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence. An independent image processing thread is created for each of the composite computing tasks and assigned to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority. After completing the composite computation task, each image processing thread generates a result data block containing all algorithm results and records the starting index corresponding to the processed image subsequence; Start a result aggregation thread to monitor the completion status of each image processing thread. When a thread is detected to be complete, obtain the result data block output by that image processing thread and its corresponding starting index. The result aggregation thread merges each result data block into the global result sequence according to the original logical order based on the starting index, and performs weighted fusion of the merged results according to the execution priority. During the dynamic integration of all processed image subsequences, a progressive analysis report matching the processing of the composite computation task is generated synchronously based on the integration status of the global result sequence.

6. The method according to claim 5, characterized in that, Determine execution priority and resource allocation parameters for at least two medical image analysis algorithms, including the following steps: Acquire historical execution performance data for at least two medical image analysis algorithms, wherein the historical execution performance data includes peak time for a single operation and peak memory usage. Based on the data characteristics of the data processed by the medical image analysis algorithms, the degree of dependence of each medical image analysis algorithm on the data characteristics is analyzed, and the dependence level is defined; Based on the peak computation time and memory usage in the historical execution performance data, as well as the dependency level, a quantified initial execution priority score is generated for each medical image analysis algorithm through a preset priority calculation function. During the generation or application of the dynamic subsequence partitioning strategy, the initial execution priority score is mapped to specific resource allocation parameters based on the baseline throughput and free memory capacity of each computing unit. The resource allocation parameters include the proportion of computing time slices reserved for each algorithm on each computing unit and the maximum available memory quota. Based on the resource allocation parameters, the resource quota allocated by each computing unit to the composite computing task is dynamically adjusted.

7. The method according to claim 5, characterized in that, The merged results are weighted and fused according to the execution priority, and the method is as follows: When the result aggregation thread obtains the result data block and its starting index, it simultaneously receives the actual load index of the computing unit where it is located when processing the composite computing task, which is monitored and reported by each image processing thread. Based on the actual load indicators, the effective throughput of the corresponding computing unit when executing various medical image analysis algorithms is re-evaluated; The effective throughput is compared with the baseline throughput of the computing unit, and the deviation value is calculated. During the process of merging each result data block into the global result sequence, a confidence factor that is dynamically adjusted according to the deviation value is attached to the algorithm result in each result data block. When the deviation value exceeds a preset threshold, the confidence factor is negatively adjusted. When it is necessary to fuse multiple results from different result data blocks that belong to the same logical location and originate from the same algorithm, a weighted fusion coefficient is calculated based on the confidence factor attached to each result and the execution priority. After completing the fusion processing of all currently available result data blocks, a computing resource efficiency assessment report is generated based on the overall distribution of the deviation values. Based on the aforementioned computing resource performance evaluation report, online fine-tuning of the resource allocation parameters of the computing units associated with the image processing threads that have not yet completed their computations is triggered.

8. A medical image analysis system for performing a medical image analysis method according to any one of claims 5-7, characterized in that, The system includes: The strategy generation module is used to generate a dynamic subsequence partitioning strategy and determine the execution priority and resource allocation parameters for at least two medical image analysis algorithms that are to be applied sequentially to the output results of standardized image sequences or target image processing algorithms. The task construction module is used to obtain multiple image subsequences according to the dynamic subsequence partitioning strategy, and then construct a composite computing task for each image subsequence. The composite computing task executes the target image processing algorithm and at least two medical image analysis algorithms in sequence. The thread scheduling module is used to create an independent image processing thread for each of the composite computing tasks and allocate it to an independent computing unit. Within each image processing thread, the medical image analysis algorithm is called serially according to the execution priority. The result encapsulation module is used by each image processing thread to generate a result data block containing all algorithm results after completing the corresponding composite calculation task, and to record the starting index corresponding to the processed image subsequence. The result monitoring and collection module is used to start a result aggregation thread, monitor the completion status of each image processing thread, and when the thread is detected to be completed, obtain the result data block output by that image processing thread and the corresponding starting index. The result fusion module is used by the result aggregation thread to merge each result data block into the global result sequence according to the original logical order based on the starting index, and to perform weighted fusion of the merged results according to the execution priority. The report generation module is used to synchronously generate a progressive analysis report that matches the corresponding image processing process based on the integration status of the global result sequence during the process of dynamically integrating all processed image subsequences.