LungSAM large model segmentation method based on lung tumor frequency-aware adapter

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing a large LungSAM model based on a lung tumor frequency-aware adapter, the problem of poor adaptability of deep learning models to low contrast in lung tumor segmentation was solved, achieving efficient and accurate lung tumor segmentation and intelligent assisted diagnosis.

CN122244435APending Publication Date: 2026-06-19NANTONG UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: NANTONG UNIV
Filing Date: 2026-03-05
Publication Date: 2026-06-19

Application Information

Patent Timeline

05 Mar 2026

Application

19 Jun 2026

Publication

CN122244435A

IPC: G06V10/26; G06T7/00; G06V10/80; G06V10/82; G06N3/045

AI Tagging

Application Domain

Image analysis Character and pattern recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing deep learning models are poorly adapted to low-contrast tumors in lung tumor segmentation, rely on high-cost, high-quality labeled data, and are difficult to achieve accurate and automated segmentation.

Method used

We constructed a large LungSAM model based on a frequency-aware adapter for lung tumors. By using a lightweight frequency-aware adapter and a triplet attention mechanism, combined with wavelet transform to enhance tumor edge details, we built an end-to-end inference system.

Benefits of technology

It significantly reduces reliance on large-scale labeled data, improves lung tumor segmentation accuracy, supports fully automatic and interactive segmentation, and provides intelligent auxiliary tools.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122244435A_ABST

Patent Text Reader

Abstract

This application discloses a large-scale LungSAM model segmentation method based on a frequency-aware adapter for lung tumors, comprising: analyzing the adaptability defects of SAM in lung tumor segmentation scenarios based on CT image data, and obtaining lung tumor images containing adaptability defects; constructing a lightweight frequency-aware adapter for lung tumor features based on the lung tumor images containing adaptability defects; constructing and training a LungSAM model based on the frequency-aware adapter; and constructing and optimizing an end-to-end inference system for lung tumor segmentation based on the trained LungSAM model. This application fully considers the clinical reality of the difficulty in obtaining high-quality manually annotated data for lung tumor CT images and the high cost of annotation. It utilizes the large-scale SAM model, which has strong zero-shot and few-shot transfer capabilities, as its basic architecture. Through parameter-efficient adapter design, it achieves accurate lung tumor segmentation with only 2.5% parameter fine-tuning, significantly reducing the dependence on large-scale annotated data.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of medical image segmentation, specifically involving the LungSAM large model segmentation method based on the frequency-aware adapter for lung tumors. Background Technology

[0002] Lung tumors can occur in any lobe of the lung, bronchi, and pleura, exhibiting strong morphological heterogeneity, indistinct borders, and infiltrative growth. Furthermore, lung imaging often suffers from low contrast, noise interference, and artifacts, making accurate diagnosis and segmentation of lung tumors extremely challenging. Lung tumor lesions typically include solid tumor areas, ground-glass opacities, infiltrative areas, and necrotic areas, with significant differences in imaging characteristics among different lesions. Traditional lung tumor segmentation relies primarily on clinicians and radiologists manually delineating tumor boundaries and lesion extent. This method is not only cumbersome and time-consuming but also susceptible to human factors such as physician experience and subjective judgment, making it difficult to guarantee the accuracy and consistency of segmentation results. With the continuous increase in the amount of lung CT and X-ray imaging data, traditional manual segmentation can no longer meet the actual needs of rapid clinical diagnosis, preoperative planning, and efficacy evaluation. Therefore, automated lung tumor segmentation technology based on deep learning has gradually become a core research area, providing a key technical path for achieving rapid and accurate segmentation of lung lesions.

[0003] The rapid development of deep learning and medical imaging technologies has provided effective means to overcome the challenge of lung tumor segmentation, and deep learning-based segmentation methods have become a core technology for clinical auxiliary diagnosis of lung tumors. Existing research largely focuses on optimizing U-Net series networks and Transformer architectures, achieving lung tumor region segmentation by improving network structure and fusing multi-scale features. However, these methods still have significant limitations. Model training heavily relies on large volumes of high-precision labeled lung image data, while lung lesion annotation requires professional radiologists, resulting in high data acquisition costs, long cycles, and a severe shortage of high-quality labeled datasets. Although basic visual models such as SAM have brought new breakthroughs to segmentation tasks due to their strong generalization capabilities, they have poor adaptability to low-contrast lung tumors and are difficult to directly apply to fully automated and accurate lung tumor segmentation. Therefore, developing automated segmentation models that can efficiently capture fine-grained features of lung lesions has become an urgent research need in the field of lung tumor diagnosis and treatment. Summary of the Invention

[0004] This application provides a large-scale LungSAM segmentation method based on a lung tumor frequency-aware adapter to solve the aforementioned technical problems.

[0005] To address the aforementioned technical problems, this application adopts the following technical solution: a LungSAM large model segmentation method based on a lung tumor frequency-sensing adapter, comprising:

[0006] S1. Based on CT image data, analyze the adaptability defects of SAM in lung tumor segmentation scenarios and obtain lung tumor images containing adaptability defects;

[0007] S2. Based on lung tumor images containing adaptive defects, a lightweight frequency-aware adapter for lung tumor features is constructed.

[0008] S3. Based on the frequency-aware adapter, construct and train the LungSAM model;

[0009] S4. Based on the trained LungSAM model, construct and optimize an end-to-end inference system for lung tumor segmentation.

[0010] Furthermore, the method in step S1 includes:

[0011] S11. Select multiple sets of clinical lung tumor CT images;

[0012] S12. Calculate the sphericity and surface area / volume ratio of the tumor;

[0013] S13. Based on the three core indicators of Dice similarity coefficient, intersection-union ratio and 95% Hausdorff distance, each image was tested three times and the average value was taken.

[0014] S14. Quantify the performance limitations of the native SAM model in lung tumor segmentation tasks and obtain lung tumor images containing adaptive defects.

[0015] Furthermore, the method in step S2 includes:

[0016] S21. The lung tumor image containing adaptive defects is transmitted to the image encoder of the SAM model to obtain the tumor feature map, and then transmitted to the residual connection branch and the frequency-aware enhancement branch.

[0017] S22. In the frequency-aware enhancement branch, the tumor feature map is preprocessed first to obtain the feature tensor and reshape it into a format suitable for two-dimensional spatial processing;

[0018] S23. Perform a two-dimensional discrete wavelet transform on the reshaped feature tensor to decompose it into low-frequency approximate components, vertical details, horizontal details and diagonal details. At the same time, receive two-dimensional bounding box cue and convert it into high-dimensional embedding features to generate cue embedding vector.

[0019] S24. Use depthwise separable convolution to optimize global structural information for low-frequency approximation components; calculate spatial attention weights by concatenating vertical, horizontal and diagonal details with low-frequency approximation components respectively, then perform weighted fusion and depthwise separable convolution to complete the enhancement process, and finally reconstruct features by inverse discrete wavelet transform and batch normalization to obtain normalized reconstructed features.

[0020] S25. Input the normalized reconstructed features into the three sub-attention branches of the triplet attention module: channel-width, height-channel, and space. Perform a weighted average fusion of the outputs of the three branches to obtain the final attention features that integrate multi-dimensional dependencies.

[0021] S26. Shape reshaping and Dropout operations are performed on the final attention features to prevent overfitting. Then, the low-dimensional bottleneck representation is restored to the original high-dimensional channel through a linear upprojection layer. The adapter contribution scale is dynamically adjusted by multiplying it with a learnable scaling factor to obtain the scaled adapter branch output.

[0022] S27. Add the scaled adapter branch output to the tumor feature map by residual addition to obtain the final output feature of the fused frequency-aware enhancement and original information, thus completing one adapter forward propagation.

[0023] Furthermore, the method in step S23 includes:

[0024] Acquire the low-frequency approximation component LL, vertical detail LH, horizontal detail HL, and diagonal detail HH;

[0025] (1);

[0026] in, For the reshaped feature tensor, the low-frequency subband and three high-frequency subbands All dimensions are .

[0027] Furthermore, the method in step S24 includes:

[0028] S241. Low-frequency approximation components are processed using depthwise separable convolution to preserve and optimize global information; the specific formula is as follows:

[0029] (2);

[0030] in, For depthwise convolution, For pointwise convolution;

[0031] S242. Vertical details, horizontal details, diagonal details, and low-frequency approximate components are concatenated, spatial attention weights are calculated, and then weighted, depthwise separable convolution and inverse discrete wavelet transform are performed to obtain the reconstructed features; the specific formula is as follows:

[0032] (3);

[0033] in, It is the sigmoid activation function. It is a 7×7 convolution. For depthwise separable convolution;

[0034] The formula for the inverse discrete wavelet transform is shown below:

[0035] (4);

[0036] S243. Normalize the reconstructed features to obtain new normalized features; the specific formula is as follows:

[0037] (5);

[0038] in, , These are the mean and variance of the current batch; and These are learnable parameters.

[0039] Furthermore, the method in step S3 includes:

[0040] S31. Load the pre-trained SAM model, which includes the Vision Transformer image encoder, cue encoder, and mask decoder components;

[0041] S32. Freeze all backbone parameters of the SAM model and determine the insertion position strategy of the LT-FAA adapter in the SAM model;

[0042] S33. The LT-FAA adapter is pre-trained for domain adaptation using a large-scale general medical image segmentation dataset, so that the SAM model can initially adapt to the texture features and contrast characteristics of medical images;

[0043] S34. Based on the SAM model, load the adapter weights after the first stage of pre-training, use the lung tumor-specific dataset for task-specific training, and obtain the LungSAM model;

[0044] S35. In each training iteration, a batch of CT images and corresponding tumor mask labels are obtained from the data loader, and forward propagation is performed through the LungSAM model to calculate the multi-task loss function between the model prediction and the real label.

[0045] S36. Based on the multi-task loss function, execute the backpropagation algorithm, and then use the AdamW optimizer to iteratively update the adapter parameters by combining gradient information with preset hyperparameters such as learning rate and weight decay, so as to minimize the value of the multi-task loss function.

[0046] S37. After each training cycle, evaluate the current model performance on an independent validation set, calculate key metrics such as Dice, IoU, and 95%HD, and record the model weights corresponding to the best performance to implement a model checkpoint saving mechanism.

[0047] S38. The performance of the trained LungSAM model is comprehensively evaluated on an independent test set and compared with the current state-of-the-art lung tumor segmentation methods.

[0048] The beneficial effects of this application are:

[0049] (1) This application fully considers the clinical situation that it is difficult to obtain high-quality manually annotated data of lung tumor CT images and the annotation cost is high. It uses the SAM large model with strong zero-sample and few-sample transfer capabilities as the basic architecture. By designing an efficient adapter, it achieves accurate lung tumor segmentation with only 2.5% parameter fine-tuning, which significantly reduces the dependence on large-scale annotated data.

[0050] (2) In view of the characteristics of multi-scale, heterogeneous and blurred boundaries of CT images of lung tumors, this application proposes an LT-FAA adapter, which decomposes the features into multi-frequency sub-bands through wavelet transform, enhances the details of tumor edge by using high-frequency enhancement module, and introduces a triplet attention mechanism to capture channel-space multi-dimensional dependencies. While improving the segmentation accuracy of tumor core area and spiculated boundary, it effectively reduces computational redundancy.

[0051] (3) Based on the LungSAM large model based on frequency-aware adapter, this application constructs an end-to-end inference system that supports fully automatic and interactive dual-mode segmentation. By synchronously displaying the original CT images and visualizing the segmentation results in real time, it enables doctors to adjust the lung tumor segmentation results in real time, perform visual evaluation and clinical feedback optimization, and provide intelligent auxiliary tools for early lung cancer screening, surgical planning and efficacy evaluation. Attached Figure Description

[0052] Figure 1 This is a flowchart illustrating an embodiment of the LungSAM large model segmentation method based on a lung tumor frequency sensing adapter according to this application.

[0053] Figure 2 yes Figure 1 A flowchart illustrating step S1 of an embodiment;

[0054] Figure 3 yes Figure 1 A flowchart illustrating step S2 of an embodiment;

[0055] Figure 4 yes Figure 3 A flowchart illustrating step S22 of an embodiment;

[0056] Figure 5 yes Figure 3 A flowchart illustrating step S24 in one embodiment;

[0057] Figure 6 yes Figure 1 A flowchart illustrating step S3 in one embodiment;

[0058] Figure 7 This is a model architecture overview diagram of an embodiment of the LungSAM large model segmentation method based on a lung tumor frequency-aware adapter of this application;

[0059] Figure 8 This is a visualization of the algorithm comparison results of one embodiment of the LungSAM large model segmentation method based on the frequency-aware adapter for lung tumors in this application. Figure 1 ;

[0060] Figure 9 This is a visualization of the algorithm comparison results of one embodiment of the LungSAM large model segmentation method based on the frequency-aware adapter for lung tumors in this application. Figure 2 . Detailed Implementation

[0061] To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to specific embodiments.

[0062] Numerous specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways than those described herein, and therefore the invention is not limited to the specific embodiments disclosed in the following specification.

[0063] See Figure 1 , Figure 1 This is a flowchart illustrating an embodiment of the LungSAM large model segmentation method based on a lung tumor frequency-aware adapter according to this application. The method includes:

[0064] S1. Based on CT image data, analyze the adaptability defects of SAM in lung tumor segmentation scenarios and obtain lung tumor images containing adaptability defects.

[0065] For details, please refer to Figure 2 The method of step S1 includes:

[0066] S11. 3000 clinical lung tumor CT images were selected, covering the full size range from micro nodules, small nodules, medium and large tumors to giant tumors, including 1600 central tumors and 1400 peripheral tumors;

[0067] S12. The sphericity and surface area / volume ratio of the tumor were calculated, and there were significant differences compared with the regular shape of objects in natural images;

[0068] S13. Through standardized testing, using "single point of tumor center + bounding rectangle" as the unified input method, each image was tested three times and the average value was taken. Based on three core indicators: Dice similarity coefficient (DSC), intersection-over-union ratio (IoU), and 95% Hausdorff distance (95%HD);

[0069] S14. The performance limitations of the native SAM model (including VIT-B, VIT-H and VIT-L versions) in lung tumor segmentation tasks were quantified to obtain lung tumor images with adaptive defects.

[0070] S2. Based on lung tumor images containing adaptive defects, a lightweight frequency-aware adapter for lung tumor features is constructed.

[0071] For details, please refer to Figure 3 The method of step S2 includes:

[0072] S21. The lung tumor image containing adaptive defects is transmitted to the image encoder of the SAM model to obtain the tumor feature map, and then transmitted to the residual connection branch and the frequency-aware enhancement branch.

[0073] S22. In the frequency-aware enhancement branch, the tumor feature map is preprocessed first to obtain the feature tensor and reshape it into a format suitable for two-dimensional spatial processing;

[0074] For details, please refer to Figure 4 The method of step S22 includes:

[0075] S221. Normalize the input tumor feature map to obtain the normalized features.

[0076] Specifically, the input tumor feature map X is passed to the residual connection branch and the frequency-aware enhancement branch; in the frequency-aware enhancement branch, the input feature X is first subjected to layer normalization to stabilize the training. The specific formula is as follows:

[0077] (6);

[0078] in, The input is a tumor feature map; and These are the mean and standard deviation along the channel dimension, respectively; and These are learnable parameters; It is the numerical stability constant; This indicates element-wise multiplication.

[0079] S222. The normalized features are compressed from the high-dimensional channel C to the low-dimensional bottleneck representation d through a linear downprojection layer, significantly reducing the number of parameters. The downprojected features are expressed in the following formula:

[0080] (7);

[0081] in, The weight matrix is the downward projection matrix. For bias.

[0082] S223. Apply the nonlinear activation function ReLU to the features after downward projection to introduce nonlinear transformation capability and obtain the feature tensor. ;

[0083] S224. Then the feature tensor Reshaping the shape from [B, H, W, d] into a spatial format This is so that subsequent two-dimensional spatial processing can be performed.

[0084] S23. Perform a two-dimensional discrete wavelet transform on the reshaped feature tensor to decompose it into low-frequency approximate components, vertical details, horizontal details, and diagonal details. Simultaneously, receive two-dimensional bounding box cues and convert them into high-dimensional embedding features to generate cue embedding vectors. ;

[0085] Acquire the low-frequency approximation component LL, vertical detail LH, horizontal detail HL, and diagonal detail HH;

[0086] (1);

[0087] in, For the reshaped feature tensor, the low-frequency subband and three high-frequency subbands All dimensions are .

[0088] S24. Use depthwise separable convolution to optimize global structural information for low-frequency approximation components; calculate spatial attention weights by concatenating vertical, horizontal and diagonal details with low-frequency approximation components respectively, then perform weighted fusion and depthwise separable convolution to complete the enhancement process, and finally reconstruct features by inverse discrete wavelet transform and batch normalization to obtain normalized reconstructed features.

[0089] Furthermore, the method in step S24 includes:

[0090] S241. Low-frequency approximation components are processed using depthwise separable convolution to preserve and optimize global information; the specific formula is as follows:

[0091] (2);

[0092] in, For depthwise convolution, For pointwise convolution;

[0093] S242. Vertical details, horizontal details, diagonal details, and low-frequency approximate components are concatenated, spatial attention weights are calculated, and then weighted, depthwise separable convolution and inverse discrete wavelet transform are performed to obtain the reconstructed features; the specific formula is as follows:

[0094] (3);

[0095] in, It is the sigmoid activation function. It is a 7×7 convolution. For depthwise separable convolution;

[0096] The formula for the inverse discrete wavelet transform is shown below:

[0097] (4);

[0098] S243. Normalize the reconstructed features to obtain new normalized features; the specific formula is as follows:

[0099] (5);

[0100] in, , These are the mean and variance of the current batch; and These are learnable parameters.

[0101] S25. Input the normalized reconstructed features into the three sub-attention branches of the triplet attention module: channel-width, height-channel, and spatial. Perform a weighted average fusion of the outputs of the three branches to obtain the final attention features that integrate multi-dimensional dependencies. ;

[0102] S26. Features of final attention Shape reshaping and Dropout operations are performed to prevent overfitting. Then, the low-dimensional bottleneck representation is restored to the original high-dimensional channel through a linear upprojection layer. The adapter contribution scale is dynamically adjusted by multiplying it with a learnable scaling factor to obtain the scaled adapter branch output.

[0103] Specifically, attention features Reconstructing the shape from [B, d, H, W] back to the format [B, H, W, d]. Apply the Dropout operation to the reshaped features to obtain This prevents overfitting and improves the model's generalization ability; features are obtained through a linear up-projection layer. The low-dimensional bottleneck d is recovered to the original high-dimensional channel C, and the output vector features are obtained.

[0104] Subsequently, the output vector features are multiplied with a learnable scaling factor α to dynamically adjust the scale of the adapter contribution, thus obtaining the scaled adapter branch output. The details are as follows:

[0105] (8);

[0106] in, These are trainable scalar parameters.

[0107] S27. Add the scaled adapter branch output to the tumor feature map X by residual addition to obtain the final output feature Y that fuses the frequency-aware enhancement and the original information, thus completing one adapter forward propagation.

[0108] S3. Based on the frequency-aware adapter, construct the LungSAM model and train it.

[0109] For details, please refer to Figure 6 The method of step S3 includes:

[0110] S31. Load the pre-trained SAM model, which includes a Vision Transformer image encoder, a cue encoder, and a mask decoder component; and fully retain its general visual representation capabilities learned on natural images, as expressed in the formula:

[0111] (9);

[0112] S32. Freeze all backbone parameters of the SAM model and set require_grad=False to prevent damage to its pre-trained general feature extraction capability during lung tumor-specific task training, while significantly reducing the number of parameters that need to be optimized; determine the insertion position strategy of the LT-FAA adapter in the SAM model structure, and choose to insert the adapter module in parallel after the Feed-Forward Network layer of each Transformer block of the Vision Transformer encoder to form a residual connection structure.

[0113] S33. Use a large-scale general medical image segmentation dataset (such as the FLARE22 abdominal organ dataset) to perform domain adaptation pre-training on the LT-FAA adapter, so that the SAM model can initially adapt to the texture features and contrast characteristics of medical images. The learning rate is set to 1e-3 and the training is performed for 100 epochs.

[0114] S34. Based on the SAM model, load the adapter weights after the first stage of pre-training, use the lung tumor-specific dataset for task-specific training, focus on optimizing the tumor boundary segmentation performance, reduce the learning rate to 5e-4, train for 100 epochs, and obtain the LungSAM model.

[0115] S35. In each training iteration, a batch of CT images and corresponding tumor mask labels are obtained from the data loader, and forward propagation is performed through the LungSAM model to calculate the multi-task loss function between the model prediction and the real label.

[0116] Among them, the multi-task loss function adopts a hybrid loss function that combines weighted cross-entropy loss and Dice loss. Among them, weighting factors and Set them to 0.2 and 0.8 respectively.

[0117] S36. Based on the multi-task loss function, execute the backpropagation algorithm, and then use the AdamW optimizer to iteratively update the adapter parameters by combining gradient information with preset hyperparameters such as learning rate and weight decay, so as to minimize the value of the multi-task loss function.

[0118] S37. After each training cycle, evaluate the current model performance on an independent validation set, calculate key metrics such as Dice, IoU, and 95%HD, and record the model weights corresponding to the best performance to implement a model checkpoint saving mechanism.

[0119] S38. The performance of the trained LungSAM model is comprehensively evaluated on an independent test set and compared with the current state-of-the-art lung tumor segmentation methods.

[0120] S4. Based on the trained LungSAM model, construct and optimize an end-to-end inference system for lung tumor segmentation.

[0121] Specifically, step S4 includes:

[0122] S41. Receive the DICOM sequence of the patient's lung CT scan as system input, automatically extract image data and perform preprocessing, including window width and window level adjustment, grayscale normalization and isotropic resampling, to ensure the consistency and standardization of input data;

[0123] S42. Load the trained LungSAM model, which consists of a pre-trained SAM infrastructure and an LT-FAA adapter optimized for lung tumors, while loading the model configuration parameters and initializing the inference environment.

[0124] S43. Two segmentation modes are provided for users to choose from: fully automatic segmentation mode and interactive segmentation mode;

[0125] S44. Automatic Mode: Performs automatic lung region segmentation, uses a lightweight convolutional neural network to quickly locate the bilateral lung regions, excludes background and irrelevant tissue, narrows the scope of subsequent processing and improves computational efficiency;

[0126] S45. Interaction Mode: Receives interactive prompts input by the user through the graphical interface, including positive dots (tumor areas), negative dots (non-tumor areas), and bounding boxes, and encodes the prompt information into a format that the model can recognize;

[0127] S46. Detect tumor candidate regions within the lung area to provide initial localization information for subsequent fine segmentation;

[0128] S47. Input the preprocessed CT image and prompt information into the LungSAM model, perform forward inference calculation, and generate a binarized segmentation mask.

[0129] To verify the effectiveness of the present invention, this embodiment was compared with the current mainstream SAM-like segmentation model on a clinical lung tumor CT image dataset, and the specific results are shown in Table 1.

[0130] Table 1 shows the comparison and segmentation results.

[0131]

[0132] Table 1 compares the performance of the method in this embodiment with other mainstream large-scale lung tumor segmentation models on an internally constructed lung tumor CT image dataset. Evaluation metrics include Dice, IoU, and 95% HD. The results show that the method in this embodiment outperforms the other comparative models in all three metrics. Specifically, the DSC of this embodiment reaches 94.92%, an improvement of 3.19% to 38.60% compared to other methods; the IoU value is 92.38%, an improvement of 1.49% to 38.11% compared to other methods; and the 95% HD distance is 17.92 mm, a reduction of 10.15 mm to 49.43 mm compared to other methods. This indicates that the method in this embodiment achieves superior segmentation results in the field of lung tumor segmentation.

[0133] This embodiment, by fine-tuning the large SAM model and utilizing LT-FAA tailored to lung tumor features, not only effectively reduces computational costs but also adaptively enhances tumor boundary details, accurately capturing weak boundary features of lung tumors of different sizes, shapes, and densities. It achieved optimal segmentation results across multiple evaluation metrics, validating the effectiveness and superiority of this method.

[0134] Figure 7 This diagram provides an overview of the large-scale model architecture of the LungSAM large-scale model segmentation method and system based on a lung tumor frequency-aware adapter proposed in this invention. The model mainly consists of three core components: an optimized image encoder, a cue encoder, and a mask decoder. Modules marked with a flame icon represent a key improvement in this invention: an LT-FAA frequency-aware adapter is embedded within the Transformer block of the image encoder. Modules marked with a snowflake icon represent components that retain the original SAM model, including the cue encoder and the mask decoder.

[0135] Figure 8 and Figure 9 This image shows a visualization comparing the LungSAM large-scale segmentation method and system based on a frequency-aware adapter for lung tumors proposed in this invention on a self-built lung tumor CT dataset. The selected lung tumor cases show significant differences in the boundary morphology of solid nodules, ground-glass nodules, and some solid nodules. Compared to other mainstream large-scale segmentation models (as shown in Table 1), the LungSAM model based on the frequency-aware adapter can adaptively distinguish different density regions and morphological features of the tumor. It can more accurately capture the spiculation, lobulation, and weak boundary features of the tumor adhering to surrounding tissues in CT images, ensuring that the segmentation results highly match the true labeled boundaries delineated by radiologists.

[0136] The above description is merely an embodiment of this application and does not limit the patent scope of this application. Any equivalent structural or procedural transformations made using the content of this application's specification and drawings, or direct or indirect applications in other related technical fields, are similarly included within the patent protection scope of this application.

Claims

1. A large-scale LungSAM segmentation method based on a lung tumor frequency-sensing adapter, characterized in that, include: S1. Based on CT image data, analyze the adaptability defects of SAM in lung tumor segmentation scenarios and obtain lung tumor images containing adaptability defects; S2. Based on the lung tumor images containing adaptive defects, construct a lightweight frequency-aware adapter for lung tumor features; S3. Based on the frequency-aware adapter, construct the LungSAM model and train it; S4. Based on the trained LungSAM model, construct and optimize an end-to-end inference system for lung tumor segmentation.

2. The method according to claim 1, characterized in that, The method of step S1 includes: S11. Select multiple sets of clinical lung tumor CT images; S12. Calculate the sphericity and surface area / volume ratio of the tumor; S13. Based on the three core indicators of Dice similarity coefficient, intersection-union ratio and 95% Hausdorff distance, each image was tested three times and the average value was taken. S14. Quantify the performance limitations of the native SAM model in the lung tumor segmentation task, and obtain the lung tumor image containing the adaptation defect.

3. The method according to claim 2, characterized in that, The method of step S2 includes: S21. The lung tumor image containing the adaptation defect is transmitted to the image encoder of the SAM model to obtain the tumor feature map, and then transmitted to the residual connection branch and the frequency-aware enhancement branch. S22. In the frequency-aware enhancement branch, the tumor feature map is first preprocessed to obtain the feature tensor and reshape it into a format suitable for two-dimensional spatial processing; S23. Perform a two-dimensional discrete wavelet transform on the reshaped feature tensor to decompose it into low-frequency approximate components, vertical details, horizontal details and diagonal details. At the same time, receive two-dimensional bounding box cue and convert it into high-dimensional embedding features to generate cue embedding vector. S24. The global structural information is optimized by depthwise separable convolution on the low-frequency approximation component; the spatial attention weights are calculated by concatenating the vertical details, the horizontal details and the diagonal details with the low-frequency approximation component respectively, and then the enhancement processing is completed by weighted fusion and depthwise separable convolution. Finally, the features are reconstructed by inverse discrete wavelet transform and batch normalization is performed to obtain the normalized reconstructed features. S25. Input the normalized reconstructed features into the three sub-attention branches of the triplet attention module: channel-width, height-channel, and space. Perform a weighted average fusion of the outputs of the three branches to obtain the final attention features that integrate multi-dimensional dependencies. S26. Shape reshaping and Dropout operations are performed on the final attention features to prevent overfitting. Then, the low-dimensional bottleneck representation is restored to the original high-dimensional channel through a linear upprojection layer. The adapter contribution scale is dynamically adjusted by multiplying it with a learnable scaling factor to obtain the scaled adapter branch output. S27. Add the scaled adapter branch output to the tumor feature map by residual addition to obtain the final output feature of the fused frequency-aware enhancement and original information, thus completing one adapter forward propagation.

4. The method according to claim 3, characterized in that, The method of step S23 includes: Acquire the low-frequency approximation component LL, vertical detail LH, horizontal detail HL, and diagonal detail HH; （1）； in, For the reshaped feature tensor, the low-frequency subband and three high-frequency subbands All dimensions are .

5. The method according to claim 4, characterized in that, The method of step S24 includes: S241. The low-frequency approximation components are processed using depthwise separable convolution to preserve and optimize global information; the specific formula is as follows: （2）； in, For depthwise convolution, For pointwise convolution; S242. The vertical details, horizontal details, diagonal details, and low-frequency approximation components are concatenated, spatial attention weights are calculated, and then weighted, depthwise separable convolution and inverse discrete wavelet transform are performed to obtain the reconstructed features; the specific formula is as follows: （3）； in, It is the sigmoid activation function. It is a 7×7 convolution. For depthwise separable convolution; The formula for the inverse discrete wavelet transform is shown below: （4）； S243. Normalize the reconstructed features to obtain new normalized features; the specific formula is as follows: （5）； in, , These are the mean and variance of the current batch; and These are learnable parameters.

6. The method according to claim 1, characterized in that, The method in step S3 includes: S31. Load the pre-trained SAM model, wherein the SAM model includes a Vision Transformer image encoder, a cue encoder, and a mask decoder component; S32. Freeze all backbone parameters of the SAM model and determine the insertion position strategy of the LT-FAA adapter in the SAM model; S33. The LT-FAA adapter is pre-trained for domain adaptation using a large-scale general medical image segmentation dataset, so that the SAM model can initially adapt to the texture features and contrast characteristics of medical images; S34. Based on the SAM model, load the adapter weights after the first stage of pre-training, and use the lung tumor-specific dataset for task-specific training to obtain the LungSAM model. S35. In each training iteration, a batch of CT images and corresponding tumor mask labels are obtained from the data loader, and forward propagation is performed through the LungSAM model to calculate the multi-task loss function between the model prediction and the true labels; the multi-task loss function adopts a hybrid loss function combining weighted cross-entropy loss and Dice loss. Among them, weighting factors and Set them to 0.2 and 0.8 respectively; S36. Based on the multi-task loss function, execute the backpropagation algorithm, and then use the AdamW optimizer to iteratively update the adapter parameters by combining gradient information with preset hyperparameters such as learning rate and weight decay, so as to minimize the value of the multi-task loss function. S37. After each training cycle, evaluate the current model performance on the independent validation set, calculate the key metrics of Dice, IoU and 95%HD, and record the model weights corresponding to the best performance to implement the model checkpoint saving mechanism. S38. The performance of the trained LungSAM model is comprehensively evaluated on an independent test set and compared with the current state-of-the-art lung tumor segmentation methods.