A Deep Learning-Based Body Composition Analysis System and Method for CT Images

By introducing the H-ViTER module into U-Net and pruning the ViT module, combined with image preprocessing and loss function optimization, the problems of accuracy and speed in body composition analysis of CT images are solved, and efficient and accurate segmentation and analysis are achieved in resource-constrained environments.

CN118505615BActive Publication Date: 2026-06-30ZHEJIANG UNIV +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2024-04-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies for body composition analysis of CT images suffer from poor accuracy, slow speed, and high computational requirements. In particular, deep neural network models have insufficient generalization ability and a high risk of overfitting in resource-limited environments.

Method used

We employ a deep neural network model based on U-Net, introduce a hierarchical visual Transformer-enhanced ResNet (H-ViTER module) and combine it with pruning strategies. We enhance the robustness of the model through image preprocessing, capture local and global contextual information, prune the ViT module to achieve lightweighting, and optimize the model using focal cross-entropy and Dice loss function.

Benefits of technology

It enables efficient and accurate body composition analysis of CT images, and can quickly and accurately segment tissue regions in resource-constrained environments, significantly improving the efficiency of clinical decision-making.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118505615B_ABST
    Figure CN118505615B_ABST
Patent Text Reader

Abstract

This invention discloses a deep learning-based system and method for body composition analysis of CT images. The system includes an image preprocessing module for preprocessing CT images; an image segmentation module for constructing a U-Net-based deep neural network model, where the encoding layer of U-Net employs an H-ViTER module (a hierarchical visual Transformer-enhanced ResNet), embedding ViT modules at different levels of ResNet and dynamically fusing features from different ViT modules; and a U-Net decoding layer upsampling the features and outputting a tissue region segmentation image. A model optimization module prunes the ViT modules and optimizes the deep neural network model. The composition analysis module calculates the body composition analysis results. This invention improves the efficiency and accuracy of model segmentation, enabling automated body composition analysis based on CT images.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing technology, specifically relating to a deep learning-based CT image body composition analysis system and method. Background Technology

[0002] Body composition analysis has become an important tool in medical research and clinical practice, particularly in the prognostic assessment of chronic diseases, cancer diagnosis and treatment, and postoperative rehabilitation. Body composition, including but not limited to muscle mass, fat mass, and bone mineral density, has been shown to be closely related to a variety of health outcomes. Especially in the field of oncology, changes in body composition are considered an important indicator of disease progression and are closely related to patient survival, chemotherapy response, and postoperative recovery.

[0003] Currently, clinical body composition calculations still rely on professional rehabilitation physicians spending a significant amount of time manually sketching and calculating the muscle area and mass of the patient's third lumbar vertebra in cross-section. Existing technologies mostly involve manual sketching or AI sketching using traditional neural network models, but these methods suffer from poor accuracy, slow speed, and high computational demands. Therefore, developing a fast and efficient CT image body composition analysis tool is of great significance.

[0004] U-Net is a popular deep learning architecture characterized by its speed, accuracy, and low deployment requirements. Primarily used for image segmentation tasks, it is particularly effective in medical image analysis, achieving satisfactory results. It is widely applied in medical imaging such as CT and MRI to segment various tissues and organs, including tumors, blood vessels, and the heart. Through accurate segmentation, doctors can better diagnose diseases, plan treatment strategies, and evaluate treatment effectiveness. Therefore, U-Net greatly helps solve the problem of body composition analysis in CT images.

[0005] Chinese patent application CN114305473A discloses an automatic body composition measurement system based on abdominal CT images and deep learning. The segmentation method is based on and enhanced with the U-Net architecture, incorporating a texture attention mechanism that enhances the blurred regions at skeletal muscle edges to capture features such as the size of different types of skeletal muscles and the course of fiber bundles. A 3D coding branch is designed to extract fiber bundle features. Chinese patent application CN117788435A discloses a physical examination CT image data processing and analysis system and its application. It constructs a deep neural network Centrum-UNETR++ for joint vertebral body classification and segmentation, consisting of image block embedding, a four-stage encoder-decoder, an efficient pairwise attention module, convolutional blocks, and a linear classification module. Skip connections are used between the encoder and decoder in the same layer to help recover spatial information lost during downsampling operations. However, due to factors such as differences in the performance of different patients and different scanning equipment, the complexity of CT images poses challenges to deep neural network models, such as insufficient generalization and the risk of overfitting. Furthermore, the design of deep neural network structures in existing methods increases model complexity, which may increase training time and computational resources. In practical applications, especially in resource-constrained environments, there are still shortcomings. Therefore, it is necessary to design a more efficient and accurate CT image body composition analysis scheme. Summary of the Invention

[0006] In view of the above, the purpose of this invention is to provide a deep learning-based CT image body composition analysis system and method. By preprocessing CT images, it provides a foundation for enhancing the robustness and generalization ability of deep neural network models and reducing the risk of overfitting. By introducing a Hierarchical Vision Transformer Enhanced ResNet (H-ViTER module) into a deep neural network model based on the traditional U-Net, it effectively captures local and global contextual information. Combined with pruning strategies, fine-grained pruning is performed within the ViT module to obtain a lightweight ViT module, thereby ensuring the efficiency and accuracy of model segmentation. Ultimately, it achieves automated body composition analysis based on CT images, which has great application potential in promoting personalized patient treatment and patient prognosis assessment.

[0007] To achieve the above-mentioned objectives, the present invention provides the following technical solution:

[0008] In a first aspect, the present invention provides a CT image body composition analysis system based on deep learning, comprising: an image preprocessing module, an image segmentation module, a model optimization module, and a composition analysis module;

[0009] The image preprocessing module is used to preprocess CT images to enhance image quality;

[0010] The image segmentation module is used to construct a deep neural network model based on U-Net to segment tissue regions in preprocessed CT images. The encoding layer of U-Net uses the H-ViTER module, which is a hierarchical visual Transformer enhanced ResNet. By embedding ViT modules at different levels of ResNet and using residual feature fusion and global feature fusion to dynamically adjust the features from different levels of ViT modules, the decoding layer of U-Net upsamples the features and outputs the tissue region segmentation image.

[0011] The model optimization module is used to prune the ViT module and optimize the deep neural network model based on the preprocessed CT images and the constructed loss function;

[0012] The component analysis module is used to calculate body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation image output by the optimized deep neural network model.

[0013] Preferably, the preprocessing includes: image formatting, particle swarm optimization-based adaptive image enhancement, Gaussian filter-based edge enhancement, and / or data enhancement, wherein image formatting includes converting CT images into a uniform format and size, particle swarm optimization-based adaptive image enhancement includes improving the contrast and visualization of CT images by intelligently searching for optimal enhancement parameters, Gaussian filter-based edge enhancement includes using a Gaussian filter to perform edge enhancement processing on CT images to highlight the edge and texture information of CT images, and data enhancement includes randomly rotating, scaling, and / or flipping CT images to obtain new training samples.

[0014] Preferably, the U-Net includes: an input convolutional layer, a pooling layer, an H-ViTER module, a decoding layer, and an output convolutional layer, wherein the pooling layer and the output convolutional layer are connected in a skip connection, and the H-ViTER modules at different levels are connected in a skip connection to the corresponding decoding layers.

[0015] Preferably, the H-ViTER module includes: a first convolutional layer, a pooling layer, a residual module, a second convolutional layer, a third convolutional layer, a first ViT module, a fourth convolutional layer, a second ViT module, a fusion module, an average pooling layer, and a fully connected layer. The second convolutional layer is connected to the first ViT module and the second ViT module in a skip connection, the third convolutional layer is connected to the first ViT module and the second ViT module in a skip connection, the fourth convolutional layer is connected to the second ViT module in a skip connection, and the first ViT module is connected to the second ViT module in a skip connection. By inputting features into the H-ViTER module, feature details, contextual information, and global dependencies are further captured.

[0016] Preferably, the loss function includes: focal cross-entropy loss and Dice loss. Focal cross-entropy loss is used to make the model focus on distinguishing easily confused or misclassified tissue regions to improve the accuracy of multi-classification of different tissue regions. Dice loss is used to make the model focus on capturing fine tissue structure segmentation boundaries to improve the segmentation accuracy of small areas.

[0017] The focal cross-entropy loss L FCE The calculation formula is:

[0018] L FCE = -α(1-p) γ log(p)

[0019] Where α is the balance parameter, γ is the adjustment parameter, and p is the prediction probability. How closely it approximates the true category y;

[0020] The Dice loss L Dice The calculation formula is:

[0021]

[0022]

[0023] Where Dice is the Dice score, Dice(i) is the Dice score of the i-th organizational region among N organizational regions, A is the segmentation region output by the model, and B is the real region;

[0024] The formula for calculating the loss function L is:

[0025] L=λ1L FCE +λ2L Dice

[0026] Where λ1 and λ2 are the weights of the two types of loss, respectively.

[0027] Preferably, the pruning of the ViT module includes selectively pruning the attention head or MLP part inside the ViT module to reduce redundant parameters and obtain a lightweight ViT module.

[0028] Preferably, the component analysis module is used to calculate body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation image output by the optimized deep neural network model, including:

[0029] The total number of each color pixel in the tissue region segmentation image is statistically analyzed to establish the proportional relationship between different body components, including muscle, fat and bone. The proportional relationship is then transformed into an accurate area measurement through a mathematical model.

[0030] Based on area measurement, the image region of each specific body component in the CT image is reconstructed. By comprehensively calculating the pixel gray values ​​within these image regions, the HU value of each body component in the CT image is evaluated.

[0031] Secondly, to achieve the above-mentioned objectives, embodiments of the present invention also provide a method for body composition analysis of CT images based on deep learning, comprising the following steps:

[0032] The image preprocessing module is used to preprocess CT images to enhance image quality;

[0033] A deep neural network model based on U-Net was constructed using an image segmentation module to segment tissue regions in preprocessed CT images. The encoding layer of U-Net uses the H-ViTER module, which is a hierarchical visual Transformer enhanced ResNet. By embedding ViT modules at different levels of ResNet and using residual feature fusion and global feature fusion to dynamically adjust the features from different levels of ViT modules, the decoding layer of U-Net upsamples the features and outputs the tissue region segmentation image.

[0034] The ViT module is pruned using the model optimization module, and the deep neural network model is optimized based on the preprocessed CT images and the constructed loss function.

[0035] The component analysis module calculates body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation images output by the optimized deep neural network model.

[0036] Thirdly, to achieve the above-mentioned objectives, embodiments of the present invention also provide a CT image body composition analysis device based on deep learning, including a memory and a processor. The memory is used to store a computer program, and the processor is used to perform body composition analysis using the above-mentioned deep learning-based CT image body composition analysis system when the computer program is executed.

[0037] Fourthly, to achieve the above-mentioned objectives, embodiments of the present invention also provide a computer-readable storage medium storing a computer program. When the computer program is executed by a computer, body composition analysis is performed using the aforementioned deep learning-based CT image body composition analysis system.

[0038] Compared with the prior art, the beneficial effects of the present invention include at least the following:

[0039] (1) This invention preprocesses CT images, including image formatting to reduce the computational burden of model training and ensure image quality, image adaptive enhancement based on particle swarm optimization to ensure that the model can accurately identify and learn subtle features from the images, edge enhancement based on Gaussian filtering to help the model more accurately identify tissue boundaries and provide strong support for segmentation tasks, and data augmentation to help the model learn body component features observed from different angles, thereby providing a foundation for enhancing the robustness and generalization ability of deep neural network models and reducing the risk of overfitting.

[0040] (2) This invention uses a deep neural network model to perform CT image segmentation tasks. By introducing the H-ViTER module into the deep neural network model based on the traditional U-Net, and embedding lightweight ViT modules at different feature levels of the H-ViTER module to integrate and interact with global information at their respective levels, the model performance is effectively enhanced and more accurate image segmentation is achieved by reflecting the integration of global and local information in the best way.

[0041] (3) This invention performs fine-grained pruning on the ViT module, selectively pruning the attention head or MLP part inside the ViT module to obtain a lightweight ViT module, reducing unnecessary computation and parameters, improving processing efficiency while ensuring the model's feature representation capability, providing more model options for different deployment scenarios, reducing model running costs, and making it more suitable for deployment in resource-constrained environments such as edge devices. Attached Figure Description

[0042] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0043] Figure 1 This is a schematic diagram of the structure of the CT image body composition analysis system based on deep learning provided in an embodiment of the present invention;

[0044] Figure 2 This is a comparison image of the effects before and after CT image preprocessing provided in the embodiments of the present invention;

[0045] Figure 3 This is a schematic diagram of the structure of the deep neural network model provided in an embodiment of the present invention;

[0046] Figure 4 This is a schematic diagram of the tissue region segmentation image delineation effect output by the deep neural network model provided in this embodiment of the invention;

[0047] Figure 5 This is a flowchart illustrating the deep learning-based body composition analysis method for CT images provided in this embodiment of the invention. Detailed Implementation

[0048] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of protection of this invention.

[0049] The inventive concept of this invention is as follows: Addressing the challenges of insufficient model generalization and overfitting risks caused by the complexity of CT images in existing technologies, the limitations of existing neural networks in capturing global contextual information and handling long-range dependencies, and the potential increase in training time and computational resources due to the complex design of deep neural network structures, this invention provides a deep learning-based CT image body composition analysis system and method. By employing preprocessing techniques including image formatting, particle swarm optimization-based adaptive image enhancement, Gaussian filtering-based edge enhancement, and data augmentation to CT images, this method supports the reduction of computational burden on subsequent deep neural network models and enhances image quality, thus supporting CT image segmentation tasks. This study employs a deep neural network model with U-Net as the backbone, incorporating the H-ViTER module (a hierarchical visual Transformer-enhanced ResNet) into U-Net. ViT modules are embedded at different levels of ResNet, and residual feature fusion and global feature fusion are used to dynamically adjust features from different ViT modules, effectively capturing local features and global contextual information. Fine-grained pruning of the ViT modules ensures model performance and efficiency. After training the deep neural network model with a loss function, it can be successfully applied to clinical workflows, achieving accurate segmentation of target regions and automated body composition analysis, significantly improving clinical decision-making efficiency.

[0050] Figure 1 This is a schematic diagram of the structure of a deep learning-based CT image body composition analysis system provided in an embodiment of the present invention. Figure 1 As shown, the embodiment provides a CT image body composition analysis system 100 based on deep learning, including: an image preprocessing module 110, an image segmentation module 120, a model optimization module 130, and a composition analysis module 140.

[0051] In this embodiment, the image preprocessing module 110 is used to preprocess CT images to enhance image quality.

[0052] First, a total of 1,022 CT scan images from multiple medical centers were collected, covering a wide range of cases including liver transplantation for liver cancer, liver resection for liver cancer, cholecystitis, routine physical examinations, and chemotherapy for tumors. This ensured the diversity and representativeness of the dataset. Among these, 600 CT scan images of the third lumbar vertebra, carefully drawn by professional physicians, were used as labels to construct a training dataset with the original CT images for model training and learning. Another 422 images, carefully drawn by professional physicians, were used as labels to construct a validation dataset with the original images to evaluate the model's generalization ability and accuracy on unknown data.

[0053] Then, in order to improve the model training effect and the final analysis accuracy, the following necessary preprocessing steps were performed on the CT images:

[0054] (1) Image formatting: First, all CT images are converted to a uniform format and size to facilitate subsequent processing, reduce the computational burden during model training, and ensure image quality.

[0055] (2) Adaptive image enhancement based on particle swarm optimization: To improve image contrast and visualization, an adaptive enhancement technique based on particle swarm optimization algorithm was adopted. By intelligently searching for the optimal enhancement parameters, the visibility of the region of interest in the image was effectively improved, especially in the low contrast region. This step is the key to ensuring that the model can accurately identify and learn subtle features from the image.

[0056] (3) Edge enhancement based on Gaussian filtering: The image is enhanced by using a Gaussian filter to highlight the edge and texture information of the image. This step not only helps the model to identify tissue boundaries more accurately, but also provides strong support for subsequent segmentation tasks.

[0057] (4) Data augmentation: In order to increase the robustness and generalization ability of the model, further data augmentation processing was performed on the training dataset, including random rotation, scaling and flipping. These operations can generate new training samples from the original images, help the model learn the body composition features observed from different angles, and effectively reduce the risk of overfitting.

[0058] like Figure 2 The image shown is a comparison of the effects of CT image preprocessing before and after. The top left image is the original, unprocessed CT image, and the top right image is the image after adaptive enhancement based on particle swarm optimization. Figure 3 The image shown is the result of edge enhancement based on Gaussian filtering (with smoothing kernels of 1×1, 3×3, and 5×5 selected from left to right). The image quality is significantly improved after preprocessing, which can enhance the recognition efficiency and speed of subsequent models.

[0059] In this embodiment, the image segmentation module 120 is used to construct a U-Net-based deep neural network model to segment tissue regions in the preprocessed CT image.

[0060] Considering the generally small sample size of clinical case data, this embodiment uses the U-Net architecture to address the CT image delineation problem, replacing the traditional convolutional layers in the U-Net structure with ResNet. However, neural networks mainly rely on local convolutional kernels to extract features, making them relatively weak in capturing global contextual information and handling long-range dependencies. Furthermore, their fixed receptive field size limits their flexibility in processing features at different scales. Therefore, this embodiment introduces a Vision Transformer (ViT) into the ResNet architecture. By embedding customized ViT modules, each ViT module is responsible for further refining its received feature maps, enhancing the ability to capture global dependencies through a self-attention mechanism.

[0061] Deep neural network models use U-Net as the basic network architecture for CT image segmentation. For example... Figure 3 As shown, U-Net includes: an input convolutional layer (7×7), pooling layers, a first H-ViTER module, a second H-ViTER module, a third H-ViTER module, a third decoding layer, a second decoding layer, a first decoding layer, and an output convolutional layer (3×3). `concat` represents the concatenation operation, where the pooling layer and the output convolutional layer are connected in a skip connection, and different levels of H-ViTER modules are connected in a skip connection to the corresponding levels of the decoding layer. This helps preserve high-resolution features extracted from the original image and improves segmentation accuracy. The encoding layer uses 3 H-ViTER modules for downsampling (image sizes 1024×768, 512×384, 256×192), while the decoding layer uses 3 native U-Net decoders for upsampling (image size same as downsampling). The image size is gradually restored through layer-by-layer upsampling. After each upsampling step, the feature map is further refined through convolutional layers, and low-level features are fused with high-level features. This ensures image quality while restoring more detailed information to generate a refined segmentation result.

[0062] The H-ViTER module is essentially a Hierarchical VisionTransformer Enhanced ResNet. Based on the task's complexity and performance requirements, a suitable ResNet version (such as ResNet-101 or Res2Net) is selected as the basis for feature extraction. The input layer of ResNet (image size 1024×768) is appropriately adjusted according to the size and characteristics of the input CT image to ensure effective capture of details and contextual information. The network structure is designed to extract feature maps from different levels of ResNet (shallow, intermediate, and deep). These feature maps will be fed into the subsequent ViT module for further processing.

[0063] like Figure 3 As shown, the H-ViTER module includes: a first convolutional layer (7×7), pooling layers, a residual module, a second convolutional layer (3×3), a third convolutional layer (4×4), a first ViT module, a fourth convolutional layer (5×5), a second ViT module, a fusion module, an average pooling layer, and a fully connected layer. The second convolutional layer has skip connections to both the first and second ViT modules, the third convolutional layer has skip connections to both the first and second ViT modules, the fourth convolutional layer has a skip connection to the second ViT module, and the first ViT module has a skip connection to the second ViT module. By inputting features into the H-ViTER module, it further captures feature details, contextual information, and global dependencies. Feature extraction before the first ViT module is considered the first stage. The second stage involves embedding the first ViT module in an intermediate layer of ResNet. The feature map in this stage already contains rich spatial information but still retains sufficient resolution, suitable for integrating global information. A 2×2 pooling kernel with a stride of 2 is used for downsampling. By embedding a second ViT module before the global feature fusion layer as a third stage, richer spatial information is obtained, enabling the identification of relationships between global information. The first stage has sufficient resolution to enhance the information density acquired by the ViT module. Embedding multiple ViT modules within the ResNet framework fully mines features from low to high levels and improves the model's segmentation accuracy for medical CT images. Features from different levels of ViT modules are dynamically adjusted using residual feature fusion and global feature fusion to further refine features and capture global dependencies. The fused feature representation is obtained through average pooling layers and fully connected layers. The feature representation is then upsampled through the U-Net decoding layer and output convolutional layer to output a tissue region segmentation image.

[0064] In this embodiment, the model optimization module 130 is used to prune the ViT module and optimize the deep neural network model based on the preprocessed CT images and the constructed loss function.

[0065] Considering the complexity of tissue structures in CT images, and the potential for overlap, intersection, or embedding between different tissues that increases classification difficulty, a Focal Cross Entropy Loss (L) is employed for imbalanced multi-class classification tasks. FCE This is an improvement on Cross Entropy, reducing the relative loss weights for easily classified samples, thus allowing the model to focus more on difficult-to-classify samples. Simultaneously, considering the challenge of accurately segmenting tissue boundary regions in extremely small areas, Dice loss (L10000), designed for medical image segmentation tasks, particularly when handling cases where the target region is much smaller than the background, is adopted. Dice ).

[0066] Focus cross-entropy loss L FCE The calculation formula is:

[0067] L FCE = -α(1-p) γ log(p)

[0068] Where α is the balance parameter, γ is the adjustment parameter, and p is the prediction probability. The degree of closeness to the true category y is adjusted by setting the degree of focus, so that the model focuses on distinguishing easily confused or misclassified tissue regions, effectively handling class imbalance without resampling, and improving the accuracy of multi-classification of different tissue regions.

[0069] Dice loss L Dice The calculation formula is:

[0070]

[0071]

[0072] Where Dice is the Dice score, Dice(i) is the Dice score of the i-th tissue region among N tissue regions, A is the segmented region output by the model, and B is the real region. The Dice loss is used to make the model focus on capturing fine tissue structure segmentation boundaries to improve the segmentation accuracy of small regions.

[0073] The formula for calculating the loss function L is:

[0074] L=λ1L FCE +λ2L Dice

[0075] Where λ1 and λ2 are the weights of the two types of loss, respectively, and the loss function L utilizes L... FCE To improve the accuracy of multi-class classification and avoid problems such as color misjudgment, L...Dice Improve the overlap between the predicted region and the actual region to achieve region alignment.

[0076] Meanwhile, to address the issue of the high computational cost of the ViT module and its inability to achieve good results with small sample sizes, this embodiment employs Dense Vision Transformer Compression (DC-ViT) to compress the ViT module. This involves selectively pruning the attention heads or MLP parts within the ViT module to reduce redundant parameters, resulting in a lightweight ViT module. This enables effective model compression and optimization even with extremely limited training samples. Furthermore, it allows the deep neural network model in this embodiment to run relatively easily on commercial GPUs. This method allows for fine-grained pruning of the ViT part, rather than simply discarding the entire Transformer block or layer, thus providing more model options for different deployment scenarios, reducing model operating costs, and making it more suitable for deployment in resource-constrained environments such as edge devices.

[0077] In this embodiment, the component analysis module 140 is used to calculate body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation image output by the optimized deep neural network model.

[0078] First, the total number of each color pixel in the tissue region segmentation image output by the model is statistically analyzed to establish the proportional relationship between different body components, including muscle, fat, and bone. Then, leveraging the real-world scale information contained in the DICOM image file, these proportions are transformed into precise area measurements using a mathematical model, achieving accurate measurement of the distribution area of ​​body components. Body component HU (Hounsfield Unit) value calculation: In this stage, based on the previous analysis results, the image region of each specific body component is accurately reconstructed on a virtual blank canvas. Then, by comprehensively calculating the pixel grayscale values ​​within these defined regions, the HU value of each body component is evaluated. This step is achieved by analyzing the distribution of pixel grayscale levels.

[0079] The performance of deep neural network models is evaluated on validation datasets, with key metrics including accuracy, recall, F1 score, and Dice coefficient. The following are the key performance metrics obtained on the validation datasets:

[0080] Accuracy: 95.8%;

[0081] Recall rate: 94.2%;

[0082] F1 Score: 94.9%;

[0083] Dice Coefficient: 95.2%.

[0084] These metrics demonstrate that, compared to traditional manual delineation methods, this deep learning model exhibits significant accuracy and consistency in identifying and quantifying body composition. In particular, the high Dice coefficient indicates that the model's segmentation of muscle and adipose tissue closely approximates the actual annotations, accurately capturing the boundaries and details of the target regions. Figure 4 The image shown is a schematic diagram of the drawing effect output by the deep neural network model provided in this embodiment of the invention. The left image shows the effect of manual drawing (label), and the right image shows the effect of model drawing. Figure 4 In the cross-section of the third lumbar vertebra, the blue area represents fat, the red area represents muscle, the green area represents intramuscular fat, the yellow area represents internal organs, and the black area represents others. The model's MIoU is 0.7251, and the accuracy per pixel is 90.2%, indicating high accuracy.

[0085] In summary, a deep learning-based CT image body composition analysis system, after meticulous performance optimization, has successfully integrated the deep neural network model provided in this embodiment into clinical workflows. By automating the analysis of body composition in CT images, physicians can now obtain accurate muscle mass, fat mass, and bone density data within 1-3 seconds (depending on server performance), significantly improving the efficiency of clinical decision-making. A significant finding was obtained after in-depth analysis and application of the body composition analysis tool developed in this embodiment: clinical decisions based on the data provided by this tool have an accuracy rate of up to 98%. This finding underscores the tool's extremely high value and reliability in supporting high-quality clinical decision-making.

[0086] Based on the same inventive concept, such as Figure 5 As shown, this embodiment of the invention also provides a method for body composition analysis of CT images based on deep learning, including the following steps:

[0087] S1 uses the image preprocessing module to preprocess CT images to enhance image quality.

[0088] S2 utilizes an image segmentation module to construct a deep neural network model based on U-Net to segment tissue regions in preprocessed CT images. The encoding layer of U-Net uses the H-ViTER module, which is a hierarchical visual Transformer enhanced ResNet. By embedding ViT modules at different levels of ResNet and using residual feature fusion and global feature fusion to dynamically adjust the features from different levels of ViT modules, the decoding layer of U-Net upsamples the features and outputs the tissue region segmentation image.

[0089] S3 uses the model optimization module to prune the ViT module and optimizes the deep neural network model based on the preprocessed CT images and the constructed loss function.

[0090] S4 uses the component analysis module to calculate body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation image output by the optimized deep neural network model.

[0091] Based on the same inventive concept, this invention also provides a deep learning-based CT image body composition analysis device, including a memory and a processor. The memory is used to store a computer program, and the processor is used to perform body composition analysis using the aforementioned deep learning-based CT image body composition analysis system when the computer program is executed.

[0092] Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which, when executed by a computer, performs body composition analysis using the aforementioned deep learning-based CT image body composition analysis system.

[0093] It should be noted that the deep learning-based CT image body composition analysis method, deep learning-based CT image body composition analysis device, and computer-readable storage medium provided in the above embodiments all belong to the same inventive concept as the deep learning-based CT image body composition analysis system. For details of their specific implementation process, please refer to the embodiments of the deep learning-based CT image body composition analysis system, which will not be repeated here.

[0094] The specific embodiments described above illustrate the technical solution and beneficial effects of the present invention in detail. It should be understood that the above description is only the most preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, additions, and equivalent substitutions made within the scope of the principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A deep learning-based CT image body composition analysis system, characterized in that, include: Image preprocessing module, image segmentation module, model optimization module, and component analysis module; The image preprocessing module is used to preprocess CT images to enhance image quality; The image segmentation module is used to construct a U-Net-based deep neural network model to segment tissue regions in preprocessed CT images; The U-Net comprises: an input convolutional layer, a pooling layer, an H-ViTER module, a decoding layer, and an output convolutional layer. The pooling layer and the output convolutional layer are skipped connections, and H-ViTER modules at different levels are skipped connections to the corresponding decoding layers. The H-ViTER module comprises: a first convolutional layer, a pooling layer, a residual module, a second convolutional layer, a third convolutional layer, a first ViT module, a fourth convolutional layer, a second ViT module, a fusion module, an average pooling layer, and a fully connected layer. The second convolutional layer is skipped between the first ViT module and the second ViT module, and the third convolutional layer is skipped between the first ViT module and the second ViT module. The U-Net employs a block-skipping connection, with the fourth convolutional layer skipping the second ViT module, and the first ViT module skipping the second ViT module. By inputting features into the H-ViTER module, it further captures feature details, contextual information, and global dependencies. The encoding layer of U-Net uses the H-ViTER module, which is a hierarchical visual Transformer-enhanced ResNet. By embedding ViT modules at different levels of ResNet and using residual feature fusion and global feature fusion to dynamically adjust features from different levels of ViT modules, the decoding layer of U-Net upsamples the features and outputs a tissue region segmentation image. The model optimization module is used to prune the ViT module and optimize the deep neural network model based on the preprocessed CT images and the constructed loss function; The component analysis module is used to calculate body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation image output by the optimized deep neural network model.

2. The CT image body composition analysis system based on deep learning according to claim 1, characterized in that, The preprocessing includes: image formatting, particle swarm optimization-based adaptive image enhancement, Gaussian filter-based edge enhancement, and / or data enhancement. Image formatting includes converting CT images into a uniform format and size. Particle swarm optimization-based adaptive image enhancement includes improving the contrast and visualization of CT images by intelligently searching for optimal enhancement parameters. Gaussian filter-based edge enhancement includes using a Gaussian filter to perform edge enhancement processing on CT images to highlight the edges and texture information of CT images. Data enhancement includes randomly rotating, scaling, and / or flipping CT images to obtain new training samples.

3. The CT image body composition analysis system based on deep learning according to claim 1, characterized in that, The loss functions include: focal cross-entropy loss and Dice loss. Focal cross-entropy loss is used to make the model focus on distinguishing easily confused or misclassified tissue regions to improve the accuracy of multi-classification of different tissue regions. Dice loss is used to make the model focus on capturing fine tissue structure segmentation boundaries to improve the segmentation accuracy of small areas. The focal cross-entropy loss The calculation formula is: , in, For balancing parameters, To adjust the parameters, For predicting probabilities Compared to the real category The degree of closeness; The Dice loss The calculation formula is: , , in, Dice For Dice score, for The first organizational region Dice scores for each organizational region The segmented regions output by the model. For real areas; loss function The calculation formula is: , in, and These are the weights for the two types of losses.

4. The CT image body composition analysis system based on deep learning according to claim 1, characterized in that, The pruning of the ViT module includes selectively pruning the attention head or MLP part inside the ViT module to reduce redundant parameters and obtain a lightweight ViT module.

5. The CT image body composition analysis system based on deep learning according to claim 1, characterized in that, The component analysis module is used to calculate body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation image output by the optimized deep neural network model, including: The total number of each color pixel in the tissue region segmentation image is statistically analyzed to establish the proportional relationship between different body components, including muscle, fat and bone. The proportional relationship is then transformed into an accurate area measurement through a mathematical model. Based on area measurement, the image region of each specific body component in the CT image is reconstructed. By comprehensively calculating the pixel gray values ​​within these image regions, the HU value of each body component in the CT image is evaluated.

6. A deep learning-based method for body composition analysis of CT images, implemented using the deep learning-based CT image body composition analysis system according to any one of claims 1 to 5, characterized in that, Includes the following steps: The image preprocessing module is used to preprocess CT images to enhance image quality; A deep neural network model based on U-Net was constructed using an image segmentation module to segment tissue regions in preprocessed CT images. The encoding layer of U-Net uses the H-ViTER module, which is a hierarchical visual Transformer enhanced ResNet. By embedding ViT modules at different levels of ResNet and using residual feature fusion and global feature fusion to dynamically adjust the features from different levels of ViT modules, the decoding layer of U-Net upsamples the features and outputs the tissue region segmentation image. The ViT module is pruned using the model optimization module, and the deep neural network model is optimized based on the preprocessed CT images and the constructed loss function. The component analysis module calculates body composition analysis results, including muscle mass, fat mass, and bone density, based on the tissue region segmentation images output by the optimized deep neural network model.

7. A deep learning-based CT image body composition analysis device, comprising a memory and a processor, wherein the memory is used to store a computer program, characterized in that, The processor is used to perform body composition analysis using the deep learning-based CT image body composition analysis system according to any one of claims 1-5 when executing the computer program.

8. A computer-readable storage medium storing a computer program thereon, characterized in that, When the computer program is executed by a computer, body composition analysis is performed using the deep learning-based CT image body composition analysis system according to any one of claims 1-5.