Three-dimensional full-automatic human body composition analysis method and system based on CT image, and medium
By constructing a spine segmentation model and a body composition segmentation model using a U-Net network, and combining the attention module and the maximum connected component method, a fully automated three-dimensional human body composition analysis of CT images was achieved. This solves the problems of low accuracy and insufficient automation in single-level measurement in existing technologies, and provides a detailed body composition analysis report.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- GUANGDONG GENERAL HOSPITAL
- Filing Date
- 2024-02-06
- Publication Date
- 2026-06-23
AI Technical Summary
Existing body composition analysis methods based on CT images suffer from low accuracy in single-level measurements, inability to achieve fully automated processes, and lack of analysis of intramuscular fat and bone.
By constructing a spine segmentation model based on the U-Net network, and combining attention modules and the maximum connected component method, the automatic localization of the spine is achieved. Combined with the body component segmentation model, a fully automated 3D analysis is performed, including detailed classification of the spine and body components and modeling of their interrelationships.
It improves the accuracy of three-dimensional volume composition measurement, reduces human error, provides detailed volume composition analysis reports, and supports clinical physician evaluation.
Smart Images

Figure CN118052783B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the technical field of machine learning, specifically relating to a fully automated three-dimensional human body composition analysis method, system, and medium based on CT images. Background Technology
[0002] Body composition refers to the various tissue components that make up an organism, including fat, muscle, and bone. Under normal circumstances, the content and proportion of these components in the human body are in a state of balance, maintaining the normal function of the body's organs. However, once the content and proportion of these components become imbalanced, it can lead to varying degrees of symptoms or signs, and even trigger serious diseases such as metabolic disorders, diabetes, coronary heart disease, and various cancers. Classifying and quantifying body composition helps doctors diagnose diseases, develop treatment plans, monitor treatment effectiveness, and assess prognosis based on changes in body composition. Therefore, body composition analysis has significant clinical importance.
[0003] Conventional body composition analysis methods mainly include bioelectrical impedance analysis (BIA) and dual-energy X-ray absorptiometry (DXA). These methods require specific hardware equipment, increasing hospital equipment costs and patient examination fees. Furthermore, BIA results are highly volatile and unstable; DXA requires close patient cooperation, increasing ionizing radiation exposure, and results vary significantly between equipment from different manufacturers.
[0004] To address these issues, researchers have recently begun analyzing body composition using CT images. Compared to bioelectrical impedance analysis and dual-energy X-ray absorptiometry, CT image-based body composition analysis offers the following advantages: 1) CT scans are routine procedures; hospitals do not need to purchase specific equipment, patients do not incur additional examination costs, and there is no additional ionizing radiation. 2) CT grayscale values are stable, with minimal differences between devices from different manufacturers. 3) CT images provide a high-resolution description of anatomical structures, reflecting the distribution of body components in specific organs. For example, fatty liver can be diagnosed by quantitatively assessing fat accumulation in the liver, helping doctors make more accurate judgments.
[0005] The core of CT image-based body composition analysis lies in the effective segmentation of fat, muscle, and bone within the images. Traditional segmentation methods primarily rely on thresholding techniques to automatically identify fat and atlas techniques to separate muscle. These methods are susceptible to noise and artifacts in CT images, resulting in low segmentation accuracy and often requiring manual intervention, which is time-consuming and labor-intensive. In contrast, deep learning neural network methods learn the anatomical structural features of various body components through a data-driven approach, leading to more robust and accurate segmentation results.
[0006] Although deep learning methods for volume composition analysis of CT images are superior to traditional thresholding and atlas techniques, they still have the following main problems:
[0007] Single-level measurement: Due to the lack of complete labeled data, current body composition analysis methods based on CT images often only select the central layer of the third lumbar vertebra (L3) as recommended by clinical guidelines to represent the body composition of the entire human body. Single-level body composition segmentation cannot fully utilize the three-dimensional information of CT images, resulting in reduced model segmentation accuracy and affecting the final analysis results; the body composition content between adjacent layers of CT images may also vary significantly, and subtle differences in the selection of the central layer may lead to different measurement results; in addition, chest CT scans often do not include the central layer of the third lumbar vertebra, reducing the applicability of single-level measurement.
[0008] The process cannot be fully automated and often requires manual spinal localization: Current body composition analysis methods based on CT images often require doctors to first select the corresponding slice based on the spinal localization, and then perform body composition analysis based on the selected slice. This semi-automated process is cumbersome, and the slices selected by different doctors may also have some individual differences, introducing human error.
[0009] The classification of body components and their interrelationships is not comprehensive enough: current methods often only analyze visceral fat, subcutaneous fat, and skeletal muscle in CT images, lacking analysis of intramuscular fat and bone. Furthermore, the interrelationships between body components reflect the balance between components; therefore, it is necessary to model the interrelationships between each component as well. Summary of the Invention
[0010] The main objective of this invention is to overcome the shortcomings and deficiencies of the prior art and provide a three-dimensional fully automated human body composition analysis method, system and medium based on CT images. This invention achieves fully automated body composition analysis through automated spinal positioning.
[0011] To achieve the above objectives, the present invention adopts the following technical solution:
[0012] In a first aspect, the present invention provides a fully automated three-dimensional human body composition analysis method based on CT images, comprising the following steps:
[0013] Acquire CT image data and annotate the CT image data;
[0014] Preprocess the labeled CT image data;
[0015] A spinal segmentation model is constructed, and the minimum, maximum, and center values of each vertebra's position are obtained using the maximum connected component method. These minimum, maximum, and center values represent the uppermost, lowermost, and middle layers of the vertebral body required for spinal localization. Bilinear interpolation is performed on all corresponding layers of the vertebral body to obtain a new image and visualization results. The spinal segmentation model incorporates an attention module into the skip connection part of a pre-defined U-Net network. This attention module is used to learn the overall information of the spine, thereby improving the segmentation performance of the spinal segmentation model.
[0016] The spine segmentation model is trained based on the loss function until the spine segmentation model converges; the loss function is the sum of the Dice loss function and the cross-entropy loss function.
[0017] Construct a volume component segmentation model and train it using the same loss function and deep supervision method as the spine segmentation model until convergence;
[0018] Based on the trained spine segmentation model and body component segmentation model, the spine localization result and body component segmentation result are obtained. Based on the spine localization result and body component segmentation result, the relevant indicators and interrelationships of human body components are calculated.
[0019] As a preferred technical solution, the annotation of CT image data specifically includes:
[0020] The CT image data is segmented and annotated with the spine and three-dimensional body composition data. The spinal segmentation annotation includes the annotation of the cervical vertebrae, thoracic vertebrae, lumbar vertebrae and sacrum. The three-dimensional body composition data annotation includes the annotation of skeletal muscle, intramuscular fat, visceral fat, subcutaneous fat and bone.
[0021] As a preferred technical solution, the preprocessing includes anonymization, normalization, and resampling, specifically:
[0022] Anonymize patient-related information in CT image data and convert the original DICOM format data to NII format data;
[0023] Read the image grayscale values at the corresponding positions of the annotated CT image data, and calculate the mean, variance, and quantiles of the grayscale values. After the calculation, truncate the grayscale values and standardize them by subtracting the calculated mean and dividing by the variance.
[0024] The standardized image data and its corresponding labeled data are resampled to the resolution corresponding to the mean, and the preprocessing results are stored.
[0025] As a preferred technical solution, the spine segmentation model is constructed based on the U-Net network. An attention module is added to the skip connection part of the U-Net network. When the feature map encoded by the encoder passes through the attention module, it first undergoes a three-dimensional convolution through a convolution kernel of a set size, and then undergoes non-linear activation through the PReLU activation function before undergoing another three-dimensional convolution. The attention module obtains global information through bar average pooling. After bar average pooling, bilinear interpolation is used to upsample the sampling result. Finally, the result of the attention module is weighted and superimposed with the original feature map before being sent to the decoder for decoding.
[0026] As a preferred technical solution, the Dice loss function L dice Expressed using the following formula:
[0027]
[0028] Where N represents the total number of pixels, y true Represents the true segmentation label, y pred The label represents the predicted label, where i represents the i-th pixel.
[0029] Cross-entropy loss function L cw for:
[0030]
[0031] Therefore, the total loss function L total for:
[0032] L total =L dice +λL ce .
[0033] As a preferred technical solution, the method of obtaining the minimum, maximum, and center values of each spine position using the maximum connected component method is as follows:
[0034] After the spinal segmentation model outputs the results, it first calculates the sum of all pixels in the sagittal plane of the CT image data that are equal to the label value of the spine to be located. Then, it divides this by the sum of the pixels of the spine in the entire CT image to obtain the normalized result of each layer. Finally, it performs a weighted summation of the normalized result and the index value of each layer to obtain the most complete layer.
[0035] After obtaining the most complete layer corresponding to each spine, the most complete layer corresponding to each spine is converted into a binary image. The region segmented by the spine has a value of 1, and the rest have a value of 0. Starting from the top left corner, the binary image is scanned. When a foreground pixel is encountered, it is marked as a connected component. It is then checked whether there is foreground within its four connected components. If there is, it is marked as the same connected component. If not, the marking ends. When a new foreground pixel is encountered, a new connected component is marked. When all points are traversed, two maximum connected components can be obtained: the connected component of the main vertebral body and the connected component of the spinous process. Based on the characteristics of human anatomical structure, the connected component on the left is taken as the connected component of the main vertebral body.
[0036] Based on the connected component results, obtain the minimum and maximum ordinate values of the connected component, as well as the corresponding centroid coordinate values. These are the uppermost, lowermost, and middle layers of the vertebral body required for spinal localization.
[0037] As a preferred technical solution, after obtaining the spinal localization result and the body component segmentation result, the spinal localization result is combined with the body component segmentation to obtain the segmentation result of the corresponding voxel. The segmentation result includes:
[0038] Voxel segmentation results: Calculate the volume component segmentation results of the CT voxel from the first layer to the last layer;
[0039] Lung-related: Take the uppermost layer of the first thoracic vertebra to the lowermost layer of the twelfth thoracic vertebra, and calculate the body component segmentation results within this range;
[0040] Abdominal and pelvic region: from the top of the first lumbar vertebra to the bottom of the sacrum, calculate the body component segmentation results within this range;
[0041] Specific internal organs: Based on the anatomical information, input two specific vertebrae, and directly obtain the uppermost and lowermost layers based on the positioning results, and calculate the body component segmentation results within this range;
[0042] Specific spinal single layer: Input a specific spine, directly obtain the middle layer of the spine, calculate the volume component segmentation results within that range, and compare them with existing single-layer measurement methods.
[0043] As a preferred technical solution, the calculation of human body composition-related indicators and their interrelationships based on spinal positioning results and body composition segmentation results specifically includes:
[0044] Based on the spinal location and task, select the corresponding voxel range, and calculate the basic volume parameters of body composition analysis based on the body composition segmentation results: skeletal muscle volume, subcutaneous fat volume, intramuscular fat volume, visceral fat volume, and bone volume.
[0045] Based on the segmentation results, calculate the comprehensive parameters: skeletal muscle content: skeletal muscle content = skeletal muscle volume × average muscle radiation density; total fat content: total fat content = skeletal muscle volume × average muscle radiation density.
[0046] Based on the segmentation results, bone standardization results are calculated, and body composition analysis is performed: skeletal muscle-bone volume ratio = skeletal muscle volume / bone volume, subcutaneous fat-bone volume ratio = subcutaneous fat-volume / bone volume, intramuscular fat-bone volume ratio = intramuscular fat volume / bone volume, visceral fat-bone volume ratio = visceral fat volume / bone volume.
[0047] The relationships between soft tissues are calculated based on the segmentation results to analyze their distribution and balance: subcutaneous fat-skeletal muscle volume ratio = subcutaneous fat volume / skeletal muscle volume, skeletal muscle fat volume fraction = intermuscular fat volume / skeletal muscle volume, visceral fat-skeletal muscle volume ratio = visceral fat volume / skeletal muscle volume, visceral fat-subcutaneous fat volume ratio = visceral fat volume / subcutaneous fat volume, intermuscular fat-subcutaneous fat volume ratio = intermuscular fat volume / skeletal muscle volume, intermuscular fat-visceral fat volume ratio = intermuscular fat volume / visceral fat volume;
[0048] When clinical data on the patient's height is available, input the patient's height to perform height-standardized body composition analysis; that is, based on the segmentation results, when the user inputs their height, body composition can be standardized for height: skeletal muscle volume index = skeletal muscle volume / height. 2 Subcutaneous fat volume index = subcutaneous fat volume / height 2 Intramuscular fat volume index = intramuscular fat volume / height 2 Visceral fat volume index = visceral fat volume / height 2 Bone volume index = bone volume / height 2 .
[0049] Secondly, the present invention provides a three-dimensional fully automated human body composition analysis system based on CT images, which is applied to the three-dimensional fully automated human body composition analysis method based on CT images, including a data acquisition module, a preprocessing module, a first model construction module, a first model training module, a second model construction module, and a segmentation module;
[0050] The data acquisition module is used to collect CT image data and annotate the CT image data;
[0051] The preprocessing module is used to preprocess the labeled CT image data;
[0052] The first model construction module is used to construct a spine segmentation model. It uses the maximum connected component method to obtain the minimum, maximum, and center values of each vertebra's position. The minimum, maximum, and center values are the uppermost, lowermost, and middle layers of the vertebral body required for spine localization. Bilinear interpolation is performed on all corresponding layers of the vertebral body to obtain a new image and visualization results. The spine segmentation model adds an attention module to the skip connection part of the preset U-Net network. The attention module is used to learn the overall information of the spine so that the spine segmentation model can obtain better segmentation results.
[0053] The first model training module is used to train the spine segmentation model based on a loss function until the spine segmentation model converges; the loss function is the sum of the Dice loss function and the cross-entropy loss function;
[0054] The second model building module is used to build a volume component segmentation model and train the volume component segmentation model until it converges using the same loss function and deep supervision method as the spine segmentation model.
[0055] The segmentation module is used to obtain the spine localization result and body composition segmentation result based on the trained spine segmentation model, and to calculate the relevant indicators and interrelationships of human body composition based on the spine localization result and body composition segmentation result.
[0056] Thirdly, the present invention provides a computer-readable storage medium storing a program, which, when executed by a processor, implements the aforementioned three-dimensional fully automated human body composition analysis method based on CT images.
[0057] Compared with the prior art, the present invention has the following advantages and beneficial effects:
[0058] 1. This invention enables three-dimensional volume composition calculation: By annotating the data in detail, the volume composition at three-dimensional levels can be calculated, reducing the error caused by the gap between the existing two-dimensional single calculation and the volume composition of the whole human body, and further expanding the application value of conventional CT examination.
[0059] 2. This invention achieves fully automated body composition analysis through automated spinal positioning: By utilizing automated spinal positioning technology, fully automated body composition analysis is achieved. Doctors no longer need to manually select corresponding layers for measurement, reducing human error.
[0060] 3. The detailed spinal localization of this invention facilitates the analysis of specific organ performance: Detailed spinal localization helps physicians perform body composition analysis by identifying the internal organs corresponding to the spinal level. This invention provides corresponding optional interfaces for clinicians to input data, improving clinical usability.
[0061] 4. This invention provides a more detailed classification and exploration of the interrelationships of various body components in CT images: This method classifies various body components in CT images in more detail, explores the interrelationships between them and models them, and finally prints a report for easy evaluation by clinicians. Attached Figure Description
[0062] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0063] Figure 1 This is a flowchart of a fully automated three-dimensional human body composition analysis method based on CT images, according to an embodiment of the present invention.
[0064] Figure 2 This is a flowchart illustrating the localization results of the spinal segmentation model in an embodiment of the present invention.
[0065] Figure 3 This is a structural diagram of the attention mechanism module in the U-Net skip connection according to an embodiment of the present invention;
[0066] Figure 4 This is a schematic diagram of the body component segmentation results in an embodiment of the present invention;
[0067] Figure 5 This is a block diagram of a fully automated three-dimensional human body composition analysis system based on CT images, according to an embodiment of the present invention. Detailed Implementation
[0068] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of the present application, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present application without creative effort are within the scope of protection of the present application.
[0069] In this application, the reference to "embodiment" means that a specific feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a mutually exclusive, independent, or alternative embodiment. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described in this application can be combined with other embodiments.
[0070] Body components: The various tissue components that make up the human body, such as subcutaneous fat, visceral fat, intramuscular fat, skeletal muscle, and bones.
[0071] Deep learning originates from research on artificial neural networks; it discovers distributed feature representations of data by combining low-level features to form more abstract high-level representations of attribute categories or features.
[0072] Convolutional Neural Network (CNN) is a type of feedforward neural network widely used in image processing and speech recognition, and is one of the main network frameworks in deep learning.
[0073] Instance normalization: also known as contrastive normalization, normalization is used when features in data have different ranges. In order to change the values of numerical columns in the dataset, a uniform scale is used, which can speed up the optimization process. Instance normalization is applied to the entire batch of images rather than a single image.
[0074] LeaklyReLU is an activation function for neural networks. When the input is greater than 0, the output is equal to the input. When the input is less than 0, the output is a number that is a times smaller than the input, where a is usually between 0.01 and 0.3.
[0075] PReLU: An activation function for neural networks; unlike LeaklyReLU, 'a' in PReLU is a learnable parameter.
[0076] Sigmoid: is an activation function in neural networks; it maps the input to a range between 0 and 1.
[0077] Dice coefficient: It is a measure of similarity between two samples, with a value between 0 and 1. The larger the value, the more similar the two samples are.
[0078] Skip connections are an architectural design technique in deep neural networks. Their main purpose is to solve problems such as vanishing and exploding gradients during deep neural network training, and to accelerate training convergence.
[0079] like Figure 1 As shown, this embodiment of the fully automated three-dimensional human body composition analysis method based on CT images includes the following steps:
[0080] S1. Collect CT data, complete the data annotation work, and divide the data into training set, validation set, and test set.
[0081] Furthermore, step S1 specifically includes:
[0082] S11. Collect clinical CT image data, first clean the data, and delete CT data with poor image quality or excessive slice thickness.
[0083] S12. Use ITK-SNAP software to perform spine segmentation annotation and three-dimensional annotation of body composition data on image data.
[0084] In one specific embodiment, the spine comprises 25 tags: the first to seventh cervical vertebrae (C1 to C7, tag values 25 to 19), the first to twelfth thoracic vertebrae (T1 to T12, tag values 18 to 7), the first to fifth lumbar vertebrae (L1 to L5, tag values 6 to 2), and the sacrum (S, tag value 1). Body components comprise five tags: skeletal muscle (Muscle, tag value 1), intramuscular fat (IMAT, tag value 2), visceral fat (VAT, tag value 3), subcutaneous fat (SAT, tag value 4), and bone (Bone, tag value 5).
[0085] S13. Organize the labeled data and divide it into training set, validation set and test set in a ratio of 7:1:2.
[0086] S2. Perform anonymization, standardization, and resampling preprocessing on the data.
[0087] Furthermore, step S2 specifically includes the following:
[0088] S21. Anonymize all patient-related information in all CT image data and convert the original DICOM format data to NII format data.
[0089] S22. For the two tasks of spinal segmentation and volume component segmentation, read the grayscale values of the CT images corresponding to all labeled data locations for each task, and calculate the mean, variance, 5th percentile, and 95th percentile of the grayscale values. After the statistics are completed, truncate the grayscale values of the CT image data, with the minimum value being the 5th percentile and the maximum value being the 95th percentile. Standardization is achieved by subtracting the calculated mean and dividing by the variance.
[0090] S23. Obtain the resolution of each CT data, calculate its mean, and resample the standardized image data and its corresponding labeled data in S22 to the resolution corresponding to the mean, and store the preprocessing results.
[0091] S3. Build a spine segmentation model based on CT images, train the network using training set data, and obtain the maximum, minimum and center values of each spine position by extracting the maximum connected component.
[0092] Furthermore, such as Figure 2 As shown, step S3 specifically includes the following:
[0093] S31. Build a spine segmentation model based on U-Net and add an attention module to the skip connection part of U-Net so that the network can learn the overall information of the spine better and obtain better segmentation results.
[0094] In one specific embodiment, the encoder of the spine segmentation model consists of 5 layers, each containing 2 convolutional blocks. Each convolutional block is composed of a 3D convolution, instance normalization, and a LeaklyReLU activation function. The number of channels in the five encoder layers are 32, 64, 128, 256, and 320, respectively. All 3D convolutions have 3×3×3 kernels. The stride of the first convolutional block in each of the five layers is 1×1×1, 2×2×2, 2×2×2, 2×2×2, and 1×2×2, respectively, and the stride of the second convolutional kernel is 1×1×1. The decoder of the spine segmentation model also consists of 5 layers, with the bottom layer being a single 3D transposed convolution with 320 channels, a 1×2×2 kernel size, and a 1×2×2 stride. The other four layers consist of two convolutional blocks and one 3D transposed convolution. The number of channels in each layer is 256, 128, 64, and 32, respectively. The kernel size in both convolutional blocks is 3×3×3, and the stride is 1×1×1. The kernel size in the 3D transposed convolution is 2×2×2, and the stride is 2×2×2. The decoder achieves upsampling through transposed convolution. At the end of the network, the Sigmoid activation function is used to output the result. The basic U-Net architecture passes the encoder's feature maps to the decoder via skip connections, enabling the decoder to learn low-level semantic features from the encoder. Based on this, to enable the spine to better learn the mutual information between spines and achieve better segmentation results, this invention adds an attention module to the skip connection part. A schematic diagram of this attention module is shown below. Figure 3 As shown, when the feature map encoded by the encoder passes through this attention module, it first undergoes a 3D convolution with a kernel size of 3×3×3 and a stride of 1×1×1, followed by non-linear activation using the PReLU activation function, and then another 3D convolution with a stride of 1×1×1. Assume the new feature map F∈R after the convolutional layers... C×D×H×W Here, H corresponds to the axes of the spine that are related to each other. The attention mechanism obtains global information through bar average pooling, and its formula can be expressed as:
[0095]
[0096] Where, x c(i,j,k) represents the gray value of the c-th feature layer corresponding to F at position (i,j,k). Through this attention mechanism, the network gains a larger field of view on H, thereby learning the relationships between the spines. After strip average pooling, to make the output the same size as the original feature map, bilinear interpolation U(·) is used to upsample the pooling result to the original C×D×H×W size. Finally, the A∈R of the attention mechanism is... C×D×H×W The result is weighted and superimposed with the original feature map to form F. out The data is then sent to the decoder for decoding.
[0097] F out =F + ε × A
[0098] Here, ε is a hyperparameter of the network, used to control the proportion of the attention mechanism output.
[0099] S32. After the architecture is built, design the loss function for training the spine segmentation model; this spine segmentation model uses the Dice loss function and the cross-entropy loss function. The Dice loss function is expressed by the following formula:
[0100]
[0101] Where N represents the total number of pixels, y true Represents the true segmentation label, y pred This represents the predicted label. The cross-entropy loss function is:
[0102]
[0103] Therefore, the total loss function is:
[0104] L total =L dice +λL ce
[0105] Furthermore, to enable the network to obtain better semantic information at each feature layer, deep supervision is employed during training. In step S1, an output convolutional layer and a sigmoid layer are added to each layer of the network decoder. The output image is interpolated to the original image size using nearest neighbor interpolation. The loss function is applied to the output image of each layer and multiplied by the corresponding coefficient. The ratios of the five decoder layers from the bottom layer to the output layer are set as follows: 0.1, 0.25, 0.5, 0.75, and 1.
[0106] After the S33 spine segmentation model outputs the results, it is necessary to first obtain the most complete layer corresponding to each spine for subsequent localization.
[0107] Furthermore, step S33 specifically includes:
[0108] First, calculate the sum of all pixels in each layer of the sagittal plane of the CT image that are equal to the label value of the spine to be located. Then, divide this sum by the total number of pixels of the spine in the entire CT image to obtain the normalized result of each layer. Next, perform a weighted summation of the normalized result and the index value of each layer. Finally, round the result to the nearest integer to obtain the correct layer.
[0109] S34. After obtaining the most complete slice for each spine, since the main vertebral body and spinous process are separated in the CT image, and the presence of the spinous process affects localization, it is necessary to first obtain the connected components of the main vertebral body. First, convert the most complete slice for each spine into a binary image, where the segmented region of the spine has a value of 1, and the rest have values of 0. Start scanning the binary image from the top left corner. When a foreground pixel is encountered, mark it as a connected component and check if there is foreground within its four connected components. If there is, mark it as the same connected component; otherwise, mark the end. When a new foreground pixel is encountered, mark the new connected component. When all points are traversed, two maximum connected components can be obtained: the connected component of the main vertebral body and the connected component of the spinous process. Based on the characteristics of human anatomical structure, the connected component on the left is taken as the connected component of the main vertebral body.
[0110] S35. Based on the connected component results from step S33, obtain the minimum and maximum ordinate values of the connected component, as well as the corresponding centroid coordinate values. These represent the uppermost, lowermost, and middle layers of the vertebral body required for spinal localization. Perform bilinear interpolation on all corresponding layers of the vertebral body to obtain a new image and visualization results.
[0111] S4. Construct a volume component segmentation model based on CT images and train the network using the training set data.
[0112] Furthermore, step S4 specifically includes the following:
[0113] S41. Construct the same basic U-Net framework as in step S31 (without adding an attention mechanism module to the skip connection part), and use the loss function and deep supervision method from step S32. Train the network with the volume component results from expert segmentation, and visualize the segmentation results as follows: Figure 4 As shown, the body component segmentation model is basically consistent with the results of expert annotation in the segmentation of subcutaneous fat, skeletal muscle, visceral fat and bone, with only some error in intramuscular fat.
[0114] S42. Combine the spinal localization results with body component segmentation to obtain the segmentation results of the corresponding voxels.
[0115] Furthermore, the parameter values are preset as follows: (1) Overall voxel results: The voxel segmentation results of the CT voxel are calculated directly from the first layer to the last layer. (2) Lung-related (e.g., lung cancer voxel segmentation analysis): The voxel segmentation results of the range are calculated from the uppermost layer of the first thoracic vertebra (T1) to the lowermost layer of the twelfth thoracic vertebra (T12). (3) Abdominal and pelvic region (e.g., colorectal cancer voxel segmentation analysis): The voxel segmentation results of the range are calculated from the uppermost layer of the first lumbar vertebra (L1) to the lowermost layer of the sacrum (S). (4) Specific visceral organs: The user inputs two specific vertebrae according to the anatomical situation, and the system directly obtains the uppermost and lowermost layers according to the positioning results and calculates the voxel segmentation results of the range. Since the user does not need to directly select a specific layer, the human error caused by subjective differences between users is avoided. (5) Specific spinal single layer: The user inputs a specific spine, and the system directly obtains the middle layer of the spine and calculates the voxel segmentation results of the range, which can be compared with the existing single-layer calculation method.
[0116] S5. Calculate the relevant indicators and interrelationships of human body composition based on the spinal positioning results and body composition segmentation results.
[0117] S51. Select the corresponding voxel range based on the spinal location and task, and calculate the basic volume parameters of body composition analysis based on the body composition segmentation results: skeletal muscle volume (SMV), subcutaneous fat volume (SFV), intramuscular fat volume (IMFV), visceral fat volume (VFV), and bone volume (BV).
[0118] In one specific embodiment, the calculation method is as follows: First, the resolution information of the CT image is read, which includes the X-axis resolution S. x Y-axis resolution S y and Z-axis layer thickness S z Then the size of each actual voxel can be expressed as the product of three factors: V = S x ×S y ×S z After calculating the actual voxel size, it is only necessary to count the total number of pixels occupied by the corresponding volume component segment and multiply it by the actual voxel size to obtain the volume result of each volume component. At the same time, the average muscle radiometric density (AMD) of the muscle in the CT image is calculated by reading the gray values of the corresponding points of the muscle in the CT image and averaging all gray values.
[0119] S52. Calculate the comprehensive parameters based on the segmentation results. Skeletal muscle content: Skeletal muscle content = Skeletal muscle volume × Average muscle radiation density; Total fat content: Total fat content = Skeletal muscle volume × Average muscle radiation density.
[0120] S53. Calculate the bone standardization results based on the segmentation results and perform body composition analysis: skeletal muscle-bone volume ratio = skeletal muscle volume / bone volume, subcutaneous fat-bone volume ratio = subcutaneous fat - volume / bone volume, intramuscular fat-bone volume ratio = intramuscular fat volume / bone volume, visceral fat-bone volume ratio = visceral fat volume / bone volume.
[0121] S54. Calculate the interrelationships between soft tissues based on the segmentation results to analyze their distribution and balance: subcutaneous fat-skeletal muscle volume ratio = subcutaneous fat volume / skeletal muscle volume, skeletal muscle fat volume fraction = intermuscular fat volume / skeletal muscle volume, visceral fat-skeletal muscle volume ratio = visceral fat volume / skeletal muscle volume, visceral fat-subcutaneous fat volume ratio = visceral fat volume / subcutaneous fat volume, intermuscular fat-subcutaneous fat volume ratio = intermuscular fat volume / skeletal muscle volume, intermuscular fat-visceral fat volume ratio = intermuscular fat volume / visceral fat volume.
[0122] S55. When clinical data on patient height is available, the system supports inputting the patient's height for height-standardized body composition analysis. Based on the segmentation results, when the user inputs their height, body composition can be standardized for height: Skeletal muscle volume index = Skeletal muscle volume / Height 2 Subcutaneous fat volume index = subcutaneous fat volume / height 2 Intramuscular fat volume index = intramuscular fat volume / height 2 Visceral fat volume index = visceral fat volume / height 2 Bone volume index = bone volume / height 2 .
[0123] S56. Print out the above results as the corresponding body composition analysis report for easy evaluation by clinicians.
[0124] This invention implements a CT image-based body composition analysis technology. By further utilizing and annotating the data, and through automatic segmentation and localization of the spine, it achieves fully automated three-dimensional body composition analysis, reducing human error. Furthermore, by providing an interface, it offers clinicians more options for body composition calculation. Finally, by performing more detailed modeling of the relationships between body components, it facilitates further evaluation by clinicians.
[0125] like Figure 5 As shown, in another embodiment of this application, a three-dimensional fully automated human body composition analysis system 100 based on CT images is provided. The system includes a data acquisition module 101, a preprocessing module 102, a first model construction module 103, a first model training module 104, a second model construction module 105, and a segmentation module 106.
[0126] The data acquisition module 101 is used to acquire CT image data and annotate the CT image data;
[0127] The preprocessing module 102 is used to preprocess the labeled CT image data;
[0128] The first model construction module 103 is used to construct a spine segmentation model. It uses the maximum connected component method to obtain the minimum, maximum, and center values of each vertebra's position. The minimum, maximum, and center values are the uppermost, lowermost, and middle layers of the vertebral body required for spine localization. Bilinear interpolation is performed on all corresponding layers of the vertebral body to obtain a new image and visualization results. The spine segmentation model adds an attention module to the skip connection part of the preset U-Net network. The attention module is used to learn the overall information of the spine so that the spine segmentation model can obtain better segmentation results.
[0129] The first model training module 104 is used to train the spine segmentation model based on a loss function until the spine segmentation model converges; the loss function is the sum of the Dice loss function and the cross-entropy loss function;
[0130] The second model building module 105 is used to build a volume component segmentation model and train the volume component segmentation model until convergence using the same loss function and deep supervision method as the spine segmentation model.
[0131] The segmentation module 106 is used to obtain the spinal positioning result and body composition segmentation result based on the trained spinal segmentation model, and to calculate the relevant indicators and interrelationships of human body composition based on the spinal positioning result and body composition segmentation result.
[0132] It should be noted that the CT image-based three-dimensional fully automated human body composition analysis system of the present invention corresponds one-to-one with the CT image-based three-dimensional fully automated human body composition analysis method of the present invention. The technical features and beneficial effects described in the above embodiments of the CT image-based three-dimensional fully automated human body composition analysis method are applicable to the embodiments of the CT image-based three-dimensional fully automated human body composition analysis. For details, please refer to the description in the embodiments of the present invention, which will not be repeated here.
[0133] Furthermore, in the embodiments of the CT image-based three-dimensional fully automated human body composition analysis system described above, the logical division of each program module is merely illustrative. In actual applications, the above functions can be assigned to different program modules as needed, for example, for the sake of corresponding hardware configuration requirements or the convenience of software implementation. That is, the internal structure of the CT image-based three-dimensional fully automated human body composition analysis system can be divided into different program modules to complete all or part of the functions described above.
[0134] This application also provides a computer-readable storage medium storing a computer program. When executed by a processor, the computer program can implement the steps and corresponding content of the aforementioned method embodiments, specifically:
[0135] Acquire CT image data and annotate the CT image data;
[0136] Preprocess the labeled CT image data;
[0137] A spinal segmentation model is constructed, and the minimum, maximum, and center values of each vertebra's position are obtained using the maximum connected component method. These minimum, maximum, and center values represent the uppermost, lowermost, and middle layers of the vertebral body required for spinal localization. Bilinear interpolation is performed on all corresponding layers of the vertebral body to obtain a new image and visualization results. The spinal segmentation model incorporates an attention module into the skip connection part of a pre-defined U-Net network. This attention module is used to learn the overall information of the spine, thereby improving the segmentation performance of the spinal segmentation model.
[0138] The spine segmentation model is trained based on the loss function until the spine segmentation model converges; the loss function is the sum of the Dice loss function and the cross-entropy loss function.
[0139] Construct a volume component segmentation model and train it using the same loss function and deep supervision method as the spine segmentation model until convergence;
[0140] Based on the trained spine segmentation model and body composition segmentation model, spine localization results and body composition segmentation results are obtained. Based on the spine localization results and body composition segmentation results, relevant indicators and interrelationships of human body composition are calculated. Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a non-volatile computer-readable storage medium. When executed, the program can include the processes of the embodiments described above. Any references to memory, storage, databases, or other media used in the embodiments provided in this application can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
[0141] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0142] The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above embodiments. Any changes, modifications, substitutions, combinations, or simplifications made without departing from the spirit and principle of the present invention shall be considered equivalent substitutions and shall be included within the protection scope of the present invention.
Claims
1. A three-dimensional fully automated body composition analysis method based on CT images, characterized by, The method comprises the following steps: Collecting CT image data, labeling the CT image data; Pretreating the labeled CT image data; A spine segmentation model is constructed, and the minimum value, maximum value and center value of the position of each spine are obtained by using a maximum connected domain method, wherein the minimum value, maximum value and center value are the uppermost layer, lowermost layer and middle layer of the vertebral body required for spine positioning, a new image is obtained by performing bilinear interpolation on all vertebral body corresponding layers to obtain a visualization result; the attention module is added to the skip connection part of the preset U-Net network, and the attention module is used to learn the overall information of the spine, so that the spine segmentation model obtains better segmentation effect; the spine segmentation model is constructed based on the U-Net network, and the attention module is added to the skip connection part of the U-Net network; when the feature map coded by the encoder passes through the attention module, first, a three-dimensional convolution is performed through a convolution kernel of a specified size, and then a nonlinear activation is performed through a PReLU activation function, and then a three-dimensional convolution is performed again; the attention module obtains global information through a bar average pooling method; after the bar average pooling, the sampling result is up-sampled through bilinear interpolation; finally, the result of the attention module is weighted and superimposed with the original feature map, and then the weighted and superimposed result is sent to the decoder for decoding; The spine segmentation model is trained based on a loss function until the spine segmentation model converges; the loss function is the sum of a Dice loss function and a cross-entropy loss function; A body composition segmentation model is constructed, and the body composition segmentation model is trained by using the same loss function and deep supervision method as the spine segmentation model until the body composition segmentation model converges; Based on the trained spine segmentation model and the body composition segmentation model, a spine positioning result and a body composition segmentation result are obtained, and human body body composition related indexes and mutual relationships are calculated based on the spine positioning result and the body composition segmentation result; The minimum value, maximum value and center value of the position of each spine are obtained by using the maximum connected domain method, and the specific method is as follows: After the spine segmentation model outputs a result, first, the sum of all pixel points of each layer on the sagittal plane of the CT image data equal to the label value of the spine to be positioned is calculated, then the normalized result of each layer is obtained by dividing the sum of the pixel points of the spine in the entire CT image, then the normalized result is weighted and summed with the index value of each layer to obtain the most complete layer; After obtaining the most complete layer corresponding to each spine, the most complete layer corresponding to each spine is converted into a binary image, the region of the spine segmented out is 1, and the remaining regions are 0, the binary image is scanned from the top left corner, when a foreground pixel is encountered, it is marked as a connected domain, and it is checked whether there is a foreground in the four-connected domain range, if yes, it is marked as the same connected domain, if no, it is marked as the end, when a new foreground pixel is encountered, a new connected domain is marked; when all points are traversed, two maximum connected domains can be obtained: the connected domain of the main vertebral body and the connected domain of the spinous process, according to the anatomical structure characteristics of the human body, the left connected domain is the connected domain of the main vertebral body. According to the connected domain result, the minimum and maximum longitudinal coordinate values of the connected domain are obtained, and the corresponding centroid coordinate value is obtained, that is, the uppermost layer, the lowermost layer and the middle layer of the vertebrae required for the spine positioning.
2. The three-dimensional fully automated body composition analysis method based on CT images according to claim 1, characterized by, The CT image data is labeled, specifically: The CT image data is labeled for spine segmentation and three-dimensional labeling of body composition data; the spine segmentation labeling includes labeling of cervical vertebrae, thoracic vertebrae, lumbar vertebrae and sacrum; the three-dimensional labeling of body composition data includes labeling of skeletal muscle, intermuscular fat, visceral fat, subcutaneous fat and bone.
3. The method of claim 1, wherein the CT image-based three-dimensional full-automatic human body composition analysis method is characterized by, The preprocessing includes anonymization, standardization and resampling, specifically: The information related to the patient in the CT image data is anonymized, and the original DICOM format data is converted to NII format data; The image gray value of the corresponding position of the labeled CT image data is read, and the mean, variance and quantile of the gray value are counted. After counting, the gray value is truncated, and standardization is achieved by subtracting the counted mean and dividing by the variance; The image data after standardization and its corresponding labeled data are resampled to the resolution corresponding to the mean, and the preprocessing result is stored.
4. The fully automated three-dimensional body composition analysis method based on CT images according to claim 1, wherein, The Dice loss function is expressed by the following equation: in, This represents the total number of pixels. Indicates the true segmentation label. Labels indicating predictions Indicates the first 1 pixel; cross-entropy loss function is: Thus, the total loss function is: 。 5. The fully automated three-dimensional body composition analysis method based on CT images according to claim 1, wherein, After obtaining the spine positioning result and the body composition segmentation result, the spine positioning result is combined with the body composition segmentation to obtain the segmentation result of the corresponding voxel, and the segmentation result includes: Whole voxel result: from the first layer to the last layer of CT, the body composition segmentation result of the CT voxel is calculated; Lung-related: from the uppermost layer of the first thoracic vertebra to the lowermost layer of the twelfth thoracic vertebra, the body composition segmentation result in the range is calculated; Abdominal and pelvic part: from the uppermost layer of the first lumbar vertebra to the lowermost layer of the sacrum, the body composition segmentation result in the range is calculated; Specific internal organs: according to the anatomical situation, input two specific vertebrae, directly obtain the uppermost layer and the lowermost layer according to the positioning result, and calculate the body composition segmentation result in the range; Specific spine single layer: input a specific spine, directly obtain the middle layer of the spine, and calculate the body composition segmentation result in the range, which can be compared with the existing single layer calculation method.
6. The fully automated three-dimensional body composition analysis method based on CT images according to claim 1, wherein, The human body composition related indexes and their relationships are calculated based on the spine positioning result and the body composition segmentation result, specifically: According to the spine positioning and task selection, the corresponding voxel range is selected, and the body composition analysis basic volume parameters are calculated according to the body composition segmentation result: skeletal muscle volume, subcutaneous fat volume, intermuscular fat volume, visceral fat volume and bone volume; According to the segmentation result, the comprehensive parameters are calculated, skeletal muscle content: skeletal muscle content = skeletal muscle volume x average muscle radiation density, total fat content: total fat content = skeletal muscle volume x average muscle radiation density; According to the segmentation result, the bone quality standardization result is calculated, and the body composition analysis is performed: skeletal muscle-skeletal volume ratio = skeletal muscle volume / skeletal volume, subcutaneous fat-skeletal volume ratio = subcutaneous fat volume / skeletal volume, intermuscular fat-skeletal volume ratio = intermuscular fat volume / skeletal volume, visceral fat-skeletal volume ratio = visceral fat volume / skeletal volume; According to the segmentation result, the mutual relationship between soft tissues is calculated to analyze the distribution and balance relationship: subcutaneous fat-skeletal muscle volume ratio = subcutaneous fat volume / skeletal muscle volume, skeletal muscle fat volume fraction = intermuscular fat volume / skeletal muscle volume, visceral fat-skeletal muscle volume ratio = visceral fat volume / skeletal muscle volume, visceral fat-subcutaneous fat volume ratio = visceral fat volume / subcutaneous fat volume, intermuscular fat-subcutaneous fat volume ratio = intermuscular fat volume / skeletal muscle volume, intermuscular fat-visceral fat volume ratio = intermuscular fat volume / visceral fat volume; When the clinical data of patient height can be acquired, input the patient height, and perform height-standardized body composition analysis; that is, according to the segmentation result, when the user inputs the height, the body composition can be height-standardized: skeletal muscle volume index = skeletal muscle volume / height 2 , subcutaneous fat volume index = subcutaneous fat volume / height 2 , intermuscular fat volume index = intermuscular fat volume / height 2 , visceral fat volume index = visceral fat volume / height 2 , bone volume index = bone volume / height 2 .
7. A three-dimensional fully automated body composition analysis system based on CT images, characterized by The CT image-based three-dimensional full-automatic human body composition analysis method according to any one of claims 1-6 comprises a data acquisition module, a preprocessing module, a first model construction module, a first model training module, a second model construction module, and a segmentation module. The data acquisition module is configured to acquire CT image data and label the CT image data. The preprocessing module is configured to preprocess the labeled CT image data. The first model construction module is configured to construct a spine segmentation model, obtain a minimum value, a maximum value, and a center value of the position of each spine by using a maximum connected domain method, the minimum value, the maximum value, and the center value being the uppermost layer, the lowermost layer, and the middle layer of the vertebral body required for spine positioning, perform bilinear interpolation on all vertebral body corresponding layers to obtain a new image, and obtain a visualization result; the spine segmentation model is obtained by adding an attention module to a skip connection part of a preset U-Net network, and the attention module is configured to learn overall information of the spine, so that the spine segmentation model has better segmentation effect. The first model training module is configured to train the spine segmentation model based on a loss function until the spine segmentation model converges; and the loss function is a sum of a Dice loss function and a cross-entropy loss function. The second model construction module is configured to construct a body composition segmentation model, and train the body composition segmentation model by using the same loss function and deep supervision method as the spine segmentation model until the body composition segmentation model converges. The segmentation module is configured to obtain spine positioning results and body composition segmentation results based on the trained spine segmentation model, and calculate human body composition related indexes and mutual relationships based on the spine positioning results and the body composition segmentation results.
8. A computer-readable storage medium storing a program, characterized in that, The program, when executed by a processor, implements the CT image-based three-dimensional full-automatic human body composition analysis method according to any one of claims 1-6.