Lung nodule recognition model based on attribute privilege and capsule network and application thereof
By combining attribute privileges and capsule networks in the lung nodule identification model, the problem of the failure of existing technologies to effectively utilize lung nodule attribute information and spatial relationships is solved, thereby improving the accuracy and efficiency of lung nodule identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU DIANZI UNIV
- Filing Date
- 2023-09-28
- Publication Date
- 2026-06-26
Smart Images

Figure CN117253048B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of medical image processing research, and relates to a lung nodule recognition model and its application based on attribute privilege and capsule network. Background Technology
[0002] Lung cancer is one of the deadliest and most threatening cancers. However, by examining the lungs and detecting lung nodules early, and diagnosing their benign or malignant nature, the survival rate of lung cancer patients can be significantly improved. With the development of medical testing technology, radiologists can examine a patient's lungs using CT scans. However, due to the large number of images in CT scans, manually examining these images is extremely time-consuming and labor-intensive for radiologists. Furthermore, radiologists' assessments are often based on their own experience, lacking independent examination and thus lacking objectivity. Therefore, computer-aided diagnostic systems allow doctors to quickly make a preliminary diagnosis of a patient's lung condition, thereby improving diagnostic efficiency.
[0003] In the early stages, researchers mostly chose to manually extract features from lung nodule images and use these features to identify the benign or malignant nature of the nodules. With the development of deep learning in image analysis, researchers are now using deep learning techniques to determine the benign or malignant nature of lung nodules. Deep learning can automatically extract lung nodule features from images to identify their benign or malignant nature, achieving end-to-end classification without the need for manual feature extraction, thus avoiding the complexity and tediousness of manual feature extraction.
[0004] In recent years, numerous algorithms using deep learning to identify lung nodules have emerged. Two-dimensional images of lung nodules are easy to obtain and require little data, leading many researchers to explore their use in assessing the benign or malignant nature of lung nodules. Xie et al. classified lung nodules by extracting two-dimensional cross-sections from different perspectives and combining semi-supervised and adversarial networks. Sahu et al. extracted cross-sectional images of lung nodules from multiple perspectives, used multiple lightweight networks to extract corresponding features, and fused features from different perspectives to assess the benign or malignant nature of lung nodules. This method can also be used to select representative cross-sections to determine the degree of malignancy, facilitating result interpretation. Liu et al. proposed a deep network model called Res-trans to identify the benign or malignant nature of lung nodules. A Transformer structure was added to the network to capture global features of lung nodules, and a sequence fusion module was designed to process the feature information of the Transformer structure, improving classification accuracy. Although using two-dimensional images of lung nodules to identify benign or malignant nodules yields good results, two-dimensional images cannot represent the three-dimensional spatial information of lung nodules.
[0005] Three-dimensional images of lung nodules contain overall three-dimensional spatial information of the nodules. Using deep learning methods to determine their benign or malignant nature can maximize diagnostic performance. Jiang et al. used three-dimensional images of lung nodules to assess their benignity or malignancy. They designed contextual and spatial attention mechanisms in the recognition model to improve feature extraction of lung nodules and used an ensemble strategy to improve classification performance. Xu et al. proposed a method called MSCS-DeepLN to assess the malignancy of lung nodules. They extracted three different sizes of images from CT images and input them into corresponding lightweight networks. Each network obtained preliminary recognition results, and finally, they fused these results to calculate the final benign or malignant result. Huang et al. extracted two different sizes of three-dimensional images of lung nodules and input them into the network. They extracted intra- and extra-intra ... Summary of the Invention
[0006] To address the issues that most current lung nodule identification methods do not utilize lung nodule attribute information, and that CNN network structures are insensitive to the relative spatial relationships of different parts in lung nodule images, this invention proposes a lung nodule identification model and its application based on attribute privilege and capsule networks.
[0007] One aspect of the present invention provides a lung nodule identification model based on attribute privilege and capsule network, including a Res2net network module, a convolution module, a channel and spatial attention module, a first identification performance enhancement module and a second identification performance enhancement module;
[0008] The Res2net network module is used to obtain multiple receptive field outputs of the input features.
[0009] The convolutional module is connected to the output of the Res2net network module and is used for feature extraction of lung nodules.
[0010] The channel and spatial attention module is connected to the output of the convolution module and is used to assign different weights to features of different channels and different spaces in lung nodule features.
[0011] The first recognition performance enhancement module is connected to the output of the channel and space attention module, and is used to add lung nodule attribute information during training to improve the accuracy of the model in recognizing benign and malignant nodules.
[0012] The second recognition performance enhancement module is connected to the output of the channel and space attention module. It is used to add a capsule structure to the recognition model to help the model recognize the relative spatial relationship of different parts in the lung nodule image and improve the performance of judging the benign or malignant nature of lung nodules.
[0013] The outputs of the first and second identification performance enhancement modules are connected to a fully connected network, which outputs the benign or malignant status of the lung nodules.
[0014] Another aspect of the present invention provides the application of the above-described lung nodule identification model in lung nodule identification.
[0015] The present invention has the following advantages:
[0016] This invention combines a privileged learning paradigm, using the attribute information of lung nodules as privileged information. This means that the attribute information of lung nodules is only needed during training, and does not need to be obtained in advance when identifying lung nodules.
[0017] To more effectively extract information from lung nodule images, CapsNets are introduced into the recognition network. Unlike CNNs, CapsNets use capsules, which are groups of neuron vectors, to represent feature instances instead of individual neurons. Feature information is transmitted in the form of "vectors" within the network. This helps the model recognize the relative spatial relationships between different parts of lung nodule images, improving the performance in determining the benign or malignant nature of lung nodules. Attached Figure Description
[0018] Figure 1 This is a network model diagram of the present invention.
[0019] Figure 2 This is a diagram of the Res2net network structure.
[0020] Figures 3(a), 3(b), and 3(c) are the structure diagrams of the CAS attention module, the ECA attention model architecture, and the spatial attention architecture, respectively.
[0021] Figure 4 This describes the specific process of converting a regular feature map into a capsule structure.
[0022] Figure 5 This is a diagram of the baseline model structure.
[0023] Figure 6 The ROC curves for each model in the ablation experiment are shown.
[0024] Figure 7 This is a partial image showing the results of lung nodule identification. Detailed Implementation Plan
[0025] The embodiments of the present invention will be described in detail below with reference to the accompanying drawings: This embodiment is implemented based on the technical solution of the present invention, and provides detailed implementation schemes and specific operation processes.
[0026] This application proposes a novel deep learning-based lung nodule identification model. For example... Figure 1 As shown, the implementation of this embodiment mainly includes four steps: (1) feeding the image into a 3×3 convolutional block named preblock and a Res2net network module; (2) feeding the data obtained above into a structure named Conv1, which consists of two cascaded 3×3 convolutional layers used for feature extraction of lung nodules in the network; (3) then feeding it into an attention module named CAS (Channel and Space); (4) the subsequent model is divided into two branches, one branch is used to add lung nodule attribute information during training, and the other branch is used to add capsule structures to the recognition model. The results of the two branches are concatenated together and passed through a fully connected network (fcfinal) to obtain the final benign or malignant nodule.
[0027] The following is a detailed explanation of each part.
[0028] like Figure 2 The diagram shows the Res2net network structure. Through its unique network structure, it can extract information from input features at multiple scales without significantly increasing the number of parameters. In this embodiment, the input features, after dimensionality reduction via a 1×1 convolution kernel, are divided into n parts (n=4), with each part having one-nth of the previous number of channels. Except for the first feature part, each feature part corresponds to a 3×3 convolution kernel for feature extraction. The first feature part is directly output; subsequent features are output after their corresponding convolution and are also convolved with the next feature element-wise. Then, the outputs of the n features are concatenated at the channel level, subjected to 1×1 convolution kernel for dimensionality increase, and finally added element-wise with the initial input features to obtain the final output.
[0029] In this embodiment, the receptive field increases sequentially after each 3×3 convolution. Finally, the outputs of different receptive fields are combined to obtain the outputs of multiple receptive fields for the input feature.
[0030] In this embodiment, the output of the Conv1 structure is fed into a joint channel and space attention module (CAS). With a very small number of parameters, iterative learning can adaptively assign different weights to features in different channels and spaces within the lung nodule features. In CAS, channel attention processing is first applied to the input features, and then spatial attention processing is applied to the processed features.
[0031] Furthermore, as shown in Figure 3(a), the CAS attention module structure is illustrated. In the channel attention part, the Efficient Channel Attention (ECA) structure is selected to increase the weight of effective features in different channels in the lung nodule features.
[0032] Furthermore, as shown in Figure 3(b), the ECA attention model architecture is presented. First, the input feature F... in The coefficients for each channel are obtained using average pooling (GAP) along the channel dimension. Then, a one-dimensional convolution (Conv1D) with a kernel size of k (odd number) is used to convolve the coefficients, yielding the common response of each channel with its neighboring channels. Finally, the attention coefficients are mapped to a range of 0 to 1 using the non-linear activation function Sigmoid, resulting in the attention coefficients A. c Then, multiply the input features by real numbers and assign weights along the channel dimension.
[0033] The formula for calculating ECA attention is:
[0034] A c =σ(Conv1D(GAP(F) in ))) (1)
[0035] Where σ represents the Sigmoid activation function.
[0036] Preferably, the size k of the one-dimensional convolution kernel is adaptively determined based on the number of channels in the input feature, and the calculation formula is as follows:
[0037]
[0038] Where C is the number of channels, γ = 2, and b = 1.
[0039] Furthermore, as shown in Figure 3(c), the spatial attention architecture is illustrated. In the spatial attention component, the input lung nodule features F... ins The maximum and average values of features are extracted across channels along the channel dimension to obtain the average pooling feature F. avg With max pooling feature F max These two features are similar to F in both high and wide dimensions. insTwo feature maps, identical in appearance but with a channel dimension of 1, are concatenated along their channel dimension. A learnable 2D convolutional kernel with one parameter is used to generate spatial attention coefficients for the nodules. The coefficients are then mapped to a range of 0 to 1 using a sigmoid function, yielding the final spatial attention coefficients A for the lung nodules. s The obtained spatial coefficients are multiplied by the input lung nodule features, which allows for the allocation of spatial feature weights across channel dimensions for the lung nodule features.
[0040] The formula for calculating spatial attention is:
[0041] A s =σ(Conv2D([F avg ;F max ])) (3)
[0042] The subsequent model is divided into two branches: one branch (the first recognition performance enhancement module) is used to incorporate lung nodule attribute information during training, and the other branch (the second recognition performance enhancement module) is used to incorporate capsule structures into the recognition model. The results of the two branches are concatenated and passed through a fully connected network (fcfinal) to obtain the final benign or malignant nodule.
[0043] In one embodiment, a branch connects a 5×5 kernel convolution named Conv2, followed by a capsule module, which outputs capsule features with a dimension of 2×16. The capsule features are used to predict the benign or malignant nature of lung nodules by calculating their modulus length, and are also combined with the features of the attribute branch to jointly predict the benign or malignant nature of lung nodules.
[0044] Furthermore, the capsule network's general structure consists of three parts: First, a two-dimensional convolutional layer with a kernel size of 9, which adjusts the feature map to a suitable size to facilitate subsequent capsule structure calculations; next, a primary capsule network, which changes the dimension of feature segmentation, transforming features from ordinary scalar features into vector features of capsule structures; and finally, a category capsule layer, which uses a dynamic routing algorithm with the primary capsule layer to learn the weight parameters between capsules and optimize the extraction of lung nodule features.
[0045] The specific process of converting ordinary feature maps into capsule structures is as follows: Figure 4 After convolution by a 2D convolutional layer with a kernel size of 9, a stride of 2, and 64 output channels, the dimension of a single feature map is C×H×W (number of channels × height × width). First, the feature map dimension is transformed, segmenting the original channel dimension C into a vector dimension of 1×d for each individual capsule. p At this point, the feature map dimension is ((C / d) p )×d p×H×W). Then, the feature maps after dimension segmentation are transformed using the permute method in the PyTorch framework to swap the dimensions of the feature maps, exchanging the first dimension with the second and third dimensions of the feature maps. The size of the swapped feature map is ((C / d)). p )×H×W×d p Finally, the feature map is dimensionality reduced by merging the first three dimensions together, which together constitute the number of capsules in the capsule structure feature map. At this point, the feature map dimension is n. p ×d p (Number of capsules × Capsule dimensions), where n p =(C / d p )×H×W.
[0046] In one embodiment, another branch obtains the attribute information of the lung nodules. The feature maps are processed through a convolution operation called attConv, which is a two-dimensional convolution with two cascaded kernels of size 3×3 and residual structure. The feature maps are then average-pooled (gap) and passed through a fully connected network (attfc1) to obtain the eight attribute rating scores of the lung nodules. The average-pooled features are then concatenated with capsule neurons that have undergone dimensionality transformation through a fully connected network (attfc2), and finally processed through a fully connected network (fcfinal) to obtain the final nodule benign or malignant result.
[0047] To verify the performance of the model described in this application, namely the Res2net multi-scale module, capsule structure, CAS attention structure, and training paradigm using lung nodule attribute information as privileged information, a series of ablation experiments were conducted on the experimental dataset.
[0048] The baseline model has the same network depth as the complete network. Each part is replaced with a standard residual structure with the same number of convolutional channels, while attribute branches and capsule structures are removed. See details... Figure 5 .
[0049] The ablation experiments were conducted by progressively adding network structures and experimental paradigms to the baseline model. The specific experimental setup was as follows:
[0050] Model 1: such as Figure 5 As shown, a baseline model is used to identify benign and malignant pulmonary nodules;
[0051] Model 2: Adds the Res2net multi-scale feature extraction structure to the Model 1 model;
[0052] Model 3: To verify the effectiveness of the CAS structure, a CAS structure was added to the Model 2 model;
[0053] Model 4: To verify the effectiveness of attribute information in lung nodule identification, an attribute prediction branch was added to the Model 3 model, and L was calculated. att ;
[0054] Model 5: To verify the effectiveness of the capsule structure in identifying lung nodules, a capsule structure was added to the Model 4 model, and L was calculated. cap That is, the complete attribute privileges and capsule network structure.
[0055] The locations of the added network structures in each model of the above ablation experiments are consistent with the locations of the network structures in the complete network. Specific experimental results are shown in Table 1. To more intuitively represent the comparison of the ablation experiment results, ROC curves of the identification results for each model in the ablation experiments were plotted, see [link to table]. Figure 6 .
[0056] Table 1 Comparison of ablation experiments
[0057]
[0058] In Table 1, Model 1 (the baseline model) has the lowest overall score across the four evaluation metrics; the four metrics increase sequentially with the addition of improved structures, with Model 5 achieving the highest score. Model 2's S... PE The metrics are slightly lower than Model 1. From the baseline model to the complete network model, the metrics for classification results generally increase gradually, with Model 5 achieving the highest values for all four metrics, resulting in the best overall performance. (See Table 1 for details.) Figure 6 It can be concluded that the improvements made to each part of the identification model proposed in this application, compared with the baseline model, have a positive effect on improving the model's performance in lung nodule identification.
[0059] To more intuitively demonstrate the model's ability to identify benign and malignant lung nodules, Figure 7 A visualization of some of the recognition results is shown. The benign or malignant label of the lung nodules is represented by 0 or 1, where 0 represents benign and 1 represents malignant. The first row is a two-dimensional image of the lung nodules, with the following rows from top to bottom: the true benign / malignant label, the model-predicted benign / malignant value, and the model-predicted probability value. When the predicted value is greater than a threshold of 0.5, the lung nodule is predicted to be malignant; otherwise, it is considered benign. The image shows that the module described in this application can classify lung nodules relatively well.
[0060] The embodiments described above are merely preferred examples of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.
Claims
1. A lung nodule identification model based on attribute privilege and capsule networks, characterized in that: It includes a Res2net network module, a convolution module, a channel and spatial attention module, a first recognition performance enhancement module, and a second recognition performance enhancement module; The Res2net network module is used to obtain multiple receptive field outputs of the input features; The convolutional module is connected to the output of the Res2net network module and is used for feature extraction of lung nodules; The channel and spatial attention module is connected to the output of the convolution module and is used to assign different weights to features of different channels and different spaces in lung nodule features. The first recognition performance enhancement module is connected to the output of the channel and space attention module, and is used to add lung nodule attribute information during training to improve the accuracy of the model in identifying benign and malignant nodules; the lung nodule attribute information is privileged information and is only needed during training, and does not need to be obtained in advance during recognition. The first recognition performance enhancement module includes a core size of The convolutional block and the capsule module connected thereafter are used to obtain capsule features. The capsule features are used to predict the benign or malignant nature of lung nodules by calculating their modulus length. On the other hand, they are combined with the features of the second recognition performance enhancement module to jointly predict the benign or malignant nature of lung nodules. The second recognition performance enhancement module is connected to the output of the channel and space attention module. It is used to add a capsule structure to the recognition model to help the model recognize the relative spatial relationship of different parts in the lung nodule image and improve the performance of judging the benign or malignant nature of lung nodules. The second recognition performance enhancement module includes two cascaded cores with residual structures, the size of which is... The system consists of a two-dimensional convolution, an average pooling layer, a first fully connected layer, and a second fully connected layer. The average pooling layer is used to perform average pooling on the feature map output by the two-dimensional convolution, and then the first fully connected layer is used to obtain the eight attribute rating scores of the lung nodules. The capsule feature is spliced together through a second fully connected layer; The capsule structure includes a two-dimensional convolutional layer used to adjust the size of the feature map, which facilitates subsequent capsule structure calculations. Primary capsule networks are used to change the dimensionality of feature segmentation, transforming features from ordinary scalar features into vector features with capsule structures. The category capsule layer uses a dynamic routing algorithm between itself and the primary capsule layer to learn the weight parameters between capsules and optimize the extraction of lung nodule features. The outputs of the first and second identification performance enhancement modules are connected to a fully connected network, which outputs the benign or malignant status of the lung nodules.
2. The lung nodule identification model based on attribute privilege and capsule network according to claim 1, characterized in that: It also includes a preprocessing module, located at the input of the Res2net network module, which uses convolutional blocks with a kernel size of 3×3.
3. The lung nodule identification model based on attribute privilege and capsule network according to claim 1, characterized in that: The convolutional module uses two cascaded convolutional layers with a kernel size of 3×3.
4. The lung nodule identification model based on attribute privilege and capsule network according to claim 1, characterized in that: The channel and spatial attention module first applies channel attention processing to the input features, and then applies spatial attention processing to the processed features.
5. The lung nodule identification model based on attribute privilege and capsule network according to claim 4, characterized in that: The specific process for handling channel attention is as follows: The coefficients for each channel are obtained by applying average pooling to the input features along the channel dimension; One-dimensional convolution is used to convolve the coefficients to obtain the common response of each channel with its neighboring channels; By using a non-linear activation function to map the coefficients to the range of 0 to 1, the channel attention coefficients are obtained.
6. The lung nodule identification model based on attribute privilege and capsule network according to claim 4, characterized in that: The maximum value and average value of the input features are extracted across channels along the channel dimension to obtain the average pooling feature and the max pooling feature. The two features are concatenated along the channel dimension, and the spatial location attention coefficients of the nodules are generated using a parameter-learnable two-dimensional convolutional kernel. The spatial attention coefficients are mapped to a range of 0 to 1 using a nonlinear activation function, thus obtaining the final spatial attention coefficients.
7. The application of the lung nodule identification model according to any one of claims 1 to 6 in lung nodule identification.