Three-dimensional convolutional neural network CT image lung nodule detection method and system based on multi-layer channel perception
By using a three-dimensional convolutional neural network based on multi-channel perception, the problems of false positives and false negatives in lung nodule detection are solved, improving detection accuracy and sensitivity, especially for small nodules and nodules connected to the pleura, reducing false positive results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV OF POSTS & TELECOMM
- Filing Date
- 2023-05-26
- Publication Date
- 2026-06-16
AI Technical Summary
Existing computer-aided diagnostic systems have problems with false positives and false negatives in the detection of lung nodules, especially for small nodules connected to the pleura, which are not accurate enough, leading to inconsistent diagnoses by different doctors.
A three-dimensional convolutional neural network based on multi-channel perception is adopted. By combining a multi-channel perception residual module and a candidate nodule generation module with a false positive filtering module, a three-dimensional neural network is constructed to detect lung nodules in CT images, thereby improving the recall and sensitivity of the detection algorithm.
It improves the accuracy and sensitivity of lung nodule detection, and can effectively detect lung nodules that are easily missed, such as lung nodules connected to the pleura and small lung nodules, reducing false positive results.
Smart Images

Figure CN116580018B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer vision and relates to a method and system for detecting lung nodules in CT images based on a three-dimensional convolutional neural network with multi-channel perception. Background Technology
[0002] Lung cancer is the most common cancer worldwide, characterized in its early stages by pulmonary nodules. Detection of pulmonary nodules is a crucial method for early lung cancer detection, significantly improving the survival rate of lung cancer patients. Recent research shows that annual low-dose computed tomography (LDCT) scans can reduce lung cancer mortality by 24% in men and 33% in women. LDCT works by scanning a specific thickness of the body with an X-ray beam. A sensor receives the X-rays passing through this layer, converts them into photoelectric signals, and a computer processes the data to create several identical cuboids called voxels. The scanned information is used to calculate the X-ray attenuation or absorption coefficient of each voxel, which is then arranged into a matrix, or digital matrix. Each number in the digital matrix is converted into small squares of varying grayscale, called pixels, and arranged in the matrix to form a CT image. Currently, computer-aided diagnostic (CAD) systems can assist doctors in diagnosing diseases effectively and accurately. However, traditional CAD mainly relies on morphology and doctors' experience for diagnosis, and different doctors may arrive at different diagnostic results.
[0003] Thanks to the development of deep learning, convolutional neural networks such as Faster R-CNN, SSD, and YOLO have been proposed and applied in medical image processing. Comparing the precision and recall curves of nodule detection by two doctors and a neural network, Doctor 1's precision was only 35.92%, with a recall of 70.20%; Doctor 2's precision was only 33.78%, with a recall of 73.80%. The neural network, with the same recall, achieved precision of 42.7% and 37.9% respectively, both higher than the precision of the two doctors.
[0004] Cancer prevention and treatment cannot rely solely on medical advancements. Just as the invention of the refrigerator significantly reduced stomach cancer rates in the United States in the last century, advancements in computer vision and artificial intelligence may hold the key to improving cancer treatment. Summary of the Invention
[0005] In view of this, the purpose of this invention is to provide a method and system for detecting lung nodules in CT images based on a three-dimensional convolutional neural network with multi-layer channel perception, to overcome the problem of false detection and false negative detection of small nodules in lung CT images, and to improve the recall and sensitivity of the target detection algorithm.
[0006] To achieve the above objectives, the present invention provides the following technical solution:
[0007] A method for detecting lung nodules in CT images based on multi-channel sensing three-dimensional convolutional neural networks includes the following steps:
[0008] S1: Convert the format of the CT image data;
[0009] S2: Map the HU values of the image, and perform preprocessing such as rotation, translation, cropping, and padding on the image before inputting it into the first layer encoder;
[0010] S3: The channel and spatial information are merged and encoded through the multi-channel sensing residual module and then downsampled multiple times;
[0011] S4: Then the encoded feature map is decoded four times through the multi-layer channel perception residual module and the deconvolution layer to restore the feature map. Then the feature maps of the same size in the encoder and decoder are stitched together in the channel direction. Regression and classification operations are performed through the candidate nodule generation module composed of two fully connected layers to obtain candidate nodules.
[0012] S5: Then, the CT image is processed through a convolutional downsampling feature map with a kernel size of 3×3×3 and a stride of 2. This feature map is then concatenated with the feature map of the same size from step S3 to obtain a candidate feature map. The candidate feature map is then compressed using a convolutional kernel size of 1×1×1. The position coordinates of the candidate nodules obtained in step S4 are then mapped onto the compressed feature map. Based on the mapped position, the feature map is segmented into a number equal to the number of nodules. This feature map is then input into two fully connected layers for regression and classification. The regression values of non-nodules are filtered out to reduce false positives.
[0013] Furthermore, step S1 specifically includes: processing the CT image in DICOM or MHD format into a binary file format that can be processed by the system.
[0014] Further, step S2 specifically includes: mapping the HU value of the image to the range [0, 255], performing rotation, translation, and cropping preprocessing on the image, with the input image cropped to a size of... The square, when cropped to the edge of the image is insufficient The extra voxels are filled with voxel values of 180 and then input into the first layer encoder.
[0015] Furthermore, step S3 specifically includes: downsampling comprising five downsampling operations, resulting in feature maps of the following sizes: , , , Finally, the size is obtained as The feature map is shown in parentheses, where the parameters represent the number of channels, length, width, and height of the feature map, respectively.
[0016] Furthermore, the multi-channel sensing residual module is obtained by fusing the multi-channel sensing module and the residual module;
[0017] The multi-layer channel perception module is used to improve the information fusion between the channels of each feature map. The multi-layer channel perception module includes channel feature weights and spatial feature weights. Channel feature weights refer to converting all channel feature maps into a set of feature vectors representing channel weights using global average pooling. These feature vectors contain the semantic information of all channels. Spatial feature weights refer to compressing all channel feature maps into a single feature map using average pooling along the channel direction. This feature map is obtained by averaging voxels at fixed positions along the channel direction. Each voxel in the compressed feature map contains information about the voxel positions of all channel feature maps and is concatenated with three dilated convolutions with a dilation distance of [1,2,3] between the convolution kernels. Then, it is compressed into a three-dimensional weight matrix with one channel. The specific calculation of channel feature weights and spatial feature weights is as follows:
[0018]
[0019]
[0020]
[0021] in Indicates global average pooling. Represented as the length, width, and height of the feature map. Represents the ReLU activation function. This represents the sigmoid activation function. , These are the weight matrices of the two convolutional layers. Indicates the input feature map, This represents the Hadamard product of the channel-aware feature weight matrix and the input.
[0022] , ,
[0023] ,
[0024]
[0025] in This indicates average pooling along the channel direction. This represents the sigmoid activation function. These represent dilated convolutions with kernel dilation distances of [3, 2, 1], respectively. It is the weight matrix of the compressed convolutional layer. The Hadamard product of the spatially perceived feature weight matrix of the input and the channel direction;
[0026] After extracting feature information through a bidirectional sensing mechanism of channel direction and spatial direction, convolutional coding is used and the enhanced features are back-propagated to the original feature map in the form of Hadamard product. On this basis, the multi-layer channel sensing module and the residual module are fused together. The fusion positions are distributed in the 2nd, 3rd, 4th and 5th layers of the feature downsampling network and the corresponding 1st, 2nd, 3rd and 4th layers of the feature upsampling network.
[0027] Furthermore, a three-dimensional neural network is constructed using a multi-channel perception residual module, a candidate nodule generation module, and a nodule false positive filtering module to process the input CT image and obtain the nodule classification and location results. The three-dimensional neural network model consists of two parts: encoding and decoding. The encoding part uses a multi-channel perception residual module and a max-pooling layer, while the decoding part uses deconvolution and a multi-channel perception residual module. The encoding part includes five modules: d1, d2, d3, d4, and d5, which consist of a multi-channel perception residual module and max-pooling downsampling to extract feature map information. The decoding part includes a multi-channel perception residual module and deconvolution upsampling to expand the feature map information. The sample consists of four modules: U1, U2, U3, and U4. The candidate nodule generation module integrates the information from U1, U2, U3, and U4 to generate candidate nodule information containing a large number of false positives, including location coordinates and nodule classification information. Its loss function adopts the cross-entropy loss function of hard negative sample mining. The nodule false positive filtering module uses the coordinates and categories output by the candidate nodule generation module as information to find false positive nodules. Its principle is to downsample the d2 feature map and the d1 feature map by convolution with a stride of 2 and then concatenate them in the channel direction to obtain a new feature map. The candidate nodule information is used to segment and find false positive nodules on the new feature map. Its loss function adopts the cross-entropy loss function.
[0028] On the other hand, the present invention provides a lung nodule detection system for CT images based on multi-layer channel perception three-dimensional convolutional neural network, including communication equipment, storage equipment, computing equipment, and display equipment;
[0029] The communication device is used to transfer DICOM or MHD format files output by the CT equipment to the storage device;
[0030] The storage device is used to store CT image data and lung nodule detection process and results data;
[0031] The computing device is used to load and execute the above-mentioned method for detecting lung nodules in CT images based on multi-channel perception three-dimensional convolutional neural networks;
[0032] The display device is used to display the results of lung nodule detection.
[0033] The beneficial effects of this invention are as follows: This invention is a three-dimensional convolutional neural network CT image lung nodule detection method based on a multi-layer channel perception module. Through the multi-layer channel perception module, the channel dimension and spatial dimension of the feature map are perceived, thereby improving the sensitivity of the network. The three-dimensional neural network constructed by the channel and spatial attention modules, the residual network module, the candidate nodule generation module, and the nodule false positive filtering module can detect lung nodules in three-dimensional CT images. It can detect lung nodules that are easily missed, such as lung nodules connected to the pleura and small lung nodules.
[0034] Other advantages, objectives, and features of the invention will be set forth in part in the description which follows, and in part will be apparent to those skilled in the art from the following examination, or may be learned from practice of the invention. The objectives and other advantages of the invention can be realized and obtained through the following description. Attached Figure Description
[0035] To make the objectives, technical solutions, and advantages of the present invention clearer, the preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, wherein:
[0036] Figure 1 This is a flowchart of a lung nodule detection method based on a multi-channel sensing module using a 3D convolutional neural network in Embodiment 1 of the present invention.
[0037] Figure 2 This refers to the multi-channel sensing residual module in Embodiment 1 of the present invention;
[0038] Figure 3 This is a flowchart of a 3D convolutional neural network lung nodule detection system based on a multi-layer channel sensing module, as described in Embodiment 1 of the present invention. Detailed Implementation
[0039] The following specific examples illustrate the implementation of the present invention. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Unless otherwise specified, the following embodiments and features can be combined with each other.
[0040] The accompanying drawings are for illustrative purposes only and are schematic diagrams, not actual pictures. They should not be construed as limiting the invention. To better illustrate the embodiments of the invention, some parts in the drawings may be omitted, enlarged, or reduced, and do not represent the actual product dimensions. It is understandable to those skilled in the art that some well-known structures and their descriptions may be omitted in the drawings.
[0041] In the accompanying drawings of the embodiments of the present invention, the same or similar reference numerals correspond to the same or similar components. In the description of the present invention, it should be understood that if terms such as "upper," "lower," "left," "right," "front," and "rear" indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, they are only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, the terms used to describe positional relationships in the drawings are only for illustrative purposes and should not be construed as limiting the present invention. For those skilled in the art, the specific meaning of the above terms can be understood according to the specific circumstances.
[0042] like Figure 1 As shown, this embodiment provides a 3D convolutional neural network method for lung nodule detection based on a multi-channel sensing module, including:
[0043] S1: Acquire 3D CT images and convert the image data from DICOM or MHD format into a binary file specific to this method;
[0044] S2: Map the HU values of the image to the range [0, 255], and perform preprocessing such as rotation, translation, and cropping on the image. The input image to be cropped is of size [value missing]. The square, when cropped to the edge of the image is insufficient The extra voxels are filled with voxel values of 180 and then input into the first layer encoder;
[0045] S3: Then, the channel and spatial information are merged and encoded by the multi-channel sensing residual module for downsampling. The downsampling includes five downsampling operations, and the resulting feature maps are of the following sizes: Finally, the size was obtained as The feature map, where the size parameter represents the number of channels, length, width, and height of the feature map, respectively;
[0046] S4: Then, the encoded feature map is decoded four times through a multi-channel perceptual residual module and a deconvolution layer to restore the feature map. The size of the feature maps is determined, and then the feature maps of the same size as the encoder and decoder are stitched together in the channel direction. Regression and classification operations are performed through a candidate nodule generation module composed of two fully connected layers.
[0047] S5: Then, the CT image is downsampled once through a convolution with a kernel size of 3×3×3 and a stride of 2. The feature map of size S3 is concatenated with the feature map of the same size in S3 to obtain a feature map of size S3. The candidate feature maps are then compressed using a 1×1×1 convolution kernel. Then, the position coordinates of the candidate nodules obtained in step S4 are mapped onto the compressed feature map. Based on the mapped positions, the nodules are then segmented into a number equal to the number of nodules and a size equal to the number of nodules. The feature map is then input into two fully connected layers for regression and classification, filtering out the regression values of non-nodules to reduce false positives.
[0048] Step S2 includes adaptive multi-scale feature map processing. During downsampling, the image size continuously decreases. When the input image size (length, width, and height) is an odd number, the size of the downsampled feature map is recorded. Then, the difference between this downsampled feature map and the upsampled feature map is calculated. Finally, missing feature maps are filled by padding with 180 (the 180 value differs significantly from the voxel value of the lung nodules). The purpose of this is to allow the network to adapt to CT images of different sizes and prevent data loss due to downsampling from affecting the network's data fusion operation.
[0049] Step S2 includes a multi-layer channel sensing module, such as... Figure 2This module is divided into channel feature weights and spatial feature weights. Channel feature weights refer to converting all channel feature maps into a set of feature vectors representing channel weights using global average pooling. These feature vectors contain the semantic information of all channels. Spatial feature weights refer to compressing all channel feature maps into a single feature map using average pooling along the channel direction. This feature map is obtained by averaging the voxels at fixed positions along the channel direction. Each voxel in the compressed feature map contains information about the voxel positions of all channel feature maps. It is then concatenated with three dilated convolutions with a dilated distance of [1,2,3]. This dilated convolutional layer expands the receptive field of the original 7×7×7 (343) ordinary convolution to 13×13×13 (2197) while maintaining the same amount of data computation. It is then compressed into a three-dimensional weight matrix with 1 channel. The specific principles of channel feature weights and spatial feature weights are as follows:
[0050] ,
[0051] ,
[0052]
[0053] in Indicates global average pooling. Represents the ReLU activation function. This represents the sigmoid activation function. , These are the weight matrices of the two convolutional layers. Indicates the input feature map, This represents the Hadamard product of the channel-aware feature weight matrix and the input.
[0054] , ,
[0055] ,
[0056]
[0057] in This indicates average pooling along the channel direction. Represents the ReLU activation function. This represents the sigmoid activation function. These represent dilated convolutions with kernel dilation distances of [1, 2, 3], respectively. It is the weight matrix of the compressed convolutional layer. The Hadamard product of the spatially perceived feature weight matrix of the input and the channel direction;
[0058] After extracting feature information through a bidirectional sensing mechanism in both channel and spatial directions, convolutional coding is used to backfeed the enhanced features to the original feature map in the form of Hadamard product. On this basis, the multi-layer channel sensing module and the residual module are fused together. The fusion positions are distributed in the 2nd, 3rd, 4th and 5th layers of the feature downsampling network and the corresponding 1st, 2nd, 3rd and 4th layers of the feature upsampling network, which can improve the efficiency of information fusion.
[0059] Step S3 includes residual modules. The residual network is a two-dimensional convolutional neural network proposed by four scholars from Microsoft Research. This embodiment extends it to three-dimensional depth and improves it. Instead of using the original fixed network model, 27 residual modules are divided into 9 modules, and each residual module has 2 layers of 3×3×3 convolutions. For multi-channel sensing residual modules, multi-channel sensing modules are inserted into the convolutional layers of the residual modules. Each multi-channel sensing module contains 6 convolutional layers, and there are a total of 156 convolutional layers excluding the preprocessing layer.
[0060] Step S3 includes a multi-channel sensing residual module and max pooling downsampling, the purpose of which is to extract feature map information; downsampling includes five modules: d1, d2, d3, d4, and d5.
[0061] Step S4 includes an information fusion operation, which concatenates the first, second, third, and fourth layers of the feature downsampling network with the second, third, and fourth layers of the feature upsampling network, respectively. Step S4 also includes a multi-channel perceptual residual module and deconvolutional upsampling, the purpose of which is to expand the feature map information; the upsampling includes four modules: U1, U2, U3, and U4.
[0062] Step S5 includes a candidate nodule generation module. This module integrates the information from U1, U2, U3, and U4 to generate candidate nodule information containing a large number of false positives, including location coordinates and nodule classification information. Its loss function adopts the cross-entropy loss function of hard negative sample mining. The specific principle is: generating [8] on the feature map. 3 10 3 16 2 32 3 50 3 The feature map is processed by generating anchors of size 1, and then the number of channels is reduced to 64 by performing a 3D convolution with a kernel of 1. Then, the number of output channels is changed to the number of anchors and 6 times the number of anchors respectively (for classification information and coordinate information). The feature vector is obtained by flattening and Sigmoid, and finally fed into the loss function for calculation.
[0063] Step S5 includes a false positive filtering module. This module uses the coordinates and category output by the candidate nodule generation module as information to identify false positive nodules. It concatenates the feature map obtained by downsampling the CT image through a 3×3×3 convolution kernel with a stride of 2, and the d1 feature map after further downsampling, to obtain a new feature map. False positive nodules are then searched for on this new feature map using the candidate nodule information. The loss function used is the cross-entropy loss function, specifically: the classification and coordinate information generated by the candidate nodule network can generate 16... 3 The anchors of different sizes are mapped onto the concatenated feature map obtained by downsampling the d2 and d1 feature maps through convolution. After being flattened, the feature map is fed into two fully connected layers, and then passed through two fully connected layers and a sigmoid function to obtain new feature vectors, which are then fed into the loss function for calculation.
[0064] Step S5 includes a nodule information generation function, which prints the identified nodule information to a table, the format of which is shown in Table 1:
[0065] Table 1
[0066]
[0067] In step S5, the num hard value for hard negative sample mining in the nodule candidate module is set to 3, and the cross-entropy loss function is:
[0068]
[0069]
[0070] To explain further:
[0071] The cross-entropy loss function of the false positive filtering module in step S5 is:
[0072]
[0073] The total loss function in step S5 is:
[0074]
[0075] The meanings of the parameters in the loss function in step S5 are as follows: The coordinate information representing candidate nodules includes location coordinates and size information; This represents the actual nodule information labeled during training; Represents the confidence level of the nodule; Represents the weighting parameter; This indicates the total number of all classification parameters; This represents the total number of regression parameters.
[0076] A three-dimensional neural network is constructed based on a multi-channel perception module, a residual network module, a candidate nodule generation module, and a nodule false positive filtering module. By processing the input CT image, the classification and location results of the nodules are obtained. During training, image enhancement techniques such as mirroring, inversion, cropping, scaling, and denoising are applied to the 3D CT image.
[0077] In steps S1-S5, a graphics card with at least 20GB of video memory is used for training. The training parameters are: BatchSize=8; epochs=200; the learning rate is 0.01 for the first 130 epochs, then 0.001 for 130 to 160 epochs, and then 0.0001 for 160 to 200 epochs; a false positive suppression network is added at 75 epochs; stochastic gradient descent is used as the optimizer.
[0078] The following are the metrics for evaluating the method of this invention:
[0079] FROC (Free-response receiver operating characteristic) is defined as the average sensitivity of each result when the number of false positives per image is 0.125, 0.25, 0.5, 1, 2, 4, or 8. Since CT images are three-dimensional, the radius distance is used to determine whether a nodule is a true positive. For example, if a candidate nodule is located within the label coordinates and the nodule radius, it is a true positive; otherwise, it is a false positive.
[0080] The experimental results (six-fold cross-validation) of the invention are shown in Table 2.
[0081] Table 2
[0082]
[0083] like Figure 3 As shown, this embodiment provides a 3D convolutional neural network lung nodule detection system based on a multi-channel sensing module, including:
[0084] The communication module transmits CT images to the detection system;
[0085] The storage module stores the transmitted CT images in the computer system;
[0086] The computing module applies the above methods to the computer system;
[0087] The display module shows the data results.
[0088] The communication equipment should have basic computer information communication functions, enabling it to transmit DICOM or MHD format files output by the CT equipment to the computer system. The communication equipment should have a firewall and should only connect to the internal network to prevent hacker intrusion and leakage of patient information.
[0089] The storage device has basic computer information storage functions. The computer storage device includes memory and hard disk. The memory should be 64GB server memory with DDR4 or higher protocol. The choice of hard disk depends on the situation: when performing lung nodule detection, NVMe solid disk should be used with PCIe 3.0 or higher storage protocol and read / write speed of 3000Mb / s. It has the characteristics of high speed and good stability. The hard disk used to store the lung nodule detection results is mechanical hard disk, which has the characteristics of good stability and large capacity.
[0090] The computing device is equipped with a GPU with image processing acceleration and a CPU with multi-threaded processing capabilities for computer information processing. It can load and compute the 3D convolutional neural network lung nodule detection method based on a multi-layer channel perception module described in this embodiment. It should use a GPU graphics accelerator with large video memory and high computing power. During training, a GPU graphics accelerator with more than 20GB of video memory (multiple graphics cards can be used) should be used. During detection, a GPU graphics accelerator with more than 10GB of video memory should be used.
[0091] The display device should have basic computer display capabilities and the ability to display images. It should use LCD or LED displays and have the ability to display high-definition images with high resolution and high color gamut. If necessary, a 3D display can be used.
[0092] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.
Claims
1. A method for detecting lung nodules in CT images based on multi-channel perception three-dimensional convolutional neural networks, characterized in that: Includes the following steps: S1: Convert the format of the CT image data; S2: Map the HU values of the image, and perform preprocessing such as rotation, translation, cropping, and padding on the image before inputting it into the first layer encoder; S3: The channel and spatial information are merged and encoded through the multi-channel sensing residual module and then downsampled multiple times; S4: Then the encoded feature map is decoded four times through the multi-layer channel perception residual module and the deconvolution layer to restore the feature map. Then the feature maps of the same size in the encoder and decoder are stitched together in the channel direction. Regression and classification operations are performed through the candidate nodule generation module composed of two fully connected layers to obtain candidate nodules. S5: Then, the CT image is processed by a convolutional downsampling feature map with a kernel size of 3×3×3 and a stride of 2. This feature map is then concatenated with the feature map of the same size from step S3 to obtain a candidate feature map. The candidate feature map is then compressed using a convolutional kernel size of 1×1×1. The position coordinates of the candidate nodules obtained in step S4 are then mapped onto the compressed feature map. Based on the mapped position, the feature map is segmented into a number equal to the number of nodules. This feature map is then input into two fully connected layers for regression and classification. The regression values of non-nodules are filtered out to reduce false positives. The multi-channel sensing residual module is obtained by fusing the multi-channel sensing module and the residual module; The multi-layer channel perception module is used to improve the information fusion between the channels of each feature map. The multi-layer channel perception module includes channel feature weights and spatial feature weights. Channel feature weights refer to converting all channel feature maps into a set of feature vectors representing channel weights using global average pooling. These feature vectors contain the semantic information of all channels. Spatial feature weights refer to compressing all channel feature maps into a single feature map using average pooling along the channel direction. This feature map is obtained by averaging voxels at fixed positions along the channel direction. Each voxel in the compressed feature map contains information about the voxel positions of all channel feature maps and is concatenated with three dilated convolutions with a dilation distance of [1,2,3] between the convolution kernels. Then, it is compressed into a three-dimensional weight matrix with one channel. The specific calculation of channel feature weights and spatial feature weights is as follows: in Indicates global average pooling. Represented as the length, width, and height of the feature map. Represents the ReLU activation function. This represents the sigmoid activation function. , These are the weight matrices of the two convolutional layers. Indicates the input feature map, This represents the Hadamard product of the channel-aware feature weight matrix and the input. , , , in This indicates average pooling along the channel direction. This represents the sigmoid activation function. These represent dilated convolutions with kernel dilation distances of [3, 2, 1], respectively. It is the weight matrix of the compressed convolutional layer. The Hadamard product of the spatially perceived feature weight matrix of the input and channel directions; After extracting feature information through a bidirectional sensing mechanism of channel direction and spatial direction, convolutional coding is used and the enhanced features are back-propagated to the original feature map in the form of Hadamard product. On this basis, the multi-layer channel sensing module and the residual module are fused together. The fusion positions are distributed in the 2nd, 3rd, 4th and 5th layers of the feature downsampling network and the corresponding 1st, 2nd, 3rd and 4th layers of the feature upsampling network. A three-dimensional neural network is constructed using a multi-channel perception residual module, a candidate nodule generation module, and a nodule false positive filtering module to process input CT images and obtain nodule classification and location results. The three-dimensional neural network consists of two parts: encoding and decoding. The encoding part uses a multi-channel perception residual module and a max pooling layer, while the decoding part uses deconvolution and a multi-channel perception residual module. The encoding part contains five modules: d1, d2, d3, d4, and d5, which are composed of a multi-channel perception residual module and max pooling downsampling, with the aim of extracting feature map information. The decoding part contains a multi-channel perception residual module and deconvolution upsampling, with the aim of expanding the feature map information. Upsampling comprises four modules: U1, U2, U3, and U4. The candidate nodule generation module integrates the information from U1, U2, U3, and U4 to generate candidate nodule information containing a large number of false positives, including location coordinates and nodule classification information. Its loss function adopts the cross-entropy loss function of hard negative sample mining. The nodule false positive filtering module uses the coordinates and categories output by the candidate nodule generation module as information to find false positive nodules. Its principle is to downsample the d2 feature map and the d1 feature map by convolution with a stride of 2 and then concatenate them in the channel direction to obtain a new feature map. The candidate nodule information is used to segment and find false positive nodules on the new feature map. Its loss function adopts the cross-entropy loss function.
2. The method for detecting lung nodules in CT images based on multi-channel perception three-dimensional convolutional neural networks according to claim 1, characterized in that: Step S1 specifically includes: processing CT images in DICOM or MHD format into binary file formats that can be processed by the system.
3. The method for detecting lung nodules in CT images based on multi-channel perception using a three-dimensional convolutional neural network according to claim 1, characterized in that: Step S2 specifically includes: mapping the HU value of the image to the range [0, 255], performing rotation, translation, and cropping preprocessing on the image, with the input image cropped to a size of... The square, when cropped to the edge of the image is insufficient The extra voxels are filled with voxel values of 180 and then input into the first layer encoder.
4. The method for detecting lung nodules in CT images based on multi-channel perception of a three-dimensional convolutional neural network according to claim 1, characterized in that: Step S3 specifically includes: downsampling consists of five downsampling operations, resulting in feature maps of the following sizes: , , , Finally, the size is obtained as The feature map is shown in parentheses, where the parameters in parentheses represent the number of channels, length, width, and height of the feature map, respectively.
5. A three-dimensional convolutional neural network-based CT image lung nodule detection system, characterized in that: Includes communication equipment, storage devices, computing devices, and display devices; The communication device is used to transfer DICOM or MHD format files output by the CT equipment to the storage device; The storage device is used to store CT image data and lung nodule detection process and results data; The computing device is used to load and execute the lung nodule detection method based on multi-channel perception in CT images as described in any one of claims 1-4; The display device is used to display the results of lung nodule detection.