Multi-modal image classification method and device for auxiliary diagnosis of Parkinson's disease and computer readable storage medium

By combining pixel-level and feature-level fusion methods with 3D Simple CNN and 3D Multi-Scale CNN, the problems of information loss and complexity in multimodal image fusion are solved, and efficient, accurate and interpretable image classification for Parkinson's disease diagnosis is achieved.

CN118570537BActive Publication Date: 2026-06-30QILU UNIVERSITY OF TECHNOLOGY (SHANDONG ACADEMY OF SCIENCES) +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
QILU UNIVERSITY OF TECHNOLOGY (SHANDONG ACADEMY OF SCIENCES)
Filing Date
2024-05-31
Publication Date
2026-06-30

Smart Images

  • Figure CN118570537B_ABST
    Figure CN118570537B_ABST
Patent Text Reader

Abstract

This invention belongs to the field of medical image processing technology, and more specifically relates to a multimodal image classification method, device, and computer-readable storage medium for auxiliary diagnosis of Parkinson's disease. The method includes acquiring an image; performing local normalization processing on the image to extract local normalization coefficients; fitting the local normalization coefficients to obtain shape parameters and variance; fitting the normalization coefficients, including shape parameters and variance, in each direction of the image using an asymmetric generalized Gaussian function to obtain index features in each direction; using a support vector machine as a regression model to obtain the image quality score; and comparing the image quality score with a set threshold. This invention solves the problem in the prior art that structural images have high resolution and can reflect structural morphological information, but weak metabolic and other functional information, while functional images have low resolution, can reflect functional metabolic information, but have difficulty reflecting structural information.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing technology, and more specifically, relates to a multimodal image classification method, device, and computer-readable storage medium for auxiliary diagnosis of Parkinson's disease. Background Technology

[0002] Because Parkinson's disease (PD) often presents with no obvious clinical symptoms, and its symptoms are often similar to those of other diseases, it is difficult to detect through routine examinations and is easily misdiagnosed. This means that by the time most PD patients are diagnosed, a large number of dopamine neurons in the substantia nigra and striatum have already died, the condition is very serious, and the golden period for treatment has long passed. Therefore, timely and effective detection of PD is crucial for patient treatment and disease progression control. In recent years, the increasing maturity of radiomics has led to its growing application in disease-aided diagnosis, achieving good results. Multimodal medical image fusion mainly refers to processing medical images of different modalities, utilizing the complementarity and redundancy between different types of information to improve image quality and retain specific features, obtaining more reliable and accurate lesion information than single-modal images.

[0003] Chinese invention patent CN113143246A discloses a Parkinson's disease assisted clinical decision-making system based on multimodal magnetic resonance imaging (MRI), including an input module, a feature extraction module, a feature fusion module, and a decision module. The input module includes MRI multimodal image input and sample data (demographic data and clinical data) input. The feature extraction module is used to extract features from the two modalities of data, including MRI image preprocessing and extracting feature data from regions of interest. The feature fusion module fuses the extracted image feature data, constructs a concatenated feature matrix X from the calculated DTI-ALPS values, and concatenates the sample data into a corresponding matrix Y. The decision module is used to learn the features, perform data regression and classification, and finally obtain the result.

[0004] Currently, pixel-level fusion and feature-level fusion are more commonly used, while decision-level fusion is less frequently used due to its high requirements for feature extraction and higher fusion cost. The images used in fusion are mainly divided into structural images and functional images. Structural images, such as MRI and CT, have high resolution and can reflect structural morphological information, but their metabolic and other functional information is weak. Functional images, such as DTI and PET, have low resolution and can reflect functional metabolic information, but it is difficult to reflect structural information. However, feature-level fusion often fuses the last level of features after convolution, and the amount of information loss in this process is often significant. Furthermore, this process is essentially a "black box," making it difficult to clearly interpret the specific meaning of the extracted information. In practical applications, research on Parkinson's disease images remains very limited. Summary of the Invention

[0005] This paper focuses on structural and functional images of Parkinson's disease (PD), proposing a pixel-level fusion method and a feature-level fusion method for auxiliary diagnosis of PD, filling the gap in research on multimodal medical image fusion in the field of Parkinson's disease. In the pixel-level fusion method, structural and functional images are first rigorously registered while preserving the subject's brain structure and metabolic information, achieving the fusion of structural and functional information. Then, 3D Simple CNN is used to extract features from the fused images and perform final classification, enabling auxiliary diagnosis of Parkinson's disease. In the feature-level fusion method, structural and functional images are input into parallel branches on both sides of a 3D Multi-Scale CNN, where 3DSimple CNN performs step-by-step feature extraction. Features at the same level are concatenated with the fusion features of the previous level. The concatenated features are then processed in the fusion block to become the fusion features of that level. The fusion features then proceed to the next level, repeating the fusion process. This step-by-step processing method achieves full extraction of structural and functional information from the images, reduces information loss, effectively utilizes various lesion information, and improves diagnostic results.

[0006] The detailed technical solution of this invention is as follows:

[0007] Multimodal image classification methods for Parkinson's disease auxiliary diagnosis:

[0008] S1. Acquire the image, perform local normalization on the image, and extract the local normalization coefficients of the image.

[0009] S2. Fit the local normalization coefficients using a generalized Gaussian distribution to obtain the shape parameters and variance;

[0010] S3. The normalization coefficients of the image in each direction, including shape parameters and variance, are fitted by an asymmetric generalized Gaussian function to obtain the index features of the image in each direction, including the horizontal direction, the vertical direction, and the diagonal direction.

[0011] S4. Using the support vector machine as a regression model, input the index features of the acquired image in each direction to obtain the image quality score.

[0012] Specifically, image quality assessment takes an image as input and outputs its quality score. There are three types of image quality assessment: full-reference image quality assessment (requires a high-quality reference image), half-reference image quality assessment (requires selective reference images, such as watermarked images), and no-reference image assessment (no reference image required, only the image to be assessed is needed). Given that image data generally cannot provide a large number of reference images, this paper selects the no-reference image assessment method and chooses the classic Brisque method as the assessment approach.

[0013] S5. Compare the image quality score with the set threshold:

[0014] If the quality score of an image is greater than a set threshold, the image will be placed into the feature-level fusion classification branch for identification.

[0015] If the quality score of an image is less than or equal to a set threshold, the image will be placed into the pixel-level fusion classification branch for identification.

[0016] Furthermore, the images include MRI and DTI images. The MRI and DTI images are fused to obtain MRI-DTI images. Then, feature extraction and classification are performed on the MRI-DTI images to obtain classification and diagnostic results.

[0017] Furthermore, the pixel-level fusion classification branch includes pixel-level fusion and pixel-level classification;

[0018] Specifically, the pixel-level fusion method is as follows:

[0019] (a) Skull dissection of MRI images to obtain SS-MRI: Using the BET (BrainExtraction Tool) component in FSL software, non-brain tissues such as the skull are dissected from MRI images by setting an appropriate fractional intenlsity threshold and voxels for center of initial brain surface sphere, reducing noise and irrelevant brain volume;

[0020] (b) Standard position registration: Using the FLIRT (FMRIB's Linear Image Registration Tool) component, the SS-MRI affine transformation was performed to the MNI152 space using the trilinear interpolation method to eliminate spatial differences between scanned subjects and obtain standard MNI-MRI images for subsequent operations;

[0021] (c) Mean-DTI image: Since DTI is a four-dimensional image, in order to obtain a uniform dimension and standard image, fslmaths is used to mean the volume of DTI to obtain the mean image Mean-DTI in this dimension.

[0022] (d) Skull dissection of mean-DTI images: Skull dissection of mean-DTI images obtained after mean processing to remove excess brain tissue and obtain SS-DTI.

[0023] (e) Register SS-DTI to MNI-MRI: Using the MNI-MRI obtained in (b) as a reference image, the SS-DTI is mapped to the MNI-MRI by spline interpolation to obtain MRI-DTI with MRI anatomy. That is, the image obtained in this step contains two types of lesion information.

[0024] Furthermore, the pixel-level classification method is shown in the figure below:

[0025] This paper proposes a 3D Simple CNN for the classification of fused images (MRI-DTI with MRI anatomical structures). Compared with other deep learning models, 3D Simple CNN has a simpler structure and fewer parameters, which can effectively prevent overfitting and other problems that often occur in 3D image training.

[0026] The 3D Simple CNN includes a feature extraction module and an output module. In the initial stage, the feature extraction module uses a 7×7×7 convolution to obtain the overall features of the image. Subsequently, it performs feature extraction step by step through K1 cascaded sampling convolution modules (k,c,s). Each sampling convolution module (k,c,s) includes a downsampling convolution module and a regular convolution module.

[0027] The downsampling convolution module is used to perform downsampling, increase the number of channels, and introduce batch normalization layers and ReLU activation functions to accelerate convergence and prevent gradient vanishing; the ordinary convolution module is used to increase the nonlinearity of the network and help the model learn complex feature representations.

[0028] In the output module, a GlobalAveragePooling layer and a dropout layer with a rate of 0.6 are added to reduce the number of parameters and prevent overfitting. Finally, the diagnostic results are output through FCLayer.

[0029] 3D Simple CNN has a simple structure and serves as a baseline network for evaluating image classification methods.

[0030] Furthermore, this paper proposes a multimodal 3D image feature-level fusion classification method—3D Multi-Scale CNN—to balance high-level and low-level features and improve the comprehensiveness and effectiveness of feature fusion for the feature domain of Parkinson's disease images.

[0031] The process of placing the image into a feature-level fusion classification branch for recognition, namely the feature-level fusion classification method (3D Multi-Scale CNN), specifically includes:

[0032] First, the MRI and DTI images are preprocessed. The MRI images are sequentially subjected to skull dissection and registration, and the DTI images are sequentially subjected to mean processing, skull dissection and registration. Then, the MRI and DTI images are input into a 3D Multi-Scale CNN for feature extraction, fusion and classification.

[0033] The 3D Multi-Scale CNN has a three-parallel structure, meaning the network has three columns. The first and second columns are the two sides, namely the left and right feature extraction modules, which are used to extract features from MRI and DTI, respectively. The second column is the middle part, the fusion part, which fuses the features from MRI and DTI. Specifically, for the feature extraction part of MRI and DTI, the feature extraction module of the 3D Multi-Scale CNN initially uses 7×7×7 convolutions to obtain the overall features of the image, and then performs feature extraction step by step through K2 cascaded sampling convolution modules (k,c,s).

[0034] The feature extraction part of the 3D Multi-Scale CNN is based on 3D Simple CNN, and the fusion part of 3D Multi-Scale CNN mainly combines the extracted different types of data to obtain richer feature representations.

[0035] Specifically, the features extracted by the feature extraction module in the 3D Multi-Scale CNN are kept separately for the fusion operation in the second column. That is, the features extracted by each sampling convolution module are kept separately and placed in a waiting queue. After all the sampling convolution modules of the 3D Multi-Scale CNN have completed feature extraction, the features in the waiting queue are passed into the second column in order of hierarchy. In the fusion operation in the second column, the features of the same level are first obtained separately, and then the obtained features are concatenated with the fusion features of the previous level. The concatenated features are then further fused by the fusion module.

[0036] For the data flow of each layer of a 3D Multi-Scale CNN, for example, in the fusion operation of the i-th layer, the two types of features of the i-th layer are first obtained separately, and then these two types of features are fused with the fusion features obtained from the (i-1)-th layer. By concatenating the series, we can obtain the features. Will The data is fed into the fusion module to obtain the fusion features of this layer. The detailed workflow is as follows:

[0037] (1) MRI and DTI respectively use the Conv of 3D Simple CNN to perform first-level feature extraction. The two types of features extracted are concatenated by the concat operation. The concatenated features are sent to the feature fusion module to fuse the two types of features and obtain the first-level fused features of the two types of images.

[0038] (2) MRI and DTI perform second-level feature extraction through module 1 of 3D Simple CNN. The extracted two types of features are concatenated with the fusion features obtained in process (1). The concatenated features are sent to the fusion module to perform fusion of the two types of features at different levels, and obtain the first-level and second-level fusion features of the two types of images.

[0039] (3) Repeat the above steps to obtain the fusion features of the first, second and third layers of the two types of images;

[0040] (4) Continue to repeat the above steps to obtain the fusion features of all layers of the two types of images;

[0041] (5) Finally, the features of all layers of the two types of images are used to obtain the classification results through the output module.

[0042] Specifically, the formula for the data flow direction in a parallel structure is as follows:

[0043]

[0044] In formulas (1)-(3), i represents the layer of the network. Indicates a serial operation. The MRI and DTI feature representations of the i-th layer are given. Fusion(x) represents the fusion operation, which combines the fusion features of the previous layer in the cascaded state with two new features of the current layer.

[0045] In the feature fusion part of the 3D Multi-Scale CNN, a 3D dual-channel pooling method (including 3D global average pooling and 3D global max pooling) is first used to obtain representative and common features from the two types of images. These two features are then added at the same position to increase the robustness of the features. Next, convolution (Conv3D) is used to reduce aliasing and channel dimensionality, followed by normalization processing (Batch Normal?). The sigmoid function is used to obtain feature weights, and finally, the feature weights are multiplied by the initial features obtained in this module to assign feature weights to the features at this level. The specific process is shown in the following formula:

[0046]

[0047] In formula (4), F c denoted by , · represents the concatenated features, σ represents the channel-wise multiplication operation, σ represents the sigmoid function, and GAP(F) and GMP(F) represent global average pooling and global max pooling of features F, respectively. This represents a positional addition operation. Through the above fusion process, fine-tuning of features at each level is achieved, highlighting the influence of important parts and weakening the influence of irrelevant parts. This approach effectively enhances the model's ability to capture key features at each level, enabling it to more keenly focus on the most informative and representative regions of the image. By employing a hierarchical optimization strategy, the model can more accurately extract and utilize important image features when performing classification and diagnostic tasks, thereby improving the model's overall generalization ability.

[0048] This invention also discloses a multimodal image classification device for Parkinson's disease-assisted diagnosis, the device comprising:

[0049] processor;

[0050] A memory on which computer programs that can run on the processor are stored;

[0051] The computer program, when executed by the processor, implements the steps of the multimodal image classification method for Parkinson's disease-assisted diagnosis.

[0052] A computer-readable storage medium storing a multimodal image classification program for Parkinson's disease-assisted diagnosis, wherein when the multimodal image classification program for Parkinson's disease-assisted diagnosis is executed by the processor, the multimodal image classification program for Parkinson's disease-assisted diagnosis implements the steps of the multimodal image classification method for Parkinson's disease-assisted diagnosis.

[0053] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0054] 1. This invention proposes a multimodal image classification method, device, and computer-readable storage medium for auxiliary diagnosis of Parkinson's disease. In the pixel-level fusion process, the processed functional image and structural image are matched in terms of function and structure to obtain a comprehensive image that simultaneously contains information on two types of lesions. The fusion process is interpretable.

[0055] 2. This invention proposes a multimodal image classification method, device, and computer-readable storage medium for Parkinson's disease assisted diagnosis. In the pixel-level fusion classification process, a medical image training model (3DSimple CNN) is proposed to balance the relationship between model complexity and feature capture capability, thereby achieving feature extraction and diagnostic classification.

[0056] 3. This invention proposes a multimodal image classification method, device, and computer-readable storage medium for Parkinson's disease auxiliary diagnosis. In feature-level fusion, a fusion method for Parkinson's images—3DMulti-ScaleCNN—is proposed. This method performs step-by-step feature extraction and processing on images of different modalities, making full use of the various types of information contained in Parkinson's images and solving the problem of large information loss in traditional feature-level fusion.

[0057] 4. This invention proposes a multimodal image classification method, device, and computer-readable storage medium for Parkinson's disease assisted diagnosis. This paper designs a three-parallel structure for a 3D Multi-Scale CNN, with 3D simpleCNNs on both sides and a fusion module in the middle. Images from different modalities enter the network from both sides, and the 3D simple CNNs extract features. The fusion module in the middle performs feature fusion and information transfer at the same level, and the model ultimately outputs the diagnostic results through the fusion module in the middle. In the fusion module of the 3D Multi-Scale CNN, 3D dual-channel pooling is also used to improve the model's ability to capture lesion areas and increase model stability. Through sigmoid and channel-wise multiplication, the model can learn the most important features at each level, thereby improving the final diagnostic effect. Attached Figure Description

[0058] Figure 1 This is the overall flowchart described in Example 1.

[0059] Figure 2 This is a schematic diagram of the pixel-level fusion classification method described in Example 1.

[0060] Figure 3 This is a schematic diagram of the pixel-level fusion method described in Example 1.

[0061] Figure 4 This is a schematic diagram of the 3D Simple CNN structure described in Example 1.

[0062] Figure 5 This is a schematic diagram of the feature-level fusion classification method described in Example 1.

[0063] Figure 6 This is a schematic diagram of the data flow of each layer of the 3D Multi-Scale CNN described in Example 1.

[0064] Figure 7 This is a schematic diagram of the feature fusion module structure of the 3D Multi-Scale CNN described in Example 1. Detailed Implementation

[0065] The present invention will be described in further detail below with reference to the accompanying drawings and embodiments, but is not limited thereto;

[0066] Example 1

[0067] This paper proposes a multimodal fusion method for Parkinson's disease images at both the pixel and feature levels, enabling intelligent diagnosis of Parkinson's disease from a multimodal image perspective, thus filling the gap in research on multimodal image fusion in the field of Parkinson's disease. Furthermore, the Brisque algorithm is used to evaluate image quality and select an appropriate fusion classification method to improve the overall fusion classification effect. Specifically, this includes:

[0068] S1. Acquire the image, perform local normalization on the image, and extract the local normalization coefficients of the image.

[0069] The images include MRI and DTI images. In this embodiment, MRI images, which are more commonly used in real life, are selected for structural images, and 4D DTI images are selected for functional images. Based on these two types of Parkinson's images, the problems existing in the above-mentioned multimodal medical image fusion are studied.

[0070] S2. Fit the local normalization coefficients using a generalized Gaussian distribution to obtain the shape parameters and variance.

[0071] S3. The normalization coefficients of the image in each direction, including shape parameters and variance, are fitted by an asymmetric generalized Gaussian function to obtain the index features of the image in each direction, including the horizontal direction, the vertical direction, and the diagonal direction.

[0072] S4. Using the support vector machine as a regression model, input the index features of the acquired image in each direction to obtain the image quality score.

[0073] Specifically, image quality assessment takes an image as input and outputs its quality score. There are three types of image quality assessment: full-reference image quality assessment (requires a high-quality reference image), half-reference image quality assessment (requires selective reference images, such as watermarked images), and no-reference image assessment (no reference image required, only the image to be assessed is needed). Given that image data generally cannot provide a large number of reference images, this paper selects the no-reference image assessment method and chooses the classic Brisque method as the assessment approach.

[0074] S5. Compare the image quality score with the set threshold:

[0075] If the quality score of an image is greater than a set threshold, the image will be placed into the feature-level fusion classification branch for classification and recognition.

[0076] If the quality score of an image is less than or equal to a set threshold, the image will be placed into a pixel-level fusion classification branch for classification and recognition.

[0077] Specifically, the pixel-level fusion classification branch includes pixel-level fusion and pixel-level classification, such as... Figure 2 As shown, MRI and DTI images are fused to obtain MRI-DTI, and then the MRI-DTI images are fed into 3DSimple CNN for feature extraction and classification to obtain classification and diagnosis results.

[0078] Specifically, the pixel-level fusion method is as follows: Figure 3 As shown:

[0079] (a) Skull dissection of MRI images: Using the BET (Brain Extraction Tool) component in FSL software, non-brain tissues such as the skull are dissected by setting an appropriate fractional intenlsity threshold and voxels for center of initial brain surface sphere, reducing noise and irrelevant brain volume, and obtaining SS-MRI.

[0080] (b) Standard position registration: Using the FLIRT (FMRIB's Linear Image Registration Tool) component, the SS-MRI affine transformation was performed to the MNI152 space using the trilinear interpolation method to eliminate spatial differences between scanned subjects and obtain standard MNI-MRI images for subsequent operations;

[0081] (c) Mean-DTI: Since DTI is a four-dimensional image, in order to obtain a uniform dimension and standard image, fslmaths is used to mean the volume of DTI to obtain the mean image Mean-DTI in this dimension.

[0082] (d) Skull dissection of mean-processed DTI images: Skull dissection was performed on mean-processed DTI images to remove excess brain tissue.

[0083] (e) Register SS-DTI to MNI-MRI: Using the MNI-MRI obtained in (b) as a reference image, the SS-DTI is mapped to the MNI-MRI by spline interpolation to obtain MRI-DTI with MRI anatomy. That is, the image obtained in this step contains two types of lesion information.

[0084] Specifically, the pixel-level classification method is shown in the figure below:

[0085] Currently, deep learning is widely used in medical imaging due to its excellent performance. However, in previous 2D CNN methods, 3D images are processed slice by slice, resulting in the complete loss of anatomical information in directions orthogonal to the 2D plane. In 3D CNN, images can be processed as a whole, perfectly preserving continuous information in the spatial dimension and improving the overall performance of the model. This paper proposes a 3D Simple CNN for image fusion classification, considering the balance between dataset size, model complexity, and feature capture capability when using 3D CNN to process datasets with 3D data. Compared with other deep learning models, 3D Simple CNN has a simpler structure and fewer parameters, effectively preventing overfitting problems commonly encountered in 3D image training. The specific structure is as follows: Figure 4 As shown.

[0086] The 3D Simple CNN includes a feature extraction module and an output module. In the initial stage, the feature extraction module uses a 7×7×7 convolution (Conv 7x7x7) to obtain the overall features of the image. Subsequently, it performs feature extraction step by step through K1 cascaded sampling convolution modules (k,c,s). Preferably, K1=4. Each sampling convolution module (k,c,s) includes a downsampling convolution module and a regular convolution module.

[0087] The downsampling convolution module is used to perform downsampling, increase the number of channels, and introduce batch normalization layers and ReLU activation functions to accelerate convergence and prevent gradient vanishing; the ordinary convolution module is used to increase the nonlinearity of the network and help the model learn complex feature representations.

[0088] In the output module, a GlobalAveragePooling layer and a dropout layer with a rate of 0.6 are added to reduce the number of parameters and prevent overfitting. Finally, the diagnostic results are output through FCLayer.

[0089] 3D Simple CNN has a simple structure and serves as a baseline network for evaluating image fusion methods.

[0090] Furthermore, past feature fusion methods primarily relied on deep learning models to extract high-level features from images, and then used various algorithms to fuse and classify these features. However, this fusion approach only operates on high-level features, while in Parkinson's disease medical images, in addition to tissue lesions, there are also structural changes. As deep learning models become increasingly sophisticated, the expression of high-level features becomes more prominent, while low-level structural information gradually weakens. Therefore, when conventional feature-level fusion methods are applied to Parkinson's disease images, they often fail to capture structural changes in the images. Thus, Parkinson's disease images require a more comprehensive fusion strategy to simultaneously focus on both tissue and structural features. To address this, this paper proposes a multimodal 3D image fusion method—3D Multi-Scale CNN—to balance high-level and low-level features, improving the comprehensiveness and effectiveness of feature fusion.

[0091] The feature-level fusion classification method (3D Multi-Scale CNN) includes:

[0092] First, standard preprocessing operations were performed on the MRI and DTI images. The MRI images underwent skull dissection and registration sequentially, while the DTI images underwent mean squared reduction, skull dissection, and registration sequentially. Then, the MRI and DTI images were input into a 3D Multi-Scale CNN for feature extraction, fusion, and classification.

[0093] The 3D Multi-Scale CNN consists of three parallel parts. The first and second columns are the two side parts, namely the left and right feature extraction modules, which are used for feature extraction. The second column is the middle part, namely the fusion part, which is used for feature fusion.

[0094] The feature extraction part is based on 3D Simple CNN, and the fusion part mainly combines the extracted data of different types to obtain richer feature representations. The specific structure is as follows: Figure 5 As shown.

[0095] The 3D Multi-Scale CNN has a three-parallel structure, meaning the network has three columns. The first and third columns are used for feature extraction from MRI and DTI, respectively, while the second column focuses on fusing the features from MRI and DTI. For the feature extraction of MRI and DTI, the 3D Multi-Scale CNN's feature extraction module initially uses 7×7×7 convolutions to obtain the overall features of the image, and then performs feature extraction step by step through K2 cascaded sampling convolutional modules (k,c,s). Preferably, K2=3. Figure 5 As shown, Module 1, Module 2, and Module 3 are all sampling convolution modules;

[0096] Here, a downsampling convolutional module and a regular convolutional module in 3D Simple CNN are grouped into one module, namely the downsampling convolutional module, as follows: Figure 5 Take module N in the bottom right corner as an example.

[0097] Specifically, the features extracted by the feature extraction module in the 3D Multi-Scale CNN are kept separately for the fusion operation in the second column. That is, the features extracted by each sampling convolution module are kept separately and placed in a waiting queue. After all the sampling convolution modules of the 3D Multi-Scale CNN have completed feature extraction, the features in the waiting queue are passed into the second column in sequence according to their hierarchy. In the fusion operation of the second column, features at the same level are first obtained separately, and then the obtained features are concatenated with the fusion features of the previous level. The concatenated features are then further fused by the fusion module.

[0098] The data flow direction of each layer in a 3D Multi-Scale CNN is as follows: Figure 6 As shown, for example, in the fusion operation at layer i, the two types of features at layer i are first obtained separately, and then these two types of features are combined with the fusion features obtained from layer (i-1). By concatenating the series, we can obtain the features. Will The data is fed into the fusion module to obtain the fusion features of this layer. The detailed workflow is as follows:

[0099] (1) MRI and DTI respectively use Conv of 3D Simple CNN to extract the first level of features. The two types of features extracted are concatenated by the concat operation. The concatenated features are sent to the feature fusion module to fuse the two types of features and obtain the first level of fused features of the two types of images.

[0100] (2) MRI and DTI perform second-level feature extraction through module 1 of 3D Simple CNN. The extracted two types of features are concatenated with the fusion features obtained in process (1). The concatenated features are sent to the fusion module to perform fusion of the two types of features at different levels, and the first and second level fusion features of the two types of images are obtained.

[0101] (3) Repeat the above steps to obtain the fusion features of the first, second and third layers of the two types of images.

[0102] (4) Continue to repeat the above steps to obtain the fusion features of all layers of the two types of images.

[0103] (5) Finally, the features of all layers of the two types of images are used to obtain the classification results through the output module.

[0104] Specifically, the formula for the data flow direction in a parallel structure is as follows:

[0105]

[0106] In formulas (1)-(3), i represents the layer of the network. Indicates a serial operation. The MRI and DTI feature representations of the i-th layer are given. Fusion(x) represents the fusion operation, which combines the fusion features of the previous layer in the cascaded state with two new features of the current layer.

[0107] In the fusion module, a 3D dual-channel pooling method is first used to obtain representative and common features from the two types of images. These two features are then added at the same position to increase feature robustness. Next, convolution is used to reduce aliasing and channel dimensionality, and a sigmoid function is used to obtain feature weights. Finally, these feature weights are multiplied by the features initially obtained in this module to assign feature weights to the features at this level. The specific process is as follows: Figure 7 As shown, its formula is expressed as follows:

[0108]

[0109] In formula (4), F c denoted by , · represents the concatenated features, σ represents the channel-wise multiplication operation, σ represents the sigmoid function, and GAP(F) and GMP(F) represent global average pooling and global max pooling of features F, respectively. This represents a positional addition operation. Through the above fusion process, fine-tuning of features at each level is achieved, highlighting the influence of important parts and weakening the influence of irrelevant parts. This approach effectively enhances the model's ability to capture key features at each level, enabling it to more keenly focus on the most informative and representative regions of the image. By employing a hierarchical optimization strategy, the model can more accurately extract and utilize important image features when performing classification and diagnostic tasks, thereby improving the model's overall generalization ability.

[0110] Example 2

[0111] This embodiment provides a multimodal image classification device for Parkinson's disease auxiliary diagnosis;

[0112] The device includes:

[0113] processor;

[0114] A memory on which computer programs that can run on the processor are stored;

[0115] The computer program, when executed by the processor, implements the steps of the multimodal image classification method for Parkinson's disease-assisted diagnosis.

[0116] This embodiment also provides a computer-readable storage medium based on a multimodal image classification method for Parkinson's disease-assisted diagnosis. The computer-readable storage medium stores a multimodal image classification program for Parkinson's disease-assisted diagnosis. When the processor executes the multimodal image classification program for Parkinson's disease-assisted diagnosis, it implements the steps of the multimodal image classification method for Parkinson's disease-assisted diagnosis.

[0117] Obviously, the above embodiments of the present invention are merely examples for clearly illustrating the technical solutions of the present invention, and are not intended to limit the specific implementation of the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the claims of the present invention should be included within the protection scope of the claims of the present invention.

Claims

1. A multimodal image classification method for Parkinson's disease auxiliary diagnosis, characterized in that, include: S1. Acquire the image, perform local normalization on the image, and extract the local normalization coefficients of the image. S2. Fit the local normalization coefficients using a generalized Gaussian distribution to obtain the shape parameters and variance; S3. The normalization coefficients of the image in each direction, including shape parameters and variance, are fitted by an asymmetric generalized Gaussian function to obtain the index features of the image in each direction, including the horizontal direction, the vertical direction, and the diagonal direction. S4. Using the support vector machine as a regression model, input the index features of the acquired image in each direction to obtain the image quality score. S5. Compare the image quality score with the set threshold: If the quality score of an image is greater than a set threshold, the image will be placed into the feature-level fusion classification branch for identification. If the quality score of an image is less than or equal to a set threshold, the image will be placed into the pixel-level fusion classification branch for identification. The images include MRI and DTI images. The MRI and DTI images are fused to obtain MRI-DTI images. Then, feature extraction and classification are performed on the MRI-DTI images to obtain classification and diagnostic results. The pixel-level fusion classification branch includes pixel-level fusion and pixel-level classification; The pixel-level fusion method is as follows: (a) Skull dissection was performed on the MRI images to obtain SS-MRI; (b) The SS-MRI affine transformation was performed to the MNI152 space to eliminate spatial differences between scanned subjects and obtain standard MNI-MRI images; (c) The volume of DTI was averaged using fslmaths to obtain the mean image Mean-DTI in this dimension; (d) The Mean-DTI obtained after mean processing was subjected to craniotomy to remove excess brain tissue, resulting in SS-DTI; (e) Using the MNI-MRI obtained in (b) as a reference image, the SS-DTI is mapped to the MNI-MRI using the spline interpolation method to obtain the MRI-DTI with MRI anatomical structure; The pixel-level classification includes: MRI-DTI with MRI anatomy was classified using 3D Simple CNN; The 3D Simple CNN includes a feature extraction module and an output module. In the initial stage, the feature extraction module uses a 7×7×7 convolution to obtain the overall features of the image, and then performs feature extraction step by step through K1 cascaded sampling convolution modules; wherein, each sampling convolution module includes a downsampling convolution module and a normal convolution module. The downsampling convolution module is used to perform downsampling, increase the number of channels, and introduce a batchnorm layer and a ReLU activation function. In the output module, a GlobalAveragePooling layer and a dropout layer with a rate of 0.6 are added to reduce the number of parameters and prevent overfitting. Finally, the diagnostic results are output through FCLayer. The step of placing the image into a feature-level fusion classification branch for recognition specifically includes: First, the MRI and DTI images are preprocessed. Simultaneously, the MRI images are sequentially subjected to skull dissection and registration, and the DTI images are sequentially subjected to mean-averaging, skull dissection and registration. Then, the processed MRI and DTI images are input into a 3D Multi-Scale CNN for feature extraction, fusion and classification. The 3D Multi-Scale CNN has a three-parallel structure, meaning the network has three columns. The first and third columns are the two sides, namely the left and right feature extraction modules, which are used to extract features from MRI and DTI, respectively. The second column is the middle part, which is the fusion part, which fuses the features from MRI and DTI. The features extracted by each sampling convolution module are kept separately for the fusion operation in the second column. That is, the features extracted by each sampling convolution module are kept separately and placed in the waiting queue. For the feature extraction part of MRI and DTI, the feature extraction module of the 3DMulti-ScaleCNN initially uses 7×7×7 convolution to obtain the overall features of the image, and then performs feature extraction step by step through K2 cascaded sampling convolution modules (k,c,s). After all modules of 3D Simple CNN have completed feature extraction, the features in the waiting queue are passed into the second column in order of hierarchy. In the fusion operation of the second column, features of the same level are first obtained separately, and then the obtained features are concatenated with the fusion features of the previous level. The concatenated features are then further fused through the fusion module.

2. The multimodal image classification method for Parkinson's disease auxiliary diagnosis according to claim 1, characterized in that, The specific processing steps for each layer of the 3D Multi-Scale CNN include: (1) MRI and DTI respectively use the Conv of 3D Simple CNN to perform first-level feature extraction. The two types of features extracted are concatenated by the concat operation. The concatenated features are sent to the feature fusion module to fuse the two types of features and obtain the first-level fused features of the two types of images. (2) MRI and DTI perform second-level feature extraction through module 1 of 3D Simple CNN. The extracted two types of features are concatenated with the fusion features obtained in (1). The concatenated features are sent to the fusion module to perform fusion of the two types of features at different levels, and the first and second level fusion features of the two types of images are obtained. (3) Repeat the above steps to obtain the fusion features of the first, second and third layers of the two types of images; (4) Continue to repeat the above steps to obtain the fusion features of all layers of the two types of images; (5) Finally, the features of all layers of the two types of images are used to obtain the classification results through the output module.

3. The multimodal image classification method for Parkinson's disease auxiliary diagnosis according to claim 2, characterized in that, In the feature fusion part of 3D Multi-Scale CNN, the representative and common features of the two types of images are first obtained using the 3D dual-channel pooling method. These two features are then added at the same position. Convolution is then used to reduce jagged edges and channel dimensionality. Normalization is then performed using a normalization layer, and sigmoid is used to obtain feature weights. Finally, the feature weights are multiplied by the features initially obtained in this module to assign feature weights to the features at this level. The specific process is shown in the following formula: (4); In formula (4), Represents the characteristics after concatenation. This represents a channel-by-channel multiplication operation. GAP(F) represents the sigmoid function, and GMP(F) represents global average pooling and global max pooling of feature F. This represents the addition operation at the same position.

4. The multimodal image classification method for Parkinson's disease auxiliary diagnosis according to claim 3, characterized in that, The 3D dual-channel pooling method includes 3D global average pooling and 3D global max pooling.

5. A multimodal image classification device for Parkinson's disease assisted diagnosis, characterized in that, The device includes: processor; A memory on which computer programs that can run on the processor are stored; When the computer program is executed by the processor, it implements the steps of the multimodal image classification method for Parkinson's disease assisted diagnosis as described in any one of claims 1 to 4.

6. A computer-readable storage medium, characterized in that: The computer-readable storage medium stores a multimodal image classification program for Parkinson's disease auxiliary diagnosis, which, when executed by a processor, implements the steps of the multimodal image classification method for Parkinson's disease auxiliary diagnosis as described in any one of claims 1 to 4.