A Multimodal Brain Tumor Image Segmentation Method Based on Metasurface Encoders

By combining an electromagnetic metasurface encoder and a deep learning network, the problems of large data volume and high complexity in MRI image processing were solved, achieving efficient and accurate brain tumor image segmentation, and improving diagnostic efficiency and computational resource utilization.

CN117635949BActive Publication Date: 2026-06-30ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2023-12-12
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies for processing MRI images suffer from problems such as large data volume, high complexity, high computational resource consumption, and susceptibility to subjective factors, resulting in low efficiency in brain tumor diagnosis.

Method used

By combining an electromagnetic metasurface encoder and a deep learning network, MRI data is preprocessed using an electromagnetic metasurface to reduce its dimensionality to a low-dimensional feature representation. The U-Net semantic segmentation network is then used for image segmentation, and finally, a high-resolution image is reconstructed using a decoder.

Benefits of technology

It improves the speed and accuracy of MRI image processing, reduces the computational resource requirements, provides more accurate information for brain tumor localization and diagnosis, and optimizes the efficiency and accuracy of image segmentation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117635949B_ABST
    Figure CN117635949B_ABST
Patent Text Reader

Abstract

This invention discloses a multimodal brain tumor image segmentation method based on a metasurface encoder, relating to the fields of image processing and neural networks. The method includes: acquiring multimodal magnetic resonance imaging data and preprocessing it using an electromagnetic metasurface encoder to obtain preprocessed data; reducing the dimensionality of the original multimodal brain imaging data through rapid dimensionality reduction to decrease data processing time; inputting the preprocessed data into a deep learning network for image segmentation to obtain image segmentation results; the UNet semantic segmentation model used has been trained to identify and locate brain tumors; and reconstructing a high-resolution image based on the image segmentation results output by the deep learning network using a decoder. Compared with traditional brain tumor image segmentation methods, this invention significantly improves processing speed; due to the effective reduction of data dimensionality, the required storage space and computing resources are greatly reduced, providing a more economical and efficient solution for clinical applications and research.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of image processing and neural network technology, and more specifically to a multimodal brain tumor image segmentation method based on metasurface encoders. Background Technology

[0002] Early detection and accurate diagnosis of brain tumors are crucial for improving patient survival rates. Magnetic resonance imaging (MRI), a key tool for diagnosing brain tumors, provides high-resolution images of the brain, helping doctors identify the location and size of the tumor. However, MRI images are massive and structurally complex, and traditional manual segmentation and identification methods are time-consuming and lack accuracy and repeatability. Therefore, developing automated and efficient image processing technologies is essential for improving the clinical application of MRI imaging for brain tumors.

[0003] Since its initial application in 1980, MRI technology has made significant progress. From the initial T1-weighted and T2-weighted imaging to more advanced functional MRI (such as diffusion tensor imaging (DTI) and perfusion imaging), these techniques have provided crucial information for a comprehensive understanding and analysis of brain tumors. Functional MRI is particularly important, enabling physicians to assess the impact of tumors on brain function, thereby better guiding surgical and treatment planning. Although MRI plays a vital role in the diagnosis of brain tumors, the sheer volume and complexity of the data it generates present challenges. Traditional analysis methods based on radiologists' visual examination and interpretation are time-consuming and susceptible to subjective biases. Therefore, the development of efficient, automated MRI image analysis techniques is urgently needed to assist physicians in identifying and quantifying tumor characteristics more quickly and accurately, thereby improving diagnostic efficiency.

[0004] Against this backdrop, deep learning techniques, especially the UNet network, have become a revolutionary advancement in the field of medical image segmentation. The UNet network was initially proposed in 2015 by Olaf Ronneberger, Philipp Fischer, and Thomas Brox. The unique feature of the UNet network is its symmetrical structure and skip connections, which makes it particularly suitable for accurate segmentation of medical images with complex textures and structures.

[0005] Structurally, UNet consists of a shrinking path (encoder) and a symmetrical dilating path (decoder). In the shrinking path, the network progressively captures image features and reduces their spatial dimensionality through two repeated 3x3 convolutional layers, each followed by a ReLU activation function and a 2x2 max pooling operation. This process is repeated continuously, allowing the network to capture image features at different scales while reducing the number of parameters and improving computational efficiency. The decoder path is the reverse structure of the shrinking path. It progressively restores the spatial dimensionality of the feature maps through upsampling and 2x2 convolutions (upsampling operations), and at each step, it concatenates the feature maps with those of the corresponding layers in the shrinking path. This concatenation operation is a core feature of UNet; by combining the high-resolution features of the encoder with the upsampled features of the decoder, it ensures the preservation of detail information during segmentation, especially for more accurate processing of details such as image edges.

[0006] Furthermore, UNet is particularly suitable for scenarios with a limited number of training samples. Due to its efficient feature extraction and transfer mechanism, UNet can achieve high segmentation accuracy even with a small number of training samples. Therefore, it has become a very popular and effective tool in medical image processing, especially in tissue and cell-level image segmentation.

[0007] While the UNet network excels in feature extraction and image segmentation, it requires reducing feature dimensionality to facilitate efficient and accurate image analysis when dealing with more complex and nonlinear data structures. Traditional techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) struggle with highly complex and nonlinear data structures, making Convolutional Autoencoders (CAEs) particularly important.

[0008] A convolutional autoencoder (CA) is a special type of neural network primarily used for unsupervised learning, particularly well-suited for feature extraction and dimensionality reduction of image data. This network consists of two main parts: an encoder and a decoder. The encoder transforms the input data (such as an image) into a smaller, denser representation, the encoding in the latent space, while the decoder reconstructs the original input data from this latent representation. In a CA, the encoder typically consists of multiple convolutional layers and pooling layers. Convolutional layers extract local features from the image, while pooling layers reduce the spatial dimensionality of these features. Through this process, the network can extract important features from the input image and represent them in a more compact form. The decoder typically consists of upsampling layers and convolutional layers, whose task is to map the encoding in the latent space back to the original data space. Upsampling layers increase the spatial dimensionality of the feature map, while convolutional layers help reconstruct the details of the original image. Through layer-by-layer processing, the decoder gradually reconstructs an output similar to the original input.

[0009] A key characteristic of convolutional autoencoders (CAEs) is their ability to automatically learn effective data encoding by minimizing reconstruction errors. This makes them particularly suitable for tasks such as dimensionality reduction, feature extraction, denoising, and generating new image samples. In image processing, CAEs are widely used due to their efficiency and effectiveness in extracting image features. They capture local spatial features in images or other high-dimensional data by partially capturing these features, providing a more efficient low-dimensional representation for subsequent tasks such as image segmentation or classification. Nevertheless, CAEs implemented on traditional computers are still limited by computational resources and energy efficiency, which is particularly evident in applications requiring real-time processing and low power consumption.

[0010] Wave simulation computing has become a cutting-edge and diverse research field in recent years, attracting significant attention from both academia and industry. This field primarily utilizes photons, rather than electrons, for information processing and transmission, aiming to provide ultrafast computing speeds, parallel processing capabilities, and low power consumption. Recent research trends show that wave simulation computing, achieved through photonic integrated circuits, photonic crystals, and metamaterials, holds various potentials, including leveraging the linear and nonlinear properties of light to perform complex computational tasks. Beyond directly utilizing light for computation, researchers are exploring methods to combine light with existing electronic computing techniques to facilitate more efficient and faster information processing systems. For example, to maximize the advantages of each, hybrid optoelectronic convolutional neural networks (CNNs) use optical components as convolutional layers, while electronic components are used to implement nonlinear activation and fully connected layers.

[0011] Metasurfaces have emerged as a key approach in wave-based simulations, providing a promising tool for manipulating light at subwavelength scales. As two-dimensional (2D) arrays of artificial atoms or nanostructures, metasurfaces can manipulate electromagnetic waves with a precision unattainable by natural materials. Numerous reports to date have documented the creation of remarkable phenomena such as invisibility cloaks, polarization conversions, vortex beams, and holograms. The high degree of freedom in metasurface design makes them strong candidates for wave-based simulations, including spatial differentiation, integration, and convolution. They also offer new avenues for re-examining image segmentation. Key advantages of metasurface technology include thinness, low loss, and ease of integration, making them more attractive than traditional three-dimensional optical components in many applications. Currently, metasurfaces are used in ultrathin lenses, high-efficiency polarizers, and advanced optical sensors. These applications extend beyond scientific research to industries such as manufacturing, medicine, and the military.

[0012] Diffractive Neural Networks (D2NNs) are an innovative computing platform based on the principle of optical diffraction, attracting widespread attention in information processing and machine learning in recent years. Compared to traditional electron-based neural networks, D2NNs utilize the propagation and diffraction of light waves in specific materials to process information, achieving a novel, high-speed, and energy-efficient computing method. These networks typically consist of multiple layers of diffractive elements, including lenses, gratings, and phase modulators, arranged in a specific manner to mimic the function of neurons in biological neural networks. By precisely adjusting the physical parameters of these diffractive elements (such as shape, size, and refractive index), D2NNs can perform complex data processing tasks during the propagation of light waves. The core advantage of D2NNs lies in their exceptional processing speed and parallel computing capabilities. Thanks to the speed of light propagation and diffraction properties, they can process large amounts of data simultaneously, demonstrating extremely high efficiency. Furthermore, these networks do not rely on electrical energy during operation, making them extremely energy-efficient. D2NNs have shown great application potential in multiple fields, including image processing, pattern recognition, and complex data analysis. Especially today, with the increasing demand for high-speed data processing, D2NN provides a brand-new approach for quickly and accurately processing and analyzing large-scale data.

[0013] In the field of medical imaging, D2NN shows great promise. Leveraging its high speed and efficiency, D2NN can significantly improve the processing speed and analysis accuracy of MRI images. Combined with its pattern recognition capabilities, D2NN can effectively extract key features from complex MRI data, assisting doctors in making more accurate diagnoses and treatment plans. Nevertheless, D2NN still faces some challenges in practical applications, including improving processing accuracy, optimizing network structure design, and improving interfaces with traditional electronic systems. With further research and technological advancements, D2NN is expected to demonstrate broader application value in medical imaging and many other fields.

[0014] Because of their ability to precisely control light waves, metasurfaces offer a novel approach to imaging and data processing as components of digital-to-neural (D2NN). Each metasurface layer is specially designed to mimic different processing layers in a D2NN. These layers manipulate the properties of light waves passing through them (such as phase, amplitude, and direction) to achieve operations similar to weighting and activation functions in traditional neural networks. In this way, metasurface D2NNs can perform complex computational tasks on passing light signals.

[0015] Therefore, how to combine electromagnetic metasurfaces and optoelectronic neural networks with deep learning frameworks to improve the computational speed and efficiency of medical image processing is a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0016] In view of this, the present invention provides a multimodal brain tumor image segmentation method based on metasurface encoder, which solves the problems existing in the background technology.

[0017] A multimodal brain tumor image segmentation method based on metasurface encoders includes the following steps:

[0018] Multimodal magnetic resonance imaging data were acquired and preprocessed using an electromagnetic metasurface encoder to obtain preprocessed data;

[0019] The preprocessed data is input into a deep learning network using a processor to perform image segmentation, and the image segmentation result is obtained.

[0020] Based on the image segmentation results output by the deep learning network, a decoder is used to reconstruct a high-resolution image.

[0021] Optionally, the electromagnetic metasurface encoder consists of multiple subwavelength units, with each subwavelength unit interconnected between adjacent layers via optical diffraction, and each subwavelength unit being an independent neuron.

[0022] Optionally, each subwavelength unit of the electromagnetic metasurface encoder includes a fully polarized metal composite unit with three circular patches, and the radius of the circular patches varies within a preset range.

[0023] Optionally, preprocessing can be performed using an electromagnetic metasurface encoder, specifically as follows:

[0024] Electromagnetic metasurface encoders utilize multilayer subwavelength units to encode high-dimensional information from multimodal magnetic resonance imaging data, thereby obtaining low-dimensional feature representations.

[0025] Optionally, the deep learning network is a U-Net semantic segmentation neural network used to identify and segment brain tumor images from the low-dimensional feature representation output by the electromagnetic metasurface encoder.

[0026] Optionally, the decoder reconstructs high-resolution images using transposed convolution and upsampling techniques.

[0027] As can be seen from the above technical solution, compared with the prior art, this invention discloses a multimodal brain tumor image segmentation method based on a metasurface encoder. It employs an electromagnetic encoder based on metasurface technology to reduce data dimensionality while retaining key feature information in the image, significantly improving image processing speed and ensuring high-resolution image reconstruction and accurate tumor localization. Subsequently, the encoded data is input into the UNet semantic segmentation model, and the output is encoded to obtain a tumor segmentation image. The optimized U-Net network provides higher accuracy and reliability for brain tumor identification and segmentation, while the decoder can reconstruct high-quality images from the compressed data, providing doctors with more accurate diagnostic information. Compared with traditional brain imaging tumor segmentation methods, this invention greatly improves processing speed; due to the effective reduction of data dimensionality, the required storage space and computing resources are also significantly reduced, thus providing a more economical and efficient solution for clinical applications and research. Attached Figure Description

[0028] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0029] Figure 1 A flowchart of the multimodal brain tumor image segmentation method based on metasurface encoder provided by the present invention;

[0030] Figure 2 This is a complete optoelectronic network schematic diagram provided by the present invention;

[0031] Figure 3 This is a schematic diagram of the electromagnetic encoder's working process provided by the present invention;

[0032] Figure 4 This is a schematic diagram of the metasurface diffraction principle provided by the present invention;

[0033] Figure 5 This is a schematic diagram of a traditional convolutional autoencoder training method provided by the present invention;

[0034] Figure 6 The training loss curve of a traditional convolutional autoencoder provided by this invention;

[0035] Figure 7 This is a schematic diagram of the present invention simulating a traditional self-encoder using a metasurface;

[0036] Figure 8 The phase distribution diagram of the three-layer metasurface after network convergence provided by the present invention;

[0037] Figure 9 The schematic diagram of the metasurface unit provided by the present invention has the following dimensions: a = 13 mm, w = 0.5 mm, h = 3 mm;

[0038] Figure 10 The dispersion relation curve of the transmissive metasurface provided by the present invention;

[0039] Figure 11 This is a schematic diagram of the low-dimensional space representation of tumor tags provided by the present invention;

[0040] Figure 12 This is a schematic diagram of the tumor segmentation results provided by the present invention;

[0041] Figure 13 This is a schematic diagram of the low-dimensional characteristic light field distribution after metasurface electromagnetic encoding provided by the present invention;

[0042] Figure 14 This is a schematic diagram of the multimodal brain tumor images acquired according to the present invention. Detailed Implementation

[0043] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0044] This embodiment discloses a multimodal brain tumor image segmentation method based on a metasurface encoder, such as... Figure 1 As shown, it includes the following steps:

[0045] Multimodal magnetic resonance imaging data were acquired and preprocessed using an electromagnetic metasurface encoder to obtain preprocessed data;

[0046] The preprocessed data is input into a deep learning network using a processor to perform image segmentation, and the image segmentation result is obtained.

[0047] Based on the image segmentation results output by the deep learning network, a decoder is used to reconstruct a high-resolution image.

[0048] In this embodiment, a novel opto-neural network is constructed by introducing the concept of a metasurface encoder as a preprocessor for image segmentation. This network physically performs feature dimensionality reduction on multimodal brain images. The dimensionality-reduced feature map is then input into the traditional image segmentation network U-Net, and a high-resolution tumor segmentation image is reconstructed through a decoding process. This approach combines electromagnetic metasurfaces and opto-neural networks with a deep learning framework, providing a novel approach to improving the computational speed and efficiency of medical image processing.

[0049] Furthermore, the electromagnetic metasurface encoder consists of multiple subwavelength units, with each subwavelength unit interconnected between adjacent layers via optical diffraction. Designed based on Rayleigh-Sommerfeld diffraction theory, it effectively controls the propagation and processing of optical signals, with each subwavelength unit being an independent neuron.

[0050] Furthermore, each subwavelength unit of the electromagnetic metasurface encoder includes a fully polarized metal composite unit with three circular patches, and the radius of the circular patches varies within a preset range.

[0051] Furthermore, preprocessing is performed using an electromagnetic metasurface encoder, specifically as follows:

[0052] Electromagnetic metasurface encoders utilize multi-layer subwavelength units to encode high-dimensional information from multimodal magnetic resonance imaging data, obtaining low-dimensional feature representations. This effectively reduces data dimensionality, accelerates processing speed, and maintains the integrity of key information.

[0053] Furthermore, the deep learning network is the U-Net semantic segmentation neural network, which processes complex heterogeneous brain imaging data. It is used to identify and segment brain tumor images from the low-dimensional feature representation output by the electromagnetic metasurface encoder. The optimized U-Net network provides higher accuracy and reliability for the identification and segmentation of brain tumors.

[0054] Furthermore, the decoder reconstructs high-resolution images using transposed convolution and upsampling techniques, which can reconstruct high-quality images from compressed data, providing doctors with more accurate diagnostic information.

[0055] Next, combined Figures 2-14 This paper provides a deeper understanding of conventional encoders and electromagnetic encoders through a specific embodiment.

[0056] 1. Traditional encoder training

[0057] In the traditional convolutional autoencoder training process, a dataset is first constructed comprising four brain imaging modalities (T1, T2, T1ce, and Flair). Each modality consists of 10,000 images with a resolution of 160×160 pixels, used for training. Additionally, for model validation, 2,500 images are configured as a validation set; these validation images are identical to those in the training set and serve as labels for the autoencoder model.

[0058] The autoencoder model architecture consists of two parts: an encoder and a decoder. The encoder comprises three identical convolutional layers, each followed by a modified linear unit activation function and a batch normalization layer. The first layer accepts a single-channel input and uses a 4×4 convolutional kernel with a stride of 2 and padding of 1, generating an output of size [16, 80, 80]. The second layer further reduces the dimension to [8, 40, 40]. Finally, the third layer transforms the input into a low-dimensional feature representation of dimension [2, 20, 20].

[0059] The decoder aims to reconstruct the original data from these dimensionality-reduced features. It also has three identical convolutional transpose layer structures, each followed by a ReLU activation function and a batch normalization layer. The first layer receives the 2-channel encoded output and expands it into a representation of size [8, 40, 40]. The second layer further decodes it into feature maps of size [16, 80, 80]. The final convolutional transpose layer reconstructs the original data into a size of [1, 160, 160]. During the forward pass, the autoencoder returns a low-dimensional spatial representation of the brain image.

[0060] After hundreds of training cycles, a continuous decrease in loss values ​​was observed on both the training and validation sets, and the trends of the loss values ​​on both sets tended to be consistent, indicating the reliability of the model and its ability to effectively control overfitting. This was further confirmed by cross-validation. To demonstrate the model's performance, five brain images from different modalities were selected, and the significant similarity between the reconstructed images and the original images indirectly corroborated that the encoder successfully extracted key features.

[0061] After completing the above steps, these dual-channel feature maps are connected and flattened to form an image of size [1, 20, 40], which is saved and used as labels for subsequent training stages.

[0062] 2. Design and training of electromagnetic encoders

[0063] The electromagnetic encoder is based on three layers of high-density subwavelength units constructed using a transmissive metasurface. These units are interconnected between adjacent layers via optical diffraction, with each unit acting as an independent neuron. Based on the Huygens-Fresnel principle, each light field is considered as numerous secondary wave sources, and their propagation effect can be described by the superposition of all secondary waves. We also utilize Rayleigh-Sommerfeld diffraction theory to quantify this process. In the network construction, the first and second layers contain 160×160 neurons, while the third layer contains 40×40 neurons, each neuron measuring 8×8 mm. 2 The interlayer axial distance is 600mm. This embodiment uses image data of size [1, 160, 160] originally used to train a traditional convolutional autoencoder, and low-dimensional feature maps of size [1, 20, 40] collected in the previous stage as labels. Each unit of the metasurface contains a three-layer fully polarized metal composite unit with circular patches. According to CST system simulation results, the radius of the circular patches changes within a certain range, and the transmission phase covers a 2π range at the operating frequency. In the optimization of the transmission coefficient gradient, it is assumed that the neurons have uniform transmission amplitude, and a gradient is set and initialized for the transmission phase of each unit. Through several iterations, the network gradually converges, forming the phase parameters of the three-layer metasurface. Similarly, all data after electromagnetic encoding by the metasurface is collected as input to the subsequent image segmentation network.

[0064] 3. Decoder training and image reconstruction

[0065] Convolutional autoencoders with the same structure as in the first step are constructed, using the labels provided in the dataset as both input and output. After multiple iterations, the network gradually converges, preserving the decoder portion of the autoencoder's network structure parameters and the low-dimensional feature encoding of the tumor labels.

[0066] The electromagnetic encoding of the brain images obtained in the second step is used as the training and test sets, and then input into the UNet semantic segmentation neural network. The low-dimensional feature encoding of the brain tumor is used as the label. After several iterations, the UNet network gradually converges. At this point, the complete electromagnetic encoding-image segmentation-decoding network has been trained. In the complete optoelectronic hybrid network, multimodal brain images are input into the electromagnetic encoder to obtain low-dimensional optical feature encoding, which is then transmitted to the semantic segmentation network. Finally, the decoder reconstructs the final high-resolution tumor segmentation result.

[0067] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.

[0068] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A multimodal brain tumor image segmentation method based on a metasurface encoder, characterized in that, Includes the following steps: Multimodal magnetic resonance imaging data were acquired and preprocessed using an electromagnetic metasurface encoder to obtain preprocessed data; The preprocessed data is input into a deep learning network using a processor to perform image segmentation, and the image segmentation result is obtained. High-resolution images are reconstructed using a decoder based on the image segmentation results output by a deep learning network. The electromagnetic metasurface encoder consists of multiple subwavelength units. Each subwavelength unit is interconnected between adjacent layers through optical diffraction, and each subwavelength unit is an independent neuron. Each subwavelength unit of the electromagnetic metasurface encoder contains a fully polarized metal composite unit with three circular patches, and the radius of the circular patches varies within a preset range. Preprocessing is performed using an electromagnetic metasurface encoder. Specifically, the electromagnetic metasurface encoder uses multi-layer subwavelength units to encode the high-dimensional information of multimodal magnetic resonance imaging data to obtain a low-dimensional feature representation.

2. The multimodal brain tumor image segmentation method based on a metasurface encoder according to claim 1, characterized in that, The deep learning network is the U-Net semantic segmentation neural network, used to identify and segment brain tumor images from low-dimensional feature representations output by an electromagnetic metasurface encoder.

3. The multimodal brain tumor image segmentation method based on a metasurface encoder according to claim 1, characterized in that, The decoder reconstructs high-resolution images using transposed convolution and upsampling techniques.