Multispectral imaging system based on metasurface and target detection method of multispectral imaging system based on metasurface
By combining metasurfaces with multispectral target detection, the problems of large size and severe crosstalk in traditional multispectral optical systems are solved, achieving improved multispectral image quality and target detection accuracy, especially for efficient detection in complex scenes.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIDIAN UNIV
- Filing Date
- 2026-02-04
- Publication Date
- 2026-06-23
Smart Images

Figure CN122259028A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of spectral imaging and computer vision, and in particular to a multispectral imaging system based on metasurfaces and a target detection method based on the multispectral imaging system based on metasurfaces. Background Technology
[0002] Multispectral detection technology distinguishes camouflaged targets from complex backgrounds by analyzing the radiation characteristics of targets in different spectral bands. However, its performance is limited by the spectral segmentation accuracy of the optical system, its crosstalk suppression capability, and the feature utilization efficiency of the target detection algorithm. Traditional multispectral optical systems suffer from problems such as large size, insufficient optimization of film structure, and severe spectral crosstalk, resulting in low-quality multispectral data. At the same time, existing target detection methods mostly rely on single-band or simple spectral stitching, failing to fully exploit the complementarity of multispectral features, leading to a significant drop in detection accuracy in complex scenarios such as low light and occlusion. Summary of the Invention
[0003] This application provides a multispectral imaging system based on metasurfaces and a target detection method based on the multispectral imaging system based on metasurfaces, which combines metasurfaces with multispectral target detection to improve the spectral segmentation accuracy of multispectral image acquisition and the accuracy of target detection algorithms.
[0004] In a first aspect, this application provides a multispectral imaging system based on metasurfaces, comprising: Front imaging system, array aperture stop and microlens array; A single image plane is formed after one imaging operation using a front-facing imaging system; The array aperture stop includes multiple sub-apertures, each sub-aperture corresponding to a spectral channel. The primary image plane is divided into a multi-channel image by multiple sub-apertures. Microlens arrays are based on metasurface aperture imaging, which focuses the image passing through the aperture onto the corresponding position of the detector according to the spectrum, and obtains a multispectral image through the detector.
[0005] In the imaging system provided in this embodiment, the micro-nano structure of the metasurface can achieve fine spectral segmentation and precise phase control, promote the miniaturization and low crosstalk of the multispectral imaging system, eliminate chromatic aberration phase, and improve the image quality of multispectral images.
[0006] Secondly, this application provides a target detection method based on a multispectral imaging system using metasurfaces, comprising: Multispectral images of the target scene are acquired by a multispectral optical system based on metasurfaces, and the multispectral images are stitched together to form three-dimensional multispectral data; Multi-scale features of three-dimensional multispectral data are extracted in parallel by a multi-branch feature extraction module. Determine the branch weight parameters, calculate the normalized weight of each branch using the branch weight parameters, and use the normalized weights to fuse the multi-scale features of each branch to obtain preliminary fused features; The spatial weights of the three-dimensional multispectral data are determined by a spatial attention mechanism, and the spatial weights are used to weight the preliminary fusion features to obtain the final fusion features. The final fused features are detected using a teacher-student model, and a prediction result is output, which includes target category and location information.
[0007] According to the method provided in this embodiment, the multi-branch feature extraction module can process multispectral images in parallel and extract multispectral information, thereby making full use of the complementarity between multispectral information and improving the efficiency and accuracy of target detection.
[0008] Thirdly, this application provides a target detection device based on a metasurface-based multispectral imaging system, comprising: Multispectral optical system and target detection module based on metasurface; Among them, the multispectral optical system based on metasurface is used to acquire multispectral images of the target scene; The target detection module is used to extract multi-scale features of three-dimensional multispectral data in parallel through a multi-branch feature extraction module; determine branch weight parameters, calculate the normalized weight of each branch using the branch weight parameters, and fuse the multi-scale features of each branch using the normalized weights to obtain preliminary fused features; determine the spatial weight of the three-dimensional multispectral data through a spatial attention mechanism, and use the spatial weights to weight the preliminary fused features to obtain final fused features; use a teacher-student model to detect the final fused features and output prediction results, which include target category and location information.
[0009] Fourthly, this application provides an electronic device including a memory and one or more processors. The memory stores one or more computer programs, each including instructions that, when executed by the processor, cause the electronic device to perform a target detection method as described in the second aspect of a metasurface-based multispectral imaging system.
[0010] Fifthly, this application provides a computer-readable storage medium storing instructions that, when executed on an electronic device, cause the electronic device to perform the target detection method of the metasurface-based multispectral imaging system as described in the second aspect.
[0011] In a sixth aspect, this application provides a computer program product that, when run on an electronic device, causes the electronic device to perform the target detection method of the multispectral imaging system based on metasurfaces as described in the second aspect.
[0012] Understandably, the beneficial effects that the target detection device, electronic device, computer-readable storage medium, and computer program product based on the metasurface multispectral imaging system provided above can be referred to the beneficial effects in the second aspect, and will not be repeated here. Attached Figure Description
[0013] Figure 1 This is a schematic diagram of the system architecture of a multispectral imaging system based on metasurfaces provided in an embodiment of this application; Figure 2 This is a schematic diagram of the microlens array structure in the embodiments of this application; Figure 3 A schematic flowchart illustrating the target detection method of a multispectral imaging system based on metasurfaces provided in this application embodiment; Figure 4 This is a schematic diagram of the model structure provided in the embodiments of this application; Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application. Detailed Implementation
[0014] To facilitate a clear description of the technical solutions in the embodiments of this application, the terms "first" and "second" are used in the embodiments of this application to distinguish identical or similar items with substantially the same function and effect. For example, "first chip" and "second chip" are only used to distinguish different chips and do not limit their order. Those skilled in the art will understand that the terms "first" and "second" do not limit the quantity or execution order, and the terms "first" and "second" do not necessarily imply that they are different. It should be noted that in the embodiments of this application, the words "exemplary" or "for example" are used to indicate that they are examples, illustrations, or descriptions. Any embodiment or design scheme described as "exemplary" or "for example" in this application should not be construed as being better or more advantageous than other embodiments or design schemes. Specifically, the use of the words "exemplary" or "for example" is intended to present the relevant concepts in a specific manner. In the embodiments of this application, "at least one" means one or more, and "more than one" means two or more.
[0015] It should be noted that "at the time of..." in the embodiments of this application can be either at the instant when a certain situation occurs, or for a period of time after the occurrence of a certain situation. The embodiments of this application do not make specific limitations on this.
[0016] The implementation of this embodiment will now be described in detail with reference to the accompanying drawings.
[0017] This embodiment first provides a multispectral imaging system based on metasurfaces, which combines metasurfaces for multispectral image acquisition and improves the accuracy of spectral segmentation by using metasurfaces, thereby improving the image quality of multispectral images.
[0018] Therefore, this paper integrates metasurface multispectral acquisition, multi-branch feature extraction, and semi-supervised learning to design an efficient multispectral target detection scheme. Metasurfaces, with their unique optical manipulation capabilities based on micro- and nanostructures, can achieve fine-grained spectral segmentation and precise phase control, providing a solution for the miniaturization and high performance of multispectral optical systems. Semi-supervised learning, combining labeled and unlabeled data, can improve the model's generalization ability when data labeling is insufficient. Multi-branch feature extraction can process multispectral image sequences in parallel, fully utilizing the complementarity of multispectral information.
[0019] like Figure 1 As shown, the multispectral imaging system based on metasurfaces provided in this application includes: a front imaging system, an array aperture stop, and a microlens array. The front imaging system performs a single imaging operation to form a primary image plane; the array aperture stop includes multiple apertures, each corresponding to a spectral channel, and the primary image plane is segmented into a multi-channel image through these multiple apertures; the microlens array, based on metasurface aperture-segmented imaging, focuses the image passing through the apertures onto the corresponding positions of the image sensor according to the spectrum, thus obtaining a multispectral image.
[0020] In this embodiment, a front-mounted imaging optical system is used to form a primary image plane after a single imaging process. The aperture stop is located behind the primary image plane and divides the entire system into multi-channel images, with each channel corresponding to a different spectral channel, thus achieving aperture-segmented spectral imaging.
[0021] Figure 2 A schematic diagram of the microlens array structure is shown, for reference. Figure 2 The microlens array includes a metasurface, an array black matrix, and a lens matrix. The metasurface is arranged with various nanounits, each nanounit corresponding to a spectral channel. The transmittance of the target wavelength in the corresponding spectral channel is higher than that in other spectral channels. The target wavelengths of different spectral channels are different. The array black matrix is used to suppress spectral crosstalk between different spectral channels. The lens matrix is a non-uniform array that is matched one-to-one with the spectral channels.
[0022] In this embodiment, a metasurface is used instead of a traditional filter matrix for spectral segmentation. Each nanounit in the metasurface is composed of a periodic arrangement of the same type of nanopillar structure. The resonant wavelength of the nanopillar structure is pre-adjusted so that the difference between the spectral transmittance of the resonant wavelength and the target wavelength meets a preset condition.
[0023] Multi-channel images passing through the array aperture aperture pass through the spectral channels of the metasurface. Each spectral channel transmits the most light near the target wavelength, thus allowing the multi-channel images to be focused onto the image sensor in order of different target wavelengths, forming multiple spectral images.
[0024] The metasurface employs a microlens array aperture-splitting imaging architecture to achieve spectral segmentation. The substrate is made of a high-transmittance glass material, and the functional layer is deposited with a high-refractive-index dielectric material, such as titanium oxide, silicon nitride, or gallium nitride. First, a corresponding nanounit structure is designed for each channel, ensuring high transmittance near its target wavelength. Based on the different cylindrical spectral transmittance of each structure, the surface sources are arranged in a non-uniform a×b channel filter matrix, with a passband half-width... ≤60nm , where a and b are natural numbers.
[0025] In an exemplary embodiment, the metasurface may have nine spectral channels, forming a nine-channel spectral filter matrix. The total wavelength is divided into nine sub-channels Δλ. i The center wavelength is λ i Using the center wavelengths λ1, λ2, ..., λ9 of the nine channels as design targets, a corresponding nanounit structure was designed for each channel, ensuring that the unit has high transmittance near its corresponding target wavelength, while having lower transmittance at the target wavelengths of other channels. Then, utilizing the Mie resonance of the nanopillar pores, the resonant wavelength λ was shifted by changing the structural characteristic size parameters (diameter d, height h, period p). Parameter scanning was used to establish the relationship between structural size and spectral transmittance. T ( λ The mapping relationship is used to determine the transmission spectrum T. sim (λ) and the ideal target filter spectrum T target The minimum difference between (λ) is used to determine the final parameter shape. The imaging surface is then divided into nine sub-regions, each consisting of a periodic arrangement of the same type of nanopillar structure, forming a corresponding filter channel, thereby realizing a nine-channel spectral filter matrix on a single integrated chip.
[0026] The array of black matrices is based on chromium oxide and is fabricated using photolithography.
[0027] The lens matrix is precisely matched to the microlenses of each channel to form a non-uniform microlens array, which meets the differential requirements of the algorithm during signal processing.
[0028] This embodiment utilizes the unique optical modulation capabilities of metasurface micro / nano structures to achieve fine spectral segmentation and precise phase control, providing a solution for the miniaturization and high performance of multispectral optical systems.
[0029] This application also provides a target detection method based on the multispectral imaging system of the metasurface. For example, this method can be applied to various electronic devices such as computers (PCs), tablets, virtual reality / augmented reality devices, wearable devices, industrial computers, and vehicle systems; it can also be applied to servers, cloud, server clusters, etc. This embodiment does not make any special limitations on this.
[0030] Figure 3 A schematic flowchart of a target detection method based on a multispectral imaging system with this metasurface, provided in an embodiment of this application, is shown.
[0031] like Figure 3 As shown, the target detection method of the multispectral imaging system based on this metasurface may include the following steps: Step 101: Acquire multispectral images of the target scene using a metasurface-based multispectral optical system, and stitch the multispectral images into three-dimensional multispectral data.
[0032] Step 102: Extract multi-scale features of the three-dimensional multispectral data in parallel using the multi-branch feature extraction module.
[0033] Step 103: Determine the branch weight parameters, calculate the normalized weights of each branch using the branch weight parameters, and use the normalized weights to fuse the multi-scale features of each branch to obtain preliminary fused features.
[0034] Step 104: Determine the spatial weights of the three-dimensional multispectral data through a spatial attention mechanism, and use the spatial weights to weight the preliminary fusion features to obtain the final fusion features.
[0035] Step 105: Use the teacher-student model to detect the final fused features and output the prediction results, which include target category and location information.
[0036] The multi-branch feature extraction module includes multiple sub-networks in each branch. The multi-scale features of the 3D multispectral data are extracted in parallel by the multi-branch feature extraction module, specifically including: extracting output features at different scales through each sub-network, the output features including target class probability, bounding box coordinate offset, and target presence confidence; upsampling and concatenating the output features at each scale of different branches to obtain the concatenated and fused features at that scale; and processing the concatenated and fused features at each scale through a multi-head attention module to obtain the multi-scale features at each scale.
[0037] In this implementation, the teacher-student model is semi-supervised, and its loss function includes the supervision loss for labeled data and the consistency loss for unlabeled data. During object detection, the student module in the trained teacher-student model is used to detect the final fused features. Non-maximum suppression is used to remove duplicate bounding boxes, and the prediction result is output.
[0038] This embodiment provides a target detection model for performing steps 102 to 105 as described above. The target detection model includes a multi-branch feature extraction module, a feature extraction fusion module, and a semi-supervised detection module. Figure 4 The system architecture diagram of this model is shown for reference. Figure 4 As shown, the method may include the following steps: Step 1: Obtaining multispectral images through image cropping: Output of a multispectral optical system based on metasurfaces a× b A multispectral image, which is then stitched together to form three-dimensional multispectral data. N 0 ={ I 0 , I 1 ,..., I m } Single image Im∈ RH×W×C ( H Image height, W Image width, C (Number of channels), and pixel values of each band image are normalized to obtain multispectral sequence data. .
[0039] Step 2: Multi-branch feature extraction: Input the multispectral sequence data N into the image feature extraction module composed of residual blocks. The multi-branch feature extraction module extracts multi-scale features in parallel through its sub-networks. Including target class probability Bounding box coordinate offset There is confidence level with the target After processing by the multi-head attention module, the feature extraction process can be represented as follows:
[0040] in For inputting multispectral sequence data, For the k-th subnetwork Output feature maps for each stage. Each subnetwork... Conv3, Conv4, Conv5 Feature maps are upsampled and concatenated to achieve multi-scale feature fusion:
[0041] Finally obtained Parallel Head The module outputs preliminary predicted features. (Multi-scale features), including target class probability Bounding box coordinate offset There is confidence level with the target .
[0042] Step 3: Feature Extraction and Fusion: The feature extraction and fusion module combines channel weighting with spatial attention to fully exploit the complementarity of multi-branch features, using learnable branch weight parameters. Calculate the normalized weights:
[0043] We perform weighted summation of similar features across branches to obtain preliminary fused features:
[0044]
[0045]
[0046] To enhance the representation of target region features, a spatial attention module is introduced to generate a weight map A:
[0047]
[0048] Finally, the spatial attention weights are multiplied pixel-by-pixel with the initial fusion features to output the final fusion features. , and .
[0049] Step 5, Semi-supervised Predictive Detection: The semi-supervised predictive detection module adopts... Teacher-Student A dual-module structure and a consistency regularization strategy, combined with labeled data. label Compared with unlabeled data Unlabel Conduct joint training. Teacher The module generates high-confidence pseudo-labels for weakly augmented unlabeled data. Student The module processes both weakly augmented labeled data and strongly augmented unlabeled data simultaneously.
[0050] The reasoning phase only uses Student The module takes a sequence of multispectral images to be detected as input, extracts and fuses features, outputs the prediction results, removes duplicate bounding boxes through non-maximum suppression, and finally outputs the target category and location information.
[0051] Optimization is achieved through backpropagation during training. Student Module parameters, TeacherModule parameters are updated using an exponential moving average.
[0052] Furthermore, this embodiment also provides a target detection device based on a metasurface-based multispectral imaging system, which can be used to execute the above-described target detection method based on a metasurface-based multispectral imaging system.
[0053] The target detection device of the metasurface-based multispectral imaging system specifically includes: a metasurface-based multispectral optical system and a target detection module.
[0054] Among them, the multispectral optical system based on metasurfaces is used to acquire multispectral images of the target scene.
[0055] The target detection module is used to extract multi-scale features of three-dimensional multispectral data in parallel through a multi-branch feature extraction module; determine branch weight parameters, calculate the normalized weight of each branch using the branch weight parameters, and fuse the multi-scale features of each branch using the normalized weights to obtain preliminary fused features; determine the spatial weight of the three-dimensional multispectral data through a spatial attention mechanism, and use the spatial weights to weight the preliminary fused features to obtain final fused features; use a teacher-student model to detect the final fused features and output prediction results, which include target category and location information.
[0056] The specific details of each module or unit in the target detection device of the above-mentioned metasurface-based multispectral imaging system have been described in detail in the corresponding target detection method of the metasurface-based multispectral imaging system, so they will not be repeated here.
[0057] This application also provides an electronic device. Figure 5 A schematic diagram of the structure of an electronic device suitable for implementing embodiments of the present disclosure is shown. Figure 5 The electronic device 600 shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments disclosed herein.
[0058] like Figure 5 As shown, the electronic device 600 includes a central processing unit (CPU) 601, which can perform various appropriate actions and processes based on a program stored in a read-only memory (ROM) 602 or a program loaded from a storage section 608 into a random access memory (RAM) 603. The RAM 603 also stores various programs and data required for system operation. The CPU 601, ROM 602, and RAM 603 are interconnected via a bus 604. An input / output (I / O) interface 605 is also connected to the bus 604.
[0059] The following components are connected to I / O interface 605: an input section 606 including a keyboard, mouse, etc.; an output section 607 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 608 including a hard disk, etc.; and a communication section 609 including a network interface card such as a LAN card, modem, etc. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to I / O interface 605 as needed. A removable medium 611, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 610 as needed so that computer programs read from it can be installed into storage section 608 as needed.
[0060] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 609, and / or installed from removable medium 611. When the computer program is executed by central processing unit (CPU) 601, it performs the functions defined in the embodiments of this application.
[0061] For example, when the computer program is executed by the central processing unit (CPU) 601, it can perform the following: acquire multispectral images of the target scene through a metasurface-based multispectral optical system, and stitch the multispectral images into three-dimensional multispectral data; extract multi-scale features of the three-dimensional multispectral data in parallel through a multi-branch feature extraction module; determine branch weight parameters, calculate the normalized weights of each branch through the branch weight parameters, and fuse the multi-scale features of each branch using the normalized weights to obtain preliminary fused features; determine the spatial weights of the three-dimensional multispectral data through a spatial attention mechanism, and weight the preliminary fused features using the spatial weights to obtain final fused features; detect the final fused features using a teacher-student model, and output prediction results, the prediction results including target category and location information.
[0062] It should be noted that the computer-readable medium disclosed herein may be a computer-readable signal medium or a computer-readable storage medium, or any combination thereof. A computer-readable storage medium may be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.
[0063] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0064] The units described in the embodiments of this disclosure can be implemented in software or hardware, and the described units can also be located in a processor. The names of these units do not necessarily limit the unit itself.
[0065] In another aspect, this application also provides a computer-readable medium, which may be included in the electronic device described in the above embodiments; or it may exist independently and not assembled into the electronic device. The computer-readable medium carries one or more programs, which include instructions that, when executed by the electronic device, cause the electronic device to perform the methods described in the above embodiments.
[0066] It should be noted that although several modules or units for the device used to perform actions have been mentioned in the detailed description above, this division is not mandatory. In fact, according to the embodiments of this application, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided and embodied by multiple modules or units.
[0067] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A hyper-surface-based multi-spectral imaging system, comprising: include: Front imaging system, array aperture stop and microlens array; A single image plane is formed after one imaging operation using a front-facing imaging system; An array aperture stop includes multiple apertures, each aperture corresponding to a light channel. A single image plane is divided into a multi-channel image by multiple apertures. Microlens arrays are based on metasurface aperture imaging, which focuses the image passing through the aperture onto the corresponding position of the image sensor according to the spectrum to obtain a multispectral image.
2. The metasurface-based multispectral imaging system of claim 1, wherein, Microlens arrays consist of metasurfaces, array black matrices, and lens matrices; The metasurface is arranged with multiple nanounits, each nanounit corresponding to a spectral channel. The transmittance of the target wavelength in the corresponding spectral channel is higher than that in other spectral channels, and the target wavelength is different in different spectral channels. The array black matrix is used to suppress spectral crosstalk between different spectral channels; The lens matrix is a non-uniform array, matched one-to-one with the spectral channels.
3. The multispectral imaging system based on metasurfaces according to claim 1, characterized in that, Each nanounit is composed of a periodic arrangement of the same nanopillar structure. The resonant wavelength of the nanopillar structure is pre-adjusted so that the difference between the spectral transmittance of the resonant wavelength and the target wavelength meets the preset conditions.
4. The multispectral imaging system based on metasurfaces according to claim 1, characterized in that, The metasurface has nine spectral channels, forming a nine-channel spectral filter matrix; the array black matrix is based on chromium oxide and is fabricated using photolithography.
5. A target detection method based on a multispectral imaging system with metasurfaces, characterized in that, include: The target scene is acquired by the multispectral image of the metasurface-based multispectral optical system according to any one of claims 1-4, and the multispectral image is stitched together to form three-dimensional multispectral data. Multi-scale features of three-dimensional multispectral data are extracted in parallel by a multi-branch feature extraction module. Determine the branch weight parameters, calculate the normalized weight of each branch using the branch weight parameters, and use the normalized weights to fuse the multi-scale features of each branch to obtain preliminary fused features; The spatial weights of the three-dimensional multispectral data are determined by a spatial attention mechanism, and the spatial weights are used to weight the preliminary fusion features to obtain the final fusion features. The final fused features are detected using a teacher-student model, and a prediction result is output, which includes target category and location information.
6. The target detection method of the multispectral imaging system based on metasurfaces according to claim 5, characterized in that, Each branch of the multi-branch feature extraction module includes multiple sub-networks. The multi-scale features of the three-dimensional multispectral data are extracted in parallel through the multi-branch feature extraction module, including: Output features of different scales are extracted through each sub-network. The output features include target category probability, bounding box coordinate offset, and target presence confidence. The output features of each scale of different branches are upsampled and concatenated to obtain the concatenated and fused features of that scale. The splicing and fusion features at each scale are processed by a multi-head attention module to obtain multi-scale features at each scale.
7. The target detection method of the multispectral imaging system based on metasurfaces according to claim 5, characterized in that, Also includes: Semi-supervised training is performed on the teacher-student model, and the loss function includes the supervision loss for labeled data and the consistency loss for unlabeled data. The student module in the trained teacher-student model is used to detect the final fused features. Repeated bounding boxes are removed by non-maximum suppression, and the prediction results are output.
8. A target detection device based on a multispectral imaging system using metasurfaces, characterized in that, include: Multispectral optical system and target detection module based on metasurface; Among them, the multispectral optical system based on metasurface is used to acquire multispectral images of the target scene; The target detection module is used to extract multi-scale features of three-dimensional multispectral data in parallel through a multi-branch feature extraction module; determine branch weight parameters, calculate the normalized weight of each branch using the branch weight parameters, and fuse the multi-scale features of each branch using the normalized weights to obtain preliminary fused features; determine the spatial weight of the three-dimensional multispectral data through a spatial attention mechanism, and use the spatial weights to weight the preliminary fused features to obtain final fused features; use a teacher-student model to detect the final fused features and output prediction results, which include target category and location information.
9. A computer-readable storage medium storing a computer program, which, when executed by a processor, causes the processor to perform a target detection method based on a metasurface multispectral imaging system as described in any one of claims 5 to 7.
10. An electronic device, characterized in that, The device includes a processor and a memory, the memory storing one or more computer programs, the one or more computer programs including instructions that, when executed by the electronic device, cause the electronic device to perform the target detection method of the multispectral imaging system based on metasurfaces as described in any one of claims 5-7.