Intelligent Identification and Classification Method for Marine Debris Based on Multispectral UAV Remote Sensing Imagery

By combining multispectral UAV remote sensing imagery with deep learning technology, the problem of insufficient accuracy in marine debris monitoring has been solved, enabling efficient and real-time marine debris identification, especially for small target detection in complex environments.

CN120673112BActive Publication Date: 2026-06-30MINISTRY OF ECOLOGY & ENVIRONMENT PEARL RIVER BASIN & SOUTH CHINA SEA ECOLOGICAL ENVIRONMENT SUPERVISION & ADMINISTRATION BUREAU ECOLOGICAL ENVIRONMENT MONITORING & SCI RES CENT

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MINISTRY OF ECOLOGY & ENVIRONMENT PEARL RIVER BASIN & SOUTH CHINA SEA ECOLOGICAL ENVIRONMENT SUPERVISION & ADMINISTRATION BUREAU ECOLOGICAL ENVIRONMENT MONITORING & SCI RES CENT
Filing Date
2025-04-29
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional methods for monitoring marine debris are inefficient and lack precision, making it difficult to achieve high-precision and dynamic monitoring, especially in complex environments where the detection accuracy for small targets is low.

Method used

By combining multispectral UAV remote sensing imagery with deep learning technology, and through color correction networks, multi-scale feature pyramids, and conditional generative adversarial networks, the distinguishability between marine debris and background is enhanced. A lightweight detection and classification model is built using EfficientDet-Lite to achieve intelligent identification of marine debris.

Benefits of technology

It significantly improves the accuracy and robustness of marine debris identification, reduces the false detection rate, meets real-time requirements, is suitable for online monitoring by drones, and improves the efficiency of marine debris detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120673112B_ABST
    Figure CN120673112B_ABST
Patent Text Reader

Abstract

This invention discloses an intelligent identification and classification method for marine debris based on multispectral UAV remote sensing imagery, belonging to the field of image processing technology. The method includes: acquiring and preprocessing remote sensing images of a predetermined sea area; constructing a color correction network to perform color correction on the preprocessed remote sensing images; acquiring multi-band features and spectral indices to enhance debris detection; fusing multispectral features using a multi-scale feature pyramid structure; training a conditional generative adversarial network (GAN), inputting the remote sensing images and multispectral features, and fusing the enhanced remote sensing images and the color-corrected remote sensing images pixel-by-pixel to obtain a multi-band fused feature map; constructing a lightweight detection network based on EfficientDet-Lite, using the multi-band fused feature map as input, and outputting the bounding boxes and categories of debris using a joint detection-classification architecture. This invention solves problems such as color distortion, blurred contours, and missed detection of small targets in marine debris identification, while ensuring real-time marine debris identification and classification.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image processing technology, and more specifically, to a method for intelligent identification and classification of marine debris based on multispectral UAV remote sensing images. Background Technology

[0002] With the increasing severity of global marine pollution, the monitoring and management of marine debris (such as plastics, foam, and fishing nets) has become a crucial issue for environmental protection. Traditional marine debris monitoring mainly relies on manual patrols or satellite remote sensing. However, manual patrols are inefficient and have limited coverage, while satellite remote sensing is limited by spatial resolution and revisit frequency, making it difficult to meet the needs for high-precision and dynamic monitoring. In recent years, unmanned aerial vehicle (UAV) remote sensing technology has gradually become an important means of marine environmental monitoring due to its advantages of high flexibility, low cost, and high resolution.

[0003] Multispectral UAV remote sensing, equipped with multispectral sensors, can acquire richer spectral information than visible light, thereby enhancing the ability to identify marine debris. However, marine debris is complex in distribution and diverse in form, and is affected by factors such as lighting, waves, and suspended matter. Traditional classification methods based on thresholds or manual feature extraction often lack sufficient accuracy and cannot meet the needs of practical applications. In recent years, deep learning technology has made significant progress in the field of image recognition, especially models such as convolutional neural networks (CNNs) and Transformers, which have demonstrated powerful feature learning capabilities in target detection and classification tasks, providing a new technical path for the intelligent identification of marine debris. Therefore, how to effectively utilize multi-band information to enhance the distinction between debris and background, and suppress environmental interference to improve the detection accuracy of small targets, is a problem that urgently needs to be solved. Summary of the Invention

[0004] To address the aforementioned technical problems, this invention proposes an intelligent identification and classification method for marine debris based on multispectral UAV remote sensing images. This method solves problems such as color distortion, blurred outlines, and missed detection of small targets in marine debris identification, while ensuring the real-time nature of marine debris identification and classification.

[0005] The first aspect of this invention provides a method for intelligent identification and classification of marine debris based on multispectral UAV remote sensing imagery, comprising the following steps:

[0006] The remote sensing images of a preset sea area are collected using a multispectral sensor carried by a drone. The remote sensing images are preprocessed, a color correction network is constructed, and the color of the preprocessed remote sensing images is corrected.

[0007] Multi-band features and spectral indices for enhanced waste detection are obtained, and a multi-scale feature pyramid structure is used to fuse the multi-band features and spectral indices to generate multi-spectral features.

[0008] The training conditional generative adversarial network is given by inputting remote sensing images and corresponding multispectral features to obtain enhanced remote sensing images. The enhanced remote sensing images are then fused pixel by pixel with the color-corrected remote sensing images to obtain a multi-band fused feature map.

[0009] A lightweight detection and classification model is built based on EfficientDet-Lite. The multi-band fused feature map is used as input, and a joint detection-classification architecture is adopted to output the bounding box of the waste and the waste category.

[0010] In this scheme, a multispectral sensor mounted on a drone is used to collect remote sensing images of a predetermined sea area. The remote sensing images are then preprocessed, specifically as follows:

[0011] The drone is equipped with a multispectral sensor to collect remote sensing images of a predetermined ocean area, including visible light, near-infrared and short-wave infrared bands. Metadata during the remote sensing image acquisition process is recorded. The multispectral remote sensing images are grouped by band, and the band with the highest spatial resolution is selected as the reference image, while the other bands are used as images to be registered.

[0012] The remote sensing image is initially coarsely registered based on the metadata, and each band is initially aligned using affine transformation. Histogram matching is then used to make the brightness distribution of the image to be registered approximate that of the reference image.

[0013] In the reference image and the image to be registered, the ORB algorithm combined with the feature pyramid is used to detect multi-scale feature points, the fast approximate nearest neighbor algorithm is used to perform preliminary matching of feature points, the Euclidean distance between descriptors is calculated for registration, and mismatched points are removed based on the registration results.

[0014] The reference image and the image to be registered are divided into grids. The local homography transformation is calculated independently for each grid, and gridded local registration is performed. The transformation parameters of adjacent grids are smoothed by the moving least squares method to obtain the registered remote sensing image.

[0015] In this scheme, a color correction network is constructed to perform color correction on the preprocessed remote sensing images, specifically as follows:

[0016] A color correction network is constructed based on U-Net as the backbone network. The input layer channel layer is extended to the number of bands of multispectral remote sensing images. In the encoder part, grouped convolution is used to process different bands independently and extract band features. After each downsampling, an attention mechanism is used to dynamically weight important band features.

[0017] In the bottleneck layer, a cross-band feature interaction module is used to fuse multispectral information. In the decoder part, a multi-scale channel attention mechanism is introduced to adaptively fuse the upsampled features of each layer with the band features extracted by the corresponding encoder. The output layer is used to generate the corrected multispectral remote sensing image.

[0018] Add a discriminator for adversarial training to optimize the color correction network. Cut the generated multispectral remote sensing image and the real label into image blocks. The discriminator judges the authenticity of each block. Use alternating training to optimize the local color of the color correction grid based on the judgment results.

[0019] The preprocessed remote sensing images are imported into the trained color correction network, and the color-corrected multi-band remote sensing images are output.

[0020] In this scheme, multi-band features and spectral indices for enhanced waste detection are obtained. A multi-scale feature pyramid structure is used to fuse the multi-band features and spectral indices to generate multispectral features, specifically:

[0021] After acquiring color-corrected multi-band remote sensing images, low-level features of each band are extracted independently using lightweight convolutional blocks. The low-level features of each band are then input into the EfficientNet backbone network to extract high-level semantic features. Multi-band contextual information is fused, and multi-scale feature maps are output as multi-band features.

[0022] Marine debris detection instances are retrieved, preprocessed, and SHAP local attribution analysis is used to interpret the preprocessed instance samples. The Shapley values ​​corresponding to each spectral index in the instance samples are calculated, and the absolute values ​​of the Shapley values ​​corresponding to each spectral index are taken and normalized to a percentage to quantify local importance.

[0023] The spectral indices involved in the instance samples are sorted according to the local importance, a preset number of spectral indices are selected as key spectral indices, radiometrically corrected multi-band remote sensing images are obtained, and spectral indices for enhanced garbage detection are extracted based on the key spectral indices.

[0024] The multi-band features and spectral indices are fused step by step using a multi-scale feature pyramid to generate multispectral features. The multispectral features are then batch normalized and nonlinearly enhanced to output optimized multispectral features.

[0025] In this scheme, enhanced remote sensing imagery is acquired, and then the enhanced remote sensing imagery is fused pixel-by-pixel with the color-corrected remote sensing imagery to obtain a multi-band fusion feature map. Specifically:

[0026] A generator based on residual dense blocks combined with multi-scale dilated convolution is constructed to form a conditional generative adversarial network. The preprocessed remote sensing image and its corresponding multispectral features are input to obtain a remote sensing image with enhanced details.

[0027] The enhanced remote sensing image is imported into the discriminator, which determines whether the imported image belongs to real data or generated data. The generator and discriminator of the conditional generative adversarial network are trained alternately using training data, and the parameters are updated until the discriminator can no longer correctly classify the imported image, and the enhanced remote sensing image is output.

[0028] The color-corrected remote sensing image is acquired and subjected to Sobel edge extraction to obtain an edge intensity map. A dynamic weight map is generated based on the edge intensity map. An adaptive weight is obtained based on the dynamic weight map, and the enhanced remote sensing image and the color-corrected remote sensing image are fused pixel by pixel to obtain a multi-band fusion feature map.

[0029] In this solution, a lightweight detection and classification model is built based on EfficientDet-Lite, specifically as follows:

[0030] A lightweight detection and classification model is built based on the EfficientDet-Lite framework. Training and testing data are obtained using marine debris detection instances for model training and testing.

[0031] The training data is imported into the detection and classification model, and the Ghost module is used to replace the traditional convolution for feature extraction. The feature map output by the Ghost module is weighted by coordinate attention to enhance the response at key locations.

[0032] The detection branch is constructed using a lightweight feature pyramid. A BiFPN structure is adopted, and the convolution in the cross-connection is replaced with Ghost convolution. The weighted feature map is imported into the detection branch. After further feature extraction through the lightweight feature pyramid, three Ghost convolutional layers with shared weights are used to generate bounding box predictions.

[0033] In the classification branch, the weighted feature maps are concatenated, and different band features are dynamically weighted through channel attention. A classifier is built using two fully connected layers to generate class predictions. After iterative training, the bounding box predictions and class predictions are validated using the test data. When the validation is successful, the network parameters of the current detection classification model are retained, and the trained detection classification model is output.

[0034] In this scheme, the multi-band fused feature map is used as input, and a joint detection-classification architecture is adopted to output the bounding box and waste category of the waste, specifically:

[0035] The multi-band fused feature map corresponding to the multi-band remote sensing image is imported into the detection and classification model. The garbage bounding box prediction result and garbage category prediction result are obtained through the detection branch and classification branch of parallel computing.

[0036] Based on the predicted bounding box and category of the debris, time-series processing is performed to obtain the movement path of marine debris. Spatiotemporal features are extracted based on the movement path, and the movement path is predicted using the spatiotemporal features. The predicted movement path is then sent and displayed using a preset method.

[0037] The second aspect of this invention provides an intelligent marine debris identification and classification system based on multispectral UAV remote sensing imagery. The system includes a data acquisition and preprocessing module, a multispectral feature enhancement module, a target detection and classification module, and a post-processing optimization module.

[0038] The data acquisition and preprocessing module acquires remote sensing images of a preset sea area, preprocesses the remote sensing images, and uses a color correction network to perform color correction on the preprocessed remote sensing images.

[0039] The multispectral feature enhancement module is responsible for acquiring multi-band features and enhancing the spectral index for garbage detection. It uses a multi-scale feature pyramid structure to fuse the multi-band features and spectral index to generate multispectral features. It uses a conditional generative adversarial network to generate enhanced remote sensing images and fuses them pixel by pixel with the color correction results to construct a multi-band fused feature map.

[0040] The target detection and classification module is responsible for building a lightweight detection and classification model based on EfficientDet-Lite, taking the multi-band fused feature map as input, and using the detection branch and classification branch to obtain the garbage detection box and garbage category.

[0041] The post-processing optimization module is responsible for optimizing waste detection results, reducing false detections and missed detections, and compressing and accelerating the detection and classification model.

[0042] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0043] This invention significantly improves the accuracy, robustness, and real-time performance of marine debris identification by fusing multispectral data, deep learning enhancement, and lightweight detection techniques. Multi-band feature fusion improves the detection rate of debris such as plastics and foams, especially under complex lighting and water conditions, greatly reducing the false detection rate; conditional generative adversarial networks enhance the detection of contour blurring and color distortion, improving the detection accuracy of small targets (such as microplastic fragments).

[0044] In addition, the lightweight detection model (EfficientDet-Lite) was selected to meet the real-time business requirements and is suitable for online monitoring by UAVs, which greatly improves the efficiency of marine debris detection. It solves the pain points of traditional methods such as low efficiency, missed detection, false detection and insufficient real-time performance in complex environments, and provides a feasible intelligent solution for marine environmental protection. Attached Figure Description

[0045] To more clearly illustrate the technical solutions in the embodiments or examples of the present invention, the drawings used in the embodiments or examples will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained according to these drawings without creative effort.

[0046] Figure 1 A flowchart of a method for intelligent identification and classification of marine debris based on multispectral UAV remote sensing imagery is shown.

[0047] Figure 2 A flowchart illustrating the generation of multispectral features in the embodiment is shown;

[0048] Figure 3 The flowchart illustrating the construction of the lightweight detection and classification model in this embodiment is shown.

[0049] Figure 4 A block diagram of a marine debris intelligent identification and classification system based on multispectral UAV remote sensing imagery is shown. Detailed Implementation

[0050] To better understand the above-mentioned objectives, features, and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in these embodiments can be combined with each other.

[0051] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and therefore the scope of protection of the invention is not limited to the specific embodiments disclosed below.

[0052] Figure 1 A flowchart of a method for intelligent identification and classification of marine debris based on multispectral UAV remote sensing imagery is shown.

[0053] like Figure 1 As shown, this embodiment provides a method for intelligent identification and classification of marine debris based on multispectral UAV remote sensing imagery, including:

[0054] S102, use the multispectral sensor on the drone to collect remote sensing images of the preset sea area, preprocess the remote sensing images, construct a color correction network, and perform color correction on the preprocessed remote sensing images;

[0055] S104, acquire multi-band features and spectral indices for enhanced garbage detection, and fuse the multi-band features and spectral indices using a multi-scale feature pyramid structure to generate multi-spectral features;

[0056] S106, train the conditional generative adversarial network, input remote sensing images and corresponding multispectral features, obtain enhanced remote sensing images, fuse the enhanced remote sensing images with the color-corrected remote sensing images pixel by pixel, and obtain a multi-band fusion feature map.

[0057] S108. A lightweight detection and classification model is constructed based on EfficientDet-Lite. The multi-band fused feature map is used as input, and the bounding box of the waste and the waste category are output using a joint detection-classification architecture.

[0058] It should be noted that a drone equipped with a multispectral sensor was used to acquire remote sensing images of a predetermined marine area to improve the spectral differentiation of different types of debris (such as plastic, foam, and fishing nets). The multispectral remote sensing images included visible light, near-infrared, and short-wave infrared bands. Metadata from the image acquisition process was recorded, such as flight altitude, lighting conditions, and shooting angle. The multispectral images were grouped by band, and the band with the highest spatial resolution was selected as the reference image, while the remaining bands were used as images to be registered. Initial coarse registration of the remote sensing images was performed based on the GPS and IMU data recorded by the drone. Affine transformations were used to initially align the bands, eliminating significant translation and rotation biases. Histogram matching was used to make the brightness distribution of the images to be registered approximate that of the reference image, reducing contrast deviations caused by differences in lighting or sensors. When fog interference was present, defogging was performed to enhance the extractability of edge features.

[0059] The ORB algorithm combined with feature pyramids is used to detect multi-scale feature points in both the reference image and the image to be registered. ORB has higher computational efficiency and is suitable for scenarios with high real-time requirements, while the multi-scale pyramid strategy enhances adaptability to small targets and weakly textured regions. A fast approximate nearest neighbor algorithm is used for initial feature point matching, and Euclidean distance between descriptors is calculated for registration. Based on the registration results, mismatched points are consistently removed through random sampling. Both the reference image and the image to be registered are divided into grids, and local homography transformation is independently calculated for each grid for gridded local registration. This addresses global registration residuals caused by lens distortion or terrain undulations. Moving least squares is used to smooth the transformation parameters of adjacent grids, avoiding obvious seams between blocks. The resulting registered remote sensing image is obtained. Gridded local registration reduces the computational load of global transformation, making it suitable for real-time processing on UAV platforms. Preferably, the registration quality is verified by calculating the mutual information (MI) or structural similarity (SSIM) of the registered images as quality verification indicators until the accuracy requirements are met.

[0060] It should be noted that the color correction network is built based on U-Net as the backbone network. Its encoder-decoder structure preserves spatial details, making it suitable for pixel-level color correction tasks. The input layer channels are expanded to the number of bands in the multispectral remote sensing image, such as visible light, near-infrared, and short-wave infrared bands, for a total of 5 channels. In the encoder part, grouped convolutions are used to independently process different bands, extracting band features and reducing cross-band interference. After downsampling at each level, an attention mechanism is used to dynamically weight important band features; for example, the short-wave infrared band is more critical for plastic detection. In the bottleneck layer, a cross-band feature interaction module is used to fuse multispectral information. Multispectral information is fused through 1×1 convolutions to generate global color correction parameters. A multi-scale channel attention mechanism is introduced in the decoder section. After upsampling at each layer, adaptive feature fusion is performed with the band features extracted by the corresponding encoder to reduce noise propagation. A 3×3 convolution is used in the output layer to generate the corrected multispectral remote sensing image. A discriminator is added for adversarial training to optimize the color correction network. The generated multispectral remote sensing image is divided into image patches with the real labels, and the discriminator judges the authenticity of each patch. Alternating training is used to optimize the local color of the color correction grid based on the judgment results, forcing the generator to optimize local color consistency. The preprocessed remote sensing image is then imported into the trained color correction network to output the color-corrected multiband remote sensing image.

[0061] Figure 2 A flowchart illustrating the generation of multispectral features in an embodiment is shown.

[0062] According to an embodiment of the present invention, multi-band features and spectral indices for enhanced waste detection are obtained, and a multi-scale feature pyramid structure is used to fuse the multi-band features and spectral indices to generate multispectral features, specifically:

[0063] S202: Acquire multi-band remote sensing images after color correction, use lightweight convolutional blocks to independently extract low-level features of each band, import the low-level features of each band as input into the EfficientNet backbone network to extract high-level semantic features, fuse multi-band context information, and output multi-scale feature maps as multi-band features.

[0064] S204, retrieve marine debris detection instances, preprocess the marine debris detection instances, use SHAP local attribution analysis to interpret the preprocessed instance samples, calculate the Shapley value corresponding to each spectral index in the instance samples, take the absolute value of the Shapley value corresponding to each spectral index and normalize it to a percentage to quantify local importance.

[0065] S206, Sort the spectral indices involved in the instance samples according to the local importance, select a preset number of spectral indices as key spectral indices, obtain radiometrically corrected multi-band remote sensing images, and extract spectral indices for enhanced garbage detection based on the key spectral indices.

[0066] S208, the multi-band features and spectral indices are fused step by step using a multi-scale feature pyramid to generate multispectral features, and the multispectral features are batch normalized and nonlinearly enhanced to output optimized multispectral features.

[0067] It should be noted that after acquiring color-corrected multi-band remote sensing images, lightweight convolutional blocks are used to independently extract low-level features for each band, such as edges and textures, while preserving band specificity. The multi-band low-level feature maps are grouped by band and compressed to the standard input channel number using 1×1 convolutions to meet the input requirements of EfficientNet. A 3×3 depthwise separable convolution is used to initially fuse band information. The MBConv module is used for feature extraction, and through multiple downsampling stages, feature maps of different resolutions are output. Cross-band attention is added to these feature maps, and the correlation matrix between different band feature maps is calculated. Softmax is used to generate fusion weights, and after weighted summation, cross-band enhanced features are output. Multi-scale feature alignment is then performed to output high-level semantic features.

[0068] Marine debris detection examples are retrieved, including detection results from single multispectral images and corresponding multiband data and spectral indices for the region. These examples are preprocessed by calculating the mean band features and mean spectral indices for each pixel, generating an interpretation input vector. SHAP local attribution analysis is used to interpret the preprocessed example samples, obtaining the Shapley value for each spectral indice involved in the marine debris detection example. Positive and negative values ​​indicate promotion or inhibition of detection, and the Shapley value reflects the feature contribution. Standard local importance is quantified, and the top 5 most important spectral indices are selected as the core discrimination criteria for the example and marked as key spectral indices. For example, FDI contributes the most to plastic detection, and NDVI has an inhibitory effect on falsely detected algae.

[0069] It should be noted that the generator of the conditional generative adversarial network is constructed based on two residual dense blocks and multi-scale dilated convolutional branches located between the residual dense blocks. The residual dense blocks contain densely connected multi-layer residual modules, and the batch normalization layer is removed to avoid artifacts. The output of each convolutional layer is connected to all subsequent layers to achieve feature reuse and enhance the efficiency of multispectral information transmission. Three dilated convolutional branches with different dilation rates are set in parallel to process features of different receptive fields in parallel, preserving basic spatial features, capturing mesoscale context, and extracting large-scale edge associations. Weights are dynamically generated through 1×1 convolutions for feature fusion to enhance high-frequency edge information. The preprocessed remote sensing image and its corresponding multispectral features are used as input to the generator to obtain a detail-enhanced remote sensing image. The detail-enhanced remote sensing image is then imported into a PatchGAN-based Markov discriminator, where multi-scale processing is performed to determine whether the imported image belongs to real or generated data. The generator and discriminator of the conditional generative adversarial network are trained alternately using training data, with parameter updates performed until the discriminator can no longer correctly classify the imported image, at which point the enhanced remote sensing image is output. The color-corrected remote sensing image is then acquired and subjected to Sobel edge extraction to obtain an edge intensity map. A dynamic weight map α = σ(β·E + γ) is generated based on this edge intensity map, where α represents the adaptive weight, σ represents the Sigmoid function, E represents the edge intensity map, and β and γ represent learnable parameters. Based on the dynamic weight map, adaptive weights are obtained to fuse the enhanced remote sensing image and the color-corrected remote sensing image pixel-by-pixel, resulting in a multi-band fusion feature map.

[0070] Figure 3 A flowchart illustrating the construction of a lightweight detection and classification model in this embodiment is shown.

[0071] According to an embodiment of the present invention, a lightweight detection and classification model is constructed based on EfficientDet-Lite, specifically as follows:

[0072] S302, a lightweight detection and classification model is built based on the EfficientDet-Lite framework. Training and testing data are obtained using marine debris detection instances for model training and testing.

[0073] S304 imports the training data into the detection and classification model, uses the Ghost module to replace the traditional convolution for feature extraction, and applies coordinate attention weighting to the feature map output by the Ghost module to enhance the response at key locations.

[0074] S306 uses a lightweight feature pyramid to construct the detection branch. It adopts a BiFPN structure, replaces the convolution in the cross-connection with Ghost convolution, imports the weighted feature map into the detection branch, performs further feature extraction through the lightweight feature pyramid, and uses three Ghost convolutional layers with shared weights to generate bounding box predictions.

[0075] S308: In the classification branch, the weighted feature maps are stitched together, and different band features are dynamically weighted through channel attention. A classifier is built using two fully connected layers to generate category predictions. After iterative training, the test data is used to verify the bounding box predictions and category predictions. When the verification is successful, the network parameters of the current detection classification model are retained, and the trained detection classification model is output.

[0076] It should be noted that the standard 3×3 convolutions in the EfficientNet-Lite backbone network are replaced with Ghost convolutions. In the Ghost module, 1×1 convolutions are used to generate a small number of intrinsic feature maps, and depthwise separable convolutions are used to generate phantom feature maps. The intrinsic and phantom features are then concatenated as the output, reducing the number of network parameters and computational cost. A coordinate attention mechanism is inserted into the output of the last three stages of the backbone network. Global pooling is performed on the input features in both height and width directions to generate orientation-aware features. Spatial attention weights are generated by combining convolutions with non-linear activation, and these spatial attention weights are multiplied with the original features to enhance the response at key locations. The detection branch adopts a BiFPN structure, replacing the convolutions in the cross-connections with Ghost convolutions and reducing the number of FPN layers to four, further achieving lightweighting. The classification branch stitches together the feature maps after coordinate attention weighting. The channel attention mechanism dynamically weights features of different bands to represent the differences in importance of different bands, automatically reducing the weight of affected bands in turbid water and improving the feature representation ability of rare waste categories (such as rubber). It uses two fully connected layers to output a Softmax probability distribution. The training of the detection and classification model is divided into two stages: the first stage freezes the backbone network and trains only the detection and classification branches; the second stage fine-tunes all parameters end-to-end.

[0077] Multi-band fused feature maps corresponding to multi-band remote sensing images are imported into a detection and classification model. Parallel computing detection and classification branches are used to obtain predicted bounding boxes and categories of marine debris. Temporal processing is performed based on these predictions, and the current frame detection results are correlated and matched with historical trajectories to obtain the movement path of marine debris. A filter is used to smooth positional shifts. Spatiotemporal features are extracted based on these movement paths, including short-term motion vectors, long-term trend angles, spectral stability, and aggregation indices, corresponding to the instantaneous movement direction, overall migration trend, material change monitoring, and group movement characteristics of the debris. An LSTM prediction model is used to predict the movement path based on these spatiotemporal features. The predicted movement path is then sent and visualized using a preset method, generating a debris distribution heatmap and predicted path arrows, and providing relevant early warnings.

[0078] Figure 4 A block diagram of a marine debris intelligent identification and classification system based on multispectral UAV remote sensing imagery is shown.

[0079] The second embodiment of the present invention provides a marine debris intelligent identification and classification system 4 based on multispectral UAV remote sensing imagery. The system includes a data acquisition and preprocessing module 401, a multispectral feature enhancement module 402, a target detection and classification module 403, and a post-processing optimization module 404.

[0080] The data acquisition and preprocessing module acquires remote sensing images of a preset sea area, preprocesses the remote sensing images, and uses a color correction network to perform color correction on the preprocessed remote sensing images.

[0081] The multispectral feature enhancement module is responsible for acquiring multi-band features and enhancing the spectral index for garbage detection. It uses a multi-scale feature pyramid structure to fuse the multi-band features and spectral index to generate multispectral features. It uses a conditional generative adversarial network to generate enhanced remote sensing images and fuses them pixel by pixel with the color correction results to construct a multi-band fused feature map.

[0082] The target detection and classification module is responsible for building a lightweight detection and classification model based on EfficientDet-Lite. It takes a multi-band fused feature map as input, uses detection branches and classification branches to obtain garbage detection boxes and garbage categories, and uses Focal Loss in the lightweight detection and classification model to alleviate the class imbalance problem.

[0083] The post-processing optimization module is responsible for optimizing the waste detection results, reducing false detections and false negatives, and compressing and accelerating the detection classification model. For example, it uses non-maximum suppression (NMS) optimization to avoid incorrectly suppressing highly overlapping waste of the same type. It combines spectral features (such as the reflectance threshold of plastic in a specific band) to filter out false detection targets such as waves and foam. It uses TensorRT or ONNX Runtime to accelerate model inference and ensure real-time detection on edge computing devices.

[0084] A third aspect of the present invention provides a computer-readable storage medium comprising a program for an intelligent identification and classification method for marine debris based on multispectral UAV remote sensing imagery. When the program is executed by a processor, it implements the steps of the intelligent identification and classification method for marine debris based on multispectral UAV remote sensing imagery.

[0085] In the several embodiments provided in this application, it should be understood that the disclosed methods and systems can be implemented in other ways. The system embodiments described above are merely illustrative. For example, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods, such as: multiple modules or components can be combined, or integrated into another system, or some features can be ignored or not executed. In addition, the coupling, direct coupling, or communication connection between the various components shown or discussed can be indirect coupling or communication connection through some interfaces, devices, or modules, and can be electrical, mechanical, or other forms. Furthermore, in the various embodiments of the present invention, all functional modules can be integrated into one processing module, or each module can be a separate module, or two or more modules can be integrated into one module; the integrated modules can be implemented in hardware or in the form of hardware plus software functional modules.

[0086] Those skilled in the art will understand that all or part of the steps of the above method embodiments can be implemented by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it performs the steps of the above method embodiments. The aforementioned storage medium includes various media capable of storing program code, such as mobile storage devices, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0087] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for intelligent identification and classification of marine litter based on multispectral unmanned aerial vehicle remote sensing images, characterized in that, Includes the following steps: The remote sensing images of a preset sea area are collected using a multispectral sensor carried by a drone. The remote sensing images are preprocessed, a color correction network is constructed, and the color of the preprocessed remote sensing images is corrected. Multi-band features and spectral indices for enhanced waste detection are obtained, and a multi-scale feature pyramid structure is used to fuse the multi-band features and spectral indices to generate multi-spectral features. The training conditional generative adversarial network is given by inputting remote sensing images and corresponding multispectral features to obtain enhanced remote sensing images. The enhanced remote sensing images are then fused pixel by pixel with the color-corrected remote sensing images to obtain a multi-band fused feature map. A lightweight detection and classification model is built based on EfficientDet-Lite. The multi-band fused feature map is used as input, and the lightweight detection and classification model outputs the bounding box of the waste and the waste category. Multi-band features and spectral indices for enhanced waste detection are acquired. A multi-scale feature pyramid structure is then used to fuse these multi-band features and spectral indices to generate multispectral features, specifically: After acquiring color-corrected multi-band remote sensing images, low-level features of each band are extracted independently using lightweight convolutional blocks. The low-level features of each band are then input into the EfficientNet backbone network to extract high-level semantic features. Multi-band contextual information is fused, and multi-scale feature maps are output as multi-band features. Marine debris detection instances are retrieved, preprocessed, and SHAP local attribution analysis is used to interpret the preprocessed instance samples. The Shapley values ​​corresponding to each spectral index in the instance samples are calculated, and the absolute values ​​of the Shapley values ​​corresponding to each spectral index are taken and normalized to a percentage to quantify local importance. The spectral indices involved in the instance samples are sorted according to the local importance, a preset number of spectral indices are selected as key spectral indices, radiometrically corrected multi-band remote sensing images are obtained, and spectral indices for enhanced garbage detection are extracted based on the key spectral indices. The multi-band features and spectral indices are fused step by step using a multi-scale feature pyramid to generate multispectral features. The multispectral features are then batch normalized and nonlinearly enhanced to output optimized multispectral features. 2.The method of claim 1, wherein, Remote sensing images of a predetermined sea area are acquired using a multispectral sensor mounted on a drone. These images are then preprocessed, specifically as follows: The drone is equipped with a multispectral sensor to collect remote sensing images of a predetermined ocean area, including visible light, near-infrared and short-wave infrared bands. Metadata during the remote sensing image acquisition process is recorded. The multispectral remote sensing images are grouped by band, and the band with the highest spatial resolution is selected as the reference image, while the other bands are used as images to be registered. The remote sensing image is initially coarsely registered based on the metadata, and each band is initially aligned using affine transformation. Histogram matching is then used to make the brightness distribution of the image to be registered approximate that of the reference image. In the reference image and the image to be registered, the ORB algorithm combined with the feature pyramid is used to detect multi-scale feature points, the fast approximate nearest neighbor algorithm is used to perform preliminary matching of feature points, the Euclidean distance between descriptors is calculated for registration, and mismatched points are removed based on the registration results. Both the reference image and the image to be registered are divided into grids. Local homography transformation is calculated independently for each grid, and gridded local registration is performed. The transformation parameters of adjacent grids are smoothed using the moving least squares method to obtain the registered remote sensing image. 3.The method of claim 1, wherein, A color correction network is constructed to perform color correction on the preprocessed remote sensing images, specifically as follows: A color correction network is constructed based on U-Net as the backbone network. The input layer channel layer is extended to the number of bands of multispectral remote sensing images. In the encoder part, grouped convolution is used to process different bands independently and extract band features. After each downsampling, an attention mechanism is used to dynamically weight important band features. In the bottleneck layer, a cross-band feature interaction module is used to fuse multispectral information. In the decoder part, a multi-scale channel attention mechanism is introduced to adaptively fuse the upsampled features of each layer with the band features extracted by the corresponding encoder. The output layer is used to generate the corrected multispectral remote sensing image. Add a discriminator for adversarial training to optimize the color correction network. Cut the generated multispectral remote sensing image and the real label into image blocks. The discriminator judges the authenticity of each block. Use alternating training to optimize the local color of the color correction grid based on the judgment results. The preprocessed remote sensing images are imported into the trained color correction network, and the color-corrected multi-band remote sensing images are output. 4.The method of claim 1, wherein, The enhanced remote sensing image is acquired, and then fused pixel-by-pixel with the color-corrected remote sensing image to obtain a multi-band fusion feature map, specifically as follows: A generator based on residual dense blocks combined with multi-scale dilated convolution is constructed to form a conditional generative adversarial network. The preprocessed remote sensing image and its corresponding multispectral features are input to obtain a remote sensing image with enhanced details. The enhanced remote sensing image is imported into the discriminator, which determines whether the imported image belongs to real data or generated data. The generator and discriminator of the conditional generative adversarial network are trained alternately using training data, and the parameters are updated until the discriminator can no longer correctly classify the imported image, and the enhanced remote sensing image is output. The color-corrected remote sensing image is acquired and subjected to Sobel edge extraction to obtain an edge intensity map. A dynamic weight map is generated based on the edge intensity map. An adaptive weight is obtained based on the dynamic weight map, and the enhanced remote sensing image and the color-corrected remote sensing image are fused pixel by pixel to obtain a multi-band fusion feature map. 5.The method of claim 1, wherein, A lightweight detection and classification model is built based on EfficientDet-Lite, specifically as follows: A lightweight detection and classification model is built based on the EfficientDet-Lite framework. Training and testing data are obtained using marine debris detection instances for model training and testing. The training data is imported into the detection and classification model, and the Ghost module is used to replace the traditional convolution for feature extraction. The feature map output by the Ghost module is weighted by coordinate attention to enhance the response at key locations. The detection branch is constructed using a lightweight feature pyramid. A BiFPN structure is adopted, and the convolution in the cross-connection is replaced with Ghost convolution. The weighted feature map is imported into the detection branch. After further feature extraction through the lightweight feature pyramid, three Ghost convolutional layers with shared weights are used to generate bounding box predictions. In the classification branch, the weighted feature maps are concatenated, and different band features are dynamically weighted through channel attention. A classifier is built using two fully connected layers to generate class predictions. After iterative training, the bounding box predictions and class predictions are validated using the test data. When the validation is successful, the network parameters of the current detection classification model are retained, and the trained detection classification model is output.

6. The intelligent identification and classification method for marine debris based on multispectral UAV remote sensing imagery according to claim 5, characterized in that, Using the multi-band fused feature map as input, a lightweight detection and classification model is employed to output the bounding box and category of the waste, specifically: The multi-band fused feature map corresponding to the multi-band remote sensing image is imported into the detection and classification model. The garbage bounding box prediction result and garbage category prediction result are obtained through the detection branch and classification branch of parallel computing. Based on the predicted bounding box and category of the debris, time-series processing is performed to obtain the movement path of marine debris. Spatiotemporal features are extracted based on the movement path, and the movement path is predicted using the spatiotemporal features. The predicted movement path is then sent and displayed using a preset method.

7. A marine debris intelligent identification and classification system based on multispectral UAV remote sensing imagery, characterized in that, The system implements the intelligent identification and classification method for marine debris based on multispectral UAV remote sensing imagery as described in any one of claims 1-6. The system includes a data acquisition and preprocessing module, a multispectral feature enhancement module, a target detection and classification module, and a post-processing optimization module. The data acquisition and preprocessing module acquires remote sensing images of a preset sea area, preprocesses the remote sensing images, and uses a color correction network to perform color correction on the preprocessed remote sensing images. The multispectral feature enhancement module is responsible for acquiring multi-band features and enhancing the spectral index for garbage detection. It uses a multi-scale feature pyramid structure to fuse the multi-band features and spectral index to generate multispectral features. It uses a conditional generative adversarial network to generate enhanced remote sensing images and fuses them pixel by pixel with the color correction results to construct a multi-band fused feature map. The target detection and classification module is responsible for building a lightweight detection and classification model based on EfficientDet-Lite, taking the multi-band fused feature map as input, and using the detection branch and classification branch to obtain the garbage detection box and garbage category. The post-processing optimization module is responsible for optimizing waste detection results, reducing false detections and missed detections, and compressing and accelerating the detection and classification model.