An ocean garbage lightweight detection method and system for an edge computing platform
By constructing a high-precision basic detection model and combining it with a large kernel convolution, lightweight feature fusion, and residual scale compensation detection head, the problem of balancing detection accuracy and efficiency in marine debris monitoring has been solved, achieving lightweight and high-precision detection on an edge computing platform.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- STATE OCEAN TECH CENT
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for marine debris monitoring suffer from a tradeoff between detection accuracy and efficiency. In particular, when deployed on edge computing platforms, the high computational complexity and limited resources lead to decreased detection accuracy, especially for small targets and complex backgrounds.
A high-precision basic detection model is constructed. A large-kernel convolution module is used to enhance global perception. Lightweight feature fusion is used to reduce computational redundancy in the neck area. The residual scale compensation detection head improves the localization accuracy of small targets. The model is optimized through structured pruning and adaptive learning distillation strategies to form a lightweight and high-precision detection model.
While reducing the number of model parameters and computational load, it maintains extremely high detection accuracy and can be adapted to resource-constrained edge computing platforms such as drones and underwater vehicles, achieving high-precision, real-time detection of marine debris.
Smart Images

Figure CN122244736A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of marine environmental monitoring and computer vision technology, and in particular relates to a lightweight detection method and system for marine debris oriented towards edge computing platforms. Background Technology
[0002] Marine plastic pollution poses a serious threat to marine ecosystems, fisheries, and human health. Effective monitoring is fundamental to quantifying pollution, assessing risks, guiding cleanup efforts, and developing policies. Traditional methods for monitoring marine debris primarily include: Field surveys require manual collection, counting, and sorting of litter within pre-designated transects. While this method provides real-world ground data, it has limited spatial coverage, is labor-intensive, susceptible to observer subjectivity, and is difficult to apply to large-scale assessments of nearshore floating litter or seabed debris.
[0003] Ship-based observation and trawling: used to investigate floating debris, but inefficient due to limitations of weather, cost, and vast sea areas.
[0004] Satellite remote sensing: It can conduct large-scale observations, but its spatial resolution (usually greater than 10 meters) is insufficient, making it difficult to detect small and medium-sized pieces of trash near the shore, on beaches, and on the sea surface, especially common trash with a size of only a few centimeters to a few meters.
[0005] To compensate for the coarse-grained nature of satellite observations and the limitations of ground-based surveys, mobile platforms such as unmanned aerial vehicles (UAVs) and autonomous underwater vehicles (AUVs) have rapidly emerged. They offer advantages such as flexible deployment, low-altitude / near-ground flight to acquire centimeter-level ultra-high-resolution imagery, and access to dangerous or inaccessible areas, making them particularly suitable for monitoring beach litter, seabed debris, and nearshore floating debris. However, the core challenge has shifted from data acquisition to data interpretation, with the manual interpretation of thousands of high-resolution images becoming an unbearable bottleneck.
[0006] Therefore, automated target detection technology based on computer vision and deep learning has been introduced into marine debris monitoring. Single-stage detectors, represented by the YOLO series, have attracted attention due to their good speed-accuracy balance. However, directly applying general target detection models to real marine environments faces the following inherent and severe challenges: 1. Extreme imbalance in target scale: In marine debris datasets (typically including categories such as plastic bottles, plastic bags, fishing nets, ropes, and foam debris), the vast majority of targets are extremely small (normalized width / height < 0.2), while a small number of large targets exist, forming a significant long-tail distribution. General-purpose detectors are severely inadequate in detecting small targets (such as millimeter-sized foam debris), resulting in a high false negative rate.
[0007] 2. Targets with diverse shapes and high aspect ratios: such as ropes and fishing nets, which exhibit significant elongated and strip-like forms, require the model to capture long-distance contextual dependencies. Traditional small convolutional kernels are difficult to effectively model such global features.
[0008] 3. Complex and dynamic background: Coastal environments contain repetitive wave patterns, solar flares, foam, shadow variations, and natural objects such as seaweed and rocks. These background noises are visually easily confused with small, fragmented debris (such as foam fragments), greatly interfering with the distinction between real and false targets.
[0009] 4. Strict resource constraints of edge deployment: The final monitoring system typically needs to be deployed on platforms such as drones, AUVs, and solar-powered edge monitoring buoys. These platforms have limited battery life and computing power (such as the NVIDIA Jetson series) as well as stringent requirements for real-time inference. High-precision but computationally complex models (millions of parameters and tens of GFLOPs of computation) cannot meet the deployment requirements of real-time performance and low power consumption.
[0010] Existing technologies attempt to adapt to edge devices through model compression (such as pruning and quantization), but this often leads to a decrease in accuracy, especially a sharp drop in detection performance for small targets and difficult samples. Simple knowledge distillation methods lack adaptive mechanisms for the characteristics of marine debris detection tasks, resulting in limited accuracy recovery. Summary of the Invention
[0011] To address the problems mentioned in the background art, the present invention provides a lightweight detection method and system for marine debris oriented towards edge computing platforms, which solves the problem that it is difficult to balance detection accuracy and efficiency in the prior art.
[0012] The first objective of this invention is to provide a lightweight detection method for marine debris for edge computing platforms, comprising: S1. Construct a high-precision basic detection model, which includes: Large kernel convolution module enhances global perception of slender targets; Lightweight feature fusion neckline reduces computational redundancy; A residual scale compensation detection head improves the positioning accuracy of small targets; the residual scale compensation detection head includes: The basic prediction pathway generates initial bounding box offset predictions and class probability predictions through lightweight sub-networks; Scale awareness and residual generation: Based on the initial scale of the predicted target, a compensation residual is generated to optimize the localization of small targets. Residual fusion and final output: The final bounding box offset is obtained through weighted residual connections. S2. Perform structured pruning on the high-precision basic detection model to obtain a lightweight student model; S3. Using the high-precision basic detection model as the teacher model, and through an adaptive learning distillation strategy, the adaptive phased characteristics of human learning are simulated, and the knowledge is transferred to the lightweight student model to obtain the final lightweight high-precision detection model. S4. Deploy the lightweight, high-precision detection model on an edge computing device to perform real-time reasoning on the collected marine environment images and output the category and location information of the garbage targets.
[0013] The large kernel convolution module adopts a parallel path structure during the training phase and uses structural reparameterization technology during the inference phase to fuse the trained convolution kernels into a single depth-separable convolutional layer.
[0014] Preferably, the lightweight feature fusion neck adopts a structure based on GSConv and VoV-GSCSP modules to reduce computational complexity while maintaining feature expressive power.
[0015] Preferably, the structured pruning includes: Applying L1 regularization to the scaling factor of the batch normalization layer induces channel sparsity. Construct a network dependency graph, identify channel dependency groups, and set up protection sets to preserve channels for critical modules; Pruning of non-protected dependency groups is performed based on channel importance scores until the target pruning rate is achieved.
[0016] Preferably, in the adaptive learning distillation strategy, the distillation loss weight coefficient decays from its initial value to its final value according to a cosine function over the training period, so as to dynamically balance the relationship between the soft label of the teacher model and the fitted true label.
[0017] A second objective of this invention is to provide a lightweight marine debris detection system for edge computing platforms, comprising: The image acquisition module is used to acquire images of the marine environment; The edge computing processing module is used to run the aforementioned lightweight, high-precision detection model; The positioning and attitude determination module is used to acquire the device's geographical location and attitude information; The control and communication module is used to coordinate the operation of each module and output the test results.
[0018] Preferably, the edge computing processing module is deployed on an unmanned aerial vehicle, an autonomous underwater vehicle, or a monitoring buoy.
[0019] A third objective of this invention is to provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned method for lightweight marine debris detection for edge computing platforms.
[0020] A fourth objective of this invention is to provide a computer program product, including a computer program that, when executed by a processor, implements the aforementioned method for lightweight marine debris detection for edge computing platforms.
[0021] The advantages and positive effects of this application are: This invention enhances global perception of slender targets through large-kernel convolution, improves the localization accuracy of small targets through residual scale compensation detection heads, and maintains extremely high detection accuracy while significantly reducing the number of model parameters and computational load through synergistic optimization of pruning and distillation. It is well-suited for resource-constrained edge computing platforms such as UAVs and underwater vehicles. Specifically: This invention features high detection accuracy and strong targeting: it effectively models the global context of slender debris through large kernel convolution (UniRepLKNet Block), and significantly improves the localization accuracy of small targets through RSCD Head, enabling the basic model mAP@0.5 to reach 0.825, an improvement of 9.4% over the baseline, especially improving the detection effect of difficult-to-detect targets such as fishing nets, ropes and foam debris.
[0022] This invention features a high degree of model lightweighting: through Slim-neck design (GSConv, VoV-GSCSP) and structured pruning, the number of model parameters is compressed from 3.18M to 1.37M (a reduction of approximately 44%), and the computational cost is reduced from 5.3 GFLOPs to 3.3 GFLOPs, significantly reducing storage and computational requirements.
[0023] The present invention achieves good accuracy recovery: the proposed adaptive learning distillation strategy can dynamically balance the learning process from the teacher model and the learning process from the real labels, so that the mAP@0.5 of the lightweight model (OceanTrashNet-Lite) is restored to 0.815, which is almost the same as the performance of the original model. At the same time, the recall rate is improved to 0.742, ensuring the reliability in actual monitoring.
[0024] The invention boasts superior edge real-time performance: the final model achieves real-time inference speed on typical edge computing devices such as NVIDIA Jetson, fully meeting the real-time processing requirements of platforms such as UAVs and AUVs, making online and on-orbit detection possible. Attached Figure Description
[0025] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 A flowchart of a preferred embodiment of the present invention is shown; Figure 2 A comparison chart of model performance and computational cost of a preferred embodiment of the present invention is shown; Figure 3 A comparison diagram of the model attention response of a preferred embodiment of the present invention is shown; Figure 4 A comparison chart of model detection results of a preferred embodiment of the present invention is shown. Detailed Implementation
[0027] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0028] Please see Figure 1 The first embodiment, a lightweight detection method for marine debris for edge computing platforms, mainly includes: S1. Construct a high-precision basic detection model, which includes: Large kernel convolution module enhances global perception of slender targets; Lightweight feature fusion neckline reduces computational redundancy; Residual scale compensation detection head improves the positioning accuracy of small targets; S2. Perform structured pruning on the high-precision basic detection model to obtain a lightweight student model; S3. Using the high-precision basic detection model as the teacher model, and through an adaptive learning distillation strategy, the adaptive phased characteristics of human learning are simulated, and the knowledge is transferred to the lightweight student model to obtain the final lightweight high-precision detection model. S4. Deploy the lightweight, high-precision detection model on an edge computing device to perform real-time reasoning on the collected marine environment images and output the category and location information of the garbage targets.
[0029] To better understand the technical solution of the present invention, the following non-limiting description is provided: This invention mainly includes the following three stages: Phase 1: Building a high-precision basic detection model (OceanTrashNet) Backbone Network Design: A large-kernel convolutional module (UniRepLKNet Block) is introduced. This module employs a parallel path structure during training and utilizes structural reparameterization (SRP) to fuse the trained convolutional kernels into a single depthwise separable convolutional layer during inference. Specifically: during training, the large-kernel convolutional module uses parallel large-kernel and small-kernel convolutional paths to simultaneously capture long-range spatial dependencies (for fishing nets and ropes) and local details; during inference, SRP fuses the multi-branch structure into a single depthwise separable convolutional layer without increasing computational overhead during inference.
[0030] Lightweight feature fusion neck design: A slim-neck based on GSConv (Generalized Depthwise Separable Convolution) and VoV-GSCSP modules is adopted. By replacing standard convolutions with depthwise separable convolutions and combining channel shuffling, computational redundancy in the feature fusion process is significantly reduced.
[0031] Residual Scale Compensation Detection Head Design: A residual scale compensation detection head (RSCD Head) is designed. Based on the preliminary scale of the predicted target, a residual offset is generated, specifically used to correct the positioning error of small targets, thereby solving the problem of extreme imbalance at the marine debris scale.
[0032] Phase Two: Model Lightweighting and Accuracy Restoration The structured pruning includes: Applying L1 regularization to the scaling factor of the batch normalization layer induces channel sparsity. Construct a network dependency graph, identify channel dependency groups, and set up a protection set to preserve channels of critical modules; specifically: identify channel groups that must be pruned simultaneously, add layers in the RSCD Head to the protection set to prevent the loss of critical detection capabilities; Pruning of non-protected dependency groups is performed based on channel importance scores until the target pruning rate is achieved.
[0033] Adaptive learning knowledge distillation: This embodiment proposes an adaptive learning distillation strategy. During distillation training, the weight coefficients of the distillation loss decay from their initial values to their final values according to a cosine function over the training period, in order to dynamically balance the relationship between the soft labels of the teacher model and the fitted true labels.
[0034] In the early stages of training, the weights are relatively large, and the student model mainly imitates the softened probability distribution rich in "hidden knowledge" provided by the teacher model. In the later stages of training, the weights are reduced, and the student model gradually shifts towards fitting the true labels. This dynamic balancing mechanism effectively restores the accuracy of the pruning loss.
[0035] Phase 3: Edge Deployment and Real-time Monitoring The optimized model is deployed on edge computing devices (such as the NVIDIA Jetson series) and combined with GPS / IMU data to achieve a real-time pipeline from image input to garbage geographic coordinate output.
[0036] The following section details the formalized implementation process of each stage: I. Formulaic definition of the high-precision basic detection model (OceanTrashNet): Let the input marine environment image be ,in, Here are the image height and width. The model's final output is a set of N detected marine debris targets. ,in The center coordinates and width and height of the bounding box (normalized to image size). Label the waste category (such as plastic bottles, fishing nets, etc.). To detect the confidence score, N is the number of marine debris targets detected by the model from the input image.
[0037] 1.1 Image preprocessing and backbone network feature extraction: The input image is first adjusted to the fixed input size of the model. (like ), and then normalize it:
[0038] in, To obtain the channel mean of the images in the training dataset, This is the standard deviation vector of the images in the training dataset.
[0039] The normalized image is fed into the backbone network for feature extraction. This invention uses a UniRepLKNet module with integrated large kernel convolutions as the backbone core to address the global context awareness problem of long, thin debris (such as fishing nets and ropes). The input feature map of the layer is ;in, Indicates the index of the current network layer. Indicates the first The feature map output by the current layer, i.e., the feature map of the current layer. Input to the layer module, This represents the number of channels in the input feature map. , Indicates the first The height and width of the layer feature map This indicates that the values in the feature map are real numbers (such as floating-point numbers). A C3k2_UniRepLKNetBlock operation contains parallel paths: During the training phase, this block performs the following operations:
[0040]
[0041]
[0042] in, This represents the feature map extracted by the large kernel convolution branch. This represents a depthwise convolution operation, which performs convolution operations independently on each channel of the input without mixing channel information, thus significantly reducing the computational cost. The size of the convolution kernel is usually a large odd number (e.g., 13, 27). This represents the feature map extracted by the small kernel convolution branch. for Convolution operation, This is the final feature map output by this layer. This represents the normalization process performed on the summed feature maps. Representing a nonlinear activation function, this introduces a nonlinear factor, enabling the network to fit complex marine debris morphological distributions. Large kernel convolutions capture long-range spatial dependencies, while small kernel convolutions assist in learning local details. Outputs are obtained through residual connections and activation functions (such as SiLU).
[0043] During the inference phase, structural reparameterization is used to transform the trained... Convolution kernels are equivalently fused to In depthwise separable convolution kernels, an equivalent single... Depthwise separable convolutional layers. This process can be formally represented as finding equivalent weights. and bias , so that:
[0044] In this way, without increasing any computational overhead during inference, large convolutional kernels can simultaneously model both global structure and local details, which is crucial for marine scenarios where large fishing nets and tiny plastic debris coexist.
[0045] The backbone network ultimately outputs a multi-scale feature pyramid. These correspond to features downsampled by 8, 16, and 32 times, respectively, of the input image. .
[0046] 1.2 Lightweight Feature Fusion Neck (Slim-neck): The neck is responsible for efficiently fusing multi-scale features. Generate an enhanced feature pyramid For use by the detection head. This invention adopts a slim-neck design based on GSConv and VoV-GSCSP modules, aiming to reduce computational complexity while maintaining feature representation capability.
[0047] GSConv operation: for input features First, it is evenly divided into two parts along the channel dimension. .
[0048]
[0049]
[0050]
[0051] in, This represents two sub-feature maps after being uniformly split along the channel dimension. for Convolution operation, For depthwise separable convolution, To splice the output along the channel dimension, The final output after channel shuffling is the concatenated feature maps, which are then rearranged in group order. GSConv significantly reduces the number of parameters and computation by replacing some standard convolutions with depthwise separable convolutions, while maintaining sufficient feature interaction through channel shuffling.
[0052] The VoV-GSCSP module is a highly efficient feature aggregation module that employs a "one-time aggregation" structure, avoiding the excessively long gradient paths and computational redundancy issues associated with traditional chain-stacking methods. It fuses two input features. , For example:
[0053] in , For input features, For the output of the VoV-GSCSP module, in the Slim-neck of OceanTrashNet, the VoV-GSCSP module replaces the traditional C3k2 module, constructing top-down and bottom-up feature fusion paths, and finally outputting computationally efficient and information-rich features. .
[0054] 1.3 Residual Scale Compensation Detection Head (RSCD Head) To address the prevalent problem of extreme scale variations in marine debris, particularly the challenge of locating tiny targets such as foam debris, this invention designs a dedicated residual scale compensation detection head. This detection head is attached to each layer of the feature pyramid. .
[0055] For feature maps An anchor point position on the vector is denoted as . ,in The number of channels is a characteristic. The RSCD Head performs the following steps: a) Basic prediction path: Initial bounding box offset predictions and class probability predictions are generated through a lightweight subnetwork (e.g., two consecutive GSConv layers).
[0056]
[0057] in, It is the offset relative to the preset anchor frame size. yes Initial probability distribution for each waste category, It is the number of waste categories. It is a lightweight neural network whose function is to extract features and generate basic initial target prediction information.
[0058] b) Scale awareness and residual generation: This step aims to generate a compensation residual based on the initial scale of the predicted target, specifically for optimizing the localization of small targets.
[0059] First, the base offset is decoded into the initial predicted bounding box size (normalized value):
[0060] in, These correspond to the width and height of the anchor frame. The logarithm of the predicted box area is calculated as a scale representation: , It is a very small positive number. Then, based on the scale representation... Generate scale condition vector :
[0061] in, It is a multilayer perceptron network that receives scale information. Output scale condition vector Its core function is to dynamically generate conditional vectors based on the scale of the predicted target, guiding subsequent residual generation to focus on features at different scales.
[0062] Subsequently, a residual generation network is used with anchor features and scale condition vector Given the input, predict a residual offset:
[0063] in, These components are relative to the base offset. The “fine-tuning” or “residual” is used to correct the difference in the base forecast.
[0064] c) Residual Fusion and Final Output: The final bounding box offset is obtained through weighted residual concatenation.
[0065] in, This is a learnable scalar parameter, initially set to a small positive number (e.g., 0.1), so that the network primarily uses basic predictions in the early stages of training, gradually learning to use residuals for fine-tuning as training progresses. Class prediction directly uses the output of the basic pathway. .
[0066] 1.4 Loss Function and Training Train the complete OceanTrashNet model (as a subsequent teacher model). When ), the total loss function It is composed of a weighted average of the target loss, classification loss, and bounding box regression loss:
[0067] A binary cross-entropy loss is used to determine whether the anchor point contains the target.
[0068] Cross-entropy loss or Focal Loss is used to handle classification tasks.
[0069] CIoU Loss is adopted, which comprehensively considers the overlapping area of bounding boxes, the distance between center points, and the consistency of aspect ratio. It is defined as follows:
[0070] in, For intersection, union, and comparison, Let Euclidean distance be the center point of the predicted bounding box and the ground truth bounding box. It is the diagonal length of the smallest closed region that simultaneously contains both the predicted bounding box and the ground truth bounding box. It is a parameter that measures the consistency of aspect ratio. It is the weighting coefficient.
[0071] Weighting coefficients to balance the various losses.
[0072] The model trained through the above process is the high-precision base model (teacher model). Based on ablation experiment data, its parameter count is approximately 3.18M, computational cost is 5.3 GFLOPs, and its mAP@0.5 score reaches 0.825 on the marine debris test set, representing a 9.4% improvement over the YOLOv11n baseline.
[0073] II. Collaborative Optimization of Structured Pruning and Adaptive Knowledge Distillation: To make OceanTrashNet suitable for edge platforms, it needs to be lightweighted and compressed while maintaining accuracy to the maximum extent. This invention proposes a two-stage collaborative optimization method and process.
[0074] 2.1 Structured Pruning Stage Objective: From the teacher model A lightweight student model was obtained. .
[0075] a) Channel importance sparsity induction: The scaling parameter for all batch normalized (BN) layers in the model. Apply L1 regularization to encourage it to approach zero. During fine-tuning training, the loss function becomes:
[0076] in, For regularization strength (e.g.) ), This represents the set of all Batch Normalization (BN) layers. After training, The absolute value of the value indicates the importance of the corresponding channel; the smaller the absolute value, the more redundant the channel is.
[0077] b) Dependency-based grouping pruning and critical module protection: Due to operations such as residual connections and feature splicing in the network, structural dependencies exist between channels. A dependency graph needs to be constructed. To identify the channel groups (dependency groups) that must be pruned simultaneously. For each dependency group Calculate its overall importance score. For example, retrieve all BN layers within a group. L2 norm of the parameter:
[0078] Key protection strategy: Identify and protect modules critical to the task. In OceanTrashNet, the RSCDHead is responsible for small object detection, and its channels are crucial. Therefore, all layers in the RSCDHead are added to the protection set. The dependency group it belongs to is exempt from pruning.
[0079] Set target pruning rate (e.g., 40%). All non-protection dependency groups were ranked by their importance score. Sort in ascending order and remove channels of the corresponding proportion from each group in turn (i.e., cut off). (The channel with the smallest value), until the cumulative parameter count decreases to the target. .
[0080] After pruning, the model undergoes short-term fine-tuning to stabilize its performance, resulting in... After pruning, the number of model parameters decreased to 1.37M, the computational cost decreased to 3.3 GFLOPs, the mAP@0.5 was approximately 0.797, and the recall rate was 0.725, indicating that the core detection capabilities were retained.
[0081] 2.2 Adaptive Knowledge Distillation Stage Objective: To develop a teacher model The "knowledge" is transferred to the pruned student model. This restores its accuracy, resulting in a final, deployable, lightweight, high-precision model. .
[0082] This invention employs an adaptive cosine decay distillation strategy based on output logic (Logits).
[0083] a) Knowledge distillation loss: For the same input image The output logic (vector before softmax) of the teacher model is: The output logic of the student model is .
[0084] Introducing a temperature parameter T "softens" the logic, resulting in a smoother probability distribution:
[0085] Distillation loss is defined as the Kullback-Leibler divergence between the two:
[0086] Multiply This is to maintain the stability of the gradient magnitude during backpropagation. Softened teacher probability. It includes similarity relationships between categories (i.e., "dark knowledge"), such as the similarity between plastic and foam in shape and context, which helps student models learn more robust classification boundaries.
[0087] b) Adaptive weighted total loss function: The total loss of the student model during distillation training is a dynamic weighted sum of the ground truth (GT) supervision loss and the distillation loss:
[0088] in It is the same detection loss as when training the teacher model (i.e. The core innovation lies in the weighting coefficient. The design, it changes with the training cycle Dynamic changes:
[0089] in, This refers to the current training cycle (epoch). The preset total number of distillation training cycles (e.g., 100). (e.g., 0.7) and (e.g., 0.2) represent the initial and final distillation loss weights, respectively.
[0090] Early training phase: The value is relatively large. The student model mainly imitates the softened probability distribution rich in dark knowledge provided by the teacher model, which plays a powerful regularization role. Analogous to students listening to the teacher explain the principles in class, it helps the student model quickly establish good feature representation and classification logic, and avoids overfitting to the limited GT labels.
[0091] Late training phase: As the value decays less, the student model gradually shifts its learning focus to accurately fitting the true GT labels, analogous to improving accuracy through student question bank training, thus completing the final precision fine-tuning.
[0092] c) Distillation training process: Fixed Teacher Model The parameters are used. The student model is initialized using pruned and fine-tuned weights. Run on the training dataset In each cycle, each batch of data is simultaneously fed forward through the teacher and student models, as described above. Calculate the loss and backpropagate only to update the parameters of the student model.
[0093] After training, the final OceanTrashNet-Lite model is obtained. Its architecture and Completely consistent, but with significantly improved performance. According to the knowledge base, its mAP@0.5 improved to 0.815, and recall improved to 0.742.
[0094] III. Marine debris detection devices and real-time monitoring for edge platforms: The above training results The model is deployed to specific edge computing devices (such as onboard computers integrated into UAVs and AUVs) to form a complete marine debris detection device. The real-time monitoring method of this device includes the following steps: 3.1 Image acquisition and preprocessing: Camera devices mounted on mobile platforms (such as drones) acquire marine environmental images in real time. (No. (Frame). The onboard processing unit performs preprocessing: scaling and padding to the model input size. And perform normalization consistent with that used during training.
[0095] 3.2 Lightweight model for real-time inference, using preprocessed image tensors Enter the deployed The model inference engine (e.g., using a runtime optimized with TensorRT, ONNX, etc.). Internally, the engine performs the aforementioned forward propagation computation, outputting a preliminary detection result set at the original image scale. The coordinates have been mapped back to the original image.
[0096] 3.3 Post-processing of test results: First, confidence filtering is performed to remove low-confidence tests.
[0097] in Set a confidence threshold (e.g., 0.25). Then, apply non-maximum suppression (NMS) to remove highly overlapping redundant boxes.
[0098] in This is the crossover-union threshold for NMS (e.g., 0.45). This is the final list of garbage detections for the current frame.
[0099] 3.4 Geospatial Coordinate Mapping (Optional, for Mobile Platforms): If the detection device is integrated into a mobile platform equipped with GPS, IMU, and height / depth sensors, image pixel coordinates can be converted into geospatial coordinates to achieve precise geographic location of the waste. Let the camera intrinsic parameter matrix be... The platform is always The position is The attitude is determined by the rotation matrix. Description. For the center pixel coordinates of the detection box. Its geographic coordinates can be calculated using the collinearity equation in photogrammetry or a simplified orthophoto model. Simplified model (applicable to UAV orthophotos or underwater platforms at known altitudes):
[0100]
[0101]
[0102] in, Principal point of the image, Ground sampling distance (meters / pixel), relative to flight altitude and camera focal length Related ( ), This is the approximate radius of the Earth.
[0103] 3.5 Results Output and Decision Support: The detection device ultimately outputs detection results with timestamps and geographic references:
[0104] The results can be transmitted back to the control center in real time via wireless link, visualized on an electronic map, and used to generate a heat map of waste distribution. They can also be used to guide unmanned cleaning vessels, robotic arms, and other equipment to carry out targeted cleaning operations, or stored for long-term pollution trend analysis.
[0105] Please see Figures 2 to 4 Model performance analysis: 1. OceanTrashNet architecture optimization To objectively analyze the contributions of each component in the OceanTrashNet framework, a systematic ablation experiment was conducted, and the results are summarized in Table 1. The baseline model achieved an mAP50 of 0.754, but its low recall (0.652) exposed a fundamental weakness: it missed nearly 35% of targets when faced with diverse and challenging garbage in the dataset. This sets a clear performance bottleneck for the application of standard architectures in this domain.
[0106] Introducing the UniRepLKNet module based on large kernel convolutions profoundly changed the model's behavior. Precision improved significantly by 8.9 percentage points to 0.801, indicating a substantial suppression of the model's ability to generate false positives. The large receptive field enabled the network to integrate global contextual information during early forward propagation. For example, the model can now understand that a long, winding shape in sand is more likely a rope than a naturally formed ridge, or a cluster of small white spots is more likely a Styrofoam Piece than sunlight shimmering on waves. This contextual reasoning effectively suppressed false detections. However, the improvement in recall was relatively limited (from 0.652 to 0.677), suggesting that while the model became more "confident" in its predictions, it didn't "see" significantly more objects. The global context served more for validation than initial discovery.
[0107] The integrated Slimneck architecture based on GSConv directly addresses the recall bottleneck. This modification resulted in the most significant recall leap, improving by 10.3 percentage points to 0.747, and correspondingly, mAP50 increased to 0.806. Slimneck's design philosophy focuses on preserving gradient flow and minimizing information loss during multi-scale feature fusion. In practice, this means that subtle features crucial for identifying small debris are not diluted as they are passed from the backbone to the detection head. Its efficient design even reduces computational cost from 6.3 GFLOPs to 6.0 GFLOPs. This result demonstrates that when detecting a large number of objects of varying sizes in cluttered environments, an efficient and information-rich feature fusion pathway is just as important as a robust backbone network.
[0108] Finally, the Residual Scale Compensated Detector Head (RSCD Head) was integrated with the aforementioned components to form the complete OceanTrashNet. This model achieved peak performance: the highest overall mAP50 (0.825, an absolute improvement of 9.4% over the baseline) and the highest mAP50-95 (0.665, an improvement of 13.1%). Crucially, it achieved an optimal balance between precision (0.792) and recall (0.762). The RSCD Head, with its explicit multi-branch design for different object scales and residual connections for feature preservation, played a key role in achieving accurate bounding box regression, directly reflected in the excellent performance of the mAP50-95 metric, which has stringent requirements for localization accuracy. Notably, this optimal accuracy was achieved with the lowest computational cost (5.3 GFLOPs) in the full-size model, demonstrating that our co-designed architecture successfully improved both accuracy and efficiency simultaneously.
[0109] 2. Performance Evaluation of Model Compression and Knowledge Distillation A key contribution of this research is the transition from a high-precision laboratory model to a practical, deployable solution. We rigorously evaluated a two-stage compression pipeline. The first stage, structured pruning, compresses the full OceanTrashNet into an OceanTrashNet-Lite (Undistilled) model, resulting in a 43.7% reduction in parameters and a 37.7% reduction in computation. This significant reduction inevitably incurs a performance cost, with mAP50 dropping to 0.797 and recall suffering an even more pronounced decline to 0.722. This model demonstrates that the pruned network loses some of its ability to generate diverse feature activations to propose candidates, particularly for more challenging categories, but retains its ability to correctly classify clear candidates (thus maintaining a high precision of 0.807).
[0110] The subsequent adaptive learning knowledge distillation stage aims to recover the "knowledge" lost during pruning. The results are convincing. The final OceanTrashNet-Lite model achieves an mAP50 of 0.815, meaning it recovers 94.1% of the accuracy lost due to pruning, reaching 98.8% of the full model's performance. Most importantly, recall significantly improved from 0.722 to 0.742. Our adaptive learning knowledge strategy, which dynamically adjusts the balance between mimicking the teacher network's soft labels and adhering to the true labels based on prediction confidence, has proven particularly effective in recovering the model's object-finding capabilities, which is crucial for marine environmental monitoring applications.
[0111] Table 1. Comparison of Model Performance and Computational Cost
[0112] 3. Qualitative visualization and interpretability analysis. Please see Figure 3 Comparative attention heatmaps generated by Grad-CAM provide a visual explanation for the performance improvement. The baseline model's heatmaps exhibit a scattered and localized activation pattern, with attention highly focused on high-contrast edges or isolated core regions of objects in the image. For objects with curved or elongated geometries (such as ropes or plastic products), this localized attention hinders the model from forming a coherent perception of the object's overall contour. Therefore, the heatmaps often only highlight parts of the object, failing to cover its full length. This fundamentally explains the missed detections observed in real-world applications—the network fails to build a complete feature-level representation of the global context of such objects. In contrast, the first-component model, employing large-kernel convolutions, exhibits a distinctly different activation pattern: its heatmaps show a wider, more diffuse attention region, covering a greater extent of the image context. However, this global focus can sometimes lead to an overly large recognition range (…). Figure 3 The first row of the algorithm includes portions of the background or adjacent irrelevant regions into the high-confidence activation region. However, in some cases, the main activation region may exhibit an overall positional shift. Figure 3 (Line 3 in the image) cannot be precisely aligned with the actual location of the target. This explains why its detection box may have localization bias or contain too much background interference. The OceanTrashNet model and its lightweight variant OceanTrashNet-Lite effectively combine the advantages of global and local information: its heatmap has sufficient spatial coverage to capture long-distance dependencies, while it can more accurately focus on semantically relevant target regions, demonstrating excellent noise suppression and target alignment capabilities.
[0113] Please see Figure 4This provides further qualitative evidence. The baseline model exhibits significant false negatives in multiple scenarios, particularly in categories with complex or elongated shapes (such as ropes and plastics), often detecting only partial fragments of objects or completely missing the target. While the first component model improves recall, its predicted bounding boxes are often too large. Figure 4 (Line 1 in the text) — The bounding box is much larger than the actual size of the object, and the position is offset ( Figure 4 (Line 3 in the text) — The bounding box cannot accurately fit the object. While the second component model improves recall, its bounding box accuracy has defects: such as... Figure 4 As shown in rows 2 and 3, the shape and size of the predicted bounding box have a low match with the target object; furthermore, in Figure 4 The first row still shows significant missed detections. This indicates that while its efficient feature fusion improves detection sensitivity, it also sacrifices localization accuracy to some extent. The full OceanTrashNet model, however, achieves significant improvements in all of these areas: it generates more bounding boxes, provides more accurate and complete boundary fitting, and performs particularly well in detecting curved, slender objects, and small fragments. Its lightweight version, OceanTrashNet-Lite, shows visually high similarity to the full model, maintaining extremely high recognition accuracy while reducing computational cost, thus validating the effectiveness of the distillation technique.
[0114] A lightweight marine debris detection system for edge computing platforms includes: The image acquisition module mainly includes an optical camera or video camera, used to acquire marine environmental images of the area to be monitored; The edge computing processing module integrates an edge computing chip to store and run the aforementioned lightweight, high-precision detection model, which is used to perform image preprocessing, model inference, and result postprocessing. The positioning and attitude determination module is used to acquire the device's geographical location and attitude information; it includes a GPS receiver, an inertial measurement unit (IMU), and an altimeter / depth sensor to acquire the device's own geographical location, attitude, and height / depth information.
[0115] The control and communication module coordinates the work of each module and sends the processing results (detected waste type, confidence level, image location, and geographic coordinates) to a remote monitoring center or local storage via a wireless communication link.
[0116] The edge computing processing module is deployed on unmanned aerial vehicles, autonomous underwater vehicles, or monitoring buoys.
[0117] A computer-readable storage medium storing a computer program that, when executed by a processor, implements the aforementioned lightweight marine debris detection method for edge computing platforms.
[0118] A computer program product includes a computer program that, when executed by a processor, implements the aforementioned lightweight marine debris detection method for edge computing platforms.
[0119] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented, in whole or in part, as a computer program product, the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line, or wireless (e.g., infrared, wireless, microwave, etc.) means). The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that integrates one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive), etc.
[0120] The above description is only a preferred embodiment of the present invention. It should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A lightweight detection method for marine debris for edge computing platforms, characterized in that, include: S1. Construct a high-precision basic detection model, which includes: Large kernel convolution module enhances global perception of slender targets; Lightweight feature fusion neckline reduces computational redundancy; A residual scale compensation detection head improves the positioning accuracy of small targets; the residual scale compensation detection head includes: The basic prediction pathway generates initial bounding box offset predictions and class probability predictions through lightweight sub-networks; Scale awareness and residual generation: Based on the initial scale of the predicted target, a compensation residual is generated to optimize the localization of small targets. Residual fusion and final output: The final bounding box offset is obtained through weighted residual connections. S2. Perform structured pruning on the high-precision basic detection model to obtain a lightweight student model; S3. Using the high-precision basic detection model as the teacher model, and through an adaptive learning distillation strategy, the adaptive phased characteristics of human learning are simulated, and the knowledge is transferred to the lightweight student model to obtain the final lightweight high-precision detection model. S4. Deploy the lightweight, high-precision detection model on an edge computing device to perform real-time reasoning on the collected marine environment images and output the category and location information of the garbage targets.
2. The lightweight marine debris detection method for edge computing platforms according to claim 1, characterized in that, The large kernel convolution module adopts a parallel path structure during the training phase and uses structural reparameterization technology during the inference phase to fuse the trained convolution kernels into a single depth-separable convolutional layer.
3. The lightweight marine debris detection method for edge computing platforms according to claim 1, characterized in that, The lightweight feature fusion neck adopts a structure based on GSConv and VoV-GSCSP modules to reduce computational complexity while maintaining feature expressive power.
4. The lightweight marine debris detection method for edge computing platforms according to claim 1, characterized in that, The structured pruning includes: Applying L1 regularization to the scaling factor of the batch normalization layer induces channel sparsity. Construct a network dependency graph, identify channel dependency groups, and set up protection sets to preserve channels for critical modules; Pruning of non-protected dependency groups is performed based on channel importance scores until the target pruning rate is achieved.
5. The lightweight marine debris detection method for edge computing platforms according to claim 1, characterized in that, In the adaptive learning distillation strategy, the distillation loss weight coefficient decays from its initial value to its final value according to a cosine function over the training period, so as to dynamically balance the relationship between the soft label of the teacher model and the fitted true label.
6. A lightweight marine debris detection system for edge computing platforms, characterized in that, include: The image acquisition module is used to acquire images of the marine environment; An edge computing processing module is used to run the lightweight, high-precision detection model as described in any one of claims 1-5; The positioning and attitude determination module is used to acquire the device's geographical location and attitude information; The control and communication module is used to coordinate the operation of each module and output the test results.
7. The lightweight marine debris detection system for edge computing platforms according to claim 6, characterized in that, The edge computing processing module is deployed on unmanned aerial vehicles, autonomous underwater vehicles, or monitoring buoys.
8. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the lightweight marine debris detection method for edge computing platforms as described in any one of claims 1-5.
9. A computer-readable storage medium storing a computer program, characterized in that, When executed by the processor, the program implements the lightweight marine debris detection method for edge computing platforms as described in any one of claims 1-5.