Intelligent recognition method for polymetallic nodules on the seabed based on YOLO-nodules
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- QINGDAO INST OF MARINE GEOLOGY
- Filing Date
- 2025-01-22
- Publication Date
- 2026-06-16
Smart Images

Figure CN120047790B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of polymetallic nodule image target recognition technology, specifically relating to a smart recognition method for seabed polymetallic nodules based on YOLO-Nodules. Background Technology
[0002] Polymetallic nodules are a naturally occurring polymetallic mineral resource distributed in deep-sea plains and seamounts. Rich in elements such as Fe, Mn, Ni, and Co, these nodules are currently widely used in the chemical industry and high-tech production, such as in solar cells and superconductor materials. Polymetallic nodules are among the most important deep-sea mineral resources and strategic resources, and their further exploration and development are of significant strategic importance. Accurately calculating the distribution density of polymetallic nodules on the seabed and estimating regional productivity and economic value is an essential step in the large-scale mining and utilization of polymetallic nodules.
[0003] In the measurement of polymetallic nodules, traditional methods mainly rely on manual estimation and analysis, which is time-consuming, labor-intensive, and inefficient, and difficult to apply across large areas. While some existing research attempts to introduce supervised classification methods such as machine learning and deep learning to automate the identification and analysis of polymetallic nodules, this requires a large amount of real and reliable labeled polymetallic nodule data for model training and validation. However, manually labeling a large number of polymetallic nodule targets is also time-consuming, labor-intensive, and inefficient. Without a sufficient amount of polymetallic nodule label data, it is difficult to train and optimize a good supervised classification model. Limited by the quantity and types of manually labeled data, supervised models trained based on manually labeled data only show good recognition performance in specific regions or scenarios. This makes it difficult to directly apply these manually sample-based supervised classification methods to other specific areas; in other words, traditional supervised methods suffer from poor universality.
[0004] Furthermore, while deep learning methods based on neural networks are widely used for object recognition and classification in natural scenes, the convolution operation can lead to the loss of features in small objects during layer-by-layer convolution, making the recognition of irregular small objects, especially polymetallic nodules, a challenge. Therefore, applying existing deep learning methods or models to the recognition of small objects like polymetallic nodules can easily result in misclassification or omission of nodule targets, leading to poor recognition performance and low accuracy.
[0005] Therefore, how to automatically generate polymetallic nodule label data, accurately identify and segment small target objects such as polymetallic nodules, and then implement automated and intelligent estimation of polymetallic nodules, solve the problems of time-consuming and laborious traditional manual estimation methods, difficulty in obtaining nodule label data, and poor recognition effect of small target objects, and achieve accurate detection and segmentation of polymetallic nodule targets, is an urgent problem to be solved by those skilled in the art. Summary of the Invention
[0006] This invention addresses the shortcomings of existing technologies by proposing an intelligent identification method for polymetallic nodules on the seabed based on the YOLO-Nodules model. It automatically generates polymetallic nodule tag data and combines it with the constructed YOLO-Nodules model to achieve intelligent detection and segmentation of polymetallic nodule targets. Furthermore, it calculates polymetallic nodule index parameters within the region, solving the problems of time-consuming and labor-intensive traditional manual estimation methods, difficulty in obtaining nodule tag data, and poor recognition of small targets. This method achieves accurate identification and segmentation of polymetallic nodule targets.
[0007] This invention is achieved using the following technical solution: a smart identification method for seabed polymetallic nodules based on the YOLO-Nodules model, comprising the following steps:
[0008] Step S1: Create a polymetallic nodule image dataset: Combine polymetallic nodule images obtained from seabed photography, randomly select some representative images and perform image segmentation to generate a polymetallic nodule image dataset, which will be used for subsequent nodule label data generation and YOLO-Nodules model training.
[0009] Step S2: Automatically generate polymetallic nodule label data: The polymetallic nodule image dataset is sequentially processed by image enhancement, Segment Anything Models (SAM) model processing, and image post-processing to generate semantic segmentation label data and object detection label data;
[0010] Step S3, Intelligent Detection and Segmentation of Polymetallic Nodules: Construct a YOLO-Nodules model suitable for polymetallic nodules, which consists of Conv+BN+SiLU (CBS) module, Space to Depth Conversion (SDC) module, Spatial Context Aware Module (SCAM) module, Spatial Pyramid Pooling Fast (SPPF) module, Cross Stage Partial convolution module with Shuffle Attention (CSPSA) module, and Cross Stage Partial convolution module with two CBS blocks (CSP2C) module.
[0011] The image data of polymetallic nodules and the corresponding semantic segmentation label data are input into the YOLO-Nodules model to train a deep learning model suitable for the semantic segmentation task of polymetallic nodules targets. The trained YOLO-Nodules model for the semantic segmentation task is obtained. Then, the image of the polymetallic nodules to be identified is input into the trained YOLO-Nodules model to obtain the semantic segmentation result image of the polymetallic nodules target.
[0012] The polymetallic nodule image data and the corresponding target detection label data are input into the YOLO-Nodules model to train a deep learning model suitable for the target detection task of polymetallic nodules. The trained YOLO-Nodules model for the target detection task is obtained. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the polymetallic nodule target detection result image.
[0013] Step S4: Calculation of polymetallic nodule index parameters: Based on the polymetallic nodule target detection results and semantic segmentation results, calculate the polymetallic nodule index parameters, including the number of polymetallic nodules, the size of polymetallic nodules, the number of polymetallic nodules distributed per unit area, the proportion of large, medium and small polymetallic nodules, the distribution area of polymetallic nodules, and the coverage rate of polymetallic nodules.
[0014] Further, in step S1, a portion of representative images are randomly selected and image segmentation is performed. Specifically, this includes: taking into account the differences in the shape, distribution area, and image shooting environment of polymetallic nodules, N representative images are randomly selected from the polymetallic nodule image dataset acquired by photography; to facilitate subsequent model training, the selected original images are cropped into W*H (height H pixels, width W pixels), and the overlap rate between adjacent images is R%; finally, a polymetallic nodule image dataset containing M images is obtained.
[0015] Furthermore, in step S2, image enhancement specifically includes the following steps:
[0016] (1) Enhance the brightness of the polymetallic nodule image to ensure that the nodule target is clearly visible in the image.
[0017] (2) Enhance the image sharpness of the polymetallic nodule image after the image brightness is enhanced, so that the edges and details of the image are clearer, thereby improving the overall image clarity.
[0018] (3) Enhance the image contrast of the sharpened multimetallic nodule image to make the difference between the nodule target area and the substrate area in the image more obvious, increase the detail information of the image, and facilitate the detection and segmentation of the nodule target.
[0019] Further, in step S2, the Segment Anything Models (SAM) model processing is specifically carried out in the following way: the enhanced polymetallic nodule image is input into the SAM model, the pre-trained weights disclosed by the SAM model are used to optimize the hyperparameters of the model, without adding any prompts, the nodule target is pre-identified globally in the image, the distribution range of the pre-detected polymetallic nodules is generated, and the result is a grayscale image containing the nodule target.
[0020] Furthermore, in step S2, the image post-processing specifically includes the following steps:
[0021] (1) To generate semantic segmentation task label data, the steps are as follows:
[0022] By setting a threshold, the grayscale image of the nodule target generated by the SAM model in step S2 is converted into a binary image. Pixels with a value of 1 represent polymetallic nodule pixels, and pixels with a value of 0 represent seabed sediment pixels.
[0023] In the binary image, the contour range of all nodule targets is found, contour coordinates are generated, and the contour coordinates are normalized according to the height and width of the image. Then, the contours of all polymetallic nodule targets are traversed, and all contour coordinates of the nodule targets are stored in a TXT format file. That is, the polymetallic nodule labels are converted from binary image format labels to TXT format labels. The internal format of the TXT file is "object X1 Y1 X2 Y2...", where object represents the nodule target category, X1 represents the X coordinate of the first contour point of the first nodule target, Y1 represents the Y coordinate of the first contour point of the first nodule target, X2 represents the X coordinate of the second contour point of the first nodule target, Y2 represents the Y coordinate of the second contour point of the first nodule target, and so on.
[0024] (2) To generate target detection task label data, the steps are as follows:
[0025] By setting a threshold, the grayscale image of the nodule target generated in step S2 is converted into a binary image. Pixels with a value of 1 represent polymetallic nodule pixels, and pixels with a value of 0 represent seabed sediment pixels.
[0026] A series of morphological processing methods are applied to the binary image, including image erosion, distance transformation, image normalization, image thresholding, and opening, to eliminate speckle noise in the image, solve the problem of some nodule targets being close together or sticking together, and generate a morphologically processed binary image.
[0027] In the morphologically processed binary image, the contour range of all nodule targets is found. Based on the contour range, the coordinates of the upper left and lower right corners of the nodule targets are calculated, and then the coordinates, height, and width of the center point of the nodule targets are calculated. Then, based on the height and width of the image, the coordinates, height, and width of the center point of the nodule targets are normalized. Finally, the contours of all polymetallic nodule targets in the image are traversed, and the coordinates, height, and width of the center point of the nodule targets are calculated and stored in a TXT format file. This converts the polymetallic nodule labels from binary image format labels to TXT format labels. The internal format of the TXT file is "object X1 Y1 W1 H1…", where object represents the nodule target category, X1 represents the X coordinate of the center point of the first nodule target, Y1 represents the Y coordinate of the center point of the first nodule target, W1 represents the width of the first nodule target, H1 represents the height of the first nodule target, and so on.
[0028] Furthermore, in step S3, the specific process of constructing the YOLO-Nodules model is as follows:
[0029] (1) Constructing the Backbone of the YOLO-Nodules model: The main function of the Backbone is to extract multi-scale features of nodule targets from the image. It uses CBS and CSPSA modules as basic units. The CBS module consists of a Convolution layer, a Batch Normalization layer, and a SiLU activation function. The CSPSA module consists of four CBS modules and a Shuffle Attention module. An SCAM module is embedded at the end of the Backbone to construct the global context relationship within the image and consider the relationship between small target objects and the global region. The SCAM module consists of three CBS modules, an Average Pool layer, a Max Pool layer, a Softmax activation function, a Sigmoid activation function, and a SiLU activation function. An SPPF module is also embedded to use pooling operations of different scales to stitch together feature maps of different scales, thereby improving the ability to recognize targets of different sizes. The SPPF module consists of two CBS modules and three Max Pool layers.
[0030] (2) Constructing the Neck part of the YOLO-Nodules model. The main function of the Neck part is multi-scale feature fusion, which fuses feature maps from different stages of the Backbone to enhance feature representation capabilities. It consists of CBS, SDC, CSP2C and SCAM modules; among them, the SDC module is used to convert the spatial dimension of the feature map into the depth dimension to enhance feature representation, thereby making up for the lack of information in low-resolution images and improving the model's ability to process small objects and low-resolution images; the three side outputs of the Neck part are respectively added to the SCAM module to improve the global correlation capability across channels and across spaces, enhance the weak feature representation of small targets, and suppress easily confused backgrounds; the CSP2C module consists of four CBS modules and incorporates a residual mechanism.
[0031] (3) Construct the Prediction part of the YOLO-Nodules model. The YOLO-Nodules model inherits the decoupling head of the YOLOv8 model. The decoupling head enables each task to focus on its own objective during the optimization process, thereby improving its accuracy.
[0032] Further, in step S4, the polymetallic nodule target detection results and semantic segmentation results are combined to calculate the polymetallic nodule index parameters, specifically including:
[0033] (1) Calculate the number of polymetallic nodules and the distribution of polymetallic nodules per unit area: Based on the nodule targets identified in the target detection results, calculate the number of polymetallic nodules in the image. The number of polymetallic nodules in the image divided by the image area equals the distribution of polymetallic nodules per unit area.
[0034] (2) Calculate the size of the polymetallic nodule: Based on the coordinates of the upper left and lower right corners of the recognition box in the target detection result, calculate the length of the diagonal of the nodule, and then multiply it by the scaling factor to finally obtain the length of the major axis of the nodule target, i.e. the size of the nodule.
[0035] (3) Calculate the proportion of large, medium and small polymetallic nodules: Based on the definitions of large, medium and small nodules and the nodule size that has been calculated, calculate the proportion of large, medium and small polymetallic nodules.
[0036] (4) Calculate the distribution area of polymetallic nodules: The number of pixels covered by the nodules multiplied by the area of a single pixel equals the distribution area of polymetallic nodules.
[0037] (5) Calculate the polymetallic nodule coverage: The polymetallic nodule distribution area divided by the image area equals the polymetallic nodule coverage.
[0038] Compared with the prior art, the advantages and positive effects of the present invention are as follows:
[0039] (1) This scheme can generate a large number of reliable nodule label samples without manual intervention by automatically generating target detection labels and semantic segmentation labels for multi-metal nodules, providing data support for the training of subsequent deep learning models. Moreover, this automatic label generation method can be extended to the automatic generation of single target label samples in other scenarios or fields, solving the problem of lack of label samples when applying deep learning methods.
[0040] (2) Constructing the YOLO-Nodules model effectively solves the problem that existing deep learning methods or models are prone to poor target recognition and low accuracy when applied to the recognition of small targets such as polymetallic nodules. It enables accurate detection and segmentation of polymetallic nodule targets from images. In addition, it can achieve polymetallic nodule target detection and semantic segmentation tasks based on the same model, realize accurate detection and segmentation of nodule targets, and provide support for the accurate calculation of polymetallic nodule index parameters.
[0041] (3) By combining the target detection results and semantic segmentation results generated by the intelligent recognition method, the automatic calculation of multiple index parameters of polymetallic nodules can be realized, including the number of nodules, the size of nodules, the number of nodules distributed per unit area, the distribution area of nodules, and the coverage rate of nodules. This enables the preliminary estimation of the polymetallic nodule production capacity in a certain area, providing data information support for further seabed mineral resource estimation and development, and serving the marine resource exploration strategy. Attached Figure Description
[0042] Figure 1 This is a flowchart of the intelligent identification method for polymetallic nodules on the seabed according to an embodiment of the present invention;
[0043] Figure 2 This is a schematic diagram illustrating the principle of automatic generation of multimetallic nodule tag data in an embodiment of the present invention;
[0044] Figure 3 This is a schematic diagram illustrating the principle of intelligent detection and segmentation of polymetallic nodules and its index parameters according to an embodiment of the present invention.
[0045] Figure 4 This is a schematic diagram of the YOLO-Nodules model structure as described in an embodiment of the present invention;
[0046] Figure 5 This is a schematic diagram of the structure of the CBS, SDC and SCAM modules of the YOLO-Nodules model according to an embodiment of the present invention, wherein (a) is the CBS module, (b) is the SDC module and (c) is the SCAM module;
[0047] Figure 6 This is a schematic diagram of the CSPSA, CSP2C and SPPF modules of the YOLO-Nodules model described in the embodiments of the present invention, wherein (a) is the CSPSA module, (b) is the CSP2C module and (c) is the SPPF module;
[0048] Figure 7 This is a schematic diagram of some polymetallic nodule target detection labels and semantic segmentation labels automatically generated in an embodiment of the present invention;
[0049] Figure 8 This is a schematic diagram of the semantic segmentation results of some polymetallic nodules in an embodiment of the present invention;
[0050] Figure 9 This is a schematic diagram of the detection results of some polymetallic nodules in an embodiment of the present invention. Detailed Implementation
[0051] To better understand the above-described objects, features, and advantages of the present invention, the present invention will be further described below in conjunction with the accompanying drawings and embodiments. Many specific details are set forth in the following description to provide a thorough understanding of the present invention; however, the present invention may be practiced in other ways than those described herein, and therefore, the present invention is not limited to the specific embodiments disclosed below.
[0052] This invention applies deep learning model algorithms to deep-sea mineral identification and detection, proposing a method for intelligent identification of polymetallic nodules on the seabed based on the YOLO-Nodules model. By automatically generating tag data and combining it with a deep learning model, intelligent identification of polymetallic nodules is achieved. The YOLO-Nodules model enables accurate detection and segmentation of polymetallic nodule targets. Combined with photographic images acquired from the seabed, this method can generate corresponding nodule target tag data for different regions, training a targeted polymetallic nodule identification model, and can be widely applied in various ranges and regions.
[0053] Specifically, a method for intelligent identification of polymetallic nodules on the seabed based on the YOLO-Nodules model, referring to... Figure 1 As shown, it includes the following steps:
[0054] Step S1: Create a polymetallic nodule image dataset: Combine polymetallic nodule images obtained from seabed photography, randomly select some representative images and perform image segmentation to generate a polymetallic nodule image dataset, which will be used for subsequent nodule label data generation and deep learning model training.
[0055] Step S2: Automatically generate polymetallic nodule label data: Input the polymetallic nodule image dataset into the automatic generation method of polymetallic nodule label data to generate semantic segmentation label data and target detection label data; the automatic generation method of polymetallic nodule label data includes image enhancement, Segment Anything Models (SAM) model processing, and image post-processing.
[0056] Step S3: Intelligent detection and segmentation of polymetallic nodules:
[0057] A YOLO-Nodules model suitable for polymetallic nodules was constructed. The YOLO-Nodules model consists of the Conv+BN+SiLU (CBS) module, the Space to Depth Conversion (SDC) module, the Spatial Context Aware Module (SCAM) module, the Spatial Pyramid Pooling Fast (SPPF) module, the Cross Stage Partialconvolution module with Shuffle Attention (CSPSA) module, and the Cross Stage Partialconvolution module with two CBS blocks (CSP2C) module.
[0058] Subsequently, the polymetallic nodule image data and corresponding semantic segmentation label data are input into the YOLO-Nodules model to train a deep learning model suitable for the semantic segmentation task of polymetallic nodule targets, resulting in a trained YOLO-Nodules model for the semantic segmentation task. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the semantic segmentation result image of the polymetallic nodule target. The polymetallic nodule image data and corresponding target detection label data are then input into the YOLO-Nodules model to train a deep learning model suitable for the target detection task of polymetallic nodule targets, resulting in a trained YOLO-Nodules model for the target detection task. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the target detection result image of the polymetallic nodule.
[0059] Step S4: Calculation of polymetallic nodule index parameters: Based on the polymetallic nodule target detection results and semantic segmentation results, calculate the polymetallic nodule index parameters, including the number of polymetallic nodules, the size of polymetallic nodules, the number of polymetallic nodules distributed per unit area, the proportion of large, medium and small polymetallic nodules, the distribution area of polymetallic nodules, and the coverage rate of polymetallic nodules.
[0060] To better understand this invention, the following detailed descriptions of each of the above steps are provided:
[0061] In step S1 above, firstly, based on the polymetallic nodule images obtained from seabed photography, a portion of representative images are randomly selected and image segmentation is performed to generate a polymetallic nodule image dataset, which is used for subsequent nodule label data generation and deep learning model training. In this embodiment, considering the shape of polymetallic nodules, distribution area, and image shooting environment conditions, 13 representative images (4000*3000) are randomly selected from the polymetallic nodule image dataset obtained from photography; to facilitate subsequent model training, the selected original images are cropped to 480*480 (width 480 pixels, height 480 pixels), and the overlap rate between adjacent images is R%, with R generally ranging from 0 to 99, preferably 0 in this embodiment; finally, a polymetallic nodule image dataset containing 624 images is obtained.
[0062] In step S2 above, the polymetallic nodule image dataset is input into the automatic polymetallic nodule label data generation method to generate semantic segmentation label data and target detection label data. The automatic polymetallic nodule label data generation method includes image enhancement, Segment Anything Models (SAM) model processing, and image post-processing, which are executed sequentially to finally generate label data. In this embodiment, the polymetallic nodule image dataset generated in step S1 is input into the automatic polymetallic nodule label data generation method, wherein the flowchart of the automatic polymetallic nodule label data generation method is referred to... Figure 2 As shown, the specific process for automatically generating label data is as follows:
[0063] Step S21: Image enhancement consists of image brightness enhancement, image sharpness enhancement, and image contrast enhancement. These are performed sequentially to achieve image enhancement. The detailed process is as follows:
[0064] (1) Image Brightness Enhancement: In this embodiment, the PIL library of Python is used to achieve this purpose. First, a polymetallic nodule image is input, the brightness value of the current image is calculated, and different brightness enhancement scaling factors are specified according to the brightness value to obtain the nodule image with enhanced brightness. The brightness value is inversely proportional to the scaling factor, that is, the larger the brightness value, the smaller the scaling factor. The specific relationship is as follows: when the brightness value ≥ 55, the scaling factor is 1.1; when 50 ≤ brightness value < 55, the scaling factor is 1.2; when 45 ≤ brightness value < 50, the scaling factor is 1.4; when 40 ≤ brightness value < 45, the scaling factor is... 1.5, when 35 ≤ brightness value < 40, the scaling factor is 1.8; when 30 ≤ brightness value < 35, the scaling factor is 2.0; when 25 ≤ brightness value < 30, the scaling factor is 2.2; when 20 ≤ brightness value < 25, the scaling factor is 2.6; when 17 ≤ brightness value < 20, the scaling factor is 3.0; when 15 ≤ brightness value < 17, the scaling factor is 4.0; when 10 ≤ brightness value < 15, the scaling factor is 4.8; when 9 ≤ brightness value < 10, the scaling factor is 7.2; when 8 ≤ brightness value < 9, the scaling factor is 8.2; when 5 ≤ brightness value < 8, the scaling factor is 11; when brightness value < 5, the scaling factor is 12.
[0065] (2) Image sharpness enhancement: In this embodiment, the PIL library of Python is used to achieve this purpose; input the nodule image with enhanced brightness, set the image sharpening ratio factor to 2, and obtain the nodule image with image sharpening.
[0066] (3) Image contrast enhancement: In this embodiment, the PIL library of Python is used to achieve this purpose; input the image of the nodule that has been sharpened, set the image contrast enhancement ratio factor to 2, and obtain the nodule image with enhanced contrast.
[0067] Step S22: Input the enhanced polymetallic nodule image into the SAM model, optimize the model hyperparameters without adding any prompts, perform preliminary pre-identification of nodule targets globally in the image, use the pre-trained weights disclosed by the SAM model to generate the distribution range of the preliminary detection of polymetallic nodules, and generate a grayscale image containing nodule targets. In this embodiment, the enhanced polymetallic nodule image generated in step S21 is input into the SAM model. The SAM model type used is "vit_h", and the hyperparameters of the model are set as follows: points_per_side = 32, pred_iou_thresh = 0.90, stability_score_thresh = 0.95, crop_n_layers = 1, crop_n_points_downscale_factor = 2, min_mask_region_area = 100. No prompts, such as points, boxes, or text, are added. Preliminary pre-identification of nodule targets is performed globally in the image to generate the distribution range of the preliminary detection of polymetallic nodules. The output result is a grayscale image containing nodule targets.
[0068] Step S23: Input the grayscale image containing the nodule target into the "image post-processing" flow to generate multimetallic nodule label data. In this embodiment, the grayscale image containing the nodule target generated in step S22 is input into the image post-processing flow. The automatically generated multimetallic nodule target detection labels and semantic segmentation labels provided in this embodiment are shown below. Figure 7 As shown, the detailed process is as follows:
[0069] (1) If semantic segmentation task label data is generated, the image post-processing steps are as follows:
[0070] The grayscale image containing the nodule target generated in step S22 is converted into a binary image. That is, by setting a threshold, the value of pixels with a value greater than 5 is assigned to 1, and the value of pixels with a value less than or equal to 5 is assigned to 0. Pixels with a value of 1 represent polymetallic nodule pixels, and pixels with a value of 0 represent seabed sediment pixels.
[0071] In the generated binary image, the contour range of all nodule targets is found, contour coordinates are generated, and the contour coordinates are normalized according to the height and width of the image. Then, the contours of all polymetallic nodule targets are traversed, and all contour coordinates of the nodule targets are stored in a TXT format file; that is, the polymetallic nodule labels are converted from binary image format labels to TXT format labels. The internal format of the TXT file is "object X1 Y1 X2 Y2...", where object represents the nodule target category, X1 represents the X coordinate of the first contour point of the first nodule target, Y1 represents the Y coordinate of the first contour point of the first nodule target, X2 represents the X coordinate of the second contour point of the first nodule target, Y2 represents the Y coordinate of the second contour point of the first nodule target, and so on.
[0072] (2) If object detection task label data is generated, the image post-processing steps are as follows:
[0073] The grayscale image containing the nodule target generated in step S22 is converted into a binary image. That is, by setting a threshold, the value of pixels with a value greater than 5 is assigned to 1, and the value of pixels with a value less than or equal to 5 is assigned to 0. Pixels with a value of 1 represent polymetallic nodule pixels, and pixels with a value of 0 represent seabed sediment pixels.
[0074] A series of morphological processing methods are applied to the binary image, including image erosion, distance transformation, image normalization, image thresholding, and opening. These are performed sequentially to eliminate speckle noise in the image and resolve the issue of some nodule targets being too close together or sticking together, generating a morphologically processed binary image. In this embodiment, the PIL library of Python is used to achieve this purpose; the image erosion operation uses a 3×3 convolution kernel with a pixel value of 1 for image erosion processing, with 1 iteration; the distance transformation uses Manhattan distance to represent the distance between non-zero pixels and zero pixels, with a neighborhood size of 3×3; image normalization uses the "min-max normalization" method to normalize pixel values to between 0 and 1; image thresholding uses the "binarization thresholding" method to binarize the image, with 0.25 as the threshold; the opening operation uses a 5×5 convolution kernel with a pixel value of 1 for image opening operation, with 1 iteration.
[0075] In the morphologically processed binary image, the contour range of all nodule targets is found. Based on the contour range, the coordinates of the upper left and lower right corners of the nodule targets are calculated, and then the coordinates, height, and width of the center point of the nodule targets are calculated. Then, based on the height and width of the image, the coordinates, height, and width of the center point of the nodule targets are normalized. Finally, the contours of all polymetallic nodule targets in the image are traversed, and the coordinates, height, and width of the center point of the nodule targets are calculated and stored in a TXT format file. This converts the polymetallic nodule labels from binary image format labels to TXT format labels. The internal format of the TXT file is "object X1 Y1 W1 H1…", where object represents the nodule target category, X1 represents the X coordinate of the center point of the first nodule target, Y1 represents the Y coordinate of the center point of the first nodule target, W1 represents the width of the first nodule target, H1 represents the height of the first nodule target, and so on.
[0076] In step S3 above, a YOLO-Nodules deep learning model suitable for polymetallic nodules is constructed, which consists of CBS, SDC, SCAM, SPPF, CSPSA, and CSP2C modules. In this embodiment, a schematic diagram of the YOLO-Nodules model structure constructed by this invention is shown below. Figure 4 As shown in the diagram. The structural diagrams of the CBS, SDC, and SCAM modules are respectively shown in the diagram. Figure 5 As shown in (a), (b), and (c), the structural diagrams of the CSPSA, CSP2C, and SPPF modules are respectively as follows: Figure 6 As shown in (a), (b), and (c).
[0077] In step S3, the polymetallic nodule image data and corresponding semantic segmentation label data are input into the YOLO-Nodules model to train a deep learning model suitable for the semantic segmentation task of polymetallic nodule targets. This yields a trained YOLO-Nodules model for the semantic segmentation task. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the semantic segmentation result image of the polymetallic nodule target. In this embodiment, the flowchart for intelligent detection and segmentation of polymetallic nodule targets is referred to... Figure 3 and Figure 4As shown, the multimetallic nodule image dataset and the corresponding semantic segmentation label data generated in step S2 are input into the YOLO-Nodules model. The hyperparameters of the model are set as batch=4, epochs=100, imgsz=480, optimizer='SGD', lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, and warmup_epochs=3.0. After training, a YOLO-Nodules model suitable for multimetallic nodule targets is obtained for semantic segmentation tasks. Then, the multimetallic nodule image to be identified is input into the trained YOLO-Nodules model, with parameters set as imgsz=480, conf=0.35, iou=0.7, vid_stride=1, line_width=2, and max_det=1000000, to obtain the semantic segmentation result image of multimetallic nodules. Some of the semantic segmentation result images of multimetallic nodules provided in this embodiment are shown in [reference needed]. Figure 8 As shown.
[0078] In step S3, the polymetallic nodule image data and corresponding target detection label data are input into the YOLO-Nodules model to train a deep learning model suitable for the target detection task of polymetallic nodules. This yields a trained YOLO-Nodules model for the target detection task. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the polymetallic nodule target detection result image. In this embodiment, the flowchart for intelligent detection and segmentation of polymetallic nodule targets is referred to... Figure 3 and Figure 4 As shown, the multimetallic nodule image dataset and the corresponding target detection label data generated in step S2 are input into the YOLO-Nodules model. The hyperparameters of the model are set as batch=4, epochs=100, imgsz=480, optimizer='SGD', lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, and warmup_epochs=3.0. After training, a YOLO-Nodules model suitable for multimetallic nodule targets is obtained for the target detection task. Then, the multimetallic nodule image to be identified is input into the trained YOLO-Nodules model, with parameters set as imgsz=480, conf=0.25, iou=0.7, vid_stride=1, line_width=2, and max_det=1000000, to obtain the multimetallic nodule target detection result image. Some multimetallic nodule target detection result images provided in this embodiment are shown below. Figure 9 As shown.
[0079] In step S4, combining the polymetallic nodule target detection results and semantic segmentation results, polymetallic nodule index parameters are calculated, including the number of polymetallic nodules, the size of polymetallic nodules, the number of polymetallic nodules distributed per unit area, the proportion of large, medium, and small polymetallic nodules, the distribution area of polymetallic nodules, and the coverage rate of polymetallic nodules. In this embodiment, the polymetallic nodule target detection result image generated in step S3 is used to calculate the number of nodules, the size of nodules, the number of nodules distributed per unit area, and the proportion of large, medium, and small nodules in the image; the polymetallic nodule semantic segmentation result image generated in step S3 is used to calculate the distribution area of nodules and the coverage rate of nodules in the image.
[0080] By combining seabed photographic images, this method can generate corresponding nodule target label data for different regions. Then, by combining this with the YOLO-Nodules model, a region-specific multi-metallic nodule recognition model is trained. This model can be widely applied across different areas and demonstrates excellent recognition performance for small targets such as multi-metallic nodules, effectively reducing the probability of misclassification or omission in existing deep learning methods for nodule target recognition. Furthermore, benefiting from the automatic generation of label data, this method can combine seabed photographic images from other regions to generate corresponding nodule target label data for different regions, training a region-specific multi-metallic nodule recognition model, and thus enabling its widespread application across different areas. Moreover, existing technologies have paid less attention to deep learning-based deep-sea mineral identification. The intelligent identification method proposed in this invention requires no additional manual processing or intervention and can also be extended to other image-based deep-sea mineral identification applications.
[0081] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments for application in other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
Claims
1. A method for intelligent identification of polymetallic nodules on the seabed based on YOLO-Nodules, characterized in that, Includes the following steps: Step S1: Create a polymetallic nodule image dataset; Combine polymetallic nodule images obtained from the seabed, randomly select some representative images and perform image segmentation to generate a polymetallic nodule image dataset; Step S2: Automatically generate polymetallic nodule label data: Perform image enhancement, SAM model processing and image post-processing on the polymetallic nodule image dataset to generate semantic segmentation label data and target detection label data; Step S3, Intelligent detection and segmentation of polymetallic nodules: Construct a YOLO-Nodules model suitable for polymetallic nodules. The YOLO-Nodules model consists of CBS module, SDC module, SCAM module, SPPF module, CSPSA module and CSP2C module. The polymetallic nodule image data and the corresponding semantic segmentation label data are input into the YOLO-Nodules model for training. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the semantic segmentation result image of the polymetallic nodule target. The polymetallic nodule image data and the corresponding target detection label data are input into the YOLO-Nodules model for training. Then, the polymetallic nodule image to be identified is input into the trained YOLO-Nodules model to obtain the polymetallic nodule target detection result image. In step S3, constructing the YOLO-Nodules model suitable for multimetallic nodules specifically includes: (1) Constructing the Backbone part of the YOLO-Nodules model: The Backbone part is used to extract multi-scale features of nodule targets from the image. It uses CBS and CSPSA modules as basic units. The CBS module consists of a Convolution layer, a Batch Normalization layer and a SiLU activation function. The CSPSA module consists of four CBS modules and a ShuffleAttention module. An SCAM module is embedded at the end of the Backbone to construct the global context relationship within the image. The SCAM module consists of three CBS modules, an Average Pool layer, a Max Pool layer, a Softmax activation function, a Sigmoid activation function and a SiLU activation function. An SPPF module is embedded to stitch together feature maps of different scales using pooling operations of different scales. The SPPF module consists of two CBS modules and three Max Pool layers. (2) Constructing the Neck part of the YOLO-Nodules model: The Neck part is used to achieve multi-scale feature fusion, which fuses feature maps from different stages of the Backbone part. The Neck part consists of CBS module, SDC module, CSP2C module and SCAM module; among them, the SDC module is used to convert the spatial dimension of the feature map into the depth dimension to achieve the purpose of enhancing feature representation; the three side outputs of the Neck part are respectively added to the SCAM module to improve the global correlation ability across channels and across spaces; the CSP2C module consists of 4 CBS modules and adds a residual mechanism; (3) Building the Prediction part of the YOLO-Nodules model: The YOLO-Nodules model inherits the decoupling head of the YOLOv8 model so that each task can focus on its own goal during the optimization process; Step S4: Calculation of polymetallic nodule index parameters: Based on the polymetallic nodule target detection results and semantic segmentation results obtained in step S3, calculate the polymetallic nodule index parameters; the polymetallic nodule index parameters include the number of polymetallic nodules, the size of polymetallic nodules, the number of polymetallic nodules distributed per unit area, the proportion of large, medium and small polymetallic nodules, the distribution area of polymetallic nodules and the coverage rate of polymetallic nodules.
2. The intelligent identification method for seabed polymetallic nodules based on YOLO-Nodules according to claim 1, characterized in that, In step S1, considering the differences in the shape, distribution area, and image shooting environment of polymetallic nodules, N representative images are randomly selected from the acquired polymetallic nodule images. The selected images are cropped, and the overlap rate between adjacent images is R%, resulting in a polymetallic nodule image dataset containing M images.
3. The intelligent identification method for seabed polymetallic nodules based on YOLO-Nodules according to claim 1, characterized in that, In step S2, the image enhancement and SAM model processing are specifically performed in the following ways: Image enhancement: The polymetallic nodule image is sequentially enhanced in terms of brightness, sharpness, and contrast. SAM model processing: Optimize the hyperparameters of the SAM model, input the enhanced multimetallic nodule image, perform preliminary pre-identification of nodule targets globally in the image, generate the distribution range of the preliminary detection of multimetallic nodules, and generate a grayscale image containing nodule targets.
4. The intelligent identification method for seabed polymetallic nodules based on YOLO-Nodules according to claim 3, characterized in that, In step S2, the image post-processing is specifically implemented in the following manner: (1) Generate semantic segmentation task label data: By setting a threshold, the generated result of the SAM model is converted into a binary image. Pixels with a value of 1 represent polymetallic nodule pixels, and pixels with a value of 0 represent seabed sediment pixels. In the binary image, the contour range of all nodule targets is found, contour coordinates are generated, and the contour coordinates are normalized according to the height and width of the image; then the contours of all polymetallic nodule targets are traversed, and all contour coordinates of the nodule targets are stored in a TXT format file. (2) Generate target detection task label data: By setting a threshold, the generated result of the SAM model is converted into a binary image. Pixels with a value of 1 represent polymetallic nodule pixels, and pixels with a value of 0 represent seabed sediment pixels. A series of morphological processing methods are applied to the binary image, including image erosion, distance transformation, image normalization, image thresholding, and opening, to eliminate speckle noise in the image, solve the problem of some nodule targets being close together or sticking together, and generate a morphologically processed binary image. In the morphologically processed binary image, the contour range of all nodule targets is found; the coordinates of the upper left and lower right corners of the nodule targets are calculated based on the contour range, and then the coordinates, height, and width of the center point of the nodule targets are calculated; then, the center point coordinates, height, and width of the nodule targets are normalized according to the height and width of the image; finally, the contours of all polymetallic nodule targets in the image are traversed, the center point coordinates, height, and width of the nodule targets are calculated, and stored in a TXT format file.
5. The intelligent identification method for seabed polymetallic nodules based on YOLO-Nodules according to claim 1, characterized in that, In step S4, the multimetallic nodule index parameters are calculated by combining the multimetallic nodule target detection results and semantic segmentation results, specifically including: (1) Calculate the number of polymetallic nodules and the distribution of polymetallic nodules per unit area: Based on the nodule targets identified in the target detection results, calculate the number of polymetallic nodules in the image. The number of polymetallic nodules divided by the image area is the distribution of polymetallic nodules per unit area. (2) Calculate the size of the polymetallic nodule: Based on the coordinates of the upper left and lower right corners of the recognition box in the target detection result, calculate the length of the diagonal of the nodule, and then multiply it by the scaling factor to obtain the length of the major axis of the nodule target, i.e. the size of the nodule; (3) Calculate the proportion of large, medium and small polymetallic nodules: Define large, medium and small nodules, and calculate the proportion of large, medium and small polymetallic nodules based on the nodule size obtained in (2); (4) Calculate the distribution area of polymetallic nodules: The distribution area of polymetallic nodules is the number of pixels covered by the nodules multiplied by the area of a single pixel. (5) Calculate the polymetallic nodule coverage: The polymetallic nodule distribution area divided by the image area is the polymetallic nodule coverage.