Target detection method, system and device based on underwater image and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The underwater image target detection model trained by deep learning solves the problem of insufficient target detection accuracy in underwater environments by using feature extraction and fusion techniques, and achieves high-precision and high-efficiency underwater target recognition.

CN116704324BActive Publication Date: 2026-06-26SOUTH CHINA NORMAL UNIV

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SOUTH CHINA NORMAL UNIV
Filing Date: 2023-05-26
Publication Date: 2026-06-26

Application Information

Patent Timeline

26 May 2023

Application

26 Jun 2026

Publication

CN116704324B

IPC: G06V20/05; G06V10/80; G06V10/774; G06V10/82; G06N3/048; G06N3/08

AI Tagging

Technology Topics

Pattern recognition Radiology

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Weed detection method and device based on hypergraph enhanced YOLOv11 framework
CN122265829AMake up for the limitationsImplement long-range dependency captureCharacter and pattern recognition Biological models Pattern recognitionWeed detection
Image filtering method, device, equipment, storage medium and program product
CN122265042AGuaranteed continuityLow latencyImage enhancement Processor architectures/configuration Pattern recognition Imaging processing
Reference frame selection based on camera pose for video encoding
WO2026123146A1Image analysis Digital video signal modification Pattern recognition Gyroscope
Video surveillance system with advantageous viewpoint transformation
CN115297295BPattern recognition Video monitoring
Image processing method, electronic device, and storage medium
CN122265779ACharacter and pattern recognition Biological models Pattern recognition Imaging processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

The complex light propagation and background interference in the underwater environment result in poor detection accuracy of existing detectors for underwater targets, making it difficult to accurately identify marine life and infrastructure.

Method used

A target detection model based on underwater images is trained using deep learning methods. Feature extraction and fusion are performed through a backbone network, a neck network, and a prediction network. The detection accuracy is improved by utilizing an improved YOLOv7 neural network model and an attention mechanism.

Benefits of technology

It achieves high-precision and high-efficiency detection of target regions in underwater images, improving the accuracy and efficiency of underwater target recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116704324B_ABST

Patent Text Reader

Abstract

The present application relates to the field of image detection, in particular to a target detection method, device and system based on underwater image and storage medium, which adopts a deep learning method to train a target detection model based on underwater image, so as to extract different levels of feature information of the underwater image, perform feature fusion, and realize detection of the target region of the underwater image according to the obtained fusion feature information, thereby improving the accuracy and efficiency of detection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image detection, and in particular to a target detection method, apparatus, system, and storage medium based on underwater images. Background Technology

[0002] Underwater target detection is crucial for a variety of applications, including marine conservation, oceanography, and defense. Accurate detection and classification of these targets are key technologies for monitoring the health of marine ecosystems and identifying potential threats to underwater infrastructure.

[0003] However, due to the various complex factors in the underwater environment, most detectors have poor accuracy in detecting underwater targets. The main reason is that light is scattered and absorbed by water as it propagates underwater, causing significant attenuation and blurring of the image captured by the camera, further increasing the difficulty of underwater object detection. Furthermore, the complexity of lighting conditions further limits visibility in the underwater environment. Underwater objects vary in size and shape, including various types such as marine life and marine debris, and their structures are also very complex, posing challenges to target identification and detection. Another reason is the presence of a large amount of background interference in the underwater environment, such as algae and rocks, which can be confused with the target, increasing the difficulty of detection. Summary of the Invention

[0004] Based on this, the purpose of the present invention is to provide a target detection method, device, system, and storage medium based on underwater images. The method employs deep learning to train a target detection model based on underwater images, extracts feature information from different levels of underwater images, performs feature fusion, and realizes the detection of target regions in underwater images based on the obtained fused feature information, thereby improving the accuracy and efficiency of detection.

[0005] In a first aspect, embodiments of this application provide a target detection method based on underwater images, comprising the following steps:

[0006] Obtain a sample underwater image set and a sample label set, wherein the sample underwater image set includes several sample images, and the sample label set includes several label regions of several sample images, and several label data corresponding to several label regions;

[0007] A preset detection model is obtained, which includes a backbone network, a neck network, and a prediction network. The sample underwater image set is input into the backbone network, and the backbone feature maps of each sample underwater image at several preset scales are obtained.

[0008] The backbone feature maps of each of the underwater images of the samples at several scales are input into the neck network for feature fusion to obtain the fused feature map of each of the underwater images of the samples.

[0009] The fused feature maps of each of the underwater images of the samples are input into the prediction network to obtain several prediction regions and label data of several prediction regions for each of the underwater images of the samples.

[0010] The detection model is trained based on the label data of several predicted regions and the label data of several labeled regions in the underwater images of each sample to obtain a target detection model.

[0011] In response to a detection command, an underwater image to be detected is obtained, and the underwater image to be detected is input into the target detection model to obtain the detection result of the underwater image to be detected.

[0012] Secondly, embodiments of this application provide a target detection device based on underwater images, comprising:

[0013] The data acquisition module is used to obtain a sample underwater image set and a sample label set. The sample underwater image set includes several sample images, and the sample label set includes several label regions of several sample images and several label data corresponding to the label regions.

[0014] The backbone feature extraction module is used to obtain a preset detection model, which includes a backbone network, a neck network, and a prediction network. The sample underwater image set is input into the backbone network to obtain backbone feature maps of several scales for each sample underwater image.

[0015] The feature fusion module is used to input the backbone feature maps of several scales of the underwater images of each sample into the neck network for feature fusion to obtain the fused feature map of each underwater image of the sample.

[0016] The prediction module is used to input the fused feature map of each of the underwater images of the samples into the prediction network to obtain several prediction regions and label data of several prediction regions for each of the underwater images of the samples.

[0017] The model training module is used to train the detection model based on the label data of several predicted regions and the label data of several labeled regions of the underwater images of each sample, so as to obtain the target detection model.

[0018] The detection module is used to respond to a detection command, obtain an underwater image to be detected, input the underwater image to be detected into the target detection model, and obtain the detection result of the underwater image to be detected.

[0019] Thirdly, embodiments of this application provide a computer device, including: a processor, a memory, and a computer program stored in the memory and executable on the processor; when the computer program is executed by the processor, it implements the steps of the underwater image-based target detection method as described in the first aspect.

[0020] Fourthly, embodiments of this application provide a storage medium storing a computer program that, when executed by a processor, implements the steps of the underwater image-based target detection method as described in the first aspect.

[0021] In this application embodiment, a target detection method, apparatus, system, and storage medium based on underwater images are provided. A deep learning method is used to train an underwater image-based target detection model to extract feature information at different levels of the underwater image, perform feature fusion, and realize the detection of target regions in the underwater image based on the obtained fused feature information, thereby improving the accuracy and efficiency of detection.

[0022] To better understand and implement this invention, the following detailed description is provided in conjunction with the accompanying drawings. Attached Figure Description

[0023] Figure 1 A flowchart illustrating the target detection method based on underwater images provided in the first embodiment of this application;

[0024] Figure 2 A schematic flowchart of the target detection method based on underwater images provided in the second embodiment of this application;

[0025] Figure 3 This is a flowchart illustrating step S2 of the underwater image-based target detection method provided in the first embodiment of this application.

[0026] Figure 4 A schematic flowchart of the target detection method based on underwater images provided in the third embodiment of this application;

[0027] Figure 5 This is a flowchart illustrating step S4 of the underwater image-based target detection method provided in the first embodiment of this application.

[0028] Figure 6 This is a flowchart illustrating step S5 of the underwater image-based target detection method provided in the first embodiment of this application.

[0029] Figure 7 This is a schematic diagram of the underwater image-based target detection device provided in the fourth embodiment of this application;

[0030] Figure 8 This is a schematic diagram of the structure of a computer device provided in the fifth embodiment of this application; Detailed Implementation

[0031] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.

[0032] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

[0033] It should be understood that although the terms first, second, third, etc., may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this application, first information may also be referred to as second information, and similarly, second information may also be referred to as first information. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to determination."

[0034] Please see Figure 1 , Figure 1 The flowchart of the target detection method based on underwater images provided in the first embodiment of this application is shown. The method includes the following steps:

[0035] S1: Obtain the sample underwater image set and sample label set.

[0036] The execution entity of the underwater image-based target detection method is a detection device (hereinafter referred to as the detection device). In an optional embodiment, the detection device may be a computer device, a server, or a server cluster composed of multiple computer devices.

[0037] In this embodiment, the detection device can obtain a set of underwater sample images and a set of sample labels input by the user, or it can obtain the set of underwater sample images and a set of sample labels through a preset database. The set of underwater sample images includes several sample images, and the set of sample labels includes several label regions of several sample images and label data corresponding to several label regions.

[0038] Please see Figure 2 , Figure 2 The flowchart of the target detection method based on underwater images provided in the second embodiment of this application is shown, including step S7, which is performed before step S2, as follows:

[0039] S7: Perform data augmentation on several sample images in the sample underwater image set to obtain the data-augmented sample underwater image set.

[0040] In this embodiment, the detection device performs data augmentation processing on several sample underwater images in the sample underwater image set to obtain a data-augmented sample underwater image set. This data augmentation expands the size of the sample underwater image set used for training, thereby enhancing the generalization ability of the trained model. In an optional embodiment, the data augmentation processing includes one or more combinations of data augmentation transformations such as shearing, rotation, reflection, flipping, scaling, translation, scale transformation, contrast transformation, noise perturbation, and color transformation to increase the size of the dataset.

[0041] S2: Obtain a preset detection model, which includes a backbone network, a neck network, and a prediction network. Input the sample underwater image set into the backbone network and obtain backbone feature maps of each sample underwater image at several preset scales.

[0042] The detection model is an improved YOLOv7 neural network model. The YOLOv7 neural network model is the seventh generation of the YOLO series of object detection networks, capable of accurate object detection for small-scale images. The improved YOLOv7 model includes a backbone network, a neck network, and a prediction network (YoloHead).

[0043] In this embodiment, the detection device obtains a preset detection model, inputs the sample underwater image set into the backbone network, and obtains backbone feature maps of each sample underwater image at several preset scales.

[0044] The backbone network of the detection model includes convolutional modules and feature extraction modules connected in sequence. The feature extraction module includes several sub-feature extraction modules connected in sequence. The sub-feature extraction module includes a first sub-convolutional module of a first branch and a second sub-convolutional module and a bottleneck convolutional module of a second branch.

[0045] Please see Figure 3 , Figure 3 The flowchart of step S2 in the underwater image-based target detection method provided in the first embodiment of this application includes steps S21 to S22, as follows:

[0046] S21: Input the sample underwater image into the convolution module to obtain the convolution feature map of the sample underwater image.

[0047] The convolutional module comprises four CBS standard convolutional activation modules, each consisting of a Conv layer, a BN layer, and a SiLU layer connected in sequence. The Conv layer is a convolutional layer, the BN layer is a batch normalization layer, and the SiLU layer is an activation layer.

[0048] In this embodiment, the detection device inputs the sample underwater image set into the convolution module to obtain convolutional feature maps of each sample underwater image. This allows for preliminary convolution processing of each sample underwater image in the sample underwater image set, extracting preliminary feature information.

[0049] S22: The convolutional feature map of the underwater image of the sample is used as the input feature map of the first sub-feature extraction module in the feature extraction module. According to the first feature extraction algorithm in the first branch of the first sub-feature extraction module, the output feature map of the first branch of the first sub-feature extraction module is obtained.

[0050] The first feature extraction algorithm is:

[0051] F cbs ()=σ(ρ(C(X,c)))

[0052] In the formula, F cbs () represents the output feature map of the first sub-convolution module of the first branch of the sub-feature extraction module, X represents the input feature map of the sub-feature extraction module, c represents the output channel of the first branch of the sub-feature extraction module, ρ() represents the normalization function, σ() represents the activation function, and C() represents the convolution function.

[0053] In this embodiment, the detection device uses the convolutional feature map of the underwater image of the sample as the input feature map of the first sub-feature extraction module in the feature extraction module, and obtains the output feature map of the first branch of the first sub-feature extraction module according to the first feature extraction algorithm in the first branch of the first sub-feature extraction module.

[0054] S23: Based on the second feature extraction algorithm in the first branch of the first sub-feature extraction module, obtain the output feature map output by the second branch of the first sub-feature extraction module.

[0055] The second feature extraction algorithm is as follows:

[0056]

[0057] In the formula, F cef ( i ) represents the output feature map of the bottleneck convolution module of the second branch of the sub-feature extraction module, f() is the convolution activation function, concat() is the concatenation function, C2 is the output channel of the bottleneck convolution module of the second branch of the sub-feature extraction module, and g() is the bottleneck function.

[0058] In this embodiment, the detection device obtains the output feature map of the second branch of the first sub-feature extraction module according to the second feature extraction algorithm in the first branch of the first sub-feature extraction module.

[0059] S24: The output feature map output by the first branch and the output feature map output by the second branch of the first sub-feature extraction module are concatenated to obtain the output feature map output by the first sub-feature extraction module.

[0060] In this embodiment, the detection device concatenates the output feature map from the first branch and the output feature map from the second branch of the first sub-feature extraction module to obtain the output feature map output by the first sub-feature extraction module, as follows:

[0061] F = f(concat(F) cbs (),F cef ())

[0062] In the formula, F is the output feature map of the sub-feature extraction module.

[0063] S25: Use the output feature map of the first sub-feature extraction module as the input feature map of the next sub-feature extraction module, repeat the above steps to obtain the output feature maps of each sub-feature extraction module, and use them as the backbone feature maps of the several scales.

[0064] In this embodiment, the detection device uses the output feature map of the first sub-feature extraction module as the input feature map of the next sub-feature module, repeating the above steps until the output feature map of the last sub-feature extraction module is obtained, thus obtaining the output feature maps of each sub-feature extraction module as the backbone feature maps of the several scales. Compared to the original Yolov7 neural network model, this application enhances the feature information extraction capability of the branches and improves the accuracy of model training by introducing a bottleneck convolution module into the backbone network.

[0065] The first sub-convolutional module includes several interconnected static convolutional activation layers and dynamic convolutional activation layers. The dynamic convolutional activation layer ODConv extends the dynamic characteristics along the convolutional dimension, considering the dynamics in spatial domain, input channel, and output channel dimensions to obtain more detailed feature information with a larger receptive field, thereby improving the accuracy of model training.

[0066] Please see Figure 4 , Figure 4 The flowchart of the target detection method based on underwater images provided in the third embodiment of this application is shown, and it further includes step S8, which is performed before step S3, as follows:

[0067] S8: Using an attention mechanism, the backbone feature maps of several scales of the underwater images of each sample are transformed by attention to obtain enhanced backbone feature maps of several scales of the underwater images of each sample.

[0068] The attention module is Coordinate Attention (CA). In this embodiment, the detection device inputs backbone feature maps at several scales of the underwater images of each sample into the attention module. Using an attention mechanism, channel relationships and long-term dependencies are encoded through precise location information. The backbone feature maps are then average-pooled channel by channel, with pooling kernels used to encode each channel in both horizontal and vertical directions, generating feature maps along the horizontal and vertical directions respectively. These two feature maps are then subjected to corresponding convolutions and non-linear activations. Finally, the feature maps in the horizontal and vertical directions are expanded to obtain the final output feature map. This results in enhanced backbone feature maps at several scales for each underwater image sample, improving the feature perception and location information of small, distant underwater targets, thereby improving the accuracy of model training.

[0069] S3: Input the backbone feature maps of several scales of the underwater images of each sample into the neck network for feature fusion to obtain the fused feature map of each underwater image of the sample.

[0070] In this embodiment, the detection device inputs backbone feature maps at several scales from each of the underwater images of the samples into the neck network for feature fusion to obtain a fused feature map of each of the underwater images of the samples. By fusing feature information from different levels of the extracted underwater images of the samples, a more detailed fused feature map is obtained for more accurate target detection.

[0071] S4: Input the fused feature map of each of the underwater images of the samples into the prediction network to obtain several prediction regions and label data of several prediction regions for each of the underwater images of the samples.

[0072] In this embodiment, the detection device inputs the fused feature map of each of the underwater images of the samples into the prediction network to obtain several prediction regions and label data of several prediction regions for each of the underwater images of the samples.

[0073] Please see Figure 5 , Figure 5 The flowchart of step S4 in the underwater image-based target detection method provided in the first embodiment of this application includes steps S41 to S42, as follows:

[0074] S41: Based on the fusion feature map of each of the underwater images of the samples, perform grid division on each of the underwater images of the samples to obtain the grid coordinate information of each of the underwater images of the samples.

[0075] In this embodiment, the detection device sequentially performs gridded prediction on the fusion feature maps of several channels of each sample ophthalmic ultrasound image, uses convolution operation to obtain a convolution feature map with a specified number of channels, and performs grid division on each sample ophthalmic ultrasound image based on the convolution feature map to obtain the grid coordinate information of each sample underwater image.

[0076] S42: Based on the grid coordinate information of the underwater images of each sample and the preset detector, obtain several predicted regions and label data of several predicted regions for the ophthalmic ultrasound images of each sample.

[0077] The label data includes the center point coordinates, width, and height parameters of the label area, wherein the center point coordinates include the horizontal coordinate and the vertical coordinate of the center point.

[0078] In this embodiment, the detection device is pre-configured with detectors, which can be up to three. Each detector contains three anchor boxes, typically obtained by clustering the target boxes in the training set using K-means clustering. The calculation process is integrated into the model, and different anchor boxes are adaptively trained during the training of different parts to generate detection boxes for prediction. Regression is performed on each detection box to obtain the position and size of the predicted region.

[0079] In this embodiment, the detection device inputs the grid coordinate information of the underwater images of each sample into the detector. Based on the detection box parameters and preset regression coefficients, it calculates the center and width / height of the prediction region to obtain the prediction result. The regression coefficients are normalized to the range (0, 1), then multiplied by 2 and subtracted by 0.5, fixing the value between (-0.5, 1.5). Combined with the grid coordinate information, the center point coordinate parameters of the prediction region are obtained. The normalized regression coefficients are multiplied by 2 and squared, fixing the value between (0, 4). Multiplied by the width and height of the prior box, the width and height parameters of the prediction region are obtained.

[0080] S5: Based on the label data of several predicted regions and several label regions of the underwater images of each sample, the detection model is trained to obtain the target detection model.

[0081] In this embodiment, the detection device trains the detection model based on the label data of several predicted regions and the label data of several labeled regions of the underwater images of each sample to obtain a target detection model.

[0082] Please see Figure 6 , Figure 6 The flowchart of step S5 in the underwater image-based target detection method provided in the first embodiment of this application includes steps S51 to S52, as follows:

[0083] S51: Based on the detection area corresponding to the same underwater image of the sample, the label data corresponding to the detection area, the corresponding label area, the label data corresponding to the label area, and the preset bulldozer distance loss function, obtain several data-enhanced bulldozer distance loss values corresponding to the underwater images of the sample.

[0084] The bulldozer distance loss function is:

[0085]

[0086] In the formula, L1(N) a N b ) is the distance function of the bulldozer, N a N represents the detection area. b Indicates the label area, cx acy represents the x-coordinate parameter of the center point corresponding to the detection area. a w represents the ordinate parameter of the center point corresponding to the detection area. a h is the width parameter corresponding to the detection area. a Here, T represents the height parameter corresponding to the detection area, and cx represents the transpose. b cy is the x-coordinate parameter of the center point corresponding to the label area. b w is the ordinate parameter of the center point corresponding to the label area. b h is the width parameter corresponding to the label area. b This is the height parameter corresponding to the label area.

[0087] In this embodiment, the detection device obtains several data-enhanced bulldozer distance loss values corresponding to the underwater images of the same sample based on the detection area corresponding to the same sample underwater image, the label data corresponding to the detection area, the corresponding label area, the label data corresponding to the label area, and a preset bulldozer distance loss function.

[0088] The detection device weights the image pixels in the prediction and label regions, giving the highest weight to the center pixels in the boundary region, with the weight gradually decreasing from the center to the boundary. Finally, the normalized Wasserstein distance (NWD) is used to transform the prediction and label regions into two-dimensional Gaussian probability distributions, which can better evaluate the similarity between two small-sized objects, improve the accuracy of model training, and better detect small-scale objects.

[0089] S52: Based on the bulldozer distance loss values corresponding to several sample underwater images, construct a total loss function, and train the improved detection model according to the total loss function to obtain a target detection model.

[0090] The total loss function is:

[0091]

[0092] In the formula, LOSS is the total loss function, k is the preset scaling coefficient, and L... CIOU Here, α is the preset detector loss function, β is the first proportionality coefficient, β is the second proportionality coefficient, and c is a preset constant.

[0093] To further reflect the correlation between the predicted region and the labeled region, in this embodiment, the detection device maps the distribution distance to a probability range of 0 to 1, constructs a total loss function based on the bulldozer distance loss values corresponding to several sample underwater images, and trains the improved detection model based on the total loss function to obtain the target detection model.

[0094] S6: In response to the detection command, obtain the underwater image to be detected, input the underwater image to be detected into the target detection model, and obtain the detection result of the underwater image to be detected.

[0095] The detection command is issued by the user and received by the detection device.

[0096] In this embodiment, the detection device responds to the detection command, obtains the underwater image to be detected, inputs the underwater image to be detected into the target detection model, obtains the detection result of the underwater image to be detected, and displays it on the preset display interface of the detection device.

[0097] Please refer to Figure 7 , Figure 7 This is a schematic diagram of the underwater image-based target detection device provided in the fourth embodiment of this application. The device can be implemented entirely or partially through software, hardware, or a combination of both. The device 7 includes:

[0098] The data acquisition module 71 is used to obtain a sample underwater image set and a sample label set, wherein the sample underwater image set includes a number of sample images, and the sample label set includes a number of label regions of the sample images and label data corresponding to the label regions.

[0099] The backbone feature extraction module 72 is used to obtain a preset detection model, which includes a backbone network, a neck network, and a prediction network. The sample underwater image set is input into the backbone network to obtain backbone feature maps of several scales for each sample underwater image.

[0100] The feature fusion module 73 is used to input the backbone feature maps of several scales of the underwater images of each sample into the neck network for feature fusion to obtain the fused feature map of each underwater image of the sample.

[0101] The prediction module 74 is used to input the fused feature map of each of the underwater images of the samples into the prediction network to obtain several prediction regions and label data of several prediction regions of each of the underwater images of the samples.

[0102] The model training module 75 is used to train the detection model based on the label data of several predicted regions and the label data of several labeled regions of the underwater images of each sample, so as to obtain the target detection model.

[0103] The detection module 76 is used to respond to a detection command, obtain an underwater image to be detected, input the underwater image to be detected into the target detection model, and obtain the detection result of the underwater image to be detected.

[0104] In this embodiment, a data acquisition module is used to obtain a sample underwater image set and a sample label set. The sample underwater image set includes several sample images, and the sample label set includes several label regions of the sample images and label data corresponding to the label regions. A backbone feature extraction module is used to obtain a preset detection model, which includes a backbone network, a neck network, and a prediction network. The sample underwater image set is input into the backbone network to obtain backbone feature maps at several scales for each sample underwater image. A feature fusion module is used to input the backbone feature maps at several scales for each sample underwater image into the... Feature fusion is performed in the neck network to obtain fused feature maps of each sample underwater image. A prediction module inputs these fused feature maps into the prediction network to obtain several predicted regions and label data for each underwater image. A model training module trains the detection model based on the label data of the predicted regions and the label data of the labeled regions to obtain a target detection model. A detection module, responding to a detection command, obtains the underwater image to be detected and inputs it into the target detection model to obtain the detection result. Deep learning is used to train an underwater image-based target detection model to extract feature information at different levels from the underwater image, perform feature fusion, and, based on the obtained fused feature information, achieve the detection of target regions in the underwater image, improving detection accuracy and efficiency.

[0105] Please refer to Figure 8 , Figure 8 This is a schematic diagram of the structure of a computer device provided in the fifth embodiment of this application. The computer device 8 includes: a processor 81, a memory 82, and a computer program 83 stored in the memory 82 and executable on the processor 81. The computer device can store multiple instructions, which are applicable to the method steps shown in the first to third embodiments above being loaded and executed by the processor 81. For the specific execution process, please refer to the specific descriptions shown in the first to third embodiments, which will not be repeated here.

[0106] The processor 81 may include one or more processing cores. The processor 81 connects to various parts of the server using various interfaces and lines. It executes various functions and processes data of the underwater image-based target detection device 7 by running or executing instructions, programs, code sets, or instruction sets stored in the memory 82, and by calling data from the memory 82. Optionally, the processor 81 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 81 may integrate one or a combination of several of the following: a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required to be displayed on the touch screen; and the modem handles wireless communication. It is understood that the modem may also be implemented as a separate chip without being integrated into the processor 81.

[0107] The memory 82 may include random access memory (RAM) or read-only memory. Optionally, the memory 82 may include a non-transitory computer-readable storage medium. The memory 82 can be used to store instructions, programs, code, code sets, or instruction sets. The memory 82 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch instructions), instructions for implementing the various method embodiments described above, etc.; the data storage area may store data involved in the various method embodiments described above, etc. Optionally, the memory 82 may also be at least one storage device located remotely from the aforementioned processor 81.

[0108] This application also provides a storage medium that can store multiple instructions. These instructions are applicable to being loaded and executed by a processor using the method steps shown in the first to third embodiments described above. For details of the execution process, please refer to the specific descriptions shown in the first to third embodiments, which will not be repeated here.

[0109] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0110] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0111] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the algorithm. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0112] In the embodiments provided by this invention, it should be understood that the disclosed apparatus / terminal devices and methods can be implemented in other ways. For example, the apparatus / terminal device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0113] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0114] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0115] If the integrated module / unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of the present invention can also be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms.

[0116] This invention is not limited to the above-described embodiments. If any modifications or variations to this invention do not depart from the spirit and scope of this invention, and if such modifications and variations fall within the scope of the claims and equivalent technologies of this invention, then this invention also intends to include such modifications and variations.

Claims

1. A target detection method based on underwater images, characterized in that, Includes the following steps: Obtain a sample underwater image set and a sample label set, wherein the sample underwater image dataset includes several sample underwater images, and the sample label set includes several label regions of several sample underwater images, and several label data corresponding to several label regions; A preset detection model is obtained, which includes a backbone network, a neck network, and a prediction network. The sample underwater image set is input into the backbone network, and the backbone feature maps of each sample underwater image at several preset scales are obtained. The backbone feature maps of each of the underwater images of the samples at several scales are input into the neck network for feature fusion to obtain the fused feature map of each of the underwater images of the samples. The fused feature maps of each of the underwater sample images are input into the prediction network. Based on the fused feature maps of each of the underwater sample images, the underwater sample images are divided into grids to obtain the grid coordinate information of each of the underwater sample images. Based on the grid coordinate information of each sample underwater image and a preset detector, several prediction regions and label data of several prediction regions are obtained for each sample underwater image. The label data includes the center point coordinate parameters, width parameters, and height parameters of the label region. The center point coordinate parameters include the center point horizontal coordinate parameters and the center point vertical coordinate parameters. The detection model is trained based on the label data of several predicted regions and the label data of several labeled regions in the underwater images of each sample to obtain a target detection model. In response to a detection command, an underwater image to be detected is obtained, and the underwater image to be detected is input into the target detection model to obtain the detection result of the underwater image to be detected.

2. The target detection method based on underwater images according to claim 1, characterized in that: The backbone network of the detection model includes a convolutional module and a feature extraction module connected in sequence. The feature extraction module includes several sub-feature extraction modules connected in sequence. The sub-feature extraction module includes a first branch and a second branch. The first branch includes a first sub-convolutional module, and the second branch includes a connected second sub-convolutional module and a bottleneck convolutional module. The step of inputting the sample underwater image set into the backbone network and obtaining backbone feature maps of each sample underwater image at several scales according to several preset scales includes the following steps: The underwater sample image is input into the convolution module to obtain the convolutional feature map of the underwater sample image; The convolutional feature map of the underwater sample image is used as the input feature map of the first sub-feature extraction module in the feature extraction module. Based on the first feature extraction algorithm in the first branch of the first sub-feature extraction module, the output feature map of the first branch of the first sub-feature extraction module is obtained. The first feature extraction algorithm is as follows: In the formula, This is the output feature map of the first sub-convolutional module of the first branch of the sub-feature extraction module. The input feature map for the sub-feature extraction module. This is the output channel of the first branch of the sub-feature extraction module. ( ) is the normalization function. ( ) is the activation function. ( ) represents the convolution function; Based on the second feature extraction algorithm in the first branch of the first sub-feature extraction module, the output feature map of the second branch of the first sub-feature extraction module is obtained, wherein the second feature extraction algorithm is: In the formula, This is the output feature map of the bottleneck convolutional module in the second branch of the sub-feature extraction module. For convolution activation functions, For concatenation functions, This is the output channel of the bottleneck convolution module in the second branch of the sub-feature extraction module. This is the bottleneck function; The output feature map of the first branch and the output feature map of the second branch of the first sub-feature extraction module are concatenated to obtain the output feature map of the first sub-feature extraction module, as follows: In the formula, The output feature map of the sub-feature extraction module; The output feature map of the first sub-feature extraction module is used as the input feature map of the next sub-feature extraction module. The above steps are repeated to obtain the output feature maps of each sub-feature extraction module, which are then used as the backbone feature maps of the several scales.

3. The target detection method based on underwater images according to claim 2, characterized in that: The first sub-convolutional module includes several connected static convolutional activation layers and dynamic convolutional activation layers.

4. The target detection method based on underwater images according to claim 2, characterized in that, The step of inputting the sample underwater image set into the backbone network and obtaining backbone feature maps of each sample underwater image at several scales according to several preset scales includes the following steps: An attention mechanism is employed to perform attention transformation on the backbone feature maps of the underwater images of each sample at several scales, thereby obtaining enhanced backbone feature maps of the underwater images of each sample at several scales.

5. The target detection method based on underwater images according to claim 1, characterized in that, The step of training the detection model based on the label data of several predicted regions and the label data of several labeled regions of the underwater images of each sample to obtain a target detection model includes the following steps: Based on the detection area corresponding to the same underwater image of the sample, the label data corresponding to the detection area, the corresponding label area, the label data corresponding to the label area, and a preset bulldozer distance loss function, several bulldozer distance loss values corresponding to the underwater image of the sample after data augmentation are obtained, wherein the bulldozer distance loss function is: In the formula, Let the distance function of the bulldozer be denoted as . Indicates the detection area. Indicates the label area. The x-coordinate parameter of the center point corresponding to the detection area. The ordinate parameter of the center point corresponding to the detection area. The width parameter corresponding to the detection area. Here, T represents the height parameter corresponding to the detection area. The x-coordinate parameter of the center point corresponding to the label area. The ordinate parameter is the center point of the label area. This refers to the width parameter corresponding to the label area. This is the height parameter corresponding to the label area; Based on the bulldozer distance loss values corresponding to several sample underwater images, a total loss function is constructed. The detection model is then trained using this total loss function to obtain a target detection model. The total loss function is: In the formula, For the total loss function, The preset scaling factor, The preset detector loss function, The first proportionality coefficient, This is the second proportionality coefficient. This is a preset constant.

6. The target detection method based on underwater images according to claim 1, characterized in that, Before inputting the sample underwater image set and sample label set into the improved detection model for training to obtain the target detection model, the following steps are included: Data augmentation is performed on several sample underwater images in the sample underwater image set to obtain the data-augmented sample underwater image set.

7. A target detection device based on underwater images, characterized in that, include: The data acquisition module is used to obtain a sample underwater image set and a sample label set. The sample underwater image set includes several sample underwater images, and the sample label set includes several label regions of several sample underwater images and several label data corresponding to the label regions. The backbone feature extraction module is used to obtain a preset detection model, which includes a backbone network, a neck network, and a prediction network. The sample underwater image set is input into the backbone network to obtain backbone feature maps of several scales for each sample underwater image. The feature fusion module is used to input the backbone feature maps of several scales of the underwater images of each sample into the neck network for feature fusion to obtain the fused feature map of each underwater image of the sample. The prediction module is used to input the fused feature map of each of the underwater sample images into the prediction network, and to divide each of the underwater sample images into grids according to the fused feature map of each of the underwater sample images to obtain the grid coordinate information of each of the underwater sample images. Based on the grid coordinate information of each sample underwater image and a preset detector, several prediction regions and label data of several prediction regions are obtained for each sample underwater image. The label data includes the center point coordinate parameters, width parameters, and height parameters of the label region. The center point coordinate parameters include the center point horizontal coordinate parameters and the center point vertical coordinate parameters. The model training module is used to train the detection model based on the label data of several predicted regions and the label data of several labeled regions of the underwater images of each sample, so as to obtain the target detection model. The detection module is used to respond to a detection command, obtain an underwater image to be detected, input the underwater image to be detected into the target detection model, and obtain the detection result of the underwater image to be detected.

8. A computer device, characterized in that, include: A processor, a memory, and a computer program stored in the memory and executable on the processor; the computer program, when executed by the processor, implements the steps of the underwater image-based target detection method as described in any one of claims 1 to 6.

9. A storage medium, characterized in that: The storage medium stores a computer program that, when executed by a processor, implements the steps of the underwater image-based target detection method as described in any one of claims 1 to 6.