Marine organism image processing method and device, electronic equipment and readable storage medium

By combining convolutional modules and Transformer codecs, the problem of poor imaging quality in underwater marine organism images is solved, and effective restoration of image details and quality improvement are achieved, making it suitable for image enhancement in complex marine environments.

CN116843912BActive Publication Date: 2026-06-23HAINAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HAINAN UNIV
Filing Date
2023-07-06
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Marine life images suffer from poor image quality during underwater imaging due to light attenuation and scattering, especially in complex environments where image degradation is severe. Existing technologies struggle to effectively improve image enhancement.

Method used

Convolutional modules with different kernels are used for progressive downsampling, and multi-level feature extraction and upsampling are performed in combination with Transformer encoder and decoder. Image quality is improved by deepening and fusing multi-level detailed features.

Benefits of technology

It effectively restores underwater image details and improves image quality, especially in complex marine environments, and is suitable for target detection and segmentation tasks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116843912B_ABST
    Figure CN116843912B_ABST
Patent Text Reader

Abstract

The application discloses a marine organism image processing method and device, electronic equipment and a readable storage medium, and is applied to the technical field of digital image processing. The method comprises the following steps: performing step-by-step down-sampling on a to-be-processed optical image by using a convolution module with different convolution kernels, so as to obtain a plurality of initial images with different scales. Global feature extraction is performed on each initial image under different scales, so as to obtain a plurality of multi-level feature maps; the multi-level feature maps are input into a Transformer encoder, and multi-level detail feature extraction is performed on the multi-level feature maps by adopting a multi-level feature deepening extraction fusion mode. The output features of the Transformer decoder and the multi-level detail features are up-sampled, so as to obtain an enhanced optical image. The application can solve the problem that the underwater image degradation phenomenon is serious or the details are blurred, and effectively improve the image enhancement effect of the marine organism image.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of digital image processing technology, and in particular to a method, apparatus, electronic device, and readable storage medium for processing marine biological images. Background Technology

[0002] With the continuous and in-depth exploration of marine natural resources, underwater observation is an effective means of understanding and utilizing marine resources. Underwater optical images, as data carrying rich underwater information, are widely used in the underwater observation process.

[0003] Imaging environments differ between the ocean and land. Ocean imaging involves more interference, resulting in lower image quality for marine life. For example, light is absorbed and scattered by water, causing attenuation and degrading the color and texture of marine organisms. Furthermore, as depth increases, the propagation of natural light sources becomes more difficult. Different frequencies of light from natural sources have different maximum depths of transmission; therefore, underwater images acquired at greater depths are typically more bluish or yellowish-green, and exhibit decreased quality, blurred details, and a darker appearance.

[0004] The image enhancement effects of related technologies on marine life images are not good, especially for marine life images taken in complex underwater environments or marine life images that have been severely degraded due to various marine factors.

[0005] Therefore, effectively improving the image enhancement effect of marine life images is a technical problem that needs to be solved by technicians in this field. Summary of the Invention

[0006] This application provides a method, apparatus, electronic device, and readable storage medium for processing marine organism images, which can effectively improve the image enhancement effect of marine organism images.

[0007] To solve the above-mentioned technical problems, this application provides the following technical solution:

[0008] This application provides a method for processing marine biological images, including:

[0009] By using convolution modules with different kernels to downsample the optical image to be processed, multiple initial images of different scales are obtained.

[0010] Global feature extraction at different scales is performed on each initial image to obtain multi-level feature maps;

[0011] The multi-level feature map is input into the Transformer encoder, and multi-level feature deepening extraction and fusion is used to extract multi-level detailed features from the multi-level feature map;

[0012] The output features and multi-level detail features of the Transformer decoder are upsampled to obtain the enhanced optical image.

[0013] Optionally, the convolution module using different convolution kernels performs step-by-step downsampling of the optical image to be processed to obtain multiple initial images at different scales, including:

[0014] A pre-trained downsampling model is provided, comprising a first MBConv module, a second MBConv module, and a third MBConv module. The first MBConv module has a 5*5 convolutional kernel and a stride of 2; the second MBConv module has a 5*5 convolutional kernel and a stride of 1; and the third MBConv module has a 3*3 convolutional kernel and a stride of 1.

[0015] The MBConv modules of the downsampling model are invoked to perform step-by-step downsampling of the optical image to be processed.

[0016] Optionally, the step of extracting global features at different scales for each initial image to obtain multi-level feature maps includes:

[0017] A pre-trained feature extraction model is provided, which includes a first dynamic convolutional layer, a residual block, and a second dynamic convolutional layer in sequence according to the data transmission method. The residual block includes a first convolutional layer, a second convolutional layer, a batch normalization layer, and an activation function layer. The first convolutional layer and the second convolutional layer do not change the size of each initial image.

[0018] The feature extraction model is invoked to perform global feature extraction on each initial image, resulting in feature maps at multiple scales. The multiple feature maps of different sizes of each initial image constitute a multi-level feature map.

[0019] Optionally, the step of using a multi-level feature deepening extraction and fusion method to extract multi-level detailed features from the multi-level feature map includes:

[0020] The target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature maps are input into the detail enhancement module to extract the first-level detail features;

[0021] The multi-level feature map is input into the global deepening module to extract the second-level detail features;

[0022] The output features of the Transformer encoder are then input into the local enhancement module to extract the third-level detail features.

[0023] Optionally, the upsampling of the output features and multi-level detail features of the Transformer decoder to obtain the enhanced optical image includes:

[0024] Block decoding processing is performed on the output features of the Transformer decoder and the third-level detail features;

[0025] By using transposed convolution, the decoded features obtained from block decoding, the first-level detail features, and the second-level detail features are upsampled to obtain an enhanced optical image.

[0026] Optionally, the step of inputting the multi-level feature map into the global deepening module to extract the second-level detail features includes:

[0027] The global deepening module includes an adaptive pooling layer, a convolutional layer module, and an adaptive spatial fusion layer; the convolutional layer includes multiple convolutional layers in parallel; the adaptive spatial fusion layer includes a first branch, a second branch, and a weight aggregation layer; the first branch includes a splicing layer, a fourth convolutional layer, a fifth convolutional layer, and a sigmoid function layer;

[0028] The adaptive pooling layer is used to perform a ratio-invariant adaptive pooling operation on the multi-level feature map to generate multiple contextual features of different scales.

[0029] Each convolutional layer of the aforementioned convolutional layer module independently performs convolution processing on a single contextual feature of the input.

[0030] The convolutional processing results for each context are simultaneously input into the first branch and the second branch, and aggregated into a fused feature map through the weight aggregation layer as a second-level detail feature.

[0031] Optionally, the step of inputting each target feature map whose size is between the maximum and minimum size in the multi-level feature maps to the detail enhancement module to extract the first-level detail features includes:

[0032] The detail enhancement module includes an input layer, a first multilayer perceptron, a layer normalization layer, a second multilayer perceptron, and an output layer; both the first and second multilayer perceptrons include a first fully connected layer, a ReLU activation function, and a second fully connected layer; the outputs of the first and second multilayer perceptrons serve as the inputs to the output layer;

[0033] The target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature maps are input to the input layer of the detail enhancement module, and are processed sequentially by the first multilayer perceptron, the layer normalization layer, and the second multilayer perceptron, and finally output as the first level detail features through the output layer.

[0034] Another aspect of this application provides a marine organism image processing apparatus, comprising:

[0035] The stepwise downsampling module is used to perform stepwise downsampling of the optical image to be processed using convolution modules with different convolution kernels, so as to obtain multiple initial images of different scales;

[0036] The multi-level feature map extraction module is used to extract global features at different scales for each initial image to obtain multi-level feature maps;

[0037] The encoding module is used to input the multi-level feature map into the Transformer encoder;

[0038] The detail feature extraction module is used to extract multi-level detail features from the multi-level feature map using a multi-level feature deepening extraction and fusion method.

[0039] The image enhancement module is used to upsample the output features and multi-level detail features of the Transformer decoder to obtain an enhanced optical image.

[0040] This application also provides an electronic device including a processor for executing a computer program stored in a memory to implement the steps of the marine life image processing method as described in any of the preceding claims.

[0041] Finally, this application also provides a readable storage medium storing a computer program that, when executed by a processor, implements the steps of the marine life image processing method as described in any of the preceding claims.

[0042] The advantages of the technical solution provided in this application lie in its use of a Transformer codec to encode and decode the original optical image. Applying Transformer enhances image enhancement and deepens detailed features. Furthermore, multi-level detail enhancement is used to enhance the original optical image, fully utilizing features at different scales and effectively enhancing details. This significantly improves the image quality of degraded or blurred optical images, particularly suitable for complex underwater environments and degraded images heavily influenced by various marine factors. It effectively restores image details, facilitating the recovery of useful information and holding significance for downstream tasks such as target detection and segmentation.

[0043] Furthermore, this application also provides corresponding implementation devices, electronic devices, and readable storage media for marine biological image processing methods, further making the methods more practical. The devices, electronic devices, and readable storage media have corresponding advantages.

[0044] It should be understood that the above general description and the following detailed description are merely exemplary and do not limit this application. Attached Figure Description

[0045] To more clearly illustrate the technical solutions of this application or related technologies, the drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0046] Figure 1 A flowchart illustrating a marine organism image processing method provided in this application;

[0047] Figure 2 A schematic diagram of the structural framework of an image enhancement model for an exemplary application scenario provided in this application;

[0048] Figure 3 A schematic diagram of the structural framework of a global deepening module for an exemplary application scenario provided in this application;

[0049] Figure 4 A schematic diagram of the structural framework of an adaptive spatial fusion module for an exemplary application scenario provided in this application;

[0050] Figure 5 A schematic diagram of the structural framework of a detail enhancement module for an exemplary application scenario provided in this application;

[0051] Figure 6 A schematic diagram of the structural framework of a local deepening module for an exemplary application scenario provided in this application;

[0052] Figure 7 A structural diagram of a specific embodiment of the marine organism image processing device provided in this application;

[0053] Figure 8 A structural diagram of one specific embodiment of the electronic device provided in this application. Detailed Implementation

[0054] To enable those skilled in the art to better understand the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are merely some embodiments of the present application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0055] The terms "first," "second," "third," "fourth," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may include steps or units not listed. Various non-limiting embodiments of this application are described in detail below.

[0056] First see Figure 1 , Figure 1 A flowchart illustrating a marine organism image processing method provided in this application, which may include the following:

[0057] S101: Using convolution modules with different kernels, the optical image to be processed is downsampled step by step to obtain multiple initial images of different scales.

[0058] In this step, the optical image to be processed can be an underwater image of a marine organism, or an image of a terrestrial environment, or other optical images that are degraded or have blurred details. The initial image is a downsampled version of the optical image to be processed.

[0059] S102: Perform global feature extraction at different scales on each initial image to obtain multi-level feature maps.

[0060] The previous step obtained multiple initial images at different scales. Global feature extraction was performed on the initial images at each scale, resulting in multiple feature maps at different scales for each image. This step is collectively referred to as multi-level feature maps.

[0061] S103: Input the multi-level feature map into the Transformer encoder, and use the multi-level feature deepening extraction and fusion method to extract multi-level detailed features from the multi-level feature map.

[0062] This step utilizes a Transformer encoder to encode the multi-level feature maps extracted in the previous step, which can better refine the detailed features and improve the image enhancement effect. To fully utilize the features of the image at different scales, thereby enhancing the details of the final output image, a multi-level feature enhancement extraction and fusion method is used to extract detailed features of the image at different scales in different ways and fuse the extracted detailed features into the final output image.

[0063] S104: Upsample the output features and multi-level detail features of the Transformer decoder to obtain the enhanced optical image.

[0064] The optical image in this step is the image obtained after processing the image to be processed by S101-S104. The input to the Transformer decoder is the output feature of the Transformer encoder. The multi-level detail features are the detail features obtained by extracting multi-level detail features from the multi-level feature map using the multi-level feature deepening extraction and fusion method in the previous step. The output feature of the Transformer decoder and the multi-level detail features are the objects of upsampling. After scale restoration by the multi-level upsampling module, combined with the previous multi-level detail deepening module, an enhanced image with good output effect is obtained.

[0065] In the technical solution provided in this application, the original optical image is encoded and decoded using a Transformer codec. Applying Transformer enhances the image enhancement effect and deepens detailed features. Furthermore, multi-level detail enhancement is used to enhance the original optical image, fully utilizing features at different scales and effectively enhancing details. This significantly improves the image quality of degraded or blurred optical images, particularly suitable for complex underwater environments and degraded images heavily influenced by various marine factors. It effectively restores image details, facilitating the recovery of useful information within the image and holding significance for downstream tasks such as target detection and segmentation.

[0066] It should be noted that there is no strict order of execution for the steps in this application. As long as they conform to a logical order, these steps can be executed simultaneously or in a certain preset order. Figure 1 This is just an illustration and does not mean that this is the only possible execution order.

[0067] In the above embodiments, there is no limitation on how to perform step S101. This embodiment provides an illustrative downsampling method, which may include the following:

[0068] A pre-trained downsampling model is used, comprising a first MBConv module, a second MBConv module, and a third MBConv module. The first MBConv module has a 5x5 kernel with a stride of 2; the second MBConv module has a 5x5 kernel with a stride of 1; and the third MBConv module has a 3x3 kernel with a stride of 1. Each MBConv module of the downsampling model is called to progressively downsample the optical image to be processed through dynamic convolution. The depthwise separable convolution kernels in each MBConv module have different sizes: initially 5x5 with a stride of 2, then 5x5 with a stride of 1, and finally 3x3 with a stride of 1x1. This progressive downsampling gradually reduces the size of the optical image to be processed, obtaining initial images at different scales while limiting the memory required for subsequent Transformer codec processing, thus improving overall image processing efficiency.

[0069] In the above embodiments, there is no limitation on how to perform step S102. This embodiment provides an illustrative global feature extraction method, which may include the following:

[0070] A pre-trained feature extraction model is used, which includes a first dynamic convolutional layer, a residual block, and a second dynamic convolutional layer in sequence according to the data transmission method. The residual block includes a first convolutional layer, a second convolutional layer, a batch normalization layer, and an activation function layer. The first and second convolutional layers do not change the size of each initial image. The feature extraction model is called to perform global feature extraction on each initial image to obtain feature maps of multiple scales. The multiple feature maps of different sizes of each initial image constitute a multi-level feature map.

[0071] This embodiment uses convolution combined with residual blocks to construct a feature extraction model for extracting global features. Dynamic convolution is used to extract features by residual connection. The residual block consists of two convolutional layers that do not change the image size, combined with batch normalization and activation functions, thereby obtaining multi-level feature maps at different sizes, preparing for subsequent feature enhancement.

[0072] In the above embodiments, there is no limitation on how to perform step S103. This embodiment provides an illustrative method for multi-level detail feature extraction, such as... Figure 2 As shown, it may include the following:

[0073] The target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature map are input into the detail enhancement module to extract the first level of detail features; the multi-level feature maps are input into the global enhancement module to extract the second level of detail features; the output features of the Transformer encoder are also input into the local enhancement module to extract the third level of detail features.

[0074] In this case, the smallest feature map in the multi-level feature map can be first divided into blocks for encoding, and the encoded image can be combined with features of other sizes. Figure 1 The input is the Transformer decoder. For feature maps of intermediate sizes, detail enhancement modules can be used for detail feature extraction; for feature maps of various sizes in multi-level feature maps, global enhancement modules can be used for detail feature extraction; for the output of the Transformer encoder, local enhancement modules can be used for feature extraction.

[0075] As an optional implementation method, such as Figure 3 As shown, the global deepening module includes adaptive pooling layers, convolutional layer modules, and adaptive spatial fusion layers; the convolutional layers consist of multiple parallel convolutional layers; as... Figure 4 As shown, the adaptive spatial fusion layer includes a first branch, a second branch, and a weight aggregation layer. The first branch includes a concatenation layer, a fourth convolutional layer, a fifth convolutional layer, and a sigmoid function layer. The adaptive pooling layer performs ratio-invariant adaptive pooling operations on the multi-level feature maps to generate multiple context features of different scales. Each convolutional layer of the convolutional layer module independently performs convolution processing on one of the input context features. The convolution processing results of each context are simultaneously input to the first branch and the second branch, and aggregated into a fused feature map by the weight aggregation layer as the second-level detail features.

[0076] In this embodiment, the global deepening module first employs ratio-invariant adaptive pooling to generate multiple context features with different scales. Then, each context feature independently undergoes a convolutional layer with a 1×1 kernel. Finally, it can be upsampled using bilinear interpolation for subsequent fusion. However, considering the aliasing effect caused by interpolation, this step utilizes an adaptive spatial fusion layer to adaptively combine these context features, rather than simply summing them. Specifically, the adaptive spatial fusion layer takes the upsampled features as input and generates a spatial weight map for each feature. The weights are used to aggregate the context features into a fused feature map, which imparts multi-scale contextual information. Through the module of the adaptive spatial fusion layer, these context features are adaptively combined, and this feature map is endowed with multi-scale contextual information.

[0077] As another alternative implementation method, such as Figure 5As shown, the detail enhancement module includes an input layer, a first multilayer perceptron, a layer normalization layer, a second multilayer perceptron, and an output layer. Both the first and second multilayer perceptrons include a first fully connected layer, a ReLU activation function, and a second fully connected layer. The outputs of the first and second multilayer perceptrons serve as the input to the output layer. Target feature maps whose sizes fall between the maximum and minimum dimensions in the multi-level feature maps are input to the input layer of the detail enhancement module, processed sequentially by the first multilayer perceptron, the layer normalization layer, and the second multilayer perceptron, and finally output as the first-level detail features through the output layer.

[0078] In this embodiment, the detail enhancement module is a small feature extraction network built from a multilayer perceptron, used to enhance some details. Specifically, it uses two multilayer perceptron blocks connected across layers. The structure of the multilayer perceptron blocks is not fixed. They are usually connected by alternating fully connected layers and activation functions. The number of layers can be adjusted according to the actual situation.

[0079] As another alternative implementation method, such as Figure 6 As shown, the local enhancement module includes an input layer, an adaptive filter, a Transformer decoder, and an output layer. The input of the Transformer decoder has different learnable parameters than the main Transformer codec group, which is used to observe more different detailed features.

[0080] As another alternative implementation method, such as Figure 2 As shown, block decoding processing is performed on the output features and third-level detail features of the Transformer decoder. This allows the smallest-size feature map, i.e., the most deeply fused feature map, to be reconstructed from the sequence into an image form through the output of the Transformer decoder. Transposed convolution can be used to upsample the decoded features, first-level detail features, and second-level detail features obtained from block decoding, resulting in an enhanced optical image. Furthermore, learning can be performed simultaneously with upsampling, achieving better restoration results compared to interpolation-based upsampling. At each level of upsampling, the previously deepened multi-level feature maps are fused until the final enhanced output image is obtained. Learnable parameters are used to control the fusion process to achieve a more suitable fusion effect.

[0081] Finally, to improve the overall practicality of marine life image processing, this application provides another embodiment, which may include the following:

[0082] This embodiment pre-trains an image enhancement model. The network architecture of this model is an end-to-end network. By inputting the optical image to be processed, such as the original underwater image, into the image enhancement model, the enhanced optical image can be directly obtained. The main structure of the image enhancement model uses convolution and Transformer for processing, and incorporates multi-level detail enhancement modules for image enhancement. Optionally, the image enhancement model includes an input layer, a downsampling model, a feature extraction model, a Transformer encoder, a Transformer decoder, a detail enhancement module, a global enhancement module, a local enhancement module, an upsampling model, and an output layer.

[0083] A1: The original optical image to be enhanced undergoes some preprocessing, such as noise reduction and size normalization, and is then input into the input layer of the image enhancement model.

[0084] A2: Use image enhancement models to enhance this type of degraded optical images.

[0085] A3: Output and obtain the final enhanced result image;

[0086] For A1 above, since the size of the input part of the network of the image enhancement model is fixed, it is necessary to preprocess the optical images to be enhanced (such as underwater optical images) obtained from various sources to make the size of the input images appropriate.

[0087] For A2 above, the MBConv modules of the downsampling model in the above embodiments are used to perform step-by-step downsampling of the optical image to be processed through dynamic convolution. Then, the feature extraction model in the above embodiments is used to perform global feature extraction on each initial image obtained by step-by-step downsampling, resulting in feature maps of multiple scales. Multiple feature maps of different sizes in each initial image constitute multi-level feature maps. The multi-level feature maps are input to the Transformer encoder. The target feature maps in the multi-level feature maps whose size is between the maximum and minimum size are input to the detail enhancement module to extract the first level of detail features. The multi-level feature maps are input to the global enhancement module to extract the second level of detail features. The output features of the Transformer encoder are also input to the local enhancement module to extract the third level of detail features. Block decoding processing is performed on the output features of the Transformer decoder and the third level of detail features. Transposed convolution is used to upsample the decoded features obtained by block decoding, the first level of detail features, and the second level of detail features to obtain the enhanced optical image. Combining the above multi-level feature extraction modules, the main body adopts multi-level upsampling and downsampling, encoding and decoding, and the Transformer encoder-decoder group to form the entire image enhancement model. The enhanced image obtained after passing through the main augmentation network can provide a clear and high-quality image for subsequent downstream vision tasks.

[0088] This application also provides a corresponding apparatus for marine life image processing methods, further enhancing the practicality of the methods. The apparatus can be described from both functional module and hardware perspectives. The marine life image processing apparatus provided in this application is described below. This apparatus is used to implement the marine life image processing method provided in this application. In this embodiment, the marine life image processing apparatus may include or be divided into one or more program modules. These one or more program modules are stored in a storage medium and executed by one or more processors to complete the marine life image processing method disclosed in Embodiment 1. The program module referred to in this application is a series of computer program instruction segments capable of performing specific functions, which is more suitable than the program itself for describing the execution process of the marine life image processing apparatus in the storage medium. The following description will specifically introduce the functions of each program module in this embodiment. The marine life image processing apparatus described below can be referred to in correspondence with the marine life image processing method described above.

[0089] From the perspective of functional modules, see Figure 7 , Figure 7 A structural diagram of the marine life image processing apparatus provided in this application under one specific embodiment, the apparatus may include:

[0090] The step-by-step downsampling module 701 is used to perform step-by-step downsampling on the optical image to be processed using convolution modules with different convolution kernels to obtain multiple initial images of different scales.

[0091] The multi-level feature map extraction module 702 is used to extract global features at different scales for each initial image to obtain multi-level feature maps.

[0092] Encoding module 703 is used to input multi-level feature maps into the Transformer encoder.

[0093] The detail feature extraction module 704 is used to extract multi-level detail features from multi-level feature maps using a multi-level feature deepening extraction and fusion method.

[0094] Image enhancement module 705 is used to upsample the output features and multi-level detail features of the Transformer decoder to obtain an enhanced optical image.

[0095] Optionally, in some embodiments of this example, the stepwise downsampling module 701 described above can also be used for: pre-training a downsampling model, the downsampling model including a first MBConv module, a second MBConv module and a third MBConv module, the first MBConv module having a 5*5 convolution kernel and a stride of 2; the second MBConv having a 5*5 convolution kernel and a stride of 1; and the third MBConv module having a 3*3 convolution kernel and a stride of 1; and calling each MBConv module of the downsampling model to perform stepwise downsampling of the optical image to be processed.

[0096] As an optional implementation, the multi-level feature map extraction module 702 can also be used to: pre-train a feature extraction model, which includes a first dynamic convolutional layer, a residual block, and a second dynamic convolutional layer in sequence according to the data transmission method; the residual block includes a first convolutional layer, a second convolutional layer, a batch normalization layer, and an activation function layer; the first convolutional layer and the second convolutional layer do not change the size of each initial image; and call the feature extraction model to perform global feature extraction on each initial image to obtain feature maps of multiple scales, wherein multiple feature maps of different sizes of each initial image constitute a multi-level feature map.

[0097] Optionally, in some embodiments of this example, the above-mentioned detail feature extraction module 704 can also be used to: input each target feature map whose size is between the maximum and minimum size in the multi-level feature map to the detail enhancement module to extract the first level detail features; input the multi-level feature map to the global enhancement module to extract the second level detail features; and input the output features of the Transformer encoder to the local enhancement module to extract the third level detail features.

[0098] As an optional implementation of the above embodiments, the image enhancement module 705 can also be used to: perform block decoding processing on the output features and third-level detail features of the Transformer decoder; and upsample the decoded features, first-level detail features and second-level detail features obtained by block decoding using transposed convolution to obtain an enhanced optical image.

[0099] As another optional implementation of the above embodiments, the detailed feature extraction module 704 can also be used for: a global deepening module including an adaptive pooling layer, a convolutional layer module, and an adaptive spatial fusion layer; the convolutional layer including multiple parallel convolutional layers; the adaptive spatial fusion layer including a first branch, a second branch, and a weight aggregation layer; the first branch including a concatenation layer, a fourth convolutional layer, a fifth convolutional layer, and a sigmoid function layer; using the adaptive pooling layer to perform ratio-invariant adaptive pooling operations on multi-level feature maps to generate multiple context features of different scales; using each convolutional layer of the convolutional layer module to independently perform convolution processing on an input context feature; the convolution processing results of each context are simultaneously input to the first branch and the second branch, and aggregated into a fused feature map through the weight aggregation layer as a second-level detailed feature.

[0100] As another optional implementation of the above embodiments, the detail feature extraction module 704 can also be used in the following ways: the detail enhancement module includes an input layer, a first multilayer perceptron, a layer normalization layer, a second multilayer perceptron, and an output layer; both the first and second multilayer perceptrons include a first fully connected layer, a ReLU activation function, and a second fully connected layer; the output of the first and second multilayer perceptrons serves as the input to the output layer. The target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature maps are input to the input layer of the detail enhancement module, processed sequentially by the first multilayer perceptron, the layer normalization layer, and the second multilayer perceptron, and finally output as first-level detail features through the output layer.

[0101] The functions of each functional module of the marine biological image processing device described in this application can be specifically implemented according to the methods in the above method embodiments. The specific implementation process can be referred to the relevant descriptions in the above method embodiments, which will not be repeated here.

[0102] As can be seen from the above, this embodiment can solve the problem of severe underwater image degradation or blurred details, and effectively improve the image enhancement effect of marine life images.

[0103] The marine life image processing device mentioned above is described from the perspective of functional modules. Furthermore, this application also provides an electronic device, which is described from the perspective of hardware. Figure 8 This is a schematic diagram of the structure of the electronic device provided in one embodiment of this application. For example... Figure 8As shown, the electronic device includes a memory 80 for storing a computer program; and a processor 81 for executing the computer program to implement the steps of the marine life image processing method mentioned in any of the above embodiments.

[0104] The processor 81 may include one or more processing cores, such as a quad-core processor or an octa-core processor. The processor 81 may also be a controller, microcontroller, microprocessor, or other data processing chip. The processor 81 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). The processor 81 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, the processor 81 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, the processor 81 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0105] The memory 80 may include one or more computer-readable storage media, which may be non-transitory. The memory 80 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the memory 80 may be an internal storage unit of an electronic device, such as a server hard drive. In other embodiments, the memory 80 may be an external storage device of an electronic device, such as a plug-in hard drive on a server, a smart media card (SMC), a secure digital card (SD), a flash card, etc. Furthermore, the memory 80 may include both internal and external storage units of the electronic device. The memory 80 can be used not only to store application software and various types of data installed on the electronic device, such as code for programs executing the marine life image processing method, but also to temporarily store data that has been output or will be output. In this embodiment, the memory 80 is used to store at least the following computer program 801, which, after being loaded and executed by the processor 81, is capable of implementing the relevant steps of the marine life image processing method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 80 may also include an operating system 802 and data 803, and the storage method may be temporary storage or permanent storage. The operating system 802 may include Windows, Unix, Linux, etc. The data 803 may include, but is not limited to, data corresponding to the results of marine biological image processing.

[0106] In some embodiments, the aforementioned electronic device may further include a display screen 82, an input / output interface 83, a communication interface 84 (or network interface), a power supply 85, and a communication bus 86. The display screen 82 and input / output interface 83, such as a keyboard, are user interfaces; optional user interfaces may also include standard wired interfaces, wireless interfaces, etc. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, or an OLED (Organic Light-Emitting Diode) touchscreen, etc. The display may also be appropriately referred to as a display screen or display unit, used to display information processed in the electronic device and to display a visual user interface. The communication interface 84 may optionally include a wired interface and / or a wireless interface, such as a Wi-Fi interface, a Bluetooth interface, etc., typically used to establish communication connections between the electronic device and other electronic devices. The communication bus 86 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. This bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, Figure 8 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0107] Those skilled in the art will understand that Figure 8 The structure shown does not constitute a limitation on the electronic device and may include more or fewer components than shown, such as sensors 87 that perform various functions.

[0108] The functions of each functional module of the electronic device described in this application can be specifically implemented according to the methods in the above method embodiments. The specific implementation process can be referred to the relevant descriptions in the above method embodiments, and will not be repeated here.

[0109] As can be seen from the above, this embodiment can solve the problem of severe underwater image degradation or blurred details, and effectively improve the image enhancement effect of marine life images.

[0110] It is understood that if the marine life image processing method in the above embodiments is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the related technology, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and executes all or part of the steps of the methods in the various embodiments of this application. The aforementioned storage medium includes: USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), electrically erasable programmable ROM, register, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, removable disk, CD-ROM, magnetic disk or optical disk, and other media capable of storing program code.

[0111] Based on this, this application also provides a readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the marine life image processing method described in any of the above embodiments.

[0112] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the hardware disclosed in the embodiments, including devices and electronic equipment, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple; relevant parts can be referred to the method section.

[0113] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0114] The above provides a detailed description of a marine life image processing method, apparatus, electronic device, and readable storage medium provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are merely for the purpose of helping to understand the method and its core ideas. It should be noted that those skilled in the art can make various improvements and modifications to this application without departing from its principles, and these improvements and modifications also fall within the protection scope of the claims of this application.

Claims

1. A method for processing marine organism images, characterized in that, include: The optical image to be processed is downsampled stepwise using convolutional modules with different kernels to obtain multiple initial images at different scales. The optical image to be processed is an optical image with degradation or blurred details. A downsampling model is pre-trained. The downsampling is performed by a downsampling model containing a first MBConv module, a second MBConv module, and a third MBConv module. The first MBConv module has a 5*5 convolutional kernel with a stride of 2, the second MBConv module has a 5*5 convolutional kernel with a stride of 1, and the third MBConv module has a 3*3 convolutional kernel with a stride of 1. Global feature extraction at different scales is performed on each initial image to obtain multi-level feature maps. A feature extraction model is pre-trained, and the global feature extraction is performed through this model. The feature extraction model, in the order of data transmission, includes a first dynamic convolutional layer, a residual block, and a second dynamic convolutional layer. The residual block includes a first convolutional layer, a second convolutional layer, a batch normalization layer, and an activation function layer. The first and second convolutional layers do not change the size of each initial image. The multi-level feature maps are input to a Transformer encoder, and the target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature maps are input to a detail enhancement module to extract the first-level detail features; the multi-level feature maps are input to a global enhancement module to extract the second-level detail features; the output features of the Transformer encoder are also input to a local enhancement module to extract the third-level detail features; the detail enhancement module includes an input layer, a first multilayer perceptron, a layer normalization layer, a second multilayer perceptron, and an output layer; both the first and second multilayer perceptrons include a first fully connected layer, a ReLU activation function, and a second fully connected layer; the outputs of the first and second multilayer perceptrons are used as the inputs to the output layer; the global enhancement module includes an adaptive pooling layer, a convolutional layer module, and an adaptive spatial fusion layer; the convolutional layer includes multiple parallel convolutional layers; the adaptive spatial fusion layer includes a first branch, a second branch, and a weight aggregation layer; the first branch includes a concatenation layer, a fourth convolutional layer, a fifth convolutional layer, and a sigmoid function layer; The output features and multi-level detail features of the Transformer decoder are upsampled to obtain the enhanced optical image.

2. The marine organism image processing method according to claim 1, characterized in that, The convolutional modules using different kernels perform progressive downsampling of the optical image to be processed, resulting in multiple initial images at different scales, including: The MBConv modules of the downsampling model are invoked to perform step-by-step downsampling of the optical image to be processed.

3. The marine organism image processing method according to claim 1, characterized in that, The process of extracting global features at different scales from each initial image to obtain multi-level feature maps includes: The feature extraction model is invoked to perform global feature extraction on each initial image, resulting in feature maps at multiple scales. The multiple feature maps of different sizes of each initial image constitute a multi-level feature map.

4. The marine organism image processing method according to claim 1, characterized in that, The upsampling of the output features and multi-level detail features of the Transformer decoder to obtain the enhanced optical image includes: Block decoding processing is performed on the output features of the Transformer decoder and the third-level detail features; By using transposed convolution, the decoded features obtained from block decoding, the first-level detail features, and the second-level detail features are upsampled to obtain an enhanced optical image.

5. The marine organism image processing method according to claim 1, characterized in that, The step of inputting the multi-level feature map into the global deepening module to extract the second-level detailed features includes: The adaptive pooling layer is used to perform a ratio-invariant adaptive pooling operation on the multi-level feature map to generate multiple contextual features of different scales. Each convolutional layer of the aforementioned convolutional layer module independently performs convolution processing on a single contextual feature of the input. The convolutional processing results for each context are simultaneously input into the first branch and the second branch, and aggregated into a fused feature map through the weight aggregation layer as a second-level detail feature.

6. The marine organism image processing method according to claim 1, characterized in that, The step involves inputting the target feature maps whose sizes fall between the maximum and minimum sizes from the multi-level feature maps into the detail enhancement module to extract the first-level detail features, including: The target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature maps are input to the input layer of the detail enhancement module, and are processed sequentially by the first multilayer perceptron, the layer normalization layer, and the second multilayer perceptron, and finally output as the first level detail features through the output layer.

7. A marine organism image processing device, characterized in that, include: A progressive downsampling module is used to progressively downsample the optical image to be processed using convolutional modules with different kernels to obtain multiple initial images at different scales. The optical image to be processed is an optical image with degradation or blurred details. A downsampling model is pre-trained, and the downsampling is performed using a downsampling model containing a first MBConv module, a second MBConv module, and a third MBConv module. The first MBConv module has a 5*5 kernel with a stride of 2, the second MBConv module has a 5*5 kernel with a stride of 1, and the third MBConv module has a 3*3 kernel with a stride of 1. A multi-level feature map extraction module is used to extract global features at different scales from each initial image to obtain multi-level feature maps. A pre-trained feature extraction model is used to perform global feature extraction. The feature extraction model, in the order of data transmission, includes a first dynamic convolutional layer, a residual block, and a second dynamic convolutional layer. The residual block includes a first convolutional layer, a second convolutional layer, a batch normalization layer, and an activation function layer. The first and second convolutional layers do not change the size of each initial image. The encoding module is used to input the multi-level feature map into the Transformer encoder; The detail feature extraction module is used to input the target feature maps whose sizes are between the maximum and minimum sizes in the multi-level feature maps to the detail enhancement module to extract the first-level detail features; input the multi-level feature maps to the global enhancement module to extract the second-level detail features; and input the output features of the Transformer encoder to the local enhancement module to extract the third-level detail features. The detail enhancement module includes an input layer, a first multilayer perceptron, a layer normalization layer, a second multilayer perceptron, and an output layer. The first multilayer perceptron and the second multilayer perceptron each include a first fully connected layer, a ReLU activation function, and a second fully connected layer. The outputs of the first multilayer perceptron and the second multilayer perceptron serve as the inputs to the output layer. The global enhancement module includes an adaptive pooling layer, a convolutional layer module, and an adaptive spatial fusion layer. The convolutional layer includes multiple convolutional layers in parallel. The adaptive spatial fusion layer includes a first branch, a second branch, and a weight aggregation layer. The first branch includes a concatenation layer, a fourth convolutional layer, a fifth convolutional layer, and a sigmoid function layer. The image enhancement module is used to upsample the output features and multi-level detail features of the Transformer decoder to obtain an enhanced optical image.

8. An electronic device, characterized in that, It includes a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the steps of the marine life image processing method as described in any one of claims 1 to 6.

9. A readable storage medium, characterized in that, The readable storage medium stores a computer program that, when executed by a processor, implements the steps of the marine life image processing method as described in any one of claims 1 to 6.