Shipwreck target sample generation method and system for side scan sonar image based on improved CycleGAN model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By improving the global and local feature extraction and cross-attention mechanism of the CycleGAN model, and combining VGG perceptual loss and SSIM loss, the problem of scarce annotations in side-scan sonar images is solved, generating high-quality shipwreck target samples and improving detection performance.

CN122223352APending Publication Date: 2026-06-16ANHUI UNIVERSITY OF ARCHITECTURE

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: ANHUI UNIVERSITY OF ARCHITECTURE
Filing Date: 2026-03-16
Publication Date: 2026-06-16

Application Information

Patent Timeline

16 Mar 2026

Application

16 Jun 2026

Publication

CN122223352A

IPC: G06V10/40; G06V10/54; G06V10/80; G06V10/82; G06N3/0475; G06N3/045; G06N3/094

AI Tagging

Application Domain

Character and pattern recognition Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122223352A_ABST

Patent Text Reader

Abstract

The application belongs to the technical field of image processing, and particularly relates to a side-scan sonar image shipwreck target sample generation method and system based on an improved CycleGAN model; the method comprises the following steps: obtaining optical ship images and side-scan sonar shipwreck images, and performing pretreatment to generate a sample set; an improved generative adversarial network model is constructed; the sample set is input into the model for training, global features and local features of the shipwreck in the optical images are extracted by a first generator, and bidirectional fusion is performed through a cross-attention mechanism to generate a pseudo side-scan sonar image; sonar echo features of the pseudo image and the real image are extracted based on a VGG network, a perception loss is obtained, and model parameters are optimized in combination with an SSIM loss; a to-be-converted optical image is input into the trained model, and a shipwreck target sample is output. The application solves the problem of a lack of sonar image samples, the generated sample has real sonar imaging features, and the underwater target detection performance can be effectively improved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of image processing technology, specifically relating to a method and system for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model. Background Technology

[0002] Side-scan sonar is an important underwater detection device that generates sonar images by emitting sound waves and receiving echoes from the seabed. It is widely used in underwater target detection, seabed mapping, and other fields. With the rapid development of deep learning technology, target detection methods based on convolutional neural networks have achieved remarkable results in optical image processing. However, applying them to shipwreck target detection in side-scan sonar images faces severe challenges: acquiring annotated sonar image data is costly, requiring specialized domain knowledge for interpretation, making it difficult to construct large-scale, high-quality annotated datasets; traditional data augmentation methods generate samples with limited representativeness and insufficient generalization ability; existing generative adversarial network models suffer from insufficient feature transfer, blurred details in generated images, and insufficient semantic consistency in cross-domain image transformation, making it difficult to meet the requirements for generating high-quality samples.

[0003] To address the data scarcity problem in sonar image processing, existing technologies have proposed several semi-supervised learning methods. For example, Chinese patent application CN116129117A discloses a semi-supervised semantic segmentation method and system for small sonar targets based on multi-head attention. This method introduces the multi-head attention mechanism into a recurrent generative adversarial network (RGAN) and applies the resulting RGAN to a semi-supervised semantic segmentation network model to improve the semi-supervised semantic segmentation effect for small sonar targets. However, this technical solution mainly addresses the semantic segmentation problem of sonar images, aiming to segment small targets from the background rather than generating new sonar image samples. The multi-head self-attention mechanism it employs models long-distance feature dependencies on a single feature path, failing to adequately address the disconnect between global semantic features and local detail features during the transfer from the optical domain to the sonar domain. Therefore, achieving high-quality cross-domain conversion from optical images to side-scan sonar images to generate shipwreck target samples with realistic sonar imaging characteristics remains a pressing technical problem in this field. Summary of the Invention

[0004] The purpose of this invention is to provide a method and system for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model, in order to solve the technical problem in the prior art that it is difficult to generate high-quality shipwreck target samples due to the scarcity of labeled samples in side-scan sonar images and the insufficient feature transfer, blurred details, and insufficient semantic consistency in cross-domain generation methods.

[0005] The present invention achieves the above objectives through the following technical solutions: Firstly, this invention proposes a method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model. The method includes: Acquire optical ship images and sonar shipwreck images and preprocess them to generate a sample set; An improved generative adversarial network model is constructed, the model including a first generator for forward transformation, a second generator for reverse transformation, and a first discriminator and a second discriminator corresponding to the two generators respectively; The sample set is input into the model for training. The first generator extracts the hull topology and hull plate texture features of the shipwreck in the optical ship image and fuses them to generate a pseudo side-scan sonar image. The second generator converts the side-scan sonar shipwreck image into a pseudo optical image. The first discriminator and the second discriminator respectively judge the authenticity of the generated image. High-level semantic features of pseudo-sonar images and real sonar images are extracted based on VGG network to obtain perceptual loss, and SSIM loss is obtained based on structural similarity. The trained sample generation model is obtained by optimizing the model parameters based on the perceptual loss and the SSIM loss. The optical ship image to be converted is input into the sample generation model, and the sunken ship target sample is output.

[0006] Furthermore, the first generator extracts the overall topological structure and aspect ratio features of the shipwreck from the optical ship image as global features F. g ∈R C×H×W Extract the shipwreck's plank texture and edge detail features as local features F. l ∈R C×H×W The global features are extracted using a 5×5 convolution kernel, and the local features are extracted using a 3×3 convolution kernel.

[0007] Furthermore, the first generator fuses the global features and the local features through a cross-attention mechanism to obtain a fused feature map, and then performs upsampling processing on the fused feature map to initially generate the pseudo-sonar image.

[0008] Furthermore, the global features and the local features are fused through a cross-attention mechanism, specifically including: The global features are mapped to a global query matrix Q using a 1×1 convolution. g Global key matrix K g and global value matrix V g The local features are mapped to a local query matrix Q through 1×1 convolution. l Local bond matrix K l and local value matrix V l ; With the global query matrix Q g For semantic queries, the focus is on the shipwreck's hull outline, aspect ratio, and overall orientation, using the local key matrix K. l and local value matrix V l For detail keys, calculate the first attention weight matrix as follows: ; For the local value matrix V l We perform weighted summation to obtain enhanced local features under global guidance, as shown in the following formula: ; With the local query matrix Q l For detailed queries, focus on the deck structure, hull edges, and surface texture of the shipwreck, using the global key matrix K. g and global value matrix V g For the structural key values, calculate the second attention weight matrix as follows: ; For the global value matrix V g We perform weighted summation to obtain the enhanced global features after local correction, as shown in the following formula: ; The enhanced local features and the enhanced global features are concatenated along the channel dimension and then compressed through a 1×1 convolution to output a fused feature map, as shown in the following formula: ; Cat() represents the concatenation of channel dimensions, with an output channel count of 2C×H×W. Conv1x1 compresses the channels back to C.

[0009] Furthermore, the perceptual loss is achieved by extracting the hull outline, deck structure, and bow and stern morphological features of the sunken ship from the pseudo-sonar image and the real sonar image using a pre-trained VGG19 network, constraining the generated image to maintain consistency with the real image in terms of the semantic structure of the sunken ship. The perceptual loss is calculated using the following formula: ; in, For real sonar images, I fake The generated pseudo-sonar image; For pre-training the VGG19 network; The feature extraction function is the output of the 20th layer of VGG19.

[0010] Furthermore, the SSIM loss enhances the texture and edge sharpness of the shipwreck's hull by measuring the brightness, contrast, and structural consistency of local regions, making the generated image closely resemble the microstructural features of the shipwreck in real sonar images. The SSIM loss is calculated using the following formula: ; in, This represents the number of samples in batch training. This is a structural similarity index.

[0011] Furthermore, during model training, the pseudo-sonar image is input into the first discriminator for discrimination to obtain a first discrimination result, and the pseudo-optical image and the real optical image are input into the second discriminator for authenticity discrimination to obtain a second discrimination result. Based on the perceptual loss, the SSIM loss, and the first and second discrimination results, the parameters of the first generator, the second generator, the first discriminator, and the second discriminator are alternately optimized until the model converges, resulting in a trained sample generation model.

[0012] Secondly, this invention proposes a system for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model, comprising: The preprocessing module is used to acquire optical ship images and side-scan sonar images of shipwrecks and perform preprocessing to generate a sample set; The model building module is used to build an improved generative adversarial network model, which includes a first generator for converting optical images into side-scan sonar images, a second generator for converting side-scan sonar images into optical images, and a first discriminator and a second discriminator corresponding to the two generators respectively. The training module is used to input the sample set into the model for training. The first generator extracts the hull topology and hull plate texture features of the shipwreck in the optical ship image and fuses them to generate a pseudo side-scan sonar image. The second generator converts the side-scan sonar shipwreck image into a pseudo optical image. The first discriminator and the second discriminator respectively judge the authenticity of the generated image. The loss calculation module is used to extract the sonar echo distribution and shadow morphology features of the shipwreck in the pseudo side-scan sonar image and the real side-scan sonar image based on the VGG network, obtain the perceptual loss, and obtain the SSIM loss based on structural similarity. The parameter optimization module is used to optimize the model parameters based on the perceptual loss and the SSIM loss to obtain the trained sample generation model. The sample generation module is used to input the optical ship image to be converted into the trained sample generation model and output the side-scan sonar shipwreck target sample.

[0013] Thirdly, the present invention proposes an electronic device, including a processor and a memory, wherein the memory stores a computer program, and the processor executes the program to implement the above-mentioned method for generating side-scan sonar image shipwreck target samples.

[0014] The beneficial effects of this invention are as follows: 1. This invention employs a dual-path feature extraction module, combining global and local approaches, to capture the overall topological structure and detailed texture features of a shipwreck target. A cross-attention mechanism is then used to achieve bidirectional feature fusion and dynamic complementarity. This design effectively solves the problem of disconnect between global semantics and local details in traditional methods, resulting in significantly improved semantic consistency and texture realism in the generated pseudo-sonar images.

[0015] 2. This invention introduces a joint constraint mechanism of VGG perceptual loss and SSIM loss in the loss function design. VGG perceptual loss extracts high-level semantic features through a pre-trained network, ensuring that the generated image maintains semantic consistency with the real sonar image in terms of macroscopic morphology such as hull outline and deck structure. SSIM loss, on the other hand, constrains local regions from three dimensions: brightness, contrast, and structure, effectively enhancing the realism of microscopic details such as hull texture and shadow edges. The dual loss function design enables the generated image to maintain overall structural integrity while possessing texture details closer to the real sonar image, significantly improving the quality and credibility of the generated samples. Attached Figure Description

[0016] Figure 1 This is a flowchart of a method for generating shipwreck target samples from side-scan sonar images in an embodiment of the present invention; Figure 2 This is a diagram of the overall architecture of the DPFI-CycleGAN model in this invention; Figure 3 This is a system block diagram of a side-scan sonar image shipwreck target sample generation system in an embodiment of the present invention; Figure 4 This is a schematic diagram of a cycle-consistent generative adversarial network in an embodiment of the present invention; Figure 5 This is a schematic diagram of the cross-attention mechanism in an embodiment of the present invention; Figure 6 These are sample sonar images from the experimental cases of this invention; Figure 7 These are sample images of some optical ships used in the experimental cases of this invention; Figure 8 This is a side-scan sonar comparison chart generated in the experimental case of this invention; Figure 9 This is a comparison chart of the target detection results in the experimental case of this invention. Detailed Implementation

[0017] The present application will now be described in further detail with reference to the accompanying drawings. It should be noted that the following specific embodiments are only used to further illustrate the present application and should not be construed as limiting the scope of protection of the present application. Those skilled in the art can make some non-essential improvements and adjustments to the present application based on the above application content.

[0018] This invention proposes a modified architecture based on a recurrent generative adversarial network (GAN), namely DPFI-CycleGAN (Dual-Path Feature Interaction CycleGAN). DPFI stands for Dual-Path Feature Interaction. This model, based on the CycleGAN framework, is specifically designed for generating shipwreck target samples from side-scan sonar images. The model constructs parallel feature extraction modules for global and local paths within the generator. The global path uses a 5×5 large convolutional kernel to extract the overall topology and aspect ratio features of the shipwreck, while the local path uses a 3×3 small convolutional kernel to extract the ship's plating texture and edge details. A cross-attention mechanism achieves bidirectional interaction and fusion of global semantics and local details, ensuring that the generated pseudo-sonar images maintain both the overall structural integrity of the shipwreck and realistic texture details. Furthermore, the model introduces VGG perceptual loss to constrain the semantic consistency of the generated images and combines it with SSIM loss to enhance the realism of local textures, effectively solving the problems of insufficient feature transfer and blurred details in generated images during cross-domain image transformation in traditional generation methods.

[0019] Example 1 Please see Figure 1 and Figure 2 This disclosure proposes a method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model, the method comprising the following steps: S1. Acquire optical ship images and sonar shipwreck images and preprocess them to generate a sample set.

[0020] Specifically, in this embodiment, the side-scan sonar shipwreck image dataset can be the Ship data from the SeabedObjects public dataset, which contains 700 side-scan sonar shipwreck images. The shipwreck targets in the images have different sizes, attitudes, and seabed backgrounds. The optical ship image dataset can be selected from 4,000 representative images from the HRSC2016 dataset and the Ships-Aerial-Images dataset, covering optical images of different types of ships, different shooting angles, and lighting conditions.

[0021] Preprocessing operations include cropping, scaling, and size normalization. Cropping removes the background from the image, retaining the region containing the main features of the ship's hull to reduce background interference. Scaling resizes images that are too large or too small to a uniform size. Size normalization unifies the pixel count of all images to 512×512 pixels to meet the input requirements of the subsequent network model. Optionally, the optical ship image dataset is divided into a training set and a transformation set in an 8:2 ratio, where the training set is used for model training and the transformation set is used to generate pseudo-sonar images; the side-scan sonar shipwreck image dataset is divided into a training set and a test set in a 4:1 ratio, where the training set is used for model training and the test set is used to evaluate the quality of the generated images. Some samples are shown below. Figure 6 and Figure 7 As shown, where Figure 6 Sonar image samples, Figure 7 This is a sample of an optical ship.

[0022] S2. Construct an improved generative adversarial network model, which includes a first generator G for forward transformation. os The second generator G used for inverse transformation so And the first discriminator D corresponding to the two generators respectively. s Second discriminator D o .

[0023] In this embodiment, the first generator G os The second generator G is used to convert optical ship images into side-scan sonar images of wrecked ships. so The first discriminant D is used to implement the inverse conversion. s The second discriminator D is used to determine the authenticity of sonar images. o Used to determine the authenticity of optical images, see [reference]. Figure 4 This demonstrates a recurrent consistency network structure. This dual-generator + dual-discriminator architecture forms the basic framework of a recurrent consistency generative adversarial network, enabling cross-domain image transformation even without paired images.

[0024] S3. Input the sample set into the model for training. Extract the hull topology and hull plate texture features of the shipwreck in the optical ship image through the first generator and fuse them to generate a pseudo side-scan sonar image. Convert the side-scan sonar shipwreck image into a pseudo optical image through the second generator. Then, use the first discriminator and the second discriminator to judge the authenticity of the generated image.

[0025] In this step, the training process employs an alternating optimization approach: first, the generator parameters are fixed while the discriminator parameters are optimized, enabling the discriminator to accurately distinguish between real and generated images; then, the discriminator parameters are fixed while the generator parameters are optimized, allowing the generator to generate images that more closely approximate the real distribution. Through this adversarial training mechanism, the generator gradually learns the mapping relationship from the optical domain to the sonar domain.

[0026] Preferably, the generator incorporates a dual-path parallel feature extraction module; please refer to [link to relevant documentation]. Figure 5 The first generator internally includes global and local feature extraction paths. The global feature extraction path uses a 5×5 convolutional kernel, whose receptive field matches the pixel scale (4-6 pixel units) of the macroscopic geometric features of the shipwreck in the side-scan sonar image. This is used to extract the overall topological structure and aspect ratio features of the shipwreck in the optical ship image as global features. The local feature extraction path uses a 3×3 convolutional kernel, whose receptive field matches the pixel scale (2-4 pixel units) of the microscopic scattering texture features of the shipwreck. This is used to extract the ship's hull texture and edge details as local features. By designing the convolutional kernel size specifically for the scale characteristics of side-scan sonar shipwreck images, feature extraction becomes more accurate, effectively capturing the multi-scale feature information of the shipwreck target.

[0027] It should be noted that, in the context of this invention, global features specifically refer to the overall outline and shape of the shipwreck target. For example, the slender elliptical or spindle-shaped main structure of the hull in sonar images, the directional trend of the bow and stern, and the overall visual skeleton determined by the length-to-width ratio of the hull. These macroscopic features determine the category and basic posture of the shipwreck target. Local features, on the other hand, specifically refer to the fine textural details of the shipwreck surface. For example, the alternating light and dark echo spots formed by the cabin structures on the deck in sonar images, the clear shadow boundary between the ship's edge and the seabed background, and the subtle textural variations at the joints of the hull's steel plates. These microscopic features determine the realism and credibility of the generated image and are key to distinguishing real shipwrecks from blurry artifacts.

[0028] S4. High-level semantic features of pseudo-sonar images and real sonar images are extracted based on VGG network to obtain perceptual loss, and SSIM loss is obtained based on structural similarity.

[0029] In this embodiment, the perceptual loss is achieved by extracting high-level semantic features from the pseudo-sonar image and the real sonar image using a pre-trained VGG19 network. This perceptual loss constrains the generated image to maintain semantic consistency with the real image. Specifically, feature maps from multiple convolutional layers in the VGG19 network are selected, and the L1 distance between the generated image and the real image on these feature maps is calculated as the perceptual loss. This loss function effectively preserves the inherent semantic structure of the shipwreck, such as the hull outline, deck layout, and bow and stern shapes, preventing semantic distortion in the generated image.

[0030] SSIM loss measures the brightness, contrast, and structural consistency of local regions to enhance the microstructural features of the generated image. The SSIM exponent ranges from 0 to 1, with values closer to 1 indicating higher structural similarity between the two images. This embodiment uses 1 minus the SSIM exponent as the loss function to make the generated image closely resemble the features of a shipwreck in a real sonar image in terms of texture details, such as the clarity of the hull texture and the sharpness of shadow edges, thus improving the realism of the generated image.

[0031] S5. Optimize the model parameters based on the perceptual loss and SSIM loss to obtain the trained sample generation model.

[0032] In this embodiment, the total loss function is composed of adversarial loss, cycle consistency loss, perceptual loss, and SSIM loss. During training, the Adam optimizer is used for parameter updates, with an initial learning rate of 0.0002 and a batch size of 4, for a total of 500 training epochs.

[0033] S6. Input the optical ship image to be converted into the sample generation model and output the shipwreck target sample.

[0034] It is understood that in this disclosure, (DPFI) refers to the construction of two parallel feature extraction modules, a global path and a local path, in the generator of a generative adversarial network (GAN), which capture features of different scales of the image respectively, and achieve bidirectional fusion and dynamic complementarity of the features of the two paths through a cross-attention mechanism.

[0035] More specifically, the first generator extracts the overall topological structure and aspect ratio features of the shipwreck from the optical ship image as global features F. g ∈R C×H×W Extract the shipwreck's plank texture and edge detail features as local features F. l ∈R C×H×W Global features are extracted using a 5×5 convolution kernel, and local features are extracted using a 3×3 convolution kernel. The first generator fuses the global and local features through a cross-attention mechanism to obtain a fused feature map, and then upsamples the fused feature map to initially generate a pseudo-sonar image.

[0036] Preferably, global and local features are fused through a cross-attention mechanism, specifically including: The global features are mapped to the global query matrix Q using a 1×1 convolution. g Global key matrix K g and global value matrix V g Local features are mapped to a local query matrix Q through 1×1 convolution. l Local bond matrix K l and local value matrix V lThe purpose of the 1×1 convolution here is to reduce the channel dimension, reduce the computational cost of subsequent attention calculations, and simultaneously achieve feature recalibration.

[0037] Using the global query matrix Q g For semantic queries, the focus is on the shipwreck's hull outline, aspect ratio, and overall orientation, using the local key matrix K. l and local value matrix V l For detail keys, calculate the first attention weight matrix as follows: ; For the local value matrix V l We perform weighted summation to obtain enhanced local features under global guidance, as shown in the following formula: ; Using the local query matrix Q l For detailed queries, focus on the deck structure, hull edges, and surface texture of the shipwreck, using a global key matrix K. g and global value matrix V g For the structural key values, calculate the second attention weight matrix as follows: ; For the global value matrix V g We perform weighted summation to obtain the enhanced global features after local correction, as shown in the following formula: ; The enhanced local features and enhanced global features are concatenated along the channel dimension and then compressed through a 1×1 convolution to output a fused feature map, as shown in the following formula: ; Cat() represents the concatenation of channel dimensions, with an output channel count of 2C×H×W. Conv1x1 compresses the channels back to C.

[0038] This cross-attention fusion mechanism achieves precise alignment and dynamic complementarity between global semantics and local details through bidirectional interaction. Global features act as queries to guide the enhancement of local features, optimizing local texture features under global structural constraints; conversely, local features act as queries to correct global features, integrating detailed information into global structural features for adjustment. This bidirectional enhancement mechanism effectively solves the problem of disconnect between global semantics and local details in traditional generation methods, enabling the generated pseudo-sonar images to maintain both the overall structural integrity of the shipwreck and realistic texture details.

[0039] Preferably, the perceptual loss is achieved by extracting the hull outline, deck structure, and bow and stern morphological features of the sunken ship from the pseudo-sonar image and the real sonar image through a pre-trained VGG19 network, constraining the generated image to maintain consistency with the real image in terms of the semantic structure of the sunken ship. The perceptual loss is calculated using the following formula: ; in, For real sonar images, I fake The generated pseudo-sonar image; For pre-training the VGG19 network; The feature extraction function is the output of the 20th layer of VGG19.

[0040] This perceptual loss utilizes the image semantic knowledge learned by the pre-trained network to effectively guide the generated image to approximate the real image at a high-level semantic level, avoiding the semantic drift problem caused by relying solely on pixel-level loss.

[0041] Preferably, the SSIM loss enhances the texture and edge sharpness of the shipwreck's hull by measuring the brightness, contrast, and structural consistency of local regions, making the generated image closely resemble the microstructural features of the shipwreck in the real sonar image. The SSIM loss is calculated using the following formula: ; in, This represents the number of samples in batch training. This is a structural similarity index.

[0042] This loss function focuses on the local structural information of the image, which can effectively improve the realism of texture details in the generated image, especially key features such as the hull texture and shadow edges of the shipwreck target.

[0043] Preferably, during model training, the pseudo-sonar image is input into the first discriminator for discrimination to obtain the first discrimination result, and the pseudo-optical image and the real optical image are input into the second discriminator for authenticity discrimination to obtain the second discrimination result. Based on the perceptual loss, SSIM loss, and the first and second discrimination results, the parameters of the first generator, the second generator, the first discriminator, and the second discriminator are alternately optimized until the model converges, and the trained sample generation model is obtained.

[0044] According to the above embodiments, the adversarial loss of the discriminator prompts the generator to produce more realistic images, while the perceptual loss and SSIM loss constrain the generated images from both semantic and detail dimensions. The joint optimization of multiple loss functions enables the generative model to maintain its adversarial generation advantages while taking into account the semantic consistency and detail fidelity of the images, ultimately generating high-quality side-scan sonar shipwreck target samples.

[0045] Please see the appendix Figure 8 and Figure 9 To verify the effectiveness of the method of the present invention, this embodiment designed model training and testing experiments as well as target detection verification experiments.

[0046] I. Model Training and Testing Experiment In the model training and testing experiments of this invention, the performance of various advanced models on the side-scan sonar image generation task was compared, including ResNet-CycleGAN based on residual connections, VGG-CycleGAN using VGG network for feature extraction, Stable Diffusion based on diffusion model, CSLS-CycleGAN introducing channel attention mechanism, and DPFI-CycleGAN proposed in this invention. 800 images from the optical ship image conversion set were input into the DPFI-CycleGAN model for training, and the corresponding pseudo-side-scan sonar images were generated through the first generator. The quality of the generated images was evaluated using 140 images from the real sonar image test set. The core evaluation indicators included qualitative and quantitative analysis; for qualitative analysis, please refer to [reference needed]. Figure 8 Quantitative analysis included Fraser Initial Distance (FID), Maximum Mean Difference (MMD), and 1-NN classification accuracy. Please refer to Table 1: Table 1 Accuracy Evaluation Results Qualitative experimental results show that (a) the hull features are almost pure black and cannot express the hull features; (b) the shadow features and texture details of the hull are poor; (c) and (d) the main features of the hull are very similar to the background features and have very low distinguishability. However, the method proposed in this paper, namely (e), compared with the previous four models, has achieved good results in terms of the prominence of the main features and the texture details of the background, and has a significant advantage.

[0047] Quantitative experimental results show that the pseudo-sonar images generated by this method have an FID value of 76.41, an MMD value of 0.110, and a 1-NN classification accuracy of 0.72. Compared with existing methods, this method shows significant improvements in all indicators, proving that the generated images have high stylistic similarity and clear feature distribution to real sonar images.

[0048] II. Target Detection Verification Experiment To verify the effectiveness of the generated samples for downstream target detection tasks, this embodiment further constructs two datasets for comparative experiments. Please refer to Table 2 and... Figure 9 : D1 dataset: contains 500 raw sonar images; The D2 dataset contains 500 original sonar images and 500 pseudo sonar images generated by the method of this invention.

[0049] Table 2. Performance comparison of detection models on different datasets Both datasets were divided into training, validation, and test sets in a 7:2:1 ratio. The YOLOv8n object detection model was used for training and testing, and the detection performance comparison is as follows: The detection precision on the D1 dataset is 0.881, the recall is 0.893, the mAP@0.5 is 0.811, and the mAP@0.5:0.95 is 0.510. On the D2 dataset, the detection precision was improved to 0.940, the recall to 0.893, the mAP@0.5 to 0.823, and the mAP@0.5:0.95 to 0.541.

[0050] Figure 9 The first column (three images) shows the original sonar images, the second column shows the detection images from dataset D1, and the third column shows the detection images from dataset D2. The blue boxes represent the confidence levels; higher confidence levels indicate higher detection accuracy. The images show that the confidence levels for dataset D1 are 0.84, 0.88, and 0.77, while those for dataset D2 are 0.92, 0.93, and 0.95. Therefore, the detection accuracy of dataset D2 is higher than that of dataset D1. Experimental results show that adding the pseudo-sonar samples generated by this invention improves detection accuracy by 5.9%, mAP@0.5 by 1.2%, and mAP@0.5:0.95 by 3.1%, while maintaining the same recall rate. This fully demonstrates that the pseudo-sonar samples generated by this invention can effectively expand training data, improve the performance of shipwreck target detection, and provide high-quality data support for underwater target detection.

[0051] Example 2 Please see Figure 3 This disclosure provides a specific embodiment of a side-scan sonar image shipwreck target sample generation system for implementing the side-scan sonar image shipwreck target sample generation method in Embodiment 1. The system includes: The preprocessing module is used to acquire optical ship images and side-scan sonar images of shipwrecks and perform preprocessing to generate a sample set; The model building module is used to build an improved generative adversarial network model. The model includes a first generator for converting optical images into side-scan sonar images, a second generator for converting side-scan sonar images into optical images, and a first discriminator and a second discriminator corresponding to the two generators, respectively. The training module is used to input the sample set into the model for training. The first generator extracts the hull topology and hull plate texture features of the shipwreck in the optical ship image and fuses them to generate a pseudo side-scan sonar image. The second generator converts the side-scan sonar shipwreck image into a pseudo optical image. The first discriminator and the second discriminator respectively judge the authenticity of the generated image. The loss calculation module is used to extract the sonar echo distribution and shadow morphology features of the shipwreck in pseudo side-scan sonar images and real side-scan sonar images based on the VGG network, obtain the perceptual loss, and obtain the SSIM loss based on structural similarity. The parameter optimization module is used to optimize the model parameters based on the perceptual loss and SSIM loss to obtain the trained sample generation model. The sample generation module is used to input the optical ship images to be converted into the trained sample generation model and output side-scan sonar sinking target samples.

[0052] Each module of this system corresponds one-to-one with the method steps in Example 1, achieving the same functionality and results. In actual deployment, the system can adopt a client-server architecture, with the preprocessing and sample generation modules deployed on the user's end, and the model building, training, loss calculation, and parameter optimization modules deployed on the server. It provides sample generation services to users by calling the trained model.

[0053] Example 3 One specific embodiment of this disclosure proposes an electronic device, including a processor and a memory, wherein the memory stores a computer program, and the processor executes the program to implement the method for generating side-scan sonar image shipwreck target samples in Embodiment 1.

[0054] It is understood that the electronic device can be a server, workstation, or personal computer, with a processor employing CPU or GPU for accelerated computing. The computer program stored in the memory contains instructions for implementing the above-described methods, and when the processor executes these instructions, it can complete the entire process from data preprocessing and model training to sample generation. Those skilled in the art will understand that the electronic device may also include necessary components such as input / output interfaces and communication interfaces for data input / output and remote communication.

[0055] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0056] In addition, the functional modules in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0057] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model, characterized in that, The method includes: Acquire optical ship images and sonar shipwreck images and preprocess them to generate a sample set; An improved generative adversarial network model is constructed, the model including a first generator for forward transformation, a second generator for reverse transformation, and a first discriminator and a second discriminator corresponding to the two generators respectively; The sample set is input into the model for training. The first generator extracts the hull topology and hull plate texture features of the shipwreck in the optical ship image and fuses them to generate a pseudo side-scan sonar image. The second generator converts the side-scan sonar shipwreck image into a pseudo optical image. The first discriminator and the second discriminator respectively judge the authenticity of the generated image. High-level semantic features of pseudo-sonar images and real sonar images are extracted based on VGG network to obtain perceptual loss, and SSIM loss is obtained based on structural similarity. The trained sample generation model is obtained by optimizing the model parameters based on the perceptual loss and the SSIM loss. The optical ship image to be converted is input into the sample generation model, and the sunken ship target sample is output.

2. The method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model according to claim 1, characterized in that, The first generator extracts the overall topological structure and aspect ratio features of the shipwreck from the optical ship image as global features F. g ∈R C×H×W Extract the shipwreck's plank texture and edge detail features as local features F. l ∈R C×H×W The global features are extracted using a 5×5 convolution kernel, and the local features are extracted using a 3×3 convolution kernel.

3. The method for generating shipwreck target samples from side-scan sonar images based on the improved CycleGAN model according to claim 2, characterized in that, The first generator fuses the global features and the local features through a cross-attention mechanism to obtain a fused feature map, and then performs upsampling processing on the fused feature map to initially generate the pseudo-sonar image.

4. The method for generating shipwreck target samples from side-scan sonar images based on the improved CycleGAN model according to claim 3, characterized in that, The global features and the local features are fused through a cross-attention mechanism, specifically including: The global features are mapped to a global query matrix Q using a 1×1 convolution. g Global key matrix K g and global value matrix V g The local features are mapped to a local query matrix Q through 1×1 convolution. l Local bond matrix K l and local value matrix V l ; With the global query matrix Q g For semantic queries, the focus is on the shipwreck's hull outline, aspect ratio, and overall orientation, using the local key matrix K. l and local value matrix V l For detail keys, calculate the first attention weight matrix as follows: ； For the local value matrix V l We perform weighted summation to obtain enhanced local features under global guidance, as shown in the following formula: ； With the local query matrix Q l For detailed queries, focus on the deck structure, hull edges, and surface texture of the shipwreck, using the global key matrix K. g and global value matrix V g For the structural key values, calculate the second attention weight matrix as follows: ； For the global value matrix V g We perform weighted summation to obtain the enhanced global features after local correction, as shown in the following formula: ； The enhanced local features and the enhanced global features are concatenated along the channel dimension and then compressed through a 1×1 convolution to output a fused feature map, as shown in the following formula: ； Cat() represents the concatenation of channel dimensions, with an output channel count of 2C×H×W. Conv1x1 compresses the channels back to C.

5. The method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model according to claim 1, characterized in that, The perceptual loss is calculated by extracting the ship's hull outline, deck structure, and bow and stern morphological features from the pseudo-sonar image and the real sonar image using a pre-trained VGG19 network. This constrains the generated image to maintain consistency with the real image in terms of the semantic structure of the shipwreck. The perceptual loss is calculated using the following formula: ； in, For real sonar images, I fake The generated pseudo-sonar image; For pre-training the VGG19 network; The feature extraction function is the output of the 20th layer of VGG19.

6. The method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model according to claim 1, characterized in that, The SSIM loss enhances the texture and edge sharpness of the shipwreck's hull by measuring the brightness, contrast, and structural consistency of local regions, making the generated image closely resemble the microstructural features of the shipwreck in real sonar images. The SSIM loss is calculated using the following formula: ； in, This represents the number of samples in batch training. This is a structural similarity index.

7. The method for generating shipwreck target samples from side-scan sonar images based on an improved CycleGAN model according to claim 1, characterized in that, During model training, the pseudo sonar image is input into the first discriminator for discrimination to obtain a first discrimination result, and the pseudo optical image and the real optical image are input into the second discriminator for authenticity discrimination to obtain a second discrimination result. Based on the perceptual loss, the SSIM loss, and the first and second discrimination results, the parameters of the first generator, the second generator, the first discriminator, and the second discriminator are alternately optimized until the model converges, thus obtaining the trained sample generation model.

8. A system for generating shipwreck target samples from side-scan sonar images for implementing the method for generating shipwreck target samples from side-scan sonar images according to any one of claims 1-7, characterized in that, The system includes: The preprocessing module is used to acquire optical ship images and side-scan sonar images of shipwrecks and perform preprocessing to generate a sample set; The model building module is used to build an improved generative adversarial network model, which includes a first generator for converting optical images into side-scan sonar images, a second generator for converting side-scan sonar images into optical images, and a first discriminator and a second discriminator corresponding to the two generators respectively. The training module is used to input the sample set into the model for training. The first generator extracts the hull topology and hull plate texture features of the shipwreck in the optical ship image and fuses them to generate a pseudo side-scan sonar image. The second generator converts the side-scan sonar shipwreck image into a pseudo optical image. The first discriminator and the second discriminator respectively judge the authenticity of the generated image. The loss calculation module is used to extract the sonar echo distribution and shadow morphology features of the shipwreck in the pseudo side-scan sonar image and the real side-scan sonar image based on the VGG network, obtain the perceptual loss, and obtain the SSIM loss based on structural similarity. The parameter optimization module is used to optimize the model parameters based on the perceptual loss and the SSIM loss to obtain the trained sample generation model. The sample generation module is used to input the optical ship image to be converted into the trained sample generation model and output the side-scan sonar shipwreck target sample.

9. An electronic device, characterized in that, The device includes a processor and a memory, the memory storing a computer program, and the processor executing the program to implement the method for generating shipwreck target samples from side-scan sonar images according to any one of claims 1-7.